Measuring novelty in science with word embedding

doi:10.1371/journal.pone.0254034

Fig 1.

Algorithm of novelty computation.

More »

Expand

Table 1.

Previous novelty measures.

More »

Expand

Table 2.

Questionnaire of novelty.

More »

Expand

Fig 2.

Distribution of distance and novelty.

The same sample for the third validation study (prediction of future citation) is used, except that oversampled highly-cited documents are excluded. The 947 selected documents include in total approximately 230,000 combinations of cited references, for which the distance (Eq 1) is computed (A). The distances are summarized at the focal document level (Eq 2), and Novel₁₀₀ is displayed as an example (B). Novelty measures with different q values are illustrated in S1 Appendix. Since abstracts and keywords are not available for all documents, the sample sizes are smaller.

More »

Expand

Table 3.

Validation of distance measures.

More »

Expand

Fig 3.

Correlation between bibliometric and self-reported novelty measures.

Pearson’s correlation coefficient. Novel_q (q∈{100,99,95,90,80,50}) is correlated with the mean of four self-reported newness scores (Column 9 in Table 4). ^†p<0.1, *p<0.05, **p<0.01, ***p<0.001.

More »

Expand

Table 4.

Validation of novelty measures.

More »

Expand

Table 5.

Odds ratio of top-1% citation rank.

More »

Expand

Fig 4.

Prediction of top-1% citation rank.

The probability of a focal document falling within the top 1 percentile is predicted. For easier interpretation and comparison, the horizontal axis takes the percentile of the novelty measures. (A) based on Row 1 in Table 5. (B) and (C) based on curvilinear models incorporating the quadratic term of the novelty measures (S1 Appendix).

More »

Expand