Fig 1.
Description of PreprintMatch.
Fig 2.
CONSORT-style flow diagram showing the process of test set construction. Starting with a simple random sample of 1,000 preprints from the set of all bioRxiv and medRxiv preprints, 578 were considered true positives and 404 true negatives. Preprints were classified as true negatives when both bio/medRxiv and PreprintMatch did not report a match, and true positives when they both reported the same matched paper. In all other cases, manual curation was used to assign the true label.
Table 1.
Preprint matching method comparison on test set.
Table 2.
Tool comparison on Cabanac et al.’s dataset.
Fig 3.
Effect of time between preprint and paper publication.
(a) Histogram of the time gap between preprint posting and paper publication in weeks. (b) Median abstract similarity, as measured with cosine similarity, between preprint and paper over time. (c) Mean title similarity, as measured with cosine similarity, between preprint and paper over time.
Fig 4.
Country level preprint analysis.
(a) Map of the rate of preprint publication for the top 50 research-producing countries. (b) Preprint publication rate of World Bank country income group classifications. (c) Time gap between preprint posting and paper publication, in days, for income groups. (d) Median abstract similarity between preprint and paper for income groups. Horizontal line shows median of all published preprints. (e) Mean title similarity between preprint and paper for income groups. Horizontal line shows mean of all published preprints. *** p < 0.001, n.s. not significant.
Table 3.
Income groups and top countries from each.
Table 4.
Preprint reporting of funding.
Percentage of medRxiv preprints from each income group that report external funding.
Fig 5.
(a) Average number of authors added from preprint to paper for each income group. Preprint-paper pairs with 10 or more author changes were excluded. (b) Percentage of preprints published from upper middle or low/lower middle income countries where there is at least one other author on the preprint from a high income country or not. (c) Percentage of authors who retain their position from preprint to paper for each income group and for first and last author positions. (d) Percentage of papers where at least one author was added from preprint to paper publication by country. *** p < 0.001, * p < 0.05, n.s. not significant.
Fig 6.
Top publishers from high and low/lower middle income groups.
The top four publishers from each income group are shown. Oxford University Press is in the top four publishers of papers from high income countries, but not low/lower middle income countries, so it is only shown in one of the charts. Similarly, Springer-Verlag is only in the top four publishers of low/lower middle income countries.