Fig 1.
We developed a semi-automated pipeline for detection of misidentified SEM instruments in the MSE literature and tested it on published articles in 50 journals.
a, Annual count of articles featuring SEM based on Scopus search results. b, SEM image featuring an auto-generated metadata banner from a TESCAN MAIA3 SEM, after [19]. c, Illustrative metadata banner snippets allowing for identification of the manufacturer and/or model of instrument used. d, Publication timeline of the 1,067,108 articles extracted from 50 journals between 2010 and early 2023. SEM image were present in 174,046 articles (16.3%).
Fig 2.
Comparison of the sizes of images, in terms of image file height in pixels, from which we were able to extract brandmarks using optical character recognition (OCR) (red boxes) versus images for which we were not (white boxes).
a, For all years, image files for which we were able to extract brandmarks were larger. b, Comparison across 4 publishers of the sizes of images for which we were able to extract brandmarks (red boxes) versus images for which we were not (white boxes). Except studies published by the Public Library of Science, images for which we were able to extract brandmarks were significantly larger. Center line shows median, boxes show inter-quartile range, whiskers show 2.5th percentile and 97.5th percentile. n.s. = p > 0.05, * = p < 0.05, ** = p < 0.01 and *** = p < 0.001 by two-sided Mann-Whitney U test. Additional analysis shown in S2 Fig.
Fig 3.
Flowchart of implemented semi-automated pipeline.
Fig 4.
Summary of instrument identification for 11,314 articles with SEM images with extractable metadata banners.
a, Publication timeline of the articles analyzed. 6,115 (54.0%) correctly identified the SEM used, 2,400 (21.2%) articles incorrectly identified the SEM used, and 2,799 (24.7%) articles did not identify the SEM used. b, Top 15 surveyed journals with the most articles with SEM misidentification. Most articles with SEM misidentification incorrectly identified both the manufacturer (Mfr.) and the model of SEM used.
Fig 5.
Assessments of methodological sensitivity and selectivity.
a, Estimation of our pipeline’s false discovery rate. We manually assessed a subset of 150 articles with SEM misidentification. 1 article (0.7%) was incorrectly classified during the manual step of our pipeline. 32 articles (21.3%) had fewer instruments identified in the text than were identified in the figures. b, Comparison of the results of our pipeline to PubPeer comments. Only 43 out of 2,400 articles labeled by our pipeline as problematic were already commented on by PubPeer users. For 30 of the 43, the issues commented upon were unrelated to SEM misidentification. c, Estimation of our pipeline’s false negative rate. We determined the ability of our pipeline to recover all articles for which a PubPeer user had already reported SEM misidentification. Our pipeline caught 13 of 45 eligible articles (28.9%). d, Pre-existing Pubpeer comments were more frequently found on articles with misidentified SEM instruments than other MSE articles. Errors bars show standard error of the proportion. n.s. = p > 0.05, * = p < 0.05, ** = p < 0.01 and *** = p < 0.001 by two-sided Z-test of proportions.
Fig 6.
Assessment of Tauc plots in articles flagged by our pipeline.
a, The mockup Tauc plot on the left demonstrates how to correctly estimate band gap energy (by taking the intersection of the plots linear fit with the line) [28]. The mockup Tauc plot on the right demonstrates a common error where the band gap is estimated by taking the intersection of the linear fit with the line
where
is the lowest value on the y-axis as plotted. b, This specific error occurred more frequently on articles with misidentified SEM instruments than other MSE articles. Errors bars show
standard error of the proportion. n.s. = p > 0.05, * = p < 0.05, ** = p < 0.01 and *** = p < 0.001 by two-sided Z-test of proportions.
Fig 7.
Systemic characteristics of articles with SEM misidentification.
a, Some terms found in article titles were enriched or depleted among articles that misidentified SEM instruments compared to other articles featuring SEM images. For instance, articles containing ‘green’ (as in ‘green synthesis’) in the title were 2.8 times as likely to contain misidentified SEM equipment than articles without ‘green’ in the title (p by two-sided Fisher exact test). We show only non-stopwords found in 100 or more articles with Benjamini-Hochberg false discovery rate < 0.01 by two-sided Fisher exact test. b, Rate of misidentification of SEM instrumentation for the five most frequent authors of articles with SEM misidentification.
Fig 8.
Systemic characteristics in the affiliations of articles with SEM misidentification.
a, Location of affiliation of authors of articles with SEM misidentification. Many authors are affiliated with institutions located in China, Iran, India and Egypt. A minority of articles had authorship from multiple countries. b, Enrichment of location of affiliations of authors of articles with SEM misidentification. Articles with authorship from Iran were 6.3 times as likely to have misidentified their SEM instruments than articles without authorship in Iran (p ). n.s. = p > 0.05, * = p < 0.05, ** = p < 0.01 and *** = p < 0.001 by two-sided Fisher exact test.