Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeImplausible career lengths = Possible misidentification of individual authors?
Posted by spavlov on 05 Jan 2021 at 08:52 GMT
I was studying the table intending to check how career length influenced the ranking when something odd came up:
First, I noticed that an author (Marshall, William S. from Saint Francis Xavier University, rank(ns)=46295) had a long and prolific publishing career lasting 186 years. Their first paper was published in 1834 and their last in 2020. This finding made me suspicious (first I thought it was a case of mistaken year). I calculated the career lengths as the difference between lastyr and firstyr:
The Career length distribution ranges between 2 and 186 years with median 34 (IQR 27-42).
It turned out that there were suspiciously many authors with long careers of publishing. While it is certainly plausible that someone might have been publishing for 50 years, I could not say the same for 90 years of a scientific publishing career. The numbers of authors with suspiciously long publishing careers are quite disturbing:
N(Career longer than 50 years) = 14706
N(Career longer than 60 years) = 2539
N(Career longer than 70 years) = 395
N(Career longer than 80 years) = 130
N(Career longer than 90 years) = 62
N(Career longer than 100 years) = 45
I suspect that this is the result of binning error of different authors with the same name into one identity or some other error due to the automated data extraction and processing. This error might also occur for scientists with shorter careers but can’t be detected easily.
This makes the integrity and authenticity of the data questionable and reduces the trustworthiness of the related research.