Hirsch Index and Truth Survival in Clinical Research

Background Factors associated with the survival of truth of clinical conclusions in the medical literature are unknown. We hypothesized that publications with a first author having a higher Hirsch' index value (h-I), which quantifies and predicts an individual's scientific research output, should have a longer half-life. Methods and Results 474 original articles concerning cirrhosis or hepatitis published from 1945 to 1999 were selected. The survivals of the main conclusions were updated in 2009. The truth survival was assessed by time-dependent methods (Kaplan Meier method and Cox). A conclusion was considered to be true, obsolete or false when three or more observers out of the six stated it to be so. 284 out of 474 conclusions (60%) were still considered true, 90 (19%) were considered obsolete and 100 (21%) false. The median of the h-I was = 24 (range 1–85). Authors with true conclusions had significantly higher h-I (median = 28) than those with obsolete (h-I = 19; P = 0.002) or false conclusions (h-I = 19; P = 0.01). The factors associated (P<0.0001) with h-I were: scientific life (h-I = 33 for>30 years vs. 16 for<30 years), -methodological quality score (h-I = 36 for high vs. 20 for low scores), and -positive predictive value combining power, ratio of true to not-true relationships and bias (h-I = 33 for high vs. 20 for low values). In multivariate analysis, the risk ratio of h-I was 1.003 (95%CI, 0.994–1.011), and was not significant (P = 0.56). In a subgroup restricted to 111 articles with a negative conclusion, we observed a significant independent prognostic value of h-I (risk ratio = 1.033; 95%CI, 1.008–1.059; P = 0.009). Using an extrapolation of h-I at the time of article publication there was a significant and independent prognostic value of baseline h-I (risk ratio = 0.027; P = 0.0001). Conclusions The present study failed to clearly demonstrate that the h-index of authors was a prognostic factor for truth survival. However the h-index was associated with true conclusions, methodological quality of trials and positive predictive values.


Introduction
Science progresses via a series of paradigms that are held to be true until they are replaced by a better approximation of reality [1]. In surgery and medicine two studies have estimated that the half-life of truth for clinical conclusions in the literature is 45 years [2,3]. We had tried to identify factors that were independently associated with this truth survival, and found only two, one expected (the negative conclusion of the publication) and one nonexpected (the absence of meta-analysis in the methodology used) [3]. We therefore concluded that better prognostic factors should be found to better convince clinicians of the long term utility of evidence-based medicine [3,4].
In the previous study, we did not analyze any author's related factor [3]. In the present study we hypothesized that publications with a first author having higher h-I which quantifies [5] and predicts an individual's scientific research output [6,7], should have longer survival. An association between the h-I and truth survival could be also the proof of concept of using this type of method for validating such indexes. So far, the h-I has been validated using '''scientific achievement'', as defined by criteria which are finally very redundant: the number of citations [6],peer review [8], -grant proposals [9] or quantitative performance measurements [10][11].
We used 474 previously assessed articles [3] with an identified first author, and in which the survival of the main conclusions were updated in 2009.

Methods
Summary of the initial study methodology [3] Selection of articles. We identified original articles concerning cirrhosis or hepatitis in adults from 1945 to 1999 in 11 five year periods. The articles selection was stratified into 3 categories: non-randomized studies, randomized trials and meta-analyses. In each five year period we selected 20 non-randomized articles from two journals, 10 published in Lancet and 10 in Gastroenterology. In each period we tried to select 20 randomized trials on cirrhosis or hepatitis, 10 from Lancet and 10 from Gastroenterology. We chose these two journals because they have published clinical studies in hepatitis and cirrhosis since at least 1945, because they are peer-reviewed with a high level of selection and have a high impact factors greater than 10. A hand search was utilized to select articles from 1945 to 1985. As a true randomization was very difficult to organize we used a selection by order of publication inside each 5 year period. The first article of the period concerning cirrhosis or hepatitis was chosen, then the last of the period, then the second, and then the one before the last and so on up to 20 articles. From 1985 to 1999 we used PUBMED electronic search specifying the following ''limits'': cirrhosis or hepatitis, human, Lancet or Gastroenterology. Abstracts were randomly downloaded using a similar selection method, stratified by five year periods. We selected the first abstract listed on the first electronic page, then the first on the last electronic page, then the last on the second electronic page, then the last on the page before the last and so on up to 20 articles.
In each period we tried to select 20 randomized trials on cirrhosis or hepatitis, 10 from Lancet and 10 from Gastroenterology. This was possible from 1970 to 1999. In the periods from 1945 to 1969 we selected all identified randomized trials whatever the journal, with a range from four (1945)(1946)(1947)(1948)(1949)(1950) to 20 trials (1965-1969). From 1945 to 1982 we used the_ manual method the hand searching method as previously described [5]. From 1982 to 1985 we completed the random selection by hand searching and from 1985 to 1999 by PUBMED as described for non-randomized studies.For the meta-analyses, we used a hand searching method as described in the systematic review of meta-analyses [12]. To be included, meta-analysis should be based on trials in the field of hepatology and published as full papers before 2000. The following operational definition of meta-analysis was adopted: a study in which a computation of an overall treatment effect, based on the estimation of treatment effect in each trial, was performed, and reported with its 95% con-fidence interval or with the corresponding statistical test. Meta-analyses on childhood diseases were not included [12].
Selection of conclusion. The one conclusion from each abstract that seemed to best summarize the findings was copied to a database. Editing of these sentences was restricted to the rephrasing of outdated terminology and the elimination of redundant words.
Observers. Six hepatologists, called the observers, assessed the form which contained the selected conclusions in a random order. The observers were fulltime hepatologists from different subspecialties but working in a hospital and aged from 31 to 65 years. Observers were blind to the period, the journal, the authors, the method (meta-analysis, randomized, non randomized), and the methodological quality from which each conclusion was derived. They classified each conclusion into one of three categories: 1) still true in 2000 (updated in 2009), 2) obsolete but not false, 3) false.
Prognostic factors. The following seven factors were analyzed; 1) the design (meta-analysis, randomized trial, not randomized study); 2) the quality assessment of randomized trials and meta-analyses had been made independent of this study by one of us (TP) by means of scoring methods [13][14][15]; articles were rated as high quality when the score was greater or equal to the mean (12 for randomized trials, 27 for meta-analyses) and as low quality when lower than the mean. Non-randomized studies were classified as low methodological quality as there is no specific scoring method; 3) negative or positive conclusions; 4) the type of disease (hepatitis, portal hypertension, other); 5) the domain of clinical research (therapeutic, diagnostic or other study; other studies were defined as explanatory studies not assessing treatment or diagnostic tests); 6) the journal of publication (Lancet, Gastroenterology, other); and 7) the specialty (medicine or surgery).
Statistical analysis. A conclusion was considered to be true, obsolete or false when three or more observers out of the six stated it to be so. When there was a split decision 3 to 3 regarding conclusions being true-not true the final conclusion was considered to be true; these splits concerned 9 out of 474 (1.9%) articles. When there was a split decision 3 to 3 regarding conclusions being obsolete-non obsolete the final conclusion was obsolete; these splits concerned 26 articles out of 474 (5.5%). When the article was not classified as either true or obsolete it was considered as false. The half-life was calculated according to the Kaplan Meier method using the censored time as the duration between the year of publication to the year 2000 (updated in 2009). The censored time is the time at risk of being refuted or found to be obsolete. We analyzed the truth survival: if the conclusion was assessed to be still true the case was censored at the end of follow-up. If the conclusion was assessed to be false or obsolete it was considered as a failure. The comparison between factors used the two-sided logrank test and the multivariate analysis proportional hazard regression analysis.

Hirsh index
The h-I of first authors was the main prognostic factor assessed in the present study. The h-I were assessed in the first 6 months of 2010. The h-I was originally computed using Google Scholar (''Google Scholar Universal Gadget'') for first authors. Because Google Scholar is not a perfect Gold Standard of estimating h-I, other methods were used. The commonness of last names can introduce a false estimate of the h-I [16] and therefore for the high risk names we used ''liver'' as a supplementary selection criteria in the Scholar research. As the Scholar research should be less performing for the oldest publications, the h-I was also assessed using the Scopus database for first authors of articles that were published after 1995,and using the ISI data-base. Only the authors still publishing after 1980 have been taken into account as the applicability of ISI search was very low in the older periods.
The date of the publication as well as the scientific age of the author (time between first and last publications) are mathematically associated with the h-I, which is cumulative, and increases over time [5,8,10,[16][17]. Therefore analyses were stratified according to the publication date (1945-1964, 1965-1979, 1980-1999), the rate of the h-I (h-I/scientific life in years) was estimated and the scientific life duration of the author was included in multivariate analyses.
The seven characteristics of studies [3] and two author characteristics associated with the h-I in the literature (gender of author, and place of residence) [10,[16][17] were analyzed as possible confounding factors in the prognostic analyses. The gender was unknown from the Scholar research and from the first name initials. We used the personal knowledge of coauthors and the details of first name given by Scopus.

Updated methods
No change was made for the selection of articles, and methodological quality assessment. Observer conclusions were updated in 2009, that is with 10 years more of follow-up. One previous observer had retired, two had moved and two new ones agreed to participate (MM, DT). The observers were asked to modify their previous conclusions if necessary. A conclusion was changed when at least three observers out of the five stated it to be so. Five changes occurred, one previously false conclusion and one previously obsolete became true, two previously true became false and one became obsolete.
The main a priori endpoint was the prognostic value of the h-I (quantitative value) in the multivariate analysis including previously identified prognostic factors. The other ''significant'' P values were detailed when , = 0.10 and were described as NS if .0.10.
Statistical descriptions and analyses used non-parametric methods. Median was expressed with a 95% confidence interval. Multiple comparisons used the Kruskal Wallis variance analysis with Dunn s' multiple comparison test. In comparison with the previous analysis the same time-dependent analyses were used. [3] A modification was made for the estimated time of censoring for obsolete or false conclusion, according to a pertinent critique [18]. Very old publications that had been declared obsolete at the end of follow-up could cause the duration of survival to be overestimated if they were in fact been obsolete or false many years earlier. Therefore for each obsolete or false conclusion, we estimated the year in which it became obsolete or false. We added the duration of scientific life in the Cox proportional regression model as a covariate for adjusting the prognostic value of the h-I. The conclusions of the first analysis and the factors associated or not associated with truth survival did not change [19].
It was not possible to assess directly the h-I of the author at the time of publication (baseline h-I) for each article included in the present survey. However it was possible to estimate the baseline h-I using backwards the progression rate of the given h-I. For example a Scholar h-I = 81 in 2010 (h-I 2010 ), for an author with a mean speed (h-speed) of 2.53, it was possible to extrapolate that for one article of the present database published in 1995 (h-I baseline ) the h-I was at this baseline date: h-I baseline = h-I 2010 -(h-speed 6 (2010-1995)) = 81-2.53x(15) = 43. This baseline h-I was also assessed in the prognostic analysis.
It has been suggested that for a special ''outstanding category'' of top-scientists, citation' indexes can reflect scientific ''quality'' [20]. Therefore we planned an analysis of ''top-hepatologists'' conclusions, using the cutoff which select the 30 highest h-I. Using h-Scholar the cutoff was h-I = 60; this resulted in 33 articles (6.1%), as there was 4 ties at h-I = 60. Using h-Scopus the h-I cutoff was 33 and for ISI 38.
We have not previously observed a prognostic value of studies according to criteria based on methodological quality scoring systems [3]. Recently Ioannidis proposed a classification of research findings in 9 classes of positive predictive values according to various combinations of power, ratio of true to not-true relationships and bias [4]. The details of this classification are available in Table S1. Therefore we planned an analysis using this classification in the multivariate prognostic analysis.

Results
A total of 474 articles were included. The characteristics of included first authors are given in Table 1 and of the articles are given in Table 2, stratified by periods. There was a majority of articles published by residents of the US and UK before 1980, and by residents of continental Europe after 1980. A large majority of articles were published by male first authors, who were not surgeons, with a median scientific life of 30 years. The methodological quality, expressed according to scoring systems or predictive value, was much better since 1980. In the year 2009, 284 out of 474 conclusions (60%) were still considered true, 90 were considered obsolete (19%) and 100 (21%) false. The half-life of truth was 45 years. The survival rate of conclusions was 85% (95%CI 83-89%) at 20 years and -52% (95%CI, 47-57%) at 40 years.

The h-Index
The first author Scholar h-I (median; 95%CI) was 24 (20- Factors associated with the h-I estimated using Google scholar As expected the h-I was highly associated with duration of scientific life and recent publications ( Using univariate and not time-dependent analysis, the h-I was also associated with methodological quality either using scores (Table 4) or positive predictive value categories (Figure 1), randomization design, and with authors with several articles included (Table 5).
There was no significant association between the h-I and truth survival using time-dependent analysis both in uni-and multivariate analyses (Table 5). Comparing the Scholar h-I there was no significant difference between 50 years survival (main end point), 5065% (h-I above median) and 4664% (under the median), respectively (P = 0.63) (Figure 2). There was also no difference in truth survival for Scopus h-I (Figure 3).
For the main endpoint the risk ratio of the h-I was 1.003 (0.994-1.011) and was not significant (P = 0.56). There was a significant difference of the 50 years survival of conclusions according to the negative or positive finding, 72612% (negative finding) and 4063% (positive finding), respectively (P,0.0001) (Figure 4).

Concordance between the h-index estimated using Google Scholar, Scopus, and ISI
Concordance between the h-I estimated using Google scholar on the overall scientific life of authors and the h-I estimated using Scopus and ISI for the scientific life after 1994, was assessed for the 217 authors of articles published after 1994 and applicable ISI (1à not applicable out of 227). There was a highly significant concordance between the 3 h-I estimates. The Spearman's rank correlation between Scholar and Scopus was 0.72, between Scholar and ISI 0.81 (P,0.0001) and between Scopus and ISI  0.31 (SE = 0.06; P = 0.01). In comparison with h-I estimated using Google Scholar, the h-I estimated using Scopus or ISI had similar variability according to characteristics of included first author (Table 3) and original articles (Table 4), and were also not independently associated with truth survival (Table 5).

Predictive value of baseline Scholar h-index
For baseline H-I the prognostic value was opposite between uni and multivariate analyses. Using univariate comparison (Table 5), article with author baseline h-I greater than 3 (the median value) had lower 50 year survival (18%) than article with lower baseline h-I (42%; P,0.0001) and in multivariate analysis the quantitative value was positively associated with survival (Risk ratio = 0.027; P = 0.0001). This discrepancy was due to a very significant period effect. After 1980 the 25 year survival of author with baseline h-I .3 was 66% (54-77%) versus 63% (50-76%; NS) in h-I#3, with in multivariate analysis a significant positive prognostic value (risk ratio = 0.027; P = 0.0001). Before 1980 the 25 year survival of author with baseline h-I .3 was 19% (5-34%) versus 63% (50-76%; NS) in h-I#3 (negative prognostic value), with in multivariate analysis a significant positive prognostic value (risk ratio = 1.052; P = 0.04).

Discussion
We observed that the h-I at the end of the study was associated with true conclusions, but its prognostic value did not survive with time-dependent analysis as previously observed for methodological quality. On the contrary baseline h-I (when the paper was written), was significantly and independently associated with truth survival, when adjusted on other covariables. Negative conclusions remained a robust and independent predictor of truth survival [3].

Strength
We confirmed in the present study the intriguing prognostic value of negative conclusions (72% vs. 40% for 50 years survival for positive conclusions), which persisted after other factors had been taken into account. This prognostic value was not due to obsolete conclusions as among negative conclusions, as only 2% of negative conclusions had been rated as obsolete compared to 25% of positive conclusions. We found few negative studies which had been published in order to reveal previous false positive conclusions (Proteus phenomenon) [21]. An example is the article which concluded that hepatitis B virus was not responsible for primary biliary cirrhosis which was published 18 months after another article had suggested this association [3]. There was no significant difference in the h-I of authors with negative (h-I = 21) or positive (h-I = 25) conclusions. If we accept that most published research findings are false [4], the better survival of negative findings (''no relationships'') is a corollary of this statement. This is therefore the most plausible explanation of the better long term survival of negative findings.
Subgroup analyses are hazardous, but in a multivariate analysis restricted to 111 articles with negative conclusions we observed a significant independent predictive value of the h-I. This retrospective observation without a priori hypothesis must be confirmed by another study. We previously observed in the present cohort that the prognostic value of negative versus positive conclusions was mainly due to high differences among the randomized trials' conclusions: 68613% for 52 negative conclusions compared with 1464% (P,0.001) for 118 positive conclusions [3]. One hypothesis is that authors with an elevated h-I are principal investigators of ''better trials'' with better findings survival than those of authors with a lower h-I. From our analysis we cannot conclude that this ''author effect'' is a cause or a consequence of scientific performance. Some authors may be supported more by industry for other reasons than their ''intrinsic'' quality. A means of verifying whether ''an intrinsic'' author exists would have been to assess the factors associated with survival among articles published at the beginning of the authors' scientific life.

Limitations
Our study has significant limitations. The study is retrospective between 1945 and 2000 and only prospective for the last 10 years of follow-up (updated in 2009). The inclusion criteria selected authors who may not have been representative of the overall biomedical community. They had published articles on liver diseases with high methodological levels (majority of randomized trials) in two competitive journals (mainly Lancet and Gastroenterology) with high impact factors in 2008, 28.4 and 12.6, respectively. We also used methods to assess methodological quality which are not the most recent and valid ones.
This selection should explain the high observed h-I (median of 24 for all periods and 30 for the period of 1980-1999). The h-I cannot be compared between different scientific fields or between different periods of publications [16,17]. However, the observed median (h-I = 30) is higher compared with h-I of the same medical fields: versus other medical faculty members (same period): 7.6 mean h-I in 826 US oncologists [22], median 10 for 29 Dutch professors in cardiology [23], and median 23 for 45 editorial board members [24]. Because of this rather high h-I level, it is possible  There is no gold standard for scientific truth definition. We used a definition that was decided by the majority vote of a panel of 5 experts, 10 to 65 years after the findings' publication. The main advantage was the duration of followup with subsequent progresses in the field of knowledge. The main weakness was the arbitrary choice of experts. To limit the risk of bias, the experts were chosen from different domains of Hepatology and had different ages [3]. We also adjusted the prognosis analysis using the classification of studies according to positive predictive values per Ioannidis [4]. The results were similar to the previous adjustments using the -validated quality scoring system of randomized trials and meta-analyses [3]. However we think that the positive predictive value estimates could be improved for negative findings and for diagnostic studies, which is a growing part of clinical research.
The h-I estimates had limitations and we cannot rule out that these limitations might be able to explain the absence of clear and independent prognostic values [7][8][9][10][16][17]25]. The first limitation is the reliability of a citation index in oldest years  before the prospective existence of PubMed and Google Scholar. The second main limitation is the commonness of last names which could introduce false estimates of the h-I. However, with the high risk names we used ''liver'' as a supplementary selection criterion in Scholar research and checked the authorship twice using Scopus for authors still publishing after 1994. Moreover, the main results were similar using two other estimates, Scopus and ISI (Table S2), which were significantly concordant.
Finally the extrapolation of baseline h-I at the year when the paper was written suggest a clear and independent prognostic value of h-I. The main limitation of this index in comparison with the 2010 h-I estimates, is its indirect assessment. This extrapolation rely on the normality and linearity of the h-I progression rate. We used median to reduce the risk of variability but a real prospective validation of the h-I prognostic value is needed.

Conclusion
The h-I is simple, probably more accurate than other citation indexes for estimating authors' scientific outputs, and it is accepted when its limitations are understood [25], with [26] or without [7,10] irony. We agree with Horne et al, that retaining a dignified aloofness to the h-I could be difficult for those with scores of less than 30 [7].
For living hepatologists, at least, our conclusions were balanced. The present study failed to clearly demonstrate that the h-index of authors was a prognostic factor for truth survival. However the h-I was partly validated as associated with true conclusions, the methodological quality of trials and with positive predictive values combining power, ratio of true to not-true relationships and bias.
Furthermore an indirect (extrapolated) estimate of baseline h-I clearly observed a high and independent prognostic value for articles published after 1980. Prospective study in the next decades should be initiated to confirm this observation.

Author Contributions
Conceived and designed the experiments: TP. Performed the experiments: TP. Analyzed the data: TP. Contributed reagents/materials/analysis tools: TP MM VR YB DT OD. Wrote the paper: TP.