The Impact of Article Length on the Number of Future Citations: A Bibliometric Analysis of General Medicine Journals

Background The number of citations received is considered an index of study quality and impact. We aimed to examine the factors associated with the number of citations of published articles, focusing on the article length. Methods Original human studies published in the first trimester of 2006 in 5 major General Medicine journals were analyzed with regard to the number of authors and of author-affiliated institutions, title and abstract word count, article length (number of print pages), number of bibliographic references, study design, and 2006 journal impact factor (JIF). A multiple linear regression model was employed to identify the variables independently associated with the number of article citations received through January 2012. Results On univariate analysis the JIF, number of authors, article length, study design (interventional/observational and prospective/retrospective), title and abstract word count, number of author-affiliated institutions, and number of references were all associated with the number of citations received. On multivariate analysis with the logarithm of citations as the dependent variable, only article length [regression coefficient: 14.64 (95% confidence intervals: (5.76–23.50)] and JIF [3.37 (1.80–4.948)] independently predicted the number of citations. The variance of citations explained by these parameters was 51.2%. Conclusion In a sample of articles published in major General Medicine journals, in addition to journal impact factors, article length and number of authors independently predicted the number of citations. This may reflect a higher complexity level and quality of longer and multi-authored studies.


Introduction
An article's citations are considered a measure of the scientific recognition the study has received, and thus an indicator of its value and impact on the scientific field [1]. The citations are also the main factor determining the scientific impact of a journal, as expressed by the journal impact factor [2]. This indicator represents the mean number of citations received in an index calendar year, by all the citable articles published in a journal during the previous two years [3,4]. Researchers commonly aim to publish articles that will attract citations and will thus be regarded to have a high scientific impact, as this may be associated with their career advancement.
In this context, we aimed to examine the factors associated with the number of citations received by published articles, focusing on the article's length.

Data extraction
The abstract and/or full-text manuscript of each article was accessed to collect information regarding article length and characteristics that were reported to affect the number of citations in previous studies. Specifically, we documented variables comprised the number of authors and affiliated institutions, title and abstract word count, article length (as the number of pages), number of bibliographic references, study design (human or experimental studies; prospective or retrospective; interventional or observational), access to the article (open access or requiring subscription), and 2006 journal impact factor (JIF).

Data analysis and statistical methods
Statistical analyses were performed using SPSS Version 20.0. Initially, the association of each independent variable with the dependent variable (citation count) was assessed with univariate analyses (Mann-Whitney for categorical and Spearman's correlation for continuous variables); we used non-parametric methods, because citations of articles published in General Medicine journals are known to have a non-parametric distribution [21]. Variables significantly associated with the citation count in univariate analysis (p,0.10) were then entered in a backward multiple linear regression model to identify independent predictors of higher number of citations. The multiple linear regression model was also run with logarithmic transformation of the dependent variable (number of citations) to assess for a logarithmic, rather than linear relationship between the dependent and independent variables. Since the logarithmic transformed model performed better, only the results of this model were presented. To exclude the possibility of a false positive association between the article length and the number of authors and the number of citations, we repeated the multiple regression analysis separately for each of the journals, as the journal impact factor has been well established to be a major factor affecting citations. All assumptions of linear regression were met by this model, including lack of error term correlation (Durbin-Watson = 2.013). Graphical examination of residuals did not suggest a violation of the linearity and normality assumption. Multicollinearity was deemed not important (VIF ,5) for every independent variable. Homoscedasticity was checked by examination of the scatterplot of residuals and predicted values, and was met when outliers were excluded from the model. We also tested for outliers using added value and residual plots. Three outliers were identified with citations 1314, 1185 and 793, and were excluded. A variable was considered statistically significant if it had a p-value ,0.05 in the final multivariable model.

Results
A total of 196 articles were analyzed. Experimental studies were excluded, leading to a total of 192 articles. The citation count varied from 5 to 1314 with a median of 96.5 (mean = 166). The majority of studies were prospective (67.2%), open-access (90.2%) and multi-center (67.2%). The most common type of study in our sample was that of a trial (39.6%, both randomized control trials and non-randomized trials). The study characteristics are presented in Table 1.
On univariate analysis, all tested independent variables except access (free versus restricted) and multicenter or single-center study, were found to have a statistically significant correlation to citations (Table 2). Therefore, the following variables were entered in the multivariate model: JIF, number of authors, article length, prospective or retrospective design, type of study (interventional or observational), abstract and title word count, number of affiliated institutions, and number of references, with the logarithm of the number of citations as the dependent variable.
A backward linear regression analysis was performed, removing insignificant independent variables one by one. Two variables were found to independently predict the number of citations: article length (number of pages) [regression coefficient (95% confidence interval): 0.079 (0.055-0.102), p,0.001; Figure 1and JIF [0.008 (0.004-0.013), p,0.001; Figure 2]. The variance of citations explained by these factors is 51.2% (adjusted R 2 = 50.7%), p,0.001. The findings of the univariate and multivariate analyses are presented in table 2.

Discussion
The main finding of this study is that the article length and journal impact factor are independently associated with the number of citations received by each article. Although several previous studies have reported that the journal impact factor is associated with the article citations, this is the first study, to the best of our knowledge, to report a positive association between the article length and the article citations after adjustment for several potentially confounding variables, such as the study design, prospective or retrospective nature of the study, abstract and title word count, number of author-affiliated institutions and number of bibliographic references. Specifically, we found an increase by an average of 0.079 in the logarithm of citations per article for each additional page, 0.008 for every unit of increase in the journal impact factor. The greater article length could reflect increased greater scientific complexity and higher methodological quality of a study; in addition, lengthier articles are expected to contain more information, thus increasing the possibilities that part of it will be appropriate to be cited by other researchers. Furthermore, in lengthier compared with shorter articles, the study methodology and findings could be more clearly and elaborately presented and discussed, and can therefore have a greater impact. It should be highlighted that our findings probably do not apply to long articles where the results have been improperly ''inflated''; after all, some of the greatest discoveries in science have been described only briefly [22].
A few studies have assessed, albeit not comprehensively, the impact of the article length on future citations. In the field of Astronomy and Astrophysics, lengthier articles were cited more often in some journals [23]. In the fields of Infectious Diseases, Clinical Microbiology and Antimicrobial Agents, brief reports were cited less often than full articles, even after adjustment for the journal impact factor [24]. This was not the case in another study assessing 504 articles and adjusting for several confounding factors [13]. In contrast to our study, in which we assessed only original study articles, the authors included in their analysis numerous Cochrane reviews and reports from the Technology Assessment database (n = 108), that are typically lengthy; in addition, they excluded articles not meeting specific methodological and clinical relevance criteria. That study reported a slightly negative   correlation between the article length and the number of citations received [20.11 (20.02 to 20.01)]; however, when Cochrane reviews and reports from the Technology Assessment database were excluded, no association between the article length and citations was identified. Although the difference between these findings and those of our study is probably attributed to the difference in the type of articles assessed (inclusion/exclusion of review articles), it remains to be proven whether our findings can be generalized to a larger part of the biomedical literature than just the 5 highest impact factor journals in General & Internal Medicine. In addition to the number of print pages, we found that the impact factor of the journal and the number of authors were associated with the citation count. Although we limited our analysis only to articles from high impact factor journals, the articles published in the highest impact factor journals were cited significantly more often. It should be noted that we used the 2006 journal impact factor (that refers to articles published in 2004 and 2005) for our analysis (that referred to articles published in 2006) to avoid a potential bias. In this regard, our findings are in concordance with previous studies that found the journal impact factor to be a major predictor of the article citation count [5][6][7][8][9][10].
Several other variables assessed in previous studies were incorporated in our analysis, but failed to show a statistically significant association with the number of citations. The characteristics and findings of all relevant studies are briefly presented in Table 3. Some authors have described an association between the type of the study and the future citations, with more citations received by meta-analyses and randomized control trials and less citations received by observational studies [11,12,14,16]; their findings are have been limited by selection bias (articles of a specific specialty) [11,12,14,16] and inappropriate adjustment of confounding factors [16]. Such findings were not verified in our analysis, as we found no citation advantage neither for interventional over observational studies, nor for any specific type of study (trial, cohort, cross-sectional or case-control); however, this could also be attributed to the relatively small sample size of each subset of articles of different study type. It has been debated whether  open access distribution of articles leads to more citations [18][19][20]25,26] or that scientific collaboration positively influence citation count [13,15,27]; we did not confirm such an association. Last, we did not observe a significant impact of the title length (word count) on the future citations, in contrast to what other researchers have found [25,26]. This may be attributed to the lack of adjustment for confounding factors by those studies. Our study is subject to certain limitations. First, it is characterized by selection bias, as the articles published in high impact factor journals in General Medicine may not be representative of all published articles; for example, they are more likely to be multi-center RCT than a single-center case-control study. Second, although our results are statistically significant, it is possible that the association does not represent a causal relationship. Third, we did not assess the analyzed articles regarding topic [11,15,16], paper quality [9,10], funding [15,18] or country of origin of the authors [7,18], which are factors that have been found to affect citations by other authors. Last, in our assessment of article length, we only analyzed page count (not word count) and inter-journal variance in the number of words per page cannot be excluded.
In conclusion, for original research articles published in the major General Medicine journals, in addition to journal impact factor, the article length independently predicts the number of future citations. This probably reflects a higher complexity level and quality of longer studies and does not apply to inappropriately inflated articles. Additional studies are warranted to verify the generalizability of our findings to a largest part of the biomedical literature.