A Review of the Study Designs and Statistical Methods Used in the Determination of Predictors of All-Cause Mortality in HIV-Infected Cohorts: 2002–2011

Background Research in the predictors of all-cause mortality in HIV-infected people has widely been reported in literature. Making an informed decision requires understanding the methods used. Objectives We present a review on study designs, statistical methods and their appropriateness in original articles reporting on predictors of all-cause mortality in HIV-infected people between January 2002 and December 2011. Statistical methods were compared between 2002–2006 and 2007–2011. Time-to-event analysis techniques were considered appropriate. Data Sources Pubmed/Medline. Study Eligibility Criteria Original English-language articles were abstracted. Letters to the editor, editorials, reviews, systematic reviews, meta-analysis, case reports and any other ineligible articles were excluded. Results A total of 189 studies were identified (n = 91 in 2002–2006 and n = 98 in 2007–2011) out of which 130 (69%) were prospective and 56 (30%) were retrospective. One hundred and eighty-two (96%) studies described their sample using descriptive statistics while 32 (17%) made comparisons using t-tests. Kaplan-Meier methods for time-to-event analysis were commonly used in the earlier period (n = 69, 76% vs. n = 53, 54%, p = 0.002). Predictors of mortality in the two periods were commonly determined using Cox regression analysis (n = 67, 75% vs. n = 63, 64%, p = 0.12). Only 7 (4%) used advanced survival analysis methods of Cox regression analysis with frailty in which 6 (3%) were used in the later period. Thirty-two (17%) used logistic regression while 8 (4%) used other methods. There were significantly more articles from the first period using appropriate methods compared to the second (n = 80, 88% vs. n = 69, 70%, p-value = 0.003). Conclusion Descriptive statistics and survival analysis techniques remain the most common methods of analysis in publications on predictors of all-cause mortality in HIV-infected cohorts while prospective research designs are favoured. Sophisticated techniques of time-dependent Cox regression and Cox regression with frailty are scarce. This motivates for more training in the use of advanced time-to-event methods.


Introduction
Appropriate utilization of biostatistical methods is becoming increasingly important in biomedical research. Many journals, if not all, have a dedicated statistical committee that scrutinizes the methods used in analyzing data. In the last decade, several papers addressing study design issues and statistical analysis approaches in different clinical fields have been published underpinning the importance of robustness in methodology [1][2][3][4][5][6][7][8]. There is consensus that inappropriate study designs and statistical meth-odology lead to incorrect results, poor interpretation of study findings and wrong conclusions.
An array of study designs and appropriate statistical techniques with varying levels of complexity exists. Selecting the appropriate study design and relevant statistical analysis technique is largely dependent on the complexity of the study and its objectives. Research on statistical content of medical research shows wider usage of techniques [9,10] beyond descriptive statistics as a result of advanced software that can handle complex analyses. Much as advanced analyses are being conducted, simple techniques of descriptive and inferential statistical analysis like student t-tests and chi-square tests remain popular in the literature [4,6,11].
Despite major successes in the development of interventions for prevention of mother to child treatment (PMTCT) and antiretrovirals (ARVs), HIV still remains a major public health concern. To date, limited information is available if any, reporting on the study design and statistical techniques used in determining the predictors of all-cause mortality in HIV positive cohorts in the last decade. With a large number of clinicians and public health experts relying on published research for new developments in HIV research, it is important they understand appropriateness of study designs and statistical techniques used in determining predictors of all-cause mortality. This study reviews relevant original articles in HIV-infected cohorts with the aim of identifying study designs, statistical methods used and further assess their appropriateness. We also sought to determine whether there was an increase in the use of time-to-event analysis techniques over time and highlight the need for methodological training.

Search strategy and selection criteria
In this bibliometric analysis, we searched all original Englishlanguage articles indexed in Pubmed/Medline using the terms ''Predictors of HIV Mortality'', ''Determinants of HIV Mortality'' and ''Factors associated with HIV mortality''. The search covered the period between January 2002 and December 2011, a period of ten years. These were further split into two five year periods; January 2002-December 2006 and January 2007 to December 2011 in order to assess whether there was a variation in the methods used over time. Original articles on HIV-infected cohorts within the specified period were eligible for inclusion. Letters to the editor, editorials, reviews, systematic reviews, meta-analysis and case reports were excluded. Other studies comparing both HIV positive and negative participants were also excluded. We identified a total of 91 and 98 papers between the periods 2002-2006 and 2007-2011 respectively.
Each article was reviewed to determine the study design, nature of statistical methods used and their appropriateness. Time-toevent analysis methods were considered optimal or appropriate in this study. A spreadsheet containing a checklist of items of interest was prepared as a data collection tool. Findings were systematically recorded based on statistical methods previously reported [12]. We used a modified version of the classification proposed by Colditz and Emerson (Table 1) [13,14]. Where a statistical technique was used more than once in an article, we recorded it as having occurred only once. A count of the number of statistical techniques employed in each article was determined for purposes of comparing the two periods.
The statistical methods used in the research articles were classified as either parametric or non-parametric. A further classification was made describing the statistical methods used as either basic or advanced. Methods classified as basic included Student t-test, Chi-Square and Fishers Exacts test, Mann-Whitney, Kruskall-Wallis, Wilcoxon, simple one-way ANOVA and correlation statistics. Modelling approaches such as logistic Regression, Conditional Logistic Regression, Poisson Regression, Cox regression, time-varying Cox-regression and Cox regression with frailty and epidemiologic statistics were classified as advanced ( Table 1).
The logistic regression is used to analyse the relationship between a binary dependent variable and independent predictor through estimation of the probability of an event occurring. It makes no assumption about normality, linearity and homogeneity of variance. But used with time-to-event outcomes, it fails to account for follow-up time. For this reason, articles reporting use of logistic regression on such outcomes were classified as suboptimal [15]. Cox regression analysis is a survival analysis technique in time-to-event data that incorporates follow-up time and fixed covariates [16]. Censoring is done when events occur. The method assumes risk of an event is homogeneous. Extensions of the Cox regression exist which include time dependent Cox regression and Cox regression analysis with frailty [17,18]. Time dependent Cox regression analysis accounts for the inherent correlation that may exist when covariates change over time. Cox regression analysis with frailty, if used in some of the reviewed articles, tries to account for unobserved heterogeneity.
The data collected in this study were compared between two periods. Frequency analysis was used to determine the number of studies reporting use of specified statistical techniques. The number of optimal or sub-optimal methods used in the determination of predictors of mortality was determined using frequencies.
The comparison between numbers of methods reported between the two periods was compared using the chi-square test where appropriate. All the Statistical analysis was performed using SAS 9.3 software and p-values #0.05 were considered a significant difference.

Results
The total number of studies reporting on predictors of HIV mortality that met our criteria in the era January 2002 and   Table 2 presents the study design and sample size distribution of the included articles between the two periods.
There were no significant differences in the study designs and sample sizes used between the two periods.
The number of studies reporting descriptive statistics for the two periods was similar. Table 3 presents the distribution of commonly reported statistical methods. The number of articles reporting use of t-tests, contingency table analysis, correlation and epidemiological statistics was similar. Few studies in both periods reported using one-way analysis of variance technique (ANOVA). The number of modeling approaches such as logistic, conditional logistic, generalized estimating equations and Poisson regression was significantly higher in the later period compared to the earlier (n = 31, 32% vs. n = 13, 14% p = 0.005).
A total of 122 (65%) articles reported using the Kaplan-Meier methods and it was commonly used in the first period compared to the second (p = 0.002). Use of the Cox proportional hazards regression modeling was reported by 131 (69%) articles and the number was similar between the two eras (p = 0.12). The number reporting use of time dependent Cox regression was higher in the first period (n = 21, 23% vs. n = 11, 11%; p = 0.03). Overall Cox regression with frailty was scarcely used (n = 7, 4%) in which 6 (3%) articles were in the later period.
There were 22 (12%), 96 (51%) and 71 (38%) articles reporting use of 2 to 3, 4 to 5 and more than 5 statistical methods respectively. There were no significant differences in the number of methods used between the two eras. Similarly there were no significant differences between the two eras in the number of  articles reporting use of basic or advanced statistical analysis methods.
A total of 149 (79%) of the articles used appropriate methods while 40 (21%) used sub-optimal methods to determine the predictors of mortality in HIV-infected participants. Of the articles using appropriate methods, 116 (78%) were prospective and 33 (22%) retrospective. There were significantly more articles from the first period using appropriate methods compared to the second (n = 80, 88% vs. n = 69, 70%, p-value = 0.003). Table 4 presents findings on the appropriateness of the statistical methods used. A significantly higher number of articles in the first period could have used Cox regression analysis with frailty as the appropriate method, since they had clustered data (n = 82, 92% vs. n = 65, 68%; p,0.0001) while overall they were 147 (78%).

Discussion
This paper aimed at reviewing articles on predictors of all-cause mortality in HIV-infected people to investigate the appropriateness of statistical methods used and nature of study designs. We reviewed a total of 189 articles. Like in any other study, there were several limitations. The literature review of the articles included in this study was searched in Pubmed/Medline ostensibly because this was not a systematic review requiring a measure of effect. Any relevant articles indexed elsewhere or in a language other than English were not considered.
Our findings concur with others reporting on study designs, statistical methods used and their appropriateness. Prospective study designs remain the most common type of design used in studies of predictors of HIV mortality in the last decade.
Retrospective study designs formed about one third of all articles included in this study. It may be that retrospective study designs are used as a cost-effective way of saving on huge expenses required for running prospective studies as a way for stimulating academic research. However there was no significant difference in the type of study designs used between the two periods.
Basic statistical analysis procedures like t-tests, Chi-Square and Fishers Exact tests, Mann-Whitney, Kruskall-Wallis and Wilcoxon are commonly used. There was no difference in the number of articles reporting use of t-tests between the two periods. This is similar to previously reported studies that have shown the popularity of t-tests [4,19].
All the studies used at least two statistical tests. We contend that our inclusion criteria and the nature of studies included in this review all required using a type of statistical analysis to address the research question. Our findings concur with those reported earlier showing majority of articles apply more than one statistical test [5,7,20]. But this is contrary to the findings of a review on study designs and statistical methods in Chinese journals that found a low proportion of studies reporting use of multiple statistical tests [21].
Survival analysis approaches remain popular in the studies looking at predictors of mortality in HIV-infected people, especially the Cox proportional hazards regression modeling. Though fewer studies used extensions of the Cox proportional hazards regression, our findings show that there is an interest in using advanced approaches like the time-dependent Cox proportional hazards or Cox proportional hazards regression with frailty in modeling survival data in HIV-infected cohorts. We found a higher proportion of the studies could have used Cox regression analysis with frailty, an appropriate technique. While the methods used were not wrong, they could have gained more information by using Cox regression analysis with frailty. Previously reported work on statistical methods in medical research show that while use of sophisticated methods is increasing, inappropriate techniques still remain a challenge [1,6,7,22]. It may be that recent techniques are advanced and require rigour to implement. Furthermore the techniques may not necessarily be easily implemented in standard statistical software [23]. As a result, researchers use techniques that are fairly straight-forward and implementable in standard statistical software.
Our findings show that not all the studies in our sample used optimal statistical tests in the determination of the predictors of mortality. Survival analysis techniques produce better estimates that are more informative when analysed using optimal methods. Furthermore, in clinical research where objectives require a multivariable analysis approach, it is prudent to adjust for confounding appropriately by using optimal statistical methods [24]. Cox regression analysis and its extensions provide a better picture compared to logistic regression when using survival data. Unlike previously reported research [21,25], the proportion of studies using sub-optimal statistical tests was lower in our sample. These findings are contrary to those reported in other clinical fields where there was a high proportion of articles using suboptimal methods [3,11,19,26,27].
Descriptive statistics and survival analysis techniques remain the most common methods of analysis in publications on predictors of all-cause mortality in HIV-infected cohorts while prospective research designs are favoured. These results suggest the importance of understanding advanced survival analysis methods in interpreting research findings in this set-up. However complex and appropriate methods like Cox regression analysis with frailty remain scarcely utilised. Our findings are in agreement with others who also reported a high use of descriptive statistics [4,6]. The more sophisticated techniques of time dependent Cox regression and Cox regression with frailty are scarcely used. This motivates for more training in the use of advanced time-to-event methods.

Supporting Information
Appendix S1 Prisma checklist.