Systematic review of statistically-derived models of immunological response in HIV-infected adults on antiretroviral therapy in Sub-Saharan Africa

Introduction In Sub-Saharan African (SSA) resource limited settings, Cluster of Differentiation 4 (CD4) counts continue to be used for clinical decision making in antiretroviral therapy (ART). Here, HIV-infected people often remain with CD4 counts <350 cells/μL even after 5 years of viral load suppression. Ongoing immunological monitoring is necessary. Due to varying statistical modeling methods comparing immune response to ART across different cohorts is difficult. We systematically review such models and detail the similarities, differences and problems. Methods ‘Preferred Reporting Items for Systematic Review and Meta-Analyses’ guidelines were used. Only studies of immune-response after ART initiation from SSA in adults were included. Data was extracted from each study and tabulated. Outcomes were categorized into 3 groups: ‘slope’, ‘survival’, and ‘asymptote’ models. Wordclouds were drawn wherein the frequency of variables occurring in the reviewed models is indicated by their size and color. Results 69 covariates were identified in the final models of 35 studies. Effect sizes of covariates were not directly quantitatively comparable in view of the combination of differing variables and scale transformation methods across models. Wordclouds enabled the identification of qualitative and semi-quantitative covariate sets for each outcome category. Comparison across categories identified sex, baseline age, baseline log viral load, baseline CD4, ART initiation regimen and ART duration as a minimal consensus set. Conclusion Most models were different with respect to covariates included, variable transformations and scales, model assumptions, modelling strategies and reporting methods, even for the same outcomes. To enable comparison across cohorts, statistical models would benefit from the application of more uniform modelling techniques. Historic efforts have produced results that are anecdotal to individual cohorts only. This study was able to define ‘prior’ knowledge in the Bayesian sense. Such information has value for prospective modelling efforts.


Introduction
The successful roll out of antiretroviral therapy (ART) in Sub-Saharan Africa (SSA) has dramatically improved the survival of Human Immunodeficiency Virus (HIV) infected people in this region, which remains a focal point of the HIV epidemic [1]. In the majority of cases, the successful suppression of plasma viral load (VL) after ART initiation to below detection levels facilitates immunological recovery in the form of rising CD4 (+) T cell counts. However, 'residual viremia', involving the multiplication of the virus, within for example gut reservoirs, may continue even after circulating VL has been suppressed [2]. As a result CD4 cell count depletion may continue in long term treatment [2,3]. Patients particularly at risk of secondary opportunistic infections include immune 'non-responders' who have low CD4 counts in spite of a suppressed VL [2].
In resource limited settings (RLS) such as SSA, CD4 counts continue to be used for clinical decision making, e.g. when to initiate first-line, switch to second-line ART [4] and to benchmark the risk of incident clinical events [5,6]. In this region, patients who fail to reach >350 cells/μL after 5 years of ART [7] are common and ongoing immunological monitoring is necessary. CD4 count is more affordable than VL monitoring and continues to be the only immunological biomarker recommended by the World Health Organization [8].
However, as a biomarker, CD4 counts are known to be inherently variable both within and between individuals [2,9]. Further, prior multivariate models of CD4 count response to ART have employed varying outcome measures and have consequently produced inconsistent results [10][11][12][13][14][15][16][17]. This variation in models complicates the effects of inherent variation in CD4 counts and hinders the comparison of immunological responses to ART across different cohorts.
In this study we systematically review statistical, or empirically-derived rather than biological-mechanistic mathematical, models of immunological response (CD4 counts) in SSA cohorts. We highlight the similarities, differences and problems associated with the varying methodologies with the aim of defining prior knowledge, in the Bayesian sense, for prospective modeling exercises in the future.

Search strategy
The guidelines from the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) were used (S1 File) [18]. The search syntax was constructed around 4 major terms, allowing for small variations within each. These included 'immune response', 'HIV antiretroviral treatment' or 'ART', 'Statistical model', and 'Sub Saharan Africa' or 'SSA'. Each term was defined based on Medical Subject Heading (MESH) terms or other common, published terminology. Online electronic databases were searched using SCOPUS [19], from 1 st January 2004 up to 2 nd April 2015 (S2 File). This start date was selected as it corresponds to the commencement of ART scale-up in most of SSA [4]. Only studies published in peer-reviewed Englishlanguage journals, which existed in all four sets mentioned above were selected.

Study selection
Abstracts and full-texts of potentially relevant studies were reviewed by JBS and ELU. MN provided the deciding vote if consensus was not unanimous regarding the inclusion or exclusion of a study. Only studies of immunological response, measured as an outcome in any form, after ART initiation in adults were included. Although immune response is not limited to CD4, our searches only returned modeling studies that employed it. Studies were excluded where: 1. there was no multivariate statistical model, 2. immune response was combined with any other treatment outcome, 3. data was analyzed that contained a combination of people from SSA with those from other regions, and 4. immune response prior to ART initiation was analyzed.
Model outcomes were categorized into 3 general groups, further sub-divided by the type of regression used: 1. the trajectory of CD4 counts within particular time-frames after ART initiation, or 'slope' models, with Generalized Estimating Equations (GEE), and Generalized Linear Mixed Effects (GLME), 2. the time to a particular immune response, or 'survival' models, with Cox Proportional Hazards (CPH) and 3. the specified overall gain in CD4 count, or 'asymptote' models, with Logistic, Simple Linear, Difference-in-Difference, Log-Binomial and Poisson regression.

Data extraction
The following data was extracted from each study: first author, year published, country, the sex/es studied, sample size, study design, ART follow-up years, initiating ART regimen (if reported), outcome/s analyzed, variable scale transformation methods, criteria for model variable/s selection (e.g. statistical methods and/or a priori clinical information), assessment of confounding and covariates adjusted for in the final model. For each of the final model variables, the unit and scale of measurement, effect sizes, 95% confidence intervals and, where available, standard deviations were noted. Effect sizes were rounded off to the nearest whole number and 95% confidence intervals and standard deviation to one decimal place. If 'immunologic failure' was mentioned, we checked if it was defined according to the WHO criteria [4]. Risk of bias was also assessed in each study as follows: Low risk-covariate adjusted for in model based on its clinical/biological plausibility; medium risk-covariates included based on both biological and statistical significance; and high risk-model employed only statistical significance (p-value). The provision by authors of biological reasoning, including references, for their covariate adjustments was noted.

Statistical analysis
All data was collated in MS Excel (version 2013) and comparisons made as per the tables below. In R version 3.2.2 using package 'wordcloud' [20], variables adjusted for in the final multivariate models were presented. In wordclouds, the size and color of each word is determined by the frequency of its appearance in a list, in this case all covariates adjusted-for within a specified outcome. This enabled the comparison of variables with potentially different units and/or numeric scales. A minimal frequency cutoff of !3 was used to define the 'consensus' set of covariates across all models reviewed.

Results
Of the 615 articles identified 580 were excluded based on the specified inclusion criteria (Fig  1). Of the remaining 35 the median sample size (and IQR) was 1002 (351-5448) with followup of 2 years (1)(2)(3)(4)(5). Across all models, 75 unique covariates were included in multivariate analysis, of which 69 were adjusted for in the final models. In the majority of cases the effect sizes of covariates were not directly comparable in view of the combination of different variables and varying scale transformations methods across models. However, the frequency of the occurrence of variables, independent of their scales, enabled the identification of a consensus set (Fig 2).
For slope models this included, gender, baseline age, baseline CD4 count, baseline WHO stage, ART initiating or 'baseline' regimen, e.g. efavirenze vs nevirapine, baseline exposure to   Figure 2A: Covariates adjusted for in the final slope models; Figure 2B: Covariates adjusted for in the final Survival models; and figure 2C: Covariates adjusted for in the final Asymptote models. The word size and color represents the frequency of covariates, hence the larger the size of the covariate, the higher its frequency in the list of adjusted covariates. Sitelocation of the study; KSincid-Kaposis' sarcoma diagnosed after ART start; HBVprev-Hepatitis B virus diagnosed at ART start; TBprev-History of TB at ART start; TDFbl-treated with tenofovir at ART start; 3TCbl-treated with lamivudine at ART start; DistanceHC-distance from health center; Maritstatus-marital status of the subject; Season-season of the tear when patient was initiated on ART; ALTbl-alanine aminotransferase at ART start; sdNVP-history of single does nevirapine; Parity-number of children; CD8bl-CD8 count at ART start; CONSULTratio-cadre levels at health center; Hhassets-possession of any household assets; OralCandida-Oral candidiasis at ART start; ChronDiarrhea-Chronic diarrhea at ART start; VLsupress-ever had viral suppression; NNRTIcr-time-updated exposure to either nevirapine or efavirenz; NRTI cr -time-updated exposure to d4T cr (stavudine) or AZT cr (zidovudine) or TDF cr (tenofovir) or 3TC cr (lamivudine); CD4preART-pre-ART start CD4 count; VLpreART-pre-ART start viral load; PreARTexp-pre-ART exposure; AlcoholCons-consumption of alcohol; DurapreART-duration between ART start and diagnosis; duraCD4<200-duration while CD4 <200 cells/μL before ART start; and antiTBstart-patient initiated on anti-tuberculosis medicine. For other variable definitions, please refer to the notes below Tables 2, 3 and 4. zidovudine or stavudine, ART duration, log-VL, baseline hemoglobin level, baseline Body Mass Index (BMI), year of ART start, study site and tuberculosis incidence. For survival models, baseline CD4 count, gender, baseline age and either prevalent or incident tuberculosis. For asymptote models, gender, baseline age, baseline CD4 count, baseline zidovudine exposure, year of ART start, ART adherence, log-VL and baseline BMI. Across all three types of models, Sex, Age, baseline log VL, baseline CD4, ART initiation regimen and ART duration count were the most commonly adjusted for covariates and also those most often significantly associated with the immunological outcomes (Table 1).
Differences were found in the estimation of effect sizes and residuals across all 19 slope models ( Table 2, [50], see Table 4. For criteria used to select covariates for final multivariate models and assessment for confounding (Table 5), 9 studies reported using significance cutoffs ranging from 0.05 to 0.25, and ; HB cr -current hemoglobin level; YrARTstart-year of ART start; ART dura -duration on ART; AZT bl exposure to zidovudine at baseline; NRTI bl -exposure to d4T bl (stavudine) or 3TC bl (lamivudine) at ART start; NNRTI bl -exposure to either efavirenze or nevirapine at ART start; ARTresist-pre-ART drug resistance; TBincid-Incident tuberculosis diagnosis after ART start; TIMEpreg-duration between pregnancies; CD4pregwhether CD4 count was taken during pregnancy; doi:10.1371/journal.pone.0171658.t002   Systematic review of statistically-derived immune response models in SSA

Discussion
This study systematically reviewed recent statistical or empirically-defined models of CD4 count response in HIV-infected adults on ART in SSA. The aim was to arrive at a set of model covariates and outcomes that might allow the comparison of modeling results between cohorts. From the studies reviewed, Sex, Age, baseline log VL, baseline CD4, ART initiation regimen and ART duration were the most commonly adjusted covariates and also those most often significantly associated with the different metrics of immune response across all models reviewed. Many permutations were found, in fact, the majority of the models were different with respect to variable transformations and scales, varying model assumptions, modeling strategies, model reporting methods and the use of different covariates, even if the same outcomes had been studied. In particular: In the CPH models studied, authors did not adjust for time-updated variables [38][39][40]42]. It was assumed that patients remained on their initiation regimen throughout the period of follow-up. It is known from studies of ART regimen durability and tolerability that drug toxicity will often occur in the period soon after initiation, necessitating drug substitutions [51,52]. Such switches are obviously important in understanding CD4 responses, particularly if more potent drugs are subsequently employed. 'Joint' time-to-event and longitudinal (or repeated) measure models may be used for time-updated covariates, in which a 2-phase process involves combining the model/s of the endogenous longitudinal covariate/s with a CPH model [53].
All seven studies which analyzed time to immunological failure did so for only the time to the first failure episode [17, [38][39][40][41]. However, it has been conjectured that multiple failures may actually occur and be hidden by the normal variability seen in adult CD4 counts [9]. CPH models are not appropriate for multiple failure-time points since the outcome terminates after the first event. Further, the assumption of the independence of outcomes is violated since events within an individual are correlated [54]. Corrections to such models for correlated failure time points have been implemented in the form of Andersen-Gill, Marginal Wei-Lin-Weissfeld or Prentice-Williams-Peterson methods [54,55]. If multiple episodes of immunologic failure are present, as defined by the WHO criteria, then the Andersen-Gill method would appear to be a good choice [55].
In selecting regression methods, considerations regarding covariate distributions and the mathematical assumptions regarding their relationship/s with the outcome are important. These assumptions can be tested a priori using the dataset at hand. Only 4 of the studies reviewed indicated that such tests had been used to confirm that the particular covariates fulfilled the model assumptions [15,23,32,41]. If the assumptions are violated it is not possible to estimate the effect of the covariates on the outcome with both precision and accuracy [56].
Six studies used GEEs to model the slope of the CD4 count response. Three defined the outcome as the change in CD4 count from baseline, i.e. from ART initiation [13,16,21] and the others used the change in CD4 count between each subsequent visit [22][23][24]. Of the 11 studies that used GLMEs, 2 used the increase from the baseline as outcome [32,33], one used change in CD4 count between subsequent visits [27] and 8 used absolute change in CD4 counts over time [14,25,26,[28][29][30][31]34]. Only one study used non-linear mixed effects regression [36]. Selecting either GEE-population averaged effects, or GLMM-individual averaged effects, is possible using tests of assumptions regarding the underlying mechanisms of CD4 count response [57]. CD4 counts vary due to both individual patient characteristics and laboratory procedures [58][59][60]. Given the individual effects, GLMEs may be preferable to GEEs in this context. Non-Linear Mixed Effect models (NLMEs) may also be used since they take into consideration mechanistic biological assumptions and both the underlying subject-specific longitudinal responses (CD4) and the variation of these across the study group over time [57].
Sarfo et al 2014 [35] modelled CD4 count response using a GLMM with a Poisson distribution. Baseline CD4 counts, being female, increasing ART duration and baseline WHO stage (stage 1 and stage 2) where associated with increasing CD4 counts, while initiating ART on efavirenz and zidovudine based regimens and higher baseline age were associated with decreasing CD4 counts. These results apparently support prior studies [2,61]. However, the incident rate ratios (IRR) in their final model were close to null. The Poisson distribution assumption may have biased the results towards the null and presumably explains their rounding off IRR to 3 decimal places of [35]. The Poisson model assumes that the probability of the occurrence of any two events p(x\y) is negligible, and the probability of the occurrence of an event p(x) is constant throughout the interval, Δt. In [35] the sampling frequency for CD4 count was 6 monthly, thus, the probability of having another CD4 count measurement was never negligible. Further, the probability of increasing CD4 counts throughout the sampling interval is variable due to adherence, opportunistic infections, and drug resistance [62].
There was also variation in approaches to adjustment for confounding between covariates. Confounding usually refers to a !10% change in the coefficient estimate of the main predictor after adjusting for the effect of a covariate [56]. It does not relate to the significance of the p-values for covariates in the model. Four studies did not report the criteria used to select covariates to be adjusted for in the multivariate models [39,40,47,50]. In others [36,38,41,46], covariates were excluded from the final model since they were not statistically significant. This practice may exacerbate confounding [63,64]. Directed Acyclic Graphs (DAGs) can be used as a non-statistical modeling strategy for multivariate analysis [65]. Such causal diagrams, which are based on clinical or biological assumptions, are useful for deciding on the minimal set of covariates to adjust for. Some studies [17,40,42,43,46] did not adjust for covariates, such as age and baseline CD4 count, even if appropriate data had been collected. Prior reviews by Pinzone et al 2012 [2] and Corbeau et al. 2011 [61] have shown that both baseline age and baseline CD4 count are associated with immunological response to ART.
Covariate scale transformations were reported to have been assessed in only five studies [14, 24,25,32,34]. Others, report a square root transformation of CD4 counts [14, 32,36]. Variable transformations are obviously important in meeting the distributional assumptions of the model/s [56]. Reda [32] investigated a wide range of variable transformations for all variables in their model, while Sudfeld et al 2012 [24] transformed only the main predictor-Vitamin D levels. Other studies employed polynomial transformations of time on treatment [14, 25,34] or regression splines on time [22][23][24]28]. Graphical inspection of the effect of covariate transformation are possible prior to modelling, while statistical tests such as Akaike's Information Criterion (AIC) and the Bayesian Information Criterion (BIC) are useful afterwards [66]. It is also possible to apply Martingale residuals [55] for CPHs. Caution is always required in variable transformation since, for example, categorizing continuous variables may result in residual confounding [67,68]. Further, the interpretation or translation of results into practice becomes problematic as it is no longer direct.
In terms of model validation, only 5 out of 34 studies provided goodness of fit metrics. These included the AIC [15], Hosmer-Lemeshow test [43], and the Log-likelihood ratio [31,32,45] goodness of fit tests. Other possible techniques include cross-validation, i.e. regressing the model on the training dataset to see if it still predicts the outcome, and graphical methods, i.e. analyzing whether model residuals are random by plotting predicted versus observed values. Without such validations there is a risk of overfitting to data [56]. Similarly, the dissemination of results also has a bearing on the comparability of models. Six studies reported only p-values without beta coefficients or confidence intervals [22][23][24][25]46,48] and two studies reported only model coefficients and p-values [34,46]. Ideally, both model coefficients and confidence intervals should be reported. Significant p-values continue to be commonly employed in modeling practice, but these do not indicate clinical significance nor the precision of parameter estimates [69].
Criticism of routine CD4 monitoring in ART has occurred due to the innate biological variation in these counts [9]. However, the value of such criticism seems questionable when it is presented in the absence of suggestions for alternatives, particularly given the fact that HIV is a disease which targets the immune system. Arguably, the limitation of immunological monitoring to only CD4, particularly in SSA, has been based more on considerations of public-health affordability than individual patient welfare. Alternative biomarkers, though considered as indirect immune markers [3], have existed for some time, including among others: Natural Killer (NK) cells, which secrete interferon activating macrophages, which in turn feed off infected and stressed cells and Plasmacytoid Dendritic Cells, which secret type-1 antiviral interferons [3]; β-defensins, which aid in the production of NK cells have also been associated with immunologic response [25,70]; and Co-stimulatory CD28 or co-inhibitory cytotoxic Tlymphocyte antigen 4 proteins, which are expressed by all T-cells in HIV infected people [2,3,71]. The possibility obviously exists to use a combination of CD4 and alternative biomarkers to provide a robust description of the immune system in ART.
This study has limitations. Publication bias may be present in view of the inclusion of only studies published in peer reviewed journals. While specified in the inclusion criteria, only statistical or empirically-derived models were reviewed. This excluded those originating in mechanistic biological theory but did include those expressly incorporating assumptions regarding biological causality. All data collected regarding the models was contingent on the information provided in each study, and based on the assumption that these models should be reproducible using other similar datasets. In comparing the frequency of variables across models, we used a threshold of !3 which may have excluded 'rare' covariates in SSA cohorts. Only a small number of studies analyzed covariates on comparably transformed or untransformed scales. This negated the possibility of a meta-analysis, i.e. direct quantitative comparisons, since the models adjusted for varying sets of covariates. This situation may be understandable in terms of the facts that certain studies aimed at elucidating particular treatment effects, and that authors are incentivized to publish unique results.
In conclusion, for purposes of comparing immunological, i.e. CD4 count, outcomes across cohorts in SSA, statistical models would benefit from the application of more uniform modelling techniques. The value of the historic models to public health in SSA is questionable since the modeling was apparently performed in the absence of a priori comparisons across studies. That is, since such efforts have produced results that are anecdotal to individual cohorts only. However, this study was able to define 'prior' knowledge, in the Bayesian sense. Qualitative and semi-quantitative, rather than quantitative and completely comparable effect sizes, for variables in models of immunological response to ART were defined. Such information has value in terms of prospective modeling efforts in the future.