Efficacy of Initial Antiretroviral Therapy for HIV-1 Infection in Adults: A Systematic Review and Meta-Analysis of 114 Studies with up to 144 Weeks' Follow-Up

Background A comprehensive assessment of initial HIV-1 treatment success may inform study design and treatment guidelines. Methods Group-based, systematic review and meta-analysis of initial antiretroviral therapy studies, in adults, of ≥48 weeks duration, reported through December 31, 2012. Size-weighted, intention-to-treat efficacy was calculated. Parameters of study design/eligibility, participant and treatment characteristics were abstracted. Multivariable, random effects, linear regression models with backwards, stepwise selection were then used to identify variables associated with efficacy. Outcome Measures Antiviral efficacy (undetectable plasma viral load) and premature cessation of therapy. Results 114 studies were included (216 treatment groups; 40,124 participants; mean CD4 count 248 cells/µL [SD 81]; mean HIV-1 plasma viral load log10 4.9 [SD 0.2]). Mean efficacy across all groups was 60% (SD 16) over a mean 82 weeks' follow-up (SD 38). Efficacy declined over time: 66% (SD 16) at 48 weeks, 60% (SD 16) at 96 weeks, 52% (SD 18) at 144 weeks. The most common reason for treatment cessation was participant decision (11%, SD 6.6). Efficacy was higher with ‘Preferred’ than ‘Alternative’ regimens (as defined by 2013 United States antiretroviral guidelines): 75% vs. 65%, respectively, difference 10%; 95%CI 7.6 to 15.4; p<0.001. In 98 groups (45%) reporting efficacy stratified by pre-treatment viral load (< or ≥100,000 copies/mL), efficacy was greater for the lower stratum (70% vs. 62%, respectively, difference 8.4%; 95%CI 6.0 to 10.9; p<0.001). This difference persisted within ‘Preferred’ regimens. Greatest efficacy was associated with use of tenofovir-emtricitabine (vs. other nucleoside analogue backbones) and integrase strand transfer inhibitors (vs. other third drug classes). Conclusion Initial antiretroviral treatments for HIV-1 to date appear to have suboptimal long-term efficacy, but are more effective when commenced at plasma viral loads <100,000 copies/mL. Rising viral load should be considered an indication for starting treatment. Integrase inhibitors offer a treatment advantage (vs. other third drug classes).


Introduction
Combination antiretroviral therapy (cART) for human immunodeficiency virus (HIV)-1 infection typically comprises a 'backbone' -two nucleoside analogue reverse transcriptase inhibitors (NRTI) -and a third drug -either a non-nucleoside reverse transcriptase inhibitor (NNRTI), a protease inhibitor (PI) or an integrase strand-transfer inhibitor (INSTI).
The United States Department of Health and Human Services (DHHS) guidelines form a major basis for HIV public policy in resource-rich settings. As of February 2013, an NRTI backbone of tenofovir-emtricitabine/lamivudine, with either efavirenz (NNRTI), raltegravir (INSTI), or ritonavir-boosted atazanavir or darunavir (PI) comprises 'Preferred' initial therapy [1]. When to commence cART remains guided by the clinical stage and CD4 lymphocyte count; pre-treatment plasma HIV viral load was removed as an indication for starting cART in 2007. While such recommendations arise from serial evaluation of individual studies by expert bodies, a systematic review of outcomes across multiple studies may reveal characteristics associated with success/failure, and so inform drug development, future study design, treatment guidelines and ultimately patient care.
Of the available meta-analyses of initial cART, most focus upon specific comparisons, or earlier studies [2,3,4,5,6]. None has evaluated outcomes beyond 48 weeks or by the regimen type ('Preferred' vs. 'Alternative'). Much data (some of it unpublished) have been generated since the last broad-ranging analysis of initial cART efficacy [4], and an updated comprehensive assessment of initial cART efficacy and its associations is warranted.

Methods
This systematic review evaluated all prospective studies of initial cART in adults reported through December 31, 2012. The primary outcome measure was antiviral efficacy, defined as undetectable (study-defined) plasma HIV viral load reported on an intention-to-treat (study-defined) basis. Substitution of any initial drug was regarded as treatment failure. Secondary outcomes were efficacy at weeks 48, 96 and 144, change in efficacy postweek 48, and premature cessation of initial cART. Subgroup analyses of efficacy were performed within pre-treatment HIV viral load strata ($ or ,100,000 copies/mL plasma) and by use of a 'Preferred' or 'Alternative' regimen as per the February 2013 edition of the DHHS guidelines. Additionally, we aimed to identify characteristics associated with heterogeneity of summarised efficacy, and premature treatment cessation due to participant decision, adverse events or virological failure.

Study protocol and eligibility criteria
Conduct of this study was in accordance with the PRISMA Statement [7]. The protocol/analysis plan are available from the editors or authors upon request (Protocol S1).
This review aimed to include all studies of initial cART, subject to strict eligibility criteria to ensure data quality. Included studies: were conducted in consenting, treatment-naïve, HIV-1-infected adults; were a prospective cohort or randomised trial; reported efficacy data; and had a minimum of 48 weeks' follow-up. Comparative trials of initial cART were assessed only for the duration that the original randomisation was preserved.
We excluded studies of: retrospective or cross-sectional design; cART regimens categorised as not to be offered at any time in key treatment guidelines from 1996 through 2013 [1,8,9]; multiple/ variable regimens within a single study arm; and directly-observed therapy. However, treatment groups with fixed, unspecified, dual-NRTI backbones and a common third drug were permitted [10]. Studies of novel, class-sparing regimens were considered for inclusion on an individual basis after review of the efficacy data (relative to standard triple-drug therapy based on a dual-NRTI backbone) by at least two authors. Boosting-dose ritonavir was not regarded an antiretroviral drug (but included in pill counts/dosing requirements). Apart from excluding any study presenting ,48 weeks of efficacy data, eligibility criteria were the same as our previous systematic review of pre-2008 studies [4].

Data sources and search strategy
The search period was January 1, 2008 to December 31, 2012. Electronic databases searched were: MEDLINE; Cochrane Central Register of Controlled Trials; United States National Institutes of Health clinical trials registry; and the International Standard Randomised Controlled Trial Numbers registry. For each of these, the search strategy was: '(''drug'') AND (HIV OR antiretroviral) AND (cohort OR randomised trial)', where ''drug'' was the generic or pre-approval code name of an antiretroviral drug. No language restriction was applied. Abstracts from the following key scientific meetings between 2008 and 2012 were also searched: Conference on Retroviruses and Opportunistic Infections; International AIDS Society; Interscience Conference on Antimicrobial Agents and Chemotherapy; and the International Congress on Drug Therapy in HIV. Product labels and medical reviews of antiretroviral drugs published by the United States Food and Drug Administration and the European Medicines Agency between January 1, 1996 and December 31, 2012 were reviewed manually.
The above data were combined with all eligible 1995-2008 studies included in our earlier systematic review [4]. All were manually reviewed in duplicate by an author (FJL) for eligibility before being combined with results of the latest search. Discrepancies were discussed with a second author (AC).
Study synopses accessible on the following pharmaceutical company websites up to December 31, 2012 were reviewed: Abbott; Boehringer-Ingelheim; Bristol-Myers Squibb; Gilead Sciences; GlaxoSmithKline; and Roche. Other antiretroviral manufacturers did not have such websites.
The PRISMA flow diagram is depicted in Figure 1.

Data extraction
Study arms were regarded as individual groups. For every group, characteristics collected were: year of commencement; participant numbers; latest reporting format; study characteristics; eligibility criteria; participant/disease characteristics; treatment characteristics; efficacy; and rates of premature treatment cessation. Efficacy data were collected cumulatively through study weeks 48, 96 and 144.
Dosing requirements were assumed to be in accordance with the product label at the time of study initiation or subsequent approval, unless otherwise stated. Efficacy data reported using the 'time to loss of virological response' (TLOVR) algorithm were collected preferentially. The newer 'Snapshot' algorithm was not considered, due to the small numbers and reported similarity of results to TLOVR [11]. All plasma viral load assays were considered equivalent -when more than one threshold was reported (e.g. both ,400 and ,50 copies/mL plasma), the lowest was used.
Data were abstracted manually in triplicate by one author (FJL) using a standardised data collection form. Ethnicity was recorded as 'white', 'black' or 'other', defined as previously [4]. Cause of premature treatment cessation was categorised as one of: adverse event, participant decision (withdrawal, loss to follow-up), virological failure, and other. No imputation was made for missing data (marked as unavailable). Completed forms and the resulting electronic database were audited by a second author (AC) to ensure data quality.

Summary measures and data synthesis
The study arm was the unit of analysis, with clustering by trial accounted for. Principal summary measures were the proportion of participants per group and the mean (for continuous variables, including proportions). For summary measures across study arms, outcomes were expressed as a mean percentage, weighted for group size. Standard error of the mean was used to adjust for group size differences and heterogeneity was quantified using the I 2 statistic [12]. Adjusted absolute difference in outcomes within categorical variables was expressed as a percentage coefficient. Differences between means were compared using Student's t-test.
A multivariable, random-effects, linear regression approach was used to explore sources of heterogeneity in measures of efficacy and treatment cessation. Variables pre-selected for testing included all study, participant, disease and treatment characteristics as well as adverse events ( Table 1). Year of study commencement was excluded from the primary predictor analysis, because of its likely relationship to other variables (drug potency/tolerability, pill/dose counts, availability of genotypic testing, emphasis on maximum pill adherence). Fixed NRTI backbones and third drug classes were compared, not individual drugs. This was done to retain analytical power and sensitivity across subgroup analyses. Furthermore, comparative prospective trials are powered for non-inferiority, and no clear superiority has hitherto been demonstrated for individual third drugs within a given class for initial cART -excepting boosted vs. unboosted PI. Unboosted PIs, (including unboosted atazanavir) were considered here as a separate class. Backwards, stepwise selection was used: only those statistically significant (p#0.05) in univariable analysis were further assessed in multivariable models. Additionally, where covariate data were missing for .20% of individuals within a group, that group was excluded for assessment of that covariate.
For stratified antiviral efficacy (pre-treatment viral load; regimen type) tests for interaction were conducted first. Only variables with statistically significant interactions with the stratified outcome were further interrogated using the aforementioned linear regression approach.
The following pre-planned sensitivity analyses were performed for the predictor linear regression models: (1) including only recent data by excluding studies commenced prior to 2005; (2) excluding cohorts and abstracts to correct for potential differences in study/ report quality; and (3) including year of commencement as a variable. Four post-hoc analyses were performed: (1) descriptive efficacy/premature cessation data for groups using tenofoviremtricitabine and abacavir-lamivudine, both overall and stratified by pre-treatment plasma viral load $ and ,100,000 copies/mL; (2) a variant of the primary linear regression model for overall efficacy conducted after exclusion of all abacavir-lamivudine groups without HLA-B*5701 screening; (3) descriptive efficacy/ premature cessation data and a linear regression model for overall efficacy when considering raltegravir separately from other INSTI drugs; and (4) descriptive efficacy/premature cessation data and a linear regression model for overall efficacy when considering unboosted atazanavir separately from other unboosted PIs (Tables S1-S6 in Appendix S1: Supplementary post-hoc stratified analyses).
All analyses were performed using STATA, Version 11 (StataCorp LP, College Station, Texas, USA). The meta suite of commands were used for combining data across trial arms.

Study and participant characteristics
Of 114 included studies, 97 (85%) were randomised trials and 17 (15%) prospective cohorts ( Table 3), encompassing 216 treatment groups with 40,124 participants (median 112 participants/group; interquartile range 63 to 200). This represents 73 new groups (32 randomised trials, 3 cohorts, 17,057 participants) since our earlier review [4]. Participant and treatment characteristics are shown in Table 4; these were similar in terms of NRTI backbone and third drug class, demographics and disease stage for each analysis population (data not shown).
Overall efficacy: all studies Mean overall efficacy was 60% (SD 16) after a mean follow-up of 82 weeks (SD 38) with greater efficacy in more recent studies (Table 5, Figure 2). Collected data were highly heterogeneous (I 2 = 96%).
Study phase, intention-to-treat analysis method, genotype/CD4 eligibility restrictions, NRTI backbone, third drug class, daily pill and dose counts were identified as the primary sources of efficacy heterogeneity on univariable analysis ( Table 6). In the multivariable analysis, higher efficacy was associated with: NRTI backbone (favouring tenofovir-emtricitabine, p,0.001); third drug class (favouring INSTI, p,0.001); and studies using intention-to-treat algorithm (favouring 'missing equals failure', p,0.001). This model accounted for 56% of the variance (r 2 ) in reported efficacy.
Type of NRTI backbone (favouring tenofovir-emtricitabine, p, 0.001), third drug class (favouring INSTI, p,0.001) and efficacy analysis method (favouring 'missing equals failure', p = 0.001), eligibility by pre-treatment resistance genotype (favouring restriction, p = 0.007) and CD4 lymphocyte count (favouring no restriction, p = 0.001) were each associated on multivariable analysis with greater efficacy at week 48. At week 96, greater efficacy was associated with: phase 2 studies (vs. phase 3 or 4, p, 0.001) and no eligibility restriction for CD4 count (p,0.001). The r 2 values were 66% and 39% for weeks 48 and 96, respectively. Due to insufficient data, a stable multivariable model could not be generated for efficacy through week 144.
For those studies reporting efficacy data through to week 96, a multivariable analysis for the decline in efficacy between weeks 48 and 96 was performed. Lesser decline was associated with phase 2 studies (p = 0.002), placebo use (p,0.001), and NRTI backbone (p = 0.002), but there was no significant difference between use of tenofovir-emtricitabine and abacavir-lamivudine.
The initial regimen type selected was found to significantly interact with the relationship of third drug class (p = 0.040), dosing requirements (p = 0.024), previous AIDS events (p = 0.003), and pre-treatment CD4 lymphocyte count (p = 0.049) on efficacy. On regression analysis (Table 7), the pre-treatment CD4 count was associated with efficacy within 'Preferred' regimens (coefficient ,
Stratification by pre-treatment viral load showed no significant interaction with the relationship of any pre-selected variable upon efficacy (p.0.05), and univariable-multivariable analysis was not pursued. Despite this lack of interaction, 'Preferred' regimens (February 2013 DHHS guidelines) maintained greater efficacy at both higher and lower viral load strata; efficacy within these strata were 73% (SD 12) and 82% (SD 12), respectively (difference 9.1%; 95%CI 4.3 to 14.0; p = 0.001).
On multivariable analysis adjusting for study, participant, disease and treatment characteristics and adverse events, industry-only sponsorship was associated with lower cessation rates due to participant decision, compared to industry-supported academic (coefficient 22.6%; 95%CI 24.8 to 20.5; p = 0.015) and academic-only sponsorships (coefficient 24.0%; 95%CI 26.3 to 21.6; p = 0.001). Similar associations existed between sponsorship and cessation due to adverse events. Phase 2 studies were associated with lower rates of cessation due to adverse events than phase 3 or 4 studies (coefficients 22.6% and 24.0%, respectively; p,0.015). Neither NRTI backbone nor third drug class were associated with cessation attributed either to adverse events or participant decision.

Other sensitivity analyses
The separate analyses conducted after excluding cohorts and abstracts, and then including the year of commencement as a variable both gave similar results to the primary analysis (data not shown).

Discussion
For studies initiated from 1995 through to 2010, the overall mean efficacy of initial cART was low (60% over 82 weeks), although it has risen substantially for more recent studies, suggesting it could rise further with studies commenced post-2010 (not available for this analysis before locking of the final database). Efficacy was higher (75% over 99 weeks) with current 'Preferred' cART regimens. There is ongoing loss of efficacy through the second and third years of initial cART, mainly because of participant decision or adverse events. Across all analyses, tenofovir-emtricitabine was more effective than abacavirlamivudine, and INSTI more effective than NNRTI. As the sensitivity analyses demonstrated, this also applied to the more recent studies of initial cART. Participants with pre-treatment viral load ,100,000 copies/mL plasma had significantly better antiviral outcomes on initial cART, a finding persisting within 'Preferred' regimens, and similar in magnitude to that favouring February 2013 DHHS 'Preferred' over 'Alternative' regimens.
Initial cART should aim to induce and maintain long-term virological control. For a disease that presently requires life-long treatment, this analysis demonstrates suboptimal efficacy and durability even with currently 'Preferred' initial regimens, corroborating similar findings of the Antiretroviral Therapy Cohort Collaboration [13]. In practice, individuals failing one regimen are likely to be switched to a different, effective combination. Therefore, the present data reflect success or failure of initial cART only, not the absolute failure of all treatment, which we have not examined. Nevertheless, our data show for the first time that almost 15% of participants failed cART between weeks 48 and 144 of therapy. If this trend were ongoing, the majority of initial regimens would fail within 10 years of commencement. As the pace of new antiretroviral development slows, treatment options may become limited as patients are progressively exposed to switching of cART regimens, even in resource-rich settings [14].
Participant decision was the most common identifiable cause of failure, ahead of adverse events or virological failure. This was the case both for studies of 'Preferred' regimens, and more recent studies (post-2005). The underlying reasons for this decision, or whether participants re-started cART outside a study, were not routinely reported. Uncovering and systematically documenting reasons for premature cART cessation and subsequent outcomes will be necessary if participants are to be retained more successfully on treatment. Industry-sponsored studies were associated with lower rates of premature cessation due to participant decision or adverse events. It is possible that the cost-free medication and more frequent clinical follow-up commonly   mandated within such studies may have attenuated cessation rates. Treatment cessation due to virological failure was consistently low. This was the case even in the earliest cART studies, possibly because the limited treatment options available at the outset of cART prevented drug switches even in the event of study-defined virological failure. Most treatment failure/cessation occurred early, within the first 48 weeks, with incremental declines of ,10% per year at weeks 96 and 144. It may seem self-evident that the longer an individual remains on a single regimen the lower the likelihood of failure, but this may explain why at week 96, only study design characteristics were associated with efficacy, rather than treatment or disease characteristics. There were, however, considerably fewer study groups with data for 96 weeks of follow-up.
Unsurprisingly, of the intention-to-treat strategies used to calculate efficacy, the 'missing equals failure' algorithm was associated with greater treatment success, as it has the fewest imputations regarding failure. This raises the possibility that overall efficacy may have been even lower if other, more stringent algorithms had been used exclusively. However, the 'missing equals failure' approach arguably more closely reflects actual clinical practice.
The four key international guidelines use the CD4 lymphocyte count as a key marker for when to start cART, with none recommending pre-treatment viral load as an indication [8,9,15]. Viral load was removed from DHHS recommendations in 2007, because of data indicating that risk of AIDS or death in individuals receiving cART with pre-treatment CD4 counts $350 cells/mL was ,2% regardless of viral load [16]. Our findings that: (1) the high-vs.-low viral load and 'Preferred'-vs.-'Alternative' regimen efficacy differences were similar (8.4% vs. 10%, respectively); and (2) 'Preferred' regimens had greater efficacy at pre-treatment viral loads ,100,000 copies/mL, are therefore striking, since it suggests that the pre-treatment viral load exerts an independent effect upon efficacy, even as guidelines place a primary emphasis on regimen selection. An implication of this finding is that the presence or absence of pre-treatment viral loads $100,000 copies/mL may influence the efficacy ultimately reported in studies. In particular, it may help explain the lesser efficacy of abacavir-lamivudine compared to tenofovir-emtricitabine -a finding noted in the ACTG A5202 study [17].
In the existing pre-planned multivariable analyses, neither median viral load, nor the proportion of participants with a pretreatment viral load $ or ,100,000 copies/mL was significantly associated with efficacy in the overall multivariable analysis. Within the pre-planned subgroup analyses, there were also no significant interactions between: (1) the regimen type and pretreatment viral load strata; or (2) the viral load-stratified efficacy and treatment characteristics. However, a post-hoc analysis of our data comparing tenofovir-emtricitabine with abacavir-lamivudine mirrors the gaps in efficacy seen between 'Preferred' and 'Alternative' regimens (Table S1 in Appendix S1: Supplementary post-hoc stratified analyses). At viral loads $ 100,000 copies/mL, overall efficacy with tenofovir-emtricitabine was higher than abacavir-lamivudine (71% [SD 10] vs. 59% [SD 12], respectively; difference 11%; 95%CI 3.6 to 17.7; p = 0.004), with a similar difference at viral loads ,100,000 copies/mL (79% [SD 12] vs. 67% [SD 14], respectively; difference 14%; 95%CI 6.5 to 20.8; p,0.001). The mean follow-up periods were 83 (SD 37) and 74 (SD 38) weeks, respectively. The descriptive findings, both pre-planned and post-hoc, suggest that the overall superior efficacy of tenofovir-emtricitabine (and 'Preferred' regimens) is independent of the plasma viral load and further implies that for a given drug or combination, efficacy at lower viral loads is better than at higher viral loads. Not only does this validate current 'Preferred' regimens, it argues for future guidelines recommending cART initiation when the plasma viral load rises towards 100,000 copies/ mL. Comparable results have been reported recently, albeit limited to 48 weeks' follow-up [18]. The ACTG A5202 study showed a higher risk of study-defined virological failure with abacavir-lamivudine for viral loads $100,000 copies/mL at interim analysis (resulting in the unblinding of that stratum), rather than all-cause, intention-to-treat failure. As we did not have premature cessation data stratified by pre-treatment viral load, the results are not directly comparable.
While viral load and regimen type did not significantly interact to influence efficacy, precluding multivariable analyses of these subgroups, it is worth noting that the high-low threshold of 100,000 copies/mL (log 10 5.0) reported in studies is arbitrary. A meta-analysis of efficacy data from individual participants may reveal a clinically relevant association between gradations of viral load and long-term efficacy.
The superiority of tenofovir-emtricitabine over abacavir-lamivudine, although statistically significant on primary analysis, remains confounded by one major issue -that of abacavir-related hypersensitivity. The association between HLA-B*5701 and abacavir-related hypersensitivity, first reported in 2002, is welldescribed [19,20]. It would have been advantageous to have more efficacy data with pre-treatment HLA-B*5701 screening. However, due to limited availability, testing for HLA-B*5701 did not become the standard of care until its inclusion in DHHS guidelines from 2007 [21]. By that point, 24 of the 26 groups using abacavir in our review had already commenced, leaving only two studies (total 227 participants) which utilised HLA-B*5701 screening (efficacy 55% [SD 13] over 96 weeks).
Frequency of abacavir-related hypersensitivity is estimated at between 2% and 9%, with some ethnic variation [22]. A metaanalysis of 5,332 patients exposed to abacavir reported a mean incidence of 4% (range 3% to 6%) [23]. Hypersensitivity does contribute to the lesser efficacy of abacavir-lamivudine vs. tenofovir-emtricitabine in our primary analysis, but within the limitations of the source data its relative contribution to higher  Table 5. Efficacy and categories of premature treatment cessation.  Table 6. Characteristics* associated with efficacy of initial antiretroviral therapy: all groups. treatment failure cannot be quantified. Given the insoluble nature of the missing data, one approach to addressing this problem is to infer that the adjusted efficacy difference of 10% between tenofovir-emtricitabine and abacavir-lamivudine is partially due to hypersensitivity (assuming a mean incidence of 4%), with other factors responsible for the balance. For instance, amongst post-2005 studies, the approval of single-tablet, fixed-dose preparations is a possible reason for patients electing to switch away from abacavir mid-clinical trial. During the period of analysis (up to 2010), all such preparations contained tenofovir-emtricitabine as the NRTI backbone. The pairing of tenofovir with emtricitabine, rather than lamivudine, may also contribute to the efficacy difference. Although emtricitabine and lamivudine have closely related chemical structures [24,25], and are regarded as interchangeable by the DHHS and the World Health Organization [8,26], emtricitabine has a longer intracellular half-life than lamivudine [27,28]. The alternative approach is a separate, post-hoc, multivariable analysis excluding the 24 abacavir-lamivudine groups (5,289 participants) that did not use HLA-B*5701 screening. Such an analysis was performed ( Table S2 in Appendix S1: Supplementary post-hoc stratified analyses); in this model, the comparative difference in efficacy (vs. tenofovir-emtricitabine) was non-significant (coefficient 214.4%; 95%CI 230.2 to 2.5; p = 0.175), although the large negative coefficient favoured tenofovir-emtricitabine. In one of the two included abacavir groups (from study CNA109586), hypersensitivity accounted for more premature cessation in the abacavir arm than the comparator tenofovir-emtricitabine arm (6% vs. ,1%, respectively), despite HLA-B*5701 screening [29]. With only the two HLA-B*5701-screened groups included, this post-hoc analysis is underpowered to compare abacavir-lamivudine with other NRTI backbones. However, as all other groups were left unchanged from the original primary analysis, the other predictive associations (third drug class, intention-to-treat analysis method) were preserved.
The disadvantage of removing the majority of abacavirlamivudine groups is that it also removes a large amount of data for the concomitant drug classes from our study. If further studies were then to be excluded on the basis either of currently nonrecommended regimens or a non-standard practice, it would also necessitate the exclusion of older cART regimens (e.g. zidovudinelamivudine), and similarly, the many earlier studies commenced before pre-treatment resistance genotyping became the recommended standard of care in 2006 [30]. With the removal of such a large number of groups from the analysis, the benefits of ecological analysis may be lost, and with it, several comparisons between NRTI backbones and third drug classes.
This systematic review only evaluated entire third drug classes. While this may limit the specificity of the comparisons, assessing individual drugs will result in a loss of analytical power. Our approach is vindicated by long-term prospective data. The adjusted efficacy difference between INSTI and NNRTI classes is consistent with the 10% lower efficacy of efavirenz vs. raltegravir (both 'Preferred' agents), as well as with recently-published fiveyear outcomes of the 004 and STARTMRK studies, and the short-term results of the SPRING-1 study comparing efavirenz with dolutegravir [37,38,39]. Examining pooled data can effectively portend long-term prospective results, displaying the value of systematic review.
The favourable efficacy of INSTI-over NNRTI-based initial cART has been noted elsewhere [31]. Our analysis supports this finding, but extends it to include a treatment advantage over boosted PI therapy, and updates earlier meta-analyses conducted *Study phase, genotype/CD4 eligibility restrictions, pills and doses per day were significant on univariable analysis but not multivariable. Co-infection with hepatitis B or C were excluded from the multivariable analysis because .
20% of groups were missing data.
{ Study sponsorship, placebo use, country/region of recruitment, haemoglobin/viral load/liver function eligibility restrictions, race, risk factors for HIV infection, sex, previous AIDS events, pre-treatment viral load/CD4 count, dosing relative to food, serious adverse events, and clinical/laboratory adverse events of at least moderate severity were variables not significant on univariable analysis. Coefficient represents the adjusted percentage difference in outcomes relative to a unit increase in any variable (or relative to the reference variable within a category appears that while a one-pill-per-day regimen is convenient, it is not necessarily superior. This remains an inference, as importantly, 44% of studies (weighted) used placebo controls, inflating the daily pill count. We did not analyse 'Preferred' and 'Alternative' cART regimens by guidelines other than of the DHHS (e.g. World Health Organisation, International Antiviral Society -USA, European AIDS Clinical Society). International ART guidelines have all been derived from the same pool of publicly reported data. Such duplicated analyses are likely to show results similar to the ones we obtained for DHHS-specified cART regimens, i.e. that use of a 'Preferred' cART regimen (however defined) resulted in higher efficacy than with 'Alternative' regimens. Similarly, despite our updated search being restricted to between 2008 and 2012, an additional sensitivity analysis for post-2008 studies was not performed. This would have effectively duplicated the 'Preferred'-vs.-'Alternative' regimen analysis, while arbitrarily excluding studies of currently 'Preferred' (and therefore relevant) regimens that commenced pre-2008.
There are several limitations to our study. Chief of these is that the base unit of analysis in our methodology was the treatment group. Hence ours was an ecological analysis, using mean aggregate data, rather than individual participant data. Also, other, more commonly-reported meta-analyses of cART aim to combine similarly-designed, randomised comparisons of efficacy/ failure to achieve data homogeneity. Such analyses therefore have a different hypothesis, i.e. they evaluate the comparison, whereas a study such as ours describes overall efficacy and failure in each group exposed to the treatment drug(s). By including a wide variety of treatments and settings, our data set is more heterogeneous; a meta-regression approach permits exploration of potential sources of heterogeneity in the estimation of efficacy. This would not be possible with a 'typical' meta-analysis of similar studies. However, because broad associations are identified, it is not possible to infer causality, and it would be incorrect to deduce the impact of a particular cART regimen on efficacy in participants of a certain age, gender, race, location or clinical status. However, as standardised outcome measures (virological efficacy and failure) have been applied to a large weighting of participants, it is best suited to representing the likely outcomes for populations with a similar profile, reporting associations that may advise public policy, like the DHHS and other international antiretroviral guidelines, and identifying topics for future study. As most study groups were of predominantly white race, this may limit applicability to currently resource-limited settings. A multivariable approach was used to interrogate heterogeneity within the efficacy data. With this approach, bias due to missing data may allow some clinically relevant sources of heterogeneity to be missed whilst other, non-relevant sources might reach statistical significance. The large number of groups examined reduces, but does not eliminate, the risk of such bias. The inability of our study to assess the effect of HLA-B*5701 screening on the relative efficacy of abacavir-lamivudine is a key example of this, and one which necessitated post-hoc analysis in an attempt at characterisation. The failure of some covariates to reach significance may also highlight which data are currently poorly reported in studies. We were able to source some non-published data from pharmaceutical sponsors, but very little from academic sponsors. For each subgroup analysis, population numbers changed, so comparisons between subgroups can only be inferred. For most studies, randomisation was not stratified by pre-treatment viral load. It is likely that variables not reported by viral loads strata (previous AIDS events, CD4 lymphocyte count, adherence, co-infection with viral hepatitis) can partly explain the differences seen between high and low viral load strata. This limitation does not affect other subgroup analyses. Finally, HIV viral load was used as the primary outcome measure, rather than a clinical endpoint.
Our data identifies pre-treatment HIV-1 viral load as a determinant of efficacy, which should prompt a re-examination of its place in treatment guidelines, and also suggests that HIVinfected adults should initiate cART for plasma viral loads rising towards 100,000 copies/mL. Whether this difference is driven by lesser antiviral potency at high viral loads is unknown because of missing data, but requires further investigation. One possible option would be a prospective study comparing triple-and quadruple-drug combinations as initial therapy in high viral loads. The medium-term efficacy of initial cART alone remains unsatisfactory, and prospective studies require longer follow-up and better reporting of adverse events and reasons for participantinitiated treatment cessation. As an ecological analysis of pooled outcomes, our findings are associations rather than causes, best applied to populations with demographics similar to our source data. Future analyses performed using data from individual participants are needed to allow specific causality to be inferred for the trends identified in this study.

Supporting Information
Appendix S1 Supplementary post-hoc stratified analyses.