Conceived and designed the experiments: RDK VVW CJP SB HFG. Performed the experiments: MH JMW JB SY CC TK. Analyzed the data: RDK VVW TH. Contributed reagents/materials/analysis tools: TH CJP MH JMW JB SY CC TK. Wrote the paper: RDK VVW HFG SB.
Potential conflicts of interest: H.F.G. has been an adviser and/or consultant for GlaxoSmithKline (GSK), Abbott, Gilead, Merck Sharp & Dohme (MSD), Novartis, Boehringer Ingelheim, Roche, Tibotec, JansenCilag and BristolMyers Squibb (BMS) and has received unrestricted research and educational grants from Roche, Abbott, BMS, GSK, Gilead, Pfizer, ViiV Healthcare, Tibotec, and MSD (all money went to institution). S.Y. has participated in advisory boards of BMS and Tibotec and has received travel grants from GSK and MSD. M.C. has received travel grants from Abbott, Boehringer Ingelheim, and Gilead. E.B. has been an advisor and/or consultant for Gilead and Abbott, has been a member of an advisory board of ViiV, Gilead, Tibotec, Pfitzer, and MSD, and has received research grants from Gilead and Abbott as well as travel grants from BMS, Gilead, ViiV, MSD, Abbott, and Tibotec. P.L.V. has been a member of an advisory board of MSD, Tibotec, Gilead, and ViiV and has received payment for lectures from Gilead, Tibotec, and GSK. All other authors report no potential conflicts.
HIV1 replicative capacity (RC) provides a measure of withinhost fitness and is determined in the context of phenotypic drug resistance testing. However it is unclear how these invitro measurements relate to invivo processes. Here we assess RCs in a clinical setting by combining a previously published machinelearning tool, which predicts RC values from partial pol sequences with genotypic and clinical data from the Swiss HIV Cohort Study. The machinelearning tool is based on a training set consisting of 65000 RC measurements paired with their corresponding partial pol sequences. We find that predicted RC values (pRCs) correlate significantly with the virus load measured in 2073 infected but drug naïve individuals. Furthermore, we find that, for 53 pairs of sequences, each pair sampled in the same infected individual, the pRC was significantly higher for the sequence sampled later in the infection and that the increase in pRC was also significantly correlated with the increase in plasma viral load and with the length of the timeinterval between the sampling points. These findings indicate that selection within a patient favors the evolution of higher replicative capacities and that these invitro fitness measures are indicative of invivo HIV virus load.
Determining how well different genotypes of HIV can replicate within a patient is central for our understanding of the evolution of HIV. Such in vivo fitness is often approximated by in vitro measurements of viral replicative capacities. Here we use a machinelearning algorithm to predict in vitro replicative capacities from HIV nucleotide sequences and compare these predicted replicative capacities with clinical data from HIVinfected individuals. We find that predicted replicative capacity correlates significantly with the concentration of HIV RNA in the plasma of infected individuals (virus load). Furthermore, we show that the predicted replicative capacity increases in the course of an infection. Finally, we found that the temporal increase of replicative capacity correlates significantly with the temporal increase of virus load within a patient. These results indicate that (predicted) replicative capacity is a useful measure for viral fitness and suggest that virus genetics determines virus load at least to some extent via replicative capacity.
Measuring the fitness of HIV1 is notoriously difficult. At the betweenhost level, fitness can be interpreted as the transmission potential which is defined as the expected number of transmissions in the course of an infection
In a recent publication, some of the authors of this article described a computational method to predict RC values on the basis of viral aminoacid sequences
The Swiss HIV cohort study was approved by individual local institutional review boards of all participating centers (
Fitness is measured as the log replicative capacity of HIVderived amplicons [representing all of Protease(PR) and most of Reverse Transcriptase (RT)] inserted into a constant backbone of a resistance test vector. The models are then trained to predict this fitness from the aminoacid sequence of the amplicons. Details on the experimental measurement of the RC values and on inferring the predictor have been published in
In essence, the predictor is based on fitting the data consisting of amino acid sequences s and the corresponding logRC values (
We assessed the RCpredictor by using two datasets collected from untreated, chronically infected patients. The latter criterion was introduced because HIV RNA levels are usually very high during acute HIV infection, and it was ensured by discarding data points measured within the first 180 days after the first positive HIV test. The patients were enrolled in the Swiss HIV Cohort Study, a longitudinal multicenter observational cohort study (SHCS) (
N (%) 
Regression Coefficient [95% Confidence Interval] 
Pvalue  
Median [IQR] 
0.62 [0.40 to 0.81] 

<0.001 
Sex  0.02  
Male  1685 (82.3%)  Reference  
Female  388 (18.7%) 


Median [IQR] age  37 [31 to 43]  0.02 [−0.02 to 0.06] 
0.242 
Mode of HIV acquisition  <0.001  
Heterosexual contacts  483 (22.3%) 


Homosexual contacts  1144 (55.2%)  Reference  
Intravenous drug use  446 (21.5%) 


Ethnicity  0.015  
White  1925 (92.9%)  Reference  
Black  24 (1.2%) 


Hispanic  68 (3.3%)  −0.10 [−0.27 to 0.07]  
Asian  35 (1.7%)  −0.05 [−0.22 to 0.12]  
Other  21 (1.0%)  0.13 [−0.18 to 0.43]  
Sequence generating laboratory  0.087  
A  215 (10.4%)  Reference  
B  420 (20.3%)  −0.05 [−0.19 to 0.09]  
C  1438 (69.4%) 


Median [IQR] year of sequence generation  2008 [2006 to 2008] 


Median [IQR] CD4 counts/microliter at time of sampling for genotyping  298 [162 to 464]  not done 

CD4 count groups (by 25th percentiles) 
<0.001  
0 to 162  542 (25%)  Reference  
163 to 298  542 (25%) 


299 to 464  543 (25%) 


465 to 1522  541 (25%) 


Ever had CDC stage C event prior to genotyping  206 (9.9%)  0.10 [−0.01 to 0.21]  0.085 
unless stated otherwise.
Regression coefficients printed in bold face are statistically significant at the 5% level.
Abbreviations: IQR interquartile range.
Regression coefficient per 10 years increase.
Regression coefficient per year increase.
because of better regression fit the final model included CD4 cell count as 4 categories.
Relationships between HIV RNA and pRC were modelled by the use of univariable and multivariable linear regression. Model assumptions were verified by inspecting residual versus fitted plots and by checking for unequal variance across fitted values (heteroskedasticity) and outliers. Because these diagnostics suggested the presence of heteroskedasticity we performed “robust” versions of linear regressions, which estimate a weighted variance based on the Huber−White method.
Statistical calculations were carried out with Stata 11.2 (Stata Corp., College Station, TX, USA). The level of significance was set at 0.05, and all pvalues are two sided.
Demographic and clinical characteristics of our study population are displayed in
In the plasma viral load dataset (2073 patients), values for RC predictions (pRC) were ranging from −1.07 to 1.43 units (median [interquartile range] 0.62 [0.40; 0.81]), and corresponding median [interquartile] HIV RNA levels were 4.7 log10 copies/mL [4.1; 5.2]. Using univariable linear regression analysis, we find a highly significant effect of the pRC value on virus load (F−Test
(
For the longitudinal dataset, we find that the pRC value increases in the course of an infection. Among the 53 patients with two viral sequences available taken at least 6 months apart, the median [interquartile] difference in pRC is 0.10 units [0.04; 0.25] and is statistically significantly different from 0 (p sign rank<0.001). Unadjusted linear regression estimates this increase in pRC at 0.020 units per year [95% confidence interval 0.006; 0.035] (
The above analyses were based on untreated patients sampled after the acute phase of the infection. We find similar results if we exclude patients, which have been sampled in the AIDS phase (defined as patients with at least one CDC stage C event, n = 206). In particular, we still find a highly significant (p<0.001) correlation between pRC and RNA load (slope: 1 unit increase in pRC is associated with an 0.54 increase [95% confidence interval 0.41; 0.66] in log10 HIV RNA) and a significant (p = 0.0058) increase of RC over time (increase in pRC at 0.020 units per year [95% confidence interval 0.006; 0.035]). Only the significancelevel of the correlation between the temporal change of pRC and the temporal change of RNA load changes from ‘significant’ (p = 0.04) to ‘trend’ (p = 0.058); however even in this case the point estimates for the regression coefficient are very similar in both cases (0.9[0.01; 1.79] vs. 0.84[−0.03; 1.70]).
How do the pRCs analyzed here relate to previous findings? For example, the 6 sequences (in our dataset) carrying the lamivudine mutation M184V, which has a large negative fitness effect on the virus
Our findings thus support the notion that virus load is to a large extent controlled by virus genetics
Overall our results show on the basis of a computational predictor, firstly that in vitro replicative capacity increases in the course of infection, which is consistent with the interpretation that RC is a determinant of fitness at the withinhost level, and secondly that RC is linked to virus load, which has been shown to be a in vivo determinant of viral fitness at an epidemiological level
We thank the patients participating in the SHCS for their commitment, all the study nurses and study physicians for their invaluable work, the data center for data management, all the resistance testing laboratories for their highquality work, and SmartGene for providing an impeccable database service.
The members of the Swiss HIV Cohort Study are Barth J, Battegay M, Bernasconi E, Böni J, Bucher HC, Bürgisser P, BurtonJeangros C, Calmy A, Cavassini M, Egger M, Elzi L, Fehr J, Flepp M, Francioli P (President of the SHCS), Furrer H (Chairman of the Clinical and Laboratory Committee), Fux CA, Gorgievski M, Günthard H (Chairman of the Scientific Board), Hasse B, Hirsch HH, Hirschel B, Hösli I, Kahlert C, Kaiser L, Keiser O, Kind C, Klimkait T, Kovari H, Ledergerber B, Martinetti G, Martinez de Tejada B, Müller N, Nadal D, Pantaleo G, Rauch A, Regenass S, Rickenbach M (Head of Data Center), Rudin C (Chairman of the Mother & Child Substudy), Schmid P, Schultze D, SchöniAffolter F, Schüpbach J, Speck R, Taffé P, Telenti A, Trkola A, Vernazza P, von Wyl V, Weber R, Yerly S.