Polymorphisms and Mutational Covariation Associated with Death in a Prospective Cohort of HIV/AIDS Patients Receiving Long-Term ART in China

Background HIV drug resistance is associated with faster clinical progression of AIDS. However, the effect of significant polymorphisms and mutational covariation on mortality among HIV/AIDS patients receiving long-term antiretroviral therapy (ART), have rarely been studied. Methods In this prospective cohort study from December 2003 to December 2014, we present a new computational modelling approach based on bioinformatics-based models and several statistical methods to elucidate the molecular mechanisms involved in the acquisition of polymorphisms and mutations on death in HIV/AIDS patients receiving long-term ART in China. Results This study involved 654 ART-treated patients, who had been followed for 5473.4 person-years, a median of 9.8 years, and 178 died (25.2%, 3.3/100 person-years). The first regimens included AZT/d4T + NVP+ ddI (78.9%) or AZT/d4T + NVP+ 3TC (20.0%). We calculated an individual Ka/Ks value for each specific amino acid mutation. Result showed that 20 polymorphisms (E6D, Q18H, E35D, S37N, T39A, K43E, S68N, L74I, I93L, K103N, V106A, E169D, Y181C, G190A, Q197K, T200V, T200E, T215I, E224D and P225H) were strongly associated with AIDS related deaths. Among them, 7 polymorphisms (L74I, K103N, V106A, Y181C, G190A, T215I and P225H) were known to be drug resistance mutations, 7 polymorphisms (E6D, E35D, S37N, I93L, E169D, T200V and T200E were considered to be potential drug resistance mutations, and 6 polymorphisms (T39A, K43E, S68N, Q197K, T200V and E224D) were newly found to have an association with drug resistance mutations, which formed a complex network of relationships. Conclusions Some polymorphisms and mutational covariation may be the important influencing factors in the failure of treatment. Understanding these mechanisms is essential for the development of new therapies, designing optimal drug combinations, and determining effective clinical management of individual patients.

. NFATP is a community-based public health approach that has greatly reduced the mortality and morbidity caused by HIV infection [2][3][4][5]. However, many HIV-infected patients receiving antiretroviral therapy (ART) have experienced treatment failure, increasing the risk of HIV-related deaths. The emergence of drug resistance variants is the main obstacle to the effectiveness of ART. HIV can evolve drug resistance rapidly in response to new drug treatments, often, through a combination of multiple mutations [6][7][8]. For most of antiretroviral (ARV) drugs, a single resistance mutation does not result in maximal resistance [9]. HIV continually accumulates mutations in its genome, and some of these mutations allow HIV to become resistant to individual ARV drugs.
The analyses of databases containing HIV-1 sequences from drug-naive and ART-treated patients have revealed the emergence of these accessory mutations could be related to the evolutionary pathway leading to the selection of drug-resistant isolates [10][11][12]. The selection pressure-based method is a useful way to explore the critical sites of mutations. In addition, this method can effectively detect the actual drug-resistance and mutational covariation [13]. HIV-1 functional constraints are usually expected to limit the amino acid substitution rates, resulting in a higher conservation of functional sites on the rest of the protein surface. Jaccard similarity coefficient, an extremely specific measure of covariation [14], was used to determine the covariation between mutations.
The HIV-1 subtype B' (Thai-B) plays unique roles in the genesis of the HIV-1 epidemic in China [15]. It is indicated that HIV-1 subtype B' is a single founding strain responsible for HIV-1 outbreaks among former plasma donors (FPDs) in China. However, the drug resistance of HIV-1 subtype B' has rarely been reported. The evolution of resistance mutation under ART, especially its relation to long-term ART (a median of 9.8 years) effect needs further investigation.
This study sought to determine whether certain mutational patterns were more prevalent in patients dead from HIV/AIDS compared to those alive. Hypothesis testing and selection pressure-based method were performed to determine the polymorphisms and mutations between deceased and living patients.

Study Population
This observational cohort study enrolled patients in Henan Province and in Anhui Province, both in China, from December 2003 to December 2004. They were the first groups of patients to receive the free ART drugs in China. Patients who were more than 18 years old, started ART between 2003 and 2004, and were willing to provide informed consent eligible for enrolment.
Data on demographics and risk factors for HIV infection were collected at baseline. Patients were followed up every six months up to 31 May 2010, or to stopping ART, death or loss to follow-up, and then every 12 months until December 2014. The information on treatment and outcomes was collected at every follow-up visit, including last combination antiretroviral regimen, missed ARV drugs within a month, CD4 T cell count (per μL) and viral load (copies/ mL).
This observational cohort study was approved by the Institutional Review Board of the National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention. Written informed consent was obtained from all of the subjects at the time of data collection.

Laboratory Tests
Blood samples were collected in ART clinics and processed in the CDC laboratories. The CD4 T cell count was measured within 24 hours after sampling by flow cytometry. The laboratory of NCAIDS (National Center for AIDS/STD Control and Prevention) in Beijing performed the tests for HIV viral load and sequences. Plasma HIV RNA was quantified with real-time NASBA (NucliSense Easy Q, bioMerieur, France) or COBAS (Roche Applied Science, Germany). HIVDR (HIV drug resistence) genotyping was carried out by an in-house PCR protocol for samples at baseline or follow-up with viral loads of more than 1000 copies/mL. The resulting fragment of the HIV pol gene (full protease amino acids from 1 to 99; part of the RT (reverse transcriptase) amino acids from 1 to 255) was amplified, purified, and bidirectionally sequenced using an ABI3100 sequencer (Applied Biosystems, Foster City, CA, U.S.A.) [16,17]. For individuals from whom multiple sequences were available, the last sequence of therapy was used in the analyses. Sample pol gene sequences were compared to a consensus sequence using HIVdb software (Stanford HIV Drug Resistance Database) to detect drug resistance mutations. We involved all drug resistance mutations that conferred low, intermediate or high-level resistance.

Data Processing and Statistical Analysis
Data were double-blinded, entered, and compared using EpiData software (EpiData 3.1 for Windows; The EpiData Association Odense, Denmark). Questionnaire and laboratory data were analyzed using Statistical Analysis System (SAS 9.3 for Windows; SAS Institute Inc., NC, U.S.A.). Mutation rates were computed and tested for significance based on chi-square and Fisher's exact tests. P-values < 0.05 were considered to be statistically significant, and all tests of significance were two-sided.

Phylogenetic Analyses
PhyML 3.0 was used to construct a maximum likelihood phylogenetic tree with all of the sequences obtained. Tree topologies were heuristically searched using the subtree pruning and regrafting procedure [18]. The branch significance was analyzed by bootstrap with 1000 replicates and inter-subject distances were calculated. The final tree was viewed using MEGA5.0 software and FigTree v1.3.1 (http://beast.bio.ed.ac.uk), as previously described [19].

Ka/Ks Computation
The selective pressure on a protein-coding gene was measured by comparing silent (synonymous) and replacement (nonsynonymous) substitution rates, often referred to as Ka/Ks (amino acid mutations over synonymous mutations) or dn/ds (nonsynonymous mutations over synonymous mutations) [20]. A higher replacement than silent rates provides unequivocal evidence for adaptive evolution driven by Darwinian selection. Since HIV has a high transition-to-transversion ratio, we calculated an individual Ka/Ks value for each specific amino acid mutation, instead of calculating Ka/Ks for an individual gene or codon [13]. Ka/Ks is calculated using the following formula developed by Li et al. [21]: where N a and N s are the numbers of non-synonymous mutations and synonymous mutations observed at the codon, respectively; n a,t is the number of possible transition mutations that will change the amino acid; n s,t is the number of possible transition mutations that are synonymous; n a,v and n s,v are the equivalent numbers for transversions; and f t and f v are the transition and transversion frequencies, respectively.
If an amino acid change is neutral, it will be fixed at the same rate as a synonymous mutation, with Ka/Ks = 1. If the amino acid change is deleterious, however, purifying selection will reduce its fixation rate so that Ka/Ks < 1. Ka/Ks values significantly lower than 1 are regarded as undergoing purifying selection and therefore, may have a functionally or structurally important role. When the amino acid change offers a selective advantage, it will be fixed at a higher rate than a synonymous mutation, with Ka/Ks > 1. Ka/Ks values significantly greater than 1 are indicative of positive Darwinian selection, suggesting adaptive evolution [20]. Ka/Ks values are calculated for amino acid sites 1 to 99 in PR and 1 to 255 in RT.
A LOD confidence score is calculated for a mutation to be under a positive selection pressure according to the following formula [13]: where N is the total number of mutations observed in the codon. If positive selection (Ka/ Ks > 1) has LOD scores of 2 or greater, then the positive selection is significant. An individual Ka/Ks value for each specific amino acid mutation was calculated. Then the association of these mutations with the drug resistance was investigated based on the criteria: (1) Ka/Ks > 1, LOD > 2; (2) Frequency of mutations in deceased patients was significantly larger than that in the living patients; (3) the non-synonymous mutations with low frequency (<1% deceased patients) were excluded.

Jaccard Similarity Coefficient
For a pair of mutations X and Y, Jaccard similarity coefficient is calculated as: where N XY represents the number of sequences containing mutation X and Y; N X0 represents the number of sequences containing X, but not Y; and N 0Y represents the sequences containing Y, but not X [14].
Jaccard similarity coefficient uses only those sequences in which at least one of a pair of mutations is present. To test whether the observed Jaccard similarity coefficients are statistically significant, the Fisher exact test is used (2,000 replicates used in the Monte Carlo test). Jaccard similarity coefficient is calculated as the mean Jaccard similarity coefficient after 2,000 random rearrangements of the X or Y vector (containing 0 or 1 for presence or absence of a mutation, respectively) [14]. If the OR (odds ratio) > 1, the mutation pair is deemed to be positively correlated, and it is considered to be negatively correlated if the OR < 1. The software, Cytoscape, is then used to construct the relationship between these mutations [22].

Study Population Characteristics
Of 691 study participants, 37 patients were excluded from this study, including 30 who stopped ART for a long time, and 7 patients whose deaths were not related with AIDS (3 patients committed suicide, 3 patients died in accidents and 2 patients died of a brain hemorrhage). This study involved 654 ART-treated patients, who had been followed for 5473.4 person-years, a median of 9.8 years, and 178 died (25.2%, 3.3/100 person-years) before the last follow-up in 2014. A majority of individuals were women (57.2%); young and middle-aged adults (25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44) years old) at enrolment accounted for 74.8%; only 2.8% had received post-secondary school education; most of them finished secondary school or less; 84.1% married and living with partner. Blood transfusion was the major route of transmission, accounting for 93.6% of all cases. The first regimens included AZT/d4T + NVP+ ddI (78.9%) or AZT/d4T + NVP+ 3TC (20.0%). The last regimens included AZT/d4T + NVP+ ddI (17.7%) or AZT/d4T + NVP+ 3TC (25.4%) or TDF+LPV+3TC (52.5%). Most patients were good in medication adherence, and only 6.4% missed ARV drugs within a month mainly because of working outside. 15.8% of patients initiated ART with a CD4 T cell count of higher than 500 cells/mL, 19.7% with a CD4 T cell count of between 350 and 499 cells/mL, 29.4% with a CD4 T cell count of between 200 and 349 cells/mL, and only 35.2% had a CD4 T cell count of less than 200 cells/mL. All the patients in the study were identified to be infected by HIV-1 subtype B' as determined by Neighbor-joining genetic analysis of pol sequences of the viruses obtained from plasma samples of the HIV-1-infected patients using PCR technique (Table 1).

Hypothesis Testing and Selection Pressure Computation for Individual Mutations
Mutations in the RT and PR of HIV-1 viruses from 178 patients who had died of AIDS and the 476 who were still living were analyzed. We used chi-square and Fisher's exact tests (when sample sizes were small) to determine the critical mutations in the deceased group, and 20 death-associated polymorphisms were identified. These polymorphisms include E6D, Q18H,  E35D, S37N, T39A, K43E, S68N, L74I, I93L, K103N, V106A, E169D, Y181C, G190A, Q197K,  T200V, T200E, T215I, E224D and P225H. Notably, 7 (35%) mutations (L74I, K103N, V106A, Y181C, G190A, T215I and P225H) among those classified were known to be associated with drug resistance according to the HIVdb Program in the Stanford HIV Drug Resistance Database. Most drug resistance mutations (L74I, K103N, Y181C, and G190A) had a high frequency in two groups. They were all positive Darwin selection mutations (Ka/Ks > 1, LOD > 2), except for V106A and P225H. Others 13 (65%) (E6D, Q18H, E35D, S37N, T39A, K43E, S68N, I93L, E169D, Q197K, T200V, T200E and E224D) had not been previously reported to be drug resistance mutations. Among these newly found mutations, 7 mutations (E6D, E35D, S37N, I93L, E169D, T200V, and T200E) were considered to be potential drug resistance mutations (Table 2). Table 3 indicated the Jaccard similarity coefficients and conditional probabilities of the positively associated protease mutation pairs between death-associated polymorphisms and drug resistance mutations. Only a small number of mutations were found in the HIV protease, so this covariation analysis only considered mutations of the reverse transcriptase. 6 polymorphisms that were not previously identified as being associated with drug resistance per the Stanford database (T39A, K43E, S68N, Q197K, T200V and E224D) were found to have an association with drug resistance mutations. There was a correlation between T39A and three mutations (M41L, K43E, and M184V) known to confer drug resistance. A more complex relationship between these mutations can be seen in Fig 1, which is advantageous in understanding the covariation among mutations. Additionally, once this relationship is understood, it will help to reveal the underlying mutation mechanisms. The network was stratified into three layers based on the three types of mutations (death-associated polymorphisms, drug resistance mutations, and other mutations). Death-associated polymorphisms are displayed in the left layer (some of them are drug resistance mutations, while drug resistance mutations are displayed in the middle layer and other mutations in the right layer. In the network, drug resistance mutations with higher covariation (e.g., M41L, D67N, K101E, K103N, Y181C, M184V, L210W, T215F, and T215I) were more likely to influence the other mutations. For example, M41L and D67N had 16 (D218E, E44A, G190S, G196E, H221Y, I142V, K101E, L210W, L228R, L74V, M184V, R211K, T215F, V108I, V118I, and V179I) and nine (E44A, H208Y, H221Y, K70R, L210W, M184V, M41L, R211K, and V90I) target mutations, respectively.

Covariation Mutation and Network Graph Analysis
In our covariation analysis, four death-associated mutations (T39A, K43E, L74I, and Y181C) were significantly associated with M184V. Non-drug resistant but death-associated polymorphisms T39A may play a role in promoting drug resistance through M184V. The emergence of these accessory mutations could be related to the evolutionary pathway leading to drug resistance. In the deceased group, the covariation pairs between M184V and L74I were highly significant (p < 0.01, OR > 1). M41L was significantly associated with T215Y (p < 0.001; R > 1). L210W was significantly associated with M41L and T215Y (p < 0.001; R > 1). The rates of M41L in the deceased and survival groups were not significant. However, in the deceased group, M41L was significantly associated with the TAMs (thymidine analogue resistance mutations) D67N, L210W, and T215Y/F. This probably increased both drug resistance and adaptability. D67N was significantly associated with the other TAMs K70R, H208Y, L210W, and T215Y. V118I was significantly associated with M41L, L210W, and T215F in this cluster (p < 0.001; R > 1).

Discussion
The increasing evidences suggest that in addition to those currently known mutations, more and more unidentified mutations may also be involved in the development of drug resistance, which contribute to therapy failure, and the development of drug resistance may be more complex than the classical one-step model of significant resistance via a single mutation so far considered [23,24]. Some studies suggest that the various subtypes may respond differently to ARV drugs [25,26]. The frequency and pattern of mutations conferring resistance to these drugs differ among HIV-1 subtypes and can influence the outcome [27]. HIV-1 Subtype B' plays unique roles in the genesis of the HIV-1 epidemic in China. As a result, it is particularly important to understand the mutation changes in patients infected by HIV-1 subtype B' and their effect on dug-resistance. In this study, death-associated polymorphisms emerged after the viral failure (i.e., the inability to achieve or maintain viral suppression, or a viral load of more than 1,000 copies/ mL) and were found more than one year prior to the death of the patients. As such, these polymorphisms that were not reported to be involved in drug resistance might be considered drug resistance mutations or compensatory mutations. In this study, about 36 (5.5%) samples sequenced in the beginning, and they might not under enough selective therapy pressure, but no significant mutations were found in these samples.
Among 20 significant death-associated polymorphisms, 13 (65%) polymorphisms (E6D, Q18H, E35D, S37N, T39A, K43E, S68N, I93L, E169D, Q197K, T200V, T200E and E224D) had not been previously reported to be drug resistance mutations, but they were strongly associated with death. 7 (35%) mutations (E6D, E35D, S37N, I93L, E169D, T200V, and T200E) were considered to be potential drug resistance mutations in this study. 6 polymorphic mutations (Q18H, K43E, S68N, Q197K, T215I and E224D) were presented in strains isolated from the deceased patients, while they were completely absent in the strains isolated from the living patients, and they were all positive Darwin selection mutations (Ka/Ks > 1, LOD > 2), but LOD 2. Ka/Ks calculations successfully identified the mutations that were associated with drug resistance and AIDS-related death. However, many researchers had reported that drug resistance mutations displayed positive selection (Ka/Ks > 1) with LOD scores greater than 2 [12,13,28]. Therefore, these polymorphisms differed from drug resistance mutations in the traditional sense. Some studies had shown that accessory mutations at positions 39 (T39A), 43 (K43E) were more frequent in viral isolates from patients failing therapy than in naive individuals, and those mutations were previously identified as accessory mutations associated with the accumulation of TAMs (thymidine analogue resistance mutations) and with high levels of resistance to nucleoside analogues [23,[28][29][30]. These polymorphisms require further clinical data and related literature to demonstrate the nature of this difference. This study also found these polymorphisms had complex network relationships with previously reported drug resistance mutations. Stratified networks were utilized to display the mutation covariation and had proved useful in studying mutation variation during ART [12,28,31]. It had been reported that HIV can employ various combinations of mutations to resist drug treatments [32]. To determine mutational interactions between the newly identified and drug resistance mutations in RT of HIV-1 subtype B' strains, we used the Jaccard similarity coefficient and network graph to investigate the correlated mutations in the deceased group, and found that these newly identified mutations were likely to be related to drug resistance. Network showed that most mutations were connected together as a component; mutations of high frequency were more likely to influence the other mutations (Fig 1). The relationship among mutations in the networks could give clues to the combinatorial mutation patterns responsible for drug resistance within the network. For example, T39A and K43E had three (M41L, K43E, and M184V) and seven (D67N, L74V, K101E, L210W, M184V, T215Y, and E224D) target drug resistance mutations, respectively. The other two non-drug-resistance mutations (E6D and E169D) also had indirect relationship with drug resistance mutations (e.g., E6D-K70R-Q197K; E169D-V179I-G190A), indicating a possible association between these mutations and drug resistance. Several studies had reported that HIV-1 replication efficiency might correlate with disease progression [33][34][35]. Understanding these polymorphisms could help to assess the most at-risk populations to avoid undermining current treatment regimens, achieve the greatest impact for the most people, and ensure sustainability. Thus, further in vitro experiments are required to confirm whether the death-associated mutations are drug resistance mutations or compensatory mutations, as well as elucidate the role of these mutations in the development of drug resistance.
This study also had several limitations. 1) The sample size was not large enough and did not include all kinds of patients in China; and 2) covariance analysis of related protein sequences was known to be a few problematic [36]. Jaccard similarity coefficient treated all substitutions of amino acids equally, ignoring physicochemical preferences. It may be worth considering different essential covariance measures for further analysis, and conclusions should be made with caution. 3) Not only resistance mutations or polymorphisms, but other biases could interfere with the progression to death, such as age, physical condition, combination antiretroviral regimens, CD4 + cell count baseline, pathologies presented at the beginning of the follow-up and so on. Therefore, a more detailed analysis of stratification will be in further consideration; 4) A small number of mutations were found in HIV protease, so this method was not able to analyze the correlation between them.