Genetic and clinical predictors of CD4 lymphocyte recovery during suppressive antiretroviral therapy: Whole exome sequencing and antiretroviral therapy response phenotypes

Increase of peripheral blood CD4 lymphocyte counts is a key goal of combined antiretroviral therapy (cART); most, but not all, recipients respond adequately and promptly. A small number of studies have examined specific genetic factors associated with the extent of CD4 recovery. We report a genome-wide examination of factors that predict CD4 recovery in HIV-infected women. We identified women in in a cohort study who were on cART with viral load below 400 copies, and drew racially and ethnically matched samples of those with good CD4 response over 2 years or poor response. We analyzed the exomes of those women employing next generation sequencing for genes associated with CD4 recovery after controlling for non-genetic factors identified through forward stepwise selection as important. We studied 48 women with good CD4 recovery and 42 with poor CD4 recovery during virologically-suppressive cART. Stepwise logistic regression selected only age as a statistically significant (p<0.05) non-genetic predictor of response type (each additional year of age reduced the odds of good recovery by 11% (OR = 0.89, CI = 0.84–0.96, p = 0.0009). After adjustment for age and genomic estimates of race and ethnicity, 41 genes harbored variations associated with CD4 recovery group (p≤0.001); 5 of these have been previously reported to be associated with HIV infection, 4 genes would likely influence CD4 homeostasis, and 13 genes either had known functions or were members of product families that had functions for which interactions with HIV or effects on lymphocyte homeostasis were biologically plausible. Greater age was the strongest acquired factor that predicted poor CD4 cell recovery. Sequence variations spanning 41 genes were independently predictive of CD4 recovery. Many of these genes have functions that impact the cell cycle, apoptosis, lymphocyte migration, or have known interactions with HIV. These findings may help inform new hypotheses related to responses to HIV therapy and CD4 lymphocyte homeostasis.


Introduction
Numeric recovery of CD4 T cells (referred to as CD4 cells from here on) in peripheral blood is an important indicator of response to combination antiretroviral treatment (cART). The number of CD4 cells predicts the occurrence of HIV-associated opportunistic infections and cancers. [1][2][3][4][5] Suppression of HIV replication is a major determinant of immunologic recovery during cART. However, even when viremia is optimally controlled, 15-30% of cART recipients have very slow or minimal gains in circulating CD4 cells. [6][7][8]. Persons with poor CD4 cell recovery after cART-mediated suppression of HIV replication (so-called immunologic non-responders or INRs) may experience continued immune impairment and an increased risk of clinical complications [9][10][11].
The definition of INR after cART suppression varies in terms of time period and number of cells that is attained. Peripheral blood CD4 cell counts may increase for more than 5 years after initiation of suppressive therapy. [12] However, a CD4 cell gain of 50-150 cells/mm 3 in the first year of suppressive therapy is a common standard for a good response, as are subsequent increases of 50-100 cells/mm3 per year. A short term gain followed by a sustained increase defines a treatment response that is considered adequate in the current Guidelines for use of Antiretroviral Agents. [13] Other definitions of CD4 cell count responses to treatment are based on expansion from pre-treatment CD4 counts and/or functional recovery. The varied definitions of immunologic recovery influence the reported proportion of optimal and INR cases; studies applying the most stringent recovery definition tend to report a larger proportion of suboptimal responses.
Poor CD4 cell recovery has been associated with age and telomere length, male sex, hepatitis C co-infection, pre-treatment CD4 counts, pre-treatment HIV RNA viral load (VL), specifics of the cART regimen, and duration of infection prior to cART initiation.[7, 10,11,[14][15][16][17][18][19][20] Previous studies demonstrated that host genetics influence expansion of CD4 populations during cART. The AIDS Clinical Trials Group examined 137 single nucleotide polymorphisms (SNPs) among 17 genes and found that SNPs of TNFSF10, TNF, BCL2L11, IL15RA, and IL15 were associated with differential CD4 cell recovery at 12 months of cART virologic suppression. [21] IL7R gene polymorphisms were associated with rate of CD4 cell recovery in a large cohort of cART recipients, a finding consistent with the known role of IL-7 in establishing CD4 lymphocyte homeostasis. [22] CCR5 genotype/CCL3L1 copy number was associated with extent and rate of CD4 cell recovery in another cohort. [23] cART recipients who were homozygous for HLA-Bw4 were significantly more likely to be INRs than other individuals. [24,25] Mitochondrial DNA genotype also influences rate and extent of CD4 cell recovery among cART recipients. [26,27] The participants of these studies were either predominantly male or the proportion of females was not specified; determination of factors associated with CD4 lymphocyte recovery in females is important because sex dimorphisms are recognized for immune responses, circulating numbers of T cells, and intensity of HIV viremia. In the study of IL-7R haplotype 2 among Ugandan cART recipients [25]; female sex was associated with significantly longer time until reaching a threshold value of 500 cells/mm3, but this association was not independent of other factors such as pre-treatment CD4 cell count and age.
We report herein the associations of demographic and clinical characteristics and exome sequence variations with CD4 cell recovery of women (participants of the Women's Interagency HIV Study, WIHS), limited to women who experienced rapid expansion of peripheral blood CD4 cell count or women who had slow or minimal expansion of these cells while receiving cART with suppression of plasma VL to below 40 copies/ml.

Study population
WIHS is a longitudinal, multisite, observational cohort study of HIV infection among U.S. women. [28,29] Use of cART was defined according to expert HIV treatment guidelines. [13] Our analysis used a dichotomized measure of self-reported adherence of the proportion of doses of antiretroviral medications taken as being either �95% or <95% of what the provider instructed over the last six months. Peripheral blood CD4 counts and VL quantitation in plasma were determined via standard assays in laboratories that participated in the NIAID laboratory quality assurance program. The CD4 nadir is defined as the lowest point to which the CD4 count has dropped and is predictive of slower immune recovery and long-term morbidity. Nadir CD4 count was the lowest value measured by WIHS, or pre-WIHS ascertained via medical record review. Anti-Müllerian hormone (AMH) levels in plasma were measured using a commercially available ELISA with a lower limit of detection of 0.08 ng/ml. AMH is an indicator of ovarian follicular reserve, and falls below assay detection 3-5 years prior to menopause, a time that corresponds to the onset of consistent estradiol depletion. [30] Informed consent was provided by all participants via protocols approved by institutional review committees or boards (IRB) at each affiliated institution with consent to studies of host genetics was specifically obtained for women who contributed to this study.

Phenotype determination
Study visits during reported cART use were categorized according to whether VL was suppressed and according to whether CD4 counts demonstrated adequate gains over pretreatment and nadir values. Imputed values for CD4 counts or VL were applied to one or two visits with missing values for these measures if there was no change in cART status between at least two nearby visits with the measurements available. To mitigate volatility in CD4 counts, non-missing CD4 counts were smoothed using values from two or three nearby visits, when available. Detailed specification of the phenotyping algorithm is provided in online S1 Appendix, and the SAS program implementing it is in online S2 Appendix. These appendices include programming for phenotypes that were not used for this study.

Selection for this study
Initial selection for this study was completed using the phenotyping detailed in the preceding section, and was based on the initiation of cART, having plasma VL below assay detection (e.g., <80 copies) for �1 year, and having either the most rapid and durable gains in CD4 counts for at least 2 years or the poorest gains for at least one year during cART visits with viral suppression. Because race is associated with total leukocyte counts in peripheral blood, we matched the good and poor responder groups by race and ethnicity. Our goal was to select 50 women with rapid CD4 cell responses and 50 with slow CD4 cell responses, matched for race and ethnicity. Initial screening with the algorithm (online S1 appendix) identified 201 rapid responders and 78 slow responders. We reviewed longitudinal plots of viral load, CD4 cell count, and treatment status (examples in Fig 1) stratified by race (i.e., African-American, White) and Hispanic ethnicity, and identified 46 clearly slow responders. We identified 50 race-matched rapid responders with similarly clear and rapid CD4 cell expansion after initiation of suppressive cART.
Ninety-two of the 96 initially selected women had available samples for exome sequencing (examples provided in Fig 1). Two additional eliminations occurred following genetic data collection; one had an inadequate genomic material, and exome sequencing failed in the second, resulting in a final sample of 90 women.

Statistical analyses
We used forward stepwise logistic regression to select variables from a set of demographic, clinical, and treatment variables, for inclusion in a final model of acquired factors associated with good recovery. Predictors were required to have a p-value for association with CD4 response group of <0.05 in a multivariate regression. SAS version 9.4 was used for these analyses.
Whole exome sequencing. Banked peripheral blood mononuclear cells underwent genomic DNA extraction using the Puregene DNA Isolation System (Invitrogen, Carlsbad, CA), quantitated, and normalized to a concentration of 45 ng/mL. Approximately 6.75 micrograms of DNA was subjected to sequencing at an average depth of read of 50X. Sequencing was performed by Expression Analysis Inc. (Durham, NC). Prior to library construction and sequencing, DNA was evaluated for evidence of degradation by electrophoresis.
Exome sequencing was completed with 100 base-pair paired-end sequencing. A PhiX control sample was included on all flow cells. DNA fragments were amplified using Clonal Single Molecule Array technology and Sequencing-by-Synthesis using Reversible Terminator Chemistry. Sequence data was stored as FASTQ files for each specimen. Sequences were required to pass Illumina (San Diego, CA) purity filter reads and was accompanied by a per-base quality score, as defined by Illumina-scaled Phred metric.
Variant calling file generation procedure. Paired-end whole-exome reads were processed and aligned to the human genome. The alignments were used to call variants in order to create the Variant Calling Format (VCF) file using the Genome Analysis Toolkit from The Broad Institute (GATK) following best practices. [31,32] A diagram of the VCF file generation procedure is provided in S1 Fig. In brief, paired-end reads were clipped from adapters using fastq-mcf from the EA-utils package (https://code.google.com/p/ea-utils/). Paired-end libraries were evaluated for quality using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/ fastqc/). Paired-end reads were aligned to the reference genome assembly GRCh37/hg19. Alternate loci and patches were excluded. Alignments were performed with bwa-mem (http:// bio-bwa.sourceforge.net). [33] Variant calling was performed using GATK version 3.4-46 (https://www.broadinstitute.org/gatk).
Alignments were processed with samtools (http://www.htslib.org) [34] to remove PCR duplicates. Read Groups lines were defined in the BAM file using AddOrReplaceReadGroups in picard-tools (http://picard.sourceforge.net). Re-alignment was performed in regions around known indels with the RealignerTargetCreator and IndelRealigner in GATK using the gold standard indels from Mills, the 1000 Genomes Project (KGP), and the KGP phase 1 indels (GATK Resource Bundle, version 2.8, ftp.broadinstitute.org). Recalibration of the base quality scores was performed to assign an empirically accurate error model to the bases with BaseRecalibrator/PrintReads in GATK using the known variation from the gold standard indels from in the above listed sources and dbSNP 138. Variant calling was performed per sample with the GATK HaplotypeCaller in GCVF mode. All samples were called together to gain more confidence in weak calls with GenotypeGVCFs and dbSNP 138. Variants were recalibrated using VariantRecalibrator using known variation data from dbSNP 137, HapMap 3.3, and KGP Omni 2.5. A Variant Quality Score logarithm of odds was calculated for each variant using ApplyRecalibration (filer level = 99.0). Final annotations for variant filtration were performed with SelectVariants for approximate read depth (DP<8) and genotype quality (GQ<20) as previously suggested. [35] Only SNP variation was called. Genomic data for the WIHS is indexed in the database of Genotypes and Phenotypes (dbGaP); the accession number is phs001503.
Genetic association analyses. Allele and genotype frequencies were determined by gene counting. Population substructure in good and poor CD4 response groups was estimated by principal component analysis (PCA). The first two principal components (PCs) were sufficient to estimate the substructure of the cohort due primarily to genetic racial and ethnic differences, which were employed to control for the potential confound of ancestry-based substructure in the multiple regression analyses.
The approaches used for association analysis in host genomic studies typically focus on common polymorphisms but are poorly suited to detect the impact of rare polymorphisms and cannot estimate their combined effect or "variant burden". In order to leverage the availability of common and rare polymorphisms yielded from whole exome sequencing, two approaches to estimate the magnitude of variant burden with the CD4 recovery phenotype were considered: the Combined Multivariate and Collapsing (CMC) method [36] and the Kernel-Based Adaptive Collapsing (KBAC) method [37] Both methods collapse variants into a single covariate based on gene regions (i.e., a gene). CMC can detect the effects of both common and rare alleles, while KBAC detects effects of rare alleles only. Gene associations identified by  Fig 1A legend. WIHS Study Visit denotes visit number (2 visits per year). The vertical black line at visit 16 denotes the timepoint at which the beginning of the phenotypic window begins, which is followed by a period of rapid CD4 T cell gain (orange bracket). The following graphics were used to display relevant clinical, disease, and treatment information: dashed blue line denotes CD4 T cell count, solid green line denotes nadir CD4 T cell count, dashed black line denotes HIV viral load, HIV antiretroviral therapy (ART) status was denoted using � (no ART) or ▲ (combination ART (cART) visit, and HIV viremia status was denoted using � designated in blue font (HIV RNA below the assay detection limit) or ✖ designated in blue font (HIV RNA greater than or equal to the assay detection limit. The visit phenotype was denoted using the following symbols: U (not determined due to the lack of sufficient observation), designated in red font (long term off cART and viremic with stable CD4), l designated in green font (long term off cART and viremic with declining CD4), H (first cART visit), H designated in red font (on cART with viremia), or H designated in green font (on cART with virologic suppression). Fig 1B legend. WIHS Study Visit denotes visit number (2 visits per year). The vertical black line a visit 12 denotes the timepoint which is followed by a period of minimal CD4 T cell gain (orange bracket). The graphics were used to display relevant clinical, disease, and treatment information are the same as for Fig  1A, except that the visit phenotype had the following additional symbols: s designated in red font (short term off cART with viremia and declining CD4), H designated in green font (reinitiated cART visit), or h designated in green font (virologic suppression with minimal CD4 gain).
https://doi.org/10.1371/journal.pone.0219201.g001 both CMC and KBAC would suggest that both common and rare variants contribute to the observed association, while gene associations identified by CMC or KBAC alone, would suggest that the variant burden at said gene was due primarily to common or rare variants, respectively. Both unadjusted and adjusted (i.e., age, genomic estimates of population substructure) associations were calculated. For CMC and KBAC, gene regions were defined by RefSeq release 63. Statistical analyses were carried out using HelixTree software (Golden Helix, Bozeman, MT).
The CMC method groups variants according to the frequency of each polymorphism within a gene region to perform a multivariate test collapsed over the gene region to estimate the association with the phenotype. Both common (i.e., �5% minor allele frequency [MAF]) and rare (i.e., <5% MAF) SNPs are included in CMC analyses. In the current study we defined five potential groups, or bins, based on the frequency of the rare allele for each SNP as provided by KGP: <1%, 1%�x<2.5%, 2.5%�x<5%, 5%�x<10%, �10%. SNPs identified by whole exome sequencing in WIHS not present in the KGP reference database were assigned to the bin defined as <1% because the variant was assumed to be too rare to be detected in KGP.
The KBAC method collapses the genotypic information across all polymorphisms in a gene region into compound, multi-marker genotype without the need for binning. However, common SNPs (i.e., �5% MAF) must be excluded from analysis. The counts of the multi-marker genotypes are used to perform a multivariate test to estimate their association with the outcome. The KBAC test weights multi-marker genotypes based on risk for the outcome with increased risk genotypes assigned higher weights to better distinguish between causal and non-causal genotypes. Accordingly, the KBAC test is a one-sided test.
Initially, both CMC and KBAC regression analyses were performed at the level of the gene with all transcripts consolidated into a single region. In order to evaluate the possibility that associations may be narrowed to specific transcripts, the CMC and KBAC regression analyses were repeated for every transcript only for those gene regions that met a priori significance thresholds to minimize the added multiple testing penalty. For candidate gene analyses, a significance threshold of p<0.05 was selected a priori for the evaluation of previously identified candidate genes for CD4 recovery. Based on the exploratory nature of the exome analyses using both CMC and KBAC covariate-adjusted models, we report here the genetic predictor terms that had a gene-wise permutation-based p<0.001 by either CMC or KBAC after one million permutations, a threshold that is more appropriate when conducting gene-wise association studies. [38] The CMC analyses evaluated 18,987 genes while KBAC analyses evaluated 19,653 genes; genes associated with CD4 recovery using either method where retained.
Genes that were associated with CD4 recovery group were crossed referenced via the HIV: host protein interaction database (https://www.ncbi.nlm.nih.gov/genome/viruses/ retroviruses/hiv-1/interactions/). Both direct (i.e., direct host protein:viral protein) and indirect (HIV protein interaction with a host protein that interacts with the protein encoded for by an identified CD4 recovery gene) were considered.

Clinical factors
Ninety WIHS participants contributed to analyses reported here; 42 met criteria for poor CD4 recovery on virologically-suppressive cART and 48 women demonstrated good CD4 expansion on virologically-suppressive cART.  Table 1 summarizes the characteristics of the two CD4 response groups (white columns), and the associations estimated by univariate logistic regression (gray columns). Even though we limited analysis to women who had HIV RNA levels below assay threshold during the CD4 response time period, �95% adherence to the prescribed regimen was common. CD4 nadir was associated with CD4 response phenotype (OR = 2.0 per 100 cell/ml, 95% CI 1.34-3.1, p = 0.0008). However, CD4 count and the amount of increase above CD4 nadir during the response period are used in the CD4 response group definition, so the meaning of this association is questionable.
Age at the onset of the response-defining treatment period was strongly associated with CD4 response group (OR for a good response = 0.89 per year, 95% CI = 0.838-0.955, p = 0.0009). Being a tobacco smoker at start of the CD4 response phenotype was also associated with CD4 response group: OR for a good response = 0.413, 95% CI = 0.175-0.976, p = 0.044). Plasma AMH level was available for 87 women. Undetectable AMH was associated with poor response group: OR for a good response = 0.221, 95% CI = 0.087-0.564, p = 0.0016.
All women reported current or past use of nucleoside antiretroviral drugs, though prior use of ddI or d4T was associated with poor CD4 response (OR for a good response = 0.333, 95% CI = 0.116-0.955, p = 0.041). Self-reported adherence of >95% to the cART regimen was associated with a good CD4 response (OR = 3.300, 95% CI = 1.288-8.5, p = 0.013).
Most of these covariates are associated with each other. For current smokers age and years smoked will line up exactly, and even for former smokers there will likely be a relation. Because ovarian follicular reserve declines with age, AMH detection also declines with age. ddI and d4T were used earlier in the epidemic, and so will also be associated with age. To sort these out we constructed a multivariate model with forward stepwise logistic regression, with p<0.05 for inclusion. Nadir CD4 count was not included as a candidate predictor in the model because it is linked with criteria for group assignment. Age had the smallest p-value on its own, and none of the other variables had p<0.05 after controlling for age. The confidence intervals for many of those other predictors include substantial effects, so one should not conclude this finding to mean they don't matter. It just means we don't know much about how or if they matter. However, our decision rule generates a final model of non-genetic factors that includes only age. Nadir CD4 count was not included as a candidate predictor in the model because it is linked with criteria for group assignment. Table 2 details the 41 genes that had p�0.001 for predicting response group by CMC (a regression analysis approach that evaluates the genetic burden of both common and rare alleles) and/or KBAC (a regression analysis approach that evaluates the genetic burden of rare alleles), along with genes previously reported to be associated with HIV-related outcomes (i.e., candidate genes). The reported values are from regression analyses controlling for age (based on the multivariate analysis of non-genetic variables discussed immediately above) and the first two ancestry principal components that were included to account for any residual variability in population substructure due to race and/or ethnicity after matching on self-reported race and ethnicity. Five of these genes were previously reported to be associated with HIV-related outcomes, including HECW2, CLCN3, ALG2, SRP14, and CCL25. Another four genes have functions that would likely influence CD4 lymphocyte homeostasis; these were: SLC12A2, USP12, GMIP, KLC3. Eleven of the genes associated with CD4 response had unknown functions. The remaining 21 genes either had known functions or were members of product families that had functions for which interactions with HIV or effects on lymphocyte homeostasis were biologically plausible. None of the 8 candidate genes previously reported to be linked with HIV treatment responses were statistically significantly associated (p<0.05) with CD4 response group in this study. A detailed description of the estimates of association for both CMC and KBAC is provided in online S1 Table and the polymorphisms incorporated into each gene level burden regression analysis is provided in online S2 Table.

Discussion
Several predictors of CD4 cell response group among suppressed (by clinical assay) cART recipients were identified in this study. Of the non-genetic factors, only age was statistically significantly associated with CD4 cell response group in multivariable analysis. Greater age was associated with lower likelihood of a good CD4 recovery response, a finding that has been previously reported. [86,87] This is in keeping with the loss of immune function that is known to occur with aging Some studies have reported that females have higher circulating numbers of CD4 lymphocytes than males [88,89], but substantial sex differences in CD4 cells have not been consistently observed. [90] Some reports note that sex differences in CD4 cell counts is driven by higher counts among women less than 50 years of age. [89] We found that a biomarker of gonadal aging (ovarian follicular function) was associated with recovery group, but the association was not statistically significant when adjusted for chronological age.
Many of the genes identified in this study appear to participate in processes related to lymphocyte homeostasis and thymus activity. Some genes that are known to interact with HIV also were associated with CD4 lymphocyte response group. 0.00057 0.58511     [73], homing of lymphocytes to mucosa [74] and inflammation in colitis [75]. In the SIV model, low CCL25 is associated with lymphoid apoptosis. [76] 0 Glycine C-acetyltransferase AKA KBL The two approaches we employed to evaluate genetic burden are related, but provide distinct information. While each analysis estimates genetic burden, CMC evaluates both rare and common polymorphisms and identified 23 genes that suggests that the basis of these associations is due primarily to common and not rare variant burden, while KBAC evaluates rare polymorphisms and identified 17 genes that suggests that the basis of these associations is due primarily to rare variant burden. Only one gene was identified by both CMC and KBAC (i.e., CCL25), indicating that both rare and common variants of this gene may influence CD4 cell recovery pattern.
Further investigation of the 41 positional candidate genes revealed that 9 encoded for RNA products, while the remainder encoded for proteins. Twenty-five of the 32 protein-coding genes encoded for proteins that interacted with HIV proteins ( Table 2, last column). The 9 RNA-encoding genes have not been evaluated for their potential HIV protein interactions. However, the expression of microRNAs often differs by HIV status and disease characteristics.
In a post-hoc analysis, we identified publically available gene expression data collected in a sample of South Asian men (n = 10) and women (n = 2) who exhibited expected (n = 5) and poor (n = 7) CD4 T-cell recovery after cART initiation (Gene Expression Omnibus [GEO] expression set GSE77939) which we interrogated to determine if a subset of the 41 genes that were enriched for sequence variations associated with poor CD4 cell recovery also exhibited differences in gene expression. The definitions of expected and poor CD4 cell recovery were loosely similar to ours, with HIV-positive individuals required to take ART for at least one year who did (expected responders) and did not (poor responders) experience a CD4 lymphocyte count gain of more than 150 cells/μL in one year and a viral load <400 copies/mL). Additionally, nonresponders consistently had CD4 cell counts of less than 250 cells/uL, while expected responders showed a CD4 count that exceeded 250 cells/μL (personal communication, S. Aurora). Gene expression measures for 7 (i.e., CTAGE7P, MIR3662, MIR5010, MIR6769A, MIR6784, RP11-383C5.3) of the 41 candidate genes identified in our sample were not available. Of the 34 genes that we identified by whole exome sequencing for which gene expression data was also available, seven were differentially expressed between CD4 responder groups (i.e., ALG2, GIMP, GRHL2, LCE1D, SLC12A2, TMEM121, ZNF493; all p<0.05) based on GEO2R analysis. These gene expression findings provide complementary support that the identified candidate genes that differed in terms of enrichment of sequence variations by CD4 cell recovery group also show 0.84127 e 1 Adjusted for genomic estimates of race and ethnicity and age (years). HIV protein interactions with the gene of interests was categorized as unknown, direct (i.e., an HIV protein(s) interacts directly with the protein product encoded by the gene of interest), or indirect (i.e., the protein encoded by the gene of interest interacts with another host protein that in turns is known to interact with an HIV viral protein(s)). For direct HIV protein interactions, the HIV protein is denoted with a single letter designation (i.e., e = Env; g = Gag protein; n = Nef; p = Pol; t = Tat  differences in gene expression. Whereas the number of genes that were differentially expressed exceed that expected by chance (i.e., 7 genes were differentially expressed where approximately 2 would be expected by chance alone, p<0.00022), it is also possible that a subset of the other candidate genes may also be differentially expressed and associated with CD4 cell response to ART but in other tissues (e.g., thymus) or at different times (e.g., viral load set-point). Future studies are warranted that examine gene expression and sequence variation (e.g., expression-quantitative trait locus) analyses in relation to CD4 lymphocyte response to treatment. Further bioinformatic analyses to evaluate for a biological pattern(s) across the genes were identified using KEGG gene set enrichment analyses. One pathway, hsa00260 (glycine, serine, and threonine metabolism) featured two genes associated with CD4 recovery: CDCH and GCAT. Enrichment of genes in the hsa00260 pathway has been highlighted in several studies of HIV infection. [91,92] Because we identified HIV-related genes that were associated with CD4 cell recovery pattern, it is possible that ongoing HIV replication at very low levels might have been a contributor to CD4 cellular expansion. Thus, when specimens were available, we conducted ultrasensitive HIV RNA assays to detect HIV RNA quantities below the threshold of clinical assays in use at the time of this study. These ultra-sensitive assays utilized Hologic (San Diego, CA) using the Hologic Aptima1 HIV-1 Quant DX Assay on the Panther System with up to 9 replicates of each sample and can detect HIV RNA levels as low as 1 copy/ml. Thirty-eight of 43 good CD4 cell responders with available samples for testing had �1 replicate test that had detectable HIV RNA. All 10 of the available poor CD4 cell responder samples had detectable HIV RNA in �1 replicate. This finding indicates that persistent viral replication exists, even among individuals thought to have optimal viral suppression. However, despite the frequency of ongoing low-level replication, most cART recipients experience rapid expansion of CD4 lymphocyte populations. It is possible that individuals vary in terms of how on-going low-level viral replication impacts CD4 recovery. The fact that some of the variations in genes we found to be predictive of CD4 response group had known relationships with HIV also supports a continuing role of HIV replication in the setting of host differences in determining immunologic recovery.

Conclusions
This study confirms the finding that increasing age is an independent and strong predictor of poor CD4 recovery during cART. We found that age, in a group of mid-life women, was a better predictor of CD4 recovery than AMH, a biomarker of gonadal aging. This result suggests that loss of ovarian estrogen is not a leading determinant of CD4 recovery during cART. A major contribution of this study, which broadly searched for relationships between host genetics and CD4 cell recovery, is the identification of new genes that may influence immune reconstitution in treated HIV infection. These findings may lead to novel hypotheses and identification of new host factors which contribute to HIV pathogenesis and recovery.  Table. Detailed estimates of gene-wise associations using CMC and KBAC. The study design is listed in column A and is entitled "StudyDesign" which has one of two values: "Novel Loci" identified by exome sequencing and "Candidate Gene" because the given gene was specified a priori. The gene abbreviation is specified in column B. The chromosome location of the gene is listed in column C. The gene identifier is listed in column D. The gene name is defined in column E. The nucleotide start position for the gene is listed in column F. The nucleotide end position for the gene is listed in column G. The gene transcript identifier is listed in column H. The strand (i.e., "+", "-") encoding the gene is listed in column I. The total number of markers (SNPs) included in the CMC-based associated test is listed in column J. ) for non-adherence is listed in column AN. The adjusted estimate of association (beta and associated standard error [SE]) for adherence is listed in column AO. The adjusted estimate of association (beta and associated standard error [SE]), for the first bin ("0") is listed in column AP. The adjusted estimate of association (beta and associated standard error [SE]), for the second bin ("1") is listed in column AQ. The adjusted estimate of association (beta and associated standard error [SE]), for the third bin ("2") is listed in column AR. The adjusted estimate of association (beta and associated standard error [SE]), for the fourth bin ("3") is listed in column AS. The adjusted estimate of association (beta and associated standard error [SE]), for the fifth bin ("4") is listed in column AT. The total number of markers (SNPs) included in the KBAC-based associated test is listed in column AU. The number of multi-marker genotypes estimated for the KBAC test is listed in column AV. The unadjusted p-value for the KBAC-based test of association for each gene is provided in column AW. The KBAC test statistics (i.e., the KBAC score) for the unadjusted estimate of association is listed in column AX. The adjusted (i.e., genomic estimates of race and ethnicity [PC1, PC2], age) p-value for the KBAC-based test of association for each gene is provided in column AY. The KBAC test statistics (i.e., the KBAC score) for the PC1-, PC2-, and age-adjusted estimate of association is listed in column AZ. The adjusted (i.e., genomic estimates of race and ethnicity [PC1, PC2], age, self-reported non-adherence) p-value for the KBAC-based test of association for each gene is provided in column BA. The KBAC test statistics (i.e., the KBAC score) for the PC1-, PC2-, age-, non-adherence-adjusted estimate of association is listed in column BB. (XLSX)

S2 Table. Single nucleotide polymorphisms incorporated into each gene burden analysis.
The study design is listed in column A and is entitled "StudyDesign" which has one of two values: "Novel Loci" identified by exome sequencing and "Candidate Gene" because the given gene was specified a priori. The chromosome location of the gene is listed in column B. The gene abbreviation is specified in column C. The gene identifier is listed in column D. The nucleotide position is listed in column D. The RefSeq identifier (rsID) is provided, if available, in column E. The reference allele is provided in column F. The excess heterozygosity ratio statistic, R, is listed in column G. The p-value for Fisher's Exact test for the association of a given SNP with the CD4 Recovery Group outcome is listed in column H. The odds ratio (OR) and the 95% confidence interval (95% CI) of the association of a given SNP with the CD4 Recovery Group outcome is listed in column I. The sample size is listed in column J. The SNP call rate is listed in column K. The reference minor allele frequency (MAF) is provided in column L. The p-value for the Fisher's Exact Test for the deviation of a given SNP from the Hardy-Weinberg expectation is listed in column M. (XLSX) S1 Appendix. SAS programming. (PDF) S2 Appendix. Detailed specification of the phenotyping algorithm. (PDF) assays used in this study and Sheila Keating for her assistance in arranging this testing. We also thank Sunilkumar Arora for providing additional information on the gene expression data that he deposited in GEO. We particularly thank the participants of WIHS, whose diligence in attending study visits provides such valuable information. De-identified participant data (i.e., clinical, demographic, treatment, next generation DNA sequencing) is securely maintained by the WIHS and is available on request after securing an approved Concept Sheet (https://mwccs.org).
Data in this manuscript were collected by the Women's Interagency HIV Study (WIHS).