Regulatory Polymorphisms in the Cyclophilin A Gene, PPIA, Accelerate Progression to AIDS

Human cyclophilin A, or CypA, encoded by the gene peptidyl prolyl isomerase A (PPIA), is incorporated into the HIV type 1 (HIV-1) virion and promotes HIV-1 infectivity by facilitating virus uncoating. We examined the effect of single nucleotide polymorphisms (SNPs) and haplotypes within the PPIA gene on HIV-1 infection and disease progression in five HIV-1 longitudinal history cohorts. Kaplan-Meier survival statistics and Cox proportional hazards model were used to assess time to AIDS outcomes. Among eight SNPs tested, two promoter SNPs (SNP3 and SNP4) in perfect linkage disequilibrium were associated with more rapid CD4+ T-cell loss (relative hazard = 3.7, p = 0.003) in African Americans. Among European Americans, these alleles were also associated with a significant trend to more rapid progression to AIDS in a multi-point categorical analysis (p = 0.005). Both SNPs showed differential nuclear protein-binding efficiencies in a gel shift assay. In addition, one SNP (SNP5) located in the 5′ UTR previously shown to be associated with higher ex vivo HIV-1 replication was found to be more frequent in HIV-1-positive individuals than in those highly exposed uninfected individuals. These results implicate regulatory PPIA polymorphisms as a component of genetic susceptibility to HIV-1 infection or disease progression, affirming the important role of PPIA in HIV-1 pathogenesis.


Introduction
As an obligate intracellular parasite, HIV type 1 (HIV-1) utilizes host cell factors for its replication. Human cyclophilin A (CypA), also known as peptidyl prolyl isomerase A (PPIA), is a ubiquitous cytoplasmic protein (by convention, we refer to the protein as CypA and the gene as PPIA). CypA has long been known for its incorporation into HIV-1 virions and its important role in facilitating HIV-1 replication in host cells [1,2]. CypA is a member of the cyclophilin family, members of which all possess peptidyl-prolyl cis/trans isomerase activity. Peptidyl prolyl cis/trans isomerases (PPIAses) catalyze the cis/ trans isomerization of prolyl peptide bonds and are believed to be involved in protein folding [3]. The incorporation of CypA into the HIV-1 virion capsid is mediated through the direct binding between prolyl peptide bond located in a proline-rich loop of the fourth and fifth helices of the HIV-1 capsid and the active sites of CypA [4,5]. Disruption of CypA incorporation, either by HIV-1 Gag mutations or by cyclosporine A, an immunosuppressive drug that prevents HIV-1 Gag binding to CypA, leads to an attenuation of HIV-1 infectivity [2,6]. Braaten and Luban found that HIV-1 replication was decreased in CypA-null human CD4 þ T cells, in which the gene encoding CypA (PPIA) was deleted through homologous recombination [7]. CypA is therefore an important host factor that regulates HIV-1 replication.
Recently, the role of CypA in HIV-1 has gained even greater attention with the discovery of a fusion protein of CypA and TRIM5a, a host restriction factor against HIV-1 [8], which confers HIV-1 resistance in owl monkey [9][10][11]. Both TRIM5a and CypA recognize and act on the capsid of HIV-1, but apparently confer opposite effects. TRIM5a restricts HIV-1 by promoting premature disassembly of HIV-1 capsid [12], while CypA increases viral infectivity by facilitating proper uncoating. Although the interaction between CypA and TRIM5a is still unclear, it appears that the modulation of HIV-1 infectivity by CypA is independent of TRIM5a [11,[13][14][15]. It has been postulated that binding of CypA to capsid protects HIV-1 from an unknown restriction factor in humans [15].
The study of the influence of human gene variation on susceptibility to HIV-1 infection and progression is an approach that may reveal the in vivo host factor HIV-1 interactions and their epidemiologic importance at the population level. With this approach we have identified several AIDS-modifying variants in genes TRIM5 [16], APOBEC3G [17], and CUL5 [18] that encode human innate HIV-1 restriction factors or related proteins. In a recent study performed in the Swiss HIV Cohort Study, a variant of PPIA was implicated as a potential factor affecting HIV-1 disease progression [19]. As CypA is an essential human protein for completion of the HIV-1 life cycle, we have assessed the influence of genetic variation in the PPIA gene on HIV-1 infection and disease progression in five United States-based HIV-1 natural history cohorts.

Description of PPIA Variations
The PPIA gene is approximately 7 kb in length, consisting of five exons ( Figure 1A). Using PPIA-specific primers we resequenced virtually the entire PPIA gene to screen for single nucleotide polymorphisms (SNPs) in 92 European Americans (EA) and 92 African Americans (AA). Seven polymorphisms were discovered: three in the putative promoter region, one in the 59 UTR, one in the 39 UTR, one in intron 4, and one in the second Alu repeat region upstream of the putative promoter. No SNPs were found in the coding regions. These SNPs and four additional SNPs available from dbSNP database located in regions not detected by our sequencing were selected for genotyping in the AIDS cohorts ( Figure 1 and Table 1). SNP1 genotypes deviated from the frequencies expected under the Hardy-Weinberg equilibrium (p , 0.001), probably due to its location in the Alu repetitive element; thus, SNP1 was excluded in the analysis. SNP2 was also excluded from analysis due to its rarity. The genotypic frequencies of all other SNPs conform to Hardy-Weinberg expectations.
Allele frequency distributions of PPIA SNPs differed in EA and AA with Fst values ranging from 0.03 to 0.40 (average 0.22). The difference is particularly pronounced for SNPs 7, 8,

Author Summary
Individual risk of acquiring HIV type 1 (HIV-1) infection and developing AIDS is not equal; some people are more prone to HIV/AIDS than others. Susceptibility to HIV-1/AIDS is likely determined by a combination of environmental, viral, and host genetic factors. Genetic variations in host cellular factors involved in HIV-1 cell entry, replication, and host defense have been found to affect susceptibility to HIV-1/AIDS. In this report, we focused on the gene PPIA that encodes cyclophilin A, a human cellular protein that is incorporated into the HIV-1 virion and promotes viral replication. We studied genetic variation in the PPIA gene in persons with different susceptibility levels to HIV-1 infection or different rates of disease progression. We found that individuals who processed two functional variants in the promoter region of PPIA had higher risk of CD4 þ T-cell loss or progression to AIDS-defining diseases. We also observed that an additional variant occurred more frequently in HIV-1-infected individuals compared to HIV-1-exposed, but uninfected, individuals. These results suggest that genetic variation in PPIA may influence host susceptibility to HIV-1 infection or disease progression and targeting PPIA might provide therapeutic benefit.

Linkage Disequilibrium and Haplotypes of PPIA SNPs
The extent of linkage disequilibrium (LD) among eight SNPs was assessed by calculating all pairwise D9 values, separately for AA and EA (Figure 2A and 2B). Strong LD was observed among almost all SNPs in AA (D9 range 0.87-1.0) and even stronger LD in EA (D9 range 0.97-1.0). SNP3 and SNP4 were in perfect LD in both populations (D9 ¼ 1, r 2 ¼ 1). As each of these two SNPs carries the same information content, SNP4 was selected as a proxy for SNP3 in the analysis. The eight SNPs in both population groups formed a single LD block as defined by the solid spine of LD method [20]. This provided a justification to use all eight SNPs spanning the entire region as one block for subsequent haplotype-based association analyses.
Among AA, SNP6 presented intermediate level of LD (D9 ;0.60) with the SNPs downstream (SNP8-11); SNP6-SNP8 recombinant G-C haplotype had a frequency of 5%. Among EA, SNP5 had weak LD with SNP6-11 (D9 0.47-0.71); the recombinant haplotype SNP5-SNP6 (G-G) had a frequency of 2.5%. This suggests an occurrence of recombination in the region between SNP5 and SNP8. The presence of multiple Alu elements in PPIA may have introduced the recombination event by promoting genomic instability [21].
Haplotype structure and maximum likelihood haplotype frequencies were estimated with the EM algorithm. There were, in total, eight haplotypes with minor allele frequency .1% in either population ( Figure 1B). Five or six haplotypes were present in EA or AA, respectively, accounting for .97% of total sampled chromosomes. Among these, only three common haplotypes (minor allele frequency .5%) were shared between AA and EA (hap1, 2, and 4). Diverse distributions of haplotype frequencies between AA and EA were seen as for SNPs.

Effect of PPIA SNPs and Haplotypes on Susceptibility to HIV-1 Infection
We compared the PPIA SNP allele and haplotype frequency distributions among three groups with increasing resistance to HIV-1 infection: seroconverters (SCs), seronegatives (SNs) belonging to an HIV-1 risk group, and those with documented high-risk exposures to HIV-1 who remain uninfected (HREU) ( Table 2). The minor allele of SNP5 (G) was carried in 13.7% of SCs, 13.5% of SNs, and 9.5% of HREU, respectively (Mantel-Haenszel trend test, p ¼ 0.21). The reduced carriage of SNP5G in HREU was significant when SCs were compared to HREU (odds ratio ¼ 1.78, p ¼ 0.02), suggesting that the SNP5G allele carriers may have increased susceptibility to HIV-1 infection ( Table 2). No distortion of frequency distribution between risk groups were observed for all other SNPs in AA or EA (unpublished data).

PPIA SNPs and AIDS Progression
We analyzed AA and EA separately since the SNP frequencies differed between the two groups. The Cox proportional hazards model was employed to test the potential differential impact of SNPs on the rates of progression to CD4 ,200 or to AIDS clinical diseases (Table  3). To minimize the number of SNPs to be tested, we only tested the unique SNPs represented in each population, i.e., only one SNP was tested for those in perfect LD (r 2 ¼ 1). Six and four SNPs were analyzed in AA and EA, respectively. Two SNPs in AA (SNPs 4 and 5) and three SNPs in EA (SNPs 4, 5, and 7) showed significant or near-significant effects ( Table 3).

Effect of SNP4 on AIDS Progression
Among AA SCs, the minor allele of SNP4 (the G allele) was associated with accelerated loss of CD4 þ T cells (relative   (Table 3). After stratifying by cohort in the Cox regression analysis, this association became slightly stronger (RH ¼ 4.08, 95% CI ¼ 1.71-9.70). Kaplan-Meier survival analysis presented a clear separation of curves stratified for the SNP4 C/C and C/G genotypes (G/G absent) on progression to CD4 ,200 in AA (p ¼ 0.002, log-rank test, Figure 3). Notably, all SNP4 carriers progressed to CD4 ,200 within 6 y. The effect on AIDS-1987 was in the same direction but short of significance (Table 3). Identical results were seen for SNP3 (unpublished data). Among EA SCs, a trend of accelerated progression to AIDS-1987 was also observed, though short of significance (RH ¼ 1.34, p ¼ 0.07, Table 3).
To affirm the observed genetic influence of SNP4, we performed a categorical analysis of HIV-1-infected people that included seroprevalents (SPs) in the long-term survivors ( Figure 4). SPs who had remained AIDS-free for greater than 7.5 y for AA or 10 y for EA after study enrollment (the median time to AIDS for each) were included in this analysis. The frequency distributions of SNP4 in six discrete time intervals progressing to disease outcome after HIV-1 infection were tested for statistical trend using a Mantel-Haenszel test. As shown in Figure 4, the SNP4 G allele occurred more frequently among fast progressors in both AA and EA. With an increased sample size of 389 patients in AA, a significant trend toward rapid CD4 loss is still preserved (p ¼ 0.04, Figure  4A). Among 970 EA patients, the trend toward rapid progression to AIDS was more pronounced (p ¼ 0.004) ( Figure 4B). Thus, the results from SC plus SP groups corroborate the findings from the SC-alone survival analyses, indicating that the accelerating effect of SNP4 is consistent in both AA and EA populations.

PPIA SNP5 and AIDS Progression
In a Swiss Caucasian HIV-1 cohort, SNP5G, referred to as 1650A/G [19], was reported to be associated with a rapid CD4 cell depletion and with a trend toward higher in vitro HIV-1 replication [19]. In our analysis of EA SCs, SNP5G showed a nonsignificant trend toward rapid progression to AIDS-1987 (RH ¼ 1.26, p ¼ 0.08), but had no effect on CD4 cell depletion, while an opposite trend was observed in AA (Table 3). In a categorical analysis of combined samples from SCs and SPs, SNP5G also had no impact on both outcomes in both EA and AA (p . 0.23, unpublished data). Therefore, this study offers no convincing evidence for SNP5 association with disease progression in either EA or AA.

Effects of PPIA Haplotypes on AIDS Progression
In the haplotype analysis, we first used the Cox regression model to test the global null hypothesis of no association between haplotype frequency and AIDS progression, by comparing a model with all haplotypes to the covariants-  only model. This revealed that the global distributions of haplotype frequencies were significantly or near-significantly different for the outcome of CD4 ,200 in AA (p ¼ 0.01) and AIDS-87 in EA (p ¼ 0.08). The distortion of frequency distribution was mainly attributable to one haplotype (Hap7) ( Table 4). Hap7, the only haplotype carrying SNP4G (Figure 1), was associated with accelerated progression to AIDS in both AA and EA (Table 4), consistent with the results from SNP analysis.

Interactions between TRIM5 and PPIA Variants
Because TRIM5a interferes with and CypA facilitates postentry uncoating of the HIV-1 capsid, it is possible that there may be genetic interactions between two genes. Potential interactions were tested only between variants in two genes with functional plausibility or high strength of association.
Specifically, PPIA SNP4 was tested for its interaction with TRIM5-rs16934386 and TRIM5-R136Q, which was previously reported using the same patient population to be associated with HIV-1 infection but not progression [22]. In the Cox regression model analysis, we observed no obvious interaction for infection or progression (p . 0.15, unpublished data).

Electrophoretic Mobility-Shift Assay
As SNP3, 4, and 5 reside in the regulatory region of PPIA, we performed gel shift assays to assess whether these SNPs differentially bind to transcriptional factors ( Figure 5). In this experiment, ;25-bp DNA probes containing either of the SNP alleles were incubated with the nuclear extracts from Th1 cytokine-stimulated human T lymphocytes. Both SNPs showed differential binding affinity to nuclear protein(s) with the same mobility ( Figure 5A, top band). Averaged from two independent experiments, a 4-fold decrease or 2.8-fold increase in the band density was observed for the minor allele of SNP3 (G) and SNP4 (G), respectively. The cold unlabeled oligonucleotides with minor alleles were unable to compete with wild-type oligonucleotides, suggesting the bindings are specific. These results indicate that these two SNPs within the promoter alter the binding affinity with certain transcriptional protein(s). On the contrary, no obvious differential binding was observed for SNP5 ( Figure  5A).
In silico analysis of the sequence at these three SNP sites by using TESS software also predicted differential binding for SNP3 and 4, but not for SNP5. SNP3 and 4 were each located in one of the consensus-binding motifs of transcription factor SP1, as shown in Figure 5B. The local sequence with SNP3C is identical to the consensus sequence of SP1 site, GGGGCC, whereas SNP3C.G change perturbs this motif. In contrast, the sequence with SNP4C has two mismatch sites compared to an alternative consensus sequence of SP1 site, GAGGCGGGGC, whereas SNP4C.G reduced the mismatch to one, predicting a stronger binding. There is, therefore, a plausible biochemical basis for the differential binding affinity observed in the gel shift experiment. These results also suggest that SNP3 and SNP4 affect transcription, likely through their interactions with SP1. The sequence of the region of human PPIA promoter resided in by SNP3 and SNP4 was compared to the corresponding sequence from other species ( Figure 5C). SNP3 and SNP4 are both located in the highly conserved regions. The C allele of SNP3 is present  in both primates and rodents. The C allele of SNP4 occurs only in primates and the minor allele G occurs in rodents, suggesting a possible association with speciation. These interspecies data support the hypothesis that SNP3 and SNP4 changes may have functional consequences.

Discussion
The role of CypA in promoting HIV-1 replication has been well established through extensive in vitro experiments [1,2,6,7]. In this study, we have undertaken a systematic investigation of the association between genetic variations in the PPIA gene and susceptibility to HIV-1 infection and disease progression in HIV-1 natural history cohorts. We found that two promoter variants in perfect LD, SNP3 and SNP4, in PPIA, were associated with rapid HIV-1 disease progression. This effect was revealed for CD4 cell depletion or for AIDS progression, reflecting early or late stage of disease progression, in AA and EA, respectively. The racial difference in the strength or timing of the effects may be due to interactions with other sequence variations that are distributed differently in two populations or to immunologic and viral variables. We further found that both SNPs had differential binding efficiency to nuclear proteins in the gel shift experiment. These findings suggest that the functional promoter SNP3/SNP4 influences HIV-1 disease progression.
PPIA in human encodes a 165-amino acid protein that is highly conserved across species, with 100% and 96% sequence identity to rhesus monkey and mouse, respectively. The nucleotide sequence of PPIA is 98% identical between human and chimpanzee. No variation in the coding sequence of human PPIA was found in this study or in the dbSNP database. This suggests that differential genetic impact of PPIA SNPs and haplotypes on infection and progression are likely due to regulatory region variation affecting protein levels. The regulation and promoter function of human PPIA is still undefined. PPIA is highly expressed in almost all cell types and is considered a housekeeping gene. Sequence analysis revealed that PPIA promoter contains a TATA box and two SP1 binding sites and is highly GC-rich (;70%) with overrepresented CpG dinucleotides [24]. In this study, we identified two SNPs, each located in two additional SP1 binding elements. SP1 is generally believed to function as a transactivating factor for many housekeeping genes [25]. These two SNP changes, SNP3G and SNP4G, always occurring together through LD, had opposite effects on the nuclear protein binding. Assuming the predicted stronger SP1 binding effect of SNP4 is dominant over the weaker SP1 binding of SNP3, a higher PPIA gene expression would be predicted. This would be consistent with more rapid disease course as observed in this study.
Based on our analyses of eight SNPs covering the entire gene region of PPIA and their haplotypes, we found that the PPIA disease-modifying effects were most likely afforded by promoter SNP3 or SNP4 and the associations with the downstream SNPs 7-11 were largely due to tracking of these two SNPs through LD. In a previous report, Bleiber et al. studied the ex vivo HIV-1 infectivity in CD4 þ T cells from healthy individuals carrying the SNP4 and SNP5 alleles (named as PPIA-1604 and PPIA-1650, respectively, in the Bleiber's report), as well as their impact on the CD4 gradient (from CD4 þ T-cell counts 500/mm 3 to 200/mm 3 ) among SPs enrolled in the Swiss HIV cohort, largely comprised of Caucasians [19]. In the ex vivo experiment, both SNPs showed nonsignificant tendencies toward greater HIV-1 replication, with the effects of SNP5 being more pronounced. In the population study, both SNPs were associated with faster CD4 þ T-cell depletion in a model also containing two other candidate genes (model 2); with the SNP5 of PPIA effect again being more pronounced. Thus, the detrimental effect of SNP4 observed from these two studies is largely consistent. In contrast, no evidence supporting a role of SNP5 in AIDS progression was found, although the SNP5 G allele was carried significantly less often in HREU than in SC but was similarly distributed in SC and SN. Thus, the role of CypA polymorphism in HIV-1 infection remains inconclusive and warrants further investigation. Taken together, the association and functional results from these two studies point to a role of PPIA genetic variation in HIV-1/AIDS. In future studies, it may be informative to determine the mechanism of association between PPIA variants and host susceptibility to HIV-1 infection or disease progression in view of the interaction between CypA and the capsid domain of the Gag polyprotein.
Population-based genetic association studies provide a powerful approach to uncovering host genes that confer disease susceptibility or resistance. This approach has led to identification of various genetic factors that affect HIV-1/ AIDS, and has provided unique insights into the host HIV-1 interaction [26,27]. The successful identification of true genetic associations requires a large sample size (to minimize the type II error), replication in independent studies (to ward off type I error), and plausible functional evidence (to infer causal relationship). In this study, the plausible geneticmodifying role of PPIA is supported by replication in two independent ethnic groups and the demonstration that SNPs 3 and 4 are each associated with altered binding affinities to transcription factors. Moreover, the previous independent study also provides supportive association and functional evidence [19].
Our comprehensive analysis of nearly all variant alleles and their haplotypes in the PPIA gene is, in essence, equivalent to a gene-based approach [28]. A major problem facing genetic association studies is the difficulty in replication, particularly for single SNP associations. One reason for failure to replicate previous studies is due to potential differences in allele frequencies and LD structure across populations. Genebased replication has recently been advocated as a gold standard for replication studies [28]. The entire gene with prior SNP association is considered the functional unit and is examined for association with effectively all genetic variation in the gene. Nonreplication due to population differences may be minimized as local allele frequencies and LD structure from study populations are considered with this approach [28]. Thus, our comprehensive survey of PPIA genetic variation and LD structure may facilitate future comparisons of replication studies using a gene-based approach.
In summary, through genetic epidemiological and func-tional approaches, we have identified two promoter SNPs in PPIA as potential genetic modifiers of HIV-1 disease progression. Our findings corroborate the notion that genetic variation of PPIA influences AIDS pathogenesis and provide in vivo evidence that CypA is a critical host protein in interaction with HIV-1. Manipulation of PPIA may be considered as a plausible option for anti-HIV-1 therapeutic development as previously explored [29].

Materials and Methods
Study participants. Study participants were enrolled in five United States-based natural history HIV/AIDS cohorts. AIDS Link to the Intravenous Experience (ALIVE) is a community-based cohort of intravenous injection drug users in Baltimore enrolled in 1988-1989 [30], consisting of 92% AA; Multicenter AIDS Cohort Study (MACS) is a longitudinal prospective cohort of men who have sex with men (MSM) from four United States cities: Chicago, Baltimore, Pittsburgh, and Los Angeles, enrolled in 1984-1985 [31], consisting of 83% EA and 10% AA; the San Francisco City Clinic Study (SFCC) is a cohort of MSM originally enrolled in a hepatitis B study in 1978-1980 [32], consisting of 96% EA; Hemophilia Growth and Development Study (HGDS) is a multicenter prospective study that enrolled children with hemophilia who were exposed to HIV-1 through blood products between 1982 and 1983 [33], consisting of 72% EA and 11% AA; the Multicenter Hemophilia Cohort Study (MHCS) is a prospective study that enrolled persons with hemophilia [34], consisting of 90% EA and 6% AA. The participant group is comprised of HIV-1 SCs (infected after study enrollment), SPs (infected at study enrollment), SNs, and HREU. The MACS, MHCS, SFCC, and ALIVE cohorts consist of both SCs and SPs among HIV-1-infected individuals. Due to the potential frailty bias (missing the most rapid progressors to AIDS and death) among SPs, only SCs from these four cohorts were used in the survival analysis. SPs were also included for allele frequency estimation, haplotype inference, and disease categorical analysis. The number of participants studied in each risk or disease category was as follows: SC The date of seroconversion after study enrollment was estimated as the midpoint between the last seronegative and first seropositive HIV-1 antibody test; only individuals with less than 2 y elapsed time between the two tests were included in the seroconverter progression analysis. The censoring date was the earliest of the date of the last recorded visit, or December 31, 1995 for the MACS, MHCS, HGDS, and SFCC or July 31, 1997 for the ALIVE cohort to avoid potential confounding by highly effective anti-retroviral therapy (HAART). The censoring date was extended in the ALIVE cohort because of delayed uptake of HAART in this group [30,35].
HIV-1-uninfected individuals were classified into two categories based on individual's documented exposure levels to HIV-1. HREU individuals were those 80 AA and 145 EA with documented high-risk exposure through sharing of injection equipment [36], who had anal receptive sex with multiple partners [37], or transfusions with Factor VIII clotting factor prior to 1984, when heat treatment was initiated [38]. SN individuals (n ¼ 420 and 571, respectively, for AA and EA) are those enrolled in the cohorts who remained HIV-seronegative despite ongoing or prior risk activity.
The study protocols were approved by the Institutional Review Boards of participating institutions and informed consent was obtained from all study participants.
Identification of SNPs. Nucleotide polymorphisms were discovered in a panel of 92 EA and 92 AA, representing the extremes of the distribution for rapid and slow progression to AIDS and HREU. A nonisotopic RNA cleavage assay following PCR was employed to screen for polymorphisms [39]. PPIA has a high degree of homology to multiple processed pseudogenes that varies from 75% to 95% in the exon regions. PPIA also contains six copies of Alu repeats, each with a length of approximately 300 bp, located in the immediate upstream of the putative promoter region and introns [24]. Sequence comparison and BLAST search were performed to select PPIAspecific PCR primers. Overlapping primers covered nearly the entire PPIA gene region including the putative promoter region, 59 and 39 UTRs, all five exons, as well as the Alu repeat regions. A part of intron 1 was not covered due to the high GC content. Primer sequences are presented in Table S1 and are numbered according to the GenBank DNA sequence X52851. Additional intronic SNPs were selected from NCBI dbSNP (http://www.ncbi.nlm.nih.gov/SNP) and HapMap databases (http://www.hapmap.org), by considering location, spacing, and allele frequency at least 10%. Haplotype tagging (ht)SNPs were given preference in the SNP selection. The ancestral allele state of the SNPs was based on a reference chimpanzee sequence.
Genotyping of SNPs. Genotyping was performed using PCRrestriction fragment length polymorphism (PCR-RFLP) assay or TaqMan assays. PCR-RFLP was carried out with 35 cycles of denaturing at 94 8C for 30 s, annealing at 60 8C for 30 s, and extension at 72 8C for 45 s. The PCR product was digested with respective restriction enzymes (New England Biolabs, http://www.neb. com) overnight and then separated on 3% agarose gels. TaqMan assays were obtained from the Assay-by-Demand service of Applied Biosystems (http://www.appliedbiosystems.com). Genotyping primers and conditions were presented in Table S2. Eight water controls were included on each plate to monitor the potential PCR contamination, and 10% of SC and HREU samples were genotyped twice. The genotypes obtained were free of water contamination or of inconsistencies between duplicates.
Statistical analysis. To assess the difference in allele frequency distribution in two populations, we performed a contingency v 2 test for each marker to test the null hypothesis that the allele frequencies are the same in the two populations. Fst values were estimated by Wier and Cockerham's method [40].
Pairwise LD was quantified using the absolute value of D9. Absolute values of D9 range from 0 for independence to 1 for complete LD between the pairs of loci. LD plots were generated utilizing Haploview (http://www.broad.mit.edu/mpg/haploview) [20]. A triangular matrix of D9 value was used to demonstrate LD patterns within AA and EA. Haplotype blocks were estimated using the solid spine of LD method [20]. Haplotype blocks were defined with a default algorithm based on confidence intervals of D9 [41], or the solid spine of LD method, which creates blocks of SNPs that have contiguous pairwise D9 values of greater than 0.8. With the latter method, the first and last SNPs in a block are in strong LD with all intermediate SNPs, but the intermediate SNPs are not necessarily in LD with each other [20]. Eight SNP haplotype frequencies were inferred separately for each population by means of an expectation-maximization algorithm [42].
Association analyses were conducted using the statistical packages SAS (version 9.0, SAS Institute, http://www.sas.com). EA and AA groups were analyzed separately because allele and haplotype frequencies were quite different between the two groups. Conformity to the genotype frequencies expected under Hardy-Weinberg equilibrium was examined for each SNP. The genetic effects of SNPs on HIV-1 infection susceptibility were assessed by comparing allelic and genotypic frequencies between HIV-1 HREU and HIV-1 SC participants using the chi square or Fisher's exact test. Regardless of the exposure route, persons in the HREU or SN groups were at risk to HIV-1 infection based on their inclusion in a HIV-1 risk group; therefore, we combined participants across cohorts to achieve a reasonable statistical power.
Kaplan-Meier survival statistics and the Cox proportional hazards model (Cox model) were used to assess the effects of SNPs and haplotypes on the rate of progression to AIDS. Two separate endpoints reflecting advancing AIDS pathogenesis were considered for SCs: (1) [43]. The significance of genotypic associations and RH was determined by unadjusted and adjusted Cox model regression analyses. For each SNP, we compared the minor allele genotypes to the most common genotype as a reference group. All p-values were two-tailed. Genetic factors previously shown to affect progression to AIDS were pre-determined to be included as confounding covariates in the Cox model analysis: CCR5 D32, CCR2-64I, CCR5-P1, HLA-B*27, HLA-B*57, HLA-B*35Px group (including HLA-B*3502, B*3503, B*3504, and B*5301), and HLA Class I homozygosity for EA (reviewed in [26,27]); HLA-B*57 and HLA Class I homozygosity for AA. CCR2-64I, HLA-B*27, and HLA-B*35Px were not considered as covariates in AA due to no or weak effects in the AA participants, and CCR5 D32 was not considered due to its rarity in AA. Analyses were stratified by sex and by age at seroconversion: 0-20, .20-40, and .40 y [27]. Further stratification by cohort was also performed for the exploratory analyses. In the stratified Cox regression model, the overall log-likelihood of hazards obtained is the sum over strata of the stratum-specific hazards, as estimated by the method of partial likelihood. As the same criteria for determining startpoints (seroconversion date) and endpoints and the similar sampling strategy and follow-up settings were used across cohorts, we combined SCs from all cohorts for the survival analysis to increase the power. Although these cohorts differ in routes of HIV-1 transmission, no appreciable effect of mode of infection on AIDS progression has been found through re-analysis of more than ten thousands of SCs from 38 studies around the world (including three used in this study) [44].
To test the association of PPIA haplotypes and HIV-1 disease progression, we first performed a global test in the Cox regression model for each of two disease outcomes separately for AA and EA. The global null hypothesis is that the odds ratios of all haplotypes are equal between cases and controls. Likelihood ratio tests were used to compare a full model with all haplotypes and a base model with only covariates. When the significance of the global test exceeded a relaxed nominal level, p , 0.10, the associations of individual haplotypes were further tested.
To assess the level of correction factor for the number of SNPs, assuming this was a discovery study, we applied spectral decomposition analysis [45]. This multiple testing correction method assesses the equivalent level of independent SNPs taking account of the extensive LD across PPIA. Based on spectral decomposition analysis of SNPs in this study, a corrected p-value of 0.01 would be equivalent to p ¼ 0.05. As this study was a confirmation and extension study of markers with previous positive association, uncorrected pvalues were reported.
Funding. This project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under contract N01-CO-12400. This research was supported in part by the Intramural Research Program of the National Institutes of Health, National Cancer Institute, Center for Cancer Research, and National Human Genome Research Institute.
The National Institute on Drug Abuse, National Institutes of Health, under grant DA-04334 provided funding for the ALIVE cohort. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government.
Competing interests. The authors have declared that no competing interests exist.