Fig 1.
Mean average pairwise Hamming distance (APHD) of HIV-1 Env SGA/S sequences distinguishes between single and multiple founder viruses.
(A) A training set of SGA/S Env sequences derived from 127 previously published acute HIV-1 infected subjects illustrating a wide range of env diversity. The APHD is calculated using a sliding window of 120bp with a step size of 21bp. The mean APHD is plotted according to Fiebig stages as defined by HIV-1 clinical laboratory test results. (B) A classifier based on a logistic regression segregated 127 subjects into single or multiple infections and correctly assigned 97% of subjects into the respective groups. Each point corresponds to an individual subject with the number of subjects denoted on the x-axis in parenthesis under each Fiebig stage.
Fig 2.
Subject 571373 exhibits low env diversity by both 454 and SGA/S reflective of a single founder virus.
454 and SGA/S analysis of 3′ half sequences from subject 571373 (Fiebig stage II/III). (A) Heat map illustrating a small number of sites exhibiting low-level amino acid sequence diversity across the 3′ half of the HIV-1 genome as detected by 454 deep sequencing. Plotted is the percentage of amino acid diversity at each position with the first amino acid of Vif located in the top left corner of the grid and last amino acid of Nef located in the bottom right corner. Completely conserved residues are black and low-level variant residues (<10%) are dark blue. (B) The average pairwise hamming distance calculated from 454 sequencing reads for the 3′ half of the genome is plotted with the APHD of 0.063 (red line) and standard deviation (dotted black line) shown. The plot shows a relatively uniform population with random sites throughout the genome exhibiting low-level diversity. (C) SGA/Ss from subject 571373 covering the 3′ half of the HIV-1 genome display limited structure on a neighbor-joining (NJ) phylogenetic tree (left) and few nucleotide changes from the intrasubject consensus. The highlighter plots (right) compare sequences for each subject’s sequence set to an intrasubject consensus (uppermost sequence) and illustrates the pattern of nucleotide base mutations within sequences using short color-coded bars. (D) Hamming distance analysis of SGA/Ss showing infection by a single virus with Hamming distance frequencies conforming precisely to model predictions of a single virus infection (red line). As further support for a single founder virus the estimated time to a single most recent common ancestor (MRCA) of 36 days (23–49 days) overlapped with the estimated clinical duration of infection based on Fiebig stages (18–37 days).
Fig 3.
Subject 654207 exhibits high env diversity by both 454 and SGA/S reflective of multiple founder viruses.
(A) Heat maps illustrating a number of sites exhibiting amino acid sequence diversity across the 3′ half of the genome as detected by 454 deep sequencing. Plotted is the percentage of amino acid diversity at each position with the first amino acid of Vif located in the top left corner of the grid and last amino acid of Nef located in the bottom right corner. Completely conserved residues are black and low-level variant residues (<10%) are dark blue, moderately variable residues (20%) are sky blue and highly variant residues (>40%) are green. (B) The average pairwise hamming distance calculated from 454 sequencing reads for the 3′ half of the genome is plotted with the APHD of 0.752 (red line) and standard deviation (dotted black line) shown. The plot shows a variable population with a high number of sites exhibiting throughout the genome exhibiting high-level diversity. (C) SGA/Ss from subject 654207 covering the 3′ half of the HIV-1 genome display a phylogeny (left) revealing productive infection by at least four viruses with inter-lineage recombination. Founder virus lineages are color-coded and labeled variant 1–4. Recombinant sequences are shown by green symbols. The highlighter plots (right) compare sequences for each subject’s sequence set to an intrasubject consensus (uppermost sequence) and illustrates the pattern of nucleotide base mutations within sequences using short color-coded bars. (D) Hamming distance analysis of SGA/Ss showing infection by multiple viruses with Hamming distance frequencies (mean Hamming distance of 35.38) not conforming to model predictions of a single virus infection. Splitting of variants into their respective sub-lineages such as variants 1 and 4 demonstrate Hamming distance frequencies that do conform to model predictions of a single virus infection (red line). Subject 654207 is viral RNA positive but Western blot negative (stage II/III of infection).
Fig 4.
Complexity of acute HIV-1 infection revealed using deep sequencing data and the APHD approach.
(A) Mean APHD of 74 newly deep sequenced acute HIV-1 infected subjects, illustrating a wide range of env diversity plotted according to Fiebig stages. Black circles depict the 6 samples in which SGA/S was also performed. (B) Classification of the 74 subjects into single vs. multiple founder viruses resulted in 63 subjects exhibiting a more homogeneous infection suggestive of productive clinical infection originating from a single virus, and 11 subjects exhibiting distinctly higher diversity indicative of heterogeneous infection and infection by multiple founder viruses. Each point corresponds to an individual subject with the number of subjects denoted on the x-axis in parenthesis under each Fiebig stage.
Table 1.
Multiplicity of HIV-1 infection in HSX, MSM and IDU subjects.
Fig 5.
Selection bias intensified for HSX founder viruses compared to MSM founder viruses.
The transmission index of a sequence was calculated using logistic regression with model weights taken from [12]. Black lines represent the median transmission index for the two risk groups. The overall transmission index of HSX (red circles) viruses is significantly higher than from MSM (blue circles) founder viruses (P = 0.00003, Mann-Whitney two-tailed test). The number of subjects in each category is denoted under each group.
Table 2.
Previously described signature sites enriched in HSX Founder viruses.
Table 3.
Signature sites identified between MSM and HSX Founder viruses in Env using a phylogenetic corrected method.
Fig 6.
Mapping of signature sites on the three-dimensional structure of gp120 shows clustering around the CD4-binding site.
A ribbon representation of the crystal structure from the JRFL gp120 molecule (grey) bound to CD4 molecule (green) (PDBID: 2B4C). The CD4 binding site is highlighted in transparent green while signature sites 283, 343, 362, 389, 429, 465 and 471 are all depicted as red space-filling residues.