Figure 1.
Definitions of HIV-1 clade B nonamer sequence motifs.
The different sequence motifs of the aligned HIV-1 clade B isolates were identified as shown with 20 sequences of a model nonamer position. The “Index” nonamer is the most prevalent sequence, present in 8 of the 20 isolates. The “Major” variant is the most common variant of the index (5/20). “Minor” variants are multiple different repeated sequences, each with incidences less than that or occasionally equal of the major variant. “Unique” variants are those represented by a single aligned sequence. “Nonatypes” (boxed) are the distinct variant sequences at a given nonamer position; in this example one of major, two of minor, and three of unique.
Figure 2.
HIV-1 clade B proteome nonamer sequence entropy and total variants. A.
Entropy (in black) and incidence of the total variants of the index (in red) was measured for each aligned nonamer (nine amino acids) position (1–9, 2–10, etc.) of the proteins. The entropy values indicate the level of variability at the corresponding nonamer positions, with zero representing completely conserved sites (0% total variants incidence), with a maximum of about 9 at the extremely variant sites (∼98% total variants incidence). B. Relationship of entropy and the incidence of total variants for the proteome nonamer positions.
Table 1.
A sample of the quantitative analysis of HIV-1 clade B Env protein diversity∧.
Figure 3.
Distribution of conservation level of nonamer positions.
The nonamer positions of the proteome and the individual proteins were defined as highly conserved (black, index incidence ≥90%), mixed-variable (white, index incidence <90% & >20%), and highly diverse (grey, index incidence ≤20%). Highly conserved positions were only 9% of total proteome nonamer positions, ranging from 0% for Vpr and Tat, to 19% for Pol. Highly diverse positions comprised 14% of the proteome, ranging from 1% Pol to 27% Env and Nef, each. Mixed-variable positions comprised 77% of the proteome, ranging from 69% Env to 88% Vpr.
Figure 4.
Dynamics of diversity motifs of HIV-1 clade B proteome.
A. Motif incidence in relation to total variants incidence: index sequence (orange), total variants (black), major variant (blue), minor variants (pink), unique variants (green), and nonatypes (yellow). B. Violin plot of the frequency distribution of the indicated proteome sequence motifs. The width of the plot (x-axis) represents the frequency distribution of a given incidence of the indicated motif. “x” represents the arithmetic mean incidence value.
Figure 5.
Dynamics of diversity motifs of HIV-1 clade B proteins.
The color key for each sequence motif is described in Figure 4.
Figure 6.
Frequency distribution violin plots of the diversity motifs incidences of HIV-1 clade B proteins.
The legend for the violin plot is described in Figure 4.
Figure 7.
Comparison of HIV-1 clade B Nef concatenated index sequence with the HXB2 and C1P sequences.
The numbers before and after the concatenated index sequence represent amino acid positions of the comparison; the comparison is shown in blocks of 60 amino acids. Identity of the index amino acids with those of HXB2 (green) and/or C1P (blue) is represented by “.”; those that differ are shown by the respective amino acid. Amino acid mutations of the aligned viruses that did not follow the concatenation of the index are shown in red. The corresponding amino acids of HXB2 and C1P sequences are also shown at these positions, without representing identical residues by “.”. The green dashes represent amino acid deletions in HXB2.
Figure 8.
A. Gag protein alignment region of amino acid (aa) positions 310 to 326 is shown for the first and last five sequences of the aligned viruses. The aa position 318 (coloured) is involved in index switching and includes two aa, the prevalent E observed in 59% of the sequences, with its variant D in 41%. B. The nine aligned overlapping nonamer positions (310–318, 311–319, etc.) represent the sliding windows of the alignment region in A. Each nonamer position is shown with the index sequence, two of the variant nonatype sequences and the total number of remaining variant nonatypes of incidence equal or below 10%. The first nonamer position, 310 to 318, is shown with the index sequence (41%) containing aa E at the 9th position. The major variant (38%) contains the variable aa D at the corresponding position. A minor variant (11%) has a mutation at aa position 7 relative to the index. A total of 37 nonatypes had individual incidences less or equal to 10%. The dominance of aa E in the index is maintained for the next few nonamer positions. At position 314 to 322, index switching is observed, where the sequence with aa D is now the index (41%) and the one with E is the major variant (35%). This relationship continued till position 316–324, with reversal of sequence ranks to the original state at position 317–325, where the sequence with aa E is the index. At an index switching position, the index is alternative to that expected, relative to the preceding position. C. Concatenated index, formed by linking the overlapping index sequences of the nine nonamer positions. At position 318 (coloured), the aa D, which did not follow the concatenation of the index is shown below the sequence.