Skip to main content
Advertisement

< Back to Article

Fig 1.

Experimentally identified biochemical domains of the viral proteins under study.

Only reasonably large domains (of 8 or more residues) are presented here; for an extended list, see S1 File. The first and last position in the sequence is shown in bold for each protein.

More »

Fig 1 Expand

Fig 2.

Inferring co-evolutionary structure using the RoCA method.

Illustration of the RoCA algorithm for a simple toy model involving two non-overlapping sectors of co-evolving residues. (A) Data pre-processing that involves computation of the mutational Pearson correlation matrix from a multiple sequence alignment. (B) (Top panel) A spectral analysis on the correlation matrix is performed to distinguish true correlations, encoded in the dominant spectral modes (shown here in red and blue colors), from those which seemingly reflect statistical noise. The observed eigenvalue spectrum is reminiscent of that generally observed in spiked correlation models [18], which includes a bulk of small eigenvalues representing largely statistical noise and a few big eigenvalues (referred to as spikes) representing the true underlying correlations. (Bottom panel) The dominant PCs are estimated to identify the co-evolutionary structure using the proposed robust method. This involves an intelligent data-driven thresholding step based on random matrix theory to identify the set of all correlated residues (those present in both sectors) from statistical noise, followed by an iterative procedure to determine the correlated residues associated with each PC from the set of all correlated residues. Based on the resulting PCs, the groups of co-evolving residues (sectors) are accurately identified. Note that these groups are not necessarily contiguous in the primary sequence, as assumed in this toy model construction. (C) Sectors, inferred using the robustly estimated PCs, are generally closely placed in the 3D structure.

More »

Fig 2 Expand

Fig 3.

Co-evolutionary sectors revealed by RoCA for HIV Gag.

(A) Biplots of the robustly estimated PCs that are used to form RoCA sectors. The sector residues are represented by circles according to the specified color scheme, while overlapping residues (belonging to more than one sector) and independent (non-sector) residues are represented as gray and white circles, respectively. The heat map of the cleaned correlation matrix (Materials and methods), with rows and columns ordered according to the residues in the RoCA sectors, shows that the sectors are notably sparse and uncorrelated to each other. (B) Location of RoCA sector residues in the primary structure of HIV Gag. The sector residues are colored according to the specifications in (A) while remaining residues are shown in gray color. The vertical axis shows the negative log-frequency of mutation for each residue i. (C) Statistical independence of sectors using the normalized entropy deviation (NED) metric. It is a non-negative measure which is zero if two sectors are independent, while taking a larger value as the sectors become more dependent (see Materials and methods for details). For each possible pair of sectors, the inter-sector NED is very small and generally close to the randomized case, while being substantially lower than the maximum intra-sector NED of any individual sector in the considered pair, reflecting that sectors are nearly independent. Corresponding results for the other three proteins are presented in S1 Fig.

More »

Fig 3 Expand

Fig 4.

Individual associations of RoCA sectors with the biochemical domains of the studied viral proteins.

The sectors are colored according to the scheme in Fig 3A. The area of each bubble reflects the statistical significance of the associated result, measured as −1/log10 P, where P is the P-value computed using Fisher’s exact test, and the black circles indicate the conventional threshold of statistical significance, P = 0.05; any P-value lower than that (bubble inside the black circle) is considered statistically significant. The star symbols indicate those RoCA sectors with unknown biochemical significance. Note that the involved structural interfaces were defined based on a contact distance of less than d = 7Å between the alpha-carbon atoms. Similar qualitative results are obtained for d = 8Å or d = 9Å (S2 Fig).

More »

Fig 4 Expand

Fig 5.

Sectors revealed by the PCA-based method [12].

(A) Bar plots showing the merging of multiple RoCA sectors in the sectors revealed by the method in [12] (only the first four are shown) for HIV Gag. The vertical axis of each plot shows the percentage of residues within the different RoCA sectors that fall into the prescribed sector. (B) Biplots of the HIV Gag PCs which are post-processed to form sectors in [12]. The sector residues are represented by circles according to the specified color scheme, while independent (non-sector) residues are represented as white circles. The PCs can be seen to be severely affected by statistical noise. Note the substantial overlap in the support (relevant entries) of the PCs; such overlap is however not present in the formed sectors, as the method in [12] applies heuristic post-processing steps to enforce disjoint sectors. Corresponding results for the other three proteins are presented in S4 Fig. (C) Individual associations of sectors produced by the PCA-based method [12] with the biochemical domains of the studied viral proteins; compare with Fig 4. Only the sectors having statistically significant association with any biochemical domain are presented. The sectors are colored according to the scheme in Fig 3A. The area of each bubble reflects the statistical significance of the associated result, measured as −1/log10 P, and the black circles indicate the conventional threshold of statistical significance, P = 0.05; any P-value lower than that is considered statistically significant. The P-values associated with non-significant associations (P > 0.05) are displayed inside the black circle. Similar results were revealed for the HIV Gag sectors previously reported using the related PCA method in [11] (S6 Fig).

More »

Fig 5 Expand

Fig 6.

Details of the different biochemical domains of viral proteins associated with the respective inferred RoCA sectors.

Sectors are shown as diagonal blocks in the heat map of cleaned correlations with rows and columns re-ordered accordingly (Materials and methods); the heat map is restricted to sector residues only. For each sector, the crystal structure of the associated domains are depicted, when available. The protein chains are shown in gray (in case of dimer structures, chains A and B are depicted in gray and cyan colors, respectively), the relevant domain residues present in the sector are represented as red spheres, and the remaining domain residues are shown as blue spheres. (A) For HIV Gag, sector 1 residues are associated with the membrane-binding domain of p17 (PDB ID 2LYA); sector 2 residues with the p24-SP1 interface (PDB ID 1U57); sector 3 residues with the intra-hexamer and intra-pentamer interface of p24 (PDB ID 3GV2 and 3P05, respectively); sector 5 residues with the two zinc-finger structures of p7 (PDB ID 1MFS); and sector 6 residues with the inter-hexamer interface of p24 (PDB ID 2KOD). No crystal structure is available for the SP2-p6 interface residues that comprise sector 4. (B) For HIV Nef, relevant residues of the biochemical domains are shown on the dimer crystal structure (PDB ID 4U5W). Note that this crystal structure only includes residues 68-204 of Nef. Sector 1 residues are associated with the viral infectivity enhancement function; sector 3 residues with HLA1 down-regulation function (note that four residues (62-65) of this biochemical domain cannot be shown in this crystal structure); and sector 4 residues with both the CD4 down-regulation function and the intra-dimer interface (important for Nef dimerization). (C) For HCV NS3-4A, sector 1 residues are associated with the interface between NS3 and NS4A proteins (PDB ID 4B6E; this crystal structure only includes NS4A residues 21-36) important for (i) membrane association and assembly of a functional HCV replication complex, (ii) activation of the NS3 protease function, and (iii) NS5A hyper-phosphorylation; sector 3 residues with the motif critical for enzymatic and helicase activities of NS3 (no blue sphere is visible as sector 3 comprises all the residues in this motif); and sector 4 residues with the helicase-helicase interface of the NS3 dimer (PDB ID 2F55).

More »

Fig 6 Expand

Fig 7.

Immunological significance of HIV Gag sectors obtained using RoCA.

Association of HIV Gag sectors inferred using RoCA to the epitope residues targeted by T cells of (A) HIV LTNP and (B) HIV RP. The list of residues targeted by the T cells of LTNP and RP patients is provided in S1 Table. The vertical axis of each plot shows the statistical significance of the association, measured as − log10 P, where P is the P-value computed using Fisher’s exact test. The black dashed line corresponds to the conventional threshold of statistical significance, P = 0.05; any value above this line is considered statistically significant.

More »

Fig 7 Expand