Co-evolution networks of HIV/HCV are modular with direct association to structure and function
Fig 2
Inferring co-evolutionary structure using the RoCA method.
Illustration of the RoCA algorithm for a simple toy model involving two non-overlapping sectors of co-evolving residues. (A) Data pre-processing that involves computation of the mutational Pearson correlation matrix from a multiple sequence alignment. (B) (Top panel) A spectral analysis on the correlation matrix is performed to distinguish true correlations, encoded in the dominant spectral modes (shown here in red and blue colors), from those which seemingly reflect statistical noise. The observed eigenvalue spectrum is reminiscent of that generally observed in spiked correlation models [18], which includes a bulk of small eigenvalues representing largely statistical noise and a few big eigenvalues (referred to as spikes) representing the true underlying correlations. (Bottom panel) The dominant PCs are estimated to identify the co-evolutionary structure using the proposed robust method. This involves an intelligent data-driven thresholding step based on random matrix theory to identify the set of all correlated residues (those present in both sectors) from statistical noise, followed by an iterative procedure to determine the correlated residues associated with each PC from the set of all correlated residues. Based on the resulting PCs, the groups of co-evolving residues (sectors) are accurately identified. Note that these groups are not necessarily contiguous in the primary sequence, as assumed in this toy model construction. (C) Sectors, inferred using the robustly estimated PCs, are generally closely placed in the 3D structure.