On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins
Fig 2
Eigenvalue spectra of the covariance matrix of the natural MSA and for Null models I and II.
We show the cumulative distribution of the unified eigenvalue spectra for the 60-protein dataset DS2, i.e. the fraction of eigenvalues larger than λ is shown as a function of λ. We observe that the phylogeny-aware Null model II shows the same fat tail for large eigenvalues, which is also present in the natural data, while the non-phylogenetic Null model I has a more compact support. The cutoff of the tail for large λ is an effect of the inter-family variability of the largest eigenvalues among the 60 spectra, cf. Fig D in S1 Text for the 9 individual proteins in dataset DS1.