Impact of phylogeny on the inference of functional sectors from protein sequence data
Fig 2
Spectrum of the ICOD matrix and of its block diagonal approximation.
Left (resp. middle) panel: spectrum of the ICOD matrix computed on 2000 (resp. 14,000) sequences generated independently at equilibrium, and of its block diagonal approximation. The spectrum of the inverse covariance matrix C−1 is also shown as a reference. Right panel: spectrum of the analytical approximation of the ICOD matrix, and of its block diagonal approximation. Sequences of length L = 200 were sampled independently at equilibrium using the Hamiltonian in Eq 1 with
and τ* = 90. The vector of mutational effect
comprises sector sites (the 20 first sites) with components sampled from a Gaussian distribution with mean 5 and variance 0.25, and non-sector sites (the remaining 180 sites) with components sampled from a Gaussian distribution with mean 0.5 and variance 0.25. The analytical approximation
(see S1 Appendix section 1) was computed from the values of κ and
used for data generation.