Skip to main content
Advertisement

< Back to Article

On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins

Fig 3

DCA scores derived from natural sequence data and from MSA generated by Null models I and II, for datasets DS2 (panel A) of large MSA, and DS3 (panel B) of smaller MSA.

For the protein families under study, we show the histograms of DCA coupling scores FAPC (APC corrected Frobenius norm of couplings, the standard output of plmDCA), for the natural MSA and samples of Null models I and II. Here and in the following, histograms are normalized as probability distributions, i.e. to area one under the curve. It becomes evident that phylogenetic effects create—at least for sufficiently deep MSA—larger couplings than to be expected from finite sample size alone. However, couplings derived from the natural MSA have substantially larger values. The figures include also the positive predictive value (PPV, scale on the right of each panel), providing the fraction of true contacts in between all couplings FAPC above some threshold θ, as a function of θ, for plmDCA run on the natural MSA. We clearly see that almost all large couplings correctly predict contacts, while the PPV starts to drop once we reach FAPC reached also by phylogenetic effects in Null model II. We find this to be true for all non-trivial contacts (sequence separation |ij| > 4) as well as for long-distance contacts (|ij| > 24).

Fig 3

doi: https://doi.org/10.1371/journal.pcbi.1008957.g003