Accelerated Profile HMM Searches
Figure 3
MSV scores follow a predictable distribution.
A: example MSV score distributions for a typical Pfam model, CNP1, on random i.i.d. sequences of varying lengths from 25 to 25,600, with the shortest, typical, and longest lengths highlighted as red, black, and blue lines, respectively. The predicted distribution, following the procedure of [31] including an edge correction on the slope
, is shown in orange (though largely obscured by the data lines right on top of it). B: Histogram of maximum likelihood
values obtained from score distributions of 11,912 Pfam models, showing that most are tolerably close to the conjectured
, albeit with more dispersion for default entropy-weighted models (black line) than high relative entropy models without entropy-weighting (gray line). C: The observed fraction of nonhomologous sequences that pass the filter at a P-value of 0.02 should be 0.02. Histograms of the actual filter fraction for 11,912 different Pfam 24 models are shown, for a range of random sequence lengths from 25 to 25,600, for both default models (black lines) and high relative entropy models with no entropy weighting (gray lines).