A Horizontal Alignment Tool for Numerical Trend Discovery in Sequence Data: Application to Protein Hydropathy

doi:10.1371/journal.pcbi.1003247

A Horizontal Alignment Tool for Numerical Trend Discovery in Sequence Data: Application to Protein Hydropathy

Figure 2

Empirically determined probability model for protein hydropathy.

A. Inverse Chi-Squared model for the distribution of observed scores. Distributions of Equation 4 scores for HePCaT alignments of length L = 100 obtained from parameters W = 5 residues, GapMax = 4 residues, C = 0.4. Pairs of random sequences were generated, their Kyte-Doolittle amino acid hydropathies averaged over a 15-residue window, and subjected to optimal alignment using HePCaT, as described in the text. Binned data in each case was reasonably fit to the Inverse Chi-Squared probability distribution function (PDF, Equation 5), as described in Methods and tabulated in Table 1. B. Analytical parameters to estimate statistical significance. Parameters ν and σ² for the PDF were observed to vary smoothly as a function of HePCaT alignment length, allowing the parameters, and thus alignment significance, to be analytically estimated for arbitrary alignment length using Equations 6 and 7 and parameters in Table 2. Discrete best-fit parameters for ν and σ² are given in Table 1. Equations for displayed best-fit curves are as follows: y = 0.497609x (Hydropathy, ν), y = 0.160379–1.04167 ln(x+38.9045) (Hydropathy, σ²).

doi: https://doi.org/10.1371/journal.pcbi.1003247.g002