Probabilistic Interaction Network of Evidence Algorithm and its Application to Complete Labeling of Peak Lists from Protein NMR Spectroscopy

doi:10.1371/journal.pcbi.1000307

Figure 1.

Conventional stages in protein structure determination by NMR.

After the data have been collected, the challenging “front-end” process leads to sequence-specific amino acid labeling. The “back-end” process then leads to the three-dimensional structure.

More »

Expand

Figure 2.

Conventional process of resonance assignments for a protein labeled with stable isotopes (¹³C and ¹⁵N).

Peaks observed in multidimensional spectra are matched to search for common frequencies. Some common frequencies identify atoms within a residue; others identify atoms in neighboring residues. The common visual aid in this process is a series of paired strip plots from complementary NMR experiments. Strips from CBCA(CO)NH (a and c) and HNCACB (b and d) experiments can be used here to assign the tripeptide Thr-Tyr-His. Starting with C^α (CA) and C^β (CB) frequencies assumed to belong to Thr⁶⁶ (strip a), a horizontal trace (line), arising from the common frequency of NH nuclei, is used to locate C^α and C^β of Tyr⁶⁷ in (strip b). To continue the process, the same peaks are located in (strip c), and the peaks are traced to strip d. In strip d, given the accepted tolerances across spectra (shown by boxes around the selected peaks), several alternative assignments are plausible for His⁶⁸. These additional peaks may be artifacts (false peaks), or peaks from other nuclei with similar frequency. Depending on the starting point of the assignment process, the choice of experiments, the amount of conflicting information, or other factors, an exponentially expanding number of alternative assignments can arise, rendering a computational solution intractable. This difficulty has proved to be a major drawback for NMR structure determination, particularly for larger proteins.

More »

Expand

Table 1.

Backbone and side chain assignment performance of PINE-NMR with NMR data from a representative group of twelve proteins.

More »

Expand

Figure 3.

Illustration of the system of neighborhoods built around each data value in PINE.

Each input data point (S) is linked to a set of labels (L) with associated weights. Similarity measures and constraints are utilized to construct each neighborhood system or topology (as denoted by the arrows).

More »

Expand

Figure 4.

Global network of relationships in PINE-NMR.

A set of probabilistic influence sub-networks are combined into a larger influence network. The iterative probabilistic inference on the complex network ensures globally consistent labeling.

More »

Expand

Figure 5.

Spin system generation network in PINE-NMR.

The peaks in the most sensitive experiments in the data are used initially as reference peaks. Aligning the peaks along the common dimensions and registering them with respect to reference peaks enables us to define a common putative object called the spin system. Spin systems are then assembled to derive triplet spin systems.

More »

Expand

Figure 6.

Graphical network for backbone chemical shift assignments.

Overlapping tripeptides (triplet residue) are evaluated. The weights on the edges are derived from amino acid typing, secondary structures, connectivity experiments, and possible outlier assignments. According to the statistical physics model described in the text, application of the belief propagation algorithm yields the marginal probabilities for backbone assignments.

More »

Expand