Figure 1.
Flowchart for Yeast Two-Hybrid Screens Indicates Systematic and Stochastic Sources of False Negatives and Stochastic Sources of False Positives
Figure 2.
Simplified Schematic Shows the Two-Hybrid Sampling Process
In this picture, true-positive interactions (black edges) are sampled uniformly with total probability 1 − α, and false-positive interactions (red edges) are sampled stochastically with total probability 1 − α. Sampling is with replacement, and multiple edges between a pair of vertices represent multiple observations of the same interaction. The example shows n = 12 edges sampled in the entire network, with w = 11 unique edges and s = 10 edges that are singletons observed once. The total number of true-positive edges, k, and the number of false-positive edges within the sample, f, are hidden. The actual experimental data is more complicated, with individual values reported for n, w, and s for each protein used as a bait. The statistical method presented here provides estimates for k and f together with parameter estimates for α and the distribution Pr(k).
Figure 3.
Number of Unique Interactions (w) and Singleton Interactions (s) Calculated as a Function of the Number of Preys Examined for the Experimental Data (Points)
Extrapolations based on half the data are provided for yeast, worm, and fly based on the TPL-MIXTURE model obtained for each.
Table 1.
Definitions of Symbols
Table 2.
Known Properties of the Experimental Datasets Are Total Number of Baits, N; Mean Number of Preys Sampled per Bait, n̄; Mean Number of Unique Preys, w̄; and Mean Number of Singleton Preys, s̄
Table 3.
Error Rates and Projections for Full Coverage Provided for Yeast (PL-MIXTURE), Worm (TPL-MIXTURE), and Fly (TPL-MIXTURE) Models
Table 4.
Promiscuous Domains
Table 5.
Chaste Domains
Table 6.
Correlation of False-Discovery Rates with Hydrophobicity Scales and Length
Table 7.
The False-Discovery Rate for a Bait Protein, f̂/n, Positively Correlated with the Estimated Number of True Interaction Partners That Are Observed, w − f̂, and the Total Number, k̂
Table 8.
Parameter Estimates for the True-Positive Rates for Avoiding Systematic Losses
Table 9.
Protein Interaction Count Predictions Provided from This Method, k̂, and from a Previous Method, k∩
Table 10.
True-Positive Rates Estimated from Literature Comparisons