Fig 1.
A.) Schematic of the phylogenetic latent variable model, with a continuous time Markov chain describing the evolution of network edges and an error model mapping the unobserved edge states to continuous, observed data. Input features, including correlation, distance, and hypergeometric probability features, were extracted from several high-throughput datasets, comprising over 16k mass spectrometry experiments B.) Time-calibrated phylogeny of the species analyzed here.
Fig 2.
A.) The distribution of Pearson’s correlation coefficient averaged over 32 co-fractionation experiments on human cell lines. The distribution for gold-standard interacting human protein pairs is barely distinguishable from non-interactors. B.) Error models parameters were manipulated to replicate typical experimental noise. The equilibrium frequencies of the two state, π0 and π1, can be tuned to replicate class-imbalance. The difference between the means δ, was changed to replicate weak signal. The mixture weights for the positive error model, λ1 and λ2, set the false negative rate.
Fig 3.
A.) Performance on simulated training and test (hold-out) data when the model is trained by maximizing either the likelihood or the average precision score (APS). Fitting by APS outperforms fitting under likelihood. The APS is also used as a criterion for goodness of fit and results include all simulation parameter combinations B.) Performance as a function of the mixture weight λ1, the false negative rate, δ, the distance between the means of positive and negative interactions, and the equilibrium frequency π1, which is the expected frequency of positive interactions and is therefore proportional to class imbalance. π1 is also the expected APS value from a random guess.
Fig 4.
A Performance on hold-out sets in four species, measured as precision-recall curves and the average precision score (APS). Three modeling conditions are plotted next to the raw features derived individually in each species from the highest performing (blue) dataset. This dataset was also used for all subsequent analyses. Note that not all features were collected for each species. The higher baseline in flies is due to a lower ratio of negatives to positives in the test data (see methods), not better performance in that species, and in general the species cannot be directly compared to each other due to differences in the test sets. B Conserved orthogroup interactions, where the orthogroups are shared across more taxa, perform better.
Table 1.
Experimental co-elution mass spectrometry data used for analyses, from Wan et al. 2015.
Fig 5.
Reconstructed orthogroup interaction network for the most recent common ancestor of planulozoans (cnidarians + bilaterians).
The model successfully reconstructs known soluble complexes and groups membrane proteins into a large clump. Edge widths are proportional to the z-score transformed score from the PLVM
Fig 6.
Evolution of the Commander complex.
A.) Schematic model (Mallam & Marcotte 2017) and PLVM reconstruction of human Commander subunits. B.) Interactions between subunits of Commander that survived FDR correction at interior nodes of the tree.
Table 2.
Experimental affinity purification mass spectrometry data used for analyses.