Next-generation yeast-two-hybrid analysis with Y2H-SCORES identifies novel interactors of the MLA immune receptor

Protein-protein interaction networks are one of the most effective representations of cellular behavior. In order to build these models, high-throughput techniques are required. Next-generation interaction screening (NGIS) protocols that combine yeast two-hybrid (Y2H) with deep sequencing are promising approaches to generate interactome networks in any organism. However, challenges remain to mining reliable information from these screens and thus, limit its broader implementation. Here, we present a computational framework, designated Y2H-SCORES, for analyzing high-throughput Y2H screens. Y2H-SCORES considers key aspects of NGIS experimental design and important characteristics of the resulting data that distinguish it from RNA-seq expression datasets. Three quantitative ranking scores were implemented to identify interacting partners, comprising: 1) significant enrichment under selection for positive interactions, 2) degree of interaction specificity among multi-bait comparisons, and 3) selection of in-frame interactors. Using simulation and an empirical dataset, we provide a quantitative assessment to predict interacting partners under a wide range of experimental scenarios, facilitating independent confirmation by one-to-one bait-prey tests. Simulation of Y2H-NGIS enabled us to identify conditions that maximize detection of true interactors, which can be achieved with protocols such as prey library normalization, maintenance of larger culture volumes and replication of experimental treatments. Y2H-SCORES can be implemented in different yeast-based interaction screenings, with an equivalent or superior performance than existing methods. Proof-of-concept was demonstrated by discovery and validation of novel interactions between the barley nucleotide-binding leucine-rich repeat (NLR) immune receptor MLA6, and fourteen proteins, including those that function in signaling, transcriptional regulation, and intracellular trafficking.


Model
We used a Galton-Watson (GW) branching process to model yeast growth in each condition c ∈ {S, N }. In this presentation of the model, we drop the index c from the notation for simplicity. The rth replicate culture in the presence of bait i starts with M ir (0) = M 0 = 3.84 × 10 9 total yeast, and is grown for a potentially random number of T ir generations until the exponential growth phase ends. While the population size M ir (T ir ) at the end of the experiment will be about 7.5 × 10 10 , there is enough variation in this number that we do not consider it necessary to condition on its value. Let X ikr (t) be the number of yeast containing prey k at generation t. We assume X ikr (t) follows a simple Galton-Watson branching process, X ikr (t) = X ikr (t − 1) + δ ktr , where δ ktr ∼ Bin(X ikr (t − 1), e ik ) and e ik is the "fitness" of prey k in the given condition with bait i. We will generally assume each prey is experiencing differential growth rates e ik because of selection, but the model also applies to non-selection conditions, where we assume all yeast grow at the same rate e ik = e N .
By the independence of yeast during exponential growth, if the initial number of prey k is X ikr ( Given the true proportion q ik of prey k in the prey library, the initial number of prey k yeast, is a binomial random variable, where M 0 is the initial number of diploid yeast cells starting the culture. By the Laws of Total Expectation and Total Variance, we have the unconditional expectation and variance are At the end of the experiment (selection or non-selection), at generation T ir , we do not observe X ikr (T ir ) directly. Instead, we observe read counts from a Negative Binomial distribution with mean and variance where L ir ≈ Vir Mir(Tir) is a scaling factor (also called "size factor") accounting for sequencing depth V ir and the population size M ir (T ir ) at generation T ir . Parameter φ ik ≥ 0 is an overdispersion parameter that accounts for extra variation not already explained by the randomness in the initial prey count M ikr and the branching process. We treat diploid enrichment and the second round of selection as deterministic in the model ( Fig  5), but either may cause overdispersion relative to our stochastic growth model. Possible overdispersion is accommodated by using a NB observation model.
The assumption L ir = Vir Mir(Tir) is good if PCR amplification plays a negligible role in the sampling, meaning read depth is much smaller than the number of yeast cells sampled and there is no bias in the amplification. The unconditional mean and variance are therefore

Estimation of parameters
The proportion of true interactors in the library varies with bait, but is roughly Unif(0.0004, 0.001). This distribution is based on the data of Pashova et al. (2016) who confirmed 8 out of~15000 preys to be true interactors. Our experiments also showed a similar trend, having confirmed interactions between 1 and 25 in a~36000 prey population.
Observed size factors were computed asL icr = vicr Micr(Ticr) for the rth replicate of bait i under condition c. Here, v icr = K k=1 z ikcr (T icr ) is the observed coverage and M icr (T icr ) is the yeast population size at the end of the experiment. For this calculation, we assumed the rough estimate M icr (T icr ) = 7.5 × 10 10 , which is constant across baits, conditions and replicates.

No selection
Under no selection, each prey should replicate at the same rate regardless of bait, and we can drop the bait index i. Furthermore, we assume e N = 1 under these ideal conditions, so every yeast replicates at every generation. Because the initial population size M iN r (0) = M 0 = 3.84 × 10 9 is large and all yeast are actively replicating, the total population size M iN r (t) = M N (t) is not only independent of bait i, but also effectively deterministic and thus independent of replicate r, even if the individual number M ikN r (t) of some prey k is notably stochastic and not constant with bait i and replicate r. Since the experiment is stopped based on the total population size, the number of elapsed generations T iN r = T N is also constant across baits and replicates in the non-selection experiments. Under the branching process formulation, after t generations, the population size is expected to reach Given information about the initial population size M 0 and the final population size M N (T N ), we can estimate the number of elapsed generations as which we round to T N = 4 in practice.
Furthermore, we expect the proportion of prey k in the population q ikN r (t) at generation t to satisfy and the expected initial proportion E [q ikN r (0)] = q k is determined by the prey library, independent of replicate r, bait i and experimental condition (S or N ). We have observed z ikN r (T N ) reads of prey k in the presence of the rth replicate of bait i in the absence of selection. We can use the method of moments to estimateq the average proportion of prey k observed across all baits and replicates in the non-selection condition.
We can also use the method of moments to estimate overdispersion parameters φ kN for each prey. Estimating equation

Selection
By experimental design, we know the initial number of cells M iSr (0) = M 0 = 3.84 × 10 9 in the selection experiments, which do not vary by bait i or replicate r. Let q ikSr (0) be the initial proportion of prey k at generation t = 0 in the selection experiment with bait i. Since both non-selected and selected growth were initialized in the same way and non-selected growth does not change the prey proportions, we can assume the initial prey proportion q ikSr (0) =q k , whereq k were estimated from the no selection experiments. Then, the initial number of prey k yeast in the selection replicate r against bait i, is a binomial random variable.
Culture growth under selection depends on the bait and because it is so severely inhibited, becomes stochastic, so the time to saturation is observed to vary across baits and replicates. We estimated the number of generations T iSr = h iSr /g T per selection experiment assuming a constant generation time g T = h N /T N , calculated from the observed culture time h N = 18h of the non-selected samples, which was constant across experiments, and the observed culture times h iSr of selected samples, varying across baits and replicates.
By the end of the experiment, after T iSr generations of growth, the theory developed earlier yields unconditional mean and variance We used the method of moments to estimate e ik from the observed counts z ikSr (T iSr ) as which yieldsê ik as the root of equation where once again we make the assumption that M iSr (T iSr ) ≈ 7.5 × 10 10 at the end of the experiment, and thus is constant across baits and replicates. We then estimatedê ik using the Newton-Raphson method implemented by the uniroot function in R. To assure a positive estimate, we ultimately set As in the non-selection experiments, we can estimate the overdispersion parameters φ ikS aŝ

Fusion data
We expect some fraction u ikc of the Z ikcr (T icr ) observed reads to be fusion reads F ikcr . The factors determining u ikc include the placement of the PCR primers, the read length and coverage, the library and bait in selective conditions. It is difficult to model this process, but in a high throughput experiment, there are enough prey to model the distribution empirically. After verifying that the fusion read fraction was stable across baits and replicates in non-selection conditions, we simply estimatedû kN = . In selected conditions, the fusion read fraction was stable across replicates, but not baits, so we estimated . The resulting empirical distributions are shown in Figure S4.
If the number of in-frame fusion reads is given by Y ikcr , some proportion π ikc = Y ikcr F ikcr of the fusion reads will be in-frame. Again, we observed little variation across baits and replicates in non-selection conditions and little variation across replicates in selection conditions. For the non-selected condition, we computed the observed proportionsπ kN = 1 i,r 1 i,r y ikN r f ikN r for all prey with at least one fusion read. We used the densityπ kN to sample π kN during the simulation. Similarly, for the selected condition we calculated π ikS = 1 r 1 r y ikSr f ikSr and used it to sample π ikS .
For selection conditions, the in-frame proportion π ikS will further depend on the bait and the type of prey, true interactor, non-interactor, or auto-active/non-specific interactor. Since we do not know the true interactor status of each prey, we set true interactors π ikS ∼π ikS π ikS ∈π ikS :π ikS|0.95 < π ikS <π ikS , which matches the right peak of the observed in-frame proportions in selected conditions (Fig 2 and Fig  S4D). In the case of non-interactors and auto-active/non-specific interactors, we sampled from this density without restiction. S4 Fig shows these distributions depicting a frame preference during selection, which points to a higher efficiency of a specific prey frame during selection. In the case of true interactors we asume this frame should be in-frame to have a biological meaning.

Simulation algorithm
Our simulation relies on parameters estimated from real data, namely estimates In addition, the simulation takes several parameters as input: the proportion of auto-active/non-specific prey p s , the fitness threshold e t , the number of simulated baits I sim , the total number of prey n p , the initial selected population size and the initial non-selected population size M iSr (0) = M N (0) = M 0 , and the final population size M N (T N ), assumed to be approximately equal to the final selected population sizes M iSr (T iSr ). The number T N of generations elapsed in non-selected conditions is implied by these choices. True interactors are assumed to have fitness e ik ≥ e t , while non-interactors have e ik < e t . A proportion p s of auto-active/non-specific interactors in the sample, which autoactivate in the presence of any bait, or interact with multiple baits. Their e ik are selected from the top 10% ofê ik < e t in Q S .
Select n i ∼ Unif(0.0004, 0.001) · n p true interactors and n s ∼ p s (n p − n i ) auto-active/non-specific interactors from the non-interactors. Compute the 99th percentile e 0.99t of all fitness parametersê ik < e t . Compute In what follows, GW(M, e, t) is the Galton-Walton random branching process initialized with M particles, replicating at rate e for t generations. For the fusion count simulation we calculate the 95th percentile π ikS|0.95 of all the in-frame proportion for selected condition π ikS|0.95 <π ikS .
For each bait 1 ≤ j ≤ I sim : • For each replicate r, -select L jSr , L jN r from L with replacement, and select T jSr from T S with replacement.
• For each prey l in the non-selected condition: -If j = 1, this is the first prey, we select parameters that will be shared by subsequent preys: * Select (q l , φ lN ) from subset Q N with replacement unless we want to simulation low abundance interactors, in which case truncate q l above the minumum values sampled during the simulation ∼ 1 × 10 −8 if l is a true interactor, with φ lN = 0. * If we want to simulate high overdispersion then we sample φ lN from φ kN ∈ Q N :φ kN |0.9 <φ kN ⊂ Q N * Sample proportion of fusion reads u lN from U N . * Sample proportion of in-frame reads π lN from P N .
-For each replicate r: . * Simulate number of fusion reads F jlN r ∼ Bin (Z jlN r (T N ), u lN ). * Simulate number of in-frame fusion reads Y jlN r ∼ Bin (F jlN r , π lN ).
• For each prey l in the selected condition: -Sample proportion of fusion reads u jlS from U S .
-Else if it is a non-interactor and j = 1, * Select (e jl , φ jlS ) from subset ê ik ,φ ikS ∈ Q S :ê ik ≤ e t ⊂ Q S and set k to the prey k selected. * Sample π jlS from P S .
-Else if it is a non-interactor or a true interactor and j > 1, * Select (e jl , φ jlS ) from subset ê ik ,φ ikS ∈ Q S : k = k . * Sample π jlS from P S .