Fig 1.
A. In a phage display experiment, an initial library containing ∼ 105 variants, each in ∼ 106 copies (here illustrated with 3 variants in 2 copies) is incubated in the presence of DNA hairpins (in black) coupled to magnetic beads (in orange). Antibodies are selected in proportion to their binding probability. The input and output populations are sampled and sequenced to provide data-sets of ∼ 105 sequences each. B. We selected the same initial library against four different combinations of ligands: two different DNA hairpins coupled to magnetic beads, presented either alone or in combination, and naked magnetic beads. We refer to these four combinations as “Black”, “Blue”, “Mix” and “Beads” complexes. For the Black, Blue, and Mix complexes, we made two successive rounds of selection. The 10 boxes at the tip of the arrows indicate the 10 sequencing datasets thus produced to feed our model, in addition to the sequencing dataset from the initial library.
Fig 2.
The model predicts accurately the evolution of sequence variants abundances in response to multiple selective pressures.
We considered different tasks of increasing difficulty, depending on the training set used: A. Model trained on the experiments with Black, Blue complexes, and empty Beads, and prediction evaluated with a mixture of the Black and Blue complexes; B. Model trained on experiments with a mixture of Black and Blue complexes, Blue complexes only, and naked Beads, with predictions evaluated on the experiment with Black complexes only; C. Model trained on experiments with Blue complexes only, and predictions evaluated on experiments with naked Beads; D. Model trained on experiments with a mixture of Black and Blue complexes and naked Beads, and predictions evaluated on experiments with Black complexes only. The panels show scatter plots of the observed (x-axis) vs. predicted sequence frequencies (y-axis), with the initial library abundances shown in gray for comparison. The Pearson correlation between empirical enrichments and the model-predicted enrichments for each task are given in the legend and in Table B in S1 Text. In all cases p-values (from Student’s test) are < 10−90. See SI Mathematical supplement for details about model training.
Fig 3.
Design and validation of antibodies with prescribed specificity.
A. Model-based energy plot where each sequence s is represented as a circle with coordinates (, with w1 representing the binding mode associated with the Black hairpin and w2 with the Blue hairpin. Sequences predicted to be specific to the Blue hairpin, specific to the Black hairpin, or cross-specific to the two hairpins are respectively highlighted in blue, black, and purple. We selected for experimental validation all the colored sequences that are not present in the training set. B. Experiment-based enrichment plot of the selected sequences where each sequence s is represented as a circle with coordinates (log ϵsBlack, log ϵsBlue), with ϵsBlack representing the enrichment against the Black complex and ϵsBlue against the Blue complex. Sequences with high enrichment in one experiment and low enrichment in the other are ligand-specific, those with high enrichment in both are cross-specific, and low-enrichment sequences are non-binders (false positives). We assess our computational approach’s effectiveness by calculating the percentage of designed sequences falling within the correct region. Cross-specific designed antibodies achieve a 45% true positive rate, while Black and Blue-specific binders yield lower percentages (19% and 8%, respectively), reflecting the capacity of our approach to design antibodies with desired properties despite the challenges arising from the close similarity of the two ligands (% in parenthesis indicate the total fraction of sequences in each quadrant; see also Fig I in S1 Text for the choice of the thresholds and Table C in S1 Text for p-values).