Funneling modulatory peptide design with generative models: Discovery and characterization of disruptors of calcineurin protein-protein interactions

doi:10.1371/journal.pcbi.1010874

Fig 1.

Overview of calcineurin-substrate complexes.

(a) Structure of calcineurin bound to representative SLiM-containing peptides (pdb codes: 2p6b [51], 5sve [52]). calcineurin is shown in molecular surface representation, and is colored by propensity to bind disordered regions (from white = low to dark red = high), based on ScanNet [47,48]. The catalytic site is colored in blue. Both the catalytic and the regulatory (circled in green) domains are shown. PxIxIT and LxVP-containing peptides are shown in stick representation (resp. in magenta and yellow). (b) Sequence alignment of the PxIxIT short linear motifs that bind calcineurin.

More »

Expand

Fig 2.

Overview of the protocol for design of peptide inhibitors of a target PPI.

The protein substrates of the target enzyme are first identified from previous experiments together with their binding fragment. Additional interacting orthologs are identified by homology search, and the corresponding binding regions are extracted and aligned. A sequence generative model (SGM) is trained to generate a library of candidate peptides. The latter are screened for affinity by structural modeling and high-throughput binding assay. The best candidates are selected for further low-throughput experimental characterization.

More »

Expand

Fig 3.

Generative modeling of PxIxIT binding motifs.

(a) Schematic view of the generative approach. A “smooth” probability distribution over the whole sequence space is learnt from a limited number of samples. Unseen sequences with high probability are potential novel binders, whereas regions with low probability are likely non-functional proteins. (b) Graphical depiction of the cRBM model, the parametric form chosen. The visible layer corresponds to the aligned sequence; each visible unit contributes a site-specific term g_i(s_i) to the log-likelihood. The hidden (representation) layer corresponds to unobserved hidden units, each of which contributes an additional term to the log-likelihood function, defined as a linear projection through a sparse tensor followed by a trainable, strictly convex non-linearity. (c,d) cRBM-predicted mutational landscapes for the NFATc2 and AKAP79 peptides. Red, white and blue entries correspond respectively to beneficial, neutral and deleterious mutations. (e) Comparison between cRBM-predicted mutational landscapes and deep mutational scans of change in binding affinity measured by Nguyen et al. Four DMS were performed taking as wild type the PVIVIT, PKIVIT, NFATc2 and AKAP79 peptides. Spearman correlation coefficients are annotated. (f,g,h) Selected examples of sequence motifs learnt by the cRBM (f), together with their activity distribution (g) and top-activating sequences. Motif 1 is gene-specific, whereas motifs 2 and 3 are shared by multiple genes.

More »

Expand

Fig 4.

Medium-throughput filtering by structural modeling and microarray screening.

(a) Depiction of the structural modeling protocol: after alignment to the known PxIxIT binding site, an efficient flexible backbone structure refinement algorithm is applied to estimate the docking energy. (b) Histogram of docking energy scores for the generated peptides and selected controls (lower is better; normalized to zero mean and unit variance). (c) Coefficients of the equivalent single-site model fitted by sparse linear regression, shown in weight logo representation. At each position, the height of the letter is proportional to the corresponding coefficient of the regression; residues with large negative coefficients (e.g. hydrophobic residues at the motif locations) contribute favorably to the docking score. Colors indicate physical property (black = hydrophobic, red = negatively charged, etc.). (d) Per-gene distribution of docking scores across natural fragments (lower is better). The docking protocol qualitatively discriminates between obligate and transient interactions. (e) Overview of the microarray screening. Peptides are printed on the chip (two circles per peptide). After pouring of Cn and subsequent washing, fluorescent-tagged, a Cn-targeting antibody is overlaid and an image is taken. Fluorescent spots indicate strong Cn binders. (f) Scatter plot of the sequence likelihood (normalized by length, higher is better) against fluorescence level (higher is better, see Methods for details of the data analysis).

More »

Expand

Fig 5.

FP competition assay of selected peptides for the binding of Cn to PVIVIT peptide.

Variable concentrations of each selected peptide were incubated with Cn bound to the FITC-labeled PVIVIT peptide. Polarization levels were read and normalized values were fitted to a single site model. The curve shows the bound fraction of Cn to PVIVIT vs. the logarithmic concentration of the peptides. Red—natural peptides, purple—designed peptides, black—PVIVIT.

More »

Expand

Table 1.

List of peptide sequences characterized by competitive fluorescence polarization assay.

Abbreviations: IC50: half maximal inhibitory concentration; NB: No binding; low T: low temperature sampling.

More »

Expand