Automated and modular protein binder design with BinderFlow

doi:10.1371/journal.pcbi.1013747

Fig 1.

Binder design pipeline and BinderFlow architecture.

(A) Schematic of the BinderFlow pipeline: 1. The user defines a hotspot in a surface of interest. 2. The target structure gets trimmed to increase computational efficiency. 3. RFD.sh produces protein backbones of a specified length complementary in shape to the target. 4. align_filtering.py filters out suboptimal backbones that might be problematic for expression, such as long helices or isolated hairpins. 5. pMPNN.sh assigns a sequence of amino acids to each backbone. 6. scoring.sh predicts the binder-target complex using AF2IG, collects the relevant AlphaFold2 scores and uses PyRosetta for measuring a set of relevant parameters for interactions. 7. The process is monitored in real time using the BFmonitor.py web-based tool. (B) Architecture of BinderFlow and comparison with a linear workflow. In a typical workflow each step of the pipeline requires manual handling of input and output files, and manual inspection to remove obvious suboptimal candidate backbones. Moreover, each step is run as an independent job, hampering parallelisation. Meanwhile, BinderFlow distributes batches of end-to-end predictions as independent jobs, facilitating parallelisation of instances across GPUs and HPC cluster nodes. It includes automatic filtering of suboptimal backbones and real-time monitoring of the process, allowing for stopping the campaign once a suitable number of hits is achieved.

More »

Expand

Fig 2.

The BFmonitor web-based dashboard.

BFmonitor provides three tools to follow a design campaign in real. (A) A scatter plot that updates in real time, providing information about any two of the variables calculated by scoring.sh. Each data point corresponds to an individual binder, and only those that fulfil all in silico criteria get colored. (B) A radar plot enables pairwise comparison of all scores calculated for any two binders, providing visual information on how each metric compares between the two candidates. (C) An interactive 3D viewer allows to inspect the backbones of in silico hits (yellow cartoon) interactively in the context of the target protein (grey surface) in real time.

More »

Expand

Fig 3.

Alternative pipelines for binder refinement strategies: FastRelax and Sequence Diversity.

(A) Comparison between the standard BinderFlow pipeline and proposed alternative strategies for binder refinement. The dashed arrow indicates skipping the backbone generation step. (B) Comparison of PAE interaction and binder pLDDT score distributions resulting from the standard BinderFlow pipeline, Partial Diffusion and Sequence Diversity. The grey, dashed lines indicate the pae_interaction and plddt_binder scores of the design used to initialize Partial Diffusion and Sequence Diversity runs, obtained from the standard BinderFlow pipeline. The black, dotted lines indicate typical thresholds to consider a binder a hit, PAE_interaction < 10 and pLDDT_binder > 80. (C) Correlation between PAE_interaction scores and Shape Complementarity scores for designs resulting from the Partial Diffusion run. (D) Correlation between PAE_interaction scores and Shape Complementarity scores for designs resulting from the Sequence Diversity run. (E) Structural alignments between candidates selected for in silico refinement (yellow) and example hits from Partial Diffusion (purple) and Sequence Diversity (red). The target, PDL1, is shown as a grey cartoon or as a silhouette. The first row displays the initial structure of the candidates, while the second and third rows show backbone structural variation for each candidate after the respective refinement strategy. (F) Predictions of the input reference (yellow, Ref) and the top hit according to the PAE interaction (red, Top), with PDL1 shown as an electrostatic surface. (G) _Residue identity changes introduced during Sequence Diversity shown on a sequence alignment of the interacting helix (residues 50–105) between Ref and Top. Two residue changes are highlighted in lime (S > A) and grey (A > R).

More »

Expand

Fig 4.

Benchmarking of BinderFlow.

(A) Comparison of time spent per step using a linear binder design pipeline and the BinderFlow architecture. Left: normalized times and proportion of the run length allocated to each step. Right: comparison of the total wall times required to obtain 24 hits. *Align & filter times are too small to be noticed in the bar plot (S1 Fig). (B) Comparison of wall time per pipeline step between the linear approach (Lin) and BinderFlow (BF). The same campaigns were executed on RTX4090 and RTX2080 GPUs. The number of designs corresponding to each step is indicated on the x axis. “Aligning Filtering” and “PyRosetta” steps are only performed by BinderFlow.

More »

Expand