RosettaAntibodyDesign (RAbD): A general framework for computational antibody design

A structural-bioinformatics-based computational methodology and framework have been developed for the design of antibodies to targets of interest. RosettaAntibodyDesign (RAbD) samples the diverse sequence, structure, and binding space of an antibody to an antigen in highly customizable protocols for the design of antibodies in a broad range of applications. The program samples antibody sequences and structures by grafting structures from a widely accepted set of the canonical clusters of CDRs (North et al., J. Mol. Biol., 406:228–256, 2011). It then performs sequence design according to amino acid sequence profiles of each cluster, and samples CDR backbones using a flexible-backbone design protocol incorporating cluster-based CDR constraints. Starting from an existing experimental or computationally modeled antigen-antibody structure, RAbD can be used to redesign a single CDR or multiple CDRs with loops of different length, conformation, and sequence. We rigorously benchmarked RAbD on a set of 60 diverse antibody–antigen complexes, using two design strategies—optimizing total Rosetta energy and optimizing interface energy alone. We utilized two novel metrics for measuring success in computational protein design. The design risk ratio (DRR) is equal to the frequency of recovery of native CDR lengths and clusters divided by the frequency of sampling of those features during the Monte Carlo design procedure. Ratios greater than 1.0 indicate that the design process is picking out the native more frequently than expected from their sampled rate. We achieved DRRs for the non-H3 CDRs of between 2.4 and 4.0. The antigen risk ratio (ARR) is the ratio of frequencies of the native amino acid types, CDR lengths, and clusters in the output decoys for simulations performed in the presence and absence of the antigen. For CDRs, we achieved cluster ARRs as high as 2.5 for L1 and 1.5 for H2. For sequence design simulations without CDR grafting, the overall recovery for the native amino acid types for residues that contact the antigen in the native structures was 72% in simulations performed in the presence of the antigen and 48% in simulations performed without the antigen, for an ARR of 1.5. For the non-contacting residues, the ARR was 1.08. This shows that the sequence profiles are able to maintain the amino acid types of these conserved, buried sites, while recovery of the exposed, contacting residues requires the presence of the antigen-antibody interface. We tested RAbD experimentally on both a lambda and kappa antibody–antigen complex, successfully improving their affinities 10 to 50 fold by replacing individual CDRs of the native antibody with new CDR lengths and clusters.


Rosetta Commands
For the benchmarking set, full commands and CDR Instruction File contents are given below. Note that some options are the default and do not need to be specified, but are given in the file explicitly anyway. To prepare the structures for design, they are relaxed with the Rosetta force field using the FastRelax protocol with these commands:

Dihedral, Epitope, and Paratope Constraints
Several constraint types are used by the Antibody Design framework to limit unproductive structural perturbations of the CDR regions and the relative orientation of the antibody-antigen interface while docking in the program. There are many constraint types with associated function types implemented in Rosetta. These constraints are evaluated via terms added to the Rosetta energy function. The Rosetta energy minimizer (which optimizes the conformation of the structure by finding the local energy minimum) can use these constraints to find optimal values that help to satisfy all the energy terms including the constraints. Within the Rosetta Antibody Design framework, the weight of these constraints can be set from command-line options. We can set parameters that govern whether these constraints are used throughout the protocol (where they also act as structural filters) or only in certain situations like minimization or docking (where they act only to guide the structure to an optimum conformation that satisfies the constraints).
The set of general Antibody constraint movers that were implemented consist of the CDRDihedralConstraintMover, ParatopeSiteConstraintMover, and the ParatopeEpitopeSiteConstraintMover. These Movers (a mover applies some change to a Pose or structure [1]) can be fine-tuned for specific design strategies using a number of user-accessible options and RosettaScripts [2].
The CDRDihedralConstraintMover places Circular Harmonic constraints on each φ and ψ dihedral angle of a given CDR as cluster-specific and general-use constraints. The equation for the Circular Harmonic constraint is as follows where x0 is the starting dihedral angle, x is the changed dihedral angle, and σ is the standard deviation of x: Constraints are added to help Rosetta maintain a particular loop structure during any backbone optimization. Dihedral constraints are used instead of coordinate constraints (which try to keep each atom at a particular Cartesian coordinate) in order to allow more natural, hinge-like motion of the CDR loops. Users of the protocol who wish to design antibodies without these constraints can set the weight of the dihedral constraint to zero via a command-line option.
The cluster-specific constraints have the value of x0 and standard deviation for each backbone φ and ψ dihedral angle at the angle mean and standard deviation of the members of the cluster. These constraints are output by PyIgClassify using a high-quality set of non-redundant data. The default behavior of the CDRDihedralConstraintMover is to add these constraints for a particular CDR only if there are enough members in the cluster to have reliable data. If data are scarce, then general dihedral constraints are added, with means at the current angles and a standard deviation that was originally compiled by taking the mean of the standard deviations of all CDR clusters. These angles can be set via command-line options. By default, we use these general dihedral constraints for H3, since it does not cluster well.
For the cluster-specific constraints (and other places in the protocol), we generally filter out outliers in the data as described below. This can be turned off through the use of an option that will load a different set of constraints compiled with structures that are not filtered for outliers. This can be useful if using outliers elsewhere in the protocol.
SiteConstraints are a set of atom-pair constraints that evaluate whether a residue interacts with some other chain or region --roughly, that it is (or is not) in a binding site. More specifically, if we have a SiteConstraint on a particular residue, that SiteConstraint consists of a set of distance constraints on the Cα atom from that residue to the Cα atom of all other residues in a set, typically the set being specific residues on another chain or chains. After each constraint is evaluated, only the constraint giving the lowest score is used as the SiteConstraint energy for that residue. These SiteConstraints use a Flat Harmonic function by default: Values of the standard deviation are set at 1 Å, while the tolerance is set at the interface distance of the protocol (8 Å default), which means that there is no penalty for the SiteConstraint except at distances greater than this distance. The ParatopeSiteConstraintMover adds SiteConstraints between each CDR residue and the antigen. This helps to keep the CDR paratope at the interface during docking; without it, docking can use the whole of the antibody surface instead of just the paratope and this can be seen in resulting models. These paratope constraints are added automatically in the program, and the CDRs of the paratope can be controlled through an option.
The ParatopeEpitopeSiteConstraintMover adds SiteConstraints from the epitope residues to the paratope residues and from the paratope to the epitope. Target epitope residues can be specified via command-line or automatically detected via the set interface distance. These constraints are off by default, but if they are enabled, they are set instead of the ParatopeSiteConstraintMover and help to keep the paratope and the epitope in contact during design when the docking component of the algorithm is enabled.

Outlier Control
In our original clustering of the antibody CDR structures, an affinity propagation clustering technique was used on a carefully curated dataset of high-resolution structures and few outliers [3]. In order to match new CDR structures with a proper cluster from that original clustering, we use the dihedral angle metric originally used for the affinity propagation, but measure it against the centroid (representative structure) of all clusters of the same length. The cluster with the lowest dihedral distance is assigned as the cluster for that structure [4].
While this is useful to assign CDRs of known length to a particular cluster, many structures become outliers of the particular cluster and would have formed their own cluster if clustering was repeated (Kelow and Dunbrack, in preparation). To optimize our CDR profiles, constraints, and other aspects of the design program for an updated database, we needed to quantitatively define what would be considered an outlier.
We used both the dihedral distance metric and RMSD of all backbone atoms to help define an outlier. In order to visualize the breadth of each cluster, we generated PyMol sessions of each of the clusters using python and PyRosetta [5] by aligning the CDRs to their cluster center either using all backbone heavy atoms or by aligning only the stem region (three framework residues on either side of the CDR loop). We also generated plots of dihedral distance versus RMSD and length versus RMSD for both alignment types and for each length and cluster, where high RMSD can be seen even with lower dihedral distance, especially when only the stem was aligned. We then used these plots and the PyMol visualizations for each CDR cluster to define two outlier definitionsone conservative and one liberal (used as the default). We calculate the RMSD for these definitions through the full CDR alignments as the stem alignment can result in very high RMSD for low dihedral angle distances, attributable to hinge-like motions in the CDR:

Å) then CDR is Outlier
Outlier control is handled as an option in the Antibody Design framework, where each set of data used by the framework is first generated with and without outliers and using both the liberal and conservative definition of an outlier. For smaller clusters or H3 (which only cluster well at lengths ≤ 9), outliers may be useful in the design search and an option will switch all aspects of the framework to include outliers for sequence and dihedral constraint statistics as well as graft sets. By default, outliers are left out, but used for H3 since it does not cluster well.

Antibody Feature Analysis
Three FeatureReporters were developed as a part of the Rosetta Feature Reporter framework [6][7][8] to aid in the modeling and design of antibody structures. Each of these can be used through the RosettaScript framework on a list of structures. The physical attributes reported are output to a relational database, such as SQLITE3, across multiple tables for further analysis. These databases can easily be converted into CSV files or read by available packages in R and Python.
The CDRClusterFeature reporter identifies all North/Dunbrack CDR clusters in an antibody to the closest cluster centroid using the same metric described in PyIgClassify as well as information pertaining to the dihedral distances of the CDRs [4]. It is the primary FeatureReporter used in benchmarking length and cluster recovery. The database tables output by the CDRClusterFeatures are detailed in Table D in S1 Supporting Information.
The InterfaceFeature Reporter, detailed in Table E in S1 Supporting Information, analyzes protein-protein and protein-ligand interfaces, outputting a number of different tables and physical data. Much of the analysis is done through the Rosetta InterfaceAnalyzer [9,10] which we have updated. The InterfaceAnalyzer calculates differences in scoring (such as an estimate of the interface ΔG -the enthalpic component of the full binding free energy) by physically separating the interface components (such as antibody from antigen) and optimizing interface residue side chains -both in the complexed and separated conformations. An interface distance of 6 Å is used as the default interface distance.
Separate tables are output for the overall complex, the individual proteins in the complex, and the interface residues. The main data output by this Reporter are the estimated binding energy (ΔG) of the complex in Rosetta Energy Units (REU), the change in solvent accessible surface area upon binding (ΔSASA) using the Le Grand SASA calculation method [11], the Lawrence and Colman shape complementary of the interface (sc_value) [12], the packing quality (packstat) [13], and the number of unsaturated hydrogen bonds in the complex [10].
We added alternative SASA radius sets to Rosetta, with the standard, now-defunct radii changing from the default to 'legacy'. We implemented a variety of radius sets found in the literature and used in various structural modeling programs in which they either implicitly or explicitly include hydrogen atoms. Once a particular radius set is used, the SASA machinery will change its consideration of implicitly or explicitly including hydrogen atoms during the calculation depending on the set.
The atomic radius set with implicit hydrogens is the one used by the program Naccess, a popular program used for the calculation of SASA [14]. This set was derived by Chothia in his seminal 1976 paper [15], while explicit hydrogen radius sets include the legacy radii, the Rosetta Lennard-Jones (LJ) radii (which are mostly the same as the LJ radii from the CHARMM molecular dynamics program [16]), and the radii used by the program reduce (a program for the placement of hydrogens onto molecular models and crystal structures) [17], originating from physical data obtained from Bondi [18] and Gavezzotti [19]. The reduce radius set is now the default in Rosetta.
We implemented the AntibodyFeature Reporter, a type of InterfaceFeature Reporter specific for antibody and antibody-antigen interfaces, while outputting a number of additional metrics for antibodies and CDRs. Some of the main metrics include CDR, antibody, and paratope charge, ΔG and ΔSASA, H3 kink statistics, number of contacts, and packing angle statistics [20].
The packing angle is a measure of the relative orientation between the light and heavy antibody chains. It uses four conserved residues of each chain in the framework beta-sheets at the VL and VH interface and principal component analysis to define four centroid points and a dihedral angle for which to quantify the orientation [21].
A full list of the metrics and tables output by the AntibodyFeature Reporter can be found in Table F in S1 Supporting Information. All tables output by the InterfaceFeature Reporter are output by the AntibodyFeature Reporter for specific antibody interfaces specified where A is the antigen: LH-A, L-H, L-A, H-A.