RosettaSurf—A surface-centric computational design approach

Proteins are typically represented by discrete atomic coordinates providing an accessible framework to describe different conformations. However, in some fields proteins are more accurately represented as near-continuous surfaces, as these are imprinted with geometric (shape) and chemical (electrostatics) features of the underlying protein structure. Protein surfaces are dependent on their chemical composition and, ultimately determine protein function, acting as the interface that engages in interactions with other molecules. In the past, such representations were utilized to compare protein structures on global and local scales and have shed light on functional properties of proteins. Here we describe RosettaSurf, a surface-centric computational design protocol, that focuses on the molecular surface shape and electrostatic properties as means for protein engineering, offering a unique approach for the design of proteins and their functions. The RosettaSurf protocol combines the explicit optimization of molecular surface features with a global scoring function during the sequence design process, diverging from the typical design approaches that rely solely on an energy scoring function. With this computational approach, we attempt to address a fundamental problem in protein design related to the design of functional sites in proteins, even when structurally similar templates are absent in the characterized structural repertoire. Surface-centric design exploits the premise that molecular surfaces are, to a certain extent, independent of the underlying sequence and backbone configuration, meaning that different sequences in different proteins may present similar surfaces. We benchmarked RosettaSurf on various sequence recovery datasets and showcased its design capabilities by generating epitope mimics that were biochemically validated. Overall, our results indicate that the explicit optimization of surface features may lead to new routes for the design of functional proteins.


Introduction
Proteins are key components in living cells, performing many functions that commonly rely on physical interactions between molecules. The molecular surface arising from the threedimensional arrangement of the many atoms that compose a protein is determinant for protein function and is therefore crucial for biological processes [1]. While discrete atomic-level protein representations have been invaluable for our understanding of protein function, nearcontinuous surface-based representations offer the opportunity to study protein structures using a different subset of features (e.g. electrostatic potentials, geometry).
In 1971, Lee and Richards introduced the concept of solvent-accessible surfaces, which in practice are generated by rolling a probe approximating a solvent molecule over the protein atoms [2]. The molecular surface, often denoted as solvent-excluded surface or Connolly surface [3][4][5], consists of the surface that can be directly contacted by the probe and the reentrant surface which smooths over gaps between the atoms that were not accessible to the probe sphere [4,6,7]. Numerical representations of surfaces have also been developed, ranging from dot surfaces, to voxel representations and graphs [1,[8][9][10][11][12]. These representations allow the mapping of molecular and geometric properties onto the generated surface, including physicochemical properties (such as electrostatics and hydrophobicity), and geometrical features (e.g. protrusions or cavities) [6,8,[13][14][15].
Intuitively, the molecular surface forms the boundary of the protein and its surroundings, thus acting as the interface that engages in interactions with other molecules. The study of protein structures as near-continuous molecular surfaces is therefore important to understand structural and functional aspects of proteins, which may not be fully captured by a discrete atomic representation [13,16]. A widely studied category of features of molecular surfaces is their chemistry, in particular electrostatic potentials, and their implications for protein function. Most notably, such representations have been used to study various types of protein-substrate interactions [17,18]. Other important metrics used to study molecular surfaces are their shape-derived properties, commonly focusing on shape complementarity (S C ) or shape similarity (S S ). Shape complementarity is frequent in the study of molecular recognition, e.g. protein-protein or protein-ligand interactions. In particular, protein-protein interactions (PPIs) have been extensively studied in terms of complementarity, showing that protein-protein interfaces are often highly complementary, both in shape and in charge [19,20]. On the other hand, shape similarity has been used to globally and locally compare biomolecules, aiding the functional annotation of proteins that show structural similarities but lack detectable sequence homology [11,21].
An important extension of such successful applications in analysis and prediction tasks using surface-based representations is that of protein design [6], where the objective is to guide the sequence search process to optimize specified surface features.
Here we present a new, surface-centric computational design strategy, termed RosettaSurf, that uses a description of the molecular surface shape and electrostatic properties as objective function for scoring optimization in protein design. Working at the surface-level of protein structures offers a unique approach for the design of proteins and their functions, as molecular surfaces are, to some extent, independent of the underlying sequence [11,14]. This means that different sequences in different proteins may present similar surfaces [11,14]-a premise that drives our proposed methodological approach. In contrast, the large majority of computational design workflows entail a discrete atomistic description of the proteins and sequence design is typically performed on the atom arrangements and interactions; i.e. by sampling different amino acid side chains and adjusting atoms in the protein structure to optimize a given energy function.
RosettaSurf is a protocol implemented in the Rosetta software package [33] and we assessed its performance with several benchmark tests, demonstrating the protocol's capabilities through the recovery of protein interfaces and its application to functional protein design.

RosettaSurf framework
The RosettaSurf protocol operates at the solvent-excluded surface level of a protein structure, and its core operation is the comparison between surfaces. To compare molecular surfaces, we defined a score that quantifies the similarity between two surfaces considering both shape and electrostatic features, and incorporated it within the Rosetta software package. The molecular surface is generated from the three-dimensional atomic coordinates of a protein [3,4] and stored as a discrete point cloud (Fig 1). Representation of the surface as a point cloud allows the featurization of the points with geometrical and chemical descriptors (Fig 1) and enables rapid calculations of the surface similarity score. We refer to the mutable surface as target while the reference surface used for comparisons is denoted as reference.
To describe the surface geometry, we developed a descriptor based on concepts introduced by Lawrence and Colman [19] that quantifies shape relationships of two surfaces relying on normal vector comparisons that are derived at each point of the surface (Fig 1). For all pairs of closest points of the target and reference surface the geometric similarity is evaluated. Averaging these individual similarity values yields an overall shape similarity (S S ) score that quantifies the similarity of the geometry of two given molecular surfaces. The S S score ranges from 0 (no similarity) to 1 (identical shape).
Electrostatic similarity (E S ) of two surfaces is derived from comparing their electrostatic potentials, i.e. E S is assessed by computing the correlations of the two electrostatic fields originating from the target and reference surface, respectively (Fig 1). Correlation coefficients are derived from this comparison that describe the similarity of the electrostatic fields. Values of E S range from -1 (highly similar) to +1 (highly complementarity).
To accurately capture the relative contributions of both shape and electrostatics on the similarity of molecular surfaces, we combined both scores into a single surface similarity score (Surf S ). Since we were interested in assessing similarity of surface sites that are well maintained and relevant to functional protein design, we focused on interfaces of PPIs and performed logistic regression on a dataset of 2,660 protein complexes to optimize the individual weights of the Surf S score for each component, shape and electrostatics.
The Surf S score is defined as follows: The final Surf S score combines both properties into a single score scaled from 0 to 1, where 1 represents highly similar surfaces. A detailed explanation of the computation of the individual scores and the resulting Surf S score can be found in Materials and Methods. We note that the developed surface similarity score can be straightforwardly converted to surface complementarity (S C ) by inverting the normal vectors of the reference surface. Complementarity has been typically used to evaluate PPIs and was first demonstrated by Lawrence and Coleman for molecular shapes [19].

Surface-centric protein design protocol
We developed the RosettaSurf design protocol, as part of the RosettaScript software environment [34], which utilizes the described surface scoring function during the sequence sampling stage of the design process to bias the selection of amino acids and rotamers towards a desired surface configuration in terms of geometric and electrostatic properties (S1 Fig). Thereby, it is possible to design proteins for function without relying on protein grafting but by optimizing the molecular surface, for instance, to mimic a given active site.
With the RosettaSurf design protocol we explicitly optimize molecular surface features during the protein design process. In practice, RosettaSurf performs sequence design in a way where the mutable surface of the design scaffold (target) is optimized to closely match the surface features of a reference protein (reference) (S1 Fig). To efficiently explore the sequence space during the design process, Monte Carlo simulated annealing guides the optimization of rotamers, where substitutions of residues are scored based on the resulting surface and accepted if they pass the Monte Carlo criterion that is implemented as the Surf S score.
To reduce computational time spent on rotamer sampling in a combinatorial fashion on the overall surface, we implemented a single amino acid scanning surface-centric protein design approach (RosettaSurf-site). This protocol samples amino acids individually at each position of the design surface, selects the top three rotamers according to the surface and samples those combinatorially with other designable positions, reducing the combinatorial possibilities.
In our benchmark studies we sought to compare the developed surface-centric design protocol, RosettaSurf, to state-of-the-art macromolecular design approaches implemented in Rosetta. The Rosetta energy function has been parametrized to evaluate and optimize the energy of many different aspects of molecular interactions (e.g. protein stability, proteinligand, protein-protein and protein-nucleic acids, etc.); it contains both statistical and physicsbased terms, being calibrated using a discrete atomic representation of the molecules. This type of optimization has been particularly successful for the design of novel sequences that fold into defined protein structures. However, to design proteins that display defined motifs which can perform biological functions has proven to be difficult. By focusing the design process on the areas where molecular interactions occur-the protein surfaces-novel approaches attempting to design function into proteins may represent new routes to address this problem.
Two types of metrics in surface comparison are considered: surface complementarity and surface similarity. We show how the surface similarity score captures features of individual residues and demonstrate the ability of the RosettaSurf protocol to recover amino acid sequences of native protein interfaces. Furthermore, we highlight the performance of RosettaSurf for the design of surface patches at protein interfaces.

Single amino acid recovery
To evaluate the accuracy of our design protocol, we performed a benchmark on the recovery of single amino acids in native protein interfaces. The amino acid of interest is substituted by each of the 19 amino acid identities, excluding cysteine as it can form covalent bonds in the form of disulfide bridges, thus rendering the resulting surface dependent on the interplay of two amino acids, and the surfaces of the substituted amino acids are compared to the surface of the native rotamer (Fig 2A). These calculations are performed without the knowledge of the native amino acid surface and this information is only used for the assessment of similarity upon each substitution. For each amino acid type a dataset of 100 crystal structures of protein complexes forming transient interactions was compiled, resulting in a total of 1,900 complexes (for details see Materials and Methods). The energy computed by Rosetta, Rosetta Energy Unit (REU), serves as a baseline for comparison to the surface-centric design strategy. We evaluated the recovery of the different amino acids for mutations made in the bound and unbound state of the protein complex, respectively ( Fig 2B). The recovery of an amino acid is deemed successful when the native amino acid has the best score among all 19 amino acids and is uniquely identifiable. In this benchmark we evaluated the performance of shape similarity, electrostatic similarity and both combined in a surface similarity score. The surface similarity measurement is highly accurate in identifying native amino acids in the bound state of the protein complexes, showing consistently higher sequence recovery rates than Rosetta energy (Fig 2B).
Incorporating the electrostatic similarity term into the surface score generally results in a boost in recovery rates over shape similarity alone, in particular for amino acids that have Single mutant discrimination using surface similarity score in protein-protein complexes. A) Surface similarity evaluation protocol for single amino acids. B) Recovery for all 19 considered amino acid types in bound (top) and unbound (bottom) complex states, evaluated with four different metrics: S S (shape similarity), REU (Rosetta energy unit), E S (electrostatic similarity), and Surf S (surface similarity). C) Average surface similarity score when performing all-against-all amino acid comparison for bound (top) and unbound (bottom) complex states. The highest mean Surf S score for every amino acid is highlighted.
The Rosetta energy function shows the best recovery rates for amino acids with unique features on their side-chains, i.e. glycine and proline. However, even for these cases, surface similarity outperforms the recovery by REU. A similar trend can be observed for the unbound complexes, although the general success of retrieving the native amino acid decreases for certain amino acids. This trend is most obvious for polar residues and large side chains (e.g. arginine, lysine, or methionine) that have access to a variety of different rotamer conformations as the lack of the binding partner allows a larger conformational space for different rotamers to be explored. For these types of amino acids, the sequence recovery rate is higher in the bound conformation as the presence of the binder restricts the possible rotamer space that can be explored, especially for highly exposed residues (S2A and S2B Fig). A similar degradation in performance can be observed for tyrosine, phenylalanine, and histidine in the unbound benchmark case. These amino acids share common surface signatures according to our score (Fig 2C), complicating their recovery which is amplified in the unbound benchmark cases.
Moreover, the surface similarity scores retrieved from an all-against-all amino acid comparison demonstrate the high accuracy by which most amino acids can be identified ( Fig 2C). Here, we computed mean surface similarity values for all substitutions tested, i.e. for each native amino acid across the 100 protein complexes we computed the similarity of all other 19 amino acids to the native one on average. The results show that each of the 19 amino acids is generally most similar to itself, demonstrating that the method can accurately distinguish between the different amino acid types. Notable exceptions are residues that share similar geometrical features, e.g. phenylalanine and histidine or tyrosine.
Close inspection of the considered surface of a single amino acid demonstrates the local precision of the surface similarity score (S3 Fig). Small off-rotamer deviations are captured on the point cloud and are specific for the mismatching surface area. These results indicate that the surface similarity score is sensitive to small differences in the comparison of two surface point clouds. Overall, these results clearly show that the developed surface similarity score can capture local differences in surfaces providing the basis for the evaluation of differences between full surface patches.

Protein interface sequence recovery
Having shown that the implemented surface similarity score is sufficiently accurate to recover individual amino acids, the following benchmark expands on recovering entire surface patches in protein interfaces. We evaluated the ability of our surface-based sequence design protocols (RosettaSurf and RosettaSurf-site) to recover natural protein interfaces as a gateway for the design of proteins endowed with biochemical function. By focusing on protein interfaces, the correct sequence does not solely depend on minimizing the Rosetta scoring function, but rather needs to represent the surface properties of the interface site. While such design scenario has limited applicability for the de novo design of protein-protein interactions, it is important for applications in the domain of immunogen design for vaccine development where the surface mimicry of known surfaces (neutralizing epitopes) is critical for the biological activity of this type of design [28,[30][31][32].
In this benchmark we considered nine protein-protein complexes forming transient interactions (for details see Materials and Methods), grouped into three categories (low complementarity, high complementarity, antibody/antigen), and evaluated the performance by assessing sequence recovery ( Fig 3A). We compared three different design approaches: All protocols operate on fixed backbones of the protein complexes that are presented in the native conformation that mediates the interaction. Consequently, the sequence recovery success will largely depend on side-chain placement in a given backbone. The surface-centric design approaches are compared to the standard FixBB Rosetta design protocol, using the ref2015 scoring function [35]. In the presented benchmarks, the Rosetta FixBB design protocol serves as baseline to assess the impact of surface-centric design on sequence recovery.
In a first step, the target protein's interface is stripped off its native sequence by mutating all interface residues to alanine. Second, the different design protocols were employed on the interface positions with FixBB using the Rosetta energy function to select mutations while

PLOS COMPUTATIONAL BIOLOGY
RosettaSurf was guided by our surface similarity scoring term ( Fig 3A). RosettaSurf was therefore provided with the native protein surface that we aimed to mimic as reference surface, thus actively optimizing towards that given native surface during the design process. With this setup it is possible to evaluate the sensitivity of the Surf S score to recover native amino acids compared to a known ground truth. This is in contrast to the single amino acid recovery benchmark where the reference surface solely served as comparison to evaluate the similarity of the native surface and the surface resulting from the mutation without affecting the amino acid change itself.
All different design protocols were employed for the bound and unbound states of the protein complexes and results are reported by their category as mean sequence recovery rate ( Fig  3A). Overall, sequence recovery rates were higher for surface-centric design pipelines while standard Rosetta sequence design showed lower recovery rates regardless of the presence of the binder (Fig 3B). For all studied complexes, RosettaSurf and RosettaSurf-site outperformed FixBB by 36-39 percentage points in the unbound and 30-31 percentage points in the bound state. As expected, sequence recovery was in general more successful in the presence of the binding partner due to the reduced number of possible side-chains and rotameric conformations. However, even in the presence of the binder, surface-centric design can be applied to improve results. FixBB demonstrated improved performance for the antibody/antigen test set, however, the surface-centric protocols still reached better results that were comparable to the other categories. Worth noting is the 100% sequence recovery success of RosettaSurf-site for the D8 protein-vv138 antibody complex. The FixBB protocol performed slightly better for low-complementary (55%) than for high-complementarity complexes (48%), while Rosetta-Surf and RosettaSurf-site resulted in similar recovery rates of~85% in both cases with the notable exception of RosettaSurf-site for low-complementarity complexes. Here, RosettaSurfsite was able to recover 92% of the sequence and for one complex of that category, the Enterotoxin G-T-cell receptor complex, even 100%.
Furthermore, we analyzed more closely outlier decoys, i.e. structures that scored high in surface similarity but showed only little sequence recovery when designing with RosettaSurf. We set a threshold of up to 50% as low recovery and a Surf S score of greater or equal to 0.9992 as this value marks the lower end of structures with high sequence recovery (> = 70%), leading to one candidate from the RSV epitope-scaffold-Motavizumab complex. When investigating which types of amino acids were common failures, we identified general geometric resemblance as likely reason. Several amino acids are frequently replaced by structurally similar residues, e.g. isoleucine by valine and lysine by arginine.
As for sequence recovery without the binder, RosettaSurf and RosettaSurf-site performed better for complexes of low shape complementarity with 81% and 80% recovery, respectively. However, sequence recovery obtained for high-complementary interfaces reached 66% for RosettaSurf and 68% for RosettaSurf-site. For antibody/antigen complexes we observed 76% for RosettaSurf and 66% for RosettaSurf-site. All these performances were substantially higher than those of FixBB, that reached 25% and 45% for high-complementary and antibody/antigen complexes, respectively.
In addition, we investigated possible reasons for structures with low sequence recovery but high surface similarity scores. We adjusted the selecting criteria to structures with < = 30% sequence recovery and > = 0.9996 Surf S score, observing similar results as for the bound benchmark. Four structures, all from the Colicin E9-IM9 complex fulfill these criteria. Again, mainly amino acids with similar geometrical features were common mismatches between recovered and native amino acids, e.g. phenylalanine, tyrosine, and histidine, aspartatic acid and asparagine, as well as valine and threonine.
Overall, similar performance of RosettaSurf and RosettaSurf-site showed that performing the combinatorial sampling of all possible rotamers simultaneously was not necessary. During the RosettaSurf-site pipeline amino acids were sampled individually at each design position and only a subset of the amino acids was used for combinatorial sampling based on the surface similarity score, resulting in similar recovery outcomes. Furthermore, despite small differences in the sequence recovery rate between the three selected categories of protein complexes, overall no strong correlation between the shape complementarity of the considered complexes and the success in sequence recovery could be observed. We considered groupings based on different metrics, namely total interface area, hydrophobic interface area, and number of recoverable positions. No correlation with sequence recovery was observed for these different properties (S4 Fig). Additionally, we grouped the complexes based on their proportion of amino acids with access to diverse rotamer conformations to the total number of residue positions considered during the recovery. A weak correlation could be observed between reduced recovery rate and higher numbers of flexible amino acids (S4 Fig), in line with the results from the single amino acid recovery benchmark (Fig 2B). Together, these results suggest that the proposed method is largely independent of the provided protein structure.
Our results have thus implications for the design of functional proteins as the high success of recovering natural protein interfaces may be promising for the design of proteins displaying defined surface properties, as is the case for immunogen design.

Computational SSM screening to improve protein binding
A common problem in protein design is the optimization of PPIs to generate high affinity protein binders. Experimental screening methods, e.g. site-saturation mutagenesis (SSM) and combinatorial libraries, provide insights into mutations that improve binding interactions but are time and resource-consuming experiments. While in silico saturation mutagenesis has been used previously to identify stabilizing mutations or improve binding affinity, these approaches were based on energy functions to identify beneficial mutations [36][37][38][39][40]. However, to our knowledge, surface metrics have not been considered as sole selection criterion and compared to experimentally determined mutations. Based on our observations in the sequence recovery benchmarks, we tested if RosettaSurf-site could be a fast and efficient computational screening alternative to experimental based approaches. We studied the optimization of de novo designed interleukin-2/15 antagonists that bind specifically to the IL-2Rβγ c receptor [41]. The computationally designed interleukins were optimized for binding to the interleukin receptor by performing SSM followed by combinatorial libraries based on the identified mutations. The described study did not only report sequence information but included additional structural characterization of the resulting design, thus allowing a fair comparison to our computational method. This level of data completeness is rare in many other design endeavors that have used this type of optimization strategy [42,43].
We performed RosettaSurf-site similar to the approach described above (see "Protein Interface Sequence Recovery"), however, with the selection criteria being shape complementarity of the design and IL-2Rβγ c interface rather than similarity as shape complementarity has been shown to be a key feature of high affinity binding interactions [44]. RosettaSurf-site allows exhaustive computational sampling of amino acids similar to SSM experiments, screening all possible amino acid substitutions at the interface of the interleukin design. Our study is based on the reported crystal structure of the interleuking-2/15 design in complex with its cognate receptor (PDB: 6DG5). We selected interface positions that were tested during SSM and later included in the combinatorial library for our benchmark screening. Positions not interacting with the receptor, i.e. not within a C β -distance of 7 Å, were excluded from the selection, resulting in a total of 18 positions that can be regarded as interface. From these 18 possible interface residues, four positions were also present in the experimental library. One additional position (residue 39) was included that lies at the boundary of the interface, within 9 Å of the receptor, and was tested in the experimental library, resulting in a total of five positions as design input for our benchmark. In our design protocol, the five selected residues were sampled with RosettaSurf-site and mutations were evaluated based on the shape complementarity to the receptor surface. The computational results were compared to the composition of the combinatorial library that was constructed based on the preceding SSM screening. We analyzed whether RosettaSurf-site was able to identify mutations that are present in the highest affinity binder as well as the performance of the Rosetta energy function to retrieve similar mutations. After performing all possible mutations with RosettaSurf-site, the four top-ranking amino acids for each position were selected (Fig 4), in line with the maximum number of amino acids included in the combinatorial library.
RosettaSurf-site recovered four out of the five mutations present in the best design, with residue 39 being at the edge of the interface and contributing minimally to the protein-protein interaction, thus making it more challenging for this approach. Four additional residues included in the combinatorial library were recovered, that were observed to improve binding in the SSM but were not present in the highest affinity binder. In contrast, selecting mutations based on Rosetta energy alone recovers only a single mutation observed in the best binder and two residues present in the combinatorial library. These results show the potential of surfacebased design of single point mutants as fast and promising approach to select candidates for combinatorial libraries to improve binding interactions without requiring preceding screening experiments like SSM.

Surface-centric design of a novel RSV site 0 immunogen
In recent years, protein design has shown promising results in the field of immunoengineering, allowing the computational design of epitope-focused immunogens that were shown to elicit functional antibody responses in mice and non-human primates [28,30,31,45]. To achieve an epitope-specific immune response, the epitope is transplanted from the viral antigen to an unrelated small protein scaffold, presenting the antigenic site in isolation. We applied the RosettaSurf protocol to the design of immunogens mimicking the antigenic site 0, an epitope on the F-glycoprotein of Respiratory Syncytial Virus (RSV) which consists of an irregular α-helix and a 10-residue long loop. Previous studies have shown that identifying scaffolds that can present such complex structural motifs is challenging, and specifically for site 0 they are currently not available in the known structure space. De novo design methods have been successfully used to build scaffolds from scratch [31,32], but the design process remains challenging and expertise in de novo protein design is critical. The RosettaSurf approach allows to rescue protein scaffolds that only present partial structural matches to the epitope structure motif.
As a demonstration of the use of RosettaSurf to transplant the epitope site into an unrelated scaffold, a two-step strategy was employed: 1) side chain grafting of the α-helical segment onto a small, monomeric protein scaffold ( Fig 5A); 2) RosettaSurf design of the remaining antigenic site, including the surface generated by the epitope loop and transition to the helical fragment The surface mimicking designs are generated by grafting the side chains of the helix segment of the epitope onto the scaffold and surface-centric design is employed to optimize the loop region. Before and after design of the surface compared to native site 0. Blue areas indicate high similarity. B) Mimicry of surface geometry of WT scaffold, RSV_FixBB, and Surf_03 designs compared to native site 0. C) Representative SPR measurements of Surf_03 and RSV_FixBB against site 0-specific antibodies D25 and ADI14496. D) Binding profiles of Surf_03, a FixBB designed protein, and a helix-only design against a panel of site-specific antibodies with green indicating binding and red cells corresponding to non-binding. A knockout mutant of Surf_03 and the WT protein binding profiles are listed as reference.
https://doi.org/10.1371/journal.pcbi.1009178.g005 ( Fig 5A). The grafted helical segment serves as anchor point around which the surface can be optimized to mimic the complete antigenic site.
We identified the NarX histidine kinase receptor (PDB: 3EZI) [46] as promising scaffold. The small, monomeric protein can accommodate the epitope α-helix (C α RMSD: 0.4 Å) and offers a sufficiently large surface area around the helix to be optimized with RosettaSurf. We generated 760 designs using the RosettaSurf pipeline and the site 0 surface, as observed in complex with site-specific antibody D25 [47], as a sequence optimization target. The best design was selected based on the highest surface similarity score and was further optimized with six mutations (Surf_03). First, five point mutations, which were not part of the interface, were introduced to resolve clashes between native scaffold and epitope residues in the α-helix as well as steric hindrance with residues introduced during surface-centric design. In addition, we mutated a polar residue in the binding pocket to a hydrophobic residue based on the sequence profile of the top 20 surface-designed decoys.
Two additional variants of the same protein scaffold were designed to test whether there were advantages in using RosettaSurf. One design containing only the α-helix of site 0 (RSV_helix) and three point mutations that resolve clashes with the side chain grafted epitope helix, similar to Surf_03. The second design was generated with FixBB (RSV_FixBB), starting with side-chain grafting of the epitope α-helix, and allowing to design the same amino acid positions as those of Surf_03. The RosettaSurf design (Surf_03) reaches a surface similarity score of 0.45 compared to the antigenic site in the native viral protein RSVF, while the native scaffold scores 0. The helix-grafted base-design scores 0.06 and the RSV_FixBB design 0.04 in surface resemblance.
To test experimentally our predictions, the designs were expressed and purified, and the binding affinities were measured using a panel of monoclonal antibodies that engage the site 0 epitope (Fig 5B) [31,47,48]. In essence, this panel of monoclonal antibodies (mAbs) was used as conformational probes to assess the surface mimicry presented by the designs. We compared the binding profiles of the three designs, a negative control (Surf_03 KO) and the WT scaffold using surface plasmon resonance (S5 Fig). All three designs bound to the D25 and ADI19009 mAbs, indicating that these mAbs mostly rely on the helical segment of the epitope. Surf_03 was the only design recognized by two additional antibodies (ADI14496 and ADI18900), indicating that the higher surface mimicry achieved through RosettaSurf design improves the presentation of the antigenic site and promotes binding of additional site-specific antibodies. The designed immunogen demonstrates RosettSurf's capabilities to sculpt protein surfaces with a high degree of accuracy which could be of use to introduce functional sites into protein scaffolds by optimizing the molecular surface.

Discussion
In this work we propose a surface-centric protein design approach and demonstrate its accuracy in several in silico benchmarking and experimental design tasks. The protocol can either be used to optimize surface similarity or surface complementarity depending on the design task at hand.
Our ideas to leverage the surface geometrical and chemical features stem from observations that surfaces displaying similar patterns can have similar functional roles (e.g. in PPIs [1,11,14,49]). Technically, our framework is derived from earlier work by Lawrence and Coleman that introduced a fast and efficient way to evaluate shape complementarity as well as work by McCoy et al. that addressed electrostatic complementarity in interfaces. Based on some of these principles, we further implemented a surface similarity score to evaluate both shape and chemistry of surfaces. The resulting surface similarity score is implemented with a focus on surface-centric protein design pipeline, allowing easy interaction and modification of the algorithm inside Rosetta design protocols.
We showed the high accuracy of the surface score in recovering single amino acid identities by their surface properties and show its benefit when used in conjunction with the Rosetta energy score. Further, we highlight the use of surface similarity score inside a Monte Carlo sampling approach and its performance in the sequence recovery of complete protein interfaces. The incorporation of surface similarity clearly increases sequence recovery rates relative to other scoring schemes.
This approach can be readily applied for the design of experimental combinatorial libraries to reduce library sizes, as the high sequence recovery rates suggest that surface-centric design is especially interesting to design novel proteins that mimic surface patches of other proteins, as it is the case for the design of novel immunogens for vaccine development [30][31][32].
We demonstrated that surface-centric design could be used to generate targeted libraries to optimize binding specificity of a previously reported IL-2 design [41]. With the ranking of all mutations, we were able to recover amino acid variants that were also obtained by experimental screening through a saturation mutagenesis library. This approach represents a straightforward computational screening method to detect mutations that modulate specificity and affinity in protein-protein interactions. Specifically, this strategy is a fast and accessible alternative to experimental screening techniques.
The provided benchmarks highlight the protocol's design capabilities when operating on static backbones. However, the modelling of protein flexibility remains a challenge for computational methods at large. Due to RosettaSurf's modular implementation within the Rosetta framework, we anticipate that the protocol may perform well when working with conformational flexibility but will be dependent on the quality of the conformational ensembles generated by upstream methods.
Finally, we applied the surface-centric design protocol to engineer novel immunogens to present the surface patch of an antigenic site present in the RSVF protein. We used a structurally complex epitope (site 0 from RSVF), that consists of two structural segments, as an example for structurally challenging sites that remain difficult for computational design approaches. The surface-centric designs show a broader binding profile across a panel of monoclonal antibodies as compared to other design approaches, suggesting that the surface presented is a closer mimic of the native antigenic site.
Possible applications for surface-centric design range from the computational design of highly specific protein-protein interactions, focusing on optimizing the surface complementarity both in shape and chemistry. While the herein presented examples work under the premise of a known protein structure to be optimized for binding, an important related problem is the design of a target surface in the absence of a known backbone template. To our knowledge, identifying backbones with compatible target surfaces remains an open research question, however, we envision two approaches that can readily be combined with RosettaSurf. One simplified approach could utilize a database of protein backbone scaffolds that could be used as initial docking candidates and subsequently optimized with RosettaSurf to improve complementarity towards the target. A second potential approach is to apply a recently developed geometric deep learning framework called MaSIF [14]. Due to its excellent speed, MaSIF allows the comparison of a given surface to all surfaces in the PDB for their complementarity, and this way identifying potential design templates that could be further optimized with RosettaSurf.
In the scope of translational applications, surface similarity enables the computational design of proteins that may recapitulate precise surface features of target surfaces, which could then find applications in immunogen design, where surface similarity allows the design of immunogens mimicking an antigenic site of interest that could ultimately be used as probes for antibody isolation or vaccine antigens.
Ultimately, we introduce a new conceptual approach in computational protein design where the fine features of molecular surfaces are explicitly optimized. Through this route it will now be possible to explore the tantalizing hypothesis of the existence of a "surface degeneracy code", which in some cases may allow to represent similar surface patches using very distinct sets of amino acids. Importantly, this capability could also enable the presentation of similar surface patches in proteins with completely different backbone architectures which could again realize many endeavors in functional protein design that are thus far out of reach.

Surface similarity calculation
The described surface similarity (Surf S ) score is composed of shape and electrostatic similarity components. To evaluate surface shape similarity, two protein structures are first aligned and the closest surface points between target and reference are identified. Next, the normal vectors of the surface points are compared by first computing their dot product to obtain the enclosed angle, followed by distance-based scaling that penalizes points that are far apart (Fig 1). Each comparison yields a shape similarity score for the considered pair of points and the similarity of the entire surface results from computing the mean of all pair-wise point comparisons.
Since the identification of closest points depends on the starting surface, the surface scores are not identical depending on which structure is the considered the reference and the target and thus the scores are computed in both combinations. A robust surface similarity score is obtained by considering the similarity of the target surface compared to the reference surface and vice versa, effectively averaging the shape similarity to correct for differences during the selection of closest points.
To quantify surface electrostatics, the electrostatic potentials are computed over the continuous electrostatic field and discrete charge values are assigned to every point of the surface using the APBS software [50] (Fig 1). The representation of the target and reference surface point clouds as vectors allows fast computation of Pearson's and Spearman's rank correlations as described by McCoy and colleagues [20]. The resulting correlation coefficients capture the similarity of the individual potential values as well as the overall trends in electrostatic similarity of the two surfaces [18,20].
We combined both, the shape and electrostatic similarity scores, into a single Surf S score to facilitate accurate description of molecular surfaces. To identify a set of optimal weights that describe the relative contributions of the geometric and biochemical features towards the overall properties of the molecular surface, we aimed to derive weights from native protein surfaces. As our goal was to ultimately apply the Surf S score to design novel protein binders, we focused on surfaces arising at the interface of PPIs.
The dataset of protein complexes was compiled from the PDBbind [51], the SAbDab [52], and the Affinity Database versions 1.0 and 2.0 [53] databases, containing crystal structures of transient PPIs. We used RosettaScripts to determine the frequency of each amino acid type across all interaction interfaces. For each amino acid type (excluding cysteines) a total of 140 complexes (19 x 140 = 2,660 complexes) containing the respective amino acid in the interface region were considered to allow for a balanced dataset with equal numbers of complexes for each amino acid type and no distinctions were made based on rotameric conformations. All possible point mutations to non-native amino acids were generated and their geometric and electrostatic similarity to the native surface measured, resulting in a dataset containing shape and electrostatic properties for each mutation. Logistic regression was applied to identify the optimal set of weights to combine shape and electrostatic features to optimize for the highest recovery rates of the native amino acids. In total, the dataset contained 2,660 true positive and 47,880 true negative (18 x 19 x 140 = 47,880) data points. The dataset was split into training and testing subsets, with the testing set containing 20% of the data. Logistic regression was performed using Python's scikit-learn library [54], with the model reaching an accuracy of 98%. The resulting model parameters, i.e. an intercept of -13.79986756 and coefficients 1 and 2 with -13.78594078 and 14.64347448 respectively, provide the weights to combine shape and electrostatic similarity measurements.

Benchmark datasets
The dataset of the single amino acid recovery benchmark consists of 1,900 protein complexes, with 100 complexes for each amino acid identity except cysteine. All structures used during the various benchmarks as well as the proteins used to generate the Surf S score were subjected to constrained energy minimization in their complexed state using Rosetta to adapt them to the Rosetta energy function and the resulting energy optimized decoy served as input for the subsequent benchmark analysis. The protein complexes were randomly selected from the same pool of transient PPIs used for the logistic regression analysis while ensuring that no duplicated complexes were used for the logistic regression training and the single amino acid recovery benchmark. We divided each of these protein complexes into target proteins, where the interface can be mutated, and binders, the proteins that serve as context for the target. The interface of the target protein is defined as residues that are within 7 Å C β -distance of the binder and have the C β -atom pointing towards the binder. This selection ensures that the amino acid side chains are part of the interface and the contribution of the residue to the binding interaction is not solely due to backbone interactions. The full target interface is converted to alanine, effectively removing any side chain memory of the native structure that would restrict the placement of new rotamers and introduce biases towards the native sequence. Cysteine residues are ignored as they can form chemical linkages in the form of disulfide bonds, thus generating a surface that is not attributable to an individual residue.
To test the performance of the surface scoring function we sought to perform a benchmark to evaluate sequence recovery at the interfaces of protein-protein interactions with varying shape complementarity. We assembled a diverse dataset consisting of nine protein complexes capturing different aspects of PPIs and grouped them into three different categories: 1) low-shape complementary interactions which include Enterotoxin G-T-cell receptor complex (PDB: 3MC0) [55], Ribonuclease A in complex with its inhibitor (PDB: 1DFJ) [56] and domain 2 of VEGFR1 in complex with PIGF (PDB: 1RV6) [57]; 2) High-shape complementarity interactions which include the complex between Colicin E9 and IM9 (PDB: 1EMV) [58], Bovine beta-trypsin in complex with CMTI-I (PDB: 1PPE) [59] and PD-L1 in complex with a nanobody (PDB: 5JDS) [60]; 3) antigen-antibody interactions which include HIV-gp120 in complex with CD4-binding site antibody b13 (PDB: 3IDX) [61], RSV epitope-scaffold in complex with Motavizumab (PDB: 4JLR) [28] and the Vaccinia virus D8 protein in complex with the antibody vv138 (PDB: 6B9J) [62]. The shape complementarity of the complexes in the first two categories was assessed by the Rosetta shape-complementarity filter. Protein complexes with a shape complementarity score less than 0.65 were classified as low-complementarity. During the benchmark analysis we distinguish between bound and unbound proteins, where unbound proteins are obtained by removing the protein binder from the holo-crystal structure.
Data analysis was performed with the help of the rstoolbox Python library [63].

Computational design of proteins mimicking the antigenic site 0 in RSV
The structure of antigenic site 0 was extracted from the crystal structure of prefusion stabilized RSVF in complex with antibody D25 (PDB: 4JHW) [47]. The D25 epitope consists of an irregular α-helix (residues 196-209) and a 10-residue loop (residues 61-70). To identify putative scaffolds, we performed a structural search based on the irregular α-helix against 55,574 monomeric, helix-containing crystal structures from the Protein Data Bank (PDB, from September 2015) using Rosetta's MotifGraft algorithm [64]. Matches were filtered at a backbone RMSD threshold below 0.55 Å and less than ten atomic clashes at the interface, resulting in 13 scaffold candidates. After visual inspection, we selected the NarX histidine kinase receptor (PDB: 3EZI) [46] as scaffold, a 107-residue long, monomeric protein that aligned to the epitope helix with a C α RMSD of 0.4 Å and provided additional surface area to mimic the entire antigenic site.
In a first step, we transplanted the side chains of the epitope helix contacting the D25 antibody onto the selected scaffold using MotifGraft, followed by the introduction of three mutations on the protein scaffold that were not part of the interface to resolve steric clashes with the transferred side chains, resulting in the design RSV_helix. Subsequently, we performed surface-centric design on 19 residues using the RosettaSurf pipeline to increase surface mimicry of the site 0 around the epitope helix, resulting in 760 design decoys. Surface-centric design was performed in the presence of the D25 antibody and site 0 of RSVF served as a reference to which our designed surface was optimized. We selected the decoy with the highest surface mimicry score, named surf_01, and optimized the design in additional steps. To allow accurate display of the designed surface, we introduced two point mutants in the scaffold adjacent to the optimized surface patch to avoid steric hindrance (surf_02). Lastly, after comparing the similarity of surf_02 and the native antigenic site, we identified a lysine residue at position 18 in our design with suboptimal mimicry. Evaluating the sequence profile of the 20 best scoring decoys sorted by surface mimicry revealed a strong preference for leucine at this position and the residue was incorporated into a new design (Surf_03). Finally, we designed a version of RSV_helix that used Rosetta's fixed-backbone design (FixBB) to serve as comparison to our surface-optimized designs. We designed 1'000 decoys in the presence of D25 antibody, allowing mutations to occur at the same 19 residues as was the case for RosettaSurf design. All designs converged to an identical sequence which was selected as design RSV_FixBB. Based on Surf_03 we designed a knockout mutant (Surf_03_KO) with a N74Y mutation in the epitope helix.

Protein expression and purification
Designs. Genes for all designs were purchased as DNA fragments from Twist Bioscience, and cloned into pET11 vectors, containing a N-terminal MBP-tag and His-tag as well as a TEV cleavage site, for bacterial expression. Plasmids were transformed into E. coli BL21 (DE3 pLysS) (Merck, #69451-3) and grown overnight in LB media at 37˚C. Pre-cultures were diluted 1:50 and inoculated to an OD 600 of 0.6 in terrific broth (Condalab, #PRO1246.05) at 37˚C and expression was induced by the addition of 1 mM isopropyl-β-D-thiogalactoside (IPTG). Cultures were harvested after 18-20 hours at 20˚C. Pellets were resuspended in lysis buffer (50 mM Tris, pH 7.5, 500 mM NaCl, 5% Glycerol, 1 mg/ml lysozyme, 1 mM PMSF, and 1 μg/ml DNase) and sonicated on ice for a total of 12 minutes, in intervals of 15 s sonication followed by 45 s pause. The lysates were cleared by centrifugation (48'384 g, 20 min) and purified using a His-Trap FF column on an Ä kta pure system (GE Healthcare), followed by size exclusion on a HiLoad 16/600 Superdex 75 column (GE Healthcare) in phosphate-buffered saline (PBS). Protein concentrations were determined by measuring the absorbance at 280 nm on a Nanodrop (Thermo Scientific). The designed proteins were concentrated by centrifugation (Millipore, #UFC900324) to 1 mg/ml, snap frozen in liquid nitrogen, and stored at −80˚C.
Antibody variable fragments (Fabs). For Fab expression, heavy and light chain DNA sequences were purchased from Twist Biosciences and cloned separately into the pHLSec mammalian expression vector (Addgene, #99845) using AgeI and XhoI restriction sites. Expression plasmids were premixed in a 1:1 stoichiometric ratio, co-transfected into HEK293-F cells, and cultured in FreeStyle medium (Gibco, #12338018). Supernatants were harvested after 1 week by centrifugation and purified using a kappa-select column (GE Healthcare). Elution of bound proteins was conducted using 0.1 M glycine buffer (pH 2.7), and eluates were immediately neutralized by the addition of 1 M Tris ethylamine (pH 9), followed by buffer exchange to PBS (pH 7.4).

Binding affinity determination by surface plasmon resonance (SPR)
SPR measurements were performed on a Biacore 8K (GE Healthcare) with HBS-EP+ as running buffer (10 mM HEPES pH 7.4, 150 mM NaCl, 3 mM EDTA, 0.005% v/v Surfactant P20, GE Healthcare) at room temperature. Approximately 700 response units (RU) of Fabs were immobilized on a CM5 sensor chip (GE Healthcare) via amine coupling, and designed monomeric proteins were injected as analyte in two-fold serial dilutions. The flow rate was 30 μl/ min with 120 s of contact time followed by 400 s dissociation time. After each injection, surface was regenerated using 0.1 M glycine at pH 3.5. Data were fitted using 1:1 Langmuir binding model within the Biacore 8K analysis software (GE Healthcare #29310604).

S1 Fig. Schematic overview of the surface-centric design process.
A surface patch is selected on the target protein that will be subjected to mutations for improving surface features. A reference surface is specified and will be used during the design process to guide the introduced mutations. During sequence design, rotamers are sampled in the selected interfaces of the target protein and for each substitution the surface is compared to the reference surface. If mutations improve the surface score, the changes are accepted. During iterative sampling steps of the selected surface patch, the overall surface can be improved. (TIF)

S2 Fig. Recovery success of individual amino acid types.
Case study of the recovery of arginine in difficult and easily recoverable benchmark cases. A) Recovery of the exposed arginine residue is unsuccessful in the unbound test case as a non-native rotamer is placed in the structure. The addition of the binding partner limits the accessible rotameric space and allows successful recovery of the amino acid. B) Successful recovery of arginine independent of the presence or absence of the binder as the native rotamer conformation is less exposed. Overall side chain configurations placed in the bound benchmark cases are closer to the native rotamer (mean full-atom RMSD of~0.4 Å) as compared to the unbound benchmark cases (mean full-atom RMSD of~2.3 Å). . When evaluating both rotamers in terms of surface similarity, the Surf S score can discriminate the local changes. The shape similarity score changes by 0.2 and the electrostatic similarity score by 0.5 units, resulting in an overall Surf S score of 0.989 when comparing both rotamers. The differences are specific for the altered region as shown for shape similarity (right) and outline the modified atom positions. (TIF)