Growing Glycans in Rosetta: Accurate de novo glycan modeling, density fitting, and rational sequon design

Carbohydrates and glycoproteins modulate key biological functions. However, experimental structure determination of sugar polymers is notoriously difficult. Computational approaches can aid in carbohydrate structure prediction, structure determination, and design. In this work, we developed a glycan-modeling algorithm, GlycanTreeModeler, that computationally builds glycans layer-by-layer, using adaptive kernel density estimates (KDE) of common glycan conformations derived from data in the Protein Data Bank (PDB) and from quantum mechanics (QM) calculations. GlycanTreeModeler was benchmarked on a test set of glycan structures of varying lengths, or “trees”. Structures predicted by GlycanTreeModeler agreed with native structures at high accuracy for both de novo modeling and experimental density-guided building. We employed these tools to design de novo glycan trees into a protein nanoparticle vaccine to shield regions of the scaffold from antibody recognition, and experimentally verified shielding. This work will inform glycoprotein model prediction, glycan masking, and further aid computational methods in experimental structure determination and refinement.


Introduction
Carbohydrates and glycoproteins are ubiquitous in biological organisms [1].Viral glycoproteins such as HIV envelope trimer, influenza hemagglutinin, and SARS-CoV-2 spike, employ N-linked glycosylation as an immune evasion strategy, taking advantage of the fact that host glycans on the surface of proteins are usually recognized as "self" by the adaptive immune system [2].Yet, HIV broadly neutralizing antibodies often target glycans as part of their epitopes [3] [4] [5].Small carbohydrate residues attached to serine or threonine can act in signaling pathways akin to phosphorylation [6], while glycans on the constant region of antibodies act as mediators of effector function [7] [8].Glycans can also improve stability [9] and solubility [10], reduce aggregation [11], and even improve biological drug-targeting and vaccine design through glycan masking of off-target regions [12].However, in the context of a series of protein nanoparticle immunogens, we recently discovered that glycan masking of the protein nanoparticle scaffold itself is unlikely to enhance antigen-specific antibody responses, especially when the displayed antigen is immunodominant over the nanoparticle scaffold [13].But we also recently showed that high-density and high-mannose glycans on protein nanoparticle surfaces increase lymph node trafficking and antibody responses against the nanoparticle in a density-and mannose-dependent manner [14].Thus, optimizing the density and composition of glycans displayed on protein-based vaccines-either on the antigen and/or protein nanoparticle scaffold-provides a framework for engineering glycan recognition to optimize vaccine efficacy.
The biosynthesis of glycoconjugates is complex.Carbohydrates can be attached to certain amino acid residues including serine, threonine, asparagine, and (rarely) tryptophan through covalent modification, forming glycoproteins.The attachment can be made to nitrogen, oxygen, or carbon atoms, (known as N-, O-, or C-linked glycosylation, respectively), with each process involving a multitude of enzymes, sugar moieties and resulting carbohydrate structures.These processes are stochastic in nature, producing glycoproteins that are heterogeneous in both the occupancy of a glycan at the glycosylation site (macro-heterogenicity) and the chemical makeup of the N-, C-, or O-linked glycan (micro-heterogenicity) [15].
The most common form of glycosylation observed in glycoprotein structures is N-linked glycosylation.Initiation of this process occurs during translation, by the protein from the Bill & Melinda Gates Foundation to WRS, by BMGF CAVD funding to the IAVI NAC Center to WRS, by Grant OPP1156262 from the Bill & Melinda Gates Foundation and a generous donation from the Open Philanthropy Project to NPK, NIH awards 1P01AI167966 and P50AI150464 to NPK, and by an NIAID training grant fellowship T32AI007244 to JAB.NIH award R01-GM127278 supported JWL and JJG.NIH/ NIGMS R35 GM122517 (R. Dunbrack) supported MS.The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: Dr. JJG is an unpaid board member of the Rosetta Commons.Under institutional participation agreements between the University of Washington, acting on behalf of the Rosetta Commons, Johns Hopkins University may be entitled to a portion of revenue received on licensing Rosetta software including methods discussed/developed in this study.As a member of the Scientific Advisory Board, JJG has a financial interest in Cyrus Biotechnology.Cyrus Biotechnology distributes the Rosetta software, which may include methods developed in this study.These arrangements have been reviewed and approved by the Johns Hopkins University in accordance with its conflict-of-interest policies.WRS is an employee of Moderna, Inc., but his contributions to this work were all conducted prior to his employment at Moderna.JAB is an employee of Johnson and Johnson Innovative Medicine, Inc., but his contributions to this work were all conducted prior to his current employment.oligosaccharyltransferase (OST), which recognizes a multi-residue consensus motif, or sequon, of NX(S/T) (where X is any residue except proline), and covalently attaches a lipid-linked core-oligosaccharide to the asparagine residue through an N-glycoside linkage 1 .This process is not deterministic (not every sequon results in attachment of a glycan) and certain amino acids in and around the sequon motif can affect the efficiency of this process, resulting in higher or lower glycan occupancy at the site [16] [17].
Upon successful protein folding in the endoplasmic reticulum, the initial N-linked glycan is "trimmed down" by removal of several terminal glucosyl residues, while many sugar processing enzymes in the Golgi apparatus can add or remove sugar residues from the nascent branched sugar (tree).The resulting chemical makeup of the glycan tree depends on which enzymes are available in the Golgi, which is heavily influenced by species, disease state [18], developmental stage [19]; and the local structure, sequence, and environment of the glycosylation site [20].In addition, a particular glycosylation site can result in vastly different glycans [21], though this can be controlled to some extent through various bioengineering techniques [15] [22] [23].
Glycans are also conformationally flexible, being highly hydrophilic and typically exposed on the surface of proteins, with a large number of conformational degrees of freedom.However, as has been observed in molecular dynamics and NMR experiments, glycan conformations can be influenced by their structural environment [24].Through the plethora of highresolution crystallographic and cryo-EM studies, we also know that glycans can adopt stable conformations with well-defined density observed for many of the glycan residues in each tree, especially towards the root of the glycan tree, even for some unrestrained glycans [25] [26],.Presumably, these low-energy, stable conformations are occupied at higher frequency in solution.In addition, a recent QM study on glycan torsional energies showed that the QMderived conformational preferences of glycan torsions match well with glycan structures analyzed from the protein data bank, indicating that conformational diversity is also influenced by the chemical makeup of each glycan structure [27].
Given the complex chemistry and conformational diversity involved, accurate modeling of glycans is currently a grand challenge in computational biology.Computational glycobiology tools and webapps have been developed for protein glycosylations [28], validation of carbohydrate structural chemistry [29], statistical analysis [30], and docking [31] [32], Common methods in glycoprotein modeling typically involve molecular dynamics (MD) simulations [33] or adding glycans by manual placement and conformational tweaking into their density for structure determination [34].Recently, a new method for automatic building of glycan structures from sequence was described [35]; this method, the CHARM-GUI Glycan Modeler, was benchmarked only up to the first and second sugar.
Here we describe a new glycan modeling algorithm built within the Rosetta software suite, a platform that incorporates state-of-the-art applications and modules for a variety of macromolecular modeling and design tasks [36].The new algorithm provides user interfaces for the creation of tailor-made protocols [37] [38], and includes a reliable knowledge-based energy function to evaluate models and designs [39].We build on earlier work that enabled representing and evaluating carbohydrate structures within Rosetta [40] and in loading, representing, and refining glycans from the Protein Data Bank [41].We expand on this foundational work through the addition of new carbohydrate-specific sampling methods, an updated conformer database employing adaptive kernel density estimates, a new framework for general analysis in Rosetta (SimpleMetrics), and a new algorithm for accurately modeling complex carbohydrates, the GlycanTreeModeler.
We rigorously benchmark the new method on a set of diverse high-resolution crystal structures of glycans in symmetric crystal environments using the new analysis framework SimpleMetrics and a new application called rosetta_scripts_jd3, and we show that the Glycan-TreeModeler is capable of recapitulating native glycan structures with high accuracy both through de novo and density-guided modeling [42].We then applied our glycan modeling protocol with Rosetta sequence design of glycan sequons to engineer optimal new glycans onto a protein nanoparticle vaccine scaffold and evaluated changes in immune responses.We observed reduced reactivity to the underlying protein surface in immunization experiments, thus demonstrating that glycans can be computationally engineered to tailor immunogenicity of vaccines.

Benchmarking tools
In order to examine the performance of GlycanTreeModeler, we built a new benchmarking infrastructure in Rosetta.We developed the SimpleMetrics framework within the XML interface to Rosetta (RosettaScripts [37]), which allows for robust analysis through more than 20 associated structural and energetic metrics, with data reporting at any step in a RosettaScripts protocol.To facilitate large scale benchmarking, we developed a general application for parallel RosettaScripts computing, rosetta_scripts_jd3, enabling glycan calculations to be run in parallel on a high-performance computing cluster.This application can run multiple jobs within a single parallel run of Rosetta, with individually configured glycan trees to be modeled, and any associated input files for each.The SimpleMetric framework and rosetta_scripts_jd3 application are reviewed in detail in S1 Text.

Glycan structure set
The Rosetta GlycanTreeModeler algorithm was benchmarked against a set of 25 unique Nlinked glycan trees in their crystal arrangement ranging from three to twelve residues, across 19 unrelated glycoprotein structures of better than 2 Å resolution, totaling 139 sugar residues.Each glycan tree was checked for chemical and structural inconsistencies (such as incorrect isoform assignments, wrong linkages, or missing atoms) using the glycosciences.depdb-care webserver (which filtered many of our initial glycan list) [29].It should also be noted that some of the structures are likely substructures of larger glycans.Preparation and analysis of the structures can be found in S1 Text.

De novo modeling
Using the optimized protocol and scoring function found during protocol optimization (see methods), benchmarking was done on the set of 25 glycans described above.Across the benchmark dataset, the median RMSD of the glycan predictions to the native structures was 2.7 Å, while the mean was 5 Å.For the first two residues of the glycan tree, the median was 1.28 Å with a mean of 2.17 Å.Of the 25 glycan trees, 20% of the glycans were predicted at < 1 Å accuracy and 72% (18/25) of the glycans were predicted at < 5 Å accuracy (Figs 1 and 2).The largest glycan in our dataset, with twelve residues, was benchmarked at 2.5 Å. Full results for each glycan are listed in S1 Table .It is also useful to understand how well the algorithm predicts the internal structure of the glycans, as a single dihedral angle change at the root of the glycan can significantly change the overall structure of the glycan relative to the protein.For each of these structures, the same lowest-energy models were superimposed onto the input glycan.The median superimposed RMSD is 1.1 Å, with a mean of 2.7 Å. Overall, 32% (8/25) were < 1 Å RMSD, 64% < 2.5 Å RMSD and 92% of the predictions < 5 Å.Both RMSD measurements of the glycans were generally correlated to each other (S1 Fig).
In addition, most of the glycan benchmarks in our dataset had convergent score vs. RMSD (funnel) plots (S2 Fig) .This funnel-like quality is directly related to the ability of the scoring function to discriminate near-native models from decoys and was quantified using the PNear metric [43] that estimates the Boltzmann-weighted probability of finding a system near its native state at various near-native cutoffs (lambdas) (S1 Text).A PNear closer to 1.0 indicates the highest quality funnel possible.The worst-performing glycans in our benchmark set had poor score vs. RMSD funnels, indicating that the scoring function was not able to capture This result is further detailed through the low pNear metrics of the funnel plot with all lambdas being less than .01,showing that the current energy function is unable to score these types of interactions well.However, a scoreterm that accurately represents glycan-aromatic CH-π interactions [44] may improve these results.Solvent is implicitly represented in most Rosetta applications, but we observe that half of the benchmark glycans have significant crystallographic waters in contact with the surrounding protein.Attempting to understand the effect of waters, we modeled the worst-performing and best-performing glycans and then predicted explicit waters around the glycan for each output decoy using Rosetta-ECO [45] in order to score more native-like conformations that have these bridged waters.However, decoy discrimination as measured by pNear was significantly worse for all lambda cutoffs (even for the best-performing glycans), indicating that even with explicit waters and sufficient near-native sampling distributions, the Rosetta energy function was unable to use this information to accurately distinguish near-native decoys.(S2 Table ).
In the benchmark set, the internal (superimposed) RMSDs are generally low in comparison to the overall RMSD (84% < 3 Å), showing that the energy function, guided by the QMderived sugar_bb energy term, can accurately predict many glycan structures, but may need to be further improved to more accurately score glycan-protein interactions in the future.

Density building
There are an increased number of glycoprotein structures being determined.To assist structure determination, many recent glycan modeling tools have focused on their ability to aid in glycan structure building and refinement using the experimental density, especially for structures with many resolved glycans such as HIV Env.We tested the ability of the GlycanTreeModeler to build glycan structures using crystallographic density information to guide modeling and decoy discrimination using integrated density scoring [42].The experiment was conducted in the same manner as de novo modeling, by first randomizing all backbone dihedral angles of the glycan to be modeled for each output decoy and removing all crystallographic waters.For each of the 25 glycans, the lowest-energy model was used for assessment.
Without further refinement or any additional changes to the protocol, all glycans were modeled at sub-angstrom accuracy.The best glycan in the current benchmark, with six residues, was built at 0.08 Å RMSD to native (3gml position 165A glycan), while the worst, a fiveresidue glycan, was modeled at 0.88 Å RMSD (1gai position 171A glycan).For both of these glycans, funnel plots were generally good, with respective PNear values of 0.99 and 0.46 at a lambda of 1.0 Å (Fig 3).For 1gai glycan 171A, the last residue in the glycan is twisted in the best model compared to the native and fits two constituent oxygens into the low residue density at a different angle than the solved structure.This twist can clearly be seen in the funnel plot where the distribution of models less than 1 Å is bimodal, indicating two primary close solutions of the electron density.(Fig 3F).
Overall, the GlycanTreeModeler achieved a mean heavy atom RMSD of 0.48 Å using all residues and 0.34 Å using residues that had acceptable fits into the density (133/139 total glycan residues, S1 Text).For both inclusion types, the median RMSD was 0.31 Å and 0.28 Å respectively, while the mean RMSD of the glycan root (first two sugar residues) was .23 ).Values for PNear with lambda of 1.0 Å were generally quite favorable, indicating high-quality funnels, with a mean of 0.86 and median of 0.92 (Fig 4B).These results show that the GlycanTreeModeler can be effective for modeling known glycans into electron density, especially with existing methods refinement [41].

Sugar coating protein surfaces
The addition of glycans to exposed protein surfaces can reduce B cell receptor access to underlying surface epitopes; this approach (called "glycan masking") has been used to decrease the amount of antibodies elicited against off-target epitopes of designed immunogens [12] [46] [47] [48].Given the predictive capability of the GlycanTreeModeler to model the spatial arrangement of complex glycans, we used the algorithm in combination with RosettaScript SugarCoating methods for sequon design and computational glycosylation to iteratively design four N-linked glycans onto the outer surface of the I53-50A trimeric component of the I53-50 protein nanoparticle scaffold (Fig 5A; details of the design approach are described in Materials and Methods in S1 Text.Designed sequences and designed glycan positions are given in S4 Table ).I53-50 was selected as a model immunogen because it is currently in clinical trials as the nanoparticle scaffold for SARS-CoV-2 [49] and RSV [50] vaccines.
When glycosylated I53-50A trimers and I53-50B pentamers were mixed in vitro at equimolar concentrations, the two components self-assembled into I53-50(gly) nanoparticles that display 240 glycans on the outer surface (Fig 5A and 5B).Biophysical characterization by negative stain transmission microscopy (nsTEM), dynamic light scattering (DLS), and size exclusion chromatography (SEC) confirmed the formation of monodisperse particles with the known I53-50 morphology (Fig 5B).SDS-PAGE analysis of the I53-50A(gly) trimer treated with PNGase F confirmed that the designed glycans were present in the protein (Fig 5B).Further in vitro characterization and antibody responses against these glycosylated I53-50A trimers has been recently described in other reports [13].Mice were immunized three times with 5.57 μg of I53-50 or I53-50(gly) particles.Anti-I53-50A trimer serum antibody titers were significantly lower in mice immunized with I53-50(gly) particles compared to mice immunized with I53-50 particles, whereas anti-I53-50A(gly) trimer titers were unchanged between the two groups (Figs 5C and S5).These data demonstrate that the methods presented here can be used for glycan masking through design and analysis of potential sequon motifs and the spatial arrangement of putative glycans on protein surfaces.

Discussion
The GlycanTreeModeler and associated tools allow modelers to accurately model glycans of interest through de novo and density-guided modeling.The algorithm and energy function were rigorously optimized and benchmarked with glycans of varying length and complexity at a median de novo RMSD of 2.7A.In fact, even before full optimization and release, the GlycanSampler algorithm (previously the glycan_relax app) was used to model glycans on HIV [52], Hepatitis C [53], vaccine candidates [54] [55], and (with the final optimized version) SARS-CoV-2 [56], illustrating the general utility of the algorithm and its potential to inform chemical biology.
The modular nature of Rosetta and the tools created for this work allow them to be used in a variety of complex modeling and design tasks.The GlycanTreeModeler was used with previously published density tools [42] to build glycans into their crystallographic or cryoEM experimental density with sub-Angstrom accuracy.However, while the results are encouraging, a truly automated solution for glycoprotein modeling must also sample glycan chemistries, branching, and kinematics simultaneously in order to build potential glycan residues into the density of unknown glycans.Knowledge of the range of glycoforms and occupancy occurring at a glycosylation site can be obtained through mass-spectroscopy techniques [21] [57], but due to chemical and structural heterogeneity at any single glycan site, modelers will typically need to build models for multiple different glycoforms at a single site, especially for complex glycans.The tools presented here can sample and build multiple potential whole glycans at a site through the SimpleGlycosylateMover, but core Rosetta methods that also consider species and cell-type dependent glycan chemistries during the GlycanTreeModeler or end-to-end deep learning methods would be a welcome addition to the methods presented here.
By combining the tools through RosettaScripts, it becomes possible to computationally design glycan sequons at ideal positions on a protein, and then build and model multiple potential glycans at a variety of sites in a symmetric manner.This general workflow was used to sugarcoat a clinically relevant nanoparticle vaccine scaffold with N-linked glycans.In vitro and in vivo testing of this glycosylated scaffold showed a decrease in the humoral immune response to the glycan-masked surface.Sugar coating therapeutics using these methods could potentially reduce off-target effects of many preclinical biologics, especially with respect to immunogenicity.
Most glycans can sample a wide range of conformations in solution, as they are mostly polar, usually exposed to solvent, and have many conformational degrees of freedom.Thus, accurately predicting the lowest energy states (and highest occupancy conformations) for glycans is difficult.In addition, these glycans may be forced into higher-energy internal states through local and crystal contacts.While we can generalize that low energy conformations found through the GlycanTreeModeler should be indicative of probable solution conformations, the GlycanTreeModeler was not benchmarked on an experimental ensemble of glycan structures.The few glycan ensembles found through solution NMR [58] may approximate conformational ensembles in solution and could be the bases for future benchmarking and protocol/scorefunction optimization.However, even with this consideration, many of the benchmark glycans that were modeled accurately to their crystal structures are not hindered by monomer or crystal contacts, but have few interactions to protein residues in their glycan root.Additionally, predictions of the internal (superimposed) RMSDs of all glycans benchmarked were generally favorable with a median benchmarked accuracy of 1.1 Å and a mean of 2.7 Å, indicating that the glycan root, subsequent torsional preferences, and intra-glycan interactions may be determining structural factors for these isolated glycans.
Although the algorithm is capable of accurate de novo modeling of many glycans (especially at their base) and has been used for experimental glycan masking, there is certainly room for improvement.In nearly all of the benchmarks, the native structure is sampled adequately, but in a subset of structures, the energy function is not able to choose near-native structures.Upon further investigation of the many native glycans in the benchmark set with water-mediated hydrogen bonds, we originally hypothesized that explicit water modeling might help the energy function discriminate near-native models.However, we found that implicit modeling actually led to better discrimination scores through the pNear metric.In order to improve the algorithm further, the Rosetta energy function will need to be optimized to improve glycanprotein interactions, specifically in terms of hydrogen bonds, solvation, and the introduction of energy terms that better represent aromatic CH-π interactions [44].Finally, the algorithm requires more compute time as the number of glycans to model increases, which can be prohibitive for large, multimeric glycoproteins such as HIV.
In this work, optimization of both sampling and scoring was necessary to improve overall accuracy.A key component of the algorithm is the nature-inspired kinematics used during sampling, which was shown to be an important determinant of the overall accuracy of the algorithm.The kinematics were rigorously benchmarked here, though kinematics are not always taken into account or optimized in state-of-the-art classical modeling algorithms.This benchmarking was made possible by the SimpleMetric framework and a new RosettaScripts application that were created and used continuously throughout this work.In addition, we demonstrated the usability of these methods through glycan masking the trimeric subunit of a two-component self-assembling protein nanoparticle that is used as a scaffold to multi-valency display viral glycoprotein antigens.While the glycan masking did not completely remove antibodies specific for the trimer, the experimental results did show proof-of-concept that glycan masking can significantly reduce antibody responses.
SimpleMetrics have now become a critical tool for general analysis in Rosetta and as a way to export important information for external algorithms, such as the quantum annealer [59].As core protocols in Rosetta continue to be optimized, and as deep learning becomes a more integral aspect of modeling and design, SimpleMetrics should allow the robust analysis of new protocols, results, and Rosetta benchmarks, as it has for this work.
These results show that the GlycanTreeModeler is able to accurately predict glycan structures de novo, build them into known density, and be used in SugarCoating protein surfaces.In addition, the modular nature of the components allows them to be further developed for specific engineering tasks such as immunogenicity reduction or the optimization of developability characteristics such as half-life, solubility, and aggregation potential.

Methods
The Rosetta GlycanTreeModeler builds whole glycan "trees" through an algorithm that mimics the growth of natural trees.A primary difficulty in de novo glycan modeling is the correct prediction of the base of glycoconjugate structures.To increase the accuracy of the first few sugars of the tree, our algorithm begins modeling from the "root" (reducing end) of the glycan tree out to the branching "foliage".Monte Carlo optimization through sampling of glycan degrees of freedom (DOFs) is carried out through the new GlycanSampler, which includes routines for glycosidic torsion angle (backbone) sampling, structure minimization, hydroxyl and other side-chain optimization, and neighbor protein side-chain optimization.During the protocol, the total amount of sampling scales linearly with the number of glycan residues being modeled, ensuring even sampling regardless of the size or quantity of glycans being modeled.
The GlycanSampler optimizes glycosidic torsion angles using statistically favorable sets of phi, psi, and omega angles (conformers) and single torsions sampled from QM-derived probabilities originally used for energetic evaluation of glycosidic linkages [27] [31],.Conformer sets are dependent on each chemically distinct pair of saccharides making up a glycosidic bond, whereas single torsions depend on the anomeric chemistry of the linkage.We derived the conformers for this work by carrying out a new bioinformatic analysis of glycans in the PDB through the use of adaptive kernel density estimates in a similar manner to what was done for the 2010 Dunbrack Backbone-dependent Rotamer Library [60] (S1 Text).
To optimize the conformations of glycan residues on different branches at the same time, the glycan tree is built layer-by-layer, with a layer defined as the residue distance to the root (Fig 6A).Once each new layer is built and optimized, all previous layers are then optimized further (Fig 6B).After all layers are built and optimized, a final optimization is conducted.The lowest energy model (decoy) found during this Monte Carlo algorithm is output at the end of the program as a PDB file.The lowest-energy structure of all the output decoys is used as the "best" model produced by the algorithm (S1 Text).

Benchmarking protocol
Benchmarking was carried out through the SimpleMetrics framework developed for this work.A SimpleMetric takes a structure and returns a metric or set of metrics, which can then be written to an output scorefile at the end of the protocol during a RosettaScripts execution.A number of SimpleMetric types were developed for textual, numeric, coupled, and per-residue data (S7 Table ).These metrics enable calculation of RMSDs, Solvent-Accessible-Surface-Area (SASA), complex hydrogen bonding networks, and other biophyisical properties.These metrics can also be used on-the-fly with Rosetta filters using the SimpleMetricFilter and simple calculations of per-residue data can be achieved using the ResidueSummaryMetric.Many of these metrics were used for benchmarking and analysis (S1 Text).Further, a new application, roset-ta_scripts_jd3 was created to enable large-scale benchmarking of Rosetta protocols.This application enables parallel-execution of different rosettascript protocols in parallel, with all resulting experiments tagged during score-file output.This allows for an entire experimental benchmarking pipeline to be created, run, and analyzed through a single Rosetta execution.The Python scripting language was used to load the resulting JSON scorefile for data analysis and figure creation using the numpy [61], pandas [62], and seaborn [63] libraries.All protocol components and their availability in RosettaScripts is listed in S8 Table.
To assess the predictive capability of the GlycanTreeRelax algorithm, the dihedral angles of the glycans are randomized at the start of the algorithm, and waters are removed.Models are compared to the crystal structures using the all-heavy-atom Root Mean Square Deviation (RMSD) metric, with the lowest energy model of all output decoys used for assessment (Fig 7).The RMSD is calculated on all glycan residues that have an acceptable fit to the density in the native model, as terminal glycan residues of some glycans often cannot be observed in the density due to their higher flexibility.A description of the methods used for the RMSD calculation is provided in S1 Text.

Glycan masking
Glycan masking was carried out through the use of two new RosettaScript components; the CreateGlycanSequonMover, which designs typical and enhanced [64] [17], glycan sequons into a protein at a desired position, and the SimpleGlycosylateMover, which adds whole glycans of a given IUPAC onto a protein.Glycans were then sampled using the GlycanTreeModeler through RosettaScripts at each potential glycan position individually.Low-energy and nonclashing models were used to select optimal positions for experimental validation with sequon sequences designed for each position using the CreateGlycanSequonMover (S1 Text).

Figures
Figures were created using matplotlib [68].Glycans were visualized in PyMol using the Azahar plugin [69], which was expanded for this work.The cartoonize command was generally run for figures (cartoonize A) for chain A: https://github.com/BIOS-IMASL/Azahar/pull/17Documentation Links:

Fig 6 .Fig 7 .
Fig 6.Glycan modeling diagram.a. Glycan trees building layer by layer.Numbers indicate distance to root of the glycan tree, which is the first residue.b.After a layer is built, Glycan Sampling is performed on the new layer, and then all layers, before building the next layer.c.Diagram showing major components of the GlycanSampler.The GS is a weighted random sampler, indicating that each DOF is sampled with a specific probability (S1 Text).https://doi.org/10.1371/journal.pcbi.1011895.g006

Example of glycan modeling for a single decoy modeling a single glycan tree of 9 residues where the layer size is set to one.
(MOV)