Insights on cross-species transmission of SARS-CoV-2 from structural modeling

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the ongoing global pandemic that has infected more than 31 million people in more than 180 countries worldwide. Like other coronaviruses, SARS-CoV-2 is thought to have been transmitted to humans from wild animals. Given the scale and widespread geographical distribution of the current pandemic and confirmed cases of cross-species transmission, the question of the extent to which this transmission is possible emerges, as well as what molecular features distinguish susceptible from non-susceptible animal species. Here, we investigated the structural properties of several ACE2 orthologs bound to the SARS-CoV-2 spike protein. We found that species known not to be susceptible to SARS-CoV-2 infection have non-conservative mutations in several ACE2 amino acid residues that disrupt key polar and charged contacts with the viral spike protein. Our models also allow us to predict affinity-enhancing mutations that could be used to design ACE2 variants for therapeutic purposes. Finally, our study provides a blueprint for modeling viral-host protein interactions and highlights several important considerations when designing these computational studies and analyzing their results.


Introduction
SARS-CoV-2, a novel betacoronavirus first identified in China in late 2019, is responsible for the ongoing global pandemic that has infected more than 31 million people worldwide and killed over 900,000 [1]. Comparative genomics studies suggest that SARS-CoV-2 was transmitted to humans from an animal host, most likely bats or pangolins [2]. Given the widespread human-to-human transmission across the globe, the question emerges whether humans can infect other animal species with SARS-CoV-2, namely domestic and farm animals. Characterizing molecular features necessary for infection is a first step towards identifying potential intermediate hosts that can act as reservoirs for the virus and has important global health, animal welfare, and ecological implications.
During the course of this pandemic, there have been several news reports of domestic, farm, and zoo animals testing positive for SARS-CoV-2 infection. Belgium [3] and New York [4] reported positive symptomatic cases in cats, The Netherlands reported infection of minks in farms [5], and the Bronx Zoo in New York reported infections in lions and tigers [6]. In all these cases, the vehicle of transmission appears to be an infected human owner or handler. More importantly, in the case of the mink farms in The Netherlands, there is evidence of human-to-animal-to-human transmission. In addition to these reported cases, several groups put forward both pre-prints and peer-reviewed studies on animal susceptibility to SARS-CoV-2 under controlled laboratory conditions [7][8][9], two of which are of particular interest. The first study showed that cats, civets, and ferrets are susceptible to infection; pigs, chickens, and ducks are not, while the results for dogs were inconclusive [7]. A second study, using human cells expressing recombinant SARS-CoV-2 receptor proteins showed that camels, cattle, cats, horses, sheep, and rabbit can be infected with the virus, but not chicken, ducks, guinea pigs, pigs, mice, and rats [8]. Together, these studies provide a dataset of confirmed susceptible and non-susceptible species that we can analyze to find molecular discriminants between the two groups. For simplicity, from here on we will refer to susceptible and non-susceptible species as SARS-CoV-2 pos and SARS-CoV-2 neg , respectively.
Like SARS-CoV-1 before, SARS-CoV-2 infection starts with the binding of the viral spike protein to the extracellular protease domain of angiotensin-converting enzyme 2 (ACE2) [10], a single-pass transmembrane protein expressed on the surface of a variety of tissues, including along the respiratory tract and the intestine. Several biophysical and structural studies identified helices α1 and α2, as well as a short loop between strands β3 and β4 in ACE2 as the interface for the viral spike protein [10][11][12][13]. These studies also identified key differences between the sequences of the receptor binding domains (RBD) of SARS-CoV-1 and SARS-CoV-2, which explain the stronger interaction of the latter with human ACE2. If binding to ACE2 is the first step in the infection cycle, we can reasonably assume that sequence variation across ACE2 orthologs can explain why only some animal species are susceptible to infection. In addition, combining structural and binding data with the natural diversity of ACE2 across species can help elucidate the key aspects that drive ACE2 interaction to viral RBDs and ultimately help guide the development of therapeutic molecules against SARS-CoV-2.
Unsurprisingly, given the rapid release of sequence and structural data on SARS-CoV-2 at the onset of the pandemic, we and several other groups contributed analyses of how sequence variation affects ACE2 binding to SARS-CoV-2 RBD [14][15][16][17][18][19]. Three recent works, specifically, focus on the effects of ACE2 variation on RBD binding. Damas et al. carried out a very comprehensive and multi-disciplinary computational analysis of ACE2 orthologs to identify species with the highest risk for SARS-CoV-2 infection [18]. In their work, meanwhile published in PNAS, the authors analyze the sequences of 410 vertebrate species and describe a set of 25 amino acids of ACE2 important for binding the viral RBD. In a similar study posted on biorxiv, Lam et al. used computational modeling to predict ΔΔG of mutations in 215 animal species, assess their risk for infection, and identify a number of locations on ACE2 that contribute to binding SARS-CoV-2 RBD [15]. Separately, in a single-author report also posted on biorxiv, Erik Procko used deep mutagenesis and selection experiments to systematically characterize the effect of single-point ACE2 mutants on the binding affinity to the viral RBD. From these results, the author designs several ACE2 variants that bind RBD with high affinity and have potential as therapeutics for COVID-19 [14]. Finally, while this manuscript was under revision, Alexander et al. deposited their work in biorxiv predicting the susceptibility of several animal species to infection by SARS-CoV-2 using computational modeling [20].
In this study, we aimed to leverage structural, binding, and sequence data to investigate how different ACE2 orthologs bind to SARS-CoV-2 RBD. We selected 28 animal species likely to encounter humans in a variety of residential, industrial, and commercial settings. For each of these species, we generated 3D models of ACE2 bound to RBD and refined these models using short molecular dynamic simulations. After refinement, we found that models of SARS--CoV-2 pos species generally have a lower (better) score than those of SARS-CoV-2 neg species. Following this positive result, we carried out a per-residue energy analysis that predicts both key locations in ACE2 that are consistently mutated across SARS-CoV-2 neg species, as well as possible mutations that likely enhance binding to the viral RBD. Collectively, our results provide a structural framework to understand why certain animal species are not susceptible to SARS-CoV-2 infection while, at the same time, providing a starting point for rational engineering of antiviral molecular therapeutics. Finally, our work also provides a blueprint for experts and non-experts alike to carry out future structural studies of viral-host protein interactions at high-resolution.

Sequence conservation of ACE2 orthologs
We analyzed the sequence conservation of ACE2 across our dataset (Table 1), with respect to the entire sequence (591 residues) and to the interface residues computed from a structure of ACE2 bound to RBD [12] (PDB ID: 6m17) (22 residues) (S1 Table). All orthologs are reasonably conserved, with global similarity values to the human ACE2 sequence (hACE2) ranging from 72% (goldfish) to 99.5% (chimpanzee) (S1 Fig). All species coarsely cluster in three classes consistent with evolutionary distance to humans: primates have the highest similarity values, followed by other mammals, birds and reptiles, and finally fish. Zooming in on the interface residues, we find more variation (Fig 1, left). Similarity values for this region range from 50% (crocodile) to 100% (all 3 primates) but, despite an overall correlation (Pearson R 2 of 0.68), they do not always match global similarities. Hedgehogs and sheep, for example, share 86.7% and 86.4% global similarity with hACE2, respectively, but 59% and 95.5% for the interface region. In other words, sheep share 21 out of 22 residues with hACE2 at the interface with RBD, while hedgehogs share 13. The horseshoe bat, one of the proposed animal reservoirs for SARS-CoV-2, shares 72.2% interface similarity with hACE2, a comparable value to the 77.3% of the SARS-CoV-2 neg mouse sequence. Altogether, these results prompt two observations. First, simply measuring sequence similarity, either globally or on the entire interface, is not sufficient to confidently predict SARS-CoV-2 susceptibility. Second, that the interface of the viral RBD is substantially plastic and able to bind to sufficiently different ACE2 orthologs.

Refinement of the hACE2:RBD complex
In order to validate the refinement protocol used in our analysis, we created and refined models of human ACE2 (hACE2) bound to SARS-CoV-2 RBD. We used the cryo-EM structure of full-length human ACE2 bound to the RBD, in the presence of the amino acid transporter B 0 AT1 (PDB ID: 6m17). Compared to a high-resolution crystal structure of the same complex [11] (PDB ID: 6m0j), the cryo-EM structure lacks several key contacts between our two proteins of interest, which we attribute to poor density for side-chain atoms at the interface region. Our refinement protocol restores the majority of these contacts (S1 Table), yielding an average HADDOCK score of -116.2 (arbitrary units, a.u.) for the 10 best models of the best cluster. See Materials and Methods for further details on the protocol. These scores fall within the range observed for a reference set of transient protein-protein interactions (N = 144, HADDOCK score = -124.9 ± 53.4) [21]. Upon visual inspection, the interfaces in our models are dominated by hydrogen bond interactions involving the ACE2 α1 helix and a small loop between strands β3 and β4. There is one single salt-bridge involving hACE2 D30 and RBD K417 consistently present in all our hACE2 models. These observations all agree with the published crystal structure. Further, the buried surface area of the refined models is also in agreement with published crystal structures (~1800 Å 2 ). As such, we are confident that our modeling and refinement protocol is robust enough to model all ACE2 orthologs. A similar but less conclusive trend is observed between the sequence similarities of amino acid residues interacting with the viral RBD (derived from PDB 6m17) (right). Collectively, these results suggest that SARS-CoV-2 neg species lack specific key ACE2 amino acid residues that form strong interactions with the viral spike protein, leading to impaired binding between the two proteins. Species are ordered in increasing order of HADDOCK score. Species for which SARS-CoV-2 susceptibility is unknown or assays were inconclusive are shown in gray. https://doi.org/10.1371/journal.pcbi.1008449.g001

PLOS COMPUTATIONAL BIOLOGY
Insights on cross-species transmission of SARS-CoV-2 from structural modeling.

Refinement of orthologous ACE2:RBD complexes
We modeled and refined complexes for all 28 ACE2 orthologs in our dataset (Table 1) using the same protocol as above. The representative models for each species (10 best models of the best cluster) are available for visualization and download at https://joaorodrigues.github.io/ ace2-animal-models/. The HADDOCK scores of all 29 ACE2 complexes (including hACE2) range from -137.5 (dog) to -93.2 (mouse), indicating substantial differences between these interfaces (Fig 1, right, and S2 Table). The average HADDOCK score is -116.7, very close to that of the human complex (-116.2). Overall, models of SARS-CoV-2 pos species have consistently lower (better) scores than those of SARS-CoV-2 neg species. Although it is well-known that docking scores do not quantitatively correlate with experimental binding affinities [22], these scores suggest that SARS-CoV-2 neg species lack one or more key ACE2 residues that contribute significantly to the interaction with RBD.
To understand what forces drive the interactions between ACE2 and SARS-CoV-2 RBD, we quantified the contribution of each component of the HADDOCK scoring function to the overall score (Fig 2). The HADDOCK score is a linear combination of van der Waals, electrostatics, and desolvation energy terms. In our models, electrostatics are the most discriminatory component (Pearson R 2 of 0.62), followed by desolvation (0.28), and lastly van der Waals (0.07). These correlations suggest that differences between the models of the different species originate primarily in polar and charged residues, in agreement with observations from experimental structures. In addition, the buried surface area of the models also correlates quite strongly with the HADDOCK score (Pearson R 2 of 0.65), which is unsurprising since larger interfaces tend to make more contacts. Most models bury between 1700 and 1850 Å 2 , in agreement with the crystal and cryo-EM structures, while the top-scoring species (dog and goldfish) bury nearly 2000 Å 2 and the lowest-scoring (mouse) bury only 1600 Å 2 . Finally, there is a weak correlation between the average HADDOCK score of the representative models and the sequence similarity of the ACE2 interface residues (Pearson R 2 of 0.16) (S2 Fig).

Structural and energetic differences between SARS-CoV-2 pos and SARS-CoV-2 neg species
To gain further insight on how ACE2 sequence variation across the different orthologs affects binding to SARS-CoV-2 RBD, we calculated HADDOCK scores for each interface residue in the refined models. This high-resolution analysis highlights several ACE2 amino acids with strong interaction energies that differ between SARS-CoV-2 pos and SARS-CoV-2 neg species (Fig 3 and S3 Fig).
The first and most relevant of these sites is amino acid 30, which in hACE2 (D30) interacts with RBD K417 to form the only intermolecular salt-bridge of the interface (Fig 4, top left). In all 12 SARS-CoV-2 pos species, this site is occupied by a negatively charged amino acid residue. In contrast, 4 out of 5 SARS-CoV-2 neg species have a hydrophobic or polar residue at this position, breaking the intermolecular salt-bridge (Fig 4, bottom left). The second site is amino acid 31, a lysine in hACE2, and in nearly all of the SARS-CoV-2 pos species, that interacts both with ACE2 E35 and RBD Q493 (Fig 4, top middle). The only exceptions are the civet and dromedary sequences, mutated to threonine and glutamate, respectively. In the case of the civet, our models show that T31 can still hydrogen bond with both E35 and RBD Q493. Dromedaries, on the other hand, share E31 with chickens, guinea pigs, and ducks, all SARS-CoV-2 neg species. However, in dromedary ACE2 the likely electrostatic repulsion between E31 and E35 is compensated by a lysine at position 76 (Q76 in hACE2) leading to the formation of an additional intramolecular salt-bridge that possibly stabilizes the fold of ACE2 and frees E35 to hydrogen bond with Q493 (90% of our models). Those three SARS-CoV-2 neg species have an additional charge-reversal mutation at position 35. In all our chicken and duck models, E31 is locked in an intramolecular salt-bridge with R35, weakening the intermolecular hydrogen bond with RBD Q493 (Fig 4, bottom middle). Finally, guinea pigs compensate K31E with E35K and remain able to hydrogen bond with RBD, while rats have a lysine at this position.
Besides these major discriminatory sites, we identified multiple other sites that are systematically mutated in SARS-CoV-2 neg species. The first of these sites is K353 (in hACE2), which For each species (row), the blocks (columns) represent amino acid residues within 5Å of the viral RBD in any of the species' best 10 models. The identity of the amino acid is shown in one-letter code. The colors represent the HADDOCK score of each residue, averaged over the 10 models: lower scores (dark blue) indicate more favorable interactions, while positive scores indicate steric clashes or electrostatic repulsion. The first row shows the median of the averages for each column. From this analysis, we predict that amino acid residues at positions 30, 31, and 353 contribute the most to the stability of the ACE2:RBD complex. In SARS-CoV-2 neg species (red labels), some of these residues are consistently mutated (30 and 31), which could explain their lower susceptibility to infection. S3 Fig shows the

PLOS COMPUTATIONAL BIOLOGY
Insights on cross-species transmission of SARS-CoV-2 from structural modeling.
is involved in an intramolecular salt-bridge with D38, and two hydrogen bonds with RBD G496 and G502 (Fig 4, bottom right). In rat and mouse ACE2, both SARS-CoV-2 neg species, this residue is mutated to a histidine, which weakens the interaction with D38, possibly leading to increased conformational dynamics of the β3-β4 loop, lower stability of the ACE2 interface fold, and consequently lower binding affinity. Then, Q42, conserved in most other species, hydrogen bonds with RBD Y449 in the majority of our models. In canary, chicken, hedgehog, duck, and crocodile ACE2 sequences, this amino acid is mutated to a glutamate, which introduces the possibility of an additional intramolecular salt-bridge with K68, in ACE2 helix α2. As we observe in some of our models, this intramolecular interaction prevents the formation of the intermolecular hydrogen bond. Finally, amino acid 83, a tyrosine in hACE2 and all other SARS-CoV-2 pos is mutated to phenylalanine in 4 out of 5 SARS-CoV-2 neg species: mouse, duck, rat, and chicken. The loss of the hydroxyl group excludes residue 83 from a ternary hydrogen-bonding network involving Q24 and RBD N487 that likely stabilizes the protein-protein interface. Without this hydrophilic terminal group, residue 83 might also prefer less solvent accessible conformations in the unbound state, burying between both α1 and α2 helices and thus being less available to interact with RBD F486.
To further validate some of these predictions, we built models of hACE2 with mutations D30A, D30N, K31E/E35R, and K353H. In parallel, we built models of mouse ACE2 with mutations N30D, H353K, and a "humanized" triple-mutant N30D/N31K/H353K. The scores of these models are shown in S3 Table. As expected, mutating D30 on hACE2 to either alanine (hydrophobic) or asparagine (polar) breaks the intermolecular salt-bridge with RBD K417 and worsens the HADDOCK scores of these models by 10% and 6%, respectively. On mouse ACE2, the N30D mutation improves the HADDOCK score of the models by 14%. Swapping the charges of positions 31 and 35 on hACE2 (K31E/E25R) also leads to a deterioration of the intermolecular energy (-2%), in particular of the electrostatics component (-20%), in agreement with our previous analysis. On the other hand, mutating position 353 in hACE2 (K353H) and mouse ACE2 (H353K) leads to unexpected score changes. In hACE2 K353, the overall HADDOCK score improves due to better desolvation energies. In mouse ACE2 H353K, the overall score worsens despite improved electrostatics, offset by marked decreases in desolvation energy (-40%). We note that evaluating the impact of mutations at this site is difficult because of assumptions made about the tautomeric states of histidine, which remain fixed during the modeling process. Finally, as expected, the "humanized" mouse ACE2 has a much-improved HADDOCK score (-113.7) very similar to wild-type hACE2 (-116.2).

Affinity-enhancing mutations from ACE2 orthologs
In addition to highlighting discriminatory mutations between SARS-CoV-2 pos and SARS--CoV-2 neg species, our models also allow us to search for mutations that could be used to generate variants of hACE2 with higher affinity towards the viral RBD. To this end, we calculated a modified HADDOCK score for each residue, including both intra-and intermolecular interactions, and then subtracted the score of the corresponding residue in the hACE2:RBD models (see Material and Methods for details) (Fig 5).
The resulting analysis highlights several single-point mutations that we predict could confer a higher affinity for RBD if engineered on hACE2. Some we can explain with simple biophysics following a careful inspection of the models (Fig 6). Q24E, observed in both the pangolin and horseshoe bat sequences, contributes to a stronger hydrogen bond network with partner RBD N487, and helps stabilize the α1 helix through interactions with the backbone of neighboring S21; A25V, observed only in the dog sequence, is buried between helices α1, α2, and α3, and contributes to a stronger packing with neighboring hydrophobic and aromatic residues (L29, Y83, V93, and L97); D30E stabilizes the intermolecular salt-bridge with RBD K417 due to the longer glutamate side-chain; H34Y enhances the hydrophobic interactions with neighboring L455 and the aliphatic chain of RBD N493; F72Y introduces possible hydrogen-bonding interactions between helices α1 and α2, while maintaining strong hydrophobic packing through the phenyl ring; L79H, observed only in the ferret ACE2 sequence, allows for intermolecular hydrogen bonds with the backbone carbonyl of RBD G485, in addition to stabilizing helix α2 and the packing of helices α1 and α2 through hydrogen bonds with residue 76 and aromatic stacking with F28; finally, A387E, observed in the ferret sequence, can interact with both R354 (G354 in hACE2) and, more importantly, RBD R408.
Other mutations observed in top-scoring species and predicted in our models to have stronger local interactions are dependent on additional mutations in neighboring residues. F40S, observed in bat and horse ACE2, forms a hydrogen bond with the hydroxyl group of Y390 (F390 in hACE2); Y41H, observed in bat, donkey, and horse ACE2, contributes to a polar network involving RBD residues, namely Q498, T500, and N501, as well as hydrogen bonds with the backbone of ACE2 L351, that might help stabilize the local fold of the β3-β4 loop; lastly, Q76E, Q76K, and T78K all stabilize helices α1 and α2 through interactions with adjacent residues, such as E31 (dromedary), E74 (pangolin), E75 (dog), or H79 (ferret).

Can structural modeling predict cross-species transmission of SARS-CoV-2?
Our detailed computational modeling of 29 vertebrate ACE2 orthologs bound to SARS-CoV-2 RBD discriminates between previously reported SARS-CoV-2 pos and SARS-CoV-2 neg species. Models of SARS-CoV-2 neg species-chicken, duck, mouse, and rat-have clearly higher (worse) HADDOCK scores than average (Fig 1), suggesting that these species' non-susceptibility to infection could stem from deficient RBD binding to ACE2. Despite this clear trend, our predictions are not entirely correct. Our modeling ranks guinea pig ACE2 (SARS-CoV-2 neg ) as a better receptor for SARS-CoV-2 RBD than for example, human, cat, horse, or rabbit ACE2 (all SARS-CoV-2 pos species), despite experiments showing that there is negligible binding between the two proteins [8]. Our ranking largely agrees with other published analyses on much larger datasets [15,18]. Notable differences are the predicted susceptibilities of ferrets and bats, highscorers in our dataset but characterized as low-risk by the conservation model of Damas et al. These small differences suggest that is a benefit to adding a structural dimension to these predictive models.
These results highlight the need to carefully interpret computational predictions, a concern also shared by Damas et al [18]. As we note earlier in the introduction, SARS-CoV-2 infection is a complex multi-step process [23]. Thus, while we can assume that impaired ACE2 binding likely decreases odds of infection, we cannot state that ACE2 binding is predictive of infection. ΔHADDOCK score of individual ACE2 interface residues compared to hACE2. For each species (row), the blocks (columns) represent amino acid residues within 7.5Å of the viral RBD in any of the species' best 10 models. The identity of the amino acid is shown in one-letter code. The colors represent the ΔHADDOCK score-including intramolecular interactions-of each residue, averaged over the 10 models, compared to the average of the corresponding hACE2 residue: negative scores (dark blue) indicate a stabilizing mutation. This analysis highlights several potential affinity-enhancing mutations, namely Q24E, A25V, D30E, H34Y, F40S, Y41H, F72Y, L79H, and A387E. We note that this analysis requires further visual inspection of the models to account for additional variations in ACE2 sequence that may skew the per-residue HADDOCK score. Refer to the main text for details. S4 Fig shows the  For instance, experiments with recombinant ACE2 show that the pig ortholog binds SARS--CoV-2 RBD and leads to entry of the virus in host cells [8], but tests in live animals returned negative results [7]. Our modeling protocol, like any structure prediction method, is limited by the accuracy of its sampling method (how many conformational states do we model) and of its scoring function (how do we pick the most native-like model). In addition, by basing our models on a single structure of the hACE2:RBD complex, we make assumptions about the bound state of the two proteins when it is known from structures of the full-length SARS-CoV-2 spike protein [24] and coarse-grained simulations [25] that there is considerable flexibility at the interface. As such, our computational models alone cannot be used to predict whether certain animal species are at risk of infection. With additional data, it might be possible to build a reliable predictor of susceptibility to infection, but as it stands, our ranking of species requires thorough experimental validation before drawing definitive conclusions. What our models do allow us to conclude, however, is that there are distinctive molecular features between SARS--CoV-2 pos and SARS-CoV-2 neg species. As the adage goes, 'all models are wrong, but some are useful.'

SARS-CoV-2 neg species lack important polar and charged ACE2 residues
Having established that our modeling protocol and scoring function generally distinguishes between SARS-CoV-2 pos and SARS-CoV-2 neg species, we took a closer look at the biophysical properties of the interface across all orthologs. We decomposed our scoring function in its individual energy terms (Fig 2) and found that SARS-CoV-2 neg models rank worse due to a substantial decrease in electrostatic energy and interface surface area. An exhaustive per- residue energy analysis of the 10 best models for each species (300 models in total), reveals a loss of hydrogen bonds and salt-bridges in models of most SARS-CoV-2 neg species. Collectively, our set of ACE2 residues predicted to be key for the interaction (D30, K31, H34, E35, Q42, and Y83) overlaps significantly with other computational predictions [15,18] and is largely in agreement with experimental mutagenesis data [14]. In contrast, our models do not attribute particular importance to M82, highlighted as a critical residue by Damas et al, nor to any of the glycosylated moieties attached to N53, N90, or N322, known to play a role in the interaction [26]. However, since our modeling of these post-translational modifications is based on the very short glycans observed in the template cryo-EM structure, we cannot discard possible interactions between these sugars and the viral RBD and direct the interested reader to ongoing all-atom molecular dynamics simulations of fully-glycosylated SARS-CoV-2 Spike and ACE2 [27,28].
In more detail, mouse, duck, rat, and chicken ACE2 sequences lack the only intermolecular salt-bridge observed between hACE2 and RBD due to mutations on position 30 (D30 in hACE2) (Fig 4, left). These predictions agree with published structures [11,12] and are supported by experimental work showing that mutants lacking a negatively-charged amino acid at this position are largely unable to bind SARS-CoV-2 RBD [14]. Other non-conservative mutations on ACE2 also contribute negatively to the interface scores. Our models suggest that the introduction of a negatively charged residue at position 31 is disruptive to binding, again in agreement with mutagenesis experiments [14], unless compensated by an additional mutation. In both chicken and duck ACE2, the compensatory mutation-E35R -nevertheless locks E31 in an intramolecular salt-bridge and prevents it from interacting with RBD (Fig 4, middle). Then, at the opposite end of the α1 helix, our models identify K353 as a strong contributor to interface stability that is mutated in both rat and mouse ACE2 (Fig 4, right). The long lysine side-chain stabilizes the interface region of ACE2 by forming an intramolecular salt-bridge with D38, but also contributes to hydrogen bonds with viral RBD, with G496 and G502. These results support other modeling work [17] that predicts that RBD mutants G496D bind worse to ACE2 because of the disruption of this intramolecular salt-bridge. Also, as shown by experiments, any mutation in this region decreases the ability of ACE2 to bind RBD [14], confirming our predictions and highlighting a second important interaction site lacked by rodents' ACE2. Our predictions identify other sequence differences between SARS-CoV-2 pos and SARS-CoV-2 neg species that impair intra-and intermolecular polar interactions. Position 83 is mutated from a tyrosine to a phenylalanine in all SARS-CoV-2 neg species except guinea pig, while position 42, a glutamine in all SARS-CoV-2 pos species, is mutated to a glutamate in chicken and duck ACE2. Introducing these mutations on hACE2 leads to impaired binding of RBD [14]. Given the highly polar nature of the interface (S1 and S2 Tables), it is then plausible that the accumulation of several mutations on key polar and charged residues, as observed in SARS--CoV-2 neg species, leads to a drastic reduction in binding affinity between the two proteins and is responsible for reduced susceptibility to infection.

Natural variants of ACE2 encode potential affinity-enhancing mutations for SARS-CoV-2 RBD
One of the many proposed antiviral agents against SARS-CoV-2 is recombinant soluble hACE2 [29], which acts as a decoy for the viral RBD and therefore reduces rates of infection. While deep mutagenesis experiments have been used to optimize protein-protein interfaces for therapeutic purposes [30], it is impractical to carry out an exhaustive search of the entire protein sequence space. Our models suggest several sites and variants that potentially enhance the affinity between hACE2 and RBD (Fig 5). These predictions fall in three broad categories: the stabilization of existing interactions, the introduction of novel interactions, and stabilization of the ACE2 interface region. We note, however, that our coverage of sequence space is limited to naturally occurring variants, and that natural selection imposes additional constraints on sequence variability besides RBD binding.
For the first category, the clearest affinity enhancer seems to be D30E, a variant observed in 6 of the 8 best scoring species (Fig 6, third panel) and shown in experiments to increase binding to RBD [8,14]. The longer side-chain of a glutamate residue can help strengthen and stabilize the intermolecular salt-bridge with RBD K417. The impact of such Asp-to-Glu mutations in modulating protein interactions has been reported previously for other systems [31]. Other mutations predicted to enhance the strength of existing interactions between ACE2 and RBD include Q24E (Fig 6, first panel) and F72Y, both validated by experiments [14]. The introduction of novel interactions is particularly interesting from a protein design perspective. Our models predict that placing a negative charge at position 387 might allow for a second intermolecular salt-bridge to form with RBD R408 (Fig 6, fourth panel). In our hACE2 models, RBD R408 points towards-but does not interact with-the glycan molecule bound to N90. It has been shown that removing this N-glycosylation motif increases RBD binding, while both A387D and A387E lead to mild increases in binding affinity in some cases [14]. As such, we propose that a double N90A/A387E mutant could have a synergistic effect on RBD affinity. Finally, it is known that interactions between rigid binders, with little to none conformational changes upon binding, have the highest affinities [21]. Indeed, this is a ground rule of many successful protein interface designs (e.g. [32]). Our per-residue energy analysis predicts that A25V stabilizes the packing of the α1 and α2 helices, which is an important nexus of RBD interactions (Fig 6, second panel).
Our models also predict that mutating H34Y increases RBD binding, possibly by introducing novel interactions with RBD via the terminal hydroxyl of the tyrosine side-chain. In addition, the large aromatic ring offers a hydrophobic partner for RBD L455. Our predictions for both H34V and H34S indicate that neither of these mutants is energetically favorable, likely because they retain only one of the two types of interactions (aromatic or polar). However, experiments show exactly the opposite behavior: H34S or H34V dramatically increase binding to RBD, while H34Y decreases it [14]. This result highlights the limitations of our models and stresses the need for experimental validation for all our predictions.

Can SARS-CoV-2 variants render non-susceptible species susceptible to infection?
Finally, while the focus of our analysis was the variation of ACE2 and its impact on binding to SARS-CoV-2 RBD, this is only half of this viral-host interaction story. Since others have carried out extensive analyses to assess whether variations in SARS-CoV-2 RBD lead to increased infectivity in humans [33], we asked instead if these variations could introduce compensatory mutations that rendered SARS-CoV-2 neg species, like mice or chicken, susceptible to infection. To this end, we carried out a cursory analysis of more than 100,000 SARS-CoV-2 RBD sequences obtained from infected patients and deposited in the GISAID database [34] (see Material and Methods for details). Out of 17 single-point mutants located at or near the interface and with a frequency over 0.01%, 4 (Y453F, L455F, Q498H, and N501Y) lead to improvements of more than 7% to the HADDOCK score, suggesting enhanced affinity of these RBD variants to mouse ACE2 (S4 Table). Mutants L455F, Q498H, and N501Y mostly improve the desolvation energy of the complex, while Y453F improves packing of hydrophobic residues in the central region of the interface and frees ACE2 Q34 to form a water-mediated hydrogen bond with RBD R403. Only one variant (F486L) leads to a consistently worse score (-12%) because of the loss of hydrophobic interactions with aromatics in ACE2 helices α1 and α2.
We do note, however, that we considered each mutation independently. Since viral strains often display multiple mutations in the spike protein, it is not unreasonable to posit that there could be a combination of such affinity-enhancing mutations that render animal species previously not susceptible to infection more likely to be infected. However, we believe that drawing such conclusions with a moderate degree of confidence requires a more complete analysis that is beyond the scope of this manuscript. Yet, it is an interesting question that we believe could be studied using protocols similar to ours.
In summary, our protocol combines structural, sequence, and binding data to create a structure-based framework to understand SARS-CoV-2 susceptibility across different animal species. Our models help rationalize the impact of naturally-occurring ACE2 mutations on SARS-CoV-2 RBD binding and explain why certain species are not susceptible to infection with the virus. Our predictions complement other computational analyses and can be another source of information when building predictors for susceptibility to infection. In addition, we propose possible affinity-enhancing mutants that can help guide engineering efforts for the development of ACE2-based antiviral therapeutics. Importantly, our protocol is fast and can easily be reproduced using freely-available tools and web servers. As such, it may serve as a blueprint for future modeling studies on protein interactions, including other viral-host systems, where data is available for a large number of homologues.
Finally, to prevent human-to-animal transmission and limit the risk of infecting domestic and wild animals, including possibly endangered species, we recommend following the World Organization for Animal Health guidelines: people infected with COVID-19 should limit contact with their pets, as well as with other animals (including humans).

Sequence alignment of ACE2 Orthologs
Sequences of ACE2 orthologs from 27 species were retrieved from NCBI using the human gene as a reference (Gene ID: 59272, updated on 20-Apr-2020) and the query term "ortholog_-gene_59272[group]". Other species, such as Rhinolophus sinicus, were manually included using custom queries. The sequences were aligned with MAFFT version 7 [35,36], using the alignment method FFT-NS-i (Standard). Some sequences were of lower quality and had undefined amino acids ('X'). When reasonable, i.e. when undefined positions were rare and not located at the interface between the two proteins, we converted these sites to glycine to allow modeling and avoid introducing artificial side-chain interactions. All species and the respective protein identifiers are listed in Table 1.

Definition of sequence similarity
All calculations were based on the alignments from MAFFT, restricted to the region used for modeling (residues 21-600). To calculate sequence similarity, we considered the following groups based on physico-chemical properties: charged-positive (Arg, Lys, His), charged-negative (Asp, Glu), aromatic (Phe, Tyr, Trp), polar (Ser, Thr, Asn, Gln), and apolar (Ala, Val, Ile, Met). Cys, Gly, and Pro residues were considered individual classes.

Modeling of ACE2 orthologs
The modeling of ACE2 orthologs was carried out using MODELLER 9.24 [37] and custom Python scripts (available here: https://github.com/joaorodrigues//ace2-animal-models/). We used the cryo-EM structure of the SARS-CoV-2 RBD bound to human ACE2 (PDB ID: 6M17) [12] as a template for all our subsequent models, including all glycans and the coordinates of RBD. We chose not to model the first 17 residues of ACE since they are predicted as a signal peptide in UniProt (accession Q9BYF1), which is cleaved after membrane insertion. In addition, we refrained from modeling residues 18-20 because these are either missing from publicly available structures or, when present, have poor stereochemistry. Finally, we did not include the transmembrane domains of ACE2 both to save computational resources and because residues 21-600 are known to be sufficient to bind to RBD. To avoid unwanted deviation from the initial cryo-EM structure, we restricted the optimization and refinement of the models to the coordinates of atoms of mutated or inserted residues. We used the fastest library schedule for model optimization and the very_fast schedule for model refinement. For each species, we generated 10 backbone or loop models and selected the one with the lowest normalized DOPE score as a representative. These final models were then processed to remove any sugar molecules in species where the respective asparagine residue had been mutated.

Refinement of ACE2:RBD complexes
The initial complex models were prepared for refinement using the pdb-tools suite [38]. Each chain was separated into a different PDB file (pdb_selchain) and standardized with TER and END statements (pdb_tidy). We used HADDOCK 2.4 [39] to carry out the refinement of the models. The protein molecules were parameterized using the standard force field in HAD-DOCK, while the sugars were parameterized using updated parameters for carbohydrates [40]. We used a modified version of the topology generation scripts to allow automatic detection of N-linked glycans and expand the range of the interface refinement (10 Å distance cutoff). Each initial homology model was refined through 50 independent short molecular dynamics simulations in explicit solvent (solvshell = True). These refined models were then clustered using the FCC algorithm [41] with default parameters and scored using the HADDOCK score, a linear combination of van der Waals, electrostatics, and desolvation. A lower HADDOCK score is better. The top 10 models of the top scoring cluster, ranked by its average HADDOCK score, were selected as representatives of the complex.

Analysis of interface contacts of ACE2:RBD complexes
We used the interfacea analysis library (version 0.1) (http://doi.org/10.5281/zenodo.3516439) to identify intermolecular contacts between hACE2 and RBD, specifically hydrogen bonds, salt bridges, and aromatic ring stacking. Hydrogen bonds were defined between any donor atom (nitrogen, oxygen, or sulfur bound to a hydrogen atom) within 2.5 Å of an acceptor atom (nitrogen, oxygen, or sulfur), if the donor-hydrogen-acceptor angle was between 120 and 180 degrees. Salt bridges were defined between two residues with a pair of cationic/anionic groups within 4 Å of each other. Finally, two aromatic residues were defined as stacking if the centers of mass of the aromatic groups were within 7.5 Å (pi-stacking) or 5 Å (t-stacking) and the angle between the planes of the rings was between 0 and 30 degrees (pi-stacking) or between 60 and 90 degrees (t-stacking). Additionally, for pi-stacking interactions, the projected centers of both rings must fall inside the other ring. For each modelled species, we took the 10 best models of the best cluster, judged by their HADDOCK score, and aggregated all their contacts together. Contacts present in at least 5 models were considered representative.

Per-residue decomposition of HADDOCK scores
We used a custom CNS [42] script to calculate the HADDOCK score of each residue at the interface between ACE2 and RBD. Briefly, the protocol was the following. For each model, since HADDOCK uses a united-atom force field, we first added missing hydrogen atoms and minimized their coordinates, keeping all other atoms fixed. We marked a residue of ACE2 as part of the interface if any of its atoms were within 5 Å of any atom of RBD, and vice-versa. We then calculated the electrostatics, van der Waals, and desolvation energies for each of these residues, considering only atoms belonging to the other protein chain. Note that this protocol does not account for intramolecular effects of mutations. Finally, we calculated the HAD-DOCK score per residue, using the default scoring function weights, and averaged per-residue values for the best 10 models of the best cluster of each species. For the calculation of combined intra-and intermolecular scores (Fig 6), we followed a similar protocol where the distance cutoff to define neighbors was increased to 7.5 Å and atoms from both chains were considered.

Analysis of SARS-CoV-2 RBD sequence variation
We queried the public database GISAID (Global Initiative on Sharing All Influenza Data) [34] on September 21 st , 2020 for protein sequences of the SARS-CoV-2 spike protein. The search returned a total of 104,979 sequences, 93,374 of which had more than 95% sequence coverage. These sequences were then aligned against a reference SARS-CoV-2 spike protein sequence (UniProt accession P0DTC2) using MAFFT v7.471 and the resulting alignment truncated to the region comprising the receptor-binding motif (residues 438-506) of the RBD. We then computed the frequency of mutations at each site in the alignment, ignoring undefined residues (X) and gaps (-). Mutations occurring with a frequency equal or above 0.01%, belonging to residues within 5A of any atom of ACE2 in the cryo-EM structure of the ACE2:RBD complex (PDB ID: 6m17), and leading to significant changes in the character or volume of the amino acid were modelled onto the complex of mouse ACE2 with RBD and refined, using the protocols described above. The scores for these variant models are listed in S4 Table. Supporting information  Table. HADDOCK scores and individual energy terms for each modeled ACE2:RBD complex. The values represent the average and standard deviation of the 10 best models (ranked by HADDOCK score) of each species. (DOCX) S3 Table. HADDOCK scores and individual energy terms for mutated human ACE2:RBD and mouse ACE2:RBD complexes. The values represent the average and standard deviation of the 10 best models (ranked by HADDOCK score) of each model. (DOCX) S4 Table. HADDOCK scores and individual energy terms for mouse ACE2 bound to select frequent single-point mutants of RBD (in >0.01% of sequences) derived from an analysis of 104,979 SARS-CoV-2 genomes. The values represent the average and standard deviation of the 10 best models (ranked by HADDOCK score) of each model. (DOCX)