Expanded ACE2 dependencies of diverse SARS-like coronavirus receptor binding domains

Viral spillover from animal reservoirs can trigger public health crises and cripple the world economy. Knowing which viruses are primed for zoonotic transmission can focus surveillance efforts and mitigation strategies for future pandemics. Successful engagement of receptor protein orthologs is necessary during cross-species transmission. The clade 1 sarbecoviruses including Severe Acute Respiratory Syndrome-related Coronavirus (SARS-CoV) and SARS-CoV-2 enter cells via engagement of angiotensin converting enzyme-2 (ACE2), while the receptor for clade 2 and clade 3 remains largely uncharacterized. We developed a mixed cell pseudotyped virus infection assay to determine whether various clades 2 and 3 sarbecovirus spike proteins can enter HEK 293T cells expressing human or Rhinolophus horseshoe bat ACE2 proteins. The receptor binding domains from BtKY72 and Khosta-2 used human ACE2 for entry, while BtKY72 and Khosta-1 exhibited widespread use of diverse rhinolophid ACE2s. A lysine at ACE2 position 31 appeared to be a major determinant of the inability of these RBDs to use a certain ACE2 sequence. The ACE2 protein from Rhinolophus alcyone engaged all known clade 3 and clade 1 receptor binding domains. We observed little use of Rhinolophus ACE2 orthologs by the clade 2 viruses, supporting the likely use of a separate, unknown receptor. Our results suggest that clade 3 sarbecoviruses from Africa and Europe use Rhinolophus ACE2 for entry, and their spike proteins appear primed to contribute to zoonosis under the right conditions.


Introduction
As shown by the ongoing Severe Acute Respiratory Syndrome-related Coronavirus 2 (SARS-CoV-2) pandemic, viral spillover from animal reservoirs can decimate public health systems and the global economy. The likelihoods of zoonotic spillovers are multifactorial, including both ecological and molecular factors. Human disruptions to world ecosystems are increasing the likelihood of future zoonotic events [1]. We still lack a clear understanding of the molecular factors that play key roles during zoonosis.
Molecular compatibility during viral entry is a key determinant of viral tropism and host switching [2][3][4][5][6]. The Betacoronavirus genus include known zoonotic viruses of pandemic a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 potential including Middle East Respiratory Syndrome-related Coronavirus (MERS-CoV), SARS-CoV, and SARS-CoV-2. These viruses use the spike glycoprotein to catalyze entry into target cells upon binding to a compatible host cell receptor. Unlike MERS-CoV, which uses Dipeptidyl-peptidase 4 (DPP4) as the cell surface receptor [7], the lineage B viruses of the sarbecovirus subgenus SARS-CoV and SARS-CoV-2 utilize angiotensin converting enzyme-2 (ACE2) as the host cell entry receptor [8,9]. ACE2 binding from SARS-like CoVs is dictated by an independently folded domain of up to 223 residues in length, referred to as the receptor binding domain (RBD).
Multiple viral clades exist within the sarbecovirus subgenus, and the cell surface receptor dependencies of each clade are not well established [10,11]. Clade 1 sarbecoviruses including SARS-CoV and SARS-CoV-2 are known to utilize ACE2, while the receptors for clade 2 and clade 3 viruses are unknown [10,11]. The lack of observed ACE2-dependent enhancement to infection by clade 2 and clade 3 sarbecovirus spike proteins, such as YN2013 or BM48-31, can be explained in 3 ways: (1) these RBDs have weak but functionally relevant affinity for ACE2, below the limit of detection of commonly used assay; (2) these RBDs have affinity for certain orthologs of ACE2, but little or no affinity for human ACE2 or for any orthologs that have been tested so far; or (3) these RBDs primarily utilize an entry mechanism distinct from ACE2.
Here, we characterize the extent of ACE2 dependence across sarbecovirus clades. We utilized a single-copy HEK 293T genome modification platform to strongly overexpress multiple cell surface proteins proposed to serve as receptors for SARS-CoV-2, alongside the well-established receptor, ACE2 [12]. As the clade 2 and clade 3 sarbecoviruses were observed in samples collected from various Rhinolophus bats, we synthesized and expressed ACE2 orthologs from Rhinolophus ferrumequinum, Rhinolophus affinis, Rhinolophus alyone, Rhinolophus landeri, Rhinolophus pearsonii, and various ACE2 alleles observed in Rhinolophus sinicus. We observed differing patterns of ACE2 ortholog usage by various clade 3 sarbecovirus RBDs during cell entry, including human ACE2-dependent entry by the BtKY72 and Khosta-2 RBDs. We observed little to no ACE2-dependent infection with RBDs from clade 2 sarbecoviruses, including various alleles from R. sinicus and R. pearsonii from which these viruses were isolated. Thus, our study provides a new genetic approach for characterizing receptor utilization during viral entry and demonstrated that clade 3 sarbecoviruses likely utilize ACE2 as a cellentry receptor during infection.

Developing a robust genetic assay for viral entry
Knowing that sarbecovirus spike proteins may exhibit weak affinity for ACE2 proteins from mismatched hosts, we designed an assay for measuring biochemically weak but functionally important interactions promoting viral entry. We previously developed a Bxb1 recombinasebased transgenic expression system, wherein human ACE2 or its coding variants could be stably and precisely expressed by a Tet-inducible promoter already engineered into the cell genome, upon integration of a single promoterless plasmid [12] (Fig 1A). We found that the human ACE2 cDNA, when encoded behind a consensus Kozak sequence permitting frequent ribosomal translation of the mRNA, yielded high ACE2 cell surface abundance, roughly 10-fold greater than ACE2 protein observed in Vero-E6 cells, commonly used to propagate SARS-CoV or SARS-CoV-2 in cell culture [12].
We previously discovered that our pseudovirus infection system was more sensitive than traditional in vitro binding assays utilizing soluble proteins. For example, expression of ACE2 mutants K31D or K353D, which reduced binding to soluble monomeric SARS-CoV RBD in vitro [13], had little to no effect for SARS-CoV spike pseudovirus infection when translated reduced 30-fold, suggesting that avidity effects conferred by high cell-surface ACE2 abundances can compensate for reductions to binding affinity. Thus, we focused on further developing a flexible pseudovirus infection assay capable of detecting weak but specific protein interactions enabling infection.
To increase throughput, we converted the traditional singleplex pseudovirus infection format into a duplex assay configuration. Traditionally, control and experimental cells are plated separately into different wells (Fig 1B, left). All wells are then exposed to the same volume of viral inoculum and infectivity is quantitated by taking a ratio of the amount of infection present in the experimental wells divided by the amount of infection present in the control wells. While ensemble measurements such as luciferase activity require a traditional singleplex format, fluorescent reporters for infection, such as GFP positivity, are single-cell assays and easier to multiplex. Thus, we developed an approach wherein the experimental and control cells are marked by different fluorescent proteins, allowing the 2 cell types to be mixed together and infected by the same inoculum of GFP-reporter pseudovirus within the same well (Fig 1B,  right). Instead of calculating the ratio of GFP positivity from 2 different wells, we take the ratio of GFP positivity in mCherry-negative control cells or putative receptor overexpressing iRFP670-positive cells, all from a single well. When testing 2 nearly isogenic cell lines differing solely by their expression of a putative receptor transgene, this ratio quantifies the amount of receptor-dependent enhancement to infection that has occurred.
To validate this approach, we created ACE2(dEcto)-negative control HEK 293T cells encoding human ACE2 lacking its entire ectodomain, and thus incapable of serving as a cell surface receptor for SARS-CoV or SARS-CoV-2 spike. We marked these cells with red nuclei using mCherryfused histone H2A (Fig 1B). Notably, HEK 293T cells naturally express a low but detectable amount of endogenous ACE2 from the X chromosome [12], thus accounting for the low, background level of infection in the assay. We next created ACE2 HEK 293T cells encoding full-length human ACE2 and marked these cells with near-infrared fluorescent nuclei using iRFP670-fused histone H2A. These cells exhibit more than 100-fold increased ACE2 protein than unmodified HEK 293T cells [12]. These cells were mixed into the same well and exposed to GFP encoding lentiviral particles coated with the ACE2-dependent envelope glycoprotein SARS-CoV spike ( Fig  1C, left), or an ACE2-independent envelope glycoprotein such as vesicular stomatitis virus glycoprotein (VSV-G; Fig 1C, right), which uses LDLR as the viral entry receptor [14]. After 2 or more days, the entire well of cells can be analyzed with multicolor flow cytometry to simultaneously measure the infection rates in ACE2-expressing or control cells. WeAU : Pleasecheckandconfirmth observed that ACE2-dependent viruses, such as those with SARS-CoV spike, exhibited preferential infection of the ACE2-expressing iRFP670-fluorescent cells, whereas pseudoviruses coated with VSV-G infected the mCherry and iRFP670 expressing cells equally (Fig 1D). We will heretofore refer to this as the duplex infection assay.
We next performed a systematic analysis of how the duplex infection assay performed when the 2 cells were mixed at different ratios and compared these results with data obtained using the traditional singleplex assay format. We observed the greatest ACE2-dependent infection when the ACE2-expressing cells were a tenth of the total cells in the well (Fig 1E), with the coefficient of variation similar to the traditional singleplex assay format. As the proportion of ACE2-expressing cells increased, the amount of ACE2-dependent SARS-CoV-2 spike mediated infection reduced from approximately 52-fold at 10% ACE2-expressing cells to approximately 16-fold at 40% ACE2-expressing cells. There was a concomitant increase in the coefficient of variation suggesting a loss of data precision, at least partially due to insufficient sampling of the background level of infection in the control cells. While we generally tried to keep the ACE2-expressing cells a minor fraction of the mixed cultures, some experiments were performed before we had characterized this phenomenon. We observed that the SARS-CoV related to SARS-CoV capable of using human ACE2 for entry [25]. Similar surveillance efforts uncovered hundreds of related coronaviruses in bats, but many of the receptor usages of these viruses are unknown. Due to the increased sensitivity possible with pseudovirus assays [12], we focused our remaining studies toward assessing the ACE2 dependencies of diverse, uncharacterized SARS-like CoV spike proteins.
To further establish the specificity of the approach, we subjected the mixture of cells expressing either the human ACE2 cDNA or the ectodomain-deleted control ACE2 construct to a panel of viruses pseudotyped with a wide range of viral entry glycoproteins. We found that the glycoproteins for Ebolavirus, Marburgvirus, Lassa fever virus, lymphocytic choriomeningitis virus (LCMV), Junin virus, and MERS-CoV infected both ACE2 expressing and ACE2 null cells similarly (Fig 3A), consistent with the fact that none of these glycoproteins rely on ACE2 for infection [14,[26][27][28][29]. In contrast, the spike protein from WIV1 exhibited clear human ACE2 dependence comparable to SARS-CoV and SARS-CoV-2 spike (Fig 3A). Thus, the ACE2 duplex pseudovirus infection assay can be used to query the dependencies of a wide range of viral glycoproteins with high specificity.
We next turned our attention to characterizing the RBDs from the spike proteins of novel sarbecoviruses observed in bats. The RBDs from the spike proteins of sequenced sarbecoviruses thus far divide into 3 major clades [10,11], although more are likely to be found and divided into additional clades [30]. All RBDs from clade 1 viruses tested so far have used ACE2 for entry, oftentimes exhibiting clear binding and utilization of human ACE2 [10], while the receptor dependencies of clade 2 and clade 3 viruses are unknown [10,11]. We compiled a list of clade 2 and clade 3 RBDs following manual curation of sarbecovirus spike proteins currently listed in the National Center for Biotechnology Information (NCBI) (S1 Table). Clade 3 RBDs were all roughly 219 to 222 residues in length and thus similar in length to clade 1 RBDs, which were between 222 and 223 residues. In contrast, clade 2 RBDs possess an internal deletion, resulting in RBDs of 204 or 205 residues in length. To contextualize the protein sequence differences between diverse sarbecovirus RBDs, we created a matrix of pairwise Hamming distances of the differences in amino acid sequence for each RBD, ordered by hierarchical clustering (Figs 3B and S2). The resulting clustering recreated the 3 established clades [10] and largely corresponded to evolutionary phylogenies [11].
Clade 1 and clade 3 virus RBDs exhibited similarities not shared with clade 2 RBDs. Clade 1 and clade 3 RBD sequences were more similar to each other than clade 2 RBDs, while clade 2 Normalized human ACE2-dependent infection data from 5 or more replicate pseudotyped virus infections with diverse viral entry glycoproteins. (B) Hamming distance matrix between the pairs of RBD amino acid sequences. Numbers denote nonidentical amino acids between each pair. RBD lengths, virus names, clade groupings, and bat host species identified upon isolation are shown at the bottom. Names in brown were previously shown to bind human ACE2. The caret symbol denotes the closest bat species inferred through ACE2 protein sequencing from the sample. (C) Nonidentical residues between RBDs grouped by clade. (D) Models of RBD tertiary structure, with the RBD colored pink and ACE2 colored in cyan. SARS-CoV RBD sequence was used to make the model for clade 1 RBDs were equally distant from both of the other clades (Fig 3B and 3C). Clade 1 RBDs also exhibited high intraclade variability, such as between the SARS-CoV and SARS-CoV-2 RBDs, which exhibit 60 amino acid differences (Fig 3B). Similar to clade 1 RBDs, the clade 3 RBDs exhibited high intraclade variability, with many pairwise combinations differing by 40 or more amino acids (Fig 3B and 3C). In contrast, the RBDs from clade 2 viruses yielded comparatively low intraclade variability, with the RBDs exhibiting 28 amino acid differences or less (Figs 3B  and S2). In a previous study, none of the 21 clade 2 RBDs tested used human ACE2 [10]. Only 3 clade 3 RBDs (BM48-31, PRD-0038, and PDF-2386) were tested, with none exhibiting increased infection with human ACE2 [10,11]. In contrast, RaTG15, which possesses an RBD distinct from the other known sarbecoviruses and thus may constitute a separate clade [30], and the clade 1 viruses, have all been shown to use ACE2, either from humans or from other animals.
The dissimilarity of the clade 2 RBD sequences relative to the other clades likely results in an altered tertiary structure in the RBD surface typically known to bind ACE2. Homology modeling of the 3D structure of the YN2013 clade 2 RBD showed the approximately 15-residue deletion to cause a shortening of the receptor binding motif in a section often referred to as the "receptor binding ridge" [31] (Fig 3D), a disulfide-linked loop that makes contact with the N-terminal alpha helix in the ACE2 protein ectodomain (Fig 3F). This deletion removes the disulfide bond, as none of the clade 2 RBDs encode cysteines in that region (RBD residues 158 through 171), while all clade 1 or clade 3 viruses do (Fig 3E). Accordingly, there are only 7 cysteines encoded in clade 2 RBDs, while the clade 1 and clade 3 viruses have 9 (Fig 3B). In contrast, homology models of the BtKY72 clade 3 and SARS-CoV clade 1 RBDs showed an extended receptor binding ridge similar to the experimentally determined SARS-CoV-2 ACE2 cryo-EM costructure [31] (Fig 3D).
Coronavirus spike proteins are routinely greater than 1,250 amino acids in length, and chemically synthesizing each cDNA for functional analysis is prohibitively expensive. We instead took a chimeric spike approach, where we chemically synthesized the RBDs from various bat coronaviruses, and inserted the sequence in place of the RBD of SARS-CoV spike (Fig 3G) [10,11]. We tested a panel of 3 clade 2 RBDs: Rs4237 [32], Rp3 [33], and YN2013 [34], each found in horseshoe bats in Eastern or Southeastern Asia. We also initially tested a small panel of clade 3 RBDs: BM48-31 [35], BB9904, and BtKY72 [36], which were the only three clade 3 RBDs identified as of June 2020. BM48-31 and BB9904 were identified from Rhinolophus blasii and Rhinolophus euryale bat samples collected in Bulgaria, while BtKY72 was collected from an unidentified Rhinolophus bat in Kenya. As positive controls, we generated chimeric SARS-CoV spikes encoding the WIV1 or SARS-CoV-2 RBDs. The WIV1 and SARS-CoV-2 RBDs promoted strong ACE2-dependent pseudovirus entry, corresponding to 81-fold and 63-fold, respectively (Fig 3H, left). Of the clade 2 and clade 3 RBDs, only BtKY72 exhibited significant human ACE2-dependent entry, corresponding to an approximately 5-fold increase.
To validate our result with BtKY72, we explored this interaction in additional contexts. We first assessed whether we could enhance ACE2-dependent entry by coexpressing the cell surface protease TMPRSS2 (S3 Fig), which promotes spike-mediated viral entry at the cell surface [37][38][39], bypassing the need for endocytosis and proteolysis by endosomal cathepsins for activation [40]. The BtKY72 chimeric virus exhibited slightly increased infection in the presence of TMPRSS2 (S3C Fig). Furthermore, upon alignment of the BtKY72 RBD with the SARS-CoV, WIV1, and SARS-CoV-2 RBDs, we found a number of highly conserved positions at the purported ACE2 interface, including BtKY72 residues Y488 and T499 (corresponding to Y489 and T500 in SARS-CoV-2) (Fig 3F). To test whether these residues were involved in BtKY72 RBD interaction with human ACE2, we created Y488H or T499E mutant BtKY72 RBD chimeric spike pseudoviruses. Both single amino acid mutations abrogated ACE2-dependent entry (Fig 3H, right).

Multiple clade 3 sarbecoviruses use human and rhinolophid ACE2
Having observed human ACE2 utilization by the RBD from BtKY72, we undertook a more comprehensive experiment testing combinations of diverse RBDs with various horseshoe bat or human ACE2 proteins. During our previous experiments, 6 more clade 3 RBDs were identified. Three were identified through the USAID-PREDICT project, corresponding to PRD-0038, PDF-2370, and PDF-2386. Like BtKY72, these sequences were discovered in African bats of unknown species within the Rhinolophus genus [11]. The RBDs from PDF-2370 and PDF-2386 were identical, and all 3 RBDs clustered closely with BtKY72 (Fig 3B), so we did not initially incorporate them into our study. In contrast, 3 highly unique RBDs were also discovered through ecological sequencing efforts: Khosta-1 and Khosta-2, observed in R. ferrumequinum and Rhinolophus hipposideros bats in Sochi Russia [41], and RhGB01 observed in a R. hipposideros bat in the United Kingdom [42]. We thus created SARS-CoV chimeric spikes encoding these RBDs.
Based on our aforementioned result with BtKY72, we suspected that more clade 3 sarbecovirus RBDs are ACE2 dependent, but may only be compatible with ACE2 sequences encoded by their natural hosts, prompting us to synthesize additional ACE2 orthologs from Rhinolophus bats. ACE2 is known to be positively selected in bats, particularly at residues at the interface with SARS-CoV spike [43]. ACE2 is also highly polymorphic in R. sinicus bats [43,44]. Due to these variations, different horseshoe bat ACE2 sequences likely exhibit a range of compatibility with diverse RBD sequences. There are more than 106 Rhinolophus species known [45]. The majority of the variation between rhinolophid ACE2 sequences are found in the ACE2 ectodomain, including positions 24, 27, 31, and 34, which exhibited 4 or more different amino acids at each site (Fig 4A). These highly variable positions are along the face of an alpha helix in contact with SARS-like CoV RBDs including their receptor binding ridge (Fig 4B) and were previously shown to be positively selected [43].
To quantitate the differences between the ACE2 sequences, we created another Hamming distance matrix comparing each pair of rhinolophid ACE2 protein sequences (S4 Fig). We then chose 6 orthologs, including 3 distantly related alleles from R. sinicus, for experimental characterization. We synthesized codon-optimized cDNA sequence encoding the first 600 ACE2 residues, corresponding to the majority of the ectodomain of each Rhinolophus ACE2, as this region should contain all of the sequence impacting RBD binding and viral entry. To minimize effort and cost, we ligated these sequences to a DNA sequence encoding the last 200 residues of R. ferrumequinum ACE2, including its transmembrane and cytoplasmic regions (Fig 4A). A distance matrix for our tested constructs showed that the rhinolophid ACE2 sequences differed by a minimum of 7 residues and a maximum of 56 (Fig 4C). Each chimeric ACE2 allele was stably recombined into HEK 293T landing pad cells, and expression was confirmed by immunoblot against the R. ferrumequinum cytoplasmic domain, shared by all of our chimeric rhinolophid ACE2 constructs (Fig 4D). All constructs yielded ACE2 proteins at roughly the expected size, with slight differences in electrophoretic mobility among samples, potentially due to differing numbers of N-glycosylation motifs. Instead of a doublet like the rest of the samples, the R. sinicus (472) allele yielded a single band in some immunoblot experiments (Fig 4D), although it also yielded a doublet in other replicate immunoblots. Imaging flow cytometry confirmed that much of the ACE2 protein trafficked to the cell surface (Fig 4E and 4F).
We performed the duplex pseudotyped virus infection assay as a matrix of combinations of chimeric RBD pseudotyped viruses and cells expressing human or Rhinolophus ACE2 proteins. Every chimeric RBD spike was expressed in producer cells and incorporated into pseudotyped viral particles (S5 Fig). The fraction of control mCherry+ or ACE2-expressing miRFP670+ cells were quantitated using flow cytometry and averaged across the replicates (Fig 5A). In the process of performing these experiments, we observed massive syncytia formation in a subset of samples (S6 Fig), including some syncytia that reached half a millimeter in width (Fig 5B). SARS-CoV-2 spike pseudotyped viral particles can induce cell-cell fusion [46], likely enhanced by the high amounts of ACE2 and TMPRSS2 coexpressed in our engineered cells. Since large syncytia are unlikely to be efficiently measured alongside normal sized cells in the flow cytometer, we also quantified the frequency of green fluorescence in mCherry + or miRFP670+ cells using fluorescent microscopy ( S6 Fig). Despite the reduced dynamic range of the microscopy assay, the results between the 2 measurements were largely consistent (Figs 5C and S7 and S8), providing additional confidence in the results. Regardless of the readout, the controls performed as expected, wherein VSV-G-coated pseudoviruses did not Comparisons of the ACE2 dependencies of pseudotyped viruses harboring RBD-chimeric or full-length spikes for BtKY72. VSV-G is shown as a comparison, with the gray line denoting an ACE2 dependence ratio of 1.The horizontal dash symbol is the geometric mean of the replicate infection assay results, shown as semitransparent points. All constructs encode human TMPRSS2 C-terminally linked to ACE2 with a 2A translational stop-start sequence. The underlying data can be found in S2 Data, and the source code can be found at https://github.com/ MatreyekLab/ACE2_dependence. ACE2, angiotensin converting enzyme-2; GFP, green fluorescent protein; RBD, receptor binding domain; SARS-CoV, Severe Acute Respiratory Syndrome-related Coronavirus; SARS-CoV-2, Severe Acute Respiratory Syndrome-related Coronavirus 2; VSV, vesicular stomatitis virus; VSV-G, vesicular stomatitis virus glycoprotein.
The clade 1 RBDs showed broad utilization of ACE2 alleles (Fig 5A and 5C), typified by WIV1, which was clearly enhanced by all but 2 orthologs tested. This included strong enhancement by R. sinicus(215), consistent with the fact that WIV1 was isolated from R. sinicus bats of currently unknown allelic genotype. Also in agreement with previous reports, neither SARS-CoV nor SARS-CoV-2 RBDs utilized ACE2 from R. ferrumequinum or R. pearsonii [47], while both utilized ACE2 from R. affinis [48,49] and R. alcyone [9]. SARS-CoV RBD exhibited compatibility with ACE2 from R. sinicus, much like WIV1, consistent with this likely being similar to its initial source of zoonosis. In contrast, SARS-CoV-2 RBD exhibited low compatibility with the tested R. sinicus alleles but did exhibit strong compatibility with ACE2 from R. affinis. This is consistent with RaTG13, isolated from R. affinis bats, possessing one of the most similar RBD sequences to SARS-CoV-2 to date [50,51].
In contrast, clade 2 RBDs did not exhibit obvious signals of ACE2-dependent entry ( Fig  5A). Only the Rs4237 RBD paired with R. landeri ACE2 yielded signal in both the flow cytometry and microscopy readouts (Fig 5A), although this effect was relatively minor. We also did not observe consistent signals with alleles belonging to the R. sinicus bat species that the viruses were originally isolated from (Fig 5A, black boxes). As a whole, pseudoviruses generated with clade 2 RBD chimeric spike proteins exhibited an intermediate level of ACE2 dependence between the other sarbecovirus RBDs and VSV-G (Fig 5C, top), suggesting that there may still be slight ACE2 binding and utilization over background. Regardless, the lack of strong compatibility between the tested clade 2 RBDs and any of the Rhinolophus ACE2 alleles we tested suggest that these viruses use a different cell surface protein as a primary receptor and, at best, only use ACE2 in an auxiliary role.
Clade 3 RBDs exhibited several distinct patterns of ACE2 ortholog usage. Along with BtKY72, the Khosta-2 RBD exhibited enhanced entry in the presence of human ACE2. These 2 RBDs differed by 46 amino acids (Fig 3B), corresponding to approximately 80% amino acid identity, suggesting that they may be using a different set of amino acid side chain interactions to engage the ACE2 protein surface. Accordingly, their usage patterns of the tested Rhinolophus alleles were vastly different. Khosta-2 could only utilize ACE2 from R. alycone, while BtKY72 exhibited the broadest utilization of ACE2 alleles in the panel. The pattern was largely similar to the clade 1 viruses, as BtKY72 exhibited the weakest entry with R. sinicus (472) and R. pearsonii ACE2s. BtKY72 could utilize the ACE2 from R. ferrumequinum while the clade 1 RBDs could not. Khosta-1 exhibited compatibility with R. ferrumequinum ACE2, consistent with its identification in R. ferrumequinum bats [41]. The remaining clade 3 RBDs showed limited ACE2 compatibility aside from the sequence from R. alcyone, which permitted infection by all clade 3 virus RBDs tested.
Our observation of human ACE2 dependence with the BtKY72 and Khosta-2 RBD chimeric spikes should be reflective of human ACE2 compatibility with the full-length spikes, but we sought to formally test this. We first attempted to recreate the full BtKY72 spike by piece-wise addition of additional BtKY72 spike sequence into the BtKY72 RBD chimeric spike. Unfortunately, we were unable to observe ACE2-dependent entry with any of the additional sequence swap points (S9 Fig), suggesting that the modularity of the RBD itself was critical for creating functional chimeric spike proteins. We thus synthesized and tested the full length BtKY72 spike protein, where we observed both human and R. alcyone ACE2 dependence (Fig 5D). These data support the interpretation that the RBD compatibilities observed with the chimeric spikes are reflective of compatibilities with the corresponding full-length spikes, although the exact magnitude may slightly differ based on other factors like proteolytic processing requirements.

Interpretations of clade 3 sarbecovirus host range and tropism
Despite the various clade 3 sarbecovirus spike RBD and Rhinolophus ACE2 ortholog compatibilities observed in our pseudovirus infection assay, an important practical consideration is whether the virus is likely to encounter the corresponding bat in real life. For example, RhGB01, BB9904, and Khosta-2 were isolated from R. hipposideros or R. euryale bats in Europe (Fig 3B, bottom). Neither of these bats have been observed in sub-Saharan Africa and thus unlikely to allow transmission of these viruses to R. alcyone bats despite their potential RBD-ACE2 molecular compatibility. In contrast, BM48-31 was found in R. blasii bats, which have been observed in Eastern and Southern Africa, and thus in closer proximity to the known range of R. alcyone bats (Fig 6A, top). Thus, while all 4 RBDs can use the ACE2 from R. alcyone, only BM48-31 was found within a host bat of potentially overlapping geographical range (Fig 6A, bottom) and more likely to allow infection in those bats in real life.
Unlike the viruses isolated in Europe, BtKY72 and the related PDF-2370, PDF-2386, and PRD-0038 viruses were observed in Rhinolophus bats in Central and East Africa (Fig 6A). The exact species was not determined upon collection, although sequencing of the ACE2 gene from the PDF-2370 sample revealed that the host bat was likely R. ferrumequinum or a highly related species, only differing by 4 to 10 amino acids depending on the R. ferrumequinum ACE2 allele (S4 Fig). R. ferrumequinum bats are not thought to populate Africa, although the host bat ACE2 sequenced from the PDF-2370 sample would suggest that R. ferrumequinum, or a highly related bat, is present in Central Africa. The next most similar of the currently sequenced ACE2 orthologs belonged to R. alcyone and R. landeri, differing by 25 and 28 residues, respectively (S4 Fig). ACE2 sequences from R. ferrumequinum, R. alcyone, and R. landeri were all permissive for entry by BtKY72 RBD in our pseudovirus infection assay (Fig 5). Additionally, PRD-0038 infected human and R. alcyone ACE2-expressing cells to a similar degree to that of BtKY72 (Fig 6B). The sites of sampling for all 4 viruses overlapped with the known range of R. landeri bats, and the more western sites where PDF-2370, PDF-2386, and PRD-0038 were sampled were at the edge of the known range of R. alcyone bats (Fig 6A, bottom). Thus, the BtKY72 related viruses are likely able to enter cells of various bat species, including R. ferrumequinum bats, around the sites they were first identified.
We also looked for ACE2 sequence patterns consistent with a history of sarbecovirus infection in these African horseshoe bats. The ACE2 proteins from R. alcyone and R. landeri are similar, differing only in 10 of the 805 total residues. Three of these differences are in the transmembrane domain and cytoplasmic tail, so our tested constructs only differed at 7 positions. Four of these differences were at positions 27, 31, 35, and 41, all on the same surface of the main ACE2 alpha-helix that contacts the CoV RBDs (Fig 6C). While BtKY72 and Khosta-1 can use ACE2 proteins from both R. alcyone and R. landeri, Khosta-2, BM48-31, and RhGB01 are only capable of using the sequence from R. alcyone (Fig 5A), suggesting that these coding differences are functionally important for restricting entry for a subset of clade 3 sarbecoviruses. No sarbecoviruses have been identified in R. alcyone or R. landeri bats so far, but these genomic signatures suggest that these bats have been under evolutionary pressure by ACE2-utilizing sarbecoviruses in Africa.

ACE2 lysine 31 impacts a subset of clade 3 RBD infections
Better understanding of the ways in which diverse sarbecoviruses have achieved compatibility with human ACE2 may help identify molecular barriers initially posed by ACE2 sequence differences that can be circumvented through RBD sequence adaptations from prior infection in various hosts. Within our set, the 4 RBDs capable of binding human ACE2 exhibited different patterns in Rhinolophus ACE2 ortholog usage. While they all recognize the same overall binding site on ACE2, they are likely presenting unique interfaces for the interaction and thus relying on a different set of side chains within the same human ACE2 protein sequence. Our panel of Rhinolophus ACE2 orthologs differ at many residues, so it is impossible to delineate which specific amino acids are playing the most important roles in each interaction.
To gain amino acid-level resolution into this interaction, we tested a panel of human ACE2 mutants. We originally tested constructs that yield low amounts of overall ACE2 protein, as we previously saw that high ACE2 levels can mask the impacts of human ACE2 mutants during SARS-CoV or SARS-CoV-2 entry. These RBDs use the human ACE2 protein surface differently, as human ACE2 mutants Y41A and E37K reduced SARS-CoV spike-mediated entry without impacting SARS-CoV-2 [12]. We repeated the original singleplex experiments with the duplex infection assay (S10A  The backbone traces of ACE2 from 5 independent RBD interface structures are shown as a blue ribbon and gray semitransparent cartoon (pdb: 6m17, 7wpb, 7dqa, 7l7f, and 3sci), alongside the SARS-CoV-2 RBD (salmon, pdb: 6m17) and SARS-CoV RBD (green, pdb: 3sci). The orientations of the alpha-and beta-carbon bond for the 4 RBD interface residues differing between R. alcyone and R. landeri ACE2 orthologs are shown as gray sticks, with the corresponding side chains in the orthologs listed below. (D) ACE2-dependent infection measured by flow cytometry observed with RBD chimeric spike pseudoviruses in cells overexpressing mutants of human ACE2. All constructs encode human TMPRSS2 C-terminally linked to ACE2 with a 2A translational stop-start sequence. The base layer of the map was obtained from Natural Earth at https://www.naturalearthdata.com/downloads/110m-cultural-vectors/ and is in the public domain (https://www.naturalearthdata.com/about/terms-of-use/). The underlying data can be found in S2 Data, and the source code can be found at https://github.com/MatreyekLab/ACE2_dependence. ACE2, angiotensin converting enzyme-2; RBD, receptor binding domain; SARS-CoV, Severe Acute Respiratory Syndrome-related Coronavirus; SARS-CoV-2, Severe Acute Respiratory Syndrome-related Coronavirus 2; VSV, vesicular stomatitis virus; VSVG, vesicular stomatitis virus glycoprotein. https://doi.org/10.1371/journal.pbio.3001738.g006 Testing the same panel of mutants with the BtKY72 RBD was uninformative (S10A Fig), as this interaction is likely relatively weak and requires higher ACE2 amounts to promote infection. We thus recreated a smaller set of human ACE2 mutants in our high abundance ACE2 and TMPRSS2 coexpression construct, recombined them into HEK 293T landing pad cells, and exposed the cells to pseudoviruses with these RBDs (Figs 6D and S10C and S10D). Both flow cytometry and microscopy readouts correlated well (S10E Fig). Most mutants had little impact on pseudovirus infection with SARS-CoV and SARS-CoV-2 RBDs in this context, except for D355N and R357T, which measurably reduced infection by all 3 clade 1 RBDs (Fig  6D), consistent with previous results [12].
The clade 3 sarbecoviruses had distinct reliances on human ACE2 mutants, typified by differences in how they were impacted by the K31D mutant (Fig 6D). Infection by both clade 3 RBDs were hampered by the D355N and R357T mutants, showing that these contacts serve a common role across diverse RBD interfaces. The Khosta-2 RBD was unaffected by the Y41A and K353D mutants, but particularly reliant on K31, as the K31D mutant reduced infection as strongly as D355N and R357T. In contrast, BtKY72 was strongly inhibited by most of the human ACE2 mutants tested including Y41A and K353D, with the sole exception being K31D, which drastically enhanced pseudovirus infection with the BtKY72 RBD (Fig 6D).
Despite the known signatures of evolutionary positive selection driving variation at ACE2 position 31 within bats (Fig 4A) [43], it is unclear which types of RBDs may be involved in this coevolutionary interplay. Clade 1 sarbecoviruses including SARS-CoV and SARS-CoV-2 are inhibited by the K31D mutant when ACE2 abundance is limiting (S10A Fig), suggesting that they either favor a Lys at this site or disfavor an Asp. Multiple costructures show that SARS-CoV-2 spike Q493 can form a hydrogen bond with K31 in human or pangolin ACE2 (S11 Fig) [31,52], although this was not observed in all structures.
To gain better insight at this molecular interaction, we aligned the homology models for each clade 3 RBD with the SARS-CoV and SARS-CoV-2 RBDs. As the SARS-CoV-2 Q493 position is at the end of a highly conserved beta-strand within the receptor binding motif, the models predicted the different amino acid side chains at this position to extend from the same site and overall direction (Fig 7A). WIV1 and Khosta-2, which are both inhibited by the K31D human ACE2 mutant, encode either a Asn or Gln at this position and thus may also form similar hydrogen bonds as observed with SARS-CoV-2 (Fig 7B). BtKY72, which prefers the K31D human ACE2 mutant, encodes a Lys at this position (Fig 7B). This Lys side chain would lack the ability to hydrogen bond with K31 and their like charges may make close contacts energetically unfavorable.
We hypothesized that the pairwise identities of ACE2 residue 31 and the nearby residue on the RBD may determine compatibility for a subset of interactions. We tested this observation by looking at the ACE2 ortholog compatibility for BtKY72 and Khosta-1 (Fig 7C). Both of these viruses could not enter cells expressing Rhinolophid ACE2 orthologs encoding Lys at this position, while they could enter orthologs encoding Asp, Glu, or Asn. Khosta-1 was incapable of using human ACE2, which also encodes Lys at this position. The BtKY72 RBD could use human ACE2 thus defying this pattern, although its enhancement with the K31D mutant of human ACE2 supports our overall hypothesis (Fig 7C). The only other RBD with a Lys at this position was from BB9904. While this RBD was unable to support entry into cells expressing the K31 ACE2 orthologs from humans, R. sinicus (472) bats, and R. pearsonii bats, this RBD also could not enter cells with other orthologs and was less conclusive.
To further test our hypothesis that the RBD residue near ACE2 K31 can determine infection efficiency, we mutated the Lys at this position on BtKY72 and Khosta-1 to Glu or Asn (Fig 7D). While Asn only mildly increased infection, mutating the Lys to Glu on BtKY72 increased human ACE2-dependent infection nearly 10-fold. However, these mutations exhibited no significant effect on Khosta-1, suggesting this effect was somewhat RBD specific. Mutation of SARS-CoV N479 or SARS-CoV-2 Q493 to Lys also yielded no effect, although this is likely due to the high ACE2 expression and high avidity from pseudotyped virus infection experiments masking reductions to binding affinity with these already strong binders (Figs 6D and S10A) [12]. On the other hand, Khosta-2 is not adapted to human ACE2 and seemingly exhibits comparatively limited affinity for human ACE2. Mutation of Khosta-2 Q478 to Lys elicited a greater than 10-fold reduction to infection with human ACE2 (Fig 7D). This effect was not due to an overall disruption to the RBD, as all variants exhibited high levels of infection in cells overexpressing ACE2 from R. alcyone. Thus, RBDs encoding Lys at the position analogous to SARS-CoV-2 Q493 may generally be incompatible with ACE2 orthologs encoding K31, such as human ACE2. Conversely, mutation of this residue from Lys to Glu or Asn may be a step toward adaptation to human ACE2 and many other terrestrial mammals.
Altogether, our studies show that currently known sarbecoviruses have segregated into 2 groups of RBDs based on their ACE2 dependence or independence, with the more diverse ACE2-dependent viruses further segregating into sequence subgroups that each differentially utilize host ACE2 protein sequences during viral entry (Fig 8). The sequences within the ACE2-dependent RBDs can be highly divergent, but a constellation of pairwise interactions, such as those between ACE2 position 31 and the adjacent RBD residue often encoding Gln, Asn, Lys, or Ala, likely determine the patterns of ortholog-specific compatibilities that enable successful entry during potential zoonotic events.

Discussion
Here, we created a duplex pseudovirus infection assay for interrogating protein sequences capable of promoting viral infection. We subsequently harnessed this assay to test a matrix of 108 pairwise combinations for pseudovirus infection, with 12 different spike RBD sequences and 9 different ACE2 orthologs, to demonstrate that clade 3 sarbecoviruses consistently use various subsets of ACE2 alleles from a panel of horseshoe bats, with at least two clade 3 sarbecovirus spike RBDs also capable of using human ACE2. Our results also provide context for the importance of ACE2 residue 31, known to exhibit strong signatures of positive selection across bat species.
Our work was aided by the internally controlled, duplex nature of the infection assay format. This assay format recapitulated the effect size of the traditional singleplex format when the highly infectable cells were a minor fraction of the overall mixed cell population, and the magnitude of the effect was reduced by one-third when the cells were mixed equally. Testing the control and experimental samples in the same well allowed us to reduce the total number of samples in each experiment by up to 2-fold. Notably, in vivo infections involve complex cell mixtures, and while the duplex infection assay does not fully recreate such conditions, it may be an improved proxy over traditional singleplex measurements for obtaining infection measurements. Future advancements with multiplexed assessments of receptor sequences in pseudovirus infection assays will likely come through uniquely barcoding each transgenic construct, so that cells that are sensitive or resistant to infection can be separated and subsequently counted using high-throughput sequencing. The Hamming distance matrix for sarbecovirus RBD protein sequences was used for a principal coordinate analysis. The first 2 principal coordinates are shown as X and Y axes, generating a scatter plot based on RBD sequence dissimilarity. Proposed delineation of ACE2-dependent and ACE2-independent RBDs based on differences in protein sequence are shown, with each clade given a different color. The underlying data can be found in S2 Data, and the source code can be found at https:// github.com/MatreyekLab/ACE2_dependence. ACE2, angiotensin converting enzyme-2; RBD, receptor binding domain.
https://doi.org/10.1371/journal.pbio.3001738.g008 Viral glycoproteins that facilitate entry at the cell surface can cause cell-cell fusion resulting in syncytia formation. Consistent with other studies [53], we saw rampant syncytia formation with many of the wild-type (WTAU : PleasenotethatWThasbeendefinedaswild À typeatitsfirstmentionin ) or chimeric sarbecovirus spike proteins we tested, particularly when ACE2 and TMPRSS2 were both overexpressed. This observation prompted us to develop an automated microscopy readout for the duplex infection assay, as this does not require disrupting syncytia prior to measurement. While the dynamic range of our microscopy readout was smaller than the flow cytometry assay, the overall patterns were highly correlated between measurement types across all experiments. Our fluorescent nuclei markers aided the image-based analysis pipeline, as nuclei are far more consistent in size and shape than the cell bodies of adherent cells.
Despite the additional proteins proposed to serve as alternative receptors for SARS-CoV-2 infection, our side-by-side comparison showed ACE2 to confer the vast majority of enhancement to infection, followed by L-SIGN. While we did not test DC-SIGN or SIGLEC1 [18,54], these factors will likely confer similar effects as L-SIGN as they likely share a common mechanism for enhancing viral attachment to target cells through glycan binding. We did not see enhanced entry from overexpression of CD147, NRP1, or NRP2. Multiple recent studies have also observed no effect through CD147 overexpression [16,17,54], casting major doubt on its importance during SARS-CoV-2 entry and infection. Similar to our work, another study also did not observe significant effects from NRP1 or NRP2 overexpression [54], so the roles of these proteins during SARS-CoV-2 infection remain unclear.
Our sequence-function analysis revealed clear protein sequence and feature differences separating the ACE2-dependent and ACE2-independent groups. The ACE2-independent group completely overlaps with the evolutionary defined clade 2 sarbecoviruses. These RBDs are highly related to each other but clearly distinct from the ACE2-dependent RBDs in amino acid length and sequence identity. In contrast to the structurally resolved clade 1 RBDs, including those from SARS-CoV and SARS-CoV-2, the ACE2-independent RBDs are approximately 15 residues shorter and are predicted to lack the disulfide bridged receptor binding ridge, particularly as the majority of ACE2-independent RBDs do not encode cysteines in this region of the interface. None of these RBDs conferred ACE2-dependent infection, even with sequences derived from bats of the same species as the ones they were isolated from.
In contrast, the ACE2-dependent RBDs included all known clade 1 and clade 3 viruses and the viruses within the recently discovered RaTG15 clade [30]. These RBDs were between 219 and 223 residues in length, possessed a pair of cysteines in the receptor binding ridge capable of forming a disulfide bridge, and exhibited species-specific utilization of at least 1 known rhinolophid ACE2 protein. Despite their shared feature of ACE2 utilization, these RBDs can still drastically vary in protein sequence, with diverse pairs of RBDs exhibiting amino acid differences at 50 to 75 positions (S1 Fig). While all known clade 1 and clade 3 sarbecovirus RBDs share these features, there are undoubtedly additional clades of sarbecovirus RBDs not yet observed, which may defy these patterns.
Two other preprints have recently observed similar findings with clade 3 sarbecovirus RBD utilization of ACE2. Starr and colleagues observed weak in vitro binding between BtKY72 RBD and human ACE2 [55], which could be enhanced with a K493Y/T498W double mutant in the RBD that increases its affinity. They were initially unable to observe pseudovirus infection with the full-length WT BtKY72 spike, although the double mutant spike yielded detectable infection [54]. Seifert and Letko observed that the Khosta-2 RBD could use human ACE2 during infection [56]. Both of these observations are consistent with our results, considering slight differences in assay sensitivities. By generating a larger set of rhinolophid ACE2 orthologs including R. ferrumequinum, R. alcyone, and R. landeri, we were also able to test sequences that were likely more similar to those in the bats that serve as their natural hosts, instrumental in seeing that all of the known sarbecoviruses with RBDs possessing 219 or more residues are ACE2 dependent. The precise genetic determinants of compatibility between spike RBD and host ACE2 sequences will become clearer once more Rhinolophus ACE2 sequences are sequenced and tested in functional assays like ours.
Despite the overall concordance in results from independent groups, some inconsistencies remain. For example, a previous study by Wells and colleagues concluded that the PRD-0038 and PDF-2370 / PDF-2386 RBDs, which they refer to as "Rwanda" and "Uganda" viruses, do not utilize human ACE2 [11]. These RBDs are highly similar to the clear human ACE2-compatible BtKY72 RBD, differing only in 3 or 4 amino acids across the approximately 222 residue RBD. We saw that PRD-0038, corresponding to the "Rwanda" virus, and BtKY72 RBD chimeric spikes both exhibited roughly 10-fold increases to infection in the presence of human ACE2, and almost 100-fold increases in the presence of R. alcyone ACE2. Importantly, this means that PRD-0038 can indeed use human ACE2 to a similar extent as BtKY72. This observation would seem to support our hypothesis that the high ACE2 expression and our duplex infection assay may indeed have higher sensitivity for weaker RBD interactions than various biochemical or bulk infection assays used in the past.
Even with the advantages of our assay and results, interpretations should be made with caution. Due to financial constraints, the diverse sarbecovirus spikes and Rhinolophus ACE2 alleles we tested were chimeric molecules, wherein the domains most critical for the virushost interaction were swapped into existing scaffold protein sequences. As some chimeric molecules may not be fully stable, there are the possibilities of false negatives in our dataset. For example, the strongest enhancement to pseudovirus infection conferred by R. sinicus (472) ACE2 cells was the 3.5-fold increase with WIV1 chimeric spike, and it is currently unclear whether this relatively poor enhancement is a true property of this allele, or an artifact of altered protein conformation or subcellular localization.
While BtKY72 and Khosta-2 RBDs can utilize human ACE2 during entry, this does not mean that these viruses are currently capable of infecting humans. In both cases, the amounts of pseudovirus infection conferred by these RBDs were less than those conferred by the SARS-CoV, SARS-CoV-2, and WIV1 RBDs. These interactions may still be too inefficient to allow widespread entry and replication within humans. Furthermore, replication in vivo is multifactorial [57], and there are likely additional incompatibilities in immune antagonism and replication that may stifle a zoonotic event. For example, the Khosta viruses lack genes thought to antagonize the immune system, such as ORF8 [41]. While not sufficient, compatible interactions between viral entry proteins and host receptor proteins are likely necessary for zoonosis. Thus, these results demonstrate that there are sarbecoviruses that are at least partially primed to jump into humans and that surveillance efforts should be further extended outside of East Asia to other continents, including Africa and Eastern Europe.
The drastic differences in ACE2 and sarbecovirus RBD compatibility observed in our study highlight the importance of knowing the genotypes of both the virus and host. For example, R. alcyone was compatible with all clade 1 and 3 RBDs tested, while the highly related ortholog from R. landeri was only compatible with half of these viruses, and the next related sequence from R. ferrumequinum was only compatible with BtKY72 and Khosta-1. All 3 species may overlap in geographical range in Central Africa, and accurate identification of the host bat is needed to know which species can serve as reservoirs in vivo. There is a staggering amount of ACE2 allelic variation in horseshoe bats, including the 19 or more variants observed with R. sinicus, the 6 or more variants observed with R. affinis, and 3 or more variants observed with R. ferrumequinum. With some pairs of alleles in the same species differing by as much as 17 residues, there are likely drastically different viral susceptibilities and phenotypic heterogeneity within a population of the same species. Future efforts capable of simultaneously sequencing both the viral genome and host ACE2 coding sequence from a single sample will undoubtedly help uncover some of these complex relationships between virus and host.
Our results clarify the molecular interactions that likely underlie the evolutionary interplays between sarbecovirus RBDs and host ACE2 sequences. For example, ACE2 residue 31 was long known to be a site of evolutionary conflict due to its signature of positive selection in bats [43]. We found that the BtKY72 and Khosta-1 RBDs disfavor horseshoe bat ACE2 orthologs and alleles encoding K31. Human ACE2 also encodes K31. While BtKY72 was capable of using human ACE2, its infection was enhanced with a K31D mutant of human ACE2. Both RBDs encode Lys at the RBD residue that interacts with ACE2 position 31, suggesting a potential side chain incompatibility making the interaction less energetically favorable.
Consistent with our interpretation, Starr and colleagues found that the WT BtKY72 Lys residue was disfavored for human ACE2 interaction, as substitution to multiple other amino acids including Tyr, Gln, Phe, Ala, Val, Gly, and Cys improved binding [55]. In contrast, SARS-CoV and SARS-CoV-2 RBDs encode Asn or Gln at the analogous residue and disfavor the K31D mutant of human ACE2. Thus, an amino acid that is favored for one sarbecovirus RBD may be disfavored for another and vice versa. With the BtKY72 and Khosta-2 RBDs, the infectivity of the pseudovirus particles could be toggled by switching to or away from the disfavored Lys amino acid at the residue directly across from ACE2 position 31. A similar pattern of incompatibility was previously observed between N479 of SARS-CoV RBD or K479 from a related virus cSz02 from palm civets, and K31 of human or T31 of palm civet ACE2, thought to be important for transmission of SARS-CoV from palm civet intermediate hosts [6]. Analogous interplays likely exist between other key pairs of RBDs and ACE2 residues, and the collective actions of all of these interactions likely dictate ACE2 usage by a given RBD.
The current pandemic is a sobering reminder of the importance of understanding the molecular barriers that normally prevent zoonosis, especially when previous ecological and societal factors that also served as barriers continue to erode. Once identified, weakened barriers can be bolstered or surveilled as part of pandemic precaution. While most prescient with sarbecoviruses, these considerations apply to other viruses, including Merbecoviruses, Henipaviruses, or Filoviruses. Multiplexable genetic assays will be instrumental in getting the large number of data points needed to understand the nuanced molecular coevolutionary relationships that exist across a diverse set of host-pathogen interactions.

Plasmid construction
Construction of the landing pad lentiviral vector construct, LLP-Int-BFP-IRES-iCasp9-Blast (Addgene plasmid #171588) was described previously [12]. All plasmids were produced using Gibson Assembly [58]. For the initial polymerase chain reaction, a total of 40 ng template plasmid DNA was mixed with forward and reverse primers, each at a final concentration of 0.333 μM, and amplified with Kapa HiFi HotStart ReadyMix polymerase. To create the chimeric SARS-CoV spike constructs with chimeric RBDs, 100 ng of gBlock DNA (Integrated DNA Technologies) encoding the codon-optimized RBD sequences were used as the starting template. To create the chimeric rhinolophid ACE2 molecules, 100 ng of eBlock DNA oligomers (Integrated DNA Technologies) were used as the template. Notably, the multipiece eBlock ligation strategy worked poorly and is not recommended for future molecular cloning.
All of the aforementioned DNA was amplified under the following conditions: 95˚C 5 0 , 98˚C 20@, 65˚C 15@, 72˚C 8 0 , repeat 7 or 8 times, 72˚C 5 0 . Twenty units (1 μL) of DPN1 enzyme (New England BioLabs, R0176L) were added to each reaction, except for those produced from DNA oligomers, and incubated for 2 hours at 37˚C. A Zymo clean and concentrator kit (Zymo Research, D4003) was used to clean each reaction and 1 μL of the final eluate was incubated with 1 μL 2× GeneArt Gibson Assembly MasterMix (ThermoFisher, A46629) for 30 to 60 minutes at 50˚C to complete the Gibson cloning reaction. The resulting recombinant plasmids were transformed via calcium heat shock into homemade chemically competent E. coli 10β cells (New England BioLabs, C3019I). Plasmid DNA was extracted using a GeneJET miniprep kit (ThermoFisher, K0503) and sequence-confirmed with Sanger sequencing on an Applied Biosystems 3730 Genetic Analyzer.

Recombination of landing pad cells
HEK 293T cells were used to generate the lenti-landing pad line derived from LLP-Int-BFP-IRES-iCasp9-Blast as previously described [10]. Landing pad cells expressing Bxb1 integrase with a nuclear localization signal to allow for transport into the nucleus were recombined in either 24-well or 6-well plates. In the 24-well plate, 120,000 cells were transfected with 254 ng of attB recombination plasmid mixed with 0.96 μL of Fugene 6 reagent in D10-dox media. In the 6-well plate, 600,000 cells were transfected with 1,200 ng of attB recombination plasmid mixed with 5 μL of Fugene 6 reagent in D10-dox media.
Upon attB-plasmid transfection, negative selection of nonrecombined landing pad cells was performed with the addition of 10 nM AP1903 (ApexBio, B4168) to activate iCasp9. Positive selection of recombined cells was achieved with the addition of 1 μg/mL puromycin (Invivo-Gen, ANTPR1). Recombined cells were maintained in D10-dox with 1 μg/mL puromycin to prevent transgene silencing.

BLAST searches and protein sequence alignments
The RBDs of SARS-CoV, WIV1, and SARS-CoV-2 spikes were used as initial query sequences for NCBI BLASTp searches. The resulting sarbecovirus spike protein sequences were obtained from NCBI, with the corresponding accession numbers listed in S1 Table. All RBD fragments were manually curated and aligned using Clustal Omega [59]. The spike RBD sequences for PDF-2370 and PDF-2386 were identical, and we thus collapsed these 2 entries into one and only refer to this sequence as PDF-2370 for simplicity. The aligned sequences were used as the input for a custom python script that performed calculations of amino acid identity at each position for any given pair of RBD sequences.
To perform a comprehensive search for clade 2 RBD spike sequences to gain a near-complete sampling of their sequence diversity, we first performed an NCBI BLASTp search using the YN2013 RBD amino acid sequence as the query. Clade 2 sequences were retained following a filtering step excluding hits longer than 210 amino acids, yielding a list of 112 likely "clade 2" accession numbers. The full "YN2013" spike amino acid sequence was then used to perform another search, and full-length spike sequences were retrieved and separated based on their existence or absence in the list of "clade 2" RBDs.
To identify Rhinolophus bat ACE2 alleles, we performed an NCBI BLASTp search using human ACE2 as the query sequence but restricting results to the 58055 taxonomic ID. The sequences were aligned, and Hamming distance matrices were calculated, as described before for the sarbecovirus RBD sequences.

Pseudotyped virus infection assays
All pseudotyped virus infection experiments were performed with lentiviral vectors. The lentiviral particles were produced by transfecting 1.5 million HEK 293T cells in a single well of a 6-well plate, using PEI-Max MW 40,000 (PolySciences, CAS Number: 49553-93-7) mixed with 600 ng of PsPax2 (Addgene # 12260), 600 ng of the lentiviral transfer vector pLenti_CM-V-EGFP-2A-mNeonGreen (Addgene # 171599), and 600 ng of various viral envelope plasmids. The media was changed the next day, and the supernatant was collected over the next 72 hours. Upon each collection, the media was spun at 300 × g for 3 minutes, and the soluble fraction retained. A list of viral envelope coding sequences used in this study is shown in S1 Table. Upon mixing of pseudovirus supernatants and target cells mixtures, the target cells were incubated for 48 or more hours prior to processing for flow, or imaging by automated microscopy.

Detection of spike proteins in the pseudo-lentiviral particles produced from transfected 293T cells
A total of 8 million 293T cells were plated in 10 cm cell culture dishes 12 hours before transfection. The next day, 293T cells were transfected with lentiviral vectors for producing pseudolentiviral particles using PEI-Max MW 40,000 mixed with 8 μg of PsPax2, 8 μg of the lentiviral transfer vector pLenti-CMV-mNeonGreen-2A-HygroR, and 8 μg of various viral envelope plasmids. Approximately 12 hours following transfection, the cell medium was removed from the transfected plates and cells were gently washed with 1X PBS followed by addition of 10 mL D10 medium. Cell supernatants having lentiviral particles were collected and stored at 4 degrees, at 36, 60, and 84 hours following transfection. A volume of 30 mL of each of the collected cell supernatants were centrifuged at 1,200 rpm for 5 minutes to allow the cell debris to settle at the bottom and the clarified supernatants were transferred to fresh falcon tubes. One volume of Lenti-X concentrator (Clontech, # 631231) was combined with 3 volumes of clarified supernatant and mixed with gentle inversion. Mixtures were incubated on ice for 2 hours followed by centrifugation at 1,500 × g for 45 minutes at 4˚C. Supernatants were discarded and the remaining off-white pellets were resuspended in 500 μL of RIPA buffer (Thermo Scientific, #89901) supplemented with 1X protease inhibitor (Thermo Scientific, #1862209). RIPA lysates were mixed with 4X SDS sample buffer and boiled for 10 minutes. Equal volumes of denatured lysates were used for separation on 4% to 12% gradient SDS PAGE polyacrylamide gel (Genscript, #M00653). Separated proteins on the polyacrylamide gel were transferred to a 0.2μm PVDF membrane (Thermo Scientific, #88520) and immunoblotted using anti-Rhodopsin (1D4) (Abcam, #Ab5417) or anti-p24 antibody (Abcam, #63917). Western blot images were acquired using a GE Amersham Imager 600.

Flow cytometry and fluorescence microscopy
Cells were detached with 0.25% Trypsin with 2.21 mM EDTA (Corning, #25-053-CI) and resuspended in PBS containing 5% fetal bovine serum. Analytical flow cytometry was performed either with a ThermoFisher Attune NxT or a BD LSRII flow cytometer. For the Attune NxT, mTagBFP2 was excited with a 405-nm laser, and emitted light was collected after passing through a 440/50-nm bandpass filter. EGFP was excited with a 488-nm laser, and emitted light was collected after passing through a 530/30-nm bandpass filter. mCherry was excited with a 561-nm laser, and emitted light was collected after passing through a 620/15-nm bandpass filter. iRFP670 and miRFP670 were excited with a 638-nm laser, and emitted light was collected after passing through a 720/30-nm bandpass filter. For the BD LSRII, mTagBFP2 was excited with a 405-nm laser, and emitted light was collected after passing through a 440/40-nm bandpass filter. EGFP was excited with a 488-nm laser, and emitted light was collected after passing through a B525/50-nm bandpass filter. mCherry was excited with a 561-nm laser, and emitted light was collected after passing through a 610/20-nm bandpass filter. iRFP670 and miRFP670 were excited with a 640-nm laser, and emitted light was collected after passing through a 710/ 40-nm bandpass filter. Before analysis of fluorescence, live, single cells were gated using FSC-A and SSC-A (for live cells) and FSC-A and FSC-H (for single cells).
Fluorescent images were captured on a Nikon Ti-2E fluorescent microscope, outfitted with a SOLA SM II 365 light engine (Lumencor), a CFI Plan Apochromat DM Lambda 20X objective or a NIKON Plan Fluor 4X objective, GFP (#96392), Texas Red (#96395), or Cy5 (#96396) filter sets, and imaged with a DS-QI2 monochrome CMOS camera. The images were captured with an automated image acquisition workflow, which performed autofocus on each well of a 96-well plate. All exposure times for each fluorescent channel were kept constant between wells and replicate experiments. The captured TIFF files were analyzed with a custom Python script utilizing the numpy, scipy, cv2, skimage, and PIL packages. This script entitled "Overla-p_ratio_calculation.py" can be found in the project GitHub repository (https://github.com/ MatreyekLab/ACE2_dependence). The image shown in Fig 5B was processed in NIS-Elements imaging software (Nikon), with intensity minimums and maximums autoscaled to show the locations and relative sizes of the red or near-infrared nuclei and highlight the distribution of GFP within the large syncytial cell.

Immunofluorescence staining of ACE2 expressing cells for imaging flow cytometry
Human or Bat ACE2 expression constructs were recombined in LLP-Int-BFP-IRES-iCasp9-Blast HEK 293T cells and cells were selected and maintained in D10-dox with 1 μg/mL puromycin as described above. Cells were detached with 0.25% Trypsin (Corning #25-053-Cl), resuspended in D10 medium, and centrifuged at 300 × g for 3 minutes. The supernatants were removed, and the cell pellets were resuspended and incubated with PBS + 5% FBS for 30 minutes at 4 degrees followed by centrifugation at 300 × g for 3 minutes, and the supernatant was removed. The cells were fixed with 50 μL of 1X TF fix/perm buffer (BD Pharmingen 51-9008100, 51-9008101) for 45 minutes at 4˚C. After fixation, the cells were washed twice with 100 μL of 1X TF perm/wash buffer (BD Pharmingen 51-9008102). The cells were incubated with 50 μL of 1:50 dilution of ACE2 antibody (Abcam, #Ab15348) prepared in 1X TF perm/ wash buffer for 45 minutes at 4˚C. After primary antibody incubation, the cells were washed twice with 100 μL 1X TF perm/wash buffer followed by incubation with 50 μL of 1:300 dilution of Alexa Fluor 488 (Invitrogen, #A11008) secondary antibody prepared in 1X TF perm/wash buffer for 30 minutes at 4˚C. After the incubation with secondary antibody, the cells were washed twice with 100 μL 1X TF perm/wash buffer and cell pellets were resuspended in 100 μLl of PBS + 5% FBS. The samples were acquired on an Amnis ImageStream X flow cytometer.
The images were imported into Amnis IDEAS software (Version 6.2). The events were gated for highly focused cells with miRFP670 values above the background distribution. The resulting 80 × 80 pixel Tag Image File Format images were converted into Portable Network Graphics image files using the gdal_translate function from the GDAL translator library (Version 3.5.0). The resulting PNG files were analyzed with a custom Python script utilizing numpy and skimage, where the pixels of every channel of every image were converted into a data table of normalized pixel intensities and euclidean distance to the image center. The resulting tab-separated value datafiles were imported into R, and plotted for average normal pixel intensities grouped by micron distance to the cell center. This script entitled "Radial_intensity.py" can also be found in the project GitHub repository.

Data analysis and statistics
Data analysis was performed using version 1.4.1717 of RStudio, with the exception of flow cytometry data, which was first analyzed using version 10.8.0 of FlowJo. An R Markdown file containing code capable of fully reproducing the analyses can be found at the Matreyek Lab GitHub repository (https://github.com/MatreyekLab/ACE2_dependence). The analysis utilized the tidyverse [60], ggrepel, ggbeeswarm, sf [61], and ggfortify [62] packages. Statistical significance was determined using 2-sided t tests, and multiple test corrections were performed using the Benjamini-Hochberg procedure. The principal component analysis was performed by using the cmdscale classical (metric) multidimensional scaling function in ggfortify on the RBD Hamming distance matrix shown in S1 Fig. To calculate ACE2-dependent infection by flow cytometry, the acquired single cells were subsequently gated into mCherry+/iRFP670− or mCherry−/iRFP670+ subpopulations using FlowJo. The percentage of GFP-positive cells in each subpopulation was calculated and exported as individual columns of a comma-separated value data file. These values were copied into the experiment sample sheet listing the date, sample name, cell line used, pseudotyped virus used, and pseudotyped virus inoculum, and imported into RStudio for subsequent analysis. There, the percent GFP value for the mCherry−/iRFP670+ subpopulation was divided by the percent GFP value for the mCherry+/iRFP670− population to obtain a ratio. The geometric mean of multiple replicate experiments were used to derive the ACE2-dependent infection metric used throughout our work. The table of pseudotyped virus infection values analyzed in this work can be found in S1 Data. The datasets used to generate each of the figures can be found in S2 Data. Both datasets are also found within the analysis script found at the aforementioned GitHub repository.

Modeling of Bat ACE2 three-dimensional structures
The HHpred web server was used to perform homology alignment of various Bat ACE2 sequences with human ACE2 structures (pdb: 6m17 and 6m18) [63,64]. A structural model was then built with the MODELLER web server [65], and the ACE2 models were each aligned to ACE2 in PDB:6m17 using the default alignment settings in PyMol. The HHpred web server was used to perform homology alignment of the YN2013 RBD sequence with the SARS-CoV RBD (pdb: 7lm9). The HHpred web server was also used to align the RBD of the BtKY72 spike protein to the spike protein from SARS-CoV-2 (pdb: 7eam). A structural model was then built with the MODELLER web server [65], and the ACE2 models were each aligned to the SARS-CoV-2 RBD in PDB:6m17 using the default alignment settings in PyMol. The same pipeline was used to generate a model of the SARS-CoV-2 RBD, to gain an estimate of the amount of error produced in the modeling process.

Visualizing the global ranges of various rhinolophus bat species
The ranges of R. affinis [66], R. alcyone [67], R. blasii [68], R. euryale [69], R. ferrumequinum [70], R. hipposideros [68], R. landeri [67], and R. sinicus [71] were downloaded as shape files from The IUCN Red List of Threatened Species 2020. The shape files were imported into R and displayed in R studio using the "sf" package [61]. The base layer of the map (version 5.1.1) was obtained from Natural Earth at https://www.naturalearthdata.com/downloads/110mcultural-vectors/ and is in the public domain (https://www.naturalearthdata.com/about/termsof-use/). The schematic on the right shows the chimeric swap points that were tested with light gray denoting SARS-CoV sequence, and dark gray denoting BtKY72 sequence. The underlying data can be found in S2 Data, and the source code can be found at https://github.com/MatreyekLab/ACE2_dependence. ACE2, angiotensin converting enzyme-2; RBD, receptor binding domain; SARS-CoV, Severe Acute Respiratory Syndrome-related Coronavirus. and an X-ray diffraction structure (PDB: 7dhx, right) between SARS-CoV-2 RBD and either human (left) or pangolin (right) ACE2. The side chain residues for ACE2 Lys31 and SARS-CoV-2 RBD Gln493 are shown as stick representations, with nitrogen atoms colored blue and oxygen atoms colored red. ACE2 is colored cyan, and SARS-CoV-2 RBD is colored salmon. The predicted hydrogen bond is shown as black dashes. ACE2, angiotensin converting enzyme-2; RBD, receptor binding domain; SARS-CoV-2, Severe Acute Respiratory Syndrome-related Coronavirus 2. (TIF) S1 Table. Table of  (XLSX) S1 Raw Images. PDF displaying the raw unmodified image files as well as size marker overlays for each western blot. (PDF)