Structural analysis of viral ExoN domains reveals polyphyletic hijacking events

Nidoviruses and arenaviruses are the only known RNA viruses encoding a 3’-5’ exonuclease domain (ExoN). The proofreading activity of the ExoN domain has played a key role in the growth of nidoviral genomes, while in arenaviruses this domain partakes in the suppression of the host innate immune signaling. Sequence and structural homology analyses suggest that these proteins have been hijacked from cellular hosts many times. Analysis of the available nidoviral ExoN sequences reveals a high conservation level comparable to that of the viral RNA-dependent RNA polymerases (RdRp), which are the most conserved viral proteins. Two highly preserved zinc fingers are present in all nidoviral exonucleases, while in the arenaviral protein only one zinc finger can be identified. This is in sharp contrast with the reported lack of zinc fingers in cellular ExoNs, and opens the possibility of therapeutic strategies in the struggle against COVID-19.


Introduction
As of today, the coronavirus SARS-CoV-2 pandemic has affected more than 100 million people worldwide, causing millions of deaths, as well as a severe sanitary, social, and economic crisis [1]. The Coronaviridae family is part of the Nidovirales order, which includes enveloped, non-segmented and single positive-stranded RNA (+ssRNA) viruses that infect a wide variety of animal hosts, including humans [2][3][4][5][6][7].
The largest known RNA viral non-segmented genomes are found in nidoviruses. The upper limit is held by the Planarian secretory-cell nidovirus (PSCNV), with a 41.1 kb genome [13,14]. The presence of these unusually long linear RNA-genomes is explained in part by the proofreading activity of their 3'-5' exonuclease (ExoN) domain [12,15], which hydrolyzes phosphodiester bonds to cleave nucleotides of a polynucleotide chain (both ssRNA and dsRNA) when they are misincorporated at the 3' end during the replication process [16,17]. At present, only eight of the fourteen families recognized by the International Committee on Taxonomy of Viruses in the Nidovirales order (Coronaviridae, Tobaniviridae, Roniviridae, Medioniviridae, Euroniviridae, Mesoniviridae, Abyssoviridae, and Mononiviridae), all of which possess genomes of 20 kb or larger (Table 1), are known to be endowed with the proofreading ExoN domain.
The coronaviral nsp14 protein is composed of two different functional domains, the N-terminus which corresponds to the ExoN, and the C-terminal domain which is an N7-methyltransferase (N7-MTase) that caps the RNA avoiding its degradation [18]. The importance of the ExoN domain has been corroborated experimentally. In ExoN-knockout mutants of the betacoronaviruses Mouse hepatitis virus (MHV) and SARS-CoV, replication fidelity is strongly diminished, conferring them with a "mutator phenotype'' viable in cell culture [19,20]. Moreover, inactivation of the ExoN of HCoV-229E (alphacoronavirus), MERS-CoV, and SARS-CoV-2 (betacoronaviruses) severely affects the replication process, and results in failure to recover infectious viral progeny [16,21].
Based on sequence comparisons, Snijder et al. (2003) demonstrated the evolutionary relationship between the coronaviral ExoN domain and the cellular DNA-proofreading enzymes of the DEDD (DnaQ-like) family of exonucleases [22]. Several features are shared among the coronaviral ExoN and the DnaQ-like family of exonucleases, including the well-conserved A noteworthy feature of the coronaviral nsp14 is that its two protein domains are endowed with zinc fingers (ZFs), two in the case of ExoN and one in the case of N7-MTase [18,29]. ZFs can be described as a group of stable scaffolds whose structure is maintained by the zinc ion. They vary in sequence and structure, which reflect the Zn 2+ ion coordination with cysteine and/or histidine residues and the way in which the ZF interacts with other molecules [30][31][32]. Typically, ZFs act as interaction modules and bind to several molecules, including nucleic acids, proteins, lipids, and small compounds [32,33]. The distinctive ZFs of the SARS-CoV ExoN domain appear to play a key role in the structural stability of the enzyme [18]. According to the Andreini et al. (2011) zinc sites classification, the SARS-CoV ExoN ZF1 and ZF2 are a shuffled zinc ribbon and C2H2, respectively. Although ZFs are found in numerous eukaryotic proteins as well as in the bacterial Ros\MucR protein family [30], they have not been identified in known cellular and dsDNA viral exonucleases.
To the best of our understanding, this is the first evolutionary analysis of the RNA viral exonucleases in which their cellular counterparts have been included demonstrating the polyphyletic viral hijacking of the cellular ExoN gene. Due to the mutation rate disparity between DNA and RNA-based biological entities, we have built tertiary structure-based phylogenies, in which several viral ExoN domain hijacking events can be recognized. Phylogenetic analysis of the available nidoviral ExoN sequences reveals a level of sequence conservation similar to that of the viral RNA-dependent RNA polymerases (RdRp). This and its cornerstone relevance in the viral cycle suggest that the SARS-CoV-2 ExoN should be considered as a therapeutic target in the struggle against the Covid-19 pandemic. Since the cellular ExoNs lack ZFs, our findings suggest that conserved ZFs of the nidoviral ExoN domain might be seen as therapeutic targets in the control of coronaviral infections and for the understanding of the early evolution of this viral order.

Structural comparisons and structure-based tree construction
A search for structural homologs of the SARS coronavirus ExoN domain (nsp14-nsp10 complex, chain B, residues 1-287; PDB ID: 5C8U) was made in the PDB database using the DALI server [34]. Thirty-three non-redundant crystallographic structures with a Z score >4 were collected from the PDB database [35]. To construct the structure-based evolutionary tree, we performed pairwise comparisons between the selected structures with the PDBe Fold online server [36]. From each pairwise comparisons, we collected the root mean-square deviation (RMSD) and the number of superimposed residues to calculate the Structural Alignment Score (SAS) [

Retrieval of ExoN and RdRp sequences
Nidovirales polyprotein 1ab sequences were manually obtained from the NCBI-RefSeq database. For each sequence, a local alignment algorithm was run (Smith-Waterman with default parameters), using the SARS-CoV-2's (YP_009725309.1) ExoN domain and nsp12 (RdRp) sequences as queries. ExoN and RdRp homologous sequences were also retrieved from viral and cellular NCBI-RefSeq database (December 2020) using SARS-CoV-2 NSP14 (YP_009725309.1) and NSP12 (PDB-6M71) as queries, respectively. For this purpose, a Smith-Waterman local alignment was carried out (default parameters).

Multiple sequence alignment and phylogeny estimation of ExoN and RdRp
A multiple sequence alignment was built for each of the two proteins with the PROMALS3D server [39] uploading the SARS-CoV ExoN domain (PDB 5C8U, [18]) and the SARS-CoV-2 nsp12 (PDB 6M71) tertiary structures as references. For the ExoN sequences, the best model under Akaike criterion was LG+F+R5 calculated with ModelFinder [40], and a phylogeny was inferred with maximum likelihood and 100 non-parametric bootstraps implemented in IQ-TREE2 [41]. Branches with bootstrap values <50% were collapsed using TreeCollapserCL4 [42]. The RdRp alignment was treated with the trimAL program [43] using the automated1 heuristic method. The best model under Akaike criterion was LG+F+R6 calculated with Mod-elFinder [40], and a phylogeny was inferred with maximum likelihood and 100 non-parametric bootstraps implemented in IQ-TREE2 [41]. Branches with bootstrap values <50% were collapsed using TreeCollapserCL4 [42]. The phylogenies were edited and visualized with Figtree (http://tree.bio.ed.ac.uk/software/figtree/) [44].

ExoN tertiary structure-based phylogenetic tree
As expected, the multiple alignment of viral and cellular ExoN's sequences led to a non-conclusive tree that reflected the high degree of divergence within this diverse family that includes proteins encoded by RNA viruses, DNA phages, and cells. Since protein tertiary structure is more conserved than the amino acid sequence [45,46], we built a phylogenetic tree based on the spatial superpositions of the available tertiary structures of DnaQ-like family exonucleases. Our results confirm the monophyletic origin of all these exonucleases. As shown in Fig 1, there is a non-random distribution of the ExoN domains, with the DEDDh and DEDDy exonucleases located on clearly defined different clades. Exonucleases have a wide array of functions involving RNA and DNA repair, proofreading, immune activity, etc., and the fact that these different functions are located in the same clades highlights their functional versatility and lack of absolute substrate specificity. The SARS-CoV ExoN is found within a branch encompassing eukaryotic and prokaryotic DEDDh ExoNs with multiple functions, including proofreading, cytoplasmic nucleic acid degradation, and RNA maturation, processing and binding activities. On the other hand, the arenaviral ExoNs are located in a different branch of DEDDh ExoNs, and its sister branch includes the MAEL enzymes, which are an atypical group of nucleases lacking the characteristic DEDDh/y catalytic pentad (see below). Finally, the dsDNA phages ExoNs are grouped together with other cellular DEDDy ExoNs, all of which partake in DNA proofreading. This tree strongly suggests that these monophyletic viral exonucleases have evolved diverse functions following three clearly independent viral hijacking events.

Evolutionary analysis of the nidoviral ExoN domain
All the available data indicate that the nidoviral acquisition of the ExoN domain occurred prior to the diversification of these viruses and may have played a key role in their evolutionary success. As shown in Fig 2 and S1 Fig, all  identified, while the residues that could correspond to an additional second ZF cannot be confidently assigned (Fig 2). In fact, the PSNCV has several major genomic and molecular differences with other nidoviruses, which might explain the absence of ZF2 ( Table 2). The Nidovirales ExoNs phylogenetic tree (Fig 3) shows each of the families forming its own clade, with high bootstrap values close to the edges. Only the families Roniviridae and Euroniviridae are clustered with high bootstrap values closer to the root of the tree. The Aplysia californica nidovirus (APNV) stems as a sister group to the Coronaviridae family, whereas the PSCNV is located at the root of the tree forming its own clade. The RdRp phylogenetic tree (Fig 3) exhibits a similar topology to the ExoNs tree, with high bootstrap values from the root to the edges. Most of the viral families form their own clade; however, some families are
A single ORF overlapping other small ORF's in distinct reading frames.

Additional genes
Lack of ribonuclease T2, ankyrin and fibronectin type II genes. Contains genes for ribonuclease T2 homolog, ankyrin and two fibronectins type II.

3CLpro differences
3CLpro with cysteine as the catalytic nucleophile.
3CLpro with Ser-His-Asp as a catalytic triad.
Substrate-binding pocket with a Histidine. Substrate-binding pocket with a Valine.

RdRp C motif Ser residue in the nidovirus-specific
Ser is replaced by a Gly residue in this signature (GDD). SDD signature.

NiRAN domain All nidoviruses retain seven invariant residues.
Only six of seven invariant residues are conserved.
https://doi.org/10.1371/journal.pone.0246981.t002 In this tree, the APNV as well as the PSCNV stem as independent clades. The overall topology is quite similar in both trees, with the different viral families consistently grouped. The patterns of evolutionary relatedness among viral groups in both the RdRp and ExoN phylogenies exhibit a very similar topology. The high level of sequence conservation (95% identity) between SARS-CoV and SARS-CoV-2 ExoNs is similar to the one observed when the proteins involved in the RdRp complex (nsp8, nsp9, and nsp12) are compared (96% identity) [47]. The very high level of similarity between the topology of the ExoN tree with that of the highly conserved RdRp is an indication that even in RNA-based entities like nidoviruses, proofreading processes play a key role in maintaining genome integrity and stability.

ZFs are present in RNA viral ExoNs, but not in cellular and phage enzymes
A detailed analysis of the DnaQ-like family of exonucleases structures showed that the only two cellular exonucleases with ZF domains are the "cell death-related nuclease 4" [48], and the "target of Egr1'' [49]. The multiple sequence alignment of viral and cellular DnaQ-like exonucleases demonstrates the lack of ZFs in both the DNA phages and cellular ExoN domains (S2 Fig). In contrast with cellular DnaQ-like exonucleases, RNA viral exonucleases such as the SARS-CoV nsp14 ExoN and Lassa virus NP exonuclease are endowed with two and one ZFs, respectively (Fig 4A and 4B) [18,50]. Interestingly, the peculiar Maelstrom (which has the typical ExoN β1-β2-β3-αA-β4-αB-β5-αC catalytic core topology but lacks the DEDD sequence motif) possesses a different and characteristic ZF (ECHC) in the exonuclease domain, which is also found in the arenaviral ExoN (Fig 4C)  Overall, the differences among viral ExoNs and the fact that they are located in different branches in our structure-based tree supports the idea of independent viral hijacking events.

The viral ExoN domains
Apart from nidoviruses, the other only known RNA viruses endowed with a 3'-5' ExoN domain are the arenaviruses, which are enveloped, segmented negative-stranded RNA viruses that belong to the Arenaviridae family (Order Bunyavirales). However, it has been suggested that the arenaviral ExoN is involved in suppressing innate immune signaling and not in proofreading activity [52], which is consistent with the role of ExoN in the genome size increase of coronavirus and arenaviral small size genomes (10.5 kb). Although the SARS-CoV (Coronaviridae) and Lassa virus (Arenaviridae) ExoNs are homologous and possess features of the DnaQlike family of exonucleases, structural differences between the proteins support the idea of independent acquisition events by these viruses, which in the case of the arenaviruses led to a different function. Several differences can be noted (Fig 4A and 4B). Firstly, the Lassa ExoN has a basic loop motif (K516, K517, K518, and R519) in a projecting ''arm" located to the left of the active site that interacts with the non-substrate strand, and which is absent in SARS-CoV ExoN. Secondly, the SARS-CoV ExoN is endowed with an additional co-factor binding site (nsp10 binding-site) that is absent in the Lassa ExoN. Thirdly, the SARS-CoV ExoN is structurally interconnected to a N7-MTase domain, while the Lassa virus ExoN is linked to the NP core domain. Finally, the Lassa virus ExoN has one zinc finger with an as yet undescribed role,

PLOS ONE
The origin of viral ExoN domains whereas the SARS-CoV ExoN is endowed with two zinc fingers involved in structural stability (ZF1) and possibly in enzymatic activity (ZF2) [18,27,50].
As shown in Fig 1, several dsDNA phages have a proofreading ExoN domain that shares with DnaQ-like exonucleases the well-conserved DEDD motif, the 3'-5' exonucleolytic degradation of DNA, and the β1-β2-β3-αA-β4-αB-β5-αC conserved catalytic core topology [23][24][25][26]28]. In these DNA phages, the ExoN domain is covalently linked to their polymerase, as in Escherichia coli DNA polymerase I and III, suggesting that both the polymerase and the ExoN were taken in a single hijacking event by the ancestor of the DNA phages. This indicates an independent hijacking event different from the ones of the arenaviral and nidoviral ExoNs, which is consistent with their different locations in the structure-based tree shown in Fig 1, are consistent with the idea of a distinct viral hijacking event for the phages ExoNs. This is further supported by the fact that they belong to the DEDDy subgroup rather than the DEDDh subgroup [53], where SARS-CoV and arenaviral ExoNs are located.

The arise of zinc fingers at the ExoN domain in nidoviruses
The groundbreaking discovery of catalytic RNAs in the early 1980s gave considerable credibility to the proposal that the first living entities were based on RNA as both the genetic material and as catalyst, a hypothetical stage called the RNA world [54]. Indeed, the catalytic, regulatory, and structural properties of RNA molecules and ribonucleotides, combined with their ubiquity in cellular processes, suggest that they played a key role in early evolution and perhaps in the origin of life itself [55,56]. Today the only known RNA-based biological entities are found in the wide array of RNA viruses and viroids. Although RNA viruses may provide insights into the structure and evolution of early cellular genomes prior to the emergence of DNA, it is unlikely that they are direct descendants of primitive RNA-based life forms [57]. The viral ability to cross taxonomic barriers and infect new species is well established, but all the available evidence indicates that nidoviruses are restricted to the Animalia, suggesting that the Nidovirales order originated late in the history of the biosphere. Since the appearance of animals occurred sometime around 750 million years ago, the available data indicate that the nidoviral hijacking of the ExoN domain took place during the late Proterozoic.
Our This stands in sharp contrast with the nidoviral enzyme, in which ZFs appear to play an essential role in structure stability and perhaps also in catalysis [18]. Experimental analyses reinforce this conclusion. Mutagenesis studies targeting ExoN zinc finger 1 (ZF1) from Murine hepatitis virus (C206A and C209A), Transmissible gastroenteritis virus (C210H) and MERS-CoV (C210H) are known to affect genome replication [20,21,58]. Additionally, White bream virus (C6101A, C6104A, C6122A, and C6125A), and SARS-CoV ExoN ZF1 mutants were found to lack nucleolytic activity and cannot be expressed as soluble proteins, respectively [18,59]. Furthermore, ExoN ZF2 mutants for SARS-CoV (C261A and H264R), and MERS-CoV (C261A and H264R) abolished enzymatic activity and abrogated replication, respectively [18,21]. The conservation of ZFs across nidoviruses (Fig 2) and the mutagenesis studies mentioned above indicate that they play a key structural role in viral ExoN function. Sequence conservation and ZF traits suggest a monophyletic acquisition of the ExoN domain that took place in the ancestral nidovirus population prior to its split into several families.
As mentioned above, with the exception of the PSCNV, all known nidoviruses with genomes larger than 20 kb possess two ZFs. The PSCNV is an interesting case and warrants further experimental analysis. On the one hand, it is the RNA virus with the largest linear genome characterized as of today, with a length of 41.1 kilobases [14], which is remarkable for RNA-based biological entities with such an elevated mutation rate [60]. Nevertheless, PSCNV ExoN seems to lack one of the two ZFs, more specifically, ZF2, which has been shown to be essential for the correct function of the ExoN and replication, at least for two betacoronaviruses [18,21]. These observations question the nature of this virus, which may not be a typical nidovirus, and the mechanism by which this unusually large RNA virus can keep its genome integrity and stability.

The possible cellular origin of MAEL domain
Maelstrom (MAEL) is a conserved endoribonuclease present in metazoans and protists. It is related to the regulation of certain endogenous genetic elements such as retrotransposons [51,61,62]. Mutation assays that reduce the activity among MAEL orthologs indicate that MAEL is involved in ssRNA binding and not in catalysis. In particular, MAEL has been described as an RNA-binding protein that interacts with piwi-RNAs (pRNA), protecting the genome from transposons by repressing them in animal gonads [51,63]. MAEL seems to be highly similar in structure to the arenaviral NP ExoN, but lacks the DEDD sequence motif (Figs 1 and 4C). However, evolutionary studies by Zhang et al. [61] demonstrated that the MAEL domain in protists such as Entamoeba histolytica, Trypanosoma brucei, and Leishmania braziliensis is endowed with the DEDD motif. Chen et al. [64] have reported the existence of a MAEL domain in the amoeba that has both sequence motifs (DEDD and an ECHC MAEL-tetrad) and a potential exoribonuclease activity. The structural similarity between the arenaviral exonuclease and MAEL, supports the possibility of a cellular origin of the arenaviral exonuclease domain. As suggested by Sato and Siomi [63], MAEL may have evolved from a DEDDh exonuclease to an ECHC tetrad only by switching the catalytic residues. Thus, the arenaviruses could have hijacked the enzyme prior to the emergence of the RNA-binding moonlighting function and the loss of enzymatic activity as an exoribonuclease. A possible mechanism for protein evolution in nucleases suggested by Ballou et al. [65] following Jeffery's [66] proposal could explain this evolutionary transition reinforcing the hypothesis of its ultimate cellular origin.

Conclusions
The results presented here suggest that ExoN genes have been hijacked by viruses at least three times: once by DNA phages (RB69, ϕ29, T4, T7) and, independently, by arenaviruses and by nidoviruses, both of which are RNA viruses. The presence of ExoN led to a major increase in nidovirales genomes, which are endowed with the biggest viral RNA genomes known. The PSCNV is an odd case and further studies may shed light on its evolutionary history. The case of the cellular MAEL enzyme is, on the other hand, quite impressive; the exonuclease-like domain folding changed its function across the evolution from DNA edition to RNA-binding, losing its catalytic activity in the process.
As shown in Fig 1, ExoN activity is quite unspecific and includes both RNA and DNA substrates. It is reasonable to assume that the early evolution of exonucleases represented a critical step in enhancing the encoding capabilities of primitive RNA genomes. It may not be so difficult to evolve exonuclease activity-after all, it involves a simple hydrolase reaction capable of destroying a phosphodiester bond in a genetic polymer already strained due to a mismatched base-pair. Thus, the postulated transition from small, fragmented RNA to much larger DNA cellular genomes would have been facilitated by the lack of absolute substrate specificities of ExoN [67].
The RNA proofreading activity by the ExoN domain in nidoviruses and the immune evasion function of the arenaviral ExoN highlights the versatility of these enzymes, in which a few structural changes can lead to a novel function. The conservation of the residues that form the ZFs and coordinate the metal ions show the importance of these motifs in the structural stabilization of exonucleases in RNA-based entities such as arenaviruses and nidoviruses. Finally, the presence and the importance of ZFs in the RNA viral exonucleases analyzed in our work opens the possibility of developing antiviral therapies using zinc chelating agents.
Supporting information S1 Fig. Sequence conservation among nidoviral ExoNs. In SARS-CoV ExoN structure, Exo I (DE), Exo II (D/E), and Exo III (D) conserved sequence motifs are highlighted in green. Zincbinding motif 1 (ZF1, CCCH/C) and zinc-binding motif 2 (ZF2, HCHC) are highlighted in red. Zn 2+ is depicted as dark grey spheres and Mg 2+ as a yellow sphere. In the logo, Exo I, Exo II, and Exo III are signaled with green arrows, while ZF1 and ZF2 are signaled with red arrows. Logo was made with WebLogo 3 (http://weblogo.threeplusone.com/).