Structural Analysis of Monomeric RNA-Dependent Polymerases: Evolutionary and Therapeutic Implications

The crystal structures of monomeric RNA-dependent RNA polymerases and reverse transcriptases of more than 20 different viruses are available in the Protein Data Bank. They all share the characteristic right-hand shape of DNA- and RNA polymerases formed by the fingers, palm and thumb subdomains, and, in many cases, “fingertips” that extend from the fingers towards the thumb subdomain, giving the viral enzyme a closed right-hand appearance. Six conserved structural motifs that contain key residues for the proper functioning of the enzyme have been identified in all these RNA-dependent polymerases. These enzymes share a two divalent metal-ion mechanism of polymerization in which two conserved aspartate residues coordinate the interactions with the metal ions to catalyze the nucleotidyl transfer reaction. The recent availability of crystal structures of polymerases of the Orthomyxoviridae and Bunyaviridae families allowed us to make pairwise comparisons of the tertiary structures of polymerases belonging to the four main RNA viral groups, which has led to a phylogenetic tree in which single-stranded negative RNA viral polymerases have been included for the first time. This has also allowed us to use a homology-based structural prediction approach to develop a general three-dimensional model of the Ebola virus RNA-dependent RNA polymerase. Our model includes several of the conserved structural motifs and residues described in other viral RNA-dependent RNA polymerases that define the catalytic and highly conserved palm subdomain, as well as portions of the fingers and thumb subdomains. The results presented here help to understand the current use and apparent success of antivirals, i.e. Brincidofovir, Lamivudine and Favipiravir, originally aimed at other types of polymerases, to counteract the Ebola virus infection.


Introduction
Due to their role in replication, transcription, and reverse transcription in the case of reversetranscribing viruses, RNA-dependent RNA polymerases (RdRp) and reverse transcriptases (RT) are key enzymes in the viral biological cycle. Following the crystallization of the poliovirus RdRp by Hansen et al. [1], over 20 distinct viral RNA polymerases crystals have been obtained which belong to single-stranded positive RNA (ss(+)RNA) viruses of the families Flaviviridae, Picornaviridae, Caliciviridae and Leviviridae; single-stranded negative RNA (ss(-)RNA) viruses of the families Orthomyxoviridae and Bunyaviridae; double-stranded RNA (dsRNA) viruses of the families Reoviridae, Cystoviridae and Birnaviridae; and reverse transcribing viruses of the family Retroviridae. They are all part of the superfamily of DNA-and RNA polymerases, which are characterized by a right hand architecture with three functional subdomains, i.e. fingers, palm and thumb; and a two metal ion mechanism of action in which two aspartic acid residues located in the palm subdomain interact with two divalent metal ions to achieve the nucleophilic attack, which allows the incorporation of the incoming ribonucleotide to the RNA chain [2,3]. The primary structure of RdRps and RTs is characterized by the sequence fingers-palm-fingers-palmthumb, and in the tertiary structures of the former there are several extensions from the fingers, named "fingertips", that extend towards the thumb subdomain, giving the appearance of a closed right-hand shape (Fig 1), in contrast with the U-shaped form of RTs and of DNA-dependent DNA polymerases of the families A, B and Y (Fig 1) [4]. The palm subdomain is the catalytic subdomain and is by far the most conserved region of all monomeric viral RNA polymerases. It is formed by a β-sheet with three to six β-strands that lie above two helices, and has the conserved catalytic aspartic acid residues that coordinate the two metal ions necessary for the phosphoryl transfer reaction (Fig 1E) [5,6]. It has been hypothesized that the palm subdomain is the oldest domain of these enzymes, and that it may be a relic of an RNA/protein world that existed prior to the evolution of cellular DNA genes [7][8][9]. The fingers subdomain is a mixed α/β structure that plays a key role in the interactions with the template strand and the incoming nucleotide. The thumb subdomain is a highly variable subdomain with a predominantly helical structure located opposite the fingers that forms non-specific interactions with the primer strand.
Six conserved structural motifs (motifs A-F) have been identified in the tertiary structures of RdRps and RTs [3][4][5]10,11]. With the exception of motif F [12], which is located in the fingers subdomain, motifs A-E are all located in the palm subdomain ( Fig 1E). Additional structural motifs and functional regions have been identified in some polymerase subgroups such as motifs G [13,14] and H of ss(+) and dsRNA viruses [10], or motifs G and H of ss(-)RNA viruses [15]. All these motifs and functional regions have been shown to participate in the most critical steps for the correct recognition and incorporation of ribonucleotides [1,3,5,10,11] and are described below (Table 1).
The structural motifs of monomeric viral RNA-dependent RNA polymerases and reverse transcriptases Motif A. Motif A is located within the palm subdomain and is formed by a β-strand followed by a helical structure or a loop that continues to the fingers subdomain. At the C-terminus of the β-strand, this motif contains one of the invariant catalytic aspartic acid residues present in DNA-and RNA polymerases (Fig 1E).
Single-stranded positive RNA viruses have a characteristic-DX 4 D-conserved sequence, in which the first aspartate corresponds to the catalytic residue. Structural and thermodynamic evidence shows that the second aspartate in motif A plays a key role in the discrimination of NTP over dNTP by forming a hydrogen bond with the ribose 2' OH moiety [16]. The polymerases of ss(-)RNA viruses and reverse transcriptases lack the C-terminal aspartic acid. The polymerase structures of influenza A and B and the Lacrosse viruses have one conserved lysine, three amino acids downstream of the catalytic aspartic acid, which has been shown to be part of the NTP entrance tunnel [17].
Instead of the second aspartate of motif A, reverse transcriptases have an amino acid with a bulky side chain such as phenylalanine, which is known as "the steric gate" that helps to discriminate between deoxyribonucleotides and ribonucleotides, avoiding the incorporation of the latter by the steric interference of the side chain of this residue with the 2' OH group of the ribonucleotide [18,19]. Apart from the pair of conserved residues within motif A, residues with aromatic rings four amino acids downstream of the catalytic aspartate are conserved in both positive-and negative single-stranded RNA viruses, which suggest that they play an important role either in the structural stability of the protein due to the hydrophobic nature of the side chains, or in nucleotide binding due to their position within the active site. . (E) The highly conserved palm subdomain showing the position of the conserved catalytic aspartic acid residues (edited from 3DLK). The color code for the subdomains is fingers subdomain, yellow; palm subdomain, green; thumb subdomain, red; fingertips, orange. The conserved structural motifs in the palm subdomain are colored as follows: motif A, red; motif B, dark blue; motif C, green; motif D, magenta; motif E, cyan. Motif B. Motif B is located in the transition between the fingers and the palm subdomains. Its structure consists of a loop that connects a β-strand of the fingers and the N-terminal helix of the palm subdomain. The residues of this motif have been shown to participate in binding the template and the incoming nucleotide [20]. The N-terminal loop of motif B is a dynamic structure that participates in template binding. Within this loop, right before the start of the αhelix, both RdRps and RTs have a strictly conserved glycine preceded by a serine in both ss(+) and dsRNA viruses, and a glutamine in retroviruses [10,21]. In the case of influenza A, this glycine is located in a methionine-rich region, but in the arena-and bunyaviruses it is preceded by a glutamine [15,17]. This conserved glycine has been shown to serve as a pinpoint for the loop to change its conformations [20], and mutating this residue results in the complete abolishment of the polymerase function [22].
Single-stranded positive-and dsRNA viruses have a conserved threonine within the α-helix that is located in its N-terminal part and faces the active site of the polymerase. In the next helix turn, also facing the active site, ss(+)RNA viruses have a conserved asparagine, which has been shown to aid in the rNTP selection by correctly positioning the catalytic aspartate in motif A [20]. The structures of segmented ss(-)RNA viruses have two dyads of conserved residues apart from the previously mentioned glycine. One amino acid after this glycine, a phenylalanine and an asparagine are conserved. As mentioned above, this last residue is conserved in ss(+)RNA viruses. Two amino acids downstream there are two conserved hydroxylic residues, either serine or threonine [15,17,23].
Motif C. Motif C follows motif B in the palm subdomain, and is formed by a β-strandloop-β-strand structure. The second catalytic aspartic acid residue conserved in both DNAand RNA polymerases that coordinates the interactions with the metal ions is located within the loop. Motif C is one of the most conserved regions in viral RNA polymerases; all the loops of ss(+), segmented ss(-), ds-and reverse transcribing RNA viruses have two aspartates preceded by a glycine in ss(+) and dsRNA viruses, a serine in segmented ss(-)RNA viruses, and by a methionine in RT viruses. Both aspartates coordinate the interactions with the metal ions [3]. Previous work in other viral polymerases has shown that any mutation of the first aspartic residue results in a complete loss of RNA polymerase activity, whereas mutations of the second aspartate diminish the polymerase activity or modify the metal cofactor requirements, but do not inactivate the enzyme [24,25]. Viral families such as the dsRNA Birnaviridae, that infects animals, or the ss(-)RNA Mononegavirales order, have an asparagine instead of the second aspartic acid in Motif C. It has been shown that mutating this residue reduces the enzyme's activity [14]. Moreover, this substitution enables viral RNA polymerases to use manganese instead of magnesium as cofactor [14]. When the asparagine is replaced by an aspartate, the mutated enzyme replicates the viral RNA more efficiently than the wild type. It was suggested that this less efficient polymerase slows the replication kinetics of birnaviruses, which could favor the virus spread [14]. Motif D. Motif D follows motif C within the palm subdomain. It is formed by an α-helix and a flexible loop adjacent to the palm's β-sheet. It is a highly dynamic structure that changes its conformation when the correct nucleotide is bound. It also serves as a structural scaffold for the palm domain, and has been involved in the protonation of the pyrophosphate leaving group after the nucleotidyl transfer reaction [26,27]. Located at the N-terminal end of motif D's loop, there is one glycine, which has been shown to be conserved in many RdRps from ss(+), ss(-), dsRNA viruses and retroviruses [15,17,21]. It has been argued that this conserved glycine serves as a hinge for this structure that might play a key role in its conformational changes [21].
The second widely conserved residue in motif D is a lysine. It has been proposed that this amino acid, by acting as the general acid that deprotonates the pyrophosphate leaving group, contributes to the rate of nucleotide addition [28]. It is also part of the NTP entrance tunnel in ss(-)RNA viruses [15,17]. Even though this lysine is located in a similar position in the available RdRps crystal structures, it must be underlined that the distance in the primary structure between this conserved residue and the conserved glycine varies in the different viral families.
Motif E. Motif E is a unique structural feature of RdRp and RTs, which has been called "the primer grip". It is a β-hairpin that is located facing the palm subdomain's β-sheet at the junction with the thumb subdomain. As its name implies, this motif has been shown to act in the correct positioning of the 3' OH end of the primer [29].
The level of primary sequence conservation around motif E seems to be lower in viral RdRps, despite the fact that all the crystal structures exhibit the characteristic β-hairpin located in the same position [10,21]. One of the most conserved features in viral RNA polymerases is the presence of an aromatic amino acid at the N-terminal moiety of the loop facing motif C. Singlestranded positive RNA viruses of the families Picornaviridae and Caliciviridae, as well as the dsRNA viruses of the family Reoviridae, exhibit a similar arrangement. The former have a leucine and a basic residue, either lysine or arginine, after the aromatic amino acid, while the latter have a glycine and a lysine. There is evidence that basic residues located in the loop of the β-hairpin of Picornaviridae interact with the primer [30]. In the case of the Flaviviridae, the aromatic amino acid is followed by a cysteine and a serine, while in the Orthomyxoviridae and the Bunyaviridae, it is followed by either threonine or valine and a serine. In the Qβ phage replicase the aromatic amino acid is absent, since in this case a serine is followed by a cysteine and a glycine.
Viruses with a protein-priming mechanism such as the Picornaviridae have larger templatebinding channels, in which two basic residues with long side chains protruding towards the active site in motif E easily fit [31], while viruses that have de novo initiation such as the Flaviviridae [32,33], the bacteriophage ϕ6 [34], and ss(-)RNA viruses [3,17,23,35,36] have more elaborate thumb subdomains and a structure which has been named "the priming loop", which is a β-hairpin that protrudes towards the active site creating a platform for priming and reducing the space for large side chains in motif E.
At the C-terminus of the β-hairpin, ss(-)RNA viruses have one conserved glycine. Hass et al. [37] have shown that this conserved residue is required by ss(-)RNA viruses polymerases for transcription, but not for genome replication. The position of this glycine in the threedimensional structures suggests that it might work as a hinge for the thumb domain to move [15,17,23].
Motif F. Besides the conserved structural motifs A to E, which practically define the palm subdomain, one additional conserved motif named motif F has been identified in the fingers subdomain of all the crystallized RNA viral polymerases.
This motif extends from the fingers subdomain towards the thumb subdomain as part of the fingertips, directly on top of the palm subdomain active site. This long structure has numerous basic residues that interact with the negatively charged phosphate backbone of the incoming nucleotide, and has been shown to be part of the NTP entrance tunnel in ss(-)RNA viruses [3,32,38]. The only conserved residue in all the motif F structures is an arginine located near its C-terminus.
Single-stranded positive and double-stranded RNA viruses polymerase motif G. Gorbalenya et al. [13] and Pan et al. [14] identified the so-called motif G, which is an additional conserved structural motif found in many ss(+) and dsRNA viruses, which is found in the fingers subdomain approximately 120 amino acids upstream of motif A's catalytic aspartic acid. The consensus sequence of the motif is S-X-G, and forms a loop that is part of the template entrance tunnel.
Single-stranded positive, double-stranded RNA polymerases and reverse transcriptases motif H. Cerny et al [10] recently proposed the presence of an additional conserved structural motif located in the thumb subdomain of ss(+), dsRNA, and RT viruses, which they named motif H. It is formed by an helix-turn-helix structure, but there is not a single strictly conserved amino acid within the motif. This motif has been identified based solely on multiple sequence alignments, and its actual function has not been described [10].
Segmented single-stranded negative RNA viruses structural motifs G and H. Apart from the conserved structural motifs A-F, Gerlach et al. [15] identified two additional motifs G and H in the segmented ss(-)RNA viral polymerases. These motifs are located in different positions and have functions different from those of the previously named motifs G and H of ss(+) and dsRNA viruses [10,21]. Gerlach's motif G is found in the C-terminal region of the influenza virus PA subunit, and in the N-terminal half of the LaCrosse virus L protein, facing the active site, and has the sequence RKLL and RYMI, respectively. It has been proposed that the conserved arginine could interact with the priming NTP. The proposed motif H is found, sequencewise, in the region between motifs A and B, and is located in the fingers subdomain. "on top" of motif B. It has one conserved lysine which has been proposed to stabilize motif B [15].
Single-stranded positive RNA viral conserved functional regions. The recently proposed functional regions seem to be conserved in RdRps of ss(+) RNA viruses [2]. Two of them are located in the fingers subdomain and interact with the template RNA strand, while the third functional region is located in the thumb subdomain and binds the nascent RNA strand [2].
Given the availability of crystal structures of polymerases from the four major groups of RNA viruses, i.e., ss(+), ss(-), ds, and reverse-transcribing, we present in this paper a phylogenetic tree built based on comparisons of RdRps and RTs' tertiary structures, which might help us understand the evolutionary relationships among these enzymes. Prompted by the lack of a crystal of the Ebola virus (EBOV) [39] polymerase, we have used this data to build a homology-based three-dimensional model of the EBOV RdRp domain and, by extension, of all the other viruses that belong to the Mononegavirales order, which include viruses associated with important human pathologies such as measles, rabies, human respiratory syncytial virus, and the Ebola hemorrhagic fever, among many others. The recent availability of ss(-)RNA viral polymerases tertiary structures [15,17,23,40] allowed an evaluation of our approach. As expected, our results demonstrate that the EBOV RdRp shares a homologous catalytic palm subdomain and other functionally important motifs with the other viral RdRp described thus far. As argued here, the evolutionary conservation of three-dimensional features common to all the monomeric polymerases analyzed here help explain the recent reports of the successful use against the EBOV infection of antivirals originally targeted to inhibit other types of polymerases, such as RTs and DNA-dependent DNA polymerases. Pairwise structural comparisons between the different RdRps and RTs mentioned above were performed with the Secondary Structure Matching (SSM) program [41] included in the PDBe web server. In the case of reverse transcriptases, both the connection and the RNase H domains were deleted for the comparisons. All the crystals have a resolution of 3 Å or higher. The results of each set of comparisons allowed the construction of a matrix that included the number of residues in each of the structures, the Root-mean Square Deviation (RMSD), and the number of aligned residues.

Structural comparisons and dendogram construction
A geometric distance measure was then estimated for each of the comparisons using the Structural Alignment Score [42], which is calculated according to the following formula: (RMSD x 100)/number of aligned residues.
The program FITCH, included within the PHYLIP package, was used to transform the geometric distance into an evolutionary distance and FigTree (http://tree.bio.ed.ac.uk/software/ figtree/) was used to visualize the resulting tree.
Ebola virus L protein study: remote homology detection and threedimensional structure modeling The search for homologs of the EBOV polymerase and the three-dimensional structure modeling of the EBOV RdRp domain were performed using the PHYRE web server version 2.0 [43]. One Zaire Ebola virus L protein sequence (Sierra Leona, Makona-G3686.1; AIE11922) from the current outbreak was downloaded from NCBI's Viral Genome Resource (http://www.ncbi. nlm.nih.gov/genome/viruses/). The sequence was edited according to the information provided by the Conserved Domain Database [44], leaving only the fragment of the sequence which corresponds to the entry "Mononegavirales RNA dependent RNA polymerase" (CDD 250248). This edited fragment of the protein was used as the PHYRE version 2.0 server query sequence.
The three-dimensional model and its images were edited with Chimera 1.8 [45].
Mononegavirales L protein secondary structure-based multiple sequence alignment Secondary structure-based multiple sequence alignments were built using the PROMALS3d web server [46]. For each EBOV species, one L protein sequence was randomly chosen with the exception of the Zaire EBOV, for which two sequences were chosen, one from the 1976 Yambuku-Mayinga outbreak and another one from the current outbreak.

Results and Discussion
Tertiary structure-based phylogeny of RdRps and RTs RNA-dependent RNA polymerases have been used as evolutionary markers because of their presence in all RNA viruses and their several conserved regions and number of amino acids [10,38,[47][48][49]. However, the high level of primary structure divergence among the different groups of RNA viruses has hindered their usefulness as a tool for obtaining insights into deep phylogenies and the evolutionary relationships between the different viral families.
A relatively recent important alternative to the construction of primary structure-based phylogenies are evolutionary trees based on the comparison of tertiary structures. This method has proven to be particularly useful when trying to assess the evolutionary relationship between homologous proteins with high levels of sequence divergence [50]. The unrooted phylogeny we have constructed using this approach is shown in Fig 2 and is part of this trend. It exhibits several different well-defined branches, each of them clustering one or two viral families. One of the branches groups the ss(+)RNA viruses of the families Picornaviridae and Caliciviridae. Double-stranded RNA viruses are not grouped into a single branch. There are two branches with only dsRNA viruses close from one another in the tree. One of which corresponds to the Birnaviridae family, and the other to the Cystoviridae family. The fact that they are not grouped in one single clade might be due to the presence of a circular permutation in the Birnaviridae polymerase that alters the topology of the palm subdomain. A major clade groups dsRNA viruses of the family Reoviridae and ss(+) viruses of the family Leviviridae, with this latter diverging from the dsRNA branch. Another one clusters the ss(+)RNA viruses of the family Flaviviridae. In the tree shown in Fig 2, one branch groups ss(-)RNA viruses, i.e., the LaCrosse virus and the Orthomyxoviridae family polymerases. The VSV L protein structure is not included in this tree because its resolution is below the 3A threshold. As shown in Fig 2, the longest and most distant branch groups together the RTs with the eukaryotic telomerase stemming close to the root of this clade. Phylogenetic trees of ss(-)RNA viruses using the most conserved regions of the RdRp [38,49] show that segmented negative-stranded RNA viruses are phylogenetically related and are grouped in a separate clade from the Mononegavirales order. The latter are a monophyletic group, and each family (Paramyxoviridae, Rhabdoviridae and Bornaviridae) has its own branch, with the exception of the family Filoviridae, which stems from the Pneumovirinae subfamily within the Paramyxoviridae node.
The monophyly of ss(-)RNA viral polymerases is supported by additional biochemical and structural data. Transcription and replication of all ss(-)RNA viruses is similar. First the mRNA, which is modified in a later stage, is synthesized using the genomic RNA as template. A complementary positive strand is also formed, which is used as the template for the synthesis of the genomic negative sense RNA, which is then packed inside the virions. Other shared features of negative-strand RNA viruses are the fact that the polymerase template consists of a ribonucleoprotein complex in which the viral nucleoprotein is bound to the genomic RNA [4] and, that the polymerases use a de novo initiation mechanism [15,17,23,35,36] The mRNA of ss(-)RNA viruses must be capped to be recognized by the cellular protein synthesis machinery. Non-segmented RNA viruses have within the L protein a capping domain located bordering one of the faces of the RdRp domain forming the template channel. No structural homologs are known for this domain, and the capping mechanism is different from that of eukaryotic cells and segmented ss(-)RNA viruses. It works by the attack of a guanosine nucleotide to a histidine residue covalently bound to the 5' end of the RNA, i.e., it is said to have GDP polyribonucleotidyl transferase activity [40]. On the other hand, segmented RNA viruses carry a protein, which can be part of the polymerase protein (Arenaviridae and Bunyaviridae) or can be synthesized as a multi-domain protein (Orthomyxoviridae), that "steals" the capping from the cellular proteins, which are then used as transcription primers. This process has been named cap-snatching [51,52].
The ss(-)RNA viral polymerases are big complexes endowed with many different functional domains. In the case of the Orthomyxoviridae family, the polymerase is a heterotrimeric complex formed by the proteins PA, PB1 and PB2. The PB1 protein contains the RdRp domain [17,23]. In the case of the LaCrosse orthobunyavirus [15], the polymerase complex is part of the L protein, a 2250 amino acid-long protein that includes at least three distinct functional domains: endonuclease, PA-C like, and RdRp domains [15]. Despite the different coding strategies and a lack of primary sequence homology, structural comparisons between the Orthomyxoviridae and the Bunyaviridae polymerases show a linear correlation between the complexes and homologous functional domains, i.e. PA and the N-terminus of the L protein; PB1 and the central region of the L protein; and PB2 and the C-terminus of the L protein.
Electron microscopy (EM) studies of the VSV polymerase have revealed certain differences in the overall structure and organization of non-segmented ss(-)RNA viruses [40] in comparison with the polymerases from segmented ss(-) RNA viral polymerases [15,17,23]. The Mononegavirales L protein consists of five domains, i.e., the RdRp, capping, connector, methyltransferase and the C-terminal domains. It has a number "6" shape, in which the bottom part is formed by the RdRp and the capping domain, and the top is formed by the connector, methyltransferase and C-terminal domains [40].
The different location of the capping enzymes (N-terminal in segmented ss(-)RNA viruses vs C-terminal in the Mononegavirales), the differences in the quaternary structure of the polymerases, and the distinct capping mechanisms indicate a ss(-) RdRp ancestor and later accretion events during which the complementary functional domains were independently acquired.
Different attempts to construct evolutionary trees of viral RdRps and RTs based on structural comparisons including this work [10,53] yield similar but not identical results. This can be easily understood as the outcome of the different methodological approaches developed by Cerny et al. [10], Mönttinen et al [53], and ourselves. Cerny et al. [10] used a dual method. First they made structural multiple alignments to improve a primary sequence-based alignment, followed by the construction of a matrix with the structural "phenotypic" features of viral RdRps and RTs, and then they combined the two results to construct an unrooted evolutionary tree. On the other hand, Mönttinen et al. [53] made automated comparisons of the available DNA-and RNA polymerases structures in order to construct a normalized geometrical distance matrix, which was in turn converted to an "evolutionary" distance matrix, which was then used to build a phylogenetic tree. For this last step, both Mönttinen et al. [53] and our group (Fig 2), used the FITCH algorithm, since other algorithms such as KITSCH assume that all the species included in the analysis are contemporary and that there is a molecular clock.
While the results of Cerny et al. [10], Mönttinen et al. [53] and our group share many similarities and exhibit some differences, it is equally significant that none of them is consistent with the Baltimore classification of RNA viruses [54]. In all three reports, polymerases from the ss(+) and dsRNA viruses are interspersed, and in the trees of Mönttinen et al. [53] and ours (Fig 2), there is one branch that includes polymerases of both Leviviridae and Reoviridae. The Qβ-phage, a ss(+)RNA virus of the family Leviviridae, is distant from all other ss(+)RNA viruses in the three phylogenies, forming one independent branch in the tree of Cerny et al. [10], and diverging from the Reoviridae family in the other two trees (Fig 2) [53]. In all three cases, the most distant and longest branch corresponds to reverse transcriptases, both viral and cellular. The recent availability of ss(-)RNA viral polymerases [15,17,23,40] allowed us to include them in the phylogenetic tree (Fig 2). Although the structure of the mononegaviral VSV polymerase has been recently reported [40], we have not included it in our tree due to its low resolution (3.8A).
A theoretical 3D structure of the Ebola virus RNA polymerase: a model for non-segmented single-stranded RNA viral polymerases Due to the biomedical relevance of ss(-)RNA viruses and the lack of structural information of their RdRp, several attempts had been made to model the polymerase domain based on the homology with other RNA viruses whose polymerases had already been crystallized. A predicted model of an Arenavirus RdRp domain [38] built using the hepatitis C virus polymerase as a reference showed a remarkably similar structure to the rest of the RdRps and included the entire palm subdomain, fragments of the fingers subdomain, and the structural elements inserted between the palm, as well as the N-terminal region of the thumb subdomain. Later works by Hass et al. [37] demonstrated that the predicted structural model of the Arenavirus polymerase was correct, and that several of the conserved residues located within the conserved structural motifs A-F are relevant for the proper function of the enzyme, and that mutations to most of these residues completely abolish the polymerase's catalytic reaction.
The current epidemic of the EBOV hemorrhagic fever is by far the biggest outbreak of this disease since its discovery in 1976. The high mortality rates ranging from 40 to almost 90% [55][56][57], combined with the lack of approved vaccines and effective treatments against the virus, have pushed the biomedical community in the search towards a better understanding of this pathogen.
EBOV is part of the family Filoviridae which, together with the families Rhabdoviridae, Paramyxoviridae and Bornaviridae, forms the Mononegavirales order. The members of this highly diverse group of viruses all have a linear, monopartite, negative strand RNA genome, share transcription and replication strategies, and exhibit a conserved arrangement of at least five genes that encode for nucleoprotein, phosphoprotein (VP35 in EBOV), matrix protein, glycoprotein and L protein [58]. The filoviruses have two additional proteins located between the glycoprotein and the L-protein, VP30, which is an essential cofactor for the Filoviridae mRNA synthesis [59], and VP24, which participates in nucleocapsid formation [60], viral assembly and budding [61] as well as in viral evasion from the host immune system [62].
The L protein of Ebola virus is a multifunctional protein about 2210 amino acids long with a molecular weight of approximately 250 kDa engaged in viral transcription, genome replication, mRNA capping, mRNA methylation and polyadenylation [63,64]. Poch et al [65] identified six blocks with a high degree of conservation in the entire L protein of the five species of Mononegavirales available 25 years ago, and concluded that each block could be performing a particular function of the L protein. The work of Liang et al. [40], has shown that conserved blocks I-III are located in the RdRp domain; blocks IV and V are found within the capping domain; and conserved block VI is part of the methyltransferase domain.
The description of the VSV L protein structure to a 3.8 A resolution [40] has provided new insights on the overall three-dimensional arrangement of the Mononegavirales polymerase (vide supra), including several conserved traits with the previously characterized RdRps and RTs. Prompted by the lack of a tertiary structure of the EBOV polymerase to an atomic resolution level, we have developed a three-dimensional model of the RdRp domain of the EBOV L protein by using the PHYRE 2.0 web server and, with the addition of a Mononegavirales L protein secondary structure-based multiple sequence alignment, identify conserved residues within the enzyme that might help in the design of specific drugs that could counteract the EBOV epidemic.
The best match yielded by the PHYRE 2.0 server to our EBOV polymerase sequence corresponds to the bat influenza A virus (PDB code 4WSB, chain B) [17]. Only 253 residues of the EBOV polymerase could be aligned with confidence levels of 91.7%, and the identity between the fragments of the two proteins is of 12% (S1 Fig). Although the ss(-)RNA LaCrosse virus polymerase is now available [15], its use as a template in the alignment and model prediction were not encouraging. The sequence coverage is larger compared with other alignments (358 residues aligned) and the identity was 13%, but the predicted model only ranked 15 th with a confidence level of 33.9%.
Our predicted three-dimensional model of the EBOV polymerase allowed the identification of the fingers-, palm-, and thumb subdomains structures and in the same sequential order of DNA-and RNA polymerases, i.e., fingers-palm-fingers-palm-thumb. This is consistent with the recently reported structure of the VSV L protein [40]. The fingers include residues 417-439 and 489-563, the palm subdomain is formed by residues 440-488 and 563-666, and the thumb subdomain includes residues 667-704 (Fig 3A). Adding the secondary structure-based multiple sequence alignment, we were able to identify motifs A to F in our model, which are six of the conserved motifs in the RdRps and in the RTs crystallized so far [1,10,66]. We could not identify the recently proposed motifs G and H of segmented ss(-) RNA viruses [15], which are not part of the active site. The fragment of our predicted three-dimensional EBOV polymerase model is in excellent agreement with the recently published structure of the VSV L protein [40], in which the conserved palm subdomain with structural motifs A to F are observed. The EBOV polymerase model presented here shows that motif A is formed by a β-strand followed by a ten amino acids-long loop (Fig 3B). The aspartic acid 483 of the EBOV L protein's motif A is conserved in all the Mononegavirales. This residue matches the catalytic amino acid and is located in the same position as in all the other viral RNA polymerases. It is followed by leucine, glutamate, lysine, tyrosine and asparagine. The lysine residue is conserved in both segmented and non-segmented ss(-)RNA viruses, and is part of the NTP entrance tunnel [17]. In spite of its high level of conservation, Hass et al. [37] proved that the polymerase maintained a normal level of proficiency when this residue was substituted with alanine. It is followed by a residue with an aromatic ring (S2 Fig), which is conserved in both ss(+) and ss(-)RNA viral polymerases. Nine amino acids after the catalytic aspartate, there is one strictly conserved arginine in Only the residues that could be aligned with a 90% confidence or higher are drawn. The color code is the same as in Fig 1; (B) Conserved structural motifs in the Ebola virus polymerase. The motifs are colored as in Fig 1: motif A, red; motif B, dark blue; motif C, green; motif D, magenta; motif E, cyan. The image has been amplified for a better view of the active site; (C) Secondary structure prediction of the Ebola virus RNA-dependent RNA polymerase. Only the fragment that could be confidently aligned according to the PHYRE results is shown. The color lines under the sequence match the three structural subdomains and are the same as Fig 3A). The color frames surrounding the sequence match the conserved the Mononegavirales order (S2 Fig). This residue is located in the fingers subdomain, relatively far from the active site, and might be involved in interactions with some of the other functional domains of the L protein, or with the proteins required for the transcription and replication processes. Motif A is found nested within Poch et al. [65] conserved block III.
The EBOV polymerase model's motif B is formed by a loop followed by a long α-helix. The sequence of the loop and the N-terminus of the helix are GGIEGLQQKLWT. According to its position in the model, the third glycine corresponds to the conserved glycine of RdRps and RTs (vide supra). The position in our model of glutamine 564, lysine 565, tryptophan 567 and threonine 568 suggest that they might be involved in the interactions with the incoming nucleotide. These five residues are highly conserved in the Mononegavirales order with a few exceptions (S2 Fig). Their high level of conservation suggests that their interactions are required for the proper functioning of the enzyme, and their position in the EBOV RNA polymerase threedimensional structure presented here compared with the known viral RdRps crystal structures hints that these conserved residues could be involved in ribonucleotide selection over dNTPs.
The predicted motif C has the characteristic structure β-strand-loop-β-strand, and its loop has one aspartate residue within the sequence MGDNQ that matches the second strictly conserved amino acid (Asp593) (Fig 3C). The model presented shows that the aspartate and the asparagine are in position to interact with the metal ions and complete the nucleotidyl transfer reaction. The Mononegavirales polymerase sequence has the tetrad GDNQ conserved amongst all its families, with the sole exception of the genus Novirhabdovirus, in which the glutamine has been substituted by a valine (S2 Fig). Directed mutagenesis in viruses belonging to other families of the Mononegavirales, i.e. Rhabdoviridae and Paramyxoviridae, have shown that mutations to the aspartate or the asparagine of Motif C completely abolish the enzymatic activity [67][68][69]. The strict functional dependence on the asparagine of motif C has to be associated with differences in the active site architecture which will have to be unraveled once the EBOV polymerase crystal is available. The EBOV polymerase model presented here predicts that motif D is formed by an α-helix followed by a long loop (Fig 3A and 3B). The helical structure has a predominance of hydrophobic residues, which is consistent with its role as a structural scaffold, while the loop is formed by the sequence GIFLKPDET. The Mononegavirales secondary structure-based multiple alignment shows the presence of hydrophobic residues in the helical structure followed by a loop with the consensus sequence G-(L/H/I)-X-(L/I)-K-X2-E-T (S2 Fig). Glycine (Gly635), located at the N-terminal end of motif D's loop (Fig 3), corresponds to the glycine that has been shown to be conserved in many RdRps from ss(+), ss(-), dsRNA viruses and retroviruses [15,17,21]. Lysine 639, which is conserved in the Mononegavirales (Fig 3 and  S2 Fig), may be the general acid identified in other RdRps, that deprotonates the leaving pyrophosphate group. Finally, glutamic acid 642, which is also highly conserved in the Monegavirales order, (S2 Fig) might be participating in the interactions with the incoming nucleotide.
The EBOV polymerase motif E has the characteristic β-hairpin structure and the sequence FIYFGKKQYL (Fig 3C). In the Mononegavirales its conservation level is low compared with the rest of the motifs, with only the triad of residues (F/Y/M)-(G/S/N)-K exhibiting conservation levels above 60% (S2 Fig). As noted above, motif E has a conserved residue with an aromatic ring, which in the case of the EBOV model could be tyrosine 651 or phenylalanine 652. Although Poch et al. [65] identified within block III a conserved region that was named motif D, the structural alignment and the three-dimensional predicted model presented here shows structural motifs depicted in Fig 1). Residues involved in metal ion coordination are highlighted in red; conserved residues involved in template-primer interactions are highlighted in blue; conserved residues likely participating in structural stability and motion are highlighted in yellow.
doi:10.1371/journal.pone.0139001.g003 that it corresponds to structural motif E and the first residues of the thumb subdomain (vide infra) (Fig 3C and S2 Fig).
Our EBOV polymerase model lacks motif F. Nevertheless, analysis of the PHYRE 2.0 alignments with other polymerases such as encephalomyocarditis virus 3dpol and Sapporo virus RdRp and their three-dimensional structures, allowed the identification of a region approximately 70 amino acids upstream of motif A that could correspond to motif F (Fig 3 and S2  Fig). The sequence of this region has four basic residues, FSLKEKELNVGRTFGK. Three of these four basic residues, as well as phenylalanine, are conserved in the Mononegavirales polymerase (S2 Fig). Poch et al [65] identified conserved block II and proposed that, due to the presence of several basic residues, it might be an RNA binding domain. Our work suggests that motif F corresponds to Poch et al. [65] conserved block II.
The last residues of the EBOV polymerase that could be confidently aligned with other polymerases match the N-terminal helices of the thumb subdomain. Two helices were identified. The first one, which is closer to the active site, has two basic residues, one lysine and one arginine (Lys 668 and Arg 672), which are conserved in all the Mononegavirales (S2 Fig), and that could be interacting with the primer strand in the polymerase active site. The second helix, which may have a structural stabilization role, is mainly hydrophobic and exhibits a higher variability without any conserved residue. Even though this predicted region is helical, the connectivity between the predicted helices does not match the motif H proposed by Cerny et al [10].
The EBOV polymerase model presented here lacks the priming loop present in RNA viral polymerases that use a de novo initiation mechanism, because the modelling technique we have used only exhibits the enzyme fragment with good confidence levels, and does not include the additional L protein functional domains. This is a strong limitation of our model, since in the ss(-) mononegaviral VSV polymerase structure recently published [40] the priming loop can be seen protruding from the cap domain into the active site.
We have included our EBOV polymerase model in the RdRps and RTs structure-based phylogenetic tree. As shown in

RNA-dependent RNA polymerases as therapeutic targets
An overwhelming majority of the recent emergent human epidemics are caused by RNA viruses [70], and in spite of the major advancements that have been obtained regarding the treatment of hepatitis C virus [2] and human immunodeficiency virus infections [71], as of today there is no specific drug designed to counteract several of these highly pathogenic diseases [72], including the EBOV infection with its high mortality rate. The fact that RNAdependent RNA polymerization is an essential process for the viral cycle makes it a very attractive target for the development of antiviral drugs [20]. Indeed, most of the antivirals currently approved are drugs aimed at inhibiting the activity of this crucial enzyme [71], including Brincidofovir (CMX-001), Lamivudine and Favipiravir (T-705), which are being tested against EBOV and have proven to have antiviral activity in vitro or in vivo [73][74][75] by inhibiting the polymerase activity. Brincidofovir is a nucleoside phosphonate analog that inhibits DNA chain extension and has proven to be effective against the double-stranded DNA viruses of the families Adenoviridae, Poxviridae and Herpesviridae [76][77][78], and is already in the late stages of trials as an antiviral for the aforementioned pathogens [79,80].
Lamivudine is a nucleoside-analog reverse-transcriptase inhibitor (NRTI) that also acts as a chain terminator due to its lack of a 3'hydroxyl end [81]. It has been used for many years as an antiretroviral drug in the treatment of hepatitis B chronic infection, although due to the resistance rates it is no longer a first-line drug [82]. It has also been employed against human immunodeficiency virus infections, usually as part of a combination of a multi-drug treatment [83]. During the current EBOV outbreak, it was reported that the treatment with lamivudine early in the infection resulted in the cure of 13 out of 15 patients [84].
Nucleotide/nucleoside analogues are drugs aimed at the active site of RNA-and DNA polymerases that compete with the natural substrates for incorporation into the nascent nucleic acid strands, and may act either as chain terminators or as mutagenic agents [93,94]. Protein crystal structures have played a key role in the development of new drugs, since they allow the visualization of the interactions that take place inside and between proteins, which in turn,  Evolution of Monomeric Viral RNA Polymerases helps to unravel the atomic interactions that occur between an enzyme and its substrates. They have also been useful in determining which point mutations generate resistance to certain drugs [2,[95][96][97].
No polymerase is endowed with absolute template-or substrate specificity [27,[98][99][100], and the available crystal structures of complexes of DNA-and RNA polymerases with nucleotides or nucleotide analogues all exhibit similar binding mechanisms [30,[101][102][103][104][105]. The incoming nucleotide has several interactions with key residues within the active site in order to be correctly positioned for the nucleophilic attack. The triphosphate moiety of the incoming nucleotide interacts with the strictly conserved aspartic acid residues of the palm domain's motifs A and C which are, in turn, interacting with two divalent metal ions. The coordination of this moiety is completed by interactions with basic residues of the fingers domain's motif F, which is present in both RdRps and RTs, but absent in DNA-dependent DNA polymerases. Instead, the analogous region in these enzymes is an α-helix, named helix O, which has several basic residues that point towards the active site and coordinate the triphosphate region of the incoming nucleotides [101]. The sugar moiety of the nucleotide interacts with residues in motif A and, in the case of RdRps, with residues in motifs A and B. As mentioned above, residues in motif A play a key role in the discrimination of the correct substrate. DNA-dependent DNA polymerases and RT have a residue with a bulky side chain such as glutamate, tyrosine or phenylalanine that serves as a steric barrier that prevents the incorporation of ribonucleotides into the nucleotide binding pocket [18]. The selection of ribonucleotides in RdRps is determined by the interactions of the 2' OH moiety with the second conserved aspartate in motif A and the conserved asparagine in motif B of ss(+) and ds RNA viruses. Finally, most of the interactions of the incoming nucleotide base moiety are made with the template and the primer bases. The completion of the abovementioned interactions is a major determinant of the intrinsic fidelity of these enzymes, which is enhanced by the presence of an exonuclease domain in many DNA polymerases, which is clearly a later evolutionary addition [7][8][9].
A structural superposition of the EBOV RdRp with a foot-and-mouth disease virus polymerase bound to a template-primer RNA and ribavirin triphosphate (Fig 5) was drawn with Chimera 1.8 using the palm subdomain as reference (PDB code 2E9R) [30]. The image shows that the EBOV RdRp central crevice has enough space to hold a dsRNA, and the interactions between the RNA and the protein are analogous to those previously observed in RdRps (Fig 5). The primer might could be bound to residues located in the thumb subdomain and residues within palm domain's motif E, while the template might be coordinated by residues located in the fingers subdomain. Moreover, the ribavirin triphosphate is located in the predicted active site in which residues from the palm subdomain's motif B loop and the helix N-terminal region could be interacting with the base-and sugar moieties, and residues from motif A may be forming bonds with the sugar-and triphosphate moieties (S4 Fig). Even though previous work showed that ribavirin had no antiviral effect on animal models [106], our predicted model shows that this drug could fit into the EBOV polymerase NTP binding site, and that most of the interactions of NTPs with the protein could be formed. This supports the work of Morin et al. [36], which demonstrated that ribavirin has in vitro activity against the Mononegavirales RdRp.
Crystal structures of ss(-)RNA viral polymerases with NTPs at the active site are required in order to elucidate the exact mechanism of nucleotide binding in these enzymes. However, the high level of conservation of the palm subdomain, together with the similarity of the interactions within the active site in DNA-and RNA polymerases are essential to understand why some nucleotide analogues have broad antiviral spectra such as favipiravir, which has proven to have antiviral activity even against different types of RNA viruses, including ss(-) and ss(+) RNA viruses, or brincidofovir, which has antiviral activity against both DNA-and RNA viruses. Our predicted EBOV polymerase model indicates that this RdRp shares the same basic architecture and mechanism of action, including the structural motifs and some of the residues that participate in nucleotide binding. Therefore, drugs aimed at the active site of different types of polymerases (Fig 6), such as those mentioned above, might also interfere with the functionality of the EBOV RdRp, albeit with less specificity.

Conclusions
RNA-dependent RNA polymerases are encoded by RNA viruses from different families with a variety of genome organization and replication strategies. All known viral RNA polymerases are homologous monomeric enzymes. Whether this is true or not for all RNA viruses remains to be proven. The availability of more than twenty distinct viral RNA polymerase crystals of different RNA viruses and retroviruses reveals the characteristic right hand architecture typical Ebola virus RNA-dependent RNA polymerase predicted model bound to a template-primer RNA and ribavirin triphosphate. The figure is based on the structural superposition with foot-and-mouth disease virus polymerase using the palm subdomain as reference. The active site has been slightly amplified to allow a better visualization. The EBOV polymerase predicted model is in grey, the template strand is in yellow, the primer strand is in green and the ribavirin triphosphate is colored according to Chimera's elements palette. doi:10.1371/journal.pone.0139001.g005 Evolution of Monomeric Viral RNA Polymerases of the superfamily of DNA-and RNA polymerases, with fingers, palm and thumb functional subdomains [4]. The palm subdomain is the catalytic subdomain, and is by far the most conserved region of single subunit DNA-and RNA polymerases [7,9].
Although attempts to group RNA viruses based on polymerases sequence data have been criticized [48], the pioneering analysis of animal and plant viruses by Kamer and Argos [107], Poch et al. [65] and others [108][109][110] led to the identification of shared conserved motifs and the recognition of the evolutionary relationships between ss(+), ss(-), dsRNA and retroviruses based on RdRp and RT sequences. In this work we have constructed a tertiary structure-based phylogeny that includes viral RdRps and RTs, as well as an eukaryotic telomerase (Fig 2). Our phylogeny exhibits an overall topology similar to those reported by Mönttinen et al. [53] and Cerny et al. [10], although the three trees are based on different assumptions. It has been argued that viral polymerases are not good phylogenetic markers [48]. However, the robustness of three-dimensional based phylogenies is supported by the consistency of the results reported by Cerny et al [10], Mönttinen et al [53] and ourselves, in which the same basic topology and branch distributions are observed. These results indicate that three dimensional-based phylogenies are an important alternative to the primary structure-based phylogenetic trees of RNA-based genetic systems. It is interesting to note that none of the trees is consistent with the Baltimore classification of RNA viruses [54], suggesting the polyphyly of changes in template organization, especially of double-stranded RNA genomes, because of their enhanced chemical stability.
We have also proposed here a three dimensional model of the EBOV RdRp (Fig 3) using homology-based structural prediction of the available amino acid sequences of the Ebola L protein based on the highly conserved and widely distributed motifs characteristic of the polymerase palm domain. These conserved motifs play a critical role in nucleotidyl transfer reaction, ribonucleotide binding, and in the conformational changes of the enzyme. Our predicted fragment of the EBOV polymerase is in agreement with the recently reported structure of the VSV polymerase [40]. The approach we have developed is comparable to the calculation of a model of an Arenavirus RNA polymerase using the hepatitis C viral RNA polymerase reported by Vieth et al [38]. Using as a scaffold the recently reported crystal structure of the bat influenza A viral polymerase, we have developed an in silico model of the spatial distribution of a 253-amino acid residue data set of the Ebola virus RNA polymerase with a 92% certainty.
A multiple alignment based on secondary structure prediction of the negative single stranded RNA viruses of the Mononegavirales order (S2 Fig) allowed us to identify several conserved residues, not only in this group of viruses but also in ss(+), ss(-), dsRNA viruses and reverse transcribing viruses. As summarized in this work, our model includes the A-E conserved structural motifs described in other viral RdRps that define the highly conserved righthand catalytic palm subdomain as well as portions of the fingers and thumb subdomains. The conserved structural similarity of the EBOV polymerase palm subdomain with the viral and cellular DNA polymerases proposed here is consistent with the hypothesis that it is one of oldest identifiable structural domains present in extant viruses and cells [7][8][9]. The monophyletic origin of all the monomeric polymerases analyzed here has important implications for our understanding of the origin and evolution of mobile genetic elements.
The crystal structures of complexes of DNA-and RNA polymerases with nucleotides or nucleotide analogues show that very similar binding mechanisms are involved [55,[101][102][103][104][105]. The incoming nucleotide has several interactions with key residues within the active site in order to be correctly positioned for the nucleophilic attack. The work presented here helps to understand the current use and apparent success of antivirals, i.e. Brincidofovir, Lamivudine and Favipiravir, originally aimed at other types of polymerases, to attack the Ebola virus infection. The strong conservation of the EBOV polymerase functional sites discussed here on the basis of its three-dimensional structure explains the action of these replication inhibitors originally designed for DNA and distinct RNA viruses, and may assist in the search of new therapeutic agents against these subcellular pathogens. The residues predicted to form helical structures are in red; the residues predicted to form β structures are in blue. Only the residues that could be matched to the bat influenza A virus in the Ebola virus polymerase are shown (vide supra). The lines below the multiple alignment match the polymerases subdomains: fingers subdomain, yellow; palm subdomain, green; thumb subdomain, red. The colored frames correspond to the RNA-dependent RNA polymerases conserved structural motifs: motif A, red; motif B, blue; motif C, green; motif D, magenta; motif E, cyan; motif F, orange. The catalytic aspartic acid residues are highlighted in red. The residues with a high degree of conservation in the Mononegavirales are highlighted in yellow.