Phylogenetic Analysis of the Endoribonuclease Dicer Family

Dicers are proteins of the ribonuclease III family with the ability to process dsRNA, involved in regulation of gene expression at the post-transcriptional level. Dicers are conserved from basal metazoans to higher metazoans and contain a number of functional domains that interact with dsRNA. The completed genome sequences of over 34 invertebrate species allowed us to systematically investigate Dicer genes over a diverse range of phyla. The majority of invertebrate Dicers clearly fell into the Dicer1 or Dicer2 subfamilies. Most nematodes possessed only one Dicer gene, a member of the Dicer1 subfamily, whereas two Dicer genes (Dicer1 and Dicer2) were present in all platyhelminths surveyed. Analysis of the key domains showed that a 5′ pocket was conserved across members of the Dicer1 subfamily, with the exception of the nematode Bursaphelenchus xylophilus. Interestingly, Nematostella vectensis DicerB grouped into Dicer2 subfamily harbored a 5′ pocket, which is commonly present in Dicer1. Similarly, the 3′ pocket was also found to be conserved in all Dicer proteins with the exceptions of Schmidtea mediterranea Dicer2 and Trichoplax adherens Dicer A. The loss of catalytic residues in the RNase III domain was noted in platyhelminths and cnidarians, and the ‘ball’ and ‘socket’ junction between two RNase III domains in platyhelminth Dicers was different from the canonical junction, suggesting the possibility of different conformations. The present data suggest that Dicers might have duplicated and diversified independently, and have evolved for various functions in invertebrates.


Introduction
Small regulatory RNA pathways are highly conserved mechanisms present in most eukaryotic organisms and play an important role in post-transcriptional gene regulation. The gene-regulatory function of microRNAs (miRNAs) and short interfering RNAs (siRNAs) is mainly through translational repression or degradation of cytoplasmic mRNAs by an RNA-induced silencing complex (RISC). miRNA and siRNA pathways share a common RNase III processing enzyme, Dicer, and together with other proteins it constitutes RISC for gene transcriptional repression [1]. Dicer is responsible for recognizing a hairpin (in pre-miRNA) or long double-strand RNA (dsRNA), and processing them into miRNA-miRNA* duplexes or siRNA duplexes [2]. These small RNA duplexes are converted to a single-stranded form and bound to Argonaute (AGO), a key component of RISC, through a process coordinated by Dicer and other RNA-binding proteins [3]. Then small RNAs target specific mRNA sequences, leading to cleavage or translational repression of these [4].
Dicer proteins are present in many eukaryotic organisms including plants, fungi, and metazoans [5,6]. Vertebrates and nematodes have only one Dicer gene (Dicer1), whereas insects and flatworms possess two, (Dicer1 and Dicer2). Dicers normally contain a number of functional domains: an N-terminal DEAD box, an RNA helicase domain, a Dicer dimer domain, a Piwi-Argonaute-Zwille (PAZ) domain, two RNase III domains and a dsRNA binding domain [7,8]. The crystal structure of Dicer from Giardia intestinalis revealed that the PAZ domain was responsible for binding of the 39 terminus of dsRNA [9]. After the 39 end bound to the PAZ domain, pre-miRNAs or dsRNAs are cleaved by the two RNase III domains which form a single dsRNA processing center through intramolecular dimerization [10]. In Dicer1, binding of the PAZ domain to the 39 terminus of pre-miRNA is crucial for orienting the RNase III domains for cleavage, however, recent publications have revealed that 59 terminus recognition of pre-miRNAs is also important for mature miRNAs synthesis [11,12].
Previous studies have focused on Dicers of plants and model organisms, little is known about Dicers of invertebrates. The recent availability of genome sequences of over 34 invertebrate species from 10 phyla, including 1 choanoflagellate, 2 cnidarians, 1 placozoan, 2 annelids, 1 mollusc, 7 platyhelminths, 7 nematodes, 10 arthropods, 1 echinoderm and 3 chordates, have allowed us to perform an extensive phylogenetic analysis of Dicers.

Acquisition of sequence
For some well-annotated genomes, Dicer sequences were directly retrieved from the databases. In addition, BLASTP and TBLASTN were performed to search against their databases using Drosophila melanogaster Dicers (NP_524453 and NP_523778), Caenorhabditis elegans Dicer (NP_498761) or Schistosoma mansoni Dicers (Smp_169750.1 and Smp_033600) as query sequences. An E-value of 16e-10 was used as a cutoff in BLAST searches and the hits were filtered to keep only those with at least 25% identity to the query sequence. Protein functional domains were identified using Pfam database and SMART database [13,14]. The species names, abbreviations and accession numbers are provided in Table 1.

Sequence alignment and phylogenetic analysis
The data sets contained a total of 58 sequences from 34 species (in a size from 565aa to 2769aa, Text S1). The amino acid sequences of Dicer were aligned by MUSCLE [15] with default parameters and manually optimized by Jalview 2.8 [16]. The alignments were subsequently processed using Gblocks v0.91b [17] for phylogenetic reconstruction, allowing gaps in 1/2 of the sequences. ProtTest 3.2 was applied to find an appropriate model of amino acid substitution for tree building analysis [18]. A maximum likelihood tree was constructed using PhyML 3.0 program [19]. Clade support was calculated using SH-like approximate likelihood ratio test, Bayes likelihood test and bootstrap proportions (500 replicates).

Identification and distribution of Dicer genes across invertebrates
The final data sets contained 58 Dicer gene sequences from two cnidarians, one placozoa, two annelids, one mollusc, seven platyhelminths, seven nematodes, eleven arthropods, one echinoderm and three chordates (Table 1). No Dicer homologues were identified in the choanoflagellate Monosiga brevicollis. Our results of genomic database searches revealed that one placozoan, two annelids, one mollusc, one echinoderm and three of the chordates investigated possessed only one Dicer1 gene. Each of nematodes had only one Dicer1 gene, except Trichinella spiralis, which expressed both Dicer1 and Dicer2 genes. Platyhelminths and arthropods possessed two Dicer genes in their genomes, with the exceptions of Daphnia pulex (three genes), Pediculus humanus corporis (one gene) and Echinococcus multilocularis (three genes).

Phylogenetic analysis of Dicers
As shown in the Maximum likelihood tree (Fig. 1), Dicers of invertebrates were grouped into two lineages: Dicer1 subfamily and Dicer2 subfamily. Almost all of the arthopods and platyhelminths surveyed possessed one member of each of these subfamilies, and annelids, molluscs, nematodes, echinoderms and chordates investigated had only one Dicer gene that belongs to Dicer1 subfamily. The placozoan Trichoplax adhaerens had the most copies of Dicer genes in our investigated species; however, all of the five Dicer genes were classed into the Dicer2 subfamily. The two cnidarians N. vectensis and Hydra magnipapillata each had only one Dicer2 gene, but possessed other Dicer genes that fell outside the two subfamilies.

Organization of functional domains of Dicer family
We identified the functional domains using the Pfam database and confirmed each inferred domain using the SMART database. As shown in Fig. 1, Dicers had significant variability in domain  [10]. However, Taenia solium Dicer2 processed only one RNase III domain. We also observed the loss of the DEAD domain, which contains two RecA-like domains as a catalytic core and can regulate various processes involving RNA [20], in Dicer1 of mollusks, annelids, platyhelminths and most arthropods. A PAZ domain is an RNA-binding module found in PPD proteins (PAZ and Piwi domain proteins) and Dicer orthologs, and anchors the 2-nucleotide 39 overhang of dsRNA with its highly conserved binding pocket [10]. After searching annotated domains using Pfam and SMART databases, we did not find the PAZ domain in Dicer2 of the platyhelminths Schmidtea mediterranea,   Hymenolepis microstoma, T. solium, Echinococcus granulosus, E. multilocularis, the placozoan T. adhaerens and the nematode T. spiralis. There are two possibilities: the sequences are too divergent to be clearly recognized or they may have lost the PAZ domain during evolution. We therefore aligned the key amino acid residues in the PAZ domain of Dicers in the above species. As shown in Fig. 2a, most of the key residues in the PAZ domain were conserved in Dicer2 sequences with the exceptions of S. mediterranea and T. adhaerens, indicating the absence of the PAZ domain in those two species. Recently studies have revealed that human Dicer anchors not only the 39 end but also the 59 end, and the 59 end recognition by Dicer is important for the precise and effective biogenesis of miRNAs [12]. A previous study revealed that the 59 binding residues (Arg778, Arg780 and Arg811 within the N-terminal extension of PAZ domain, and Arg996 and Arg1003 within PAZ domain) were conserved across invertebrate Dicer1 and absent in other Dicers [12]. However, we found that these five key residues were present in N. vectensis DicerB, which was classed into the Dicer2 family (Fig. 2b).
After dsRNA binding to the 59 and 39 pockets, two RNase III domains of Dicer cleave targeted molecules. Based on the alignment of Dicer RNase III domain, we found that the catalytic core in most invertebrates was highly conserved (Fig. 3). However, Schistosoma mansoni Dicer1, S. mediterranea Dicer1, T. solium Dicer2 and E. multilocularis Dicer RNC3.1 showed variations in this key region (Fig. 3). Compared to platyhelminths, the RNase III domain seemed to be divergent in cnidarians, and most of the key residues were altered in H. magnipapillata DicerC (Fig. 3), indicating the possibility of loss of dsRNA cleavage ability. During the cleavage of targeted molecules by sRNAs, two RNase III domains of Dicer form a tight dimer of which the subunit interface is hydrophobic [9]. The crystal structure showed that a tight dimer was formed by two Aquifex aeolicus RNase III proteins, each of which possessed only one RNase III domain. A total of 128 hydrophobic interactions (,4.0 Å ) were found between the two molecules, whereas only 20 hydrogen bonds/salt bridges existed at the dimer interface. In the dimer, two identical ''ball-and-socket'' junctions were formed at each end of the interface. The 'ball' was the hydrophobic side chain of Phe41 and the 'socket' was a cavity formed by side chains of Val52, Val56, Leu67, Ser68, and Lys71 [9]. Subsequent studies showed that a Met1317 within the human RNase III a domain was located in the position of the 'ball' residue and the corresponding socket residues Thr1717, Tyr1721, Leu1732, Thr1732 and Arg1736 were located in RNaseIII b domain [21]. Interestingly, we found that the 'ball' residue in the RNase III a domain of platyhelminths Dicer1 was replaced by a hydrophilic amino acid-threonine, whereas the corresponding 'socket' residues were still conserved (Fig. 4).

Discussion
In our study, the number of the Dicer genomic loci was variable, from one in several invertebrates to five in T. adhaerens. Dicers of invertebrates were clearly classed into two subfamilies, Dicer1 and Dicer2, except for several Dicers from cnidarians. Our results support the model of Dicer evolution in which a eukaryote Dicer may have duplicated independently. Interestingly, Dicer2 of E. multilocularis may have undergone duplication after species formation. Mature miRNAs have been identified in all the invertebrates investigated in our research with the exceptions of the choanoflagellate M. brevicollis and the placozoan T. adhaerens [22]. Similarly, we failed to find Dicer genes and other RISC proteins genes in the genome of M. brevicollis, a close known relative of metazoans. However, T. adhaerens, a simple known metazoan, possessed five Dicer proteins and all of them belonged to Dicer2 subfamily. These Dicers may constitute an immune defense mechanism against viral infection as placozoans are exposed to a high viral load [23].
Both cnidarians possessed multiple Dicers, and only N. vectensis DicerB and H. magnipapillata DicerB were classed into Dicer2 group, while the others fell outside the two recognized subfamilies. Recent analysis has revealed that cnidarians express speciesspecific miRNAs and share few miRNA families with bilateria [24,25]. These distinct cnidarian Dicers may provide some clues to understanding of the biogenesis of species-specific miRNAs.
The recognition and cleavage of dsRNA by Dicer is a core step in miRNA and siRNA pathways. The 39 pocket of Dicer is involved in 39 end binding of dsRNA. The absence of key sites in the pocket in S. mediterranea Dicer2 and T. adhaerens DicerA could lead to loss of the binding ability. But these two Dicer2 genes may function with the help of other RNA binding protein, such as Drosha [26]. The 59 pocket is positioned in close proximity to the 39 pocket on the same surface of Dicer1, and the binding residues of the 59 pocket are conserved [12]. Interestingly, we found most of the key binding residues of the 59 pocket, which were previously found only in Dicer1, in N. vectensis DicerB that belonged to Dicer2 lineage. It suggests that N. vectensis DicerB may retain the bioactivities of Dicer1 as well.
After dsRNA recognition by the conserved pockets, the cleavage of dsRNA is conducted by two RNase III domains [27]. The loss of the catalytic residue-aspartate in E. multilocularis Dicer RNC3.1 could reduce its catalytic activity. However, Dicer RNC3.2, a paralogue of Dicer RNC3.1, possessed all the key residues, and therefore it may compensate for the reduced activity of Dicer RNC3.1. The dimerization of RNase III domains creates a catalytic valley which can accommodate a dsRNA substrate. The two ''ball and socket'' junctions may be responsible in part for the accurate positioning of the catalytic residues in the valley. A Figure 4. The residues of ''ball and socket'' junction. The 'Ball' residue within RNaseIII a domain is marked by a black asterisk, and the 'socket' residues within RNase III b domain are marked by red asterisks. Gaps are filled using question marks (?). doi:10.1371/journal.pone.0095350.g004