Assessing Whether Alpha-Tubulin Sequences Are Suitable for Phylogenetic Reconstruction of Ciliophora with Insights into Its Evolution in Euplotids

The current understanding of ciliate phylogeny is mainly based on analyses of a single gene, the small subunit ribosomal RNA (SSU-rDNA). However, phylogenetic trees based on single gene sequence are not reliable estimators of species trees, and SSU-rDNA genealogies are not useful for resolution of some branches within Ciliophora. Since congruence between multiple loci is the best tool to determine evolutionary history, we assessed the usefulness of alpha-tubulin gene, a protein-coding gene that is frequently sequenced, for ciliate phylogeny. Here, we generate alpha-tubulin gene sequences of 12 genera and 30 species within the order Euplotida, one of the most frequently encountered ciliate clades with numerous apparently cosmopolitan species, as well as four genera within its putative sister order Discocephalida. Analyses of the resulting data reveal that: 1) the alpha-tubulin gene is suitable phylogenetic marker for euplotids at the family level, since both nucleotide and amino acid phylogenies recover all monophyletic euplotid families as defined by both morphological criteria and SSU-rDNA trees; however, alpha-tubulin gene is not a good marker for defining species, order and subclass; 2) for seven out of nine euplotid species for which paralogs are detected, gene duplication appears recent as paralogs are monophyletic; 3) the order Euplotida is non-monophyletic, and the family Uronychiidae with sequences from four genera, is non-monophyletic; and 4) there is more genetic diversity within the family Euplotidae than is evident from dargyrome (geometrical pattern of dorsal “silverline system” in ciliates) patterns, habit and SSU-rDNA phylogeny, which indicates the urgent need for taxonomic revision in this area.


Introduction
Current studies on the relationships within the phylum Ciliophora are almost exclusively based on SSU-rDNA phylogenies [1][2][3][4][5][6][7]. These single gene analyses provided resolution for a number of important questions on the phylogenetic relationships within this group, but there are problems. The overall picture emerging from these studies confirmed the monophyly of most classes defined by morphological criteria; however, relationships among these classes vary with different taxon sampling (for example [3,[8][9][10]). Moreover, while some previous investigations based on SSU-rDNA alone resolved assignments of some taxa with ambiguous morphological classification (for example [4,11,12]), relationships within some orders/families containing a large number of taxa remain problematic [6,13].
In recent years, other gene markers, including LSU-rDNA gene, ITS region, tubulins, phosphoglycerate kinase, actin, DNA Polymerase a, Hsp 70, etc., have been used to reconstruct ciliate phylogenies [14][15][16][17][18][19][20][21][22]. Traditionally, protein gene markers are considered more suitable alternatives to SSU-rDNA than LSU-rDNA gene and ITS region for two reasons. First, protein markers are less sensitive to differences in compositional bias, which can lead to artifacts in tree construction [23][24][25]. In addition, sequences of multiple unlinked loci have different histories, as opposed to linked SSU-rDNA, LSU-rDNA, ITS regions, and are necessary to estimate species trees [26]. However, protein-coding genes can possess paralogs that might bias phylogenetic trees [27,28]. Among these protein-coding genes, alpha-tubulin is one of the mostly used gene makers for ciliated phylogeny [8,11,13,15,20,21,29], and its duplication in ciliates has been previously studied only with sparse taxon sampling [15,30]. Therefore, alpha-tubulin is a promising candidate for testing whether protein-coding genes are suitable for phylogeny construction of ciliates.
Previous alpha-tubulin phylogenies showed that most classes could be well distinguished with high support [8,15,20], while subclasses appeared to be non-monophyletic [13,21,29]. In our recent study [13], we characterized alpha-tubulin gene from 15 genera covering all families of the order Urostylida, but we were unable to determine if alpha-tubulin gene is suitable for classification of lower level taxa since urostylid families are not well defined morphologically [31,32] and their monophyly is rejected by both SSU-rDNA and alpha-tubulin phylogenies. Therefore, a group with well-defined families or genera is needed to test the ability of alpha-tubulin to resolve phylogeny for lower level taxa. The order Euplotida, one of the most frequently encountered ciliate clades with numerous putatively cosmopolitan species [33][34][35][36][37], is a good choice because most morphological families within this order are recovered robustly in SSU-rDNA analyses [38][39][40].
Here, we increase sampling of alpha-tubulin gene sequences from 30 taxa within the order Euplotida, including multiple morphospecies from four out of five families, as well as four species of its putative sister order Discocephalida. Our main aims are to: 1) assess the suitability of alpha-tubulin for circumscribing lower level taxa; 2) estimate phylogenetic relationships within the order Euplotida using two-gene combined, SSU-rDNA and alphatubulin trees; and 3) characterize patterns of molecular evolution among euplotid alpha-tubulin paralogs.

Different Species with Same Amino Acid Sequences
Although species contained diverse alpha-tubulin sequences, we found that some euplotid species (e.g. Uronychia multicirrus and U. sinica; Euplotes sp.-GZJJM2009121510, Euplotoides parawoodruffi and Euplotopsis sp.-GZJJM2009121508; Euplotopsis encysticus and Euplotes cf. antarcticus) share identical amino acid sequences, revealing the high level of functional constraint on this protein (Fig. S1). Alpha-tubulin gene sequences of U. multicirrus and U. sinica are different from each other at 56 sites, all of which are third codon positions (Fig. S1). Similarly, 69 residues, 3 first codon positions and 66 third codon positions, are different between alpha-tubulin gene sequences of Euplotopsis encysticus and Euplotes cf. antarcticus (Fig. S1). There are totally 124 polymorphic nucleotide sites between Euplotes sp.-GZJJM2009121510, Euplotoides parawoodruffi and Euplotopsis sp.-GZJJM2009121508. And among these sites, 11 are in first codon positions, only one is in second codon position, and the remaining 112 sites are in third codon positions (Fig. S1).

Intraspecific Variation
Multiple clones have been sequenced from ten species and two populations of the morphospecies Diophrys parappendiculata and Euplotes sinicus are sampled (Table 1). No paralogs are detected in two populations of D. parappendiculata. Only one sequence was found in E. sinicus population II, while two paralogs are present in population I, with the first being identical to the sequence of population I. ( Table 2).
Relationships among species in the family Euplotidae do not always corresponding to dargyrome patterns or natural habitats. Within the order Discocephalida, two families (viz. Pseudoamphisiellidae and Discocephalidae) are included. The Pseudoamphisiellidae (Pseudoamphisiella and Leptoamphisiella) form a monophyletic group as do the Discocephalidae (Discocephalus and Prodiscocephalus), indicating the monophyly of these two families. However, sister relationship between these two families is never recovered (Figs. 2, S2, S3).  2) and Atub_aa (Fig. S2), respectively; however, they are always in Aspidisca-clade.
Nodes of two-gene combined tree (Fig. 3) are better supported than SSU-rDNA tree (Fig. S4). There are 43 and 41 supported nodes (Bootstrap values .50%) in two-gene combined tree (Fig. 3) and SSU-rDNA tree (Fig. S4), respectively. Among them, support values of 28 nodes for these two trees are more than 90%, but more nodes are fully supported in two-gene combined tree (13 for combined tree vs. 10 for SSU-rDNA tree). Moreover, the two-gene combined tree (Fig. 3) posses more monophyletic taxa as predicted by morphology (e.g. spirotrichean subclasses and euplotid families, genera) than other trees (Table 3).

Discussion
Is Alpha-Tubulin a Suitable Marker for Inferring Ciliate Phylogeny?
The topologies of trees inferred from of ciliate proteins may be confounded by the many paralogs present in these lineages [14,18] However, in the present investigation, alpha-tubulin gene paralogs of euplotid species are not very divergent from one another, and only paralogs of two of the nine species are non-monophyletic (Figs. 2, S2,  S3). Similarly, with samples of five species from three ciliate classes, Israel et al. [15] also found that alpha-tubulin gene paralogs in any given taxon appear to be most closely related to each other or to a sequence from a congener than to others These data indicate that only recent paralogs of alpha-tubulin are retained, and thus gene duplication may not pose a substantial problem in defining ciliate clades [30]. However, alpha-tubulin is not always a good marker for studying relationships at the level of species or below given the high level of amino acid conservation among sequences (Fig. S1). Moreover, it is possible that a combination of gene duplication followed by concerted evolution and differential extinction of some alpha-tubulin paralogs has obscured the evolutionary history in some part of the ciliate tree [15].
The best way to evaluate the quality of one gene marker for tree construction is to look for its congruence with species tree inferred by morphology [14] and by other gene markers. We follow the criterion as a modified one given by Budin and Philippe [14], which is to assess the recovery of the monophyletic groups unquestionably supported by both morphology and SSU-rDNA trees. The monophyly of the family Euplotidae recovered by SSU-rDNA trees is consistently reconstructed in our three alpha-tubulin trees. In recent study, only three out of eight genera (Moneuplotes, Gastrocirrhus and Aspidisca), with alpha-tubulin gene sequences from several species, are monophyletic (Figs. 2, S2, S3). And for the other five genera, only species within Uronychia appear to be monophyletic. Same situation occurs in SSU-rDNA analyses [38][39][40][43][44][45]. Therefore, according to the important criterion of accepted monophyletic groups, the reliability of alpha-tubulin is comparable to that of SSU-rDNA at the genus and family levels.
In the present investigation, only the subclass Hypotrichia, which contains only four genera, is monophyletic in three alphatubulin gene trees (Figs. 2, S2, S3). However, with more samples of alpha-tubulin gene from the Hypotrichia, monophyly of Hypotrichia was rejected by previous investigation [13]. For the other four subclasses for which we have sufficient taxon sampling, Oligotrichia is monophyletic (Figure 2, Atub_n74; Figure S2, Atub_aa), and others are not monophyletic. Therefore, alphatubulin might not be a good gene marker for examine relationships among high level taxa.

Phylogenetic Relationships within the Order Euplotida
Multi-gene analyses are proving useful as a means of placing some taxa within phylogenetic trees where morphological evidence and single gene analyses have not been successful [53][54][55][56]. Our results also show that two-gene combined tree is better than single gene trees for most clades (Table 3). However, relationship among four euplotid genera was not resolved by any of our analyses, including two-gene combined tree (Figs. 2, 3, S2, S3, S4). Inclusion of more taxa, especially species within the family Certesiidae, coupled with more genes are likely necessary to resolve sister relationships among euplotid families.
All six genera of the family Uronychiidae have been sequenced in the present study, revealing that this family is not monophyletic (Figs. 2, 3, S2, S3, S4). This result is consistent with some previous investigations inferred from SSU-rDNA sequences [45,57,58], though in some other SSU-rDNA trees this family appears monophyletic [38][39][40]44]. Therefore, it is too early to infer whether this family should be further defined before more gene information is available. Similarly, the Diophrys-complex contains five genera (Diophrys, Diophryopsis, Paradiophrys, Heterodiophrys and Apodiophrys) but due to variable positions in different trees, it is difficult to infer their related relationships. However, similar to previous SSU-rDNA investigations [38][39][40]44], the genus Diophrys seems to be non-monophyletic.
The family Euplotidae is composed of Euplotes-complex, and was divided into several genera or groups based on different morphological characters [34,59,60] or SSU-rDNA trees [38,41,42]. However, these classifications are not consistent with one another. For example, based on cortical structure, endosymbionts, morphometric data, morphogenetic patterns, and ecology, Euplotes-complex was separated into four genera (i.e. Euplotes, Euplotopsis, Euplotoides and Moneuplotes) by Borror and Hill [34]. Previous SSU-rDNA trees [38,39,41,42,44] and our analyses based on SSU-rDNA plus the two-gene combined trees demonstrate the monophyly of Moneuplotes and Euplotoides, but reject the monophyly of the other two genera (Figs. 3, S4). Similarly, the three species groups (i.e., single-, double-, or multiple-dargyrome types) defined according to dargyrome patterns (dorsal silverline system) by Gates and Curds [60] are not always monophyletic in molecular phylogenetic trees (our investigation [38,42]), indicating the presence of more complexity within this group than is evident from dargyrome patterns. Moreover, the three well resolved clades (Clade I-III) repeatedly shown in previous SSU-rDNA trees [38,39,41,42] are not always present in our trees (Figs. 2, 3, S2, S3, S4) nor are they supported by morphological characters, which indicates that these well resolved clades in SSU-rDNA gene trees may not capture valid taxonomic relationships. Finally, clades within Euplotidae are inconsistent with respect to morphology and

Evolutionary Patterns in Duplicated Alpha-Tubulin
Among seven euplotid species for which paralogs are detected, duplicated alpha-tubulin genes of all taxa show some changes in the amino acid sequence following duplication ( Table 2). Compared to those of Paramecium tetraurelia, which has much longer macronuclear chromosomes, there are bigger amino acid distances between paralogs of euplotid species. This elevated level of sequence divergence is similar to patterns in proteins from other ciliates with gene-sized macronuclear chromosomes (Israel et al. 2002, Zufall et al. 2006, and supports the hypothesis that genome processing is associated with increased protein diversification as proposed by previous investigations [30,61,62].

Materials and Methods
No specific permits were required for the described field studies. All locations are not privately-owned or protected in any way, and none endangered or protected species was involved.

Collection and Identification of Ciliates
We isolated genomic DNAs from 34 morphospecies samples ( Fig. 1) from China. Exact collection localities, sample information and GenBank accession numbers of sequenced alpha-tubulin genes are listed in Table 1. All isolates were identified by the methods of Shen et al. [37]. Terminology and systematic classification used in the current paper follow Lynn [31]. The term dargyrome used in the present paper is here defined as in previous reference [38], and refers to the overall geometrical pattern of the dorsal argyrome or ''silverline system'' in some euplotid ciliates. This pattern consists of net-or web-like structure revealed by silver impregnation methods, which is of great taxonomic importance at generic or specific level [59].

Extraction and Sequencing of DNA
Genomic DNA was extracted according to methods described in Yi et al. [63]. All DNA samples are extracted from one to several cells of one population, except for that there are two DNA samples for Diophrys parappendiculata and Euplotes sinicus, which are from two populations, respectively ( Table 1). The PCR amplifications of the alpha-tubulin genes were performed using a TaKaRa ExTaq DNA Polymerase Kit (TaKaRa Biomedicals, Japan). Primers used for partial alpha-tubulin gene amplification were Tub-1 (59-AAG GCT CTC TTG GCGTAC AT-39) and the reverse primer Tub-2 (59-TGATGC CTT CAA CAC CTT CTT-39) [11]. PCR conditions were: 5 min initial denaturation (95uC), followed by 35 cycles of 1 min at 95uC, 1 min at 56uC and 1.5 min at 72uC, with a final extension of 15 min (72uC). The amplicons were directly sequenced using the same primers. However, if paralogs were detected in a sample, then it was purified using the TIANgel Midi Purification Kit and inserted into a pUCm-T vector. Two to nine clones were selected and sequenced by Invitrogen (Shanghai, China). Though it is impossible to detect all paralogs of investigated species due to interpretation of direct sequencing and limited clone samples, these sequences provide an estimate of paralog diversity.

Data Analyses
Sequence divergence between paralogs of ciliates is not clear. In the present investigation, we follows criterion of previous study [15], which defines sequences that diverge by more than 2% as paralogs, considering sequences errors produced by repeated PCRs and cloning [64]. Under this approach, recent paralogs may be confounded with allelic diversity and some paralogs may be missed, but this should not substantially bias our interpretations.
Five data sets were included in phylogenetic analyses: (1) Atub_n74: alpha-tubulin nucleotide sequences including first two codon positions (74 sequences in total); (2) Atub_aa: alpha-tubulin amino acid (70 sequences in total); (3) Atub-SSU: two-gene combined dataset including all euplotid species available (the paralog with shortest branch length is selected for alpha-tubulin) and other spirotrichean species of Dataset Atub_n74 except for Discocephalus ehrenbergi and Histriculus histrio for SSU-rDNA, and D. rotatorius and H. cavicola for alpha-tubulin (52 sequences in total); (4) SSU: SSU-rDNA sequences including all taxa in Dataset Atub-SSU (52 sequences in total); (5)Atub_n52: alpha-tubulin nucleotide sequences with first two codon positions including all taxa in Dataset Atub-SSU (52 sequences in total). For phylogenetic analyses, 27 sequences of alpha-tubulin genes from GenBank were used in addition to ones newly sequenced in the present study. The sequences were aligned using the ClustalW implemented in BIOEDIT 7.0.0 [65], and further modified manually using BIOEDIT. Final alignments used for subsequent phylogenetic analyses included 710 positions (Atub_n74), 355 positions (Atub_aa), 2,303 positions (Atub-SSU) and 1,593 positions (SSU), respectively. GTR + I + C was the best fitted model for nucleotide dataset (Atub_n74) selected by AIC as implemented in MrModeltest v2 [66], and Blosum62+I+G was the best one for amino acid dataset (Atub_aa) selected by AIC as implemented in ProtTest 1.4 [67]. Maximum likelihood analyses, and 1,000 bootstrap replicates, were conducted using RaxML-HPC v7.2.7 [68]. A Bayesian inference (BI) analysis was performed with MrBayes 3.1.2 [69] using the GTR+I+G model selected by MrModeltest 2 [66] under the AIC criterion. Markov chain Monte Carlo (MCMC) simulations were run with two sets of four chains using the default settings: chain length 1,500,000 generations, with trees sampled every 100 generations. The first 3,000 trees were discarded as burn-in. The remaining trees were used to generate a consensus tree and to calculate the posterior probabilities (PP) of all branches using a majority-rule consensus approach. Phylogenetic trees were visualized with TreeView v1.6.6 [70] and MEGA 4 [71].Congruence of different data partitions (in this case genes) was tested with both the incongruence length difference (ILD) test [72] and Shimodaira-Hasegawa (S-H) test [73] as implemented in PAUP*4.0b 10. PAUP* 4.0b 10 was used to generate constraint trees, and resulting trees were compared with unconstrained ML tree using the approximately unbiased (AU) test [74] as implemented in CONSEL package [75].