Parallel Loss of Plastid Introns and Their Maturase in the Genus Cuscuta

Plastid genome content and arrangement are highly conserved across most land plants and their closest relatives, streptophyte algae, with nearly all plastid introns having invaded the genome in their common ancestor at least 450 million years ago. One such intron, within the transfer RNA trnK-UUU, contains a large open reading frame that encodes a presumed intron maturase, matK. This gene is missing from the plastid genomes of two species in the parasitic plant genus Cuscuta but is found in all other published land plant and streptophyte algal plastid genomes, including that of the nonphotosynthetic angiosperm Epifagus virginiana and two other species of Cuscuta. By examining matK and plastid intron distribution in Cuscuta, we add support to the hypothesis that its normal role is in splicing seven of the eight group IIA introns in the genome. We also analyze matK nucleotide sequences from Cuscuta species and relatives that retain matK to test whether changes in selective pressure in the maturase are associated with intron deletion. Stepwise loss of most group IIA introns from the plastid genome results in substantial change in selective pressure within the hypothetical RNA-binding domain of matK in both Cuscuta and Epifagus, either through evolution from a generalist to a specialist intron splicer or due to loss of a particular intron responsible for most of the constraint on the binding region. The possibility of intron-specific specialization in the X-domain is implicated by evidence of positive selection on the lineage leading to C. nitida in association with the loss of six of seven introns putatively spliced by matK. Moreover, transfer RNA gene deletion facilitated by parasitism combined with an unusually high rate of intron loss from remaining functional plastid genes created a unique circumstance on the lineage leading to Cuscuta subgenus Grammica that allowed elimination of matK in the most species-rich lineage of Cuscuta.


Introduction
Introns within organellar genes have unique features. Unlike those in eukaryotic nuclear genes, they do not rely on spliceosomes for excision from RNA transcripts, and unlike the similarly structured self-splicing introns of prokaryotes, they typically require other trans-acting factors for efficient splicing in vivo [1,2]. Land plant plastid genomes usually contain between 17 and 20 of these introns, all of which are classified as group II based on putative folding structure except for a single group I intron within the transfer RNA gene trnL-UAA [3]. Plastid group II introns are further subdivided structurally into two classes, group IIA and group IIB. All of these group II introns, with the exception of the second of two group II introns within clpP, seemingly trace their origin to a shared common ancestor of charophycean algae and all land plants [4].
Only one transcribed open reading frame has been identified within any plastid intron, the presumed intron maturase matK, consistently found within trnK-UUU. Although matK has been shown to be an essential factor for the splicing of the trnK intron within which it is contained [5], its involvement in the splicing of other plastid introns is poorly understood [6]. The plastid genome of the nonphotosynthetic, parasitic angiosperm Epifagus encodes only four proteins not involved in transcription or translation and lacks a functional trnK gene [7]. However, the trnK pseudogene retains a complete open reading frame for matK which is evolving under selective constraint, indicating matK is essential for other functions beyond splicing the trnK intron in that species [8]. A parallel pattern of trnK loss with retention of matK is seen in the photosynthetic streptophyte alga Zygnema circumcarinatum [4]. Various studies have shown that without translation of plastid-encoded proteins, seven group IIA introns in the plastid genome remain in an unspliced transcript form, whereas group IIB introns are largely unaffected and have been shown in maize to primarily rely upon a nuclear-encoded factor, crs2, for splicing [3,9,10]. An eighth group IIA intron, clpP intron 2, is present in the chloroplast genomes of most land plants but was not examined in those studies because it is not present in grasses. Excision of the only group I intron, trnL-UAA, is unaffected by any of these factors, as is splicing of the second of two group IIB introns found within ycf3 [10]. Reliance of seven group IIA introns upon a plastid-encoded factor for splicing indicates a role for matK in splicing introns other than the trnK intron within which it resides.
Like Epifagus (Orobanchaceae), members of the genus Cuscuta (Convolvulaceae) are parasitic plants that have undergone substantial gene loss from their plastid genomes [11]. However, at least some members of the genus retain a largely intact plastid genome and contain chlorophyllous tissues [12], albeit in a localized form less crucial to the parasites' survival relative to fully autotrophic plants [13]. Losses of three group IIA introns from the plastid genomes of various Cuscuta species were reported more than a decade ago [14,15], and presence of the intron found within the 39 locus of the trans-spliced rps12 gene was shown to be polymorphic in the genus [16]. More recently, the sequencing of four complete plastid genomes from the genus Cuscuta, two from subgenus Monogyna and two from subgenus Grammica, shows that intron content between the two subgenera differs greatly; specifically, both matK and all group IIA introns except the second intron of clpP are lost from the plastid genomes of the closely related members of subgenus Grammica [11,17]. Intron 2 of clpP, acquired in the common ancestor of land plants millions of years after matK and the other seven plastid group IIA introns [4], was shown to be properly transcribed and translated in Cuscuta gronovii in the absence of plastid matK [17].
In this study we sampled across the taxonomic range of Cuscuta in order to ascertain the distribution of matK and plastid introns in the genus. For those taxa that still contain matK, we investigated whether or not significant changes in selective constraint occurred on branches where intron loss has occurred. Finally, we conducted similar branchXsite tests on an equal sample size of the variously parasitic family Orobanchaceae, where loss of most plastid introns is known to have occurred at least in Epifagus.

Results
Using PCR assays that gave clear positive or negative results based on band size, we surveyed for the presence of matK at the trnK-UUU locus along with all known group IIA introns and three group IIB introns (one in trnG-UCC and two within ycf3) from a variety of Cuscuta species representing all three currently recognized subgenera (Table 1). In cases of tRNA introns, we used sequence reads to confirm presence or absence of the gene and intron, as tRNA exons are generally shorter than 40 nucleotides in length.
Although the trnK gene itself is absent across all Cuscuta species, all sampled members of subgenus Monogyna and subgenus Cuscuta retain an open reading frame for matK, paralleling the condition in Epifagus and Zygnema. However, all sampled members of subgenus Grammica, which contains the majority of Cuscuta species, have lost matK from the plastid genome. As predicted under the hypothesis that matK is necessary for splicing of all seven group IIA introns shown to be unspliced in grass plastid translational mutants [3,9,10], loss of matK in Cuscuta correlates perfectly with the loss of all of those group IIA introns from the plastid genome. Representatives of subgenus Grammica still possess the group IIA intron within clpP (intron 2), five group IIB introns, and the trnL-UAA group I intron within otherwise normal genes, corroborating prior results that resident plastid matK is not necessary for the splicing of these introns [3,5,9,10,17]. All sampled species of Cuscuta that still possess matK also possess at least four group IIA introns with the exception of Cuscuta nitida, which retains only the 39 rps12 intron (Table 1, Fig. 1A) and intron 2 of clpP. The open reading frame of matK was partially or fully sequenced for five species in Cuscuta subgenus Monogyna, three species from subgenus Cuscuta, and four species from the otherwise autotrophic family they are derived from within, Convolvulaceae (Morning Glory Family). Using outgroup sequences from available plastid genomes, a well-supported phylogeny was constructed that agrees fully with published relationships within Convolvulaceae and Cuscuta [18,19] (Fig. 1A). Ipomoea (tribe Convolvuleae) was strongly supported as sister to Cuscuta, although alternative hypotheses at this node could not be rejected in a previous study study [20]. Because our taxon sampling outside of Cuscuta is sparse, we conservatively chose to collapse this node as a polytomy for analyses of selective constraint. We were especially interested in changes in selective constraint within domain X, the portion of matK that has been identified as the putative RNA binding domain [21]. When per-site ratios of nonsynonymous to synonymous nucleotide substitutions (d N /d S = v) were constrained across the phylogeny, domain X was found to be evolving under stronger purifying selection (v = 0.21) than the remainder of the gene (v = 0.392; Table 2, model M0). All sampled species also contained an amino acid consensus motif within domain X (SX 3-6 TLAXKXK) conserved across land plants and charophytes [22], further suggesting that matK remains functional among all Cuscuta that still possess it.
Significant variation in selective constraint across sites within domain X was observed when comparing nested models with a single ratio of d N /d S (M0) versus models with two or three rate ratio classes (M3; Table 2, line 2). We used fitmodel [23] to test whether changes in the pattern of among-site variation in selective Table 2. Shifting patterns of selection on matK.

Model (parameters)
Omega constraint varied across the tree, perhaps in association with the loss of specific group II introns (Fig. 1). A Likelihood Ratio Test (LRT) did not yield significantly better support for a model that allowed switching among rate ratio classes across the tree relative to the M3 model (among-site variation in d N /d S ) without switching across the tree (p = 0.32; Fig. 1). We wanted to explore this further by focusing on the branch leading to C. nitida, which has lost all introns that are thought to be spliced by matK (see above) with the exception of the one contained in the 39 portion of rps12. We used branchXsites models implemented in codeml [24,25,26] to test the a priori hypothesis that the pattern of variation in constraint among sites was different on the branch leading to C. nitida (specified as the foreground branch) than the pattern of among-site variation across the rest of the tree (background branches). This approach is analogous to the switching test implemented in fitmodel, but in this case we have an a priori hypothesis that switches among rate ratio classes are concentrated on a single branch. The alternative hypothesis for branchXsites tests of Yang, Nielson and colleagues (implemented in codeml) have fewer parameters than the unconstrained switching model implemented in fitmodel, and thus this approach may have more statistical power when one has well defined hypothesis for the branch on which switching is expected to have occurred. In fact, the branchXsites model fits the domain X data significantly better than the rates across sites model (M3) when the C. nitida branch was specified as the foreground (Tables 2,   line 3, and 3, test 1). By contrast, the likelihood for discrete branchXsite model was not significantly different than the rates across sites model (M3) when residues outside of domain X were analyzed ( Table 2, line 11).
Returning to analyses of domain X, two branchXsites tests were designed specifically to detect evidence of adaptive evolution [25,26]. The first test compares the ''nearly neutral'' model (M1a) with codons evolving under conserved (0,v 0 ,1) and neutral (v 1 = 1) evolution, with a positive selection branchXsites model that includes a third, positive rate ratio class (v 2 .1) for a fraction of sites evolving on the foreground branch. The second, more stringent test compares a branchXsites null model with v 2 = 1 on the foreground branch to the positive selection branchXsites model (i.e. v 2 .1 on the foreground branch). In addition, codeml [27] provides a posteriori Bayes empirical Bayes (BEB) estimation of the probability that each site on the foreground branch is evolving under positive selection (v 2 .1). The likelihood for the branchXsites positive selection model was significantly better than for the nearly neutral model, and adaptive evolution on the branch leading to C. nitida was strongly supported for one site, position 16 in the domain X alignment (Tables 2, line  7, and 3, test 2). However, we were unable to reject a more stringent null model (v 2 = 1; Table 2, line 8). In summary, these results indicate that loss of three of the final four group IIA introns for which matK has been implicated in splicing has resulted in relaxed or even positive selection for some codons within domain X in C. nitida.
In Epifagus one of only two remaining, putatively matK-spliced plastid group IIA introns is the same 39 rps12 intron retained in C. nitida; the second is an intron in rpl2 that is not found in any Cuscuta species nor autotrophic relatives in Convolvulaceae [28]. Because Epifagus retains only one additional intron relative to Cuscuta nitida, we used codeml to perform LRT analyses testing whether matK may also be evolving under positive selection in Orobanchaceae, the predominantly parasitic family containing Epifagus. Although knowledge of intron distribution among members of Orobanchaceae is lacking, we gathered matK data from a range of species available on Genbank that likely differ in plastid gene and intron content from Epifagus. Orobanche fasciculata, like Epifagus, is nonphotosynthetic but is known to retain a possibly functional copy of rbcL, the large subunit of the Rubisco protein crucial to the Calvin Cycle [29]. A parasite that retains the ability to photosynthesize (Castilleja linariifolia) and a fully autotrophic sister-group to the parasites (Lindenbergia philippinensis) were also included in the analysis, and the same outgroups were used as for the Convolvulaceae tests. The phylogeny obtained for these species (Fig. 1B) was congruent with published relationships; although the branch joining the two nonphotosynthetic Orobanchaceae sensu strictu taxa, Orobanche fasciculata and Epifagus, has relatively low support in our tree, this relationship is incontrovertibly supported in all other systematic work done on Orobanchaceae to date [30,31,32]. As was the case with the Cuscuta/Convolvulaceae result, global d N /d S (v in M0) was lower for domain X than for the rest of the gene (0.28 vs. 0.47), and branchXsites models with the Epifagus lineage set as the foreground were significantly better than the rates across sites models (M1a and M3; Table 2, line 18). As was also seen in the Cuscuta/ Convolvulaceae analysis, positive selection was implicated when the nearly neutral model was set as the null, but not when the more stringent null model (v 2 = 1; Tables 2, lines 23 and 24, and 3, test 4) was imposed. Unlike the Cuscuta/Convolvulaceae analysis, however, both fitmodel and codeml analyses identified shifting levels of constraint across branches for some sites outside of domain X and strong evidence for positive selection on the branch leading to Epifagus (Tables 2, lines 31 and 32).

Discussion
In the evolutionary history of Cuscuta, the previously conserved RNA-binding domain of matK underwent dramatic change in selective pressure after the loss of three of the remaining four group IIA introns for which matK is involved in splicing. In Cuscuta nitida, the RNA-binding domain is evolving under less constraint than in other Cuscuta species and outgroups where multiple group IIA introns spliced by matK are still present. It is possible that constraint on domain X to remain a generalist for group IIA intron binding has been released on the branch leading to Cuscuta nitida, and matK may have subsequently specialized to specifically bind to and splice the 39 rps12 intron. Alternatively, one of the three introns lost on the branch to Cuscuta nitida may be particularly integral to maintaining constraint on domain X. Results of the branchXsites analyses are suggestive of adaptive evolution in domain X on the Cuscuta nitida lineage, but not conclusive. While the Maximum Likelihood estimations of v 2 were .4.0 for some codons on the Cuscuta nitida lineage, we are not able to reject the hypothesis that these sites are evolving under neutrality (v 2 = 1.0; Table 2, lines 7 and 8). This may be due to insufficient statistical power.
Epifagus, which retains two group IIA introns linked to matK splicing in its plastid genome, also shows a dramatic change in selective constraint of domain X relative to related taxa. If one of the three introns lost on the branch leading to Cuscuta nitida (trnF-GAU, trnA-UGC, and atpF) is primarily responsible for constraint of domain X across streptophytes, that intron may be lost on the branch leading to Epifagus as well. As we saw with the Cuscuta analysis, the Maximum Likelihood estimations of v were .4.0 for some codons in domain X; however, we are not able to reject the hypothesis that these sites are evolving under neutrality (v 2 = 1.0; Table 2, lines 23 and 24). Interestingly, we were able to reject neutral evolution for some sites in the amino terminal region, outside of domain X on the branch leading to Epifagus (Table 2, line 32). The sites showing significant signal for positive selection (Table 3, test 6) are moderately conserved in the pfam alignment for the matK amino terminal region (positions 224 and 277 in the complete alignment of pfam01824), but no function has been hypothesized for this portion of the matK protein.
Loss of tRNA genes is a common phenomenon in the plastid genomes of parasitic plants [33,34,35], and Epifagus has also lost the group IIA-containing atpF gene along with all other photosynthetic and chlororespiratory genes [7]. However, there are no cases of intron loss from functional genes in Epifagus. Although sampled members of subgenus Grammica parallel Epifagus in losing all group IIA intron-containing tRNAs, atpF and rps12 remain under purifying selection in Cuscuta despite precise intron losses from these genes. Intron 2 of clpP, a group IIA intron not linked to matK splicing, was uniquely lost by Cuscuta epilinum (Table 1); that species still retains clpP intron 1, a group IIB intron. However, the group IIB introns in ycf3 are also precisely lost from subgenus Grammica and Cuscuta nitida (Table 1), indicating a mechanism for intron loss that is not limited to group IIA introns. Intron losses from intact plastid genes are not unprecedented in land plants [14,36,37], but they are sporadic and rare. Such losses are much more frequent in conjugating charophycean algae, perhaps due to higher rates of homologous recombination or levels of reverse transcriptase activity [4]. Independent loss of six introns from five different functional genes in Cuscuta suggests this lineage is much more prone to purge introns from its plastid genomes than other land plants, although the mechanism for this increased rate of intron loss is unclear. Because the rpl2 intron was lost before the evolution of parasitism in Cuscuta [28], the high rate of intron loss from otherwise intact genes in Cuscuta may or may not be related to its parasitic habit.
Loss of matK from the plastid genome of Cuscuta is only possible due to a unique combination of tRNA loss related to heterotrophy and a predisposition for plastid intron loss that is otherwise unknown in land plants. This special situation provides an opportunity to test the prediction that matK is indeed required for splicing of most group IIA introns, but isn't required for the evolutionarily distinct group IIA intron 2 of clpP, group IIB introns, nor the group I intron in trnL-UAA. Since the invasion of the chloroplast genome by all group II introns other than intron 2 of clpP at least 450 million years ago, matK has performed the role of both a cisand transgroup IIA intron-splicing element in the plastid genome. All plastid genomes retaining any of these group IIA introns in genes necessary for survival must also retain a functional copy of matK; thus, loss of matK from functional plastid genomes is expected to be rare or perhaps even nonexistent in land plants other than Cuscuta. Parallel changes in matK associated with intron loss in two independent lineages of parasitic plants indicate that reduction of generalist splicing requirements may cause the protein to undergo adaptive changes to specialize on remaining intron splicing functions. Alternatively, one of three introns lost on the branch to Cuscuta nitida and possibly also on the branch to Epifagus may be primarily responsible for the high constraint of the RNA-binding domain of matK. Investigation of these and other parasitic lineages, which have evolved as natural plastid gene and intron knockout mutants, will help further understanding of organellar intron and maturase coevolution.

Materials and Methods
Complete plastid genome sequences of Cuscuta obtusiflora, Cuscuta exaltata, and Ipomoea purpurea were used to design primers for this study, assess presence of non-group IIA introns within Cuscuta, to eliminate the possibility of gene transpositions in cases of PCRdetected intron and matK loss, and to verify the presence of only the expected loci for genes examined in this study. Genbank accession numbers and voucher numbers for sequences used for this study are shown in Table 4.
Primer combinations to assay intron or matK presence were chosen for ease of band size interpretation on 1% agarose gels stained with ethidium bromide. PCRs for matK and plastid introns were conducted using a combination of published [30,38,39,40] and newly designed primer sequences (Table 5). Most sequencing was performed on a Beckman-Coulter CEQ8000 system according to manufacturers protocol, and the remaining sequences were generated by the Pennsylvania State University Nucleic Acids Facility on an ABI 3730XL.
Separate matK phylogenies were estimated for the Cuscuta/ Convolvulaceae and Epifagus/Orobanchaceae analyses. Maximum Likelihood (ML) trees were estimated in PAUP*4.0b10 [41] using GTR + gamma models with parameters estimated from the data. The ML trees were used in molecular evolutionary analyses to test for change in constraint on lineages leading to Cuscuta nitida and Epifagus. Likelihood ratio tests were applied to compare a series of nested models including equal constraint (M0 d N /d S = v), variation in v across sites (M3, M2a and 1a ) and distinct patterns of variation across sites on foreground and background branches (branchXsites models). Model parameter and likelihood values ( Table 2 and 3) were estimated using codeml within the PAML package v.3.15 [27]; http://abacus.gene.ucl.ac.uk/software/paml. html). Foreground branches were specified as those leading to Cuscuta nitida or Epifagus in separate analyses. Sites with Bayes empirical Bayes posterior probabilities .0.95 for v 2 .1.0 were estimated in codeml [25].