Mechanisms of Intron Loss and Gain in the Fission Yeast Schizosaccharomyces

The fission yeast, Schizosaccharomyces pombe, is an important model species with a low intron density. Previous studies showed extensive intron losses during its evolution. To test the models of intron loss and gain in fission yeasts, we conducted a comparative genomic analysis in four Schizosaccharomyces species. Both intronization and de-intronization were observed, although both were at a low frequency. A de-intronization event was caused by a degenerative mutation in the branch site. Four cases of imprecise intron losses were identified, indicating that genomic deletion is not a negligible mechanism of intron loss. Most intron losses were precise deletions of introns, and were significantly biased to the 3′ sides of genes. Adjacent introns tended to be lost simultaneously. These observations indicated that the main force shaping the exon-intron structures of fission yeasts was precise intron losses mediated by reverse transcriptase. We found two cases of intron gains caused by tandem genomic duplication, but failed to identify the mechanisms for the majority of the intron gain events observed. In addition, we found that intron-lost and intron-gained genes had certain similar features, such as similar Gene Ontology categories and expression levels.

Three models of intron loss have been proposed: the reverse transcription (RT) model, the genomic deletion model, and the model of non-homologous end joining (NHEJ) repair of double strand breaks [7][8][9]. In recent years, many attempts have been made to test these models by phylogenetic analysis of the presence and absence of introns among orthologous genes. Among the three models, the RT model has been widely tested. It was initially proposed to explain the 59-biased distribution of the limited number of introns in the budding yeast Saccharomyces cerevisiae [13]. In this model, the 39 side of a mature mRNA is more successfully reverse-transcribed because of the occasional dissociation of reverse transcriptase from template mRNAs. Recombination of the partial cDNA products with genomic DNA causes exact loss of an intron or introns from the gene. Its first prediction is that introns are preferentially lost from the 39 sides of genes. This pattern has been observed in Dictyostelium discoideum, Schizosaccharomyces pombe, Mycosphaerella, Cryptococcus, Caenorhabditis elegans, Anopheles, and mammals [14][15][16][17][18][19], but not in Fusarium graminearum, Magnaporthe grisea or rice [20,21]. In Neurospora crassa and Aspergillus, the biased position of intron loss has been observed in a comparative analysis among distantly related species [18], but not in analyses of closely related species [20,22]. In both Arabidopsis and Drosophila, conflicting results have been reported [10,[23][24][25][26].
A modified version of the RT model is that reverse transcription is primed by the polyA tail itself and, therefore, the preferential loss of which introns depends on the specific secondary structure of the mRNA molecules [27,28]. It could explain the intron losses that were not biased to the 39 sides of genes. However, this modified version has not received further support [10,22]. The second prediction of the RT model is that adjacent introns tend to be lost simultaneously. This prediction has been confirmed in Cryptococcus, Fusarium, Aspergillus, Drosophila and mammals [10,15,16,19,22], but not in Caenorhabditis, Plasmodium or Arabidopsis [23,29,30]. In addition, homologous recombination between cDNA and genomic DNA would produce an exact intron loss. By contrast, in the genomic deletion model, introns are lost individually and often imprecisely by unequal exchange of alleles. In most previous studies, especially those focused on distantly related species, filtration of unreliable alignments artificially excluded all possible cases of imprecise intron loss. Up to now, only a few cases of imprecise intron loss have been detected in pufferfish, Drosophila, and Muridae [10,31,32]. Recently, it was proposed that introns could be lost either precisely or imprecisely during the NHEJ repair of double strand DNA breaks [9]. By analyzing the frequency of micro-homology between splice sites of lost introns, researchers observed evidence for the NHEJ model in Arabidopsis [23], but not in Drosophila [10]. This micro-homology would degrade gradually during evolution; therefore, this signal could only be detected in closely related organisms.
Compared with intron loss, the possible mechanisms underlying intron gain are much more diverse. At least six mechanisms have been proposed for intron gain: intron transposition, transposon insertion, group II intron insertion, tandem genomic duplication, intron transfer and NHEJ-mediated intron gain events [11,33]. The NHEJ-mediated intron gain, which is similar to the NHEJ model of intron loss, has been frequently observed in Daphnia and Aspergillus [22,34]. The intron transposition model suggests that an intron is reverse-spliced into a different position of its own mRNA or another mRNA. Subsequently, recombination of the cDNA reverse-transcribed from the mRNA with genomic DNA would create a new intron [33]. Therefore, the newly gained intron would have high sequence similarity with an intron in the same gene or an unrelated gene. Evidence for this model has been found in Oikopleura and Mycosphaerella [14,35]. Insertions of transposable elements containing splicing signals may create new introns. The test of this model requires similarity between gained introns and transposons. A few such cases have been found in Drosophila and Arabidopsis [10,23]. In Cladosporium and Dothistroma, introner-like elements (ILEs) were found to contribute to new introns [36,37]. However, further study is required to show whether the ILEs were inserted similarly to transposons or via reverse splicing. New introns may also arise among tandem genomic repeats containing cryptic splice sites. This tandem genomic duplication model has also gained some support [19,38]. Finally, introns may be transferred between paralogs or genes with highly similar segments by gene conversion. A few cases of intron gain by this model have been observed in Chironomus, Aspergillus and Mycosphaerella [14,22,39]. Studies on intron gain were extensively reviewed in [11].
A broad definition of intron loss and gain could include deintronization and intronization, which are conversions of intron sequences to exon sequences, and vice versa, by mutations [40][41][42]. Intronization and/or de-intronization events have been found in Cryptococcus, Fusarium, Caenorhabditis, mammals and Populus [15,40,41,43,44]. However, detection of intronization/de-intronization events depends heavily on the quality of gene annotation. Caution must still be taken when relevant transcriptome data are limited [22].
The fission yeast, Schizosaccharomyces pombe, has a low intron density [2,45]. Its high quality gene annotations and high coverage of transcriptomes make it a good candidate for the study of intron evolution. Previous studies indicated that it has experienced extensive intron losses during evolution [17,18,46,47]. However, all the studies were comparisons of S. pombe with distantly related organisms, such as vertebrates and plants. Most of the evolutionary traces of intron loss and gain could not be retained over such a long evolutionary history. For this reason, we compared four closely related species; Schizosaccharomyces cryophilus, Schizosaccharomyces octosporus, S. pombe and Schizosaccharomyces japonicus with six outgroup fungus species (Figure 1). Certain novel findings obtained in Schizosaccharomyces enriched our understanding of the mechanisms of intron evolution.

Genomes and Gene Annotations
We downloaded the genome sequences and gene annotations of four fission yeast species (S. cryophilus, S. octosporus, S. pombe and S. japonicus) from the Broad Institute (http://www.broadinstitute.org/ science/data, September 08, 2012). Data for Saitoella complicata, Aspergillus niger, Nectria haematococca, Sporobolomyces roseus, Cryptococcus neoformans and Phycomyces blakesleeanus were obtained from JGI (http://genome.jgi.doe.gov/, September 08, 2012). Saitoella complicata was selected because it is closely related to the Schizosaccharomyces genus and the other five outgroup species were selected because of their relative abundance of introns. The phylogenetic relationship between the Schizosaccharomyces species and the outgroup species is shown in Figure 1.
Genes with obvious annotation errors, such as those having coding sequences with non-multiples of three nucleotides or those conflicting with their protein sequences, were discarded. If a gene had multiple transcript isoforms, the longest mRNA was retained for analysis. As alternative splicing events in fungi are rare compared with plants and animals [47,48], this selection, even if inaccurate, would not affect our final results significantly.

Detection of Orthologs
First, the best reciprocal BLAST was used to search for orthologous protein-coding genes across the four Schizosaccharomyces taxa, with thresholds of E,10 210 and identity $0.25. Then, all the possible orthologs were imported into OrthoCluster 2.0 [49] to generate synteny blocks among the four organisms. The minimum orthologous gene number and the maximum mismatched gene number in each block were set to 3. Only orthologous gene pairs that were located in synteny blocks were retained to avoid retrogenes or processed pseudogenes being mistaken as true orthologs. In total, 2,963 1:1:1:1 orthologous genes were found, among which 2,108 intron-containing groups were used for further analysis.

Identification of Unique Intron Positions
Each group of orthologous proteins was aligned using MUS-CLE 3.8 [50] and intron positions were mapped onto the alignments. Position candidates for intron loss and gain were filtered using the following criteria: a) Adjacent intron positions in different taxa that were less than five amino acids in distance were excluded as they might represent intron sliding events; b) Introns near large gaps (longer than five amino acids), which might represent possible intronization or de-intronization events, were manually checked and analyzed separately ( Figure 2); c) If the identity of 15 amino acids neighboring an intron position on each side was less than 0.30, the first quartile of all orthologous protein sequence identities, it was discarded because of poor alignment. This resulted in 1,775 conserved intron positions (Table S1) and 808 unique intron positions.
Six outgroup fungus species were used to distinguish intron losses and intron gains among the unique intron positions. First, the outgroup orthologous proteins were identified and aligned with the intron-containing Schizosaccharomyces proteins using the same method mentioned above. Second, for each candidate intron position, the presence and absence of introns in each species were marked as 0 (lacks an intron), 1 (has an intron) or ? (no orthologous regions). This data list was imported into the Dollop (Dollo and Polymorphism Parsimony) program in the PHYLIP 3.6 [51] package to detect intron loss and gain events that happened in the taxa and nodes within the Schizosaccharomyces group. Finally, considering the limitation of Dollop (Text S1) and the empirically low occurrence of intron gains, we identified an intron gain event only when the identification was supported by $4 outgroup branches.

Certification of Target Intron Positions Using Transcriptome Data
In order to exclude the possibility that the unique intron positions, including intron losses/gains and intronizations/deintronizations, were gene structure annotation artifacts instead of real intron changes, it is necessary to use transcriptome data to identify related gene structures. The RNAseq/Inchworm/PASA assembly sequences of the four fission yeast species were downloaded from the Broad Institute (http://www.broadinstitute.org/ annotation/genome/schizosaccharomyces_group/MultiDownloads. html, October 02, 2012). The transcripts were mapped onto the corresponding genomic sequences using BLAT 34 [52]. A target intron position supported by $1 RNA assembly was regarded as a transcript-certified intron position. If a target position lacked related transcripts, we searched the related gene in the Feature Search page of Broad Institute (http://www.broadinstitute.org/annotation/ genome/schizosaccharomyces_group/FeatureSearch.html) to get additional transcripts that are only available on the webpages.

Results and Discussion
Among the 2,108 intron-containing orthologs across the four Schizosaccharomyces species, we found 1,775 conserved intron positions and 808 unique intron positions. By consulting the orthologous genes in the six outgroup fungal species, we identified   (Tables S2-S3), as well as 156 putative cases of intronization and de-intronization.

Intronization and De-intronization Are Rare in Fission Yeasts
Intronization or de-intronization events are characterized by unique introns neighboring large gaps of exons where the other exon parts remain well aligned ( Figure 2). In this study, 156 putative cases of intronization and de-intronization events were detected. When transcriptome data were not available, changes of splicing signals might be used as evidence of intronization and deintronization. Among the 156 possible cases, we found that changes of splicing signals occurred in 63 cases. By contrast, only two cases were supported by transcriptome data: one case of intronization and one case of de-intronization (Table 1, Figure 3). Surprisingly, changes in splicing signals were observed in the intronization case, but not in the de-intronization case. In addition to these two cases, there were 12 putative cases of intronization/ de-intronization that were neither supported nor disproved by transcriptome data. Most of the cases, even those having changes of splicing signals, were disproved by transcriptome data (Table 1). A conclusion can be drawn here is that the identification of intronization and de-intronization depends heavily on the accuracy of genome annotation. The putative intronizations and de-intronizations in some previous studies, such as those in Aspergillus [22], are likely to be mostly annotation errors.

Mutations in Splice Sites and Branch Sites Lead to Intronization and De-intronization
In gene SPBC29A10.02 of S. pombe, an intronization event occurred by the conversion of a coding segment into a new intron. At the two ends of this segment, two point mutations (C to G and A to G) created the two splicing sites ( Figure 3A).
In the case of de-intronization, the fourth intron of gene SPBC29A10.14 was conserved among S. octosporus, S. pombe, and S. japonicus. However, the orthologous sequence of the intron had changed into an exonic segment in S. cryophilus ( Figure 3B). No mutations were detected in 59 or 39 splice sites. Using the ICAT program and Weblogo 3.3 [53,54], we identified the consensus sequence (YTRAY) of intron branch sites in Schizosaccharomyces species ( Figure 3C). Furthermore, we found an A to T mutation in the branch site of gene SPOG_00055 in S. cryophilus ( Figure 3D). This point mutation probably caused the de-intronization by inactivating the branch site.
Multiple signals are required for efficient splicing [55]; therefore, conversion of an exonic segment into an intron requires multiple constructive mutations. The point mutations we observed in gene SPBC29A10.02 merely represent the end steps. By contrast, mutations in any of the essential signals of splicing (e.g., the 59 and 39 splicing signals, the branch sites, and the exonic splicing enhancers) could produce alternative products [55] or even cause de-intronization. Unless cryptic splicing signals are very common, de-intronization is expected to have a higher frequency than intronization. However, de-intronization introduces an insertion and possibly premature stop codons into mRNA, which might be deleterious and be selected against. Therefore, the observed frequency of de-intronization should be much lower than the actual frequency. Previous studies observed a higher frequency of intronization than de-intronization [40,41].

Most Intron Loss and Gain Positions Were Supported by Transcriptome Data
The identification of intron loss or gain events were initially based on the gene structure annotations of the four fission yeast species. However, some of these annotations might be erroneous, which could be seen from the fact that most putative intronization or de-intronization events were annotation errors. For intron losses and gains, gene annotation errors could also lead to false-positive results. If an exonic segment was mis-annotated as an intron in orthologous genes, a simple deletion of it would lead to a falsepositive case of intron loss. Similarly, a simple insertion of exonic sequence mis-annotated as an intron would lead to a false-positive case of intron gain. Therefore, the transcriptome data were also required to support the target intron loss or gain events.
In our datasets, we found that most intron loss and gain positions were supported by transcriptome data (Tables S2-S3). In only three out of the 677 putative cases of intron loss, the extant orthologous introns were lacking related transcriptome data. Therefore we are not sure whether they are really introns. Among the intron-absent genes, 14 were not covered by any transcripts. These genes might be inactivated after losing their introns or be uncovered simply because of low-coverage transcriptomes. For accuracy, we excluded these 17 cases and thus retained 660 cases of intron loss in the further analyses (Table 2). On the other hand, all 62 putative cases of intron gain were found to be successfully spliced out from pre-mRNAs (Table 2). In contrast with intronization or de-intronization events, most of the intron loss and gain events were supported by transcriptomes.
Based on the divergence time between taxa and the number of considered introns in each species and nodes, the rate of intron loss was calculated (Table 2). We found that the variation of intron densities within the Schizosaccharomyces group was negatively correlated with the intron loss rates ( Figure S1). It seems that intron loss was the main force that shaped the gene structures in fission yeasts.

Evidence for the Genomic Deletion Model of Intron Loss
Although the genomic deletion model of intron loss is widely cited [7,8], very few supporting cases have been revealed [10,31,32]. In our dataset of intron loss, most cases are exact losses of entire introns. Fortunately, we detected four cases of imprecise intron deletions (Figure 4). In the first case, gene SJAG_01807 lost an intron together with 3 nt of the downstream exon ( Figure 4A). In the second case, loss of an intron from gene SPBC17A3.05c was also accompanied by loss of 3 nt from the downstream exon. Meanwhile, we found that one of its orthologous genes, SJAG_01160, also lost the same intron, but this was a precise loss ( Figure 4B). In the third case, an imprecise intron loss occurred in gene SJAG_05247 when 6 nt of the intron were left on the downstream exon ( Figure 4C). In the last case, the orthologous gene pair SPOG_00241 and SOCG_04299 both lost an intron at the same position, but left 3 nt of the intron on the upstream exon ( Figure 4D). We hypothesize that this represents one intron loss event that occurred in the common ancestor of S. cryophilus and S. octosporus.
None of these imprecise intron losses caused any frameshifts in the coding sequences. It is very likely a result of negative selection. Imprecise intron losses that caused frameshifts might also have occurred, but have been eliminated. In addition, some cases of imprecise intron losses that did not cause frameshifts might also have been eliminated because of indels of amino acid sequences, especially when the indels were very long. Genomic deletion may also produce exact intron loss (at a low frequency); therefore, we suggest that the genomic deletions that have actually occurred in evolution should be more frequent than the imprecise intron deletions we observed.
In addition, the frequency of genomic deletion might also been underestimated for methodological reasons. In previous studies [10,31,32] and the present one, imprecise intron losses all involved leaving or moving very short exon sequences. All the studies focused on intron sites in well-aligned sequence regions; therefore, deletions of large regions would definitely be filtered out and the remaining cases observed might reflect only part of the genomic deletion events that have actually occurred. For this reason, we searched all the remaining unique intron sites that were discarded because of low alignment identity. Two possible cases were found ( Figure S2). Unfortunately, we could not find orthologous genes in the outgroup species, and thus failed to distinguish between intron loss and intron gain for these two cases. Further evidence is required to determine whether they represent imprecise intron losses or imprecise intron gains, an unknown phenomenon with no previous reports.

Intron Loss Events Are Mainly Caused by Reverse Transcription
Similarly to previous studies in other taxa [15,16,19], three predictions of the classical RT model of intron loss have been confirmed in fission yeasts. First, most cases (656 among 660) of the intron losses are precise intron deletions. Second, adjacent introns tend to be lost simultaneously. We observed 38 groups of losses of adjacent introns: six in the ancestor of S. cryophilus and S. octosporus, 17 in S. pombe, seven in the ancestor of S. cryophilus, S. octosporus and S. pombe and eight in S. japonicus. Referring to the method of Roy and Gilbert [17], we calculated the probability distribution of the loss of adjacent introns with the assumption of independent loss of each intron ( Figure 5). The probabilities that exceed the number of observed lost intron pairs were low enough (0.043,6.8610 27 ) to deny the null hypothesis. Furthermore, we observed a preferential loss of introns at the 39 side of genes. As shown in Table 3, the 39-biased intron loss is significant in every species and node of the yeasts studied.
Meanwhile, we also tested the NHEJ-mediated model of intron loss by surveying the micro-homology between 59 and 39 splice sites of lost introns [9]. The frequency of direct repeats around lost introns was not significantly higher than that around conserved introns (P.0.10 in all species and nodes). Thus, the NHEJ-mediated model was not supported by the intron losses in fission yeast.

Evidence for Tandem Genomic Duplications Leading to Intron Gain
Intron gains are generally observed at a much lower frequency than intron losses. Even for the limited number of intron gains, definite source sequences could not be found for most new introns [22,34]. This delayed the interpretation of the mechanisms of intron gain and raised doubts concerning the reliability of the identified intron gain events [22]. Similarly, we did not find the source sequences of most of the new introns identified in this study. Among the 62 cases of intron gains, definite source sequences were revealed for only two new introns. We found that the similarity can be extended to the neighboring exons. They were both similar to nearby exons and seemed to result from tandem genomic duplications. Gene SPOG_01682 gained its intron at position 576-0 (after the 576th amino acid, phase 0) and its ortholog SOCG_00815 gained its intron at another position 612-0. Both of the introns comprised tandem repeats ( Figure 6A). SPOG_01682 had a spliced EST, SCY_iw_8826, which spliced out a longer segment than the annotated intron ( Figure 6B). Thus, the true in vivo situation of this intron requires further investigation.   Only introns that were not filtered out by our criteria were counted, including conserved introns and unique introns. Therefore, they are smaller than the total numbers of annotated introns. c The divergence time between each species and the nodes were obtained from [47]. Data are shown in millions of years. d Introns lost or gained at the ancestor nodes were counted only once.     SOCG_00815 also had a spliced EST, SO_iw_16530 and it mapped with the intron correctly, although it only covered a small part of the 59-side exon ( Figure 6B). In addition, the 24-nt repeat units around these two gained introns were not totally identical ( Figure 6C), which reduces the possibility of them arising through sequence assembly errors. It seemed that the proto splice sites (AGGC) in the repeat units led to the occurrence of new introns.
We also attempted to test the NHEJ model of intron gain by surveying the frequency of direct repeats near intron-exon boundaries [11,34]. Among the newly gained introns, 29.2% have direct repeats near boundaries. Comparatively, we found that 25.1% of conserved introns also have direct repeats near their boundaries. A Pearson square test showed that the difference is not significant (P = 0.338). Therefore, the NHEJ model of intron gain was not supported in fission yeasts.

Some Genes are More Likely to Lose and Gain Introns
In our dataset, 58 genes lost multiple introns and five gained multiple introns. It seems that intron number variations are not evenly distributed among different genes, but biased to certain special ones. We tested whether intron-lost (IL) genes and introngained (IG) genes are clustered in certain special features. BiNGO 2.44 [56] was used to classify and characterize the analyzed genes into Gene Ontology (GO) categories, which were based on the S. pombe GO annotation from the Gene Ontology website (http:// www.geneontology.org). Compared with the whole gene sets, IL genes were more likely to participate in metabolism, molecular transportation and enzyme activity regulation (Table 4). Interestingly, IG genes were also clustered in similar GO categories, although the sample size was much smaller (Table 4). We also showed that IL genes and IG genes both had significantly higher expression levels than other genes (Table S4). These results implied that intron loss and gain in fission yeast might share some similar mechanisms.
As discussed above, intron losses were mainly mediated by reverse transcriptase. In the current models of intron gain, only the intron transposition model shares a similar mechanism (i.e. a requirement for reverse transcription) with intron loss. However, the intron transposition model requires similarities between newly gained introns and other extant introns, which was not observed in our study. As the selective constraint is much lower for introns than for coding regions, accumulated mutations in gained intron sequences might have resulted in them being significantly divergent from the source sequences. It is also possible that some of the identified intron gains might be intron losses in other taxa, considering the high intron loss rates in fission yeasts. Falsepositive results of intron gains would cause the IG genes share some false-positive similarities with IL genes.

Conclusions
In fission yeasts, we found a higher frequency of intron loss than intron gain. Although a moderate number of putative intronization and de-intronization events were observed, most cases were filtered out using transcriptome data. Careful examination of the confident cases of intronization and de-intronization revealed them to be caused by mutations in splice sites and branch sites. Although at a low frequency, imprecise intron deletions were observed, supporting the genomic deletion model of intron loss. The characteristics of most intron losses are consistent with the RT model. Similar to previous studies [22,34], the source sequences of most newly gained introns identified in this study were not found. A hypergeometric test was used and only GO categories with P values lower than 0.05 were shown. *GO categories that were both present in intron-lost and intron-gained genes. The null hypothesis is that the intron-lost and intron-gained genes were randomly clustered among the GO categories. However, three out of four GO categories of intron-gained genes were shared by intron-lost genes, which implied that intron loss and gain in fission yeast might share some similar mechanisms. doi:10.1371/journal.pone.0061683.t004 Some of the source sequences might have been lost during evolution, or became significantly dissimilar. It is also possible that some of the intron gains might actually be intron losses in other species. In spite of this, evidence for the tandem genomic duplication model was supported by two cases of intron gains. We also found that intron loss rates are not uniform. Some genes, like those participating in metabolism, molecular transportation and enzyme activity regulation, are more likely to lose their introns.

Supporting Information
Table S1 Conserved introns across the four fission yeast species.
(XLS)    Text S1 The limitation of the Dollop program. (DOC)