Scanning of Transposable Elements and Analyzing Expression of Transposase Genes of Sweet Potato [Ipomoea batatas]

Background Transposable elements (TEs) are the most abundant genomic components in eukaryotes and affect the genome by their replications and movements to generate genetic plasticity. Sweet potato performs asexual reproduction generally and the TEs may be an important genetic factor for genome reorganization. Complete identification of TEs is essential for the study of genome evolution. However, the TEs of sweet potato are still poorly understood because of its complex hexaploid genome and difficulty in genome sequencing. The recent availability of the sweet potato transcriptome databases provides an opportunity for discovering and characterizing the expressed TEs. Methodology/Principal Findings We first established the integrated-transcriptome database by de novo assembling four published sweet potato transcriptome databases from three cultivars in China. Using sequence-similarity search and analysis, a total of 1,405 TEs including 883 retrotransposons and 522 DNA transposons were predicted and categorized. Depending on mapping sets of RNA-Seq raw short reads to the predicted TEs, we compared the quantities, classifications and expression activities of TEs inter- and intra-cultivars. Moreover, the differential expressions of TEs in seven tissues of Xushu 18 cultivar were analyzed by using Illumina digital gene expression (DGE) tag profiling. It was found that 417 TEs were expressed in one or more tissues and 107 in all seven tissues. Furthermore, the copy number of 11 transposase genes was determined to be 1–3 copies in the genome of sweet potato by Real-time PCR-based absolute quantification. Conclusions/Significance Our result provides a new method for TE searching on species with transcriptome sequences while lacking genome information. The searching, identification and expression analysis of TEs will provide useful TE information in sweet potato, which are valuable for the further studies of TE-mediated gene mutation and optimization in asexual reproduction. It contributes to elucidating the roles of TEs in genome evolution.


Introduction
Sweet potato [Ipomoea batatas] is the world's seventh largest food crop cultivated worldwide due to its high yield, wide adaptability and strong resistance.It is grown on about 9 million hectares in the world, yielding 140 million tons per year, and over 97% of the world output of sweet potato is produced from developing countries.The cultivated area and yield of sweet potato in China, about 6.6 million hectares and 100 million tons, account for 70% and 85% of total area and yield of the world, respectively [1,2].Sweet potato belongs to the Convolvulaceae family, Ipomoea genus, Batatas section.It is the only hexaploid (2n = 6x = 90) plant in this section with a huge genome (2,200 to 3,000 Mbp) [3][4][5] and complicated genetic structure.Many questions like genome sequencing and genetic evolution mechanism are still unresolved so far.In general, the organisms adapt to the changing environment through favorable mutation and chromosome recombination in the process of sexual reproduction.Since sweet potato mostly performs asexual reproduction, how does it reorganize its genetic substance in the process of evolution, while lacking of gametic recombination?
Transposable elements (TEs) are one of the important genetic factors for genome reorganization.They are the most abundant genomic components in eukaryotes, even accounting for more than 50% of the entire genome [6][7][8][9], especially in some large cereal genomes such as maize (85%), wheat (80%) and barley (84%) [10][11][12][13].The TEs affect the genome as mutagenic agents by replicating and translocating to generate plasticity, producing structural changes in single gene or overall genome followed by altered spatial and temporal patterns of gene expression and, ultimately, gene function [14].Although mutations may be harmful, and could lead to different diseases and even death of the individuals, they are the basis of biological evolution for the species.Thus, mutations generate diversity that may provide adaptive advantages to the changing environments, being further selected as a result of the natural selection [15].The role of TEs in evolution was proposed by Barbara McClintock in the 1980s, since then progress had been made to understand the significance of TEs in genome evolution through the comprehensive study of the structure and function of TEs in different organisms [16].For example, it has been reported that Helitrons, a kind of DNA-TEs in Zea mays, could capture and move gene fragments to an extent that around 20% genes in maize genome were found located differently between two maize lines [17].And in Arabidopsis thaliana, Helitrons could proliferate themselves in the genome after acquiring the exon fragments [18].Since many of the exon fragments captured by TEs are expressed, the TE-mediated exon shuffling might lead to the appearance of novel genes.A series of studies have been carried on the mutational capacity of TEs, their ability to regulate genetic systems, and their sensitivity to environmental stress.These studies have demonstrated that the TEs could not only shape the structure and function of the genomes, but also generate genetic polymorphisms favoring population adaptation, which plays a major role in genome evolution [16].However, the TEs are still poorly understood in sweet potato as its hexaploidy genome makes genetic manipulation and genome sequencing very challenging.
In this study, we searched and identified TEs in sweet potato on the basis of the integrated-transcriptome database, which provides a large amount of expressed TE homologues.Such a large number of TEs, which were scarcely identified in sweet potato before, represent the collection of TEs with the largest number and the most complex classification in sweet potato by far.The searching, identification and expression analysis of TEs provides useful resources and information of TEs in sweet potato, which may be valuable for the study of TE-mediated gene mutation and optimization in asexual reproduction.Our result provides a new method for TE searching on species with transcriptome sequences while lacking genome information.Such amounts of TEs found in sweet potato are important data resources and material bases for studying the TE functions further.It also contributes to the elucidation of the roles of TEs in genome evolution.

Integrated-transcriptome Database of Sweet Potato
There were totally four transcriptome databases of sweet potato which had been established from primary sequencing data of three cultivars in China.Two of them were established by our laboratory from the vegetative organ [19] and flowers [20] of Xushu 18 cultivar (XS 18) in 2012.The other two came from the fibrous and tuberous root of Guangshu 87 cultivar (GS 87) in 2010 [21] and the tuberous root of a purple sweet potato Jingshu 6 cultivar (JS 6) in 2012 [22], respectively.The characteristics and details of the four transcriptome databases are listed in Table 1.
Sweet potato integrated-transcriptome database was established by integrating the above four databases.All the raw reads from these databases were combined and de novo assembled [23][24][25][26].The resulted integrated-transcriptome database comprised of totally 279,473 transcripts, 118,309 of which were .200nt in length, and the longest one was 13,067 nt.For annotation of the assembled transcripts, sequence-similarity search was conducted against the NCBI non-redundant (Nr) protein database through Basic Local Alignment Search Tool (BLAST) alignment [27].The transcripts longer than 200 nt were submitted for annotation through Blast2GO, and 60,976 transcripts were annotated.The integrated and the other four transcriptome databases were used for searching TEs and for analyzing the quantities, distributions and expression levels of TEs inter-and intra-cultivars, respectively.
Among the 883 retro-TEs, there were 247 with LTR sequences which could be classified into Ty1/Copia and Ty3/Gypsy superfamily, 507 without LTR and 129 unclassified.In terms of the 522 TEs belonging to class II, 501 of them could be classified into 6 superfamilies, including Tc1/Mariner, hAT, Mutator, PIFharbinger, CACTA and Helitron, and 21 were unclassified.The unclassified TEs showed sequence similarity lower than the threshold value set in BLAST alignment with the known TEs and thus couldn't be classified.However, they can still be further classified if the TE-specific conserved sequences analysis was carried out.For example, the conserved sequence CTAG and its preceding 18 bp palindromic sequence was suggested to be able to produce a hairpin loop to capture gene fragments by an unknown mechanism possibly associated with their rolling circle (RC) replication process [31].Depending on this characteristic, the initially unclassified Arabidopsis thaliana TE Basho was lately grouped into Helitron superfamily.This kind of TE-specific conserved sequence was also existed in the 39 terminus of 4 unclassified sweet potato TEs.Therefore, the detailed analysis of the complete sequences and some experiment validations will be helpful to identified unclassified TEs.It is noteworthy that few of the 1,405 TEs were collected with full length in plant repeat databases like PlantTribes or Repbase [32,33].All the TEs identified in sweet potato are summarized in Table 2.
The 1,405 TEs were submitted to GenBank for similarity alignment using BLASTn program and the result showed that the TEs in sweet potato shared deep homologies with those from at least 28 species of higher plants.The plant which got the most BLAST hits in number was Vitis vinifera (20.6%), followed by Arabidopsis thaliana (15.0%),Oryza sativa (13.7%),Ipomopea trifida (10.2%), etc (Figure 1).In addition, the plant which got the highest identity to sweet potato TEs was Ipomopea trifida, with sequencesimilarity distribution ranging from 58% to 100%, followed by Vitis vinifera (50%,98%), Glycine max (46%,95%), Populus trichocarpa (43%,90%), Arabidopsis thaliana (39%,89%), Zea mays (47%,87%), Oryza sativa (43%,84%), etc.The high homologies of TEs between sweet potato and other plants revealed that these TEs were widely distributed in the vast majority of the genomes of higher plants, and thus provided further evidence for the evolutionary relationships between sweet potato and other dicotyledonous plants.

Differences in the Number of TEs Inter-and Intra-cultivars
To compare the number of the expressed TEs in three different sweet potato cultivars and different tissues of XS 18, the reads of the four primary databases were mapped to the 1,405 TEs.Firstly, the two databases from the same XS 18 cultivar were combined, then the number of TEs among three different sweet potato cultivars (XS 18, GS 87 and JS 6) can be determined.Secondly, the two databases established from the same XS 18 cultivar, corresponding to the vegetative and reproductive organs of XS 18, were used to analyze TE expression difference and specificity between these two kinds of tissues in XS 18.Finally, the vegetative transcriptome database of XS 18 was used to analyze TE expression difference and specificity among seven different vegetative organs and tissues, including YL (young leaves), ML (mature leaves), stem, FR (fibrous roots), ITR (initial tuberous roots), ETR (expanding tuberous roots) and HTR (harvest tuberous roots).Among 1,405 TEs expressed in sweet potato, there were 1,209 TEs identified in XS 18, 994 TEs in GS 87 roots and only 5 TEs in JS 6 roots.However, except for the five TEs found in JS6 which were also expressed in other two cultivars, we found that the expressions of some TEs were restricted to one cultivar.The pairwise alignments of TEs showed that 412 TEs were specifically expressed in XS 18 (called XS18-specifically expressed TEs, XS18_SETEs) and 197 TEs in GS 87 (called GS87-specifically expressed TEs, GS87_SETEs).In the XS 18 cultivar, there were 1,030 TEs expressed in vegetative organs and 832 in reproductive organs.Similarly, there were 157 TEs expressed specifically in the vegetative organs (called vegetative organ-specifically expressed TEs, VO_SETEs) while only 124 in the reproductive organs (called reproductive organ-specifically expressed TEs, RO_-SETEs) (Figure 2).The TE numbers in each of the seven vegetative organs and tissues were determined by digital gene expression (DGE) tag profiling [19].Among 328,383 distinct clean tags generated in the DGE tag profiling, only 689 tags could be mapped to 417 TEs expressed in these seven tissues, implying that the expressions of other 988 TEs lacking the recognition bases CATG could not be detected through this method.Figure 3 demonstrated that the average number of expressed TEs in one tissue was 232.The tissue which had the lowest number of expressed TEs was YL (192 TEs), accounting for 46.04% of all the detected TEs.The highest one was FR (270 TEs), accounting for 64.75%.In addition, there were 101 TEs expressed in only one tissue, accounting for 24.22% and 109 TEs expressed in all the seven tissues, accounting for 26.14%.These two kinds of TEs were more than half of all the detected TEs and the rest TEs expressed in 2-6 tissues were relatively less (,50%).Moreover, among the tissue-specifically expressed TEs, the number expressed in FR was 2-3 times more than that in other six tissues.

Differences in the Type of TEs Inter-and Intra-cultivars
In accordance with the above results, the types of the expressed TEs were analyzed further.The TEs expressed in each cultivar were classified into superfamilies and the results are shown in Table 3.The superfamily possessing the most number of expressed TEs in both XS 18 and GS 87 was Helitron, with 145 and 140 TEs, respectively, followed by superfamily hAT (123 and 116 TEs).Even so, there were some differences between them.In XS 18, the most abundant superfamily of retro-TEs was the Ty1/ Copia, while in GS 87, Ty3/Gypsy had a slight advantage in numbers of the expressed TEs.As to the DNA-TEs, the superfamilies CACTA and PIF contained more TEs in GS 87 than in XS 18.Taken into consideration the total numbers of TEs in the two cultivars, the importance of these two superfamilies to GS87 was more obvious.
The TE types between the vegetative and reproductive organs of XS 18 were almost the same, and the Helitron was the most abundant superfamily in both organs.The situation was the same when the TE types in seven tissues of vegetative organs in XS 18 were analyzed.The superfamily with the most expressed TEs was Helitron and the least was Ty3/Gypsy.The expressed TEs in superfamily hAT and Mutator were relatively less in the vegetative organs of XS 18 (Table 4).

Differences in the Expressing Activities of TEs Inter-and Intra-cultivars
By mapping all the reads of each cultivar transcriptome database to the identified TEs in the integrated-transcriptome database, we could calculate the relative expression level of every TE in each database and analyze the differences in the expressing activities at three levels.RPKM (reads per kilobases per million reads) was used as the standardized unit, which normalized the gene length and sequencing depth to make the expression levels of genes in different transcriptomes comparable.The comparisons among cultivars demonstrated that some TEs had stable expression levels but others showed significant differences.The major characteristics were as follow: All of the 5 TEs in JS 6 cultivar showed high expression levels, ranging from 30,000,190,000 RPKM, much higher than their expression levels in the other two cultivars (about 2000 RPKM).As to the TEs expressed in XS 18 and GS 87, the expression levels of TEs ranged from 0.42,70,000 RPKM and 5.59,30,000 RPKM, respectively, indicating that the expression levels of various TEs varied greatly in different cultivars.
Meanwhile, a large number of TEs were differentially expressed in XS 18 and GS 87.For example, Ib_DTC_34571 belonging to CACTA superfamily, expressed highly in GS 87 (2759.25 RPKM) but lowly in XS 18 (0.42 RPKM).Oppositely, Ib_RN_13038, non-LTR retro-TE, had highly expression level in XS 18 (9280.42RPKM) but lowly expressed in GS 87 (32.82 RPKM).Except for that, we also found some TEs have relatively stable expression levels in two cultivars.For example, Ib_DTM_2831 belonging to Mutator superfamily, had high and similar expression level in XS 18 (4479.14RPKM) and GS 87 (4181.12RPKM).
In addition, the TE expression levels between the vegetative and reproductive organs of XS 18 were compared and some TEs expressed specifically in flowers were found.For example, Ib_RU_704, unclassified retro-TE, had a high expression in flower (3243.17RPKM) but a low expression in vegetative organs (40.09RPKM).And Ib_RLC_116580, retro-TE belonging to LTR-Copia superfamily, was expressed highly in flower (3837.77RPKM), but not expressed in vegetative organs.Therefore, Ib_RLC_116580 could be defined as the flower-specially expressed TE.

DGE Analysis of TEs in Vegetative Tissues of Xushu18
The DGE tag profiling was used to analyze TE expression levels among vegetative tissues of XS 18.We found lots of typical TEs differentially expressed in different vegetative tissues.For example, Ib_DU_31235, a unclassified TE, was expressed highly in ML (260.71TPM, tags per million reads), but its expressions in other tissues were around 30 TPM; Ib_RN_25697, a non-LTR retro-TE, was expressed highly in stem with 63.41 TPM but lowly in other tissues (around 2 TPM), even not expressed in YL and HTR; Ib_RN_6012, a non-LTR retro-TE, was expressed lowly in YL (13.13 TPM), but highly in other 6 tissues (60.66-128.63TPM) (Figure 4).In addition, some TEs were stably expressed in 7 tissues.For example, Ib_DTH_65331, a TE belonging to hAT superfamily, was expressed highly in seven tissues with TPM value higher than 100; Ib_DTC_11049, a TE belonging to CACTA superfamily, was expressed lowly in all tissues with TPM value around 1.
To analyze differential expression patterns of TEs among seven tissues, we pair-wisely compared them and obtained 21 pairs of comparisons.There were numerous TEs showing differential expression (DETEs) and specific expression (SETEs).Among 417 TEs expressed in vegetative organs, the number of DETEs between each two tissues, including the up-regulated and the down-regulated TE, ranged from 7 to 46 and the average number was 27 (Figure 5).The largest difference was observed between ETR and HTR, and there were 25 TE transcripts up-regulated and 21 down-regulated.The smallest difference occurred between FR and ITR, in which only 7 DETEs were identified.In addition, a large number of SETEs between each two tissues were also detected.In the 21 pair-wisely comparisons, the number of SETEs    which expressed in only one of the two compared tissues ranged from 108 to 166, with an average number of 137 (Figure 6).The SETE patterns among tissues revealed that the largest difference was shown between YL and FR.There were 45 TEs expressed in YL but not in FR and 121 TEs expressed in FR but not in YL, oppositely.The smallest difference occurred between YL and ETR, in which 63 and 45 TEs were specifically expressed in YL and ETR, respectively.

Evolutionary Analysis of Transposase Genes
Sixteen TEs in sweet potato belonging to superfamily hAT, Mutator and PIF, respectively, and showing high homologies with transposase genes in other higher plants, were chosen for further studies of transposase genes.Bioinformatics analyses on these 16 TEs, involving gene length, predicted protein molecular weight, isoelectric points, etc are shown in Table S2.The coding sequences of 16 transposase genes were cloned and sequenced.The sequence similarity between the gene-clone sequences using Sanger sequencing and the assembled transcripts using Illumina sequencing were all higher than 95%, indicating that the predicted TE sequences were reliable.The comparisons between predicted and sequenced TEs are shown in Table 5.
The 16 sequenced transposase genes were then used for evolutionary analyses.The three phylogenetic trees of TEs in three superfamilies (Figure 7, a-c) showed that there were close relationships between TEs in sweet potato and dicotyledons like Vitis vinifera, Glycine max and Populus trichocarpa.The TE superfamily showing the highest sequence similarity with that in Vitis vinifera was Mutator, such as Ib_DTM_1770, Ib_DTM_4635, etc.On the other hand, we analyzed the evolutionary relationships of 16 TEs with each other.The phylogenetic tree diagram (Figure 7d) demonstrated that the evolutionary relationships of TE in Mutator superfamily were very close, with the bootstrap of 100 between certain TEs.However, there appeared to be a clear differentiation within this family, since bootstrap 84 presented at the boundary of the two subfamilies of Mutator (Mutator-FAR and Mutator-PB1).Therefore, these Mutator-TEs could be classified in more detail.Noteworthily, Ib_DU_2831 was originally not classified, but since it was highly homologous with Ib_DTM_3260, it could be classified into Mutator superfamily.Similarly, the unclassified TE Ib_DU_1235 could be classified into hAT superfamily because of its closest genetic relationships with Ib_DTH_1962.However, from the phylogenetic tree diagram we could see that the PIF superfamily member Ib_DTP_11286 showed low homology with the same superfamily member Ib_DTP_9943, even lower than the hAT superfamily member Ib_DTH_1962.This might indicate that these two superfamilies were relatively close in sequence similarity and even in component structure.The alignment of 1,405 TEs with each other by using BLASTn was employed to analyze the genetic relationships among TEs.The results showed that 78 of 162 unclassified TEs had their relatives with similarity more than 78%, so that the evolutionary relationship analysis could be used to identify some unclassified TEs.

Identification of Transposase-gene Copy Number
Real-time polymerase chain reaction (PCR)-based methodology for the determination of transposase-gene copy number was introduced and demonstrated [34][35][36].Absolute quantification determines the exact copy number of transposase gene by relating the Ct value to a standard curve.According to the standard curves of 11 transposase genes and one single-copy S8e gene [37], the copy number of each transposase gene in genomic DNA per microliter was tested, as well as that of S8e.The transposase-gene copy number in sweet potato genome can be determined as the copy ratio of transposase gene to S8e in one sample.
The standard curves for Ib_DTM_FAR14362 and S8e gene, each ranging from 10 6 to 10 9 copies per microliter are shown in Figure 8 (the standard curves for other ten transposase genes are shown in Figure S1).The Ct values of Ib_DTM _FAR14362 in each dilution ranged from 16 to 21, while the ranges of Ct values in S8e gene were 20-28.Figure 8 shows that both curves were highly linear (R2.0.99) in the range tested by the duplicate reactions and the slopes of the standard curves were 23.84 and 23.52, respectively.From the slope of each curve, PCR amplification efficiency (E) with 0.82 and 0.92 were calculated in the investigated range, respectively.The results of absolute quantification and the calculated transposase gene copy number are shown in Table 6.It was found that the copy number was around 1 to 3.These 11 transposase genes belonged to low copy number genes, in which there were one gene with 3 copies and 6 genes with 2 copies in genome, and there were 4 genes belonging to single-copy genes.In addition, the expression levels of these 11 transposase genes in XS 18 were also calculated.They all had relatively high expression ranging from 939.33-14372.48RPKM.As listed in Table 6, there were not any proportional relationships between gene copy number in genome and their expression levels in transcriptome.

A New Perspective for TEs Scanning
Transposable elements may be important motors of genetic variability, they account for the majority of genome (as in maize), and have the ability to generate genetic polymorphisms favoring population adaptation [38].Researchers have been working to identify different TE types, in order to elucidate the molecular mechanism of TE transposition.The most widely used method to search for TE in the genome is to design element-specific primers for gene cloning by recognizing the sub-terminal conserved sequences of TEs in each family [28].As for distinguishing different types of the TEs in model organisms, it was in accordance with their transposition chemistry, that is, by the enzymes that catalyze the DNA-strand cleavage and transfer steps necessary for their movement, such as DDE transposases in IS sequences and members of Tc/Mariner superfamily [39].However, these methods, which detect TEs one-by-one, not only require significant labor, being costly and time-consuming, but also may miss some TE or generate false-positive clones because of genetic variations or nonspecific cloning.The availability of genome sequencing results of some model or non-model plants made genome-widely TEs prediction practical and feasible.Therefore, a few databases of plant TEs like SoyTEdb in Glycine max [14], as well as some integrated repetitive-element databases of eukaryotes like Repbase Update and TIGR [33,40], were established.Therefore, the methods of TE identification in known genomes were based on the sequence alignment against the fully characterized elements in some TE databases.Along with more and more transcriptomes being sequenced, a method depending on transcriptome sequencing and gene annotation was developed, to search and identify TEs in plants without prior genome information.The advantage of this novel approach is that a global overview of potentially active TEs can be obtained.In contrast, the classic strategies required the TE to be analyzed on a one-by-one basis, being firstly identified at the DNA level, which allowed further transcription and transposition studies.For example, the 276 expressed TEs were identified in Saccharum officinarum, by searching from 260,781 transcriptome sequences in the sugarcane expressed-sequence tag (SUCEST) database [41].In this paper, an improved method was used in TE searching in sweet potato, and 1,405 expressed TEs were identified after redundancy reduction.Noteworthy, the quantities and abundance of TEs searched from sweet potato were more than those searched from sugarcane.This may come from the more sensitive searching method and more abundant integrated-transcriptome database.Three different ways were used for TE searching in this study.The integratedtranscriptome database was built by re-assembling all the raw sequencing reads from four independent transcriptome databases of sweet potato representing three cultivars.It comprised plenty of transcripts with longer length and higher integrity and provided the possibilities to compare the differences of expressed TEs intraor inter-cultivars.Some TEs were randomly selected for cloning and sequencing and the results showed that the similarities of the cloned and assembled sequences were higher than 95%, implying that the assembled TEs are considerably reliable.
Since there were few TEs in sweet potato reported before, the large numbers of TEs obtained in this study will provide an abundant resource for further studies like cloning and functional identification of interested TEs.The intensive study of their transpositions will contribute to elucidating the role of TEs in genetic variation in the process of asexual reproduction and the evolution of sweet potato.

Prediction of the Size of TEs in the Sweet Potato Genome
Depending on transcriptome sequencing and gene annotation, 1,405 TEs were predicted from the sweet potato integratedtranscriptome through different searching methods.Even though these transcribed TE sequences could not completely reflect all the TEs existed in sweet potato genome, they represented the active part in these cultivars and tissues.Owing to the availability of the genome and transcriptome sequences of Oryza sativa, the TEs of rice can be predicted from the annotation information of genome and transcriptome, respectively.TEs in Oryza sativa genome could be acquired from the annotation information of the Oryza sativa genome from the web of the Rice Genome Annotation Project, http://rice.plantbiology.msu.edu/index.shtml[42].TEs in the Oryza sativa transcriptome in same cultivar were obtained by employing the same method used in sweet potato (the Oryza sativa transcriptome were assembled ourselves and are not published).As a result, there are 17,272 TEs found in the genome and 2,021 TEs in the transcriptome,indicating that the former is about 8.6-fold more than the later and most of the TEs in the genome were silent.As to the 17,272 TEs found in rice genome, most of them could be classified into retro-TEs (12,143) and DNA-TEs (3,968).In terms of the 2,021 TEs predicted from transcriptome, there are 1,290 retro-TEs and 747 DNA-TEs.The statistics above implied that there were about 10% genomic retro-TEs and 19% genomic DNA-TEs expressed.There also existed difference of the expressed members among TE superfamilies.For example, the TE members in Ty3/Mariner superfamily identified in rice genome was twice more than that in transcriptome, meaning that less than half members were expressed.While for Non-LTR subfamily, the difference was more than one hundred times higher and the expressed TEs in this superfamily were few.Conceivably, some subfamilies, such as TNP and SNF2, were not detected to express in transcriptome.Noteworthy, for some superfamilies such as hAT and Mutator, the TE members identified in transcriptome were more than those identified in genome (85 versus 27 for hAT, and 167 versus 65 for Mutator).The TE number of these two certain superfamilies in transcriptome was 2.5-3 fold higher than that in genome, the reasons may come from the transcriptional regulation or alternative splicing, and may be associated with the discontinuity in transcriptome splicing.Even so, it is certain that the numbers and types of TEs in the genome are more than the expressed TEs in transcriptome in Oryza sativa.Although the ratios of TE numbers and types predicted from the genomes to those from transcriptomes showed differences among various model plant species, to a certain extent, they could be used to predict the TE numbers and types in the genome of non-model plant species on the basis of transcriptome database.At this point we can speculate that the number of TEs in sweet potato genome might be 15,000 or more.

Diversity of Expressed TEs among Sweet Potato and other Plant Species
As for the integrated-transcriptome database of sweet potato, 3 superfamlies of retro-TEs and 7 superfamilies of DNA-TEs were identified and characterized.The distributions of TE superfamilies in transcriptomes of sweet potato and other species were compared, and some similarities among them were found.For example, Gypsy-TEs, a superfamily of LTR-retro-TEs, account for minority of retro-TEs in Oryza sativa (285 Gypsy-TEs in 1283 retro-TEs, about 22.2%) and the sugarcane expressed sequence tag (SUCEST) database (19 gypsy-TEs in 128 retro-TEs, about 14.8%) [41].This is consistent with what we found in sweet potato (95 expressed Gypsy-TEs in 883 retro-TEs,10.8%).It was reported that these long terminal repeat (LTR) retro-TEs have the ability to trigger TE expression by their cis-regulatory elements in 59 LTR.These regulatory sequences are similar to the well-characterized motifs required for the activation of stress-responsive gene expression [43].Therefore, these TE activations caused by environmental changes could eventually result in mutagenesis in the genome, which may help the organism adapt to new environmental conditions.These TEs also played a key role in translating changes in the external environment into changes at the genomic level.Indeed, Gypsy-TEs were found to respond directly to some specific stress situations [44], but the members of this superfamily in sweet potato and other plant species was few.This is probably due to the fine transcriptional control which makes difficulty for the expression of Gypsy-TEs under normal conditions [45].
On the other hand, the expressed DNA-TEs belonging to superfamily hAT (115 TEs) and Mutator (113 TEs) accounted for 43.7% of all the DNA-TEs in sweet potato, which were similar with the distribution in other higher plants.For example, in Arabidopsis thaliana, the most abundant DNA-TEs in its genome were Mutator-like element (MULE), which reached 108 TEs, accounting for 17.33% among total 623 DNA-TEs [46].It suggested that these TEs may play an important role in genome restructure [47,48].A process called transduplication is a potentially rich source of novel coding sequences, reflecting that the activities of these TEs have a substantial impact on the evolution of new genes in plants, by their capacity to capture and mobilize genes or fragments [49].Just as three thousand Pack-MULEs in rice, they have mobilized fragments of more than 1,000 genes.Many of these gene fragments are likely to be nonfunctional pseudogenes.However, 42% of these retro-genes have recruited new exons to become chimeric genes and show some degree of function through expression [45].
However, the Non-LTR elements in retro-TEs were generally abundant in sweet potato, which is inconsistent with that in Oryza sativa.The reason why the Non-LTR elements were the most expressed retro-TEs in sweet potato will only be determined after a detailed analysis of their complete sequences.

TE Distribution Differences and Environmental Adaptation
The major TE superfamilies reported in other plants are also present in three cultivars of sweet potato, but the family members and their expression levels differ enormously among cultivars.The TE number identified in Jingshu6 cultivar, a purple sweet potato, was extremely low, probably reflecting this cultivar may has distant genetic relationships with other two cultivars.The reasons for such big differences among cultivars need further analysis.Although the total TE numbers identified in Xushu18 and Guangshu87 were almost equal, the types and expression levels of TEs in the two cultivars were obviously different (Table 3).Specifically expressed TEs were existed in both cultivars, and differentially expressed TEs showed varied expression levels between these cultivars.To some TEs expressed in two cultivars, their expression level varied, even to the extent of hundred folds.In addition, the differentially and specifically expressed TEs were also found between the vegetative organs and reproductive organs of XS 18.There were almost 500 TEs expressed specifically within the vegetative organs while ,200 TEs in the reproductive organs.This suggested that the expression of a TE could be changed due to the cultivars, tissues and organs.
The reason for the differential TEs expression level among cultivars may come from the environmental influences on cultivars, such as temperature and rainfall [43].JS 6 is grown up in north china with dry climate and low rainfall, while XS 18 and GS 87 are cultivated in the south of Yangtze River with humid and rainy climate.We can speculate that the growth environment conditions of sweet potato cause the difference in the number of expressed TEs and their expression level.So the TEinduced adaptive mutations suggested a widespread role of TEs in environmental adaptation.As reported, TEs played an important role in the responsive capacity of their hosts in the face of environmental challenges [44].For example, TEs might directly regulate the function of individual genes, provide a mechanism for rapidly acquiring new genetic material and disseminate regulatory elements that can lead to the creation of stress-inducible regulatory networks [50].And stress-activated TEs might generate the raw diversity that species require to survive among different stressful environments.Rather than being redundant, the presence of many TEs and different expression pattern among cultivars are required to overcome the challenges imposed by different environmental conditions [43].

Highly Expressed TE Superfamily and Genome Evolution
The result from TEs identification revealed a surprising amount of expressed TE homologues.It provided an enormous source of variability that can be used to create novel genes or modify genetic functions for genome evolution.The DGE profiling analyses demonstrated that the hAT superfamily had obvious advantages in TE expression levels, because in some highly expressed superfamilies, the expression level of TEs in Mutator was 3.40 TPM in average (the maximum value was 17.27 and the minimum value was 0.07 with 38 members), in CACTA was 1.79 TPM (the maximum was 17.16 and the minimum was 0.08 with 27 members), but in hAT it reached 42.2 TPM (the maximum was 211 and the minimum was 0.32 with 56 members).Particularly, a member (Ib_DTH_29406) of hAT superfamily had the highest average expression level in all 7 tissues of Xushu18.The high transcription activities of TEs in hAT superfamily in sweet potato may be related to their functional specificity.It was reported before that the hAT TEs were a diverse and ancient transposon superfamily which had insertion specificities, suggesting that they may be the most frequent contributors to genome evolution [51].From the earliest discovered TEs like Ac in Zea mays, Tam in Antirrhinum majus, hobo in Drosophila melanogaster to the domesticated TEs like DAYSLEEPER in Arabidopsis thaliana which has been exapted for new function rather than transposition [52][53], these hATs has been verified to have high activity in the process of plant adaptative evolution.In addition to simply altering the gene structure, hATs insertions can lead to some positive or negative regulatory functions.For example, Ac in Zea mays tends to transpose into the 59 ends of plant genes and the promoter and enhancer elements within these TEs could potentially alter gene expression [54].No matter what the gross effects on the overall architecture of genomes caused by hAT are, or the broad range of changes in gene expression and function, from subtle quantitative effects to the rewiring of regulatory networks and the evolution of entirely new genes, it is suggesting that hAT-induced mutations has played a key part in adaptive evolution over longer periods of time in plants [55].Therefore, whether the high expression of hAT TEs is related to the genome evolution in sweet potato should be identified in the future studies.

High Expression Level and Low Genome Copy Number of TEs
TEs have played an important role in determining the size and structure of a complex plant genome.Every aspect of TE life cycle has the potential for genome alteration and somaclonal variation, such as increasing gene copy number and genome size, mobilization and rearrangement of gene fragments and epigenetic silencing of genes, horizontal gene transferring and chromosomal rearrangements [56].It is crucial that TE functional characteristics are essential for explaining the dynamics and evolution of plant genomes.How the TEs acted on the various aspects of chromosome structure and evolution depended on the numbers and predominant types of TEs that expressed [57].According to the results described above, TEs belonging to hAT and mutator superfamilies were dominant at expression level in sweet potato, but their copy number was low in the genome.The reasons for the low copy of individual transposase gene were numerous, one of them is that the total amount of TEs in Mutator and hAT is large, the copy number of each certain TE in genome should be restricted.Thus, it is not surprising that plants devote considerable resources to TE control [45].Table 6.Identification of transposase gene copy number of Sweet potato using real-time PCR-based absolute quantification.However, the TEs in these two superfamilies generally displayed high expression and this is possibly associated with the high transposition effectiveness of the individual TE.The highly expressed TEs were predominantly active with highly mutagenic ability.And the high activity of these TEs may be useful not only for their own transposition effect, but also for providing the active components like transposase for other defective TEs with incomplete structure, such as the SINEs with huge amount and high copy number in genome.Mutation is the ultimate source of genetic variation and TEs are also likely to play a relevant role in adaptation because of their ability to generate mutations of great variety and magnitude, and their capacity to be responsive and susceptible to environmental changes [43].From this perspective, these highly expressed but low copy number TEs could be more important than those TEs which have higher copy number and ordinary expression in the genome alteration and somatic mutation.As reported before, Mutator TEs in Zea mays is by far the most mutagenic plant transposon, causing new mutations at up to a hundred times the spontaneous rate [47].The high transposition frequency and the tendency to insert into low copy sequences for such transposon have made it the primary means by which genes are mutagenized in maize (Zea mays L.) [48].

Plant Material
The sweet potato transcriptome databases used in this study were established from three cultivars in China.These cultivars were different in phenotype, planted area, main uses and so on.XS 18 is the leading cultivar in the Yangtze River Basin of China in terms of annual hectareage and total production with widely growth habit.It has green cordate and slightly toothed leaves, and elliptic roots with red skin and white flesh with purple rings in some places.It is mainly used for starch processing and the production of ethanol [58].The second cultivar is GS 87, a local cultivar planted mostly in South China.Its vines are short, the lobed leaves are green at all stages of growth, the tuberous roots have red skin and orange flesh with good eating quality [59].The third one is JS 6, an important food cultivar which is mainly planted in the north of China.It has spreading growth habit, green and long vines, triangular slightly lobed green leaves, and the roots are spinning and purple outside and inside [60].
Stem cuts of XS 18 were planted in May, 2011, and grown under natural light and temperature in experimental field of college of life sicences in Sichuan University, Chengdu, Sichuan Province of China.The college gave permission to conduct the study on this site and the field studies did not involve endangered or protected species.Samples from young leaves were used for PCR, RT-PCR and real-time quantitative PCR in this study.All tissue samples collected were snap-frozen immediately in nitrogen and stored at 280uC until further processing.

Sweet Potato Transcriptome Reads Data and Databases
The reads used for assembling the sweet potato integrated transcriptome were from four transcriptome databases.The first one was the transcriptome database of vegetative organs of cultivar XS 18, including roots, stems and leaves [19].The reads sequences were obtained in NCBI Short Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra)under accession number SRA043582, and the transcriptome sequences were in http:// cfgbi.scu.edu.cn/SweetPotato/index.php.The second one was the transcriptome database of flowers of cultivar XS 18 and the reads sequences were in SRA under accession number SRA043584 [20].The other two transcriptome databases were established from fibrous and tuberous roots of GS 87 [21] and tuberous roots of JS 6 [22], and the reads sequences were obtained all in SRA with accession number SRA022988 and SRX090758, respectively.

Extraction of RNA and Genomic DNA
Total RNAs were extracted using the Trizol Reagent (Invitrogen, USA), and treated with DNase I (Fermentas, USA) according to the manufacturer's instructions.RNA quality and purity were assessed with OD 230/260 ratio.Total cDNAs were synthesized from RNA with Moloney murine leukemia virus (M-MLV) reverse transcriptase (Invitrogen, CA, USA) using oligo (dT) as primer following the manufacturer's instructions.Genomic DNA extraction was performed using the CTAB method [61].Agarose gel electrophoresis was used to show the integrity of the DNA, while spectrophotometry was employed to display the concentration and cleanliness.

Searching for the TEs
The TEs searching was conducted in three independent screenings.Firstly, keyword searching from the annotated transcripts was used.The transcripts (.200 nt) in the sweet potato integrated-transcriptome database were submitted for annotation through BLAST using Blast2GO software v2.4.4 (http://www.blast2go.com/b2ghome)[27].For BLASTX against the NR database, the threshold was set to E-value#10 25 .Based on the annotation information, TEs were retrieved from the database by keyword searching.One kind of keywords included transposon-related words, such as ''transposon'', ''retrotransposon'', ''transposase'', ''retrotransposase'', ''transposable element'' etc.The other kind of keywords was based on the order names and superfamily names of eukaryotic TEs, like ''Non-LTR'', ''Copia'', ''Gypsy'', ''Mutator'', ''Harbinger'' etc [6].It is noteworthy that Blast2GO may be failed to give accurate annotated information when the firstly hit gene was annotated as hypothetical protein even the other hits were annotated as transposase gene.Therefore, the second method based on sequence similarity was used as a supplementary.The CDS and full-length sequences of fully characterized transposase genes of the higher plants were downloaded from GenBank (mainly from Arabidopsis thaliana, Vitis vinifera, Ricinus communis, Glycine max, Sorghum bicolor, Oryza sativa, Populus trichocarpa, Zea mays, etc.), and compared with transcripts in sweet potato integrated-transcriptome database using BLASTn program.Due to the enormous data and to avoid spurious matches, a very stringent expectation cut-off value (e-10 or better) was used.Thirdly, TEs were searched depending on the homology of the sub-terminal conserved sequence of TE families as reported in the literatures before [28].All the searching results were compared pair-wisely to remove redundant, and detailed manual evaluation was further conducted to exclude the non-TEs like reverse transcription virus etc.The ORF predictions of TEs were carried out by using Galaxy (http://main.g2.bx.psu.edu/)[29,30].

Cloning and Sequencing of Transposase Genes
In PCR and RT-PCR reactions, primers for amplification were designed according to assembled transcripts using Primer Premier 5.0 (PREMIER Biosoft International, CA, USA) and synthesized by GENEWIZ, Inc. (http://www.genewiz.com.cn).Primer sequences are shown in Table S1.The transposase genes were amplified using KOD FX DNA polymerase (TOYOBO, Japan), under the cycling conditions as 94uC for 5 min, followed by 35 cycles consisting of 94uC for 45 s, 60uC for 45 s, and 72uC for 1 min, and a final extension cycle of 72uC for 15 min.The PCR products were fractionated and recovered in a 1% agarose gel, then ligated to 50ng vector pMD-18T (TIANGEN BIOTECH, Beijing, China) using T4 DNA ligase (TAKARA BIO, Japan).Recombinant plasmids were transformed into Escherichia coli JM109 competent cells [62] and clones were picked for validation through colony PCR, plasmid electrophoresis and restriction enzyme digestion (Fermentas, USA).The positive plasmids were sequenced at BGI-Shenzhen, Shenzhen, China (http://www.genomics.cn).

Evolutionary Analysis of Transposase Gene Sequence
Transposase gene sequences of sweet potato were aligned with those of other higher plants in NCBI NR database on the bases of sequence similarity.The transposase genes from various higher plants showing similarity above 80% were screened and their sequences were downloaded from NCBI (mainly from Vitis vinifera, Populus trichocarpa, Glycine max, Zea mays, Sorghum bicolor, Arabidopsis thaliana, Ricinus communis, Oryza sativa, etc.).Using software CluxtalX (2.0) and MEGA4, we determined the evolutionary distance depending on the homologous differences and drew the evolutionary diagram.

Analysis of TEs Expression in Inter-and Intra-cultivars
Expression analyses of TEs among sweet potato cultivars were carried out using the util/alignReads.plscript in Trinity software.In order to obtain the expression level of TEs in different cultivars, we aligned the reads in each of four databases to the Trinityassembled transcripts of the integrated database.As for the expression analyses of TEs in Xushu18 vegetative organ, we used the DGE tag profiling [19].According to all the tags generated from seven sequenced DGE libraries of sweet potato, including YL (young leaves), ML (mature leaves), stem, FR (fibrous roots), ITR (initial tuberous roots), ETR (expanding tuberous roots) and HTR (harvest tuberous roots), we searched CATG with the downstream 17 base pairs in the assembled TEs and the resulted 21 base pairs tags became the new expression tags related to TEs.These tags were mapped to the distinct clean tags in DGE tag profiling of transcriptome and the resulted TEs expression tags were aligned to the TE sequences using Bowtie available at the Galaxy website to detect the expression level of TEs.The edgeR package (Empirical analysis of DGE in R) was used for differential and specifical expression analysis of TEs in DGE tag profiling [63].We normalized tag distribution for gene expression level in each library to make an effective library size and extracted differentially expressed genes (DEGs) with p value 0.05 and log2 fold-change 1.And we compared libraries pair-wise and used hypergeometric test to identify differentially expressed TEs (DETEs) and specifically expressed TEs (SETEs).

Identification of TE Gene Copy Number
Real-time PCR-based absolute quantification was used to identify the copy number of transposase genes in sweet potato genome.The PCR standards of transposase genes and S8e gene was amplified by conventional PCR from sweet potato genomic DNA, respectively, purified using the E.Z.N.A TM Gel Extraction Kit (OMEGA, USA).The purified fragment was cloned into the TA-cloning site of a vector pMD-18T (TIANGEN BIOTECH, Beijing, China).The positive recombinant plasmids were purified using the QIAprep Spin Miniprep Kit (Qiagen) and linearized by restriction enzyme digestion.A dilution series is prepared from the cloned plasmids to provide a measure of absolute standard abundance to generate a standard curve.Linearized plasmid was quantified using a spectrophotometer and copy number was calculated for all standards by the following formula [34][35][36]:

|DNA concentrations g=ml Number of bases pairs|660 daltons
The SYBR Green based real-time PCR primer sets were designed using Beacon Designer 3.0 (Premier Biosoft International, CA).Primers stocks were prepared at 100 mM in TE (10mM Tris, pH 8.0, 1mM EDTA), and working solutions were diluted to 10 mM.All real-time PCR runs were performed in duplicate, and each reaction mixture was prepared using SYBR Premix Ex Taq kit (Takara).PCR amplifications were carried out in a total volume of 20ml, containing 6.4 ml PCR-grade water, 0.8 ml of each primer, 10 ml 26SYBR Premix Ex Taq, and 2.0 ml appropriately diluted template DNA.It is necessary for the unknown target genes to be diluted to a point where the resulting PCR signal follows within the linear range of the standard curve, which must be determined empirically.At least 5-fold dilution may often minimizes potentially interfering substances that inhibit PCR amplification.To minimize pipetting errors and achieve better reproducibility, a master mix of the common components should be prepared and aliquoted into each well of the sample plate.
The thermal cycling protocol was as follows: initial denaturation for 10 min at 95uC followed by 40 cycles of 5 s at 95uC, 5 s at 60uC, and 5 s at 72uC.The fluorescence signal was measured at the end of each extension step at 72uC.After the amplification, a melting peak analysis with a temperature gradient of 0.1uC per second from 60 to 95uC was performed to confirm that only the specific products were amplified.Finally, the samples were cooled down to 40uC for 30s.The baseline and amplification parameters as Ct values in each dilution were analyzed through fluorescence data automatically.The Ct values were plotted against the logarithm of their initial standard copy number.Each standard curve was generated by a linear regression of the plotted points.Real-time PCR amplifications were carried on 11 transposase genes and one single copy gene S8e simultaneously from sweet potato genomic DNA with the standard dilutions in a run.Based on each standard curve, the absolute copy number of 11 unknown transposase genes from genomic DNA per ml had been derived.And then, the transposase gene copy number in sweet potato genome was calculated by dividing the copy number of transposase gene by that of S8e gene.The copy ratio of transposase gene to single-copy S8e gene equals to the transposase gene copy number.These procedures were optimized for 96-well format using a Bio-Rad IQ detection system which use fluorescein as an internal passive reference dye for normalization of well-towell optical variation.

Figure 1 .
Figure 1.Top-Hit species distribution.1,405 BLASTX-hit TE sequences were calculated.More than 50% of the identified TEs have the highest homology with Vitis vinifera, Arabidopsis thaliana, Oryza sativa and Ipomoea tricolor.Less than 5% of the top matches hit sweet potato itself due to the limited number of the sweet potato protein sequences available in the NCBI database.doi:10.1371/journal.pone.0090895.g001

Figure 2 .Figure 3 .
Figure 2. Expressed TE number in different cultivars of sweet potato.TEs were compared pair-wisely to detect the expressed TE number in different cultivars of sweet potato.Each circle represented the TE number expressed in a certain cultivar or tissue, and the cross part meant the coexpressed TE number in both two cultivars.Abbreviations GS87-rTE meant TEs in roots of cultivar Guangshu87, XS18-fTE meant TEs in flowers of cultivar Xushu18 and XS18-vTE meant TEs in vegetative organs of cultivar Xushu18.This overlapping circle diagram was made by auto CAD 2004 software.doi:10.1371/journal.pone.0090895.g002 XS18-vTE: TEs in vegetative organs of cultivar Xushu18.XS18-fTE: TEs in flowers of cultivar Xushu18.GS87-rTE: TEs in roots of cultivar Guangshu87.JS6-rTE: TEs in roots of cultivar Jingshu6.doi:10.1371/journal.pone.0090895.t003

Figure 4 .
Figure 4. Representatives of differentially expressed TEs in different tissues and organs of Xushu18.(a) Differentially expressed TE Ib_DU_31235, which expressed extremely higher in ML than in other tissues.(b) Specially expressed TE Ib_RN_25697, which could not express in all tissues.(c) Differentially expressed TE Ib_RN_6012, which expressed extremely lower in YL than in other tissues.Abbreviations TPM means tags per million reads, the unit of gene expression level.doi:10.1371/journal.pone.0090895.g004

Figure 5 .
Figure 5. Differentially expressed TEs (DETEs) in seven tissues.Seven libraries were compared pair-wisely to detect the differentially expressed TEs between each two libraries respectively by edgeR.21 pairs of comparison were implemented and Up-(black) and down-regulated (red) DETEs were quantified.doi:10.1371/journal.pone.0090895.g005

Figure 6 .
Figure 6.Specifically expressed TEs (SETEs) in seven tissues.Seven libraries were compared pair-wisely to detect the specifically expressed TEs between each two libraries respectively.21 pairs of comparison were implemented and the SETEs between each two samples were quantified in red and black histograms (top vs. bottom).doi:10.1371/journal.pone.0090895.g006

Figure 7 .
Figure 7.The evolutionary tree diagram of 16 transposase genes in the sweet potato.The evolutionary tree diagrams were drawn depended on the gene sequences similarity between the sweet potato and other higher plants.Each species name with an alphanumeric number means the homologous sequences in different species of each TE in sweet potato with the similarity above 80% (labeled with same digital label).The 16 sweet potato TEs were marked in red boxes.(a-c) the phylogenetic relationships of TEs in three superfamilies named Mutator, PIF-harbinger and hAT.(d) the evolutionary relationships of the 16 TEs with each other.doi:10.1371/journal.pone.0090895.g007

Figure 8 .
Figure 8.The SYBR Green I based real-time PCR for absolute quantification.Amplification graph and standard curve constructed using serially diluted DNA plasmid standards of (A) S8e gene and (B) transposase gene Ib_DTM _FAR14362, each ranging from 10 6 to 10 9 copies per microliter.The baseline and amplification parameters as Ct values in each dilution were analyzed through fluorescence data automatically.doi:10.1371/journal.pone.0090895.g008

Table 1 .
Characteristics and details of four primary transcriptome databases.

Table 2 .
Transposable elements identified and collected in the integrated-transcriptome database of sweet potato.

Table 3 .
Classifications of the TEs in different sweet potato cultivars.

Table 4 .
Statistics of the digital gene expression (DGE) tag profiling of TEs in the main superfamily.

Table 5 .
Comparison of predicted and measured transposase genes from the genome of Sweet potato.