Genome-Wide Analysis of the “Cut-and-Paste” Transposons of Grapevine

Background The grapevine is a widely cultivated crop and a high number of different varieties have been selected since its domestication in the Neolithic period. Although sexual crossing has been a major driver of grapevine evolution, its vegetative propagation enhanced the impact of somatic mutations and has been important for grapevine diversity. Transposable elements are known to be major contributors to genome variability and, in particular, to somatic mutations. Thus, transposable elements have probably played a major role in grapevine domestication and evolution. The recent publication of the complete grapevine genome opens the possibility for an in deep analysis of its transposon content. Principal Findings We present here a detailed analysis of the “cut-and-paste” class II transposons present in the genome of grapevine. We characterized 1160 potentially complete grapevine transposons as well as 2086 defective copies. We report on the structure of each element, their potentiality to encode a functional transposase, and the existence of matching ESTs that could suggest their transcription. Conclusions Our results show that these elements have transduplicated and amplified cellular sequences and some of them have been domesticated and probably fulfill cellular functions. In addition, we provide evidences that the mobility of these elements has contributed to the genomic variability of this species.


Introduction
The grapevine (Vitis vinifera L.) is a widely cultivated crop that has accompanied the development of human culture since its domestication in the Neolithic period (c. 8500-4000 BC). Cultivated grapevine (Vitis vinifera spp. sativa) is supposed to have been domesticated from wild grapevine populations (Vitis vinifera spp. sylvestris Gmelin) in the Near East, from where its culture expanded through Europe [1], although recent results suggest that different domestication events took place in both East and West Europe [2,3]. The domestication of grapevine has undergone a selection for traits important for its cultivation and usage (e.g. vigor, hermaphrodite flowers, berry content and size, cluster structure). Although sexual crossing has been a major driver of grapevine evolution, its vegetative propagation enhanced the impact of somatic mutations and has been important for grapevine diversity. Clonal selection of superior individuals identified by growers has led to many clones with different phenotypes while maintaining the same cultivar [4]. Some of these mutations exist and are maintained in a chimeric state affecting only single cell layers [5], the phenotype of the plant being the result of the combination in different cells of two different genotypes.
Transposable elements (TEs) are known to be major contributors to genome variability and, in particular, to somatic mutations. Plant genomes contain high albeit variable amounts of TEs that account for 15-80% of their genome. Most plant TEs are activated in somatic cells by different biotic and abiotic stresses including wounding, and they are usually silent in germinal cells, which limits their mutagenic capacity and their ability to colonize plant genomes (e.g. [6]). The propagation of grapevine includes layering (in the native habitats), cutting of dormant and green shoots, grafting and sometimes tissue culture steps. This practice enhances the impacts of somatic mutations and possibly increases the chance of TEs to transpose and multiplicate. Thus, TEs could have been a major force creating the variability used for grapevine breeding from its domestication to present times. Indeed, the skin color in white grapes, a highly desired trait for grape berry and wine quality, has been shown to be the consequence of a retrotransposon insertion in the promoter of a Myb-related gene that regulates anthocyanin biosynthesis [7]. This mutation is present in most white grape varieties [8,9].
Transposable elements are usually classified in two major groups based on their structure and transposition mechanism: Retrotransposons or class I elements, which transpose by an RNA intermediate, and class II or DNA transposons, which use an intermediate of DNA. Up to now, in addition to Gret1, the element responsible for the grape color phenotype, two other retrotransposons have been characterized in grapevine [10,11]. On the contrary, although there is a handful of sequences of grapevine class II elements deposited in the Repbase database (www.girinst. org) up to now no DNA transposon has been characterized in detail in this plant.
Recently, two articles describing the Vitis genome have been published [12,13] and shotgun sequences of grapevine genome have been made available opening the possibility for a genomewide bioinformatical analysis. We present here a global and detailed analysis of the ''cut-and-paste'' class II transposons present in the genome of Vitis vinifera L. We characterized 1160 potentially complete grapevine transposons as well as 2086 defective copies. Our results show that these elements have transduplicated and amplified cellular sequences and some of them have probably been domesticated (i.e. have lost their ability to transpose and fulfill cellular functions, as a conventional cellular gene). In addition, we provide evidences of recent mobility of some of these elements showing the high mutagenic capacity of grapevine transposons and their capacity to induce genomic variability in this species.

Results and Discussion
The ''cut-and-paste'' transposon landscape in Vitis vinifera Most class II transposons excise from the donor site as doublestranded DNA which is reinserted elsewhere in the genome by a mechanism usually known as ''cut-and-paste'' transposition. The only class II elements that transpose by a different mechanism are Helitrons and related elements, that transpose by rolling-circle replication, Mavericks, whose transposition mechanism is not yet known [14], and the bacterial IS200/605 family of insertion sequences that transpose as a single stranded transposon circle [15,16]. ''Cut-and-paste'' class II transposons typically contain terminal inverted repeats (TIRs) and encode a transposase that catalyses their mobilization. The sequence and structure of the transposase together with the sequence of the TIRs recognized by this protein and the characteristics of the flanking target site duplication generated by the transposase upon inserting the element has been used to classify class II elements in ten different superfamilies: CACTA, hAT, Merlin, Mutator, P element, PIF, piggyBac, Tc1/Mariner, Transib and Banshee [14,17,18]. In plants, only elements belonging to the CACTA, hAT, Mutator, PIF, and Tc1/ Mariner superfamilies have been described to date [14].
We searched the grapevine genome sequence for the presence of class II transposons of the five superfamilies by means of blastx searches of the shotgun sequences made publicly available by Velasco et al. [13] and using the sequences made available later by Jaillon et al. [12] for confirmation (see Materials and Methods section for details). We have not been able to detect any grapevine sequence that could represent a Tc1-Mariner element. Although few sequences with very limited similarity (below the threshold set) to these elements exist, they probably represent old defective elements and were not included in this analysis. We found representatives of the other superfamilies of elements: CACTA, hAT, Mutator, PIF. We have characterized a total of 1160 potentially complete DNA transposons, as well as 2086 defective elements, which altogether represent 1.98% of the Vitis genome ( Table 1).
The two recent reports on the draft sequence of the genome of Vitis vinifera spp. sativa contain a general analysis giving an overview of the transposon content in this genome [12,13]. Both reports predict higher copy numbers of DNA-transposon-related sequences (6,344 and 9,562 respectively) compared to our results, but with substantially lower transposon content in terms of genome fraction (0.43% and 1.6% respectively). The reported mean length of the described copies is low (0.3 Kb/element and 0.9 Kb/element respectively), possibly because the characterized sequences are limited to the well conserved coding regions of TEs and thus miss most of the transposon sequences which are non-coding. We have performed a stringent search and have characterized these elements in their full sequence (up to the TIRs when present) omitting only TEs deleted copies representing less than 20% of the length of the complete TE representative for each family. Employing these parameters for analysis is crucial to research the structure and possible mobility of TEs, and analyze their capacity to transduplicate sequences or become domesticated. Our analysis shows the mean TE length of 3.3 Kb/element, which is more than three times bigger when compared with previous reports.
In order to get insight on the evolutionary dynamics of class II TEs in grapevine we conducted a detailed TE analysis: For each superfamily we have compared the protein sequence of the putative transposase of all elements containing a transposase conserved region characteristic of this superfamily (see Methods for details). Maximum likelihood trees were generated from protein sequence alignments which allowed us to define different families for each transposon superfamily. We have analyzed the presence of STOP codons and frameshifts in the potential ORFs as well as the existence of ESTs in the grapevine databases that could suggest transcription of transposases and possible transpositional activity. Defective elements were identified for each family by blastn analyses using representatives of complete TEs as queries.
hAT is the most prevalent superfamily of transposons in grapevine We have found 1459 hAT-related elements in the grapevine genome, which makes hATs as the most prevalent ''cut-and-paste'' transposon family in grapevine in terms of copy number ( Table 1). The phylogenetic analysis of these elements showed that they can be grouped in different families ( Figure 1 and Table 2  S1). Most of these families include a high copy number of both potentially complete and defective elements. Single copy elements were found as well. These elements possibly represent domesticated transposases and are discussed in a separate chapter (see below). The hAT elements belonging to the high copy number families contain TIRs of 8-23 bp, with sequences similar to that of typical hATs [19], and are flanked by TSDs of 8 bp, as expected for elements of this superfamily [19]. The hAT superfamily is relatively ancient and is widespread in eukaryote genomes [19]. Thus, the high variability of grapevine hATs, and the high proportion of defective elements is not unexpected. However, our results show that some grapevine hAT families contain potentially complete elements with the capacity to encode a transposase ( Table 2), suggesting that some hATs could have maintained the capacity to transpose. This is the case of Hatvine-1, Hatvine-2, Hatvine-7, Hatvine-9 and Hatvine-10 families that contain a high number of potentially complete elements with intact ORFs and match to transcripts in the grapevine EST collections ( Table 2).
CACTA is the less active superfamily of transposons in grapevine CACTA elements are the most abundant class II elements in Brassica oleracea [20] and also seem to be highly abundant in Triticum [21] while they are much less abundant in Arabidopsis [20] where they have been found almost exclusively in pericentromeric regions [22]. In grapevine we have found only 364 CACTA elements, one third of which are potentially complete (Table 3 and Dataset S2). However, as grapevine CACTAs are very long (ranging from 10 to 25 Kb) these elements account for a significant fraction of the grapevine genome (0.34%). The high diversity of the CACTA superfamily in grapevine, which can be divided in at least nine different families, and the low number of elements having an intact transposase-encoding ORF, suggests that grapevine CACTA are relatively old elements, and most of them are probably defective. Moreover, grapevine databases contain a low number of EST sequences corresponding to the CACTA elements described here, suggesting that most of them are probably silent at present. Of the nine CACTA families only Cactavine-2, Cactavine-5 and Cactavine-13 seem to have retained the capacity to be transcribed (Table 3). Interestingly these subfamilies are phylogenetically related and may have arisen recently during grapevine evolution ( Figure 2).

Grapevine contains elements of the three major MULE families MuDR, Jittery and Hop
The Mutator superfamily (named after the Mutator (Mu) element in maize [23]) is a highly abundant and diverse superfamily of class II elements in plants [24]. Elements belonging to the Mutator superfamily are generally called Mutator-like elements (MULEs). They are the most abundant transposons in many plant genomes such as Arabidopsis thaliana [20], Lotus japonicus [25] and Oryza sativa [26,27]. While most autonomous MULEs encode a protein similar to the MURA transposase of the MuDR transposon (the autonomous version of the maize Mu element), two other families of MULEs distantly related to MuDR have been recently reported in plants. The Jittery family described in maize [28] and shown later to be present also in other plants [25] and a family related to the fungal Hop element [29] which in plants has so far only been found in legumes [25]. As the three subfamilies are only distantly related we have performed an independent search for MuDR-like elements and for elements related to the Jittery and Hop subfamilies. A high  Table 2. List of hAT-related families of transposons characterized in Vitis vinifera.  Table 3. List of CACTA-related families of transposons characterized in Vitis vinifera.  (Table 4 and Dataset S3). We have characterized a total of 1172 MULEs belonging to high copy number families, 30% probably corresponding to full-length elements ( Figure 3 and Table 4). Most MuDR-like elements belonging to the high copy number families lack an intact transposase-encoding ORF and very few of them are represented in the grapevine EST collections ( Table 4), suggesting that they are old elements that mostly have lost the capacity to transpose. The Mutavine-1 and Mutavine-17 families could be exceptions as judged by the number of ESTs corresponding to these elements found in the grapevine databases and the existence of several elements with conserved transposase ORFs (Table 4). We have only been able to find the TSDs for a subset of MULEs, probably because of the older age of grapevine MULE insertions. However when present the TSD are always of 9 nt which is typical for MULEs in other plant genomes. Typically, MULEs have long TIRs, although a fraction of them do not [30,31]. 40% of the MULEs reported here (Mutavine-5, Mutavine-6, Mutavine-11, Mutavine-13, Mutavine-14 and Mutavine-17 families) do not contain TIRs, which is similar to what has been reported for Arabidopsis where one third of the MULEs are devoid of TIRs [30,31]. Some of these MULE families are relatively old, and the absence of recognizable TIRs could simply be due to the effect of mutations. Nevertheless in some cases, like for the Mutavine-6 family, clear 9 nt-long TSDs were found, suggesting that these elements were mobilized in spite of their absence of TIRs, confirming the evidence found in Arabidopsis that non-TIR MULEs could be mobile [31]. It is interesting to note that the grapevine non-TIR MULE families do not form a monophyletic branch in a transposase-based tree ( Figure 3A), suggesting a different phyloge-netic history of the transposase-encoding sequences and the TIRs. This stresses the enormous variability of MULEs and their particular evolutionary dynamics [24].
In addition to the MuDR-like MULEs, we have found two multi-copy families of the MULEs phylogenetically related to Jittery-like elements and one multi-copy family, Hopvine-1, phylogenetically related to Hop, ( Figure 3B). While Jittery elements have been found to be present in various plant genomes, up to now Hop-like transposons were found only in fungi and in legumes, and it has been proposed that they may have arisen during the emergence of the legume family through an ancient horizontal transfer event between fungus and legume ancestor [25]. Our results show that the Hop family of MULEs is more widely distributed in plants than previously thought and suggest that if these elements have been introduced into plants by fungal infections, these would have occurred several times in the evolution and would affected different plant genera. Alternatively, Hop elements may be an old family in plants that has been lost in most genomes except in legumes and some other species like Vitis vinifera. The fact that none of the 9 copies of Hopvine-1 contains an uninterrupted ORF potentially coding for a transposase and that we have not detected any corresponding EST in the grapevine databases suggest that these elements are relatively old and have lost their capacity to be expressed and to transpose. On the contrary, the two Jittery-like families here characterized Jitvine-1 and Jitvine-2, are expressed and could have maintained their capacity to transpose. Both families (particularly Jitvine-1) contain elements potentially coding for a transposase and the grapevine databases contain several ESTs that could correspond to these elements (Table 4).   Grapevine contains potentially active PIF but not Pong elements We have found a total of 236 PIF/Pong-related sequences in the grapevine genome. Pong elements have been shown to have undergone recent amplification in Arabidopsis and to a higher extend in Brassica oleracea whereas PIF elements have not been significantly amplified in both genomes [20]. The opposite was found in the genome of grapevine: PIF elements have attained a moderate copy number while no Pong element has been maintained in this genome (Figure 4). The analysis of the 236 grapevine PIFs shows that 93 of these elements are potentially complete, 24 of which have intact ORFs (Table 1 and 5; Dataset S4), which is the highest proportion of intact ORFs among all superfamilies analyzed in our study and strongly indicates that PIF elements have amplified recently during grapevine evolution. The phylogenetic analysis show that the grapevine PIFs group into four families and do not plot together to the families previously defined in other plant genomes [32] (Figure 4). This confirms a recent grapevine specific amplification of PIF elements. Moreover, these elements have conserved TIRs and TSDs (mostly TAA or TTA trinucleotides), have maintained the capacity to code for a transposase as well as the second ORF usually found in PIF elements and known as ORF1 or PIFp2 [32][33][34] (Table 5) and the grapevine database contains a relevant number of ESTs corresponding to PIF elements, especially from the Pifvine-3 and Pifvine-4 families ( Table 5) confirming that these elements are transcribed and potentially active. Transduplicated cellular gene fragments are present in all superfamilies of Vitis class II elements Transposons can capture host genome sequences and mobilize and amplify them together with their own sequences in a process known as transduplication. Although most of these captured gene fragments seem to be non-functional pseudogenes [31], it has been recently reported that in some cases transduplicated exons could be incorporated into host transcripts by alternative splicing giving rise to new host proteins [35]. Even having lost their coding capacity, transduplicated sequences may undergo transcription and have a regulatory function [31].
MULEs have been shown to frequently capture gene fragments and form Pack-MULEs [36]. MULEs containing transduplicated gene fragments have been reported in Arabidopsis [31,37], Lotus japonicus [25], melon [38], and rice, were they reach a very high copy number [26,36]. A particular case is the Arabidopsis KAONASHI-MULE (KI-MULE), a non-TIR MULE found in high copy number that contains a cystein protease domain of 200 amino acids found in ubiquitin-like protein-specific protease (ULP) [31]. In KI-MULEs, the ULP protease domain is found in the reverse orientation with respect to the mudrA gene. However, examples of ULP-containing MULEs in both direct and reverse orientation have been described also in melon and rice [38]. In addition, the ULP domain in melon can be found in TIR-MULEs and in the distantly related Jittery-like MULEs [38]. Our results show that several MULE families identified in grapevine contain sequences with high similarity to ULP genes downstream of the TPase encoding ORF. The ULP coding sequence is found in both orientations in both TIR-MULEs and non-TIR MULEs (Table 4). In addition to MuDR-like MULEs, some Jittery-like families of grapevine MULEs also contain ULP coding sequences downstream of the transposase ORF (Figure 3). The MULE families containing ULP sequences did not form a monophyletic group (Figures 2A and 2B). In fact, the ULP sequences are found in distantly related elements (MuDR-like and Jittery-like), being  absent in other closely related families, and their presence does not correlate either with the presence or the absence of TIRs, suggesting that ULP transduplication by MULEs is a frequent phenomenon that has occurred independently several times during plant genome evolution. Alternatively, ULP sequences may be frequently lost from MULEs. In addition to MULEs, CACTA elements have also shown to transduplicate cellular genes [39,40], although up to know none has been reported to contain an ULP transduplicated domain. We have found ULP domains in five CACTA families (Cactavine-2, Cactavine-3, Cactavine-4, Cactavine-5 and Cactavine-13). We have searched in NCBI for proteins containing the same conserved domain structures as the CACTA-ULP found in grapevine and found several proteins from rice that have the Tnp2 and the ULP1 domains. Therefore it appears that CACTA-ULPs are common in plants (although perhaps not equally abundant or functional in all genomes since we did not find any similar proteins in Arabidopsis or Medicago which are genetically closer to Vitis than rice is). This also suggests a special ''affinity'' of the ULP domain to transposons in general.
ULP transduplication is only one example of transduplication. Other genic or non-genic sequences could be ''captured'' by TEs. For example, in the Mutators we have found a family containing intronic and exonic sequences of a putative cellulose synthase gene ( Figure 5). In the CACTAs, two copies of the Cactavine-5 family contain part of the coding sequence and the 39 untranslated region of a gene encoding for an unknown protein that contains a pentatricopeptide repeat (PPR) domain ( Figure 5). This sequence, located downstream of the transposase encoding ORF is found in opposite orientation, and in case of being transcribed from the transposon promoter, would give rise to a transcript antisense to PPR genes with potential regulatory functions.
Although transduplication has only been reported for MULEs and CACTA elements in plants, the fact that some of the PIF and hAT elements here described are unusually long has prompted us to analyze whether these elements contain transduplicated sequences as well. We have analyzed the elements of the Pifvine-3 family because they very frequently contain a long 59 region (up to 3.5 kb) that do not correspond to the canonical ORF1 nor transposase coding regions characteristic for these elements. The analysis of these sequences showed that in most cases they share high sequence identity to grapevine genome sequences (including exons and introns) ( Figure 5). These transduplications are shared in some cases by multiple copies suggesting that they do not inactivate the transposition of PIF elements. Elements of the hATfamily Hatvine-6 share a similar transposase coding sequence and the TIRs, but the rest of the sequence is often unique, or it is shared by only few elements. Analysis of the variable region of Hatvine-6 elements revealed that these sequences often share high sequence identity to genic (introns and exons) as well as non-genic grapevine sequences ( Figure 5).
Our results show that transduplications are common in grapevine TEs of all superfamilies. We suggest that most plant TEs share this ability as well. Because of their complicated structures and the difficulties to assemble an automated pipeline for their detection, transduplication events are not routinely reported in TE analyses. Thorough analyses, such as the one presented here, are needed to correctly characterize TEs and describe phenomena like the transduplication of cellular sequences.

MULE and hAT domesticated transposons
Transposons can lose their ability to transpose and be a source of cellular genes in a process known as domestication. Transposases are specific DNA-binding proteins that catalyze DNA cleavage and strand transfer reactions needed for transposition. Both the DNA binding and the catalytic activity of transposases can be domesticated to give rise to cellular genes [41]. Examples of plant domesticated transposases are the Arabidopsis transcription factors FAR1 and FHY3, derived from MULE transposases [42,43] or DAYSLEEPER, a gene essential for Arabidopsis development which probably encodes a transcription factor derived from a hAT transposase [44]. Other domesticated transposons of unknown function are the MUSTANG and the Gary elements, the former originated from MULE and the later from hAT transposons [45,46]. Domesticated transposons are not able to transpose, and for this reason they are in general present as single-copy genes and do not contain TIRs or TSDs.
Five hAT-like sequences found in our search are present in single copy and lack TIRs and TSDs: Vinesleeper-1, Vinesleeper-2, Hatvine-4, Hatvine-5 and Hatvine-8. The Vinesleeper-1 and Vinesleeper-2 elements are phylogenetically closely related to the Arabidopsis DAYSLEE-PER ( Figure 1) and one of them could be its grapevine orthologue. All 4 ESTs corresponding to Vinesleeper-1 derive from flower tissues and most of the 11 ESTs corresponding to Vinesleeper-2 are obtained from different tissues of different developmental stages (Table S1) which suggest a pattern of expression for both genes compatible with a developmentally related function similar to that of DAYSLEEPER from Arabidopsis [44]. The fact that the grapevine genome contains two potential orthologues for DAYSLEEPER suggests that this gene has been duplicated during grapevine evolution and, because of different numbers and origins of corresponding ESTs, the two genes might have diverged to fulfill specialized functions. The other putative domesticated hAT-like transposases Hatvine-4, Hatvine-5, and Hatvine-8 are not phylogenetically related to DAYSLEEPER nor the previously characterized Gary element [46]. Hatvine-8 has a non-functional and partially deleted TPase gene which did not allow its alignment and phylogenetical analysis with other members of the hAT superfamily, while Hatvine-4 seems to lack a start codon in its ORF. However, Hatvine-5 has an intact ORF which matches to transcripts deriving from berry tissue (Table S1) that could be compatible with this element being a domesticated transposase with a function in fruit-related processes.
We have also found MULE-related sequences as candidates for domesticated transposases because of their presence in single copy and lack of TIRs or TSDs (Table 4). These elements belong to the MuDR, Jittery and Hop families. The MuDR-like elements are phylogenetically closely related to the MUSTANG elements previously described in Arabidopsis and sugarcane [45,47] ( Figure 3A) and could be the grapevine orthologues of these genes. We have found grapevine ESTs accumulating in different organs and parts of the plant matching to most of these elements (Table 4 and Table S1) which suggests a pattern of expression similar to that of the Arabidopsis and sugarcane MUSTANGs [45,47]. Five single copy elements belonging to the Jittery family (named Jithouse) have been identified ( Figure 3B and Table 4) to potentially encode for proteins containing the three domains found in FAR1/FHY3-domesticated transposases (N-terminal C2H2type zinc-chelating motif of the WRKY-GCM1 family, a central putative core transposase domain and a C-terminal SWIM motif [43]). A recent report has identified 4 out of 5 elements described here as FRS3-related FAR1/FHY3 genes [43]. Although the sequence of Jithouse-4 was not included in that report, its phylogenetical relationship to the other four elements ( Figure 3B) suggests that this is also a FAR1/FHY3-related domesticated transposase. Finally, we found one potential domesticated transposase of the Hop family, the Hopvine-2 element present in a single copy and lacks TIRs and TSDs flanking the coding region. The corresponding EST matching to its ORF suggests that Hopvine-2 be a transposase-related functional gene.
Although the number of ESTs present in grapevine databases is limited for extended expression pattern studies of each putative domesticated element identified, we think the specific nature of these elements could be confirmed. TEs are induced under stress situations, while domesticated transposons lack such a biased expression, most domesticated transposases playing a role in developmentally related processes. 22% of the ESTs corresponding to the putative domesticated transposases here described belong to EST collections obtained from stressed material, which is almost exactly the percentage of the stress-related EST collections in the total grapevine EST databases (23%). Contrastingly, 77% of the ESTs corresponding to potentially mobile transposons are obtained from stressed material which is significantly more than expected (x 2 test, pvalue,0.0001). This difference in expression confirms the classification as true transposons and domesticated transposases made here based on molecular characteristics.

Insertion polymorphisms of grapevine cut-and-paste transposons revealed by PCR
The results presented here show that a high number of grapevine transposons have maintained the capacity to encode a transposase and are expressed under particular situations, suggesting that they may have retained the capacity to transpose. In order to get more information on the possible mobility of these elements, we looked for insertion polymorphisms of eleven of these elements among seven grapevine cultivars. We have also included in this analysis four putative domesticated elements which are supposed to have lost their ability to transpose. The presence of a given element at a particular location in the genome was revealed by a PCR amplification using a primer complementary to the internal region of the TE and a primer designed in the flanking region. To check for the absence of a given element at a particular location we performed PCR amplifications with two primers complementary to the regions flanking the element at both sides (see Materials and Methods for details). Some randomly chosen bands were sequenced to confirm the nature of the amplification products.
None of the four putative domesticated transposases analyzed showed insertion polymorphisms ( Figure 6, bottom panel). Taking into account the high heterozygosity of grapevine this result suggests that domesticated transposons fulfill important cellular roles and have been under strong selective pressure for their maintenance. On the contrary, all but one (Hatvine-7.1) transposon insertions analyzed are polymorphic (8 examples are shown in Figure 6, top and middle panels). This could suggest that most transposon insertions are not under strong selective pressure and are randomly distributed among cultivars. Alternatively, this result may also indicate that some of these insertions are recent and have not had time to become fixed. In particular, Pifvine-2 insertions could be relatively recent (possibly after the domestication of grapevine), as only two out of seven cultivars contain the insertion at this particular locus ( Figure 6). In some cases we obtained multiple bands, or products with unexpected sizes. The sequence of the unexpectedly small bands of the Pifvine-2 empty sites (for samples 4 and 6) and the unusually bigger band of the Pifvine-3 empty site (sample 5) revealed sequence polymorphisms  (6) and cabernet Carbon (7). ''+'' indicate the insertion at a given locus, while ''2'' indicate an empty site. Arrows indicate the expected size of the band. Numbers are grapevine cultivars (in the same order as given in Table S2). doi:10.1371/journal.pone.0003107.g006 unrelated to the transposition of the elements here reported. In the case of Pifvine-2 we found a 154 bp-long deletion present 216 bp downstream of the target site, while in the case of Pifvine-3 there is an insertion of a putative SINE element (155 bp-long with 13 bplong TSDs) 22 bp after the target site.
This results thus show that a high proportion of grapevine ''cutand-paste'' transposons have recently transposed during grapevine evolution, accompanying its domestication and breeding processare polymorphic and contribute to the high variability of grapevine genome.

Conclusions
We have performed a detailed analysis of the ''cut-and-paste'' transposons of Vitis vinifera L, and found that this genome contains elements belonging to four of the five superfamilies of elements described in plants, hAT, CACTA, Mutator and PIF. hAT and Mutator superfamilies are the most prevalent in grapevine, while CACTA is probably the superfamily that has had the less activity in the recent grapevine genome evolution. The presence of TSDs, intact ORFs and high number of corresponding ESTs, as well as the high frequency of insertion polymorphisms among different grapevine cultivars show that these elements have transposed recently during grapevine evolution and suggests that some of them may have retained the capacity to transpose. On the contrary, the genome of grapevine also contains an important number of domesticated transposases belonging to different superfamilies that have lost the ability to transpose and probably fulfill cellular functions. Additionally, we found that transduplication of gene fragments is not restricted only to MULEs and CACTAs but can occur in other superfamilies as well. Our results show that, as in most complex genomes, TEs have made an important contribution to grapevine genome evolution and variation today.

Transposon mining
We performed our analyses using the whole genome shotgun sequences of the grapevine genome made available at NCBI by Velasco et al. in January 2007 [13]. Sequences from Jaillon et al. [12] were made available at NCBI after we had started with our analyses and were used as confirmation references. As a first approach to characterize grapevine class II ''copy-and-paste'' transposons we used a homology-based strategy to look for sequences with similarities with known transposases. We retrieved protein sequences of plants from NCBI (in May 2007) using keywords as ''transposase'' or class II superfamily names like ''Mutator'', ''MUDRA'', ''CACTA'', ''hAT'' etc. We grouped the retrieved transposase sequences into belonging superfamilies and performed a blastx search [48] with the grapevine genome shotguns as queries. We considered all shotguns having an evalue lower than 1610 250 for their best TPase hit. These shotguns were manually checked and the putative TPase was analyzed. TPase genes were characterized by blastx of the shotgun of interest to the whole NCBI protein database. In this way, similarities with nonannotated proteins could be determined as well. As both [12] and [13] performed computational gene predictions, the NCBI contains a significant number of predicted (but not annotated) Vitis proteins which were useful to precisely determine the borders of putative TPase for each TE family analyzed. The TPase regions with several kb of flanking sequence were blasted against the whole Vitis shotgun database to determine the full length or the borders of the element. TIRs were manually looked for, or by using the FastPCR software (Kalendar 2006, www.biocenter.helsinki.fi/bi/programs/fastpcr. htm). By blasting the putative full length element to the Vitis whole genome shotgun database we could also find non-autonomous or deleted elements of the same family which have lost the TPase gene. To quantify all sequences belonging to the same family we used a full length element as query and considered all fragments with at least 80% identity and having at least 20% of the query length. We used the rule of .80% sequence similarity to group elements into the same family.

Phylogeny of the TEs
Each TE superfamily was phylogenetically analyzed to determine the number and relationships of the families and to compare them to some known elements from other plants. We aligned amino-acid sequences of conserved TPase regions using ClustalW algorithm [49] implemented in the BioEdit software [50]. PHYML software [51] was used to build phylogenies using maximum likelihood with the JTT model of evolution, four substitution rate categories, fixed proportion of invariable sites and non parametric bootstrap analysis of 100 replicates.
For the hAT superfamily we used a 39 aa-long region as in [52]. For comparison with -Vitis elements -we included the following hAT TEs in the phylogenetical tree: AC9 (accession No X05424), Bg (accession No X56877), Tag1 (accession No AAC25101), Tam3 (accession No X55078). We also included the domesticated TPases DAYSLEEPER [44] and r-gary1 [46]. The multiple alignments are given in Dataset S5.
For the Mutator superfamily we used amino-acid fragments homologous to MURA between positions 468 and 640 as in Saccaro et al., [47] . For comparison with Vitis elements we included MURA, TE165, OsMUG1, SCMUG263, SCMUG228, AtMUG1, AtMUG05a, AtMUG05b, AtMUG03b, AtMUG03c [47] and MuDR2_OS [53]. The multiple alignments are given in Dataset S7. Comparison between MuDR-like and Jittery/Hop-like elements was possible only by comparing the amino-acid fragments homologous to Jittery TPase (accession No AAF66982) between positions 217 and 343 and Hop (accession No AAP31248.1) between positions 203 and 331. The only MuDRlike elements form Vitis that could be aligned with Jittery and Hop were Mutavine-1, 12 and 14 as well as MUGvine-5. The multiple alignments are given in Dataset S8.
For the PIF superfamily we used amino-acid fragments as described in Figure 1 in Zhang et al. [32] . For comparison with Vitis elements we included Os_Pong and Os_Ping and representatives from each PIF cluster from the Figure 3 in Zhang et al. [32]: HvBF628721 for cluster A1, ShAY362818 for cluster A2, AtAC007123 for cluster A3, LjAP004528 for cluster A4, Zm_PIF for cluster A5, BoBH561775 for cluster B, BoBH485472 for cluster C and ZmAF072725 for cluster D. In addition we included Harbinger [54]. The multiple alignments are given in Dataset S9.
All trees were visualized using MEGA version 3.1. [56] Submission to Repbase Reports For some families having true full length individual copies (with TSDs and/or TIRs and the coding region) consensus sequences were created and submitted to Repbase Reports (http://www. girinst.org/repbase/). Names were changed according to the new Repbase nomenclature (Tables 1-5).

Plant material
A list of samples and their source is given in Table S2. DNA from all samples was extracted using E.Z.N.A. SP Plant DNA Mini Kit (Omega Bio-tek).

PCR analysis
Primers were designed using FastPCR software (Kalendar 2006, www.biocenter.helsinki.fi/bi/programs/fastpcr.htm). Each primer was blasted against the whole Vitis genomic database to check for specificity. The list of primers is given in Table S3. PCRs were done in 20 ml reaction volumes using approximately 30 ng of template DNA, 0.5 ml of each primer (10 pmol/ml), and TaKaRa Ex Taq in the following conditions: 94 uC?2 min 21 +406(94 uC?25 s 21 , 59 uC?45 s 21 , 72 uC?1 min 21 )+72 uC?5 min 21 . PCR products were run in 1.2% agarose gels with EtBr in a 16 TAE buffer and visualized under UV light.