Retrotransposons with long terminal repeats (LTR-RTs) are widespread mobile elements in eukaryotic genomes. We obtained a total of 81 partial LTR-RT sequences from lentil corresponding to internal retrotransposon components and LTRs. Sequences were obtained by PCR from genomic DNA. Approximately 37% of the LTR-RT internal sequences presented premature stop codons, pointing out that these elements must be non-autonomous. LTR sequences were obtained using the iPBS technique which amplifies sequences between LTR-RTs. A total of 193 retrotransposon-derived genetic markers, mainly iPBS, were used to obtain a genetic linkage map from 94 F7 inbred recombinant lines derived from the cross between the cultivar Lupa and the wild ancestor L. culinaris subsp. orientalis. The genetic map included 136 markers located in eight linkage groups. Clusters of tightly linked retrotransposon-derived markers were detected in linkage groups LG1, LG2, and LG6, hence denoting a non-random genomic distribution. Phylogenetic analyses identified the LTR-RT families in which internal and LTR sequences are included. Ty3-gypsy elements were more frequent than Ty1-copia, mainly due to the high Ogre element frequency in lentil, as also occurs in other species of the tribe Vicieae. LTR and internal sequences were used to analyze in silico their distribution among the contigs of the lentil draft genome. Up to 8.8% of the lentil contigs evidenced the presence of at least one LTR-RT similar sequence. A statistical analysis suggested a non-random distribution of these elements within of the lentil genome. In most cases (between 97% and 72%, depending on the LTR-RT type) none of the internal sequences flanked by the LTR sequence pair was detected, suggesting that defective and non-autonomous LTR-RTs are very frequent in lentil. Results support that LTR-RTs are abundant and widespread throughout of the lentil genome and that they are a suitable source of genetic markers useful to carry out further genetic analyses.
Citation: Rey-Baños R, Sáenz de Miera LE, García P, Pérez de la Vega M (2017) Obtaining retrotransposon sequences, analysis of their genomic distribution and use of retrotransposon-derived genetic markers in lentil (Lens culinaris Medik.). PLoS ONE12(4): e0176728. https://doi.org/10.1371/journal.pone.0176728
Editor: Ruslan Kalendar, University of Helsinki, FINLAND
Received: November 15, 2016; Accepted: April 14, 2017; Published: April 27, 2017
Copyright: © 2017 Rey-Baños et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: The authors acknowledge financial assistance from the Spanish Ministerio de Ciencia y Tecnología (grant, AGL2009-07853/AGR), the Spanish Ministerio de Economía y Competitividad (grant AGL2013-44714-R, co-financed with FEDER founds) and the Junta de Castilla y León to Research Groups (GR113). The Ph. D. work of RRB was supported by a personal grant from the Universidad de León, Spain.
Competing interests: The authors have declared that no competing interests exist.
Lentil (Lens culinaris Medik. subsp. culinaris) is one of the earliest domesticated plant species in the Fertile Crescent. It is a diploid (2n = 14), self-pollinated annual cool season grain legume normally grown in temperate semi-arid regions, usually in rotation with cereals. It plays an important role in human nutrition and soil improvement contributing to replenish the soil nitrogen levels. The crop is now widely cultivated throughout Western Asia, Northern Africa, the Indian subcontinent, Australia and North America [1–2]. This species is included in the tribe Fabeae or Vicieae, which also includes the genera Lathyrus, Pisum, Vavilovia and Vicia [3–4].
Transposable elements (TEs) are DNA sequences that can insert into new chromosomal locations and often make duplicate copies of themselves in the process. TEs are the single largest component of most eukaryotic genomes; although active elements comprise only a small minority of the genomic TE complement in most multicellular organisms. Eukaryotic TEs are divided into two classes according to whether their transposition intermediate is RNA (class 1, or retrotransposons) or DNA (class 2). Class 1 elements are classified into two groups depending upon the presence or not of long terminal repeats (LTRs): LTR retrotransposons and non-LTR retrotransposons [5–6] (see Fig 1). Retrotransposons (RTs) are the most abundant and widespread class of eukaryotic TEs and are widely distributed along plant genomes [6–7]. The plant genome percentage represented by RTs in assembled genomes was found to range between 7.0 of Populus trichocarpa to 75.6 of Zea mays. In the legume model species Medicago truncatula RT coverage corresponded to 26% and in the chickpea crop (Cicer arietinum) amounted to 49% , while in the Vicieae species comprised up to 81% of the nuclear genome .
The main characteristics of autonomous and non-autonomous elements are represented. LTR-retrotransposons have long terminal repeats (LTRs) in direct orientation. Autonomous elements contain at least two genes, called gag and pol. The gag gene encodes a capsid-like protein and the pol gene encodes a polyprotein that is responsible for protease (PR), reverse transcriptase (RT), RNase H (RH) and integrase (INT) activities. PBS, primer binding site; PPT, polypurine track. Non-autonomous elements, such as large retrotransposon derivatives (LARDs) and terminal repeat retrotransposons in miniature (TRIMs), lack most or all coding sequence. Non-LTR retrotransposons are divided into long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs). LINE coding regions include a gag-like protein (ORF), an endonuclease (EN) and reverse transcriptase (RT). Both LINEs and SINEs usually terminate by a poly(A) sequence . Thick lines below the elements indicate the sequences amplified in lentil in this work; the first letter c in the nomenclature indicates that the sequence was identified as a copia and g as a gypsy element. Drawings not made to scale.
LTR retrotransposons (LTR-RTs) are further classified into the Ty1-copia and Ty3-gypsy families that differ from each other in both their sequences similarity degree and encoded gene product order , and into two groups almost exclusive to plants, LARDs (LArge Retrotransposon Derivatives) and TRIMs (Terminal-repeat Retrotranposons In Miniature) [10–11]. LARDs are approximately 4.4 kb long but they do no but contain no open reading frames encoding typical retrotransposon proteins ; while TRIMs have terminal direct repeat sequences that encompass an internal domain of 100–300 bp  (Fig 1).
Among genomes of the species belonging to the tribe Vicieae LTR-RT elements are predominant, reaching approximately 140,000 copies per genome in pea (Pisum sativum); Ty3-gypsy elements are less diverse and have accumulated to a higher copy number compared to Ty1-copia elements. This is in part due to a large proportion of Ogre-like retrotransposons, included into the Ty3-gysy elements, which alone can make up more than 50% of the genome in some species of this tribe. Ogre elements are exceptionally large sized (reaching up to 25 kb) and possess several specific features. The Ty1-copia group elements are somewhat less abundant with only Maximus/SIRE elements reaching the abundance of some of the Ty3-gypsy lineages, and LARD and TRIM elements are present in even lower amounts [12–14].
The relative high copy number of retrotransposons coupled to their high genome mobility has consequently generated a relatively high number of polymorphisms. Thus, retrotransposon sequences have been used to develop several types of genetic markers such as sequence specific amplified polymorphisms (SSAP), also known as transposon display, inter-retrotransposon amplified polymorphisms (IRAP), retrotransposon-microsatellite amplified polymorphisms (REMAP), retrotransposon based insertion polymorphisms (RBIP), inter-primer binding site (iPBS) and others [15–18]. These markers have been used for phylogenetic studies and have been included in numerous genetic maps of several crop plant species , having proved their utility in the molecular dissection of plant genomes, genetics and breeding . In the Vicieae species they have been used in Lens diversity and phylogenetic studies  as well as in other genera [22–28].
The aims of this work were to identify some of the LTR-RT elements present in lentil (Lens culinaris Medik.) by partial element sequencing and to examine their distribution throughout of the lentil draft genome , as well as to generate retrotransposon derived markers in this species for their inclusion in genetic maps. These resultant markers can be useful in future genetic studies and in lentil breeding by marker-assisted selection, in addition to LTR-RT sequences aiding in the assembly of the lentil draft genome.
Material and methods
The material used for the genetic mapping was a set of 94 F7 inbred recombinant lines (RILs) derived from the cross between the cultivar Lupa and the wild lentil ancestor L. culinaris subsp. orientalis (Bioiss.) Ponert (Spanish Germplasm Bank accession BG16880). The DNA sequences of the LTR and the internal RT components were derived from the cultivar Lupa.
DNA was extracted from leaves of two-three week old seedlings with the Dnaesy Plant Mini Kit (Quiagen) following the manufacturer’s instructions. DNA quantification was carried out with a NanoDrop ND-1000 (Thermo Fisher). Putative retrotransposon sequences were amplified from genomic DNA by PCR, the results analyzed by means of 1–2% agarose gels and the selected bands were isolated, cloned and sequenced using the Sanger method and capillary electrophoresis (MegaBACE 500 Amersahm Biosiences). The PCR, electrophoresis in agarose gels, cloning and sequencing techniques used comprised standard methods . Reverse transcriptase sequences were amplified using the degenerated primers designed for the Ty1-copia , for the Ty3-gypsy , or designed from a lentil Ty1-copia retrotransposon Tnana  (S1 Table).
The iPBS markers were amplified using 12 primers previously described  (S1 Table). The inter-primer binding site technique uses a single primer (sometimes two) to amplify the sequences enclosed between the PBS sites of two nearby LTR-RTs displaying inverted orientations. Additional markers were obtained with a primer described by Hamwieh et al.  and two microsatellite primers, (AC)10 and (GT)10, in combination with primers derived from lentil retrotransposons (S1 Table). Three additional partial sequences corresponding to the RNase H were obtained using previously described primers . Lentil genomic DNA was digested with MseI, next MseI adapters were added and linked to the corresponding primers used for the PCR reactions (S1 Table). The GeneBank lentil sequence accession numbers correspond to KX871706 for Tnana, and KX889312 to KX889392 in the case of the rest.
Genetic maps were obtained with the software packages MapMaker v 3.0b , CarthaGene 1.3.beta  and MapChart v 2.2 . Evolutionary analyses were conducted in MEGA7 . Dendrograms were obtained using the neighbor-joining algorithm , making use of the Tamura 3 parameter distance  and the gamma distribution in order to consider the substitution rate differences among sites.
Some of the retrotransposon lentil sequences, internal or from the LTRs, were used to carry out an in silico search of homologous sequences appearing in the current lentil draft genome v0.8 . The search was carried out using the BLASTn v 2.3.0 , the “outformat” option used entailed the single line per query one. Since the sequences used as in silico “probes” were derived from different retroelement parts, two or more of them could identify by BLAST the same genomic element. In order to avoid different “probes” to return repeated hits of the same element the following procedure was carried out. 1) Within of each contig all of the sequences identified were aligned on a single strand, 2) Within of each contig the sequences were ordered considering the “start” and the “end” sites, 3) Two or more hits were considered as part of the same retroelement if they overlapped or if the distance between the consecutive “end” and “start” sites was not longer than a determined distance; for the non-overlapping sequences four distance categories were considered: 10, 1,000, 10,000 or 50,000 bp. Statistical analyses were performed with the R software .
Genetic markers and mapping
The iPBS markers were used in the genetic linkage analysis and to sequence and identify some retrotransposon LTRs. Primers complementary to the retrotransposon primer binding site (PBS) located close to the 5’ LTR sequence, were used to amplify the LTRs and the spacer region between retrotransposons (RT). All primers and primer-pair combinations were tested (PBS1 to PBS12, S1 Table), all of them except PBS11 generated polymorphic markers between the two parents. In total 741 bands were scored of which 233 iPBS evidenced polymorphism between L. c. culinaris and L. c. orientalis (amplicon sizes ranged from 400 to 2,000 bp). Additional genetic markers were obtained using primers designed from the lentil sequences obtained in this work, belonging to Angela (LTR1) and to SIRE1/Maximus (LTR3) of the Ty1-copia family, and to Peabody (LTR4) and to TRIM type Cassandra (LTR2) of the Ty3-gypsy type family. Likewise, markers with flanking lentil microsatellites (SSR), together with primers for SSR sequences (S1 Table) were obtained. Clear polymorphic bands were only observed for the primer combinations LTR1-AC, LTR4-GT and LTR2-SSR66R.
While the iPBS technique yielded a relatively high number of markers (a total 233 polymorphic markers, representing approximately 3.5 markers per primer combination), the number of markers obtained from the other primer combinations was low. One hundred and eighty four iPBS, six REMAP and three markers derived from the SSR66R primer showed a 1:1 segregation in the F7 RIL population analyzed. A total of 131 iPBS, four REMAP and one SSR66 marker were placed in a recombination genetic map encompassing eight linkage groups (Fig 2). Three linkage groups holding only two markers were not considered to carry out the analysis. The linkage groups ranged from 328.2 cM to 37.3 cM, including 32 to four markers, respectively. Distances between the consecutive markers varied from 0.5 to 38.5 cM, with an average of 11.2 cM. As can be seen in the box-plot included in Fig 2, most distances were lower than 15 cM and included in the third first quartiles (median of 9.6 cM) and distances higher than 30 cM could be considered as outliers. In case of not taking into account markers positioned at distances near to 30 cM or higher, some markers located near to the end of linkage groups ends would have been considered as independent. Clusters of tightly linked retrotransposon derived markers were detected in groups LG1, LG2, and LG6 (Fig 2).
The markers from the parental L. culinaris Lupa are indicated in red, that is, these bands were observed in the parental Lupa but were absent in the other parental, and vice versa for markers in black. Linkage groups are numbered from LG1 to LG8. A LOD score of 4 was used. Markers preceded by a P are iPBSs, by R are REMAPs and S indicates the SSR markers included. Partial distances in cM are indicated to the left of LGs while the total LG distance is displayed at the bottom. The insert to the right corresponds to the boxplot distribution of the distance in cM between consecutive markers.
LTRs and internal retrotransposon sequences
Internal parts of the retrotransposons were also amplified using degenerate and non-degenerate primers (S1 Table). Using the primers described for Ty1-copia elements , two PCR bands were cloned and sequenced. The smaller band of ~ 280 bp yielded 21 different sequences derived from 23 clones (Table 1 and S1 Fig). The second band of ~ 450 bp yielded six different sequences out of 12 clones, which after sequencing were identified as Ty3-gypsy elements. All the sequences corresponded to reverse transcriptase. The primers for Ty3-gypsy elements  yielded five discrete bands, ranging from ~ 850 bp to 2500 bp. From 55 clones, 22 different sequences were identified, corresponding to reverse transcriptase and some of them also overlapped with the beginning of the downstream RNase H. Further analyses included the Ty3-gypsy sequences obtained from the Ty1-copia primers in the Ty3-gypsy data set (Table 1 and S2 Fig). The position of the cloned DNA fragments in the RTs are indicated in Fig 1.
All sequences were aligned and then compared with Medicago truncatula LTR retrotransposons , those with a high similarity (e value < 10−20) were included in the subsequent analysis. Twenty-one sequences belonged to the copia family and all of them shared the conserved amino acid motif SLYGLKQA characteristic of the copia elements [31, 45] (S1 Fig). A neighbor-joining dendrogram was constructed in which the copia sequences were distributed among six sequence groups (Fig 3A) whose phylogenetic lineages were identified following the classification by Piednöel et al.  as Tnt1/Angela/Tont1, and SIRE1/Maximus. The 28 gypsy sequences were clustered in three groups (Fig 3B) classified as Tat/Ogre, Athila and Tekay/DEL/Peabody. Sequences of Group 3 shared several conserved amino acid motifs: LRID, DLRSGY, FG, and FIDD; while sequences in Group 1 shared variants of these sequence patterns: PHID, DGFSGY, FG, and YVDD (S2 Fig). The sequences considered in this analysis amplified from the gypsy and copia retrotransposons are listed in Table 1 and their corresponding positions within retroelements are indicated in Fig 1.
Trees show the relationships between lentil sequences and Medicago truncatula (Mtr) sequences. A, Ty1-copia sequences; B, Ty3-gypsy sequences. Lentil sequences are within boxes indicating the different linkage groups (Gr) to which they belong, groups were related to the M. truncatula clades as described by Piednöel et al.  and the M. truncatula sequence numbers as in Wang and Liu . Red color denotes the presence of premature stop codons in the reading frames.
The three primer combinations (the reverse primer was common to all) derived from the lentil copia element named Tnana amplified between two to five bands per combination allowing to sequence a total of 23 different PCR products (Table 1). After the alignment and phylogenetic analyses, these sequences were identified as part of the two major groups of plant retrotransposons. Fifteen come from of the pol gene of Ty1-copia sequences, similar to the Mtr38.1 sequence of M. truncatula included in the Tos17 group (Fig 3A), and eight were partial sequences of the RNase H Ty3-gypsy, similar to the Mtr59.19 included in the Tat group (Fig 3B). In addition to the presence of premature stop codons all these sequences evidenced changes in the described conserved active sites of the different enzymatic activities of the polyprotein, thus indicating that they must correspond to non-autonomous elements.
Two of the three partial sequences of the RNase H, encompassing from the end of the RNase H to the LTR (segment ce in Fig 1), were included in the copia family while the third (segment gc in Fig 1) was included in the gypsy family.
A total of 28 sequences out of the 75 retrieved that corresponded to the internal segment included premature stop codons in their ORFs. These codons were generated by point mutations or, more frequently, by frameshift mutations due to indels, generally deletions of several tens of nucleotides. These clones are marked with an asterisk in Fig 3.
Several iPBS markers were partially sequenced and those that had a high similarity with different retrotransposon families were further sequenced in order to obtain their complete sequences. Finally, six sequence types were selected to undertake a further analysis of their distribution throughout of the Lens genome. A complete LTR with similarity to an Angela lentil sequence  was amplified using the PBS7 primer; three LTRs of the non-autonomous TRIM Cassandra were amplified using PBS1, PBS2 or PBS2-PBS3; two complete LTRs of the Ty1-copia family SIRE1/Maximus similar to the gmw2-109b11-re-3 element of Glycine max  were obtained from primers PBS1-PBS3 and PBS3-PBS5; three LTRs of a SIRE1-13 element (Ty1-copia)  were retrieved from PBS1, PBS7 and PBS2-PBS3; three sequences similar to the LTR of Peabody elements (family Ty3-gypsy) were recovered from PBS1, PBS3 and PBS1-PBS2; and finally, part of the LTR of an Ogre element (family Ty3-gypsy) was derived using PBS1-PBS7 (Table 2 and localization in Fig 1).
Likewise, the lentil LTR sequence described by Smykal et al.  was amplified using the primers described by these authors. We obtained a similar Angela LTR sequence, although it differed mainly in short insertion-deletions (Table 2).
The sequences listed in Tables 1 and 2 were used as “probes” in an in silico BLASTn search against the lentil draft genome v 0.8 . This draft of the L. culinaris genome includes 490,452 contigs with an average length of 5,673 bp, ranging between 64 bp to 605,900 bp. Only sequences with an e-value lower than 3*10−4 were considered. The results obtained with the different “probe” sequences or groups of them are summarized in Table 3. Depending upon the distance between two consecutive hits they were either considered as part of the same element or included in two different elements, the number of putative retrotransposons identified ranged from 31,216 to 24,633 (Table 3) when the internal parts of retrotransposons were used as “probes”. When the LTRs were used, the range was between 51,027 to 30,968. Thus, the LTRs identified a higher number of elements in spite of existing two per element and the number of LTR used as a probe being lower than the internal sequence probes. Likewise, the LTR data revealed that the SIRE lineage was the most abundant class among the Ty1-copia elements, representing 74% to 81% of the copia elements depending on the distance considered between hits. Ogre elements were the most abundant among the Ty3-gypsy, although in a lower percentage (57% to 65% of the gypsy sequences), followed by Peabody elements. Ogre retrotransposons constitute large elements  so that the number of the different Ogre elements identified in the lentil contigs could be close to the figure displayed in the fourth column of Table 3 considering 50,000 bp distances between consecutive hits.
A further analysis was carried out in order to estimate the frequency of the putative incomplete RTs. The most frequent RT elements in lentil were used: Angela, SIRE, Peabody and Ogre. We searched for two LTRs of the same RT family within a distance of 10,000 bp, then for the presence of an internal sequence within of the LTRs. An internal sequence was identified in approximately only 3% of the Angela and SIRE (62 out of 1,767, and 59 out of 1,957) Ty1-copia elements, 28% (834/2,994) of the Peabody and 10% of the Ogre Ty3-gypsy elements (for Ogre a 50,000 bp distance was considered, since complete Ogre elements are larger than 10,000 bp).
Retrotransposon hit density distribution versus the logarithm of the contig length is in Fig 4. A total of 43,109 contigs (< 10%) contained one or more sequences similar to the “probes”, and most of them are large contigs (on average 42,770 bp). Only 141 of the largest contigs (mean size 176,200 bp) contained 10 or more sequences spaced by more than 10 bp. The distribution of the contigs with 0 to 18 LTR-RTs is depicted in Fig 5 with the box-plots indicating the average contig length and quartiles of each class. These results suggest that retrotransposons would be distributed randomly and that their number is directly related to contig length; thus the number included in each contig could possibly fit a Poisson distribution.
Figure shows the distributions of contig lengths; blue and red color lines indicate contigs with at least one lentil LTR-RT sequence or 10 or more lentil LTR-RT, respectively. The black line indicates the distribution of all lentil contigs, V0.8 genome. n = number of contigs.
Numbers at the bottom indicate the number of hits per contig while those on top to the number of contigs in each class. The first distribution was obtained when two hits were considered different if they were separated by at least 10 bp, the second distribution when hits were separated by at least 10,000 bp.
To test these hypotheses a regression analysis of the Poisson distribution was carried out using the general lineal model (glm). The natural logarithm of the number of elements in each contig was the dependent variable while the contig length in bp was the independent variable. The output models were similar with respect to the four between hit distances tested, as can be seen in S2 Table. The distance to consider two consecutive hits as belonging to different elements had only a minor effect on the coefficients. In all instances, results did not fit a Poisson distribution since the deviation of the residuals did not follow a normal distribution (S2 Table). Contigs evidencing a high deviation from the expected value consisted of some large contigs with a low number of elements. Deviations from the expected values could be due to the high number of contigs without hits, as well as to differences in contig length, therefore the fitting of the contig set with a size ranging 12,800 to 60,480 bp was tested (this set includes the intermediate size interquartile range excluding most of the contigs lacking hits). Again, data did not fit a Poisson distribution for any range of between hit distances. For instance, for a distance of more than 1,000 bp between hits an excess of contigs with no retroelements or else with four or more was observed, while a clear deficit in the classes with one or two elements was detected (Fig 6).
The continuous line indicates the theoretical distribution and bars the real number of contigs within of each class. A between hit distance of > 10,000 bp was considered.
The physical maps of the possible retroelement locations corresponding to the seven contings accounting for the highest number of hits are shown in Fig 7. Only six possible complete LTR-RT were detected, two large Ogre-like (included in the gypsy family) in contigs 11687 and 341955, and four Peabody-like (also included in gypsy) with one in contigs 82312 and 300413 while two in contig 24787. The Ogre-like sequence of contig 11687 exhibited an internal hit for a reverse transcriptase of group 1e displayed in Fig 3B; while the sequence of contig 341955 probably constitutes a chimeric non-functional sequence since it included several internal hits of LTRs derived from other elements such as Peabody (gypsy) and Angela (copia) plus a reverse transcriptase (gypsy Group 3b). The four Peabody-like elements detected showed an internal hit of the gypsy reverse transcriptase belonging to Group 3b.
Contig size (above) and contig name (below) are indicated. Arrows indicate sequences’ orientations. Blue boxes indicate putative RT flanked by two LTRs, red boxes indicate the presence of a reverse transcriptase sequence between LTRs. LTR are named according to their lineage (Table 2) and the internal sequences according to the nomenclature used in Table 1.
In this study, several retrotransposon-based markers such as iPBS and REMAP, were tested in relation to their utility to carry out genetic mapping in lentil. The iPBS technique  proved to be especially useful because it yielded a high number of Mendelian markers. Likewise, it provided a simple method to obtain LTR retrotransposon (LTR-RTs) sequences in order to gain knowledge on their type and distribution in plant genomes. Since the introduction of iPBS markers, they have extensively been used in plants with respect to fingerprinting and genetic diversity analyses [21–23, 50], nonetheless, their direct use in genetic mapping remains scarce. To the best of our knowledge, this study would be the first example of direct iPBS marker use in genetic mapping.
Considering the lentil parental lines used in this study, the high number of iPBS bands amplified in each reaction (an average of 9.75) indicates that LTR-RTs are abundant and widespread throughout of the lentil genome, in particular if considered that the iPBS PCR fragments are only produced when two LTR-RTs are located relatively close to each other and display inverted orientations. The high iPBS marker polymorphism detected in lentil agrees with previous results in pea . Upon comparing the number of polymorphic iPBS loci (presence vs. absence of an amplified fragment) among the cultivated lentil and its wild ancestor, cultivar Lupa generated a slightly higher (106) number than that retrieved in the wild L. orientalis (87), although the difference was non-significant (χ2 = 1.87, 1 d.f., 0.2 > P > 0.1).
Segregating iPBS and REMAP markers were arranged in eight linkage groups, one more than the corresponding haploid chromosome set number of Lens. The marker density within each linkage group was uneven, evidencing in general a tighter clustering within of the central regions, possibly labeling favorable genomic regions for LTR-RT accumulation. Although a wide variation exists depending upon the specific LTR-RT element, in general LTR-RTs in other species tend to concentrate in gene-poor regions . However, in Medicago and Lotus, LTR-RT rich regions can be as large as entire chromosome arms . Analyzing the origin and distribution of the PCR product alleles, it can be observed (Fig 2) that in general these alleles derived equally from both parents and were distributed at random; yet in some linkage regions the alleles of one parental were clustered, such as in the case of linkage groups 1 and 6 exhibiting clusters of Lupa alleles. These regions might represent zones where the LTR-RT amplifications preferentially occurred, or LTR-RTs were inadvertently selected by linkage drag, in the cultivated lentil during the domestication process.
Sequencing of iPBS products and other retrotransposon based markers yielded a total of 81 sequences. The phylogenetic analysis disclosed that Lens contains all of the main LTR-RT families and phylogenetic lineages described in legumes . According to our results, the lineage Tat/Ogre would be the most abundant in lentil, although the observed frequency could be biased due to the PCR primers used and is likely to be an underestimation since the current assembled lentil genome still lacks a large part comprising essentially repeated sequences. However, the Ogre prevalent abundance is in accordance with previous data of phylogenetically close species to lentil [9, 14]
Up to 37% of the lentil LTR-RT internal sequences analyzed in this study presented premature stop codons (PSC) having either originated by nucleotide substitutions or by frameshift mutations or by both mechanisms. The affected translatable sequences corresponded mainly to reverse transcriptase, since most of the analyzed sequences corresponded to this enzyme, but premature stop codons were also observed in other sequences. Since only a relatively reduced part of the total translatable fraction was sequenced, more PSC could possibly be present. Plant genome analyses in general tend to reveal a large number of LTR retrotransposons that contain stop codons interrupting their ORFs, lacking one or more coding domains, or else both situations . Therefore, as in other plant species, a high proportion of the LTR-RTs would consist of non-autonomous elements lacking the essential enzymatic functions to enable genomic movement. However, most of these inactive or non-autonomous elements are likely to be able to retrotranspose using the trans-factors synthesized by autonomous elements present elsewhere in the genome, thus accordingly generating large changes in genome size [54–57]. The subsequent in silico search of homologous sequences in the lentil draft genome yielded a lower number of hits when the more numerous internal transposon sequences (i.e., reverse transcriptase, RNase H, etc.,) were used compared to the LTR sequences, suggesting that many lentil retrotransposons contain internal deletions. Thus, they could be included into the non-autonomous set.
When the distance considered between consecutive LTR hits increased from 10 up to 10,000 bp, the theoretical size of whole or nearly-whole retrotransposons, the number of identified elements should have been reduced by approximately 50%. However, the observed diminution was 15.9% for the copia elements and 22.9% for the gypsy elements when the distance increased. Thus, the higher number of elements identified by the LTR hits compared to the internal sequence hits, plus the high frequency of internal sequence absence between LTRs, indicate in lentil a significant presence of defective non-autonomous, solo-LTRs and probably TRIM elements. Macas et al.  already indicated that the number of LTRs in species of the tribe Vicieae (Fabeae) was higher than that of the corresponding whole LTR-RTs; the estimated proportion of incomplete elements amounted to 41.2%, and 37.5% within of the gypsy Ogre elements (the most frequent retroelements).
In lentil the Ty1-copia, Ty3-gypsy and TRIM family frequencies are respectively 17.6%, 78.7% and 0.03% of the LTR-RTs . The different lineage estimations obtained by our in silico search agree with these figures. When each one of the LTR sequences obtained was analyzed by separate, and we considered that they belonged to the same RT if two consecutive hits were separated by less than 10,000 bp, then the percentages obtained corresponded to 25.7% (10,898 hits) Ty1-copia, 72.0% (30,559) Ty3-gypsy and 2.3% (998) TRIM sequences (Table 3). Our data also agrees with previous results  for the within family lineages, especially for the major families: Tat/Ogre (59.8% of the Ty3-gypsy vs 66.0% in Macas et al., 2015), Tekay/DEL/ Peabody (40.2% of the Ty3-gypsy vs. 26%), SIRE1/Maximus (79.8% of the Ty1-copia vs. 93%).
Retrotransposons are generally widely distributed along plant genomes  and can be randomly distributed such as in the case of maize , although there are reports of preferential distribution of at least some types of retrotransposons, such as TRIMs [59–60], and retrotransposon-derived markers in plant genomes [7, 61–62]. The lentil elements analyzed were widely distributed as suggested by the percentage (8.8%) of the lentil draft genome contigs which contained at least one element. If LTR-RT were distributed at random in the genome, it would be expected that contigs containing zero, one, two, etc. LTR-RT should fit a Poisson distribution. But our data has not validated this hypothesis, even when a correction by contig sizes was considered. For instance, there are more large contigs without LTR-RT than expected (Fig 6). Furthermore, the genetic map generated with the iPBS markers (Fig 2) exhibited regions of different marker densities. These results suggest that the distribution of these elements in lentil is nonrandom.
To sum up, the present data points out that LTR-RTs are a suitable source of genetic markers in lentil and to their utility in genetic analysis and map construction. These markers can be useful for future genetic analyses, marker-assisted breeding, and phylogenetic studies. In lentil, the results indicate that a high proportion of the lentil retroelements have lost their autonomous transposition ability, either by point mutations and/or deletions. In fact, many of them seem to embody defective elements with internal deletions. Likewise, although they seem to be widely distributed throughout of the lentil genome their distribution is not completely random, being LTR-RTs more densely clustered in some regions of the Lens genome.
S1 Fig. Partial amino acid sequences of the reverse transcriptase encoded by the lentil Ty1-copia elements.
Asterisks denote premature stop codons and question marks the absence of one or two nucleotides in the corresponding reading frame. Conserved motifs are located inside boxes, lentil sequences denoted by “Cop” followed by a number (see Table 1), Mtr denotes sequences of Medicago truncatula as numbered according to Wang and Liu  indicating the clade in which they are included according to Piednöel et al. .
S2 Fig. Partial amino acid sequences of the reverse transcriptase encoded by the lentil Ty3-gypsy elements.
Lentil sequences are denoted by “Gyp” followed by a number See heading of Supplementary Fig 1 for additional legends.
S3 Fig. Partial amino acid sequences of the reverse transcriptase and RNaseH codified by Ty1-copia elements obtained with Tnana-derived primers.
RNaseH indicates the starting point of this protein. See heading of Supplementary Fig 1 for legends.
S4 Fig. Partial amino acid sequences of the RNaseH codified by Ty3-gypsy elements obtained with Tnana-derived primers.
See heading of Supplementary Fig 1 for legends.
S2 Table. Coefficients (a and b) according to the general linear model (glm) testing a Poisson distribution, and values of the D statistics followed by probability according to the Kolmogorov-Smirnov test of residuals adjusting to a normal distribution.
- Conceptualization: MPV LSM.
- Data curation: RRB LSM PG.
- Formal analysis: RRB LSM PG.
- Funding acquisition: MPV.
- Investigation: RRB LSM.
- Methodology: RRB LSM PG.
- Project administration: MPV LSM.
- Software: LSM.
- Supervision: MPV LSM.
- Visualization: LSM RRB PG.
- Writing – original draft: MPV.
- Writing – review & editing: MPV LSM PG.
- 1. Fratini R, Pérez de la Vega M. Genetics of economic traits in lentil: seed traits and adaptation to climatic variations. Grain Legumes. 2011; 57:18–20.
- 2. Pérez de la Vega M, Fratini R, Muehlbauer FJ. Lentil (Lens culinaris Medik.). In: Pérez de la Vega M, Torres AM, Cubero JI, editors. Genetics, Genomics and Breeding of Cool Season Grain Legumes. CRC Press; 2011. pp. 98–150.
- 3. Mikic A, Smykal P, Kenicer G, Vishnyakova M, Sarukhanyan N, Akopian JA, et al. Beauty will save the world, but will the world save beauty? The case of the highly endangered Vavilovia formosa (Stev.) Fed Planta. 2014; 240:1139–1146. pmid:25086615
- 4. Schaefer H, Hechenleitner P, Santos-Guerra A, Menezes de Sequeira M, Pennington R T, Kenicer G, et al. Systematics, biogeography, and character evolution of the legume tribe Fabeae with special focus on the middle-Atlantic island lineages. BMC Evol Biol.2012; 12:250. pmid:23267563
- 5. Feschotte C, Jiang N, Wessler SR. Plant transposable element: where Genetics meets Genomics. Nat Rev Genet. 2002; 3:329–341. pmid:11988759
- 6. Kumar A, Bennetzen JL. Plant retrotransposons. Ann Rev Genet. 1999; 33:479–532. pmid:10690416
- 7. Waugh R, McLean K, Flavell J, Pearce SR, Kumar A, Thomas BBT, et al. Genetic distribution of Bare–1-like retrotransposable elements in the barley genome revealed by sequence-specific amplification polymorphisms (S-SAP). Mol Gen Genet. 1996; 253:687–694.
- 8. Ragupathy R, You FM, Cloutier S. Arguments for standardizing transposable element annotation in plant genomes. Trends Plant Sci. 2013; 18:367–376. pmid:23618952
- 9. Macas J, Novák P, Pellicer J, Čížková J, Koblížková A, Neumann P, et al. In depth characterization of repetitive DNA in 23 plant genomes reveals sources of genome size variation in the legume tribe Fabeae. PLoS One. 2015;10(11): e0143424. pmid:26606051
- 10. Kalendar R, Vicient CM, Peleg O, Anamthawat-Jonsson K, Bolshoy A, Schulman DAH. Large retrotransposon derivatives: abundant, conserved but nonautonomous retroelements of barley and related genomes. Genetics. 2004; 166:1437–1450. pmid:15082561
- 11. Witte CP, Le QH, Bureau T, Kumar DA. Terminal-repeat retrotransposons in miniature (TRIM) are involved in restructuring plant genomes. Proc. Natl. Acad. Sci. USA. 2001; 98:13778–13783. pmid:11717436
- 12. Hill P, Burford D, Martin DMA, Flavell A J. Retrotransposon populations of Vicia species with varying genome size. Mol Genet Genomics. 2005; 273:371–381. pmid:15891910
- 13. Macas J, Neumann P, Navrátilová A. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics. 2007; 8: 427 pmid:18031571
- 14. Macas J, Neumann P. Ogre elements—A distinct group of plant Ty3/gypsy-like retrotransposons. Gene. 2007; 390:108–116. pmid:17052864
- 15. Flavell AJ, Knox MR, Pearce SR, Ellis THN. Retrotransposon based insertion polymorphisms (RBIP) for high throughput marker analysis. Plant J. 1998; 16:643–650. pmid:10036780
- 16. Kalendar R, Schulman AH. IRAP and REMAP for retrotransposon-based genotyping and fingerprinting. Nature Protoc. 2007; 1:2478–2484.
- 17. Kalendar R, Antoniu K, Smykal P, Schulman AH. iPBS: a universal method for DNA fingerprinting and retrotransposon isolation. Theor Appl Genet. 2010; 121:1419–1430. pmid:20623102
- 18. Monden Y, Yamaguchi K, Tahara M. Application of iPBS in high-throughput sequencing for the development of retrotransposon-based molecular markers. Curr Plant Biol. 2014. 1:40–44.
- 19. Kalendar R, Flavell AJ, Ellis THN, Sjakste T, Moisy C, Schulman AH. Analysis of plant diversity with retrotransposon-based molecular markers. Heredity. 2011; 106:520–530. pmid:20683483
- 20. Roy NS, Choi J-Y, Lee S-l, Kim N-S. Marker utility of transposable elements for plant genetics, breeding, and ecology: a review. Genes Genom. 2015; 37:141–151.
- 21. Baloch FS, Derya M, Andeden EE, Alsaleh A, Cömertpay G, Kilian B, et al. Inter-primer binding site retrotransposon and inter-simple sequence repeat diversity among wild Lens species. Biochem Syst Ecol. 2015; 58:162–168.
- 22. Andeden EE, Baloch FS, Derya M, Ozkan H. IPBS-Retrotransposon-based genetic diversity and relationship among wild annual Cicer species. J. Plant Biochem Biotechnol. 2013; 22:453–466.
- 23. Baloch FS, Alsaleh A, Sáenz de Miera LE, Hatipoğlu R, Çiftçi V, Karakdy T, Yildiz M, Ozkan H. DNA based iPBS-retrotransposon markers for investigating the population structure of pea (Pisum sativum) germplasm from Turkey. Biochem Syst Ecol. 2015; 61:244–252.
- 24. Ellis THN, Poyser SJ, Knox MR, Vershinin AV, Ambrose MJ. Polymorphism of insertion sites of Ty1-copia class retrotransposons and its use for linkage and diversity analysis in pea. Mol Gen Genet. 1998; 260:9–19. pmid:9829823
- 25. Jing R, Vershinin A, Grzebyta J, Shaw P, Smykal P, Marshall D, et al. The genetic diversity and evolution of field pea (Pisum) studied by high throughput retrotransposon based insertion polymorphism (RBIP) marker analysis. BMC Evol Biol. 2010; 10:44. pmid:20156342
- 26. Sanz A Martín, Gonzalez S Gilsanz, Syed NH, Suso MJ, Saldaña C Caminero, Flavell AJ. Genetic diversity analysis in Vicia species using retrotransposon-based SSAP markers. Mol Genet Genomics. 2007; 278:433–441. pmid:17576596
- 27. Pearce SR, Knox M, Ellis T H N, Flavell A J, Kumar A. Pea Ty1-copia group retrotransposons: transpositional activity and use as markers to study genetic diversity in Pisum. Mol Gen Genet. 2000; 263:898–907. pmid:10954074
- 28. Smykal P. Development of an efficient retrotransposon-based fingerprinting method for rapid pea variety identification. J Appl Genet. 2006; 47:221–30. pmid:16877800
- 29. Bett K, Ramsay L, Chan C, Sharpe AG, Cook DR, Penmetsa RV, et al. Lentil 1.0 and Beyond. Plant & Animal Genome Conference XXIV, 2016. W222. Available from: https://pag.confex.com/pag/xxiv/webprogram/Paper19117.html
- 30. Rey, MR. Desarrollo de marcadores genéticos a partir de secuencias de retrotransposones del genoma de lenteja, Lens culinaris Medik. Ph.D. Dissertation, Universidad de León. 2016. Available From: http://buleria.unileon.es/xmlui/bitstream/handle/10612/5570/Tesis%20Marta%20Rita%20Rey%20Ba%C3%B1os.PDF?sequence=1
- 31. Hirochika F, Hirochika R. Ty1-copia group retrotransposons as ubiquitous components of plant genomes. Jpn J Genet. 1993; 68:35–46. pmid:8391285
- 32. Friedsen N, Brandes A, Heslop-Harrison JS. Diversity, origin, and distribution of retrotransposons (gypsy and copia) in conifers. Mol Biol Evol. 2001; 18:1176–1188. pmid:11420359
- 33. Llorente A. Análisis de genes de leguminas en el género Lens. Ph. D. Dissertation, Univ. de León, Spain. 2001.
- 34. Hamwieh A, Udupa S M, Choumane W, Sarker A, Dreyer F, Jung C, et al. A genetic linkage map of Lens sp. based on microsatellite and AFLP markers and the localization of fusarium vascular wilt resistance. Theor Appl Genet. 2005; 110:669–677. pmid:15650814
- 35. Pearce SR, Stuart-Rogers C, Knox M., Kumar A, Ellis THN, Flavel A.J. Rapid isolation of plant Ty1-copia group retrotransposon LTR sequences for molecular marker studies. Plant J. 1999; 19:711–717. pmid:10571856
- 36. Lander ES, Green P, Abrahamson J, Barlow A, Daly M J, Lincoln S E, et al. MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics. 1987; 1:174–181. pmid:3692487
- 37. Givry S, Bouchez M, Chabrier P, Milan D, Schiex T. CarhtaGene: multipopulation integrated genetic and radiation hybrid mapping. Bioinformatics. 2005; 21:1703–1704. pmid:15598829
- 38. Voorrips RE. MapChart: Software for the graphical presentation of linkage maps and QTLs. J. Hered. 2002; 93:77–78. pmid:12011185
- 39. Kumar S, Stecher G, Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016.
- 40. Saitou N, Nei M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987; 4:406–425. pmid:3447015
- 41. Tamura K. Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G + C-content biases. Mol Biol Evol. 1992; 9:678–687. pmid:1630306
- 42. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009; 10:421. pmid:20003500
- 43. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2014. Available from: URL http://www.R-project.org/.
- 44. Wang H, Liu J-S. LTR retrotransposon landscape in Medicago truncatula: more rapid removal than in rice. BMC Genomics. 2008; 9:308.
- 45. Flavell AJ, Smith JB, Kumar A. Extreme heterogeneity of Ty1-copia group retrotransposons in plants. Mol Gen Genet. 1992; 231:233–242. pmid:1370976
- 46. Piednöel M, Carrete-Vega G, Renner SS. Characterization of the LTR retrotransposon repertoire of a plant clade of six diploid and one tetraploid species. Plant J. 2013; 75:699–709. pmid:23663083
- 47. Smykal P, Kalendar R, Ford R, Macas J, Griga M. Evolutionary conserved lineage of Angela-family retrotransposons as a genome-wide microsatellite repeat dispersal agent. Heredity. 2009; 103:157–167. pmid:19384338
- 48. Wawrzynski A, Ashfield T, Chen NW, Mammadov J, Nguyen A, Podicheti R, et al. Replication of nonautonomous retroelements in soybean appears to be both recent and common. Plant Physiol. 2008; 148:1760–1771. pmid:18952860
- 49. Pearce S R. SIRE-1, a putative plant retrovirus is closely related to a legume Ty1-copia retrotransposon family. Cell Mol Biol Lett. 2007; 12:120–126. pmid:17119871
- 50. Guo DL, Guo MX, Hou XG, Zhang GH. Molecular diversity analysis of grape varieties based on iPBS markers. Biochem Syst Ecol. 2014; 52:27–32.
- 51. Zhao M, Ma J. Co-evolution of plant LTR-retrotransposons and their host genomes. Protein Cell. 2013;4:493–501. pmid:23794032
- 52. Bertioli DJ, Moretzsohn MC, Madsen LH, Sandal N, Leal-Bertioli SCM, Guimarães PM, et al. An analysis of synteny of Arachis with Lotus and Medicago sheds new light on the structure, stability and evolution of legume genomes. BMC Genomics. 2009; 10:45. pmid:19166586
- 53. Schulman AH. Retrotransposon replication in plants. Curr Opin Virol 2013, 3:604–614. pmid:24035277
- 54. Flavell AJ, Dunbar E, Anderson R, Pearce SR, Hartley R, Kumar A. Tyl-copia group retrotransposons are ubiquitous and heterogeneous in higher plants. Nucl Acids Res. 1992; 20:3639–3644. pmid:1379359
- 55. Sabot F, Schulman AH. Parasitism and the retrotransposon life cycle in plants: a hitchhiker’s guide to the genome. Heredity. 2006; 97:381–388. pmid:16985508
- 56. Tanskanen JA, Sabot F, Vicient C, Schulman AH. Life without GAG: The BARE-2 retrotransposon as a parasite's parasite. Gene. 2007;390: 166–174. pmid:17107763
- 57. Estep MC, DeBarry JD, Bennetzen JL. The dynamics of LTR retrotransposon accumulation across 25 million years of panicoid grass evolution. Heredity. 2013; 110:194–204. pmid:23321774
- 58. Meyers BC, Tingey SV, Morgante M. Abundance, distribution, and dranscriptional activity of repetitive elements in the maize genome. Genome Res. 2001; 11:1660–1676. pmid:11591643
- 59. Gao D, Li Y, o Kim KD, Abernathy B, Jackson SA. Landscape and evolutionary dynamics of terminal repeat retrotransposons in miniature in plant genomes. Genome Biol. 2016; 17:7. pmid:26781660
- 60. Langdon T, Seago C, Mende M, Leggett M, Thomas H, Forster JW, et al. Retrotransposon evolution in diverse plant genomes. Genetics. 2000; 156:313–325. pmid:10978295
- 61. Bernet GP, Asíns MJ. Identification and genomic distribution of gypsy like retrotransposons in Citrus and Poncirus. Theor Appl Genet. 2003; 108:121–130. pmid:12937896
- 62. Peterson-Burch B, Nettleton D, Voytas DF. Genomic neighborhoods for Arabidopsis retrotransposons: a role for targeted integration in the distribution of the Metaviridae. Genome Biol. 2004; 5:R78. pmid:15461796