Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Obtaining retrotransposon sequences, analysis of their genomic distribution and use of retrotransposon-derived genetic markers in lentil (Lens culinaris Medik.)

Obtaining retrotransposon sequences, analysis of their genomic distribution and use of retrotransposon-derived genetic markers in lentil (Lens culinaris Medik.)

  • Rita Rey-Baños, 
  • Luis E. Sáenz de Miera, 
  • Pedro García, 
  • Marcelino Pérez de la Vega


Retrotransposons with long terminal repeats (LTR-RTs) are widespread mobile elements in eukaryotic genomes. We obtained a total of 81 partial LTR-RT sequences from lentil corresponding to internal retrotransposon components and LTRs. Sequences were obtained by PCR from genomic DNA. Approximately 37% of the LTR-RT internal sequences presented premature stop codons, pointing out that these elements must be non-autonomous. LTR sequences were obtained using the iPBS technique which amplifies sequences between LTR-RTs. A total of 193 retrotransposon-derived genetic markers, mainly iPBS, were used to obtain a genetic linkage map from 94 F7 inbred recombinant lines derived from the cross between the cultivar Lupa and the wild ancestor L. culinaris subsp. orientalis. The genetic map included 136 markers located in eight linkage groups. Clusters of tightly linked retrotransposon-derived markers were detected in linkage groups LG1, LG2, and LG6, hence denoting a non-random genomic distribution. Phylogenetic analyses identified the LTR-RT families in which internal and LTR sequences are included. Ty3-gypsy elements were more frequent than Ty1-copia, mainly due to the high Ogre element frequency in lentil, as also occurs in other species of the tribe Vicieae. LTR and internal sequences were used to analyze in silico their distribution among the contigs of the lentil draft genome. Up to 8.8% of the lentil contigs evidenced the presence of at least one LTR-RT similar sequence. A statistical analysis suggested a non-random distribution of these elements within of the lentil genome. In most cases (between 97% and 72%, depending on the LTR-RT type) none of the internal sequences flanked by the LTR sequence pair was detected, suggesting that defective and non-autonomous LTR-RTs are very frequent in lentil. Results support that LTR-RTs are abundant and widespread throughout of the lentil genome and that they are a suitable source of genetic markers useful to carry out further genetic analyses.


Lentil (Lens culinaris Medik. subsp. culinaris) is one of the earliest domesticated plant species in the Fertile Crescent. It is a diploid (2n = 14), self-pollinated annual cool season grain legume normally grown in temperate semi-arid regions, usually in rotation with cereals. It plays an important role in human nutrition and soil improvement contributing to replenish the soil nitrogen levels. The crop is now widely cultivated throughout Western Asia, Northern Africa, the Indian subcontinent, Australia and North America [12]. This species is included in the tribe Fabeae or Vicieae, which also includes the genera Lathyrus, Pisum, Vavilovia and Vicia [34].

Transposable elements (TEs) are DNA sequences that can insert into new chromosomal locations and often make duplicate copies of themselves in the process. TEs are the single largest component of most eukaryotic genomes; although active elements comprise only a small minority of the genomic TE complement in most multicellular organisms. Eukaryotic TEs are divided into two classes according to whether their transposition intermediate is RNA (class 1, or retrotransposons) or DNA (class 2). Class 1 elements are classified into two groups depending upon the presence or not of long terminal repeats (LTRs): LTR retrotransposons and non-LTR retrotransposons [56] (see Fig 1). Retrotransposons (RTs) are the most abundant and widespread class of eukaryotic TEs and are widely distributed along plant genomes [67]. The plant genome percentage represented by RTs in assembled genomes was found to range between 7.0 of Populus trichocarpa to 75.6 of Zea mays. In the legume model species Medicago truncatula RT coverage corresponded to 26% and in the chickpea crop (Cicer arietinum) amounted to 49% [8], while in the Vicieae species comprised up to 81% of the nuclear genome [9].

Fig 1. Schematic representation of representative LTR retrotransposons.

The main characteristics of autonomous and non-autonomous elements are represented. LTR-retrotransposons have long terminal repeats (LTRs) in direct orientation. Autonomous elements contain at least two genes, called gag and pol. The gag gene encodes a capsid-like protein and the pol gene encodes a polyprotein that is responsible for protease (PR), reverse transcriptase (RT), RNase H (RH) and integrase (INT) activities. PBS, primer binding site; PPT, polypurine track. Non-autonomous elements, such as large retrotransposon derivatives (LARDs) and terminal repeat retrotransposons in miniature (TRIMs), lack most or all coding sequence. Non-LTR retrotransposons are divided into long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs). LINE coding regions include a gag-like protein (ORF), an endonuclease (EN) and reverse transcriptase (RT). Both LINEs and SINEs usually terminate by a poly(A) sequence [5]. Thick lines below the elements indicate the sequences amplified in lentil in this work; the first letter c in the nomenclature indicates that the sequence was identified as a copia and g as a gypsy element. Drawings not made to scale.

LTR retrotransposons (LTR-RTs) are further classified into the Ty1-copia and Ty3-gypsy families that differ from each other in both their sequences similarity degree and encoded gene product order [6], and into two groups almost exclusive to plants, LARDs (LArge Retrotransposon Derivatives) and TRIMs (Terminal-repeat Retrotranposons In Miniature) [1011]. LARDs are approximately 4.4 kb long but they do no but contain no open reading frames encoding typical retrotransposon proteins [10]; while TRIMs have terminal direct repeat sequences that encompass an internal domain of 100–300 bp [11] (Fig 1).

Among genomes of the species belonging to the tribe Vicieae LTR-RT elements are predominant, reaching approximately 140,000 copies per genome in pea (Pisum sativum); Ty3-gypsy elements are less diverse and have accumulated to a higher copy number compared to Ty1-copia elements. This is in part due to a large proportion of Ogre-like retrotransposons, included into the Ty3-gysy elements, which alone can make up more than 50% of the genome in some species of this tribe. Ogre elements are exceptionally large sized (reaching up to 25 kb) and possess several specific features. The Ty1-copia group elements are somewhat less abundant with only Maximus/SIRE elements reaching the abundance of some of the Ty3-gypsy lineages, and LARD and TRIM elements are present in even lower amounts [1214].

The relative high copy number of retrotransposons coupled to their high genome mobility has consequently generated a relatively high number of polymorphisms. Thus, retrotransposon sequences have been used to develop several types of genetic markers such as sequence specific amplified polymorphisms (SSAP), also known as transposon display, inter-retrotransposon amplified polymorphisms (IRAP), retrotransposon-microsatellite amplified polymorphisms (REMAP), retrotransposon based insertion polymorphisms (RBIP), inter-primer binding site (iPBS) and others [1518]. These markers have been used for phylogenetic studies and have been included in numerous genetic maps of several crop plant species [19], having proved their utility in the molecular dissection of plant genomes, genetics and breeding [20]. In the Vicieae species they have been used in Lens diversity and phylogenetic studies [21] as well as in other genera [2228].

The aims of this work were to identify some of the LTR-RT elements present in lentil (Lens culinaris Medik.) by partial element sequencing and to examine their distribution throughout of the lentil draft genome [29], as well as to generate retrotransposon derived markers in this species for their inclusion in genetic maps. These resultant markers can be useful in future genetic studies and in lentil breeding by marker-assisted selection, in addition to LTR-RT sequences aiding in the assembly of the lentil draft genome.

Material and methods

Plant material

The material used for the genetic mapping was a set of 94 F7 inbred recombinant lines (RILs) derived from the cross between the cultivar Lupa and the wild lentil ancestor L. culinaris subsp. orientalis (Bioiss.) Ponert (Spanish Germplasm Bank accession BG16880). The DNA sequences of the LTR and the internal RT components were derived from the cultivar Lupa.

Marker analysis

DNA was extracted from leaves of two-three week old seedlings with the Dnaesy Plant Mini Kit (Quiagen) following the manufacturer’s instructions. DNA quantification was carried out with a NanoDrop ND-1000 (Thermo Fisher). Putative retrotransposon sequences were amplified from genomic DNA by PCR, the results analyzed by means of 1–2% agarose gels and the selected bands were isolated, cloned and sequenced using the Sanger method and capillary electrophoresis (MegaBACE 500 Amersahm Biosiences). The PCR, electrophoresis in agarose gels, cloning and sequencing techniques used comprised standard methods [30]. Reverse transcriptase sequences were amplified using the degenerated primers designed for the Ty1-copia [31], for the Ty3-gypsy [32], or designed from a lentil Ty1-copia retrotransposon Tnana [33] (S1 Table).

The iPBS markers were amplified using 12 primers previously described [17] (S1 Table). The inter-primer binding site technique uses a single primer (sometimes two) to amplify the sequences enclosed between the PBS sites of two nearby LTR-RTs displaying inverted orientations. Additional markers were obtained with a primer described by Hamwieh et al. [34] and two microsatellite primers, (AC)10 and (GT)10, in combination with primers derived from lentil retrotransposons (S1 Table). Three additional partial sequences corresponding to the RNase H were obtained using previously described primers [35]. Lentil genomic DNA was digested with MseI, next MseI adapters were added and linked to the corresponding primers used for the PCR reactions (S1 Table). The GeneBank lentil sequence accession numbers correspond to KX871706 for Tnana, and KX889312 to KX889392 in the case of the rest.

Genetic mapping

Genetic maps were obtained with the software packages MapMaker v 3.0b [36], CarthaGene 1.3.beta [37] and MapChart v 2.2 [38]. Evolutionary analyses were conducted in MEGA7 [39]. Dendrograms were obtained using the neighbor-joining algorithm [40], making use of the Tamura 3 parameter distance [41] and the gamma distribution in order to consider the substitution rate differences among sites.

Genomic analysis

Some of the retrotransposon lentil sequences, internal or from the LTRs, were used to carry out an in silico search of homologous sequences appearing in the current lentil draft genome v0.8 [29]. The search was carried out using the BLASTn v 2.3.0 [42], the “outformat” option used entailed the single line per query one. Since the sequences used as in silico “probes” were derived from different retroelement parts, two or more of them could identify by BLAST the same genomic element. In order to avoid different “probes” to return repeated hits of the same element the following procedure was carried out. 1) Within of each contig all of the sequences identified were aligned on a single strand, 2) Within of each contig the sequences were ordered considering the “start” and the “end” sites, 3) Two or more hits were considered as part of the same retroelement if they overlapped or if the distance between the consecutive “end” and “start” sites was not longer than a determined distance; for the non-overlapping sequences four distance categories were considered: 10, 1,000, 10,000 or 50,000 bp. Statistical analyses were performed with the R software [43].


Genetic markers and mapping

The iPBS markers were used in the genetic linkage analysis and to sequence and identify some retrotransposon LTRs. Primers complementary to the retrotransposon primer binding site (PBS) located close to the 5’ LTR sequence, were used to amplify the LTRs and the spacer region between retrotransposons (RT). All primers and primer-pair combinations were tested (PBS1 to PBS12, S1 Table), all of them except PBS11 generated polymorphic markers between the two parents. In total 741 bands were scored of which 233 iPBS evidenced polymorphism between L. c. culinaris and L. c. orientalis (amplicon sizes ranged from 400 to 2,000 bp). Additional genetic markers were obtained using primers designed from the lentil sequences obtained in this work, belonging to Angela (LTR1) and to SIRE1/Maximus (LTR3) of the Ty1-copia family, and to Peabody (LTR4) and to TRIM type Cassandra (LTR2) of the Ty3-gypsy type family. Likewise, markers with flanking lentil microsatellites (SSR), together with primers for SSR sequences (S1 Table) were obtained. Clear polymorphic bands were only observed for the primer combinations LTR1-AC, LTR4-GT and LTR2-SSR66R.

While the iPBS technique yielded a relatively high number of markers (a total 233 polymorphic markers, representing approximately 3.5 markers per primer combination), the number of markers obtained from the other primer combinations was low. One hundred and eighty four iPBS, six REMAP and three markers derived from the SSR66R primer showed a 1:1 segregation in the F7 RIL population analyzed. A total of 131 iPBS, four REMAP and one SSR66 marker were placed in a recombination genetic map encompassing eight linkage groups (Fig 2). Three linkage groups holding only two markers were not considered to carry out the analysis. The linkage groups ranged from 328.2 cM to 37.3 cM, including 32 to four markers, respectively. Distances between the consecutive markers varied from 0.5 to 38.5 cM, with an average of 11.2 cM. As can be seen in the box-plot included in Fig 2, most distances were lower than 15 cM and included in the third first quartiles (median of 9.6 cM) and distances higher than 30 cM could be considered as outliers. In case of not taking into account markers positioned at distances near to 30 cM or higher, some markers located near to the end of linkage groups ends would have been considered as independent. Clusters of tightly linked retrotransposon derived markers were detected in groups LG1, LG2, and LG6 (Fig 2).

Fig 2. Genetic map obtained with markers showing Mendelian segregations.

The markers from the parental L. culinaris Lupa are indicated in red, that is, these bands were observed in the parental Lupa but were absent in the other parental, and vice versa for markers in black. Linkage groups are numbered from LG1 to LG8. A LOD score of 4 was used. Markers preceded by a P are iPBSs, by R are REMAPs and S indicates the SSR markers included. Partial distances in cM are indicated to the left of LGs while the total LG distance is displayed at the bottom. The insert to the right corresponds to the boxplot distribution of the distance in cM between consecutive markers.

LTRs and internal retrotransposon sequences

Internal parts of the retrotransposons were also amplified using degenerate and non-degenerate primers (S1 Table). Using the primers described for Ty1-copia elements [31], two PCR bands were cloned and sequenced. The smaller band of ~ 280 bp yielded 21 different sequences derived from 23 clones (Table 1 and S1 Fig). The second band of ~ 450 bp yielded six different sequences out of 12 clones, which after sequencing were identified as Ty3-gypsy elements. All the sequences corresponded to reverse transcriptase. The primers for Ty3-gypsy elements [32] yielded five discrete bands, ranging from ~ 850 bp to 2500 bp. From 55 clones, 22 different sequences were identified, corresponding to reverse transcriptase and some of them also overlapped with the beginning of the downstream RNase H. Further analyses included the Ty3-gypsy sequences obtained from the Ty1-copia primers in the Ty3-gypsy data set (Table 1 and S2 Fig). The position of the cloned DNA fragments in the RTs are indicated in Fig 1.

Table 1. Partial retrotransposon sequences obtained from lentil cultivar Lupa.

All sequences were aligned and then compared with Medicago truncatula LTR retrotransposons [44], those with a high similarity (e value < 10−20) were included in the subsequent analysis. Twenty-one sequences belonged to the copia family and all of them shared the conserved amino acid motif SLYGLKQA characteristic of the copia elements [31, 45] (S1 Fig). A neighbor-joining dendrogram was constructed in which the copia sequences were distributed among six sequence groups (Fig 3A) whose phylogenetic lineages were identified following the classification by Piednöel et al. [46] as Tnt1/Angela/Tont1, and SIRE1/Maximus. The 28 gypsy sequences were clustered in three groups (Fig 3B) classified as Tat/Ogre, Athila and Tekay/DEL/Peabody. Sequences of Group 3 shared several conserved amino acid motifs: LRID, DLRSGY, FG, and FIDD; while sequences in Group 1 shared variants of these sequence patterns: PHID, DGFSGY, FG, and YVDD (S2 Fig). The sequences considered in this analysis amplified from the gypsy and copia retrotransposons are listed in Table 1 and their corresponding positions within retroelements are indicated in Fig 1.

Fig 3. Phylogenetic trees of reverse transcriptase sequences.

Trees show the relationships between lentil sequences and Medicago truncatula (Mtr) sequences. A, Ty1-copia sequences; B, Ty3-gypsy sequences. Lentil sequences are within boxes indicating the different linkage groups (Gr) to which they belong, groups were related to the M. truncatula clades as described by Piednöel et al. [46] and the M. truncatula sequence numbers as in Wang and Liu [44]. Red color denotes the presence of premature stop codons in the reading frames.

The three primer combinations (the reverse primer was common to all) derived from the lentil copia element named Tnana amplified between two to five bands per combination allowing to sequence a total of 23 different PCR products (Table 1). After the alignment and phylogenetic analyses, these sequences were identified as part of the two major groups of plant retrotransposons. Fifteen come from of the pol gene of Ty1-copia sequences, similar to the Mtr38.1 sequence of M. truncatula included in the Tos17 group (Fig 3A), and eight were partial sequences of the RNase H Ty3-gypsy, similar to the Mtr59.19 included in the Tat group (Fig 3B). In addition to the presence of premature stop codons all these sequences evidenced changes in the described conserved active sites of the different enzymatic activities of the polyprotein, thus indicating that they must correspond to non-autonomous elements.

Two of the three partial sequences of the RNase H, encompassing from the end of the RNase H to the LTR (segment ce in Fig 1), were included in the copia family while the third (segment gc in Fig 1) was included in the gypsy family.

A total of 28 sequences out of the 75 retrieved that corresponded to the internal segment included premature stop codons in their ORFs. These codons were generated by point mutations or, more frequently, by frameshift mutations due to indels, generally deletions of several tens of nucleotides. These clones are marked with an asterisk in Fig 3.

Several iPBS markers were partially sequenced and those that had a high similarity with different retrotransposon families were further sequenced in order to obtain their complete sequences. Finally, six sequence types were selected to undertake a further analysis of their distribution throughout of the Lens genome. A complete LTR with similarity to an Angela lentil sequence [47] was amplified using the PBS7 primer; three LTRs of the non-autonomous TRIM Cassandra were amplified using PBS1, PBS2 or PBS2-PBS3; two complete LTRs of the Ty1-copia family SIRE1/Maximus similar to the gmw2-109b11-re-3 element of Glycine max [48] were obtained from primers PBS1-PBS3 and PBS3-PBS5; three LTRs of a SIRE1-13 element (Ty1-copia) [49] were retrieved from PBS1, PBS7 and PBS2-PBS3; three sequences similar to the LTR of Peabody elements (family Ty3-gypsy) were recovered from PBS1, PBS3 and PBS1-PBS2; and finally, part of the LTR of an Ogre element (family Ty3-gypsy) was derived using PBS1-PBS7 (Table 2 and localization in Fig 1).

Table 2. Retrotransposon long terminal repeat sequences amplified from lentil cultivar Lupa.

Likewise, the lentil LTR sequence described by Smykal et al. [47] was amplified using the primers described by these authors. We obtained a similar Angela LTR sequence, although it differed mainly in short insertion-deletions (Table 2).

Genomic distribution

The sequences listed in Tables 1 and 2 were used as “probes” in an in silico BLASTn search against the lentil draft genome v 0.8 [29]. This draft of the L. culinaris genome includes 490,452 contigs with an average length of 5,673 bp, ranging between 64 bp to 605,900 bp. Only sequences with an e-value lower than 3*10−4 were considered. The results obtained with the different “probe” sequences or groups of them are summarized in Table 3. Depending upon the distance between two consecutive hits they were either considered as part of the same element or included in two different elements, the number of putative retrotransposons identified ranged from 31,216 to 24,633 (Table 3) when the internal parts of retrotransposons were used as “probes”. When the LTRs were used, the range was between 51,027 to 30,968. Thus, the LTRs identified a higher number of elements in spite of existing two per element and the number of LTR used as a probe being lower than the internal sequence probes. Likewise, the LTR data revealed that the SIRE lineage was the most abundant class among the Ty1-copia elements, representing 74% to 81% of the copia elements depending on the distance considered between hits. Ogre elements were the most abundant among the Ty3-gypsy, although in a lower percentage (57% to 65% of the gypsy sequences), followed by Peabody elements. Ogre retrotransposons constitute large elements [14] so that the number of the different Ogre elements identified in the lentil contigs could be close to the figure displayed in the fourth column of Table 3 considering 50,000 bp distances between consecutive hits.

Table 3. Number of in silico hits of lentil retrotransposon sequence types in relation to distances between consecutive hits.

A further analysis was carried out in order to estimate the frequency of the putative incomplete RTs. The most frequent RT elements in lentil were used: Angela, SIRE, Peabody and Ogre. We searched for two LTRs of the same RT family within a distance of 10,000 bp, then for the presence of an internal sequence within of the LTRs. An internal sequence was identified in approximately only 3% of the Angela and SIRE (62 out of 1,767, and 59 out of 1,957) Ty1-copia elements, 28% (834/2,994) of the Peabody and 10% of the Ogre Ty3-gypsy elements (for Ogre a 50,000 bp distance was considered, since complete Ogre elements are larger than 10,000 bp).

Retrotransposon hit density distribution versus the logarithm of the contig length is in Fig 4. A total of 43,109 contigs (< 10%) contained one or more sequences similar to the “probes”, and most of them are large contigs (on average 42,770 bp). Only 141 of the largest contigs (mean size 176,200 bp) contained 10 or more sequences spaced by more than 10 bp. The distribution of the contigs with 0 to 18 LTR-RTs is depicted in Fig 5 with the box-plots indicating the average contig length and quartiles of each class. These results suggest that retrotransposons would be distributed randomly and that their number is directly related to contig length; thus the number included in each contig could possibly fit a Poisson distribution.

Fig 4. Contig length (bp) density distribution.

Figure shows the distributions of contig lengths; blue and red color lines indicate contigs with at least one lentil LTR-RT sequence or 10 or more lentil LTR-RT, respectively. The black line indicates the distribution of all lentil contigs, V0.8 genome. n = number of contigs.

Fig 5. Boxplot distributions of contig length according to the number of “hits” generated by the lentil LTR-RT sequences.

Numbers at the bottom indicate the number of hits per contig while those on top to the number of contigs in each class. The first distribution was obtained when two hits were considered different if they were separated by at least 10 bp, the second distribution when hits were separated by at least 10,000 bp.

To test these hypotheses a regression analysis of the Poisson distribution was carried out using the general lineal model (glm). The natural logarithm of the number of elements in each contig was the dependent variable while the contig length in bp was the independent variable. The output models were similar with respect to the four between hit distances tested, as can be seen in S2 Table. The distance to consider two consecutive hits as belonging to different elements had only a minor effect on the coefficients. In all instances, results did not fit a Poisson distribution since the deviation of the residuals did not follow a normal distribution (S2 Table). Contigs evidencing a high deviation from the expected value consisted of some large contigs with a low number of elements. Deviations from the expected values could be due to the high number of contigs without hits, as well as to differences in contig length, therefore the fitting of the contig set with a size ranging 12,800 to 60,480 bp was tested (this set includes the intermediate size interquartile range excluding most of the contigs lacking hits). Again, data did not fit a Poisson distribution for any range of between hit distances. For instance, for a distance of more than 1,000 bp between hits an excess of contigs with no retroelements or else with four or more was observed, while a clear deficit in the classes with one or two elements was detected (Fig 6).

Fig 6. Goodness of fit testing a Poisson distribution of the LTR-RT number according to the square root of the number of contigs.

The continuous line indicates the theoretical distribution and bars the real number of contigs within of each class. A between hit distance of > 10,000 bp was considered.

The physical maps of the possible retroelement locations corresponding to the seven contings accounting for the highest number of hits are shown in Fig 7. Only six possible complete LTR-RT were detected, two large Ogre-like (included in the gypsy family) in contigs 11687 and 341955, and four Peabody-like (also included in gypsy) with one in contigs 82312 and 300413 while two in contig 24787. The Ogre-like sequence of contig 11687 exhibited an internal hit for a reverse transcriptase of group 1e displayed in Fig 3B; while the sequence of contig 341955 probably constitutes a chimeric non-functional sequence since it included several internal hits of LTRs derived from other elements such as Peabody (gypsy) and Angela (copia) plus a reverse transcriptase (gypsy Group 3b). The four Peabody-like elements detected showed an internal hit of the gypsy reverse transcriptase belonging to Group 3b.

Fig 7. Diagram of the contigs in the lentil genome containing the highest number of LTR-RT sequences.

Contig size (above) and contig name (below) are indicated. Arrows indicate sequences’ orientations. Blue boxes indicate putative RT flanked by two LTRs, red boxes indicate the presence of a reverse transcriptase sequence between LTRs. LTR are named according to their lineage (Table 2) and the internal sequences according to the nomenclature used in Table 1.


In this study, several retrotransposon-based markers such as iPBS and REMAP, were tested in relation to their utility to carry out genetic mapping in lentil. The iPBS technique [17] proved to be especially useful because it yielded a high number of Mendelian markers. Likewise, it provided a simple method to obtain LTR retrotransposon (LTR-RTs) sequences in order to gain knowledge on their type and distribution in plant genomes. Since the introduction of iPBS markers, they have extensively been used in plants with respect to fingerprinting and genetic diversity analyses [2123, 50], nonetheless, their direct use in genetic mapping remains scarce. To the best of our knowledge, this study would be the first example of direct iPBS marker use in genetic mapping.

Considering the lentil parental lines used in this study, the high number of iPBS bands amplified in each reaction (an average of 9.75) indicates that LTR-RTs are abundant and widespread throughout of the lentil genome, in particular if considered that the iPBS PCR fragments are only produced when two LTR-RTs are located relatively close to each other and display inverted orientations. The high iPBS marker polymorphism detected in lentil agrees with previous results in pea [23]. Upon comparing the number of polymorphic iPBS loci (presence vs. absence of an amplified fragment) among the cultivated lentil and its wild ancestor, cultivar Lupa generated a slightly higher (106) number than that retrieved in the wild L. orientalis (87), although the difference was non-significant (χ2 = 1.87, 1 d.f., 0.2 > P > 0.1).

Segregating iPBS and REMAP markers were arranged in eight linkage groups, one more than the corresponding haploid chromosome set number of Lens. The marker density within each linkage group was uneven, evidencing in general a tighter clustering within of the central regions, possibly labeling favorable genomic regions for LTR-RT accumulation. Although a wide variation exists depending upon the specific LTR-RT element, in general LTR-RTs in other species tend to concentrate in gene-poor regions [51]. However, in Medicago and Lotus, LTR-RT rich regions can be as large as entire chromosome arms [52]. Analyzing the origin and distribution of the PCR product alleles, it can be observed (Fig 2) that in general these alleles derived equally from both parents and were distributed at random; yet in some linkage regions the alleles of one parental were clustered, such as in the case of linkage groups 1 and 6 exhibiting clusters of Lupa alleles. These regions might represent zones where the LTR-RT amplifications preferentially occurred, or LTR-RTs were inadvertently selected by linkage drag, in the cultivated lentil during the domestication process.

Sequencing of iPBS products and other retrotransposon based markers yielded a total of 81 sequences. The phylogenetic analysis disclosed that Lens contains all of the main LTR-RT families and phylogenetic lineages described in legumes [9]. According to our results, the lineage Tat/Ogre would be the most abundant in lentil, although the observed frequency could be biased due to the PCR primers used and is likely to be an underestimation since the current assembled lentil genome still lacks a large part comprising essentially repeated sequences. However, the Ogre prevalent abundance is in accordance with previous data of phylogenetically close species to lentil [9, 14]

Up to 37% of the lentil LTR-RT internal sequences analyzed in this study presented premature stop codons (PSC) having either originated by nucleotide substitutions or by frameshift mutations or by both mechanisms. The affected translatable sequences corresponded mainly to reverse transcriptase, since most of the analyzed sequences corresponded to this enzyme, but premature stop codons were also observed in other sequences. Since only a relatively reduced part of the total translatable fraction was sequenced, more PSC could possibly be present. Plant genome analyses in general tend to reveal a large number of LTR retrotransposons that contain stop codons interrupting their ORFs, lacking one or more coding domains, or else both situations [53]. Therefore, as in other plant species, a high proportion of the LTR-RTs would consist of non-autonomous elements lacking the essential enzymatic functions to enable genomic movement. However, most of these inactive or non-autonomous elements are likely to be able to retrotranspose using the trans-factors synthesized by autonomous elements present elsewhere in the genome, thus accordingly generating large changes in genome size [5457]. The subsequent in silico search of homologous sequences in the lentil draft genome yielded a lower number of hits when the more numerous internal transposon sequences (i.e., reverse transcriptase, RNase H, etc.,) were used compared to the LTR sequences, suggesting that many lentil retrotransposons contain internal deletions. Thus, they could be included into the non-autonomous set.

When the distance considered between consecutive LTR hits increased from 10 up to 10,000 bp, the theoretical size of whole or nearly-whole retrotransposons, the number of identified elements should have been reduced by approximately 50%. However, the observed diminution was 15.9% for the copia elements and 22.9% for the gypsy elements when the distance increased. Thus, the higher number of elements identified by the LTR hits compared to the internal sequence hits, plus the high frequency of internal sequence absence between LTRs, indicate in lentil a significant presence of defective non-autonomous, solo-LTRs and probably TRIM elements. Macas et al. [9] already indicated that the number of LTRs in species of the tribe Vicieae (Fabeae) was higher than that of the corresponding whole LTR-RTs; the estimated proportion of incomplete elements amounted to 41.2%, and 37.5% within of the gypsy Ogre elements (the most frequent retroelements).

In lentil the Ty1-copia, Ty3-gypsy and TRIM family frequencies are respectively 17.6%, 78.7% and 0.03% of the LTR-RTs [9]. The different lineage estimations obtained by our in silico search agree with these figures. When each one of the LTR sequences obtained was analyzed by separate, and we considered that they belonged to the same RT if two consecutive hits were separated by less than 10,000 bp, then the percentages obtained corresponded to 25.7% (10,898 hits) Ty1-copia, 72.0% (30,559) Ty3-gypsy and 2.3% (998) TRIM sequences (Table 3). Our data also agrees with previous results [9] for the within family lineages, especially for the major families: Tat/Ogre (59.8% of the Ty3-gypsy vs 66.0% in Macas et al., 2015), Tekay/DEL/ Peabody (40.2% of the Ty3-gypsy vs. 26%), SIRE1/Maximus (79.8% of the Ty1-copia vs. 93%).

Retrotransposons are generally widely distributed along plant genomes [7] and can be randomly distributed such as in the case of maize [58], although there are reports of preferential distribution of at least some types of retrotransposons, such as TRIMs [5960], and retrotransposon-derived markers in plant genomes [7, 6162]. The lentil elements analyzed were widely distributed as suggested by the percentage (8.8%) of the lentil draft genome contigs which contained at least one element. If LTR-RT were distributed at random in the genome, it would be expected that contigs containing zero, one, two, etc. LTR-RT should fit a Poisson distribution. But our data has not validated this hypothesis, even when a correction by contig sizes was considered. For instance, there are more large contigs without LTR-RT than expected (Fig 6). Furthermore, the genetic map generated with the iPBS markers (Fig 2) exhibited regions of different marker densities. These results suggest that the distribution of these elements in lentil is nonrandom.


To sum up, the present data points out that LTR-RTs are a suitable source of genetic markers in lentil and to their utility in genetic analysis and map construction. These markers can be useful for future genetic analyses, marker-assisted breeding, and phylogenetic studies. In lentil, the results indicate that a high proportion of the lentil retroelements have lost their autonomous transposition ability, either by point mutations and/or deletions. In fact, many of them seem to embody defective elements with internal deletions. Likewise, although they seem to be widely distributed throughout of the lentil genome their distribution is not completely random, being LTR-RTs more densely clustered in some regions of the Lens genome.

Supporting information

S1 Fig. Partial amino acid sequences of the reverse transcriptase encoded by the lentil Ty1-copia elements.

Asterisks denote premature stop codons and question marks the absence of one or two nucleotides in the corresponding reading frame. Conserved motifs are located inside boxes, lentil sequences denoted by “Cop” followed by a number (see Table 1), Mtr denotes sequences of Medicago truncatula as numbered according to Wang and Liu [44] indicating the clade in which they are included according to Piednöel et al. [46].


S2 Fig. Partial amino acid sequences of the reverse transcriptase encoded by the lentil Ty3-gypsy elements.

Lentil sequences are denoted by “Gyp” followed by a number See heading of Supplementary Fig 1 for additional legends.


S3 Fig. Partial amino acid sequences of the reverse transcriptase and RNaseH codified by Ty1-copia elements obtained with Tnana-derived primers.

RNaseH indicates the starting point of this protein. See heading of Supplementary Fig 1 for legends.


S4 Fig. Partial amino acid sequences of the RNaseH codified by Ty3-gypsy elements obtained with Tnana-derived primers.

See heading of Supplementary Fig 1 for legends.


S2 Table. Coefficients (a and b) according to the general linear model (glm) testing a Poisson distribution, and values of the D statistics followed by probability according to the Kolmogorov-Smirnov test of residuals adjusting to a normal distribution.


Author Contributions

  1. Conceptualization: MPV LSM.
  2. Data curation: RRB LSM PG.
  3. Formal analysis: RRB LSM PG.
  4. Funding acquisition: MPV.
  5. Investigation: RRB LSM.
  6. Methodology: RRB LSM PG.
  7. Project administration: MPV LSM.
  8. Software: LSM.
  9. Supervision: MPV LSM.
  10. Visualization: LSM RRB PG.
  11. Writing – original draft: MPV.
  12. Writing – review & editing: MPV LSM PG.


  1. 1. Fratini R, Pérez de la Vega M. Genetics of economic traits in lentil: seed traits and adaptation to climatic variations. Grain Legumes. 2011; 57:18–20.
  2. 2. Pérez de la Vega M, Fratini R, Muehlbauer FJ. Lentil (Lens culinaris Medik.). In: Pérez de la Vega M, Torres AM, Cubero JI, editors. Genetics, Genomics and Breeding of Cool Season Grain Legumes. CRC Press; 2011. pp. 98–150.
  3. 3. Mikic A, Smykal P, Kenicer G, Vishnyakova M, Sarukhanyan N, Akopian JA, et al. Beauty will save the world, but will the world save beauty? The case of the highly endangered Vavilovia formosa (Stev.) Fed Planta. 2014; 240:1139–1146. pmid:25086615
  4. 4. Schaefer H, Hechenleitner P, Santos-Guerra A, Menezes de Sequeira M, Pennington R T, Kenicer G, et al. Systematics, biogeography, and character evolution of the legume tribe Fabeae with special focus on the middle-Atlantic island lineages. BMC Evol Biol.2012; 12:250. pmid:23267563
  5. 5. Feschotte C, Jiang N, Wessler SR. Plant transposable element: where Genetics meets Genomics. Nat Rev Genet. 2002; 3:329–341. pmid:11988759
  6. 6. Kumar A, Bennetzen JL. Plant retrotransposons. Ann Rev Genet. 1999; 33:479–532. pmid:10690416
  7. 7. Waugh R, McLean K, Flavell J, Pearce SR, Kumar A, Thomas BBT, et al. Genetic distribution of Bare–1-like retrotransposable elements in the barley genome revealed by sequence-specific amplification polymorphisms (S-SAP). Mol Gen Genet. 1996; 253:687–694.
  8. 8. Ragupathy R, You FM, Cloutier S. Arguments for standardizing transposable element annotation in plant genomes. Trends Plant Sci. 2013; 18:367–376. pmid:23618952
  9. 9. Macas J, Novák P, Pellicer J, Čížková J, Koblížková A, Neumann P, et al. In depth characterization of repetitive DNA in 23 plant genomes reveals sources of genome size variation in the legume tribe Fabeae. PLoS One. 2015;10(11): e0143424. pmid:26606051
  10. 10. Kalendar R, Vicient CM, Peleg O, Anamthawat-Jonsson K, Bolshoy A, Schulman DAH. Large retrotransposon derivatives: abundant, conserved but nonautonomous retroelements of barley and related genomes. Genetics. 2004; 166:1437–1450. pmid:15082561
  11. 11. Witte CP, Le QH, Bureau T, Kumar DA. Terminal-repeat retrotransposons in miniature (TRIM) are involved in restructuring plant genomes. Proc. Natl. Acad. Sci. USA. 2001; 98:13778–13783. pmid:11717436
  12. 12. Hill P, Burford D, Martin DMA, Flavell A J. Retrotransposon populations of Vicia species with varying genome size. Mol Genet Genomics. 2005; 273:371–381. pmid:15891910
  13. 13. Macas J, Neumann P, Navrátilová A. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics. 2007; 8: 427 pmid:18031571
  14. 14. Macas J, Neumann P. Ogre elements—A distinct group of plant Ty3/gypsy-like retrotransposons. Gene. 2007; 390:108–116. pmid:17052864
  15. 15. Flavell AJ, Knox MR, Pearce SR, Ellis THN. Retrotransposon based insertion polymorphisms (RBIP) for high throughput marker analysis. Plant J. 1998; 16:643–650. pmid:10036780
  16. 16. Kalendar R, Schulman AH. IRAP and REMAP for retrotransposon-based genotyping and fingerprinting. Nature Protoc. 2007; 1:2478–2484.
  17. 17. Kalendar R, Antoniu K, Smykal P, Schulman AH. iPBS: a universal method for DNA fingerprinting and retrotransposon isolation. Theor Appl Genet. 2010; 121:1419–1430. pmid:20623102
  18. 18. Monden Y, Yamaguchi K, Tahara M. Application of iPBS in high-throughput sequencing for the development of retrotransposon-based molecular markers. Curr Plant Biol. 2014. 1:40–44.
  19. 19. Kalendar R, Flavell AJ, Ellis THN, Sjakste T, Moisy C, Schulman AH. Analysis of plant diversity with retrotransposon-based molecular markers. Heredity. 2011; 106:520–530. pmid:20683483
  20. 20. Roy NS, Choi J-Y, Lee S-l, Kim N-S. Marker utility of transposable elements for plant genetics, breeding, and ecology: a review. Genes Genom. 2015; 37:141–151.
  21. 21. Baloch FS, Derya M, Andeden EE, Alsaleh A, Cömertpay G, Kilian B, et al. Inter-primer binding site retrotransposon and inter-simple sequence repeat diversity among wild Lens species. Biochem Syst Ecol. 2015; 58:162–168.
  22. 22. Andeden EE, Baloch FS, Derya M, Ozkan H. IPBS-Retrotransposon-based genetic diversity and relationship among wild annual Cicer species. J. Plant Biochem Biotechnol. 2013; 22:453–466.
  23. 23. Baloch FS, Alsaleh A, Sáenz de Miera LE, Hatipoğlu R, Çiftçi V, Karakdy T, Yildiz M, Ozkan H. DNA based iPBS-retrotransposon markers for investigating the population structure of pea (Pisum sativum) germplasm from Turkey. Biochem Syst Ecol. 2015; 61:244–252.
  24. 24. Ellis THN, Poyser SJ, Knox MR, Vershinin AV, Ambrose MJ. Polymorphism of insertion sites of Ty1-copia class retrotransposons and its use for linkage and diversity analysis in pea. Mol Gen Genet. 1998; 260:9–19. pmid:9829823
  25. 25. Jing R, Vershinin A, Grzebyta J, Shaw P, Smykal P, Marshall D, et al. The genetic diversity and evolution of field pea (Pisum) studied by high throughput retrotransposon based insertion polymorphism (RBIP) marker analysis. BMC Evol Biol. 2010; 10:44. pmid:20156342
  26. 26. Sanz A Martín, Gonzalez S Gilsanz, Syed NH, Suso MJ, Saldaña C Caminero, Flavell AJ. Genetic diversity analysis in Vicia species using retrotransposon-based SSAP markers. Mol Genet Genomics. 2007; 278:433–441. pmid:17576596
  27. 27. Pearce SR, Knox M, Ellis T H N, Flavell A J, Kumar A. Pea Ty1-copia group retrotransposons: transpositional activity and use as markers to study genetic diversity in Pisum. Mol Gen Genet. 2000; 263:898–907. pmid:10954074
  28. 28. Smykal P. Development of an efficient retrotransposon-based fingerprinting method for rapid pea variety identification. J Appl Genet. 2006; 47:221–30. pmid:16877800
  29. 29. Bett K, Ramsay L, Chan C, Sharpe AG, Cook DR, Penmetsa RV, et al. Lentil 1.0 and Beyond. Plant & Animal Genome Conference XXIV, 2016. W222. Available from:
  30. 30. Rey, MR. Desarrollo de marcadores genéticos a partir de secuencias de retrotransposones del genoma de lenteja, Lens culinaris Medik. Ph.D. Dissertation, Universidad de León. 2016. Available From:
  31. 31. Hirochika F, Hirochika R. Ty1-copia group retrotransposons as ubiquitous components of plant genomes. Jpn J Genet. 1993; 68:35–46. pmid:8391285
  32. 32. Friedsen N, Brandes A, Heslop-Harrison JS. Diversity, origin, and distribution of retrotransposons (gypsy and copia) in conifers. Mol Biol Evol. 2001; 18:1176–1188. pmid:11420359
  33. 33. Llorente A. Análisis de genes de leguminas en el género Lens. Ph. D. Dissertation, Univ. de León, Spain. 2001.
  34. 34. Hamwieh A, Udupa S M, Choumane W, Sarker A, Dreyer F, Jung C, et al. A genetic linkage map of Lens sp. based on microsatellite and AFLP markers and the localization of fusarium vascular wilt resistance. Theor Appl Genet. 2005; 110:669–677. pmid:15650814
  35. 35. Pearce SR, Stuart-Rogers C, Knox M., Kumar A, Ellis THN, Flavel A.J. Rapid isolation of plant Ty1-copia group retrotransposon LTR sequences for molecular marker studies. Plant J. 1999; 19:711–717. pmid:10571856
  36. 36. Lander ES, Green P, Abrahamson J, Barlow A, Daly M J, Lincoln S E, et al. MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics. 1987; 1:174–181. pmid:3692487
  37. 37. Givry S, Bouchez M, Chabrier P, Milan D, Schiex T. CarhtaGene: multipopulation integrated genetic and radiation hybrid mapping. Bioinformatics. 2005; 21:1703–1704. pmid:15598829
  38. 38. Voorrips RE. MapChart: Software for the graphical presentation of linkage maps and QTLs. J. Hered. 2002; 93:77–78. pmid:12011185
  39. 39. Kumar S, Stecher G, Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016.
  40. 40. Saitou N, Nei M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987; 4:406–425. pmid:3447015
  41. 41. Tamura K. Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G + C-content biases. Mol Biol Evol. 1992; 9:678–687. pmid:1630306
  42. 42. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009; 10:421. pmid:20003500
  43. 43. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2014. Available from: URL
  44. 44. Wang H, Liu J-S. LTR retrotransposon landscape in Medicago truncatula: more rapid removal than in rice. BMC Genomics. 2008; 9:308.
  45. 45. Flavell AJ, Smith JB, Kumar A. Extreme heterogeneity of Ty1-copia group retrotransposons in plants. Mol Gen Genet. 1992; 231:233–242. pmid:1370976
  46. 46. Piednöel M, Carrete-Vega G, Renner SS. Characterization of the LTR retrotransposon repertoire of a plant clade of six diploid and one tetraploid species. Plant J. 2013; 75:699–709. pmid:23663083
  47. 47. Smykal P, Kalendar R, Ford R, Macas J, Griga M. Evolutionary conserved lineage of Angela-family retrotransposons as a genome-wide microsatellite repeat dispersal agent. Heredity. 2009; 103:157–167. pmid:19384338
  48. 48. Wawrzynski A, Ashfield T, Chen NW, Mammadov J, Nguyen A, Podicheti R, et al. Replication of nonautonomous retroelements in soybean appears to be both recent and common. Plant Physiol. 2008; 148:1760–1771. pmid:18952860
  49. 49. Pearce S R. SIRE-1, a putative plant retrovirus is closely related to a legume Ty1-copia retrotransposon family. Cell Mol Biol Lett. 2007; 12:120–126. pmid:17119871
  50. 50. Guo DL, Guo MX, Hou XG, Zhang GH. Molecular diversity analysis of grape varieties based on iPBS markers. Biochem Syst Ecol. 2014; 52:27–32.
  51. 51. Zhao M, Ma J. Co-evolution of plant LTR-retrotransposons and their host genomes. Protein Cell. 2013;4:493–501. pmid:23794032
  52. 52. Bertioli DJ, Moretzsohn MC, Madsen LH, Sandal N, Leal-Bertioli SCM, Guimarães PM, et al. An analysis of synteny of Arachis with Lotus and Medicago sheds new light on the structure, stability and evolution of legume genomes. BMC Genomics. 2009; 10:45. pmid:19166586
  53. 53. Schulman AH. Retrotransposon replication in plants. Curr Opin Virol 2013, 3:604–614. pmid:24035277
  54. 54. Flavell AJ, Dunbar E, Anderson R, Pearce SR, Hartley R, Kumar A. Tyl-copia group retrotransposons are ubiquitous and heterogeneous in higher plants. Nucl Acids Res. 1992; 20:3639–3644. pmid:1379359
  55. 55. Sabot F, Schulman AH. Parasitism and the retrotransposon life cycle in plants: a hitchhiker’s guide to the genome. Heredity. 2006; 97:381–388. pmid:16985508
  56. 56. Tanskanen JA, Sabot F, Vicient C, Schulman AH. Life without GAG: The BARE-2 retrotransposon as a parasite's parasite. Gene. 2007;390: 166–174. pmid:17107763
  57. 57. Estep MC, DeBarry JD, Bennetzen JL. The dynamics of LTR retrotransposon accumulation across 25 million years of panicoid grass evolution. Heredity. 2013; 110:194–204. pmid:23321774
  58. 58. Meyers BC, Tingey SV, Morgante M. Abundance, distribution, and dranscriptional activity of repetitive elements in the maize genome. Genome Res. 2001; 11:1660–1676. pmid:11591643
  59. 59. Gao D, Li Y, o Kim KD, Abernathy B, Jackson SA. Landscape and evolutionary dynamics of terminal repeat retrotransposons in miniature in plant genomes. Genome Biol. 2016; 17:7. pmid:26781660
  60. 60. Langdon T, Seago C, Mende M, Leggett M, Thomas H, Forster JW, et al. Retrotransposon evolution in diverse plant genomes. Genetics. 2000; 156:313–325. pmid:10978295
  61. 61. Bernet GP, Asíns MJ. Identification and genomic distribution of gypsy like retrotransposons in Citrus and Poncirus. Theor Appl Genet. 2003; 108:121–130. pmid:12937896
  62. 62. Peterson-Burch B, Nettleton D, Voytas DF. Genomic neighborhoods for Arabidopsis retrotransposons: a role for targeted integration in the distribution of the Metaviridae. Genome Biol. 2004; 5:R78. pmid:15461796