Mapping of important taxonomic and productivity traits using genic and non-genic transposable element markers in peanut (Arachis hypogaea L.)

A mapping population of recombinant inbred lines (RILs) derived from TMV 2 and its mutant, TMV 2-NLM was employed for mapping important taxonomic and productivity traits using genic and non-genic transposable element markers in peanut. Single nucleotide polymorphism and copy number variation using RAD-Sequencing data indicated very limited polymorphism between TMV 2 and TMV 2-NLM. But phenotypically they differed significantly for many taxonomic and productivity traits. Also, the RIL population showed significant variation for a few additional agronomic traits. A genetic linkage map of 1,205.66 cM was constructed using 91 genic and non-genic Arachis hypogaea transposable element (AhTE) markers. Using single marker analysis and QTL analysis, the markers with high phenotypic variance explained (PVE) were identified for branching pattern (32.3%), number of primary and secondary branches (19.9% and 28.4%, respectively), protein content (26.4%), days to 50% flowering (22.0%), content of oleic acid (15.1%), test weight (13.6%) and pod width (12.0%). Three genic markers (AhTE0357, AhTE0391, AhTE0025) with Arachis hypogaea miniature inverted-repeat transposable element (AhMITE1) activity in the genes Araip.TG1BL (B02 chromosome), Aradu.7N61X (A09 chromosome) and Aradu.7065G (A07 chromosome), respectively showed strong linkage with these taxonomic, productivity and quality traits. Since TMV 2 and TMV 2-NLM differed subtly at DNA level, the background noise in detecting the marker-trait associations was minimum; therefore, the markers identified in this study for the taxonomic and productivity traits may be significant and useful in peanut molecular breeding.

Introduction using the gene prediction data from the diploid genomes (available at https://peanutbase.org) as described earlier [11]. TMV 2 and TMV 2-NLM were also checked for the differential activity of the AhMITE1 at various sites [9,12] in the genome. The polymorphic sites were searched for functional annotation.

Evaluation of the RIL mapping population
The mapping population with 432 RILs derived from TMV 2 and its EMS-derived mutant, TMV 2-NLM [4] was developed by single seed descent method at University of Agricultural Sciences (UAS), Dharwad, India [5]. The F 14 seeds of these RILs were obtained from the Department of Genetics and Plant Breeding, UAS, Dharwad, India.
Phenotyping of the RILs. The RIL population along with the parents was grown at IABT Garden of the Department of Biotechnology, UAS, Dharwad, India during the rainy seasons of 2014 and 2015 in a randomized block design with two replications. Each replication consisted of 2 rows of 1.5 mt length with a spacing of 30 cm × 10 cm. Five representative plants were selected randomly from each RIL to record the taxonomic and productivity traits. Reactions to late leaf spot (LLS) and rust were assessed by subjecting the RILs to field screening following spreader row technique [13] in which the disease spreader plants [TMV 2 and Mutant 28-2 [14]] were planted at regular interval of 10 rows, and the disease epiphytotic condition was created using the inoculums. Disease scoring for both LLS and rust was done at 70, 80 and 90 days after sowing (DAS) according to modified 9-point scale [15]. Observations on taxonomic and morphological traits (branching pattern, growth habit, plant height, leaflet length, leaflet width, leaflet shape, leaflet colour), and productivity traits (number of pods per plant, pod yield per plant, pod yield, test weight, shelling percentage, pod length, pod width, pod size, pod constriction, pod reticulation, kernel colour, seed shape, seeds per pod) were recorded as per the groundnut descriptor [16]. Number of primary branches (NPB) borne on main axis, and the number of secondary branches (NSB) borne on primary branches were recorded. Sound mature kernel weight (%) was calculated as weight of well-developed kernels from a unit weight of kernels. Nutritional parameters contents of protein, oil, arachidic acid, behenic acid, eicosanoic acid, lignoseric acid, linoleic acid, oleic acid and palmitic acid were analyzed by near infrared spectroscopy (NIRS) using FOSS NIR System, 6500 Composite (FOSS Analytical A/S, Denmark). Chlorophyll content was measured in terms of SPAD chlorophyll meter reading (SCMR) with the help of SPAD meter on 37 DAS. Seed dormancy test was conducted by subjecting the dried seeds for germination after 15 days of harvesting, and observing for the number of seeds germinated on each day for 14 consecutive days. Percentages of seeds germinated were used to record the level of seed dormancy using the standard scores [17].

Genotyping of the RILs
Total genomic DNA from the RILs was extracted from young leaves using modified cetyl trimethyl ammonium bromide (CTAB) method [18]. The RILs were genotyped with the AhTE markers, which were polymorphic between TMV 2 and TMV 2-NLM (S1 Table). The PCR was carried out in a reaction volume of 10 μl with 50 ng of template DNA, 5 pmol of each primer, 10X of Taq polymerase buffer [500 mM KCl, 100 mM Tris-HCI (pH 8.5], 2.0 mM of MgCl 2 , 0.25 mM of dNTPs and 0.15 U of Taq polymerase. PCR was performed in 96-wellplates using Veriti 96-Well Thermal Cycler (Applied Biosystem) with the temperature profile of 95˚C for 5 min and 35 cycles of 95˚C for 1 min, 53˚C for 1 min and 72˚C for 1.30 min, and 72˚C for 8 min for final extension. The PCR products were analyzed by loading them on 1.8% agarose gel and electrophoresing in 1X TAE at 80 V for 2 h using Bio-Rad gel electrophoresis unit. The amplicons were visualized using ethidium bromide staining method. Specific PCR product was identified for each marker [9,12]

Single marker analysis (SMA)
Single marker analysis was performed to find out the association between the AhTE markers and the traits observed in this study by calculating F statistic and simple linear regression coefficient [20] using WinQTL Cartographer version 2.5 [21]. Those significant and major markers showing >10% R 2 were analyzed for their position in the genome and functional annotation using the gene prediction data from the diploid genomes (available at https://peanutbase.org).

Linkage map construction
Linkage analysis was performed with JoinMap 4.0 [19]. The "Locus genotype frequency" function was applied to calculate chi-square values for each marker to test for the expected 1:1 segregation. Markers were placed onto linkage groups with the "LOD groupings" and "Create groups for mapping" command using the Kosambi map function [22]. Calculation parameters were set for a minimum LOD threshold of 3.0, and the marker order in groups was established by "Calculate Map" command. After developing the framework genetic map, the unmapped markers were placed onto different linkage groups. The linkage map was drawn using the software MapChart 2.2 [23].

QTL analysis
The QTL mapping was performed for the phenotypic data collected during the two seasons; the rainy seasons of 2014 and 2015 and the linkage map using Windows QTL Cartographer version 2.5 [21] to detect QTL regions. Composite interval mapping (CIM) with 1,000 permutations, 1.0 cM scanning interval between markers and putative QTL, and a window size of 10.0 cM was used for QTL mapping.

Results
Field evaluation of TMV 2 and its mutant, TMV 2-NLM during the rainy seasons of 2014 and 2015 showed main stem flowering, sequential branching pattern and erect growth habit and for TMV 2, and absence of main stem flowering, alternate branching pattern and semi-spreading growth habit for TMV 2-NLM (Table 1 and Fig 1). TMV 2 had wide elliptical leaflets, while TMV 2-NLM had narrow, linear and lanceolate leaflets. Significant differences were also observed between TMV 2 and TMV 2-NLM for number of primary branches, number of secondary branches, pod yield per plant, test weight, shelling percentage, sound mature kernel weight, arachidic acid, behenic acid, eicosenoic acid, lignoseric acid, linoleic acid, oleic acid, palmitic acid and seed dormancy. Thus, TMV 2 and its primary mutant TMV 2-NLM differed for several taxonomic, agronomic, productivity and nutritional traits.
An attempt was made to check the genetic differences between TMV 2 and TMV 2-NLM. ddRAD-Sequencing of TMV 2 and TMV 2-NLM with~4X coverage could detect a total of 31 SNPs across 7 chromosomes (Table 2). Of them, only three SNPs were found from A genome, while remaining 28 SNPs originated from B genome. Twenty-nine SNPs were genic and only two were non-genic with respect to their location. Of the genic SNPs, a large number (17) was present in the introns. Of the six SNPs present in the exonic region, four were non- synonymous and two were synonymous. It was interesting to note that a few genes accumulated more SNPs upon EMS mutagenesis. Araip.NZ9YG gene on chromosome B04 coding for a protein with "F-box protein interaction domain" carried 14 SNPs, while Araip.X5KQ1 gene on chromosome B01 coding for probable sugar phosphate-protein carried four SNPs. In total, nine genes showed sequence alterations due to SNPs with or without possible functional alterations. An effort was made to check the copy number variations (CNVs) between TMV 2 and TMV 2-NLM. A total of 1,200 genomic regions showed significant CNVs (Fig 2), however, only five regions showed a change (increase or decrease) by at least five-fold of log 2 . TMV 2 and TMV 2-NLM were also checked for the differential activity of AhMITE1 over 369 genomic sites using the AhTE markers. Polymorphism was observed at 105 (28.4%) sites between TMV 2 and TMV 2-NLM. Of them, 57 sites represented genic regions, while 48 belonged to non-genic regions. Sequence analysis indicated that out of the 57 genic sites, the transpositional activity of the AhMITE1 was found in the exons at six sites. In 16 genic sites, the transposon activity was restricted to intronic regions. Six genes had transposition site at untranslated regions (UTRs). Fifteen genes had transposition site at upstream regions (23-949 bp) and 14 genes had transposon activity at downstream regions (8-934 bp). Overall, the phenotypic and genotypic data revealed very limited genotypic polymorphism between TMV 2 and TMV 2-NLM, despite significant phenotypic differences.
In order to map the genomic regions governing taxonomic and productivity traits, the mapping population of TMV 2 × TMV 2-NLM was employed. One hundred and five AhTE markers showing polymorphism between TMV 2 and TMV 2-NLM were used for genotyping the 432 RILs. A linkage map of 1,205.66 cM was constructed with 91 markers on 20 linkage groups (LGs) (S2 and S3 Tables). The length of LGs ranged from 4.59 cM (A04) to 164.12 cM (B09). The number of markers mapped on the linkage groups ranged from 2 (A02a, A04, A05a, A07 and B10) to 11 (A09 and B09). The overall inter-marker distance was 18.13 cM.
RILs were field-evaluated during the rainy seasons of 2014 and 2015. They differed significantly for most of the taxonomic, agronomic, productivity and nutritional traits (S4 Table). High PCV and GCV were observed for LLS score at 80 and 90 DAS, number of pods per plant, pod yield per plant, number of secondary branches and seed dormancy, for which the parents also differed significantly (S5 Table). However, the RILs also showed considerable variability for resistance to late leaf spot and rust, pod and seed features (pod constriction, pod reticulation and seed shape) and some fatty acids (behenic and eicosenoic acid), though the parents did not differ significantly.
Majority of the traits showed normal distribution as tested by skewness and kurtosis (S6 Table and Fig 3). Correlation analysis indicated a positive and significant association of days to 50% flowering with the number of primary and secondary branches (S7 Table). But days to 50% flowering recorded negative and significant correlation with productivity and oil quality traits. Test weight showed positive and significant association with pod yield and content of oleic acid palmitic acid and O/L ratio. Test weight was also positively and significantly correlated with pod width. It was also observed that the sequential type of branching habit resulted in higher pod yield.
Single marker analysis (SMA) was performed to find any association between the AhTE markers and the traits. A total of 41 and 43 traits in 2014 and 2015, respectively were subjected for SMA (S8 and S9 Tables). Three markers (AhTE0357, AhTE0391 and AhTE0523) showed significant association and high R 2 with one or more traits during 2014 and/or 2015 (Table 3). AhTE0357 showed the highest R 2 (32.8%) for the branching pattern (sequential/alternate). AhTE0357 also showed strong association with days to 50% flowering and number of primary and secondary branches. AhTE0391 showed strong association with the contents of protein, oil, oleic acid, linoleic acid and palmitic acid. Apart from these three markers, AhTE0025 showed high R 2 with test weight and pod width.
Analysis of the genomic position of these four markers showed that AhTE0357, AhTE0391 and AhTE0025 were genic. The transposition site of AhMITE1 at AhTE0357 locus was located at 79 bp downstream of the gene Araip.TG1BL. The marker locus corresponding to AhTE0391 coincided with the gene Aradu.7N61X. The AhMITE1 transposition site was present at 2,129 bp of the second intronic region of Aradu.7N61X (Fig 4). At AhTE0025 locus, AhMITE1 was inserted in the intronic region of Aradu.7065G. But, the marker AhTE0523 showed several significant hits upon BLAST, therefore the exact position could not be decided. Function prediction revealed that Aradu.7N61X codes for alpha-glucosidase, while Aradu.7065G and Araip. TG1BL code for aldo/keto reductase family oxidoreductase and unknown protein (galactose oxidase/kelch, beta-propeller), respectively. QTL analysis was attempted using the linkage map and the phenotypic data for 31 traits in 2014, and 33 traits in 2015 (S10 and S11 Tables). The QTL map showed seven major (PVE more than 10%) QTL regions for 22 traits across the years (Table 4 and Fig 5). The number of traits governed by these QTL ranged from one (AhTE0074-AhTE0200 and AhTE0005-AhTE0148) to nine (AhTE0391-AhTE0572).
QTL flanked by AhTE0357-AhTE0050 showed high PVE for days to 50% flowering, number of primary and secondary branches (Table 4). QTL between AhTE0003-AhTE0332 showed PVE of 26.0% and 26.4% for protein content during 2014 and 2015, respectively. QTL flanked by AhTE0025-AhTE0296 showed a PVE of 13.6% and 11.6% for test weight and pod width, respectively during 2015. Similarly, QTL at AhTE0391-AhTE0572 governed oil content, linoleic acid, oleic acid palmitic acid and O/L ratio. An effort was made to find out the gene content in the QTL region AhTE0391-AhTE0572. A total of 256 genes were predicted in the region homeologous to AhTE0391-AhTE0572 on B09 chromosome. Identifying and dissecting these genes for other QTL regions to find out the candidate gene(s) requires additional experimental evidence.
TMV 2 contributed the favourable allele at AhTE0003-AhTE0332 for protein content. The allele contributed by TMV 2 at AhTE0357-AhTE0050 resulted in reduced days to 50% flowering and number of primary branches. TMV 2-NLM contributed favourable allele at AhTE0391-AhTE0572 for oleic acid content. For test weight, TMV 2-NLM contributed the favourable allele at AhTE0025-AhTE0296.
An attempt was made to select the superior RILs for variety development and commercial release. High heritability and genetic advance over mean (GAM) were observed for LLS score at 90 DAS, days to 50% flowering, test weight, pod width, oil and protein content, oleic acid, linoleic acid and SCMR. Fifteen RILs were marginally superior for pod yield/plant over the best parent, TMV 2 (Table 5). However, only one RIL [2-19(f)] was significantly superior over TMV 2 for pod yield. The RIL 2-45(a) was superior over TMV 2 for pod yield, days to 50% flowering (early), test weight, sound mature kernel weight, protein content, oil content, oleic acid, O/L ratio and LLS and rust resistance. Another RIL 1-11(b) was superior over TMV 2 for pod yield, seed dormancy (moderate), days to 50% flowering (early), test weight, sound mature kernel weight, oleic acid, and O/L ratio. RILs 2-14(a)(ii), 2-29(g), 2-33(a) and 2-23(c) were superior over TMV 2 for pod yield, oil content, oleic acid, test weight, sound mature kernel weight and days to 50% flowering (early). Of these 15 RILs, four foliar disease resistant lines

Discussion
TMV 2, a popular and elite variety of peanut, and its EMS-derived mutant TMV 2-NLM were used to identify the genomic regions contributing to the important taxonomic and productivity traits in peanut by QTL mapping using a large number (432) of their recombinant inbred lines (RILs). TMV 2 and TMV 2-NLM differed significantly for the growth habit, number of branches (primary and secondary), productivity traits and quantity and quality of the oil. The RIL population showed considerably high variability not only for those traits for which the parents differed, but also for a few other traits (resistance to LLS and rust, pod and kernel features), thus allowing QTL detection for several traits. RAD-Sequencing, which is a reduced representation of genome sequencing method evaluating~1% of the genome, revealed genetic differences in terms of SNPs and copy number variations. The density of SNP was about 1 per Mb, which was comparable to the density found in tomato mutants [11]. Of the total 31 SNPs, only four were non-synonymous, indicating very few functional differences between TMV 2 and TMV 2-NLM. Likewise, significant copy number variations were only five within 1% of the genome. Though SNPs and CNVs can have significant effect on phenotypes through expression variation [27-29], TMV 2 and TMV 2-NLM did not differ significantly since they showed very few differences at DNA sequence. TMV 2 and TMV 2-NLM also showed the differential transpositional activity of AhMITE1 at 105 loci out of 369 loci tested in this study. Further, it was found that AhMITE1 activity was restricted to exons of only six genes. Thus, TMV 2 and TMV 2-NLM could constitute ideal parents of a RIL population which can be employed for mapping the traits by reducing the background noise.
Using 91 AhTE markers, a partial linkage map of 1,205.66 cM was constructed in a new RIL mapping population (TMV 2 × TMV 2-NLM) in peanut. The marker order was grossly comparable to the maps for SKF2 (Satonoka × Kintoki) and NYF2 (YI-0311 × Nakateyutaka),  which were also constructed with TE markers [12]. The markers mapped on A01, A02, A04, A05, B01, B03, B04 and B07 could be confirmed with BLAST. But other markers could not be confirmed since they differed partly (differing for homeologous chromosomes) or completely. Further, a QTL map was constructed using this linkage map along with the phenotypic data on taxonomic and productivity traits over two seasons (rainy seasons of 2014 and 2015 The markers flanking the QTL were also found to be strongly associated with the same traits when checked with single marker analysis. The marker AhTE0357 with AhMITE1 insertion polymorphism at 79 bp downstream of the gene Araip.TG1BL on chromosome B02, could differentiate the genotypes with sequential and alternate branching pattern to an extent of 32.8%. AhTE0357 was also associated with days to 50% flowering and number of primary and secondary branches. But selection for early maturing genotypes using AhTE0357 was not expected to bring any yield advantage since days to 50% flowering recorded negative and significant correlation with productivity and oil quality traits. AhTE0391, apart from being associated with the contents of oil, oleic acid, linoleic acid and palmitic acid, also showed association with protein content. The marker locus corresponding to AhTE0391 coincided with the gene Aradu.7N61X, which codes for alpha-glucosidase. In germinating grains of barley, HvAGL97 codes for α-glucosidase, which is a major endosperm enzyme. It catalyzes conversion of maltose to glucose, but is not required for starch degradation [32,33]. Test weight and pod width were positively and significantly correlated, and both showed strong association with AhTE0025, which corresponded to the gene Aradu.7065G on A07 chromosome coding for aldo/keto reductase family oxidoreductase. In Digitalis purpurea, aldo/keto reductase is known take part in the biosynthesis of cardiac glycosides [34]. In the previous studies, QTL regions for test weight were mapped at a distance of 1. The genomic resources (QTL regions/genes/markers) developed in this study help in generating the genetic resources to breed for improved peanut varieties. But the RILs used in this study directly provide an opportunity to select the superior lines for productivity traits. RILs 2-14(a)(ii), 2-29(g), 2-33(a) and 2-23(c) were superior since they recorded 7-15%, 35-50% and 1-4% increase over TMV 2 for pod yield, test weight and oil content, respectively. They also had 12-21% O/L higher oleate than TMV 2. They matured one to two days earlier than TMV 2. These RILs are being multiplied and evaluated in detail for variety development and commercial release.
The linkage map of TMV 2 × TMV 2-NLM, QTL regions and markers identified in this study for taxonomic, agronomic and productivity traits will be of immense use for fine-mapping and identifying the candidate genes. The loci of TE markers both at genic and non-genic regions are known to influence the gene expression [37], thereby favoring gene discovery for the taxonomic and productivity traits for use in peanut improvement. Further, the superior RILs identified in this study form the novel genotypes, which would be useful in peanut breeding for productivity and nutritional traits.

Conclusions
QTL mapping using 432 recombinant inbred lines derived from TMV 2 and its mutant, TMV 2-NLM could identify the major QTL for days to 50% flowering, number of primary and secondary branches, test weight, pod width, protein and oil content, oleic acid, linoleic acid, palmitic acid, stearic acid, and behenic acid. Based on the three markers (AhTE0357, AhTE0391, AhTE0025) detecting AhMITE1 insertion polymorphism, three genes Araip. TG1BL (B02 chromosome), Aradu.7N61X (A09 chromosome) and Aradu.7065G (A07 chromosome), respectively were identified for their association with taxonomic, productivity and quality traits.  Table. Single marker analysis for the disease resistance, productivity, agronomic, taxonomic, physiological and nutritional traits in rainy season-2015.