Construction of a high-density genetic map and QTL analysis for yield, yield components and agronomic traits in chickpea (Cicer arietinum L.)

Unravelling the genetic architecture underlying yield components and agronomic traits is important for enhancing crop productivity. Here, a recombinant inbred line (RIL) population, developed from ICC 4958 and DCP 92–3 cross, was used for constructing linkage map and QTL mapping analysis. The RIL population was genotyped using a high-throughput Axiom®CicerSNP array, which enabled the development of a high-density genetic map consisting of 3,818 SNP markers and spanning a distance of 1064.14 cM. Analysis of phenotyping data for yield, yield components and agronomic traits measured across three years together with genetic mapping data led to the identification of 10 major-effect QTLs and six minor-effect QTLs explaining up to 59.70% phenotypic variance. The major-effect QTLs identified for 100-seed weight, and plant height possessed key genes, such as C3HC4 RING finger protein, pentatricopeptide repeat (PPR) protein, sugar transporter, leucine zipper protein and NADH dehydrogenase, amongst others. The gene ontology studies highlighted the role of these genes in regulating seed weight and plant height in crop plants. The identified genomic regions for yield, yield components, and agronomic traits, and the closely linked markers will help advance genetics research and breeding programs in chickpea.


Introduction
Chickpea (Cicer arietinum L.) is a major dietary grain legume cultivated widely in South Asia and the Middle East, with a genome size of~740 Mb [1]. Globally, about 14.25 million metric tonnes of chickpea is produced annually on an area of 13.72 million hectares, and India with approximately 70% of the global production is the largest chickpea producer [2]. Chickpea is highly valued for its intrinsic potential for symbiotic nitrogen fixation and its dietary proteins, vitamins and essential minerals [3]. Despite its capability to produce 3.5-4.0 tonnes per hectare under optimum growing conditions, the global average chickpea productivity is approximately 1 tonne per hectare [4].
To ensure global food security and fulfil nutrient deficiency for 2 billion people worldwide in the coming decades [5], it is essential to increase chickpea production by developing high yielding varieties with consumer preferred traits. Genetic enhancement in yield components of chickpea is required to address the issues of low productivity. To develop improved high yielding chickpea varieties, it is crucial to understand the genetic basis of yield, yield components and associated agronomic traits [6]. Several attempts have been made to understand the genetic mechanisms associated with yield and its component traits in chickpea, and multiple QTLs have been identified for these traits. For instance, QTLs associated with plant height [7], flowering time [8], seed size/100-seed weight [9,10], double poddedness [11], Fusarium wilt [12][13][14], Ascochyta blight [15,16], Helicoverpa armigera resistance component traits [17], plant vigour [18], drought tolerance [4,19], salinity tolerance [20-22] and heat tolerance [23] have been mapped in chickpea. Even though a large number of QTLs have been identified for multiple traits, very few have been deployed for marker-assisted selection (MAS) because QTLs for complex traits tend to have small effects and strong environmental influence. Further, they also need to be validated before being incorporated into MAS programs for chickpea genetic improvement.
Recent breakthroughs in genome sequencing technologies have reduced the cost of sequencing by several folds during the last decade [5]. This reduction in sequencing/genotyping cost has enabled the development of cost-effective low-to high-density genotyping platforms, resulting in the advancement of genomic resources for chickpea [24]. For instance, a high-throughput genotyping platform, 'Axiom 1 CicerSNP array', has facilitated high resolution genetic mapping by constructing high-density genetic maps to support genetics and breeding programs in chickpea [4,20]. Such high-throughput genotyping platforms are very useful for trait mapping through genome-wide association mapping as well as for fine mapping of QTLs/trait in several crop species [25,26].
The present study focused on constructing a high-density genetic map and identifying QTLs for yield, yield components and agronomic traits using a recombinant inbred line (RIL) population derived from ICC 4958 and DCP 92-3 cross. A dense genetic map developed in this study will help in the genetic dissection of other traits that differ between the two parental genotypes. Furthermore, potential genes underlying the marker intervals of the QTLs identified for 100-seed weight and plant height were retrieved. The QTLs and candidate genes detected in this study hold enormous potential to advance genetics research and breeding applications in chickpea.

Phenotypic variation in the ICC 4958 × DCP 92-3 RIL population
The parental lines and RILs were analysed for their phenotypic performance across three years. Descriptive statistics, along with analysis of variance for six traits evaluated in the RIL population (ICC 4958 × DCP 92-3) across three years, have been provided in S1 Table. The mean squares and genetic parameters of the RILs showed highly significant (p<0.01 or p<0.001) variation for all the studied traits across three years (S1 Table). The highest coefficient of variation (%CV) was recorded for the number of primary branches (PB), followed by the number of secondary branches (SB) and the least for plant height (PLHT) (S1 Table). The magnitude of phenotypic coefficient of variation (PCV) was found to be higher than their corresponding environmental coefficient of variation (ECV) for yield per plant (YPP), 100-seed weight (100SW), pods per plant (PPP), and PLHT. The frequency distribution of the phenotypic data for the traits under study across three years indicated a marked variability and quantitative nature of the traits (S1-S3 Figs). Furthermore, the phenotypic trait value of many RILs surpassed their parental trait value in both directions for all the traits, except 100-seed weight, indicating the presence of transgressive segregation in this mapping population (S1 Table,

Correlation among traits
To scrutinize the relationships between yield, yield components and agronomic traits evaluated across three years, a principal component analysis (PCA) was performed. For the experiment conducted during the Rabi season of 2015-16, the first two components explained~59% of the total variability (S4 Fig). Here, YPP was closely related to PPP and PLHT. Similarly, 100SW and PB were found to be closely associated with PLHT and SB, respectively. For 2016-17 and 2017-18, the first two components explained about 60% and 71% of the total variation, respectively (S4 Fig). For both years, plant height favoured an increase in 100SW, while SB was closely related to YPP. Variation in the number of PB was tightly correlated with PPP. Furthermore, Pearson correlation analysis was performed to identify trait correlations within and between experimental trials ( A combined correlation analysis for all traits across three years was also undertaken to detect trait correlations ( Table 1). The results indicated that, except for 100SW, all other traits displayed a moderate to low degree of correlation (ranging from r = ± 0.01 to r = ± 0.41) between years.

High-density genetic map
Out of the 50,590 SNPs available on the Axiom 1 CicerSNP array, a total of 17,173 SNPs were found polymorphic between two parents and displayed segregation within the mapping population. A genetic map was constructed using 3,818 SNPs that covered a distance of 1064.14 cM (Table 2), with a cumulative average distance of 0.30 cM between adjacent markers. The number of SNP markers mapped across eight linkage groups ranged from 126 (CaLG08) to 995 (CaLG04), spanning from 20.4 cM (CaLG08) to 230.4 cM (CaLG04) with a cumulative mean of 133.02 cM ( Table 2). A large variation in marker density was observed across the linkage groups. The highest marker density of 6.16 SNPs/cM was observed on CaLG08, while CaLG03 displayed the lowest marker density of 1.80 SNPs/cM (Table 2).

QTL mapping for yield, yield components and agronomic traits
A total of 16 QTLs were identified for all studied traits, except for SB. These include 9 QTLs for yield and yield component traits (2 QTLs for YPP, 6 QTLs for 100SW, and 1 QTL for PPP) and 7 QTLs for agronomic traits (5 QTLs for PLHT, and 2 QTLs for PB) across three years. QTLs that contributed phenotypic variation explained (PVE) � 10% were considered as major-effect QTLs, while QTLs with <10% PVE were considered as minor-effect QTLs. Based on these premises, a total of 10 major-effect QTLs were identified for five traits, viz., 2 QTLs for YPP, 3 QTLs for 100SW, 1 QTL for PPP, 3 QTLs for PLHT, and 1 QTL for PB (Fig 1). In addition, three minor-effect QTLs were identified for 100SW and three QTLs for agronomic traits (2 QTLs for PLHT, and 1 QTL for PB) across three years (Table 3).

PLOS ONE
Furthermore, if a QTL for a particular trait was identified for more than one year, it was considered a consistent QTL, as described previously [19]. Accordingly, one consistent QTL for PLHT (qPLHT4.1) was identified on CaLG04 (Table 3). When a particular marker was found to flank more than one QTL, that particular region was considered a single genomic region. The sequences and physical positions of the markers flanking the identified QTLs are provided in the S2 Table. QTLs for yield and yield component traits. Yield per plant (YPP). Two major QTLs, one each on CaLG04 (qYPP4.1) and CaLG01 (qYPP1.1), explaining 10.1% PVE (2015-16) and 36.2% (2016-17), respectively, were identified (Table 3).

Analysis of epistatic interactions
The combinations of two-and three-loci interactions were scrutinized for all the studied traits across three years. Epistatic QTLs (E-QTLs) were identified only for 100SW evaluated during the years 2015-16 (S3 Table), and 2016-17 (S4 Table).

Candidate genes underlying the QTLs detected for 100-seed weight and plant height
The genes underlying the marker intervals of 6 QTLs for 100SW and 4 unique QTLs for PLHT were retrieved, and a total of 1,476 genes were detected (S5 and S6 Tables). The 6 QTLs detected for 100SW were found to harbour 417 genes (S5 Table). Based on the gene ontology (GO) annotation, out of 417 genes, 11 key genes were prioritized based on their role in controlling seed parameters in chickpea and other crops. A total of 1059 genes were found underlying the four QTL regions detected for PLHT. Out of these 1059 genes, 13 putative genes (two on CaLG01, five on CaLG04 and six on CaLG05) were prioritized based on their role in regulating plant height in crop plants (S6 Table). These included type-II homeodomain-leucine zipper protein (Ca_21404), and basic-leucine zipper transcription factor I (Ca_19454) on CaLG01; ethylene-responsive transcription factor ERF110-like isoform (Ca_04370), protein IQ-DOMAIN 14-like (Ca_04493), ethylene responsive transcription factor ERF027-like (Ca_04503), cytochrome P450 704C1-like isoform (Ca_04534), and C2H2-like zinc finger protein (Ca_04451) on CaLG04; probable basic-leucine transcription factor I (Ca_18221), NADH ubiquinone oxidoreductase (Ca_20087), NADH dehydrogenase [ubiquinone] 1-alpha subcomplex assembly factor 2 (Ca_20086), cytochrome c biogenesis C (Ca_20073), cytochrome c biogenesis C (Ca_20070), and NAD(P)H-quinone oxidoreductase subunit N (Ca_08896) on CaLG05.  [4,26,33,34]. Detection of polymorphism in chickpea has been a major constraint due to the limited genetic diversity [15,35]. Such low levels of genetic diversity in the cultivated gene pool entailed the need for targeting the inter-specific polymorphisms between wild and cultivated chickpea accessions [36,37]. Recent advances in genome-based capabilities have enabled the development of high-throughput approaches for genotyping, allowing the detection of desirable alleles and multiple QTLs having the potential to affect desired responses. Therefore, the current study aimed to utilize the high-density genetic map of ICC 4958 × DCP 92-3 RIL population for detecting main-effect QTLs for yield, yield components, and agronomic traits; and aimed to identify potential candidate genes for 100SW and PLHT. Detailed analysis of phenotyping data revealed a substantial genotypic variation among the RILs for all studied traits across three years. In the present study, the extent of genotypic variances was more than their corresponding environmental variances for PLHT, PPP, 100SW and YPP, indicating a greater contribution of the genotypic component to the total variation in these traits [19, [38][39][40]. All traits, except 100SW, displayed transgressive segregation in both directions, suggesting that both parental lines contributed favorable alleles for these traits. For 100SW, transgressive segregation, mostly in a negative direction, was observed for multiple RILs across all three years, which might be because of unwanted linkages between desirable and undesirable alleles contributed by parental lines. Combined correlation analysis results displayed a high degree of correlation for 100SW between years, which is predicted to be due to the high heritability of this trait observed in chickpea [19]. Furthermore, PLHT displayed a moderate correlation between 2015-16 and 2016-17, and between 2016-17 and 2017-18. However, all remaining traits displayed a low degree of correlation between years, which may be due to the significant influence of the genotypes' environment. Significant genotypic and environmental differences and a low correlation observed for traits between years might be the reasons for the absence of any consistent QTL detected for the evaluated traits.
In the present study, the genotyping of RILs using a high-throughput Axiom 1 CicerSNP array facilitated the construction of a dense genetic map. The genetic map comprising of 3,818 SNPs, with an average inter-marker distance of 0.30 cM and an average density of 3.91 SNPs/ cM, is one of the most saturated maps developed for chickpea, which is superior to some of the previously reported inter-specific maps [9,36,41,42]. Furthermore, a total of 16 QTLs were mapped for five traits, with PVE ranging from 6.5% to 59.70%. Here, two major-effect QTLs (qYPP4.1 and qYPP1.1) for yield per plant were identified for 2015-16 and 2016-17, with PVE ranging from 10.10 to 36.20%. In contrast to earlier studies [19, 43,44], the major-effect QTL qYPP1.1 explained the highest phenotypic variance of 36.20% and may hold potential for deployment in chickpea breeding efforts by marker-assisted breeding approach. Despite high PVE values obtained for yield per plant and the number of primary branches QTLs, only a marginal additive effect was observed for these traits across different years. This suggested the absence of any significant difference in trait values between parents. The high phenotypic variation explained by QTLs for YPP and PB may be due to the sample size of the mapping population (N = 161), density of markers used in the linkage map, and QTL mapping software. Recent QTL mapping studies suggest epistasis to be a crucial genetic component underlying complex quantitative traits such as 100-seed weight, plant height, yield and components [19]. Importantly, the use of epistatic effects in marker-assisted selection holds the potential to achieve a higher genetic gain in breeding programs. In the present study, two-loci and threeloci epistatic QTL interactions were identified for 100-seed weight across two years. Inclusion of these epistatic effects in sophisticated biological models will provide an opportunity to optimize long-term selection response and a comprehensive understanding of the genetic base underlying improvement of 100-seed weight. Moreover, deployment of the identified epistatic QTLs will enable marker-assisted selection to bear a longer persistence response and may lead to a considerable increase in genetic gain.
Yield trait is complex and governed by several components such as pod weight, haulm weight, harvest index, seed weight, pod to flower ratio, etc., and QTLs associated with these traits are favourable targets for selection. Furthermore, plant height represents a crucial factor for machine harvest because the losses incurred during machine harvest is more for semi-erect genotypes (about 20%) and less in tall and erect genotypes (2.6-5.0%) [45]. Given the shortage in the workforce and to address drudgery among farmers, there is a huge demand for tall and erect varieties suitable for machine harvesting. A total of 6 QTLs for 100SW were identified across three years, including two QTLs each on CaLG06 and CaLG07, and one QTL each on CaLG03 and CaLG04, explaining phenotypic variation ranging from 6.50% to 16.60%. Many studies have reported QTLs for 100SW in different genetic backgrounds of chickpea in the last few years [10,[46][47][48]. For example, the QTLs for 100SW were identified on CaLG01 and CaLG04 and accounted for 37% of the phenotypic variance across two environments [10]. QTLs for seed weight have also been reported on CaLG02 and CaLG05 in different genetic backgrounds [9,43,[48][49][50][51][52]. Many QTLs identified in the present study were found in close proximity or overlapped earlier reported QTLs. For instance, q100SW4.1 on CaLG04 was detected in close proximity of two QTLs consistent across years and locations reported earlier [19, 53], but in a different mapping population. However, higher PVE and LOD scores of the QTLs identified in the same region for previous studies compared to the present study could be due to the difference in the RIL population used and genotype × environmental interaction effects. The other two major-effect QTLs, including q100SW6.1 and q100SW6.2 on CaLG06, reported in this study are novel QTLs. Similarly, for plant height, three major-effect and two minor-effect QTLs with PVE ranging from 8.1% to 18.5% were identified. When flanking markers of the QTLs identified for plant height in the present study were compared with the markers for plant height QTLs identified in previous studies [7,19,40,54,55], it was observed that all the QTLs identified in the present study were novel and did not exhibit any similarity with previously reported QTLs. Hence, novelty and population-specific characteristics of the identified QTLs governing plant height was observed in the present study.
Identifying candidate genes for a particular trait is the first important step to understand the molecular mechanisms underlying the trait of interest. Integration of genomics-based knowledge with conventional breeding efforts can decipher molecular mechanisms underlying traits of interest. With the availability of the chickpea genome sequence [1], it is now possible to identify genes governing traits like seed weight and plant height [26]. Genomic regions harboring QTLs for 100SW and PLHT were selected to identify putative candidate genes controlling these traits. Based on these premises, a total of 417 genes were found underlying 6 QTLs detected for the 100SW, and several genes within this set were shown to play a significant role in seed development in previous studies. For instance, a gene encoding C3HC4-type RING finger protein (Ca_19553) was shown to control plant growth and fruit development in Nicotiana benthamiana [56]. Genes including serine kinase (Ca_04479) and pentatricopeptide repeat containing proteins (PPR) (Ca_06269, Ca_04422, Ca_04472) have been shown to play a major role in seed development in different crops [57][58][59]. A sugar transporter protein encoding gene (Ca_23961) was also displayed to play a key role in the accumulation of seed reserve and transport in wheat [60], Arabidopsis [61], and fava bean [62]. Furthermore, tubbylike F-box protein 8 (Ca_04384), transcription factor bHLH-118 (Ca_04526), RING-H2 finger protein (Ca_19553), and ethylene-responsive factor ERF (Ca_04503) displayed high expression in seed tissue, thereby predicting their role in regulating seed development in Arabidopsis [63] and chickpea [46].
We also predicted the possible candidate genes underlying four unique QTLs for PLHT. The potential gene underlying the qPLHT1.1 genomic region includes a leucine zipper protein (Ca_21404). Moreover, the most promising genes underlying qPLHT5.1 genomic region on CaLG05, include NADH dehydrogenase (Ca_20086), and cytochrome-c biogenesis (Ca_20070, Ca_20073). Integrated genomic approaches elucidated the role of these genes in regulating plant height. These genes are known to play a key role in the tricarboxylic acid (TCA) cycle and electron transport chain (ETC) for regulating respiration and mitochondrial organization in crop plants [64][65][66][67]. Another gene family underlying the qPLHT5.1 genomic region on CaLG05 includes the bZIP transcription factor genes, which regulate plant morphology using gibberellins (GAs) and GA homeostasis. It was shown that down-regulation of genes involved in GA biosynthesis inhibits cell elongation and growth of stem internodes and results in dwarf phenotype in monocot and dicot crop plants [68]. GA response modulators such as dwarf 8 (d8), semi-dwarf (Sd1), reduced height (Rht) and gibberellin insensitive (GAI) that regulate plant height have been identified and validated in several crops, including wheat, maize, rice and tobacco [68][69][70][71]. The interaction and similar expression characteristics of these genes in a regulatory pathway underlying both mitochondrial respiration and GA biosynthesis are necessary for maintaining the growth and development of organs, including plant height [72,73].

Development of RIL population
A RIL population was developed by crossing ICC 4958 (large seeded and drought tolerant donor parent) with DCP 92-3 (small seeded and drought susceptible) chickpea genotype at ICAR-IIPR. The segregants displayed variations in traits such as yield, 100-seed weight and plant height. This mapping population containing 166 RILs (F 8 ) was developed at ICAR-Indian Institute of Pulses Research (IIPR), Kanpur, India. The population was advanced by the single seed descent (SSD) method to develop recombinant inbred lines.  3 m), and seed to seed distance of 10 cm was maintained in a row. The field was prepared for sowing by applying diammonium phosphate (18% N and 46% P 2 O 5 ). During the crop season, rainfall ranged from 196 to 230 mm, and surface irrigation was applied during the vegetative stage as and when required. The experiment was conducted in augmented design. The traits that were evaluated during the experiment include yield per plant (YPP), 100-seed weight (100SW), pods per plant (PPP), plant height (PLHT), number of primary branches (PB), and number of secondary branches (SB). Three plants were randomly selected from each plot, and the phenotypic data for the traits mentioned above was recorded. The mean value of the data recorded on three plants was computed and used for further analysis.

Genotyping of RILs
Genomic DNA was extracted from parents and RILs using a modified CTAB method as described previously [74]. In brief, young leaves of 20-25 days old plants were used for DNA isolation, and DNA quality was tested and quantified using NanoDrop 8000 spectrophotometer (Thermo Scientific, USA). DNA concentration was normalized to a minimum of 40 ng/μL. Based on the presence of high-quality total genomic DNA, 161 RILs were genotyped using the Axiom 1 CicerSNP array containing 50,590 SNPs distributed across eight linkage groups of chickpea as described earlier [4].

Genetic map construction and QTL analysis
The genotyping data obtained from the Axiom 1 CicerSNP array was utilised for the construction of a genetic map. SNPs indicating the presence of contrasting alleles between two parents were targeted. For the 17,173 polymorphic SNPs thus obtained, a chi-square test was carried out with a null hypothesis that two alleles at a given locus segregate in a 1:1 ratio for a RIL population. SNP markers that showed substantial deviation from the 1:1 ratio, and high missing data were not used for further analyses. A high-density genetic map for ICC 4958 × DCP 92-3 RIL population was constructed using JoinMap v4.1 [75]. The logarithm of odds (LOD) score for the test of linkages between marker pairs was set to 3.0, and the markers that were attributed to a linkage group at a LOD grouping threshold of 3.0 were utilized. The maximum-likelihood mapping algorithm and Kosambi mapping function were used for constructing the genetic map. In order to map QTLs associated with yield, yield components and agronomic traits, the field phenotyping data for 161 RILs collected across three years was used. QTL mapping was performed using Windows QTL Cartographer software version 2.5. Here, composite interval mapping using a genome-wide LOD threshold of 3.0 was performed at p<0.05 significance, as described previously [47, 76,77]. This led to the mapping of main-effect and minoreffect QTLs associated with the traits evaluated in this mapping population.

Mining of candidate genes
To retrieve candidate genes underlying the QTL intervals of 100-seed weight and plant height, the QTL flanking markers were selected and used for similarity search against chickpea reference genome assembly (CaGAv1.0) [1] using blastn (with the parameter "-task blastn-short"). The flanking markers were then used to identify genes (between left and right markers) using bedtools (v2.17.0) against the corresponding chickpea genome, and 1,476 genes in the QTL regions were retrieved. Furthermore, the identified genes were functionally annotated using blastx (E-value cutoff 1E-05) against the NCBI nr database, followed by GO annotation using Blast2GO (v5.2).

Statistical analyses
Phenotyping data measured across three years were analysed individually to estimate the best linear unbiased predictors (BLUPs) for each trait using the REML option in the PROC MIXED procedure of SAS v9.0 (SAS Institute Inc., NC, USA). The performance of a genotype for augmented randomized complete block design was modelled as: where Y ij is the phenotype of the i th genotype in the j th block, μ is the overall mean, ß i is the block effect which was considered as random, c j is the effect of the check in j th block which was considered as fixed, α i is the random effect of the i th genotype, and ε ij is the residual considered as a random effect. The phenotyping data collected across three years were subjected to ANOVA for RCBD using the 'augmentedRCBD' package in R [78]. The phenotypic variation observed for each of the six traits was evaluated using the formulae described previously [79,80]: where � X indicates the grand mean for each trait. A Pearson correlation analysis and principal component analysis was computed using a custom R script. The frequency distribution of the yield, yield components, and agronomic traits within the RIL mapping population was analysed and plotted with the 'UsingR' package in R. The analysis of epistatic interactions between the interacting QTLs was conducted using Genotype Matrix Mapping (GMM) software (version 2.1; http://www.kazusa.or.jp/GMM) [81]. For estimating the combinations of two-loci and three-loci interactions using the GMM algorithm, the maximum length of locus combinations was set to 2 and 3, respectively. Moreover, the minimum number of corresponding samples was set to 1, and the default option of 'automatic' was used for defining the search range (d). Here, the syntax 'AA' refers to ICC 4958 alleles, while the syntax 'BB' refers to DCP 92-3 alleles.

Conclusion
In the present study, analysis of the genotyping data generated on ICC 4958 × DCP 92-3 RIL mapping population using Axiom 1 CicerSNP genotyping array facilitated the construction of a high-density genetic linkage map. Analysis of the phenotyping data for six traits evaluated across three years along with the genotyping data led to the identification of 16 major-and minor-effect QTLs for five traits, explaining up to 59.7% PVE. Genes underlying the identified QTL regions for 100SW and PLHT were reported to play a key role in regulating seed attributes, plant height, and plant growth and development. However, further fine-mapping and experimental validation of these genes is needed to precisely determine the candidate gene(s) underlying the QTLs identified. Nonetheless, the high-density linkage map, major-effect QTLs and genomic regions identified in this study hold huge potential to be deployed in chickpea improvement programs by the marker-assisted breeding approach to develop high yielding chickpea varieties.
Supporting information S1