Genetic Dissection of Drought and Heat Tolerance in Chickpea through Genome-Wide and Candidate Gene-Based Association Mapping Approaches

To understand the genetic basis of tolerance to drought and heat stresses in chickpea, a comprehensive association mapping approach has been undertaken. Phenotypic data were generated on the reference set (300 accessions, including 211 mini-core collection accessions) for drought tolerance related root traits, heat tolerance, yield and yield component traits from 1–7 seasons and 1–3 locations in India (Patancheru, Kanpur, Bangalore) and three locations in Africa (Nairobi, Egerton in Kenya and Debre Zeit in Ethiopia). Diversity Array Technology (DArT) markers equally distributed across chickpea genome were used to determine population structure and three sub-populations were identified using admixture model in STRUCTURE. The pairwise linkage disequilibrium (LD) estimated using the squared-allele frequency correlations (r2; when r2<0.20) was found to decay rapidly with the genetic distance of 5 cM. For establishing marker-trait associations (MTAs), both genome-wide and candidate gene-sequencing based association mapping approaches were conducted using 1,872 markers (1,072 DArTs, 651 single nucleotide polymorphisms [SNPs], 113 gene-based SNPs and 36 simple sequence repeats [SSRs]) and phenotyping data mentioned above employing mixed linear model (MLM) analysis with optimum compression with P3D method and kinship matrix. As a result, 312 significant MTAs were identified and a maximum number of MTAs (70) was identified for 100-seed weight. A total of 18 SNPs from 5 genes (ERECTA, 11 SNPs; ASR, 4 SNPs; DREB, 1 SNP; CAP2 promoter, 1 SNP and AMDH, 1SNP) were significantly associated with different traits. This study provides significant MTAs for drought and heat tolerance in chickpea that can be used, after validation, in molecular breeding for developing superior varieties with enhanced drought and heat tolerance.


Introduction
Chickpea (Cicer arietinum L) is the second most important grain legume cultivated mostly on residual soil moisture in the arid and semi-arid regions of the world. It is a diploid member of family Leguminosae with basic chromosome number eight and genome size 738 Mb [1]. Globally it is cultivated on over 13.2 Mha with an annual production of 11.6 million metric tons [2]. Chickpea is a rich source of proteins, essential amino acids and vitamins such as riboflavin, niacin, thiamin, folate and the vitamin A precursor bcarotene [3]. Based on seed size and color chickpea is grouped into two market classes namely desi and kabuli. Globally about 80% of total production is contributed by desi cultivars. Further, among chickpea growing countries India alone contributes to 70% of the world's total production [2]. Other major chickpea producing countries include Pakistan, Turkey, Australia, Myanmar, Ethiopia, Iran, Mexico, Canada and the USA. Although the chickpea production potential is high, it is not fully realized owing to several abiotic and biotic stresses. Among abiotic stresses that affect the chickpea production, drought and heat are considered as major constraints. Annually, 40-50% reduction in yield has been reported worldwide as a result of terminal drought in chickpea [4]. Further, the damage due to drought is compounded by heat stress in the warmer Mediterranean regions and regions like South Asia where temperature increases towards flowering [5] and it is difficult to differentiate between the damage caused by the individual stresses. Nevertheless as a result of drought stress, the growing season may be shortened affecting yield components, i.e., total biomass, pod number, seed number, seed weight and quality and yield plant 21 [6]. Flowering and seed set are the most critical growth stages affected by drought in chickpea. Drastic reductions in chickpea seed yields were observed when plants were exposed to high (35uC) temperatures at flowering and pod development stages [7]. Heat stress also adversely affects pollen viability, fertilization and seed development leading to a reduced harvest index. The identification of genomic regions associated with the drought and heat tolerance would enable breeders to develop improved cultivars with increased drought and heat tolerance using molecular breeding.
Large scale genomic resources are essential for understanding the genetics of complex abiotic stresses like drought, heat tolerance etc. In the case of chickpea, during last five years .3,000 microsatellites or simple sequence repeats (SSRs) [8][9][10], Diversity Array Technology (DArT) arrays [10] and single nucleotide polymorphism (SNP) [11] markers were developed. Further these marker resources were used for linkage map construction [9,10] as well as trait mapping. Majority of trait mapping studies were focused on biotic stresses [12]. Recently sequencing of desi and kabuli chickpea genomes has been completed [1,13] and a genomewide physical map (http://probes.pw.usda.gov:8080/chickpea/) was developed [14]. Despite the availability of large scale genomic resources, most of the studies to understand the genetics of complex traits were limited to quantitative trait loci (QTL) mapping studies. Moreover, the family based QTL studies were mostly limited to biotic stresses like Fusarium wilt, Ascochyta blight and Botrytis gray mold [12]. Very few studies were conducted to understand the genetics of drought tolerance [15] and salt tolerance [16] in chickpea.
Despite the continuous efforts to enhance the productivity of chickpea, climate changes during past two decades had tremendous influence on the production and productivity [17]. Global warming, coupled with increased temperatures in arid and semiarid regions has necessitated development of crop varieties that can sustain and yield high in harsh climatic conditions by virtue of being resilient to warmer temperatures. Further, the quantitative inheritance of drought and heat; their interaction with environment have been posing challenges to our understanding of the genetic basis of these traits. Although conventional breeding has substantial impact in marginal chickpea growing environments [18], future genetic gains will require a more systematic use of physiological and genetic approaches, facilitated by the rapid increase in genome knowledge and understanding. Thus the knowledge generated through advances in genomics during past two decades, have enormous potential in enhancing the tolerance to these stresses.
During last two decades, molecular markers have provided greater insights into complex traits in several crop species and the research endeavors in crop improvement shifted from quantitative to molecular genetics with emphasis on QTL identification and adoption of marker-assisted selection (MAS). However, only modest results have been witnessed due to several factors including absence of tight linkage between makers and QTLs, nonavailability of mapping populations, and substantial time needed to develop such populations. Further, the QTL mapping approaches cannot make use of the huge variation present across the germplasm available in the genebanks. In addition, the resolution achieved through linkage mapping based on bi-parental mapping population is low compared to population linkage disequilibrium (LD) based association mapping. The genomewide association study (GWAS) makes it possible to simultaneously screen a very large number of accessions for genetic variation underlying diverse complex traits. In fact, the association studies in other crop species especially in cereals [19][20][21] have revealed that the linkage based QTL analyses can be complemented by LD based association studies. Association mapping studies in legumes are limited to soybean [22], Medicago [23] and common bean [24]. Most of these association studies either have deployed GWAS or candidate gene-sequencing approach. In some recent studies, however, the combined approach of GWAS and candidate-gene sequencing has been shown as more powerful approach than the individual approach [25]. However, to date there is no report on association studies in the case of chickpea. Recently a diverse set of 300 accessions, called as 'reference set' [26] that included 'mini core collection' [27] has been used to analyze sequence diversity for 10 drought responsive genes [28].
In the present study, both genome-wide and candidate genesequencing based association mapping approaches were employed to understand genetics of the two most important complex abiotic stresses ''drought'' and ''heat'' and establish significant markertrait associations (MTAs). For this the chickpea reference set/minicore collection was genotyped for 1,813 marker loci (SSRs, DArTs and SNPs) and phenotyped at three locations in Africa (Nairobi and Egerton in Kenya; and Debre Zeit in Ethiopia) and three locations in India (Patancheru, Kanpur and Bangalore). Besides significant MTAs, the present study also provides an in-depth understanding of the genetic diversity, population structure in the reference set and linkage disequilibrium (LD) in genome of chickpea.

Genome-wide marker profiling
We report the first comprehensive characterization of the chickpea reference set using 16,046 markers (35 SSRs,15,360 DArT features and 651 SNPs) in the present study. A total of 917 alleles were detected by 35 SSR markers with an average of 26.2 alleles/marker locus and the polymorphism information content (PIC) values ranged from 0.48-0.96 (Table S4). Further, major allele frequency and the overall mean heterozygosity was 0.21 and 0.04 respectively. The DArT markers developed by Thudi et al. [10] were used for genotyping the reference set, as a result 1,156 DArT loci were found polymorphic with PIC values in the range of 0.01-0.38 (Table S5). Among 651 SNPs genotyped using KASPar assays, 381 SNPs were polymorphic and the PIC values ranged from 0.007-0.375 (Table S6).
Allele mining in candidate drought responsive genes A total of 10 candidate drought responsive genes (Table S7) were amplified on the reference set/mini-core collection. Of these, 5 genes (Abscisic acid stress and ripening, ASR; CAP2 gene; ERECTA; sucrose synthase, SUSY; Sucrose phosphate synthase, SPS) were amplified on the reference set. The number of genotypes for which good quality sequences were obtained varied from 79 (ERECTA fragment obtained from 7f-5r primer pairs) to 236 genotypes (SPS gene) out of 300 genotypes attempted [28]. The longest sequence obtained was for SUSY gene which was about 1,600 bp and the shortest sequence was obtained for SPS gene with 312 bp. Varying number of SNPs were identified in each gene using DIVEST tool. SNPs were inspected manually for possible sequencing errors and SNPs having clear peaks were considered as true SNPs. For instance, highest number of SNPs (34) was obtained for ASR gene while no SNPs were found in the case of CAP2 gene. In total, 33 SNPs were identified in case of ERECTA gene, of which 13 SNPs (9 transitions and 4 transversions) were in ERECTA (7f-5r) gene fragment and 20 SNPs (10 transitions and 10 transversions) in case of ERECTA (8f-8r) gene fragment. Only 1 indel was observed in case of ERECTA (7f-5r) gene fragment. One indel and 3 SNPs were observed in case of SPS gene sequence (Table S7). In addition to above mentioned 5 drought responsive genes, 5 abiotic stress responsive genes (AKIN-SNF1 related protein kinase, AKIN; Aminoaldehyde dehydrogenase gene, AMADH; Dehydrin, DHN; Dehydration responsive element binding protein, DREB and Myb transcription factor, Myb) were used for allele mining across mini-core collection ( Table  S7). Out of 211 genotypes screened, number of genotypes yielding high quality sequence varied from 191 (DREB) to 209 (AMADH). Highest number of SNPs (14) was obtained for DREB gene (8 transitions and 6 transversions). Apart from SNPs, 23 indels were also detected. AKIN gene was found to be most conserved with just 2 SNPs (transitions) and 2 indels. A total of 13 SNPs were identified (6 transitions and 7 transversions) in case of AMADH with 3 indels among 209 high quality sequences, while in case of DHN gene 7 SNPs (5 transitions and 2 transversions) were identified among 198 sequence analyzed. For Myb gene only 6 SNPs (1 transition and 5 transversions) were reported with 2 indels across Myb sequences. Average PIC value of SNPs ranged from 0 (CAP2 gene) to 0.43 (CAP2 promoter) (Table S7).

Population structure and genetic relationships
In order to assess the population structure and number of subpopulations, 85 evenly distributed DArT loci on chickpea genome [10] were used on the reference set employing STRUCTURE 2.3.4 [29]. Based on the maximum likelihood and delta K (DK) values, three sub-populations (Group I, Group II and Group III) were determined in the reference set (Fig 1). However, different K values are possible ( Fig S1); nevertheless, these do not qualitatively affect the results. As the K value is increasing the allelic admixture among the sub-populations is more clearly demarcated. Using a membership probability threshold of 0.60, 109 accessions were assigned to sub-population 1; 154 accessions to sub-population 2; 26 accessions to sub-population 3 and 11 accessions were retained in the admixed group (Table S1).
To understand the genetic diversity in the reference set neighbor joining (NJ) trees were constructed using allelic data from 35 SSRs, 1,156 DArT loci and 651 SNP markers independently as well as all markers combined together, which revealed three major clusters (Fig. S1). However, the grouping of the accessions in each cluster for different marker systems was not similar. This can be attributed to the nature of markers used and the genomic regions sampled by these marker systems. Neverthe-less, the SNP markers could more clearly demarcate the accessions into smaller sub-groups compared to SSRs and DArT loci ( Fig S1). This indicates SNP markers have more potential application for plant genome analysis compared to other markers. However, none of the three marker systems could group the association panel into distinct groups based on either the market class or the biological status of the germplasm set analyzed. This indicates that genetic variation in the reference set is very high and it is an ideal germplasm panel for association studies. The wild genotypes were grouped in third cluster and these are distributed away from the cultivated genotypes in the cluster indicating that the wild species genotypes are more distinct (Fig S1).

Linkage disequilibrium (LD) decay
A consensus genetic map was constructed using the two interspecific genetic maps developed by [10] and [11] based on mapping population ICC 49586 PI 489777. A total of 1,358 markers (706 DArT loci, 622 SNPs and 30 SSRs) mapped on to the consensus map were used for estimating the LD decay. Estimation of LD decay is essential to determine the number of markers required for association mapping of complex traits like drought and heat tolerance in chickpea. In general, the genetic distance at which r 2 decays to 0.1-0.2 is considered to be the extent of LD in a species [30]. In the present study, the pairwise LD estimated using the squared-allele frequency correlations (r 2 ; when r 2 ,0.20) was found to decay rapidly with the genetic distance of 5 cM (Fig 2), when r 2 ,0.1 LD decay was found to decay at a genetic distance of 20 cM. The results suggest that, LD decay over short distances will facilitate fine mapping of QTL, while LD decay over longer distances will facilitate initial association of trait data with haplotypes in chromosome regions. Further, researchers can use the LD map as a reference to find target QTL and genes for positional cloning [31].

Genome-wide and candidate gene-sequencing based association
In order to reduce the number of false positive associations, both population structure and relative kinship information were employed. Mixed linear model (MLM) with optimum compression and P3D in TASSEL 2.01 software was employed for establishing MTAs. Further, to eliminate the false positives Bonferroni correction was used and as a result, a total of 312 highly significant MTAs were identified for 25 agronomically important traits (Table 1). Phenotypic variance explained (PVE) for these MTAs ranged from low (4.14%) to very high (96.55%) and thus MTAs detected with high PVE for desired traits can be improved through molecular breeding.
Drought tolerance root traits. Drought is the major limiting factor to crop production and especially chickpea experiences various kinds of drought stresses depending on the timing and intensity of the water stress relative to the reproductive stage of the crop. Terminal drought alone has been leading to more than 50% yield losses in chickpea. In the context of receding soil moisture, the breeding strategies should focus to enhance the maximum utilization of available soil moisture efficiently; hence breeding efforts should focus on improving root traits that enhance the efficient extraction of soil moisture. A total of 8 root traits were phenotyped for three seasons at Patancheru for gaining insights into the root system (Table S2). Association analysis identified 15 markers significantly associated with 5 root traits (Root dry weight, RDW; root length density, RLD; root surface area, RSA; root volume, RV and rooting depth, RDp) with PVE ranging from 8.25-22.41%. Among them, 7 markers showed significant association with single trait (RDp) and 2 markers (NCPGR7, DR_237) showed associations with more than one trait (Table 1). Hence, these two markers are believed to be associated with colocalized/pleiotropic QTLs. The co-localization of specific genes/ QTLs could be a better way to understand the molecular basis of drought tolerance or of traits related to drought response. The presence of several co-localized/pleiotropic QTLs verified the complex quantitative nature of drought tolerance in chickpea and allowed the identification of some important genomic regions for traits related to drought tolerance. The markers associated with more than one trait may be efficiently utilized in improvement of more than one trait simultaneously through marker assisted selection (MAS). Till date there are no reports of association studies in the case of chickpea, however the association studies in other crop species especially in cereals such as maize [32], barley [33], sorghum [34] and wheat [35] have revealed that the linkage based QTL analyses can be complemented by LD based association studies.
Phenolological traits. Three phenological traits days to 50% flowering (DF; refers to the day when more than 50% of the plants in a plot initiated flowering), days to maturity (DM) and flowering days (FD; refers to the number of days from the start of flowering to cessation of flowering) were phenotyped for 7-11 seasons in 1-5 locations (Kanpur, Patancheru, Debre Zeit, Egerton and Nairobi). The variation in crop duration is known to affect the seed yield under both drought and heat stresses [5]. Among three traits, 5 significant MTAs were identified for DM and 2 significant MTAs for FD. CaSTMS2 marker associated with FD explained highest phenotypic variation (96.55%; Table 1). Among 4 markers (TA14, CKaM1056, cpPb-489599, ASR391(C/T) and ASR447(C/T)), associated with DM, TA14 explained maximum phenotypic variation (79.31%). Further, among phenological traits high heritability was observed in case of DF followed by DM and FD (Table S3), indicating the possibility of selecting phenotypes required for various target environments. Early maturity is one of the crop adoption strategies to escape drought and heat stresses [36]. Markers that are associated with DM and have significant negative effect on the trait will serve in selection of genotypes with early maturity, thus enhance drought tolerance. Conventionally, late sowing is recommended to overcome heat stress [36]; however, the unpredictable climatic condition may lead to yield losses significantly in case of late planting. Hence, identification of markers associated with DF and DM can be deployed for developing lines that escape drought.  (Table 1). Among 26 markers (9 SNPs, 6 SSRs, 6 DArT loci and 5 gene-based SNPs) that were associated with 100SDW, TA71 locus explained maximum phenotypic variation (88.34%). Further, among 5 gene-based SNP markers [AM_192, ASR_192_290, Ca_Cap2promo, CAP2-promo98(C/G) and DR_237], CAP2promo98(C/G) explained maximum phenotypic variation (36.95%). The heritability of 100SDW was more than 0.9 across all environments and locations, except for Kanpur (0.671) under heat stress environment (Table  S3) Transpiration efficiency related traits. SPAD chlorophyll meter reading (SCMR) and d 13 C are the transpiration efficiency related traits phenotyped in the present study. However, 22 significant MTAs were identified for d 13 C explaining phenotypic variation ranging from 7.81-34.77%. The d 13 C is considered as an indirect measure of transpiration efficiency and MTA identified in the present study can be used for improving transpiration efficiency. The heritability values ranges from 0.65 to 0.71 for d 13 C, indicating that this can be an ideal surrogate to breed for transpiration efficiency in chickpea.
Interestingly, the MTAs reported in the present study for d 13 C (Fig 3a) and 100SDW (Fig 3b) were falling in the ''QTL-hotspot'' region reported on CaLG04 of intra-specific genetic map ( [37] ;  Fig 3c), that reemphasizes the significance of QTLs detected that can be deployed in molecular breeding for trait improvement.

Candidate gene associations
A number of genes involved in plant drought responses and tolerance have been identified in model crops [38]. The candidate genes for drought tolerance defined by sequence variants and the phenotypic data on drought and heat tolerance related traits obtained on the reference set was employed to find the association between trait and genes. A total of 113 candidate gene-based SNPs identified in 10 candidate genes were used for association analysis as result, 18 SNPs from 5 genes (ERECTA, 11 SNPs; ASR, 4 SNPs; AMADH, 1 SNP; CAP2 promoter, 1 SNP and DREB, 1 SNP), were significantly associated with different traits. Interestingly all 11 SNPs in ERECTA gene were significantly associated with 100SDW. Furthermore, the sole SNP found in case of CAP2 promoter was detected be associated with 100SDW. AMADH192 gene-based SNP marker is significantly associated with 100SDW, explaining 25.71% phenotypic variation. The role of the gene AMADH, in response to stress caused by mechanical damage was evaluated by Petřivalský et al. [39]. This gene is also expected to play a role in physiological processes related to polyamine degradation, converting 4-aminobutanal to GABA [38]. Protective role of AMADH in response to abiotic stress is also confirmed during the present study. The AMADH gene is found to be associated with four stress related root traits, deciphering the importance of this gene in abiotic stress response.
The gene-based SNP markers with significant MTAs were used to identify the coordinates on the chickpea genome and the amino acid changes due the SNPs were identified ( Table 2). Among 11 SNPs in ERECTA7f fragment, 2 SNPs (ERECTA7f_33 and ERECTA7f_682) were found in the CDS regions, however, none of the SNPs had any effect on the change in amino acid. While in case of ASR gene all three SNPs that had significant associations were in the CDS region. Two SNPs ASR_209 and ASR_261 were associated with d13C altered the amino acid they code for. In the case of ASR_209, asparagine is changed to glutamic acid while in the case of ASR_261 valine is changed to lysine. Interestingly, asparagine synthase has been reported to be negatively correlated with most performance parameters under drought stress in case of rice [40] and the presence of glutamic acid has been reported to increase the drought tolerance. Further, enhanced lysine content has been demonstrated to possess enhanced drought tolerance in maize (http://www.echocommunity.org/resource/resmgr/ a_to_z/azch3gra.htm). In addition lysine increases the chlorophyll content in leaves and thus enhances drought tolerance. Among the candidate genes used in the present study, genes like ERECTA and DREB were reported in the ''QTL-hotspot'' region on CaLG04 of chickpea [14]. In addition ERECTA gene was also reported to enhance transpiration efficiency [41] and water use efficiency and transpiration efficiency in Arabidopsis [42] and DREB has been reported to play important role in abiotic stress response in Medicago [43]. Hence, the availability of the physical map [14], consensus genetic map [15] and the MTAs established in the present study can be effectively deployed for cloning of these important genes for enhancing the drought tolerance in chickpea.

Marker-trait associations for molecular breeding
The ultimate aim of breeding is to obtain higher yields under stress conditions. In the present study, a total of 159 MTAs were identified with PVE .25% for 4 important traits. A total of 38 significant allele effects for these 8 traits were identified associated with 9 markers showing significant impact on these traits while 9 markers were found to be associated with multiple traits (Table 3). All these associated markers and identified genotypes with favorable alleles can be deployed after validation for improving above mentioned traits through molecular breeding.

Conclusion
To understand the genetic basis of tolerance to drought and heat stresses in chickpea, a comprehensive association mapping approach has been undertaken. As a result, 312 MTAs were identified and maximum number of MTAs (70) was identified for 100-seed weight. In the present study, the pairwise LD estimated using the squared-allele frequency correlations (r 2 ; when r 2 ,0.20) was found to decay rapidly with the genetic distance of 5 cM (Fig 2), when r 2 ,0.1 LD decay was found to decay at a genetic distance of 20 cM. Among 113 gene-based SNPs, 6 SNPs in ASR, 3 SNPs each in DHN and DREB were also found to have significant associations with traits like 100-seed weight, d 13 C, plant height, root dry weight, pods per plant and yield under stressed conditions. This study provides significant MTAs for drought and heat tolerance in chickpea that can be used, after validation, in molecular breeding for developing superior varieties with enhanced drought and heat tolerance.  A chickpea reference set, comprising of 300 accessions (including 267 landraces, 13 advanced lines and cultivars, 7 wild Cicer accessions, and 13 accessions with unknown biological status) defined based on molecular characterization of global composite collection [26], captured 1,315 (78%) of the 1,683 composite collection alleles was employed in the present study (Table S1). The mini-core collection of chickpea, is a subset of reference set, comprises of 211 accessions [27].

DNA isolation and quantification
The DNA was isolated from the tender leaf tissues of 15 day old seedlings as per Cuc and colleagues [44]. The quality and quantity of DNA was checked on 0.8% agarose gel using l-DNA standard. The DNA was normalized to 5 ng/ ml for further use.

Genotyping of reference set
In addition to the genotyping data for 35 SSR markers generated previously [26], 15,360 DArT loci and 651 SNP markers using KASPar assays were genotyped on the reference set.

DArT genotyping
The reference set was genotyped with the 15,360-clone DArT arrays developed by Thudi et al. [10]. In brief, genomic representations for genotyping were prepared by the complexity reduction method described by Yang and colleagues [45]. Briefly, ca. 100 ng of DNA of reference set were digested with restriction enzymes PstI and HaeIII (New England Biolabs, USA) and the PstI adapter was simultaneously ligated. One ml of restriction/ligation reaction served as a template in a 50 ml amplification reaction with PstI+0 primer. Adaptor and primer sequences and cycling conditions are given in the earlier study [46]. Arrays were hybridized with fluorescently labeled targets from all genotypes used for the array development [45,46]. After overnight hybridization at 62uC, the slides were washed and scanned with a Tecan LS300 confocal laser scanner (Grödig, Salzburg, Austria). Individual samples were processed identically to samples for marker discovery and with similar marker quality thresholds in DArTsoft analysis [10].

SNP genotyping
Based on high PIC values and distribution of SNPs on the chickpea genome, 651 KASPar assays for the targeted SNPs were selected from [11] for genotyping at LGC Genomics, UK. Details on principle and procedure of the assays are available at http:// www.kbioscience.co.uk/reagents/KASP_manual.pdf and http:// www.kbioscience.co.uk/download/KASP.swf. SNPViewer was employed for SNP calling.

Candidate gene selection, sequencing and SNP identification
The candidate drought responsive genes were genotyped either on reference set or on the mini-core collection (Table S7). The candidate genes like abscisic acid stress and ripening gene (ASR), drought responsive element binding protein (DREB) gene, ERECTA, sucrose synthase (SuSy), sucrose phosphate synthase (SPS), dehydrin (DHN), aminoaldehyde dehydrogenase (AMADH), AKIN (SNF1 related protein kinase), MYB transcription factor, responsible for drought tolerance were identified based on prior information of involvement of the genes in drought tolerance mechanism in other crop species. In order to identify the above mentioned candidate genes in chickpea, the sequences were downloaded either from chickpea or from related legume species like Medicago, which is phylogenetically related to chickpea.
The candidate genes were amplified on the reference/mini-core collection. The PCR amplicons were purified using Exonuclease I and 1 U of shrimp alkaline phosphatase (SAP) per 5 ml of PCR product. The Exo/SAP added PCR products were products were incubated for 45 min at 37uC followed by denaturing at 80uC for 15 min in the thermal cycler for deactivating unused Exonuclease enzyme. The Exo/SAP treated amplicons were mixed with 1 ml of BigDye Terminator V3.1 (Applied Biosystems, California, USA), 2 ml of 56 sequencing dilution buffer and 3.2 mM of primer (forward and reverse separately) and the volume was made to 10 ml. The sequencing PCR profile included an initial denaturation of 96uC for 10 sec, 50uC for 5 sec, and 60uC for 4 min. Before sequencing, the PCR products were treated with 2.5 ml of 125 mM EDTA and 25 ml of absolute ethanol and incubated for 15 min at room temperature to precipitate the DNA. The plate containing the PCR product was centrifuged at 4000 rpm for 30 min at 4uC. The Ethanol/EDTA mix was poured off by inverting the plate, without losing the pellet. To each well, 60 ml of 70% ethanol was added and again spun at 4000 rpm for 20 min at 4uC. The ethanol was poured off as earlier. The plate was airdried and 10 ml of HiDi formamide (Applied Biosystems, California, USA) was added and the products were denatured (94uC for 5 min, then immediately cooled to 4uC for 5 min) and sequenced using an ABI3700 automated sequencer (Applied Biosystems, California, USA). The large-scale sequencing of candidate genes across 300 genotypes of reference set was carried out at MACROGEN, Korea using BigDye terminator cycle sequencing chemistry.
The gene sequences were subjected to BLAST against chickpea reference genome assembly [1] and the genome coordinates of these genes were determined. Using the genome coordinates for each genes, the features and the amino acid changes due to the SNPs were determined using SNPeff tool (http://snpeff. sourceforge.net/).
Phenotyping of reference set/mini-core collection. In the present study, the reference set/mini-core collection was phenotyped for 34 traits in 5 locations (Patancheru, Kanpur, Nairobi, Debre Zeit and Egerton) in three countries (India, Ethiopia and Kenya), in 1-5 environments (cylinder culture, CC; rainfed, RF; irrigated, IR; normal and heat stress environments) in 2-3 replications. The detailed information on the traits phenotyped, locations and seasons are provided in Table S2.
In addition transpiration efficiency related traits like delta carbon ratio (d 13 C), and SPAD chlorophyll meter reading (during 2004-05 and 2005-06) specific leaf area were also phenotyped (Table S2).

Heat tolerance phenotyping
A set of 280 accessions from reference set were phenotyped during the post-rainy 2009-10 in two sowing dates (normal and late sowing) on a Vertisol at Patancheru and in an Inceptisol (sandy loam) at Kanpur. The alpha lattice design was adopted and evaluated in three replications at both the locations.

Statistical analysis
For each trait, data was analyzed using analysis of variance (ANOVA) by using SAS General Linear Model (GLM) procedure [48] considering all effects as fixed. Along with ANOVA results, least square means (LSM; genotype as fixed effect), standard error of differences (SED), least significant difference (LSD) and descriptive statics like coefficient of variation (CV) and grand mean (GM) were calculated. Considering genotype as random, best linear unbiased predictors (BLUPs) were estimated by using SAS MIXED procedure [48]. Correlation coefficients among different traits were calculated by Karl Pearson's method using SAS CORR procedure [48]. Considered genotype as random in the statistical model and variance components were estimated for each effect, which are used for calculating heritability.

Population structure
A set of 85 DArT loci uniformly distributed across the chickpea genome [10] were used to understand the genetic structure and number of sub-populations in the reference set employing STRUCTURE version 2.3.1 [29] (http://pritch.bsd.uchicago. edu/structure.html) was employed. For this, the number of sub-populations (K) was presumed as 1 to 15, and each was repeated two times. For each run, burn-in and iterations were set to 1,00,000 and 2,00,000 respectively, and admixture and correlated allele frequencies was used. The run with maximum likelihood was used to assign individual genotypes into sub-population.

Association analysis
The SNP, DArT and SSR markers data set from the reference set was used to generate a matrix of similarity between each pair of genotypes in the study (the K matrix) using the program TASSEL. The Q and K matrices were used to correct for the effects of population substructure in the association panel which can cause false positive associations. Using the Q and K matrices as a covariate, markers were tested for association with each phenotype using the program TASSEL (http://www.maizegenetics.net). A Mixed Linear Model (MLM) analysis with optimum compression with P3D method [52] was used in regression, to allow for multiple testing effects. A mixed model approach implemented in EMMA 22 was used to correct the confounding of population structure.