Population Structure, Genetic Diversity and Molecular Marker-Trait Association Analysis for High Temperature Stress Tolerance in Rice

Rice exhibits enormous genetic diversity, population structure and molecular marker-traits associated with abiotic stress tolerance to high temperature stress. A set of breeding lines and landraces representing 240 germplasm lines were studied. Based on spikelet fertility percent under high temperature, tolerant genotypes were broadly classified into four classes. Genetic diversity indicated a moderate level of genetic base of the population for the trait studied. Wright’s F statistic estimates showed a deviation of Hardy-Weinberg expectation in the population. The analysis of molecular variance revealed 25 percent variation between population, 61 percent among individuals and 14 percent within individuals in the set. The STRUCTURE analysis categorized the entire population into three sub-populations and suggested that most of the landraces in each sub-population had a common primary ancestor with few admix individuals. The composition of materials in the panel showed the presence of many QTLs representing the entire genome for the expression of tolerance. The strongly associated marker RM547 tagged with spikelet fertility under stress and the markers like RM228, RM205, RM247, RM242, INDEL3 and RM314 indirectly controlling the high temperature stress tolerance were detected through both mixed linear model and general linear model TASSEL analysis. These markers can be deployed as a resource for marker-assisted breeding program of high temperature stress tolerance.


Introduction
Rice is the staple food of nearly 3.5 billion consumers with a total paddy production of 715 million tons in the world [1]. The rapid phase of urbanization and industrialization erodes the environment by increasing the level of greenhouse gas leading to global warming. Thus, the high temperature stress has become a major disastrous factor that impedes dry season rice productivity in the countries of sub-tropical region. The tropical and subtropical countries, such as India, Pakistan, Bangladesh, China, Thailand, Sudan, and some other African countries are obviously encountering yield loss due to high temperature stress [2,3]. Earlier report indicated that 3 million ha of rice was damaged during 2003 accruing a loss of 5.18 million tons of rice in China due to the prevalence of continuous high temperature (>38°C) for more than 20 days [4][5][6]. Heat stress can be considered as a serious threat to sustaining rice production in the most productive regions of tropical and sub-tropical Asia. The global average temperature has increased by 0.6°C over the past 100 years and is projected to increase at a rapid rate in future [7]. The average increase is expected to be 0.5-2.8°C by the end of the 21 st century [8,9]. Studies on the effect of high temperature stress on grain yield revealed a decrease of 10% rice grain yield by a rise of 1°C temperature [10]. Among different growth stages of rice, the flowering stage is the most sensitive stage to high temperature [11,12]. The temperature beyond 35°C during flowering stage adversely affects high pollen and spikelet fertility.However, high temperature hasmore deleterious effect on just before or during anthesis [12]. Further, it affects anther dehiscence, pollination, and pollengermination, thereby leading to spikeletsterility and yield loss [13]. Exposure of rice spikelet during anthesis even for less than an hourat 33.7°C may result in spikelet sterility [14].
The heat tolerant genotypes dehisce anther seamlessly under high temperature than the susceptible ones [12,15,16,17]. Therefore, the estimation ofspikelet fertility (SF) can be used as a screening tool for tolerance to high temperature stress during the flowering stage [18]. Genotypes endowed to withstand high temperature during anthesis result in high spikelet fertility which can be of great asset to develop high temperature stress tolerant lines by associating stress phenotypes with the chromosomal regions of the tolerant genotypes. Genotypic variation in high temperature stress tolerance during flowering stage is reported in indica and japonica sub-species [13,14,17,18,19,20]. Nagina 22 (N22), a known drought tolerant genotype of India, is highly tolerant to high temperature stress [21]. The genetic variation in high temperature stress tolerance is the pre-requisite for high temperature tolerant breeding program. Molecular markers, especially SSR or Simple Sequence Length Polymorphism (SSLP) markers have been used for estimating rice genetic diversity [22][23][24][25][26][27][28][29][30]. For selection of suitable diverse parental lines possessing temperature tolerance, the genetic diversity pattern should be estimated by using the trait linked SSR markers that would confirm the suitability of the material to be used in the breeding program. The genetic diversity and structure of the population are imperative for association mapping and molecular breeding program for the trait of concern. For a perfect association analysis, the population should not show unauthentic association or unequal relatedness within the population [14]. The population structure (Q) with relative kinship (K) matrices is used to correct the linkage disequilibrium (LD) due to population structure and familial relatedness for mixed-model association mapping [31]. Thus, marker-based kinship estimate is considered appropriate for association mapping approaches that have shown to perform better than other alternatives.
The association between DNA haplotypes and a particular phenotype may allow the assessment of genetic potential of the donors and breeding lines that are to be used in the breeding program. Several reports on QTL mapping for heat tolerance at flowering stages have been conducted using bi-parental mapping populations [21,[32][33][34][35][36]. This type of mapping is much costlier and only a few alleles can be identified over a long research timescale with low resolution [37][38][39][40]. These limitations are now reduced with the use of association mapping or linkage disequilibrium (LD) mapping. The main aim of LD mapping is to estimate the correlations between genotypes and phenotypes in a sample of individuals of a disequilibrium population. The association mapping for various traits is reported in many crops, like maize [41], wheat [42,43], barley [44], oilseed rape [45], Brassica rapa [46], soybean [47], G. hirsutum [48] and rice [49][50][51][52].Therefore, the objective of the present study is to estimate the diversity, population structure and to identify associated markers with high temperature stress tolerance traits in rice.

Materials and Methods
Plant materials and field screening of genotypes for high temperature stress tolerance A total of 240 germplasm accessions, short listed on the basis of spikelet fertility, shot duration and photo-insensitivity traits during previous years were utilized for marker-trait association study(S1 Table). Thesematerials were mostly indigenous with some exotic landraces and improved cultivars obtained from National Rice Research Institute (NRRI) gene bank. Four staggered sowings at an interval of 7 days was carried out starting in mid-January, 2014 enable coincide flowering with concomitant high temperature stress during April-May. Twenty-five days old seedlings were transplanted in a staggered manner with 7 days interval following an augmented block design comprising of four check varieties (Dular, Chandan, N22 and Satyakrishna) with three rows of 4 m length each at a spacing of 20 x 15cm between hills. During flowering period, observations on daily average maximum and minimum temperature were collected from the meteorology observatory of the institute. Spikelet fertility was recorded for all the staggered plantings to infer effect of high temperature stress on magnitude of tolerance.

Screening for high temperature stress tolerance under control and normal condition
The seeds of 60 genotypes including N22 (tolerant check) and Satyakrishna (susceptible check) were sown in plastic trays containing soil and were grown in a net house under normal condition. Twenty-one days old seedlings were uprooted and transplanted into uniform plastic pots filled with field soil and replicated thrice. One set of materials was exposed to high temperature stress in the growth chamber whereas the other set was maintained in the green house chamber to compare their relative performance under two treatment conditions. The first three heading panicles were marked and the potted plants were moved to growth chamber for high temperature treatment. The temperature in the growth chamber was set to simulate the daily ambient temperature ( Table 1). The pots were brought back to the green house when all the spikelets on the three marked panicles completed flowering. At physiological maturity stage, the number of filled spikelets and chaffy spikelets were recorded. The mean spikelet fertility (percentage of filled spikelet) of the marked panicles was used as an index to evaluate the heat tolerance of the genotypes. Eleven agro-morphological traits viz., days to 50% flowering, plant height (cm), flag leaf length (cm), flag leaf breadth (cm), panicles/plant, panicle length (cm), spikelet fertility and sterility (%) under normal and stress condition, panicle exsertion under normal and stress condition were recorded. IRRI Standard Evaluation System to score panicle exsertion was followed. Score of 1 for well exserted; 3-moderately well exserted; 5-just exserted; 7-partially exserted and 9-enclosed were used for panicle exsertion trait. Principal Component Analysis (PCA) was performed to estimate Euclidean distance between two genotypes in the multivariate space and identification of genotypes to desired environment [53,54]. The analyses were performed using the Windostat 7.5 software (Indostat Services, Hyderabad, India.).

DNA Isolation
Leaves were sampled from 15 days old seedling to extract genomic DNA for molecular screening of high temperature tolerance amongst the genotypes. Total genomic DNA was extracted after crushing in liquid nitrogen in microfuge tubes using CTAB extraction buffer (100mM Tris-HCl pH 8, 20mM EDTA pH 8, 1.3M NaCl, 2% CTAB) and chloroform-Isoamyl alcohol extraction followed by RNAase treatment and ethanol precipitation [55]. Agarose gel electrophoresis was used to estimate DNA concentration and each sample was then diluted to approximately 30ng/μL.

Genetic Diversity and population structure
Data were scored on the basis of presence or absence of the alleles for each genotype-primer combination and entered into a binary data matrix as discrete variables. The number of alleles, allele frequency, gene diversity, heterozygosis and polymorphic information index (PIC) were estimated using the program PowerMarker Ver3.25 [64]. The data were analysed for possible population structure by using model based approach with STRUCTURE 2.3.4 software [65]. The project was run with 150,000 burn-in followed by 150,000 Markov Chain Monte Carlo (MCMC) replication with model parameter set of 'possibility of admixture and allele frequency correlated'. Each K value was run for 10 times with K value varying from 1 to 10. The mean estimate of the log posterior probability of the data L (K) was plotted against the given K value in order to obtain optimum K value. Exact number of sub-population was identified using the maximal value of L (K). The model choice criterion to detect the most probable value of K was ΔK, an ad hoc quantity related to the second-order change of the log probability of data with respect to the number of clusters inferred by STRUCTURE [66]. The ΔK value as function of K estimated using Structure Harvester [67] has also shown a clear peak at the optimal K value. An unweighted neighbor joining un-rooted tree was constructed using the calculated dissimilarity index by using NEI coefficient [68] with bootstrap value of 1000 by using DAR-win5 software [69]. distance matrix. The presence of molecular variance within and between the population structure was assessed through Analysis of molecular variance (AMOVA) by using GenAlEx 6.5 software [70]. F statistics including deviations from Hardy-Weinberg expectation across the whole population (F IT ), deviation from Hardy-Weinberg expectation within a population (F IS ) and correlation of alleles between subpopulation (F ST ) was calculated. The hypothesis of the association of SSR markers with high temperature tolerance was tested using a general linear model (GLM) and mixed linear model (MLM) in the program TASSEL 5 [71].

Phenotypic Screening of rice genotypes under field situation
A total of 240 upland and early duration rice germplasm lines were staggered grown during 2014 dry season at National Rice Research Institute (NRRI), Cuttack. These genotypes were exposed to the maximum temperature to coincide with anthesis during the hottest days of the year at the Institute site ( Fig 1A). In the field evaluation study, 59 germplasm lines were selected from 240 genotypes based on spikelet fertility (SF) during the hottest period (April-May) of 2014 dry season ( Fig 1B). The selected lines included 24 International Rice Research Institute tolerant lines and N22 (positive check) in it based on very high spikelet fertility (>60%) than the susceptible genotype Satyakrishna (Fig 1). The spikelet fertility of tested genotypes ranged from 7.6% (AC11209) to 88.6% (N22), while susceptible lines showed an average spikelet fertility of 15.3% (Fig 1; S1 Table).

Phenotypic Screening of rice genotypes under control facility
The genotypes identified as tolerant in terms of high spikelet fertility in ex situ were screened again along with susceptible and tolerant checks under control facility (RGA-Cum-Phytotron) at NRRI, Cuttack following the protocol of Ye et al. [57] ( Table 1). The phenotyping results exhibited a wide variation for percentage of spikelet fertility between 0.74 (Satyakrishna) and 64.29% (N22) under high temperature stress, whereas these genotypes showed very high fertility (>80%) in normal green house conditions (Table 3). Fourteen genotypes were observed to be sensitive to high temperature stress exhibiting a spikelet fertility range of 0.7-10%whilenormalunderex situ. Twenty genotypes were weakly tolerant under high temperature stress with spikelet fertility of >10-20%, while all of them exhibited more than 80% spikelet fertility under normal condition. Eleven genotypes were moderately tolerant to the stress with spikelet fertility of >20-30%.Under temperature stress situation, seven genotypes behaved as tolerant type showing spikelet fertility of >30-40%. Eight genotypes namely CR3621-6-1-3-1-1, CR3820-2-1-5-1-2, CR3813-4-4-4-2-2, CR3820-4-5-3-1-3, AC.39890, AC.39973, AC.39790 and N22 showed spikelet fertility of >40% under high temperature stress and categorized as highly tolerant while these lines exhibited>80% under normal situation. Among all the genotypes studied, N22 alone was found to be very highly tolerant to high temperature stress showing >60% spikelet fertility while >80% fertility was observed under normal situation. The panicle exsertion varied with genotypes and ranged from well exserted (score1) to partially exserted (score7). Germplasm lines showed score 1 exhibiting well exserted panicle. Genotypes such as Satyakrishna and HHZ5-SAL10-DT3-Y2 scored more than 5 were observed to be susceptible to temperature stress.

Genotype-by-trait biplot analysis
The phenotyping data of 60 germplasm lines for 11 agro-morphologic traits were utilized to generate genotype-by-trait biplot graph (Fig 2) for analysis of the genotypes with first two principal components. The first principal component explained 64.68% of variation, while 2nd component exhibited 16.51% of the total variability. Among the 11 agro-morphologic traits studied, spikelet fertility % under normal condition contributed maximum towards diversity, followed by spikelet sterility under stress situation (Fig 2). The 1 st (top left) and 4 th (bottom left) quadrant contained 18 genotypes, which were tolerant to high temperature stress and  (Table 4).

Cluster analysis
The discrimination ability of the 20 closely linked markers for high temperature tolerance was determined by clustering the genotypes and by constructing the tree on the basis of amplification pattern with the markers. The unrooted tree was constructed by using unweighted-neighbour joining method (Fig 3). These polymorphic markers were distributed in 10 chromosomes and distinguished the genotypes into different groups. Four major and six minor clusters originated from the central point. The genotypes could be grouped into separate clusters exhibiting correspondent percentage of spikelet fertility (Fig 3A).  Population structure The population panel for high temperature stress tolerance was analyzed by employing STRUCTURE software utilizing the Bayesian clustering approach that divided the population into three sub-populations (Figs 4 and 5). The assumed values of probable sub-populations (K) were ascertained by choosing higher ΔK value, an ad hoc quantity related to the second order change of the log probability of data with respect to the number of clusters inferred by Structure [66]. As per the Evano table output, the K = 3 was observed to be the best due to high ΔK peak value of 55.8 among the assumed K (Fig 4B). The sub-population 1 (SP1) contained 15 pure and 9 admixture genotypes, accommodating weakly tolerant and susceptible genotypes to high temperature stress. In the structure analysis, pure and admixture categorization was made with 18 pure and 7 admixture types from a total of 25 germplasm lines in sub-population2 representing mainly weakly and moderately tolerant types (Figs 4 and 5; Table 5

Analysis of molecular variance (AMOVA)
The three populations generated from structure analysis were analyzed for genetic variation among and within the clusters using AMOVA (Table 6). From the analysis, 25% of variation was observed between population, 61% among individuals and 14% within individuals observed in the population. Wright's F statistic was estimated to determine deviation of Hardy-Weinberg expectation in the population. The F IS for all the 20 marker loci was 0.815, while F IT was 0.862 across the clusters. Pairwise F ST values showed significant differentiation among all the pairs of sub-populations ranging from 0.192 to 0.296 suggesting that all the three  Table 3.

Association of marker alleles with phenotypic traits
The marker-trait associations for all the traits were calculated using GLM and MLM (Q+K) model of TASSLE5 software (S2 Table). The squared allele frequency correlation (r 2 ) values ranged from 0.066 to 0.288 with an average of 0.125 by using GLM, whereas the average reduced to 0.016 by using MLM analysis. Among total of 220 comparisons, 10 were found to be significant at p< 0.01 and 36 were significant at p < 0.05 by using GLM, whereas MLM analysis showed only 10 comparisons to be significant at p < 0.05 ( Table 7). The markers showing significant associations were within 20cM range with an exception of few markers like RM547 and RM336. However, the marker RM7365 reported to be tightly linked to high temperature stress tolerance with 0.8cM position in earlier bi-parental mapping did not show significant association in the present study. The markers located more than 100cM distance did not show association at p <0.01, as in the case of RM235 and RM234 located on chromosome 12 and 7, respectively were not significant at r 2 > 0.10 with p< 0.05. Considering the GLM statistics, RM314, RM 3586, RM249, RM225, RM336 and RM547 could be associated with spikelet fertility percentage, with higher F value and lower P value indicating a positive association with high temperature tolerance. Further, to make the association robust by considering the kinship value, MLM analysis exhibited a strong marker-trait association with phenotypic variance of 7.42% with RM314 and 8.85% with RM547 for trait spikelet fertility under normal and stress condition, respectively. It was further evidenced from TASSEL analysis that the marker RM228 associated with panicle emergence; RM242 and INDEL3 with panicle length; RM 247 with flag leaf width; RM205, RM547 with flag leaf length; INDEL3 with days to 50% flowering which was also evident from cluster analysis. INDEL3 associated with days to 50% flowering and panicle length, whereas RM547 showed strong association with flag leaf length, spikelet fertility and sterility under stress condition. The QQ plot showed significant association of markers for spikelet fertility/ sterility under stress, panicle emergence under stress, panicle length and plant height (Fig 6).

Discussion
High temperature stress is posing as an important yield limiting abiotic factor in dry season rice of many Asiatic countries. It is causing significant yield loss in case of delayed dry season rice cultivation in sub-tropical countries due to changing climate. Hence, genetic diversity for this trait is the pre-requisite step for developing high yielding temperature stress tolerant rice varieties suitable for dry season cultivation. In our field screening along with concurrent controlled screening methods, we could identify eight highly tolerant genotypes to high temperature stress with >40% spikelet fertility under stress condition. The genotypes were classified into four classes basing on spikelet fertility. Clustering based on molecular markers and genotype-trait biplot analysis showed almost similar pattern of grouping to that of the spikelet fertility phenotyping group (Figs 2 and 3). The tolerant genotypes were grouped into many subgroups, which might be due to the presence of different stress tolerant gene(s)/QTL(s) and their possible combinations in the landraces and breeding lines. Hence, the population structure estimation in the germplasm set is most important. Earlier studies also indicated screening and identification of heat tolerant genotypes in rice [14,17,20,72,73]. N22, an upland cultivar of India was observed to be most tolerant genotype showing high spikelet fertility under field and control screening facility. Similar results were also obtained in earlier screening for high temperature stress tolerance [21]. The principal component analysis placed the tolerant genotypes in 1 st and 4 th quadrant of the biplot on the basis of eleven agro-morphologic traits, where highly tolerant genotypes were highlighted in a circle (Fig 2). The 60 genotypes were classified into tolerant and non-tolerant with most of the tolerant types in the left side of center. Hence, there is a variation for the trait with respect to temperature tolerance. The unrooted tree, discriminated the highly tolerant and Various classes of genotypes obtained based on temperature tolerance linked markers indicated the presence of many genes/QTLs in the studied genotypes. In our study, 20 high temperature stress tolerance linked molecular markers produced 63 alleles with average gene diversity of 0.3751 and PIC value of 0.2998. The low to moderate value of alleles, gene diversity and PIC might be due to the use of limited numbers of linked markers for a single phenotypic trait with the exclusion of spurious bands from the genotyping analysis. As the diversity estimation is for multi-locus alleles conferring diversity with respect to a high temperature tolerance trait, these parameters indicate higher genetic diversity in the population for the trait. Similar results were observed in earlier studies in rice diversity using single trait linked markers [74,75]. In a study of 40 accessions from Pakistan [75] indicated slightly lower genetic diversity with average PIC values of 0.38 with an average of 2.75 alleles per locus. Our results on genetic diversity of high temperature stress tolerance are in conformity with the earlier studies indicating moderate diversity parameters estimate [74][75][76][77][78][79]. However, several earlier reports also suggested very rich values of the genetic diversity for agro-morphologic traits in various rice populations [50,80,81]. The existence of linkage disequilibrium between a molecular marker and an unknown locus regulating a particular phenotype can further lead to establishment of association of the specific haplotype/ marker alleles with that particular phenotype in a population. The associated marker alleles can be used in molecular breeding program for improving other genotypes lacking the particular trait. The marker should be tightly linked to the locus controlling the particular phenotype for detection of marker-trait association. In our study, marker-trait association was much more precise as the materials showed different classes of phenotypes for high temperature stress tolerance trait of rice. Structured population and relatedness characteristics Population Structure and Marker-Trait Association for High Temperature Stress Tolerance in Rice among the set of genotypes suggested that the population structure and kinship in conducting population based association mapping in rice germplasm should be performed. The phenotypic evaluation clearly differentiated the study materials into different groups, indicating their heterogeneity towards high temperature stress tolerance (Figs 2-4).This favoured the presence of linkage disequilibrium and increased the chance of detecting marker-trait association. Similar results showing the potential value of germplasm in heterogeneous collections had detected marker-phenotypic trait association [35,50,64,77,82,83,84].
High temperature tolerance in rice has gained much attention due to increasing temperature stress, particularly in dry season rice cultivation of sub-tropical countries. The application of association mapping for this trait will be helpful in deciding the molecular markers to be used in pyramiding the multiple QTLs responsible for the trait. Therefore, diverse populations are needed for association mapping of the trait. STRUCTURE has grouped the population into three sub-populations, indicating the suitability of the selected population for phenotype-trait association study. Further, output of the program detected a lower value of alpha (α = 0.139) suggesting most of the landraces in each sub-population had a common primary ancestor with few admix individuals. The inferred ancestry indicated that many landraces possess a partial allele from an ancestry, which might probably be due to complex intercrossing among germplasm from diverse background experiencing strong selection pressure for the trait. Similar reason has also been reported by earlier workers [50,85].
High r 2 values were detected for markers RM547, RM228, RM247, RM205, RM indel3, RM242 and RM314 by both GLM and MLM TASSEL analysis for spikelet fertility and other phenotypic traits of tolerant accessions. This suggests that, the observed marker-trait association resulted from a possibly multiple introgressions from the landraces possessing different QTLs which ultimately showed a higher level of tolerance. Of the 220 comparisons, only 10 were significant with r 2 > 0.10 at p< 0.01. Thirty six comparisons were significant at p<0.05 ( Table 7). The markers showing significant associations fall within 20cM range with an exception of the few markers like RM547 and RM336. Similar results were also earlier described by Zhao et al. [50]. Besides spikelet fertility (stress), RM 547 was detected to be strongly associated with flag leaf length. This indicated that one marker could show association with other regions of the rice genome also. However, the genetic and physical distance between a marker and the genes controlling quantitative traits might be quite important for detecting association. These linked marker positions and other quantitative trait might also show a pleiotropic effect or may be located very closely. Similar multiple quantitative traits association with molecular markers have been detected and reported earlier [82,86]. One marker allele namely RM547 showed strong association with spikelet fertility detected by both GLM and MLM analysis can be used for selecting high temperature stress tolerant lines in rice (Table 7). In the present marker-trait association study, a strong association was detected by MLM and GLM model of TASSEL analysis for marker RM314 with spikelet fertility (normal); RM228 for panicle emergence; RM205 and RM547 for flag leaf length; RM247 for flag leaf width; RM 242 and INDEL3 for panicle length and INDEL3 for days to 50% flowering. Earlier reports using various mapping populations have reported many high temperature tolerant QTLs on chromosome 6,8,9,10 and 12 [33,57,59,60,62,63]. Also, our principal component study revealed higher estimates for these phenotypic traits in the tolerant lines present in the encircled area of the biplot analysis (Fig 2). Hence, we may use these six associated markers in marker-assisted breeding for improvement of these traits by which indirectly tolerance for high temperature will increase. Thus, the strongly associated marker RM547 with spikelet fertility under stress and the markers indirectly controlling the high temperature stress tolerance like RM228, RM205, RM247, RM242, INDEL3 and RM314 can be used in marker-assisted high temperature stress tolerance breeding program.

Conclusion
The phenotypic screening study for high temperature stress tolerance has classified the set of genotypes into four classes. A moderate level of genetic base of the population could be inferred from the genetic diversity analysis. The population panel for the trait has been categorized into three sub-populations by STRUCTURE software. It suggested that most of the landraces in each sub-population had a common primary ancestor with few admix individuals. A set of landraces and breeding lines from a large collection of germplasm materials showed the advantage of many QTLs in the set of materials representing in almost the whole of the genome for the trait. Application of association mapping will increase the utility of more number of markers detected from this study in future molecular breeding programmes aimed at improving this trait. This type of mapping can be useful and a suitable and dependable alternative for linkage mapping, making marker-assisted selection exercise more effective.