Genome-wide association analysis for heat tolerance at flowering detected a large set of genes involved in adaptation to thermal and other stresses

Fertilization sensitivity to heat in rice is a major issue within climate change scenarios in the tropics. A panel of 167 indica landraces and improved varieties was phenotyped for spikelet sterility (SPKST) under 38°C during anthesis and for several secondary traits potentially affecting panicle micro-climate and thus the fertilization process. The panel was genotyped with an average density of one marker per 29 kb using genotyping by sequencing. Genome-wide association analyses (GWAS) were conducted using three methods based on single marker regression, haplotype regression and simultaneous fitting of all markers, respectively. Fourteen loci significantly associated with SPKST under at least two GWAS methods were detected. A large number of associations was also detected for the secondary traits. Analysis of co-localization of SPKST associated loci with QTLs detected in progenies of bi-parental crosses reported in the literature allowed to narrow -down the position of eight of those QTLs, including the most documented one, qHTSF4.1. Gene families underlying loci associated with SPKST corresponded to functions ranging from sensing abiotic stresses and regulating plant response, such as wall-associated kinases and heat shock proteins, to cell division and gametophyte development. Analysis of diversity at the vicinity of loci associated with SPKST within the rice three thousand genomes, revealed widespread distribution of the favourable alleles across O. sativa genetic groups. However, few accessions assembled the favourable alleles at all loci. Effective donors included the heat tolerant variety N22 and some Indian and Taiwanese varieties. These results provide a basis for breeding for heat tolerance during anthesis and for functional validation of major loci governing this trait.


Introduction
Spikelet sterility in the rice crop is becoming a burning issue in the context of global warming since it may occur as soon as air temperature reaches 33˚C at time of anthesis [1] even if it is a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 stability indices, sugar content increases. Proteins with the highest molecular subunits (more than 66 kDa) are disintegrated into low molecular mass polypeptides (below 43 kDa), leading to the loss of pollen viability [25].
Large genetic variation exists in the spikelet sensitivity to high temperature damage. Accessions tolerant to high temperature have been identified in all major rice genetic groups [15,26,27,28,29]. Among them, N22, an Indian landrace belonging to the aus genetic group, is one of the most heat-tolerant varieties. For instance when submitted to 38˚C for 6 hours during anthesis, N22 maintained a spikelet fertility of 71% while the fertility was of 48% for the moderately tolerant indica variety IR64 and of only 18% for the tropical japonica variety Moroberekan [29]. High heat tolerance during anthesis was also reported for Giza178, a japonica variety from Egypt [30].
During the last decade several studies aiming at dissecting the genetic bases of spikelet sensitivity to high temperature have been reported. Using a bulk-segregant analysis approach in a F 2 population of 279 individuals derived from cross between the heat-tolerant cultivar 996 and the sensitive cultivar 4628, Zhang et al [31] identified two SSR markers (RM3735 on chromosome 4 and RM3586 on chromosome 3) significantly associated with heat tolerance, accounting for 17% and 3% of the total variation, respectively. Later on, using linkage mapping approach in recombinant inbred lines derived from the same cross, Xiao et al [32] detected two QTLs affecting spikelet fertility under high temperature on chromosome 4 (RM5687-RM471) and chromosome 6 (RM190-RM225). They explained 15.1% and 9.3% of the total phenotypic variation, respectively. Jagadish et al [23] identified eight QTLs associated with spikelet fertility under heat on six different chromosomes within a population of 181 recombinant inbred lines derived from a tolerant (Bala) and a susceptible (Azucena) parent. The most significant heat-responsive QTL mapped on chromosome 1 (38.35 Mb) and explained 18.1% of the phenotypic variation. Ye et al [33], using BC1F1 and BC2F2 populations derived from a cross between N22 and IR64 varieties, detected two putative QTL, located on chromosome 1 (qHTSF1.1) and chromosome 4 (qHTSF4.1), explaining 12.6% and 17.6% of the total variation, respectively. More recently, Ye et al [30] mapped four QTLs (qHTSF1.2, qHTSF2.1, qHTSF3.1 and qHTSF4.1) in a F2 population of IR64/Giza178, and two other QTLs (qHTSF6.1 and qHTSF11.2) in a F2 population of Milyang23/Giza178. The QTL on chromosome 4 (qHTSF4.1) colocalised with the QTL on chromosome 4 previously identified in the IR64/N22 population. However, the confidence intervals of those QTLs, at best between 0.8 and 1.8 Mb [30], are still too large to search for candidate genes.
At the other end of the spectrum of genetic analyses, using proteomic approach Jagadish et al [34] reported much higher accumulation of small heat shock proteins (sHSP) in N22 variety compared with the heat-sensitive variety Moroberekan, and the moderately heat-tolerant variety IR64, when submitted to high temperature during anthesis. Likewise, semi-quantitative RT-PCR based gene expression analysis, targeting heat shock proteins (Hsps) and heat shock transcription factors (Hsfs), identified N22 as the variety with the highest level of expression for most of the Hsps and Hsfs genes tested [35]. Several other studies have analyzed differential gene expression in rice subjected to high temperature [36,37,38] but they did not focus on the reproductive stage and the resulting spikelet sterility.
Genome-wide association studies (GWAS) has emerged as a tool to resolve complex trait variation down to the sequence level by exploiting historical and evolutionary recombination events at the population level [39,40]. As an alternative to linkage analysis, association mapping offers three advantages, (i) increased mapping resolution, (ii) reduced research time, and (iii) greater allele number [41]. The approach was already successfully applied in rice to dissect the genetic bases of complex traits such as plant development and agronomic performance [42], root development [43], rice grain concentration in metalloids and micronutrients [44], drought tolerance [45], grain shape and other grain related traits [46]. The simplest form of GWAS is the marker-by-marker analysis, but methodology for more complex models have recently been developed [47].
To date, GWAS approach has not been used for the exploration of loci controlling tolerance to high temperature in rice. Here, we report application of three GWAS methods to detect QTLs for spikelet fertility under high temperatures during the anthesis process, and for several other phenotypic traits, in a diversity panel of 167 indica accessions genotyped with 13,162 SNPs. To minimize any bias when measuring spikelet fertility, only the top half of the panicles that had effectively experienced heat at time of anthesis were sampled and analyzed. Co-localizations with previously identified QTLs and with candidate genes were analyzed with the aim of connecting those scales of analysis and providing a unified picture of the genetic bases of spikelet tolerance to high temperature.

Plant material
The plant material was composed of 167 accessions extracted from a diversity panel of 201 traditional and improved indica accessions plus six aus accessions including the heat tolerant variety N22 (S1 Table). These accessions were chosen so as to cover the largest geographical origins and products of the major international lowland rice breeding programs. Seeds of the material were obtained either from Centre de Ressources Biologiques Tropicales de Montpellier (http://golo.cirad.fr/FR/) or from AfricaRice, CIAT or IRRI gene banks. The accessions were seed increased during two generations in Cirad Montpellier greenhouse, using single seed descent to make sure they were homogeneous.

Phenotyping
Experimental setup. The experiment was conducted in a greenhouse and in controlled growth facilities of the International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines (14˚11 0 N, 121˚15 0 E), from July to November of 2009. Seeds of the 167 accessions were germinated in Petri dishes in a germination chamber at 32˚C. After 2 days, when the coleoptiles reached 2-3 mm length, five pre-germinated seeds of each line were sown in a 6.5 l PVC pot containing 7.0 kg soil with 11.1 g organic C kg -1 , and 1.4 g total N kg -1 , with a pH of 6.5 and particle size distribution as 33% clay, 38% silt, and 29% sand. Fertilizers were applied in each pot through 0.16 g P as solophos, 0.16 g KCl and 0.06 g ZnSO 4 at sowing, and 0.16 g N as urea at 10, 41 and 56 days after sowing. Thinning was done to two seedlings per pot one week after sowing and to one seedling two weeks after sowing. Pots were kept flooded from right after thinning until maturity.
Pots were arranged within three separated blocks, each block containing two replicates of each of the 167 lines. The first block, B1, corresponded to the T1 treatment (control), the second one, B2, to the T2 treatment (heat at flowering), and the third one, B3, to an extra set for morphological measurements. All three blocks were grown in a greenhouse without specific climate control until few days before panicle initiation (35 days after sowing) with a constant distance between plants of 30 cm (density of 44 pl m -2 ). At that time, the three blocks were moved to one highly controlled greenhouse compartment each (soil dimensions of 9.7 m by 4.4 m) equipped with a continuous ventilation facility, and grown under 22/ 28˚C night/day thermal conditions. The plants of the three blocks were maintained under these conditions until maturity, except plants of B2 that were subjected to 38˚C during 6 days at time of flowering, and plants of B3 that were used for destructive and non-destructive measurements at flowering. In the case of B2, the plants were grown from 8 am to 2 pm (time range that includes the whole period of the day when anthesis occurs) during 6 consecutive days, in an outdoor controlled growth facility (soil dimensions of 1.3 m by 1.3 m) equipped with a continuous ventilation system that maintained a temperature of 38˚C in the vicinity of the rice panicles. This treatment started one day after anthesis was first observed on the main tiller.
Plant measurements. Accession-dependent traits (plant and panicle architectural patterns, heading date) were measured in the B3 block. Heading date was reported as the date when the panicle of the main tiller appears on top of the stem (DTHD). At flowering, defined as the time when anthesis of 50% of the panicle has occurred, plant height (PTHT) was measured from the soil surface to the tip of the highest leaf when stretched; the number of live tillers (TINB) and tillers bearing a panicle (PNNB) were counted. Plant samples were separated into three entities, green leaf blades, dead leaf blades, and stems plus panicles. One leaf was considered as green if more than half of its surface was still green. The total leaf area (LFAR) of the green blades was measured with a leaf area meter (LI-3100C Area Meter, Li-Cor, Lincoln, NE, USA). Dry matter of each entity, green leaf blades (GLFDW), dead leaf blades (DLFDW), and stems plus panicles (ST+PNDW), was determined by weighing the material after drying during 72 h at 70˚C. Shoot dry weight (SHDW) was calculated as the sum of GLFDW, DLFDW, and ST+PNDW. Specific leaf area (SLA) was calculated as the leaf area divided by the corresponding leaf dry weight. Panicle maximal width (PNDM) and position compared with the tip of the highest leaf (PNPOS) was measured for the panicle of each main tiller. Angles of the flag leaf of the main tiller (FLFAG) and the leaf immediately below it (LFAG) were measured with reference to the vertical. Filled and unfilled grains were separated at a flow rate of 4m 3 /s using a Seedburo blower (KL-1205, Seedburo, Chicago, IL, USA) and counted. Spikelet sterility under controlled condition (SPKSTc) was computed as the ratio of the number of unfilled spikelets above the total number of spikelets.
Treatment-dependent traits were measured from each of the other two blocks. The number of green leaves of the main tiller of each entry in each treatment was counted at flowering, and at 16, and 21 days after flowering. The rate of leaf senescence 16 and 21 days after flowering was then calculated as the ratio of the reduction in green leaf number between flowering and 16 and 21 days later, respectively, above the green leaf number at flowering (LFSNS 16, LFSNS 21). Panicle length (PNLT) was measured at maturity from the neck of the panicle, and the rate of panicle exertion (PNEX) at flowering was estimated as the ratio of the visible panicle length at flowering above panicle length at maturity. Flowering duration (HDLT) was calculated as the difference between the day when the last tiller of the plant finished flowering and the day when the first tiller of the plant started to flower. To address spikelet sterility measurement, each single panicle subjected to T2 treatment, for which all the spikelets were exposed to heat at anthesis (considering the six consecutive days of exposure), was tagged with respect to the day when anthesis started. All corresponding panicles that emerged on the same day in the T1 treatment were also tagged. At harvest, only the tagged panicles from both T1 and T2 treatments (that varied from 4 to 9 panicles depending on the genotype) were sampled and cut into two equal pieces (top and bottom parts). Filled and unfilled grains considering the top half of the panicles only (to avoid including unfilled grains because of lack of carbohydrate supply) were separated at a flow rate of 4m 3 /s and counted. Spikelet sterility under heat stress (SPKST) was computed as the ratio of the number of unfilled spikelets above the total number of spikelets of the top half of the panicles that fully flowered during the 6-day period within T2 treatment. The 2 ArcSin Square root transformation was applied to SPKST data expressed as percentage, for association analysis.

Genotyping
Genotyping was conducted at Diversity Arrays Technology Pty Ltd. (Australia) using a method of genotyping by sequencing (GBS) previously described by Courtois et al [43]. Briefly, a combination of PstI/TaqI restriction enzymes was used to reduce the genome complexity. The sequences were trimmed at 69 bp (5 bp of the restriction fragment plus 64 bases, with a minimum quality score of 10). An analytical pipeline developed by DArT P/L was used to produce DArT tables (corresponding to the presence/absence of any given sequence) and SNP tables (corresponding to single nucleotide polymorphisms within the 69 bp sequences). The position of each marker on the rice genome was determined by aligning the sequences to the Os-Nipponbare-Reference-IRGSP-1.0 pseudomolecule assembly [48]. Sequences that did not align on the Nipponbare sequence or aligned in several positions were discarded from the initial dataset, as well as markers with more than 20% missing data. For each marker the corresponding RAP-DB annotation (http://rapdb.dna.affrc.go.jp) was retrieved and, when it corresponded to a gene, the position regarding gene attributes (intron, exon, 3' or 5' UTR) was determined.
To constitute the matrix that was used for GWAS, markers with frequency of minor allele (MAF) below 2.5% were discarded, and the few heterozygous loci were replaced by missing data. Then missing data were imputed using Beagle v3.3.2 [49]. Finally, when the distance between two consecutive markers was below 1 kb, the marker with the lowest MAF was discarded.

Analysis of phenotypic data
The standard score of phenotypic data for 20 traits and 167 accessions was submitted to principal component analysis (PCA) and to factorial discriminant analysis (FDA) using XLSTAT package. In the later analysis, the membership in subpopulations identified by Structure analysis on genotypic data (see below) was used as categorical variables.

Analysis of population structure
To analyze population structure, a sub-set of 825 SNP markers with less than 3.5% missing data, MAF>5%, and distant from one another of at least 100 kb was selected from the initial matrix. The population structure of the panel of 201 accessions was analyzed using the model-based approach of Structure v3.2. [50] which was run with the following parameters: haploid data, possibility of admixture, correlated frequencies for a number of subpopulations (K) varying from 1 to 10 and 10 runs per K value. To determine the number of subpopulations, both Evano's and DAPC methods were used as described in [43]. Once the likely number of subpopulations determined, each accession was assigned to one of those subpopulations, if the proportion of its inferred ancestry derived from a subpopulation was above 2/3 or 67%. Otherwise the accession was considered as admixed. A distance-based clustering method was also implemented: an unweighted NJ tree based on a simple matching matrix was constructed using DarWin v6 [51]. The subpopulation assignments driven from structure analysis were projected on this tree.

Linkage disequilibrium
The speed of decay of linkage disequilibrium (LD) in the panel was estimated by computing r 2 between pairs of markers on a chromosome basis using Tassel 5.2 software [52], and then averaging the results by classes of distance using XLSTAT.

Association mapping
First, a single marker regression based association analysis (Sm-GWAS) was performed for all phenotypic traits under a Mixed Linear Model (MLM) where markers and population structure (Q matrix) effects were considered as fixed and the kinship effect (K matrix) was considered as random. The MLM was run under the exact method option of Tassel 5.2 software [53], where the additive genetic and residual variance components-the random factors of the mixed model-are re-estimated for each SNP. For each SNP tested, Tassel 5.2 computed a p-value, the log likelihoods of the null and alternative models, and the fixed-effect weight of the SNP with its standard error. The threshold to declare significant an association was set at a probability level of 1e-05.
Second, in order to ascertain the results of the single regression based association analysis eight phenotypic traits chosen for the importance of their discrimination power within the panel, were submitted to association analysis using two other methods, namely haplotype based GWAS and simultaneous fitting of all markers.
Haplotype based GWAS (Hap-GWAS) consists in testing the effect of haplotypes in sliding windows of SNPs (instead of individual SNP) across the genome for their association with phenotype. It is based on hypothesis that marker haplotypes are in greater LD with the QTL alleles than single markers [54]. If this is true, then the r 2 between the QTL and the haplotypes will increase, thereby increasing the power of the experiment [47]. The mixed model we used to test for the presence of a QTL at a given position was Y = X β + Z h h + Z u u + Ɛ where β is a fixed effect, which is the overall mean, and X = 1 n is a vector of ones. Vector u represents the random polygenic effects due to relationships among individuals. Vector h represents the random effects of haplotypes and is a function of the number of observed haplotypes at each tested position, which is defined by the center of the sliding window. Haplotype effects are treated as random as it is likely that many haplotypes will have low frequencies due to the small sample size (i.e. 167). The identity by state, between the marker alleles of haplotypes, was used as the allele identity predictor at the QTL. The size of the windows was equal to 6 consecutive markers, whatever the distance between those markers. The restricted maximum likelihood ratio test (RLRT) was used to decide for the existence of association for each tested position. Hap-GWAS was implemented using a script developed by Jacquin et al [54].
GWAS based on simultaneous fitting of all markers involves fitting the models that have been proposed for genomic prediction [55]. The model we chose was the Bayesian lasso [56], which assumes that SNP effects are random and are derived from a double exponential or Laplace distribution. This distribution induces a type of shrinkage of estimates that is size-ofeffect dependent, i.e. there is a possibility for SNPs to have a null, a moderate or a large effect. The Bayesian lasso model based GWAS (BL-GWAS) was implemented using BGLR statistical package [57]. BGLR output included the individual marker effects, the estimated posterior mean and standard deviation of the marker effects, the estimated posterior mean of genomic values of N accessions and statistics related to deviance information criterion DIC. To decide for the level of marker effects to be considered as significantly different from zero, we performed a permutation test. For each trait, 1000 permutations were performed and the significance threshold was set as to correspond to less than 0.005%. Moreover, to further ascertain loci with high effect, a robustness test was performed for PTHT and SPKST traits as follows: first, 100 samples were constructed by randomly drawing 75% of the accessions of the panel; then, samples were reanalyzed using the BL-GWAS that was used for the raw data to construct the distribution of the test statistics, i.e. mean and standard deviation of the effects of the 13,160 SNP markers over the 100 simulations. The threshold to declare significant an association was set at the false positive probability level of P< 0.05.

Analysis of co-localization of significant SNP with known QTLs
To analyze co-localization of significant SNPs detected by GWAS with QTLs reported in literature using progenies of biparental crosses, we used the QTL database developed by Courtois et al [58] and the Gramene QTL database (http://www.gramene.org/). In the case of SPKST, the database used was the one proposed by Ye et al [30], specifically focusing on SPKST resulting from high temperature during anthesis, plus the QTLs those authors have reported in their own paper. To shift from genetic maps to physical maps, the positions of the QTLs were determined by the physical position of the most significant markers linked to the QTLs. To determine the physical positions of the markers, Gramene data or BLAST results based on Gramene sequence information were used.

Marker distribution and linkage disequilibrium
The genotypic data set resulting from data cleaning pipeline and imputation, was composed of 13,160 markers (6493 DArT and 6667 SNP markers), which corresponded to an average density of one marker per 29 kb. There were 63 gaps of more than 250kb and nine gaps of more than 500 kb on chromosomes 2, 4, 6, 7, 8 and 11, among which two gaps of approximately 1 Mb (S2 Table). The median MAF varied from 12% (chr 7) to 20% (chr 11) and the overall median MAF was of 16% (S2 Table).
The r 2 estimate of LD in each of the 12 chromosomes averaged 0.51 for marker pairs whose distance was below 25 kb. The LD reached half its initial value at distance of 125-150 kb between markers and went below 0.1 at distances above 600 kb (S3 Table). There was very little variation between chromosomes for LD features. Given these LD features, the average density of one marker per 29 kb seemed amply sufficient for whole-genome association mapping.

Panel structure
Based on Structure results, the most likely number of subpopulations was four (data not shown). The assignments of the 201 accessions to one of the four subpopulations, based on the proportion of their inferred ancestry from each subpopulation, are represented on the NJ tree (Fig 1). Subpopulation SP1 (76 accessions) regrouped mostly traditional lowland indica varieties from all over the world with a sub-organization on the tree based on geographic origins. Subpopulation SP2 (71 accessions) regrouped improved indica varieties mainly from IRRI and AfricaRice breeding programs for lowland cultivation, and a few improved indica for upland cultivation. Subpopulation SP3 (19 accessions) was composed of traditional lowland varieties originating from Madagascar belonging to the special group that is found at medium elevation [59]. Subpopulation SP4 (6 accessions) regrouped aus varieties from the Indian subcontinent. The remaining 29 accessions were admixed. Among the 167 accessions of the panel, that were phenotyped in the present study, 55 belonged to subpopulation SP1, 66 to SP2, 18 to SP3, 4 to SP4 and 24 were admixed (S1 Fig).

Phenotypic diversity
A large variability was observed within the panel for each of the 20 phenotypic traits considered (S2 Fig). The first axis of a PCA performed with these traits including the 167 accessions accounted for 27.8% of the total variation and opposed DTHD, LFSNS and traits related to biomass (LFDW, DLFDW, SHDW, PNDW) to SLA and HDLT (Fig 2). Traits most contributing to the second axis (accounting for 13.6% of total variation) were mainly related to plant architecture: PTHT, TINB, LFAR, FLFAG, LFAG, PNEX and PNDM. SPKST had a rather small contribution to axes 1, 2 and 3 of the PCA, much higher contribution to axes 4, 5, 6 (about 11%) and axis 7 (52%). However, the axis 7 accounted for 4.5% of total variation, only.
The FDA using the membership in subpopulations defined by the structure analysis as categorical variables showed that those subpopulations differed significantly for the combinations of the 20 phenotypic traits considered (S3 Fig). The first axis of the FDA, accounting for 77.7% of total variation, opposed subpopulation SP1, composed of landraces with high plant stature and biomass, long duration, and exhibiting rapid senescence under high temperature treatment, to SP2 composed of improved varieties with shorter stature, smaller biomass and shorter duration. The second axis, accounting for 11.4% of total variation, opposed SP3 and SP4 that had contrasted SPKST, FLFAG, LFAG, LFAR, PNLT and PNNB.
Compared to other traits, SPKST, due to severe heat treatment during the process of anthesis, had a minor discriminating power among the four subpopulations (Fig 3). Indeed, while the mean spikelet sterility under control treatment (SPKSTc) averaged 24% with standard deviation of 13%, the mean SPKST under heat stress averaged 92% with standard deviation of 9%. Individual SPKST under control treatment and heat treatment are going in the same direction, but the overall relationship between SPKSTc and SPKST was very loose, r 2 = 0.029. The lowest SPKST were observed in N22, an aus variety from India, and Peh Kuh, an indica landrace from Taiwan. Some improved accessions such as B6144-MR-6-0-0 (from Indonesia), IR 22 and IR2344-P1PB-9-3-2B from the Philippines, and Andy 11 from Mali, exhibited medium tolerance.

Association analyses
Association mapping using single-marker-based regression. Results of association analysis for all phenotypic traits submitted to Sm-GWAS are presented in S4 Table. Results for the eight most discriminant traits among our diversity panel are presented in Table 1. At least one significant association was detected for all of the 20 phenotypic traits except PNLT and PNPOS. The number of SNP significantly associated with those 18 traits varied from one (for PNNB) to 54 (for LFAG). A total of 183 significant associations (p-value < 1e-05) were detected, of which two with p-value <1e-09, three with 1e-09<p-value <1e-08, nine with 1e-08<p-value <1e-07, and 19 with 1e-07<p-value <1e-06. The number of independent loci (i.e. cluster of SNPs with distance between two consecutive significant SNPs below 200kb, corresponding to an average r 2 >0.2) associated with each trait varied from one (PNEX, TINB, PNNB, DTHD and SHDW) to 15 (for SPKST) and reached a total of 95 (S4 Table). For instance, for PTHT and SPKST, the total number of significant SNPs was five and 27, respectively, and the number of independent loci was two and 15, respectively. These loci were composed of 1-5 SNPs not always adjacent, with p-value ranging between 1e-05 and 1e-09. The contribution of individual significant SNPs to the total variance of the trait considered (marker R2) varied from 3% (one of the SNPs associated to LFDW) to 25% (one of the SNPs associated to SPKST). Among the 183 significant SNPs, 43% had marker R2>10%. Among the 27 SNPs significantly associated with SPKST, 12 had marker R2>10%. The amplitude of allelic effects at individual significant SNPs was also high for almost all traits considered. For instance, it varied from -13 to +24 cm for PTHT and from -29% to 26% for SPKST. The average MAF of the significant SNPs was 0.18 but varied highly according to traits (from 0.04 for FLFAG to 0.41 for TINB). The average MAF was 0.36 for PTHT and only 0.09 for SPKST.
Association mapping using haplotype approach. As expected, the haplotype approach detected a much higher number of individual haplotypes associated with each of the eight phenotypic traits considered ( Table 2). The total number of significant haplotypes was 883 for the eight traits considered, against 127 SNP loci with Sm-GWAS. The number of significant individual haplotypes was 46 for PTHT and 160 for SPKST against five and 27 detected with Sm-GWAS. The number of independent loci (i.e. cluster of haplotypes with distance between two consecutive haplotypes below 200kb) associated to each trait was also much higher. It varied from 11 for PTHT and SHDW to 52 for SPKST. The mean size of the independent loci varied from 35 to 462 kb (average of 200 kb) for PTHT and from 18 to 925 kb (average of 254 kb) for SPKST. Mean MAF of SNPs composing individual significant haplotypes varied much less among the eight traits (from 0.18 for SLA to 0.27 for SHDW) than the one observed for Sm-GWAS. The mean MAF for SNPs composing each of 11 independent loci associated with  Association mapping using Bayesian lasso model. In spite of a very stringent significance threshold based on the results of permutation tests, simultaneous fitting of all markers using Bayesian lasso model detected a large number of SNPs with effects significantly different from zero. These 460 SNPs represented a total of 222 independent loci for the eight phenotypic traits considered ( Table 2). The model detected 75 SNPs with significant effect corresponding to 29 independent loci for PLHT (S5 Table) and 86 SNPs with significant effect corresponding to 27 independent loci for SPKST (S6 Table). The robustness test reduced the number of SNPs with high effect to 35, which corresponded to 14 independent loci, for PTHT (S5 Table) and to 23, which corresponded to 12 independent loci, for SPKST (S6 Table). Mean MAF among the 460 significant loci was much higher than in the case of the Sm-GWAS and varied from 0.30 for SPKST to 0.38 for TINB. Congruence of results between the three association analysis methods. The number of significant independent loci common to the three GWAS methods was trait-dependent. For instance, while in the case of PTHT, the two loci detected by Sm-GWAS were also detected by Hap-GWAS and by BL-GWAS, in the case of SPKST, among the 15 significant loci detected under Sm-GWAS, only 10 were detected under Hap-GWAS and 6 under BL-GWAS. And among these loci, only four were common to the three GWAS methods. Interestingly, as shown in the case of PTHT, each of the five individual SNPs detected under Sm-GWAS Table 2. Co-localization of the loci significantly associated with one of the eight phenotypic traits as detected by three GWAS methods.

Main features PTHT LFAG TINB SLA DTHD SHDW LFSNS21 SPKST Total Mean
Sm-GWAS Number of significant SNPs (P-value < 1e-05) 5  SNPs detected by Sm-GWAS and BL-GWAS belongs to a cluster of 5-10 SNPs forming haplotypes detected as significantly associated with PTHT by the Hap-GWAS method. The proportions of independent loci under Sm-GWA also detected by Hap-GWAS or BL-GWAS were, respectively, 7% and 2% for LFAG, 0% and 0% for TINB, 0% and 62% for SLA, 100% and 100% for DTHD, and for SHDW, 12% and 29% for LFSNS21, and 52% and 37% for SPKST. Overall, among the 56 independent loci under Sm-GWAS, 25 were common with the 193 significant independent loci mapped with Hap-GWAS, and 14 were common with the 222 significant independent loci mapped with BL-GWAS. The number of significant loci common to Hap-GWAS and BL-GWAS was 25. Finally, only 12 significant independent loci were common to the three GWAS methods, of which two for PTHT and four for SPKST.

Characteristics of loci associated to target traits
Co-localization with QTLs reported in the literature. We inventoried a total of 380 QTLs detected in progenies of biparental crosses for our eight target traits (S7 Table). The proportion of these QTLs co-localizing with at least one significant independent locus was 74% for Sm-GWAS, 23% for Hap-GWAS and 20% for BL-GWAS. The co-localization dropped below 5% for loci common to two or three methods.
The chromosomic localization of the 54 QTLs inventoried for SPKST under high temperature and the localization of significant independent loci detected with each of the three GWAS methods is summarized in Fig 4. Among the 15 significant loci associated with SPKST under Sm-GWAS, nine (60%) colocalised with at least one QTL detected in progenies of biparental crosses. The proportion was 33% for the 52 independent loci under Hap-GWAS, 66% for the 27 independent loci under BL-GWAS, and 42% for the 12 independent loci under BL-GWAS after robustness test. Specially, significant independent loci detected by at least two GWAS methods colocalised with the most consistent QTL mapped on chromosome 4 for SPKST under high temperature in the progenies of both IR64/N22 and IR64/Giza178 crosses. Likewise, significant loci colocalised also with several other QTLs mapped in the progenies of these crosses (Fig 4; S6 Table).
Genomic localization. To examine the genomic localization of the independent loci, we identified the positions of the most significant SNPs by MSU database search and gene annotation. Among a total of 471 such SNPs detected by the three GWAS methods for the eight target traits, 51% were located in intergenic regions, 24% in introns, 11% in exons with synonymous coding effects, 7% in exons with non-synonymous coding effects, 6% in UTR-3 regions and 0.2% in stop-gained sites (data not shown). These proportions were similar to those observed among all the 13,160 SNPs used for GWAS. Likewise, little variations were observed for these proportions among traits and among GWAS methods (data not shown). Thus, we limited further analysis of localization to the cases of PTHT and SPKST and to loci common to at least two GWAS methods. As our diversity panel included semi-dwarf accessions, we used the PTHT case to check the power of our data and GWAS methods to detect the semi-dwarfing loci Sd1, OsGA20ox2, (Loc_Os01g08838, Chr.1: 38,382,385-38,385,469).
Genomic localization of loci associated with PTHT. The two independent loci associated with PTHT under two or three GWAS methods were both located downstream from the semi-dwarfing gene Sd1. The closest locus, represented by SNP S01_38434998, was located 52.6 kb downstream of Sd1. The marker affected PTHP by 15.8 cm, and had a R2 of 0.08. The second locus, SNP S01_38999211, was located 617 kb downstream of Sd1. The marker affected PTHP by 25.3 cm, and had a R2 of 0.13. Despite a distance of 564 kb, linkage disequilibrium between the two SNP loci was high (r 2 = 0.5), and alleles at the two loci formed three haplotypes corresponding to significantly different PTHT (p<0.0001). The three haplotypes also significantly distinguished the four sub-populations identified within the panel: while the haplotype AG (mean PTHT = 145 cm) and AT (mean PTHT = 99 cm) were present exclusively in sub-populations SP1 (traditional lowland varieties) and SP3 (improved lowland varieties), respectively, the haplotype GG (mean PTHT = 141 cm) was shared by subpopulations SP1, SP2 and SP4. Thus, (i) the power of our data and GWAS methods in detecting loci with rather high effects was confirmed despite the superposition of the panel structure and the distribution of the target traits within the panel, and (ii) the rule we used to define "independent loci" (distance of 200 kb between adjacent SNP, corresponding in average to a r 2 > 0.2) did not apply in the vicinity of Sd1 locus subject to high selection pressure within the rice breeding programs.
Genomic localization of loci associated with SPKST. Among the 22 SNPs significantly associated with SPKST under two or three GWAS methods, 13 were located in annotated genes and 7 in intergenic regions. Among the annotated genes, at least 7 were described as involved in plant response to abiotic stresses or to gametophyte development ( Table 3). Survey of genome annotation within a surrounding interval of 100 kb (50 kb downstream and 50 kb upstream), each of the 22 significant SNPs led to the identification of at least one gene with product involved in (i) plant response to abiotic stresses and transcription regulation such as heat shock proteins (SHP), wall associated kinase (WAK), wall receptor-like protein kinase (WRKY), F-box domain containing protein, serine carboxypeptidase homologue, (ii) malefemale interactions in plant sexual reproduction, gametophyte development, and senescence such as DEFL peptides, F-box, and the seed maturation protein PM23, (iii) cell division such as the SNF2 family N-terminal domain containing protein, the TBC domain containing protein, regulator of chromosome condensation, (iv) osmotic adjustment such as the cation/H + exchanger OsCHX15 (Table 3). For instance, in the case of the significant SNPs colocalizing with the major QTLs mapped in IR64/N22 and IR64/Giza178 crosses, genome survey led to a F-box/LRR-repeat protein and OsMADS21 (involved in flower development) for qHTSF1.1, to OsWAK43 and OsWAK44 (involved in gametophyte development) for qHTSF4.1, and to OsFBX187 and OsIAA20 for qHTSF6.1.
Haplotypes associated with SPKST in our panel and within the O. sativa major groups. Haplotype construction with the genotype of our 167 accessions at the 22 SNPs' loci associated with SPKST under at least 2 GWAS methods led to two major haplotypes with significantly different spikelet sterility (80% and 94%, p <0.001). The two most tolerant To explore the genetic diversity at the vicinity of loci associated with SPKST within the O. sativa major groups, we extracted genotypic data from the IRIC database (http://oryzasnp.org/ iric-portal) that hosts the data of 3 000 rice genomes: 3KRG project. Haplotype construction within the accessions of the 3KRG simultaneously using genotypes at the 22 significant SNPs or the 15 corresponding independent loci, did not yield simple and readable patterns. In contrast, haplotypes at the vicinity of individual significant loci were much more informative. For instance, the 19 SNPs extracted from the interval of 100 bp surrounding the significant SNP D04_17876533, that colocalised with qHTSF4.1 on chromosome 4, formed 5 major haplotypes. And while four of these haplotypes were related chiefly to membership to genetic group (indica, japonica, aus, or aromatic), the fifth one assembled accessions from indica and japonica groups, and almost all aus accessions. This haplotype included heat tolerant accessions common to our diversity panel such as N 22, Taichung native-1, IR22 and B6144 (S5 Fig). Likewise, the 510 SNPs available in Loc_OS04g29960 (OsWAK43), with which D04_17876533 colocalised, formed 5 major haplotypes. One of these haplotypes clustered a large number of accessions from different genetic groups and hosted the heat tolerant accessions common to our diversity panel (S5 Fig). Haplotype analysis at the vicinity of the other SPKST-associated loci led to similar patterns.

Discussion
The aim of this work was to explore the phenotypic diversity for rice spikelet sensitivity to high temperature, and the associated allelic variations, and to draw a unified picture of the genetic bases of this important trait in the context of climate change, by connecting QTLs previously mapped in bi-parental crosses to the underlying genes. Given the recent development of new GWAS methodologies, we also explored the added value of two methods that resort to more complex models than the classical single marker regression to detect SNP loci associated with phenotypic variation. During our phenotyping experiment for spikelet sensitivity to high temperature, several other traits have also been measured. Some of these traits were related to the plant ability to reduce panicle temperature through transpiration, via either large leaf area (LFAR, SLA, GLFDW, DLFDW, LFSNS 16, LFSNS 21, SLA) or appropriate architecture (PTHT, PNPOS, FLFAG, LFAG), others related to plant potential performance (PNNB, TINB, PNLT, PNEX, SHDW, ST+PNDW), and one related to the duration during which the plant may be susceptible to heat (HDLT). Although we could not find any direct relationship between these traits and SPKST, we included them in our GWAS work with the aim of (i) further ascertaining the suitability of our diversity panel for GWAS analysis and (ii) enriching existing QTL data bases for those traits.
The temperature of 38˚C applied during 6 consecutive days at the time of flowering was rather severe, leading to high SPKST for a large number of accessions. However, this did not change the ranking of the most tolerant and most susceptible accessions as reported in literature [2].
Our genotypic data of 13,160 SNP provided a reasonably good coverage of genome (average density of one marker per 29 kb) given the average extent of LD of 125-150 kb. Indeed, we may have failed to detect loci associated with our target traits in only nine chromosomic segments free of markers over more than 500 kb.
The 167 accessions of our diversity panel were structured into four subpopulations, the main structuring element being (i) landrace status (subpopulation SP1) versus modern variety (SP2) and the associated tall or semi-dwarf plant stature, (ii) adaptation to specific ecology and cropping system, such as high elevation areas (SP3) and rainfed lowland cropping in low elevation areas (SP4). Given the well-known risk of false-positive associations in such structured panels [60], we systematically included in the association mapping models a fixed effect with design matrix Q to account for population structure. Under the three GWAS methods we utilized, the detection of the loci associated with plant height variations resulting from the presence or absence of the semi-dwarfing gene Sd1 on the long arm of chromosome 1, highlighted the effectiveness of the Q matrix in balancing (i) the risks of false positive tests inherent to population structure and (ii) the risk of false negative tests due to the superposition of the distribution of the phenotypic variability for plant height with the panel structure.
An important issue for GWAS is the minimal size of the panel that provides enough power to detect associations of a given size. The power of the association analysis necessary to detect a QTL by testing the marker effect depends on the LD between markers, the proportion of total phenotypic variance explained by the QTL, the size of the population, the allele frequency of the rare allele at this marker and the significance level α set by the experimenter [47]. In the case of Sm-GWAS, the issue of the significance threshold is reflected into the dilemma of balance between (i) the very stringent correction for the number of markers tested using Bonferroni correction to obtain experiment wise p-value and (ii) the fact that tests on the same chromosome are not independent, as close markers are often in linkage disequilibrium with each other as well as with the QTL. We opted for the common practice adopted in the literature of using a p-value of 1e-05, and then tried to confirm the detected association using Hap-GWAS and BL-GWAS methods. However, the implementation of these methods also faces the question of significance threshold for haplotype or individual marker effect to be declared significant. We resorted to permutation test in the case of BL-GWAS, but despite a reasonably high number (1000) of permutations the stringency was low, leading to a large number of significant loci. By contrast, the robustness test proved to be more efficient though the experimenter still had to determine a threshold.
For each of the eight phenotypic traits considered, Hap-GWAS and BL-GWAS detected a much higher number of significant SNPs (seven times and four times more, respectively) and independent loci (3.5 and four times, respectively) than Sm-GWS. In the case of Hap-GWAS, these higher numbers were expected as marker haplotypes can be in higher LD with a QTL than individual markers. And this led to declare as significant SNPs with very low individual contribution to the trait total variance. For instance, the mean marker R2 of the 160 SNPs belonging to haplotypes significantly associated with SPKST was 2.9% against 11.4% for the 27 SNPs detected by Sm-GWAS. A less expected phenomenon was the fact that all significant SNPs detected by Sm-GWAS were not detected by hap-GWAS. For instance, in the case of SNPs associated with SPKST, only 14 out of the 27 were detected by hap-GWAS. This is probably due to the fact that most of the 27 significant SNPs under Sm-GWAS had rather small MAF and the frequency of haplotypes constructed with these SNPs was still smaller. Moreover, in some cases, the distances between the 6 consecutive SNPs serving to haplotype construction were high, leading to low LD between markers composing the haplotype. For instance, in the case of SPKST, while the average MAF among the 14 SNPs detected by both Sm-GWAS and Hap-GWAS was 17%, it was only 11% in the case of the remaining 13 SNPs. Likewise, while the length of the chromosomic segment covered by the haplotype window of 6 SNPs surrounding each of the 14 significant SNPs under the 2 GWAS methods was 100 kb in average, it was 216 kb for the 13 SNPs not detected by Hap-GWAS.
In the case of BL-GWAS, the prior assumptions on the distribution of possible SNP effects (double exponential or Laplace prior) we used proved, afterwards, not to be the most adapted to our data and thus not stringent enough. Robustness test applied to PTHT and SPKST allowed refining the threshold of significance for the SNPs with the greatest contribution to the observed phenotypic variation. The number of SNPs with p <0.05 was 35 (12 independent loci) and 29 (16 independent loci) for PTHT and SPKST, respectively. As expected, under BL-GWAS, SNPs with smaller R2 were declared significant. For both PTHT and SPKST, the mean marker effect was approximately 5% under BL-GWAS against 11% under Sm-GWAS. The significant SNPs included all those detected by Sm-GWAS, for PTHT, and only 10 of the 27 SNPs for SPKST. The major difference between the 10 common and the 17 non-common SNPs was the MAF, with a marker effect of 18% in average for the former and 5% for the latter. Similarly, while the average MAF for the 27 SNPs significantly associated with SPKST under Sm-GWAS was only 10%, it was 29% for the 35 SNPs significant under BL-GWAS. Thus the two methods differed strongly for their sensitivity to MAF.
The conclusions one can draw from the comparison of results obtained from the three GWAS methods are their differential sensitivity to (i) the genetic architecture of the phenotypic trait and its distribution within the diversity panel, (ii) the characteristics of the genotypic data in terms of regularity of markers distribution along the genome and of equilibrium between alleles at individual SNP loci. Thus, confronting the results of several GWAS methods on a given dataset helps guarding against the effects of their sensitivity to specific features of the data. However, this solution is only a first step. The most reliable solution against those risks remains validation in an independent GWAS experiment and population.
We have detected 14 independent loci significantly associated with SPKST under at least 2 GWAS methods. Favorable alleles for some of these loci were present in some accessions of three subpopulations (SP1, SP2, and SP4) of our diversity panel, but only two accessions, one from SP1 (indica landrace) and one from SP4 (aus) assembled all favorable alleles. Haplotypes analysis at the vicinity of individual significant loci within the accessions of the 3KRG confirmed the fact that favorable alleles for heat tolerance during anthesis are not confined to a few accessions but are rather widespread across different genetic groups of O. sativa. However the number of accessions bearing the favorable allele for spikelet fertility despite high temperature at all loci is probably limited. Indeed, within the 3KRG panel, construction of haplotypes using simultaneously the genotypes at all significant loci did not yield a simple and readable pattern as it was the case for individual loci and for our own diversity panel.
Among the 14 independent loci associated with SPKST, eight colocalised with QTLs reported in the literature for tolerance to high temperature during reproductive stage. Our GWAS experiment allowed (i) to narrow-down the position of these eight QTLs detected in bi-parental crosses and (ii) to detect six new QTLs. Among the QTLs reported in the literature, qHTSF4.1 was the most consistent as it was detected across different genetic backgrounds, including the progenies of crosses between the two tolerant varieties N22 and Giza178 with the moderately tolerant variety IR64 [30]. The interval for this QTL reported by these authors was 63.5-80.5 cM, approximately corresponding to the physical interval of 15-19 Mb [61]. Our results narrowed down this interval to LOC_Os04g29960, with a significant non-synonymous SNP (D04_17876533). This locus codes for a wall-associated kinase, OsWAK43, and stands at 11.6 kb of a second WAK locus, OsWAK44. The WAKs are a sub-family of receptor-like kinases associated with the cell wall. They have been suggested as sensors of the extracellular environment and triggers of intracellular signals [62]. In A. thaliana, disruption of WAK function compromises leaf cell expansion and root cell elongation [63]. In rice, several WAKs participate positively and negatively in basal defense against rice blast fungus [64]. Their involvement in response to abiotic stress is also proposed, with the production of reactive oxygen species and of downward redox signaling, which have a vital function in plant growth, development, and stress response [65]. Some WAKs are also involved in rice gametophyte development [66].
Gene families underlying the 14 independent loci associated with SPKST corresponded to functions ranging from sensing abiotic stresses and regulating plant response to those stresses, to cell division and gametophyte development. This is consistent with the differential morphophysiological and biochemical responses of rice varieties tolerant or susceptible to heat stress during anthesis. The differential responses apply notably to (i) the degree of anther dehiscence and the length of pollen tube, (ii) the pollen moisture content and the rate of pollen survival, and (iii) the pollen membrane stability indices, the pollen total protein concentration, the ratio of protein with high and low molecular mass, and the concentration of heat shock proteins [23,25]. The first set of differential responses results from the expression of genes involved in cell expansion such as WAK [65]. The second set results from upregulation of genes involved in osmotic adjustment such as CHX as mature pollen desiccates and then rehydrates at germination [67,68]. The third set results from upregulation of molecular chaperones such as HSP, and HEAT repeat domains, that recognize and bind substrate proteins that are in an unstable state [69,70], J-proteins that can partner with HSP [71] and F-box proteins that are critical for the controlled degradation of cellular proteins [72]. Of course, all these differential responses are also related to the expression of genes involved in sensing and signaling high temperature such as the WAK family, and of transcription factors regulating cell and plant response to high temperature such as the SNF2 [73] and the WRKY [74] families.
Given the diversity of the biological processes and associated genes involved in rice response to high temperature during the anthesis process, transfer of tolerance from donors such as N22 and Giza178 to elite materials has little chance to be achieved through conventional and/or marker-assisted backcross breeding. Other approaches such as marker-assisted recurrent selection in bi-parental crosses [75] or, even better, genomic selection in dedicated synthetic population [76] need to be considered.