Association Mapping for Important Agronomic Traits in Core Collection of Rice (Oryza sativa L.) with SSR Markers

Mining elite genes within rice landraces is of importance for the improvement of cultivated rice. An association mapping for 12 agronomic traits was carried out using a core collection of rice consisting of 150 landraces (Panel 1) with 274 simple sequence repeat (SSR) markers, and the mapping results were further verified using a Chinese national rice micro-core collection (Panel 2) and a collection from a global molecular breeding program (Panel 3). Our results showed that (1) 76 significant (P<0.05) trait-marker associations were detected using mixed linear model (MLM) within Panel 1 in two years, among which 32% were identical with previously mapped QTLs, and 11 significant associations had >10% explained ratio of genetic variation; (2) A total of seven aforementioned trait-marker associations were verified within Panel 2 and 3 when using a general linear model (GLM) and 55 SSR markers of the 76 significant trait-marker associations. However, no significant trait-marker association was found to be identical within three panels when using the MLM model; (3) several desirable alleles of the loci which showed significant trait-marker associations were identified. The research provided important information for further mining these elite genes within rice landraces and using them for rice breeding.


Introduction
As a staple cereal crop, rice (Oryza sativa L.) feeds more than 50% of the world's population [1] and is one of the most important components of human diet in many regions of the world. Thus, genetic improvement of rice for yield is important to the meet food demand of a growing global population. Rice landraces have a greater genetic diversity than elite cultivars (or commercial cultivars) and represent an intermediate stage in domestication between wild rice and elite cultivars [2], which make it easier to be used in rice breeding than wild rice and at the same time still keeping most of the diversity in rice germplasm resource. Therefore, mining elite genes within the germplasm of rice landraces is of importance for the improvement of cultivated rice.
Linkage mapping and association mapping based linkage disequilibrium (LD) are two main methods for locating genes or QTLs. The major limitations of linkage mapping are that only two alleles at any given locus can be studied in bi-parental crosses and a low mapping resolution [3], whereas association mapping promises to overcome the limitations of linkage mapping [4]. Moreover, association mapping identifies QTLs by examining the trait-marker associations and enables researchers to use modern genetic technologies to exploit natural diversity and locate valuable genes in the genome [5].
Association mapping has been widely used in plant research since it was firstly reported in maize [6,7]. In recent years, association mapping has been applied in Arabidopsis, maize, barley, durum wheat, spring wheat, sorghum, sugarcane, sugar beet, soybean, grape, forest tree species and forage grasses [8] as well as rice [9,10,11,12]. For example, an association mapping was performed with 60 simple sequence repeat (SSR) markers and 114 restriction fragment length polymorphism (RFLP) markers for 12 agronomic traits within 218 inbred lines of rice originating from United States of America (USA) and Asia [13]. An association mapping was performed for five agronomic traits in a population of 103 cultivars using 123 SSRs [14] as well as for grain shape using a collection of 293 accessions of Asian cultivated rice [15]. An association mapping for starch quality traits using both candidate gene-based association mapping and genome-wide association study (GWAS) strategies was performed [16]. More than 3.6 million SNPs were detected by sequencing 517 rice landraces and applied for GWAS for 14 agronomic traits [17]. However, to our knowledge, an association mapping with a high number of SSR markers was seldom performed in the previous studies. Moreover, no earlier research performed an association mapping in one population and at the same time verified the association mapping results in other populations.
The choice of appropriate germplasm to maximize the number of historical recombinations and mutation events (and thus reduce LD) within and around the gene of interest is critical for the success of association analysis [18]. One of the methods to obtain most of the phenotypes is to construct a core collection. A core collection is a subset chosen to represent most genetic diversity of an initial collection with a minimum of redundancies [19,20,21].
Core collections facilitate the users to access useful samples of small sizes while still keeping most of the genetic variability contained within the gene pool of a specific crop [22]. The construction of a core collection was widely applied in rice as well as other crops. Thus, a core collection might be an ideal mapping population for association mapping. Some rice core collections have been used as association mapping populations in previous studies [23,24]. However, the mapping population in the studies mentioned above were two subsets consisting of 547 and 203 accessions chosen randomly from United States Department of Agriculture (USDA) rice core collection which consists of 1790 rice entries, which cannot effectively maintain the genetic diversity in the original collections. Moreover, the number of SSR markers for genotyping was low (72 and 155) in the studies. As far as we know, no earlier research on association mapping based on a core collection of rice landraces was available.
Population structure may cause false positives in association mapping. To overcome this problem, an approach using a mixedmodel was proposed for association mapping, which take both population structure (Q) and kinship (K) into account for the reduction of false positives [25]. In recent years, comparisons of different statistical models e.g. Q, Q+K and P+K have been conducted for Arabidopsis [26], sweet sorghum [27], maize [28] and rice [23]. However, false positive might not be absolutely avoided through the aforementioned models. To avoid them, it required that the significant associations identified within one population should be verified in another population [29].
In our previous studies, a rice core collection (Ting's rice core collection) consisting of 150 accessions of rice landraces has been constructed based on 15 quantitative traits and 34 qualitative traits from 2262 accessions of rice landraces of the Ting's collection with an optimal sampling strategy [30]. Moreover, population structure and LD of the rice core collection had been examined in details [31]. In this study, an association mapping was performed for 12 agronomic traits in the Ting's core collection assessed with 274 SSR markers. Moreover, the significant trait-marker associations identified in the population were verified within a Chinese national rice micro-core collection and a collection from a global molecular breeding program. The study aimed to (1) perform association mapping for 12 important agronomic traits in the Ting's core collection and verify some of the mapping results in another two core collections, (2) compare the effectiveness of different statistical models and different significant thresholds for association mapping, and (3) identify desirable alleles of the loci which showed significant trait-marker associations for rice breeding.

Plant material
Three rice collections, i.e. Ting's core collection (Panel 1), the Chinese national micro-core collection (Panel 2), and a collection from the core collection of a global molecular breeding program (Panel 3) were used in this study. Panel 1 was collected by the researcher Ying Ting during 1920-1964 from all over China as well as from Korea, Japan, Philippines, Brazil, Celebes, Java, Oceania, and Vietnam. The original collection comprises 7128 rice landraces [32]. The core collection (Panel 1) with 150 accessions was constructed from 2262 accessions of 7128 based on a strategy of stepwise clustering and preferred sampling on adjusted Euclidean distances and weighted pair-group average method using integrated qualitative and quantitative traits [30]. Panel 2 with 197 accessions was provided by China Agricultural University, and Panel 3 with 122 accessions was offered by the International Rice Research Institute (IRRI). The information for each variety is shown in Table S1 in File S1.

Phenotyping
All of the three panels were cultivated at the farm of South China Agricultural University, Guangzhou (23u16N, 113u8E), during the late season (July-November) for two consecutive years (2008 and 2009). A randomized complete block design with three replications was used during each season. The space between rows and between plants was set to 20 and 16.5 cm, respectively. Thirty plants of each variety were grown in three rows with 10 plants per row. For each block, the five plants in the middle position of the second row of each variety were selected so that the marginal effect was avoided. 12 agronomic traits for these plants were investigated. Heading date (HD) was recorded as days from sowing to flowering time when 30% of the individuals of one variety started flowering. Plant height (PH), panicle length (PL), grain length (GL), grain width (GW), flag leaf length (FLL), and flag leaf width (FLW) were measured in centimeters. Seed set rate (SS, %) was the percentage of filled grains divided by the total grains per plant. For 1000-grain weight (1000GW), 100 grains were measured in grams with three replicates and then its average was multiplied by 10. For grain length (GL) and width (GW), ten grains were randomly selected and measured with a digital vernier caliper.
Genotyping 274 SSR markers evenly distributed across the 12 chromosomes of rice were selected to genotype all varieties in Panel 1 (Table S2 in File S1). A total of 23,25,24,22,21,22,21,25,23,24,23, and 21 of these markers were mapped to chromosomes 1 to 12, respectively. The average distance between the loci in chromosomes 1 to 12 is 7.5 cM, 8.2 cM, 9.4 cM, 7.4 cM, 7.1 cM, 6.3 cM, 5.8 cM, 5.4 cM, 5.2 cM, 4.7 cM, 5.6 cM and 5.3 cM, respectively. Markers which prefix RM were summarized in [33,34,35,36] and those with prefix PSM were summarized in [37]. DNA was extracted using a modified SDS method [38]. The volume of the polymerase chain reaction (PCR) was 10 ml. The profile of the PCR program was as follows: 94uC for 5 mins followed by 29 cycles of 94uC for 1 min, 55uC for 1 min, 72uC for 1 min with a final extension of 5 minutes at 72uC. PCR products were separated in size by 6% polyacrylamide gel electrophoresis and detected by silver staining [39]. A standard marker (100-600 bp, produced by Shanghai Biocolor BioScience & Technolgy Company) was added on each gel as control during the gel run. The size of PCR products were detected by BIO Imagine System with software Genetools from SynGene and were manually rechecked twice [31]. The length of each allele was compared to the standard bands of the standard marker and scored.

Data analysis
Means and standard deviation (SD) for 12 traits were calculated using Excel software. The percentage of phenotypic variation explained by population structure was calculated using a General Linear Model (GLM) with software SPSS 17.0 for Windows (SPSS Inc. Chicago, IL, USA). The broad-sense heritability (H 2 ) was calculated as H 2 = s 2 g / s 2 g zs 2 e , where s 2 g is the genetic variance, s 2 e is the environmental variance. They were calculated using software QGA Station 1.0 (Zhu Jun, Zhejiang University, China). Correlation coefficients between traits were calculated using the software SPSS.
Polymorphism information content (PIC) which measures the extent of polymorphism for marker gene(s) or marker sequence(s) was calculated using the program POWERMARKER V3.25. Software Structure V2.3.1 was used to infer population structure and get Q matrices [40,41]. During the running, a range of genetic clusters from K = 1 to 15 with the admixture model was examined, and for each K it was replicated 5 times. Each run implemented with a burn-in period of 100,000 steps followed by 100,000 Monte Carlo Markov Chain replicates. Due to the distribution of L(K) did not show a clear cutoff point for the true K, an ad hoc measure DK was used to detect the numbers of subgroup. That run with the maximum likelihood was applied to subdivide the varieties into different subgroups based on the maximum membership probability. A Q-matrix was obtained from the membership probability of each variety. Our previous study indicated that there were two distinct subgroups in Panel 1, which were in accordance with the germplasm types of indica and japonica rice [31]. The Q-matrix was used for further association mapping. The Loiselle algorithm was chosen for calculating kinship matrix (K) by software SPAGeDi [42]. Rare alleles with frequency of less than 10% in population were filtered as missing data in association analysis. Quantile-quantile plots were generated for observed against expected 2log 10 (P) using software SAS version 9.0 (SAS Institute 2002), where observed P values were obtained from association mapping and expected P values from the assumption that no associations happened between marker and trait.
Association analysis was performed using the software TASSEL (www.maizegenetics.net/tassel). For the mixed linear model (MLM) method, both K and Q matrices were incorporated, whereas for the GLM method, only population structure information (Q-matrix) was used as a covariate. Significance of associations between loci and traits were determined by their P values (P,0.05) which were calculated by the statistical models, and the phenotypic variance explained by the significant loci was calculated through analysis of variance (ANOVA). Since MLM method performs better in controlling spurious associations than GLM method [43], we first ranked the significant (P,0.05) association from MLM and then compared the significance of these markers (P,0.05) in the permutation based on GLM association tests. For the comparison, we calculated and used other two significant thresholds (i.e. Minimum Bayes factor (BF) and Bonferroni threshold) besides the P value. BF was calculated using the following formula: BF = 2e*P*ln(P) [44,45]. The Bonferroni threshold [46] was 1/274 = 0.00365, where 274 is the number of association tests for each traits in this study. Duncan multiple comparisons was implemented in SPSS for comparisons of performance of agronomic traits relevant to different alleles of the significant trait-marker associations.

Phenotypic variation
The rice landraces in Panel 1 revealed a wide range of phenotypic variation in 12 agronomic traits (Table 1). Heading date, plant height, 1000-grain weight, flag leaf length, flag leaf length/width, and panicle numbers per plant showed similar distributions in both two years (Figures S1-S6 in File S1). On average about 12.4% of phenotypic variation was influenced by population structure. The broad-sense heritability ranged from 74.8% (1000GW) to 99.8% (GW) for these traits.  (Table 2). Extremely significant (P,0.01) negative correlations in both two years were found between HD and 1000GW, GW and GL/GW, FLW and FLL/FLW, FLW and PN.

Relative kinship among individuals in the three panels
In Panel 1, about 55% of pairwise kinship estimates were zero and only 4.73% of pairwise kinship coefficient were larger than 0.5, indicating that these varieties were unrelated ( Figure 1). In Panel 2 and 3, 55.9% and 60.4% of pairwise kinship coefficient were larger than 0.5, respectively ( Figure S7 in File S1), indicating that these varieties have certain kinship relationship.
The effect of controlling type I error using MLM Observed versus expected P values for each trait-marker association were plotted to assess the control of type I errors. Uniform distributions between the observed and expected P values for all traits were observed, and were demonstrated by similar distributions in two years (Figures 2 and 3). As the deviations from the expectation demonstrated that the statistical analysis may cause spurious associations [28], our result indicated that the false positives were well controlled in the MLM method in this study.
Trait-marker associations 152 significant (P,0.05) trait-marker associations were found using the GLM model for the 12 agronomic traits both in 2008 and 2009, and 15 (,10%) of 152 trait-marker associations were detected in the previous studies (Table 3). Furthermore, 184 and 217 significant (P,0.05) trait-marker associations were identified using MLM in 2008 and 2009, respectively. Among them, 76 traitmarker associations were significant (P,0.05) both in 2008 and 2009. The number of significant loci associated with each agronomic trait in two years ranged from 0 (seed set rate) to 13 (plant height). Moreover, 24 (,32%) of the 76 trait-marker associations were in the same or similar genomic regions where QTLs were detected in previous studies (http://www.gramene. org/), and the other 52 trait-marker associations were new associations which were not previously identified.
Eleven of the 76 trait-marker associations had 10% or more explained percentage of the total variation (R 2 ), i.e. HD (PSM184), PH (RM530, RM590), PL (PSM184), GL/GW (RM447), FLL (RM287), FLW (RM235), 1000GW (RM7, RM538 and RM206), and PN (RM311) both in 2008 and 2009 (Table 4). When using BF and the Bonferroni threshold as significance thresholds, there were 15 and 3 trait-marker associations out of the 76 significant associations which still showed significant associations, respectively. Moreover, the three trait-marker significant associations shown by Bonferroni threshold were also significant when using BF as significant threshold. Furthermore, 59 of the 76 trait-marker associations were found to be significant when using the GLM model in two years.

Impact of allele frequency on the power to detect a QTL
We further investigated the relationship between the P values of significant trait-marker associations and the PIC values of related markers. For all trait-marker associations, only 3.5% of markers had a PIC value lower than 0.2 ( Figure 4). Most of the markers which showed significant associations with related traits had a PIC value larger than 0.2, which meant that these markers showed a higher power to detect a QTL.

Verification of association mapping results in Panel 2 and Panel 3
For the 76 significant trait-marker associations in Panel 1, because some SSR markers show more than one significant associations with related traits, the number of related SSR markers is less than 76, i.e. 55 SSR markers in this study. All these 55 SSR markers were further used to genotype Panel 2 and 3. Based on these genotyping data, the population structure of both Panel 2 and 3 indicated two distinct subgroups ( Figure S8 in File S1).
Association analysis was performed within the two Panels using both MLM and GLM approaches with the 55 SSR markers. A total of 20 and 31 significant trait-marker associations were detected using MLM within Panel 2 and Panel 3, respectively. Seven significant trait-marker associations which were detected in Panel 1 using MLM model were identical with those in Panel 2 and Panel 3 using the GLM model, respectively. However, there was no identical trait-marker association within the three Panels when using the MLM model (Table 5). In Panel 2, RM219 [47], RM469 [48] and RM204 [49] showed significant associations with plant height and they were also reported by previous researches. Among them, the association for marker RM469 with plant height had the highest R 2 (10.08%). Similarly, in Panel 3, the association for marker RM590 with plant height had the highest R 2 (39.96%). RM339 which showed significant associations with heading days, were reported by previous researches [50] (Table 6).

Performance of traits relevant to different alleles of significant loci
Seven markers, i.e. PSM184, RM447, RM469, RM235, RM206, RM311, and RM277, were selected for analysis of trait performance relevant to different alleles of significant loci based on their high explained percentage of genetic variation and supported by several significant thresholds (Table 4). For PSM184, the individuals carrying the allele 222 bp (the size of PCR product for the SSR markers, the same as below) had a significantly (P,0.01) lower plant height and panicle length than those carrying other two alleles 205 bp and 215 bp (Table 7). For RM447, the individuals carrying the allele 109 bp had a significantly (P, 0.01) higher grain width and significantly (P,0.01) lower grain length/width ratio than those carrying other two alleles 100 bp and 117 bp. For RM469, the individuals carrying the allele 94 bp had a significantly (P,0.01) lower flag leaf length than those carrying other two other alleles 83 bp and 88 bp. For RM206, the individuals carrying the allele 162 bp had a significantly (P,0.01) higher 1000-grain weight than those carrying the other four alleles 123 bp, 125 bp, 130 bp and 143 bp. For RM311, the individuals carrying the allele 143 bp, 143 bp and 153 bp showed a significantly (P,0.05) higher panicle number per plant than those carrying other two alleles 147 bp and 157 bp. For RM235, the individuals carrying the allele 108 bp showed a significantly (P, 0.05) higher flag leaf width than those carrying the alleles 115 bp, 117 bp, 121 bp and 123 bp, whereas the individuals carrying the allele 123 bp had a significantly (P,0.05) lower flag leaf width Table 1. Descriptive statistics, percentage of phenotypic variation explained by population structure (R 2 ), and heritability in broad sense (h 2 ) for 12 agronomic traits in Panel 1.

Discussion
Comparison of different mapping populations for association mapping An appropriate population with maximized phenotypic variation is critical for the success of an association analysis [18,51]. Rice landraces represent an intermediate stage in domestication between wild and elite cultivars [2], which possess high genetic diversity and many exotic genes, and therewith provide useful germplasm resources for rice breeding. Moreover, association mapping based on a core collection of rice landraces would help to catch as much phenotypic variation as possible.
China is well known as one of the origin center of cultivated rice with abundant genetic resources for rice. As early as in 1920-1964s, Professor Ying Ting collected more than 7128 accessions of rice landraces from all over China as well as some countries which grow rice as a major crop. The collection is one of the earliest collections for rice germplasm resources and therefore was named Ting's rice germplasm collection [30]. Our previous results based on the core collection from it indicated that (1) the percentage of SSR loci pairs in significant (P,0.05) LD was 46.8%; (2) LD decayed rapidly to the threshold, i.e. the 95% quantile of r 2 between unlinked loci pairs, at 1.03 cM in the entire collection; and (3) there were many LD blocks. These previous results indicated that Panel 1 was an appropriate population for association mapping. Therefore, our association mapping was performed based on Panel 1.
The populations in previous studies for association analysis in rice included populations from the USDA core collection [14,16,24], landraces [16,17], elite cultivars [16], and mini-core collection [23]. The mapping populations in the researches of Agrama et al. [14,24,52] and Li et al. [23] were subsets chosen randomly from the USDA core collection, which consisted of 92, 547 and 203 accessions, respectively. Moreover, the number of SSR markers was 123, 72 and 155, which was rather low for association mapping. In the study of Zhao et al. [11], 416 rice accessions including only two landraces were randomly selected and only 100 SSR markers were used.
Our results indicated that there is a wide-range of phenotypic variation for 12 agronomic traits in Panel 1. For heading days, flag leaf length, flag leaf width, grain length, grain width, grain length/ width and panicle length, there was less phenotypic variations than described in the research of Jin et al. [16], while for plant height and 1000 grains weight, more phenotypic variation was found than reported in the research of Jin et al. [16]. The comparison with the results of Li et al. [23] indicated that less phenotypic variation was found in this study for heading days, 1000-grain weight and panicle length, while more was found for plant height, panicle number per plant and seed set rate. More phenotypic variation was found than reported in the research of Agrama et al. [14] for grain length, grain width and 1000-grain weight.
Choice of statistical models and statistical parameters to control type I error There are two frequently used models (i.e. MLM and GLM) which were implemented in the software TASSEL for association analysis [17,23,28]. In this study, we used the MLM (Q+K) [25] which accounted for population structure and kinship relationship to minimize spurious associations. For comparison, GLM was also used. In our study, 137 (,90%) trait-marker associations were possibly new loci when using GLM model, whereas 52 (,68%) trait-marker associations were possibly new loci when using MLM model. The ratio of possibly new significant loci detected using GLM model was much higher than that using MLM model.
However, the new significant loci might be false positive because GLM model did not account for kinship. Furthermore, the significance threshold (P value) must be set considerately in the association mapping. Using a smaller P value as threshold might lose more minor QTLs, while using a higher P value as threshold might get more false positive QTLs. To reliably  interpret the MLM-derived significant associations in our study, we also used minimum BF estimation [44] for the MLM association results. Minimum BF estimates over P values of MLM approach may help to understand the overall impact of the associations [45]. We also used a Bonferroni threshold for identifying the associations derived from MLM analysis. The     statistical parameters had been used successfully in association mapping of cotton [8]. Our results indicated that three significant trait-marker associations (i.e. plant height-RM530, grain length-RM156 and grain width-RM276) reached simultaneously the three thresholds (i.e. P,0.05, minimum BF, and the Bonferroni), which should be emphasized in future studies. Moreover, molecular markers can be used to calculate the relative kinship between pairs of individuals in a study, which provides useful information for quantitative inheritance studies. Relative kinship reflects the approximate identity between two given individuals over an average probability of identity between two random individuals [25]. Our results indicated that most varieties had no or weak relationship with each other in the Ting's core collection, which might be due to the fact that these varieties were chosen from a diverse rice cultivating region including all over China, East Asia, and Southeast Asia. The quantile-quantile plot indicated that MLM (Q+K) performed well in association mapping on 12 agronomic traits, which could correct false positive trait-marker associations (Figure 2 and 3).

Association analysis within Ting's core collection
Using Ting's rice core collection genotyped with 274 SSR markers, we performed association mapping for 12 agronomics traits with two years data using the MLM and GLM models implemented in TASSEL. In this study, most (,80%) of the significant associations found using the MLM approach were also supported by the GLM approach in both years. The percentage of associations identical to previous reported QTLs was about 32%, which was higher than those in the research of Li et al. [23], but  Table 5. Summary of trait-marker associations within the three Panels. lower than those in the research of Agrama et al. [14]. The 76 significant trait-marker associations which were detected in both years were potential markers for effective marker-assisted selection programs in rice. Moreover, 52 of the 76 significant associations which were not detected in previous studies might be some new potential loci. For instance, the trait-marker associations for heading days with PSM184, plant height with RM590, grain length/width with RM447, flag leaf length with RM287, flag leaf width with RM235, 1000-grain weight with RM538, and 1000grain weight with RM206, explained more than 10% of genetic variations both in 2008 and 2009. For heading days, two of the four significant trait-marker associations were identical to previous reported QTLs, i.e. RM341 and RM339, were identical to previous reported QTLs in the research of Mei et al. [48] and Kunihiro et al. [50], respectively. Moreover, RM339 was also significantly associated with heading days in Panel 2 and 3. For heading days, ten of 13 significant traitmarker associations were identical to previous reported QTLs, i.e. RM530 in the research of Mei et al. [53], RM138 in the research of Fang et al. [51], PSM130 in the research of Cao et al. [54], RM469 (which also showed significant association in Panel 2 and 3) and PSM184 in the research of Mei et al. [48], RM204 (which also showed significant association in Panel 2 and 3) and RM225 in the research of Yang et al. [49], RM219 (which also showed significant association in Panel 2 and 3) in the research of Xiao et al. [47], RM21 and RM147 in the research of Lanceras et al. [55]. For panicle length, the two significant trait-marker associations were also identical to previous reported QTLs, i.e. RM228 and PSM184 in the research of Mei et al. [53] and Jiang et al. [56], respectively. For grain length, three of ten significant traitmarker associations were identical to previous reported QTLs in the previous researches, i.e. RM127 in the research of Tan et al. [57], PSM158 in the research of Xing et al. [58], and PSM171 in the research of Yoshida et al. [59]. For grain length/width, two of nine significant trait-marker associations were identical to previous reported QTLs in the previous researches, i.e. RM276 and RM557 reported by Tan et al. [57]. For flag leaf width, one of nine significant trait-marker associations were identical to previous reported QTLs, i.e. RM571 in the research of Mei et al. [48]. For 1000-grain weight, there of eight significant trait-marker associations were identical to previous reported QTLs in the previous researches, i.e. RM7 in the research of Hittalmani et al. [60], RM239 in the research of Gao et al. [61], and RM206 in the research of Cho et al. (this reference cannot be found, but QTL ID can be found in GRAMENE website). For panicle number per plant, the only one significant trait-marker association was also identical to previous reported QTL, i.e. RM311 in the research of Kobayashi et al. [62].

Verification association mapping results within Panel 2 and Panel 3
It is worthwhile to further verify the significant associations identified within one population in a different population [29]. In this study, 55 SSR markers for the 76 trait-marker associations identified in Panel 1 were used to genotype two other populations, i.e. Panel 2 and Panel 3, and an association mapping was performed using both MLM and GLM approaches. When using the GLM approach, seven significant trait-marker associations were identical within Panel 1 and Panel 2 or Panel 3. Moreover, three of the seven identical significant trait-marker associations in the two panels were reported by previous studies. Although the GLM would bring more false positive results than the MLM when it was used alone, however, some significant trait-marker associations were first detected Panel 1 in our research and Table 6. The same trait-marker associations in Panel 2 and 3 using GLM model compared with those in Panel 1. proved by several statistical thresholds as well as by previous mapping results. After that, we used the GLM to verify our mapping results in Panel 2 and 3. Therefore, it makes sense for verification of association mapping results by the fact that some common trait-marker associations were detected by the GLM approach.
We observed that there were no overlapping QTLs among the three panels with the GLM approach. The reasons might be (1) different compositions and origins of the varieties in three panels, where Panel 1 only consists of original rice landraces from China and some other rice growing countries which were collected during 1920-1964 before the emergence of hybrid rice, while Panel 2 consists of rice landraces as well as modern rice cultivars and maintainer lines in hybrid rice breeding from China, and Panel 3 is a worldwide collection and consists of modern rice cultivars including cytoplasmic sterile line, maintainer lines, and some landraces; (2) that different allelic frequencies might exist for the three panels which consist of different compositions and origins. The explanations were supported by our observations that (1) frequency of some alleles was different in the three panels and some alleles only exist in one panel (Table S3 in File S1), and (2) in our another experiment some alleles associated with aluminum tolerance were different for different germplasm types (data not shown).
When using the MLM approach, no identical significant traitmarker associations were found among the three panels. Previous studies on linkage mapping and association mapping also found that different mapping populations detected different QTL regions [14,48,63,64,65]. The reasons might be due to that (1) a much lower number of SSR markers (55 SSRs) was used in Panel 2 and Panel 3 than in Panel 1 (274 SSRs); (2) the 55 SSR markers are associated with relevant traits which were not randomly distributed across the genome, which might reduce the exactness of measurement for population structure and kinship; (3) the relative kinship calculated by 274 SSRs in Panel 1 was quite different than those calculated by the 55 SSRs in Panel 2 and 3, where in Panel 1 only 4.73% of pairwise kinship coefficient were larger than 0.5 and most of them were zero, whereas 55.9% and 60.4% of pairwise kinship coefficient in Panel 2 and 3 were larger than 0.5, respectively ( Figure S8 in File S1); and (4) the degree of association might be reduced in MLM compared to those in GLM [50], which meant that when using much less SSR markers, the weak significant trait-marker associations in GLM might be not significant in MLM. As verification experiments were rarely performed in previous association studies, it is required to find an efficient solution for verification in future as well as to check the repeatability in different association mapping populations.

Prospects for association mapping based on core collections
Association mapping has become a promising approach to mine elite genes within germplasm populations compared to traditional linkage mapping. Association mapping based on a core collection would help to capture as much phenotypic variation as possible. Compared to a natural population or a breeding population with a broad genetic basis, the LD level in a core collection might be low due to its diverse origin. Therefore, more markers might be required for association mapping. However, due to the quick LD decay, fine mapping using association analysis might be possible with a core collection. As quick, automated, economic genotyping technologies (such as genotyping by sequencing) have been developed, genotyping large germplasm resources with high density markers and GWAS in such mapping populations has become possible. Because such an association could be further applied in rice breeding by molecular marker assisted selection, it would be promising to make use of the elite genes in the diverse germplasm resources by the current strategy.

Supporting Information
File S1 Table S1, Accessions, variety names, origin, germplasm types of 150 rice varieties in Panel 1. Table S2, Summary statistics of the 274 SSR markers used in this study. Table S3, Allele frequency of the 55 significant markers in three panels. Figure S1, Frequency distribution of heading days, plant height, seed set rate and panicle length in Panel 1 in 2008. The height of black bar represents the number of varieties in different range of traits. Figure S2, Frequency distribution of grain length, grain width, grain length/width and 1000 grain weight in Panel 1 in 2008. The height of black bar represents the number of varieties in different range of traits. Figure