Genome-Wide Association Study of Grain Appearance and Milling Quality in a Worldwide Collection of Indica Rice Germplasm

Grain appearance quality and milling quality are the main determinants of market value of rice. Breeding for improved grain quality is a major objective of rice breeding worldwide. Identification of genes/QTL controlling quality traits is the prerequisite for increasing breeding efficiency through marker-assisted selection. Here, we reported a genome-wide association study in indica rice to identify QTL associated with 10 appearance and milling quality related traits, including grain length, grain width, grain length to width ratio, grain thickness, thousand grain weight, degree of endosperm chalkiness, percentage of grains with chalkiness, brown rice rate, milled rice rate and head milled rice rate. A diversity panel consisting of 272 indica accessions collected worldwide was evaluated in four locations including Hangzhou, Jingzhou, Sanya and Shenzhen representing indica rice production environments in China and genotyped using genotyping-by-sequencing and Diversity Arrays Technology based on next-generation sequencing technique called DArTseq™. A wide range of variation was observed for all traits in all environments. A total of 16 different association analysis models were compared to determine the best model for each trait-environment combination. Association mapping based on 18,824 high quality markers yielded 38 QTL for the 10 traits. Five of the detected QTL corresponded to known genes or fine mapped QTL. Among the 33 novel QTL identified, qDEC1.1 (qGLWR1.1), qBRR2.2 (qGL2.1), qTGW2.1 (qGL2.2), qGW11.1 (qMRR11.1) and qGL7.1 affected multiple traits with relatively large effects and/or were detected in multiple environments. The research provided an insight of the genetic architecture of rice grain quality and important information for mining genes/QTL with large effects within indica accessions for rice breeding.


Introduction
Rice (Oryza sativa L.), a staple cereal crop, feeds more than half of the world's population. Improvement of rice yield and grain quality is the major objective of rice breeding worldwide. Grain quality primarily includes grain appearance, milling, eating and cooking and nutrition qualities. Grain appearance quality mainly includes grain shape and chalkiness. Grain shape, described by grain length (GL), grain width (GW), grain thickness (GT) and grain length to width ratio (GLWR), contributes to grain weight and yield and has a great impact on the market values of rice grain products. A short and round rice grain is generally preferred by consumers in Northern China, Korea and Japan, whereas consumers in Southern China, the USA, and South and Southeast Asian countries prefer a long and slender rice grain [1]. Chalkiness, according to its location within the endosperm, can be divided into white belly, white core and white back in rice grain [2]. Chalky grains, filled with loosely packed, round and large compound of starch granules, are more prone to breakage during milling and reduce head milled rice rate (HMRR). Furthermore, when chalky grains are cooked, cracks occur easily and reduce the palatability of the cooked rice [3]. Degree of endosperm chalkiness (DEC) and percentage of grains with chalkiness (PGWC) are commonly used to measure grain chalkiness. Milling quality determines the final yield and the broken kernel rate of the milled rice, which is of concern for consumers and farmers. Milling quality is measured by brown rice rate (BRR), milled rice rate (MRR) and HMRR. Brown rice is the de-hulled rice with the palea and lemma removed that can be used for cooking and eating. Milled rice is the result of brown rice after removing all of the bran, which consists of aleurone and pericarp, and germ or embryo. Head milled rice is kernel longer than or equal to 3/4 full length of a kernel. Among the above-mentioned three milling quality parameters, HMRR is the most important and greatly affects market value. HMRR depends on varietal characteristics, production factors, and harvesting, drying and milling processes.
However, QTL mapping using a single bi-parental population has a few important limitations including the need to create a population segregating for the target trait, the ability to assess only two alleles per locus, and the limited number of meiosis. Only a limited number of QTL can be identified, since only QTL for which the two parents differ can be detected. Given the limited scope of each study, mapping the same trait in different bi-parental populations may yield different QTL. More than two alleles are likely to segregate per locus, and the directions of QTL effects may vary depending on the genetic background because of epistasis, pleiotropy and QTL-by-environment interaction (QEI) [28].
An alternative QTL mapping approach is to conduct association analysis in a large germplasm collection, known as association mapping. Association mapping is based on linkage disequilibrium (LD) or the non-independence of alleles in a population. Association mapping studies using sparse SSR markers has been proven to be successful in identifying marker-trait associations (MTA) in rice [29][30][31][32]. With the development of high-throughput sequencing and SNP chip techniques, genome-wide association study (GWAS) using high density markers becomes more and more popular in rice genetics. The first GWAS in rice was conducted by Huang et al. [33] using 517 rice landraces genotyped by sallow sequencing. Among the QTL identified two and five were for GW and GL, respectively [33]. Applying the same method to a large panel of 950 worldwide rice accessions, Huang et al. [34] identified two, four and two QTL for GW, GL and thousand grain weight (TGW), respectively. Based on genotyping 44,100 SNP across 413 diverse accessions from 82 countries, Zhao et al. [35] identified eight,10 and 15 QTL for GW, GL and GLWR, respectively. Begum et al. [36] identified seven, 10 and 11 QTL for GW, GL and GLWR, respectively, using advanced lines from IRRI's irrigated breeding program genotyped with 71,710 SNP generated by Genotyping-by-sequencing (GBS). Similarly, using 533 O. sativa landrace and elite accessions genotyped with 4,358,600 SNP derived from GBS, Yang et al. [37] identified 10, 11, 19 and 21 QTL for GLWR, TGW, GW and GL, respectively. These studies clearly demonstrated that GWAS offers higher resolution for QTL mapping and can be used as a powerful complementary strategy to the classical linkage mapping using bi-parental segregation populations.
Many of the reported association studies in rice included grain traits, such as GL, GW, GLWR, GT and TGW. However, rice chalkiness and milling quality traits were less studied. The objective of this study is to identify markers associated with 10 grain appearance quality and milling quality traits using an indica collection of 272 accessions. The panel was phenotyped in four locations representing major indica rice production environments in China. GBS and Diversity Arrays Technology (DArT) based on next-generation sequencing technique called DArTseq™ was used to generate genome-wide markers. The association panel was characterized for population structure using three different methods and for LD pattern by estimating basal LD and LD decay in the whole population and subpopulations. Sixteen association mapping models were tested to choose the best model for each of the trait-environment combinations. A number of known QTL and novel QTL were identified.

Association mapping panel
A total of 272 indica rice accessions from 31 countries or regions (S1 Table) were used in this study. More accessions came from China (39), Philippines (36), Madagascar (29), India (28), Senegal (24), Sri Lanka (24) and Bangladesh (15). For other countries or regions the number of accessions was fewer than eight.

Phenotypic evaluation
Field trials were conducted using a randomized complete block design with two replicates in four environments including Hangzhou (HZ), Jingzhou (JZ), Sanya (SY) and Shenzhen (SZ) in China. The sowing and transplanting dates varied for the testing environments to fit into the local planting season. In all environments, the plot size was three rows of 10 plants planted at a spacing of 20cm x 25cm. Local farmers' management practices were followed. At maturity, eight plants in the middle row were harvested. The institutions including Yangtze University, Zhejiang Academy of Agricultural Sciences, Shenzhen Institute of Breeding & Innovation permitted the field trials were conducted in their experimental fields.
The nature dried seeds stored in room temperature for three months were used to measure quality traits. GL (mm), GW (mm), GLWR, GT (mm), TGW (g), BRR (%), MRR (%) and HMRR (%) were measured according to the National Rice Grain Quality Assessment Standard of China (GB/T17891-1999). The grain chalkiness characteristics were measured using a JMWT12 Rice Appearance Quality Detector (Dong Fu Jiu Heng, Beijing). PGWC (%) was the percentage of head milled grains with chalkiness. DEC (%) was calculated as the product of PGWC and chalkiness size (the area of chalkiness divided by the area of whole grain). All measurements were conducted for two independent samples.

Phenotypic analysis
Due to various reasons not all accessions had phenotypic data in all the four testing environments. The final population size for each of the trait-environment combinations varied greatly (Table 1). Phenotypic analysis was conducted using linear mixed models to properly handle the unbalance data. For single-site analysis, accession (genotype) was regarded as fixed effect and replicate as random effect. The best linear unbiased estimates (BLUE) of accessions were obtained. For multi-site analysis, all effects including accession (genotype), environment and replicate within environment were regarded as random to estimate variance components. The best linear unbiased predictions (BLUP) for each of the genotype-environment combinations were predicted. All analyses were conducted using the PBTools package of R [38] developed by IRRI (bbi.irri.org). Phenotypic correlations were computed from the BLUE using the "rcorr" function implemented in the R package Hmisc [39]. Narrow-sense heritability (h 2 ) based on genotypic means was computed using the estimated variance components as V G / (V G + V GEI /s +V e /sr). Where, V G , V GEI , V e are the variance of genotype, genotype by environment interaction (GEI) and residual error, respectively, s is the number of environments and r is the number of replicates.

Genotyping
Genomic DNA was extracted for each accession from the leaf tissues of a single plant using the MATAB method described in Risterucci et al. [40] and then diluted to 100 ng/μl. Genotyping was conducted at Diversity Arrays Technology Pty Ltd. (DArT P/L), Australia, using a method of DArTseq™. The method involves genome complexity reduction using PstI/TaqI restriction enzymes followed by Illumina short-read sequencing. PstI specific adapters tagged with 96 different barcodes to encode a plate of DNA samples were ligated to the restriction fragments. The resulting products were amplified and checked for quality. The 96 samples were pooled and run in a single lane on an Illumina Hiseq2000 instrument. The PstI adapters included a sequencing primer so that the tags generated were always read from the PstI sites. As detailed by Courtois et al. [41], the resulting sequences were filtered and split into their respective target datasets, and the barcode sequences were trimmed. The sequences were trimmed at 69 bp (5 bp of the restriction site plus 64 bases with a minimum Q score of 10). A proprietary analytical pipeline developed by DArT P/L was used to produce DArT score tables and SNP tables. The remaining 69 bp sequences were aligned to the Os-Nipponbare-Reference-IRGSP-1.0 [42] pseudomolecule assembly using Bowtie v0.12 [43] with a maximum of three mismatches to recover the position of the restriction site for the DArT markers and the position of the polymorphism(s) within the 69 bp sequences for the SNPs. For the DArT markers, the position given is that of the second base of the 6 base PstI restriction site (5'-C|TGCAG-3') because the mutated base is unknown and can be any of the six. The same sequences were then aligned to the pseudomolecules using BLAST (e-value <1.0 e-20) to assess whether additional sequences could be positioned. The sequences that had only one hit on the pseudomolecules or had more than one hit but with a difference of at least 1.0 e-5 between the first and the second hits were retained for further analyses. When the marker position fell within a Michigan State University-annotated gene (http://rice.plantbiology.msu.edu/), the feature was determined (intron, exon, 3' or 5' UTR), and the name and function of the gene were retrieved. Call rates were measured for all markers, and markers with call rates below 80% were discarded. The allele frequency of the remaining markers was then calculated, and markers for which the minor allele had a frequency below 2.5% were also discarded.

Imputation of missing genotypic data
Markers for which the minor allele frequency (MAF) is below 2.5% were discarded before imputing missing data. Missing data were estimated using Beagle v4.0 [44]. Beagle uses a localized haplotype cluster model. It is a special class of directed acyclic graph which empirically models haplotype frequencies on a local scale and therefore adapts to local structure in the data. The model determines a hidden Markov model that can be used to find the most likely haplotype pair for each individual, given the genotype data for that individual and the graphical haplotype frequency model. The method works iteratively using an expectation maximization type approach. The imputed missing data, probabilities of missing genotypes and inferred haplotypes are calculated from the model that is fitted at the final iteration.

Analysis of population structure
Different statistical methods were employed to infer the number of subgroups in the panel. First, a model based Bayesian clustering analysis method implemented in STRUCTURE software version 2.3.4 [45] was used. The program was run with the following parameters: k, the number of groups in the panel varying from 1 to 9; 10 runs per k value; for each run, 10,000 burnin iterations followed by 10,000 MCMC (Markov Chain Monte Carlo) iterations. The optimal number of groups (k) was determined by Δk following the method described in Evanno et al. [46]. STRUCTURE analysis was conducted for different numbers of markers evenly distributed on the whole genome. Very similar results were obtained when the number of markers was more than 1000. Second, a multi-dimensional scaling and cluster analysis method implemented in Awclust package [47] was used. Third, Tdiscriminant analysis of principal components (DAPC) method implemented in adegenet package [48] was used. The "find.cluster" function was used to identify clusters (k) and the optimal k value was determined according to the Bayesian Information Criterion (BIC), then the "dapc" function was used to check the classification quality.

Kinship coefficient
Three methods were used to calculate kinship coefficient matrix (K). (1) Pairwise_IBS: the distance between two accessions is computed as the proportion of SNP which are different. The IBS is computed by subtracting all values of the distance matrix from 2 and then scaling the resultant matrix so that the minimum value is 0 and the maximum value is 2. It is implemented in TASSEL 5.2.6 [49]. (2) The scaled_IBS: the IBS was scaled to have the mean diagonal element equal to 1+F, where F is the inbreeding coefficient of the current population. It is the default method of the TASSEL 5.2.6 [49]. (3) The VanRaden method: the realized relatedness between individuals is computed according to ZZ'/(2∑p i (1-p i )) where Z = W-P, W is the marker matrix, P contains the allele frequencies multiplied by 2, p i is the allele frequency of marker i, and the sum is over all loci [50]. It is the default method of GAPIT [51].
Linkage disequilibrium LD was measured by squared allele frequency correlations (r 2 ) values between the pairs of markers using TASSEL 5.2.6 [49]. Only markers with MAF greater than 10% were used. Marker pairs were discretized into bins of 10 kb and the median r 2 value was used as the estimate of r 2 of a bin. The estimated r 2 values were formulated as a function of physical distances between markers. A power law curve (y = ax k ) was fitted to determine the physical position (x) corresponding to a given r 2 value (y) [52]. The following scheme was used to estimate the background LD beyond which LD was assumed to be caused by genetic linkage: One marker per chromosome was randomly sampled and the maximum of all the pairwise r 2 values was taken.
The median of the maximum r 2 values of 10,000 samples was taken as the basal r 2 value. The physical distance at which r 2 reached to the basal value was determined as the LD decay distance of the population. All analysis was done for the whole population and the three subpopulations inferred by STRUCTURE.

Association analysis
All association models under the unified model for association mapping [53] can be described by considering how the two factors, the population structure (Q) and genetic relatedness between genotypes (K), are considered. We used four options for handling Q and four options for handling K to create 16 models. The four options for Q were: no Q, Q3 derived from STRUCTURE, Q7 derived from STRUCTURE and C7 derived from adegenet. The four options for K were: no K, K computed as pairwise_IBS (Kp scaled_IBS (Ks) and VanRaden method (Kv). All analyses were conducted using TASSEL 5.2.6 [49]. For models without K, known as general linear model (GLM) 1,000 permutations were used. For models with K, known as mixed liner model (MLM), the compressed mixed linear model approach [53,54] and P3D [54] algorithms were adopted to reduce computing time. The best model for each trait-environment combination was chosen using the mean square difference (MSD) between observed and expected p-values of all marker loci, as suggested by Stich et al. [55]. MSD was a measure for the deviation of the observed p-values from the uniform distribution. Model with a smaller MSD is more appropriate. The critical p-value for declaring significant MTA was set to 0.0001.

Basic statistics of markers
A total of 22,266 polymorphic markers were detected in the panel including 9,340 SNP and 12,926 DArT markers. By removing markers with MAF less than 5%, 18,824 high quality markers (7,885 SNP and 10,939 DArT) were used in association analysis. The number of markers per chromosome ranged from 891 on chromosome 10 to 2,361 on chromosome 1. The size of chromosome varied from 22.8Mb for chromosome 9 to 43.2 Mb for chromosome 1. The whole genome size was 371.7 Mb and the average distance between neighboring markers (marker spacing) was 20.2 kb. The average marker spacing ranged from 16.3 kb for chromosome 11 to 25.9 kb for chromosome 10 (S2 Table). About 70.4% of the neighboring markers distances were below the average value and 97.4% were less than 100 Kb. There were eight gaps devoid of markers larger than 500 kb on chromosomes 1 (D01_26116212-S01_26770440), 2 (D02_13852683-D02_15083642), 4 (D04_8765494-D04_9302397 and D04_16774867-D04_17319136), 6 (D06_18215124-D06_18803206), 7 (D07_11700243-S07_12785983 and S07_13836518-D07_14505070) and 11 (D11_11988263-S11_12639397). More than half of the markers (55.7%) had MAF lower than 0.20 (Fig 1).

Phenotypic variation and trait correlation
In general, the panel revealed a wide range variation for all the evaluated traits (Fig 2). Most of the traits appeared to be normally distributed, but some trait-environment combinations showed skewed distributions. For instance, the distributions of DEC in all environments, HMRR and PGWC in HZ and BRR in JZ were seriously skewed (Fig 2). Variance components derived from across-site analysis were given in Table 2. For appearance quality traits, the V G was much larger than V GEI , indicating that GEI was negligible. For BRR, MRR and HMRR, the V GEI was much larger than V G , suggesting that GEI was very important. However, the V e for HMRR was very large, indicating phenotyping precision was low. The h 2 for the appearance quality traits were high ranging from 0.82 for PGWC to 0.98 for GL and GLWR, while were low for the milling quality traits ranging from 0.19 for MRR to 0.27 for HMRR. The phenotypic correlations between traits were similar in the four testing environments. Low correlations were found between grain appearance quality traits and milling quality traits in all environments. GL showed a moderate and negative correlation with GW and a low and negative correlation with GT. Correlation between GW and GT was moderate and positive. TGW positively correlated with GL, GW and GT. High correlation was found between PGWC and DEC in all environments with coefficient ranging from 0.90 in SZ to 0.94 in HZ. The correlation between TGW and GT was moderate in all environments ranging from 0.46 in SY to 0.57 in HZ. Negative and moderate correlation was found between PGWC and HMRR. The correlation between PGWC and DED was strong and positive. The correlations between the three milling quality traits were positive ranging from low to moderate (Fig 3).

Population structure
According to the value of Δk, there were three groups in the current panel (S1 Hierarchical cluster analysis using Awclust suggested that the best number of clusters was also three based on Gap stastics (S1 Fig). The classification of accessions into groups was similar to that derived from STRUCTURE. Fifteen accessions of the group Q3-2 of STRUCTURE were assigned to the C3-3 of Awclust (S3 Fig).
The "find.clusters" function of adegenet showed a clear decrease of BIC until k = 7 clusters, after which BIC increased (S1 Fig). In this case, the elbow in the curve also matched the smallest BIC, and clearly indicated that seven clusters should be retained. Roughly speaking, the three groups detected by STRUCTURE were divided into two, two and three clusters by adegenet (S3 Fig). If we also chose seven groups for STRUCTURE analysis the Q3-1, Q3-2 and Q3-3 were further divided into two, two and four groups (S3 Fig). There were some differences in the assignments of accessions between k = 7 in STRUCTURE and adegent. The biggest difference was that the C7-4 of adegenet was distributed over five of the seven groups of STRUC-

Whole genome patterns of LD decay
The decay of LD along physical distances was computed for both the whole population (272 accessions) and three subpopulations inferred by STRUCTURE k = 3. The determination coefficients (R 2 ) of regression models were 0.87, 0.93, 0.88 and 0.95 for the whole population, Q3-1, Q3-2 and Q3-3, respectively (S3 Table). The maximum r 2 was observed in the 0-10 kb marker interval for the whole population and subpopulations. The maximum r 2 was lower in the whole population (0.65) than in the subpopulations. Q3-2 had the largest maximum r 2 than Q3-1 (0.76) and Q3-3 (0.71). The basal r 2 was 0.11, 0.16, 0.16 and 0.19 for the whole population, Q3-1, Q3-2 and Q3-3, respectively (S3 Table). The LD decay distance was 150 kb for the whole population, which was longer (slower) than those for Q3-1 (110 kb) and Q3-3 (80 kb) but shorter (faster) than that for Q3-2 (240 kb) (Fig 4 and S3 Table).

Association analysis
Comparison of models. A total of 16 models for detecting associations were compared to choose the best model for each of the trait-environment combinations using MSD. The complete MSD results were given in S4 Table. Fig 5 gave the quantile-quantile plots for the 16 models for GL measured in HZ to illustrate the relative differences between models. The best model varied with traits and environments (S4 Table). For all the trait-environment combinations except BRR in SY, the naïve model was always inferior to other models (S4 Table), indicating complex population structure and genetic relatedness between accessions present in our panel affected the detection of MTA. According to the average MSD value across all trait-environment combinations, the Q models (Q3, Q7 and C7)that corrected for population structure only, were inferior to the K models (Kp, Ks and Kv) that corrected for the genetic relationships between accessions and the QK models (Q3Kp, Q3Ks, Q3Kv, Q7Kp, Q7Ks, Q7Kv, C7Kp, C7Ks and C7Kv) that corrected for both of population structure and genetic relationship (S4 Table). The QK model was the best model for most of the trait-environment combinations. The Q7 model was slightly better than the other two Q models. The Kp model was better than the Ks model and Kv model. The C7Kp model was better than the other QK models. For the models with the same Q but different K, the Q-Kp and Q-Ks models were similar and were slightly better than the Q-Kv model. The models with the same K but different Q had very similar MSD values. In the following sections only the MTA identified using the best model for each of the trait-environment combinations were presented and discussed.  Marker-trait associations. A total of 74 markers on all 12 chromosomes were found to be significantly associated with the 10 measured traits. By delineating significant markers with LD higher than 0.2 into a single QTL, the 74 markers were grouped into 38 QTL (Table 3).
Six QTL were detected for HMRR. They were on chromosomes 3, 4, 5, 6, 8 and 12. qHMRR3.1 (D03_31372694 and D03_31380514) was identified in SZ with a R 2 of 9.5%. qHMRR4.1 (S04_28018950) was identified in HZ with a R 2 of 14.9%. qHMRR5.1 (S05_12426396) was identified in SZ with a R 2 of 7.9%. qHMRR6.1 (D06_2712642) was identified in HZ with a R 2 of 11.0%. qHMRR8.1 (S08_2638229) was identified in SZ with a R 2 of 8.9%. qHMRR12.1 (S012_26692818 and S012_26695253) was identified in SZ with a R 2 of 9.5% ( Table 3). Some of the QTL for different traits were in the same chromosomal regions and were regarded as the same QTL when the number of QTL was counted across traits. They were qDEC1.1 and qGLWR1.

Phenotypic variation and trait correlation
Great variation was observed for all traits in all testing environments in the present study, suggesting that the association panel can be effectively used for GWAS of various quality traits. Grain appearance quality traits, particularly grain size and weight traits had high heritability and showed only small GEI ( Table 2), suggesting that breeding for a wide range of growing environments could be feasible and effective. Grain chalkiness and milling quality traits had low heritability and large contribution of GEI and as a result will be more difficult to improve through conventional selection breeding.
The relationship between grain quality traits in rice is complex. In the present study, positive correlations were observed between TGW and GL, GW and GT, which was consistent with previous studies [2,26,56]. The coefficients were moderate, which might be the result of negative correlations between GL and GW and GT. The correlation between GL and GW were -0.54, -0.64, -0.50 and -0.55 in HZ, JZ, SY and SZ, respectively. However, a weak and positive (0.11 in SY) or negative (-0.11, -0.07 and -0.09 in JZ, HZ and SZ, respectively) correlation was observed between TGW and GLWR, which was different with the result of Xu et al. [57]. They found TGW and GLWR had a moderate and positive correlation (0.44). Consistent with previous reports, BRR was negatively correlated with GL but positively correlated with GW, even though the correlation coefficients were small. In the process of milling, the long and slender rice grain is easier to break than short and round one. Chalkiness is also an important factor influencing rice milling traits especially HMRR. Chalky grains are also more prone to breakage at chalky area. We found HMRR was negatively correlated with PGWC and DEC, which was consistent with the study of Zheng et al. [58].

LD pattern
The basal r 2 was 0.11 and estimated LD decay distance was 150 kb for the whole population (S3 Table). The LD decay distance ranged from 80 kb to 240 kb in three subpopulations. This was in agreement with previous finding that cultivated rice has a long range LD from close to 100 kb to over 200 kb [59,60]. Compared with the whole population, LD dropped much faster in Q3-1 and Q3-3 but more slowly in Q3-2. The effect of population structure on LD pattern has been widely reported [52,[60][61][62]. Varying patterns of LD decay in different subpopulations were likely reflecting their breeding histories and the origins of accessions in subpopulations [63] and may influence the QTL mapping resolution of the panel. Given that our average marker density (one marker per 20.2 kb) was higher than the r 2 decay, we expected to have reasonable resolution and power to identify common variants of large effect associated with traits of interest.

Model comparison
It is well-known that the results of association analysis using different models would be very different. However, there is no generally acceptable best model. It is likely that the best model differs among populations, traits and growing environments. The most appropriate model should be chosen by comparing different models. In the present study, MSD proposed by Stich et al. [55] was used to determine the best model for each of the 40 trait-environment combinations. The results showed that the naïve model and the Q models were almost always inferior to the K and QK models (S4 Table). The average MSD value across all trait-environment combinations of the naïve model was much higher compared to those of the Q models, indicating introduction of population structure could tremendously reduce false positives. Q derived from different numbers of subpopulations (k) also leaded to different results. According to MSD, increasing k from three to seven resulted in better control of false positives (S4 Table). When population structure was detected by different methods, the resultant Q and as a result the association results were different. For instance, the Q7 model, which used the Bayesian clustering analysis method implemented in STRUCTURE, was better than the C7 model, which used the Q from the DAPC method (S4 Table). It was clear from our results that the three Q models were rarely the best model and the K and QK models were better (S4 Table). This was consistent with most of the previous studies that showed that GLM method (Q model) was inferior to MLM method (K and QK models). A recent example in rice was the study ofCourtois et al. [41]. They performed association study for 16 root depth and associated traits in a panel of japonica accessions and chose the best model by comparing the likelihoods of each model using the BIC [64]. Their results showed that the QK model was the best model for 13 measured traits, the K model was the best for the other 3 traits, and the Q model was always inferior to the other two models.
For the models with the same Q but different K, the Q-Kp and Q-Ks models were similar and were slightly better than the Q-Kv model (S4 Table). The models with the same K but different Q had very similar MSD values, indicating that once the genetic relatedness between genotypes was well accounted, the differences between methods for inferring Q became less important. Therefore, when both Q and K were considered simultaneously, K played a more important role than Q and different K should be tried to determine the best model.
The best model differed for the same trait measured in different environments and different traits measured in the same environment (S4 Table). For traits with low heritability, the best model changed greatly across environments. For instance, BRR (h 2 = 0.27), MRR (h 2 = 0. 19) and HMRR (h 2 = 0.23), there was no common best model in the four environments. However, for traits with high heritability the same best model tended to be found across environments. For instance, Q7Kv was the best model for GL (h 2 = 0.98) in HZ, JZ and SY and C7Kv was the best model for GW (h 2 = 0.97) in HZ and SZ and Kv was the best model for GW in JZ and SY.
In the present study, we selected the model with the smallest MSD value as the best model. It was observed that models with very similar MSD values had very similar association results, indicating that MSD is a simple but effective method for choosing the appropriate model(s) in GWAS.

The effects of environment and QEI
In the present study, the total number of identified QTL varied greatly across environments, indicating that testing environment had important influence on GWAS. Out of the 38 identified QTL only one (qGLWR5.1 (qGT5.1)), four (qGL2.2, qGL3.1, qGL7.1 and qGW5.1) and four QTL (qGLWR1.1, qGL11.1, qPGWC5.1 (qDEC5.2) and qGLWR3.2) were detected in four, three and two environments, respectively (Table 3). Furthermore, even for qGLWR5.1, which was detected in all the four environments, the R 2 varied from 7.4% to 14.1% across environments. These results may be partially attributed to QEI. For BRR, MRR and HMRR that showed significant GEI, all QTL were detected in only one of the four environments (Table 3). However, the big differences in population size across environments may also be partially responsible for the observed differences between environments. For most of the traits the population size was much larger in SY and SZ than in HZ and JZ (Table 1).

GWAS is effective
In the present study, five out of the 38 identified QTL were adjacent to previously known genes or fine mapped QTL, suggesting that association mapping was an effective way to map QTL for rice grain quality traits. On chromosome 5, qGL5.1, qGW5.1, qGLWR5.1, qGT5.1, qTGW5.2, qPGWC5.1 and qDEC5.2 were located in genomic region (S05_5368086, S05_5368151 and S05_5369527) close to the cloned gene qSW5 affecting GW and TGW [18]. qSW5 was in 5362625bp -5370506bp in the Os-Nipponbare-Reference-IRGSP-1.0 according to the complete cds sequence of qSW5 in indica. The qSW5 functioned as negative regulator of grain width and weight and a deletion in qSW5 resulted in a significant increase in sink size owing to an increase in cell number in the outer glume of the rice flower and finally increased the grain width and weight [18]. The qSW5 was also reported to regulate GL, GLWR and GT [65]. However, the effects of qSW5 on PGWC and DEC have not been reported before. Our results suggested that qSW5 could be used to increase GL and reduce DEC and PGWC simultaneously. The qGL3.1 (S03_16663793, S03_16731182, S03_16748937, S03_16762099, S03_16858510, S03_16996600 and S03_17000111) and qGLWR3.2 (S03_16858510) associated with GL and GLWR, were close to GS3 (16729228-16735641) affecting grain shape. GS3 functions as a negative regulator for grain size which is a major QTL for GL and TGW and a minor QTL for GW and GT [16]. The qGL7.2 (S07_22844850) and qTGW7.2 (S07_22684516) were located in the region harboring the previously fine mapped QTL affecting GL, GW and GLWR, qGRL7.1 (22127.4Kb-24526.7Kb) [6]. The qGLWR7.1 (S07_25383179) was located in the region of Srs1 (25381698-25389547) affecting grain size [23]. The srs1 results in small and round seed due to the reduction in both cell length and cell number in the longitudinal direction, and the elongation of the cells in the lateral direction of the lemma. The qGL11.1 (S11_2576141) and qGLWR11.1 (S11_2576141) were close to CycT1;3 (2730143-2736649) whose down regulation results in a shorter grain.

The number of QTL identified is small
Compared to the previous GWAS studies in rice the number of QTLs detected was small in the present study. Many of the known genes affecting traits measured were not detected in the present study. The most likely reason is the small population size. Most of the association panels had more than 400 accessions while the present panel only had 230 or fewer accessions. The effect of population size on GWAS is well known. It is particularly important for detecting loci with low MAF to use a large panel. Because the panel size was small markers with MAF <0.05 were filleted out before association analysis. The trait-affecting genes were missed out if markers in strong LD with them had low MAF in our panel. For example, the gene PGL2 (LOC_Os02g51320) was involved in controlling grain length and weight of rice through interaction with a typical bHLH protein APG. Marker S02_31422564 very close to this gene (1.4 kb) was removed before association analysis since its MAF was only 0.02. The big differences between environments in population size could also explain why most of the QTL (53%) were detected in SZ. Nevertheless, loci with large effects were also detected for some of the traits with small population size. For instance, the population size of GW in JZ is only 132, but qGW5.1 was identified with a high R 2 of 16.2%. Therefore, provided that the MAF is high enough, major loci would still be detected even if the population size was small. Another possible reason is that the present panel consists of only indica accessions while most of the previous studies used panels with a few subspecicies/ecotypes. It is expected that genes that determine the major trait differences between subspecies/ecotypes could not be detected in our study but might be detectable in other studies. We also expected that more false positives were present when a panel with very diverse accessions was used. The use of a single subspecies allows controlling population structure at finer scale and as results reduces false positives. The critical significant threshold value used to declare significant association also influenced the analysis results. For instance, the qGW5.1 was found to be significant in JZ, SY and SZ but not in HZ. By inspecting the p values in HZ, it could be seen that the p values of S05_5368086, S05_5368151 and S05_5369527, were 0.00065, 0.00035 and 0.00014, respectively, which were only slightly higher than the applied critical significant threshold 10 −4 . It might be worth pointing out that the number of markers used in the present study was smaller than or similar to the reported GWAS in rice. However, the marker density was sufficient for most of the genomic regions based on LD decay pattern. Furthermore, the eight gaps devoid of markers below critical LD did not harbor any known genes affecting the studied traits. Therefore, the relatively low marker density was unlikely a major contributor to the small number of detected QTL. Nevertheless, the marker density might be too low in some chromosomal regions where LD decays more rapidly or breaks down due to recombination events [41]. For instance, the GW2 (LOC_Os02g14720) controlling grain width and weight, was in the marker interval of S02_8069581-S01_8181973. However, we could not identify any markers associated with GW or TGW in this region, since LD was very low (about 0.02). Additional markers in such regions are clearly needed.

Conclusions
Wide ranges of genetic variations were present in the indica association panel for all 10 measured grain quality traits. The accessions could serve as sources of novel genes and alleles for improving rice appearance quality and milling quality traits. GWAS conducted using proper statistical models is a powerful method for elucidating the genetic basis of rice quality traits. Although it is generally true that MLM is better than the naïve model and GLM (Q) model for GWAS, the most appropriate model varies across traits and environments and must be chosen by comparing different models. MSD was found to be a simple but effective method for choosing the appropriate model in GWAS. Based on the selected best model for each of the traitenvironment combinations, 38 QTL were identified for 10 rice grain appearance quality and milling quality traits including GL, GW, GLWR, GT, TGW, PGWC, DEC, BRR, MRR and HMRR. Eight genomic regions harbored QTL for more than one trait. Five QTL were found to concur with previously cloned genes or fine mapped QTL. Thirty-three were novel. Five of the novel QTL including qDEC1.1 (qGLWR1.1), qBRR2.2 (qGL2.1), qTGW2.1 (qGL2.2), qGW11.1 (qMRR11.1) and qGL7.1 affected multiple traits with relatively large effects and/or were detected in multiple environments. Our results provided an insight into the genetic architecture of rice grain appearance quality and milling quality traits. The novel QTL are important candidates for future functional characterization studies.