Grain appearance quality and milling quality are the main determinants of market value of rice. Breeding for improved grain quality is a major objective of rice breeding worldwide. Identification of genes/QTL controlling quality traits is the prerequisite for increasing breeding efficiency through marker-assisted selection. Here, we reported a genome-wide association study in indica rice to identify QTL associated with 10 appearance and milling quality related traits, including grain length, grain width, grain length to width ratio, grain thickness, thousand grain weight, degree of endosperm chalkiness, percentage of grains with chalkiness, brown rice rate, milled rice rate and head milled rice rate. A diversity panel consisting of 272 indica accessions collected worldwide was evaluated in four locations including Hangzhou, Jingzhou, Sanya and Shenzhen representing indica rice production environments in China and genotyped using genotyping-by-sequencing and Diversity Arrays Technology based on next-generation sequencing technique called DArTseq™. A wide range of variation was observed for all traits in all environments. A total of 16 different association analysis models were compared to determine the best model for each trait-environment combination. Association mapping based on 18,824 high quality markers yielded 38 QTL for the 10 traits. Five of the detected QTL corresponded to known genes or fine mapped QTL. Among the 33 novel QTL identified, qDEC1.1 (qGLWR1.1), qBRR2.2 (qGL2.1), qTGW2.1 (qGL2.2), qGW11.1 (qMRR11.1) and qGL7.1 affected multiple traits with relatively large effects and/or were detected in multiple environments. The research provided an insight of the genetic architecture of rice grain quality and important information for mining genes/QTL with large effects within indica accessions for rice breeding.
Citation: Qiu X, Pang Y, Yuan Z, Xing D, Xu J, Dingkuhn M, et al. (2015) Genome-Wide Association Study of Grain Appearance and Milling Quality in a Worldwide Collection of Indica Rice Germplasm. PLoS ONE 10(12): e0145577. doi:10.1371/journal.pone.0145577
Editor: Rattan Singh Yadav, Aberystwyth University, UNITED KINGDOM
Received: September 24, 2015; Accepted: December 4, 2015; Published: December 29, 2015
Copyright: © 2015 Qiu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This research was granted by Global Rice Science Partnership (GRiSP), National 863 High Technology Research and Development Program from the Ministry of Science and Technology of China (2014AA10A601), Shenzhen Peacock Plan, the Natural Science Foundation of China (31461143014, 31261140369), and Hubei Collaborative Innovation Center for Grain Industry.
Competing interests: The authors have declared that no competing interests exist.
Rice (Oryza sativa L.), a staple cereal crop, feeds more than half of the world’s population. Improvement of rice yield and grain quality is the major objective of rice breeding worldwide. Grain quality primarily includes grain appearance, milling, eating and cooking and nutrition qualities. Grain appearance quality mainly includes grain shape and chalkiness. Grain shape, described by grain length (GL), grain width (GW), grain thickness (GT) and grain length to width ratio (GLWR), contributes to grain weight and yield and has a great impact on the market values of rice grain products. A short and round rice grain is generally preferred by consumers in Northern China, Korea and Japan, whereas consumers in Southern China, the USA, and South and Southeast Asian countries prefer a long and slender rice grain . Chalkiness, according to its location within the endosperm, can be divided into white belly, white core and white back in rice grain . Chalky grains, filled with loosely packed, round and large compound of starch granules, are more prone to breakage during milling and reduce head milled rice rate (HMRR). Furthermore, when chalky grains are cooked, cracks occur easily and reduce the palatability of the cooked rice . Degree of endosperm chalkiness (DEC) and percentage of grains with chalkiness (PGWC) are commonly used to measure grain chalkiness. Milling quality determines the final yield and the broken kernel rate of the milled rice, which is of concern for consumers and farmers. Milling quality is measured by brown rice rate (BRR), milled rice rate (MRR) and HMRR. Brown rice is the de-hulled rice with the palea and lemma removed that can be used for cooking and eating. Milled rice is the result of brown rice after removing all of the bran, which consists of aleurone and pericarp, and germ or embryo. Head milled rice is kernel longer than or equal to 3/4 full length of a kernel. Among the above-mentioned three milling quality parameters, HMRR is the most important and greatly affects market value. HMRR depends on varietal characteristics, production factors, and harvesting, drying and milling processes.
Most of the rice quality determining traits are quantitatively inherited, controlled by multiple genes/QTL  and affected by growing environment . As for other complex traits, QTL mapping using bi-parental populations derived from parental lines of contrast performance has been widely used to identify QTL for various rice quality traits. Many genes/QTL for rice grain appearance quality and milling quality traits have been reported in the last decades. The GW1-1 and GW1-2 , qGRL1.1 , GS2 , GW3 and GW6 , qGL-4b , qPGWC-7  qGL-7 , qGRL7.1 , gw8.1 , gw9.1 , tgw11  have been fine mapped. The GW2 , GS3 , qGL-3 , qSW5 , GS5 , Chalk5 , TGW6 , GW6a , SRS1 , GL7/GW7 [24, 25], GW8  and CycT1;3  have been cloned. The usefulness of some of the well characterized genes/QTL was proven in an indica population of diverse breeding lines .
However, QTL mapping using a single bi-parental population has a few important limitations including the need to create a population segregating for the target trait, the ability to assess only two alleles per locus, and the limited number of meiosis. Only a limited number of QTL can be identified, since only QTL for which the two parents differ can be detected. Given the limited scope of each study, mapping the same trait in different bi-parental populations may yield different QTL. More than two alleles are likely to segregate per locus, and the directions of QTL effects may vary depending on the genetic background because of epistasis, pleiotropy and QTL- by- environment interaction (QEI) .
An alternative QTL mapping approach is to conduct association analysis in a large germplasm collection, known as association mapping. Association mapping is based on linkage disequilibrium (LD) or the non-independence of alleles in a population. Association mapping studies using sparse SSR markers has been proven to be successful in identifying marker-trait associations (MTA) in rice [29–32]. With the development of high-throughput sequencing and SNP chip techniques, genome-wide association study (GWAS) using high density markers becomes more and more popular in rice genetics. The first GWAS in rice was conducted by Huang et al.  using 517 rice landraces genotyped by sallow sequencing. Among the QTL identified two and five were for GW and GL, respectively . Applying the same method to a large panel of 950 worldwide rice accessions, Huang et al.  identified two, four and two QTL for GW, GL and thousand grain weight (TGW), respectively. Based on genotyping 44,100 SNP across 413 diverse accessions from 82 countries, Zhao et al.  identified eight,10 and 15 QTL for GW, GL and GLWR, respectively. Begum et al.  identified seven, 10 and 11 QTL for GW, GL and GLWR, respectively, using advanced lines from IRRI’s irrigated breeding program genotyped with 71,710 SNP generated by Genotyping-by-sequencing (GBS). Similarly, using 533 O. sativa landrace and elite accessions genotyped with 4,358,600 SNP derived from GBS, Yang et al.  identified 10, 11, 19 and 21 QTL for GLWR, TGW, GW and GL, respectively. These studies clearly demonstrated that GWAS offers higher resolution for QTL mapping and can be used as a powerful complementary strategy to the classical linkage mapping using bi-parental segregation populations.
Many of the reported association studies in rice included grain traits, such as GL, GW, GLWR, GT and TGW. However, rice chalkiness and milling quality traits were less studied. The objective of this study is to identify markers associated with 10 grain appearance quality and milling quality traits using an indica collection of 272 accessions. The panel was phenotyped in four locations representing major indica rice production environments in China. GBS and Diversity Arrays Technology (DArT) based on next-generation sequencing technique called DArTseq™ was used to generate genome-wide markers. The association panel was characterized for population structure using three different methods and for LD pattern by estimating basal LD and LD decay in the whole population and subpopulations. Sixteen association mapping models were tested to choose the best model for each of the trait-environment combinations. A number of known QTL and novel QTL were identified.
Materials and Methods
Association mapping panel
A total of 272 indica rice accessions from 31 countries or regions (S1 Table) were used in this study. More accessions came from China (39), Philippines (36), Madagascar (29), India (28), Senegal (24), Sri Lanka (24) and Bangladesh (15). For other countries or regions the number of accessions was fewer than eight.
Field trials were conducted using a randomized complete block design with two replicates in four environments including Hangzhou (HZ), Jingzhou (JZ), Sanya (SY) and Shenzhen (SZ) in China. The sowing and transplanting dates varied for the testing environments to fit into the local planting season. In all environments, the plot size was three rows of 10 plants planted at a spacing of 20cm x 25cm. Local farmers’ management practices were followed. At maturity, eight plants in the middle row were harvested. The institutions including Yangtze University, Zhejiang Academy of Agricultural Sciences, Shenzhen Institute of Breeding & Innovation permitted the field trials were conducted in their experimental fields.
The nature dried seeds stored in room temperature for three months were used to measure quality traits. GL (mm), GW (mm), GLWR, GT (mm), TGW (g), BRR (%), MRR (%) and HMRR (%) were measured according to the National Rice Grain Quality Assessment Standard of China (GB/T17891-1999). The grain chalkiness characteristics were measured using a JMWT12 Rice Appearance Quality Detector (Dong Fu Jiu Heng, Beijing). PGWC (%) was the percentage of head milled grains with chalkiness. DEC (%) was calculated as the product of PGWC and chalkiness size (the area of chalkiness divided by the area of whole grain). All measurements were conducted for two independent samples.
Due to various reasons not all accessions had phenotypic data in all the four testing environments. The final population size for each of the trait-environment combinations varied greatly (Table 1). Phenotypic analysis was conducted using linear mixed models to properly handle the unbalance data. For single-site analysis, accession (genotype) was regarded as fixed effect and replicate as random effect. The best linear unbiased estimates (BLUE) of accessions were obtained. For multi-site analysis, all effects including accession (genotype), environment and replicate within environment were regarded as random to estimate variance components. The best linear unbiased predictions (BLUP) for each of the genotype-environment combinations were predicted. All analyses were conducted using the PBTools package of R  developed by IRRI (bbi.irri.org). Phenotypic correlations were computed from the BLUE using the “rcorr” function implemented in the R package Hmisc . Narrow-sense heritability (h2) based on genotypic means was computed using the estimated variance components as VG / (VG + VGEI /s +Ve /sr). Where, VG, VGEI, Ve are the variance of genotype, genotype by environment interaction (GEI) and residual error, respectively, s is the number of environments and r is the number of replicates.
Genomic DNA was extracted for each accession from the leaf tissues of a single plant using the MATAB method described in Risterucci et al.  and then diluted to 100 ng/μl. Genotyping was conducted at Diversity Arrays Technology Pty Ltd. (DArT P/L), Australia, using a method of DArTseq™. The method involves genome complexity reduction using PstI/TaqI restriction enzymes followed by Illumina short-read sequencing. PstI specific adapters tagged with 96 different barcodes to encode a plate of DNA samples were ligated to the restriction fragments. The resulting products were amplified and checked for quality. The 96 samples were pooled and run in a single lane on an Illumina Hiseq2000 instrument. The PstI adapters included a sequencing primer so that the tags generated were always read from the PstI sites. As detailed by Courtois et al. , the resulting sequences were filtered and split into their respective target datasets, and the barcode sequences were trimmed. The sequences were trimmed at 69 bp (5 bp of the restriction site plus 64 bases with a minimum Q score of 10). A proprietary analytical pipeline developed by DArT P/L was used to produce DArT score tables and SNP tables. The remaining 69 bp sequences were aligned to the Os-Nipponbare-Reference-IRGSP-1.0  pseudomolecule assembly using Bowtie v0.12  with a maximum of three mismatches to recover the position of the restriction site for the DArT markers and the position of the polymorphism(s) within the 69 bp sequences for the SNPs. For the DArT markers, the position given is that of the second base of the 6 base PstI restriction site (5'-C|TGCAG-3') because the mutated base is unknown and can be any of the six. The same sequences were then aligned to the pseudomolecules using BLAST (e-value <1.0 e-20) to assess whether additional sequences could be positioned. The sequences that had only one hit on the pseudomolecules or had more than one hit but with a difference of at least 1.0 e-5 between the first and the second hits were retained for further analyses. When the marker position fell within a Michigan State University-annotated gene (http://rice.plantbiology.msu.edu/), the feature was determined (intron, exon, 3' or 5' UTR), and the name and function of the gene were retrieved. Call rates were measured for all markers, and markers with call rates below 80% were discarded. The allele frequency of the remaining markers was then calculated, and markers for which the minor allele had a frequency below 2.5% were also discarded.
Imputation of missing genotypic data
Markers for which the minor allele frequency (MAF) is below 2.5% were discarded before imputing missing data. Missing data were estimated using Beagle v4.0 . Beagle uses a localized haplotype cluster model. It is a special class of directed acyclic graph which empirically models haplotype frequencies on a local scale and therefore adapts to local structure in the data. The model determines a hidden Markov model that can be used to find the most likely haplotype pair for each individual, given the genotype data for that individual and the graphical haplotype frequency model. The method works iteratively using an expectation maximization type approach. The imputed missing data, probabilities of missing genotypes and inferred haplotypes are calculated from the model that is fitted at the final iteration.
Analysis of population structure
Different statistical methods were employed to infer the number of subgroups in the panel. First, a model based Bayesian clustering analysis method implemented in STRUCTURE software version 2.3.4  was used. The program was run with the following parameters: k, the number of groups in the panel varying from 1 to 9; 10 runs per k value; for each run, 10,000 burnin iterations followed by 10,000 MCMC (Markov Chain Monte Carlo) iterations. The optimal number of groups (k) was determined by Δk following the method described in Evanno et al. . STRUCTURE analysis was conducted for different numbers of markers evenly distributed on the whole genome. Very similar results were obtained when the number of markers was more than 1000. Second, a multi-dimensional scaling and cluster analysis method implemented in Awclust package  was used. Third, Tdiscriminant analysis of principal components (DAPC) method implemented in adegenet package  was used. The “find.cluster” function was used to identify clusters (k) and the optimal k value was determined according to the Bayesian Information Criterion (BIC), then the “dapc” function was used to check the classification quality.
Three methods were used to calculate kinship coefficient matrix (K). (1) Pairwise_IBS: the distance between two accessions is computed as the proportion of SNP which are different. The IBS is computed by subtracting all values of the distance matrix from 2 and then scaling the resultant matrix so that the minimum value is 0 and the maximum value is 2. It is implemented in TASSEL 5.2.6 . (2) The scaled_IBS: the IBS was scaled to have the mean diagonal element equal to 1+F, where F is the inbreeding coefficient of the current population. It is the default method of the TASSEL 5.2.6 . (3) The VanRaden method: the realized relatedness between individuals is computed according to ZZ'/(2∑pi(1-pi)) where Z = W-P, W is the marker matrix, P contains the allele frequencies multiplied by 2, pi is the allele frequency of marker i, and the sum is over all loci . It is the default method of GAPIT .
LD was measured by squared allele frequency correlations (r2) values between the pairs of markers using TASSEL 5.2.6 . Only markers with MAF greater than 10% were used. Marker pairs were discretized into bins of 10 kb and the median r2 value was used as the estimate of r2 of a bin. The estimated r2 values were formulated as a function of physical distances between markers. A power law curve (y = axk) was fitted to determine the physical position (x) corresponding to a given r2 value (y) . The following scheme was used to estimate the background LD beyond which LD was assumed to be caused by genetic linkage: One marker per chromosome was randomly sampled and the maximum of all the pairwise r2 values was taken. The median of the maximum r2 values of 10,000 samples was taken as the basal r2 value. The physical distance at which r2 reached to the basal value was determined as the LD decay distance of the population. All analysis was done for the whole population and the three subpopulations inferred by STRUCTURE.
All association models under the unified model for association mapping  can be described by considering how the two factors, the population structure (Q) and genetic relatedness between genotypes (K), are considered. We used four options for handling Q and four options for handling K to create 16 models. The four options for Q were: no Q, Q3 derived from STRUCTURE, Q7 derived from STRUCTURE and C7 derived from adegenet. The four options for K were: no K, K computed as pairwise_IBS (Kp scaled_IBS (Ks) and VanRaden method (Kv). All analyses were conducted using TASSEL 5.2.6 . For models without K, known as general linear model (GLM) 1,000 permutations were used. For models with K, known as mixed liner model (MLM), the compressed mixed linear model approach [53, 54] and P3D  algorithms were adopted to reduce computing time. The best model for each trait-environment combination was chosen using the mean square difference (MSD) between observed and expected p-values of all marker loci, as suggested by Stich et al. . MSD was a measure for the deviation of the observed p-values from the uniform distribution. Model with a smaller MSD is more appropriate. The critical p-value for declaring significant MTA was set to 0.0001.
Basic statistics of markers
A total of 22,266 polymorphic markers were detected in the panel including 9,340 SNP and 12,926 DArT markers. By removing markers with MAF less than 5%, 18,824 high quality markers (7,885 SNP and 10,939 DArT) were used in association analysis. The number of markers per chromosome ranged from 891 on chromosome 10 to 2,361 on chromosome 1. The size of chromosome varied from 22.8Mb for chromosome 9 to 43.2 Mb for chromosome 1. The whole genome size was 371.7 Mb and the average distance between neighboring markers (marker spacing) was 20.2 kb. The average marker spacing ranged from 16.3 kb for chromosome 11 to 25.9 kb for chromosome 10 (S2 Table). About 70.4% of the neighboring markers distances were below the average value and 97.4% were less than 100 Kb. There were eight gaps devoid of markers larger than 500 kb on chromosomes 1 (D01_26116212—S01_26770440), 2 (D02_13852683—D02_15083642), 4 (D04_8765494—D04_9302397 and D04_16774867—D04_17319136), 6 (D06_18215124—D06_18803206), 7 (D07_11700243—S07_12785983 and S07_13836518—D07_14505070) and 11 (D11_11988263—S11_12639397). More than half of the markers (55.7%) had MAF lower than 0.20 (Fig 1).
Phenotypic variation and trait correlation
In general, the panel revealed a wide range variation for all the evaluated traits (Fig 2). Most of the traits appeared to be normally distributed, but some trait-environment combinations showed skewed distributions. For instance, the distributions of DEC in all environments, HMRR and PGWC in HZ and BRR in JZ were seriously skewed (Fig 2). Variance components derived from across-site analysis were given in Table 2. For appearance quality traits, the VG was much larger than VGEI, indicating that GEI was negligible. For BRR, MRR and HMRR, the VGEI was much larger than VG, suggesting that GEI was very important. However, the Ve for HMRR was very large, indicating phenotyping precision was low. The h2 for the appearance quality traits were high ranging from 0.82 for PGWC to 0.98 for GL and GLWR, while were low for the milling quality traits ranging from 0.19 for MRR to 0.27 for HMRR.
HZ: Hangzhou. JZ: Jingzhou. SY: Sanya. SZ: Shenzhen. GL: Grain Length. GW: Grain Width. GLWR: Grain Length and Width Ratio. GT: Grain Thickness. TGW: Thousand Grain Weight. BRR: Brown Rice Rate. MRR: Milled Rice Rate. HMRR: Head Milled Rice Rate. PGWC: Percentage of Grains With Chalkiness. DEC: Degree of Endosperm chalkiness.
The phenotypic correlations between traits were similar in the four testing environments. Low correlations were found between grain appearance quality traits and milling quality traits in all environments. GL showed a moderate and negative correlation with GW and a low and negative correlation with GT. Correlation between GW and GT was moderate and positive. TGW positively correlated with GL, GW and GT. High correlation was found between PGWC and DEC in all environments with coefficient ranging from 0.90 in SZ to 0.94 in HZ. The correlation between TGW and GT was moderate in all environments ranging from 0.46 in SY to 0.57 in HZ. Negative and moderate correlation was found between PGWC and HMRR. The correlation between PGWC and DED was strong and positive. The correlations between the three milling quality traits were positive ranging from low to moderate (Fig 3).
According to the value of Δk, there were three groups in the current panel (S1 Fig). Group-I consisted of 88 accessions from 21 countries with 36 from China (36). Group-II had 96 accessions from 19 countries with 30 and 23 from Philippines and Senegal. Group-III had 88 accessions from 13 countries with more accessions from Madagascar (28), India (20) and Sri Lanka (19) (S2 Fig).
Hierarchical cluster analysis using Awclust suggested that the best number of clusters was also three based on Gap stastics (S1 Fig). The classification of accessions into groups was similar to that derived from STRUCTURE. Fifteen accessions of the group Q3-2 of STRUCTURE were assigned to the C3-3 of Awclust (S3 Fig).
The “find.clusters” function of adegenet showed a clear decrease of BIC until k = 7 clusters, after which BIC increased (S1 Fig). In this case, the elbow in the curve also matched the smallest BIC, and clearly indicated that seven clusters should be retained. Roughly speaking, the three groups detected by STRUCTURE were divided into two, two and three clusters by adegenet (S3 Fig). If we also chose seven groups for STRUCTURE analysis the Q3-1, Q3-2 and Q3-3 were further divided into two, two and four groups (S3 Fig). There were some differences in the assignments of accessions between k = 7 in STRUCTURE and adegent. The biggest difference was that the C7-4 of adegenet was distributed over five of the seven groups of STRUCTURE (S3 Fig).
Whole genome patterns of LD decay
The decay of LD along physical distances was computed for both the whole population (272 accessions) and three subpopulations inferred by STRUCTURE k = 3. The determination coefficients (R2) of regression models were 0.87, 0.93, 0.88 and 0.95 for the whole population, Q3-1, Q3-2 and Q3-3, respectively (S3 Table). The maximum r2 was observed in the 0–10 kb marker interval for the whole population and subpopulations. The maximum r2 was lower in the whole population (0.65) than in the subpopulations. Q3-2 had the largest maximum r2 than Q3-1 (0.76) and Q3-3 (0.71). The basal r2 was 0.11, 0.16, 0.16 and 0.19 for the whole population, Q3-1, Q3-2 and Q3-3, respectively (S3 Table). The LD decay distance was 150 kb for the whole population, which was longer (slower) than those for Q3-1 (110 kb) and Q3-3 (80 kb) but shorter (faster) than that for Q3-2 (240 kb) (Fig 4 and S3 Table).
Comparison of models.
A total of 16 models for detecting associations were compared to choose the best model for each of the trait-environment combinations using MSD. The complete MSD results were given in S4 Table. Fig 5 gave the quantile-quantile plots for the 16 models for GL measured in HZ to illustrate the relative differences between models. The best model varied with traits and environments (S4 Table). For all the trait-environment combinations except BRR in SY, the naïve model was always inferior to other models (S4 Table), indicating complex population structure and genetic relatedness between accessions present in our panel affected the detection of MTA. According to the average MSD value across all trait-environment combinations, the Q models (Q3, Q7 and C7)that corrected for population structure only, were inferior to the K models (Kp, Ks and Kv) that corrected for the genetic relationships between accessions and the QK models (Q3Kp, Q3Ks, Q3Kv, Q7Kp, Q7Ks, Q7Kv, C7Kp, C7Ks and C7Kv) that corrected for both of population structure and genetic relationship (S4 Table). The QK model was the best model for most of the trait-environment combinations. The Q7 model was slightly better than the other two Q models. The Kp model was better than the Ks model and Kv model. The C7Kp model was better than the other QK models. For the models with the same Q but different K, the Q-Kp and Q-Ks models were similar and were slightly better than the Q-Kv model. The models with the same K but different Q had very similar MSD values. In the following sections only the MTA identified using the best model for each of the trait-environment combinations were presented and discussed.
A total of 74 markers on all 12 chromosomes were found to be significantly associated with the 10 measured traits. By delineating significant markers with LD higher than 0.2 into a single QTL, the 74 markers were grouped into 38 QTL (Table 3).
For GL, eight QTL were identified on chromosomes 2, 3, 5, 7, 9 and 11. qGL2.1 (D02_25608957) was detected in SY and explained 7.5% of the phenotypic variation (R2 = 7.5%). qGL2.2 (S02_305731433, S02_30664746 and S02_30664811) was detected in HZ, SY and SZ with R2 being 8.1%, 7.4% and 7.0%, respectively. qGL3.1 (S03_16663793, S03_16731182, S03_16748937, S03_16762099, S03_16858510, S03_16996600 and S03_17000111) was identified in HZ, SY and SZ with R2 being 8.0%, 10.2% and 10.5%, respectively. qGL5.1 (S05_5368086, S05_5368151 and S05_5369527) was detected in SY with a R2 of 7.5%. The qGL7.1 (S07_22019132, S07_22087092 and S07_22099651) was detected in HZ, SY and SZ and explained 7.8%, 7.7% and 6.4% of the phenotypic variance. qGL7.2 (S07_22844850) was detected in SY with a R2 of 7.5%. qGL9.1 (S09_10774239) was detected in JZ and explained 11.7% of the phenotypic variance. qGL11.1 (S011_2576141) was detected in JZ and SY with R2 being 12.0% and 9.6%, respectively (Table 3).
For GW, five QTL on chromosomes 4, 5, 7, 10 and 11 were identified. qGW4.1 (S04_7167748) was detected in SZ with a R2 of 6.8%. qGW5.1 (S05_5368086, S05_5368151 and S05_5369527) was detected in JZ, SY and SZ with R2 being 16.2%, 10.0% and 12.8%, respectively. qGW7.1 (S07_19590107 and S07_19592078) was detected in JZ with a R2 of 13.6%. qGW10.1 (S010_14195109 and S010_14262980) was detected in SZ with a R2 of 6.6%. qGW11.1 (D011_7124485) was detected in SZ with a R2 of 7.0% (Table 3).
For GLWR, six QTL were identified on chromosomes 1, 3, 5, 7 and 11. qGLWR1.1 (S01_5811755) was detected in HZ and SZ with R2 being 7.4% and 5.8%, respectively. qGLWR3.1 (D03_14077712) was detected in SZ with a R2 of 5.8%. qGLWR3.2 (S03_16858510) was detected in SY and SZ with R2 being 6.3% and 6.4%, respectively. qGLWR5.1 (S05_5368086, S05_5368151 and S05_5369527) was detected in HZ, JZ, SY and SZ with R2 being 7.4%, 14.1%, 11.4% and 7.6%, respectively. qGLWR7.1 (S07_25383179) was detected in SY with a R2 of 6.6%. qGLWR11.1 (S011_2576141) was detected in JZ and explained 6.6% of the phenotypic variance (Table 3).
For GT, four QTL were found on chromosome 1, 4 and 5. qGT1.1 (D01_2405901) was detected in HZ with a R2 of 9.4%. qGT4.1 (S04_29442031 and S04_29495556) was detected in SZ with a R2 of 7.0%. qGT5.1 (S05_5368086, S05_5368151 and S05_5369527) was detected in HZ, JZ, SY and SZ with R2 being 13.1%, 8.6%, 8.5% and 15.8%, respectively. qGT5.2 (S05_18307877, S05_18314309 and S05_18321742) was detected in SZ with a R2 of 6.7% (Table 3).
For TGW, five QTL were found on chromosome 2, 5 and 7. qTGW2.1 (S02_30688426) was detected in SZ with a R2 of 7.7%. qTGW5.1 (S05_4808850, S05_4856967, S05_4880400 and S05_4889259) was detected in HZ with a R2 of 10.1%. qTGW5.2 (S05_5368086 and S05_5369527) was detected in SZ with a R2 of 8.9%. qTGW7.1 (S07_21628162) was detected in HZ with a R2 of 10.3%. qTGW7.2 (S07_22684516) was detected in SZ with a R2 of 7.7% (Table 3).
For PGWC, two QTL on chromosomes 5 and 6 were identified. qPGWC5.1 (S05_5368086, S05_5368151 and S05_5369527) was detected in JZ and SZ with R2 being 10.9% and 17.1%, respectively. qPGWC6.1 (S06_6372444) was detected in SZ with a R2 of 10.8% (Table 3).
For DEC, five QTL were identified on chromosomes 1, 5 and 9. qDEC1.1 (S01_5811836 and S01_5972775) was detected in SZ with a R2 of 7.4%. qDEC5.1 (S05_4180511) was detected in HZ with a R2 of 10.3%. qDEC5.2 (S05_5368086, S05_5368151 and S05_5369527) was detected in SY and SZ with R2 being 8.7% and 8.3%, respectively. qDEC5.3 (S05_23647280) was detected in SZ with a R2 of 7.9%. qDEC9.1 (S09_16262697) was detected in HZ with a R2 of 11.1% (Table 3).
For BRR, five QTL located on chromosomes 2, 3, 4 and 7 were detected. qBRR2.1 (D02_8060607 and D02_8075892) was identified in SZ with a R2 of 8.5%. qBRR2.2 (D02_25652984) was identified in SZ with a R2 of 15.6%. qBRR3.1 (S03_22356459) was identified in HZ with a R2 of 10.6%. qBRR4.1 (S04_32002470) was identified in HZ with a R2 of 14.5% (Table 3).
For MRR, five QTL located on chromosomes 1, 2, 3, 7 and 11 were detected. qMRR1.1 (S01_41957924, S01_41957989 and S01_41970208) was identified in HZ with a R2 of 6.6%. qMRR2.1 (D02_26022390, D02_26022455, D02_26163208, D02_26265084 and D02_26312559) was identified in JZ with a R2 of 22.8%. qMRR3.1 (S03_4636350) was identified in SZ with a R2 of 7.3%. qMRR7.1 (S07_5075055 and S07_5081323) was identified in JZ with a R2 of12.8%. qMRR11.1 (D11_6895700) was identified in SY with a R2 of 7.8% (Table 3).
Six QTL were detected for HMRR. They were on chromosomes 3, 4, 5, 6, 8 and 12. qHMRR3.1 (D03_31372694 and D03_31380514) was identified in SZ with a R2 of 9.5%. qHMRR4.1 (S04_28018950) was identified in HZ with a R2 of 14.9%. qHMRR5.1 (S05_12426396) was identified in SZ with a R2 of 7.9%. qHMRR6.1 (D06_2712642) was identified in HZ with a R2 of 11.0%. qHMRR8.1 (S08_2638229) was identified in SZ with a R2 of 8.9%. qHMRR12.1 (S012_26692818 and S012_26695253) was identified in SZ with a R2 of 9.5% (Table 3).
Some of the QTL for different traits were in the same chromosomal regions and were regarded as the same QTL when the number of QTL was counted across traits. They were qDEC1.1 and qGLWR1.1 on chromosome 1; qBRR2.2 and qGL2.1 on chromosome 2; qGL2.2 and qTGW2.1 on chromosome 2; qGLWR3.2 and qGL3.1 on chromosome 3; qGL5.1, qGW5.1, qGLWR5.1, qGT5.1, qTGW5.2, qPGWC5.1 and qDEC5.2 on chromosome 5; qGL7.2 and qTGW7.2 on chromosome 7; qGW11.1 and qMRR11.1 on chromosome 11; and qGLWR11.1 and qGL11.1 on chromosome 11.
Phenotypic variation and trait correlation
Great variation was observed for all traits in all testing environments in the present study, suggesting that the association panel can be effectively used for GWAS of various quality traits. Grain appearance quality traits, particularly grain size and weight traits had high heritability and showed only small GEI (Table 2), suggesting that breeding for a wide range of growing environments could be feasible and effective. Grain chalkiness and milling quality traits had low heritability and large contribution of GEI and as a result will be more difficult to improve through conventional selection breeding.
The relationship between grain quality traits in rice is complex. In the present study, positive correlations were observed between TGW and GL, GW and GT, which was consistent with previous studies [2, 26, 56]. The coefficients were moderate, which might be the result of negative correlations between GL and GW and GT. The correlation between GL and GW were -0.54, -0.64, -0.50 and -0.55 in HZ, JZ, SY and SZ, respectively. However, a weak and positive (0.11 in SY) or negative (-0.11, -0.07 and -0.09 in JZ, HZ and SZ, respectively) correlation was observed between TGW and GLWR, which was different with the result of Xu et al. . They found TGW and GLWR had a moderate and positive correlation (0.44). Consistent with previous reports, BRR was negatively correlated with GL but positively correlated with GW, even though the correlation coefficients were small. In the process of milling, the long and slender rice grain is easier to break than short and round one. Chalkiness is also an important factor influencing rice milling traits especially HMRR. Chalky grains are also more prone to breakage at chalky area. We found HMRR was negatively correlated with PGWC and DEC, which was consistent with the study of Zheng et al. .
The basal r2 was 0.11 and estimated LD decay distance was 150 kb for the whole population (S3 Table). The LD decay distance ranged from 80 kb to 240 kb in three subpopulations. This was in agreement with previous finding that cultivated rice has a long range LD from close to 100 kb to over 200 kb [59, 60]. Compared with the whole population, LD dropped much faster in Q3-1 and Q3-3 but more slowly in Q3-2. The effect of population structure on LD pattern has been widely reported [52, 60–62]. Varying patterns of LD decay in different subpopulations were likely reflecting their breeding histories and the origins of accessions in subpopulations  and may influence the QTL mapping resolution of the panel. Given that our average marker density (one marker per 20.2 kb) was higher than the r2 decay, we expected to have reasonable resolution and power to identify common variants of large effect associated with traits of interest.
It is well-known that the results of association analysis using different models would be very different. However, there is no generally acceptable best model. It is likely that the best model differs among populations, traits and growing environments. The most appropriate model should be chosen by comparing different models. In the present study, MSD proposed by Stich et al.  was used to determine the best model for each of the 40 trait-environment combinations. The results showed that the naïve model and the Q models were almost always inferior to the K and QK models (S4 Table). The average MSD value across all trait-environment combinations of the naïve model was much higher compared to those of the Q models, indicating introduction of population structure could tremendously reduce false positives. Q derived from different numbers of subpopulations (k) also leaded to different results. According to MSD, increasing k from three to seven resulted in better control of false positives (S4 Table). When population structure was detected by different methods, the resultant Q and as a result the association results were different. For instance, the Q7 model, which used the Bayesian clustering analysis method implemented in STRUCTURE, was better than the C7 model, which used the Q from the DAPC method (S4 Table). It was clear from our results that the three Q models were rarely the best model and the K and QK models were better (S4 Table). This was consistent with most of the previous studies that showed that GLM method (Q model) was inferior to MLM method (K and QK models). A recent example in rice was the study ofCourtois et al. . They performed association study for 16 root depth and associated traits in a panel of japonica accessions and chose the best model by comparing the likelihoods of each model using the BIC . Their results showed that the QK model was the best model for 13 measured traits, the K model was the best for the other 3 traits, and the Q model was always inferior to the other two models.
For the models with the same Q but different K, the Q-Kp and Q-Ks models were similar and were slightly better than the Q-Kv model (S4 Table). The models with the same K but different Q had very similar MSD values, indicating that once the genetic relatedness between genotypes was well accounted, the differences between methods for inferring Q became less important. Therefore, when both Q and K were considered simultaneously, K played a more important role than Q and different K should be tried to determine the best model.
The best model differed for the same trait measured in different environments and different traits measured in the same environment (S4 Table). For traits with low heritability, the best model changed greatly across environments. For instance, BRR (h2 = 0.27), MRR (h2 = 0.19) and HMRR (h2 = 0.23), there was no common best model in the four environments. However, for traits with high heritability the same best model tended to be found across environments. For instance, Q7Kv was the best model for GL (h2 = 0.98) in HZ, JZ and SY and C7Kv was the best model for GW (h2 = 0.97) in HZ and SZ and Kv was the best model for GW in JZ and SY.
In the present study, we selected the model with the smallest MSD value as the best model. It was observed that models with very similar MSD values had very similar association results, indicating that MSD is a simple but effective method for choosing the appropriate model(s) in GWAS.
The effects of environment and QEI
In the present study, the total number of identified QTL varied greatly across environments, indicating that testing environment had important influence on GWAS. Out of the 38 identified QTL only one (qGLWR5.1 (qGT5.1)), four (qGL2.2, qGL3.1, qGL7.1 and qGW5.1) and four QTL (qGLWR1.1, qGL11.1, qPGWC5.1 (qDEC5.2) and qGLWR3.2) were detected in four, three and two environments, respectively (Table 3). Furthermore, even for qGLWR5.1, which was detected in all the four environments, the R2 varied from 7.4% to 14.1% across environments. These results may be partially attributed to QEI. For BRR, MRR and HMRR that showed significant GEI, all QTL were detected in only one of the four environments (Table 3). However, the big differences in population size across environments may also be partially responsible for the observed differences between environments. For most of the traits the population size was much larger in SY and SZ than in HZ and JZ (Table 1).
GWAS is effective
In the present study, five out of the 38 identified QTL were adjacent to previously known genes or fine mapped QTL, suggesting that association mapping was an effective way to map QTL for rice grain quality traits. On chromosome 5, qGL5.1, qGW5.1, qGLWR5.1, qGT5.1, qTGW5.2, qPGWC5.1 and qDEC5.2 were located in genomic region (S05_5368086, S05_5368151 and S05_5369527) close to the cloned gene qSW5 affecting GW and TGW . qSW5 was in 5362625bp - 5370506bp in the Os-Nipponbare-Reference- IRGSP-1.0 according to the complete cds sequence of qSW5 in indica. The qSW5 functioned as negative regulator of grain width and weight and a deletion in qSW5 resulted in a significant increase in sink size owing to an increase in cell number in the outer glume of the rice flower and finally increased the grain width and weight . The qSW5 was also reported to regulate GL, GLWR and GT . However, the effects of qSW5 on PGWC and DEC have not been reported before. Our results suggested that qSW5 could be used to increase GL and reduce DEC and PGWC simultaneously. The qGL3.1 (S03_16663793, S03_16731182, S03_16748937, S03_16762099, S03_16858510, S03_16996600 and S03_17000111) and qGLWR3.2 (S03_16858510) associated with GL and GLWR, were close to GS3 (16729228–16735641) affecting grain shape. GS3 functions as a negative regulator for grain size which is a major QTL for GL and TGW and a minor QTL for GW and GT . The qGL7.2 (S07_22844850) and qTGW7.2 (S07_22684516) were located in the region harboring the previously fine mapped QTL affecting GL, GW and GLWR, qGRL7.1 (22127.4Kb—24526.7Kb) . The qGLWR7.1 (S07_25383179) was located in the region of Srs1 (25381698–25389547) affecting grain size . The srs1 results in small and round seed due to the reduction in both cell length and cell number in the longitudinal direction, and the elongation of the cells in the lateral direction of the lemma. The qGL11.1 (S11_2576141) and qGLWR11.1 (S11_2576141) were close to CycT1;3 (2730143–2736649) whose down regulation results in a shorter grain.
Thirty-three of the identified QTL were located on chromosomal regions where no QTL for the measured traits have been reported and can be regarded as novel QTL. Five of the QTL including qDEC1.1 (qGLWR1.1), qBRR2.2 (qGL2.1), qTGW2.1 (qGL2.2), qGW11.1 (qMRR11.1) and qGL7.1 affected multiple traits with relatively large effects and/or were detected in multiple environments.
The number of QTL identified is small
Compared to the previous GWAS studies in rice the number of QTLs detected was small in the present study. Many of the known genes affecting traits measured were not detected in the present study. The most likely reason is the small population size. Most of the association panels had more than 400 accessions while the present panel only had 230 or fewer accessions. The effect of population size on GWAS is well known. It is particularly important for detecting loci with low MAF to use a large panel. Because the panel size was small markers with MAF <0.05 were filleted out before association analysis. The trait-affecting genes were missed out if markers in strong LD with them had low MAF in our panel. For example, the gene PGL2 (LOC_Os02g51320) was involved in controlling grain length and weight of rice through interaction with a typical bHLH protein APG. Marker S02_31422564 very close to this gene (1.4 kb) was removed before association analysis since its MAF was only 0.02. The big differences between environments in population size could also explain why most of the QTL (53%) were detected in SZ. Nevertheless, loci with large effects were also detected for some of the traits with small population size. For instance, the population size of GW in JZ is only 132, but qGW5.1 was identified with a high R2 of 16.2%. Therefore, provided that the MAF is high enough, major loci would still be detected even if the population size was small. Another possible reason is that the present panel consists of only indica accessions while most of the previous studies used panels with a few subspecicies/ecotypes. It is expected that genes that determine the major trait differences between subspecies/ecotypes could not be detected in our study but might be detectable in other studies. We also expected that more false positives were present when a panel with very diverse accessions was used. The use of a single subspecies allows controlling population structure at finer scale and as results reduces false positives. The critical significant threshold value used to declare significant association also influenced the analysis results. For instance, the qGW5.1 was found to be significant in JZ, SY and SZ but not in HZ. By inspecting the p values in HZ, it could be seen that the p values of S05_5368086, S05_5368151 and S05_5369527, were 0.00065, 0.00035 and 0.00014, respectively, which were only slightly higher than the applied critical significant threshold 10−4. It might be worth pointing out that the number of markers used in the present study was smaller than or similar to the reported GWAS in rice. However, the marker density was sufficient for most of the genomic regions based on LD decay pattern. Furthermore, the eight gaps devoid of markers below critical LD did not harbor any known genes affecting the studied traits. Therefore, the relatively low marker density was unlikely a major contributor to the small number of detected QTL. Nevertheless, the marker density might be too low in some chromosomal regions where LD decays more rapidly or breaks down due to recombination events . For instance, the GW2 (LOC_Os02g14720) controlling grain width and weight, was in the marker interval of S02_8069581—S01_8181973. However, we could not identify any markers associated with GW or TGW in this region, since LD was very low (about 0.02). Additional markers in such regions are clearly needed.
Wide ranges of genetic variations were present in the indica association panel for all 10 measured grain quality traits. The accessions could serve as sources of novel genes and alleles for improving rice appearance quality and milling quality traits. GWAS conducted using proper statistical models is a powerful method for elucidating the genetic basis of rice quality traits. Although it is generally true that MLM is better than the naïve model and GLM (Q) model for GWAS, the most appropriate model varies across traits and environments and must be chosen by comparing different models. MSD was found to be a simple but effective method for choosing the appropriate model in GWAS. Based on the selected best model for each of the trait-environment combinations, 38 QTL were identified for 10 rice grain appearance quality and milling quality traits including GL, GW, GLWR, GT, TGW, PGWC, DEC, BRR, MRR and HMRR. Eight genomic regions harbored QTL for more than one trait. Five QTL were found to concur with previously cloned genes or fine mapped QTL. Thirty-three were novel. Five of the novel QTL including qDEC1.1 (qGLWR1.1), qBRR2.2 (qGL2.1), qTGW2.1 (qGL2.2), qGW11.1 (qMRR11.1) and qGL7.1 affected multiple traits with relatively large effects and/or were detected in multiple environments. Our results provided an insight into the genetic architecture of rice grain appearance quality and milling quality traits. The novel QTL are important candidates for future functional characterization studies.
S1 Fig. Population structure analysis of 272 indica accessions.
(a) Log probability change (Δk) as function of number of subpopulations (STRUCTURE). (b) Gap statistics as function of number of clusters (Awclust). (c) Value of BIC versus number of clusters (adegenet).
S2 Fig. Population sub-structuring of 272 indica accessions based on 1083 SNP.
S3 Fig. Comparison of accession grouping of different statistical methods.
(a) STRUCTURE k = 3 and Awclust k = 3. (b) STRUCTURE k = 3 and adegenet k = 7. (c) STRUCTURE k = 3 and k = 7. (d) STRUCTURE k = 7 and adegenet k = 7.
S1 Table. Origin of 272 indica accessions of the association panel.
S2 Table. Distributions of markers on chromosomes.
S3 Table. Basic characteristics of LD pattern in the whole panel and three subpopulations.
S4 Table. Comparison of 16 association models based the mean squared difference (MSD) between the observed and expected p values for 40 traits (10)-environment (4) combinations.
This research was granted by Global Rice Science Partnership (GRiSP), National 863 High Technology Research and Development Program from the Ministry of Science and Technology of China (2014AA10A601), Shenzhen Peacock Plan, the Natural Science Foundation of China (31461143014, 31261140369), and Hubei Collaborative Innovation Center for Grain Industry.
Conceived and designed the experiments: JX GY. Performed the experiments: XQ ZY. Analyzed the data: YP GY. Contributed reagents/materials/analysis tools: MD DX ZL. Wrote the paper: YP GY.
- 1. Unnevehr L, Duff B, Juliano BO. Consumer Demand for Rice Grain Quality: Terminal Report of IDRC Projects, National Grain Quality (Asia), and International Grain Quality Economics (Asia): Int. Rice Res. Inst.; 1992.
- 2. Tan YF, Xing YZ, Li JX, Yu SB, Xu CG, Zhang Q. Genetic bases of appearance quality of rice grains in Shanyou 63, an elite rice hybrid. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2000;101(5–6):823–9. doi: 10.1007/s001220051549.
- 3. Cheng FM, Zhong LJ, Wang F, Zhang GP. Differences in cooking and eating properties between chalky and translucent parts in rice grains. Food Chemistry. 2005;90(1–2):39–46. http://dx.doi.org/10.1016/j.foodchem.2004.03.018.
- 4. Zhao X, Zhou L, Ponce K, Ye G. The Usefulness of Known Genes/Qtls for Grain Quality Traits in an Indica Population of Diverse Breeding Lines Tested using Association Analysis. Rice. 2015;8(1). doi: 10.1186/s12284-015-0064-3.
- 5. Yu S, Yang C, Fan Y, Zhuang J, Li X. Genetic dissection of a thousand-grain weight quantitative trait locus on rice chromosome 1. Chinese Science Bulletin. 2008;53(15):2326–32. doi: 10.1007/s11434-008-0281-x.
- 6. Singh R, Singh A, Sharma T, Singh A, Singh N. Fine mapping of grain length QTLs on chromosomes 1 and 7 in Basmati rice (Oryza sativa L.). Journal of Plant Biochemistry and Biotechnology. 2012;21(2):157–66. doi: 10.1007/s13562-011-0080-3.
- 7. Zhang W, Sun P, He Q, Shu F, Wang J, Deng H. Fine mapping of GS2, a dominant gene for big grain rice. The Crop Journal. 2013;1(2):160–5. doi: 10.1016/j.cj.2013.10.003.
- 8. Guo L, Ma L, Jiang H, Zeng D, Hu J, Wu L, et al. Genetic Analysis and Fine Mapping of Two Genes for Grain Shape and Weight in Rice. Journal of integrative plant biology. 2009;51(1):45–51. doi: 10.1111/j.1744-7909.2008.00793.x. pmid:19166493
- 9. Kato T, Segami S, Toriyama M, Kono I, Ando T, Yano M, et al. Detection of QTLs for grain length from large grain rice (Oryza sativa L.). Breeding Science. 2011;61(3):269–74. doi: 10.1270/jsbbs.61.269.
- 10. Zhou L, Chen L, Jiang L, Zhang W, Liu L, Liu X, et al. Fine mapping of the grain chalkiness QTL qPGWC-7 in rice (Oryza sativa L.). TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2009;118(3):581–90. doi: 10.1007/s00122-008-0922-0.
- 11. Bai X, Luo L, Yan W, Kovi MR, Zhan W, Xing Y. Genetic dissection of rice grain shape using a recombinant inbred line population derived from two contrasting parents and fine mapping a pleiotropic quantitative trait locus qGL7. BMC genetics. 2010;11:16. doi: 10.1186/1471-2156-11-16 pmid:20184774; PubMed Central PMCID: PMC2846863.
- 12. Xie X, Song MH, Jin F, Ahn SN, Suh JP, Hwang HG, et al. Fine mapping of a grain weight quantitative trait locus on rice chromosome 8 using near-isogenic lines derived from a cross between Oryza sativa and Oryza rufipogon. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2006;113(5):885–94. doi: 10.1007/s00122-006-0348-5 pmid:16850315.
- 13. Xie X, Jin F, Song MH, Suh JP, Hwang HG, Kim YG, et al. Fine mapping of a yield-enhancing QTL cluster associated with transgressive variation in an Oryza sativa x O. rufipogon cross. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2008;116(5):613–22. doi: 10.1007/s00122-007-0695-x pmid:18092146.
- 14. Oh J-M, Balkunde S, Yang P, Yoon D-B, Ahn S-N. Fine mapping of grain weight QTL, tgw11 using near isogenic lines from a cross between Oryza sativa and O. grandiglumis. Genes & Genomics. 2011;33(3):259–65. doi: 10.1007/s13258-011-0038-9.
- 15. Song XJ, Huang W, Shi M, Zhu MZ, Lin HX. A QTL for rice grain width and weight encodes a previously unknown RING-type E3 ubiquitin ligase. Nature genetics. 2007;39(5):623–30. doi: 10.1038/ng2014 pmid:17417637.
- 16. Mao H, Sun S, Yao J, Wang C, Yu S, Xu C, et al. Linking differential domain functions of the GS3 protein to natural variation of grain size in rice. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(45):19579–84. doi: 10.1073/pnas.1014419107 pmid:20974950; PubMed Central PMCID: PMC2984220.
- 17. Zhang X, Wang J, Huang J, Lan H, Wang C, Yin C, et al. Rare allele of OsPPKL1 associated with grain length causes extra-large grain and a significant yield increase in rice. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(52):21534–9. doi: 10.1073/pnas.1219776110. pmid:23236132
- 18. Shomura A, Izawa T, Ebana K, Ebitani T, Kanegae H, Konishi S, et al. Deletion in a gene associated with grain size increased yields during rice domestication. Nature genetics. 2008;40(8):1023–8. doi: 10.1038/ng.169 pmid:18604208.
- 19. Li Y, Fan C, Xing Y, Jiang Y, Luo L, Sun L, et al. Natural variation in GS5 plays an important role in regulating grain size and yield in rice. Nature genetics. 2011;43(12):1266–9. doi: 10.1038/ng.977 pmid:22019783.
- 20. Li Y, Fan C, Xing Y, Yun P, Luo L, Yan B, et al. Chalk5 encodes a vacuolar H+-translocating pyrophosphatase influencing grain chalkiness in rice. Nature genetics. 2014;46(4):398–404. doi: 10.1038/ng.2923. http://www.nature.com/ng/journal/v46/n4/abs/ng.2923.html#supplementary-information. pmid:24633159
- 21. Ishimaru K, Hirotsu N, Madoka Y, Murakami N, Hara N, Onodera H, et al. Loss of function of the IAA-glucose hydrolase gene TGW6 enhances rice grain weight and increases yield. Nature genetics. 2013;45(6):707–11. doi: 10.1038/ng.2612 pmid:23583977.
- 22. Song XJ, Kuroha T, Ayano M, Furuta T, Nagai K, Komeda N, et al. Rare allele of a previously unidentified histone H14 acetyltransferase enhandes grain weitht, yield, and plalnt biomass in rice. Proceedings of the National Academy of Sciences of the United States of America. 2014;112:76–81. doi: 10.1073/pnas.1421127112. pmid:25535376
- 23. Abe Y, Mieda K, Ando T, Kono I, Yano M, Kitano H, et al. The SMALL AND ROUND SEED1 (SRS1/DEP2) gene is involved in the regulation of seed size in rice. Genes Genet Syst. 2010;85:327–39. pmid:21317545
- 24. Wang Y, Xiong G, Hu J, Jiang L, Yu H, Xu J, et al. Copy number variation at the GL7 locus contributes to grain size diversity in rice. Nature genetics. 2015;47(8):944–8. doi: 10.1038/ng.3346. http://www.nature.com/ng/journal/v47/n8/abs/ng.3346.html#supplementary-information. pmid:26147619
- 25. Wang S, Li S, Liu Q, Wu K, Zhang J, Wang S, et al. The OsSPL16-GW7 regulatory module determines grain shape and simultaneously improves rice yield and grain quality. Nature genetics. 2015;47(8):949–54. doi: 10.1038/ng.3352 pmid:26147620.
- 26. Wang S, Wu K, Yuan Q, Liu X, Liu Z, Lin X, et al. Control of grain size, shape and quality by OsSPL16 in rice. Nature genetics. 2012;44(8):950–4. http://www.nature.com/ng/journal/v44/n8/abs/ng.2327.html#supplementary-information. doi: 10.1038/ng.2327. pmid:22729225
- 27. Qi P, Lin Y-S, Song X-J, Shen J-B, Huang W, Shan J-X, et al. The novel quantitative trait locus GL3.1 controls rice grain size and yield by regulating Cyclin-T1;3. Cell research. 2012;22(12):1666–80. http://www.nature.com/cr/journal/v22/n12/suppinfo/cr2012151s1.html. doi: 10.1038/cr.2012.151. pmid:23147796
- 28. Guo LB, Ye GY. Use of major quantitative trait loci to improve grain yield of rice. Rice Science. 2014;21(2):65–82. doi: 10.1016/S1672-6308(13)60174-2.
- 29. Zhang P, Liu X, Tong H, Lu Y, Li J. Association mapping for important agronomic traits in core collection of rice (Oryza sativa L.) with SSR markers. PloS one. 2014;9(10):e111508. doi: 10.1371/journal.pone.0111508 pmid:25360796; PubMed Central PMCID: PMC4216065.
- 30. Jiangbo Z. Association analysis of important agronomic traits in japonica rice germplasm. African Journal of Biotechnology. 2012;11(12). doi: 10.5897/ajb11.1912.
- 31. Borba TCdO, Brondani RPV, Breseghello F, Coelho ASG, Mendonça JA, Rangel PHN, et al. Association mapping for yield and grain quality traits in rice (Oryza sativa L.). Genetics and Molecular Biology. 2010;33(3):515–24. doi: 10.1590/S1415-47572010005000065. pmid:21637426
- 32. Agrama HA, Eizenga GC, Yan W. Association mapping of yield and its components in rice cultivars. Molecular Breeding. 2007;19(4):341–56. doi: 10.1007/s11032-006-9066-6.
- 33. Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nature genetics. 2010;42(11):961–7. doi: 10.1038/ng.695 pmid:20972439.
- 34. Huang X, Zhao Y, Wei X, Li C, Wang A, Zhao Q, et al. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nature genetics. 2012;44(1):32–9. doi: 10.1038/ng.1018 pmid:22138690.
- 35. Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nature communications. 2011;2:467. doi: 10.1038/ncomms1467 pmid:21915109; PubMed Central PMCID: PMC3195253.
- 36. Begum H, Spindel JE, Lalusin A, Borromeo T, Gregorio G, Hernandez J, et al. Genome-wide association mapping for yield and other agronomic traits in an elite breeding population of tropical rice (Oryza sativa). PloS one. 2015;10(3):e0119873. doi: 10.1371/journal.pone.0119873 pmid:25785447; PubMed Central PMCID: PMC4364887.
- 37. Yang W, Guo Z, Huang C, Duan L, Chen G, Jiang N, et al. Combining high-throughput phenotyping and genome-wide association studies to reveal natural genetic variation in rice. Nature communications. 2014;5:5087. doi: 10.1038/ncomms6087 pmid:25295980; PubMed Central PMCID: PMC4214417.
- 38. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2015.
- 39. Jr FEH. Hmisc: Harrell Miscellaneous. R package version 3.16–0. 2015.
- 40. Risterucci AM, Grivet L, N’Goran JAK, Pieretti I, Flament MH, Lanaud C. A high-density linkage map of Theobroma cacao L. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2000;101(5–6):948–55. doi: 10.1007/s001220051566.
- 41. Courtois B, Audebert A, Dardou A, Roques S, Ghneim-Herrera T, Droc G, et al. Genome-wide association mapping of root traits in a japonica rice panel. PloS one. 2013;8(11):e78037. doi: 10.1371/journal.pone.0078037 pmid:24223758; PubMed Central PMCID: PMC3818351.
- 42. Kawahara Y, dlB M., Hamilton JP, Kanamori H, McCombie WR, Ouyang S, et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice. 2013;6:1–10.
- 43. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25 pmid:19261174; PubMed Central PMCID: PMC2690996.
- 44. Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. American journal of human genetics. 2007;81(5):1084–97. doi: 10.1086/521987 pmid:17924348; PubMed Central PMCID: PMC2265661.
- 45. Pritchard JK, Stephens M, Donnelly P. Inference of Population Structure Using Multilocus Genotype Data. Genetics. 2000;155(2):945–59. pmid:10835412
- 46. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular ecology. 2005;14(8):2611–20. doi: 10.1111/j.1365-294X.2005.02553.x pmid:15969739.
- 47. Gao X, Starmer J. Human population structure detection via multilocus genotype clustering. BMC genetics. 2007;8:34. doi: 10.1186/1471-2156-8-34 pmid:17592628; PubMed Central PMCID: PMC1934381.
- 48. Jombart T, Ahmed I. adegenet 1.3–1: new tools for the analysis of genome-wide SNP data. 2011.
- 49. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–5. doi: 10.1093/bioinformatics/btm308 pmid:17586829.
- 50. VanRaden PM. Efficient methods to compute genomic predictions. Journal of dairy science. 2008;91(11):4414–23. doi: 10.3168/jds.2007-0980 pmid:18946147.
- 51. Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ, et al. GAPIT: genome association and prediction integrated tool. Bioinformatics. 2012;28(18):2397–9. doi: 10.1093/bioinformatics/bts444 pmid:22796960.
- 52. Phung NT, Mai CD, Mournet P, Frouin J, Droc G, Ta NK, et al. Characterization of a panel of Vietnamese rice varieties using DArT and SNP markers for association mapping purposes. BMC plant biology. 2014;14:371. doi: 10.1186/s12870-014-0371-7 pmid:25524444; PubMed Central PMCID: PMC4279583.
- 53. Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature genetics. 2006;38(2):203–8. doi: 10.1038/ng1702 pmid:16380716.
- 54. Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, et al. Mixed linear model approach adapted for genome-wide association studies. Nature genetics. 2010;42(4):355–60. doi: 10.1038/ng.546 pmid:20208535; PubMed Central PMCID: PMC2931336.
- 55. Stich B, Mohring J, Piepho HP, Heckenberger M, Buckler ES, Melchinger AE. Comparison of mixed-model approaches for association mapping. Genetics. 2008;178(3):1745–54. doi: 10.1534/genetics.107.079707 pmid:18245847; PubMed Central PMCID: PMC2278052.
- 56. Huang R, Jiang L, Zheng J, Wang T, Wang H, Huang Y, et al. Genetic bases of rice grain shape: so many genes, so little known. Trends in plant science. 2013;18(4):218–26. doi: 10.1016/j.tplants.2012.11.001 pmid:23218902.
- 57. Xu F, Sun X, Chen Y, Huang Y, Tong C, Bao J. Rapid identification of major QTLs associated with rice grain weight and their utilization. PloS one. 2015;10(3):e0122206. doi: 10.1371/journal.pone.0122206 pmid:25815721; PubMed Central PMCID: PMC4376791.
- 58. Zheng TQ, Xu JL, Li ZK, Zhai HQ, Wan JM. Genomic regions associated with milling quality and grain shape identified in a set of random introgression lines of rice (Oryza sativa L.). Plant Breeding. 2007;126(2):158–63. doi: 10.1111/j.1439-0523.2007.01357.x.
- 59. McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Ulat VJ, et al. Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(30):12273–8. doi: 10.1073/pnas.0900992106 pmid:19597147; PubMed Central PMCID: PMC2718348.
- 60. Mather KA, Caicedo AL, Polato NR, Olsen KM, McCouch S, Purugganan MD. The extent of linkage disequilibrium in rice (Oryza sativa L.). Genetics. 2007;177(4):2223–32. doi: 10.1534/genetics.107.079616 pmid:17947413; PubMed Central PMCID: PMC2219496.
- 61. Pasam RK, Sharma R, Malosetti M, van Eeuwijk FA, Haseneyer G, Kilian B, et al. Genome-wide association studies for agronomical traits in a world wide spring barley collection. BMC plant biology. 2012;12:16. doi: 10.1186/1471-2229-12-16 pmid:22284310; PubMed Central PMCID: PMC3349577.
- 62. Jin L, Lu Y, Xiao P, Sun M, Corke H, Bao J. Genetic diversity and population structure of a diverse set of rice germplasm for association mapping. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2010;121(3):475–87. doi: 10.1007/s00122-010-1324-7 pmid:20364375.
- 63. Flint-Garcia SA, Thornsberry JM, Buckler ESt. Structure of linkage disequilibrium in plants. Annual review of plant biology. 2003;54:357–74. doi: 10.1146/annurev.arplant.54.031902.134907 pmid:14502995.
- 64. Schwarz G. Estimating the Dimension of a Model. 1978:461–4. doi: 10.1214/aos/1176344136.
- 65. Yan S, Zou G, Li S, Wang H, Liu H, Zhai G, et al. Seed size is determined by the combinations of the genes controlling different seed characteristics in rice. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2011;123(7):1173–81. doi: 10.1007/s00122-011-1657-x pmid:21805338.