Improvement of Rice Biomass Yield through QTL-Based Selection

Biomass yield of rice (Oryza sativa L.) is an important breeding target, yet it is not easy to improve because the trait is complex and phenotyping is laborious. Using progeny derived from a cross between two high-yielding Japanese cultivars, we evaluated whether quantitative trait locus (QTL)-based selection can improve biomass yield. As a measure of biomass yield, we used plant weight (aboveground parts only), which included grain weight and stem and leaf weight. We measured these and related traits in recombinant inbred lines. Phenotypic values for these traits showed a continuous distribution with transgressive segregation, suggesting that selection can affect plant weight in the progeny. Four significant QTLs were mapped for plant weight, three for grain weight, and five for stem and leaf weight (at α = 0.05); some of them overlapped. Multiple regression analysis showed that about 43% of the phenotypic variance of plant weight was significantly explained (P < 0.0001) by six of the QTLs. From F2 plants derived from the same parental cross as the recombinant inbred lines, we divergently selected lines that carried alleles with positive or negative additive effects at these QTLs, and performed successive selfing. In the resulting F6 lines and parents, plant weight significantly differed among the genotypes (at α = 0.05). These results demonstrate that QTL-based selection is effective in improving rice biomass yield.


Introduction
By 2050, we will need to feed two billion more people than at present [1,2]. How can we accomplish that task? Rice (Oryza sativa L.) is the staple food for more than half of the world's population, particularly in Asia (Ricepedia, http://ricepedia.org/rice-as-food/the-global-staplerice-consumers). In some Asian countries such as South Korea and Japan, rice is used for multiple purposes, including flour, livestock feed (including whole-crop silage), biofuel, and conservation of paddy fields. For these reasons, further improvement of rice yield potential, including biomass, is a major challenge for breeders and geneticists. To our knowledge, however, little is known about the genetic architecture of rice biomass yield, except for the semidwarfing 1 gene [3,4]. Similar to animal quantitative traits [5], rice biomass yield (including grain yield) is governed by quantitative trait loci (QTLs) with small additive effects and greatly varies among individuals in variable and nonuniform environments. For yield and other traits under this type of genetic control (many QTLs with small effects), phenotypic values are continuously distributed in segregating populations and do not show simple Mendelian inheritance [6,7].
More than 20 years ago, it was suggested that marker-assisted selection (MAS) would allow integration of molecular genetics and conventional phenotype-based selection [8]. Indeed, MAS is now widely used in rice breeding for improvement of traits, such as disease and insect resistance, for which the phenotypes can be associated with the genotypes of DNA markers [9,10]. Recent advances in rice genomics are improving our understanding of the evolution and function of the rice genome and are facilitating rice improvement [11][12][13][14]; maximizing the use of such genomic data is necessary to continue increasing rice biomass yield.
Using single-nucleotide polymorphisms detected by high-throughput resequencing, we are currently assessing the effectiveness of genomics-assisted selection approaches such as QTL analysis, association study, genomic estimation of breeding value, and haplotype analysis based on pedigree, in the improvement of rice biomass yield [15,16]. In the present study, we first analyzed QTLs for biomass yield and related traits using recombinant inbred lines (RILs) derived from a cross between the high-yielding cultivars 'Tachisugata' (TS) and 'Hokuriku 193' (H193), which are used mainly for livestock feeding. Both cultivars were produced by inter-subspecific crosses between O. sativa ssp. japonica and indica cultivars and not only produce high grain yield but also have large stems and leaves [16][17][18]. Using a large F 2 population, we then divergently selected lines that carried alleles with positive or negative additive effects at several of the detected QTLs. Finally, we compared the biomass yield of the selected lines to evaluate the effectiveness of QTL-based selection in improving biomass yield.

Plant materials and experimental design
The experimental design is depicted in Fig 1. Self-progeny derived from a cross between TS and H193 were used. For QTL mapping, a set of F 5 individuals (N = 191) was genotyped and then selfed to produce RILs. The trait value of a genotyped individual was estimated as the mean value of the resulting F 6 family (i.e., F 5:6 design). For phenotypic evaluation, RILs, parents, and their F 1 progeny were grown in two replications (three rows of 14 individuals each per plot) in summer in a paddy field (Tsukubamirai, Japan). Seeds were sown in late April and 30-day-old seedlings were transplanted in the field with a spacing of 15 cm between plants within each row and 30 cm between rows. To avoid border effects, only the middle 10 individuals in the central row of each plot were analyzed.
To evaluate the effect of QTL-based selection on biomass yield, we raised more F 2 individuals (N = 468) of the same TS × H193 cross and selected lines that carried alleles with positive or negative additive effects at detected QTLs (Fig 1). Selected lines were selfed by single-seed descent up to the F 6 generation (Fig 1). Each plot had four rows (14 individuals per row), and only the middle individuals (10 per row) in the central two rows of each plot were analyzed.

Genotyping
Genomic DNA was extracted from 2-month-old seedlings by the cetyltrimethylammonium bromide method [19]. From a previously published single nucleotide polymorphism (SNP) data set [16,20]; Q-TARO database, http://qtaro.abr.affrc.go.jp/, a subset of 192 SNPs polymorphic between TS and H193 was selected and used for genotyping. In this subset, 175 SNPs were available for QTL analysis (S1 File). For QTL-based selection, a subset of 384 SNPs was used for genotyping, of which 344 SNPs were available in the F 2 population [21]. In the F 6 generation, QTL genotypes of the selected F 2 -derived lines were confirmed by determining the genotypes of the nearest SNP markers by direct sequencing of PCR products (S1 Table). Genotyping was performed on an Illumina GoldenGate BeadArray platform (Illumina Inc., San Diego, CA, USA).

Phenotyping
Three biomass traits were evaluated: plant weight (aboveground parts only, PW, g), grain weight (GW, g), and stem and leaf weight (SLW, g). For each trait, 10 mature plants were bulked, dried for 48 h at 80°C, and weighed.
The following 11 morphological and physiological traits were also evaluated: harvest index (HI, %), culm length (CL, cm), panicle length (PL, cm), flag leaf length (FLL, cm), panicle number (PN), spikelet number per panicle (SN), 1000-grain weight (1000GW, g), spikelet fertility (SF, %), chlorophyll content (SPAD value), days to heading (DTH), and nonstructural carbohydrate content (NSC, %). HI was calculated as GW/PW × 100. For CL, PL, FLL, PN, SN, SF, and SPAD, phenotypic values of 10 plants per RIL were used to calculate means. For 1000GW, first, 100-grain weight was measured with four replications and then the mean was transformed to 1000-grain weight. DTH was scored as days from germination to heading (when 5 of the 10 plants had headed). For QTL and other analyses, means of two replications for each trait were used. NSC was measured according to [22]. A five-plant bulk (from another one of the three rows described above) was harvested at the yellow-ripening stage (approximately 30 days after heading) and was used to determine NSC.

QTL analysis
A linkage map was constructed using RILs (N = 191). Linkage order and genetic distances of 175 marker loci were calculated with MAPMAKER/Exp 3.0 [23]. Residual heterozygotes were considered as missing data.
QTL analyses were performed using composite interval mapping as implemented in WinQTL Cartographer version 2.5 [24] with a significance level of α = 0.05. QTLs were added using forward and backward regression with the standard model (model 6) for up to five control markers. A window size of 10 cM was used with a walk speed of 2 cM. Significant LOD scores were assigned for each trait following permutation tests with 1000 replicates [25]. Box-Cox (for GW, HI, CL, FLL, 1000GW, and DTH) or arcsine (for SF) transformations were conducted for phenotypic data where normality was rejected by the Shapiro-Wilk test (S1 Fig) [26].

Statistical analysis
Phenotypic correlations among the biomass-related traits were evaluated using Pearson's correlation. The significance of each correlation was determined using t test with control of the false discovery rate for multiple tests [27]. Principal component analysis on the correlation matrix of line means for the 11 traits were also conducted.
To determine the total variation of PW explained by the significant QTLs for PW, GW, and SLW, multiple regression analysis of the first two principal components (PC1 and PC2) was performed. The minimum corrected Akaike information criterion was used to choose the best model. Path diagrams were generated according to Sokal and Rohlf [26].
To test whether QTL × QTL interaction was involved in the phenotypic variation of each trait in RILs, we performed two-way analysis of variance for all traits and PCs, using the genotypes of the SNPs nearest to the detected QTLs. Significance levels were corrected on the basis of the false discovery rate (α = 0.05) for multiple testing according to the number of interaction tests [27]. Significance of the difference for each trait value between genotypes was determined by the Tukey-Kramer HSD test.
All statistical analyses were performed using JMP 9 software (SAS Institute Inc., Cary, NC, USA

Phenotypic variations of traits
Phenotypic variation of 14 traits in the parents, their F 1 progeny, and RILs are summarized in Table 1. For PW, GW, and SLW (traits directly associated with biomass yield), there were no significant differences between the means of the parents. However, F 1 plants showed larger values of these traits than their parents, suggesting that at least some QTLs underlying these traits are different between the parents. In contrast, the differences in CL, PL, PN, SN, 1000GW, SF, DTH, and NSC between the parents were significant. Transgressive segregation was evident in RILs for all evaluated traits, supporting the hypothesis that different alleles at the QTLs contribute to the traits in the two parents (Table 1,

Phenotypic relationships between traits
To investigate the relationships between the 14 traits, Pearson's correlation coefficients between them were calculated for RILs (Fig 2A, S2 Table). All traits except SPAD were significantly correlated with at least one biomass trait (PW, GW, or SLW). Positive correlations between CL, PL, and FLL were marked (r > 0.5). SPAD was positively correlated with DTH and NSC and negatively correlated with CL, FLL, and SF.
It was not unexpected that significant correlations between PW and GW (r = 0.546, P < 0.0001) and between PW and SLW (r = 0.805, P < 0.0001) were observed, because PW is the sum of GW and SLW. Of note is the closer relationship between PW and SLW than between PW and GW (S2 Table), which was also confirmed by comparison of standardized partial regression coefficients from multiple repression (PW = 0.594 × GW + 0.841 × SLW-2.67E -17, R 2 = 0.999, P < 0.0001, Fig 2B). No significant correlation was found between GW and SLW (Fig 2A, S2 Table).
We also performed principal component analysis to evaluate phenotypic integration among traits other than PW, GW, and SLW ( Fig 2C). PC1 and PC2 accounted for 26.2% and 23.2% of the variation for RILs, respectively. PC1 was a vector for CL, PL, FLL, PN, and SN, whereas PC2 was a vector for HI, SF, DTH, and NSC.

QTL clusters
When one-LOD confidence intervals of three or more QTLs overlapped, we considered these QTLs to be a cluster. Seven QTL clusters were found in RILs on Chrs 1-3, 6, and 10, with two clusters on each Chr 2 and 3 (Fig 3). The presence of QTL clusters is supported by the observation that almost all clusters contained a QTL for a principal component (i.e., an integrated trait); the only exception was a cluster on Chr 3 (near ah03002520, at 35.1 Mb).

Tests for QTL × QTL interactions
No significant epistatic interaction was detected between QTLs for the 13 traits, including PW, GW, and SLW, suggesting additive relationships between these QTLs. Only one pair of QTLs for CL, consisting of the QTL on Chr 1 (aa01010927) and the one on Chr 6 (aa06001097), showed a significant epistatic interaction (P = 0.0067, Fig 3).

Candidate genes
Since QTL mapping was performed using only small numbers of RILs and markers (both <200), it was difficult to determine the relationships between detected QTLs and known genes. Still, the relationships between traits and QTL positions suggest some candidate genes. QTLs for CL, PL, and FLL (and possibly for PW, SN, SPAD, and DTH) in a cluster detected on Chr 1, in which the LOD peaks were detected between aa01010927 and aa01010816 (the nearest markers), may correspond to the semi-dwarf 1 (sd1) locus at 40.1 Mb ( Table 2, Fig 3) [3,4].
Comparison of the sizes of PCR products followed by partial sequencing of the gene revealed that TS carries a functional allele, whereas H193 carries a nonfunctional allele with a critical deletion in the gene, consistently with the direction of the additive effects of the respective alleles (S2 Fig).
The QTL for 1000GW on Chr 2 may correspond to the GW2 locus at 8.2 Mb [28] and that for DTH on Chr 10 to the Early heading date 1 locus at 17.6 Mb [29]; Table 2, Fig 3).

Combined effect of detected QTLs on PW
To assess the combined effect of detected QTLs on PW in RILs, we performed multiple regression analysis. First, eight QTLs for PW (on Chrs 1-3), GW (on Chr 3 and 10), and SLW (two on Chrs 1 and one on Chr 5), and two QTLs for PC1 (on Chrs 2 and 6) were chosen. In total, 10 SNP markers nearest to the LOD peak of each QTL were analyzed (red arrows in Fig 3). Six of the markers, which were selected as variables in the best-fitting model chosen using the corrected Akaike information criterion (filled red arrows in Fig 3, Fig 4A), accounted for 43.3% of the phenotypic variation for PW (P < 0.0001). Indeed, lines selected from RILs on the basis of the allele types at the six QTLs were distributed according to their genotypes ( Fig 4B); the mean PW values of the selected lines (1036.0 ± 91.9 g for alleles with a positive effect and 810.5 g for alleles with a negative effect) and their parents (852.2 ± 42.1 g for TS and 970.9 ± 154.3 g for H193) were also distinguishable from one another, as expected (Fig 4C).

Evaluation of QTL-based selection for PW
From F 2 plants, we divergently selected lines that carried alleles with positive and negative additive effects at the six QTLs and compared PW between F 6 progeny of the selected lines and their parents (S3 Fig, Fig 5). It should be noted that we did not select one of the QTLs (aa10000954) in the negatively selected lines (Fig 5).
Significant divergence was observed between the mean PW values of the positively (2169.0 ± 125.0 g) and negatively (1563.6 ± 108.2 g) selected lines; moreover, the mean value of positively selected lines was significantly higher than those of TS (1617.9 ± 133.3 g) and H193 (1916.1 ± 87.3 g) (Fig 5).
Divergent selection affected SLW, CL, PL, SN, and DTH in the same manner as it affected PW. However, it showed adverse effects on HI, PN, 1000GW, and SF, in that the values of negatively selected lines were higher than those of positively selected lines (S4 Fig). GW was not affected by divergent selection. Thus, the improvement of PW (Fig 5) appeared to result from the increase of SLW rather than that of GW (S4 Fig).

Discussion
In this study, we showed that classical QTL analysis has a high ability to map phenotypes to genotypes, and that QTL-based selection is effective in increasing rice PW, even when two high-yielding cultivars are used to develop RILs.
We found significant phenotypic correlations between traits. Multiple traits were significantly correlated with one or more biomass traits (PW, GW, and SLW; Fig 2A, S2 Table). Our QTL analysis revealed the presence of QTL clusters; that is, colocalization of QTLs for multiple  QTL-Based Selection for Rice Biomass traits (Fig 3). QTL clusters are generally thought to result from pleiotropy of a single QTL or from tightly linked QTLs for multiple traits, either of which cases would result in genetic correlation [30,31]. Thus, the phenotypic correlations we found may, at least in part, result from genetic correlations. The presence of a QTL for the target trait will be more reliable if multiple traits are analyzed; moreover, such QTLs can be reliable candidate markers for selection aimed at improving the target trait.
In contrast, because of genetic correlation, it seems likely that a tradeoff between traits may result in hidden genetic variation in PW. Both GW and SLW are component traits of PW, and HI and NSC play important roles in determining GW and SLW ( [32] for HI; [22] for NSC). Our analysis of phenotypic variation showed positive correlations between GW and HI and between SLW and NSC, but negative correlations between GW and NSC, HI and SLW, and HI and NSC (Fig 2A, S2 Table). Such phenotypic relationships may be accounted for, in part, by the presence of QTL clusters. For example, in the QTL cluster on Chr 3 (aa03002208 as a representative marker), H193 alleles had a positive additive effect for the QTLs for GW and HI, but TS alleles had a positive additive effect for the QTLs for SLW and NSC (Table 2, Fig 3); thus, this QTL region may not account for PW variation in the multiple regression ( Fig 4A). It should be noted that these results imply that GW and SLW are potentially interconnected through other traits such as HI and NSC, despite the absence of a significant correlation between GW and SLW in RILs, suggesting additive contributions of these traits to PW (Fig 2A).
By enabling measurements of bulks of individuals (or families) with replication, the use of biparental RILs was helpful in biomass yield phenotyping, which must detect QTLs with small effects. Thus, QTL-based selection will be dependable for improving rice biomass yield if biparental crosses are used.
In conventional rice breeding programs, crosses between a large number of varieties or candidate lines are conducted and the progeny are subjected to selection. Considering that multiple alleles are introduced and that breeders are interested in alleles associated with desirable phenotypes in diverse varieties, genome-wide association studies (GWAS) may be more suitable for QTL mapping than QTL analysis. GWAS have succeeded in detecting QTLs for agriculturally important traits in rice [16,33,34]. However, one should keep in mind that GWAS have some weak points such as the detection of false-positive or negative QTLs in the presence of population structure and low power to detect rare alleles in mapping populations because the detection method depends on allele frequency [35,36]. The development of well-designed mapping populations known as nested association mapping (NAM) populations and multiparent advanced generation intercross (MAGIC) populations, which segregate for multiple alleles, is expected to solve these problems [37,38]. As with GWAS, haplotype analysis based on pedigree, which employs mapping populations with multiple alleles, can help us to find haplotype blocks or genomic regions selected by breeders [15,20].
It has been recognized that because there is no one-size-fits-all approach in QTL mapping that is applicable to all mapping populations, the most suitable approach should be applied in each study [39]. If possible, the QTL effect should be validated by a combination of different approaches; for example, biparental QTL analysis after GWAS or haplotype analysis will provide additional information useful for the improvement of the target trait [39].
Another approach for genomics-assisted selection, genomic selection (GS), is based on genomic breeding values. Initially proposed for livestock breeding, GS has been recently extensively evaluated for crop improvement [40,41]. Genomic breeding values are calculated as the sum of the effects of genetic markers across the entire genome of individuals in a reference (training) population; this approach potentially captures all QTLs that contribute to phenotypic variation in a trait, even if they have minor effects. More recently, GS has yielded promising results in rice, although the phenotypic performance of selected lines has not yet been validated [42][43][44].
Finally, the connection between genotypes and phenotypes is the most important key in all genomics-assisted selection approaches. Whereas current advances in rice genomic research have facilitated genotyping at the whole-genome level, phenotyping of complex traits, such as biomass yield will remain a bottleneck, because it is laborious and field management is difficult. As Poland [45] has pointed out, in field-based research, collaboration between genomics and conventional breeding programs, which always excel in phenotyping of multiple traits (such as biotic and abiotic stress tolerance and grain quality) and in multiple environments (locations and years), is essential to further accelerate genotype-to-phenotype mapping of agriculturally important traits such as biomass yield. Electrophoretic profile of PCR products. PCR analyses were performed using two pairs of primers: sd1_1F (CAGACAGCTCGCCCTGCA) and sd1_1R (CTGTTGCTTCGAAGCAGAGG) (the present study), and sd1-del-1U (ACGGGTTCTTCCAGGTGTC) and sd1-del-1L (CTGCTGTCCGCGAAGAACTC) [4]. M, marker. (C) Schematic representation of a deletion in the 'Hokuriku 193' allele. Sequence analyses were performed by direct sequencing of the products of nested PCR (PCR products that were amplified using a pair of sd1_1F/sd1_1R primers were used as templates for PCR using a pair of sd1_del_1U/sd1_del_1R primers).  Table. The Pearson's correlation coefficients (above the diagonal) between traits and their corresponding P values (below the diagonal).