Multifamily QTL analysis and comprehensive design of genotypes for high-quality soft wheat

Milling properties and flour color are essential selection criteria in soft wheat breeding. However, high phenotypic screening costs restrict selection to relatively few breeding lines in late generations. To achieve marker-based selection of these traits in early generations, we performed genetic dissection of quality traits using three doubled haploid populations that shared the high-quality soft wheat variety Kitahonami as the paternal parent. An amplicon sequencing approach allowed effective construction of well-saturated linkage maps of the populations. Marker-based heritability estimates revealed that target quality traits had relatively high values, indicating the possibility of selection in early generations. Taking advantage of Chinese Spring reference sequences, joint linkage maps of the three populations were generated. Based on the maps, multifamily quantitative trait locus (QTL) analysis revealed a total of 86 QTLs for ten traits investigated. In terms of target quality traits, 12 QTLs were detected for flour yield, and 12 were detected for flour redness (a* value). Among these QTLs, six for flour yield and nine for flour a* were segregating in more than two populations. Some relationships among traits were explained by QTL collocations on chromosomes, especially group 7 chromosomes. Ten different ideotypes with various combinations of favorable alleles for the flour yield and flour a* QTLs were generated. Phenotypes of derivatives from these ideotypes were predicted to design ideal genotypes for high-quality wheat. Simulations revealed the possibility of breeding varieties with better quality than Kitahonami.


Introduction
Wheat (Triticum aestivum L.) is a global food crop and is consumed mainly in the form of baked products. Since high end-use quality is essential for the economic value of wheat and Recently, the first reference genome sequences of Chinese Spring (CS) wheat were released [27]. The information provides us with a precise comparison of QTL positions among mapping populations. Furthermore, taking advantage of next-generation sequencing technology, cost-effective and robust strategies involving amplicon sequencing of multiplex samples were established for hexaploid wheat [28,29]. These methods enable us to construct linkage maps and to perform marker-assisted selection (MAS) for a moderate number of markers in a short period of time. Since the method is flexible for selecting markers, it is possible to construct several linkage maps with as many common markers as possible very efficiently.
In this study, we used three doubled haploid populations derived from crosses between three breeding lines and a common variety, Kitahonami, which is a leading variety in the Hokkaido region of Japan. The objective of this study was to construct linkage maps of the three populations using amplicon sequencing and to determine whether common QTLs could be detected in the three mapping populations. Furthermore, we estimated the probability of segregation of each QTL in each population by a Bayesian method for jointly analyzing three populations. Based on the results, we constructed a comprehensive breeding design of genotypes that show superior quality by combining favorable QTLs from four parental varieties.

Plant materials
Four soft winter wheat varieties, Kitahonami, Kinuhime, Shunyou and Tohoku224, were used in this study. Kitahonami was released in the Hokkaido prefecture of Japan in 2006 and has become a leading variety in Japan. The variety shows superior milling properties and high noodle-making quality. Tohoku224, which was later released as Yukiharuka, is adapted to the northeastern region of Japan, while Kinuhime and Shunyou are adapted to the central region of Japan. The last three varieties show relatively poor milling properties compared to Kitahonami. We developed three F 1 -derived doubled haploid mapping populations for genetic analysis: Kinuhime/Kitahonami (KK), Shunyou/Kitahonami (SK) and Tohoku224/Kitahonami (TK). Each population consisted of 188 lines. The four parental varieties were grown with two replicates under field conditions in Morioka, Iwate, Japan (39.7˚N, 141.1˚E), during four successive cropping seasons from 2010/2011 to 2013/2014. The plot size was 1.5 m x 0.7 m, and each plot consisted of 25 plants separated from each other by 12 cm. The three mapping populations were grown under the same plot size as the parental varieties but without replication. Each doubled haploid line (DHL) was subjected to a field trial for at least two cropping seasons.

Trait evaluations
The heading dates of the accessions were recorded in all experimental plots. To compare data among cropping seasons, heading dates were converted into days to heading (DH) from May 1st of each year. Grain samples harvested from field trials were subjected to quality analysis. For each field plot, 100 grams of clean grain was tempered to a 14.5% moisture content and milled using a Quadrumat Junior instrument (C.W. Brabender Instrument Inc., South Hackensack, NJ). Milling was performed at a speed of 21.4 g/min and with a silk 72GG filter attached as a reel sieve. During the milling process, flour was divided into faster and slower halves in the flour drawer of the equipment. The faster and slower flours were called "A flour" and "B flour", respectively. Flour yield (FlYd) was calculated as the percentage of total flour weight (A flour + B flour) relative to sample weight (A flour + B flour + bran). Flour efficiency (FlEf) was obtained by the percentage of A flour weight relative to total flour weight (A flour + B flour). The A flour was subjected to an additional sieving step using a stainless steel 212 μm testing sieve (Tokyo Screen Co. Ltd., Japan). Sieved A flour was subjected to evaluation of color values using a ZE-6000 meter (Nihon Denshioku, Japan) based on 3-dimensional color values with the following rating scale: L � value (FlL) for whiteness (100: white, 0: black), a � value (Fla) for red-green chromaticity (+60: red, -60: green), and b � value (Flb) for yellowness (+60: yellow, -60: blue). A six-gram flour sample was combined with 10 ml of distilled water to form a paste, which was then mixed well without bubbling. Flour paste was poured into a Petri dish for ZE-6000 analysis, and the three color parameters were measured with the illuminant: C and angle: 2˚settings. In addition, particle size distribution was measured using a HELOS Particle Size Analyzer (Sympatec GmbH, Clausthal-Zellerfeld, Germany) as follows: flour particle specific area (Sm) and median size (x50) were used as representative parameters of particle characteristics. Flour protein content (FPC) was measured using near-infrared spectroscopy with an Infratec 1241 instrument (FOSS, Hilleroed, Denmark) and adjusted to a 13.5% moisture content. Flour ash content (Fash) was determined by combustion at 600˚C for 4 hours. All traits, except for the milling properties, were measured twice, and arithmetic means were used as the trait values for each sample.

Genotyping and linkage map construction
DNA was extracted from ground leaf tissue using a PI-50α automated DNA extraction system (Kurabo, Japan). Publicly available simple sequence repeat (SSR) markers (GrainGenes 3.0, https://wheat.pw.usda.gov/GG3/) were used for screening polymorphisms among the four parental varieties using capillary electrophoresis with a QIAxcel system (QIAGEN, Hilden, Germany), and polymorphic SSR markers were used to genotype DHLs with the same equipment. To increase the number of genetic markers, approximately 3,000 originally developed amplicon sequencing markers were used (S1 Table). Based on the physical positions and genome specificity of the markers, we selected 500-600 markers for genotyping in each population. Genotyping via amplicon sequencing was performed following the protocol described in Ishikawa et al. [29]. We also used established diagnostic markers, such as Wx-A1, Wx-B1, MFT (Mother of FT), Vp1-A1, Ppd-A1, Ppd-D1, Psy-B1, Vrn-A1 and Vrn-D1 (reviewed in Liu et al. [30]). Linkage maps of the three populations were separately constructed with MapDisto v1.3.5 [31]. Linkage groups were identified using a minimum logarithm of odds (LOD) score of 4 and a maximum recombination fraction of 0.30. Recombination fractions were converted into centimorgan (cM) map distances using the Kosambi mapping function. Taking advantage of International Wheat Genome Sequencing Consortium (IWGSC) CS reference sequences [27], common linkage maps of the three populations were constructed. We selected 860 single nucleotide polymorphism (SNP) markers, and the physical positions of the markers were estimated by a BLAST search of marker sequences against the reference sequence. Based on the positions of the markers, the genetic distances were recalculated while considering all DHLs.

Statistical analysis and Bayesian QTL mapping
All statistical analyses were performed using the R platform [32]. Analysis of variance and linear models were implemented with the base package of R. Significance test of correlation coefficients were performed by the 'cor.mtest' function in the 'corrplot' package. P-values of correlation coefficients were obtained via multiple comparison tests for all pairs of traits. Marker-based heritability estimation was performed using the 'heritability' package. To construct a relatedness matrix among DHLs, we used the 'kin' function in the 'synbreed' package with the 'realized' method. For each trait, the phenotypic value of each DHL was corrected by removing the effects of cropping season, and the phenotypic values were then averaged over the replicates of each DHL. This modified phenotypic value was used for QTL analysis. Three DHL populations were separately and jointly analyzed with a Bayesian QTL mapping method using the model described by Hayashi and Iwata [26]. In short, the model assumed biallelic QTLs with an allele, Q, derived from Kitahonami and an alternative allele, q, and was written as follows: where y ij was the modified phenotypic value of the jth DHL in the ith population; μ i was the mean of the ith population; N was the number of QTLs fitted in the model; u ijl was a variable indicating the QTL genotype of the jth DHL in the ith population, taking a value of 1 and -1 for QTL genotype QQ and qq, respectively; a l was the effect of the lth QTL; s il was a variable indicating the segregation of the lth QTL in the ith population, with values of 1 and 0 indicating QTL segregation and no QTL segregation in the ith population, respectively; and e ij was the residual, which followed N(0,σ e 2 ), with σ e 2 being the residual variance. These model parameters, including N, were estimated through Bayesian model fitting, where their posterior distributions were constructed with a Markov chain Monte Carlo (MCMC) procedure. Specifically, for QTL detection, a total linkage map (whole genome) was divided into equal intervals of 1 cM, and a putative QTL with some effect was located in a randomly selected interval to assess whether or not the QTL located in that interval was fitted in the model. The posterior probability of QTL existence was calculated for each interval as the relative frequency of the MCMC samples where a QTL located in the interval was fitted in the model of all MCMC samples. Accordingly, the number of QTLs, N, was also inferred with Bayesian estimation following Sillanpaa and Arjas [33]. For more details on the Bayesian procedure applied to model (1), see Hayashi and Iwata [26]. As a result of this Bayesian estimation, QTL information was obtained for each interval, including the posterior probability of QTL existence and the magnitude of the QTL effect, which provided evidence of the presence of a QTL in that interval. A QTL was declared significant on a chromosome when the QTL intensity, defined as the sum of posterior probabilities of QTL existence over all intervals on the chromosome [26,33], exceeded the predetermined threshold. The threshold value for QTL intensity corresponding to a genome-wide 5% significance level was determined with 100 repetitions of permutation tests, where the maximum QTL intensity over all chromosomes was obtained by analyzing permutated phenotypic data every repetition and the fifth-highest QTL intensity value among the 100 repetitions was adopted as a threshold. When the QTL intensity of a chromosome exceeded twice the threshold, two different QTLs were assumed to exist on the chromosome, where the boundary between the two QTL regions was delimited such that the sum of the posterior probabilities over intervals in each QTL region exceeded the threshold. For each significant QTL, the estimated position and effect were calculated as the weighted averages of QTL positions (intervals) and QTL effects in a QTL region, using the posterior probability of each interval as a weight. Moreover, the posterior probability of QTL segregation in each DHL population was calculated as the relative frequency of s il taking a value of 1 among all MCMC samples for each l (l = 1, 2, 3).

Construction of the prediction model and cross-validation
To investigate the predictability of trait values from genotypes of DHLs, six-fold cross-validation was performed in the following manner: DHLs of the three populations were randomly partitioned into six equally sized groups, each including 94 DHLs. Five of the groups were used as a training set to construct the prediction model. The model was applied to predict trait values of the remaining group (validation set). This process was then repeated six times, with each of the six groups used exactly once as the validation set. The prediction model was constructed in the same manner as adopted in QTL analysis, where the posterior probability of QTL existence and the estimate of the QTL effect were obtained in each 1 cM interval in the common linkage map with Bayesian analysis as described above. To predict a trait value of an individual in the validation set, the genotype was imputed in all intervals over whole-genome region for the individual based on the marker genotype, and subsequently, the prediction model including the posterior probability of QTL existence and estimated QTL effect in each interval was applied to the genotype. The predicted trait value was obtained byŷ whereŷ i was the predicted trait value of the ith individual in the validation set;m was the estimate of the intercept of the model (2); p l was the posterior probability of QTL existence in the lth interval; u il was a covariate indicating the genotype of the ith individual in the lth interval, with values of 1 and -1 corresponding to the genotypes QQ and qq, respectively, with Q being an allele derived from Kitahonami and q being an alternative allele from other cultivars; andâ l was the estimated effect of QTLs located in the lth interval, with m being the number of intervals.

Simulation of ideotypes
Here, a QTL region was redefined as a continuous chromosomal region consisting of the 1 cM intervals described above with posterior probabilities of QTL existence greater than 0.01 and the intervals with posterior probabilities less than 0.01 but surrounded by intervals with posterior probabilities greater than 0.01. Based on the FlYd and Fla QTL regions defined in such a manner, ten ideotypes with favorable genotypes for FlYd and Fla values, which were obtained for the population of DHLs derived from the same cross combinations employed here, were considered for simulation. The first ideotype (Ideotype 1) harbored favorable alleles in all target regions and was thus regarded as an optimal genotype of FlYd and Fla, while the other nine ideotypes (Ideotype 2-Ideotype 10) were genotypes with gradually loosened restraints on Fla QTLs in ascending order based on the proportions of variance explained by the QTL regions relative to total Fla variance such that the degree of accumulation of favorable alleles in Fla QTLs decreased in the order of Ideotype 1 to Ideotype 10. Different genotypes were included in an ideotype due to variable genotypes in chromosomal regions other than the fixed QTL regions. We generated 500 genotypes for each ideotype by randomly allocating the Kitahonami-type allele and alternative allele to unconstrained regions depending on the genotype in the adjacent regions, considering the recombination fraction. Generation of random genotypes was conducted by an original Fortran program. Using the results of the above Bayesian QTL analysis of DHLs, which included information on QTLs in each 1-cM-interval chromosomal region, phenotypic values of FlYd, FlL, Fla, Flb and FPC were predicted for a total of 5,000 genotypes derived from the 10 ideotypes.

Linkage map construction
Combining SSR, amplicon sequence and diagnostic markers, the numbers of markers genotyped were 561, 686 and 691 for the KK, SK and TK populations, respectively. With the criterion of a maximum recombination fraction of 0.30, genetic maps consisting of 35, 30 and 35 linkage groups were obtained for the KK, SK and TK populations, respectively (Table 1). These maps consisted of 499 loci (555 markers) for KK, 573 (685) for SK and 598 (683) for TK with total lengths of 3,920.5, 3,493.2 and 4,718.0 cM, respectively. In all three populations, the cumulative length of chromosomes was greatest in the D genome. The number of loci per chromosome ranged from 17 (chromosomes 3A of KK, 6D of KK and 6B of SK) to 44 (5D of TK) (26.5 on average), and the average distance between loci per chromosome ranged from 3.7 cM (3B of SK) to 12.1 cM (3D of TK) (7.3 cM on average). Using primer sequences, we estimated the physical positions of amplicon sequence markers by a homology search against the CS RefSeq v1 genome sequence [27]. The physical positions of the markers revealed that the cumulative sizes of these maps reached 13

Heritability estimates of traits
For each population, the number of observations of each trait in each environment are shown in S2 Table. The total number of observations per trait varied from 1,020 to 2,018 because Fash

PLOS ONE
was not observed in all four environments and some DHLs could not be measured due to an insufficient number of harvested grains. All phenotypic data are shown in S3 Table and summarized in S4 Table. Analysis of variance of each population showed that DHLs and environments had significant effects on all traits, except that environment did not affect x50 in the SK population (S5 Table). We calculated marker-based estimates of heritability for the nine traits that were observed in all four environments. The relatedness matrix of DHLs was obtained based on genotypes of 305 common markers across the three populations. The heritability of milling properties such as FlYd and FlEf was approximately 0.65, which was slightly lower than that of DH (Table 2). Among the flour color parameters, Fla showed a heritability similar to that of milling properties (FlYd and FlEf), and Flb displayed high heritability comparable to that of DH. On the other hand, FlL had the lowest heritability among the traits investigated. In this study, FPC, x50 and Sm showed moderate heritability.

Distribution of trait values
Phenotypic values of parents and DHLs were obtained by least square means across the four environments (S6 Table). The DH of Kitahonami (34.5) was approximately 10 days later than that of the three other parents (Fig 1A). The values for the DHLs were distributed among the parental values and showed transgressive segregation in all three populations. The phenotypic distributions of the three populations were almost overlapping. For FlYd, Kitahonami (68.8) showed the highest value, followed by Shunyou (62.1), Tohoku224 (61.6) and Kinuhime (59.3) (Fig 1B). Consistent with the parental values, the values of the SK population were higher than those of the other populations, followed by the TK and KK populations. Lines with transgressive segregation were found in the SK population, which showed higher values than the Kitahonami population. The relationship between the parental values and distribution of values in the DHL populations for FlEf was similar to that for FlYd (Fig 1C). Among the flour color parameters, the FlL values of the four parental varieties were almost the same, and the distributions of FlL values among the three populations overlapped (Fig 1D).  (Fig 1E and 1F). The distribution of values for each population included the average value of the parents as the mode, and significant transgressive segregation was observed in both directions. Distributions of the other four traits are described in S1 Fig.

Relationships between traits
Using all DHLs, the relationships between traits were calculated (Fig 2A). In this analysis, no clear relationships were observed between DH and quality traits. A positive correlation between the milling properties FlYd and FlEf was observed (r = 0.62), and FlEf showed a strong positive correlation with x50 (0.75) and a negative correlation with Sm (-0.64). For flour color parameters, FlL was negatively correlated with Fash (-0.51). A strong negative correlation and moderate positive correlation were observed between Fla and Flb (-0.80) and between Fla and FPC (0.57), respectively. Correlations between the traits described above were also observed within the populations (Fig 2B-2D). In addition, significant correlations were observed between flour color parameters and particle size traits (x50 and Sm) in the KK and TK populations. Therefore, in the two populations, DHLs with higher Sm values had lower Fla values.

Single-population QTL analysis
Thresholds of QTL intensity at the 5% level are shown in S7 Table. With single-population analysis of ten traits, a total of 30, 44 and 28 QTLs were detected in the KK, SK and TK populations, respectively (S8 Table). Since selection for high FlYd and low Fla values led to the development of Kitahonami, these two traits have been the focus of soft wheat breeding programs in Japan. Therefore, we mainly describe the results of these two traits in the text. For FlYd, three, six and six QTLs were found in the KK, SK and TK populations, respectively (

Multifamily QTL analysis using CS reference sequences
Based on the estimated physical positions of 860 selected markers, we constructed common genetic maps of the three populations and recalculated linkage distances (S1 File). Multifamily QTL analysis using the common maps revealed a total of 86 QTLs for ten traits. QTL intensities along chromosomes and detected QTLs are listed in S9 Table and S10 Table,

PLOS ONE
QFla.m.tarc-7D segregated in all three populations, and the direction of the effect varied between the QTLs. Notably, QTL clusters were observed on group 7 chromosomes (Fig 3,

Construction of ideal genotypes and prediction of trait values
Based on the results of multifamily analysis, we selected an ideal genotype (Ideotype 1) that pyramided favorable alleles in terms of 21 regions containing FlYd and Fla QTLs (Table 6). Among possible derivatives of the ideotype, individuals with higher FlYd and lower Fla values than those measured for Kitahonami were observed, which indicated the possibility of breeding varieties with higher quality than Kitahonami (Fig 4A). However, the derivatives tended to show high Flb and low FPC values due to strong negative and moderate positive correlations between Fla and Flb and between Fla and FPC, respectively (Fig 2). Since Flb and FPC largely affect the end-use quality of flour, nine other ideotypes (Ideotypes 2-10) that differed in the number of fixed Fla QTL alleles were generated (Table 6). Scatter diagrams of the five traits indicated that variation in Flb and FPC increased as the number of fixed Fla QTLs decreased (Fig 4B, S2 Fig). The derivatives from Ideotypes 8-10 included individuals with higher FlYd and lower Fla values than those observed in Kitahonami and similar Flb and FPC values compared to those observed in this variety. Genotypes of derivatives from the ten ideotypes and their predicted phenotypes are given in S2 and S3 File, respectively.

Discussion
Recently, high-density SNP arrays and genotyping-by-sequencing (GBS) have been widely used for genetic analyses [34][35][36][37][38]. However, when using a specific crossed population, a SNP array is not an efficient way to develop genetic maps because a large number of probes do not show any polymorphism between parents. Since GBS relies on randomly distributed polymorphic sites flanking restriction enzyme recognition sites, the tags of GBS tend to show distribution bias across samples and the genome. Therefore, GBS is not suitable for genome-wide surveys using biparental populations. In this study, we used a target amplicon sequencing approach to construct genetic maps of three DHL populations. Because the DHL populations shared Kitahonami as the paternal parent, the use of common polymorphic sites as markers to compare maps among populations was quite beneficial. Taking advantage of the amplicon sequencing approach, well-saturated genetic maps were effectively constructed by selecting suitable marker sets for the target population.
Performing quality evaluations at a late stage often results in ostensibly promising wheat lines with high yield and resistance to diseases that cannot be released due to poor end-use quality traits, such as weak performance in milling and processing properties. Because the evaluation of quality traits is a labor-intensive and time-consuming step, it is difficult to perform

PLOS ONE
genetic analysis of these traits using a large segregating population. In addition, due to unexpected environmental conditions, missing data caused by insufficient amounts of harvested seeds for milling often occur. In this study, we did not evaluate a complete set of DHLs during the four-year trial. However, with data for commonly used varieties and lines, we calculated least square means across environments, which could then be used as genetic estimates for each line. These values showed reasonable distribution patterns that were predicted by the parental values (Fig 1). Furthermore, heritability estimates of the target traits, FlYd and Fla, were relatively high, indicating that these traits were mainly governed by genetic factors ( Table 2). These results indicate that selection based on the detected QTLs at an early stage of

PLOS ONE
breeding has a great impact on improving these two traits. Since least square means across environments enabled us to detect reliable QTLs, historical data collected in breeding programs, which often include missing values, could be used for genetic analysis with the same statistical procedure. QTLs detected by multifamily analysis tended to show higher QTL intensities than those detected by single-population analysis ( Table 3, Table 4). For FlYd, single-population analysis revealed three, six and six QTLs in KK, SK and TK, respectively, while 12 QTLs were detected by multifamily analysis. Based on the physical positions of flanking markers, QTLs detected by single-population analysis were included in those detected by multifamily analysis, except for QFlyd.tk.tarc-2B. Because the intensity of this QTL (0.382) barely exceeded the 5% threshold value (0.372), further research is necessary to confirm its existence. Three QTLs were detected only by multifamily analysis. Among them, QFlyd.m.tarc-3A and QFlyd.m.tarc-6D showed relatively low posterior probabilities compared to those of other QTLs, while QFlyd.m.tarc-5D showed a high posterior probability, indicating that the QTL was reliable. For Fla, QTLs detected by single-population analysis were also found by multifamily analysis, except for QFla.sk.tarc-1B. Multifamily analysis revealed six QTLs that could not be found by single-population analysis. Among these QTLs, QFla.m.tarc-6A, QFla.m.tarc-6B and QFla.m.tarc-7B showed quite high posterior probabilities. In this study, for both FlYd and Fla, the average posterior probabilities from multifamily analysis were higher than those from single-population analysis, meaning that multifamily analysis improved the power to detect QTLs by increasing the size of the population. These results support the effectiveness of multifamily analyses in detecting actual QTLs described in Hayashi and Iwata [26].
Genome-wide QTL analyses of quality traits in soft wheat revealed flour yield and flour color QTLs on almost all chromosomes [5,7,23,[39][40][41]. By association mapping studies, many MTAs have been identified for wheat quality parameters. For example, 15, 28, 25 and 32 MTAs for flour color L � , a � , and b � and yellow pigment content, respectively, were detected [25]. Recently, using soft winter wheat from the eastern region of the United States, significant MTAs for flour yield were found on chromosomes 1B, 2A and 2B [42]. Our previous association mapping approach revealed approximately 20 QTLs related to flour yield that accumulated during the process of breeding [22]. Although these genetic factors are useful for the wheat research community, it is difficult to apply the information in actual breeding selection. Generally, breeding is a process of pyramiding favorable alleles from various materials into an elite genotype, such as a leading variety. Therefore, it is important to obtain allelism information for QTLs detected among breeding materials. A nested association mapping (NAM) population developed by crossing one elite variety with various germplasms can be used to dissect genetic factors for agronomically important traits [43][44][45]. A NAM population can also reveal the genetic architecture of genome-wide recombination rate variation, which would be useful information for improving the efficiency of gene pyramiding [46]. Therefore, the DHLs used in this study are suitable materials for genetic analysis as well as breeding materials for pyramiding favorable alleles of quality traits.
The predictabilities of five traits were evaluated by six-fold cross-validation ( Table 5). The prediction accuracies of the two target traits, FlYd and Fla, were approximately 0.5, and the highest values exceeded 0.6, indicating that prediction of these traits from genotypes was possible. However, the prediction accuracies of these traits varied among cross-validation sets. To determine the appropriate selection criteria of a training set, further research is necessary to investigate the relationship between genotypes of the training set and selection candidates. Ranks based on prediction accuracies corresponded to those based on heritabilities (Table 2). Because heritability is defined as the portion of phenotypic variance that can be explained by genotype, it is reasonable for a trait with high heritability to show a high prediction accuracy based on genotype. As expected, the low-heritability trait, FlL, was not predictable (0.144). This result was consistent with the result of our QTL analysis, which revealed only one FlL QTL, with a contribution of 0.061.
In a breeding program, it is important to maintain a balance between antagonistic traits [47,48]. Among the traits investigated in this study, the two target traits had clear selection criteria: higher flour yield was better, and lower flour a � was better. On the other hand, other traits such as DH, Flb and FPC had optimal values corresponding to those for target cultivation areas and end-use products. Multitrait QTL analysis reveals whether or not correlated traits are governed by the same genetic factors. In this study, we detected collocation of QTLs on group 7 chromosomes. According to the IWGSC functional annotations (https://wheaturgi.versailles.inra.fr/Seq-Repository/Annotations) and expression profiles (http://www. wheat-expression.com/) [49,50], annotated high-confidence genes around the QTL clusters were investigated (S4 File). Average transcripts per million (tpm) values from 166 studies were used to assess the expression levels of these genes in grains. Based on human-readable descriptions, phytoene synthase on chromosome 7B (TraesCS7B02G482000) is one of the causal genes for flour color parameters [15]. Among the genes highly expressed in grain tissues, genes related to carbohydrate metabolism, such as "glucose-1-phosphate adenylyltransferase" (TraesCS7A02G287400), "pyrophosphate-fructose 6-phosphate 1-phosphotransferase subunit beta" (TraesCS7A02G198200), "1,4-alpha-glucan branching enzyme" (TraesCS7A02G251400, TraesCS7B02G472400 and TraesCS7B02G472500), "starch synthase" (TraesCS7A02G189000 and TraesCS7D02G117800) and "debranching enzyme 1" (TraesCS7D02G133100) are candidate genes in the QTLs because carbohydrates are a dominant component of grains and are known to affect its processing quality. Transcription factors such as Zinc finger, MADS-box, Myb and NAC are also possible candidates because they are involved in various seed developmental processes. On chromosome 7A, a gene annotated as "Flowering locus T/ Terminal flower 1-like protein" (TraesCS7A02G229400) showed high expression levels in grains. Many studies have demonstrated that the proteins encoded by Flowering locus T-like (FT-like) genes act not only as major mobile flowering signals in flowering plants but also participate in the regulation of diverse developmental processes [51]. Therefore, pleiotropic effects of the 7A QTLs may be due to this multifunctionality of FT-like genes. Further studies delimiting QTL intervals are necessary to identify causal genes in the clusters. Although it is unclear whether or not the causal genes of these QTLs are the same, breeding selection of these QTL regions should be performed carefully while considering the balance of traits. A simulation study of the ideotypes showed the possibility of breeding lines with higher flour yield than Kitahonami. This possibility was realized by excluding QFlyd.m. tarc-3A, Qflyd.m.tarc-5A and Qflyd.m.tarc-7D, for which the Kitahonami alleles had negative effects. For flour color, the design of ideal genotypes was not simple because Fla was correlated with Flb and FPC. A lower Fla, which indicates high noodle-making quality, inevitably leads to a high Flb and low FPC. Since the Flb and FPC of flour greatly affect its end-use quality, it is important to optimize these values. Simulation revealed that fine-tuning of genotypes would be essential to obtain genotypes with ideal phenotypic values.
Bayesian QTL analyses of single populations and multiple families followed by simulations revealed the possibility of breeding varieties superior to Kitahoanmi. Trait predictability is an attractive technology with the potential to significantly improve breeding efficiency. Genomic predictions using a high-density genotyping platform were performed for quality traits of durum wheat [21,52]. Recently, the genomic predictabilities of 35 key traits were reported, which demonstrated the potential of genomic selection for wheat end-use quality [53]. In this report, the prediction accuracy and standard deviation of flour yield across four seasons were 0.56 ± 0.04 and 0.45 ± 0.09 when cross-validation was performed within material categories (panels) and across panels, respectively. These values were comparable to our prediction accuracies, even though the number of markers used in this study was small. In regular breeding programs, due to high costs, it is still difficult to routinely obtain high-density genotypic data from a large number of selection candidates. The cost-effective amplicon sequencing method described herein would be a suitable platform for selecting several dozens of trait-associated chromosomal regions simultaneously. Furthermore, to increase the chance pyramiding QTLs by crossing, the growth acceleration method called 'speed breeding' [54] may be an effective tool for establishing the ideal genotypes presented here. Therefore, genome-wide target selection combined with speed breeding is a next-generation breeding strategy for high-quality wheat.