Genetic Analysis and QTL Mapping of Seed Coat Color in Sesame (Sesamum indicum L.)

Seed coat color is an important agronomic trait in sesame, as it is associated with seed biochemical properties, antioxidant content and activity and even disease resistance of sesame. Here, using a high-density linkage map, we analyzed genetic segregation and quantitative trait loci (QTL) for sesame seed coat color in six generations (P1, P2, F1, BC1, BC2 and F2). Results showed that two major genes with additive-dominant-epistatic effects and polygenes with additive-dominant-epistatic effects were responsible for controlling the seed coat color trait. Average heritability of the major genes in the BC1, BC2 and F2 populations was 89.30%, 24.00%, and 91.11% respectively, while the heritability of polygenes was low in the BC1 (5.43%), in BC2 (0.00%) and in F2 (0.89%) populations. A high-density map was constructed using 724 polymorphic markers. 653 SSR, AFLP and RSAMPL loci were anchored in 14 linkage groups (LG) spanning a total of 1,216.00 cM. The average length of each LG was 86.86 cM and the marker density was 1.86 cM per marker interval. Four QTLs for seed coat color, QTL1-1, QTL11-1, QTL11-2 and QTL13-1, whose heritability ranged from 59.33%–69.89%, were detected in F3 populations using CIM and MCIM methods. Alleles at all QTLs from the black-seeded parent tended to increase the seed coat color. Results from QTLs mapping and classical genetic analysis among the P1, P2, F1, BC1, BC2 and F2 populations were comparatively consistent. This first QTL analysis and high-density genetic linkage map for sesame provided a good foundation for further research on sesame genetics and molecular marker-assisted selection (MAS).


Introduction
Sesame (Sesamum indicum L., 2n = 26), a member of the Pedaliaceae family, is one of the oldest and most important oilseed crops known to man due to its high oil content and quality [1]. Sesame seed is also rich in proteins, vitamins, niacin, minerals and lignans and is popularly used as a food and medicine [2][3]. Seed coat color is an important agronomic trait in sesame. The natural color of mature sesame seeds varies from black, intermediate colors (e.g., gray, brown, golden, yellow and light white) to white. Compared with white seeds, black sesame seeds usually have higher ash and carbohydrate content, but lower protein, oil, and moisture ratios [4]. In East Asia, the products of black sesame seeds attract greater acceptance. Seed coat color in sesame seems to be associated with seed biochemical properties, antioxidant content and activity and even the level of disease resistance among sesame accessions, in addition to being a marker of evolution within the Sesamum genus [2][3][4][5][6][7].
Significant attention has been paid to the inheritance of sesame seed coat color over a long period. The complex nature of the seed coat color trait had been mentioned in many reports. Nohara et al. (1933) performed a cross between white-seeded and blackseeded sesame accessions, obtaining an F 2 ratio of 9:3:3:1 for black, dark brown, pale brown, and white seed coat colors, respectively, and concluded that seed coat color trait is regulated by two genes [8]. In another crosses between black and dirty white sesame seed types, black was invariably dominant in the F 1 generation, and an F 2 segregation ratio of 9 black: 3 grey: 4 dirty white was obtained. Results also suggested that seed coat color is regulated by two genes [9]. Gutierrez et al. (1994) found that black is the dominant testa color and light brown was observed to be partially dominant over white. They concluded that coat color is controlled by two independent genes with complementary effects and complete dominance at each locus [10]. However, examination of the F 2 generation in black 6light brown and black 6white crosses revealed that one gene had complete dominance and supplemented the effects of other genes controlling basic testa colors [10]. Baydar and Turgut (2000) reported that epistatic segregation (9:4:3 and 9:3:4 ratios) determines sesame seed coat color [11]. In addition, a recent analysis of crosses between nine sesame accessions from Nigeria also demonstrated that seed color has a complex genetic basis, accessions with the same seed coat color possibly having different genotypes [12].
Genetic segregation analysis over multi-generations and mapping of quantitative trait loci (QTL) are the main approaches taken to clarify the genetic basis of quantitative traits [13][14][15][16]. To date, there are no reports on QTL mapping for sesame traits due to the lack of a high-density linkage map. Recent progresses on sesame genetic mapping and the development of molecular markers has laid an important foundation for studies on the genetics and QTL analysis of important sesame traits [17][18].
The aims of this study were to (1) comprehensively analyze segregation over multiple generations (P 1 , P 2 , F 1 , F 2 , BC 1 and BC 2 ) of a cross between black and white sesame accessions, and to explain the inheritance of sesame seed color trait, (2) construct a high-density genetic linkage map with a mapping population of 260 F 2 and F 3 progenies, and (3) locate the first sesame QTL for seed coat color on the linkage map.

Plant Materials
The two S. indicum germplasm samples used, COI1134 (white seeded, P 1 ) and RXBS (black seeded, P 2 ) were accessions from the sesame germplasm resources collection at Henan Sesame Research Center, Henan Academy of Agricultural Sciences (HAAS). To investigate segregation, three replicates of the P 1 , P 2 , F 1 , BC 1 , BC 2 and F 2 populations were grown at Yuanyang experimental station, HAAS, in 2009 ( Figure 1). To construct a genetic linkage map and locate QTLs, an F 2 population of 260 lines was grown at Yuanyang experimental station and the corresponding 260 F 3 families were grown at both Yuanyang and Pingyu experimental stations in 2010. Young leaf tissues from parents and F 2 plants were harvested, immersed in liquid nitrogen and stored at 270uC before DNA extraction.

Phenotypic Evaluation
Five plants from each F 3 family were selected randomly and their seed was used to represent the phenotype of individual F 2 plants. As seeds matured, three capsules from the middle of the main capsule stem per plant were collected for phenotypic evaluation of each of the six generations. Seeds from each generation in each of the three replications were photographed using a digital Nikon camera in a darkroom. The RGB (red, green and blue) values of each picture were recorded for each of the three replications using the color capture tool in Adobe Photoshop. The average RGB value of each sample was used for statistical analysis.

Segregation Analysis
Genetic analysis of the six populations was performed according to mixed major gene plus poly-gene genetic models [19][20][21][22][23]. The 24 genetic models could be divided into five model groups, i.e., inheritance controlled by one pair of major genes (A), two pairs of major genes (B), polygenes (C), one pair of major genes plus polygenes (D) and two pairs of major genes plus polygenes (E) ( Table S1). The distribution parameters for seed coat color in each population were estimated using the iterated expectation and conditional maximization (IECM) method. The best-fitting genetic model was determined according to Akaike's information criterion (AIC), a likelihood-ratio test and goodness-of-fit test [19][20][21]. The genetic effect of major genes and polygenes was estimated using the least squares method [21][22] based on the distribution parameters of each component in the optimal model.

DNA Isolation and DNA Marker Analysis
Total genomic DNA of the parents and 260 F 2 lines was extracted from 300 mg the young leaf tissue using the CTAB method [24]. 32 amplified fragment length polymorphism (AFLP) and 298 simple-sequence repeat (SSR) primers used in previous studies [17][18] were combined to generate thousands of AFLP and random selective amplification of microsatellite polymorphic loci (RSAMPL) primer pairs for screening polymorphic markers [17] (Table S2). DNA amplification and electrophoresis were performed according to Wei et al. [17].

Linkage Map Construction
A total of 724 polymorphic primer pairs, including 49 SSR, 52 AFLP, and 623 RSAMPL primer pairs, were used for linkage mapping. A high-density linkage map was constructed with a total of 653 polymorphic loci using JoinMap ver. 3.0 [25]. Chi-square test was used to determine whether or not genotypic frequencies at each locus deviated from the expected segregation ratios of 1:2:1 (or 3:1). All linkage groups (LG) were determined with an LOD score cut-off of $6.0. The expected length of the genome was estimated using the methods described by Fishman et al. [26] and Postlethwait et al. [27].

QTL Detection
QTLs regulating seed coat color were analyzed using phenotypic data from the F 2 and F 3 generations respectively. Composite interval mapping (CIM) [28] and mixed linear composite interval mapping (MCIM) [29] were both used for QTL detection. In CIM, WinQTLCart 2.5 software (http://statgen.ncsu.edu/ qtlcart/WQTLCart.htm) was run using Model 6 with four parameters for forward and backward stepwise regression, 10 cM window size, 5 control markers and a 1 cM step size [30]. The threshold was determined by permutations (1000 times). In MCIM, QTLnetworks 2.0 software (http://www.webtopicture. com/qtl/qtl-network.html) was run with genome scan parameters of a 10 cM testing window, 1 cM walk speed and 10 cM filtration window. Whether or not two adjacent test peaks represented independent QTLs was determined in this process. Critical Fvalues were calculated using the Permutation test. QTL effects were estimated using Monte Carlo Markov Chain (MCMC) [31].

Seed Coat Color Phenotype
Field investigation showed that the phenotype of COI1134 (P 1 ) was white seeded, while RXBS (P 2 ), the F 1 hybrid and the BC 2 population were generally black. Seed color in the BC 1 and F 2 populations varied from black, intermediate to white (Figure S1). The RGB values of the six populations were consistent with their phenotypes (Table 1), and those of the P 1 and P 2 populations ranged between 120-150 and 20-50 respectively. The ranges of RGB values for the F 1 and BC 2 populations were similar to that of P 2 . The RGB values of the BC 1 and F 2 populations varied continuously from 20-150 and 20-140, respectively. The distribution of seed color in both backcross families (BC 1 and BC 2 ) shifted towards the recurrent parents.

Genetic Model Analysis
To determine a genetic model for the seed color trait, segregation analysis was performed in the six populations (with three replications) using a mixed genetic model (major genes+polygenes) ( Table 2). The B-1, B-2, E-0, E-1 and E-2 models with smaller AIC values were selected as candidate models for further analysis. Fitness tests, including U 1 2 , U 2 2 , U 3 2 , and Simirnov and Kolmogorov tests, were carried out. Results indicated that the number of significant parameters in the five models varied from 0 to 13 (Table S3). The E-0 model, with the least number of significant parameters (0), was selected as the optimal genetic model for seed color trait analysis. In this model, seed coat color is controlled by two major genes and polygenes with additivedominant-epistatic effects (Table S1).

Genetic Effect Analysis
To investigate the genetic effects of two major genes, the first order genetic parameters were estimated using the least squares method (Table 3). Additive effects of the major genes (a, b) controlling seed color were 20.30 (d a ) and 25.09 (d b ) respectively, while individual values of the dominant effects were 235.94 (h a ) and 29.35 (h b ), showing a downward trend. The additive effect of gene a was lower than that of gene b (|da|,|db|), and the dominant effect of gene a was higher than its own additive effect (|ha/da| .1). In contrast, the dominant effect of gene b was lower than that of a (|ha|.|hb|) and its own additive effect (|hb/db| ,1). The additive by additive (i) and additive by dominant (j ab ) effects were low with average values of 2.97 and 7.76 respectively. The dominant by dominant (l) and dominant by additive (j ba ) effects were high with average values of 17.87 and 218.72 respectively, while the dominant by additive (j ba ) effect on seed color was decreased (218.72).
Estimation of second order genetic parameters are shown in Table 4. Average heritability values for the major genes in the BC 1 , BC 2 , and F 2 populations were 89.30%, 24.00%, and 91.11%, respectively. Effects of polygenes were minor in the BC 1 (5.43%) and F 2 (0.89%) populations, and absent in the BC 2 population.

Linkage Map Construction
In order to construct a high-density linkage map, a subset of 724 polymorphic loci (49 EST-SSR, 52 AFLP, and 623 RSAMPL)
To further confirm these findings, we performed an ANOVA using GLM procedure with color values as the dependents and markers of class variables (Table 7). Results indicated that the Hs1125R/E11-300 and Y2017F/M11-400 markers were located at 18.6 and 32.2 cM on LG11, respectively, and showed high Rsquare values (from 0.126 to 0.198) in both environments; the Hs1152F/E14-300 marker located at 23.4 cM on LG11 and showed lower R-square values (0.077 and 0.033) in the two environments.
In addition, we also investigated environmental effects on the seed color genotypes using the F 3 datasets from the two environments (i.e., Yuanyang and Pingyu) and the MCIM method. Results indicated that the genotype6environment effect was not significant for seed coat color (data not shown). In conclusion, we were able to assign four QTLs for seed coat color in Linkage groups, LG1, LG11 and LG13. Alleles at all QTLs in the black-seeded parent (RXBS, P 2 ) increased the tendency toward darker seed coat color.

Discussion
Seed coat color in sesame is an important agronomic trait as it is associated with biochemical functions involved in protein and oil metabolism, antioxidant activity, and disease resistance [2][3][4][5][6]. Recent reports suggest that the seed coat color trait is a more suitable trait for estimating sesame evolution than geographic origin [7], since the direction of evolution in sesame was from wild species to black cultivars and then white cultivars [7,32]. While exploring the genetic basis and identifying QTLs for seed coat color, we constructed a new sesame genetic map with 653 loci using an intraspecific cross between white and black seeded accessions.

Genetic Analysis
In order to improve the precision of genetic analyses for quantitative traits, the use of a segregating population with more than 100 individuals is suggested or even required [19,22]. More than one generation with several replications is also encouraged [23]. We therefore performed the segregation analysis on seed coat color using a large experimental group (more than 150 individuals from each segregating population) with three replications. Six populations (P 1 , P 2 , F 1 , BC 1 , BC 2 and F 2 ) and F 3 families grown in two environments were used in this study.
The genetic effects and heritability of the gene(s) revealed that seed color is a complex quantitative trait in sesame: it is regulated by two major genes and polygenes with additive-dominantepistatic effects (E-0 model). Seed coat color in sesame is primarily controlled by hereditary factors. More than 90.0% of the phenotypic variation in the BC 1 and F 2 populations is controlled by two major genes and polygenes, with minimal influence from environmental factors (,10.0%). Major genes in the BC 2 population controlled 24.0% of the phenotypic variation. The same phenomenon of hereditary variation in different generations and populations has also been documented in tomato, cucumber, maize and other crops [33][34][35][36]. Further functional genomic studies are required to clarify the molecular mechanism controlling seed coat color.  Figure 2. Distribution of QTLs for sesame seed color trait on our seame high-density genetic linkage map. 653 marker loci were distributed across the 14 linkage groups of our high-density genetic linkage map at an LOD threshold of 6.0. The cM distance of markers is shown on the left side of each LG. The name, amplicant length (bp) and band number of each marker are shown on the right side of each linkage group. An LOD peak value of .2.5 was considered to indicated a significant QTL interval. The four QTLs identified using two programs are designated as follows: winQTL-1, winQTL-2 and winQTL-3 represent the QTL loci from the F 2 population and F 3 families (Yuanyang) and F3 families (Pingyu), respectively using the winQTLCart program. QTLnet-1, QTLnet-2 and QTLnet-3 represent the QTL loci from the F 2 and F 3 family (Yuanyang) and F 3 family (Pingyu), respectively, using networks program. doi:10.1371/journal.pone.0063898.g002

Genetic Linkage Map
Using the same cross as was used for the construction of the first linkage map, we herein constructed a high-density linkage map in sesame using 724 PCR-based DNA markers. Compared with the first map (data shown in brackets below) [17], this new map presents optimal features, e.g., (1) a more stringent criteria of LOD $6.0 (LOD $4.0) was used for map construction; (2) larger F 2 segregating populations of 260 (96) individuals were used; (3) the map is comparatively saturated with 653 (220) markers in 14 (30) linkage groups; (4) the average marker distance is 1.86 cM (4.93 cM); and (5) the estimated sesame genome is 1,380.94 cM (1,232.53 cM) and genome coverage is 88.06% (76.00%). We believe that more polymorphic genic SSR and SNP markers, and EST-SSR, AFLP and RSAMPL markers, will be validated and used for higher-density sesame linkage map construction within the near future [18] (Zhang H. et al., unpublished data).
In this study, 79 loci with distorted segregation were detected in 12 out of the 14 linkage groups and the degree of clustering of marker loci showed marked variation in the new sesame map. A similar deviation from Mendelian segregation ratios were also observed in our previous study [17]. Many factors, including technical artifacts in genotyping [37][38][39][40][41], chromosomal rearrangements [39][40] and markers from transposable elements [42], may have contributed to this effect. Varying degrees of clustering for AFLP markers have been reported in different crops including rice [15], barley [43], maize [44], ryegrass [45], tomato [46], potato [47], and Eucalyptus globulus and E. tereticornis [48]. Differences in the level or location of DNA polymorphisms, rates of recombination, copy number variation of specific genomic sequences or sampling errors are the main factors influencing marker distribution [47][48].

QTLs and Genes for Seed Coat Color
In this study, we identified four stable QTLs for seed coat color in sesame and estimated their gene effects. In the F 2 population, QTL 1-1 and QTL 13-1 played major roles, explaining 39.95% and 30.56% of the phenotypic variation (Table 5), and having additive effects (h 2 ) of 30.03% and 21.02% (Table 6), respectively. QTL 11-1 and QTL 11-2 are regarded as polygenes due to their comparatively lower contributions. This result is consistent with the results from classical genetic analysis (Table S1). The seed color trait is relatively stable and is not affected by environmental factors [49][50][51]. In other crops, the number of genes controlling the seed coat color trait is variable; for example, the seed color trait is regulated by a single gene in flax [52], watermelon [53], and lettuce [54], while two independent loci were found in lentil [55] and biennial white sweet clover [56], and at least three genes are involved in controlling the trait in pea [57] and capsicum [58].
Using the winQTLCart program and F 3 population data, QTL 11-1 and QTL 11-2 contributed 20.61% and 24.02%, respectively, of the phenotypic variation in Yuanyang (Table 5), making a similar contribution to that of QTL 1-1 and QTL 13-1. Furthermore, a similar situation was also observed in Pingyu when data was analyzed using the networks program (Table 5 and  6). We thus suggested that QTL 11-1 and QTL 11-2 may play major roles and have comparable effects to QTL 1-1 and QTL 13-1.
It is noteworthy that the QTL 11-1 and QTL 11-2 in LG 11 are quite close to each other. Whether these QTLs are independent or two parts of one larger QTL is a question worth consideration. He et al. (2001) reported that independence between two QTLs is based on heritability, marker density and sample size [59]. In F 2 or F 3 populations, if the heritability of a QTL is 10%, the marker density is 15 cM and the sample size is 300, the likelihood of detecting two adjacent QTLs would be 80% [59]. In our study, results showed that the two QTLs in LG11 had a heritability greater than 10% (Tables 5, 6), a QTL distance of greater than 20 cM ( Figure S2) and a marker density in LG11 of greater than 15 cM. We therefore concluded that there are two QTLs in the LG11 region. To further confirm this hypothesis, we performed an ANOVA using the GLM procedure with color values as the dependents and markers of class variables (Table 7). Results obtained also supported the existence of two QTLs in the LG11 region.
Several genes related with seed coat color have been cloned from A. thaliana, Brassica and Glycine max using fine-mapping, T-DNA insert mutation and homology-based cloning strategies [60][61][62]. Due to the association between seed coat color and important biochemical functions [2][3][4][5][6][7]63], we will continue to perform gene cloning and functional research on seed coat color traits in our ongoing Sesame Genome Project (www.sesamum.org) [64].

Conclusion
We have assembled a high-density linkage map of sesame with 653 marker loci in 14 LGs. We have shown that seed coat color is controlled by two major genes with additive-dominant-epistatic effects plus polygenes with additive-dominant-epistatic effects, and detected four QTLs, QTL1-1, 11-1, 11-2 and 11-3 which are distributed in three linkage groups. Our results for segregation analyses and QTL detection for sesame seed coat color were consistent. Location of genes controlling seed color on the linkage map should be useful for gene isolation and functional genomics research. This first QTL mapping study in sesame provides a foundation for further genetics and molecular marker-assisted selection (MAS) breeding research. Figure S1 Seed coat color variation in six populations. In this figure, seed color images (a-n) represent the corresponding RGB values for the14 grades (20-150) ( Table 1) Table S1 Genetic models for P 1 , P 2 , F 1 , BC 1 , BC 2 and F 2 population analysis. The genetic models are cited from Gai et al. [20] and Zhang et al. [23] and were divided into five model groups, i.e., inheritance controlled by one major gene, two major genes, polygenes, one major gene plus polygenes and two major genes plus polygenes. (DOC) Table S2 AFLP and SSR primers used for linkage map construction. 32 AFLP and 298 SSR primer pair combinations were used to screen for polymorphic primer pairs in genetic linkage map construction. The 50 AFLP primer pairs anchored onto the map were screened from combinations of the 32 AFLP primers, while the 30 SSR primer pairs anchored onto the map were obtained from previous research [17][18], and the 573 RSAMPL primer pairs anchored onto the map were screened from combinations of the 32 AFLP and 298 SSR primers.

Supporting Information
(DOC) Table S3 Fitness tests of five candidate genetic models for seed coat color analysis. The number of significant parameters, correlated with the adaptation level of the models, varied from 0-13. The E-0 model with the least number of significant parameters (0) in the three replications was selected from the five candidate models as the optimal model and was used for seed coat color analysis. *indicated significance at p = 0.05.