Fig 1.
Biosynthesis and regulation of flavonoids.
PAL = phenylalanine ammonia lypase, C4H = trans-cinnamate 4-monoxygenase, 4CL = 4-coumarate CoA ligase; CHS = chalcone synthase; CHI = chalcone isomerase; F3H = chalcone-flavanone isomerase hydroxylase; F3’H = flavanone 3’-hydroxylase; DFR = dihydroflavonol-4-reductase; ANS = anthocyanidin synthase; ANR = anthocyanidin reductase; GST = glutathione-S-transferase; MATE = multidrug and toxic efflux transporter; LAR = leucoanthocyanidin reductase; TT = transparent testa; TTG = transparent testa glabra.
Fig 2.
Distribution and comparison of antioxidants contents in Sorghum bicolor and Sorghum bicolor × Sorghum halepense recombinant inbred lines.
Open dot and the numbers inside the rectangles (interquartile ranges) represent, respectively, the position of mean and the mean of the trait of intertest. The horizontal bar inside the rectangles represents the median of the trait of interest. Within the same trait, means with different letters are significantly different at the 5% level using the Tukey's HSD (honestly significant difference) test. Refer to text for the description of the traits.
Fig 3.
Pairwise correlation among significant marker loci (A-G) and the relationships between the four antioxidant traits (H). The filled-in areas of the circles (A-G) show the absolute value of corresponding correlation coefficients. The scale on the right hand side is colored from red (negative correlation) to blue (positive correlation); with the intensity of color scaled 0–100% in proportion to the magnitude of the correlation. Scatter plots (H) with regression lines showing the relationships between the traits are in the lower corner. Correlations between the traits are in the upper corner. Histograms of the mean concentrations of each trait are in the center diagonal. Refer to the text for the description of the traits.
Table 1.
Broad-sense heritability of the antioxidants measured in the sorghum panel.
Fig 4.
Sorghum panel genomic relatinship matrix.
Heatmap displaying relationships among the 114 sorghum genotypes used in the GWAS. Pink and green colors identify, respectively, Sorghum bicolor and S. bicolor × S. halepense populations. The white diagonal represents perfect relationship of each genotype with itself; sections of warmer colors in the diagonal represent excess heterozygosity; the symmetric off-diagonal elements represent relationship for pairs of genotypes. The blocks of light colors on the diagonal show clusters of closely related genotypes. The adjoining dendrogram illustrates Kinship groups identified in the sorghum panel.
Fig 5.
Genotypes are coded as -1, 1, 0, respectively, for homozygotes for reference and alternative alleles, and heterozygote.
Fig 6.
Alternative (Alt) and reference (Ref) allele frequency and polymorphic information content (PIC) per entire panel and subpopulations.
Open dot inside the rectangles and the numbers inside or outside the rectangles (interquartile ranges) represent, respectively, the position of mean and the mean of the metric of intertest. The horizontal bar inside the rectangles represents the median value. Means with same letter are not significantly different at the 5% level using the Tukey's HSD (honestly significant difference) test.
Fig 7.
Quantile-Quantile (QQ) plot of observed against expected probability values (P-values) from the genome-wide association analysis.
TAN, TAC, FEN, FLA, respectively, condensed tannins, total antioxidants, phenols, and flavonoids. Blue circles correspond to the P-values derived from the principal components + kinship model. The red line indicates the expected P-value distribution under the assumption (null hypothesis) that the P-values follow a uniform [0,1] distribution. The dotted lines show the 95% confidence interval for the QQ-plot under the null hypothesis of no association between the SNP and the trait. -log10(P) negative base 10 logarithm of the P-values (probability of type-I error made in GWAS hypotheses testing). Refer to text for the description of the SUPER and FarmCPU algorithms.
Fig 8.
Manhattan plots representing several strongly associated antioxidant loci for the antioxidant traits.
Each circle in the scatter plot represents a SNP, with the X-axis showing genomic location. Numbers 1 to 10 on X-axis represent the ten Sorghum bicolor chromosomes. The Y-axis shows the association level: -log10(P) is the negative base 10 logarithm of the P-values (probability of type-I error made in GWAS hypotheses testing). The solid horizontal line represents the genome-wide significance threshold as explained in the text. Regions with −log10 p-values above the threshold are candidates as in Table 2. Each plot shows the output of an algorithm for a specific target trait in the form “algorithm.trait”. TAN, TAC, FEN, FLA, respectively, condensed tannins, total antioxidants, phenols, and flavonoids. Refer to text for the description of the SUPER and FarmCPU algorithms.
Table 2.
GWAS results and descriptive statistics of the significant marker loci.
Fig 9.
Functional markers, genes, coding sequence, transcripts, and the aminoacid sequences of the gene products of interest.
The position (in base pairs) of the functional marker is indicated by a vertical bar. For each chromosome, from top to bottom: numbers represent the physical map (distance) in base numbers; solid line represents the region of the chromosome of interest as identified by unique NCBI identification number (ID) for the reference Sorghum bicolor NCBIv3 primary assembly; dashed horizontal lines are exons; horizontal green arrowed-bar represents the gene locus (and ID) of interest derived by automated computational analysis using Gnomon eukaryotic gene prediction algorithm; horizontal dashed red arrowed-bar represents the messenger ribonucleic acid macromolecule (along with ID) corresponding to gene of interest; the horizontal dashed yellow arrowed-bar represents the coding sequence (CDS along with ID) of the gene of interest. The direction of arrows indicates the DNA and RNA 5′-to-3′ direction. The translation reports the the letters representing the sequence (in one-letter code format) of the aminoacids making up the gene product of interest.