Genome-wide association mapping of total antioxidant capacity, phenols, tannins, and flavonoids in a panel of Sorghum bicolor and S. bicolor × S. halepense populations using multi-locus models

Sorghum is widely used for producing food, feed, and biofuel, and it is increasingly grown to produce grains rich in health-promoting antioxidants. The conventional use of grain color as a proxy to indirectly select against or for antioxidants polyphenols in sorghum grain was hampered by the lack of consistency between grain color and the expected antioxidants concentration. Marker-assisted selection built upon significant loci identified through linkage disequilibrium studies showed interesting potential in several plant breeding and animal husbandry programs, and can be used in sorghum breeding for consumer-tailored antioxidant production. The purpose of this work was therefore to conduct genome-wide association study of sorghum grain antioxidants using single nucleotide polymorphisms in a novel diversity panel of Sorghum bicolor landraces and S. bicolor × S. halepense recombinant inbred lines. The recombinant inbred lines outperformed landraces for antioxidant production and contributed novel polymorphism. Antioxidant traits were highly correlated and showed very high broad-sense heritability. The genome-wide association analysis uncovered 96 associations 55 of which were major quantitative trait loci (QTLs) explaining 15 to 31% of the observed antioxidants variability. Eight major QTLs localized in novel chromosomal regions. Twenty-four pleiotropic major effect markers and two novel functional markers (Chr9_1550093, Chr10_50169631) were discovered. A novel pleiotropic major effect marker (Chr1_61095994) explained the highest proportion (R2 = 27–31%) of the variance observed in most traits evaluated in this work, and was in linkage disequilibrium with a hotspot of 19 putative glutathione S-transferase genes conjugating anthocyanins into vacuoles. On chromosome four, a hotspot region was observed involving major effect markers linked with putative MYB-bHLH-WD40 complex genes involved in the biosynthesis of the polyphenol class of flavonoids. The findings in this work are expected to help the scientific community particularly involved in marker assisted breeding for the development of sorghum cultivars with consumer-tailored antioxidants concentration.

Introduction Sorghum (Sorghum bicolor (L.) Moench) is one of the world's most important crops grown for food, feed, and biofuel [1]. Sorghum was traditionally a staple food for hundreds of millions of people in Africa and Asia, but it is becoming more popular worldwide, including in developed countries, for its uses in food industry to satisfy a rise in demand for specialty grains, especially those that are gluten-free and rich in health-promoting compounds [2]. Moreover, the importance of sorghum in agricultural ecosystems is expected to keep rising because this crop presents high resilience to the predicted food security threatening scenarios related to climate change such as drought and heat [3], it is adapted under lower and higher latitudes [4], and it outperforms several crop species in terms of productivity and energy balance [1].
Like in other cultivated species, sorghum genetic improvement for production and quality traits including antioxidants concentration in grains, can benefit from controlled hybridizations with wild relatives [5]. Breeding efforts are being increasingly committed worldwide to introgress perenniality from wild species into domesticated sorghum with the aim of increasing crop sustainability as well as food security [6,7]. Sorghum halepense (L.) Pers. [6,[8][9][10] and S. propinquum (Kunth) Hitchc [11,12] are frequently used to transfer perenniality into Sorghum bicolor, but S. halepense confers a more robust perenniality [13], and hence it was used as wild donor of this trait in this study. Sorghum halepense, commonly called 'Johnsongrass' is a natural allotetraploid (2n = 40) most likely originated by the spontaneous hybridization between S. bicolor (2n = 20) and S. propinquum (2n = 20) followed by chromosome doubling [14]. It can be hybridized with either induced tetraploids or cytoplasmic-genetic male sterile diploids of cultivated sorghum, originating in both cases mainly tetraploid progenies [9,15], although diploid descendants can also occur [16,17].
In 2014 we initiated a breeding program aimed at developing perennial grain sorghum for use in specialty foods. Under the human nutrition perspectives, sorghum grain shows important health-promoting properties and it is a rich source of antioxidants such as polyphenols, carotenoids, as well as micro-and macronutrients [18,19] all of which constitute key ingredients in several human healthy diets. Indeed, sorghum grain exhibits the highest values of total antioxidant capacity among several cereals including wheat, rice, oats, barley and maize [2,20].
The antioxidant properties of sorghum grain are mostly dependent on the contents of phenols (polyphenols) which in turn are classified in phenolic acids (benzoic or cinnamic acid derivatives) and flavonoids. These compounds in sorghum grains are found in the seed pericarp (external coat) and testa (the layer between pericarp and endosperm). Polyphenols are a large class of phytonutrients that are beneficial to human health [21][22][23]. They have the potential to protect against several chronic diseases including cancer, type 2 diabetes, heart disease, and other living cell damaging oxidative reactions produced by free radicals, through antioxidant reactions [24]. In sorghum grain, the flavonoids are mostly represented by condensed tannins (proanthocyanidins) and anthocyanins [2,20]. The condensed tannins (hereafter referred to as tannins) are high-molecular-weight polymers of catechins and epicatechins produced through the flavonoid pathway and are the only type of tannins that have been found in sorghum [25]. In sorghum seeds, they play a role in seed dormancy [26], protect against grain mold [27] and bird and insect predation [28]. In addition, due to tannin's ability to bind proteins and minerals, tannin-containing sorghum is reported to decrease feed efficiency in some animal species, and this property can help fight obesity and several other degenerative afflictions in humans [29]. It is worth noticing that tannins are found in grains, such as sorghum with a pigmented testa layer, some finger millets, and barley, but not in major cereal crops, such as rice, wheat, and maize [20]. Tannin content in sorghum grains is generally much higher than in other cultivated fruits, nuts, and grains, and both tannin and non-tannin types occur in nature and in cultivated sorghums, but wild sorghums are mostly tannin types [20]. Given that one of the objectives of breeding has long been to select against the tannins in the caryopsis [30], particularly because of its astringency and its negative effect in animal feeding, the persistent residual tannin trait in cultivated sorghum needs to be explained. As suggested by Wu et al. [30], it is believed that natural selection retained a certain tannin content in domesticated sorghum as these compounds conferred sorghum resistance to frequent grain molds and bird damages. As the human interest shifts towards maintaining or increasing these healthy compounds in grain, it can be expected that fortifying sorghum grain with increased contents in condensed tannins is going to be one of the major objectives in several sorghum breeding programs.
Conventionally, grain color was used to indirectly select against or for polyphenols in sorghum [31]. A comprehensive account on sorghum grain color and the genetics and physiology of phenols can be found in Rhodes et al. [31]. According to these authors [31] and as described below, a number of classical loci reported to govern grain color and testa, govern also the presence or absence of polyphenols in sorghum. For instance, genotypes with dominant alleles at the B loci (B1-and B2-) show pigmented testa and proanthocyanidins in the testa layer. Dominant alleles at the spreader (S) and at the B1 and B2 loci, produce proanthocyanidins in the pericarp and in the testa layer, and can result in a brown-colored grain. The pericarp base color is red, yellow, or white, and is controlled by the R and Y loci. Interestingly, the spreader, intensifier (I) and mesocarp thickness (Z) loci can modify the base pericarp color, leading to colors ranging from brilliant white to black with several shades of red, yellow, pink, orange and brown [31]. Endosperm color in sorghum shows a range of color from white to yellow; recent studies showed that yellow endosperm in sorghum is correlated with carotenoids, which are mainly lutein, zeaxantin and β-carotene [32]. It can be inferred that these colored phenotypes confounded conventional breeders in their effort to control polyphenols in sorghum grains, and might have contributed, in addition to the natural selection described above [30], to the persistence of polyphenols in grains of cultivated sorghum crop.
Marker-assisted selection (MAS) built upon significant loci identified through linkage and, particularly, linkage disequilibrium (genome-wide association studies, GWAS) studies showed interesting potential in several plant breeding and animal husbandry programs, and can be used in sorghum breeding programs for consumer-tailored antioxidant production [31]. The existing linkage and linkage disequilibrium studies aimed at dissecting the genetic basis of antioxidants in sorghum have relied on cultivated S. bicolor populations and resulted in several quantitative trait loci (QTLs) most of which with small effects on antioxidants levels [31][32][33][34][35][36][37][38], which makes MAS less effective for these compounds. Specifically, QTLs for grain color, antioxidant capacity, polyphenols, tannins, anthocyanin, proanthocyanins and 3-deoxyantocyanidins contents have been reported and are recorded in the Sorghum QTL Atlas platform (http://aussorgm.org.au/sorghum-qtl-atlas/) [39]. While the genes underpinning most QTL effects on antioxidant activity have yet to be clearly identified, two major genes, Yellow seed1 and Tannin1 were proved to be involved in the biosynthesis of polyphenols (Fig 1) in sorghum grain [30,40]. They encode for a MYB (Yellow seed1) and a WD40 (Tannin1) transcription factors, the latter having homology to Arabidopsis transparent testa glabra 1 (TTG1). The genomic positions of Yellow seed1 and Tannin1 (on chromosomes 1 and 4, respectively) and their allelic polymorphisms were confirmed to be correlated with tannins concentration in sorghum grains [30,40]. Many other genes have been associated to tannin variation and grain antioxidant activity, including two homologs of Arabidopsis thaliana transparent testa (TT) genes TT10 and TT4 [31,37].
This work aimed at conducting genome-wide study of sorghum grain antioxidants using single nucleotide polymorphism (SNP) markers in a novel diversity panel consisting of a combination of Sorghum bicolor landraces and a progeny derived from S. bicolor × S. halepense (SBxSH) hybridizations. To the best of our knowledge, it is the first time such a panel is used in association analyses. In virtue of intrinsic genetic properties of the plant materials used in this work, it was hypothesized that the landraces and, particularly, the S. halepense genome would constitute a good source of antioxidants and novel useful marker loci for the association studies in sorghum grain, all of which can help discover causal genetic factors underpinning antioxidant activity. In SBxSH combinations, the wild genome of S. halepense represents an Genome-wide association mapping of total antioxidant capacity, phenols, tannins, and flavonoids in sorghum PLOS ONE | https://doi.org/10.1371/journal.pone.0225979 December 5, 2019 untapped reservoir and a good source of genetic diversity that can be used in conventional and molecular breeding. On the other hand, the S. bicolor genotypes were derived from African and Asian landraces, and are expected to harbor a high level of genetic diversity for breeding purposes inasmuch as Africa and Asia represent, respectively, the primary and secondary sorghum centers of diversity [8].

Plant materials
Sorghum materials used in this work consisted of 114 genotypes, of which 95 and 19 were, respectively, Sorghum bicolor (SB) and a mixture of recombinant inbred lines derived from S. bicolor × S. halepense (SBxSH) controlled hybridizations at different levels of filial progeny. These populations were described in previous works [1,8]. Sorghum biclor genotypes were annual sorghums selections derived from landraces from Africa and Asia. The SBxSH lines were selections from annual/perennial (A/P) crosses and A/P backcrosses to annual recurrent parents (A � 2/P; BC1), perennial/perennial (P/P) and annual/perennial//perennial (A/P//P) crosses. The annual parents were induced tetraploids (2n = 40), standard diploid (2n = 20), genetic male-sterile, and cytoplasmic-genetic male-sterile inbred S. bicolor lines. Perennial parents consisted of a S. halepense plant and tetraploid lines obtained by hybridizing induced tetraploid sorghum plants with S. halepense. Open-field trials for these sorghum populations were run in 2014 and 2015 in the CREA-CI's (CREA Research Center for Cereal and Industrial Crops) experimental station of Anzola (Bologna, Italy), using augmented randomized complete block design with 6 controls (checks) and 6 blocks [41]. Seeds were harvested at physiological maturity as indicated by the appearance of the black layer (hilum) in the caryopses at the base of the panicle.

Total phenolic, tannins, flavonoids and antioxidant capacity determination
Total phenols, tannins, flavonoids and antioxidant capacity (TAC) were analytically determined as described in our previous works [2,42]. Briefly, a 10 g sample from each genotype was ground using a 0.5 mmm sieve Cyclotec Udy Mill (UDY Corporation), moisture determined in an oven overnight at 105˚C, and antioxidants and TAC analyzed in duplicate using 100 mg of each sample. For the phenolic compounds the absorbance of samples was measured at 750 nm and expressed as gallic acid equivalents (g GAE kg -1 dry mass basis). For condensed tannins and total flavonoids assays, the absorbances were measured at 500 nm and 510 nm, respectively, and expressed as μg CE (catechin equivalents) g -1 dry mass basis. Total antioxidant content was determined using the 2,20-azino-bis/3-ethylbenzthiazo-line-6-sulphonic acid (ABTS) assay and expressed as mmol TE (Trolox equivalents) kg -1 dry mass basis.

DNA extraction
Sorghum seeds (5-20 per sample) were sown in peat, watered, and were treated with a fungicide (Ortiva, Syngenta, 1ml/L) and an insecticide (Confidor, Bayer, 0.75 ml/L) to protect young plantlets from pathogens and pests. For part of samples grown in winter, seeds were treated with a seed-coating fungicide (Celest, Syngenta, 4 ml/L in water) and allowed to germinate on wet filter paper within petri dishes laid in the incubator Venticell 111 (MMM group) at 25˚C for 4-6 days. 1-3 healthy plantlets (nearly 10-30 cm tall) or 3-5 germinated seeds were collected for each sample and DNA was extracted using the GeneJET Plant Genomic DNA Purification Kit (ThermoFisher Scientific), following manufacturer's instructions. DNA concentration and purity were evaluated by a Tecan Infinite M200Pro spectrophotometer (Tecan Group Ltd., Switzerland), while DNA integrity was checked through 1% agarose gel electrophoresis containing 10 μl/L GelRed (Biotium) as fluorescent dye. For each DNA sample, an aliquot of 60 μl at a concentration � 10 ng/μl was used for downstream analyses.

Whole-genome genotyping-by-sequensing SNP genotyping
The panel of 114 sorghum lines evaluated in this work for antioxidants production was genotyped using a genotyping-by-sequencing (GBS) strategy. The sorghum panel was split in two groups as part of larger sets of 184 and 196 individuals that were genotyped separately as batch 1 and batch 2. The methylation sensitive restriction enzyme ApeKI was used for library preparation, and GBS was carried out on an Illumina HiSeq X Ten platform by BGI Hong Kong Company Limited. The sequencing reads were aligned to the sorghum reference genome (Sor-ghum_bicolor NCBIv3) to enable variants discovery. The two batches yielded two respective matrices of 933,020 and 919,485 markers, and were delivered as separate VCF files which were subsequently merged into a single matrix using VCFtools (https://vcftools.github.io/index. html) [43] resulting in a total of 1,252,091 loci. Marker quality control criteria were then applied to the merged dataset considering only samples having phenotypic and marker data. The filters were implemented in VCF tools to restrict the dataset to high quality standards including biallelic SNPs only, minor allele frequency (MAF) � 0.05, site quality or the Phredscaled probability that reference/alternative alleles polymorphism exists at a given site given the sequencing data Q�40 (i.e., �99.99% base call accuracy), and missing genotypes (NA) � 20%. The final working matrix consisting of 61,976 high-quality SNPs was used in this work for genome-wide association analyses.

Genome-wide association assessment
GWAS was conducted using the statistical genetics package Genome Association and Prediction Integrated Tool (GAPIT) [44]. Since population and family genetic structures can interfere with GWAS bringing about spurious candidate marker loci [45,46], we aimed at detecting and visualizing the existing genetic structure using the principal component analysis and the genomic relationship matrix [47]. The genomic relationship matrix and the top three principal components that explained most of the observed phenotypic variance in the trait of interest, were used to control for population and family structure. Two multi-locus GWAS algorithms, FarmCPU [48] and SUPER [49], were used to identify significant QTLs for four sorghum grain antioxidant properties. The Fixed and random model Circulating Probability Unification (FarmCPU) method is currently a commonly used approach, that improves statistical power compared to the existing GWAS methods.
FarmCPU effectively eliminates confounding between test markers and kinship, controls false positives as well as false negatives, dividing Multiple Loci Linear Mixed Model (MLMM) into a Fixed Effect Model (FEM) and a Random Effect Model (REM) and using them iteratively [48]. The settlement of mixed linear models under progressively exclusive relationship (SUPER) is a powerful method that was developed to solve the computational issues of the mixed linear models (MLM), and the preconditional requirements (number of SNPs be less than the number of individuals in order to derive a rank-reduced relationship) of the FaST linear mixed models (FaST-LMM) method [49]. The GWAS model procedures were implemented by solving the below linear mixed model equation [49]: where y is a vector of observed phenotypes; β is an unknown vector containing fixed effects, including the genetic markers, population structure (Q), and the intercept; u is an unknown vector of random additive genetic effects from multiple background QTLs in the individuals; X and Z are the known design matrices; ε is the unobserved vector of residuals. The population structure (Q) is accounted for using principal component. The u and ε vectors are assumed to be normally distributed with a null mean and a variance of where s 2 g is the additive genetic variance, and K the kinship through which the information about the relationships among individuals is conveyed. K is used as the variancecovariance matrix between the individuals. Homogeneous variance is assumed for the residual effects namely, R ¼ s 2 e I, where s 2 e is the residual variance and I the identity matrix and unknown residual variance This mixed model equation was adapted to perform GWAS using the full model, accounting for kinship as well as population structure confounding effects.
The significantly associated QTLs were determined by the P-value less than 0.01/m, m being the number of markers [50]; no multiple test correction was required for the multi-locus methods implemented in this work because all markers were fitted to a single model and all effects were estimated and tested simultaneously. Fitness of the GWAS models for all traits was evaluated using Quantile-Quantile (Q-Q) plots of the observed vs. expected-log10(p) values which should follow a uniform distribution under the null hypothesis [46]. To characterize novel QTLs (novel markers) and QTLs overlapping with those previously identified, the position of significant markers was compared to the confidence interval of known QTLs for polyphenol-related traits (3-deoxyanthocyanidins, anthocyanin level, proanthocyanidins, polyphenol content, tannin content, antioxidant activity, grain color) retrieved from the Sorghum QTL atlas [39].
To get functional insights into candidate markers, we searched the vicinity of the GWAS significant markers for antioxidant-relevant 'a priori' and 'a posteriori' candidate genes using gene annotation and ontology information in Phytozome (https://phytozome.jgi.doe.gov/) [51]. The extent of the region flanking (upstream and downstream from the SNP position) significant markers and within which candidate genes were identified, was determined by analyzing the genome-wide LD decay which was 500 Kb in this work at the cut-off r 2 = 0.1. All genes involved in polyphenols metabolic pathway were considered 'a priori' candidates, including those genes that were reported in previous works [31][32][33][34][35][36][37][38]. Genes showing homology to the Peroxidases, Laccase and Catechol oxidases families were also included in a priori candidates as they are involved in the oxidation of flavonoids [52]. Regardless of their annotation and putative function, all genes containing a significant marker within their sequence were considered 'a posteriori' candidates.

Statistical analysis
Statistical inferences for the antioxidant traits and the diagrams produced in this work were performed using appropriate routines called from the R, a statistical computing language and environment [53]. The broad sense heritability was estimated as the ratio of genotypic variance to total phenotypic variance, using variance components estimated by fitting appropriate linear mixed effects model to the data and considering genotypes and replications as random effects [54] as briefly axplained below. Variance components and trait broad-sense heritability (repeatability, hereafter referred to as heritability) were estimated by fitting the linear mixed model equation y ij = μ+g i +e ij for i = 1,. . ...,s genotypes, j = 1,. . .,n i replicates (year) for genotype i, y ij is the response variable for genotype i in replicate (year) j; it was assumed g i � Nð0; s 2 g Þ and e i � Nð0; s 2 e Þ. Yearly adjusted means were used and the model was fitted with restricted maximum likelihood using the R package lme4. The heritability was derived through the formula s 2 g =ðs 2 g þ s 2 e =n r Þ where s 2 g ; s 2 e and n r are the genetic (genotypic) variance, residual variance, and the number of replications (years), respectively. The genomic relationship matrix relationship matrix was computed as suggested by VanRaden [47] Results

Population structure, genotypic and allelic properties
The population structure was analyzed using principal component and genomic relationship (hereafter referred to as G matrix) approaches. The two methods produced similar results, and therefore we used the genomic relationship matrix to better visualize the details of the structure. The heatmap of the G matrix (Fig 4) distinguished between SB and SBxSH subpopulations. In the SB subpopulation, two major groups were observed made up of 36 (just below SBxSH) and 61 (bottom) individuals. The diagonal elements of the G matrix were equal to 1 among SB subpopulation, while these elements were > 1.5 in the SBxSH subpopulation, underscoring that SBxSH was definitely a different population with higher rate of heterozygotes relative to the S. bicolor cluster [55]. The frequency distribution of the marker genotypes used in this work are summarized in the histogram in Fig 5. The pattern of the heterozygosity distribution was favorable for conducting GWAS as the homozygotes for the reference allele were the most frequent followed by the homozygotes for the alternative allele, while the heterozygotes were rare. The heterozygosity rate of an individual is the proportion of heterozygous (carrying two different alleles of a specific SNP) genotypes. In GWAS, high levels of heterozygosity within an individual might be an indication of low sample quality whereas low levels of heterozygosity may be due to inbreeding [56]. The distribution of the reference and alternative alleles, and the polymorphism information content (PIC) were presented in Fig 6. The reference allele was negatively skewed in the entire panel and in the SB and SBxSH subpopulations, with the mean smaller than the median, while the alternative allele and the PIC were skewed right, with the mean greater than the median. The statistical dispersion as indicated by the  extent of whiskers (Fig 6) was greater in allelic frequencies than in PIC. The reference allele was more frequent than the alternative allele as expected, but the frequency of the alternative allele was higher in SBxSH than in SB, while the frequency of the reference allele was higher in  SB than in SBxSH. Overall, the PIC averaged 0.17 and ranged from 0.02 to 0.5, with SBxSH subpopulation displaying higher (0.23 vs. 0.08) mean PIC value than SB subpopulation.

Genome-wide association and functional analytics
To investigate the linkage disequilibrium that existed between genetic variants and the antioxidant traits, i.e. total antioxidant activity (TAC), phenolics (FEN), flavonoids (FLA) and condensed tannins (TAN) in the sorghum panel, a whole genome association study was conducted using a genome-wide set of high quality SNP markers and two multi-locus algorithms: SUPER and FarmCPU. Fitness of different GWAS models for all traits was evaluated using Quantile-Quantile (Q-Q) plots (Fig 7) of the observed vs. expected -log10(p) values which should follow a uniform distribution under the null hypothesis. The Q-Q plot showed the -log10(p) values relatively inflated using SUPER, with low scores following the null hypothesis line better when FarmCPU was used. The complete list of significant markers obtained from GWAS analysis is reported in Table 2, whereas the Manhattan plot mapping several strongly associated antioxidant loci is presented in Fig 8. A total of 57 significant SNP loci were detected by the two implemented algorithms across all the traits evaluated: 16 and 44 were identified respectively by FarmCPU and SUPER algorithms, with three markers, Chr1_61095994, Chr4_60363744 and Chr4_61616880, being identified by both methods. Specifically, Chr1_61095994 was significantly associated (hereinafter referred to as associated) with FEN and TAC using FarmCPU, and with TAN using SUPER; Chr4_60363744 was associated with FEN and FLA using SUPER, and with TAC using FarmCPU, while Chr4_61616880 was associated with FEN, FLA, using SUPER, and with FLA using FarmCPU. Several cases of pleiotropy were observed in which single marker loci were associated with several traits. Combining unique associations and pleiotropic cases, SUPER and FarmCPU uncovered 79 and 17 associations, respectively, of which 17, 26, 31, and 5 for FEN, FLA, TAC, and TAN using SUPER, and 5, 6, 3, and 3 for FEN, FLA, TAC, and TAN using FarmCPU. Chromosomes 5 and 8 did not show any associations. Chromosome 1 showed associations with all traits except FLA, chromosomes 2 and 10 showed, each, associations for FLA and TAN, chromosomes 3 and 9 for FLA, chromosome 4 was prolific showing most associations for all traits, whereas chromosomes 6 and 7 showed associations for FEN and FLA. Most of the identified markers fell within or in proximity of genomic regions previously reported to harbor QTLs for polyphenol-related traits levels [31][32][33][34][35][36][37][38][39].

Functional and major effects markers
Setting the R 2 threshold at 15% resulted in 55 associations from 30 significant markers, involving 14 pleiotropic cases in which single markers were associated with 2 to 3 traits. All of these markers had positive effects on respective traits except for two markers on chromosome 9 (Chr9_1550093 and Chr9_48196807) that showed negative effects on flavonoids. These markers that explained at least 15% (R 2 ranged from 15 to 31.2%) of the phenotypic variance were therefore considered as major effects markers as suggested by Liu et al. [57], and were highlighted in boldface in Table 2. On the other hand, 2 (Chr9_1550093, Chr10_50169631) (Fig 9) and 30 of the 57 significant markers fell within 'a priori' and 'a posteriori' genes, respectively. One (Chr9_1550093) and twelve (Chr2_13905455, Chr4_60363744, Chr4_60405036, Chr4_60405075, Chr4_61616880, Chr4_63531227, Chr4_63901177, Chr4_64019027, Chr7_5827884, Chr7_58057317, Chr7_62284152, Chr9_48196807), respectively, of those within 'a priori' and 'a posteriori' genes were major effects markers (Fig 9, S2 Table). The two markers on chromosome 9 (Chr9_1550093) and 10 (Chr10_50169631) that fell within genes whose putative functions are related to antioxidant activity were therefore considered as functional markers. Functional markers and markers with major effects are the most interesting as far as this work was concerned because they can be usefully targeted for marker-assisted selection schemes in breeding programs. The functional marker on chromosome 10 (Chr10_ 50169631) and 8 major effect markers (Chr9_1550093, Chr9_48196807, Chr10_9810260, Chr7_5827884, Chr7_58057317, Chr1_61095994, Chr2_13905455, Chr4_48609207) on chromosomes 1, 2, 4, 7, 9 and 10 were novel markers and identified novel QTLs not previously described in other works reported in sorghum QTL Atlas [39].

Pleiotropic marker loci
Twenty-four cases of pleiotropic markers being associated with 2 to 3 traits were observed using both FarmCPU and SUPER methods. Markers showing pleiotropic effect on three traits

Classes of candidate genes in LD with major effects markers
The genomic regions harboring major effects markers were scanned for the presence of possible candidate genes considering 500 Kb upstream and downstream from the SNP position as suggested by the LD decay analysis. The complete list of candidate genes in proximity of major effects markers is reported in S1 Table. A total of 61 candidate genes were identified and classified in appropriate categories based on their putative antioxidant functions: biosynthesis, regulation, transport and oxidation. Most of the genes (26) found belonged to the transport class followed by those (21) involved in the regulation, biosynthesis (9), and in the oxidation (5).
Only two markers (Chr4_48609207 and Chr4_63531227) were not found in the vicinity of candidate genes within the 500 Kb LD limit. The nearest candidate genes for these markers were observed by enlarging the distance to 809 and 729 Kb from the respective positions of the markers. The biggest class of candidate genes was comprised of 20 genes similar to Glutathione-S-transferase (GST). Nineteen of these genes were on Chr1, and 1 on Chr9. In Zea mays, GST, also called bronze2 (bz2), encodes for a GST protein transporting anthocyanins into the vacuole [58]. In the regulation class were found genes similar to MYB transcription factor (13 genes), WD40 genes (5 genes) and 1 gene similar to Basic helix-loop-helix (BHLH). These three genes together form a protein complex (MBW) that have been studied to be involved in the flavonoid biosynthesis [59]. We came across 3 putative genes similar to peroxidase which encodes for a protein known to catalyze the oxidation of phenolic substrates; peroxidases are also able to produce ROS through the hydroxylic cycle [52]. Two genes were annotated as similar to a Multidrug resistance protein (MRP), one on Chr1 and one on Chr10. In Zea mays a MRP gene was shown to be involved in the transport of anthocyanins [60].

Total antioxidant capacity (TAC)
For total antioxidant capacity (TAC) both FarmCPU and SUPER methods found significant markers on chromosomes 1 and 4, all of which, except Chr4_63852069, displayed positive   Table 2. Each plot shows the output of an algorithm for a specific target trait in the form "algorithm.trait". TAN, TAC, FEN, FLA, respectively, condensed tannins, total antioxidants, phenols, and flavonoids. Refer to text for the description of the SUPER and FarmCPU algorithms.
On Chr4 were found a region dense with significant SNPs located in different positions from 60.1 to 64 Mb. The first part of the region delimited by markers Chr4_60124975 and Chr4_60363744, harbors two genes similar to a WD40 (Sobic.004G256500.1 and Sobic.004G257400.1). The following part of the region is between Chr4_60805525 and Chr4_61635551 markers, where are located 6 putative genes: 3 genes involved in the regulation, two of which are similar to MYB (Sobic.004G267000.2 and Sobic.004G270600.1), another gene similar to Basic helix-loop-helix (BHLH) transcription factor (Sobic.004G270900.3), 1 gene similar to Leucoanthocyanin reductase (LAR) (Sobic.004G267800.1) that is involved in the biosynthetic pathway of polyphenols, 1 gene similar to 9-cis-epoxycarotenoid dioxygenase / beta carotene dioxygenase (Sobic.004G268500.1) that encodes for a protein that cleaves the reaction of beta-carotene at the central bound into two molecules of retinal, modulating beta-carotenoid dietary function [61], and 1 gene (Sobic.004G270800.3) similar to WD40. In the proximity of the two markers Chr4_63901177 and Chr4_64019027 (at about 200 Kb) were found another two genes (Sobic.004G303400.1 and Sobic.004G303600.1) similar to MYB belonging to the regulation category.

Total condensed tannins (TAN)
The GWAS analysis for total condensed tannins produced a different output using the two methods (Table 2). SUPER identified 5 significant markers located on chromosomes 1, 4 and 7, all having a positive effect on the trait. In particular, three markers (Chr1_61095994, Chr7_62284152, Chr7_62396856) were polymorphic only in the SB population. FarmCPU method, on the other hand, yielded 3 significant SNPs on chromosomes 2, 3 and 10, two (Chr2_13905455 and Chr10_9810260) and one (Chr3_57670375) with positive and negative effect, respectively. Interestingly, the markers on chromosomes 2 and 3 were polymorphic only in SBxSH population (Table 2).
Six major effects markers (R 2 = 16-25%) were identified on chromosomes 1 (Chr1_ 61095994), 2 (Chr2_13905455), 4 (Chr4_63531227), 7 (Chr7_62284152, Chr7_62396856) and 10 (Chr10_9810260). Chr1_61095994 showed major effect also on TAC and the relative functional analysis was described above. Chr2_13905455 mapped 332 Kb from Sobic.002G115700.1 similar to Putative chalcone synthase, CHS, homolog to a TT4 gene in Arabidopsis thaliana (AT5G13930). We couldn't find a candidate gene in LD with Chr4_63531227 but, scanning the genome farther away (729 Kb) from this SNP, we came across a gene annotated as similar to Znfinger transcription factor. This gene is homologous to TT1 in A. thaliana (AT1G34790) [31,37], and interestingly, it is located in the beginning of the metabolic pathway of tannins. Chr7_ 62284152 mapped 374 Kb from Sobic.007G186200.1 gene, annotated as similar to a putative anthocyanin-related membrane protein 1 that encodes for a protein involved in the transport of anthocyanins. Chr7_62396856 was found in LD with 3 putative genes Sobic.007G192300.1, Sobic.007G193300.1, and Sobic.007G193900.1 which are similar, respectively, to peroxidase, MADS-box transcription factor homolog to TT16 in A. thaliana (AT5G23260), and WD40 repeat protein encoding genes. Finally, on Chr10 marker Chr10_9810260 is located close to a gene (Sobic.010G106601.1) encoding for a MYB DNA-binding protein.

Total phenolics (FEN)
SUPER and FarmCPU identified, respectively, 17 and 5 SNPs associated to total phenolics (FEN), covering chromosomes 1, 4, 6 and 7 and all having a positive effect on the trait ( Table 2). Out of these 22 significant markers, 16 (15 with SUPER and 1 with FarmCPU) are located in the region between 59 and 64Mbp on chromosome 4. FarmCPU, on the other hand, identified an association involving chromosome 1 (marker Chr1_61095994) and accounting for the highest amount (R 2 = 31.20%) of the FEN variance observed in the panel evaluated in this work. One significant marker on chromosome 1 (Chr1_20707841) showed polymorphism only in the SBxSH subgroup, as the alternative allele was not detected in the SB population ( Table 2).

Discussion
Sorghum breeding programs rely heavily on phenotypic selection to control the contents of polyphenols in the grains. The use of grain color as a proxy to polyphenols concentration was nonetheless complicated by the necessity to have a variety of information including pericarp thickness, pigmented testa, spreader genes, and endosperm appearance, which are correlated with the production of sorghums with increased phenols and antioxidant activity levels [62][63][64][65][66]. Using marker assisted breeding can simplify and expedite breeding for antioxidant activity. In this work, we used genome-wide association study (GWAS) to discover useful variants in linkage disequilibrium with genetic factors controlling antioxidant traits in grain sorghum. GWAS is an important alternative for mapping quantitative traits, and it is implemented extensively in crop improvement programs.
The broad-sense heritability of the traits (FEN, FLA, TAC, TAN) evaluated in this work was very high (Table 1), indicating that the phenotypic variations of the measured antioxidant metrics are mainly affected by genetic factors, and therefore this panel can be used for genetic linkage disequilibrium assessments. On the other hand, the pairwise correlation among the four antioxidant traits (Fig 2) was positively very high implying that the proportion of variance shared by the traits evaluated was mostly due to genetic causes. As a corollary to this statement, the observed phenotypic correlation can be considered as a measure of existing overlap between the sets of genetic influences on respective antioxidant measurements but, cannot be considered as a measure of the absolute magnitudes of those influences. On the other hand, it can be inferred that the observed high correlation between antioxidant components and the TAC is an indication that the antioxidant components evaluated in this work were major source of antioxidant activity in sorghum grain.
A perfect correlation between traits implies that genetic influences on the traits of interest are identical, and this situation can be explained by the existence of either pleiotropy (causal overlap), linkage disequilibrium, or ascertainment bias as related to sampling bias. Although we cannot totally ignore the possibility of ascertainment bias in virtue of the small size of the sample used in this work, we nonetheless witnessed extensive cases of markers with pleiotropic effects on the phenotype (Table 2), and cases of blocks of markers displaying high pairwise correlation (Fig 3A-3G), meaning that these two phenomena effectively explained the observed correlation among traits. With regard to the size of the sample used in this study, since the SBxSH lines and S. bicolor landraces evaluated have a long history spanning, respectively, more than 6 and 20 years, they have accumulated cycles of meiotic and recombination events that qualify the sample size used herein for GWAS investigations [9,14,67]. In addition, the population size did not show major data quality concerns as demonstrated by the uncovered associations that were significant and consistent with previous reports.
The Q-Q plot was used in this work to assess the fitness of the GWAS models and showed the -log10(p) values relatively inflated using the SUPER approach (Fig 7) The same Q-Q pattern was reflected in recent works [68]. Since the SUPER method was ranked among the most statistically powerful algorithms, and given that the family and population structures were properly controlled in this work using kinship (G matrix) and principal components, it can be inferred that the light offset observed in lower SUPER scores were probably caused by the high sensibility and resolution of this method, that in particular, resulted in more significantly linked markers on chromosome 4 (Fig 8). The two GWAS approaches implemented in this work uncovered different sets of significant markers, with only three markers identified in common. Commonly identified markers are particularly interesting but, as Xu et al. [50] pointed out, different methods can detect different markers due for instance to differing sensitivity to minor allele frequency x effect sizes combinations.
Among the previously discovered antioxidant genes, it is worth mentioning that Tannin1 which was found at 262 Kb from the nearest marker Chr4_62050204 did show major effect, and this is agreement with Rhodes et al. [31,37]. Yellow seed1 was located on chromosome1 but was not included in the QTLs reported herein as it was at 3.6 Mb far away from the nearest Chr1_71984009 marker. On the other hand, marker Chr7_58057317 on chromosome 7 showed major effect on FLA and mapped at 75 and 83 Kb from Sobic.007G149000.1 and Sobic.007G148900.1 genes both found to be similar to Flavone 3'-hydroxylase (F3'H) and homologous to TT7 in A. thaliana (AT5G07990), a gene directly involved in flavonoid biosynthetic process [69].
The GWAS investigations reported herein were based on phenotypic data collected from two open field trials conducted over two years. Two-year trials are a threshold standard in agricultural research. Several studies (e.g., [31,37,50]) similar to ours reported GWAS findings based on one-year trial. Sorghum antioxidants concentration is a highly heritable trait, and indeed heritability in this work was above 0.9, meaning that the environmental noise for these qualitative traits is negligible and could not significantly affect the statistical inferences on which we based our findings.

Novel major effect SNPs, functional SNPs, gene hotspots
Previous GWAS works reported molecular markers most of which with small effects on antioxidants levels [31][32][33][34][35][36][37][38]. In the present study, 55 associations from 30 major effects (R 2 � 15%) markers were reported, among which 14 pleiotropic cases were observed involving 2 to 3 traits. These major effects SNPs are expected to effectively boost genetic gain of the traits of interest per unit of time and cost. The negative and positive effects markers can allow breeders to conduct divergent breeding for antioxidant levels. On the other hand, 8 novel major effects markers were discovered in this work and can be used for new causal gene discovery. Two functional novel markers were identified at the proximal and distal sides, respectively, of chromosomes 9 (Chr9_1550093) and 10 (Chr10_50169631). These novel markers characterize novel quantitative traits loci and can be directly used in new breeding applications to increase sorghum antioxidants concentrations quantitatively.
On chromosome 1, marker Chr1_61095994 was found significant for FEN and TAC using FarmCPU method, and for TAN using SUPER, and can be considered as of major interest in sorghum breeding for antioxidant concentration. Indeed, this marker showed high R 2 (25%) for TAN, and the highest R 2 values of 27.1% and 31.2%, respectively, for TAC and FEN. Furthermore, this marker is located in proximity of a chromosome 1 hotspot of 19 genes that are all similar to GST gene which, in Zea mays, is also called bronze2 (bz2) and encodes for a protein conjugating anthocyanins into the vacuole [58]. Another chromosomal hotspot region is represented by the interval 60.1 to 64 Mb on chromosome 4 which is dense with significant SNPs located in different positions from 60.1 to 64 Mb. This region can be a good target for marker assisted breeding because it harbors genes similar to WD40 (Sobic.004G256500.1, Sobic.004G257400.1, and Sobic.004G270800.3), MYB (Sobic.004G267000.2 and Sobic. 004G270600.1), Basic helix-loop-helix transcription factor (Sobic.004G270900.3), Leucoanthocyanin reductase (Sobic.004G267800.1), 9-cis-epoxycarotenoid dioxygenase / beta carotene dioxygenase (Sobic.004G268500.1). These genes are involved in the regulation, biosynthesis, and oxidation of the polyphenols.

Novel contribution of S. halepense genome to antioxidant variability
Although Sorghum halepense is crossed to domesticated sorghum with the main target of introducing perenniality, our findings showed that its gene pool harbors many alleles useful for improving several other traits [10] including antioxidant properties of the grain. As expected [20,30], SBxSH lines outperformed S. bicolor genotypes for all the four antioxidant metrics evaluated in this work (Fig 2). As wild sorghums generally show a higher antioxidant content compared to domesticated ones [20], S. halepense alleles can contribute significantly to this category of traits, especially those, such as tannins, which have been selected against during domestication.
In our study we found significant differences between allele frequencies when the two subgroups SB and SBxSH were compared. On average, the alternative allele frequency was higher in SBxSH than in SB (Fig 6). This evidence might to some extent reflect the narrowing of the genetic base and the consequent reduction of variability associated with sorghum domestication; however, two considerations must be taken into account. First, allele frequencies are computed considering a 1:1 ref:alt allele ratio at heterozygous loci, as is the case for diploid genotypes; however, S. bicolor × S. halepense hybrid lines are expected to be tetraploid. Therefore, possible allele ratios of 3:1 and 1:3 can introduce a bias in such calculation for heterozygous genotypes. Unfortunately, these alleles ratios could not be evaluated due to the low coverage, as it would require a number of reads/locus high enough to support a reliable determination of allele dosage; this latter aspect is however considered out of the scope of this work. The second relevant aspect to consider is the alignment of reads from tetraploid individuals to the reference genome sequence of a diploid inbred line. This implies the alignment of homeologs to the same locus, which results in an overestimation of heterozygosity. Homeologs in S. halepense are expected to descend from orthologs in the genomes of S. bicolor and S. propinquum, being these two species considered its ancestors. The fate of these homeologs in S. bicolor x S. halepense hybrid lines after several generations is very difficult to predict, given the different possibilities of chromosome pairing at meiosis [70]; however, some of the homeolog chromosome pairs can be maintained and contribute to increasing the genetic variability of hybrid lines.
The panel of S. bicolor × S. halepense hybrid lines analyzed in this study is indeed small, yet some significant associations were detected. Among the 57 markers that were found to be associated to one or more of the analyzed traits, the majority (72%) were polymorphic in both SB and SBxSH subgroups (Table 2), although allele frequencies were in several cases different, consistently with what observed on the entire dataset of markers (Fig 6). 13 significant markers were polymorphic only in SB, most likely due to the small number of S. bicolor lines used as parents in SBxSH hybridization, which could retain only a portion of the genetic variability available in domesticated sorghum. Finally, 3 significant markers (Chr1_20707841, Chr2_13905455 and Chr3_57670375) were polymorphic only in SBxSH lines, highlighting the contribution of the S. halepense genome to GWAS in spite of the small size of the SBxSH group. Not surprisingly, two of these markers (on chromosomes 2 and 3) resulted significantly associated to tannins, whose content is expected to be lower in domesticated sorghum due to human selection traditionally oriented against this trait. Noteworthy, the marker Chr2_13905455 was among the major effect loci (R 2 20.75) and registered the highest positive effect of the entire dataset (+5710, Table 2). Its position is consistent with other studies reporting this region on chromosome 2 as associated with grain color [31,32], and the candidate gene Sobic.002G117500.1 (similar to an UDP-flavonoid glucosyl transferase and to A. thaliana TT15), placed nearly 600Kbp far from marker Chr2_13905455, was proposed to underlie its effect on phenotype [31]. In our study, however, a closer candidate, Sobic.002G115700.1, was found at only 330 Kbp from the marker; this gene shows homology to a chalcone synthase and A. thaliana TT4 (S1 Table) involved in the biosynthesis of polyphenols [31,32]. Clearly our results suggest that S. halepense provided a new allele at this locus with a remarkably high effect on condensed tannins (proanthocyanidins) in sorghum grain. The SBxSH lines evaluated in this work can therefore be considered as a good source of antioxidants variability in addition to the possibility of using them to breed for perennial grain sorghum fortified with condensed tannins. The introgression of S. halepense into S. bicolor to develop novel varieties was effectively accomplished in biomass sorghum breeding programs [8]. Nonetheless, the use of S. halepense for improving sorghum grain nutritional quality is expected to be challenging particularly due to the necessity to eliminate negative wild-related traits such as loose panicle, tiny seeds, and seed chattering. Moreover, it is reasonable to hypothesize that using a larger SBxSH population could lead to the identification of new alleles and even new, additional loci associated to these traits.
The purpose of this work was to conduct GWAS of sorghum grain antioxidants using SNPs in a novel diversity panel of Sorghum bicolor landraces and S. bicolor × S. halepense recombinant inbred lines. To the best of our knowledge, it is the first time such a panel is used in association analysis. Sorghum halepense contributed novel polymorphism, and sorghum recombinant inbred lines derived from Sorghum bicolor x Sorghum halepense controlled hybridizations outperformed Sorghum bicolor landraces for antioxidant production. This highlights the importance of S. halepense genome not only as a source of perenniality but also as a donor of genetic factors for antioxidant production. Antioxidant traits were perfectly highly correlated and showed very high broad-sense heritability. The genome-wide association analysis conducted in this work uncovered several major effect and novel QTLs explaning higher proportion of the variability that existed in the antioxidant traits than in previous works. These QTLs can be directly used in marker assisted breeding and/or validated using different approaches including linkage mapping before tools like breeder's chip can be produced for large-scale uses in breeding programs. The GWAS results presented herein and experimental designs used in this work can be implemented in antioxidants genetic investigations and in breeding programs to qualitatively and quantitatively improve the antioxidant production for different purposes including the manufacture of health-promoting and specialty foods.
Supporting information S1 Table. Annotated genes harboring major effect markers (R 2 � 15%). Highlighted in green are genes annotated from Rhodes et al. 2014Rhodes et al. ,2017, in orange genes annotated as similar to Peroxidase, in yellow new annotations from sorghum genome in Atlas. In the first three columns start and stop position on the sorghum genome and transcript name, followed by the nearest marker name and the distance of the gene from the nearest marker, then a column where are shown the GWAS methods and target traits for which the linked SNP was significant, the last column shows the category of the genes. (DOCX) S2 Table. 'A posteriori' genes harboring major effect marker (R 2 � 15%). (DOCX)