Genome-Wide Association Study Reveals the Genetic Basis of Stalk Cell Wall Components in Maize

Lignin, cellulose and hemicellulose are the three main components of the plant cell wall and can impact stalk quality by affecting cell wall structure and strength. In this study, we evaluated the lignin (LIG), cellulose (CEL) and hemicellulose (HC) contents in maize using an association mapping panel that included 368 inbred lines in seven environments. A genome-wide association study using approximately 0.56 million SNPs with a minor allele frequency of 0.05 identified 22, 18 and 24 loci significantly associated with LIG, CEL and HC at P < 1.0×10−4, respectively. The allelic variation of each significant association contributed 4 to 7% of the phenotypic variation. Candidate genes identified by GWAS mainly encode enzymes involved in cell wall metabolism, transcription factors, protein kinase and protein related to other biological processes. Among the association signals, six candidate genes had pleiotropic effects on lignin and cellulose content. These results provide valuable information for better understanding the genetic basis of stalk cell wall components in maize.


Introduction
Maize (Zea mays L.) is one of the three most important staple crops, providing protein, lipids and vitamins for billions of people around the world. It, along with silage maize, also serves as an important energy resource for ruminant animal. Although the energy value of forage plants is lower and more variable [1], their stover is highly useful in animal husbandry. Thus, improving the feeding value of forage crops is key target of silage maize breeding.
Plant cell walls make a large contribution to forage utilization [2], whereas, the limited digestion of fiber in the rumen makes the feeding value of forage lower than grain [1,3]. The cell wall of maize plants consists mainly of cellulose, hemicellulose and lignin. The fiber and lignin content, is negatively correlated with cell wall digestibility [4,5]. On the other hand, plant cell wall components are also related with resistance to lodging, pest (such as corn border), disease and abiotic stress [6][7][8][9][10]. In breeding programs, selecting for high stalk strength and resistance to corn border causes an increase in the cell wall components [11,12]. Selection for cell wall components can also affect stover digestibility [13]. To understand the genetic correlation between these traits, several linkage analysis studies were performed to detect quantitative trait loci (QTL) for cell wall components, digestibility traits and stalk strength [14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]. Underlying these QTL, a number of candidate genes were found to be involved in cellulose and lignin biosynthesis, which can help us better understand genetic architecture of cell wall components.
QTL mapping in bi-parent populations identified many genomic regions related to cell wall components, although the low density of the linkage maps and few recombination events in the mapping population limited the mapping resolution and led to a large confidence interval for each QTL. In recent years, genome-wide association studies become a powerful tool for dissecting the genetic basis of complex traits and identifying favorable alleles or haplotypes for target traits [29]. Compared with linkage analysis, association mapping requires less time to develop a mapping population [30], and evaluates more alleles in a diverse population simultaneously [31]. Additionally, based on linkage disequilibrium in natural populations, association mapping uses abundant historical recombination to improve the resolution of the identified QTL. In maize, the LD (linkage disequilibrium) decays rapidly [29], and association analysis with high density markers covering the whole genome can accurately narrow down the association confidence interval into a small genomic region even to the gene level. Up until now, GWAS have been widely used on maize to identify significant associations related to grain quality traits [32][33][34][35][36], agronomic traits [37][38][39], yield traits [40], disease resistance [41][42][43][44][45][46][47] and stress tolerance [48][49][50][51]. Nevertheless, as far as we are concerned, no GWAS was undertaken to detect the associations related with cell wall components in maize. Whereas, GWAS for cell wall related traits were performed in other plants and obtained many potential candidate genes. Associations for cellulose and (1,3;1,4)-β-glucan content were scanned in two barley association panels, and identified a set of CELLULOSE SYNTHASE-LIKE (Csl) genes and genes co-expressed with the CELLULOSE SYNTHASE A gene family [52,53]. In Populus, a GWAS study was conducted to scan 29,233 high quality SNPs in a population of 334 Populus trichocarpa individuals, and found 141 significant SNPs associate with 16 wood characteristics traits [54]. These results provide unique insight into the genetic basis of cell wall related traits.
In this study, we performed GWAS on a set of 368 inbred lines to analyze cell wall components. The objective s of this study were (1) to evaluate variations in stalk cell wall components in the maize natural population; (2) to identify significant SNPs or loci related to the cell wall components and (3) to dissect the genetic basis of cell wall component traits in maize stalks.

Plant materials and filed experiments
The association mapping panel consisted of 368 diverse inbred lines (AM368), including resources from the International Maize and Wheat Improvement Center (CIMMTY), China and the USA. The lines from CIMMTY consisted mainly of tropical or sub-tropical germplasm. Detailed information about AM368 was described in a previous study [34]. These inbred lines were planted in Hainan, Yunnan in 2010; in Hainan, Henan and Yunnan in 2011; and in Hainan and Yunnan in 2012. A randomized block design was constructed at all locations without replication. Each line was planted in a single row (2.5 m in length) of 11 plants at a density of 60,000 plants/ha. Adjacent rows were spaced 0.67 m apart.
forced air oven and air-dried for 10-14 days. Dried stalk samples were ground with a mill and sieved with the 0.1mm mesh. LIG, CEL and HC were detected by near-infrared reflectance spectroscopy (NIRS). Before measuring, stalk samples were dried at 45°C for 48 h to exclude any moisture. Samples were scanned through a near-infrared reflectance spectrophotometer (VECTOR22/N; BURKER Optik, Ettlingen, Germany). The amounts of LIG, CEL and HC were determined using NIRS prediction equations developed for maize plant. The content of each of these three components in maize was expressed in percent of dry matter. Modified partial least squares implemented in OPUS 6.0 Bruker software was used for fitting calibration equations [55]. The coefficients of determination for cross-validation (R 2 CV ) and external validation (R 2 Val ) were 90.5% and 92.7% for LIG, 94.0% and 96.7% for CEL, and 89.7% and 91.2% for HC.

Genotyping
Genotyping of the association mapping panel consisted of two sets, including the MaizeSNP50 BeadChip containing 56,110 SNPs and 1.03 million high quality SNPs detected by RNA sequencing [34,56]. By combining these two sets of genotypes and removing the duplicate SNPs, a total of 559,285 high quality SNPs with minor allele frequencies (MAF) larger than 0.05 were used in this study.

Phenotypic data analyses
The GLM procedure in SAS9.3 (SAS Institute) was performed to dissect the variance of phenotypes in different environments. The model used for analysis of variance was: y = μ + e l + f i + ε li , where μ is the grand mean of the target trait, e l is the environmental effect of the "l"th environment, f i represents the genetic effect of the "i"th line, and ε li is denoted as the residual error. The broad-sense heritability was calculated as h 2 ¼ s 2 g =ð s 2 g þ s 2 ε =eÞ, where s 2 g represents the genetic variance, s 2 ε is the residual error variance item, e is the number of environments. A 95% confidence interval of h 2 was calculated following the method by Knapp et al. [57].
To eliminate the effect of environment variation, we fitted a mixed linear model to calculate the best linear unbiased prediction (BLUP) values for each trait in each line:y i = μ + g l + e i + ε i . In this equation, y i represents the phenotype of the "i"th line, μ is the grand mean value of the target trait in all environments, g i is denoted as genetic effect, e i is the environmental effect, and ε i is the random error. BLUP estimation was obtained by using the MIXED procedure (PROC MIXED) in SAS9.3 (SAS Institute), which should be denoted as the sum of the grand mean and the genetic effect of each line. The BLUP values of each line were used as phenotype values for association mapping.

Genome-wide association analysis and gene annotation
GWAS were performed on cell wall component traits by fitting population structure (Q) and relative kinship (K) in a mixed linear model (MLM), which was implemented in TASSEL 4.1.26 [58]. Population structure and kinship matrices were estimated in a previous study [34]. With the Bonferroni correction threshold for GWAS, we found only one SNP (chr2. S_1360791) associated to HC and none significant association for CEL and LIG. Therefore, a less strict criterion of P < 1×10 −4 was chosen to determine significant SNPs for the three traits. All candidate genes were annotated according to the information available in MaizeSequence (http://ensembl.gramene.org/Zea_mays/Info/Index) and the MaizeGDB database (http://www. maizegdb.org/gbrowse).

Phenotypic variation
With the association mapping population including 368 diverse inbred lines, the extent of the phenotypic variations was estimated for all three cell wall components, LIG, CEL and HC. These traits showed significant diversity with a normal distribution (Fig 1), with mean values of 8.43±0.67% (LIG), 30.26±2.81% (CEL) and 25.47±1.08% (HC) ( Table 1). The correlation among each trait revealed that HC had a weak correlation with both LIG and CEL (r = 0.24 and 0.26, P < 0.01), while CEL was positively correlated with LIG (r = 0.85, P < 0.01). Based on phenotypic values measured in seven environments, the analysis of variance revealed significant differences caused by the genotype effect (P < 0.01) for each trait (Table 1). We observed that all the three traits possess moderate to high heritability (ranging from 0.68-0.83), suggesting that variations of cell wall components are affected mainly by genetic factors.

Genome wide association mapping
To dissect the genetic basis of natural variations in three cell wall components, we performed a GWAS study by fitting a mixed linear model with population structure and familial relatedness. As shown in Manhattan plots (Fig 2), the Bonferroni correction of P < 1.8×10 −6 (1/N, N represents the number of markers used in GWAS) was too strict for detecting associations related to the three traits, and therefore we chose P < 1×10 −4 as the threshold for the present study. The Q-Q plot (S2 Fig) for GWAS based on BLUP value of each trait revealed that the  associations were well controlled for population structure. 22 unique loci were identified associated with LIG, which were distributed on all chromosomes except chromosome 3 (Fig 2a,  Table 2). 64 SNPs covering 18 loci were found significant for CEL, and 1/3 of these loci were located on chromosome 4 (Fig 2b, Table 2). 50 SNPs were significantly associated with HC ( Fig  2c), which contains 24 unique loci located on all chromosomes except 4 and 7. The phenotypic variation denoted by each locus of each trait ranged from 4%-7%.

Candidate genes co-localized with associated SNPs
Based on the available B73 reference genome information, candidate genes containing leading SNPs for each trait were identified. Excluding 5 of 64 leading SNPs located within inter-genic region, 21, 18 and 20 genes were found to be associated with LIG, CEL and HC, respectively (Table 2), and 6 genes were associated with both CEL and LIG. Beside several genes encoding for uncharacterized proteins, the majority of the predicted candidate genes encoded transcription factors, protein kinases, and enzymes involved in cell wall metabolism. The rest of candidate genes encode proteins with specific domains that could not be directly correlated with cell wall components. Remarkably, we found a set of genes involved in the cell wall component biosynthesis pathway or directly related to secondary cell wall modifications. The SNP on chromosome 6 (chr6. S_155653406) significantly associated with LIG was found located within the gene model GRMZM2G140817 (Fig 3a), which encodes a coumarate 3-hydroxylase (C3H) and plays a vital role in the lignin biosynthesis pathway [59]. This protein is one of three cytochrome P450 enzymes which catalyze hydroxylation reactions in the lignin pathway. Another significant SNP located on chromosome 6 at position 164,498,311 is contained in the gene region of GRMZM2G031200 that encodes a secondary wall-associated NAC domain protein (Fig 3b), which is a transcription factor involved in the regulation of secondary cell wall biosynthesis [60]. The most associated SNP, chr6.S_164498311, has a C/T variant and results in an arginine (R) to tryptophan (W) amino acid change on the third exon of GRMZM2G031200. In addition, the most significant association signal for LIG was identified on chromosome 4 at position 145,921,053 (Chr4_145921053) (Fig 3d). Due to the linkage disequilibrium, the adjacent 42 continuous SNPs of the leading SNP showed a strong association with LIG (Fig 3d). These SNPs were located within gene model GRMZM2G133444, which encodes an uncharacterized protein that has not been correlated with the cell wall component biosynthesis pathway. Interestingly, this candidate gene was also found associated with CEL (Fig 3c), but with a different leading SNP from Chr4_145921053.

Phenotypic variation and heritability
Cellulose, hemicellulose and lignin are three organic compounds of the plant cell walls. These aromatic polymers and polysaccharides bond together and provide the basic skeleton for the secondary cell wall. Lignin content, structure and cross-linking between cell wall components has a significant impact on cell wall digestibility [1]. Several studies about the association between lignin pathway genes and cell wall digestibility were performed in association mapping populations with sample size of less than 50 [61][62][63][64]. In this study, 1.2-1.6 fold variations of cell wall components were detected in a larger sample size of association mapping panel which consisted of 368 inbred lines across whole world. According to the information of population structure in previous study [65], phenotypic variation of LIG and CEL between sub-groups was compared and obvious differences were identified (S1 Fig). LIG and CEL levels in the stiff stalk (SS) sub-group was higher than those in the non-Stiff stalk (NSS) and mixed group (Mixed) significantly (p < 0.05), and relatively higher than the Tropical and Sub-tropical group (TST), although it was not statistically significant. These results are not surprising given that SS group lines experienced stalk quality selection which may have caused the increase of the cell wall components.
The analysis of variance revealed that both environment and genetic effects contributed significantly to the phenotypic variation. To avoid the influence of the environment, a mixed linear model was fitted to calculate the BLUP value of each trait. The heritablities of the cell wall components in this study ranged from 0.68 to 0.83,which were as well as in the previous linkage studies that ranged from 0.51 to 0.92 [17,20,66,67].These results indicate that these traits are suitable for dissecting the genetic architecture using GWAS.

Genetic architecture of cell wall components
In previous linkage studies, numerous QTL for the cell wall components and the digestibility trait were detected in diverse populations [14][15][16][17][18][19][20][21][22][23][24][25][26][27]. These QTL were reviewed and projected on a consensus map using meta-analysis [68]. All of the identified cell wall components-related QTL covered 77% of the maize genome, and 42 meta-QTL for cell wall components were identified using meta-analysis that contained 26% of the maize genome. These results suggest that the genetic basis for the cell wall components is complicated and controlled by a large number of genome regions. In this study, we performed GWAS on three cell wall components with more than 550, 000 SNPs across the entire genome. Population structure and kinship matrices were accounted to reduce the spurious association. According to the Quantile-Quantile plot of the association analysis (S2 Fig), false positive associations were well controlled with the Q+K model. However, the Bonferroni correction threshold (P = 1/N, N represents the number of markers used in GWAS) was too stringent for the present study. A less stringent cutoff of 1×10 −4 was applied for significant association detection. And we found 82, 64 and 50 associated SNPs, covering 22, 18 and 24 unique loci, that were significantly associated with LIG, CEL and HC, respectively. Each SNP can explain a small portion of the phenotypic variation, which also reveals that cell wall components are controlled by numerous minor-effect genes or QTL.

Candidate genes for cell wall components
Based on the association signals associated with the target traits, we found a number of candidate genes for the cell wall components. A SNP (chr6.S_155653406) associated with LIG was located within the first intron of GRMZM2G140817 and didn't caused any change in alternative splicing. GRMZM2G140817 encodes coumarate 3-hydroxylase (C3H), a cytochrome P450 dependent monooxygenase. This enzyme catalyzes the hydroxylation reaction of the aromatic ring in guaiacyl (G) and syringyl (S) monolignol synthesis [59]. In Arabidopsis, reduced epidermal fluorescence (ref8) mutant deficient in C3H showed a reduction in lignin content and increased accumulation of p-hydroxyphenyl (H) monolignol [69]. Downregulation of C3H in alfalfa plants also caused a severe decrease in total lignin and an increase in the H monomer, which demonstrated that low C3H activity is correlated with the inhibitory effect of transformation from p-coumaroyl CoA to caffeoyl shikimate [70]. Recently, a ZmC3H1 knock-down study revealed a moderate increase in H monomers and cell wall degradation changes, along with expression reduction of ZmC3H1. The author concluded that the moderate effect of ZmC3H1, when compared to the corresponding C3H in Arabidopsis and alfalfa, was caused by the compensation effect of ZmC3H2 [71]. All these evidence described above suggest that ZmC3H2 may be a candidate gene related with lignin content as we found in present study. In addition, another candidate gene, GRMZM2G031200, which co-localized with a significant SNP (chr6.S_164498311) associated with LIG, was also found located on chromosome 6. This gene encodes an SND2/SND3-like transcription factor that belongs to the secondary wall-associated NAC domain protein (SND) family, and SND2 and SND3 have been shown to induce secondary wall biosynthesis genes in Arabidopsis [60]. Furthermore, GRMZM2G140817 and GRMZM2G031200 were identified localized within the QTL interval in bin 6.06 which was identified related with lignin content in F288 ×F271 RIL population [27,72]. These results provide more supportive evidence that these two candidate genes influence lignin content.
Among the candidate genes associated with CEL, none was specifically involved in the cellulose biosynthesis pathway. On chromosome 7, a candidate gene (GRMZM2G042627) associated with CEL that encodes a kinase associated protein phosphatase, which was also shown to components for one chromosome. (Middle) A 0.5-Mb region on each side of the leading SNP (the SNP with the lowest P value), which is denoted by a purple diamond. The position of the leading SNP is indicated with grey dashed line. The color of the remaining SNPs reflects r 2 values with the most significantly associated SNP. Dashed horizontal lines depict the significance threshold (1×10 −4 ). (Bottom) Gene structure according to the information from the B73 genome sequence in the GRAMENE database (http://ensembl.gramene.org/ Zea_mays/Info/Index).
doi:10.1371/journal.pone.0158906.g003 be related to resistance to the Mediterranean corn borer (MCB, Sesamia nonagrioides L.) in another maize association panel [51]. Considering the impact of cell wall composition on maize resistance to pests (including Mediterranean corn borer) [8], co-localization of associations related to cellulose content and MCB may be the result of the pleiotropy of this stress response regulation gene.
The candidate gene (GRMZM2G017186) which encodes a beta-glucosidase and is located on chromosome 1, was found to be associated with HC. This enzyme is involved in cell wall degradation and catalyzes the degradation reaction of cellulose. In vitro, cellulase mixed with beta-glucosidase has been demonstrated having cross activity of xylanase and β-xylosidase [73]. Since no evidence exists in terms of the overlapping activity of cellulase and hemicellulase in vivo, the association between GRMZM2G017186 and hemicellulose content needs to be validated with further studies.
The association between maize cell wall digestibility and candidate gene polymorphisms was reported for ten key enzyme genes involved in lignin biosynthesis [61][62][63][64]. Several polymorphisms on the PAL gene were associated with neutral detergent fiber and in vitro digestibility of organic matter (IVDOM); however, the results of the association analysis were influenced by population structure [63]. ZmC3H1 and F5H were also found to be associated with forage quality traits, while the associations of these two genes were not significant in multiple tests [64]. Among the remaining genes, only the genetic variations in 4CL1, CCoAOMT2 and ZmPox3 were found to be significantly associated with cell wall related traits [61,62,64]. In the present study, among the association candidate genes other than ZmC3H2 identified by GWAS, none was found involved in the lignin biosynthesis process. Taking previous results into account, it appears that the lignin content is mainly controlled by functional variations in several genes which play key roles in the metabolic pathway.

Co-localization of associations for the different cell wall components
In addition to the candidate genes mentioned above, we found six common associated genes for cellulose and lignin. These genes were annotated encoding a kinase associated protein phosphatase (GRMZM2G042627), an SBP transcription factor (GRMZM2G106798), an NAD(P)binding Rossmann-fold superfamily protein (GRMZM2G148355), a RING membrane-anchor protein (GRMZM2G169994) and two uncharacterized proteins (GRMZM2G036996 and GRMZM2G133444). The kinase-associated protein phosphatase gene, GRMZM2G042627, which has also been found co-localized with MCB related associations [51], was reported to regulate responses to biotic stress [74]. Other genes didn't have direct correlation with cell wall components. In light of the strong correlation between CEL and LIG detected in this study (Table 1) and no obvious intersection between two metabolic pathways, the co-localization of the association signals needs further evaluation.

Conclusions
In the present study, we dissected the genetic architecture of cell wall components using GWAS, and found that cell wall components are controlled by many minor-effect genes. Candidate genes for each component were annotated, most of which encode transcription factors, protein kinases, enzymes involved in cell wall biosynthesis and proteins involved in other biology process. Underlying the significant associations, we found several potential candidate genes with some evidence support, which included a C3H gene involved in lignin pathway and a NAC domain transcription factor related with secondary cell wall modification. With these results, we now have better understanding of the genetic basis of the composition of stalk's cell wall, and can improve these traits through genome-wide selection. However, additional validation, such as candidate gene-based association analysis, fragment introgression and transgenic methods, is necessary to verify these associations.