Nucleotide Diversity of Maize ZmBT1 Gene and Association with Starch Physicochemical Properties

Cereal Brittle1 protein has been demonstrated to be involved in the ADP-Glc transport into endosperm plastids, and plays vital roles in the biosynthesis of starch. In this study, the genomic sequences of the ZmBT1 gene in 80 elite maize inbred lines were obtained, and the nucleotide polymorphisms and haplotype diversity were detected. A total of 30 variants, including 22 SNPs and 8 indels, were detected from the full sequences of this gene. Among these polymorphic sites, 9 SNPs and 2 indels were found to be located in the coding region. The polymorphisms of CDS sequences classified the maize ZmBT1 gene into 6 haplotypes, which encode 6 different ZmBT1 proteins. Neutrality tests revealed a decrease in population size and/or balancing selection on the maize ZmBT1 locus. To detect the association between sequence variations of this gene and the starch physicochemical properties, 7 pasting and 4 gelatinization traits of starch were measured for the tested inbred lines using rapid visco analyzer (RVA) and differential scanning calorimeter (DSC), respectively. The result of association analysis revealed that an indel in the coding region was significantly associated with the phenotypic variation of starch gelatinization enthalpy.


Introduction
Starch or amylum is a carbohydrate consisting of a large number of glucose units joined by glycosidic bonds. Starch rich crops are the main source of dietary energy for the world's population. It has been believed that plant species share the evolutionarily conserved pathway of starch biosynthesis starting from the carbon dioxide fixation, followed by transitory starch degradation, sucrose synthesis, and starch synthesis in the storage organs [1]. Four classes of enzymes are included in the starch biosynthesis pathway, they are ADP-glucose pyrophosphorylase (AGPase), starch synthase (SS), starch branching enzyme (SBE), and starch debranching enzyme (DBE) [2,3]. Among them, AGPase catalyzes the first committed and rate-limiting step in this pathway, and plays vital role in the biosynthesis of starch [4]. Under the catalyzing of AGPase, ADP-glucose (ADP-Glc) is synthesized in the cytosol of cereal endosperms as the main precursor for starch synthesis and has to be subsequently imported into the storage plastids [4]. Because of the importance of cereals in the production of storage starches for human diet and other industrial usage, the activity of the ADP-Glc transporter have gained many attentions as a key component of the starch biosynthesis pathway [5].
One of the ADP-Glc transporters demonstrated clearly in cereal is the protein Brittle1. Brittle1 proteins are plant nucleotide transporters involved in the mitochondrial carrier family (MCF) [6]. The proteins in MCF family transports nucleotides, amino acids, inorganic ions, fatty acids, keto acids and cofactors across the mitochondrial membrane [7]. Physiological researches on the maize Brittle1 mutant have revealed that ZmBT1 (Zea mays Brittle1 protein) was involved in the ADP-Glc transport into endosperm plastids, and played critical roles in the biosynthesis of starch [4,8,9]. The maize endosperm with bt1 mutant is severely reduced in starch content, which results in kernels with a collapsed angular appearance at maturity [4]. The amyloplasts from young kernels isolated from endosperms with bt1 mutant were only 25% as active in ADP-Glc uptake and conversion to starch as amyloplasts from normal and mutant maize endosperms, suggesting that ZmBT1 is involved in the transport of ADP-Glc into maize endosperm plastids [8,10]. The researches in other cereals also revealed that the homologs of BT1 protein possessed the ability in transporting ADP-Glc. For example, the barley lys5 mutant, a homolog of BT1 (HvNST1), show a reduced capacity for ADP-Glc uptake by isolated endosperm amyloplasts [5,11]. However, the BT1 proteins in dicots, such as AtBT1 in Arabidosis and StBT1 in Solanum tuberosum do not transport ADP-Glc, but instead transport AMP, ADP and ATP in a unidirectional mode [12].
Maize (Zea mays L.) is one of the most important grown cereals in the world. It provides staple food to many populations, as well as a major nutrient source for animal feed. In addition, benefitting from its unique character such as low pasting temperature and slow tendency of retrogradation, maize starch is one of the important raw materials for industrial production of food. The pasting properties of maize starch will enormously affect fabrication property, flavor characteristics and keeping in storage. Recently, the RVA profile of starch paste viscosity was widely employed to evaluate the quality of cereal crops, because this method requires only a small size and the procedure is easy to perform [13,14]. Starch gelatinization, one of the most important and unique properties, refers to the process of the disruption of granular structure causing starch molecules to dissolve in water [15]. The gelatinization properties of starch are the most important indexes in many food modification including cooking, baking and extruding starch-based foods [16]. Although the physiological roles on the starch biosynthesis of the maize ZmBT1 gene has been illustrated, the effect of this gene in the formation of maize starch pasting and gelatinization properties reminds unknown. Moreover, there is no association analysis between the nucleotide polymorphisms of the maize ZmBT1 gene and the physicochemical properties of maize starch. In this work, we analyzed the nucleotide polymorphism of maize ZmBT1 locus, and investigate the association between the sequence polymorphisms of the maize ZmBT1 gene and some starch pasting and gelatinization properties.

Materials and Methods
Plant materials and sequencing the maize ZmBT1 gene A total of 80 elite maize inbred lines were used in this study. These inbred lines were also the representative lines, including temperate germplasm from 5 heterotic groups, tropic and waxy germplasm (Table 1). They represented most of the genetic diversity available to breeding and research programs in China. In addition, some germplasm introduced from other countries were also included in this study. The inbred lines were grown in two-row plots with an randomized block design of two repetitions in a natural environment during 2012 in Sanya, Hainan province. The mature seeds for each inbred lines were harvested in bulk for phenotypic data analysis.
Young plant leaves were collected at the four-leaf stage for each accession and stored at 280uC until genomic DNA extraction. Genomic DNA was extracted from the frozen young leaves of the 80 inbred lines using CTAB (cetyl trimethyl ammonium bromide) method [17] according to the modified protocol. The sequences of the ZmBT1 gene in the tested inbred lines were sequenced using the target sequence capture sequencing technology on the NimbleGen platform [18] by BGI Life Tech Co., Ltd.. The genomic sequence and position of the maize ZmBT1 gene (GRMZM2G144081) of the inbred line B73 were used as the reference sequences for target sequence capture.

Measurement of maize starch pasting and gelatinization properties
The pasting properties were measured using a rapid visco analyser (RVA) (Model No. RVA-3D, Newport Scientific, Sydney, Australia). A total of 3-g starch from each inbred line was dispersed in 25 ml of distilled water in the viscometer test canister. The sequential temperature curve for a 12.5 min test was as follows: (1) incubate at 50uC for 1.0 min; (2) increase to 95uC; (3) keep at 95uC for 2.5 min; (4) cool down to 50uC; and (5) hold at 50uC for 1.4 min. The viscosity was evaluated using a constant paddle rotation of 160 rpm. Viscosity values were recorded in centipose (cp).
The gelatinization properties of mazie starches were analyzed using a differential scanning calorimeter DSC 200F3 Maia (Netzsch, Germany). Starch samples (5 mg, dried starch basis) were precisely weighed in the sample pans, mixed with distilled water (10 ml), and sealed. The heating rate was at 10uC per min over the temperature range of 20-100uC. The gelatinization properties were recorded with a thermal analysis data station equipped in DSC.

Sequence analysis
Multiple sequence alignment of the maize ZmBT1 gene was performed using Clustal X and was further edited manually. The software DNASP 5.0 [19,20] was used to analyze sequence nucleotide polymorphism and allelic diversities. Two parameters of nucleotide diversity, p and h , were estimated. Where p is the average number of nucleotide differences per site between any two DNA sequences, and h is derived from the total number of segregating sites and corrected for sampling size. Tajima's D [21] and Fu and Li's D* and F* [22] statistical tests were used to test the evidence of neutral evolution within the selected population and each defined region. The minimum number of recombination events [23] was estimated in the period of evolution of ZmBT1 gene among the tested inbred lines. The linkage disequilibrium (LD) between any two polymorphic sites were estimated using TASSEL v3.0 [24]. In addition, the decay of LD with physical distance in ZmBT gene was evaluated by regression analysis (PROC NLIN and REG in SAS software). The regression models, including linear, loglinear, exponential, power and Remington's models [25], were used in this study.

Population structure and association analysis
Population structure is a major bias factor leading to falsepositive associations. To alleviate the effect of population structure, all inbred lines were genotyped with the SNP chips contained 3,072 random SNP markers evenly covering the maize genome. These SNP markers were selected from 49,585 SNP markers used by recently reported chips [26]. SNP genotyping was performed via the GoldenGate assay at the National Maize Improvement Centre of China, China Agricultural University. The population structure was evaluated by these SNP markers, and the resulting Q-values were obtained from the STRUCTURE program [27]. Five independent runs were performed setting the number of populations (k) from 2 to 8, burn in time and MCMC (Markov Chain Monte Carlo) replication number both to 100,000, and a model for admixture and correlated allele frequencies. The k value was determined by LnP(D) in STRUCTURE output and an ad hoc statistic Dk based on the rate of change in LnP(D) between successive k. The tests of significant association between the sequence polymorphisms with Minor Allele Frequency (MAF) §0:05 and starch pasting and gelatinization properties in the tested population were performed using the general linear model (GLM) in the TASSEL software v3.0 [24].

Nucleotide diversity and selection of the maize ZmBT1 gene
The position and nucleotide sequences of the maize ZmBT1 gene in inbred line B73, whose genome has been fully sequenced, were used as the references to capture of the sequences of this gene in 80 inbred lines. Sequence polymorphisms were detected among 80 maize inbred lines across 2,442 bp of sequence, which covers a 520 bp 59 upstream promoter region, a 624 bp exon_1 region, a 131 bp intron_1 region, a 162 bp exon_2 region, a 128 bp intron_2 region, a 534 bp exon_3 region and a 337 bp 3'-UTR region. Nucleotide substitutions and indels at the ZmBT1 locus were identified, and the results were summarized in Table S1 and  Table 2. From the putative genomic sequences in 80 maize inbred lines, a total of 30 variants were identified, including 22 SNP sites and 8 indels. Among all the SNP sites, only one belongs to singleton variable site, while the other 21 sites belong to parsimony informative sites (Tables 2 and S1). In addition, 2 indels were found to be singleton variations, while the other 6 indels belonged to parsimony variations. For all the 80 inbred lines, the overall nucleotide diversity (p) of ZmBT1 locus was 0.00351. However, we also noticed that the polymorphic sites were unevenly distributed among 7 defined regions of maize ZmBT1 locus. There is no nucleotide substitution in the regions of intron1, exon2 and intron2. In addition, no indel was found in exon2, while all the other regions possessed at least one indel. The Tajima's D statistic is a widely used test to identify sequences which do not fit the neutral theory model at equilibrium between mutation and genetic drift. In this analysis, the estimates of Tajima's D in the regions of promoter and exon1 were both statistically higher than 0 at the level of 0.01. In addition, we also noticed that the Tajima's D statistic for the entire region of the maize ZmBT1 gene was statistically higher than zero. Furthermore, when we combined all three exons, the estimate of Tajima's D was 3.23599, which was also statistically significant at the level of 0.01. These results revealed that low levels of both low and high frequency polymorphisms in maize ZmBT1 locus, and also indicated a decrease in population size and/or balancing selection. In addition, the estimates of Fu and Li's F* for both coding (2.3416) and entire regions were significant for the ZmBT1 gene, also suggesting balancing selection on this gene.

Haplotype diversity of the maize ZmBT1 gene
According to the full length of the ZmBT1 gene in the tested 80 maize inbred lines, a total of 11 haplotypes were detected with a haplotype diversity (Hd) equal to 0.7734 ( Table 3). The tested inbred lines were unbalancedly distributed in these haplotypes. Among the haplotypes identified in this analysis, 6 contained only one inbred line. The most frequent haplotype was Hap_1, which contained 29 inbred lines. In addition, we also noticed that four frequent haplotypes, including Hap_1-4, contained 90% of the tested inbred lines.
In the coding region of the maize ZmBT1 gene, a total of 9 SNPs were detected. In addition, 2 indels were also identified in the coding regions. When we used the coding sequences to identify the hapotype diversity, a total of 6 haplotypes were identified for these 80 inbred lines (Table 3) with a Hd equal to 0.7440. Each of the haplotypes defined by coding sequences of ZmBT1 gene contained at least 2 inbred lines. The most frequent CDS haplotype was CDS_Hap_1, which contained 32 inbred lines.
Among the SNPs detected in the coding regions, only one belonged to the nonsynonymous site which could cause the replacement change of amino acid sequences. In addition, the variation of 2 indels could also result in the change of ZmBT1 protein. Both of these two indels covered 3 or 6 nucleotides, which will not result in frame shift during translation. The indel7 contained three types, including no deletions, 3 and 6 nucleotides deletions. When we translated the CDS into amino acid sequences, 6 types of ZmBT1 protein sequences were found to be encoded by these inbred lines (Fig. 1). Three evolutionarily conserved mitochondrial carrier protein domains (Mito_carr, PF00153) were detected in maize ZmBT1 protein using the tool of Pfam. However, None of the four variants of amino acid caused by two indels and one nonsynonymous SNP was located in the regions of these three domains.

Linkage disequilibrium and recombination events
Linkage disequilibrium was investigated between pairwise segregating sites in order to predict the expected resolution and marker density needed for candidate-gene association mapping. In this analysis, all the SNPs identified in maize ZmBT1 gene and the values of r 2 were used and the result revealed that more than half of the pairs between any two polymorphic sites of maize ZmBT1 gene (130 out of 231 for the tested LD are significant at Pv0:0001) showed significant linkage disequilibrium (LD). To test the decay of LD with increasing physical distance, some regression equations, including linear, loglinear, exponential, power and Remington's models, were estimated. In this analysis, the linear Table 2. Summary of parameters for the analysis of nucleotide polymorphisms of the maize gene ZmBT1. Diversity and Association Analysis of the Maize ZmBT1 Gene PLOS ONE | www.plosone.org regression model was selected to fit the data, because this model possessed the highest coefficient of determination. Our result revealed that the LD decayed rapidly with increasing physical distance. The predicted value of r 2 declined to 0.1 within 2184 bp at ZmBT1 locus (Fig. 2).
The polymorphic sites in the entire Zmisa2 locus were used to detect the evidence of recombination. The patterns of the polymorphisms identified in inbred lines surveyed in this study indicated the history of recombination at ZmBT1 locus, which contributed to the haplotype diversity and the decay of LD. However, only one recombination event has been detected according to the algorithm of Hudson and Kaplan for minimum number of recombination events, and the recombination has been detected between sites 979-1793 bp.

The phenotypic variations and association analysis
Pasting properties of various corn starches measured by RVA, including PV, TV, BD, FV, SB, PT and PTP have been summarized in Table 4. The gelatinization temperatures (onset, T o ; peak, T p ; and conclusion, T c ) and enthalpy of gelatinization (DH), for maize starches from different inbred lines, measured  using DSC are also presented. Significant difference in all the pasting and gelatinization properties among different maize inbred lines was observed through one-way ANOVA. These results suggest that the 80 inbred lines used in this study are representative in terms of maize quality and are qualified for association analysis.
To explore the relationship among 11 starch pasting and gelatinization properties, the pairwise correlation analysis was performed, and the Pearson correlation coefficients (r) between    any two parameters were obtained (Table 5). Interestingly, the significant correlations were found between any two pasting parameters, and only 5 pairwise correlations, including PT/PV, PT/TV, PT/FV, PT/SB and SB/PTP did not reach the significant level. Among the 6 pairwise correlations for gelatinization properties, only the pairs of DH=T o and DH=T c showed no significance. In addition, the correlations between the pasting and gelatinization properties were also investigated, and the results revealed that DH showed significant correlations with PV, TV, BD, PT and PTP, T c with PV and BD, and T p with BD. These results suggested that potentially different genetic mechanisms were responsible for these starch viscosity properties.
GLM of association analysis that controlled the effects of population structure was used to identify relative association of 11 starch pasting and gelatinization properties and genotype variants in maize ZmBT1 gene. All nucleotide polymorphisms, including SNPs and indels, with a frequency of more than 0.05 of the rare alleles were considered in the association analysis of phenotypegenotype in both genes. Only one variant (indel7 in exon2) in maize ZmBT1 gene showed significant association with DH, while all the other variants had no association with starch pasting and gelatinization properties. Indel7 can cause a deletion of glutamic acid (E) in 27 inbreds or a deletion of glutamic and asparagic acids (ED) in 3 inbreds. Because the latter possessed a frequency of lower than 0.05 in the tested population, it was not used in the association analysis. According to the result of association analysis, indel7 explained 9.26% of the phenotypic variant of starch DH. The mean value of DH was 6.091 with a standard deviation 1.087 for the alleles carrying the deletion of glutamic acid (E) in protein product, and this value was statistically lower than those without deletion (6:449+0:766) based on independent samples t test (Pv0:05).

Discussion
The abundant genetic variations enable plant breeders to create novel plant gene combinations and select crop varieties more suited to the needs of diverse agricultural systems. The analysis of the genetic diversity for crop functional genes is critical for understanding the genetic background of phenotypic variation, and in turn will provide great help for crop improvement [28,29]. In this study, 30 variants, including 22 SNPs and 8 indels, were identified in the full-length sequence of the maize ZmBT1 gene. Among these SNPs, 9 were found in the coding region, one of which were nonsynonymous and the others were synonymous. In addition, there were two indels in the coding region of this gene. The nonsynonymous SNPs and indels in the coding region would result in the changes of protein product. The SNP sites and indels in the coding region also classified the tested inbred lines into 6 haplotypes, which encode 6 deferring ZmBT1 proteins. However, lower frequency of variant was found in the intron regions of this gene. Particularly, none of the SNPs was identified in two introns of this gene. This may be the result of that the intron regions in this gene are much shorter than the coding region.
In this study, significantly positive statistics were obtained for promoter, coding and entire regions of the maize ZmBT1 gene through Tajima's neutrality tests. Thus, a decrease in population size and/or balancing selection was suggested for this gene.
Balancing selection refers to a number of selective processes by which multiple alleles are actively maintained in the gene pool of a population at frequencies above that of gene mutation. Balancing selection usually happens when the heterozygotes for the alleles under consideration have a higher adaptive value than the homozygote [30]. Thus, potential high heterozygosity at the ZmBT1 locus in the tested population is suggested.
LD is the non-random association between allelic polymorphisms at two loci. It was suggested that recombination and selection were the main determinants of LD [31]. Maize is an outcrossing crop with extensive morphological variation, genetic diversity and high effective frequency of recombination [32]. Recent researches revealed that the rapid breakdown of LD in diverse sets of maize germplasm [25,33]. In this analysis, we found that the decay of LD in the maize ZmBT1 locus was slower than the expected value. This may be the result of low frequency of recombination in this gene, because only one recombination event was detected in the ZmBT1 gene.
Gelatinization temperature and enthalpy of maize starch plays an important role in grain quality. The enthalpy of gelatinization gives an overall measure of crystallinity and may be indicative of the loss of molecular order within the granule [34]. Previously, some genes in starch biosynthesis pathway were found to affect the phenotypic variation of starch gelatinization properties. Based on the strategy quantitative trait loci (QTL) mapping, Tan et al. demonstrated that the Wx gene and two loci including starchbranching enzyme (SBE) genes in rice controlled the starch gelatinization properties [34]. According to the results of association analysis, the sequence variations of rice genes wx, SSI, and SSII-3 were found to be associated with gelatinization properties T o , T p and T c in waxy rice. In addition, the enthalpy of gelatinization (DH) of rice starch is controlled by wx and SSII-3 [35]. The cereal protein BT1 is involved in the ADP-Glc transport into endosperm plastids, and played vital roles in the biosynthesis of starch. In this study, we showed that the maize ZmBT1 gene possessed abundant nucleotide polymorphism. Further evidence based on association with pasting and gelatinization properties revealed that an indel in coding regions of this gene was associated with gelatinization enthalpy (DH) of maize starch. Although DH showed correlations with other pasting and gelatinization properties, no association was found between the polymorphic sites and these traits. In addition, the formation of starch pasting and gelatinization properties is a complex process, and all these properties are quantitative traits influenced by multiple genes. Thus, these results obtained needs further verification owing to that only one gene ZmBT1 in the starch biosynthesis was used.