Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Gene-Oriented Haplotype Comparison Reveals Recently Selected Genomic Regions in Temperate and Tropical Maize Germplasm

  • Cheng He ,

    Contributed equally to this work with: Cheng He, Junjie Fu

    Affiliations College of Agriculture and Biotechnology, China Agricultural University, Beijing, China, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China

  • Junjie Fu ,

    Contributed equally to this work with: Cheng He, Junjie Fu

    Affiliation Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China

  • Jie Zhang,

    Current address: Molecular Genetics Key Laboratory of China Tobacco, Guizhou Academy of Tobacco Science, Guiyang, China

    Affiliation College of Agriculture and Biotechnology, China Agricultural University, Beijing, China

  • Yongxiang Li,

    Affiliation Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China

  • Jun Zheng,

    Affiliation Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China

  • Hongwei Zhang,

    Affiliation Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China

  • Xiaohong Yang,

    Affiliation College of Agriculture and Biotechnology, China Agricultural University, Beijing, China

  • Jianhua Wang ,

    wangguoying@caas.cn (GW); wangjh63@cau.edu.cn (JW)

    Affiliation College of Agriculture and Biotechnology, China Agricultural University, Beijing, China

  • Guoying Wang

    wangguoying@caas.cn (GW); wangjh63@cau.edu.cn (JW)

    Affiliation Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China

A Gene-Oriented Haplotype Comparison Reveals Recently Selected Genomic Regions in Temperate and Tropical Maize Germplasm

  • Cheng He, 
  • Junjie Fu, 
  • Jie Zhang, 
  • Yongxiang Li, 
  • Jun Zheng, 
  • Hongwei Zhang, 
  • Xiaohong Yang, 
  • Jianhua Wang, 
  • Guoying Wang
PLOS
x

Abstract

The extensive genetic variation present in maize (Zea mays) germplasm makes it possible to detect signatures of positive artificial selection that occurred during temperate and tropical maize improvement. Here we report an analysis of 532,815 polymorphisms from a maize association panel consisting of 368 diverse temperate and tropical inbred lines. We developed a gene-oriented approach adapting exonic polymorphisms to identify recently selected alleles by comparing haplotypes across the maize genome. This analysis revealed evidence of selection for more than 1100 genomic regions during recent improvement, and included regulatory genes and key genes with visible mutant phenotypes. We find that selected candidate target genes in temperate maize are enriched in biosynthetic processes, and further examination of these candidates highlights two cases, sucrose flux and oil storage, in which multiple genes in a common pathway can be cooperatively selected. Finally, based on available parallel gene expression data, we hypothesize that some genes were selected for regulatory variations, resulting in altered gene expression.

Introduction

Following its domestication in southwestern Mexico ~9,000 years ago [1], maize has undergone further selection for local adaptation and through modern breeding, which has had significant effects on the genetic organization of maize [2]. Many studies have explored the dispersal of maize in both temperate and tropical regions [3], but few have examined genome-wide changes in the genetic constitution using genomic approaches [1,4,5]. Although genomic approaches can be applied to ancient DNA to explore evolution [6], most studies have used an alternative strategy—detecting evolutionary footprints by scanning the genomes with molecular markers in populations to reveal clues to past changes [7]. Under natural or artificial selection, the advantageous alleles and linked polymorphisms are found at a higher frequency, causing a reduction in genetic polymorphism, which has shaped adaptation and phenotypes.

Scans for selection in a population have largely been based on searches for distortions in the allele frequency spectrum or haplotypes under assumptions of neutrality [8]. The first genome-wide scans for selection, taking advantage of differentiation across populations, focused on average FST over multilocus windows [9], which had been applied to compare maize lines at various stages in their breeding history [10]. The cross population composite likelihood ratio (XP-CLR) test is another approach used to search for historical selection through population comparisons of the allele frequency spectrum [11,12]. Using this method, evidence for selection across the genome during maize domestication and improvement was evaluated in teosinte versus maize landraces, and landraces versus improved lines, respectively [4]. This same method was also used to reveal complex adaptation in maize worldwide by comparing tropical and temperate lines [13]. The XP-CLR approach is designed to detect “ancient” signature bottlenecks from several thousand years ago, while the cross population extended haplotype heterozygosity (XP-EHH) test, a haplotype-base approach, is designed to detect recently fixed or high frequency alleles in selective sweeps through population comparisons [14]. The XP-EHH test is insensitive to background selection and provides a less confounding approach for the systematic detection of positive selection in the genome [15].

Based on the genome-wide resequencing and analysis of wild, landrace, and improved maize lines, the genomic regions most affected by selection during domestication and improvement have been reported [4]. As an alternative to genomic resequencing, RNA sequencing technology, which defines both sequence and expression divergence [16], provides a cost effective way to explore even larger groups of inbred lines [13]. In addition, information about differential expression also provides an opportunity to examine the global consequences of gene expression in relation to selection pressure [17]. Here, we use 532,815 common Single Nucleotide Polymorphisms (SNPs) from our previous RNA sequencing study of 368 diverse maize lines to detect genomic regions affected by recent improvement by comparing diverse temperate and tropical inbred lines. The recent or ongoing positive selection of genomic regions was detected using a more powerful test based on Linkage disequilibrium (LD) and haplotypes [14], which will yield promising candidates for the molecular breeding of maize.

Materials and Methods

Plant materials and RNA sequencing

A collection of 368 maize inbred lines was studied for the detection of selective sweeps. The diverse group of maize lines, a subset of an association panel, were composed of 190 temperate lines and 178 tropical lines based on modeled genetic components (S1 Table), available pedigree information, and environmental adaptations[18]. This two-group classification included much more diverse temperate or tropical lines, which reduced false positives due to the founder effect of using fewer lines which will only decrease statistical power when conflict of grouping occurs. Sequencing RNA extracted from immature seeds 15 days after pollination (DAP) from the 368 inbred lines generated 25.8 billion reads. On average, 70.3% of the reads mapped to the B73 reference genome (AGPv2) were located in annotated genes (filtered gene set, release 5b60). For genes that had mapped reads at least 71.6% of the genes had >50% of the gene coding region covered by reads. In total, 1.02 million high-quality SNPs and 28,769 genes were detected. Because many of the lines used in this study are nearly homozygous, we do not expect >2 alleles at any given locus within an inbred line. The MaizeSNP50 BeadChip and the Sequenom MassArray iPLEX and PCR resequencing confirmed a subset of high-quality SNPs, which showed a concordance rate >96% [16]. For the purpose of selection scans, SNPs with minor allele frequency (MAF) of <5% were filtered out.

Genome-wide scan for evidence of selection

Extended Haplotype Heterozygosity (EHH) is defined as the probability that two random samples are homozygous at all SNP loci. The Cross Population Extended Haplotype Heterozygosity (XP-EHH) test can detect alleles that have increased in frequency to the point of fixation or near-fixation in one of two populations based on their haplotype differences. XP-EHH has power over a very narrow time scale of positive selection. In our study, we performed a genome scan using the XP-EHH test between temperate and tropical maize lines. We estimated the genetic positions of the SNPs from a generalized additive model, which was fitted by a set of genetic markers with both known genetic positions and physical positions (http://ftp.maizegdb.org/MaizeGDB/FTP/B73_RefGen_v2_dumps/AGPv2_markers-7184.xlsx). Each gene region, including its 10-kb upstream and downstream regions, was taken as a unit to calculate the XP-EHH score of its SNPs. To estimate the genome-wide false discovery rate (FDR) in the selection scan, we divided the 368 maize lines into two ‘artificially identical’ populations and each pair of inbred lines in the two populations are nearly identical in their population genetic components (S2 Table). We calculated XP-EHH scores for all SNP markers in a population comparison using the same method as that used between temperate lines and tropical lines. At a given significance level, the FDR was calculated as P(p0 > z)/P(p1 > z), where p0 is the XP-EHH score from the permuted population, p1 is the XP-EHH score from the real population, z is the cutoff of XP-EHH scores at a given level from the real population, P(p0 > z) is the fraction of the number of XP-EHH scores >z in the permutated populations, and P(p1 > z) is the fraction in the real population.

Identifying selected genomic regions

To find the genomic regions affected by selected alleles, we used a ‘Location and Extension’ method to identify selected regions with the empirical top SNPs. These SNPs were divided into two groups based on their potential selection in temperate or tropical maize lines. First, in each group, the gene with at least two SNPs in the top 5% of the XP-EHH scores is identified. Second, as an average distance of two adjacent genes containing SNP variations is approximately 100 kb, the region is iteratively extended by an up- or down-stream 100-kb region with less significant SNPs (in the top 10% of XP-EHH scores; S6 Fig). The flanking sigificant SNPs define the selected region. After location and extension, the intensities are represented by the highest scores in the extended regions. The regions including the top 1% SNPs were then identified as candidate selected regions, and the gene containing the SNP with the highest level XP-EHH score in each region was defined as the candidate gene under selection.

Phylogenies analysis of candidate genes

Neighbor-joining trees were constructed from the candidate genes with more than 40 SNPs in the gene regions using MEGA7 (http://www.megasoftware.net/) with the parameters Test of Phylogeny = Bootstrap method (500), Model = p-distance, Gaps = Pairwise deletion. Trees were traversed using a custom R script to identify the common nodes of temperate or tropical lineages and the ratios of tropical lines co-segregating with temperate group as well as temperate lines co-segregating with tropical group were calculated. To avoid false positive, the candidate genes of which there were less than 50 lines in temperate or tropical group were excluded.

Population genetic analyses

Population genetic summary statistics (π, FST, Tajima’s D) were calculated for each gene coding region. We chose the SNPs located in the first transcripts of the maize filtered gene sets (B73 AGPv3, release 5b60) for analyses to reduce the confounders caused by different gene isoforms. Gene-based π and Tajima's D calculations were performed with VariScan (version 2.0.3), in which RunMode = 12 and the target region interval was set according to the transcript length. Gene-based FST was also used to measure the genetic differences between maize tropical and temperate subpopulations. Weir and Cockerham's estimator of FST was calculated by VCFtools (version 0.1.14) with the SNPs located in the first transcript of each gene. For whole-genome level calculations of π and FST, all the RNA-seq SNP were included, with the same methods used as for the gene-based analyses.

Annotation and enrichment analysis of maize genes

Gene annotation of protein kinases, transcription factors, and the ubiquitin-proteasome system followed the ProFITS [19] database for maize. Flowering time genes were collected from a study of ZmCCT [20], and the key maize genes were selected based on another study of maize gene evolution [21]. In order to assess the potential biological functions of the genes that underwent selection, the gene ontology (GO) enrichment was analyzed using the web toolkit agriGO [11]. When five or more mapped genes were grouped into each GO term, hypergeometric distributions were applied to test the significance against a background of the maize reference genome.

Analysis of differential gene expression

To quantify gene expression, the RNA-seq reads were mapped to the maize genes (filtered-gene set, release 5b60), and reads per kilobases per million reads (RPKM) was calculated for each gene [22] as a representation of gene expression. Genes expressed in no less than 30 inbred lines in both temperate and tropical populations were used for differential expression analysis. The log10 of the average expression of a gene in each population was calculated, and a t-test was performed to identify whether this gene was differentially expressed at a significance level of 0.05.

Results

More selected genomic regions are found in temperate maize

To identify specific regions of the maize genome affected by selection during temperate or tropical maize improvement, we used a haplotype-base approach (XP-EHH test) to scan for selection signals based on population comparisons of the diverse maize inbred lines. The temperate lines retain 98.5% of the nucleotide diversity present in the group of maize inbred lines, while the tropical lines retain 97.0%. We applied a location-extension strategy to group the selection signals into the swept regions (S1 Fig). We focused our analyses on the regions that include the SNPs at the top 1% cutoff of XP-EHH scores.

In total, we identified 730 and 421 candidate selected regions that are specific to temperate maize and tropical maize, respectively, using an outlier approach (Fig 1A and 1B, S3 Table). We further estimated the genome-wide FDRs of these selected regions. The false discovery rate (FDR) for selected regions in temperate maize identified at the top 1% cutoff was 0.14, and for the tropical maize selected regions the FDR was 0.33. Compared with tropical selected regions, the temperate selected regions contain more genes and are larger. On average, selected regions in temperate maize contained 5.3 genes with a size of 95.2 kb covering 3.4% of the genome, while the selected regions in tropical maize contained 3.2 genes with a size of 71.1 kb covering only 1.5% of the genome (Table 1, Fig 1C and 1D). In addition, the values of Tajima’s D, a measure of deviation of the allele frequency distribution from that expected under the standard neutral model, were relatively lower in temperate maize than in tropical maize. The average value of Tajima’s D <-1 was -1.32 in temperate maize and -1.18 in tropical maize. These results indicate that temperate maize lines experienced more intensive selection than did the tropical lines, which is consistent with the fact that temperate maize has a longer history of modern intensive breeding than does tropical maize.

thumbnail
Table 1. Genomic regions selected in tropical and temperate maize lines.

https://doi.org/10.1371/journal.pone.0169806.t001

thumbnail
Fig 1. Genome-wide analysis of selected genomic regions in maize.

Genome-wide XP-EHH scores for temperate (A) and tropical (B) selected regions. The cutoff lines represent an empirical cutoff for the top 1%. Distributions of region size (C) and gene counts (D) within selected regions in tropical and temperate maize lines.

https://doi.org/10.1371/journal.pone.0169806.g001

Regulatory genes are enriched in the selected regions

Previous studies have shown that genes of interest in crop domestication and improvement are likely to be regulatory genes [23]. In this study, the top five candidate genes under selection in temperate maize (mybr84, tcptf31, bhlh173, CYCD1, ca5p2) encode transcription factors (S3 Table). We also used three super classes of genes in regulatory and signaling pathways [19] to reveal functional overrepresentation of genes under artificial selection. Compared to all maize genes, only kinases were significantly enriched in the candidate temperate genes at a significance level of 0.01 (Table 2). We also found that both transcription factor genes and ubiquitin-related genes in the selected regions show significant enrichment in some specific families (S4 Table), such as bZIP, C3H, Ringfinger (hypergeometric-test, P-value <0.05). Our findings that regulatory genes are enriched under selection suggests that these genes have potentially important roles in characterizing temperate or tropical maize.

thumbnail
Table 2. The top 1% of genes in the selected genomic regions in temperate and tropical maize.

https://doi.org/10.1371/journal.pone.0169806.t002

Gene expression is a target of recent maize selection

Gene expression is one of the most important features that reflects the function of a gene, and expression is a target of natural or artificial selection [24,25]. We used available gene expression data from immature ears as an indicator of relative expression across maize lines and compared differences in gene expression in selected genomic regions between temperate and tropical maize lines (Table 3). Candidate selected genes from both temperate and tropical maize showed a greater relative difference in expression than the genome background at a significance level of P-value <0.05. In particular, we found that nearly half of the candidate genes had significant changes in expression in selected genomic regions in temperate maize (S3 Table). Moreover, 20 temperate candidate genes and 13 tropical candidate genes showed a large and signifcant difference between the two groups (fold change ≥2, significant level of P-value <0.05), indicating that at least some genes in selected regions are under selection for regulatory alleles resulting in altered gene expression.

thumbnail
Table 3. Genes showing differences in expression in the maize genome and selected genomic regions.

https://doi.org/10.1371/journal.pone.0169806.t003

Artificial selection of key maize genes

Modern maize breeding has placed strong emphasis on the selection of existing variants, due to pressure to increase grain yield and intense phenotypic selection of inbred lines [3], which together have left their marks on the maize genome. To understand the influence of recent artificial selection, we focused on key maize genes that have been identified by visible mutant phenotypes [21]. We found that 13 candidate selected genes in temperate maize are key genes (S3 Table, Table 4), and the percentage of selected genes among key genes is significantly higher than the percentage of selected genes in a filtered gene set (hypergeometric-test, P-value < 0.05). This reflects the more intensive selection imposed on these functionally important genes in temperate maize. Functional analysis of 14 strongly selected key genes (the top 0.5%) in selected regions (Table 4) showed their potential roles in various biological processes, such as carbon metabolism (e.g., sus1, incw1), and kernel protein biosynthesis (dzs18; Table 4). Using gene expression data from RNA sequencing (see Methods), we also found that 9 of 14 strongly selected genes showed altered expression.

thumbnail
Table 4. Selected key genes in genomic regions selected during improvement in temperate maize.

https://doi.org/10.1371/journal.pone.0169806.t004

Sucrose flux is strongly selected in temperate maize.

Using gene ontology (GO) enrichment analysis, the selected genes in temperate maize were predicted to be involved in various biosynthetic processes (S2 Fig). Sucrose catabolism, a primary biosynthetic process (Fig 2A), plays a central role in carbon partitioning and biomass accumulation, and can only be catalyzed by two types of enzymes, sucrose synthase (SUS) and invertase (INV). An acid invertase gene (incw1) and the sus1 gene were strongly selected only in temperate maize (Fig 2C, Table 4), suggesting that regulation of the sucrose pathway is the target of temperate selection. This is further supported by the fact that two genes, GRMZM2G109383 (pgm2) and GRMZM2G140614 (Fig 2A), encoding enzymes that act directly downstream, hexokinase (HXK) and hexose phosphate isomerase (HPI), were also under strong selection in temperate maize (S5 Table). In addition, we found two strongly selected neutral/alkaline invertase genes (top 0.5%), GRMZM2G084694 and GRMZM2G115451, which are orthologous to AT1G5650 (α-A/N-Inv) and AT1G72000 (β-A/N-Inv) in Arabidopsis, and differ in structure from the acid invertases [26]. These three selected invertases likely function in different subcellular locations, suggesting a precise coordination of cellular sucrose flux during temperate maize improvement. Directed changes in gene expression in these invertases in temperate maize suggests selection of their cis-acting regulatory regions (Fig 2B).

thumbnail
Fig 2. Selected genes in the sucrose catabolism pathway.

(A) Diagram showing the sucrose catabolism pathway with six genes selected during maize improvement. (B) Expression levels of six selected sucrose catabolism genes in tropical and temperate maize lines. The double stars represent highly significant expression differences between tropical and temperate lines. (C) XP-EHH scores and haplotypes of SNPs in one selected gene (GRMZM152908; sus1).

https://doi.org/10.1371/journal.pone.0169806.g002

Oil storage controlled by selection of multiple genes.

Fatty acids, as a highly reduced form of carbon, are one of the primary storage compounds in maize kernels, providing nutrients for subsequent germination and early development. Triacylglycerols (TAGs) are a major storage lipid (Fig 3, Table 5). Three genes (GRMZM2G083195, GRMZM2G079109 and GRMZM2G061885) that encode the enzymes in the TAG biosynthesis pathway (GPAT, LPAAT, and PDAT), were strongly selected in temperate maize lines, suggesting precise control of oil storage. Among these genes, GPAT showed an association with lipid content. DGAT1-2 is one of the most significant loci associated with oil content [27], and it drives the final acylation to generate TAGs, depending on acyl CoA; DGAT1-2 was not detected as a selected gene in temperate maize. However, a PDAT gene, which belongs to an acyl CoA-independent pathway for the production of TAGs, was found to be under strong selection. Moreover, two additional genes (GRMZM2G152105 and GRMZM2G003501) encoding fatty acid elongation enzymes were also under selection in temperate maize. Promisingly, an orthologue (GRMZM2G167576) of LEC1, which regulates the enzyme-coding genes of fatty acid biosynthesis at the transcriptional level [28] is strongly selected (top 0.5%). This implies that not only the enzyme-coding gene regions but also the regulatory gene regions were under selection to affect oil storage.

thumbnail
Table 5. Candidate selected genes involved in oil biosynthesis in selected genomic regions in temperate maize.

https://doi.org/10.1371/journal.pone.0169806.t005

thumbnail
Fig 3. Selected genes in the triacylglycerol (TAG) biosynthesis pathway.

Three genes (GRMZM2G083195, GRMZM2G079109, and GRMZM2G061885) that encode enzymes in the TAG biosynthesis pathway (GPAT, LPAAT, and PDAT) were strongly selected in temperate maize. The asterisks indicate candidate genes identified in a previous GWAS study of maize oil [27].

https://doi.org/10.1371/journal.pone.0169806.g003

Some flowering genes were newly selected during recent improvement. Previous research suggests that improvement and adaptation in maize may not have been a sequential and discrete process, but was rather an overlapping process [13]. Thus, we focused on the selected genes involved in flowering, one of most important gene sets for adaptation, by examining two hundred flowering candidate genes from a previous study [20]. We detected six candidate selected genes and 24 swept genes in the selected regions that may be involved in flowering (S3 Table). Most of these genes are located on chromosomes 2, 5, and 8 and overlapped with two known flowering QTLs [20] (S3 Fig). We identified two candidate selected maize genes with homology to genes related to the CO/FT module (S4 Fig). Both of these genes, one homologous to the CO repressor COL9 and the other homologous to the FT repressor TOE1, were found in the genomic regions under selection in tropical maize. These findings indicate that the regulators of the CO/FT module were likely to be selected in maize for improvement of yield and yield stability through fine-tuning of the flowering process.

Discussion

Using gene-oriented haplotype comparisons, and taking advantage of the differentiation between tropical and temperate maize germplasm, we performed comprehensive analyses of alleles that underwent selection during recent improvement or modern breeding after the geographical dispersal of maize using SNPs detected in RNA sequencing. Our main finding of interest was that the selected candidate genes in temperate maize were enriched for biosynthetic processes, which indicated that recent intensive improvement efforts targeted downstream genes to modify trait performance without pleiotropic effects. Analysis of these candidate genes highlights three cases where multiple genes in a common pathway can be cooperatively selected, which revealed the traits under improvement. This is consistent with the coordinated consequences of natural selection in humans [14]. With available gene expression data, we can hypothesize that artificial positive selection pressure partly resulted in altered expression among the selected genes.

There are two modern approaches to detect the characteristic signals of positive selection; site frequency spectrum (SFS), and extended haplotype heterozygosity (EHH). Because EHH signatures can detect positive selection in fewer than 1200 generations [29], the EHH-based method should have more power to detect the signals of selective sweeps from recent artificial selection and maize breeding. Here, we chose a cross population EHH (XP-EHH) method, taking advantage of population comparisons between temperate and tropical maize populations. Considering that the SNP set we used originated from transcribed regions of the genome, we subsequently designed a gene-oriented approach to locate and extend the genomic regions that had experienced a selective sweep, which should be applicable to other genome-wide selection studies using cost-effective RNA sequencing. It is important to note that genomic regions identified using “outlier” approaches are not all expected to be targets of selection. SNPs in the tails of the empirical distribution can be false positives caused by historical demographic events [14]. We made every attempt to avoid these false positives by using data from much more diverse lines that are derived from multiple founders (S1 Table) and assessed the FDR of the detected signatures. To further identify stronger candidate genes, we also calculated the ratio of temperate or tropical lines that co-segregated with the incorrect group by constructing phylogenetic trees of these genes. Finally, we acquired the co-segregation ratios of 324 temperate and 100 tropical candidate genes and categorized them into different classes (S6 Table). Among them, 141 temperate candidate genes and 15 tropical candidate genes showed low co-segregation ratios less than 0.20 (S3 Table). Combing with the XP-EHH scores, a candidate gene that has a high XP-EHH score as well as a low co-segregation ratio should be a stronger and more reliable selected one in our study.

A previous report on genome-wide selection scans aimed to identify loci with an overall reduction of diversity for domestication or improvement [4]. Instead, it was the objective of our study to identify temperate- or tropical-specific selection, and the loci with an overall diversity reduction should be excluded. Only a small percentage, 8.1% and 7.1%, respectively, of temperate and tropical selected genomic regions overlapped with the aforementioned regions that were selected for improvement, indicating that at least most of the loci detected in our study are specific for selection in either temperate or tropical maize. Compared with all selected regions detected, the overlapping regions showed a larger range (S5 Fig). Among the overlapping regions, 35 have exactly the same candidate genes. These regions showed a diversity reduction in all 35 temperate or tropical lines included in a previous report [4], but had a high level of diversity in the tropical maize lines used in our study. These cases may be due to a founder effect from using a smaller set of lines and a relatively short time span of improvement, during which mutation has not acted to produce new variations. Comparing with another study in 2016 [12], 5 temperate selected regions were found overlapping with the domestication regions identified in their study (S3 Table), indicating a number of domestication loci could be under further selection in maize improvement.

A recent published study on differentiation between tropical and temperate maize with emphasis on adaptation, used XP-CLR to detect “ancient” signals during adaptation of maize [13], while our study focused on recent maize improvement that changed the genome constitution using XP-EHH. Only 42 of 730 temperate and 21 of 421 tropical selected regions identified by our XP-EHH test overlapped with results of the Liu et al. study, suggesting that there were significant differences between the selection signatures detected by these two studies. This is consistent with the fact that outlier-based genome scans utilizing either SFS or haplotype approaches to identify genomic regions as targets of positive selection had inconsistent overlap [30].

Only a small number of the newly-identified candidate genes have been studied with regards to their function in maize; one example is the temperate gene candidate sus1 that is known to affect carbon partitioning and biomass accumulation [31]. Nucleotide diversity in the sus1 genic region is lower in temperate lines (π = 0.00477) than in tropical lines (π = 0.00737). A fraction of candidate genes have stronger selection signals than these previously-identified genes that underly morphological and physiological changes. For example, the temperate candidate gene GRMZM2G167576, an orthologue of LEC1, shows strong evidence of positive selection (XP-EHH score = -1.569) and the low Tajima’s D value (Tajima’s D = -0.437) also indicates the selection. Because ~10% of the selected regions overlap with association signals of kernel weight traits (data not shown), these candidate genes, together with the swept genes, should prove useful in dissecting existing quantitative trait loci (QTLs).

The majority of maize production takes place in temperate regions of the world. However, tropical maize lines include stress related genes, that are absent in temperate maize. The selected genes identified in tropical maize, which reflect their importance for adaptation to tropical environments, provide a valuable gene pool for temperate maize improvement in the face of ongoing climate change. Tropical maize lines generally are more drought tolerant than temperate maize lines [32]. The drought-related genes dhn1 [33] and olc1 [34] have been strongly selected in tropical maize, suggesting potential roles for their selected alleles in drought tolerance in tropical maize. These selected alleles collectively constitute the genetic basis of drought tolerance in tropical maize.

The genomic landscape opens up new avenues for exploration of global DNA divergence in response to selection. RNA sequencing, a cost-effective sequencing technology, can detect sequence variation in transcribed regions and provides additional gene expression data in the studied lines. This can be integrated into the study of gene performance and differentiation under adaptation and selection during breeding. In combination with novel approaches to QTL mapping, these genomic strategies will facilitate the development of new cultivars in the face of a changing climate.

Supporting Information

S1 Fig. Computational pipeline used to identify selected regions of the maize genome in this study.

https://doi.org/10.1371/journal.pone.0169806.s001

(PDF)

S2 Fig. GO analysis of candidate genes in the top 5% of selected genomic regions in temperate maize lines.

(A) GO analysis of candidate genes in temperate selected regions. (B) Enrichment analysis of GO annotations of candidate genes in selected genomic regions in temperate lines.

https://doi.org/10.1371/journal.pone.0169806.s002

(PDF)

S3 Fig. Selected candidate and selective sweep flowering-time genes in maize genome.

Distribution of flowering-time genes on the ten maize chromosomes. Vertical black hash marks at the bottom of each box represent the 730 and 421 selected genomic regions identified in temperate and tropical maize lines on the chromosomes based on their physical location on the maize AGPv2 reference genome. Green boxes and the letter “c” on the physical maps represent centromeres. Blue dots represent flowering-time genes in temperate selected regions and red dots represent flowering-time genes in tropical selected regions. The chromosomal positions and lengths of flowering-time QTLs are indicated by the orange boxes.

https://doi.org/10.1371/journal.pone.0169806.s003

(PDF)

S4 Fig. Selected candidate genes in the CO/FT module.

Genes shown in red were selected in tropical maize lines.

https://doi.org/10.1371/journal.pone.0169806.s004

(PDF)

S5 Fig. Size distribution of selected genomic regions identified in this study and that of Hufford et al [4].

https://doi.org/10.1371/journal.pone.0169806.s005

(PDF)

S6 Fig. Graphical representation of the method used to identify and extend selected genomic regions.

https://doi.org/10.1371/journal.pone.0169806.s006

(PDF)

S1 Table. Temperate and tropical maize inbred lines.

aSS, Stiff Stalk. bNSS, Non-Stiff Stalk. cTST, Tropic and Subtropic. The population structure was estimated according to the methods in Fu et al. 2013 [16].

https://doi.org/10.1371/journal.pone.0169806.s007

(XLSX)

S2 Table. Two ‘artificially identical’ populations for false discovery rate (FDR) estimation.

aSS, Stiff Stalk. bNSS, Non-Stiff Stalk. cTST, Tropic and Subtropic

https://doi.org/10.1371/journal.pone.0169806.s008

(XLSX)

S3 Table. Selected genomic regions in temperate and tropical maize lines.

https://doi.org/10.1371/journal.pone.0169806.s009

(XLSX)

S4 Table. Significant enrichment of candidate selected genes in transcription factor families and ubiquitin pathways.

aThe number of functional genes in the whole maize genome. bP-value of the hypergeometric-test.

https://doi.org/10.1371/journal.pone.0169806.s010

(DOC)

S5 Table. Genes located in selected genomic regions.

https://doi.org/10.1371/journal.pone.0169806.s011

(XLSX)

S6 Table. Summary of co-segregatuion ratios for candidate genes.

https://doi.org/10.1371/journal.pone.0169806.s012

(XLSX)

Author Contributions

  1. Conceptualization: GW JF CH YL.
  2. Formal analysis: CH JF JZ XY.
  3. Funding acquisition: JF GW JW.
  4. Investigation: XY.
  5. Methodology: CH JF YL.
  6. Project administration: CH JF GW.
  7. Resources: XY JW JZ HZ.
  8. Supervision: GW JW JF.
  9. Visualization: CH.
  10. Writing – original draft: CH JF.
  11. Writing – review & editing: GW JF CH YL.

References

  1. 1. Matsuoka Y, Vigouroux Y, Goodman MM, Sanchez J, Buckler E, et al. (2002) A single domestication for maize shown by multilocus microsatellite genotyping. Proc Natl Acad Sci U S A. 99: 6080–6084. pmid:11983901
  2. 2. Liu K, Goodman M, Muse S, Smith JS, Buckler E, et al. (2003) Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites. Genetics. 165: 2117–2128. pmid:14704191
  3. 3. Wallace J, Larsson S, Buckler E (2014) Entering the second century of maize quantitative genetics. Heredity 112: 30–38. pmid:23462502
  4. 4. Hufford MB, Xu X, Van Heerwaarden J, Pyhäjärvi T, Chia J-M, et al. (2012) Comparative population genomics of maize domestication and improvement. Nat Genet. 44: 808–811. pmid:22660546
  5. 5. van Heerwaarden J, Hufford MB, Ross-Ibarra J (2012) Historical genomics of North American maize. Proc Natl Acad Sci U S A. 109: 12420–12425. pmid:22802642
  6. 6. Jaenicke-Despres V, Buckler ES, Smith BD, Gilbert MTP, Cooper A, et al. (2003) Early allelic selection in maize as revealed by ancient DNA. Science. 302: 1206–1208. pmid:14615538
  7. 7. Gepts P (2014) The contribution of genetic and genomic approaches to plant domestication studies. Curr Opin Plant Biol. 18: 51–59. pmid:24631844
  8. 8. Crisci JL, Poh Y-P, Bean A, Simkin A, Jensen JD (2012) Recent progress in polymorphism-based population genetic inference. J Hered. 103: 287–296. pmid:22246406
  9. 9. Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG (2005) Measures of human population structure show heterogeneity among genomic regions. Genome Res. 15: 1468–1476. pmid:16251456
  10. 10. Jiao Y, Zhao H, Ren L, Song W, Zeng B, et al. (2012) Genome-wide genetic changes during modern breeding of maize. Nat Genet. 44: 812–815. pmid:22660547
  11. 11. Chen H, Patterson N, Reich D (2010) Population differentiation as a test for selective sweeps. Genome Res. 20: 393–402. pmid:20086244
  12. 12. Schneider KL, Xie Z, Wolfgruber TK, Presting GG (2016) Inbreeding drives maize centromere evolution. Proc Natl Acad Sci U S A. 113: E987–E996. pmid:26858403
  13. 13. Liu H, Wang X, Warburton ML, Wen W, Jin M, et al. (2015) Genomic, transcriptomic, and phenomic variation reveals the complex adaptation of modern maize breeding. Mol Plant. 8: 871–884. pmid:25620769
  14. 14. Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, et al. (2007) Genome-wide detection and characterization of positive selection in human populations. Nature. 449: 913–918. pmid:17943131
  15. 15. Enard D, Messer PW, Petrov DA (2014) Genome-wide signals of positive selection in human evolution. Genome Res. 24: 885–895. pmid:24619126
  16. 16. Fu J, Cheng Y, Linghu J, Yang X, Kang L, et al. (2013) RNA sequencing reveals the complex regulatory network in the maize kernel. Nat Commun. 4: 2832. pmid:24343161
  17. 17. Koenig D, Jiménez-Gómez JM, Kimura S, Fulop D, Chitwood DH, et al. (2013) Comparative transcriptomics reveals patterns of selection in domesticated and wild tomato. Proc Natl Acad Sci U S A. 110: E2655–E2662. pmid:23803858
  18. 18. Yang Q, Li Z, Li W, Ku L, Wang C, et al. (2013) CACTA-like transposable element in ZmCCT attenuated photoperiod sensitivity and accelerated the postdomestication spread of maize. Proc Natl Acad Sci U S A. 110: 16969–16974. pmid:24089449
  19. 19. Ling Y, Du Z, Zhang Z, Su Z (2010) ProFITS of maize: a database of protein families involved in the transduction of signalling in the maize genome. BMC Genomics. 11: 580. pmid:20955618
  20. 20. Hung H-Y, Shannon LM, Tian F, Bradbury PJ, Chen C, et al. (2012) ZmCCT and the genetic basis of day-length adaptation underlying the postdomestication spread of maize. Proc Natl Acad Sci U S A. 109: E1913–E1921. pmid:22711828
  21. 21. Schnable JC, Freeling M (2011) Genes identified by visible mutant phenotypes show increased bias toward one of two subgenomes of maize. PLoS One. 6: e17855. pmid:21423772
  22. 22. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 5: 621–628. pmid:18516045
  23. 23. Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestication. Cell. 127: 1309–1321. pmid:17190597
  24. 24. House MA, Griswold CK, Lukens LN (2014) Evidence for selection on gene expression in cultivated rice (Oryza sativa). Mol Biol Evol. 31: 1514–1525. pmid:24659814
  25. 25. Kudaravalli S, Veyrieras J-B, Stranger BE, Dermitzakis ET, Pritchard JK (2009) Gene expression levels are a target of recent natural selection in the human genome. Mol Biol Evol. 26: 649–658. pmid:19091723
  26. 26. Xiang L, Le Roy K, Bolouri-Moghaddam M-R, Vanhaecke M, Lammens W, et al. (2011) Exploring the neutral invertase—oxidative stress defence connection in Arabidopsis thaliana. J Exp Bot. 62: 3849–3862. pmid:21441406
  27. 27. Li H, Peng Z, Yang X, Wang W, Fu J, et al. (2013) Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat Genet. 45: 43–50. pmid:23242369
  28. 28. Mu J, Tan H, Zheng Q, Fu F, Liang Y, et al. (2008) LEAFY COTYLEDON1 is a key regulator of fatty acid biosynthesis in Arabidopsis. Plant Physiol. 148: 1042–1054. pmid:18689444
  29. 29. Oleksyk TK, Smith MW, O'Brien SJ (2010) Genome-wide scans for footprints of natural selection. Philos Trans R Soc Lond B Biol Sci. 365: 185–205. pmid:20008396
  30. 30. Enard D, Depaulis F, Crollius HR (2010) Human and non-human primate genomes share hotspots of positive selection. PLoS Genet. 6: e1000840. pmid:20140238
  31. 31. Zheng Y, Anderson S, Zhang Y, Garavito RM (2011) The structure of sucrose synthase-1 from Arabidopsis thaliana and its functional implications. J Biol Chem. 286: 36108–36118. pmid:21865170
  32. 32. Liu S, Wang X, Wang H, Xin H, Yang X, et al. (2013) Genome-wide analysis of ZmDREB genes and their association with natural variation in drought tolerance at seedling stage of Zea mays L. PLoS Genet. 9: e1003790. pmid:24086146
  33. 33. Close TJ (1997) Dehydrins: a commonalty in the response of plants to dehydration and low temperature. Physiol. Plant. 100: 291–296.
  34. 34. Javelle M, Vernoud V, Depège-Fargeix N, Arnould C, Oursel D, et al. (2010) Overexpression of the epidermis-specific homeodomain-leucine zipper IV transcription factor Outer Cell Layer1 in maize identifies target genes involved in lipid metabolism and cuticle biosynthesis. Plant Physiol. 154: 273–286. pmid:20605912