Recent reports have shown than many identically named genetic lines used in research around the world actually contain large amounts of uncharacterized genetic variation as a result of cross contamination of stocks, unintentional crossing, residual heterozygosity within original stocks, or de novo mutation. 27 public, large scale, RNA-seq datasets from 20 independent research groups around the world were used to assess variation within the maize (Zea mays ssp. mays) inbred B73, a four decade old variety which served as the reference genotype for the original maize genome sequencing project and is widely used in genetic, genomic, and phenotypic research. Several clearly distinct clades were identified among putatively B73 samples. A number of these clades were defined by the presence of clearly defined genomic blocks containing a haplotype which did not match the published B73 reference genome. The overall proportion of the maize genotype where multiple distinct haplotypes were observed across different research groups was approximately 2.3%. In some cases the relationship among B73 samples generated by different research groups recapitulated mentor/mentee relationships within the maize genetics community.
Citation: Liang Z, Schnable JC (2016) RNA-Seq Based Analysis of Population Structure within the Maize Inbred B73. PLoS ONE 11(6): e0157942. https://doi.org/10.1371/journal.pone.0157942
Editor: Tsuyoshi Koide, National Institute of Genetics, JAPAN
Received: April 13, 2016; Accepted: June 7, 2016; Published: June 27, 2016
Copyright: © 2016 Liang, Schnable. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All RNA-seq datasets files are available from the NCBI database (accession number(s) SRR3372478, SRR3371876, SRR651051, SRR1819621,SRR404150, SRR514100, SRR940300, SRR395191, SRR395192, SRR395194, SRR39520, SRR445245, SRR039505, SRR039506, SRR755252, SRR762349, SRR762350, SRR762351, SRR764626, SRR764627, SRR1656746, SRR1567899, SRR504480, SRR1587038, SRR1231518, SRR1272115, SRR640263, SRR520998, SRR520999, SRR536834, SRR999052, SRR248565, SRR491307, SRR1522119, SRR910231, SRR924107, SRR1030995).
Funding: This project is supported by start-up funds from University of Nebraska-Lincoln to JCS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
A great deal of biological research depends on reference genotypes that allow researchers around the world on work with material that is genetically identical or nearly identical. For many decades, assessing whether samples labeled as coming from genetically identical sources truly were identical was a costly, time consuming, and often inconclusive process  . However, recent advances in genotyping and sequencing technology have revealed a number of cases where sample names and sequence information significantly different stories. One study of human cell cultures found that 18% of cell lines were either contaminated or something entirely different from what they were labeled as  with the widely used HeLa cell line being one of the most frequent offenders . Among plants, a recent resequencing study of arabidopsis demonstrated that a line believed to carry a mutation for the ABP1 gene in an otherwise Col-0 background actually contained a wide range of other nonsense and missense mutations as well as a large region on chromosome 3 which came from a different arabidopsis accession . In soybean (Glycine max), segregating variation covering ~3.1% of the soybean genome assembly was observed between two sources of the reference genotype used in the construction of the soybean reference genome . Resequencing of multiple plants from a single batch of Columbia-0 seed in arabidopsis identified multiple haplotypes present in areas that summed up to ~20% of the total reference genome . The problem of contaminated or mislabeled samples is a very real one in plant biology, and can invalidate the results of experiments in which substantial time and resources have been invested .
Here we set out to quantify how severely these issues of divergence among samples labeled as belonging to the same genetic background impact maize (Zea mays), a preeminent model for plant genetics over the past century. Unlike soybean and arabidopsis, maize is a naturally outcrossing species, so reference genotypes must be maintained by manually controlled self-pollination in each generation. Previous studies using small sets of individually scored markers have identified genetic variation between different sources of the same maize inbred . This study focuses specifically on the maize reference genotype B73, which was developed in Iowa and first registered in 1972 , widely used in commercial hybrid seed production across the United States for much of the 1970s and 1980s  and is represented in the parentage of many elite lines even today . B73 has also been widely used by plant biologists conducting basic genetic research in maize, and was employed in the sequencing and assembly of the first maize reference genome .
Materials and Methods
A search of NCBI’s sequence read archive identified 25 Illumina RNA-seq data sets deposited by 19 independent research group in three countries (Table 1). Two additional RNA-seq data sets were constructed from B73 seed requested from Iowa State and the USDA’s Germplasm Resources Information Network (Control 1 and Control 2 respectively). For these two samples RNA was extracted from 10-day old B73 seedlings grown at the University of Nebraska-Lincoln (Table 1). In four cases where the total amount of data per run was limited (USA 6, USA 8, USA 9 and USA 17), data from multiple sequencing runs labeled as coming from the same sample were grouped together for analysis. In one case, SRR514100, the total quantity of data was excessive, so only 1/10th of the total data set was employed.
Alignment and initial SNP calling
Low quality sequences were removed using Trimmomatic-0.33 with settings LEADING:3, TRAILING:3, SLIDINGWINDOW:4:15, MINLEN:36 . Trimmed reads were aligned to the repeat masked version of the maize reference genome (version B73 RefGen v3)  downloaded from Ensemble (ftp://ftp.ensemblgenomes.org/pub/plants/release-22/fasta/zeamays/dna/) using GSNAP in version 2014-12-29 (with parameters -N 1,-n 2,-Q) . Output files were converted from SAM to BAM format, sorted, and indexed using SAMtools . SNPs were called in parallel along ten chromosomes of the maize version 3 using SAMtools mpileup (-I -F 0.01) and bcftools call (-mv -Vindels -Ob).
SNP list generation
The view function of Bcftools was combined with in-house Python scripts to extract the content of bcf files and classify SNPs based on the number of reference and non-references alleles on every screened SNP locus. In detail, if the total number of reads covering a particular SNP in a particular sample was below 5, then the site was treated as missing data. When 99% reads on the locus of a sample were from the non-reference allele the sample was coded as homozygous non-reference allele. The same criteria were used for calling a site as homozygous reference allele. When the reads containing reference and non-reference alleles totaled more than 90% of all reads and each allele was represented by more than 20% of aligned reads the site was coded as heterozygous. If two or more alleles were present at >1% of aligned reads but the above criteria were not satisfied, the site was also coded as missing data. To reduce the prevalence of false SNPs resulting from the alignment of reads from multiple paralogous loci to a single position in the reference genome, sites which were scored as heterozygous in more than 20% of all genotyped individuals were discarded. In total, 13,360 SNPs were used in downstream analysis. For each of these SNPs, the impact of the SNP on gene function was estimated using SnpEff v4.1 and SnpEff databases (AGPv3.26) .
Population structure analysis
The distribution of the three possible genotypes (homozygous reference allele, homozygous non-referenece allele and heterozygous allele) over each of the ten chromosomes of maize was visualized using matplotlib. PhyML 3.0  was used to construct a phylogenetic tree with 100 bootstrap replicates, and 13,360 SNPs in total of 27 data sets. The maximum parsimony tree was constructed using Phylip-3.696  and the full set of 13,360 SNPs with missing data imputed by LinkImpute .
Expression bias test
Individual FPKM (Frequency per kilobase of exon per million reads) value for each gene in each data set was calculated using Cufflinks v2.2.1 . Expression values were averaged across all China and USA South samples (excluded USA 12 sample that contained a unique introgressed region) separately. Only genes with average FPKM values >= 10 in both groups were retained for testing expression bias. The remaining genes were sorted into two groups: genes located in the 7 chromosome intervals where USA South and China showed different haplotypes and genes outside these intervals. The median gene expression value on behalf of each group was used to be compared.
Origins of haplotype blocks
The origins of haplotype blocks observed in some B73 accessions but not in the published reference genome were investigated using data from diverse maize lines in the HapMap2 project . In order to make comparisons to these data, alignments and SNP calling were performed a second time as above using B73 RefGen v2. All of samples in China or USA North clade were combined to generate a consensus sets of SNP calls with reduced missing data. In examining region c2r2, sample USA 12 was used individually in addition to the combined China and USA North sequences (S1 Fig). In the analysis of region c5r2 (S1 Fig), USA 10, USA 14 and USA 15 were combined to generate a consensus set of SNP calls for the UC-Berkeley clade. The resulting SNP sets were employed for phylogenetic analysis as described above, with the alteration that the an approximate likelihood ratio test (aLRT) method with SH-like was employed. The resulting trees were visualized using FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/).
Relationship among accessions labeled as B73
After alignment, SNP calling, and filtering (see Methods), a total of 13,360 high confidence segregating SNPs were identified among the 27 RNA-seq samples labeled as B73 employed in this study, substantially lower than the ~64,000 high quality SNPs identified by RNA-seq in a population segregating for a single non-B73 haplotype . Phylogenetic analysis identified three distinct clades of samples separated by long branches with 100% bootstrap support (Fig 1; S2 Fig). One clade consisted entirely of Chinese samples, one clade of samples from US research groups from Minnesota and Wisconsin, and the final clade encompassed the majority samples from US research groups as well as all German samples and the published reference genome for B73. We designated these clades “China”, “USA North”, and “USA South” respectively. Notably, the USA North clade is paraphyletic with respect to the China clade, suggesting B73 samples in China are likely derived from this group while both German samples are clearly part of the USA South Clade.
(A) Distance-scaled branch lengths; (B) Unscaled tree. Only bootstrap values greater than or equal to 60 are displayed.
The USA South clade was somewhat arbitrarily divided into three subclades with at least 60% bootstrap support, as well as a number of singleton lineages (USA 1, USA 13, USA 19). Two of these clades contained control samples generated for this study, one from B73 seed requested through the USDA Germplasm Resource Network, and one from B73 seed requested from Iowa State. The subclade containing the known USDA B73 sample also contained the B73 reference genome sequence, consistent with the reported seed source for the B73 used in the construction of the reference genome. The final subclade did not contain any control samples. However, it was notable that four of the six samples placed in this clade originated in research groups whose PIs had conducted either PhD or Postdoctoral training with Michael Freeling at UC-Berkeley, and none of the samples outside of this clade originated in research groups linked to UC-Berkeley. Based on these, we designated the final USA South subclade “UC-Berkeley”. This accessions has also been described as “Freeling B73” . In addition, the three major clades were also recovered in a parallel analysis using a tree generated using maximum parsimony, however the three subclades within USA South subclades were not fully recovered with identical membership (S3B Fig). The consistency index (CI) and retention index (RI) for this tree was 0.825 and 0.861 respectively. Gene flow can product significant amounts of apparent homoplasy when constructing trees from multiple accessions of the same species. Therefore, these values were relatively higher than expected.
Genomic distribution of within-B73 polymorphisms
The polymorphic SNPs identified in this study could originate from one of several sources including de novo mutations or the introgression of non-B73 haplotypes in one or more lineages. SNPs originating from de novo mutations would be expected to show a distribution approximating that of gene density across the maize chromosomes. SNPs resulting from introgression of other haplotypes into B73 should be tightly clustered.
When the positions of the SNPs identified in this study were plotted it became clear that 55.3% SNPs identified in this study fall within a small number of dense genomic blocks on chromosomes 2, 4, 5, and 6 (Fig 2). The distribution of non-reference-genome-like haplotype blocks is consistent with the clade relationships identified above. The USA North clade can be defined by a large block of SNPs on chromosome 2, and smaller blocks on chromosomes 2 and 5, all of which are shared with the China B73 clade. In addition to the blocks shared with the USA North B73 clade, samples from the China B73 clade all share a number of additional non-reference-genome-like blocks on chromosomes 2, 4, and 6. There are no non-reference-genome-like blocks shared by all members of the USA South clade, however a single non-reference-genome-like block on chromosome 5 is shared by the UC-Berkeley subclade of USA South. This block appears to share one breakpoint but not both with a block present in the USA North and China samples. Based on the location of this block, it is likely the same divergent haplotype region identified between the B73 reference genome and the B73 sample used to construct HapMap1 . The large block non-reference-genome-like block like SNPs observed only on chromosome 2 on USA 12 can likely be explained by the unique origin of this sample from wild type siblings of knotted1 mutants backcrossed into B73 . The remaining USA South samples, including the USDA GRIN, Iowa State, and German samples do not contain any obvious SNP blocks.
Functional impact of within-B73 polymorphism
Because the data used here came entirely from RNA-seq studies, our ability to detect SNPs was limited to genes which were consistently expressed at high enough levels to provide coverage of target regions. A total of 25,644 genes were expressed at levels >10 FPKM when at least one of data sets analyzed in this study. Of these genes, 633 (2.5%) fell within regions with non-reference-genome-like SNP blocks in one or more B73 clades. Using SnpEff, we identified 10 cases where SNPs produced “high impact” change such as the gain or loss of a stop code or the alteration of a splice donor or splice acceptor site and 396 cases which produced missense mutations which altered protein sequence. Only three genes with reported mutant phenotypes (whp1, mop1, and gol1) were in these regions, which only constituted at 2.7% of 112 classical identified maize genes with reported mutant phenotypes . However, it must be noted that this is likely an underestimate of the true number of changes, nonsense mediated decay may reduce or eliminate the expression of alleles of genes containing high impact SNPs, reducing the chances these SNPs will be detected from RNA-seq data.
Impact of within-B73 polymorpism on estimated gene expression
Overall, limited correlation was observed between gene expression level and detected SNP density. The correlation coefficient r between SNP density (number of snps per 1000 bases of exon sequence) and median gene expression across all analyzed datasets was 0.018 and 0.211 for genes outside and inside of block regions respectively (S4 Fig). A previous study found that alignment rate for RNA-seq data from non-B73 genotypes to the B73 reference genome is approximately 13% lower than the alignment rate of RNA-seq data generated from B73 plants . To test whether the introgression of non-reference genome like blocks created a bias towards lower estimated expression of genes in those blocks, for each gene within a block, the the median gene expression value observed across all datasets containing the block was compared to the median gene expression value across datasets where the same genomic region matched the reference genome. The comparison of global patterns across large populations of genes controls for experiment specific changes in the regulation of individual genes. Genes within introgressed regions showed a 5.6% reduction on expression relative to a control set of genes outside introgressed regions in this comparison between B73 USA South and B73 China (see Methods). This reduction was approximately half as large as would be predicted if the reduced alignment rate of data from non-B73 samples resulted solely from the increased difficulty of aligning reads containing SNPs to the reference genome. Potentially, the other half of the reduced alignment rate for non-B73 samples is the result of reads originating from transcripts of lineage specific genes, as previously suggested .
Origins of polymorphic regions in B73 accessions
A total of 7 chromosome intervals (referred to here as c2r1, c2r2, c4r1, c5r1, c5r2, c6r1 and c6r2) containing non-reference genome haplotypes were identified in two or more samples (Table 2; S1 Fig). SNP calls were extracted from individual non-reference-genome-like blocks using the previous version of the maize reference genome (B73 RefGen v2) and compared to genotype calls generated from 103 diverse inbreds resequenced by the Maize HapMap2 project . One example, c2r1 is shown in Fig 3A. The non-reference genome haplotype present in this block for the Chinese samples clusters very closely with W22 (Fig 4), an older inbred developed in Wisconsin which has also been widely used in the maize genetics research community. Analysis of the other six large haplotype blocks produced longer branch lengths relative to the accessions represented in the Maize HapMap2 dataset (Table 2). However, in each case the haplotypes generated from each clade containing a non-reference-genome-like block clustered together, confirming that these regions did not result from parallel introgressions covering the same regions of the genome. Consensus SNP calls from the UC-Berkeley, USA North, and China B73 samples all clustered together with the HapMap2 B73 accession, but not with the B73 reference genome sequence (S8 Fig) which suggests that the source of B73 seed used for HapMap2—like HapMap 1 —likely belonged to the UC-Berkeley subclade. Constraining the c2r2 region to only cover that portion of the genome which contained a block of SNPs in the USA North clade, the China clade and sample USA 12 revealed that USA North and China clustered together while USA 12 was placed at a different location on the tree (S5 Fig). Interestingly, the only separation case of B73 RefGen and B73 HapMap2 in the origin tree of c5r2 indicated B73 seed in HapMap2 came from the UC-Berkeley sub-clade (S8 Fig). In addition, for two cases, c2r1 and c5r2, we validated our haplotype assignments using an orthogonal analytical method, kmeans analysis. SNP data was first imputed using Linkimpute , and then grouped into two clusters using kmeans function in R with k = 2 (S1 Table). For c2r1, the analysis was entirely consistent with the results presented above with samples classified as China placed in one cluster with W22 and samples classified as USA North and South placed in the other. For c5r2, as expected all samples classified as China, USA North, and UC-Berkeley subclade were placed in a cluster with the B73 sample resequenced by the Hapmap2 project. In addition, one sample classified as USA South (USA 19) was placed in this cluster. Manual reexamination determined that USA 19 was heterozygous from the c5r2 SNP block (Fig 2).
(A) Haplotype region c2r1 on Chromosome 2; (B) Haplotype region c2r2 on Chromosome 2; (C) Haplotype region c5r2 on Chromosome 5. Non-reference-like homozygous genotypes are indicated in blue and heterozygous genotypes in red. Named haplotype regions are those between the green bars.
The maize community has long speculated that significant differences exist among B73 from different sources. Several previous studies have confirmed that genetic variation exists between different sources of the same maize inbreds   , yet due to constraints of cost and seed avaliability these comparisons were genrally able to compare only a small subset of potential seed sources for a given inbred. The avaliability of previously published RNA-seq data sets from a large number of independent research groups has made it possible to conduct a broad survey of the diversity among B73 accessions. No cases of samples which were labeled as originated from B73 but were clearly not B73 based on SNP data were identified in this study. Despite a 40+ generation reproductive history distributed across at least three continents, this analysis shows that 97.7% of the gene space of the maize genome is represented by a single consistent haplotype across all B73 accessions represented here. This compares favorably to approximately 20% of the genome showing multiple haplotypes in a single seed batch of the reference genotype of arabidopsis Columbia-0 . One potential explanation is that maize geneticists, always aware of the significant risk of pollen contamination, have had to be alert for signs of hybrid vigor or unexpected phenotypes when propogating inbred lines .
In soybean, the published reference genome was found to consist of a mosaic of sequences observed in two separate sources of the reference variety and likely is not representative of the haplotype present in any individual plant . In maize, a number of samples classified into the USDA GRIN subclade (Fig 1) are largely consistent with the reference genome suggesting that the maize reference genome sequence likely is representative of a specific plant.
The interspersed SNPs distributed over ten chromosomes of maize may result from de novo mutations, segregation of heterozygous loci in the original B73 founder accession , or false positive SNP calling errors. However, the majority of polymorphisms identified among B73 samples in this study primarily fell into a small number of dense non-reference-genome-like blocks, consistent with introgression of non-B73 germplasm into a B73 background. It is important to note that the B73 reference genome was sequenced relatively recently compared to the total age of the B73 accession. Therefore, it is not possible to infer whether a given non-reference-genome-like block originated from introgression into the line in which the non-reference-genome SNPs are observed or introgression into the B73 lineage which was ultimately employed in the creation of the B73 reference genome. However, in either case the relatively small size of these non-reference genome like blocks suggests multiple generations of backcrossing to the original B73 line, which would not be consistent with an origin as unrecognized pollen contamination.
Instead we propose a model based on the results from Sample USA 12. USA 12 consists of homozygous wild-type plants selected from family segregating for the Knotted1 . Therefore the block on chromosome 2 (~1% of the total maize genome) likely represents residual sequence from the knotted1 mutant donor parent line and is consistent with at least 5 generations of backcrossing (expected contribution of the donor parent = ~1.56%). Similar accidental fixations of unlinked regions may have occurred during the intentional introgression of other traits into a B73 background, such as disease resistance genes .
The monophyletic placement of Chinese B73 datasets suggests that the B73 seed available in China likely originated from a single transfer from the USA, apparently of seed belonging to the USA North clade and is an indicator of current tight controls on seed import/export which limit the ease with which seed change be exchanged between collaborators in China and the United States. Samples from Germany did not consistently form a monophyletic group. The concordance of academic lineages and genomic relationships in the UC Berkeley subclade acts as a notable positive control. More extensive sampling of B73 samples from many labs which employ this genotype in maize genetics research but have not, to date, published RNA-seq datasets may identify further B73 clades and subclades and additional cases where specific genomic variations have dispersed across the country as graduate students and postdocs leave a given lab for faculty positions of their own.
The existence of genomic variation among samples labeled as belonging to the same accession creates barriers to reproducibility, one of the core requirements of the scientific method . In this study no examples of sample mislabeling were identified, however the possibility of ascertainment bias, with samples mislabeled as B73 being identified prior to publication must be aknowledged. A number of non-reference-genome-like blocks were identified in B73 samples originating from some sources. These blocks were shown to contain missense and nonsense mutations and measurably lower estimated expression values for genes in these regions. The identification of the relationships among different variants of B73 and the genomic locatons of non-reference-genome-like regions will allow these differences to be controlled for future studies. With the rapid rise of sequencing-based assays such as RNA-seq, the strategy employed here may be a good one to apply in any case where one or more reference genotypes are widely employed in research across institutions, countries, and continents.
S1 Fig. The 7 named haplotype blocks and SNP distribution pattern across all 10 chromosomes of maize for each of the 27 data sets.
(A) The maximum likelihood phylogenetic tree of 27 data sets by imputed SNP set; (B) One most parsimonious phylogenetic tree of 27 data sets by imputed SNP set.
S4 Fig. The scatterplot comparing correlation between SNP density and median gene expression for genes inside and outside of identified block regions.
S5 Fig. The phylogenetic tree of c2r2 haplotype region compared to corresponding lines in HapmapV2.
S6 Fig. The phylogenetic tree of c4r1 haplotype region compared to corresponding lines in HapmapV2.
S7 Fig. The phylogenetic tree of c5r1 haplotype region compared to corresponding lines in HapmapV2.
S8 Fig. The phylogenetic tree of c5r2 haplotype region compared to corresponding lines in HapmapV2.
S9 Fig. The phylogenetic tree of c6r1 haplotype region compared to corresponding lines in HapmapV2.
S10 Fig. The phylogenetic tree of c6r2 haplotype region compared to corresponding lines in HapmapV2.
The authors thank Yang Zhang for help in generating the B73 control RNA-seq datasets.
Conceived and designed the experiments: JCS ZL. Performed the experiments: ZL. Analyzed the data: ZL. Wrote the paper: JCS ZL.
- 1. Gethi JG, Labate JA, Lamkey KR, Smith ME, Kresovich S. SSR variation in important US maize inbred lines. Crop Science. 2002;42(3):951–957.
- 2. Olmos SE, Lorea R, Eyhérabide GH. Genetic variability within accessions of the B73 maize inbred line. Maydica. 2014;59:298–305.
- 3. MacLeod RA, Dirks WG, Matsuo Y, Kaufmann M, Milch H, Drexler HG. Widespread intraspecies cross-contamination of human tumor cell lines arising at source. International Journal of Cancer. 1999;83(4):555–563. pmid:10508494
- 4. Lucey BP, Nelson-Rees WA, Hutchins GM. Henrietta Lacks, HeLa cells, and cell culture contamination. Archives of Pathology & Laboratory Medicine. 2009;133(9):1463–1467.
- 5. Enders TA, Oh S, Yang Z, Montgomery BL, Strader LC. Genome Sequencing of Arabidopsis abp1-5 Reveals Second-Site Mutations That May Affect Phenotypes. The Plant Cell. 2015;27(7):1820–1826. pmid:26106149
- 6. Haun WJ, Hyten DL, Xu WW, Gerhardt DJ, Albert TJ, Richmond T, et al. The composition and origins of genomic variation among individuals of the soybean reference cultivar Williams 82. Plant Physiology. 2011;155(2):645–655. pmid:21115807
- 7. Shao MR, Shedge V, Kundariya H, Lehle FR, Mackenzie SA. Ws-2 Introgression in a Proportion of Arabidopsis thaliana Col-0 Stock Seed Produces Specific Phenotypes and Highlights the Importance of Routine Genetic Verification. The Plant Cell. 2016;28(3):603–605. pmid:26979070
- 8. Weigel D, Bergelson J, Buckler ES, Ecker JR, Nordborg M. A proposal regarding best practices for validating the identity of genetic stocks and the effects of genetic variants. The Plant Cell. 2016;28(3):606–609. pmid:26956491
- 9. Russell W. Registration of B70 and B73 Parental Lines of Maize1 (Reg. Nos. PL16 and PL17). Crop Science. 1972;12(5):721–721.
- 10. Darrah L, Zuber M. 1985 United States farm maize germplasm base and commercial breeding strategies. Crop Science. 1986;26(6):1109–1113.
- 11. Mikel MA. Genetic diversity and improvement of contemporary proprietary North American dent corn. Crop Science. 2008;48(5):1686–1695.
- 12. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326(5956):1112–1115. pmid:19965430
- 13. Eichten SR, Briskine R, Song J, Li Q, Swanson-Wagner R, Hermanson PJ, et al. Epigenetic and genetic influences on DNA methylation variation in maize populations. The Plant Cell Online. 2013;25(8):2783–2797.
- 14. Makarevitch I, Waters AJ, West PT, Stitzer M, Hirsch CN, Ross-Ibarra J, et al. Transposable elements contribute to activation of maize genes in response to abiotic stress. PLoS Genetics. 2015;11(1):e1004915. pmid:25569788
- 15. Sekhon RS, Briskine R, Hirsch CN, Myers CL, Springer NM, Buell CR, et al. Maize gene atlas developed by RNA sequencing and comparative evaluation of transcriptomes based on RNA sequencing and microarrays. PLoS One. 2013;8(4):e61005. pmid:23637782
- 16. Martin JA, Johnson NV, Gross SM, Schnable J, Meng X, Wang M, et al. A near complete snapshot of the Zea mays seedling transcriptome revealed from ultra-deep sequencing. Scientific Reports. 2014;4.
- 17. Stelpflug SC, Sekhon RS, Vaillancourt B, Hirsch CN, Buell CR, de Leon N, et al. An Expanded Maize Gene Expression Atlas based on RNA Sequencing and its Use to Explore Root Development. The Plant Genome. 2015.
- 18. Paschold A, Jia Y, Marcon C, Lund S, Larson NB, Yeh CT, et al. Complementation contributes to transcriptome complexity in maize (Zea mays L.) hybrids relative to their inbred parents. Genome Research. 2012;22(12):2445–2454. pmid:23086286
- 19. Li P, Ponnala L, Gandotra N, Wang L, Si Y, Tausta SL, et al. The developmental dynamics of the maize leaf transcriptome. Nature Genetics. 2010;42(12):1060–1067. pmid:21037569
- 20. Wang L, Czedik-Eysenberg A, Mertz RA, Si Y, Tohge T, Nunes-Nesi A, et al. Comparative analyses of C4 and C3 photosynthesis in developing leaves of maize and rice. Nature Biotechnology. 2014;32(11):1158–1165. pmid:25306245
- 21. Campbell MT, Proctor CA, Dou Y, Schmitz AJ, Phansak P, Kruger GR, et al. Genetic and Molecular Characterization of Submergence Response Identifies Subtol6 as a Major Submergence Tolerance Locus in Maize. PloS One. 2015;10(3):e0120385. pmid:25806518
- 22. Yi G, Neelakandan AK, Gontarek BC, Vollbrecht E, Becraft PW. The naked endosperm Genes Encode Duplicate INDETERMINATE Domain Transcription Factors Required for Maize Endosperm Cell Patterning and Differentiation. Plant Physiology. 2015;167(2):443–456. pmid:25552497
- 23. Bolduc N, Yilmaz A, Mejia-Guerra MK, Morohashi K, O’Connor D, Grotewold E, et al. Unraveling the KNOTTED1 regulatory network in maize meristems. Genes & Development. 2012;26(15):1685–1690.
- 24. Lemmon ZH, Bukowski R, Sun Q, Doebley JF. The role of cis regulatory evolution in maize domestication. PLoS Genetics. 2014;10(11):e1004745. pmid:25375861
- 25. Frank MH, Edwards MB, Schultz ER, McKain MR, Fei Z, Sørensen I, et al. Dissecting the molecular signatures of apical cell-type shoot meristems from two ancient land plant lineages. New Phytologist. 2015;207(3):893–904. pmid:25900772
- 26. Thatcher SR, Zhou W, Leonard A, Wang BB, Beatty M, Zastrow-Hayes G, et al. Genome-wide analysis of alternative splicing in Zea mays: landscape and genetic regulation. The Plant Cell. 2014;26(9):3472–3487. pmid:25248552
- 27. He G, Chen B, Wang X, Li X, Li J, He H, et al. Conservation and divergence of transcriptomic and epigenomic variation in maize hybrids. Genome Biology. 2013;14(6):R57. pmid:23758703
- 28. Regulski M, Lu Z, Kendall J, Donoghue MT, Reinders J, Llaca V, et al. The maize methylome influences mRNA splice sites and reveals widespread paramutation-like switches guided by small RNA. Genome Research. 2013;23(10):1651–1662. pmid:23739895
- 29. Kakumanu A, Ambavaram MM, Klumas C, Krishnan A, Batlang U, Myers E, et al. Effects of drought on gene expression in maize reproductive and leaf meristem tissue revealed by RNA-Seq. Plant Physiology. 2012;160(2):846–867. pmid:22837360
- 30. Eveland AL, Goldshmidt A, Pautler M, Morohashi K, Liseron-Monfils C, Lewis MW, et al. Regulatory modules controlling maize inflorescence architecture. Genome Research. 2014;24(3):431–443. pmid:24307553
- 31. Chettoor AM, Givan SA, Cole RA, Coker CT, Unger-Wallace E, Vejlupkova Z, et al. Discovery of novel transcripts and gametophytic functions via RNA-seq analysis of maize gametophytic transcriptomes. Genome Biology. 2014;15(414):10–1186.
- 32. Zhang M, Xie S, Dong X, Zhao X, Zeng B, Chen J, et al. Genome-wide high resolution parental-specific DNA and histone methylation maps uncover patterns of imprinting regulation in maize. Genome Research. 2014;24(1):167–176. pmid:24131563
- 33. Chen J, Zeng B, Zhang M, Xie S, Wang G, Hauck A, et al. Dynamic transcriptome landscape of maize embryo and endosperm development. Plant Physiology. 2014;166(1):252–264. pmid:25037214
- 34. Fu J, Cheng Y, Linghu J, Yang X, Kang L, Zhang Z, et al. RNA sequencing reveals the complex regulatory network in the maize kernel. Nature Communications. 2013;4.
- 35. Urbany C, Benke A, Marsian J, Huettel B, Reinhardt R, Stich B. Ups and downs of a transcriptional landscape shape iron deficiency associated chlorosis of the maize inbreds B73 and Mo17. BMC Plant Biology. 2013;13(1):213. pmid:24330725
- 36. Opitz N, Paschold A, Marcon C, Malik WA, Lanz C, Piepho HP, et al. Transcriptomic complexity in young maize primary roots in response to low water potentials. BMC Genomics. 2014;15(1):741. pmid:25174417
- 37. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;btu170. pmid:24695404
- 38. Wu TD, Nacu S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 2010;26(7):873–881. pmid:20147302
- 39. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. pmid:19505943
- 40. Cingolani P, Platts A, Coon M, Nguyen T, Wang L, Land SJ, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6(2):80–92. pmid:22728672
- 41. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology. 2010;59(3):307–321. pmid:20525638
- 42. Felsenstein J. PHYLIP (Phylogeny Inference Package) version 3.6 [computer program]. Available: http://evolution.genetics.washington.edu/phylip.html. Accessed 31 January 2006.
- 43. Money D, Gardner K, Migicovsky Z, Schwaninger H, Zhong GY, Myles S. LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms. G3: Genes| Genomes| Genetics. 2015;5(11):2383–2390. pmid:26377960
- 44. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols. 2012;7(3):562–578. pmid:22383036
- 45. Chia JM, Song C, Bradbury PJ, Costich D, de Leon N, Doebley J, et al. Maize HapMap2 identifies extant variation from a genome in flux. Nature Genetics. 2012;44(7):803–807. pmid:22660545
- 46. Liu S, Yeh CT, Tang HM, Nettleton D, Schnable PS. Gene mapping via bulked segregant RNA-Seq (BSR-Seq). PLoS One. 2012;7(5):e36406. pmid:22586469
- 47. Johnston R, Wang M, Sun Q, Sylvester AW, Hake S, Scanlon MJ. Transcriptomic analyses indicate that maize ligule development recapitulates gene expression patterns that occur during lateral organ initiation. The Plant Cell. 2014;26(12):4718–4732. pmid:25516601
- 48. Gore MA, Chia JM, Elshire RJ, Sun Q, Ersoz ES, Hurwitz BL, et al. A first-generation haplotype map of maize. Science. 2009;326(5956):1115–1117. pmid:19965431
- 49. Vollbrecht E, Reiser L, Hake S. Shoot meristem size is dependent on inbred background and presence of the maize homeobox gene, knotted1. Development. 2000;127(14):3161–3172. pmid:10862752
- 50. Schnable JC, Freeling M. Genes identified by visible mutant phenotypes show increased bias toward one of two subgenomes of maize. PloS One. 2011;6(3):e17855. pmid:21423772
- 51. Hirsch CN, Foerster JM, Johnson JM, Sekhon RS, Muttoni G, Vaillancourt B, et al. Insights into the maize pan-genome and pan-transcriptome. The Plant Cell Online. 2014;26(1):121–135.
- 52. Romay MC, Millard MJ, Glaubitz JC, Peiffer JA, Swarts KL, Casstevens TM, et al. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biology. 2013;14(6):R55. pmid:23759205
- 53. Lipps P, Pratt R, Hakiza J. Interaction of Ht and partial resistance to Exserohilum turcicum in maize. Plant Disease. 1997;81(3):277–282.