Single-Kernel Ionomic Profiles Are Highly Heritable Indicators of Genetic and Environmental Influences on Elemental Accumulation in Maize Grain (Zea mays)

The ionome, or elemental profile, of a maize kernel can be viewed in at least two distinct ways. First, the collection of elements within the kernel are food and feed for people and animals. Second, the ionome of the kernel represents a developmental end point that can summarize the life history of a plant, combining genetic programs and environmental interactions. We assert that single-kernel-based phenotyping of the ionome is an effective method of analysis, as it represents a reasonable compromise between precision, efficiency, and power. Here, we evaluate potential pitfalls of this sampling strategy using several field-grown maize sample sets. We demonstrate that there is enough genetically determined diversity in accumulation of many of the elements assayed to overcome potential artifacts. Further, we demonstrate that environmental signals are detectable through their influence on the kernel ionome. We conclude that using single kernels as the sampling unit is a valid approach for understanding genetic and environmental effects on the maize kernel ionome.


Introduction
Increasing agricultural sustainability requires improvements in nutrient use efficiency while decreasing fertilizer inputs. These requirements exist as the majority of arable soils have limitations associated with them [1]. Only 16% of crop lands are ''without constraint,'' and most of these constraints are related to elements found in inadequate or excessive amounts [1]. The range of soil elemental concentrations optimal for productive growth of crops is much smaller than that of the wild plants they may have displaced. This is likely due to human selection of crop plants for yield under optimal agricultural conditions and not for adaptive mineral nutrient efficiency on poor soil. Non-optimal concentrations of many elements limit the productivity of crops or necessitate significant inputs to maintain productivity. Major elemental limitations include excessive Na [2]; insufficient N [3], P [4], and K [5]; acid soil syndrome, which causes Al, Mn, and Fe toxicity and Mo, Ca, and P deficiency [6]; and Fe deficiency in alkaline soils [7]. Due to low soil fertility and the effects of poverty (e.g., inability to buy fertilizer), crop yields in most of Africa are less than one-fifth of U.S. yields [8]. In order to meet future food needs, we will need to increase yields while increasing the sustainability of agricultural systems in both developed and developing countries. In order to develop crops that can grow in diverse soils with less fertilizer, we require a deeper understanding of the genes that allow plants to adapt to different soil environments [9].
The elemental composition of a cell, tissue, or organism is referred to as the ionome [10]. The ionome can be profiled using high-throughput, high-accuracy analytical chemistry such as inductively coupled plasma-mass spectrometry (ICP-MS), which can measure the concentrations of 20 elements over 5 logs in ,2 minutes per sample. To apply this systems biology phenotyping platform most efficiently, the best tissue for estimating the ionome of a crop plant must be used [9,11,12]. We assert that mature seeds are the ideal tissue when resources are limited, as mature seeds represent a well-defined developmental end point that summarizes the life history and genetic composition of a particular individual. Seeds are also highly stable and are easy to store, transport, and handle. Furthermore, seeds are feedstocks for people, animals, and industrial processes such that the seed ionome alone is of high value and represents an excellent proxy for a whole plant.
In an ideal world, a survey of tissues could be used to track the ionome through developmental time. However, genetic and environmental determinants make this approach difficult to implement on large populations of field-grown plants, as diverse varieties may progress through development at different rates that may be more or less influenced by daily weather or other environmental factors. Compromises are required to ensure the success of a particular research program, especially one that aims to identify genes that are effective over a range of environments rather than emphasizing a single one. We propose that the analysis of single seeds is the most efficient use of resources to characterize a highly relevant ionome for field-grown crops. Intact seeds are at reduced risk for contamination or preparative artifacts due to sample grinding and are an easily automated sample unit, and the overall reduced cost of preparation and analysis make this the best compromise of efficiency, relevance, and precision. This scheme is not without obvious potential problems, however, not the least of which is heterogeneity between seeds produced by the same plant or related plants within an experimental plot that may shape the estimation of the ionome through single-seed-based observation.
In the present study, we test the premise of confounding heterogeneity to better understand the sources of variance that contribute to the seed ionome using maize kernels. We assert that single-seed-based analysis is a reasonable strategy for phenotypic analysis, especially when resources are limited and considerations of the number of test environments are balanced against accuracy within any single environment. The ionomic profiling workflow described in this study for maize kernels takes advantage of automation for sample handling, weighing, and liquid dispensing to reduce operator time, effort, and overall cost. This optimization allows 576 kernels to be analyzed from start to finish in 3 days. We demonstrate the utility of our workflow as an effective means of collecting ionomic data relevant to increasing agricultural sustainability.

Experimental Design and Analysis
The ionomics pipeline starts with arraying the single kernels in 48 well plates that are then loaded onto the custom-built weighing robot. Each kernel is weighed and deposited into a glass digestion tube. The samples are digested down to the elemental components using heat and acid. The digested material is analyzed by ICP-MS to quantitatively measure the concentrations of 20 elements with high precision. Both internal and external standards are used to correct for instrument drift during and between experiments.
The 26 parents of the maize nested association mapping (NAM) panel (hereafter, the NAM Founders) [13] were grown in a randomized complete block design, in 2010 (six blocks) and 2011 (five blocks), at the Purdue Agronomy Center for Research Education. Kernels from the middle of 1-2 cobs per plot were removed from the cob by hand and analyzed using the ionomics pipeline. For each plot (i.e., each genotype within a block), four to six seeds were analyzed. Outliers within each plot were removed using a conservative cutoff of 15 median absolute deviations from the plot median (derived from [14]) on an element by element basis. To be even more conservative, we used the mean of all the samples in each plot and omitted the one popcorn and two sweet corn entries in the NAM Founders in order to prevent artifacts from low seed weights from biasing the analysis. To analyze the sources of variance, a simple linear model with line and year effects was used to demonstrate that all 20 elements had a significant effect of genotype (p,0.01 with a Bonferroni correction; Table 1). Most elements also had significant effects of year as well as line x year interactions. There were four potentially problematic elements characterized by low heritability and or low correlation between seasons: B, Na, Al, and Se. This result was not unexpected, as all are in low abundance in the kernel and have several analytical interferences (see discussion). Narrow sense heritability estimates within a single season were quite high, with 17 and 19 elements having heritabilities .0.5 in each year respectively. Heritability estimates decreased when both years were considered in a single model, likely due to the significant line x year interactions. Twelve elements had a two-year heritability estimate above 0.5. For 16 elements there was a statistically significant (p,0.01) correlation between the two years (see examples in Figure 1).
In these ionomic studies, we wished to emphasize the potential to detect QTLs in plants grown across a larger number of different environments. This requires a compromise between population size, number of field locations, and sampling depth for any one accession. To estimate if a single kernel per cob would be a feasible approach to analyzing large populations across multiple environments, we created 100 datasets by randomly sampling one of the four to six analyzed samples from each plot, a process known as jackknifing. Since the outlier removal that we performed before the first analysis was based on the distribution of samples in a given plot, we used the original (without correction for outliers) dataset for this simulation. The distribution of jackknifed heritabilities at the 5% value over all data (i.e., 95% of the single-seed dataset heritabilities were greater than these values) was lower than those calculated for means, but 7 elements had heritability .0.45 and 11 were .0.35 (Table 1). We found this a reasonable compromise, in order to detect potential QTLs from a larger number of environments at different locations rather than focusing all of our attention on a single location.

Potential Confounding effects
To evaluate potential confounding effects that would limit our ability to detect genetic, environment, or gene by environment effects, we examined several potential sources of variance.
Outcrossing. Open pollinated ears are likely to have some kernels fertilized by pollen from nearby rows. If the paternal genotype has a significant effect on the kernel ionome, this genetic contamination may confound the genetic signal from the maternal plant. While we did not specifically test for this effect in these experiments, the random block design of the NAM parent experiment should have randomized the genotypes of the contaminating pollen and therefore contributed to the unexplained variance detected in that experiment. The heritabilities observed in that experiment (Table 1) suggest that this potential contamination will not prevent the detection of genetic effects, at least at the plot spacing used in these experiments (76 cm).
Cob location. Given the differences in seed filling along the ear, elemental accumulation gradients could exist from the base to the tip of the ear. To determine the prevalence and magnitude of these gradients, we analyzed eight kernels from the base, middle, and tip of three independent ears from plants grown at four different agricultural research stations in Missouri, Texas, and Iowa. ANOVA indicated a significant effect (p,0.01 with Bonferroni correction) of position within the cob for seven elements (Table 2 and Figure 2). However, the location of the experimental fields accounted for much more of the variance in all elements except Na, S, Ca, and Zn. Additionally, the dynamic range of elemental accumulation (defined by the ratio of maximum accumulation for a cob location to minimum accumulation for a cob location) was well below the dynamic range of the observations from the two years of NAM Founders in Indiana. For  example, while the highest and lowest cob locations differed by an average of 49% for Cu, the highest Cu accumulating line was over 500% higher than the lowest line in both 2010 and 2011. These data suggest that the variation due to cob location will not prevent identification of genetic environmental effects for most elements. Seed composition. Elements are not distributed uniformly throughout the seeds of grain crops: some elements are concentrated in the embryo (e.g., Fe and Zn), while others (Ca and S) have substantial accumulation in the maternal tissues of the pericarp and endosperm [15][16][17][18]. Large differences in amounts of these compartments, such as differences in the organic composition of the seed, could change the total elemental composition of the kernel. The sugary locus is a key step in starch biosynthesis, with mutations at sugary creating the sweet corn many people eat as a fresh vegetable. Furthermore, mutant kernels also have higher protein levels than wild-type siblings [19]. As sugary kernels mature and dehydrate, they shrink, losing more stored carbon than Su+ seeds, and can potentially lose water-soluble ions [20]. To determine the effect of sugary on the elemental composition, we identified seven RI lines from the dent x sweet, B73 x IL14H NAM subpopulation where sugary was still segregating. We separated the seeds from single plots based on the kernel morphology (collapsed su/su seeds from plump Su/+ seeds) and analyzed their elemental content (Table 3 and Figure 3). For nine of the elements, there was a significant effect of the sugary locus, and for all but Fe and As, a significant effect of line could also be detected. We also analyzed a single seed per entry from this population. With this limited sampling, we were able to identify 31 QTLs (Table 4), including 8 strong QTLs with LOD scores .5 (highest 95% permutation threshold for any element was 2.8). We identified QTLs for P, K, Ni, and Cu that localized with sugary on Chr 4. The presence of sugary did not, however, prevent us from identifying an additional 3, 2, and 1 QTLs for Cu, Ni, and P, respectively, demonstrating that the gross change in kernel composition did not obscure all other genetic signals.

Discussion
A defined developmental stage that is stable at room temperature, the mature kernel has many inherent advantages as a sample tissue. Single kernels are discrete, relatively small samples that can be easily manipulated by technicians or robots. Singlekernel-based analyses also avoid mechanical disruption (e.g., grinding or milling) that are labor intensive and/or introduce the possibility of contamination from the tools used. With these experiments, we have demonstrated that single-seed measurements of elemental profiles can be used to study the genetic and environmental effects on seed elemental accumulation.   Developing the ionomics pipeline around single-kernel sampling has enabled the processing of 1728 samples per week at the USDA-ARS/Danforth Center profiling facility. Multiple kernels can be profiled with less cost and effort than would be required to pool, disrupt and homogenize, subsample, digest, and run as a single composite sample. This provides several advantages from an analytical and statistical perspective. With multiple measurements from a given packet, outliers resulting from the analytical process or machine error can be discarded, as we did here. The measurements can be averaged to a single value or included as nested factors in a statistical model.
We investigated three potential confounding effects: outcrossing, relative position of the seed within the ear, and the composition of the organic components of the seeds. While we were able to detect statistically significant effects for the latter two factors for subsets of elements, neither one of them was large enough to obscure the signal from genetic or environmental factors.
Since we did not explicitly test for outcrossing, instead conducting an experiment, where it could occur, that tested whether the ionomic traits were still heritable, we cannot estimate directly the magnitude of the effects in this study. However, the high heritabilities observed in these experiments suggest that the effects are small. Of the range of processes that can contribute to the seed ionome, root uptake, partitioning, shoot leaf transport, leaf partitioning, leaf remobilization, leaf seed transport, and seed loading are all exclusively maternal, while embryo uptake has contributions from both maternal and paternal alleles. When viewed from this perspective, it seems reasonable that paternal effects are a minor contribution to the ionome. However, we cannot exclude the possibility that there are loci that will have a strong paternal effect.
The high heritability observed in jackknifed datasets of a single seed per plot from the NAM Founders, without the benefit of outlier removal, demonstrates that the genetic variation for many  elemental traits in maize is large enough that most confounding factors will not interfere with analysis. However, if the labor and resources are available, controlled pollination experiments and harvesting only the center of the cob will reduce the variability from pollination and cob location. Analysis of the organic composition of the kernels through NIR spectroscopy or other methods could allow for differences in protein, oil, and starch content to be included as cofactors in statistical models. The appropriate number of seeds for analysis will depend on the experimental design and expected frequencies of functionally distinct alleles. For quantitative genetic experiments such as recombinant inbred populations with several hundred lines (or thousands. in the case of the NAM), where a given allele may occur over a hundred times, a single seed per line may suffice to identify significant QTLs. From a technical standpoint, more observations will decrease the associated error but be biologically unnecessary to find common alleles, and may reduce the number of lines examined if you have fixed capacity for analysis. In those cases where rare alleles have large phenotypic effects, such as in an association mapping population or when a new mutation is being tested, more seeds will be required to increase the confidence of detection [21]. While we were able to identify significant QTLs with a single seed in the B73 x IL14H population, our standard practice when analyzing a single RIL population is to profile two to four seeds per plot. When analyzing extremely large populations such as the Maize NAM, which has .6000 plots in the standard experimental design, profiling one or two seeds from most lines allows for the analysis of the population in more locations [13,22]. This compromise in technical accuracy of the ionomic phenotype versus increasing the number of environments is a highly reasonable one to make, given that resources are almost always limiting. There are, of course several drawbacks and limitations to the approach we have taken. By opting for high throughput while analyzing 20 elements simultaneously, we do lose some sensitivity compared to lower throughput efforts focused on fewer elements. Examples of this include using standard glass (sodium borosilicate) tubes for chemical digestion, which increases the background levels of Na and B; not using a dynamic reaction cell on the ICP-MS, which leads to isobaric interferences for Se, As, and Al; and the use of a single sample dilution factor to measure elements with concentrations ranging from several hundred parts per million to low parts per billion.
This speed/number of elements/accuracy trade-off, combined with the low abundance of some of the elements in maize kernels, leads to several elements being at or below the level of detection in many or most of the samples analyzed. We have considered omitting these samples from the analysis pipeline entirely, but currently include them for occasions where a given mutant line or diverse accession accumulates sufficient qualities to exceed the detection threshold, such as for two of the NAM parents for Cd ( Figure 1).
The single-seed analysis approach has the potential to create two major artifacts. One is that differences in weights can cause perceived differences in accumulation in elements at or below detection limits when the signal from the instrument is normalized to weight. The second is that some elements may be restricted to a specific subcompartment of the seed, such as the embryo, so alterations in the other compartments may not affect the amount of the element, even though they would affect the weight. However, single-seed analysis, with the weight recorded and included in every dataset, allows for the dataset to be analyzed from an amount per seed perspective, which would alleviate these issues (although it may create artifacts in itself). We have opted to present the data as a concentration in the seed because, for most elements, there appears to be a correlation with weight and total elemental concentration. But depending on the particular question, a hybrid approach may be more appropriate. To ensure that these approaches are possible for other scientists, our data, including kernel weight, are all available from www.ionomicshub.org through the data exchange portal.

Conclusions
Here we have shown that single-seed-based ionomics is a robust and information-rich approach to identifying genetic and environmental determinants of the elemental profile of maize kernels. The limitations and potential artifacts are not large enough to prevent detection of statistically significant effects and represent a reasonable compromise to increase the scope and efficiency of investigation over absolute technical accuracy.  Table 5.

Determination of Elemental Concentration by ICP-MS Analysis
The analysis methods used are almost identical to those described for soybeans in Ziegler et al. [23]. As the precise methods of the pipeline are essential and integral to this manuscript, we have included identical or near-identical descriptions in this section. Identical passages are denoted by quotation marks.
Sample preparation and digestion. The seeds for the B73 x IL14H and B73 cob location experiments were weighed by hand. For the ''Seeds were sorted into 48-well tissue culture plates, one seed per well. A weight for each individual seed was determined using a custom built weighing robot. The weighing robot holds six 48-well plates and maneuvers each well of the plates over a hole which opens onto a 3-place balance. After recording the weight, each seed was deposited using pressurized air into a 166110 mm borosilicate glass test tube for digestion. The weighing robot can automatically weigh 288 seeds in approximately 1.5 hours with little user intervention.'' ''Seeds were digested in 2.5 mL concentrated nitric acid (AR Select Grade, VWR) with internal standard added (20 ppb In, BDH Aristar Plus). Seeds were soaked at room temperature overnight, then heated to 105uC for two hours. After cooling, the samples were diluted to 10 mL using ultrapure 18.2 MV water (UPW) from a Milli-Q system (Millipore). Samples were stirred with a custom-built stirring rod assembly, which uses plastic stirring rods to stir 60 test tubes at a time. Between uses, the stirring rod assembly was soaked in a 10% HNO 3 solution. A second dilution of 0.9 mL of the 1st dilution and 4.1 mL UPW was prepared in a second set of test tubes. After stirring, 1.2 mL of the second dilution was loaded into 96 well autosampler trays.'' ICP-MS Analysis. For the NAM parent experiments, elemental ''concentrations of B, Na, Mg, Al, P, S, K, Ca, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Rb, Mo, and Cd were measured using an Elan 6000 DRC-e mass spectrometer (Perkin-Elmer SCIEX) connected to a PFA microflow nebulizer (Elemental Scientific) and Apex HF desolvator (Elemental Scientific). Samples were introduced using a SC-FAST sample introduction system and SC4-DX autosampler (Elemental Scientific) that holds six 96-well trays (576 samples). '' The other experiments were run without the FAST sample introduction system and with a standard Apex desolvator (Elemental Scientific).
''All elements were measured with DRC collision mode off. Before each run, the lens voltage and nebulizer gas flow rate of the ICP-MS were optimized for maximum Indium signal intensity (.25,000 counts per second) while also maintaining low CeO+/ Ce+ (,0.008) and Ba++/Ba+ (,0.1) ratios. This ensures a strong signal while also reducing the interferences caused by polyatomic and double-charged species. Before each run a calibration curve was obtained by analyzing six dilutions of a multi-element stock solution made from a mixture of single-element stock standards (Ultra Scientific).'' In addition for the NAM parent experiment, ''to correct for machine drift both during a single run and between runs, a control solution was run every tenth sample. The control solution is a bulk mixture of the remaining sample from the second dilution. Using bulked samples ensured that our controls were perfectly matrix matched and contained the same elemental concentrations as our samples, so that any drift due to the sample matrix would be reflected in drift in our controls. The same control mixture was used for every ICP-MS run in the project so that run-to-run variation could be corrected. A run of 576 samples took approximately 33 hours with no user intervention. The time required for cleaning of the instrument and sample tubes as well as the digestions and transfers necessary to set up the run limit the throughput to three 576 sample runs per week.''

Drift
Correction and Analytical Outlier Removal. ''Because our internal standard (IS) is added to our digesting acid, we are able to correct for losses due to differential sample evaporation, human error during the dilution process, and any sample introduction variability. So, if the final observed IS concentration is lower or higher than the starting IS concentration, all analyte concentrations are corrected equally for the percent difference between the observed IS concentration and the known starting IS concentration. IS correction is handled automatically by the PerkinElmer Elan 6000 software. Additionally, drift was corrected using the values of the controls run every tenth sample using a method similar to Haugen [24]. In short, drift was corrected by calculating the percentage of concentration change between two controls. This percentage change was assumed to have occurred linearly during the sequence of ten samples run between the two controls. So, for instance, the first sample run after the first control was corrected for 1/10 th of the drift seen between the two controls. Finally, because responses from the machine may be different between runs, we also corrected for drift between runs. This was performed by calculating a correction factor from the control concentrations in this run and a reference control used for all the runs. After drift correction, samples were corrected for the dilution factor and normalized to the seed weights.'' ''While biological outliers are of great interest to our analysis, analytical outliers (e.g. from contamination, spurious isobaric and polyatomic interferences, or poor sample uptake) introduce noise and could lead to a higher rate of incorrectly chosen potential mutants. Outlier removal was implemented using the algorithm described in Davies and Gather [14]. To remove outliers while ensuring that we weren't removing biologically significant data points we removed data points on a per element basis from seeds whose reported elemental concentration was greater than a conservatively set 15 median absolute deviations from the median concentration of that element for that line.'' Computational Analysis. All calculations were performed using a combination of custom Perl and R scripts. The following R packages were used in the data analysis: reshape [25], ggplot. Standard Perl and R functions were used for drift correction, statistical analysis and creation of the figures. Scripts used in this analysis are available from http://www.ionomicshub.org, in the data exchange portal.
QTL analysis. To create the marker map, we pulled from the core set of 1100 NAM markers all the markers that were segregating in the B73 x IL14H population and removed highly correlated (r 2 .0.91) markers. We then computed a map using the Emap function in QTL Cartographer and verified co-linearity with the full map using all 1100 markers. We performed composite interval mapping (CIM) using QTL Cartographer version 1.17f [26], with CIM [27,28] model 6, a walk speed of 2 cM, a window of 5 cM, using the forward and backward regression method. To determine threshold values, the permutation method was used [29] with 1000 permutations per element per population.