Population structure analysis and identification of genomic regions under selection associated with low-nitrogen tolerance in tropical maize lines

Increasing low nitrogen (N) tolerance in maize is an important goal for food security and agricultural sustainability. In order to analyze the population structure of tropical maize lines and identify genomic regions associated with low-N tolerance, a set of 64 inbred lines were evaluated under low-N and optimal-N conditions. The low-N Agronomic Efficiency index (LNAE) of each line was calculated. The maize lines were genotyped using 417,112 SNPs markers. The grouping based on the LNAE values classified the lines into two phenotypic groups, the first comprised by genotypes with high LNAE (named H_LNAE group), while the second one comprised genotypes with low LNAE (named L_LNAE group). The H_LNAE and L_LNAE groups had LNAE mean values of 3,304 and 1,644, respectively. The population structure analysis revealed a weak relationship between genetic and phenotypic diversity. Pairs of lines were identified, having at the same time high LNAE and high genetic distance from each other. A set of 29 SNPs markers exhibited a significant difference in allelic frequencies (Fst > 0.2) between H_LNAE and L_LNAE groups. The Pearson’s correlation between LNAE and the favorable alleles in this set of SNPs was 0.69. These SNPs could be useful for marker-assisted selection for low-N tolerance in maize breeding programs. The results of this study could help maize breeders identify accessions to be used in the development of low-N tolerant cultivars.


Introduction
High levels of nitrogen (N) fertilization is required to achieve high maize (Zea mays L.) grain yield. However, the maize crop utilizes only about 30-40% of the N available in the soil while the remaining is lost through surface runoff, denitrification, volatilization and microbial consumption [1]. N fertilization is expensive and also harmful to the environment because of the negative effect on water quality. In this context, the development of nutritionally efficient maize cultivars, which yield more with low fertilization, is a way to reduce fertilizer usage [2,3]. Maize tolerance to low availability of N in the soil is a complex trait influenced by genetic and environmental factors, including soil nitrogen availability, nitrogen uptake, assimilation, transportation, and remobilization [4]. Breeding programs use some indices related to low-N stress tolerance based on several quantitative traits [3,[5][6][7][8]. For example, Low-N Agronomic Efficiency (LNAE) is an index that accounts for the absolute grain yield at low-N and the ratio between low-N and optimal-N conditions [8]. Therefore, LNAE is an advantageous trait for maize breeding programs aimed at developing cultivars with high yield even at low-N availability.
Information on the genomic diversity and population structure of tropical maize germplasm can accelerate the genetic gains in maize breeding programs [9,10]. Previous studies showed that maize gene pools present a clear population structure between temperate and tropical/subtropical pools. While the tropical maize germplasm has greater genetic diversity, the temperate germplasm presents more pronounced heterotic patterns [11][12][13][14]. However, the tropical maize germplasm lacks information on the genetic relationship [14] and genetic variation under low-N stress [8].
Maize lines from different heterotic groups produced higher-yielding hybrids than those of lines from the same heterotic group. Therefore, the assessment of genetic diversity of germplasm is routinely carried out using different morphological and molecular markers. For this purpose, molecular markers are more beneficial because they are not affected by environmental factors and the plants' developmental stage. Consequently, DNA markers have been an indispensable tool for characterizing genetic resources and providing breeders with more detailed information to assist in selecting diverse parents [9].
With the development of next-generation DNA sequencing methods, the Single Nucleotide Polymorphisms (SNPs) markers have become a useful tool for understanding genetic relationships and population structure of different species [15], including maize [9,10,14]. In addition, the genotyping of populations with high-density SNPs distributed throughout the maize genome enables the identification of genomic regions under directional selection among populations and associated with traits of interest [15,16]. SNPs markers associated with traits of interest could benefit markers assisted selection pipelines of maize breeding programs.
The present study aimed i) at investigating the population structure and genomic diversity of selected tropical maize lines with different levels of tolerance to low-N stress, and ii) at identifying SNP markers under directional selection between groups of lines phenotypically contrasting in low-N tolerance.

Plant material and field experiments
A diversity panel consisting of 64 Brazilian tropical maize inbred lines with genetic variability for N-stress tolerance was evaluated. Most of these lines originated from the maize breeding program of the Universiade Federal de Viçosa (Brazil) [17,18]. The maize lines were planted in two experimental fields, side by side, one with optimal-N availability (IN), and the other with low-N availability (LN). An 8 × 8 square lattice design with two replications was established in each field. The experiment was carried out at three sites (each combination of loca- The amount of N applied was determined by the expected yield using a given N dosage. Therefore, the IN dose corresponded to 100% expected yield and LN dose to 50% of IN yield  [20].

Traits evaluated
We measured the grain yield (GY, kg ha -1 ) of each plot and adjusted the yield to 13% grain moisture. The adjusted means were obtained for each inbred line under each nitrogen level and environment, using the following model: where y is the grain yield vector, β was the vector of fixed effects of the common constant, the replicate, and the genotype, u was the vector of random effects of blocks with u � ð0; Is 2 b Þ, � was the error matrix with �*(0,Iσ 2 ), X and Z are the incidence matrix of the fixed and random effects, respectively.
Based on the adjusted means, we estimated the Low-N Agronomic Efficiency index (LNAE), for each inbred line in each environment. The LNAE index, introduced by [8], was calculated as follows: where GY IN ij and GY LN ij are the same as defined earlier.
Analysis of variance was performed for LNAE index using the following model: where y is the LNAE vector, β is the vector of fixed effects of the common constant, the environment, and the inbred line, � is the error matrix with �*(0,Iσ 2 ), X is the incidence matrix of the fixed effects. If there was significant effect of inbred line, we performed the Scott-Knott test using the package 'Scott-Knott' [21].

Genotyping and SNPs quality control
Fresh leaves samples were collected from the fourth week old seedlings within each maize line and stored in a deep freezer at -80˚C. According to its instruction manual, total genomic DNA extraction was performed using the Qiagen DNeasy Plant Mini Kit (Promega™). The inbred maize lines were genotyped using Affymetrix1 Axiom1 Maize Genotyping Array, containing 616,201 SNP markers [22]. The quality control of the SNPs was performed based on call rate (90%) and minor frequency allele (MAF, 5%) using TASSEL 5.0 [23] software. The quality control process resulted in 417,112 high-quality SNPs. From the total filtered SNPs, 12,050 were selected based on the LD pruning using Plink software [24], in order to remove neighbor markers possessing LD higher than 0.13, which was the average LD of the population [16].

Population structure analysis
Population structure was analyzed using the Bayesian method implemented in STRUCTURE 2.3.4 [25] software, assuming an admixture model and independence between loci. After the initial burn-in period of 1 ×10 5 iterations for each K value (ranging from 1 to 10) were performed ten replicate runs of 1 ×10 5 Markov Chain Monte Carlo iterations. The STRUCTURE 2.3.4 results were summarized using the pophelper R package [26]. The number of groups was estimated using the Evanno's ΔK based method [27]. Several runs for each K were submitted to the CLUMPP [28] to identify label switching. In order to assess and visualize the genetic relationships among the maize lines, we performed principal coordinate analyses (PCoA) via genetic distance matrix with data standardization using the package vegan [29].
The genetic distance among the maize lines was calculated as 1 -IBS (identity by state) using TASSEL 5.0 [23]. IBS is defined as the probability that alleles sampled at random from two individuals at the same locus were the same.

Detection of SNPs under selection between contrasting phenotypic groups
In order to identify SNPs with significant allele frequency differences between contrasting phenotypic groups for LNAE, we calculated the fixation index (Fst) for each SNP [30] using the SNiPlay [31] software. FsT values ranged from 0 to 1, with zero representing no allele differentiation and 1 representing complete allele differentiation between two populations. The SNPs possessing FsT > 0.2 were considered as under directional selection. We also used the Bayesian approach of the BAYESCAN software [32]. BAYESCAN was run with a burn-in of 50,000, a thinning interval of 30, a sample size of 5,000, a number of pilots runs of 50, length of pilot runs of 5,000, and the false discovery rate (FDR) threshold of 0.05.

Genetic variability for low-N efficiency
Based on the Scott-Knott test, the maize lines were classified into two phenotypic groups, one formed from 29 lines with high LNAE values (named H_LNAE group) and another comprising 35 lines with low LNAE (named L_LNAE group). The distribution of the LNAE values in the H_LNAE and L_LNAE groups is shown in Fig 1. The H_LNAE values of the genotypes ranged from 2,491 to 4,626, with an average and standard deviation of 3,304 and 589, respectively. The L_LNAE values of the lines ranged from 521 to 2448, with an average and standard  Table, while the phenotypic values for yield under low-N and optimal-N of each maize line in each site are available in S2 Table. Population structure and genetic distances using SNP markers Principal coordinate analysis based on 12,050 SNPs markers (Fig 2) showed no clear genetic differentiation between H_LNAE and L_LNAE groups. The explanation for the lack of relationship between phenotypic and genetic differences is that the PCoA was obtained using all the 12,050 SNPs, several of which may have been may be positioned in genomic regions that do not affect the LNAE.
According to the Bayesian approach, the population structure was analyzed using STRUC-TURE software (Fig 3). Based on Evanno's criterion [27], the upper levels of the subdivision of the population were K = 7 and K = 3 ( Fig 3A). Considering K = 3 (Fig 3B), the L_LNAE group presented a considerably higher assignment ( Fig 3C) to Q1 (red) genetic group (0.37) than H_LNAE group (0.25). This indicated that although there was no clear genetic structure between the H_LNAE and L_LNAE groups, there were SNPs with significant differences in allelic frequencies between these groups.
The genetic distances among the 64 maize lines analyzed (S3 Table) ranged from 0.03 (between lines L033 and L039) to 0.41 (between lines L051 and L036). The line L51 showed a relatively high genetic distance in relation to all other lines analyzed (from 0.35 to 0.41). Considering only the group of lines possessing high LNAE (H_LNAE group), the genetic distances ranged from 0.05 (between L029 and L030) to 0.4 (between line L51 and lines L016, L012B, L037, L008, L012, L038, L022, L016B, L023, L056). The two lines with the highest LNAE values (L032 and L014) had a genetic distance of 0.24, indicating that crosses between them would be promising to generate transgressive hybrids. On the other hand, the cross between L029 and L030 lines is not very promising due to the high genetic similarity between them.

Genomic regions under selection between H_LNAE and L_LNAE groups
In order to identify SNPs under directional selection between H_LNAE and L_LNAE groups, we used two approaches, the first one based on Fst [30] and the second one using the  Table 1 shows the markers, their positions, and their Fst values between H_LNAE and L_LNAE. The SNP Affx_90855476, located on chromosome 9, presented the highest Fst in relation to all other SNPs (0.37). Considering this SNP, the favorable allele for high LNAE (G) presented a frequency of 0.43 on the H_LNAE group, while in the group L_LNAE the frequency of this allele was only 0.03 (Table 2).
We also used the method implemented in the BAYESCAN software to identify SNPs under directional selection between H_LNAE and L_LNAE groups. Through this approach, 4 SNPs were identified under selection between H_LNAE and L_LNAE groups (Fig 5 and Table 2).
Considering the 29 SNPs under selection between H_LNAE and L_LNAE groups, the Pearson's correlation between the number of favorable alleles (NFA) and the LNAE of the maize lines was 0.68 (p = 6.34 10 −10 ). The R 2 coefficient of LNAE linear regression as a function of NFA was 0.46 (Fig 6). The S4 Table shows the allelic profile of the maize lines for the 29 SNPs and the number of LNAE favorable alleles of each line.

Discussion
Maize cultivars tolerant of low-N in the soil can show high grain yield with less N application. The possibility of applying smaller quantities of N in the maize crop is economically beneficial because it reduces production costs and is also environmentally favorable. In this study, we used the LNAE index (Low-N Agronomic Efficiency) to measure the tolerance of 64 tropical maize lines to low-N availability in the soil. In the LNAE index calculation, both the absolute GY at low-N availability and the ratio between low-N and optimum-N are considered [3,8]. Therefore, LNAE is an important index in maize breeding programs aimed at developing high-yielding cultivars even under low-N conditions. Genetic variability is essential for success in maize breeding programs. Therefore, genetic variability related to N-utilization has been investigated in maize, wheat, rice, and spring barley [2,[33][34][35][36]. In the present study, tropical maize lines analyzed possessed high diversity for low-N tolerance compared to studies previously reported [11,37]. Thus, high LNAE inbred lines identified in this study may be an important resource for developing low-N tolerant cultivars. A clear understanding of genetic relationships among low-N tolerant inbred lines offers an opportunity to obtain genotypes that could be used in parental combinations to develop simultaneously high-yielding and low-N tolerant maize hybrids. The genetic distance matrix (S3 Table) revealed a wide genetic variability among the maize lines analyzed, with a lower number of pairwise individuals possessing low genetic distances, suggesting that most of the tropical maize lines evaluated in this study were unique, and each of them aimed at developing the potential to contribute new alleles to the breeding programs in tropical regions. That was possible to identify pairwise individuals that possessed high LNAE and were genetically distinct from each other. For example, the genetic distance between the two lines possessing the highest LNAE values (L032 and L014) was relatively high (0.24), indicating that the crossing between them would be promising and could result in transgressive hybrids. In addition, knowledge of genetic distances between pairs of maize lines can be useful for assigning lines to heterotic groups, selecting parental lines, and for estimating of genetic diversity loss during conservation or selection [9,10,12,38,39].
As a result of the development of next-generation DNA sequencing methods, SNPs molecular markers have become essential for understanding genetic relationships among different species of agronomic interest, including maize [10]. Additionally, genotyping populations with high-density SNPs distributed throughout the genome enabled the identification of SNPs associated with traits of interest, which broadened the knowledge about the genetic architecture of various traits of interest for maize breeding [16,40,41].
In practice, SNPs markers associated with traits can be beneficial when applied to assisted selection in breeding programs. However, from large-scale implementation in large breeding programs, the cost of genotyping large populations with dense SNP panels may still be a limiting factor in this tool [42][43][44]. Thus, selecting reduced sets of SNPs is important to reduce costs. Of the 417,112 polymorphic SNPs markers initially identified among the 64 tropical maize lines analyzed in the present study, we selected 12,050 SNPs based on linkage disequilibrium. These 12,050 SNPs were sufficient for estimation of the population structure of the lines efficiently. They allowed the identification of low-N tolerant lines and those with higher genetic distances to each other to maximize heterosis in future crosses among maize lines for hybrid production.
In order to generate an even smaller set of SNPs to use in marker-assisted selection for LNAE, we selected 29 SNPs with Fst values equal to or higher than 0.2 between the two phenotypic groups of maize lines possessing contrasting characteristics (H_LNAE and L_LNAE groups). We observed that four of these SNPs (Affx_90387785, Affx_91199376, Affx_90855476, and Affx_91232032) also showed evidence of having undergone directional selection according to the results of the analysis carried out using the BAYESCAN software [32]. BAYESCAN is based on the multinomial-Dirichlet model. As Bayesian, BAYESCAN incorporates the uncertainty of allele frequencies due to small sample sizes. The false discovery rate threshold adopted in this analysis was considerably low (0.05). Therefore, the number of SNPs under directional selection identified using this approach was lower than using the Fst [30] based approach.
Considering the small set of 29 SNPs under selection between H_LNA and L_LNAE groups, we observed a Pearson's correlation between the number of favorable alleles for high LNAE and the LNAE index of the maize lines was equal to 0.68. These SNP markers can be useful in marker-assisted selection for low-N tolerance. In addition, the results can help establish breeding programs to improve tolerance of maize to stress due to the low availability of N in the soil.