Genome-wide association study and genetic diversity analysis on nitrogen use efficiency in a Central European winter wheat (Triticum aestivum L.) collection

To satisfy future demands, the increase of wheat (Triticum aestivum L.) yield is inevitable. Simultaneously, maintaining high crop productivity and efficient use of nutrients, especially nitrogen use efficiency (NUE), are essential for sustainable agriculture. NUE and its components are inherently complex and highly influenced by environmental factors, nitrogen management practices and genotypic variation. Therefore, a better understanding of their genetic basis and regulation is fundamental. To investigate NUE-related traits and their genetic and environmental regulation, field trials were evaluated in a Central European wheat collection of 93 cultivars at two nitrogen input levels across three seasons. This elite germplasm collection was genotyped on DArTseq® genotypic platform to identify loci affecting N-related complex agronomic traits. To conduct robust genome-wide association mapping, the genetic diversity, population structure and linkage disequilibrium were examined. Population structure was investigated by various methods and two subpopulations were identified. Their separation is based on the breeding history of the cultivars, while analysis of linkage disequilibrium suggested that selective pressures had acted on genomic regions bearing loci with remarkable agronomic importance. Besides NUE, genetic basis for variation in agronomic traits indirectly affecting NUE and its components, moreover genetic loci underlying response to nitrogen fertilisation were also determined. Altogether, 183 marker-trait associations (MTA) were identified spreading over almost the entire genome. We found that most of the MTAs were environmental-dependent. The present study identified several associated markers in those genomic regions where previous reports had found genes or quantitative trait loci influencing the same traits, while most of the MTAs revealed new genomic regions. Our data provides an overview of the allele composition of bread wheat varieties anchored to DArTseq® markers, which will facilitate the understanding of the genetic basis of NUE and agronomically important traits.

Introduction moderate effect will take us closer to the understanding of the final picture, the complexity of NUE [13]. These minor QTLs are more likely are specific QTLs representing genetic variation specific for the species or variety and the environmental conditions [13]. To discover many of these specific QTLs it is inevitable to investigate NUE in different environments studying diverse genetic material.
Numerous studies have established that significant genetic variation exists for NUE-related traits in wheat. Components of NUE were investigated in bread wheat under five N levels, and depending on N the supply, a 24-42% difference was fund in NUtE between the elite wheat varieties [15]. Another study revealed significant differences (69-88%) for N remobilisation efficiency and N translocation efficiency (90-93%) among five wheat genotypes [16]. These results indicate that genetic potential indeed exists to improve NUE in wheat.
Genetic diversity is a fundamental aspect of crop improvement; therefore, the effective utilisation of genetic resources in breeding programs is essentialy as long as this diversity integrates positive and profitable genes [17]. I has been concluded [13] that QTL data obtained from wide crosses, in spite of the fact that they have generated relatively robust QTL information dataset, may not be relevant in developing modern cereal crops. Diverse elite genetic materials, which are well-adapted and commercially relevant for specific interests are more useful to identify allelic variation for superior NUE. In this way, knowledge gained from scientific studies can provide valuable results in plant breeding.
Many genes or QTLs influencing NUE have been successfully mapped in wheat under various N supplies. Most of the studies utilised bi-parental populations to identify the genetic basis of NUE and its associated traits [7,[18][19][20].
Association mapping exploits ancestral recombination events that occurred throughout the evolutionary history of the populations and takes into account all the segregating alleles present in the population. In contrast, due to the restricted number of meiotic events, the genetic resolution of the segregation based conventional bi-parental maps often remain insufficient. Moreover, only the alleles from the parental genotypes are scrutinised in these 'classical' biparental analyses. So, compared to the capabilities of bi-parental mapping association analysis is a more efficient approach to dissect complex quantitative traits [21]. Concerning that this method is relatively new in cereal genetics, only a few genome-wide association studies (GWAS) have been published in wheat for NUE on diverse genetic materials under various different conditions so far [17, [22][23][24][25].
In order to identify chromosomal regions involved in the determination of NUE and its related traits, a genome-wide association study was carried out in a European winter wheat collection under different conditions. The Marker-Trait-Associations (MTA) detected in commercially relevant genetic materials will facilitate the isolation of agronomically important genes, especially those related to NUE.

Plant materials and experimental design
A set of 93 bread wheat (Triticum aestivum L.) varieties (S1 Table) were phenotyped in Martonvásár, Hungary, at the MTA ATK (Centre for Agricultural Research, Hungarian Academy of Sciences), in six environments (3 year × 2 N levels), during three successive cropping seasons (2012-2013, 2013-2014 and 2014-2015). The examined cultivars represent an elite germplasm collection grown mainly in Hungary and in Central Europe; however, some old (e.g. 'Bezostaja-1', 'Bánkúti 1201') or non-continental (e.g. 'Nudakota') varieties were also involved. Cultivars were selected according to their involvement and importance in the Hungarian wheat breeding.
In three consecutive years adjacent fields belonging to the Agricultural Institute of MTA ATK (47˚18´N, 18˚48´E, 105 MASL) were used. Each cultivar was sown between 2 and 21 October, in a split-plot design, in three replicates, at two N levels: (1) no nitrogen supply-considered as extensive management (referred as N0) and (2) intensive management, whereby 120 kg N per hectare (referred as N120) was applied. In the N0 treatment, only the naturally occurring nitrogen was available in the soil, while in the case of N120, N was top-dressed at growth stage (GS) 21-24 [26]. In 2014 and 2015 the fertiliser treatment was applied on 7 and 17 March, respectively. In 2013, the spring was cold and wet, so the N fertiliser could be allocated to the field only on 17 April (at tillering stage too, . In 2013, ammonium nitrate (34% N), while in 2014 and 2015, calcium ammonium nitrate (27% N) was allocated. The size of each plot was 3×1.44 m consisting of 12 rows, containing 500 viable seeds/m 2 . Sowing date and seed rate were applied according to the Hungarian practice [27][28]. N treatment was considered as main plots and varieties as sub-plots. P 2 O 5 and K 2 O fertilisation and plant protection were applied according to standard agricultural practice and no growth regulators were used. Crops were combine-harvested at state of grain maturity in the period of 8-21 July. Each spring, soil samples were collected before fertilisation from two depths (0-0.3 m and 0.3-0.6 m) and mineral N (ammonium + nitrate) contents (N mineral ) and main soil properties were determined in accredited laboratory according to NAT-1-1093/2001 (NÉBIH NTAI, Velence, Hungary). Soil type at each location was chernozem, but they differed in their naturally available N content. The average N mineral content (mean value of six soil samples in each year) was 21, 494 and 78 kgha -1 in 2013, 2014 and 2015, respectively. Weather data (i.e. daily precipitation and mean temperature values) were also recorded in the Martonvásár region. Further information about soil parameters and meteorological data can be obtained from [29].

Phenotypic evaluation
The entire collection was evaluated for 16 agronomically important or N input-related traits at two N input levels (N0 and N120). The investigated traits were heading date (HD), plant height (PH), GY, thousand grain weight (TGW), weight of straw-referred to as straw yield (SY), spike number per meter (SN), grain number per spike (GN), harvest index (HI), N harvest index (NHI), GPC, NUE, NUpE, NUtE, amount of N taken up by the whole plant (NUp full ), amount of the N harvested in grain (NUp grain ), grain N accumulation efficiency (GNACE). GY, HD and PH were scored on a per plot basis, while GN, SN, TGW and SY were evaluated from above-ground biomass samples collected from a representative row of each plot, whose length was one meter. HD was assigned when 50% of the spikes within a given plot had fully emerged (GS 59) [26] and expressed in days from sowing, while PH was measured in cm at maturity. The one meter samples were dried for 48 h at 70˚C and SY was determined and transformed in kgha -1 . SN and GN were determined on per meter basis. GN was determined using Contador seed counter machine (Pfeuffer GmbH, Kitzingen, Germany) and TGW was also calculated. The number of fully developed spikes was calculated from the one meter sample from each plot and expressed in spikes per m. GY was assigned when plots were combineharvested at plant maturity and calculated in kg hectare -1 as SY. HI was determined as the ratio of GY and above-ground biomass (both of them expressed in kgha -1 ), which is the sum of GY and SY. The straw and grain samples were milled and N content was determined by Dumas method [30] using Rapid NIII nitrogen analyser (Elementar Analysensysteme GmbH., Hanau, Germany). GPC in w/w% was calculated as grain N concentration multiplied by a coefficient of 5.8. The amount of the N harvested in grain and in straw was calculated by multiplying the N concentration obtained from Rapid NIII analysis with the amount of GY and SY, respectively, and expressed in kg hectare -1 . NUE, NUtE and NUpE were defined according to [8] as follows: NUE is the GY divided by the sum of naturally available N content in the soil and the amount of N allocated with fertiliser in kg hecatare -1 (N soil ). NUtE is the GY divided by NUp full . NUp full is the sum of NUp grain and N harvested in straw, which was expressed in kg hectare -1 . NUpE is the NUp full divided by N soil . NHI is the ratio of NUp grain to NUp full , which is a measure of N translocation efficiency. GNACE is NUp grain divided by the N soil , which serves as a measure of the overall efficiency.
Analysis of variance for all traits in each of the three consecutive years was performed using General Linear Model (GLM) procedure with SPSS 22.0 software for Windows. Correlations between phenotypic traits were also calculated through SPSS 22.0 in each cropping year and treatment.

Genotyping and marker selection
Genomic DNA was extracted by the Qiagen DNeasy Plant Mini Kit according to the manufacturer's instructions and sent to the commercial service provider of DArT marker, Triticarte Pty. Ltd. (Canberra, Australia; http://diversityarray.com/). Genotyping was performed with the wheat DArTseq1 platform and generated two types of marker data: (1) genotyping by sequencing (GBS) markers are SNP markers obtained by sequencing the fragments derived from genome complexity reduction and subsequent SNP calling, and (2) silicoDArT markers referring to the presence or absence of restriction fragment in the genomic representation. Sili-coDArT is analogous to microarray DArT but extracted in silico from the sequences obtained from the genomic representation used for GBS genotyping. Genomic sequences of fragments from both types of markers were also available. A detailed description of the platform used to genotype the collection can be found in [31]. As a result of genotyping, the initial dataset contained 12,293 polymorphic codominant SNP and 13,160 dominant silicoDArT markers. This marker set was filtered on the basis of individual marker-related statistics, provided by the Triticarte Pty. Ltd. The minimum threshold value for call-rate and reproducibility was 95%. Since it was assumed that all genotypes are homozygous, DArTseq1 markers showing heterozygous calls were indicated as missing and markers with >5% heterozygous alleles were excluded. Additionally, markers with minor allele frequency lower than 0.1 were also removed from the analysis. After filtering marker data, a total of 4,201 polymorphic DArTseq1 markers (2,700 silicoDArTs and 1,501 SNPs) were obtained and used for population structure analysis. However, since only 3,290 of these polymorphic markers had been previously mapped (data obtained from Triticarte Pty. Ltd), the unmapped markers were not included in linkage disequilibrium (LD) and marker trait associations analysis. A reduced marker set consisting of 300 unlinked markers were also used to complement and verify the population structure analysis. They were obtained from the filtered set by excluding markers, which were localised closer than 5 cM to any other marker.

Analysis of population structure
To determine the underlying population structure in the Hungarian wheat collection, different methodologies were used and compared. Firstly, the Bayesian algorithm implemented in the software package Structure. v2.3.4 [32] was used to estimate the number of hypothetical subpopulations (K) and to assign cultivars to them. Structure runs were performed with two marker set (4,201 markers and a reduced set with 300 markers) applying the admixture model with correlated allele frequencies. A burn-in of 100,000 iteration followed by 100,000 Markov Chain Monte Carlo iterations were set for accurate parameter estimates. The value of K was evaluated from 1 to 8, with 5 iterations for each K value. The most likely number for subpopulations was estimated by following the ΔK method, described by [33]. Each cultivar was assigned to one subpopulation based on the membership probability. Principal Coordinate Analysis (PCoA) was also used as an alternative way of visualizing the genetic stratification within the collection, by means of the software package PAST v.3.12 [34]. Additionally, the phylogenetic relationship among the cultivars was estimated using PAST software using the neighbour joining method. Bootstrap values for 1,000 replicates are indicated. Analysis of molecular variance (AMOVA) and genome wide estimation of population differentiation using Wright's F-statistics (F ST ) among subpopulations were performed with software package Arlequin 3.5.2.2 [35] to investigate levels of genetic variation revealed by Structure analysis.

Linkage disequilibrium analysis
Genome-wide LD analysis was performed using the LD function in the software TASSEL 5.0 [36]. Intra-chromosomal LD was estimated for the entire population. The analysis comprised the pairwise estimated squared allele-frequency correlation (r 2 ), normalized coefficient of linkage disequilibrium (D-prime or D') and the significance of each pair of loci. The locus were considered to be in significant LD when p<0.05. To estimate the LD decay, significant r 2 -values were plotted against the genetic distance between the marker-pairs and a second-degree smoothed Loess curve was fitted using SPSS 22.0. [37]. The interception of the Loess curve and background LD (critical r 2 ) was considered as an estimate of LD decay. The critical r 2 -value was determined by root transforming the unlinked r 2 -values and taking the 95 th percentile of the distribution as the threshold beyond which LD is likely caused by genetic linkage [23,38]. According to [39], marker pairs with a distance above 50 cM were considered as unlinked. LD decay was estimated separately in all chromosomes, for the three genomes (A, B and D) in the entire population.

Marker-trait association analysis
Genome-wide association analysis was performed using TASSEL 5.0 software [36] for each measured and calculated trait and for genotype response to N fertilisation too. Response to N fertilisation (RN) was estimated as the ratio between N120 and N0 for GY, SN, GN, GPC, NHI, NUPfull, NUpgrain and NUtE. In the association analysis each cultivar was represented as the phenotypic mean of the three plots in each season and the seasons were analysed separately. Four different statistical models were adopted to calculate P-values for MTAs to avoid spurious associations: (1) general linear model (GLM) with Q-matrix as correction for population structure; (2) GLM with Principal component analysis (PCA) as correction for population structure; (3) mixed linear model (MLM) with Kinship-matrix (K-matrix) as correction for population structure and (4) MLM with Q-matrix and K-matrix as correction for population structure [32;40]. PCA and K-matrix were calculated with TASSEL 5.0 considering recommendations of [41]. MLM was run without compression and selecting the option "population parameters previously determined". The critical threshold for assessing the significance of MTAs were calculated by false discovery rate separately for each trait in each year [42]. An MTA was defined significant if the calculated q-value passed the FDR threshold for a given trait in all of the four models. When a trait from the same environment (i.e. year and treatment) was found to be associated with more than one marker, that MTA was considered, which possessed its highest effect size within a 5 cM region.
Moreover quantile-quantile plots of -log 10 P-values in S3 Fig illustrate the relationship between the observed and the expected dataset based on the MLM+K+Q model. Appropriate control for population structure and relatedness can be seen, since empirical distribution do not deviate from expectations.

Genotypic data
Genotyping the Hungarian wheat collection with DArTseq1 platform resulted in a final dataset comprising 4,201 quality-filtered, polymorphic DArTseq1 markers of which 3,290 were placed on the genetic map. Among the mapped markers, 1,631 were located on the B genome, 1,210 on the A genome and only 449 on the D genome. Their chromosomal distribution can be found in S1 Fig. The total map length covered 5,880 cM. The average distance between markers was 1.79 cM, 1.22 cM and 5.77 cM in the A, B and D genome, respectively. Neither A nor B genome had gaps larger than 50 cM. However, the largest gap on chromosomes 3D, 4D and 5D were 60.42 cM, 74.21 cM and 64.55 cM, respectively. In summary, D genome presented the largest and the highest number of gaps among the genomes.

Analysis of population structure
To analyse the genetic diversity within the collection, the relatedness of the cultivars was firstly investigated with the Bayesian approach implemented in Structure software [32]. Evidence of significant population structure was provided by the analysis based on the method of [33]. The maximum ΔK value occurred at K = 2, so two subpopulations (referred as Sp1 and Sp2) were identified. The Q matrix (membership probability estimates) was extracted from Structure runs and each cultivar was assigned to one of the two subpopulations based on a membership probability >0. 51. The Sp1 contains the majority of the cultivars (79 cultivars), while Sp2 consisted of the remaining 14 cultivars. The separation of the two subpopulations reflects the breeding history of the cultivars. Since the algorithms in the Structure software assume independent loci, measured on randomly sampled unrelated individuals, another Bayesian clustering was carried out with a reduced marker set. The analysis of these 300 unlinked markers also led to a very similar result, two subpopulations were identified. PCoA was used as an alternative way of analysing and visualising population structure. The first two principal coordinates explained 15.2% and 11.2% of the molecular variance, and the separation into two subpopulations was confirmed by this independent analysis (Fig 1). The DArTseq1 marker-based phylogenetic structure of the wheat collection with the two subpopulations, Sp1 and Sp2 is presented in S2 Fig. The distribution of molecular variation among and within the two subpopulations was estimated by AMOVA, which revealed 23% (p 0.001) of total genetic variation partitioned among subpopulations, whereas 77% of the variation was maintained within subpopulations. Additionally, the subpopulations were compared on the basis of the phenotypic data collected and significant differences were found in five out of six (3 year × 2 N level) environments with respect to GPC, NUpE, NUp grain and NUp full , and in 4 cases with respect to GNACE, HI, GN, HD, NUE, GY (S2 Table).

Linkage disequilibrium
LD was calculated for the entire population by pairwise marker r 2 for each chromosome. The number of intra-chromosomal pairs, the number of significant marker pairs, critical r 2 , mean r 2 -values and the distributions of LDs are detailed in Table 1. Furthermore, LD analysis revealed differences between chromosomes (S3 Table). In the entire population 118,398 (35.2%) intra-chromosomal pairs showed a significant level of LD ( Table 1). Analysis of the LD in the different wheat genomes revealed, that the highest number of the significant pairs and also the pairs that are in perfect LD (i.e., where r 2 = 1 and D' = 1) was found in B genome. Only 10,543 pairs showed significant LD in the D genome (compared to 29,405 in the A and  Table 1. Overview of intra-chromosomal LD in the whole genome and in the three genomes in the entire population.

Marker pairs total
No. and rate a of significant (p<0.05) marker pairs (%) Mean r 2 of all marker pairs critical r 2 78,450 in the B genome) because of the lower marker density. On the other hand, the rates of significant and linked marker pairs were much higher on the D genome than on the A or B genome. The comparison of the mean r 2 -values showed that the D genome had a higher mean r 2 -value (0.221) with respect to the B (0.102) and A (0.062) genome. The highest mean r 2 -values occurred on 2D and 1B chromosomes, while the lowest ones on 3A, 7A, 3B and 7B chromosomes (S3 Table). The critical r 2 -value was quite similar in the three genomes, ranging from 0.2137 to 0.4371, with the maximum and minimum values on chromosomes 1B and 4D, respectively. The LD decay in the three genomes, including the Loess curve, is illustrated on Fig 2. The LD decay, estimated as the intercept of the Loess curve at the line of critical r 2 -value, was 9 cM for the whole genome and 9 cM and 30.5 cM for the B and D genomes, respectively. In the A genome the Loess curve did not intercept the critical line. The marker pairs in total LD had a mean distance of 8.44 cM in the entire population.

Characterisation of the phenotypic traits
Phenotypic variation: The phenotypic performance of the entire population and the subpopulations across the six environments for the 15 investigated traits are shown in S2A Table. Large phenotypic differences were observed in the population for all traits. Grain yield ranged from 2.33 t ha -1 (in 2012-2013 at N0) to 6.65 t ha -1 (in 2013-2014 at N120). Strong environmental effect was observed, causing remarkable variation of phenotypic traits between seasons. For most of the traits greater differences have been found between seasons than between N treatments. However, N fertilisation had significant effect on most of the studied traits (S2A Table). The significant difference of TGW, NUtE, NHI and HI varied between seasons, while N fertilisation did not have significant effect on HD. The largest differences between N0 and N120 were detected for GY, NUp grain , and NUp full (S2A Table). Other traits (except HD) also showed significant but moderate changes in response to N level. For example, HI changed only by 10%, while NUp grain changed by 36% from the 3 years' average. The values of NUE, NUpE and GNACE greatly varied between seasons, probably because these traits strongly depend on the theoretically available N in the soil, which was 21  Table), but none of the traits showed significant difference in all environment. Consequently, the phenotypic difference between the two subpopulations was highly influenced by environmental effects. In the last season (2014-2015), when the plant nutrient availability was moderate, Sp2 significantly differed from Sp1 for most of the investigated traits.
Analysis of variance components: The ANOVA results of the different traits are presented in S4 Table. The highest proportion of the variance was explained by environmental effect for most of the traits except for TGW and HI. In their cases the genotypic factor was more relevant. These results also indicated that the environment (including the effect of different N mineral ) was the most important factor in explaining the overall phenotypic variance, however, the input level and the genotype effect also had significant role in it. The genotype effect explained the second highest proportion of the phenotypic variance for most of the traits (9 traits). Genotype effect was highly significant for 14 traits, but not for NUpE and GNACE. Separate analysis of the years revealed that genotype had no significant effect on NUpE and GNACE in 2012-2013, when the lowest level of N mineral was observed. The input level effect was significant for all traits except for TGW and HD (S4 Table); moreover, it explained the second highest proportion of the phenotypic variance for NUE, GNACE, NUpE, NUp full and NUp grain . Among the two-way interactions, genotype × environment interaction explained the highest proportion of the variance for most of the traits and it was significant for 11 traits. Additionally, very strong input level × environment interaction (at p>0.001 level) was found for 10 traits and moderate effect for NUtE and SY (at p>0.05), while no significant input level × environment interaction was found for NUp grain , PH, SN and NUp full (S4 Table).
Correlations between traits: Correlation analysis was performed separately for all environments (three cropping seasons, two N fertilisation levels) for the most relevant traits, and several significant correlations were found at p<0.01 level ( Table 2). NUE was calculated as GY divided by available N soil ; therefore, the correlation analysis of NUE and GY with any other trait in the six environments separately shows the same result, hence only NUE was visualised in the analysis. Higher correlation coefficient was observed between NUE (GY) and NUpE than between NUE (GY) and NUtE in all environments. Additionally, GNACE correlated very strongly with both NUE (GY) and NUpE. GPC showed a negative correlation with NUE (GY) and NUtE, except in the season of 2012-2013 when GPC was not significant with NUE (GY). NHI correlated positively with NUE (GY) and negatively with GPC in most environments. The correlation between NUE (GY) and GN was stronger than the correlation between NUE (GY) and SN, except in the season of 2013-2014. TGW was not correlated with NUE (GY), but TGW showed slightly negative correlation with SN and GN in most of the environments.

Genome-wide mapping of agronomic and N-related traits
Genome-wide association analysis was performed for 11 investigated traits (GY, NUE, NUpE, NUtE, NUp full , NUp grain , GPC, NHI, GNACE, GN, SN) in all environments (three cropping seasons, two N fertilisation levels) and for the response to N fertilisation for 8 selected traits in order to identify genomic regions involved in the response to N fertilisation. Altogether, 183 MTAs were identified for 130 DArTseq1 markers. Most chromosomes were involved in determining at least one trait except chromosome 1D and 3D. All the details of the significant MTAs defined above are given in Table 3, while their chromosomal location is presented in

Characterisation of the effects of different environments, fertilisation and genotypes
The analysis of variance revealed that the environmental effect was the main factor controlling most of the studied traits (S4 Table). Strong interaction with the environment poses major challenge to developing N-related traits [43]. "Environmental effect" in the actual experiment indicates not just the considerable effect of the weather conditions but it also interprets the differences in the soil N supply. Although, the soil type at each location was chernozem, the average N mineral content significantly differed between years: it was 21, 494 and 78 kgha -1 in 2013, 2014 and 2015, respectively. Not surprisingly, the environmental effect (including the effect of different N mineral ) had greater influence and caused larger differences for most of the traits than the input level itself (120 kgha -1 N fertiliser) (S4 Table; S2 Table). We found that the input level was not significant for HD; moreover, in the case of HI and TGW, the "genotype effect" explained the highest proportion of the phenotypic variance (S4 Table). These traits are known to be highly influenced by genetic factors: the highly heritable traits are mainly controlled by the genotype, and the high heritability of these traits is well documented [44][45]. Genotype explained the second highest proportion of the phenotypic variance for most of the traits (GY, NUtE, NHI, HD, PH, SN, GN and GPC) which indicates existing genetic variability in the population to improve these traits.
The effect of the input level was significant for most of the traits except TGW, HD (S4 Table), and it explained the second highest proportion of the phenotypic variance for NUE, GNACE, NUpE, NUp full and NUp grain . NUE; GNACE and NUpE values were influenced by the amount of N available for plant. Unfortunately, the accurate determination of the amount   of N available for plant is very difficult because it is very expensive, time-consuming and labour-intensive to accurately determine N losses and immobilisation through the whole season. Moreover, it has been proved that all the applied N is never available for the plant [10]. Therefore, while interpreting these traits, we have to account for N losses; otherwise the result  can be misleading. For example, the decreasing efficiency at the higher N supply was not necessarily due to decreased plant efficiency but it was mostly because of the greater losses from the system. N fertilisation significantly increased GY and its components (GN and SN) in all seasons (S2 Table). HI was significantly increased by the N treatments in 2012-2013 and 2014-2015. Wheat crops generally respond to the increased N rates with higher HI until an optimum N supply is reached. Then the excessive N application leads to a decline in HI [46]. Since we did not find such decline in population level, it can be presumed that the N fertilisation was not excessive for most of the cultivars.
As an overall consideration, it can be concluded that the "genotype × environment" interaction caused important contribution to variation for most of the traits in our study, indicating that genotypes reacted differently to varying environmental conditions.

Relationship between the parameters
Correlation coefficients between NUE and its components revealed that NUpE explained much more of the variation than NUtE in all seasons ( Table 2). Our results are consistent with previous studies which declared that contributions of NUpE and NUtE to the overall NUE depends mostly on the level of N supply. When the available N is rare, the ability to absorb N is more important, which is mostly related to root characteristics. On the contrary, at high N levels, variation in NUE is mainly due to differences in NUtE. Accordingly, when N uptake is not limited by N soil or climatic conditions, NUtE and HI become more important to produce higher yield.
In our current study, GPC showed negative correlation with NUE, GY and NUtE, while GPC showed mostly no significant correlation with NHI ( Table 2). The inverse relationship between GPC and NUtE in winter wheat is well known. If GY is increased at fixed NUpE, it can cause a decline of GPC, unless NHI is increased [15]. In practice, unfortunately, the simultaneous increase of NHI and GY is very difficult. It was highlighted by [15] that a better postanthesis-N-remobilization is important to increase NHI, nevertheless a high GY is remarkably depending on canopy photosynthesis, which requires a maintained minimal leaf N concentration. The early transfer of photosynthetic N from leaves and stem would lead to a reduced photosynthesis and, as a consequence, to a reduced GY. Therefore, optimisation of timing and spatial patterns of senescence and N remobilisation is desired to increase yield while maintaining GPC.
The analysis of variance revealed that in the years 2012-2013 and 2014-2015 the variation in GN explained most of the variations in GY followed by SN, while in 2013-2014 an opposite tendency was observed. Correlations of NUE (GY) with GN and SN over cropping seasons confirmed the previously mentioned results ( Table 2). It was described, that SN is determined earlier in the growing season than GN and TGW [47]. This period precedes the double ridge stage and it ends at booting stage. In the year 2013-2014 the high N soil and the good soil moisture content increased the spike numbers by a greater production of tillers initially and/or by a higher percentage of tillers that survived to form spikes at maturity. Since we have not monitored tillering, it is not possible to establish which one affected the spike number more effectively. The year 2012-2013 was characterised by low N mineral with severe drought after tillering and probably this complex effect led to the decreased spike number. In the last cropping season (2014-2015) the rainy winter and the mild spring stimulated the vegetative development and the forthcoming spike formation.
GN is determined firstly by the number of spikelets and then by the number of florets that produce grain. Therefore, N deficiency can affect GN during a long period lasting from the double ridge stage to the anthesis. In the year 2013-2014, not just the SN but the GN was also high, indicating a high N mineral and an ideal soil moisture condition. In the 2014-2015 growing season the SN was high, however, low TGW and GN were measured since a drought period began in the spring, which greatly reduced the number of grains per spike.
TGW explained only a minor variation in GY. It showed a slightly negative correlation with SN and GN in most of the seasons as reported by other authors too [46]. It is not surprising, since TGW is mainly dependent on carbohydrate availability, rather than N assimilate supply [16]. SN and GN are determined earlier than TGW. If no sufficient assimilation occurs in the late growing season, the high total grain number (determined earlier in the season) may lead to assimilate partitioning, thus the TGW may decrease.

Distribution of the markers among genomes, population structure and linkage disequilibrium
Among the polymorphic markers, 37% were located on the A, 50% on the B and only 13% on the D genome. A very similar distribution of the polymorphic DArT and DArTseq1 markers were observed in bread wheat collections [38,44,48]. Previous studies reached the conclusion, that larger proportion of genetic diversity is originated from the tetraploid ancestor of hexaploid wheat, much more than from Aegilops tauschii, the source of the D genome [49]. Other studies concluded that high marker density is needed to capture the existing variation for gene localisation on the D genome [17]. In our current work the 3,290 polymorphic mapped markers were sufficient to explore the whole genome with the exception of some chromosomes on the D genome.
Two subpopulations (Sp1 and Sp2) were identified with the Bayesian approach implemented in Structure [32] and confirmed by PCoA. The majority of the 79 cultivar belonged to Sp1, while Sp2 incorporated the remaining 14 cultivars. This structure is consistent with the breeding history of the cultivars. Cultivars in Sp1 mainly originated from Central-Europe and Hungary, while the cultivars in Sp2 descended from Mironovskaya and/or Mv Magdaléna varieties, these cultivars in Sp2 were marketed between 1996 and 2009 by Elitmag Ltd. Martonvásár, Hungary. Separate analysis of chromosomes suggested that different selective pressures may act on chromosomes 1B and 2D between the two subpopulation (S3 Table), which may be related to the presence loci related with important agronomic traits for wheat breeding [50]. High number of pairs with complete LD and high number of linked pairs on chromosome 1B may be related with the 1BL.1RS wheat-rye translocation, which has been introduced in the areas of the former Soviet Union as a source of resistance against foliar fungal diseases through the widely used Russian varieties Bezostaya-1, Aurora and Kavkaz [51][52][53]. Large number of wheat varieties has been developed from crosses involving these varieties. Moreover, these high-yielding, early-maturing, semi-dwarf wheat varieties also carry the Reduced height 8 (Rht8) dwarfing alleles on chromosome 2D [54]. In Hungary the two main breeding programs introduced different sources of resistance against foliar fungal diseases. In Martonvásár (Elitmag Ltd.,) the above mentioned Russian cultivars were frequently used; therefore, the 1BL.1RS wheat-rye translocation played a dominant role. Their progeny cultivar Mv Magdaléna was used for a long period to integrate the 1BL.1RS translocation into the novel varieties [52][53]. On the other hand, in the second main Hungarian breeding company (Cereal Research Nonprofit Ltd., Szeged, Hungary) the Sr36/Pm6 gene cluster was applied at high frequency. The high mean r 2 and high number of linked marker pairs we observed on chromosome 2D might be the result of Rht8 locus alleles. Rht8 dwarfing alleles on chromosome 2D is widely used in Central-Europe to reduce final plant height [38].
Moreover, under Hungarian environmental conditions (a temperate continental environment), the PPD-D1 gene, located on chromosome 2D, has the largest genotypic effect on heading date [55]. Timing of flowering is an important component of adaptation in continental climate to improve grain filling before the onset of desiccating summer conditions. In a world-wide wheat collection of 683 genotypes [55] investigated varieties from entire Europe, including genotypes from our collection. In this study allele frequencies for vernalisation response (VRN-A1, VRN-B1, VRN-D1) and for photoperiod sensitivity (PPD-B1, PPD-D1) were investigated. It was determined that the two most frequent allele combinations among the Central European genotypes carry the recessive (winter) allele for all three VRN1 genes with the sensitive allele type of the PPD-B1 gene, while they differed in PPD-D1. Therefore, the photoperiod-sensitive allele, PPD-D1 must also contribute the observed difference in LD on chromosome 2D between two subpopulations in our experiment.
We studied different aspects of LD in our panel. An average LD decay distance for our total population was found to be 9 cM. Numerous studies in wheat have reported LD decay distances; although, those reported LD decay distances vary greatly between studies. A wheat collection from Northern Europe was characterised by [44] on wheat DArTseq1 platform and the average intra-chromosomal LD decay was calculated separately for SNP markers and sili-coDArT markers. The average LD decay of the whole genome was observed to be at 20.59 and 9.51 cM for silicoDArTs and SNPs, respectively. Recently, an LD decay distance of 23 cM was observed for 94 European bread wheat genotypes using 1,849 genome-wide distributed DArT markers [38]. In another DArT marker-based study, 251 winter wheat lines displayed a narrower, 9.9 cM LD decay for the whole genome [56]. These differences may be caused by different genetic variability and dissimilar population size between the populations. In line with previous results [38] we found the highest LD on the D genome.

Marker trait associations
Here we describe the genetic characterisation of NUE and several agronomically relevant traits. Moreover we also report the identified genetic loci involved in the response to input level under different N fertiliser regimes. In general, B genome showed the highest number of MTAs (93) and it was followed by the A genome (75), while only 15 MTAs were found on the D genome. Among the homologous chromosomes, group 2, 5 and 7 contained the most significant associations, especially on 1B and 7A chromosomes (25 and 19 MTAs, respectively). Particularly, chromosome 1B proved to be highly rich in MTAs, since this chromosome contains the 1RS.1BL translocation, which transfers a number of genes of agronomic importance. The positive and negative impact of the 1RS.1BL translocation on agronomically important traits is well studied [25,57].
Different numbers of MTAs were detected between cropping seasons in our experiment. Considering, that the soil N content and the weather parameters (amount and timing of precipitation, timing and duration of drought, hot periods) were quite different in every year, the three experimental seasons can be considered as three different environments. We found that most of the MTAs were environmental-dependent and also that most of the MTAs were detected in one environment only. These environment-specific MTAs reflect regions associated with adaptation to specific environmental conditions. Twenty-eight markers were associated with multiple traits (S5 Table). This situation occurs when high LD exists between loci for different traits or when the pleiotropy is the main cause of the genetic correlation between traits. Identifying multi-trait MTA can help to understand the physiological control of multiple traits. Despite this environmental variability, 4 DArTseq1 markers were significant for the same trait in two different environments (detailed in Table 3). Three marker (clone ID 3021826, 2257090 and 1125985) were associated in both treatments in the same year for the same trait (GY, NUE and SN) indicating that they were not (or just partially) influenced by N fertilisation, while only one marker (clone ID 1121033) was significant for the same trait (GPC) in two different years. These genomic regions are the best candidates for more extensive studies because they may be involved in constitutive processes for agronomically important traits.
However in the previous works, the genes encoding the key enzymes of N metabolism and the QTLs determining their activities had been investigated mostly with SSR, SNP markers, hence it was not possible to co-localise and compare our results by these studies. Therefore, it would be useful integration of DArTseq1 markers with other marker systems to identify next-generation sequencing markers, which are physically linked to the known major genes responsible for adaptation to environmental conditions and also those genes which are involved in the carbon and N metabolism. This would facilitate the screening of allelic variation in the breeding populations and help investigate the influence of the above mentioned major genes on agronomically important traits.

MTAs for grain yield and its components
Complex traits like GY are regulated by a number of metabolic networks and many traits have a downstream effect on crop yield. Nine chromosomes were involved in the control GY and its most important components (GN, SN). Only few studies used GBS or DArT markers to investigate yield components, therefore it is difficult to align and compare our results by the earlier studies using different marker systems. Identified chromosomal regions affecting grain yield and its components are good candidates to be used in breeding programs to improve the performance of wheat varieties. 12 loci were identified that are involved in the variation of GY, and several of these MTAs correspond to QTLs already reported in other studies, using different wheat mapping populations. A QTL (QYld-1B) for GY was identified by [58] under drought conditions on chromosome 1B. In the same region an MTA (clone ID 3021826) was observed in our study. In a study by [59] two QTLs (QGY.ndsu.2A.1 and QGY.ndsu.2A.2) were identified for GY which correspond to two MTAs (clone ID 1081450 and 1111004) we located on the same position on chromosome 2A. Two QTLs (wPt-7901 and wPt-6687) were observed by [25] for GY on chromosome 2A, where two MTAs (clone ID 2322279 and 2289171) were identified for main yield components (GN and SN) in the present study. Another MTA associated with GY was identified in the present study on chromosome 3B, close to the significant wPt-4209 marker for GY described by [25]. In the present study MTA for GY (clone ID 1073864) was observed on 7AL; in the same region a DArT marker (wPt-6495) was identified earlier by [25]. Our study also introduces loci affecting GY and its components that have never been discovered previously.
SN is the most easily identifiable yield component. On 11 chromosomes, 17 markers were significantly associated with SN. With a single exception on chromosome 5B, only trait-specific MTAs were observed. Some of these MTAs hit genomic regions that were previously reported to affect SN. A significant DArT marker (wPt-9277) for SN identified by [23] in close proximity to MTA (clone ID 2289171), observed in the present study on chromosome 2A. An MTA (clone ID 1113641) for SN is co-localised to the same region as a QTL (QNS.ndsu.7A) and the significant wPt-5153 marker was observed on chromosome 7A by [23] and by [59]. Moreover, MTA associated with SN (clone ID 1137959) was identified in the present study on chromosome 4A, close to the significant wPt-0817 marker [23].
Grain number per spike is an important yield component that has been studied extensively in the past. However, the identified QTLs were not consistent across treatments and cropping seasons, suggesting that the environment had considerable effect on the expression of the genes associated with GN. 15 markers were found to be significantly associated with GN and spread over 10 chromosomes. A QTL (on wPt-9793 marker) for GN was identified [23] in close proximity to an MTA (clone ID 2322279) observed in the present study on chromosome 2A. A QTL (interval between wPt-8604 and wPt-5175) for grains per square meter was identified on chromosome 5B by [60]. In our study MTAs affecting GN were mapped in this region. An another MTA (clone ID 1285290) associated with GN was identified in the present study on chromosome 5B, which coincide with the significant marker wPt-6726 identified earlier by [23].

MTAs for NUE and N-related traits
Each step contributing to NUE, i.e. the uptake, translocation, assimilation, and remobilization of N, is controlled by multiple interacting genetic and environmental factors. Most of the MTAs for NUE and its components were environment-specific in our study. We identified 12 significant DArTseq1 markers for NUE on chromosomes 1A, 1B, 2B, 3A, 3B, 4B, 6A and 7A, across all treatments and cropping seasons. The identified MTAs will facilitate the exploration of the possible structural or regulatory genes controlling NUE under different, specific environmental conditions. Previous studies revealed the importance of NUpE in NUE improvement [11,13]. In our study efforts were made to investigate not just NUpE but the total amount of N accumulated in the above ground biomass (NUp full ).
Genomic regions identified for NUp full by [7], for N uptake by [18] and [19] and for NUpE and NUtE by [61] overlap with regions identified in our study, suggesting that some related regions may be significant. We found 11 markers significantly associated with NUpE and spread over 6 chromosomes (1B, 2B, 3A, 5A, 5B, 7A). Usually these markers were also significant for NUp full , NUp grain and GNACE N-related traits. Likewise in previous studies, chromosome 1B was found to be rich in (5) MTAs for NUpE. In our study 12 NUp full MTAs were presented on chromosomes 1A, 1B, 2B, 3A, 3B, 5A, 5B and 7A. Significant markers for NUp full were related to more than one trait in most cases; besides NUp full, they were also significant for NUpE and GNACE. 8 MTAs occurred in the extensive management system, while under intensive management only 4 MTAs were found.
N utilisation is the other important component of NUE, depending on both N assimilation and remobilisation [22]. In our study 10 MTAs were spread along on 7 chromosomes for NUtE. Most markers were trait-specific, while the remaining markers were significant with GPC. This might to be expected, since an increase of NUtE could reduce grain protein concentration. In our study 8 markers were identified for NHI as significant; altogether, 7 chromosomes were involved. Previous studies identified several QTLs for GPC in winter wheat [23,47,[62][63][64]. In our study a total of 16 MTAs for GPC were detected across treatments and cropping seasons and the significant markers were spread along 10 chromosomes. Most of these trait-specific MTAs occurred in the vicinity of markers significant for NUtE or yield-related traits. Our findings may help to maintain or improve GPC under various N-conditions.
In our study differences in the ability to increase the harvested grain N content was also investigated (NUp grain ). Moreover, the efficiency of the plant to take up, translocate and incorporate N into grain protein was also examined: GNACE measures the overall efficiency with which plants extract N from the soil and accumulate it in the grain [10]. 15 markers were found to be significantly associated with GNACE and NUp grain , spreading over 8 and 9 chromosomes. In most cases these two traits showed significant association on the same markers. Furthermore, usually the same markers were also significant for N uptake. The wPt-1554 marker was in LD with chloroplastic/plastidic glutamine synthetase (GS2) gene on chromosome 2D [22], and in close proximity to this location several MTAs associated with GNACE and NUp grain were identified in our study.

MTAs for responsiveness to N fertilisation
Several significant QTLs have been detected in wheat under low N stress and normal N growth conditions which helped to partially understand the genetic control of NUE and related traits [7,20], although only few studies [18][19] characterised the responses of the agronomic traits to N fertilisation. A novel method, an association genetic study [22] used 3 computed parameters to characterize the response to N input level (for GY and grain number per m 2 ): the difference and the ratio between performance at 2 N input levels and the slope of joint regression.
The present study provides an opportunity for the identification of MTAs induced by response to N fertilisation (RN). For such a characterisation, 8 traits (GY, SN, GN, GPC, NHI, NUp full , NUp grain and NUtE) were expressed as the ratio between N120 and N0. The genomic regions involved in RN included 38 MTAs spread over 14 chromosomes ( Table 3). Most of the identified RN MTAs do not overlap with regions identified for agronomically important traits in our study. Only 2 RN MTAs showed significant association with markers also linked to MTA for agronomic parameters identified in this study. Response to nitrogen level assessed for GN (GN_NR) showed significant association with 9 loci, for NUtE_NR with 6 loci; for GPC_NR, NUp full _NR and SN_NR with 5 loci, for NUp grain _NR with 4 loci and for GY_NR and NHI_NR with 2 loci. A large number of RN MTAs were detected on chromosome 7A, 7B and 3B. A DArT marker (wPt-8330) was found [22] to be involved in the response to N level on chromosome 2D where two MTAs (clone ID 991221) were identified for NUp full _RN and NUp grain _RN in the present study. In the same study two significant DArT markers (wPt-1250 and wPt-5153) were identified for response to N level on chromosome 5B and 7A; in this position MTAs for GPC_RN and NHI_RN (clone ID 1106769 and 1060475) were found in our study.

Conclusions
Here is presented a GWAS analysing NUE with its components and related agronomically relevant traits in a panel of commercially relevant Central European winter wheat cultivars genotyped on DArTseq1 platform. In our current study the marker density was sufficient to identify MTAs almost genome, wide with the exception of some chromosomes of the D genome. So it was concluded that a higher polymorphic marker density is needed on the D genome to successfully reveal the genetic variation in all the homologous chromosomes. Since DArTseq1 is a relatively new approach in cereal genetics, it was a challenge to compare our results with numerous earlier studies using different markers systems. Novel works, based on this marker system and the appearance of comparative maps will definitely facilitate such effort. In our study chromosomal regions involved in the regulation of N metabolism were localised on many chromosomes, many of them coinciding with major genes or QTLs detected in previous studies, indicating the precision of the association study. Furthermore, many DArTseq1 markers were identified being significantly associated in regions where neither genes nor QTL had previously been reported for these traits.
The soil N content and the weather parameters were quite different in every year, therefore strong environmental effect and G × E interaction were observed for most of the examined traits. As a consequence most of the identified MTAs were environmental-dependent. These environment-specific MTAs may reflect regions associated with adaptation to specific environmental conditions. Since our cultivar collection is well-adapted to the Central European region. On the other hand, MTAs identified in the present study may be considered as first step for the understanding of the genetic basis of NUE and agronomically important traits. The determination of those genes by candidate gene identification efforts, which are anchored by markers identified by GWAS may help to explain such complex traits as nitrogen use efficiency at molecular level.
Supporting information S1 Table. Wheat varieties investigated in the recent study. (DOCX) S2 Table. The average phenotypic values of the 15 traits measured in the whole panel and in the two subpopulations (Sp1 and Sp2), at two input levels (N0: 0 kg N hectare-1 and N120: 120 kg N hectare-1). Significant differences between the two treatments (A) and between the two subpopulations in the given traits (B) are indicated with asterisks. (DOCX) S3 Table. Overview of intra-chromosomal LD on each chromosomes in the population.