Association mapping for general combining ability with yield, plant height and ear height using F1 population in maize

General combining ability (GCA) is an important index for inbred lines breeding of maize. To identify the genetic loci of GCA and associated agronomic traits, an association analysis with 195 SSRs was made in phenotypic traits of 240 F1 derived from 120 elite inbred lines containing current breeding resources of maize crossed with 2 testers (Zheng58 and Chang7-2) in two places in 2018. All of the 20 association loci detected for grain yield (GY), plant height (PH), ear height (EH) and GCA for the three traits in two places could explain a phenotypic variation range of 7.31%-9.29%. Among the 20 association loci, 9 (7.31%-9.04%) were associated with GY, 4 (7.22%-8.91%) were related to GCA of GY, 1 (7.56%) was associated with PH, and 3 (7.53%-8.96%) were related to EH. In addition, 3 loci (9.14%-9.29%) were associated with GCA of PH whereas no locus was identified for GCA of EH. In the comparison of the association loci detected in Baoding and Handan, interestingly, one locus (7.69% and 8.11%) was identified in both environments and one locus (7.52% and 7.82%) was identified for yield and GCA of yield. Therefore, the identification of GY-, PH-, EH- and GCA-related association loci could not only provide references for high yield breeding of maize, but also help us comprehend the relationships among GY, agricultural traits and GCA.


Introduction
Maize (Zea mays L.) is one of the main food crops, feeds and industrial raw materials, as well as one of the most cultivated crops worldwide [1]. Therefore, the high and stable yield has been a primary long-term target for maize breeding. As a complex quantitative trait controlled by multigene [2,3], yield had a low heritability and was easily influenced by extragenetic factors, however the composition factors of yield had high stability and heritability [4]. Robinson a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 lines included a core collection of maize germplasm resources of China, derived lines and lines selected from American hybrid. Among them, a portion of core collection resources covered the current resource background of maize breeding basically and had abundant genetic variation. In addition, these inbred lines cover the heterosis groups of core inbred lines in China. Chang7-2 and Zheng58 belonged to Tang SPT and Reid population, respectively, and they were parents of Zhengdan958 which had the largest planting area of 4.0×10 6 hectares with 20% production of maize [25]. 120 association analysis inbred lines and 2 test inbred lines were crossed in NCⅡdesign at Hainan maize breeding experiment station in the winter of 2017. For all the 240 F 1 crosses, a randomized block design with two replications was used in this experiment. Each material was planted in a plot with two rows in 0.60m inter-row spacing, 4.0m long row and 0.25 m plant to plant spacing using a population density of 65,000 plants per hectare at the Experimental Station of Hebei Agricultural University in Baoding (115.48˚E, 38.85˚N) and Experimental Station Agricultural Academy of Handan (114.47˚E, 36.60˚N) in the summer of 2018. All the plants, around with guarding rows, were in field management using local maize tillage methods throughout the growth periods. Drought stress did not occur during the growing season in either year. PH (cm) and EH (cm) of 10 plants from the third to the seventh plant every row were measured continuously in each plot after maize anthesis and silking. Base of plant to the top of the first tassel branch and node bearing of the primary ear were taken as the distance of PH and EH [26]. All ears, except the first and the last ears of each line, were harvested after the mature stage of the maize materials. Ten typical ears were selected after they were dried naturally with a moisture content of below 18% for the purpose of analyzing yield traits.

Phenotype analysis
Phenotype data (GY, PH and EH) were dealt and calculated GCA following the method developed by Wen et al. [27]. Descriptive statistics and analysis of the combining ability effect were carried out using SPSS statistics v21.0 software. The statistical model to calculate GCA was g i ¼ � y i: À � y .. where g i is the GCA effect of inbred lines i; � y i: is the average value of inbred line i and two testers. � y .. is the average value of all mating designs.

SSR genotyping
The seedlings of 120 inbred lines and 2 test lines were cultured in the greenhouse, and the DNA data of seedlings were extracted using CTAB method at four leaf stages of the test population and diluted to 50 ng/ μL. The purity and concentration of DNA was assessed by optical density (OD260/OD280) and the quality of DNA was detected by 0.8% agarose gel electrophoresis. 432 SSRs primer pairs obtained from Maize GDB (http://www.maizegdb.org/), which ensured even distribution of all the SSRs on chromosome. Then 195 SSRs primer pairs with high polymorphism and a clear band were got using 10 inbred lines that were selected randomly as samples from 120 inbred lines. All of the 195 SSRs primer pairs were distributed on the whole genome of maize. Genotype of 120 inbred lines and 2 test lines were identified via PCR amplification and electrophoresis using the method offered by Xu et al. [28]. The SSR amplification reactions were carried out in 96-well microtiter plates using a Bio-Rad Thermal Cycler T100 (USA). Differential banding patterns were recorded using 0, 1, 9, respectively, for no band, having a bright band and missing band in the same mobility position. All records were used to build a genotype database.

Population structure analysis
A set of 195 SSRs distributed on 10 chromosomes were selected to identify the structure of 120 inbred lines using Structure v2.3.4 software [29]. The Structure software with a bayesian clustering method performed three runs for K (set from 2 to 10) to calculate the genetic components. Meanwhile, a parameter of 500,000 were set both to MCMC and length of burn-in period in each run. According to the maximum likelihood criterion, one most suitable K value was determined with genealogy of inbred lines using the method performed by Evanno [30] and following the model below Genetic component (Q value) was employed 50% as the dividing line to determine which population the inbred lines belonged to. Lines with probabilities of membership greater than 50% were placed into the related groups, while those with membership probabilities lower than 50% were allotted to a "mixed" group [31].

Genetic diversity analysis
A database of the genotype data of 120 inbred lines was used to calculate the gene diversity index, the number of alleles per locus, and the polymorphism information content (PIC) which was the most extensive parameter to evaluate genetic diversity using Power Marker v3.25 [32] following the statistical models below Where H is the genic diversity of certain locus; n is the amount of materials; p ij is the frequency of alleles variation of the i th site and the j th site alleles; p i and p j are the frequency of the i th and the j th alleles.

Linkage Disequilibrium (LD)
LD analysis of 240 F 1 was evaluated with parameters r 2 (squared allele frequency correlations) and D' (Linkage disequilibrium coefficient) for SSR marker pairs using TASSEL V3.0 with sliding window size at 50. Allele loci was regarded as linkage disequilibrium when P<0.01.

Association analysis
Genome-wide association study was evaluated using TASSEL V3.0 program [33] based on mixed linear mode (MLM). 195 SSRs overlapping wide genome and 3 trait phenotypes together with population structure matrix (Q matrix) and Kinship matrix (K matrix) which acted as covariate to decrease spurious association [34] were calculated to detect marker loci combining with target traits.

Phenotype analysis
Trait phenotypes and combining ability were analyzed in this study. Results were shown in Table 1. Trait phenotype showed that PH of F 1 ranged from 172.80cm to 267.80cm with the tester of Zheng58 for the materials of and 185.90cm to 322.80cm with Chang7-2 in addition, EH of F 1 ranged from 64.60cm to121.20cm with Zheng58 and 92.60cm to 165.60cm with Chang7-2. As described in this table, the same trait of one tester had a similar drift but a significant difference. Furthermore, the differences of the traits between Baoding and Handan indicated that GY, PH and EH might be susceptible to environment which meant environment variables should be taken seriously in breeding programs. In addition, GCA of yield ranged from -0.63 to 0.65 in Baoding and from -0.79 to 1.11 in Handan. GCA of PH and EH ranged from -31.38 to 62.77 and from -18.41 to 33.49. Moreover both the highest GCA were resulted by materials 78599. The result of analysis of variance for GY was listed in Table 2. As shown as Table 2, the effects of environment and genotype were significant at 0.01 level, indicating the important roles of both genotypes and environment.

Population structure analysis
Results of population structure analysis were shown in Tables 3 and 4. K value and ΔK were evaluated ( Fig 1A). The peak value of ΔK was observed when K = 5. Therefore, an ideal structure of the research population was divided into 5 subpopulations, namely Lancaster, PB, Tang SPT, Reid and Lvda Red Cob (LRC) ( Table 4, Fig 1B). Accordingly, the structure of the research population was not complex and was satisfactory for applying the population structure analysis.

Kinship analysis
Kinship analysis of the 120 inbred lines was carried out in combination with 195 SSR markers using SPAGeDi-1.3d software. Results showed that 58% of the population had no relationship for relative kinship value (K) equaling to 0 and only 6% of the materials had high kinship with K>0.5 (Fig 2).

Genetic diversity analysis
Genetic diversity was analyzed within 120 maize inbred lines in combination with 195 SSRs which covered genome wide using Powermarker v3.25 software. 1,478 allelic variations were obtained and 2~26 loci were detected by one marker with an average of 7.58 loci. In this experiment, the genetic diversity ranged from 0.2950 (umc1794) to 0.9235 (bnlg1863) with an

Marker loci associated with GY, PH and EH
An association analysis within 195 SSRs and GY, PH and EH was made in our research.log 10 p-value was taken as the measuring parameter and the marked loci and trait phenotypes were considered as significant association if-log 10 p-value >2.5,.

Discussion
GY, PH and EH are complex quantitative characteristics. As an effective method, association analysis would accelerate the process for germplasm improvement and research of complex quantitative characteristics. Population structure, kinship, genetic diversity and LD could all influence the accuracy of association analysis [36].

Population structure and kinship analysis
Xie et al. [37] showed that the structure of population would influence the extent of LD and association analysis and divided Chinese maize germplasm into 6 groups. In current study, research population of 120 inbred lines combined with a precise K was divided into 5 subpopulations (Table 4, Fig 1), among which Reid had the maximum proportion with 30 materials accounting for 25% of the materials and PB group was the second major population with 22 materials. All of the 5 groups contained core germplasm resources of China, indicating suitable materials for association analysis. Complex relative kinship could increase the level of LD and spurious association position. In current study, 58% inbred lines of experiment population had relative kinship value of 0, meaning that more than half of the lines had no kinship. This would decrease the false positive rate.

Genetic diversity and PIC analysis
Genetic diversity was an essential factor for improving germplasm and studying complex quantitative characters [38]. In this study, all of the 120 inbred lines had 1,478 allelic gene variations and an average of genetic diversity of 0.7016 and PIC of 0.6593. Hao et al. [39] resulted an average of 0.69 of the PIC for 71 loci founded in Chinese modern wheat. Previous researches had been showed size of sliding window was consider as one factor affected the level of genetic diversity. It meant a large sliding window leaded high genetic diversity [40].
Higher genetic diversity and PIC ensured association analysis. The result of Xie et al. [37] showed that PIC value was correlated with LD level, which meant a higher LD level a locus (PIC>0.7) had compared with the locus (PIC<0.5).

Linkage disequilibrium analysis
LD is the base of association analysis. A successful association analysis relies on the possibility of examining LD between the marker and the phenotypic traits of alleles associated with the maker [41]. In our research, all pairwise markers in experiment had an average of r 2 and D' of 0.041 and 0.480 respectively. Wang et al. [42] studied 145 SSR markers, and the result showed that 63.89% of the markers had an LD range of 18.75%-40.28%. Above all, the population in research had a feasible LD level.

Association loci for yield, PH, EH and the combining ability
GY, as one of the most important emphases of maize, has made a gigantic breakthrough in terms of the use of heterosis. As a vital parameter for phenotype traits, the combining ability is taken more and more seriously by breeders. Xiang et al. [43] found GCA was more important than SCA, resulting that selecting for GCA in rice might be more effective. What's more, the construction of QTL analysis and association mapping provided powerful measures for genetic improvement of grains. Plentiful loci of agronomic traits mapped on chromosomes offered an important theoretical foundation for marker-assisted selection (MAS) breeding. In this experiment, association analysis was used to detect GY-, PH-, EH-and GCA-related gene loci. A total of 20 association loci were identified and 7 association loci for GCA were detected with no locus detected for GCA of EH. The genetic diversity and PIC of the other loci were over 0.5 except umc1794 (bin9.05), and the Phenotypic contribution rate of the these loci were between 7.31%-9.29% which meant all of the these loci were identified as minor genes. The 20 loci were compared with those of previous studies. Cai et al [44] uesd a set of 218 recombinant inbred lines (RILs) was used to evaluate PH, EH, PH/EH ratio and GY and grain yield components. In this study, there was one locus umc1710 (bin7.04) which was consistent with the results of Cai et al's result. What's more, according to the study, only one association locus mmc0241 (bin6.05) associated with GY was detected in both Baoding and Handan which belong to Zheng58 test population indicating that this locus could be stable expression and might avoid influence from the environment whereas no association site was detected in Chang7-2 cross population. The reason might be site mmc0241 was covered by the genetic background of Chang7-2. Combining with genotype of inbred lines and phenotype of F 1 to detect the elite allele sites unacted on hybrid genetic background was a valuable orientation to research which might uncover the sites gathered from inbred lines whereas did not express owing to the masking of alleles interaction. Similarly, Wang et al. [45] detected one qGY6b locus mapped on bin6.05 and at the same time, heterotic locus hlGY6b was also mapped on bin6.05 using CSSL population. umc1061 (bin10.06) detected for yield in Baoding was also identified in GCA of yield which suggested that this locus might be without interference of genetic background and could be expressed stably in both inbred lines and F 1 . Locus umc0284 (bin5.05) was similarly associated with GY in Zheng58 test group in Baoding. Ding et al. [46] showed a QTL linked with test weight in bin5.05 using F 2:3 population, and in this region, Silverio et al. [47] detected one QTL linked with endosperm hardness using F 2 population. Starch was one essential component to yield and one QTL near starch biosynthetic genes was mapped in bin9.03 [48]. In this study, bnlg430 associated with GY in Handan was also mapped in bin9.03. This region might be a concentrated area for yield. However, no product was found to be related to the locus in maize GDB. Above all, these research results would provide momentous references for breeding programs applying the combining ability.

Conclusions
In this study, the F 1 population consisting of 240 hybrids was used for association mapping and 20 association loci for grain yield (GY), plant height (PH), ear height (EH) and GCA for the three traits were detected. These research results would provide momentous references for high yield breeding of maize and applying the combining ability.