Effect of F1 and F2 generations on genetic variability and working steps of doubled haploid production in maize

For doubled haploid (DH) production in maize, F1 generation has been the most frequently used for haploid induction due to facility in the process. However, using F2 generation would be a good alternative to increase genetic variability owing to the additional recombination in meiosis. Our goals were to compare the effect of F1 and F2 generations on DH production in tropical germplasm, evaluating the R1-navajo expression in seeds, the working steps of the methodology, and the genetic variability of the DH lines obtained. Sources germplasm in F1 and F2 generations were crossed with the tropicalized haploid inducer LI-ESALQ. After harvest, for both induction crosses were calculated the haploid induction rate (HIR), diploid seed rate (DSR), and inhibition seed rate (ISR) using the total number of seeds obtained. In order to study the effectiveness of the DH working steps in each generation, the percentage per se and the relative percentage were verified. In addition, SNP markers were obtained for genetic variability studies. Results showed that the values for HIR, ISR, and DSR were 1.23%, 23.48%, and 75.21% for F1 and 1.78%, 15.82%, and 82.38% for F2, respectively. The effectiveness of the DH working step showed the same percentage per se value (0.4%) for F1 and F2, while the relative percentage was 27.2% for F1 and 22.4% for F2. Estimates of population parameters in DH lines from F1 were higher than F2. Furthermore, population structure and kinship analyses showed that one additional generation was not sufficient to create new genotype subgroups. Additionally, the relative efficiency of the response to selection in the F1 was 31.88% higher than F2 due to the number of cycles that are used to obtain the DH. Our results showed that in tropical maize, the use of F1 generation is recommended due to a superior balance between time and genetic variability.


Introduction
Developing doubled haploid (DH) lines in maize has become a common practice in public and private institutions worldwide because of the gain of time in plant breeding programs. The rapid development of DH lines provides more reliable selection than lines obtained through consecutive self-pollination because DH has the whole genome duplicated and thus, it means all its loci is homozygous. In summary, DH methodology includes the following steps: 1) induction of maternal haploids by crossing an inducer line with a donor genotype, 2) identification of haploids at the seed or seedling stage, 3) chromosome doubling of putative haploids selected, 4) self-pollination of the D 0 plants to obtain D 1 lines [1], and 5) multiplication of D 1 lines to be introduced into the breeding program. DH production can occur in a haploid induction cross with F 1 , F 2 or synthetic populations. Currently, breeding programs prefer to use F 1 generation as the base population for haploid induction [2,3], while haploid inductions from F 2 generation have been little discussed in the scientific community.
Among the advantages of using F 1 generation in haploid induction, the possibility of maintaining favorable combinations from the parental lines and the time saved in this process can be highlighted. However, the constant use of F 1 generation over selection cycles could result in a decreased response to selection due to a lower recombination rate in the DH lines compared to the maize lines obtained from the recombinant population [4,5]. In contrast, the use of F 2 generation for haploid induction requires one more cycle in the breeding process, which could increase the genetic variability through the additional recombination [6]. Each generation and synthetic population used in the DH process has advantages and disadvantages in the maize breeding. Thus, the choice of them to be used in haploid induction mainly depends on the aim of the breeding program and not on the performance of the DH lines [7]. However, it is essential to discuss this trade-off between the choice of F 1 or F 2 generation and genetic variability in the haploid induction approach, especially in tropical maize. At present, studies reporting this question are related to temperate maize germplasm or computational simulations [6,7].
After the crosses between haploid inducer and the source population, the next step is the identification of haploid seeds or seedlings. There are different methodologies to separate haploid from diploid in maize such as R1-navajo (R1-nj) marker [8], oil content of seeds [9], flow cytometry [10], differences in early seedling traits [11], red root marker [12], and stomata length [13,14]. Usually, the haploids seeds are selected based on anthocyanin pigmentation in the embryo controlled by the R1-nj because this methodology is easy, cheap, free, and seeds are classified before the artificial doubling. This phenotypic marker, however, has variable expression depending on the source germplasm used as a donor [15], mainly in the cases of inhibitor genes present in the tropical germplasm [16]. Consequently, false positives are commonly found in the haploid samples selected by R1-nj. In tropical maize, the choice of generation influences R1-nj expression. When F 1 or F 2 populations used for haploid induction have an inhibitor gene in their genome, kernels will segregate for the R1-nj phenotype. In turn, haploid kernels may not be efficiently identified, and a half to three-fourths of the haploids could potentially be lost [17]. In this case, donor genotype that has inhibitor genes in their genome can have limited use in DH technology. Chaikan et al. [16] analyzed the effectiveness of R1-nj anthocyanin in haploid induction from different tropical lines and showed that anthocyanin phenotype could be completely suppressed or poorly expressed in some germplasm, making it impossible or inefficient to identify haploids at the seed stage. Although tropical source germplasm influences on the expression of R1-nj anthocyanin in induced seeds, the effects of inhibitor genes in the induction of F 1 and F 2 (F 1 /F 2 ) generations of this genetic background has not yet been studied, especially considering the commercial genotypes. Moreover, a comparison of maize haploid inductions in F 1 /F 2 generations for DH working steps and effectiveness by step has not been studied in detail. This knowledge could help breeders to identify the critical steps in DH production and drive improvement of tropical haploid inducers, as well as direct the logistics planning necessary for each phase of the methodology.
In this context, the goal was to compare the effect of F 1 /F 2 generations on DH production in tropical germplasm, evaluating (i) the R1-nj expression in seeds, (ii) the practical steps used in the methodology, and (iii) the genetic variability estimates of the DH lines obtained.

Plant material
Five commercial single-cross hybrids were selected to represent Brazilian germplasm marketed by private companies (Table 1). Currently, the maize crop in Brazil is represented mostly by hybrid cultivars (88.32%) [18]. In order to study different generations in DH methodology, F 1 hybrids were selfed to produce F 2 populations in the summer cycle of October 2013 to February 2014 at Luiz de Queiroz College of Agriculture-ESALQ/USP, Piracicaba, Brazil.

Haploid induction in a tropical climate
In order to obtain haploid seeds, induction crosses were performed using the tropical inducer LI-ESALQ as pollinator of F 1 /F 2 generations from different source germplasm. This inducer is derived from a cross of two inducer lines (W23 and Stock6) with a maize hybrid adapted to tropical conditions, and it has R1-nj marker responsible for anthocyanin expression in the endosperm and embryo [8].
Induction crosses occurred in the summer cycle of October 2014 to February 2015 at the University of São Paulo/ESALQ in Piracicaba, Brazil. Aiming prevents contamination with other pollens, induction crosses were developed in an isolated field area. A randomized complete block design was used with three replicates. Seeds were planted in 7.0 m rows with a spacing of 0.85 m between rows. To provide ideal conditions for pollination, tropical inducer LI-ESALQ was planted in rows interspersed with source germplasm rows on three different days: on the same day that germplasm sources were planted and five and ten days later. At flowering, female lines were detasseled every day, to the begging until the last flowering day, to enable natural pollination by the inducer.

Seed selection based on anthocyanin expression
Based on the expression of R1-nj marker [8], seeds were separated and grouped into three categories: 1) putative haploids: seeds with a white embryo and purple endosperm; 2) diploids: seeds with purple embryo and endosperm; and 3) inhibitors: seeds with a total absence of purple coloring.
A total of kernels selected as putative haploids, diploids, and inhibitors based on this morphological marker were used in statistical analyses.

Germination and artificial chromosome doubling
After separating seeds, putative haploids were germinated and kept at a controlled temperature of 25˚C for 72 hours. Vigorous seedlings (a typical diploid phenotype) were considered false positives and discarded [19]. For artificial chromosome doubling, seedlings were treated with 0.06% colchicine and 0.75% dimethyl sulfoxide (DMSO) solution for 12 hours [20] (in this study we modified the percentage of DMSO used), and it was kept in the dark at ambient temperature (mean of 23.7˚C). After the 12 hours of the doubling process, seedlings were rinsed in water for 40 minutes and then transferred to plastic cups containing substrate. Seedlings were irrigated twice a day and kept for seven days in a greenhouse located at the Genetics Department of ESALQ/ USP.
Effectiveness of the DH working steps was verified by the overall number of germinated and duplicated seeds and the number of surviving seedlings.

Field experiment after chromosome doubling
After the chromosome doubling and the time of seven days in the greenhouse, young plants were transplanted to the field at the experimental area of the Genetics Department of ESALQ/ USP. It was not used any experimental design in this step. One month after transplanting, false positives were discarded based on their phenotype. Haploid plants are considered less vigorous, and with narrower and more erect leaves than hybrids [21]. Thus, vigorous plants, with a thick, anthocyanin colored stalk, and highly branched tassel were removed. Subsequently, only D 0 plants remained in the field, which allowed the estimation of false discovery rate.
The false discovery rate (FDR) refers to the probability of a sample being genuinely harmful. In other words, it is the proportion of diploid plants present in the group selected as haploid, which was estimated by the following equation, according to Melchinger et al. [9]: Where, in this study, FP (false positive) was the number of diploid plants in the field after the roguing and TP (true positive) was the number of haploid plants that remained in the field.
At the flowering stage, D 0 plants were artificially self-pollinated to obtain D 1 lines. In this step, the total number of D 0 plants and the number of D 0 plants that were self-pollinated were counted.
Finally, at the end of D 0 plants cycle, ears were harvested and selected according to size, number of seeds, and R1-nj expression.

Analyses of phenotypic data
The data obtained from seed selection in the categories of putative haploids, diploids, and inhibitors were used for statistical analyses. After, we estimated haploid induction rate (HIR), inhibition seed rate (ISR), and diploid seed rate (DSR). In addition, data obtained in this study were categorical for independent variables, which consisted of the counts obtained in each seed category (HIR, ISR, and DSR). Therefore, a generalized linear mixed model with multinomial logit distribution was used. Diploid was the reference category. This model allowed to predict the probabilities of different seed rates for germplasm sources and generations used: and DSR in source germplasm k, generation t and block j, μ is the overall mean value, S k is the fixed effect of source germplasm k, G t is the fixed effect of generation t, (SG) kt is the fixed effect of germplasm source × generation interaction, B j is the random effect of block j, and ε ktj is the random effect of experimental error. The binomial distribution used in logit function is expressed by: where π kti is the probability of haploid, diploid, or inhibited seeds in generations t and source germplasm k in the i-eth observation unit (total amount of seeds). These analyses were carried out using PROC GLIMMIX procedure of SAS software (SAS UNIVERSITY EDITION, 2018). Mean values of HIR, ISR, and DSR were discriminated for generation and genotype within each generation, by t-test with R graphics package in software R 3.5.0 (R Development Core Team, 2018). The phenotypic information used in this study can be found in (doi:10.17632/98t8nxgw5s.2).

Effectiveness of the working steps in obtaining doubled haploid lines
Each working step of DH obtention was used to analyze the effectiveness of this methodology, for which it was considered the percentage per se and relative percentage ( Table 2).
The number of units present (seeds, seedlings, and plants, as well as D 1 seeds) in each working step were used as proposed by Melchinger et al. [22]. Thus, the effectiveness of each pratical step was obtained by: per the initial number of putative haploids (E 1 ). % R ¼ E n E nÀ 1 relative percentage of each working step (E n ) per the previous step (E n-1 ).
Percentages per se (%) refer to the steps after germination of putative haploids. In contrast, relative percentages (% R ) correspond to the percentage of biological material from one step per the previous step. At this step of the study, no experimental design was used; hence, the F 1 and F 2 generations were qualitatively compared by the (%) and (% R ) calculations.

Genotyping and quality control
Only the leaf samples of D 0 plants obtained from F 1 /F 2 generations were collected to study the genetic variability through molecular markers because the number of D 1 lines obtained was not sufficient for genotypic analyses. A total of 95 lines in the F 1 generation and 78 in F 2 generation from the five commercial genotypes were used. Samples were collected after the flowering stage. Fertility is a prime indicator of DH plants [23], while haploid plants remain sterile. D 0 plants were randomly chosen to represent all individuals in the population. Thereby, the self-pollinated plants that did not present symptoms of diseases were selected. Samples were genotyped with 7,430 Single Nucleotide Polymorphisms (SNPs) markers. This step was carried out by DuPont-Pioneer through Illumina GoldenGate platform. Genotypic data were optimized for genetic variability studies, population structure, and kinship analyses. Markers that had more than 5% missing data or less than 5% minor allele frequency (MAF) [7] were excluded. Additionally, all heterozygous loci that remained in the data were considered as missing values. Then, all the missing values in the genotypic matrix were imputed [24]. The residual heterozygosity was considered as a missing value because of the presence of chimerism in D 0 cells after artificial chromosome doubling during DH in maize [13,14].
Quality control, conversion of SNP markers into numerical algorithms and imputation of missing values were carried out by the raw.data function of snpReady package [24] of R software 3.5.0 (R Development Core Team, 2018). The genotypic data file used in this study is available in (doi:10.17632/98t8nxgw5s.2).

Analyses of genome variation
Our aim in these analyses was to verify if the additional recombination of F 2 generation in haploid induction could modify the variability to allow the formation of new groups. Population parameters of DH lines derived from F 1 /F 2 generations were estimated for each SNP by group (generations) and subgroup (source germplasm) through the popgen function of the snpReady package of R software 3.5.0 (R Development Core Team, 2018), namely: where H e is the total heterozygous loci for SNP evaluated, D is the total homozygous loci for allele i, R is the total homozygous loci for allele j, and this last allele is of minor frequency.

Polymorphism Information Content
where p i and p j are the frequency of ith and jth allele for SNP evaluated.
3. Nei's gene diversity [25], ðD nn 0 ¼ H nn 0 À H n þH n 0 2 Þ, where D nn 0 measures diversity between the n-th and the n 0 -th subpopulation, H n is the estimate of heterozygosity in the n-th locus, and H n' is the heterozygosity in the n 0 -th locus.
4. Estimation of the potential genetic variance (E VG ), calculated in this study by the sum of the additive and dominance variance portions due to the allele frequencies, with V G = 2pq + 4p 2 q 2 . These measures are being presented as a proxy for genetic variance since the additive, and dominant effects of the loci were not used.
5. Inbreeding effective population size, ðN e ¼ coefficient, which was estimated by the diagonal (diag(K) − 1), being K the kinship matrix of the individuals that compose the subpopulation.
6. Response to selection (RS = i r E VG ), where i is the selection intensity fixed at 0.1, r is the selective accuracy at 0.5, and E VG is the estimation of the potential genetic variance. Later, the estimates obtained were used to quantify the relative efficiency of the response to selection (E RS ), by the equation E RS% ¼ ð where RS is the response to selection of F 1 /F 2 generations and T is the number of cycles used to obtain DH lines. In terms of T value, four cycles were considered in the F 1 generation, and five cycles in the F 2 generation.

Population structure and relationship
In order to study the performance of doubled haploids obtained from F 1 and F 2 generations, due to the additional recombination in F 2 , it was performed a population structure and kinship analyses. Principal component analysis (PCA) was carried out to study population structure through the pcaMethods R package [26]. Moreover, to evaluate the relationship among the DH lines, additive genomic kinship matrix was constructed by the method of Yang et al. [27] through snpReady R package [24] of R software 3.5.0 (R Development Core Team, 2018).

Haploid induction and R1-navajo marker expression
In the multinomial analysis of HIR, DSR, and ISR, significant differences (р < 0.05) were observed among source germplasm, generation, and source germplasm × generation interaction. A total of 415,979 seeds were obtained, where 1.51% were selected as putative haploids, 78.82% selected as diploid seeds, and 19.65% showed inhibited marker expression ( Table 3). HIR of source germplasm ranged from 0.77% to 3.76%, and the highest values were observed for genotype 30F53H in both generations. ISR ranged from 4.60% to 57.75%, and the genotype 2B587PW had the highest inhibition in the generation studied. DSR ranged from 41.47% to 93.88%, and the genotype 2B587PW had the lowest rates, while genotypes 30F53H, STATUS VIPTERA, and DKB390 had the highest rates.

Table 3. Number of total seeds (T S ), diploid (T D ), haploid (T H ), and inhibited (T I ) seeds from source germplasm and generation (F n ) used in multinomial analyses.
Means values of haploid induction rate (HIR), inhibition seed rate (ISR), and diploid seed rate (DSR) are also presented. The F 2 generation showed higher HIR (1.78%) and DSR (82.38%) than F 1 generation (1.23% and 75.21%, respectively). A lower ISR was observed in the F 2 generation (15.82%) than in F 1 generation (23.48%) ( Table 3). Moreover, R1-nj expression was more evident in F 2 than in F 1 , indicating that the inhibitory genes present in the commercial hybrids were heterozygous [16]. Thus, with the additional recombination in the F 2 generation, heterozygous genes segregated and enabled R1-nj expression in the seeds (Fig 1). Comparing the source germplasm used in this work, ears of genotype 2B587PW had the least purplish color, consistent with the results for ISR.

Source
Analyses of the false discovery rate (FDR) showed mean values of 52.12% (Table 4). Source germplasm varied from 63.82% to 21.59% in the F 1 generation and 66.77% to 40% in F 2 .  Effect of F 1 and F 2 generations in doubled haploid production

Percentage per se and relative percentage on working steps
In the effectiveness of the working steps the initial number of seeds varied among each source germplasm and generation, and all of them were used in this study. So, after germination of putative haploids and discard of vigorous seedlings, 46.4% of total seedlings were used in chromosome doubling in both generations (Table 5). Percentage per se of plants in the greenhouse was 36.6%. In the field, the percentage per se was 31.1% before roguing and 14.8% after roguing. D 0 flowering occurred about 45 days after germination. In this step, all the plants that had a fertile tassel and compatible stigma were self-pollinated (5.9%). At the end of the maize cycle, ears harvested showed a percentage per se of 1.6% and ears selected as D 1 lines showed a percentage per se of 0.4%. In summary, from 6048 putative haploid seeds, 27 D 1 lines were obtained when LI-ESALQ inducer was used.
Percentage per se of false positives in the field was 47.8% after rouging. It might happen due to the inducer characteristics, such as low HIR and a high proportion of false positives due to R1-nj expression. For each generation, more biological material was observed in F 1 generation than in F 2 generation. However, the amount of biological material did not affect DH methodology.
Chromosome doubling step had about the same percentage per se in F 1 /F 2 (47% and 48%, respectively). After this step, until the self-pollination of D 0 plants, F 1 generation presented higher values than the F 2 generation. Harvested ears step showed higher values of percentage per se and relative percentage in F 2 . Even F 1 /F 2 generations had shown variation across the working steps, the rate of D 1 lines obtained was the same (0.4%), indicating that generation did not affect the portion of DH lines obtained (Table 5).
Percentage per se in the working steps varies among source germplasm, indicating that the success of the methodology depends on the genetic background.

Quality control in the SNP marker data
A total of 7,430 SNPs markers were used in the DH lines genotyping. However, about 1826 markers were eliminated by quality control. Hence, 173 individuals and 5604 markers were used in the analyses of genetic variability and population structure.

Genetic variability and response to selection
Analyses of genetic variability were performed at the F 1 /F 2 generation and at the subgroup (source germplasm) ( Table 6, S1 Table). Inbreeding effective population size (N e ) was higher in DH lines derived from F 1 (47.56) as also the estimation of the potential genetic variance (E VG ) (2809.67). Mean values of genetic diversity (D G ), polymorphic information content (PIC), minor allele frequency (MAF), and response to selection (RS) were also higher among DH lines derived from F 1 generation than those derived from F 2 . Finally, the relative efficiency of the response to selection (E RS% ) in DH lines of the F 1 generation was 31.88% higher than that of F 2 .

Population structure and genetic relationship
PCA analysis grouped source germplasm into four groups. The first with the genotype DKB390, the second with the genotypes 30F53H and DKB390, the third with the genotype STATUS VIPTERA, and the fourth the genotype BM820 (Fig 2). There was no separation of subgroups due to the F 1 and F 2 generations within each source germplasm, indicating that additional recombination in the DH lines from F 2 generation was not sufficient to create new subgroups. The five source germplasm and their F 1 /F 2 generations were also clustered based on the genomic kinship matrix (Fig 3). The results were consistent with PCA analysis, and also indicated that one additional recombination in the F 2 generation was not sufficient to separate subgroups within a population. Table 6. Population parameters estimates of DH lines obtained from five source germplasm and generations (F 1 and F 2 ). Number of individuals (N˚), inbreeding effective population size (N e ), estimation of the potential genetic variance (E VG ), Nei's genetic diversity (D G ), polymorphic information content (PIC), minor allele frequency (MAF), coefficient of inbreeding (F i ), and response to selection (RS). In parentheses are the maximum and minimum values. Effect of F 1 and F 2 generations in doubled haploid production

Discussion
Inhibition genes present in tropical germplasm can difficult haploid selection in maize DH production once anthocyanin inhibition can be total or partial, depending on the alleles form in the source population used as a donor [14,25]. These mutant genes, known as C1-I, C2-Idf, and in-1D, act on the anthocyanin pathway preventing its expression in seeds [28,29]. When dominant inhibitors are present, such as C1-I, inhibition in seeds is total, and selection of haploids by seed color is not possible [16]. When R1-nj locus segregates, it is possible to identify some haploids, but not all of them, due to the absence of purple coloration in the endosperm. In this study, inhibition genes acted in R1-nj expression because HIR, ISR, and DSR varied among source germplasm and generations (Fig 1 and Table 3). F 2 generation showed higher HIR, DSR, and lower ISR than F 1 , which means that inhibition alleles underwent additional Effect of F 1 and F 2 generations in doubled haploid production recombination in the F 2 generation, causing segregation in these alleles. In Fig 1, it is possible to compare anthocyanin variation among and within ears from F 1 and F 2 generation. FDR analyses showed that the number of seeds selected by the R1-nj marker does not mean more success in the selection of true haploids, due the presence of false positives in the number of seeds selected as haploid (Table 4). Genotype 30F53H showed higher HIR in both generations, but its FDR was 65.29%, indicating that the selection of many putative haploid seeds by R1-nj does not indicate the selection of more haploid seeds. In contrast, genotype 2B587PW had the highest ISR, indicating that its genetic background inhibits anthocyanin expression in a higher percentage of the seeds. These results show the importance of the choice of generation and source germplasm to be used in haploid induction in tropical maize as well as the haploid inducer line. The LI-ESALQ inducer line used in this work had an FDR that ranged from 21.59% to 66.77% (Table 4). Misclassification rates associated with the R1-nj can be quite substantial (�30%) depending on the source germplasm used in the induction crosses [12,15], while the haploid rate depends exclusively on the haploid inducer. This means that, when a haploid inducer has a higher HIR, as temperate inducer lines with 8-15% [11,15], the number of haploid seeds will be higher in the amount of seeds obtained in the induction crosses. In other words, a minor number of seeds induced would be used to perform the DH methodology. In our study, the overall mean of HIR was 1.5%, which is lower than temperate inducers. However, we used an amount of 329,780 seeds to obtain a number of haploid sufficient to conduct the experiment (Table 3). Improving the success of HIR in LI-ESALQ inducer line could be an alternative to reduce the cost and time with induction crosses and seed selection. In this sense, a specific breeding program can increase the HIR in this genotype, because the artificial selection could pressure in the sed 1 locus responsible for the haploid induction [30].
For HIR, ISR, and DSR values, the best generation to induce haploids in tropical maize should be F 2 , given that the segregation of inhibitory alleles would enable greater haploid selection and lower loss of inhibited seeds. However, selecting F 2 generation only for this purpose may not be efficient, since the time necessary to obtain DH lines needs of one cycle more. In addition, the HIR difference between F 1 (1.23%) and F 2 (1.78%) was 0.55%, that is, the low value does not justify the time and resources of an additional cycle for haploid induction from F 2 . Furthermore, the F 2 generation also exhibited lower values of N e , E VG , D G , MAF, PIC, and RS than the F 1 ( Table 6). Due to an additional recombination in F 2 , it was expected that DH populations from F 2 had higher population parameters estimates than F 1 , which was not observed. DH lines derived from different germplasm sources showed delimitated groups between populations in the kinship analysis, which represent the different maize germplasms of private companies used in this work. Even in population structure and kinship analyses, results showed that additional recombination in the F 2 generation were not sufficient to create genetic variability in DH lines. Moreover, the E RS% was 31.88% greater in DH lines of F 1 generation due to the shorter time used when compared with the F 2 generation. According to Sleper et al. [7], the decision of inducing haploids in F 1 or F 2 generation needs to consider factors other than the performance of the resulting DH lines. Therefore, F 1 /F 2 generation and the amount of biological material did not affect the efficiency in obtaining DH lines ( Table 5). The working steps approach can help the breeder to optimize the number of seeds in the induction, the space in the field and greenhouse, and money needed for the laboratory activities. The loss of biological material observed in the methodology occurred because of the number of false positives and some stress factors. Colchicine duplication was the initial factor, followed by transferring the seedlings to the field and finally the rouging of false positives. The overall mean of FDR (52.12%) represented the percentage mean of false positives rouging in the field (47.8% of F 1 and 49.9% of F 2 ). In addition, high temperatures and rains during flowering reduced D 0 fertilization, which indicates the importance of the environmental choice.
Moreover, source germplasm interfered in DH methodology due to the number of false positives and sensibility to colchicine and environment factors.
Results presented in this study about working steps in DH production enables maize breeders to estimate seed quantities in the first step (induction crossing) and plan field areas that will be used. Considering the results showed in each working step, we present below some estimates related to the number of seeds that should be induced with the tropical inducer line LI-ESALQ to obtain 100 DH. The total number of DH lines obtained in each generation was used to perform the estimation: 16 in the F 1 generation and 11 in the F 2 generation. In order to obtain 100 DH from F 1 generation, maize breeders should have approximately 22,368 putative haploid seeds. It would be selected by R1-nj marker from approximately 1,700 million of induced seeds. Thus, considering an average of 520 grains per ear, the field area necessary to perform the haploid induction crosses would be from 3,345 donor plants. Conversely, to obtain 100 DH from F 2 generation, maize breeders should have approximately 22,445 putative haploid seeds, which would be selected from approximately 1,200 million induced seeds. Here, considering an average of 372 grains per ear, the field area necessary to perform the haploid induction crosses would be from 3,362 donor plants. In order to compare F 1 /F 2 generation, results presented above showed that the number of putative haploid seeds (22,368 and 22,445, respectively) and the number of donor plants in the field (3,345 and 3,362, respectively) are not very divergent to justify the use of F 2 over F 1 . Haploid induction in generation F 2 requires less biological material than F 1 (approximately 500,000 induced seeds). However, one additional cycle is also required. Working steps numbers showed in this study were obtained considering one environment. Considering the DH methodology and maize plant breeding, different environments can be introduced in future studies about DH lines performance or the development of new inducers lines.
The results of our study showed that the induction of haploids must continue in the F 1 generation, while F 2 should be used in specific objectives of the breeding program. For example, if a specific maize hybrid that has a favorable genetic diversity to be used in a doubled haploid program and shows high ISR after haploid induction crosses, the use of its F 2 generation could be a good choice. Another advantage of the use of F 2 generation is the possibility to select among segregating plants before haploid induction. However, the continued use of F 1 generation in haploid induction is recommended because it avoids the laborious process of one more self-pollinating to obtain F 2 donor plants, and it offers advantages such as saving time and resources. Additionally, nearly 90% of the donor population genome can be inherited by the haploid individuals, enabling the use of the parent's potential in the next breeding cycles [7]. In some countries, such as Brazil, commercial hybrids can be used as donor sources, which facilitates access to the elite germplasm already present in the hybrid seed market [3]. One limitation showed in this study is the small number of the population used in the DH production. However, the commercial Brazilian maize germplasms have satisfactory genetic diversity, with the most substantial variability between companies [18], and we used five different companies' seeds. Mainly for public institutions or small breeding programs that do not have source germplasm to start a DH methodology, haploid induction in F 1 hybrids is an alternative for accelerating research and obtaining lines. Insertion of exotic germplasm in the breeding program can be expensive and slow, depending on the seeds importation laws of the country. Private seeds companies that already have established heterotic groups, haploid induction in F 1 generation allows that new inbred lines are obtained with 90% of the genome preserved. In this sense, seed companies can obtain hybrids more productively than those in the commercial market. However, the use of commercial hybrids for inducing haploids should not detract from or infringe laws that protect cultivars in the countries in which they are used [3]. Besides, in some countries, the commercialization of transgenic maize seeds is allowed, which means that haploid induction from transgenic hybrids can produce DH transgenic lines. In this situation, it is essential to understand the laws and royalties to need pay to owners.

Conclusions
The present work showed that the doubled recombination in F 2 DH lines was not sufficient to create new groups in population structure and kinship analyses, or increase the population parameter estimates when compared with F 1 . Further, the effectiveness of the working steps analyses, F 1 /F 2 generation showed the same percentage (%) in the total of D 1 ears harvested, indicating that one more generation did not affect the number of DH lines obtained. Thereby, we recommended the use of F 1 generation in doubled haploid production from tropical sources germplasm due to balance in time and genetic variability.
Supporting information S1 Table. Estimation of population parameters of DH lines obtained from each evaluated population and generation. Generation (F n ), number of individuals (N˚), inbreeding effective population size (N e ), estimation of the potential genetic variance (E VG ), Nei's genetic diversity (D G ), polymorphic information content (PIC), minor allele frequency (MAF); coefficient of inbreeding (F i ). In parentheses are the maximum and minimum values. (DOCX)