Heterosis and combining ability in cytoplasmic male sterile and doubled haploid based Brassica oleracea progenies and prediction of heterosis using microsatellites

In Brassica oleracea, heterosis is the most efficient tool providing impetus to hybrid vegetable industry. In this context, we presented the first report on identifying superior heterotic crosses for yield and commercial traits in cauliflower involving cytoplasmic male sterile (CMS) and doubled haploid (DH) lines as parents. We studied the suitability of genomic-SSRs and EST-SSRs based genetic distance (GD) and agronomic trait based phenotypic distance (PD) for predicting heterosis in F1 hybrids using CMS and DH based parents. 120 F1 hybrids derived from 20Ogura based CMS lines and 6 DH based testers were evaluated for 16 agronomic traits along with the 26 parental lines and 4 commercial standard checks. The genomic-SSRs and EST-SSRs based genetic structure analysis grouped the 26 parental lines into 4 distinct clusters. The CMS lines Ogu118-6A, Ogu33A, Ogu34-1A were good general combiner for developing early maturity hybrids. The SCA effects were significantly associated with heterosis suggesting non-additive gene effects for the heterotic response of hybrids. Less than unity value of σ2A/D coupled with σ2gca/σ2sca indicated the predominance of non-additive gene action in the expression of studied traits. The correlation analysis of genetic distance with heterosis for commercial traits suggested that microsatellites based genetic distance estimates can be helpful in heterosis prediction to some extent.


Introduction
In the plant kingdom, the family Brassicaceae holds a great agronomic, scientific and economic significance and comprises of more than 372 genera and 4060 species [1]. Brassica oleracea (CC, 2n = 18) constitutes a diverse group of economically and nutritionally important morphotypes known as cole vegetables (kale, kohlrabi, cabbage, cauliflower, broccoli, brussels sprout) [2]. The Brassica vegetables are also termed as 'super-food' as they are vital source of secondary metabolites, antioxidants, vitamins and minerals [3,4,5,6]. Among the cultivated a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 B. oleracea morphotypes, cauliflower (B. oleracea var. botrytis L.) is an important vegetable crop grown worldwide. Great efforts have been made to improve the productivity and quality of this crop because of its huge economic value and quality attributes [7]. The replacement of open-pollinated varieties with F 1 hybrids become more pronounced in cole vegetables because of their high uniformity, better quality, tolerance to various biotic and abiotic stresses [5,8]. The genetic mechanisms namely, sporophytic self-incompatibility (SI) and cytoplasmic malesterility (CMS) have been used widely in hybrid breeding programme of B. oleracea [5,8,9,10,11,12]. However, frequent breakdown of self-incompatibility has been reported in Brassica vegetables due to the high temperature sensitivity of S-alleles. Thus, SI lines are not always stable and result in 'sibbed' seed in hybrid population [11]. Moreover, maintenance of S-allele lines is time-consuming and expensive. In snowball cauliflower, SI system is very poor or not present at all [11,13]. Under these circumstances, the CMS provides a better alternative for the heterosis breeding in cole crops [5,8,14].
Heterosis or hybrid vigor, is manifested as superior performance of F 1 hybrids as compared to the parents [15,16,17]. Heterosis is highly complex phenomenon and different hypothesis and genetic basis have been suggested to explain the basis of heterosis. [15,16,18,19,20]. Further, the application of genomics tools has suggested the role of epigenetic regulations in explaining the heterosis phenomenon across the crops [16,17,19,20]. Proper selection of inbreds and identification of superior heterotic combinations is crucial for exploiting heterosis in crop improvement. The traditional approaches of quantitative genetics like diallel, generation mean, line × tester analysis and estimating genetic components revealing various gene effects are effective in unraveling genetic basis of heterosis [5,15,21,22]. The measures of both general combining ability (GCA) and specific combining ability (SCA) are necessary for selection of parental lines to develop heterotic combinations [23]. Estimation of GCA provides information on breeding value and additive genetic variance while, SCA is associated with non-additive effects (dominance effects, additive×dominant, and dominant×dominant interactions). Among different biometrical approaches, line × tester analysis is very efficient for estimating GCA effects of lines and testers, SCA effects of cross combinations and revealing information about the nature of gene actions [8,21,24]. The extent of heterosis has been reported to vary with the mode of reproduction, genetic distance of parents, traits under investigation, developmental stage of plant and prevailing environment [23,25,26,27,28,29]. The pair-wise parental GD has been suggested as a good indicator of per se hybrid performance and recognition of heterotic groups [23,25,26,27,28,29].
Different approaches are available to determine genetic distance depending upon morphological traits, horticultural data, biochemical characteristics and DNA markers based genotypic data [30,31]. The SSR (simple sequence repeat) markers have been regarded as the markers of choice owing to their co-dominant inheritance, whole-genome coverage, abundance and high reproducibility [27,32]. However, contradictory results have been reported with respect to the relationship between genetic distance and heterosis in different crops [15,24,25,26,27,28,29,30,33,34]. According to Cress [35], the extent of parental genetic distance is essential but is not enough to assure the significant heterosis. It was also suggested that the better forecasting of heterosis is possible only when genetic distance is less than a definite threshold level [36]. The association of genetic distance and heterosis also depends upon the germplasm, population structure under investigation and methods of calculation. [29,37]. The parents with lesser genetic distance can also display a high level of heterosis when closely related ecotypes of Arabidopsis are used to develop hybrids [16,38]. Contrasting results are reported about relationship between phenotypic distance (PD), genetic distance (GD), heterosis and specific combining ability (SCA) in different crops [30,34,39,40]. Teklewold and Becker [39] have reported significant positive association of PD with mid-parent heterosis (MPH), general combining ability (GCA) and hybrid performance for seed yield in Ethiopian mustard (Brassica carinata), while Hale et al. [26] found no correlation between PD and heterosis in broccoli.
The development of homozygous inbred lines is tedious and time consuming process in B. oleracea as these crops are highly heterozygous in nature [41]. Availability of completely homozygous inbred lines is pre-requisite in hybrid development. Development of inbreds through conventional approaches requires several generation selfing. Presence of high inbreeding depression makes the process of inbred development more complicated. On the other hand, development of doubled haploids (DHs) enables to produce large number of completely homozygous lines within a very short period of time. DH technology has been used widely in B. oleracea in different genetic and genomic research, such as QTL mapping, construction of high density genetic linkage maps and genome sequencing [41][42][43][44][45][46][47][48][49][50][51]. Our group has developed large number of DH lines on Brassica vegetables through isolated microspore culture [41,52,53]. The improved CMS lines of cauliflower have also been developed indigenously through protoplast fusion followed by recurrent backcrossing [9,54]. Till date, only limited information is available regarding combining ability, gene action and heterosis using CMS systems and DH lines to improve yield and quality traits in B. oleracea. Moreover, no report is available in cauliflower regarding the association of GD with heterosis and combining ability for important traits. Several contrasting results have been reported in different crops in this context as heterosis is a complex biological phenomenon [16,17,19,20]. Hence, the present investigation was conducted with the objectives (i) to identify heterotic combinations using CMS and DH lines and to analyze combining ability, nature of gene action and heritability for important traits. (ii) to find out correlation between microsatellites based genetic distance and morphological traits based phenotypic distance with heterosis and combining ability. The present investigation is the first report of heterosis and combining ability study based on CMS and DH lines in cauliflower.

Plant materials
The field experiment was carried out at Baragram Experimental Farm of ICAR-Indian Agricultural Research Institute (IARI), Regional Station, Katrain, Kullu Valley, Himachal Pradesh, India. The experimental farm is located at 32.12N latitude and 77.13E longitudes with an altitude of 1,560 m above mean sea level. The basic plant material comprised of 20 genetically diverse Ogura CMS lines developed after more than nine generations of backcrossing (Table 1). These CMS lines were used as female parent in the breeding programme. Six DH based inbred lines of snowball cauliflower with abundant pollen production developed through isolated microspore culture (IMC) were used as testers (Table 1).

Experimental design
All the recommended package of practices suggested for raising cauliflower crop at IARI-Regional station, Katrain, were followed to grow a healthy crop [55]. The size of the plot was kept 3.0 x 3.0 m 2 with an inter and intra-row spacing of 45 cm. The 20 CMS lines of cauliflower were crossed with 6 DH male fertile testers to generate 120 testcross progenies by following the line × tester mating design [21]. To avoid any natural pollination, CMS lines were grown under net house. The fully opened flowers of CMS lines were pollinated with the freshly dehisced pollen from DH testers. The 120 testcross progenies along with their 26 parental lines were evaluated for various morphological, horticultural and yield -related traits along with 4 CMS based commercial hybrids (HVCF-18, HVCF-29, HVCF-16 from Acsen HyVeg and Pahuja from Pahuja Seeds) as standard checks. The 10×15 alpha lattice experimental design with three replications was used for conducting this study.
For recording the data on various agronomic traits, five well-established plants were randomly selected in each plot/block/replication.

DNA extraction and PCR amplification
The parental CMS lines and DH testers were grown in pro-trays under glasshouse conditions in a soilless mixture of cocopeat, perlite and vermiculite in the ratio of 3:1:1. The fully expanding leaves (100mg) from 25-30 days old seedlings were used for genomic DNA isolation using

Genetic analysis
Among the 350 microsatellite markers, 87 polymorphic genomic-SSR and EST-SSRs loci depicting genetic diversity (S1 Table) were used for cluster analysis, Principal component analysis (PCA) and clustering through neighbor-joining (NJ) UPGMA method using DARwin software version 6.0.017 [60]. For testing the reliability of NJ dendrogram, a bootstrap value of 1000 replicates was used. For the allelic diversity analysis, estimation of observed number of alleles (N) per loci, observed heterozygosity (H o ), expected heterozygosity (H e ) and polymorphism information content (PIC) was computed through software CERVUS version 3.0 [61]. The estimation of PIC for each locus using CERVUS 3.0 was calculated according to the formula; PIC = 1-S Pi 2 , where Pi represents the ith allele frequency in a locus for the genotypes P under study [62].
The genetic structure analysis of the parental population of the testcross progenies was studied with Bayesian model-based clustering approach implemented in STRUCTURE version 2.3.4 software [63] to assign individuals to k clusters and sub-clusters. For the estimation of the proportion of ancestral contribution in each parental line, all simulations were performed by parameter setting as: "admixture model" with "correlated allele frequencies". The algorithm was implemented with 10,000 lengths of the burn-in period followed by 100000 Markov Chain Monte Carlo (MCMC) repetitions and a plausible range of putative k values was kept from k = 1 to k = 10 run independently with 15 iterations for each k. The optimum value of k for determining most likely number of sub-populations was predicted according to the simulation method of DeltaK (ΔK) [64] with the help of web-based STRUCTURE HAR-VESTER version v0.6.94 [65].

Statistical analysis
The agronomic data recorded for each parent, 120 F 1 hybrids and 4 commercial checks in alpha lattice design were subjected to analysis of variance (ANOVA) using GLM procedure of SAS (statistical analysis system) software version 9.4 [66]. The line × tester statistical analysis of GCA, SCA, heterosis, heritability, variance and mean performance for was performed as per Kempthorne [21] through SAS version 9.4. The testing of significance of GCA and SCA effects was done at 5%, 1%, and 0.1% probability through F test. Heterosis estimates for different traits were computed as per Xie et al. [67] based on formulae: MPH% (Mid-parent heterosis) = [(F 1 -MP)/MP] x 100, BPH% (Better parent heterosis) = [(F 1 -BP)/BP] x 100, where MP is midparent and BP is better-parent performance and testing of significance was done at probability of p < 0.05, p < 0.01 and p < 0.01 through F test. The narrow-sense heritability (h 2 ns = V A / V P ; V P = V G + V E ) estimates were categorized into three classes viz., high (> 30%), medium (10-30%) and low (< 10%) [68]. The GA was calculated as = H 2 b × phenotypic standard deviation × K, where K value is 2.06, which is a standardized selection differential constant at 5% selection intensity [69]. The parental lines and testers were clustered into different groups based on sixteen agronomic traits using R software [70]. They were grouped through principal component analysis (PCA) to estimate the explained variance in the first two axes. Pooled data from five randomly selected plants of each genotype per plot per block per replication for all the sixteen morphological and commercial traits were taken for statistical analysis.
The Euclidean distance (ED), hereafter referred as phenotypic distance (PD) was calculated based on sixteen phenotypic traits using R software [70]. The simple matching dissimilarity coefficient (hereafter referred as genetic distance: GD) was computed based on g-SSR and EST-SSRs data analysis using DARwin software version 6.0.017. The association among genetic distances, heterosis and combining ability was computed by Pearson's correlation coefficients (r) (pearson product moment correlation coefficient: PPMCC) by using R software packages version 3.5.1 in Rstudio 1.1.456 [70] and testing of significance at p < 0.05 and p < 0.01. The corrplot displaying correlation among distances, heterosis and combining ability was demonstrated via Rcorrplot package in Rstudio [71].

Analysis of variance
The mean square estimates for different vegetative and commercial traits revealed significant differences among treatments for all the characters except curd diameter, core length and curd size index and curd diameter at 0.01% probability (S2 Table). Similarly, block effects were significant for all the traits except days to 50% curd initiation, curd diameter, core length and curd size index at the probability of 0.01% (S2 Table). The coefficient of determination (R 2 ) indicated high variability percentage (>70%) for all the traits except curd length, curd diameter, core length and curd size index. The higher R 2 value also suggests a higher significance of the model. The line × tester analysis of variance (ANOVA) for combining ability revealed highly significant differences among the treatments and parents for all the studied traits ( Table 2). The mean squares of the lines were found significant for all the traits, while the mean squares for testers were for all except curd length ( Table 2). The significant differences were also observed with respect to lines versus testers for all the traits except leaf width, curd length, curd diameter, core length, curd size index and harvest index, while the mean squares of parents versus crosses were significant for all the traits except for number of leaves ( Table 2). The variance analysis for combining ability also revealed highly significant differences among the120 testcross progenies for all the 16 traits at 0.1% probability, while no significant differences were found among three replications for all the traits except leaf width, suggesting the presence of inherent variability among all the crosses ( Table 2). The line × tester interaction effects were also significant for all 16 phenotypic traits.

Genetic components of variance
The estimation of genetic components of variance revealed that nature of gene action, heritability, genetic advance and degree of dominance is presented in Table 3. The GCA variance (σ 2 gca ) for both lines and testers was found lower as compared to the SCA variance (σ 2 sca ) for all the studied traits except for Days to 50% curd maturity. The value of dominance variance (σ 2 D) was greater as compared to the additive component of variance (σ 2 A) for all the traits except for days to 50% curd maturity. The greater than unity value of degree of dominance for all the traits indicated dominant nature of these traits except for days to 50% curd maturity. The ratio of additive to dominance variance (σ 2 A/D) coupled with predictability ratio (σ 2 gca / σ 2 sca ) was less than unity for all the traits suggesting preponderance of non-additive gene action. The estimation of heritability is associated with selection efficiency. We observed the lowest estimate of narrow-sense heritability (h 2 ns ) for curd length (3.44%) and the highest h 2 ns value was recorded for days to 50% curd maturity (49.21%). Generally, moderate level of h 2 ns estimates was found for majority of the traits except for CL, CD, CSI, and HI. The higher estimates of genetic advance (GA) at 5% selection intensity were observed for GPW, MCW, NCW and LSI, while lower estimates of GA were recorded for all the remaining traits.

Combining ability effects
The estimates of combining ability are effective for early generation selection of inbred lines and identifying heterotic crosses. The GCA estimates of parental lines and testers are summarized in Table 4 and Table 5. The GCA estimates revealed that the CMS lines Ogu118-6A, Ogu33A, Ogu34-1A and Ogu33-1A had significantly high GCA in desirable direction for traits related to earliness. The CMS lines Ogu307-33A, Ogu119-1A, Ogu125-8A and tester DH-53-10 also showed significantly high GCA for days to 50% curd maturity in desirable direction. For core length, the significantly high GCA in the desirable negative direction was observed in CMS lines Ogu122-5A, Ogu118-6A, Ogu1A, Ogu13-85-6A, Ogu1-6A, Ogu122-1A and tester DH-53-10. Among the 20 female parents, 6 and 9 numbers of CMS lines exhibited high GCA for plant height and gross plant weight, respectively. Similarly, 6 and 9 numbers of CMS lines were found to good combiner for the marketable curd weight and net curd weight\ respectively. Among the 20 female parents, 8 numbers of CMS line for plant height, 9 numbers of CMS lines for gross plant weight, 6 numbers of CMS lines for marketable curd weight and 8 numbers of CMS lines for net curd weight exhibited significantly high GCA in desirable direction. The results pertaining to SCA effects of 120 cross combinations are presented in S3 Table. Among the 120 hybrids, 9 and 28 crosses showed significantly negative SCA effects for earliness traits CI and CM, respectively (S3 Table).

Phenotypic characterization
The mean performance of the parental CMS and DH lines along with standard checks is presented in S4 Table and S1A-S1H Fig. On (Fig 1B). On the basis of HCA, the 26 parental lines were classified into 3 major clusters with varying extent of divergence within internal sub-clusters. The DH lines DH-53-10 and DH-18-8-1 were distantly placed from the rest of DH testers in two different major clusters.

Microsatellites based polymorphism, allelic diversity and genetic distances
Overall 511 alleles were amplified through 87 SSRs (Table 6) with the mean number of alleles per locus was 5.87. The allele numbers per locus ranged from 2 (1 primer BoSF1640) to 10 (1  (Table 6). Further, the PCA and Neighbour joining (NJ) cluster analysis based on molecular data for 87 loci, revealed distinct clusters and sub-clusters of parental CMS and DH lines based on their phylogeny (Fig 1C). The PCA revealed that the first two major coordinate axes 1 and 2 (PC1 and PC2) explained 61.41% of the total existing variation among CMS and DH lines. The dendrogram constructed revealed 3 main clusters of parental lines with internal sub-clusters showing varying degree of diversity. The DH testers remained in 2 different sub-clusters of the single main cluster. The CMS lines Ogu2A and OguKt-9-2A placed distantly from the rest of CMS lines. The CMS lines Ogu33-1A and Ogu125-8A were in close affinity with DH lines. The Euclidean distance (PD) between lines and testers were computed from 16 phenotypic traits (S5 Table) and GD was calculated from molecular data based on 87 microsatellite markers (genomic-SSR and EST-SSRs) used for assessment of genetic diversity between parents (S5 Table). The PD was ranged from 2.07 for the cross L16 × T6 (Ogu12A × DH-53-10) to 8.27 for the cross combination L5 × T1 (Ogu309-2A × DH-18-8-1) with a mean of 5.52. The GD was ranged from 0.44 for the cross L20 × T1 (Ogu33-1A × DH-18-8-1) to 0.98 for the cross combinations L4 × T5 (Ogu307-33A × DH-53-9) with the average GD of 0.83.

Genetic structure analysis
The result of analysis by 'Structure Harvester version v0.6.94' revealed that second-order likelihood, ΔK reached to peak at k = 4 (Fig 2A to 2C), hence, optimal k value should be 4. This indicated that the 26 parental CMS and DH inbred lines could be grouped into 4 genetic subclusters ( Fig 2C). All the DH testers placed in cluster III, along with 2 CMS lines Ogu125-8A and Ogu33-1A. Although there is minor admixture in DH-53-10 and Ogu125-8A from the genotypes of cluster I and cluster II, respectively. The other CMS lines placed themselves in separate clusters. The 20 parental CMS lines were grouped into 4 sub-clusters. The majority of the CMS lines were placed in cluster I. There is admixture from cluster IV to the cluster I (Ogu1A, Ogu13-85-6A) and cluster II (Ogu122-1A) genotypes. Similarly, there was minor admixture from cluster I and cluster II to cluster IV genotypes (Ogu1-6A and Ogu22-1A). Thus, there were four distinct four sub-clusters including minor gene flow within some genotypes of respective clusters from each other.

Analysis of heterosis
The heterotic response of all the 120 testcross progenies varied in magnitude and highly significant heterosis was observed for all the 16 traits in both directions. The top ten cross combinations based on significant mid-parent heterosis in desirable direction along with their betterparent heterosis and SCA effects are presented in S6 and S7 Tables. The cross combinations, Ogu34-1A×DH-53-1 and Ogu33A×DH-53-6 showed significantly high MPH in the desirable direction for days to 50% curd initiation and days to 50% curd maturity (S6 Table). For CI, the testers DH-53-1 and DH-53-9 were involved in 4 crosses individually out of top 10 crosses. among parental genotypes based on morphological traits (C) UPGMA cluster analysis illustrating the genetic relationships among parental genotypes based on g-SSR and EST-SSR analysis.
https://doi.org/10.1371/journal.pone.0210772.g001 For CM the CMS line Ogu33A as a female parent was involved in 6 hybrids for earliness among top 10 hybrids. Ogu33A was also involved as female parent in one of the top 10 cross combinations related to CI. This line had significantly highest GCA for earliness among all the CMS lines used in the study. For the plant height, among the top 10 heterotic crosses, the cross combination Ogu118-6A×DH-53-9 exhibited highest significant positive heterosis over midparent followed by OguKt-2-6A×DH-53-9 and Ogu34-1A×DH-53-9. The highest significant positive heterosis for GPW was observed in the cross Ogu118-6A×DH-53-10 over mid-parent followed by Ogu126-1A×DH-53-1 and Ogu307-33A×DH-18-8-3. The top ten crosses having significant positive MPH for 8 commercial traits are presented in S7 Table. The hybrid Ogu122-1A×DH-53-6 exhibited significantly highest MPH for CoL in desirable negative direction followed by Ogu1A×DH-53-6 and Ogu1A×DH-53-10. The CMS line Ogu1A was involved in 3 crosses as female parent among top 4 crosses with respect to core length, and it had significantly highest GCA in the desirable negative direction for core length. Thus, Ogu1A could be used as a parent for developing hybrids with short core.

Association of genetic distances, heterosis and combining ability
The Pearson correlation coefficient of genetic distances with heterosis, combining ability, and combining ability with heterosis for ten commercial traits is presented in Table 7. The GD and PD exhibited no significant correlation coefficient with SCA for any of the traits (Table 7 and S2B Fig, S2C Fig). SCA showed significantly positive correlation with MPH and BPH for all the traits at P � 0.01 (Table 6, Fig 3B and 3C). No significant association of GD with MPH and BPH was observed with respect to days to 50% CM, leaf length, curd length and core length ( Table 7, Fig 3A). For the commercial traits viz. plant height, gross plant weight, net curd weight, leaf width, curd diameter and total marketable yield, significant correlation was observed between GD and MPH in desirable direction for the respective traits (Table 7, Fig  3A). The highest magnitude of correlation coefficient of GD with MPH and BPH in desirable direction was observed for LW. However, PD exhibited no significant correlation with heterosis for majority of traits (Table 7, S2A Fig). The significant correlation of phenotypic distance with mid-parent heterosis was observed only for leaf length. PD exhibited a significant correlation in undesirable direction with core length. However, no significant association was Table 7. Pearson correlation coefficients among parental genetic distance (GD, PD), combining ability and heterosis in cauliflower for ten morphological and commercial traits. observed between parental genetic distances based on phenotypic traits (PD) and molecular data (GD) (r = -0.04) (Fig 3D).

Genetic components of variance and combining ability
The analysis of variance depicted highly significant differences among all the treatments for all the 16 agronomic traits, indicating considerable genetic differences among parents and their testcross progenies. The success of any crop breeding programme relies on genetic variation present in the germplasm. High genetic divergence among genotypes was reported by Garg and Lal [72], Verma and Kalia [73] for yield and related traits in early maturity Indian cauliflower, and for antioxidant traits in snowball cauliflower [5]. All the studied traits were found to be under the genetic control of both additive and non-additive gene effects, as revealed by Scatter plot depicting no correlation between genetic distance based on molecular data and phenotypic distance. GD is on X-axis and PD on Y-axis.
https://doi.org/10.1371/journal.pone.0210772.g003 the significant mean squares of lines, testers and line × tester interactions. The results are in agreement with Singh et al. [5] for antioxidant traits in cauliflower and Verma and Kalia [73] for days to 50% CM, leaf area, PH, MCW, NCW, curd compactness, GPW and HI in cauliflower using SI inbred lines. The analysis of genetic components of variance indicated the importance of SCA in developing heterotic crosses as revealed by the higher value of σ 2 sca than σ 2 gca of lines and testers for the majority of traits. As the response to natural and artificial selection relies on additive genetic variance, the narrow sense heritability (h 2 ns ) holds a great promise in plant breeding as it provides the basis to precise selection of genotypes based on phenotypic variance because of additive genetic components [74]. In this study, the low to intermediate level of h 2 ns was observed for majority of vegetative and commercial traits suggesting non-additive genetic control of these traits, which might be due to large epistatic effects. We also observed moderate estimates of h 2 ns for antioxidant traits in cauliflower in our previous study [5], the results are also in agreement with Xie et al. [67] for mineral content in Chinese cabbage. Thus, the early generation selection for these vegetative and commercial traits would be difficult due to dominance effects in the expression of phenotypic variance, and hence selection must be practiced in later generations. The combining ability analysis has been successfully utilized in crop breeding for evaluating parental performance and understanding the dynamics of genes involved in trait expression. The parental GCA estimates in desirable direction also indicate potentiality of parents in generating promising breeding populations. In the present investigation, the significantly high GCA effects of parental lines in desirable direction for the respective vegetative and commercial traits are due to the predominance of additive genetic effects of genes and additive × additive interactions [5,10]. It depicts a desirable gene flow from parents to progeny at high frequency and these parental lines exhibiting significantly high GCA for the respective traits in desirable direction can be utilized to accumulate favorable alleles via recombination and selection [5,9,10,75,76]. Further, our results revealed that none of the parents was good general combiner for all the studied vegetative and commercial traits. Similar findings were also reported inself-incompatible (SI) and CMS lines in cauliflower for yield and quality traits [5,9,73]. The findings indicate requirement of multiple breeding programmes in suitable mating designs for the development of productive cultivars with the accumulation of positive alleles of genes. On the other hand, the parental lines depicting GCA in opposite direction for the respective traits can be utilized to generate desirable mapping population to study the genetics of respective traits [76]. The SCA, which reflects the loci having non-additive and epistatic gene effects, can be utilized to determine specific heterotic crosses for the respective trait of interest. The inconsistent association of GCA and SCA of respective crosses for respective traits is the indication of complex interaction of genes for quantitative traits [29]. The majority of testcross progenies manifesting significantly high SCA in desirable direction had at least one of the parents reflecting poor GCA effects (poor × good general combiner or good × poor general combiner). It may be attributed to good combiner parent depicting favorable additive effects and poor combiner parent displaying epistatic effects [5,77]. Few crosses with significantly high SCA in desirable direction for various traits had both parents with good GCA. (Such results suggested the role of cumulative effects of additive × additive interaction of positive alleles [5,77]. Concurrently, some of the crosses had poor SCA effects for the respective traits, despite involving parents with significant GCA. It might be ascribed to the absence of any interaction among the positive alleles of genes. Thus, our results have clearly indicated that breeders must pay attention to both GCA and SCA in the selection of elite parents for the development of heterotic hybrids. Further, the recombination breeding and random mating in conjunction with selection among segregates (recurrent selection), synthetics and composites, may be exploited to harness utility of both additive and non-additive gene effects in cauliflower [73]. The high SCA effects are not always correlated with significantly high heterosis and concurrently, the heterotic crosses exhibiting high mid parent and better parent heterosis.

Cluster analysis, allelic diversity, and genetic structure
The morphological and molecular diversity is vital in selecting desirable parents for hybrid breeding. The identification of heterotic groups and analyzing existing genetic variation in CMS lines is the preliminary requisite for efficient use of elite CMS lines in heterosis breeding. Study of genetic diversity at morphological and molecular level has been regarded as potential tool in identification of promising parental lines for developing heterotic hybrids in B.oleracea [8,[78][79][80]. Based upon PCA and HCA of 26 parental lines for 16 phenotypic traits, it was evident that all the parental lines had sufficient genetic variation. The PCA and NJ clustering based on molecular data represent better informative results for correct analysis and to be useful in crop improvement programme [8]. The female parents with significantly high GCA in the desirable direction for the traits related to earliness and high could be useful in developing early maturity and high yielding hybrids. Thus, the information pertaining to morphological and genetic diversity along with GCA could be useful in selecting desirable CMS lines as female parent for the development of cultivars with desirable traits.
We also observed high allele frequency of overall 511 alleles through 87 genomic-SSR and EST-SSRs loci in 26 parental CMS and DH lines with an average allelic frequency of 5.87 alleles per locus. Allelic diversity could be due to variation in germplasm used in the study, difference in methods of detection of markers and number of markers from transcribed regions of genome etc. [78,79,81,82]. Varied allele frequency has been reported by various workers in Brassica crops with different set of germplasm and molecular markers [78,79]. The PIC in genetic studies is utilized as a measure of informativeness of a marker locus for linkage analysis [78,83] and it categorizes informative markers as highly informative (PIC � 0.5), reasonably informative (0.5 < PIC >0.25) and slightly informative (PIC < 0.25) [78,83]. In the present study the PIC content of 87 polymorphic loci ranged from 0.24-0.80, which classified all the 87 loci (g-SSR and EST-SSRs) as slightly informative (1 primer cnu107), reasonably informative (12 primers) and highly informative markers (74 primers). The mean PIC content of 0.63 in present investigation based on 87 genomic-SSR and EST-SSRs was higher than the mean PIC of 0.316 observed for 165 cauliflower inbred lines by Zhu et al. [84] and 0.60 as recorded for 57 genotypes of Brassica oleracea comprising 51 cultivars of cauliflower by Zhao et al. [85]. The DH lines developed through microspore culture are very diverse from most of the CMS lines. Therefore, these DH lines will be instrumental in developing heterotic combinations in snowball cauliflowers where genetic base is very narrow. Moreover, the diverse DH lines indicated the role of DH technology in creation of more diversity in different plant species. In all the clusters minor admixture was observed from each of clusters among themselves, which indicated the gene flow among the parental lines of different groups.

Association of genetic distances and combining ability with heterosis
Numerous studies in different crops have revealed the utility of the genetic distances in prediction of heterotic crosses [23,[25][26][27][28][29], assuming positive correlation of genetic distances with heterosis [86]. But the correlation between GD and heterosis is not absolute and significantly high level of heterosis may result when parents with low, intermediate or high genetic distance are used. Genetic distances based on both phenotypic and genotypic data are utilized to study the genetic variation among different genotypes or parental inbred lines. The contrasting reports are available regarding correlations of genetic distances, heterosis and combining ability. In the present study, no correlation was observed between two distance measurements, based on morphological data (PD) and molecular data (GD). This is in contrary to the findings of Gupta et al. [30] who reported a significantly positive correlation of GD and PD (r = 0.2) at P < 0.001 in pearl millets. Absence of correlation between two distance measures might be due to the fact that morphological traits showing continuous variation are largely influenced by environment and polygenic inheritance besides linkage disequilibrium [30,[87][88][89]. Both the distance measures displayed no significant correlation with SCA of all the traits, suggesting genetic distances might not be effective in predicting SCA effects. Su et al. [29] also reported no significant association between genetic distances and SCA in chrysanthemum. However, Tian et al. [34], Lariepe et al. [28] reported a significant correlation between total GD and SCA for the length of terminal raceme in rapeseed and for grain yield and plant height in maize, respectively. Thus, the association of GD with SCA is not absolute. Further, our results suggested that SCA effects had a stronger significant positive correlation with MPH and BPH for all the studied traits. Similar findings were reported by Zhang et al. [90], Su et al. [29], Tian et al. [34] in barley, chrysanthemum and rapeseed, respectively they indicated non-additive gene effects for heterosis. The GD and PD differed in their ability to predict MPH and BPH for different traits. Neither GD nor PD displayed any significant correlation with MPH and BPH for days to 50% CM, and CL. GD also exhibited no significant correlation with heterosis for LL and CoL. Similarly, PD showed no significant association with heterosis for majority of traits except LL. However, GD was significantly correlated with MPH and BPH for important commercial traits viz. PH, GPW, NCW, LW, CD, and TMY in the desirable direction. These results are in line with the theory proposed by Falconer and Mackay [86]. In general, GD had a greater magnitude of correlation than PD with heterosis for all the traits under study. The variability in correlation coefficients between heterosis for respective traits and genetic distances may reflect allele numbers controlling the trait expression [26]. Similar findings were reported in Maize by Wegary et al. [89] where they have highlighted the importance of GD in contrast to PD for predicting hybrid performance. They have also reported a significant correlation of GD with heterosis for grain yield, plant height and ear height and morphological distance with heterosis for certain traits in quality protein maize. Jarosz [33] has reported a significant association of GD (based on RAP and AFLP markers) with heterosis for total and marketable yield in carrot. On the other hand, Tian et al. [34] and Su et al. [29] reported no significant correlation of PD and GD with MPH and BPH for any traits in rapeseed and chrysanthemum, respectively. Our results are also in contrary to the findings of Geleta et al. [91] and Kawamura et al. [27] in pepper and Chinese cabbage, respectively, suggesting no utility of GD in prediction of heterosis. While Krishnamurthy et al. [92] suggested selection of parents with intermediate divergence based on AFLP markers for getting a greater number of heterotic hybrids using CMS lines for yield in the hot pepper. Incole group of vegetables, we found only a single report describing interrelationships between genetic distances and heterosis in broccoli (B. oleracea var. italic L.) by Hale et al. [26] using DH based population. They observed a significantly negative correlation between total GD (based on SRAP, AFLP, SSR markers) and heterosis for all the traits, suggesting the reduction in heterosis with the increase in genetic distances. Thus, our study is the first comprehensive report regarding interrelationships between GD (based on SSR, EST-SSRs) and heterosis for commercial traits in snowball cauliflower, suggesting significant correlation in desirable direction for important yield related traits. Hence, based on this study, we recommend the application of genomic-SSR and EST-SSRs based genetic distances in prediction of heterosis for yield and commercial traits. The non-significant or poor correlation between GD and heterosis for certain traits might be due to the lack of linkage between different alleles responsible for expression of particular trait and molecular marker used for estimating GD, inadequate coverage of entire genome and epistasis. Besides, lack of correlation may also be due to use of DNA markers from unexpressed region of genome having no interaction with commercial traits and heterosis [26,40,62,93]. The molecular marker-based GD would be more predictive of heterosis, when there are strong dominance effects among hybrids, high heritability, linkage of molecular markers and QTLs of traits of interest [26,40,62,93]. Hence, from our findings, it is quite evident that significance of genetic distances in prediction of heterosis inevitably depends upon the methods used to calculate genetic distances, type of molecular markers, genome coverage, region of genome, crop, breeding system, traits under consideration, type of germplasm and environmental conditions. High correlation among the GD and heterosis among important traits may also because of completely homozygous DH lines as male parent. Presence of minor heterozygosity among the conventionally developed inbreds also results in poor correlation among GD and heterosis in most of the earlier study.

Conclusions
The current study is first of its kind in determining the heterotic groups based on combining ability for morphological, yield and commercial traits using Ogura cytoplasm-based CMS lines and DH based testers. We also presented the first comprehensive report on predicting the association of genomic SSRs and EST-SSRs based GD and morphological traits-based PD with heterosis involving CMS and DH parental lines in snowball cauliflower. Analysis of variance of parents and their testcross progenies revealed the presence of significant genetic variability. Present investigation also emphasizes the relevance of both GCA and SCA in the selection of elite parents for the improvement of yield and commercial traits and predicting appropriate breeding strategies for the crop genetic improvement, developing high yielding hybrids, synthetics and composites in cauliflower. High and significant correlation among SCA with heterosis suggested the role of non-additive gene effects in heterosis. It was also evident that development of DH lines could broaden the genetic base of any crop through creating more diversity in the existing population. The findings of our study further suggested that genetic distances based on genomic and EST-SSRs can be used as a predictor of heterosis for commercial traits in CMS and DH based F 1 cauliflower. Contrasting results obtained in different earlier studies regarding the efficacy of genetic distances in the prediction of heterosis, invites further investigation with different sets of large number of molecular markers covering entire genome, and a different set of parental germplasm, in multiple standard environments.  Table. Estimates of Mean Squares and R2 for vegetative and commercial traits in Alpha Lattice Design. � = significant at 5% probability, �� = significant at 1% probability, ��� = significant at 0.1%, ���� = significant at 0.01% probability through F test, Rep = Replication, Blk = Block, Trt = Treatment, Rep (Blk) Adj = Rep (Blk) Adjustable Days to 50% CI = Days to 50% curd initiation, Days to 50%CM = Days to 50% curd maturity, PH = Plant height, GPW = Gross plant weight, MCW = marketable curd weight, NCW = net curd weight, LL = leaf length, LW = leaf width, NoL = No of leaves, CL = curd length, CD = curd diameter, CoL = core length, CSI = curd size index, LSI = leaf size index, HI = harvest index, TMY = total marketable yield, R 2 : coefficient of determination. (DOCX) S3 Table. Estimates of SCA effects of 120 test cross progenies for yield and horticultural traits. � = significant at 5% probability, �� = significant at 1% probability, ��� = significant at 0.1%, ���� = significant at 0.01% probability through F test, CD = critical difference, Days to 50% CI = Days to 50% curd initiation, Days to 50%CM = Days to 50% curd maturity, PH = Plant height, GPW = Gross plant weight, MCW = marketable curd weight, NCW = net curd weight, LL = leaf length, LW = leaf width, NoL = No of leaves, CL = curd length, CD = curd diameter, CoL = core length, CSI = curd size index, LSI = leaf size index, HI = harvest index, TMY = total marketable yield.  Table. MPH of top ten crosses along with their BPH, mean performance and SCA effects (value in parenthesis) for 8 vegetative traits. � = significant at 5% probability, �� = significant at 1% probability, ��� = significant at 0.1%, ���� = significant at 0.01% probability through F test, MPH: Mid parent heterosis, BPH: better parent heterosis, SCA: specific combining ability (value in parenthesis). (DOCX) S7 Table. MPH of top ten crosses along with their better parent heterosis and SCA effects (value in parenthesis) for 8 commercial traits. � = significant at 5% probability, �� = significant at 1% probability, ��� = significant at 0.1%, ���� = significant at 0.01% probability through F test, MPH: Mid parent heterosis, BPH: better parent heterosis, SCA: specific combining ability (value in parenthesis). (DOCX) 38. Groszmann