Brazilian germplasm of winter squash (Cucurbita moschata D.) displays vast genetic variability, allowing identification of promising genotypes for agro-morphological traits

Winter squash fruits (Cucurbita moschata D.) are among the best sources of vitamin A precursors and constitute sources of bioactive components such as phenolic compounds and flavonoids. Approximately 70% of C. moschata seed oil is made up of unsaturated fatty acids, with high levels of monounsaturated fatty acids and components such as vitamin E and carotenoids, which represent a promising nutritional aspect in the production of this vegetable. C. moschata germplasm expresses high genetic variability, especially in Brazil. We assessed 91 C. moschata accessions, from different regions of Brazil, and maintained at the Federal University of Viçosa (UFV) Vegetable Germplasm Bank, to identify early-flowering accessions with high levels of carotenoids in the fruit pulp and high yields of seed and seed oil. Results showed that the accessions have high variability in the number and mass of seeds per fruit, number of accumulated degree-days for flowering, total carotenoid content, and fruit productivity, which allowed selection for considerable gains in these characteristics. Analysis of the correlation between these characteristics provided information that will assist in selection to improve this crop. Cluster analysis resulted in the formation of 16 groups, confirming the variability of the accessions. Per se analysis identified accessions BGH-6749, BGH-5639, and BGH-219 as those with the earliest flowering. Accessions BGH-5455A and BGH-5598A had the highest carotenoid content, with averages greater than 170.00 μg g-1 of fresh mass. With a productivity of 0.13 t ha-1, accessions BGH-5485A, BGH-4610A, and BGH-5472A were the most promising for seed oil production. These last two accessions corresponded to those with higher seed productivity, averaging 0.58 and 0.54 t ha-1, respectively. This study confirms the high potential of this germplasm for use in breeding for promotion of earlier flowering and increase in total content of fruit pulp carotenoids and in seed and seed oil productivity.

Introduction makes it possible to estimate the magnitude of the genetic and statistical parameters of characteristics of interest, which can provide information on the nature of variability observed for these traits, in addition to elucidating which characteristics or groups of characteristics most contribute to germplasm variability. From this assessment, it is also possible to assess the association between the characteristics evaluated. Together, the information obtained from these assessments is essential for optimising the use and management of plant germplasm.
The UFV Vegetable Germplasm Bank (BGH-UFV) maintains more than 350 accessions of C. moschata, constituting one of the largest collections of this species in Brazil [34]. This bank continually carries out work on the characterisation and evaluation of this germplasm [35], which has allowed the sources of resistance to important phyto-pathogenic agents to be identified [36], and its production [21] and nutritional aspects of fruits and seed oil to be improved [10,37]. The potential of this germplasm as a source of genes for the improvement of this crop, along with the possibility of elucidating the genetic mechanisms linked to important production parameters, justifies the continuation of studies on its assessment and use.
This study therefore aimed to: a) agro-morphologically assess some of the C. moschata accessions maintained by BGH-UFV, b) analyse the genetic relationships of these agro-morphological characteristics, and c) analyse their agro-morphological variability, with a view to identifying earlier-flowering genotypes, genotypes with high total levels of carotenoids in the fruit pulp, and those with high potential for seed and seed oil productivity.

Origin of germplasm and preparation of seedlings
In this study, we assessed 95 genotypes, comprising 91 accessions of C. moschata maintained in the BGH-UFV, and four control genotypes (Fig 1). The controls comprised the commercial hybrids Tetsukabuto and Jabras, and the cultivars Jacarezinho and Maranhão, all widely cultivated and commercialised in Brazil. The accessions came from different regions of Brazil [35], and consisted, for the most part, of landraces collected from family-based farmers, who commonly select the genotypes and conserve their seeds.
Seedlings were produced in a 72-cell expanded-polystyrene tray containing commercial substrate. Seedling transplantation and cultural treatments were carried out according to local recommendations for the cultivation of pumpkins [38].

Experiment location and experimental design
The experiment was carried out from January to July 2016, at "Horta Velha" (200˚45'14'' S, 420˚52'53'' W and 648.74 m alt.), an experimental unit of the Agronomy Department of the Federal University of Viçosa, Viçosa-MG, Brazil.
The experiment was arranged in a Federer's augmented block design [39], with five replications for each control. The four controls, also called common treatments, were randomly distributed in each of the five blocks, and the 91 accessions, called regular treatments, were randomly assigned to all blocks. A spacing of 3x3 m between plants and rows was adopted, which resulted in a stand of 1,111 plants ha -1 . Each plot consisted of five plants, and all assessments were carried out from three central plants. The evaluations of fruit and seed characteristics were carried out on three fruits per plant.

Assessments of agro-morphological aspects, total carotenoid content of fruit pulp, and seed and seed oil yields
For the assessment involving multi-categorical characteristics, we adopted the morphological descriptors suggested by Bioversity International and the European Cooperative Programme for Plant Genetic Resources (ECPPGR), plus some additional descriptors.
These descriptors comprised agro-morphological characteristics of plants, fruits, and seeds (S1 Table). Assessment was also based on agronomic characteristics, the total content of fruit pulp carotenoids, productivity of seeds, and seed oil productivity ( Table 1).
The estimates of the total carotenoid (TC) and lutein contents (L) of fruit pulp were based on colorimetric parameters. For this, the fruit pulp colour was characterised with the aid of a manual tri-stimulus colorimeter, Colour Reader CR-10 Konica Minolta, by parameters related to luminosity, and the contribution of red (a) and yellow (b). The fruit pulp was characterised from a fruit from each of the three central plants of the plot. This was carried out on pulp from Table 1. Descriptors involving agronomic aspects of plants, fruits and seeds, used in the assessment of the C. moschata germplasm maintained by BGH-UFV.

Reproductive phase
Accumulated degree-days for flowering (DDF). four different parts of the fruit (part facing the sun, part facing the soil, part by the peduncle, and floral insertion part). The values of each parameter consisted of averages obtained from the pulp of fruits harvested from each of the plots' central plants. The  TC corresponds to the total content of fruit pulp carotenoids (μg g -1 of fresh pulp mass); and L corresponds to the lutein content of fruit pulp (μg g -1 of fresh pulp mass).
The seed oil was extracted by cold pressing, with the aid of a 30-ton-capacity press, with the necessary adaptations for pressing. For this, the seeds were previously dried in a forcedair-circulation oven for 72 hours, at 23˚C. To standardise the process, 50 g seed samples were weighed from each accession and all samples were equally pressed for approximately 10 minutes.

Estimation of genotypic values, components of variance and geneticstatistical parameters
Phenotypic data were analysed using restricted maximum likelihood (REML) procedures and the best linear unbiased prediction (BLUP). These procedures were carried out with the aid of the R program, using the "lme4" package [41]. The estimates of variance components were obtained from the REML procedure, while the genotypic values of accessions (BLUPS) and controls (BLUES) were obtained from the BLUP procedure. All estimates were based on the following model: y corresponds to the phenotypic data vector; b corresponds to the vector comprising the effect of blocks, assumed to be random; a corresponds to the vector comprising the effect of accessions, assumed to be random: t corresponds to the vector comprising the effect of controls, assumed to be fixed: and e corresponds to the error vector.
The letters W, X and Z correspond to the incidence matrices of parameters b, a, and t, respectively, with the data vector y.

Correlation analysis
This analysis was based on the matrix of genetic correlations, obtained from the following estimator: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi s 2 g ðxÞ s 2 g ðyÞ q in which; Cov (x, y), corresponds to the genetic covariance between two variables X and Y, and σ 2 g (x) and σ 2 g (y) correspond to the genetic variances of variables X and Y, respectively. The correlations were analysed using a procedure known as a correlation network, which allows all relationships between the variables under study to be analysed in relation to a specific function. This procedure also allows the direction and magnitude of the correlations to be distinguished. The direction is denoted by colours: dark green is used for the lines that connect positively-correlated variables, and red for the lines that connect negatively-correlated variables. The magnitude of the correlations is denoted by the thickness of the lines connecting the variables: the thicker the line, the greater the correlation. The significance of the correlations was analysed using Mantel's Z test at 1 and 5% probability. The correlation analysis was performed with the aid of the Genes program [43].

Analysis of variability and clustering
The analysis of variability was carried out using both quantitative and multi-categorical information. For quantitative data, the distance matrix between the genotypes was obtained from the BLUPS estimates in the case of accessions, and from the BLUES in the case of the controls; the genetic distances were obtained based on the negative average Euclidean distance, with data standardisation.
The matrix was obtained from negDistMat, a function of the APCluster package [44] implemented in the R program, version 3.5.1 [45]. The distances d (x; y) between the accession pairs, exemplified here as any two accessions x (x 1 , . . ., x n ) and y (y 1 , . . ., y n ), were estimated from the following equation: dðx; yÞ ¼ À ð1=vÞ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi in which v corresponds to the number of quantitative descriptors evaluated.
The distance matrix for the qualitative data was obtained using the arithmetic complement of the simple coincidence index. The variability analysis was performed from a single distance matrix, obtained from the sum of the distance matrices of the quantitative and qualitative data. For the sum of matrices, they were standardised and each received an equal weight in the sum procedure. The variability analysis was performed using the procedure known as the Affinity propagation method [46]. The grouping was carried out from 100 independent rounds, aiming to assess the consistency of grouping.
The operation of Affinity initially involves the identification, in a set of components, of samples that will function as centres of this set. This method simultaneously considers all the set components as potential centres, i.e. as nodes in an interconnected network. Following the identification of potential centres, messages are transmitted between the set components along the network until a good set of centres and their corresponding groups emerge. The messages exchanged between the components in Affinity can be "responsiveness" r (i, k) and "availability a (i, k). This first case reflects the accumulated evidence of how appropriate point k is to serve as an example for point i, considering all other potential examples for this point. The "availability", in turn, reflects the accumulated evidence of how appropriate it would be for point i to choose point k as an exemplar, considering the other points for which point k can be an exemplar [46]. In the analysis of the present study, availability was initially established as zero.
A principal component analysis was implemented in order to identify the contribution of traits in the clustering of the genotypes. This analysis considered the data of quantitative and multi-categorical traits, according to the methodology of [47]; and was implemented using the FactoMineR package [48].

Identification of promising accession groups and per se identification of accessions
In order to facilitate the identification of promising groups of accessions for each characteristic, we carried out a grouping of means of the genotypic values corresponding to the groups obtained from the analysis of variability. This was based on Tocher's method of grouping means. The identification per se of the most promising accessions for each trait was carried out by ranking the respective genotypic effects, genetic gain and the new predicted average of the accessions, and the top 15% were considered the most promising accessions.

Results
Variance components and genetic-statistical parameters of the agronomic aspects, total content of fruit pulp carotenoids, and the characteristics of seeds and seed oil Estimates of the variance components and the genetic-statistical parameters are presented in Table 2. The estimates of genotypic variance were highest for number of seeds per fruit (NSF) and mass of seeds per fruit (MSF), decreasing to accumulated degree-days for flowering (DDF), and total content of fruit pulp carotenoids (TC). Among these variance estimates, only the genotypic variance of DDF was not significant. The estimates of variance associated with the block effect were low for all characteristics ( Table 2).
For mass of seeds per fruit (MSF), number of seeds per fruit (NSF), total content of fruit pulp carotenoids (TC), and accumulated degree-days for flowering (DDF), most of the phenotypic variance was attributable to genotypic variance, with residual variance contributing less for most of the characteristics ( Table 2).
As can also be seen in Table 2, most of the characteristics had high values for selection accuracy (A). Heritability estimates were 0.525, 0.495, and 0.774 for accumulated degree-days for accumulated-days for flowering (DDF), productivity of fruits (PF), and total content of fruit pulp carotenoids (TC), respectively, While productivity of seeds (PS) had a heritability of 0.481 and seed oil productivity (SOP) 0.291. Heritability was high (>0.50) for most of the characteristics, and very high for seed characteristics, such as mass of seeds per fruit (MSF), ratio of seed to fruit mass (MS/F), and number of seeds per fruit (NSF), and fruit characteristics, such as total content of fruit pulp carotenoids (TC) and average mass of fruit (MF), as shown in Table 2.
The high estimates of genotypic variance and heritability showed that considerable selection gain could be obtained for most of the characteristics ( Table 2). For the number of accumulated degree-days for flowering (DDF), the gain was -92.947. It was also possible to obtain gains of 7.817 t ha -1 for productivity of fruits (PF) and 20.426 μg g -1 of fresh pulp mass for total content of fruit pulp carotenoids (TC), while the potential gains for productivity of seeds (PS) and seed oil productivity (SOP) were 0.187 and 0.072 t ha -1 , respectively ( Table 2).
The phenotypic range between accessions for accumulated degree-days for flowering (DDF) was 120.0 to 820.4 (average 606. 642) ( Table 2). The range for productivity of fruits (PF) was 0.7 to 44.6 t ha -1 (average 12.946 t ha -1 ), and that for total content of fruit pulp carotenoids (TC) was 43.4 to 187.2 μg g -1 of fresh pulp mass (average 65.763 μg g -1 ), while that for productivity of seeds (PS) was 0.01 to 0.9 t ha -1 (average 0.269 t ha -1 ). The phenotypic range between accessions for seed oil productivity (SOP) was 0.004 to 0.40 t ha -1 (average 0.050 t ha -1 ) ( Table 2).
The greatest ranges between accessions for the coefficients of genotypic variation (CV g %) were for mass of fruit (MF) and seed oil content (SOC), while for the coefficient of phenotypic variation (CV P %), the greatest ranges between accessions were for seed oil productivity (SOP) and accumulated degree-days for flowering (DDF). The estimates of residual variation Table 2. Estimates of variance components and genetic-statistical parameters of agronomic aspects, total content of fruit pulp carotenoids, and yields of seeds and seed oil.

Seed and oil traits
Traits

PLOS ONE
coefficient ranged from 7.502 to 71.582 for total content of fruit pulp carotenoids (TC) and SOP, respectively (Table 2).

Genotypic correlations
A genotypic correlation network analysis and visualisation of agronomic aspects, including the total content of fruit pulp carotenoids, and characteristics of seeds and seed oil is given in Fig  2, which shows cohesion of groups involving some of the fruit characteristics and those involving some of the characteristics of seeds. Cohesion is also shown between fruit productivity (PF) and other characteristics of this group, such as average mass of fruits (MF), diameter of internal cavity of fruit (DIC), height of fruit (HF), diameter of fruit (DF,), and thickness of fruit peel (TFP). As can be inferred from the colour and thickness of the lines, this set of variables showed high positive correlations. The highest correlations in this group were for PF with MF, and PF with DIC, with values equivalent to 0.61 and 0.54, respectively, both of which were significant (p<0.01). The productivity of fruits (PF) and number of fruits per plant (NFP) showed a correlation of 0.39, and each of these showed high correlations with the productivity of seeds (PS), 0.74 and 0.51, respectively all of which were significant (p<0.001), (Fig 2). Accumulated degree-days for flowering (DDF) had low correlation with others characteristics. Seed oil content (SOC) had negative and low-magnitude correlations with soluble solids of fruit pulp (SS) and resistance of fruit pulp to penetration (RP) (Fig 2).
There was cohesion between the group of variables involved in seed productivity and variables such as the ratio of seed to fruit mass (MS/F), number of seeds per fruit (NSF), and mass of seeds per fruit (MSF). This set of variables had positive and high-magnitude correlations, of which the correlation of seed productivity (SP) with MS/F, equivalent to 0.56 and significant (p<0.01), was the highest. The group involving the mass of one hundred seeds (MOHS) and characteristics such as seed width (SW), seed thickness (ST), and seed length (SL) was also a cohesive group. This group had positive correlations, of which the correlation of MOHS with seed width SW, equivalent to 0.62 and significant (p<0.01), was the highest (Fig 2).

Genetic variability and clustering
Cluster analysis, based on the agro-morphological aspects, the total content of fruit pulp carotenoids, and the characteristics related to the yields of seed and seed oil of the germplasm, placed the accessions into 16 groups (Table 3).
Based on the clustering pattern, high variability was observed between the accessions. About 17% of the genotypes were in group 11, together with the control, Jabras. Group 1, the second largest, contained 13.18% of the accessions and two controls, Jacarezinho and Maranhão. Groups 5 and 14 contained 10 and 11 accessions, respectively, making them the next largest groups formed. The grouping of genotypes in the other groups did not occur equitably and some of them contained only one genotype ( Table 3).
The visual pattern of the clustering in heatmap format showed low similarity between the groups formed, as denoted by the predominance of yellow and orange colouring (Fig 3). Visual analysis of this clustering also shows homogeneity of the distances between groups, denoted by the uniformity of the colouring. The morphological pattern of fruits representative of part of the groups obtained with genotypes clustering is shown in the Fig 4. The result of the principal components analysis (PCA) refers to the first 15 independent components, which explained 55.56% of the total variation observed between the genotypes (Fig 5). Component 1, which explained 7.35% of the total variation, had a greater contribution from quantitative variables, mainly from the average mass of fruits (MF), diameter of fruits (DF), and diameter of internal cavity of fruit (DIC). Component 2 had a greater contribution from the multi-categorical traits and explained 5.93% of the total variation (Fig 5). The result of PCA regarding the fifteen principal components and the relative contribution of traits in each component is provided in the S2 Table.

Identification of promising clusters and per se identification of promising genotypes
In order to facilitate the visualisation of clusters with the most desirable characteristics, a grouping of means of clusters was performed by the Tocher method (Table 4).

PLOS ONE
The lowest mean for accumulated degree-days for flowering (DDF) occurred in Group 16, which contained only the control Tetsukabuto, although most groups expressed intermediate averages for this characteristic ( Table 4). The group with the highest mean for productivity of fruits (PF) was Group 4, formed by the accessions BGH-1927, BGH-4681A, and BGH-5653. This group also expressed one of the highest averages for mass of fruits (MF) and an intermediate average for number of fruits per plant (NFP). As for the total content of fruit pulp carotenoids (TC), the highest average occurred in Group 7, formed by the accessions BGH-5455A and BGH-5598A. Groups 1 and 6 expressed the highest averages for seed (PS) and seed oil productivity (SOP). Group 1 contained the largest number of accessions ( Table 4).
The identification per se of the most promising accessions for each trait, based on their respective genotypic effects, is shown in Tables 5 and 6. Also in these tables are the estimates, for each accession, of their genetic gains and the new predicted average for each trait.
The selected accessions had averages for accumulated degree-days for flowering (DDF) that were much lower than the general average of the accessions (606.64) and the average of the controls (526.41), with their new predicted averages ranging from 474.39 to 251.09, and genetic gains from -132.25 to -355. 55. Notably, the accessions BGH-6749, BGH-5639, and BGH-2191 were the most promising for DDF (Table 5).
For productivity of fruits (PF), the selected accessions had higher averages than the general average of the accessions (12.95 t ha -1 ) and the average of the controls (11.85 t ha -1 ), with their Table 3. Clustering of the C. moschata germplasm assessed in this study and maintained by BGH-UFV, based on agro-morphological aspects, the total content of fruit pulp carotenoids, and the yields of seeds and seed oil.

PLOS ONE
new predicted averages ranging from 15.49 to 29.27 t ha -1 . As for total content of fruit pulp carotenoids (TC), the selected accessions also had much higher averages than the general average of the accessions (65.76 μg g -1 of fresh weight) and that of the controls (65.58 μg g -1 of fresh weight). The new averages predicted for this characteristic among those selected ranged from 72.34 to 179.46 μg g -1 of fresh pulp mass, and the most promising accessions for this characteristic were BGH-5455A and BGH-5598A (Table 5).
The identification per se of the most promising accessions for productivity of seeds (PS), seed oil content (SOC) and seed oil productivity (SOP), together with their respective genetic gains and new predicted averages for these characteristics is shown in Table 6.
As for productivity of seeds (PS), the new predicted averages among the selected accessions ranged from 0.33 to 0.58 t ha -1 and the genetic gains from 0.06 to 0.31 t ha -1 . Notably, the accessions BGH-4610A, BGH-5485A, and BGH-6590 were the most promising for this characteristic ( Table 6). The selected accessions displayed small differences in seed oil content (SOC); however, the average of these was higher than that of the controls (16.73%). Finally, for seed

PLOS ONE
oil productivity (SOP), the new predicted averages ranged from 0.12 to 0.13 t ha -1 and the genetic gains from -0.07 to -0.08 t ha -1 . The accessions BGH-5485A, BGH-4610A, and BGH-5472A were the most promising for this characteristic (Table 6).

Discussion
Variance components and genetic-statistical parameters of the agronomic aspects, total content of fruit pulp carotenoids, and the characteristics of seeds and seed oil As with other species, the usefulness of C. moschata germplasm conserved in banks depends on the level and quality of information associated with it [30,31,32,33,49]. The samples of C. moschata maintained by BGH-UFV constitute one of the largest collections of this species in Brazil [34]. Studies involving the assessment of this germplasm have allowed the identification of accessions with crucial characteristics for this crop, such as phytopathogenic resistance, and for its genetic improvement in terms of production and nutritional aspects of its fruits and seed oil [10,21,36,37]. Although BGH-UFV maintains more than 350 accessions of Cucurbita ssp. [35], part of this germplasm has not yet been assessed, demonstrating the importance of continuing these studies.
Most of the C. moschata germplasm express vigorous growth and indeterminate growth habit [50], and C. moschata plants commonly occupy a large area of cultivated land, making it difficult to phenotypically assess its germplasm in experimental designs such as in randomised blocks. The main limitation in the evaluation of C. moschata germplasm in randomised blocks is the difficulty of ensuring satisfactory homogeneity throughout the experimental area. In addition, the germplasm seed samples kept in banks in most cases are small, making it impossible to repeat accessions throughout the experimental area and assess quantitative characteristics. In view of this, we proposed in this study to evaluate part of the C. moschata germplasm maintained at BGH-UFV using the design known as Federer's augmented blocks [39]. The details of all aspects inherent to this design are very well described by Federer and, according to him, the design circumvents the limitations mentioned above and can be adopted even when the propagating material is insufficient for the establishment of more than one plot and where the quantity of samples to be evaluated is too great.
The present study describes the evaluation of one of the largest germplasm volumes of C. moschata. The high estimates of genotypic variance for characteristics related to seed production observed in this study corroborate those reported by [51], who also observed higher estimates of genotypic variance for the number of seeds per fruit and flowering characteristics, and also a greater contribution of genotypic variance to the phenotypic variance in these characteristics. Additionally, most of the characteristics assessed in this study gave high estimates of heritability (>0.50), considering the classification of [52], especially the characteristics of seeds such as mass of seeds per fruit (MSF), ratio of seed to fruit mass (MS/F), and number of seeds per fruit (NSF), as well the aspects related to fruits, such as total content of fruit pulp carotenoids (TC) and mass of fruit (MF). High estimates of heritability point to a greater correlation between the phenotype and the genotype [53], indicating that most of the variability observed for these characteristics resulted from genotypic effects.
The high estimates of genotypic variances may be associated with the quantitative nature of these characteristics, which may be the result of the influence of a high number of genes [54]. Most of the germplasm evaluated in this study came from the land of family-based farmers, who do not carry out selection either for seed characteristics or to obtain earlier-flowering genotypes. As already mentioned, the exchange of seeds between farmers and the natural occurrence of hybridisation between populations of C. moschata has increased the variability of this species, even for characteristics for which selection is commonly carried out, such as fruit productivity.
Considerable predicted gains were obtained for most of the characteristics, considering the overall average of accessions. This result was associated with the high estimates of genotypic variance and heritability observed for most of the characteristics ( Table 2).
The average relationship between the coefficient of genetic variation and the residual coefficient was close to one unit for most of the characteristics. Although the estimates of the residual coefficients of variation for most characteristics were high, in general they tended to be lower in Table 4. Grouping of means of the genotypic values of the groups obtained in the analysis of variability for agro-morphological aspects, the total content of fruit pulp carotenoids, and productivities of seed and seed oil.

PLOS ONE
relation to their corresponding coefficients of genotypic variability, which demonstrates that most of the variability expressed by germplasm was due to genetic factors ( Table 2).

Genetic correlation network
Analysis of correlations between characteristics has been widely used in plant breeding, where often a high number of characteristics must be considered simultaneously [55,56]. This analysis is often used to assist in indirect selection for certain characteristics [55,57]. However, as highlighted by [58], in cases where one intends to practise indirect selection for a primary characteristic by means of a secondary one, the heritability of the latter characteristic must be greater than that of the former for efficient selection. In view of this, the selection of genotypes with higher average mass of fruits (MF) seems to be a promising alternative for obtaining higher fruit productivity in C. moschata. It should, however, be highlighted that when selecting genotypes for increasing fruit productivity in C. moschata, crucial aspects for their acceptability in the consumer market, such as the shape and size of fruits, must be considered. Currently, important pumpkin consumption centres like the state of Minas Gerais and most of the southeast region of Brazil demand smaller fruits, and most of the consumption in these regions is represented by fruits from hybrid cultivars, such as Jabras and Tetsukabuto, which have a globular shape and weigh from 2 to 3 kg [14]. On the other hand, in the north and northeast regions of Brazil, larger fruits, which are commonly sold in slices, are more acceptable. The prevention of waste and the ease Table 5. Estimates of the genotypic effects, genetic gain and new predicted averages for the accumulated degree-days for flowering (DDF), fruit productivity (PF) and total content of fruit pulp carotenoids (TC), for the top 15% most promising accessions and the controls.

PLOS ONE
of transport are determining aspects for the acceptability of fruit shapes, and the search for greater productivity in the cultivation of C. moschata must therefore also consider these characteristics, equating them with aspects such as the number of fruits per plant (NFP), height of fruit (HF) and diameter of fruit (DF). Based on the correlations obtained in this study, the simultaneous consideration of aspects such as higher number of fruits per plant (NFP), higher productivity of fruits (PF) and higher ratio of seed to fruit mass (MS/F) seems to be a promising alternative for obtaining higher seed productivity (PS) in C. moschata. The heritability estimates obtained for these characteristics (>0.42), suggest that reasonable gains are feasible with selection for each one of them (Table 2). With this, besides greater PF and NFP, the selection of genotypes with higher PS should also prioritise greater translocation of photoassimilates for seed production, something indicated by a higher ratio of seed to fruit mass (MS/F).
Despite its applicability, correlation analysis has some limitations, and, as warned by [59], the quantification and interpretation of the correlation coefficients between two or more characteristics can result in errors during the selection process. According to them, this occurs because high estimates of correlations between these characteristics may be the effect of one or more secondary characteristics. It is therefore recommended that analysis of the association between a primary and secondary characteristic be accompanied by information on the direct and indirect effects of secondary variables on the primary [60], an approach currently known as path analysis [59]. Table 6. Estimates of the genotypic effects, genetic gain and new predicted averages for the productivity of seeds (PS), seed oil content (SOC), and seed oil productivity (SOP), for the top 15% most promising accessions and the controls.

PLOS ONE
Despite some limitations, correlation analysis has proven to be quite useful in plant breeding, mainly in the indirect selection for one or more main characteristics that have low heritability or are difficult to assess. This indirect selection is based on secondary characteristics with greater heritability or ease of assessment, providing faster genetic gains than with direct selection. In fact, correlation analysis has assisted in the indirect selection for characteristics of roots [61], for productivity in different crops [62,63,64], and for nutritional aspects and quality of fruits [65,66]. Correlation analysis can also be very useful in the characterisation and management of plant germplasm, as it may optimise the choice and number of descriptors to be used in this process.

Genetic variability and clustering
The analysis of variability provides important assistance in the initial phase of plant breeding programmes and in the management of plant germplasm. In this first case, it provides allocation of accessions in groups, guiding crossbreeding. C. moschata is allogamous, and analysing the variability of its germplasm can assist in the orientation of crossings between more diverse genotypes, thereby aiding the exploration of hybrid vigour [67,68]. Variability analysis also allows duplicates in the germplasm collections [69,70,71], which correspond to pairs or groups of accessions with high similarity, to be identified. In fact, it is estimated that less than 30% of the accessions maintained in the collections worldwide are distinct, which hinders their maintenance [29]. Therefore, in addition to optimising the use of germplasm, variability analysis reduces the cost of its maintenance by reducing its volume [72].
The accessions of C. moschata assessed in this study displayed high genetic variability in their agro-morphological characteristics, the total content of fruit pulp carotenoids (TC), and the productivity of seeds (PS) and seed oil (SOP), resulting in the formation of 16 clusters ( Table 3). The clustering of Jacarezinho and Maranhão in the same group (Group 1) reflects its consistency since these two cultivars have similar characteristics.
Clustering did not reflect a smaller genetic distance between those accessions from the same state or geographic region of Brazil. Group 11, for example, grouped accessions from different states and regions; and the preponderance of accessions from Minas Gerais (MG) and São Paulo (SP) in this group was probably only a result of the greater number of accessions from these states. This trend was repeated for other groups with higher numbers of accessions such as 1, 5 and 14. A study involving the assessment of C. moschata accessions from different regions of Brazil and maintained at BGH-UFV [73] also did not report smaller genetic distance between the accessions from the same state or region.
It is notable that the two hybrids used as controls, Jabras and Tetsukabuto, clustered in different groups. Although they have similar fruit shape and size, the groups to which they were allocated differed in most characteristics (Table 4), and their different genotypic values for most characteristics (Tables 5 and 6) justified their clustering in different groups. Tetsukabuto, which is an interspecific hybrid between C. moschata and C. maxima [74], corresponded to the group with lowest genotypic average for accumulated degree-days for flowering (DDF), in addition to expressing genotypic averages quite different from the other groups in relation to the characteristics of seeds and seed oil (Table 4), justifying its clustering separately from the other genotypes.
The predominance of yellow colour in the hierarchical clustering in heatmap format denoted low similarity between the clusters formed (Fig 3). As can also be seen in Fig 3, the uniformity in the yellow coloration for the genetic distances between groups confirms the homogeneity of distances between them.
The variability denoted by the clustering of the accessions corroborates the high estimates of genetic variances and heritabilities displayed by most of the agronomic characteristics; the total content of fruit pulp carotenoids (TC); and seed characteristics such as mass of seeds per fruit (MSF), ratio of seed to fruit mass (MS/F), and number of seeds per fruit (NSF) ( Table 2). This is also analogous to other studies involving the analysis of variability in this crop in Brazil [19,21].
The greater contribution of the average mass of fruits (MF), diameter of fruit (DF), diameter of internal cavity of fruit (DIC), as well as the mass of seeds per fruit (MSF), and number of seeds per fruit (NSF) for component 1, suggests that there was greater variability for these characteristics, and that they contributed more to genotype discrimination (Fig 5). This result seems to be related to the estimates of genotypic variance, since MSF and NSF also corresponded to characteristics with the greatest genotypic variances ( Table 2). The greatest contribution, in component 2, of variables such as the amount of trichomes (AT), leaf recess (LR) and amount of trichomes in the petiole (ATP) shows the importance of multi-categorical characteristics in the discrimination of the studied germplasm.

Identification of promising groups of genotypes
In C. moschata, the identification of promising groups of genotypes can assist in the orientation of crossings targeting hybrid vigour exploitation and the segregation of populations for their characteristics of interest [75,76].
As shown in Table 4, Group 1 expressed a high genotypic average for total content of fruit pulp carotenoids (TC) and the highest averages for productivity of seeds (PS) and seed oil content (SOC), confirming the high number of promising accessions for these characteristics. The negative correlations between SOC and characteristics related to the quality of fruit pulp in C. moschata, such as content of soluble solids (SS) and resistance of fruit pulp to penetration, might hinder simultaneous gains for these characteristics. This can be managed by conducting individualised breeding subprogrammes, aiming in one case to improve seed oil production, and in another, to improve fruit production and quality.
The highest average for total content of fruit pulp carotenoids (TC) occurred in Group 7, formed by the accessions BGH-5455A and BGH-5598A (Table 4). These accessions were also identified as the most promising for TC in the identification per se, with new predicted averages greater than 170 μg g -1 of fresh pulp mass (Table 5). This result is much higher than those reported in previous studies [4,37,77]. Among these, the study involving the characterisation of 55 accessions of C. moschata, also maintained by the BGH-UFV, reported a total content of fruit pulp carotenoid averages not greater than 118.70 μg g -1 of fresh pulp mass [37]. On the other hand, averages of up to 404.98 μg g -1 of fresh pulp mass have been reported [1,72], when evaluating C. moschata germplasm from northeast Brazil. The differences observed for the total content of fruit pulp carotenoids between the present study and previous studies might be mainly associated with the genetic aspects of the germplasm evaluated in each study. According to [72], in northeast Brazil there is a preference for winter squash fruits with more orange pulp, a characteristic associated with higher levels of carotenoids, which corroborates the results obtained for this characteristic in studies involving the evaluation of C. moschata germplasm from this region.
Studies with C. moschata commonly involve the analysis of fruit pulp carotenoids and generally report high levels of these components [1,4,78,79]. Among these studies, about 19 different carotenoids in the carotenogenic profile of the fruit pulp were identified [1], and β-and α-carotene constituted the largest proportion of the total carotenoid content in this species. In fact, this vegetable has been considered one of the best sources of carotenoids such as β-carotene, with levels above those found in other important carotenogenic vegetables, such as carrots [80].
The main biological functions of components such as α-and β-carotene are their pronounced pro-vitamin A activity [81,82], and a series of bioactive functions, especially antioxidant activity [83,84]. Along with its bioactive functions, C. moschata brings together fundamental characteristics for biofortification programmes, such as high production potentials and profitability, high efficiency in reducing deficiencies in micronutrients in humans, and good acceptance by producers and consumers in the regions where this crop is grown [8]. C. moschata has therefore been strategically used in programmes targeting biofortification in vitamin A precursors, among them the Brazilian Biofortification Programme (BioFORT), led by the Brazilian Agricultural Research Corporation (Embrapa) [9].
The main interest in the assessment of productivity of seeds (PS) and seed oil productivity (SOP) in C. moschata corresponds to the high potential for using its seed oil for food purposes. Governments and health experts are interested in encouraging the consumption of unsaturated fatty acids rather than saturated ones, based on the consensus that this reduces the risk of cardiovascular diseases [85,86,87], and this vegetable not only has a high oil content, with the lipid fraction of its seeds reaching up to 49% of its composition [88], but the lipid profile of this oil consists of more than 70% unsaturated fatty acids, with a preponderance of fatty acids such as linoleic C18: 2 (Δ 9,12 ) and oleic C18: 1 (Δ 9 ).
C. moschata seed oil is also rich in bioactive components such as vitamin E and carotenoids [13], which have important antioxidant activity, in addition to protecting the oil against oxidative processes. Despite this, most of the seeds from the production of C. moschata in Brazil are still discarded during consumption. Their use, therefore, represents an alternative way of supplementing diets as well as increasing the income of farmers involved in the production of this vegetable.
Group 16, consisting solely of the control Tetsukabuto, displayed the lowest average for accumulated degree-days for flowering (DDF), indicating that this genotype has the earliest flowering period (Table 4). As can also be seen in the Table 4, most groups had intermediate averages for DDF. Normally, C. moschata plants have very long internodes, and this, coupled with the vigorous growth of this species, limits its cultivation, since plants with a greater internode length require much larger areas for cultivation. The interest in assessing precocity in C. moschata is based on the possible relationship of this characteristic with aspect such determinate growth habit. According to [89], the Bu gene, identified as being responsible for the formation of shorter internodes in pumpkins, is also linked to earlier flowering in this species. In a study evaluating hybrids and segregating winter squash populations for oil production and plant size reduction [50], the cultivars Piramoita and Tronco Verde, which have determinate growth habits, displayed the smallest number of days for female flowering. Greater precocity is an important characteristic for most crops, especially in the cultivation of vegetables, as it optimises the use of cultivation areas, reduces the risks of exposure of the crop to adverse abiotic and biotic factors, and reduces management costs.
In view of the low correlation observed between accumulated degree-days for flowering (DDF) and the other characteristics, it is unlikely that accessions that simultaneously express earlier-flowering and other important characteristics in C. moschata will be identified. Therefore, the initial identification of earlier-flowering accessions, followed by incorporation of this trait in germplasm that is promising for other characteristics seems appropriate in C. moschata breeding.
Group 4, formed by BGH-1927, BGH-4681A and BGH-5653, had the highest average for productivity of fruits (PF) ( Table 4). It also had one of the highest averages for mass of fruits (MF) and an intermediate average for number of fruits per plant (NFP), corroborating the estimates of the correlations between these characteristics and productivity of fruits (Fig 2). The accessions BGH-4681A and BGH-5653 were also identified as the most promising for PF in the per se identification, with averages above 20 t ha -1 (Table 5). These averages were much higher than the world average, estimated at 13.4 t ha -1 [6].
Although the cultivation of C. moschata is primarily intended for fruit production, as already mentioned, the selection of genotypes for greater fruit productivity in this crop must also consider crucial aspects for the acceptability of fruits such as shape and size. In general, winter squash production must currently prioritise the adoption of cultivars with smaller fruits. In addition to obtaining fruits of greater mass, greater productivity in C. moschata can also be achieved by obtaining cultivars with higher number of fruits per plant (NFP), based on the estimated correlation observed between productivity of fruits (PF) and NFP (Fig 2).

Per se identification of promising accessions
Per se identification of promising accessions can guide selection for a specific trait, allowing the identification of promising accessions for the development of superior inbred lines and/or open-pollinated cultivars. In fact, from a brief survey of the Brazilian National Cultivar Register (RNC), it appears that, of the 182 cultivars of C. moschata registered at the moment, most consist of open-pollinated cultivars [74]. This survey also found a considerable number of intra-and interspecific hybrids, confirming the feasibility of applying inbreeding in certain stages of C. moschata breeding.
The selected accessions displayed averages for accumulated degree-days for flowering (DDF) much lower than the general averages of the accessions and the controls. Notably, the accessions BGH-6749, BGH-5639, and BGH-219 expressed the lowest new predicted averages for DDF, making them the earliest-flowering accessions (Table 5). Regarding productivity of fruits (PF), the notably more promising accessions were BGH-4453, BGH-5653, BGH-5544A, BGH-4681A, BGH-5224A, and BGH-6587A, which expressed gains above 8 t ha -1 and new predicted averages for PF above 20 t ha -1 (Table 5). It should be highlighted that the BGH-5544A accession also expressed high averages for productivity of seeds (PS) and seed oil (SOP), corroborating the correlations of these characteristics with productivity of fruits ( Fig  2).
The most promising accessions for total content of fruit pulp carotenoids (TC) were BGH-5455A and BGH-5598A (Table 5). These accessions expressed gains and new predicted averages for TC higher than 108.03 and 173.80 μg g -1 of fresh pulp mass, respectively, which were much higher than those of the controls. For the characteristics of seed and seed oil, it was found that the accessions BGH-4610A, BGH-5485A, and BGH-6590 were the most promising for productivity of seeds (PS) ( Table 6). These accessions expressed gains and new predicted averages for PS of up to 0.31 and 0.58 t ha -1 , respectively. The most promising accessions for seed oil productivity (SOP) were BGH-5485A, BGH-4610A, and BGH-5472A, which had new predicted averages for SOP of 0.13 t ha -1 . It is worth highlighting that these accessions corresponded to those with higher PS, corroborating the strong correlation between productivity of seeds and seed oil productivity (Fig 2).

Conclusions
The accessions of C. moschata assessed in this study expressed high genetic variability for agro-morphological characteristics and for agronomic aspects related to the production of seeds such as number and mass of seeds per fruit, for accumulated degree-days for flowering, for total content of fruit pulp carotenoids, and for productivity of fruits, which allowed considerable gains to be obtained from selection for each of these characteristics.
The network of genetic correlations showed that higher fruit productivity in C. moschata might be achieved from the selection of aspects considered crucial in the production of this crop such as higher number of fruits per plant, and height and diameter of fruit. It also showed that greater seed productivity might be achieved with selection for a higher ratio of seed to fruit mass, number and mass of seeds per fruit; this information will assist in selection for higher productivity of fruit, seed and seed oil.
The clustering analysis resulted in 16 groups, with low similarity between the groups, which corroborates the variability of these accessions.
Grouping the averages of the clusters and identification per se allowed the most promising groups and accessions to be recognised for each characteristic, an approach that will guide the use of these accessions in breeding programmes.
Per se analysis identified the accessions BGH-6749, BGH-5639, and BGH-219 as those with the lowest averages for accumulated degree-days for flowering, highlighting them as the earliest flowering accessions. The most promising accessions for productivity of fruits were BGH-4453, BGH-5653, BGH-5544A, BGH-4681A, BGH-5224A, and BGH-6587A, with new predicted averages greater than 20 t ha -1 . The accessions with the highest averages for total content of fruit pulp carotenoids were BGH-5455A and BGH-5598A, with averages greater than 170.00 μg g -1 of fresh pulp mass. The accessions BGH-5485A, BGH-4610A, and BGH-5472A were the most promising for seed oil productivity, which, in the case of the former two, also corresponded to the highest averages for productivity of seeds. The accessions of C. moschata assessed in this study are a promising source for the genetic improvement of characteristics such as early flowering, total content of fruit pulp carotenoids, and productivity of seeds and seed oil.
Supporting information S1 Table. Multi-categorical descriptors used in the assessment of the C. moschata germplasm maintained by BGH-UFV.