Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Genome-wide analysis highlights genetic admixture in exotic germplasm resources of Eucalyptus and unexpected ancestral genomic composition of interspecific hybrids

  • Danyllo Amaral de Oliveira,

    Roles Data curation, Formal analysis, Investigation, Writing – original draft, Writing – review & editing

    Affiliation Departamento de Biologia, Universidade Federal de Lavras, Lavras, MG, Brazil

  • Paulo Henrique Muller da Silva,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Methodology, Project administration, Resources

    Affiliation Instituto de Pesquisas e Estudos Florestais, Piracicaba, SP, Brazil

  • Evandro Novaes,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Writing – review & editing

    Affiliation Departamento de Biologia, Universidade Federal de Lavras, Lavras, MG, Brazil

  • Dario Grattapaglia

    Roles Conceptualization, Data curation, Funding acquisition, Resources, Validation, Writing – original draft, Writing – review & editing

    dario.grattapaglia@embrapa.br

    Affiliation Plant Genetics Laboratory, EMBRAPA Genetic Resources and Biotechnology, Brasilia, DF, Brazil

Abstract

Eucalyptus is an economically important genus comprising more than 890 species in different subgenera and sections. Approximately twenty species of subgenus Symphyomyrtus account for 95% of the world’s planted eucalypts. Discrimination of closely related eucalypt taxa is challenging, consistent with their recent phylogenetic divergence and occasional hybridization in nature. Admixture, misclassification or mislabeling of Eucalyptus germplasm resources maintained as exotics have been suggested, although no reports are available. Moreover, hybrids with increased productivity and traits complementarity are planted worldwide, but little is known about their actual genomic ancestry. In this study we examined a set of 440 trees of 16 different Eucalyptus species and 44 interspecific hybrids of multi-species origin conserved in germplasm banks in Brazil. We used genome-wide SNP data to evaluate the agreement between the alleged phylogenetic classification of species and provenances as registered in their historical records, and their observed genetic clustering derived from SNP data. Genetic structure analyses correctly assigned each of the 16 species to a different cluster although the PCA positioning of E. longirostrata was inconsistent with its current taxonomy. Admixture was present for closely related species’ materials derived from local germplasm banks, indicating unintended hybridization following germplasm introduction. Provenances could be discriminated for some species, indicating that SNP-based discrimination was directly proportional to geographical distance, consistent with an isolation-by-distance model. SNP-based genomic ancestry analysis showed that the majority of the hybrids displayed realized genomic composition deviating from the expected ones based on their pedigree records, consistent with admixture in their parents and pervasive genome-wide directional selection toward the fast-growing E. grandis genome. SNP data in support of tree breeding provide precise germplasm identity verification, and allow breeders to objectively recognize the actual ancestral origin of superior hybrids to more realistically guide the program toward the development of the desired genetic combinations.

Introduction

Eucalyptus is a highly diverse genus of tree species from Australia and neighboring islands ruling the forest landscape across large part of Australia and neighboring islands. Besides its keystone ecological role in native forests, the genus includes the most widely commercially planted hardwood tree species in tropical and subtropical regions of the world [1, 2]. Fast growth, wide adaptability to a globally broad diversity of tropical and subtropical environments, combined with multipurpose wood properties for energy, solid wood products, pulp and paper, have secured the superior position of the eucalypts in current world forestry [3].

As particularly speciose, the genus Eucalyptus has received significant attention to its phylogenetic organization. The classic taxonomy of eucalypts with more than 700 species [4], has now been expanded to include over 890 species [5], but taxonomic issues still remain as species delimitations are still being actively investigated. The genus was originally subdivided in 13 subgenera but two of them, Angophora and Corymbia, were recently recognized as separate genera based on molecular evidences [6, 7]. The largest subgenus Symphyomyrtus includes 470 species organized in 15 sections which are mostly well separated using molecular marker data [8, 9]. Nevertheless, some discrepancies exist between DNA marker and morphology-based classifications at the lower taxonomic level, consistent with recent divergence of taxa, characters convergence and occasional hybridization in natural populations, generating hybrid swarms and zones of intergradation between species of the same section in the wild [1013].

Most of the economically important eucalypts species belong to subgenus Symphyomyrtus, and, within it, to three specific sections, Exsertaria, Latoangulatae and Maidenaria [14]. These sections include approximately twenty species that account for more than 95% of the world’s planted eucalypts. Among those, Eucalyptus grandis, E. urophylla (sect. Latoangulatae), E. camaldulensis (sect. Exsertaria) and E. globulus (sect. Maidenaria) make up more than 80% of the planted areas [15]. The sexual compatibility among species within and across these three main sections have been important drivers of breeding programs especially in tropical and subtropical countries, where large extensions of forests are planted with hybrid material [10, 12, 16, 17]. Hybrid breeding coupled to clonal propagation has allowed the aggregation and exploitation of important characteristics from different species and provenances in highly productive hybrid clones [18, 19]. In Brazil, an estimated 80% of eucalypts plantations are established with first- or second-generation hybrids involving mainly E. urophylla and E. grandis [17, 20]. The remaining 20% are mostly pure species material of the two former species, or hybrids with E. camaldulensis and E. pellita for greater drought tolerance, and to a much lesser extent with E. globulus to improve wood quality for cellulose. Additionally, E. dunnii and E. benthamii are also used as pure species or in hybrid combinations in areas subject to frost [21].

A number of studies in the last fifteen years have approached and largely established the challenge of resolving lower-level, within section taxonomy in Eucalyptus using different genome-wide DNA marker data [8, 9, 2224]. However, issues remain for species in section Latoangulatae, for example, due to their intermediate nature, when compared to more densely clustered taxa in other sections [8]. Admixture of Eucalyptus species in their native range has been reported [11, 25], reflecting the phylogenetic fluidity that still exist in some taxa. However, misclassification, mislabeling and hybridization of eucalypts germplasm resources maintained as exotics in different countries has been suggested, but reports are only anecdotal. Little or no data exist on such incidences, or on the overall current status of gene banks in countries where eucalypt make up a large proportion of planted forests.

Genome-wide studies looking at large numbers of eucalypt species have used DArT markers, genotyped originally with probe arrays [26] and, more recently, by genome complexity reduction with restriction enzyme digestion followed by high-throughput sequencing [27]. This DNA assay has been valuable as it provides simultaneous discovery and genotyping of Single Nucleotide Polymorphisms (SNP) within and across species, facilitating genus-wide phylogenetic studies. However, some challenges remain for this SNP genotyping method due to variable sequencing coverage and irregular sampling of loci causing variable genotype reproducibility and ultimately limited data portability across studies in highly heterozygous genomes such as those of the eucalypts [2830]. The development of Eucalyptus multispecies SNP arrays based on industry-level “gold standard” technology has provided a worldwide usable platform allowing seamless and precise data exchange across studies [31].

Although species discrimination using DNA data is largely settled, less attention has been devoted to looking at provenance variation within species. This is particularly important for breeding programs that take advantage of matching distinctive provenance characteristics to specific sites in exotic environments, or aim at deliberately exploiting provenance and species complementarity by building specific genomic compositions by interspecific hybridization [2, 12, 18]. Likewise, few studies have examined the possibility of using DNA data to describe the actual genomic ancestral composition of hybrids, including those derived from more than two parental species. Knowledge of the actual genomic composition of complex hybrids of distinctive performance would allow directing more deliberate selection strategies in hybrid breeding programs. Earlier studies using microsatellite markers indicated that provenances of E. grandis could be distinguished but not for E. urophylla and E. camaldulensis, and some hybrid clones could be assigned to their most likely ancestral species, although with incomplete resolution [32]. Using SNP data, preliminary analyses have shown that provenances within species could be distinguished for E. grandis and E. urophylla [31, 33] but not for E. camaldulensis, consistent with the latter being more prone to hybridization or a remnant of an ancient widespread taxon [8].

The current eucalypt SNP arrays have been used to estimate recombination rates and carry out dense linkage mapping [34], build relationship matrices for genomic selection in several species, reviewed in [35], and understand the consequences of artificial selection [36]. No studies to date, however, have evaluated their ability to characterize germplasm material in gene banks. Questions frequently arise regarding the verification of the alleged species classification, the possibility of discriminating provenances and determining the genomic composition of hybrid clones of unknown or uncertain origin derived from successive generations of interspecific recombination. In this study we examined a large set of germplasm accessions including 440 Eucalyptus trees of 16 species and 44 interspecific hybrids currently conserved or used in Brazil. We used genome-wide SNP data to evaluate the agreement between the alleged phylogenetic classification of species and provenances as registered in their historical records, and their observed genetic clustering obtained from genomic data, agnostic to any prior phylogenetic information. We focused on the main planted species of Symphyomyrtus given their outstanding relevance in terms of germplasm use and conservation. Additionally, we used SNP data to examine the actual genomic makeups of hybrids derived from interspecific crosses involving two of more species, and compare them with their expected composition based on the recorded ancestral species.

Material and methods

Plant material

The study involved a germplasm set of 440 trees belonging to 16 Eucalyptus species of five sections of subgenus Symphyomyrtus and 44 interspecific hybrid clones (Table 1). These trees are conserved in species/provenance/progeny trials and clonal banks at the Anhembi Experimental Research Station of the Institute for Forestry Research (IPEF) in Brazil (22.7897° S, 48.1280° W), or in the gene banks of some associated forest-based companies. For six species (E. grandis, E. longirostrata, E. pellita, E. robusta, E. saligna and E. tereticornis), samples were analyzed for more than one provenance. The original locations of the species and provenances were plotted on top of a base map of world country boundaries shapefile of Australia, publicly available under a Creative Commons Attribution 4.0 International Public License (https://datacatalog.worldbank.org/search/dataset/0038272/World-Bank-Official-Boundaries) using the R package tmap (Fig 1). Plant material included: (1) individual trees sampled in species/provenance trials established with original seeds collected in Australia for which the CSIRO (Commonwealth Scientific and Industrial Research Organisation) seedlot number is known, and (2) individual trees collected in Brazilian germplasm banks at least one generation removed from the original introductions, maintained by IPEF or by three associated forestry companies (Suzano, Klabin, Vallourec and CMPC Celulose Riograndense), sometimes with unknown provenance origin (Table 1). In addition, 44 interspecific hybrid clones obtained by controlled interspecific crosses of two or more species were studied to compare their SNP-realized versus pedigree-expected genomic composition. These hybrids were grouped into five classes (Hybrids 1 to 5) according to the Symphyomyrtus sections of the component Eucalyptus species registered in their pedigrees (Table 2).

thumbnail
Fig 1. Eucalyptus species for which provenances were studied, plotted on their respective geographic locations on a publicly available basemap reprinted under a CC BY 4.0 license with permission from The World Bank (https://datacatalog.worldbank.org/search/dataset/0038272/World-Bank-Official-Boundaries).

https://doi.org/10.1371/journal.pone.0289536.g001

thumbnail
Table 1. Section, provenances, source and number of individual trees sampled for each of the 16 Eucalyptus species included in the study.

https://doi.org/10.1371/journal.pone.0289536.t001

thumbnail
Table 2. List of the 44 Eucalyptus hybrids studied, obtained by controlled interspecific crosses of two or more species, classified into five groups according to the sections of Symphyomyrtus involved in the cross (Hybrids 1 –Hybrids 5).

https://doi.org/10.1371/journal.pone.0289536.t002

SNP genotyping and filtering

Total genomic DNA was extracted with an optimized Sorbitol/CTAB protocol [37]. DNA samples were sent to ThermoFisher (Santa Clara, CA) for SNP genotyping with the 72K Eucalyptus Axiom Array developed for Eucalyptus and Corymbia species (https://www.thermofisher.com/order/catalog/product/551134; Grattapaglia D. and Silva-Junior O.B., unpublished). This Axiom Array is a second-generation Eucalyptus SNP platform with 68,055 SNPs specific to the Eucalyptus genome, 28,177 of them shared with the previously developed Infinium EUChip60k [31], and 4,147 specific to the genome of its sister genus Corymbia, these latter ones not used in this study.

SNPs with more than 10% missing data and with minor allele frequency (MAF) below 5% were removed using PLINK v1.9 [38] using parameters–maf 0.05 –geno 0.1. A total of 48,645 SNPs passed these filtering thresholds. The dataset was further pruned of 21,347 SNPs that were in linkage disequilibrium (LD) with other markers to remove redundant information and avoid regions of the genome with a disproportionate influence on the results, that could potentially distort the representation of genome-wide structure [39]. LD pruning was performed using PLINK parameter–indep-pairwise 50 5 0.3. With the retained 27,298 SNPs, the rate of per individual missing data was below 10% for all samples, except for one sample of E. grandis from Coffs Harbor. This sample had 64.9% missing genotypes and was removed from further analyses. Ultimately, genetic analyses were performed with a dataset of 27,298 SNPs genotyped in 484 individual trees.

Statistical and population genetics analyses

Basic population genetics parameters were estimated, such as the average minor allele frequency (MAF), observed (Ho) and expected heterozygosity (He). Analyses were performed in R (R Core Team 2020) using packages adegenet v2.1.3 [40] and hierfstat v 0.5–7 [41]. The data was input into R in FSTAT format after transformation with PGDSpider v2.1.1.5 [42].

fastSTRUCTURE v. 1.0 [43] was run with the 27,298 SNPs to infer population structure for the 484 individual trees. Analyses were performed with the number of clusters K varying from 2 to 30 and option—seed = 100. The input was the binary version (BED) of the PED file from PLINK. The most likely model was selected using the supervised estimators of [44] implemented in the StructureSelector [45] web server (https://lmme.ac.cn/StructureSelector/). Cluster assignment for each of the samples was visualized with barplots in R, using packages pophelper v2.3.0 [46] and gridextra v2.3 [47]. Additional fastSTRUCTURE analyses were carried out separately for individual eucalypt sections and species to assess resolution at within taxa levels for provenances differentiation.

Genomic composition of the hybrids was initially obtained from the unsupervised inference provided by fastSTRUCTURE, and compared with the recorded pedigree information. Specifically, for fastSTRUCTURE annotation, we used the meanQ file, which provided the probabilities of each sample belonging to each of the clusters found. Subsequently, a supervised analysis was carried out using ADMIXTURE, a software for model-based estimation of ancestry in unrelated individuals [48]. For this analysis, samples defined as being from pure species with ~99% probability in the initial fastSTRUCTURE analysis, were used as reference populations to infer the genomic composition of the hybrids’ genomes. A simple matching genetic distance among individual trees was also estimated and groups visualized with a principal component analysis (PCA) on the genetic distance matrix, where distances among trees were represented in a cartesian graph with PC1 and PC2. These analyses were performed in R using packages adegenet [40] ape v5.4 [49] and pegas v0.14 [50]. PCA biplots were visualized using ggplot2 v.3.3.2 package [51].

Results

SNP diversity across species

After filtering and LD pruning, the final SNP dataset of 27,298 SNPs (S1 File) had a very low percentage of missing data (<3%) for all germplasm sets (species, provenances and hybrids), corroborating the good performance of the multi-species SNP array for population genomics and molecular breeding across eucalypt taxa. The percentage of polymorphic loci per population ranged from 29.7% for E. deglupta to over 93% for E. urophylla and the hybrids involving crosses between distant sections Latoangulatae and Maidenaria (Table 3). Overall, there was no significant difference in the proportion of polymorphic SNPs among the different eucalypt sections (ANOVA F-value = 1.14, p-value = 0.37). The average MAF across taxa was similar, within 0.1 and 0.15 for most taxa but E. urophylla, E. grandis, E. camaldulensis and the hybrids had a slightly higher average MAF.

thumbnail
Table 3. Summary of the proportion of polymorphic SNPs (Minor Allele Frequency; MAF> 0.05) and their average MAF for the 27,298 filtered and LD pruned SNPs, and genetic diversity parameters (observed (Ho) e expected (He) heterozygosity) for each species and hybrid (see Table 2) germplasm source.

https://doi.org/10.1371/journal.pone.0289536.t003

Population structure analysis

StructureSelector analysis of the fastSTRUCTURE results indicated the most likely model with K = 18 taxonomic clusters (S2 File). This model correctly assigned each of the 16 species to a different cluster (Fig 2; S3 File). In the case of E. saligna some individuals were additionally separated according to provenance and the hybrids were assembled in a separate highly admixed cluster (Fig 2). Admixture at the individual level was seen in allegedly pure species trees. Some E. camaldulensis individuals were classified as being admixed with E. tereticornis, some E. urophylla individuals admixed with E. grandis, and a few additional admixed individuals were seen that were expected pure (Fig 2). At the higher taxonomic level of sections within subgenus Symphyomyrtus, models with smaller numbers of clusters easily separated eucalypt sections. For example, at K = 2, section Maidenaria detached from the rest. With K = 3, Latoangulatae and Maidenaria split, with occasional admixture seen in individuals of some species. With K = 4, species of Exsertaria separated from the other sections. Surprisingly, however, E. longirostrata that belongs to section Exsertaria, was clustered together with E. deglupta and E. argophloia that belong to two other different sections (S4 File).

thumbnail
Fig 2. Population structure analysis of the 440 trees of 16 Eucalyptus species and 44 hybrids classified according to the section of their component species involved (Hyb 1 to 5).

https://doi.org/10.1371/journal.pone.0289536.g002

The SNPs dataset could not differentiate most provenances within species when all individuals were analyzed together, except for E. saligna from Kroombit Tops (Fig 2). This provenance was assigned to a separate group from Helidon and Richmond Range provenances, which in turn were clustered together. All the provenances for the other species (E. tereticornis, E. grandis, E. longirostrata, E. pellita, E. pilularis and E. robusta) could not be discriminated even at higher K’s (S5 File). Only when species were analyzed individually, fastSTRUCTURE modeling resolved some of the provenances. This was the case of the two E. grandis provenances from Atherton and Coffs Harbor and E. robusta from Brisbane and Byfield. Somewhat separate clustering was also seen for E. longirostrata from Starkvale, and E. tereticornis from Mount Garnet, although some individuals either displayed admixture or were not clustered accordingly (Fig 3). Lastly, provenances of some species clearly could not be distinguished. This occurred with E. pellita, E. longirostrata from Coominglah and Goodger and with E. tereticornis from Mitchell Road (Oaky Creek) and Mareeba.

thumbnail
Fig 3. Population structure analysis of each Eucalyptus species separately for which more than one provenance was studied.

https://doi.org/10.1371/journal.pone.0289536.g003

Determination of ancestral species composition of hybrids

The ancestral genomic composition of hybrids estimated with both fastSTRUCTURE and ADMIXTURE were compared to their respective pedigree expected composition (Fig 4; S6 File). The supervised analysis carried out using ADMIXTURE resulted, in general, in similar genomic composition as those obtained with fastSTRUCTURE, although some differences were seen for example in Hybrids 1, where the fastSTRUCTURE model indicated the unexpected presence of E. tereticornis genome. Overall, there were only nine out of the 41 hybrids for which the SNP-based composition closely matched the pedigree expected one. This happened for hybrids Hyb-31, Hyb-32, Hyb-33, Hyb-34, Hy-35, Hyb-36, Hyb-38, Hyb-39, Hyb-40 e Hyb-41, almost all of them simple F1 hybrids. For all other hybrids, small to large deviations were observed.

thumbnail
Fig 4. Analysis of the ancestral species’ genomic composition of the 44 interspecific hybrids studied.

The genomic proportions estimated by unsupervised inference with fastSTRUCTURE (top panel) and by a supervised model with species data as reference with ADMIXTURE (bottom panel), were compared with the expected composition from pedigree information (middle panel). The species were categorized into sections according to Brooker’s (2000) classification.

https://doi.org/10.1371/journal.pone.0289536.g004

For a considerable number of hybrids, additional unanticipated species from those recorded in the pedigree, were observed in their composition (Fig 4). For example, hybrids Hyb-1 through Hyb-11 in the Hybrids 1 group were expected to be F1’s of E. urophylla and E. camaldulensis. However, eight of them showed variable amounts of E. grandis genome in the ADMIXTURE analysis while the fastSTRUCTURE model suggested the presence of E. tereticornis genome more frequently than that of E. camaldulensis. The unexpected presence of E. grandis genome was again seen in several other hybrids in the Hybrids 2 group (ex. Hyb-17, Hyb-18, Hyb-19, Hyb-26). Furthermore, in this group of hybrids none or a considerably less than expected proportion of the genome was detected coming from the recorded species of Maidenaria involved in the crosses, namely E. dunnii and E. globulus. E. dunnii genome was not detected in six of the 14 hybrids and E. globulus in seven of 12 where it should have been observed (Fig 4). For example, in hybrids Hyb-13, Hyb-17, Hyb-18, Hyb-19, Hyb-21 and Hyb-26 expected to be F1 hybrids of Latoangulatae species (E. urophylla, E. grandis or E. saligna) with Maidenaria species (E. dunnii or E. globulus), the SNP data showed little or no sign of the two temperate species genomes and an unexpected or larger than expected proportions of the genome of E. grandis. Finally, there were cases where the presumed genomic composition was completely different from the SNP-estimated one. For example, hybrid Hyb-42 was expected to be a E. dunnii x E. globulus hybrid, when in fact it involved mainly species of Latoangulatae with E. camaldulensis, suggesting mislabeling.

Genetic distances among species, provenances and hybrids

Overall, the PCA plot based on the genetic distance matrix positioned the different species and sections as expected, clustering phylogenetically closer species of the same section (Fig 5). A clear exception, however, was seen for E. longirostrata, taxonomically classified in section Latoangulatae. The PCA placed it away from Latoangulatae and closer to E. argophloia and E. deglupta. These two species belong to two different sections but they clustered together, considerably separated from all other species. In most cases, the PCA analysis had no resolution to discriminate provenances within species. In line with the fastSTRUCTURE results, exceptions were the Kroombit Tops provenance of E. saligna, and the two provenances each of E. robusta and E. grandis that were separated in the PCA.

thumbnail
Fig 5. PCA scatter plot of the 484 Eucalyptus individuals in the first two principal components.

Samples colored by species, provenances and hybrids were grouped in their respective classes according to the taxonomic sections involved in the cross. The ellipses depict the 95% confidence interval for the distribution of each species or hybrid group.

https://doi.org/10.1371/journal.pone.0289536.g005

Discussion

Genome-wide eucalypt species SNPs diversity

Consistent with the initial validation data provided alongside the EuCHIP60k development [31], our results corroborate that the current SNPs arrays platforms offer effective power to carry out genetic diversity analysis of the main eucalypt species planted worldwide. Within species, the proportion of polymorphic SNPs showed some variation, although for the vast majority, over 40% of the SNPs were informative and the average MAF was generally above 0.13, despite the relatively limited sample sizes analyzed (Table 3). Higher proportions of polymorphic SNPs, above 68% up to 93%, and higher average MAF were observed for E. grandis, E. camaldulensis and E. urophylla. These results may be explained in part from the somewhat larger sample sizes analyzed. In the case of E. urophylla the alleged mixture of provenances might have contributed to the higher diversity. A second explanation for the higher SNP diversity in these three species involves potential admixture due to unintended interspecific hybridization. These three species are widely used to generate interspecific hybrids in Brazil and the structure analysis results indicated admixture in the E. urophylla and E. camaldulensis trees (see below).

A third possible explanation for the higher SNP diversity observed in E. grandis, E. camaldulensis and E. urophylla is some ascertainment bias derived from the discovery panels used in the initial SNP discovery for the development of the EuCHIP60K. Although SNP discovery was carried out on sequence data for 240 trees of 12 species, a proportionally larger amount of sequence data was obtained for these three species when compared to the others [31]. Large proportions of informative SNPs (58–60%) were also seen for species of Maidenaria, consistent with the fact that E. globulus was also an important target of sequence production during SNP discovery. Larger proportions of polymorphic SNPs and higher average MAF were also observed in the different hybrids. This was evidently expected, given the transmission to the hybrid of alternative SNP alleles fixed in each parental species.

Except for E. urophylla, E. grandis, E. camaldulensis and the hybrids, the results suggest that the rate of SNP polymorphism might depend more on the level of genetic diversity captured in the specific sample of individuals than on the particular species analyzed. This in turn indicates that the SNP set used delivers largely equivalent numbers of polymorphic SNPs between any pairwise taxa within the main sections of subgenus Symphomyrtus. This indicates good potential for the selection of ancestry informative SNPs sets [52] that appear in substantially different frequencies between species, provenances or populations in this phylogenetic group. The expansion of the number of species and provenances and the specific selection of ancestry informative SNPs at the species and provenances levels would constitute an obvious follow-up of this study.

SNPs recover the expected species structure but admixture is present

Genome-wide SNP data provided the necessary resolution to check and validate the phylogenetic classification of germplasm sources of the eucalypt species sampled in this study. The most likely model for the SNP dataset found k = 18 clusters, allowing clearcut discrimination of the five sections and the 16 species sampled of subgenus Symphyomyrtus, while reliably indicating the admixed composition of hybrids (Fig 2). This result substantiates what a number of previous phylogenetic studies have shown using different types of DNA marker data such as ribosomal ITS, chloroplast DNA, microsatellites and DArT (reviewed in [2]), and more recent studies that further expanded the sampling of taxa and individuals within taxa [8]. The evolutionary history ‘written’ in the genome of these Symphyomyrtus species is generally consistent with their current phylogenetic organization within this subgenus.

Differently from several previous reports that examined germplasm sampled exclusively in their center of origin, our study included material conserved in exotic conditions from variable sources (Table 1). In general, for the species’ germplasm that came directly from original sources in Australia, the genetic structure splits were clearcut. For species that included material from unknown origins or collected in germplasm banks established in Brazil, occasional admixture was seen. The E. camaldulensis trees showed admixture with E. tereticornis, E. viminalis individuals showed admixture with E. dunnii, and E. urophylla sampled from multiple provenances established in Brazil displayed significant admixture with E. grandis (Fig 2). For some of these germplasm sources our data indicate that accidental hybridization might have taken place once the germplasm was introduced in Brazil. In the exotic habitat under different ecoclimatic conditions, reproductive barriers between eucalypt species such as geographic distance and flowering phenology that maintain species apart in their natural range, are relaxed or even broken, facilitating hybridization [53]. The paradigmatic example is the famous eucalypt hybrid swarm of the Rio Claro Arboretum established upon the introduction of Eucalyptus species in Brazil in 1904 [53, 54]. Several species were planted side by side, and seeds collected from that germplasm generated very heterogenous plantation forests, where some hybrids of unknown origin and outstanding performance were selected and are still planted or used in breeding programs today [17, 18]. The results of our study point to the development of ancestry informative SNPs that should allow reconstructing and understanding the recombination history of these hybrids.

The E. viminalis germplasm sample also showed evidences of admixture with E. dunni and E. globulus at k = 18. This sample of trees was from an advanced generation germplasm source established in Brazil but with unknown origin in Australia. Hybridization between these temperate species of section Maidenaria once introduced in Brazil cannot be ruled out, although less likely than for the previously mentioned species of Exsertaria and Latoangulatae, since Maidenaria species flower and produce seed less conspicuously in the tropics [53]. When a model with k = 20 was tested, E. viminalis individuals clearly split (S5 File) with no evidence of hybrid constitution. This result highlights the long-standing challenge with admixture modeling, whereby the most likely selection of K clusters is a difficult problem to automate in a way that is effectively robust [39].

The graphical projection of the different species and hybrids in the PCA was generally consistent with the phylogenetic expectations (Fig 5). Complementing the structure analysis, the PCA provided additional information regarding the genetic distance among the different taxa. E. deglupta and E. argophloia were placed at a considerable distance from the main section of Symphyomyrtus. The fact that they clustered together was however unexpected, since they are classified in distinct sections. These two species are currently part of Symphyomyrtus [55] and while no contention exists regarding the classification of E. argophloia, E. deglupta has originally been classified in subgenus Minutifructis [4]. The three main sections of interest in the subgenus were clearly separated and contained the expected species, exception made for E. longirostrata that clustered away from its section Latoangulatae and distant from Exsertaria as well.

Samples of E. longirostrata have been examined in the most extended molecular phylogenetic study of terminal taxa of sections Maidenaria, Exsertaria and Latoangulatae to date [8]. That study produced a phylogeny that largely matched the morphological treatment of sections, although sections Exsertaria and Latoangulatae were shown to be polyphyletic. Several inconsistencies between the morphological classification and the molecular phylogeny were described, and a number of taxa in Latoangulatae were deemed polyphyletic at the species level. A polyphyletic group is one that shows mixed evolutionary origin, descended from more than one ancestor, with taxa sharing homoplasies, typically explained as a result of convergent evolution, complicating the correct taxonomical classification [56]. E. longirostrata was itself deemed polyphyletic, classified within series Lepidotae-Fimbriatae and clustered into Latoangulatae IV, a clade considerably distant from Latoangulatae II where E. grandis, E. pellita, E. robusta and the section type species E. saligna belong. Furthermore, those authors suggest that all Latoangulatae species other than those in Latoangulatae II would be better placed in other taxonomic sections to reflect the phylogeny revealed in their study. The most recent classification of the eucalypts [14, 55] however, classified E. longirostrata into a different section, Pumilio. In our study, the sharp split of E. longirostrata from Latoangulatae and Exsertaria (Fig 5), provides further molecular evidence for this most recent taxonomic classification placing the species in a separate section.

Provenance discrimination is strongly dependent on geographical distance

With the exception of one provenance of E. saligna, all other Eucalyptus provenances could not be discriminated when all 484 samples were analyzed together (Fig 2). When species were analyzed separately, provenances could be discriminated for some species but not for others (Figs 3, 5). Looking at the geographical position of the sampled provenances (Fig 1), a pattern emerged suggesting that SNP-based discrimination was strongly dependent of geographical distance. The two provenances of E. grandis (Atherton and Coffs Harbor), separated in the structure and PCA analyses, are located at more than 2,000 km apart. The same happened with provenances Byfield and Brisbane of E. robusta at ~700 km from each other, and E. saligna Kroombit Tops provenance located at >700 km from the other two E. saligna provenances. All other provenances that were loosely or otherwise not discriminated are located at less than 200–300 km apart. These results indicate an isolation-by-distance (IBD) model of population structure for the provenances sampled for these species. The genetic similarity between populations will decrease exponentially as the geographic distance between them increases, because of the limiting effect of geographic distance on rates of gene flow [57].

A number of studies in Eucalyptus have looked at the prevalence of genetic structure between populations located at various geographic distances. These studies have generally shown that an IBD model fits well the observed data, with genetic distances between provenances strongly positively correlated with geographic distances [24, 58, 59]. A recent landscape study based on very dense DNA data obtained by whole genome sequencing in E. albens and E. sideroxylon, also found strong support for IBD in both species [60]. Taken together, ours and others’ results indicate that clearcut distinction of Eucalyptus germplasm sources in what regards provenance variation, might not be straightforward even with a dense panel of SNPs, unless provenances are geographically distant or provenance-informative SNP markers are specifically identified and used. As a result, what breeders may call as different provenances could in effect be members of the same continuous population despite several kilometers of physical distance, if gene flow is ubiquitous. It must be mentioned, however, that our study suffered from limited and somewhat uneven sampling of provenances that might have contributed to a greater difficulty in distinguishing some of them. It has been shown that subpopulations with reduced sampling tend to be merged together in genetic structure analyses, and uneven sampling may lead to downward-biased estimates of the true number of subpopulations [44]. Larger sample sizes for the provenances studied should allow better estimation of allele frequencies and possibly selection of ancestry informative, provenance-specific SNPs for greater discrimination power.

Genomic composition of hybrids indicates directional selection toward tropical genomes

Our genome-wide data showed that the majority of the hybrids studied (35 out of 44) displayed genomic composition deviating from the expected one based on pedigree information (Fig 4). This result is important in view of the long standing and widespread adoption of deliberate breeding strategies toward the selection of elite hybrid clones with specific anticipated genomic composition, especially in tropical countries (reviewed in [12]). This in turn highlights one more important application of using dense, high-quality array-based SNP data in support of breeding programs. SNP data not only provide precise germplasm identity verification, but more importantly allow the breeder to objectively recognize the actual ancestral origin of superior hybrids in order to discard unwanted hybrid combinations or to more realistically guide the breeding program toward the development of the desired genetic material.

For the sample of hybrids studied in this work, the lack of adherence between the expected genomic composition and the actual one suggests at least two hypotheses. Notwithstanding the possibility of mislabeling errors during controlled crosses, as likely the case for hybrids Hyb-13, Hyb-14 and Hyb-42, the second and most probable hypothesis is pervasive genetic admixture of the parents involved in the original interspecific cross. Given the frequently unknown introduction history, followed by local intermating in Brazil in the last 120 years, as discussed previously, there is a considerable possibility that the presumed parents were themselves misclassified. Moreover, because hybrids tend to be produced by crossing good performing parents in the breeding program, it is quite possible that actually some of the parents used were themselves hybrids, distorting the expected composition of the resulting hybrid offspring. Species within the same sections of Symphyomyrtus that display overlapping morphological features and easily hybridize would be more prone to such occurrences. Clearcut examples were six supposedly F1 hybrids that in principle did not involve E. grandis, but where the SNP data revealed its presence (Hyb-13, Hyb-17, Hyb-18, Hyb-19, Hyb-26, Hyb-42). Likewise, several F1 hybrids of E. urophylla with E. camaldulensis (Hybrids 1 group) showed variable amounts of E. grandis genome in their composition, and the presence of E. tereticornis genome more frequently than that of E. camaldulensis (Fig 4). Admixture of E. grandis genome into the E. urophylla parents and difficulties in morphologically discriminating E. camaldulensis germplasm from E. tereticornis could readily explain these results.

Besides the presence of E. grandis as an unexpected species in the genomically realized pedigree, the observation of larger than expected proportions of E. grandis genome was also seen for all hybrids where this species was involved. Fourteen hybrids derived from advanced generation recombinant intercrosses involved one or both hybrid parents with three or more species represented, E. grandis being one of them (ex. Hyb-14, Hyb-15, Hyb-16, Hyb-20, Hyb-22 through Hyb-25, Hyb-27 through Hyb-29, Hyb-32, Hyb-43 and Hyb-44) (Table 2). The pedigree-expected proportions were estimated based on the final presumed participation of each single species in the pedigree, assuming balanced Mendelian inheritance and recombination rates in the previous hybrid generations with no selection. For all these 20 hybrids, the SNP data showed, however, a consistently higher proportion of E. grandis genome in the hybrid composition. Aside from unintended admixture in the original parents, the ubiquitous unexpected presence or higher than anticipated proportion of E. grandis genome in the vast majority of hybrids, strongly suggests genome-wide directional selection for this species’ genome throughout the breeding history of these complex hybrid clones. This should not be surprising given that volume growth is the main breeding target, and that E. grandis is well known for its fast growth [53]. Our data therefore not only corroborates the pivotal role of E. grandis in hybrid breeding, but also shows that its actual participation is considerably larger than expected and frequently unintended. Moreover, our data also demonstrate that in hybrids between species of Latoangulatae and Exsertaria with species of Maidenaria (Hybrids 2 group), the actual participation of the latter, such as E. globulus. E. dunnii and E. benthamii in the final hybrid’s genome composition is less than expected, consistent with strong selection against the less adapted temperate genomes in tropical environments.

Concluding remarks

In conclusion, we have shown that the current Eucalyptus multi-species SNP array platform, provides a valuable tool to look at within taxa variation in Symphyomyrtus, to investigate population structure and track the genomic ancestry of individual clones. As the current “gold standard” in the high-throughput SNP genotyping industry, SNP arrays provide full data portability across studies carried out at different times. This represents a crucial advantage for the construction of legacy SNP databases for multiple Eucalyptus species and populations when compared to reduced representation genotyping by sequencing methods. SNP array data portability across studies allows effortless data consolidation across time for comparative studies and meta-analyses, that should be valuable for resolving taxonomic issues that still persist in the eucalypts. We are aware, however, that for eucalypt species phylogenetically distant from subgenus Symphyomyrtus, the current SNP array will not provide equivalent numbers of informative SNPs due to a higher genomic divergence [31].

We have also shown that while species classification is well resolved at the genome-wide level, provenance discrimination is not always so. It depends essentially on geographical distance, consistent with an isolation by distance model, and likely to be impacted by sample size. Further studies with larger samples sizes and the identification of provenance specific SNPs are warranted. Finally, our results are novel in that they objectively show, based on SNP data, that unplanned genetic admixture should not be a surprise in exotic germplasm sources not only in Brazil but likely in other countries, especially among phylogenetically closer species that easily hybridize in exotic environments. Moreover, the genomic ancestral composition of control-crossed hybrids in Brazil indicated that strong selection takes place in favor of tropical genomes and more specifically that of E. grandis. SNP-based auditing of hybrids’ genomic composition could be introduced as a standard practice in hybrid breeding programs to more truthfully guide the program toward the development of the desired genetic material.

Supporting information

S1 File. SNP genotype data.

Complete dataset for the filtered 27,298 SNPs obtained with the 72k Eucalyptus Axiom Array for the 484 individuals studied.

https://doi.org/10.1371/journal.pone.0289536.s001

(CSV)

S2 File. Supervised estimators of k clusters.

Results of the four supervised estimators of Puechmaille (2016) to detect the number of clusters implemented in the web server StructureSelector (Li and Liu 2018) indicating that the germplasm set is most likely structured in 18 clusters after modelling with a variable number of k from 2 to 30 using FastStructure.

https://doi.org/10.1371/journal.pone.0289536.s002

(DOCX)

S3 File. Output fastSTRUCTURE at k = 18.

Output of the meanQ values of the fastSTRUCTURE analysis of the 484 individuals studied with K = 18.

https://doi.org/10.1371/journal.pone.0289536.s003

(XLSX)

S4 File. Structure analysis plot at k = 2 to 4.

Population structure analyses of the Eucalyptus species and hybrids clustered with variable numbers of clusters (K) from 2 to 4 separating the Eucalyptus sections (Maidenaria, Latoangulatae e Exsertaria), while displaying admixture in species of section Latoangulatae.

https://doi.org/10.1371/journal.pone.0289536.s004

(DOCX)

S5 File. Structure analysis plot at k = 20 to 30.

Population structure analyses of the Eucalyptus species and hybrids clustered with variable numbers of clusters (K) from 20 to 30, beyond the most likely model with K = 18.

https://doi.org/10.1371/journal.pone.0289536.s005

(DOCX)

S6 File. Output fastSTRUCTURE & ADMIXTURE of hybrids’ composition.

Output of the meanQ values of the unsupervised fastSTRUCTURE analysis (sheet A) and supervised ADMIXTURE analysis (sheet B) of the ancestral genomic composition of the 44 hybrids.

https://doi.org/10.1371/journal.pone.0289536.s006

(XLSX)

Acknowledgments

We would like to thank the IPEF cooperative tree breeding program (PCMF) affiliated companies, highlighting CMPC, Klabin, Suzano and Vallourec for providing materials for the study. Special thanks to the Experimental Stations of Forestry Sciences at ESALQ/USP for maintaining a large genetic collection of eucalypts in partnership with IPEF (current agreement: 1013868) and to prof. Alexandre S. Coelho for his assistance with computational facilities.

References

  1. 1. Myburg AA, Potts BM, Marques CM, Kirst M, Gion JM, Grattapaglia D, et al. Eucalyptus. In: C K, editor. Genome mapping and molecular breeding in plants Vol. 7: Forest trees. New York, NY, USA: Springer; 2007. p. 115–60.
  2. 2. Grattapaglia D, Vaillancourt R, Shepherd M, Thumma B, Foley W, Külheim C, et al. Progress in Myrtaceae genetics and genomics: Eucalyptus as the pivotal genus. Tree Genet Genomes. 2012;3:463–508.
  3. 3. Iglesias-Trabado G, Wilstermann D. Eucalyptus universalis. Global cultivated eucalypt forests map June 2009. Version 1.0.2: www.gitforestry.; 2009 [cited accessed 21 April 2023].
  4. 4. Brooker MIH. A new classification of the genus Eucalyptus L’Her. (Myrtaceae). Australian Systematic Botany. 2000;13(1):79–148.
  5. 5. Slee A, Brooker M, Duffy S, West J. EUCLID: Eucalypts of Australia. Collingwood, Australia: CSIRO Publishing; 2006.
  6. 6. Steane DA, Nicolle D, Vaillancourt RE, Potts BM. Higher-level relationships among the eucalypts are resolved by ITS-sequence data. Australian Systematic Botany. 2002;15:49–62.
  7. 7. Bayly MJ, Rigault P, Spokevicius A, Ladiges PY, Ades PK, Anderson C, et al. Chloroplast genome analysis of Australian eucalypts–Eucalyptus, Corymbia, Angophora, Allosyncarpia and Stockwellia (Myrtaceae). Molecular Phylogenetics and Evolution. 2013;69(3):704–16. pmid:23876290
  8. 8. Jones RC, Nicolle D, Steane DA, Vaillancourt RE, Potts BM. High density, genome-wide markers and intra-specific replication yield an unprecedented phylogenetic reconstruction of a globally significant, speciose lineage of Eucalyptus. Molecular Phylogenetics and Evolution. 2016;105:63–85. pmid:27530705
  9. 9. Steane DA, Nicolle D, Sansaloni CP, Petroli CD, Carling J, Kilian A, et al. Population genetic analysis and phylogeny reconstruction in Eucalyptus (Myrtaceae) using high-throughput, genome-wide genotyping. Mol Phylogenet Evol. 2011;59(1):206–24. Epub 2011/02/12. pmid:21310251.
  10. 10. Griffin AR, Burgess IP, Wolf L. Patterns of natural and manipulated hybridisation in the genus Eucalyptus L’Herit.—a review. Australian Journal of Botany. 1988;36:41–66.
  11. 11. Potts BM, Barbour RC, Hingston AB, Vaillancourt RE. Turner Review No. 6: Genetic pollution of native eucalypt gene pools—identifying the risks. Australian Journal of Botany. 2003;51(1):1–25. ISI:000181019700001.
  12. 12. Potts BM, Dungey HS. Hybridisation of Eucalyptus: Key issues for breeders and geneticists. New Forest. 2004;27:115–38.
  13. 13. Butcher PA, McDonald MW, Bell JC. Congruence between environmental parameters, morphology and genetic structure in Australia’s most widely distributed eucalypt, Eucalyptus camaldulensis. Tree Genet Genomes. 2009;5(1):189–210.
  14. 14. Nicolle D, Jones RC. A revised classification for the predominantly eastern Australian Eucalyptus subgenus Symphyomyrtus sections Maidenaria, Exsertaria, Latoangulatae and related smaller sections (Myrtaceae). Telopea. 2018;21:129–45. https://doi.org/10.7751/telopea12571.
  15. 15. Harwood C, editor New introductions—doing it right. Proceedings of the Conference "Developing a Eucalypt Resource for New Zealand"; 2011; Blenheim, New Zealand.
  16. 16. dos Santos GA, Nunes ACP, de Resende MDV, Silva LD, Higa A, de Assis TF. An index combining volume and Pilodyn penetration to study stability and adaptability of Eucalyptus multi-species hybrids in Rio Grande do Sul, Brazil. Australian Forestry. 2016;79(4):248–55.
  17. 17. Rezende GDSP, Resende MDV, Assis TF. Eucalyptus Breeding for Clonal Forestry. In: Fenning T, editor. Challenges and Opportunities for the World’s Forests in the 21st Century. Dordrecht: Springer Netherlands; 2014. p. 393–424.
  18. 18. Grattapaglia D, Kirst M. Eucalyptus applied genomics: from gene sequences to breeding tools. New Phytologist. 2008;179(4):911–29. ISI:000258266200005. pmid:18537893
  19. 19. Assis T. Hybrids and mini-cutting: a powerful combination that has revolutionized the Eucalyptus clonal forestry. BMC Proceedings. 2011;5(Suppl 7):I18.
  20. 20. Lima BM, Cappa EP, Silva-Junior OB, Garcia C, Mansfield SD, Grattapaglia D. Quantitative genetic parameters for growth and wood properties in Eucalyptus “urograndis” hybrid using near-infrared phenotyping and genome-wide SNP-based relationships. PloS one. 2019;14(6):e0218747. pmid:31233563
  21. 21. Paludeto JGZ, Grattapaglia D, Estopa RA, Tambarussi EV. Genomic relationship-based genetic parameters and prospects of genomic selection for growth and wood quality traits in Eucalyptus benthamii. Tree Genet Genomes. 2021;17(4):20. WOS:000679453800001.
  22. 22. McKinnon GE, Vaillancourt RE, Steane DA, Potts BM. An AFLP marker approach to lower-level systematics in Eucalyptus (Myrtaceae). American Journal of Botany. 2008;95(3):368–80. pmid:21632361
  23. 23. Hudson CJ, Freeman JS, Myburg AA, Potts BM, Vaillancourt RE. Genomic patterns of species diversity and divergence in Eucalyptus. New Phytologist. 2015;206(4):1378–90. pmid:25678438
  24. 24. Rutherford S, Rossetto M, Bragg JG, McPherson H, Benson D, Bonser SP, et al. Speciation in the presence of gene flow: population genomics of closely related and diverging Eucalyptus species. Heredity. 2018;121(2):126–41. pmid:29632325
  25. 25. von Takach Dukai B, Jack C, Borevitz J, Lindenmayer DB, Banks SC. Pervasive admixture between eucalypt species has consequences for conservation and assisted migration. Evolutionary Applications. 2019;12(4):845–60. pmid:30976314
  26. 26. Sansaloni CP, Petroli CD, Carling J, Hudson CJ, Steane DA, Myburg AA, et al. A high-density Diversity Arrays Technology (DArT) microarray for genome-wide genotyping in Eucalyptus. Plant Methods. 2010;6:16. Epub 2010/07/01. [pii] pmid:20587069.
  27. 27. Sansaloni C, Petroli C, Jaccoud D, Carling J, Detering F, Grattapaglia D, et al. Diversity Arrays Technology (DArT) and next-generation sequencing combined: genome-wide, high throughput, highly informative genotyping for molecular breeding of Eucalyptus. BMC Proceedings. 2011;5(Suppl 7):P54.
  28. 28. Myles S. Improving fruit and wine: what does genomics have to offer? Trends in Genetics. 2013;29(4):190–6. pmid:23428114
  29. 29. Gautier M, Gharbi K, Cezard T, Foucaud J, Kerdelhué C, Pudlo P, et al. The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Molecular Ecology. 2013;22(11):3165–78. pmid:23110526
  30. 30. Lowry DB, Hoban S, Kelley J, L., Lotterhos K, E., Reed L, K., Antolin M, F., et al. Breaking RAD: an evaluation of the utility of restriction site‐associated DNA sequencing for genome scans of adaptation. Molecular Ecology Resources. 2016;17(2):142–52. pmid:27860289
  31. 31. Silva-Junior OB, Faria DA, Grattapaglia D. A flexible multi-species genome-wide 60K SNP chip developed from pooled resequencing 240 Eucalyptus tree genomes across 12 species. New Phytologist. 2015;206(4):1527–40. pmid:25684350
  32. 32. Faria DA, Mamani EMC, Pappas GJ, Grattapaglia D. Genotyping systems for Eucalyptus based on tetra-, penta-, and hexanucleotide repeat EST microsatellites and their use for individual fingerprinting and assignment tests. Tree Genet Genomes. 2011;7(1):63–77. ISI:000286462800006.
  33. 33. Correia L, Faria D, Grattapaglia D. Comparative assessment of SNPs and microsatellites for fingerprinting, parentage and assignment testing in species of Eucalyptus. BMC Proceedings. 2011;5(Suppl 7):P41.
  34. 34. Silva-Junior OB, Grattapaglia D. Genome-wide patterns of recombination, linkage disequilibrium and nucleotide diversity from pooled resequencing and single nucleotide polymorphism genotyping unlock the evolutionary history of Eucalyptus grandis. New Phytologist. 2015;208:830–45. pmid:26079595.
  35. 35. Grattapaglia D. Twelve Years into Genomic Selection in Forest Trees: Climbing the Slope of Enlightenment of Marker Assisted Tree Breeding. Forests [Internet]. 2022; 13(10).
  36. 36. Mostert-O’Neill MM, Reynolds SM, Acosta JJ, Lee DJ, Borevitz JO, Myburg AA. Genomic evidence of introgression and adaptation in a model subtropical tree species, Eucalyptus grandis. Molecular Ecology. 2021;30(3):625–38. pmid:32881106
  37. 37. Inglis PW, Pappas MdCR, Resende LV, Grattapaglia D. Fast and inexpensive protocols for consistent extraction of high quality DNA and RNA from challenging plant and fungal samples for high-throughput SNP genotyping and sequencing applications. PLosOne. 2018;13(10):e0206085. pmid:30335843
  38. 38. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. Epub 2007/07/25. pmid:17701901.
  39. 39. Liu C-C, Shringarpure S, Lange K, Novembre J. Exploring Population Structure with Admixture Models and Principal Component Analysis. In: Dutheil JY, editor. Statistical Population Genomics. New York, NY: Springer US; 2020. p. 67–86.
  40. 40. Jombart T. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics. 2008;24(11):1403–5. pmid:18397895
  41. 41. Goudet J. hierfstat, a package for r to compute and test hierarchical F-statistics. Molecular Ecology Notes. 2005;5(1):184–6. https://doi.org/10.1111/j.1471-8286.2004.00828.x.
  42. 42. Lischer HEL, Excoffier L. PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics. 2012;28(2):298–9. pmid:22110245
  43. 43. Raj A, Stephens M, Pritchard JK. fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets. Genetics. 2014;197(2):573–89. pmid:24700103
  44. 44. Puechmaille SJ. The program structure does not reliably recover the correct population structure when sampling is uneven: subsampling and new estimators alleviate the problem. Molecular Ecology Resources. 2016;16(3):608–27. pmid:26856252
  45. 45. Li Y-L, Liu J-X. StructureSelector: A web-based software to select and visualize the optimal number of clusters using multiple methods. Molecular Ecology Resources. 2018;18(1):176–7. pmid:28921901
  46. 46. Francis RM. pophelper: an R package and web app to analyse and visualize population structure. Molecular Ecology Resources. 2017;17(1):27–32. https://doi.org/10.1111/1755-0998.12509.
  47. 47. Auguie B. gridExtra. Miscellaneous Functions for "Grid" Graphics.2017.
  48. 48. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research. 2009;19(9):1655–64. pmid:19648217
  49. 49. Paradis E, Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35(3):526–8. pmid:30016406
  50. 50. Paradis E. pegas: an R package for population genetics with an integrated–modular approach. Bioinformatics. 2010;26(3):419–20. pmid:20080509
  51. 51. Wickham H. ggplot2. Elegant Graphics for Data Analysis. Cham, Switzerland: Springer 2016. 260 p.
  52. 52. Phillips C, Salas A, Sánchez JJ, Fondevila M, Gómez-Tato A, Álvarez-Dios J, et al. Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. Forensic Science International: Genetics. 2007;1(3):273–80. pmid:19083773
  53. 53. Eldridge K, Davidson J, Harwood C, van Wyk G. Eucalypt domestication and breeding. Oxford: Clarendon Press; 1993. 288 p.
  54. 54. Brune A, Zobel BJ. Genetic base populations, gene pools and breeding populations for Eucalyptus in Brazil. Silvae Genet. 1981;30(4–5):146–9.
  55. 55. Nicolle D. Classification of the eucalypts (Angophora, Corymbia and Eucalyptus) Version 6. 2022. Available from: http://www.dn.com.au/Classification-Of-The-Eucalypts.pdf.
  56. 56. Beentje H, Williamson J. The Kew Plant Glossary: An Illustrated Dictionary of Plant Terms: Kew; 2010.
  57. 57. Kimura M, Weiss GH. The stepping stone model of population structure and the decrease of genetic correlation with distance. Genetics. 1964;49(4):561–76. pmid:17248204
  58. 58. Jones TH, Vaillancourt RE, Potts BM. Detection and visualization of spatial genetic structure in continuous Eucalyptus globulus forest. Molecular Ecology. 2007;16(4):697–707. pmid:17284205
  59. 59. Steane DA, Potts BM, McLean E, Collins L, Prober SM, Stock WD, et al. Genome-wide scans reveal cryptic population structure in a dry-adapted eucalypt. Tree Genetics & Genomes. 2015;11(3):33.
  60. 60. Murray KD, Janes JK, Jones A, Bothwell HM, Andrew RL, Borevitz JO. Landscape drivers of genomic diversity and divergence in woodland Eucalyptus. Molecular Ecology. 2019;28(24):5232–47. pmid:31647597