Chloroplast variation is incongruent with classification of the Australian bloodwood eucalypts (genus Corymbia, family Myrtaceae)

Previous molecular phylogenetic analyses have resolved the Australian bloodwood eucalypt genus Corymbia (~100 species) as either monophyletic or paraphyletic with respect to Angophora (9–10 species). Here we assess relationships of Corymbia and Angophora using a large dataset of chloroplast DNA sequences (121,016 base pairs; from 90 accessions representing 55 Corymbia and 8 Angophora species, plus 33 accessions of related genera), skimmed from high throughput sequencing of genomic DNA, and compare results with new analyses of nuclear ITS sequences (119 accessions) from previous studies. Maximum likelihood and maximum parsimony analyses of cpDNA resolve well supported trees with most nodes having >95% bootstrap support. These trees strongly reject monophyly of Corymbia, its two subgenera (Corymbia and Blakella), most taxonomic sections (Abbreviatae, Maculatae, Naviculares, Septentrionales), and several species. ITS trees weakly indicate paraphyly of Corymbia (bootstrap support <50% for maximum likelihood, and 71% for parsimony), but are highly incongruent with the cpDNA analyses, in that they support monophyly of both subgenera and some taxonomic sections of Corymbia. The striking incongruence between cpDNA trees and both morphological taxonomy and ITS trees is attributed largely to chloroplast introgression between taxa, because of geographic sharing of chloroplast clades across taxonomic groups. Such introgression has been widely inferred in studies of the related genus Eucalyptus. This is the first report of its likely prevalence in Corymbia and Angophora, but this is consistent with previous morphological inferences of hybridisation between species. Our findings (based on continent-wide sampling) highlight a need for more focussed studies to assess the extent of hybridisation and introgression in the evolutionary history of these genera, and that critical testing of the classification of Corymbia and Angophora requires additional sequence data from nuclear genomes.

The taxonomic splitting of Corymbia from Eucalyptus was contentious (e.g. [4,23]), with the key motivation for the separation of Corymbia being that the bloodwoods, on the basis of both morphological analyses [2,24] and early molecular analyses [25], were more closely related to Angophora than to Eucalyptus. That relationship has been unequivocally supported by all subsequent molecular phylogenetic analyses of the group (e.g. [6,26,27,28,29,30,31]), and is supported by some morphological characters, including patterns of leaf venation, features of trichomes, and the presence of oil ducts in the pith of branches [2,24,32]. There are, nonetheless, clear differences in some macro-morphological features between the two groups that have led to the longstanding treatment of Angophora as a separate genus from the bloodwoods (whether placed in Eucalyptus or treated as Corymbia) by almost all authors (e.g., [5,8,9,18,33,34,35]) since Angophora was first described in 1797 [36]. The most notable differences between the groups are in the flowers, which in Angophora have free sepals and petals, in contrast to the calyptrate/operculate perianth of Corymbia (Fig 2). Despite such morphological differences, molecular phylogenetic analyses have presented conflicting signals regarding monophyly of Corymbia [37], with some resolving the genus as monophyletic (e.g. [6,27,38,39]), while others resolve it as paraphyletic, with Angophora nested within it [28,29,30,31,40].
Most phylogenetic analyses assessing the relationships of bloodwoods to other eucalypts have employed few DNA markers generated by conventional Sanger sequencing methods (e.g. [6,26,27,28,29,30,31]). The use of High-Throughput Sequencing (HTS) methods, which can generate larger volumes of sequence data, are only just beginning to be used in eucalypt studies (e.g. [41]). Partly as a result of the small size of most molecular datasets, some key relationships have typically been poorly supported, including that of the bloodwoods to Angophora. For example, Maximum Parsimony (MP) bootstrap support values indicating paraphyly of Corymbia have generally been in the range of 51-93% for clades showing Angophora nested in Corymbia [28,30,40], and those for a monophyletic Corymbia have ranged from 78-100% [6,27,38,39]. An exception is a recent study that used analyses of whole chloroplast (cp) genomes [41], which showed strong support for Corymbia as paraphyletic with respect to Angophora (parsimony bootstrap support and Bayesian posterior probability both 100%), with subg. Blakella being more closely related to Angophora than to subg. Corymbia. However, that study, despite  [2] and following the classification of [6]). Colour coding of groups matches that used in other figures, i.e., taxonomic sections of Corymbia sensu Parra-Osorio et al. [6].
https://doi.org/10.1371/journal.pone.0195034.g001 using a large amount of sequence data, included only one sample each of the red bloodwoods, yellow bloodwoods, ghost gums and spotted gums, and it was unclear whether the result was an artefact of sparse taxon sampling.
In this study, we assess relationships of bloodwood eucalypts, including those among species, series, sections and subgenera of Corymbia, and those of Corymbia to Angophora. We expand on the sampling of Bayly et al. [41] to include the largest sample of species of Angophora and Corymbia in any molecular study using HTS data to date (Table 1). We specifically address questions relating to the evolutionary history of this large and ecologically and economically important group and test the current taxonomic classification, especially at the ranks of genus and subgenus. We use both chloroplast genome derived sequences and combine sequences of nuclear ribosomal internal transcribed spacer (ITS) regions from previous studies for separate phylogenetic analyses using Maximum Likelihood (ML) and MP methods, to provide assessment of phylogenetic signal from both nuclear and chloroplast markers.  Specifically these methods are used to test the hypotheses that 1) Corymbia is monophyletic and 2) the currently recognised subgenera [6] are monophyletic.

Taxon sampling for chloroplast DNA study
Samples and sequences of chloroplast DNA used in this study are listed in Table 1. Names of species and infraspecific taxa generally follow the Australian Plant Census [1], and authorities are only given in the text for species names not listed in Table 1. Taxonomic works used to Classification of subgenera and sections follows [6]; classification of series follows the informal classification of [2], where consistent with the higher-level groupings.  identify samples were [2,8,18]. For Corymbia, our sampling included 55 of the 97 species accepted by the Council of Heads of Australasian Herbaria (CHAH) [1], including all of the subgenera and sections recognised in the classification of [6], and with 19 species represented by at least two accessions. Sampling for Angophora included eight of the ten species recognised by CHAH [1], four of which were represented by at least two accessions. We also included sequences for 31 species of Eucalyptus and outgroups Allosyncarpia and Stockwellia from [41].
In total, the analysis included 123 accessions, of which 84 were newly sequenced for this study.

DNA isolation from silica dried leaves
For the cpDNA study, total genomic DNA (gDNA) was extracted from ca. 80 mg of recently collected leaf tissue (no older than one year) using a modified CTAB DNA extraction protocol [44,45]. Older silica-dried collections were difficult to extract suitable DNA from, probably due to chemical DNA degradation in this plant group rich in secondary metabolites. The CTAB lysing buffer (2% w/v cetyltrimethylammonium bromide (CTAB), 2% w/v polyvinylpyrrolidone 40,000 (PVP-40), 1.4 M NaCl, 20 mM EDTA, 100 mM Tris-HCl pH 8.0) was modified by addition of 0.6% v/v each of 2-mercaptoethanol, RNase A, and proteinase K per sample. Further modifications to the CTAB extraction protocol included a sucrose/Tris/EDTA (STE) wash (8% w/v sucrose, 1 M Tris-HCl pH 7.0, 0.5 M EDTA) before lysis using 1 mL of STE per 80 mg of ground plant tissue [46]. The STE solution was discarded after centrifugation at 5,000 rpm for 10 min, and the pellet suspended in 700 μL of preheated (65˚C) CTAB lysate buffer. After adding 110 μL bovine serum albumin (BSA)/NaCl (1:10, 4% BSA:5 M NaCl) to each sample, they were left to incubate for ca. 16 hrs at 60˚C. Two 2/3 volume chloroform extractions were done, centrifuging for 10 min at 14,800 rpm for the first and then 8 min at the same speed for the second. DNA was precipitated with 2/3 volume of 100% isopropanol (room temperature). After 30-60 min incubation at room temperature, the DNA was centrifuged into a pellet at 14,800 rpm for 15 min and washed twice with 70% ethanol after discarding the isopropanol. DNA was resuspended in 100 μL TE pH 8.0 (10 mM Tris-HCl:1 mM EDTA pH 8.0) after leaving the pellet to dry overnight to allow all of the ethanol to evaporate. DNA quantity and quality were checked with Nanodrop 2000 (NanoDrop Products) and Qubit 2.0 fluorometer (Invitrogen) instruments and visualised by electrophoresis (1.5% agarose gel) with ethidium bromide.

DNA library construction and sequencing
This section details a relatively cost-effective library preparation protocol at ca. AUD 35 per sample using no proprietary kits. All reagents are from New England BioLabs (NEB) if not stated otherwise. Immediately before sonication, a DNA aliquot was washed with ethanol/ sodium acetate (5.5:1, 100% ethanol:2.4M NaAc) at a 1:4.7 DNA:wash solution volume, and then centrifuged at 14,800 rpm for 10 min. After discarding the wash solution, the resulting pellet was washed with 70% ethanol and resuspended in 100 μL 1 M Tris-HCl pH 8.0. DNA was quantified with a Qubit 2.0 (Invitrogen), and an aliquot of 3 μg of gDNA per sample was brought to 115 μL with ultrapure H 2 O. The DNA was sonicated for 50 sec with a S220 Focused-ultrasonicator (Covaris) set to 6-8˚C, 120W peak incident power, 200 cycles per burst, and on duty cycle 5%, aiming for 800 bp mean fragment size. The sonicated samples (100 μL each) were cleaned using Serapure SPRI beads [47] at a 0.6:1.0 beads:sample ratio to remove short fragments (<300 bp) by incubating this mixture for 20 min at room temperature, immobilising beads on a 96S super magnet plate (Alpaqua) for 15 min, discarding the supernatant and washing with 170 μL 80% ethanol, and then leaving the magnet-trapped beads to air dry for 2 min.
NEBNext End Repair Module produced blunt ends on the fragmented DNA by eluting the DNA from the magnet-trapped beads and incubating with 2.0 μL 10 × reaction buffer, 0.4 μL enzyme mix, and 17.6 μL ultrapure H 2 O per sample at 20˚C for 60 min. Samples were again purified using the Serapure SPRI beads by adding 50 μL PEG:NaCl (20% PEG w/v:5 M NaCl) and 50 μL 100% isopropanol to each sample, incubating this for 15 min at room temperature, and washed using 80% ethanol as in the above steps. Then dA tails were attached to the fragments using 0.  [47] was run to 20 cycles on a BioRad CFX q-PCR machine. Settings for the q-PCR were 30 sec of 98˚C for denaturation and 20 cycles of 98˚C for 10 sec, 67˚C for 30 sec, and 72˚C for 30 sec. Once the sample appropriate cycle number was determined from this q-PCR, a 40 μL reaction and including sample-specific PE primers including indexing barcodes to allow pooling of multiple samples per sequencing was run using the same settings as before.
Samples were pooled and quality checked either with an Agilent Bioanalyser DNA1000 chip system (Agilent) for HiSeq 1500 (Illumina) sequencing or a 2200 Tape Station using the D1000 kit (Agilent) and Qubit 3.0 (Invitrogen) for sequencing on a NextSeq 500 machine (Illumina). A 250 cycle (2 × 125 paired end reads) kit (Illumina) was used for the former or a 300 cycle (2 × 150 paired end reads) kit (Illumina) for the latter sequencer.

Sequence trimming, quality control, read mapping and chloroplast sequence assembly
Base calling and quality filtering was done with Illumina pipeline software (v.1.7 or later) and samples were pre-processed with custom scripts at the Walter and Eliza Hall Institute of Medical Research (WEHI) sequencing facility.
The new sequences were assessed, trimmed and assembled with CLC Genomics Workbench v. 9.5.1 and 9.5.2 (Qiagen) and the CLC Workflow is available as supplementary material (S1 File). Paired-end reads were paired and reads shorter than 15 or longer than 1000 bp were discarded. Reads below PHRED score 20 were also discarded. The fraction of low quality bases that were allowed in a read was 5%.
The quality-filtered and paired reads were mapped against a reference chloroplast genome (Eucalyptus globulus, GenBank accession: NC_008115.1). Sequence coverage was generally sufficient to unambiguously assemble most of the chloroplast genome for each sample, but all genomes included some regions with low coverage of mapped reads. A consensus sequence of the mapped assembly was created by removing regions with low coverage and inserting ambiguity codes for bases with more than one possible nucleotide using a threshold of 50% and 'maximum number of ambiguous nucleotides allowed after trimming = 2'.

Sequence alignment and phylogenetic analyses of chloroplast DNA
All newly generated chloroplast sequences included in the study were aligned with MAFFT v.7.299b [48] and the fast and progressive method FFT-NS-2, suitable for large alignments. One inverted repeat region (IRa) was excluded from the alignment. The alignment was viewed with SeaView v.4.6 [49] or Mesquite v.3.10 [50] and subsequently processed with GBLOCKS v.0.91b [51] using default parameters, which stringently trims alignments allowing no gaps. Hence, the final dataset only included regions with sequence coverage for all samples, and all indels were removed.
We used jModelTest v.2.1.10 [52,53] and the AIC and BIC criteria, to estimate the model of nucleotide substitution that best fits the chloroplast data. The maximum likelihood analysis was done with Standard RAxML v.8.2.8 [54] with 1000 rapid bootstrap inferences and a thorough ML search under the GAMMA model of rate heterogeneity. The maximum parsimony analysis was done with PAUP Ã v.4.0a151 [55] with the following settings: all characters were treated as unordered and of equal weight. Heuristic searches employed tree-bisection-reconnection branch swapping and 1000 replicates of random stepwise additions. The number of bootstrap replicates was 1000 with one tree held at each step. Trees were viewed and exported for rendering in FigTree v.1.4.3 [56].

Analysis of nuclear ribosomal DNA
For comparison with the cpDNA phylogeny, we combined sequences of the ITS regions of nuclear ribosomal DNA (nrDNA) from previous studies [6,26,29,30,31,57,58] for phylogenetic analyses. Separate analyses of these nrDNA sequences have not been presented in previous studies, with most including only a small number of Corymbia samples or, in the case of the largest study to date [31], also combining a subset of these sequences with cpDNA markers in analyses of a concatenated dataset. Our dataset included 66 accessions of Corymbia (representing all taxonomic sections), 15 of Angophora (9 of 10 species), 31 of Eucalyptus (the same species as in the cpDNA dataset, representing major lineages), two of Stockwellia, one of Eucalyptopsis, three of Allosyncarpia, and one of Arillastrum (used as outgroup). Partial sequences, or those identified as spacers associated with pseudogenes using established criteria [59,60,61,62], were excluded from analyses. We used existing nrDNA sequences for analyses, rather than assembling novel sequences from our current genomic data for Corymbia, because of the presence of substantial within-genome variation in our samples (in line with previous reports [38,59,60]), and associated difficulties in separating and assembling sequences of the various paralogues/alleles, which is a challenging task worthy of separate investigation and discussion. The nrITS sequences from GenBank were aligned using Geneious v.9.1.7 [63]. Model testing, ML and MP analyses were conducted as outlined above with the addition that gaps present in the nrDNA alignment were treated as missing data in the MP analysis.

Analysis of chloroplast DNA
GBLOCKS eliminated 37% of the MAFFT alignment, resulting in 121,016 characters and 10,847 distinct alignment patterns in the final alignment (see supplementary material S2 File). Both, AIC and BIC from jModelTest indicated the General Time Reversible model using gamma and invariant sites (GTR+I+G) as the best fit. Final ML Optimization Likelihood was -254343.104231. The maximum parsimony analysis had 3771 parsimony informative characters, 4116 variable but uninformative characters, and resulted in 12 trees with length = 10413, consistency index = 0.81, retention index = 0.96. Topologies of the ML and MP trees were similar (Fig 3), and most nodes had 85-100% bootstrap support (BS) for both ML and MP analyses, with the backbone, in particular, well supported.
Rooting the trees with Allosyncarpia and Stockwellia recovered relationships of a monophyletic Eucalyptus as sister to Angophora + Corymbia. Eucalyptus is composed of three clades corresponding to 'Eudesmids' (subg. Eudesmia, represented by E. erythrocorys) subtending a wellsupported clade of 'Monocalypts' sensu [41] Fig 3). Furthermore, subgenera Blakella and Corymbia, most non-monotypic sections (Abbreviatae, Maculatae, Naviculares, Septentrionales), and several species including more than one accession here are not monophyletic. In addition to Angophora, clade A includes a basal grade of a few species of red bloodwoods from southern Australia that do not group with all other red bloodwoods in clade B. The red bloodwoods in clade A include C. gummifera (monotypic sect. Corymbia) from south-eastern Australia and C. calophylla, C. ficifolia, and C. haematoxylon that correspond to sect. Calophyllae from south-western Western Australia (Fig  1). Clade A also includes the yellow bloodwoods (sect. Naviculares), spotted gums (sect. Maculatae), cadagi or C. torelliana (monotypic sect. Torellianae), and two ghost gums (sect. Abbreviatae). Although the latter all form a clade, spotted gum and yellow bloodwood species are interdigitated and taxonomic sections based on morphology do not form groups here. The ghost gums sensu Parra-Osorio et al. [6] are polyphyletic, because clade B also includes two separate clades of sect. Abbreviatae. In addition, clade B contains most of the red bloodwood species (sect. Septentrionales), in which the ghost gums are embedded.
Of 23 species of Angophora and Corymbia represented in the dataset by two or more samples, only four species (A. melanoxylon, C. eximia, C. grandifolia, and C. gummifera) are resolved as monophyletic, whereas most are indicated as paraphyletic or polyphyletic, and two species (C. trachyphloia and C. aparrerinja) have accessions split between the two major ingroup clades (clades A and B). This widespread incongruence between morphological and chloroplast data likely points to a complex evolutionary history in this group. In conclusion, both hypotheses to be tested, 1) that Corymbia is monophyletic and 2) that the currently recognised subgenera of Corymbia are monophyletic, are not supported based on the chloroplast data.
Some phylogenetic signal in the cpDNA data is geographic, and Fig 4 (for clade A) and Fig  5 (for clade B) illustrate the proximity of accession localities that form subclades within the major two ingroup clades. For example, Fig 4B shows the geographic proximity of accessions included in clade A1 (Fig 3), which is composed of members of sections Maculatae and Naviculares, and Fig 5 shows geographic groups within clade B that each include a mix of species from different taxonomic series (see Table 1) within sections Septentrionales and Abbreviatae.

Analysis of nuclear ribosomal DNA
The nrDNA dataset included 663 aligned bases and 337 distinct alignment patterns (see supplementary material S3 File). AIC from jModelTest indicated a General Time Reversible model using gamma distribution of rates and a proportion of invariant sites (GTR+I+G), used for analysis here, and BIC indicated GTR+G as the best fit. Final ML optimization likelihood was -4709.258464. The MP analysis included 170 parsimony informative characters and the alignment had 90 variable but uninformative characters. Maximum parsimony analysis resulted in 1610 trees with length = 644, consistency index = 0.55, and retention index = 0.90. Topologies of the ML and MP trees were similar, and the ML tree is shown here, with MP bootstrap support values mapped onto it (Fig 6).
Analyses strongly supported the monophyly of Eucalyptus and of Corymbia + Angophora (both with ML/MP BS of 100%). Relationships within Eucalyptus were similar to those in the cpDNA tree, in that they resolved the main 'Monocalypt' and 'Symphyomyrt' clades, subtended by subg. Eudesmia (E. erythrocorys), although the position of monotypic subg. Acerosae (E. curtisii) was not resolved with support.
Corymbia was resolved as paraphyletic with respect to Angophora, and that relationship received weak to moderate support (BS of<50% for ML and 71% for MP). Within Corymbia, there was support for the monophyly of subg. Corymbia (ML/MP BS of 74/75%) and subg. Blakella (BS 99/98%). Monophyly was also supported for Corymbia sections Maculatae (BS 75/ 83%) and Naviculares (BS 90/86%). Relationships of other Corymbia sections represented by more than one species were generally poorly supported, i.e.: sect. Septentrionales was resolved as paraphyletic, but with < 50% ML or MP BS; sect. Calophyllae was resolved as paraphyletic, on account of placement of C. gummifera (monotypic sect. Corymbia) with two samples of C. ficifolia with weak support (BS 60/58%); sect. Abbreviatae was resolved as paraphyletic with respect to monotypic sect. Torellianae, but only in the ML tree and with <50% bootstrap support. In conclusion, support was mixed for the hypotheses being tested here, 1) that Corymbia is monophyletic and 2) that the currently recognised subgenera of Corymbia are monophyletic. For hypothesis 1), the data are largely equivocal, there being only weak support for the nesting of Angophora in Corymbia (<50% BS for ML and 71% for MP); for hypothesis 2), the current subgeneric classification of Corymbia was moderately to strongly supported.

Discussion
The relationships within Eucalyptus generally confirm those of previous HTS studies [41] and therefore, our results for relationships among Eucalyptus will not be discussed further as our focus here is on Corymbia and Angophora.

Why are cpDNA relationships incongruent with nrDNA relationships and infrageneric classification of Corymbia?
A key result of the current study is that cpDNA relationships in Corymbia (Fig 3) are largely incongruent with the current circumscriptions of subgenera and sections, and with relationships inferred based on nrDNA (Fig 6). Such incongruence could occur if: A) current Labelling to the right of the tree indicates the outgroup ('OG'), major groups of Eucalyptus, including subg. Eudesmia ('Eudesmids'), the symphyomyrt clade ('Symph'; including subgenera Alveolata, Cruciformes Minutifructus and Syphyomyrtus), and the monocalypt clade ('Mono'; including subgenera Acerosae, Eucalyptus and Idiogenes), as well as subclades of Corymbia clustering by geographic proximity (A1-A3 and B1-B6) referred to in the text and Fig 4 (for clade A) and Fig 5 (for clade B). Species names of species represented by multiple accessions are followed by collection number for newly generated sequences or GenBank accession number (NC) for data generated for previous studies. Bootstrap support values are shown as percent maximum likelihood/maximum parsimony (MP mapped onto ML tree) with weighted edges indicating 100% support for both ML and MP. Support values <50% are omitted or dashed when the alternate analysis method had !50% support. https://doi.org/10.1371/journal.pone.0195034.g003 Evolution of bloodwood eucalypts Corymbia infrageneric groups are poorly defined and in need of taxonomic revision; B) the nrDNA gene tree does not accurately reflect phylogenetic relationships, e.g., as a result of mixing of orthologous and paralogous copies of this multi-copy cistron [60,64,65]; C) the cpDNA gene tree does not accurately reflect the phylogenetic relationships of taxa, e.g., as a result of processes such as incomplete lineage sorting [66,67,68] or chloroplast capture resulting from hybridisation and introgression [69,70]. We infer that the observed incongruence is consistent with the last explanation and, in particular, points to historical hybridisation and cpDNA introgression between lineages, as outlined below.
In conflict with the cpDNA gene tree, evidence for the monophyly of major infrageneric groups in Corymbia (subgenera and sections) comes from a general concordance between phylogenies based on nrDNA sequences (e.g., [6,29,30]) and morphologically defined infrageneric groups. For instance, the analysis of nrDNA presented here (Fig 6), based on ITS sequences, supports monophyly of the two subgenera (subg. Corymbia and subg. Blakella), the yellow bloodwoods (sect. Naviculares) and the spotted gums (sect. Maculatae), and it does not strongly contradict the monophyly of the ghost gums (sect. Abbreviatae). Most of these groups have historically been recognised on morphological grounds, although at varying taxonomic levels (e.g., [2,3,4]). Such concordance, from independent data sources, provides support for the notion that, on the whole, these taxa represent phylogenetic groups. Thus, it is striking that, among the molecular phylogenetic studies of Corymbia, it is only those including chloroplast data that show strongly supported nodes in conflict with the recognition of these groups (current study and [31,40]). Understanding the reasons for this conflict is central to gaining insight into the evolutionary history of Corymbia, and to properly testing its classification.
Chloroplast capture and incomplete lineage sorting are two processes commonly inferred to account for incongruence in plants between chloroplast DNA relationships and nuclear DNA phylogenies/morphological taxonomy. The relative importance of these processes can be difficult to infer or disentangle [71], but some clues can come from knowledge of the reproductive biology of the plants and of geographic patterns of DNA sequence variation. In terms of reproductive biology, a capacity to hybridise and interbreed is a necessary pre-requisite for the transfer of chloroplasts between lineages. In terms of geography, introgression necessarily occurs at particular locations, and can lead to geographic clustering in the sharing of related chloroplast sequences between species [71]. In contrast, such geographic clustering might not be expected in cases where incongruence with taxonomy results from incomplete lineage sorting of chloroplast genomes (e.g. [72]).
Although incomplete lineage sorting cannot be excluded as an explanation for aspects of cpDNA relationships in Corymbia, it seems likely from the reproductive biology of these trees (and that of other eucalypt genera), together with geographic patterns of cpDNA variation, that the observed patterns are largely consistent with a history of hybridisation and introgression. In terms of reproductive compatibility, pre-zygotic barriers to reproduction have been reported between some Corymbia species [73], but all members of Corymbia investigated so far have the same chromosome number (2n = 22; [74,75]). Both morphological variation and experimental crosses [2,73,76,77] provide evidence of substantial potential for hybridisation between species classified in different series, sections, and subgenera (summarised in Fig 7). Given this capacity for hybridisation across infrageneric groups, it seems likely that the taxonomic incongruence of cpDNA relationships in Corymbia could reflect similar processes to those seen in the better studied Eucalyptus. In Eucalyptus, such incongruence is clearly evident, Evolution of bloodwood eucalypts Corymbia with chloroplast variation commonly reflecting geography, rather than taxonomy, and widespread regional introgression of cpDNA between species, series, and sections is regularly inferred [71,78,79,80,81,82,83,84,85].
Geographic patterns in Corymbia observed here (Figs 4 and 5) provide support for a history of cpDNA introgression between species, in the form of geographically clustered cpDNA clades shared among taxa that would be considered distinct lineages based on morphological or nrDNA evidence. Such patterns are consistent with cpDNA introgression between species across a range of morphological infrageneric groups.
Some of the clearest geographic patterns in our dataset relate to geographic cpDNA clades shared among species classified, on the basis of morphology [2], in different taxonomic series (Table 1) within the same section. For example, within sect. Septentrionales three geographic clades in north Queensland each contain a mixture of species placed in different taxonomic series, i.e., clade B1 (containing members of ser. Arenariae, Polycarpae, and Dichromophloiae; Fig 3), clade B2 (containing members of ser. Rhodopes and Abergianae), and clade B3 (containing members of ser. Trachyphloiae and Polycarpae). Likewise, clade B5 (including members of ser. Dichromophloiae and ser. Ferrugineae) is geographically clustered in the Mid-West region of Western Australia. Apart from the monotypic ser. Abergianae, series with species falling in clades B1, B2, B3, and B5 are also distributed across other clades in the phylogeny and hence not monophyletic (e.g., C. dichromophloia, C. hamersleyana, and C. trachyphloia). Similar patterns are also seen in clades of sect. Abbreviatae, where clade B4 includes intermixed representatives of series Polysciadae and Papuanae, and clade B6 includes intermixed members of ser. Grandifoliae and Asperae. Interestingly, in clade B4, the morphologically distinctive species C. polysciada (ser. Polysciadae) is strongly supported as polyphyletic, with the two accessions each having cpDNA haplotypes more closely related to those of closely occurring species from ser. Papuanae (compare Fig 3 and inset on Fig 5B). Although each of these morphologically defined series present in clades B1-B6 cannot be assumed a priori to represent monophyletic groups (e.g., there is no nuclear genetic evidence to support their monophyly), the presence of geographically distinct chloroplast clades found across morphologically distinctive taxa suggests a history of local chloroplast introgression between lineages, much as seen in Eucalyptus.
Geographic patterns in cpDNA clades shared across different taxonomic sections or subgenera are less clear than those among series, but geographic links can still be discerned. For instance, clade A1 (Fig 3) although ranging from north Queensland to southern New South Wales in eastern Australia, has a cluster of samples from south-east Queensland including one of C. citriodora (sect. Maculatae) that groups in the cpDNA gene tree as sister to a clade including a sample of C. leichhardtii (sect. Naviculares) that occurs nearby (Fig 4B). Similarly, clade A2, ranging from northern New South Wales to mid-east Queensland (Fig 4C), includes, in reasonably close geographic proximity, members of sect. Maculatae, together with C. torelliana of monotypic sect. Torellianae. However, the representative of sect. Naviculares in this clade is geographically more distant, and the representative of sect. Abbreviate could not be mapped because the sample is of unknown provenance. At least the geographic and cpDNA association of C. torelliana and the three samples of sect. Maculatae is consistent with known capacity for these groups to interbreed [73,76]. Again, these groups (although not these species) have previously been reported to hybridise ( [77] ; Fig 7).
A striking feature of the cpDNA tree is placement of two clades of ghost gums, subg. Blakella sect. Abbreviatae (clades B4 and B6 ; Fig 3), within the large clade B that otherwise includes almost all samples of the red bloodwood group subg. Corymbia sect. Septentrionales. This contrasts with phylogenetic analyses of nrDNA sequences (Fig 6 and [6 ,25,27,29,30]), in which sect. Abbreviatae groups with other members of subg. Blakella. The geographic clustering of clades B4 and B6, and their distribution in areas where members of sect. Septentrionales are common, especially in the Northern Territory, is consistent with these clades reflecting historical chloroplast transfer from red bloodwoods to ghost gums, e.g., potentially two distinct events of chloroplast transfer, in each case with a red bloodwood as the initial maternal parent.
It is worth noting that taxon sampling for the current study was designed primarily to sample across the major taxonomic groups of Corymbia, to assess their relationships on the basis of chloroplast sequences, and it was not designed specifically to assess geographic patterns of chloroplast variation. As such, there are substantial geographic distances between many samples, and the spread of samples is quite unbalanced, e.g., with only one sample of sect. Abbreviatae from eastern Australia (C. tessellaris, of unknown wild provenance), and none from the north-west of Western Australia. The inferences here of historical chloroplast introgression between major taxonomic groups, especially subgenera and some sections, are consistent with the observed patterns of cpDNA variation, and knowledge of bloodwood reproductive biology, but remain speculative. Our study provides insight into the potential importance of this process in the evolutionary history of Corymbia, but more detailed studies using fine-scale geographic sampling, including multiple replicates of species, are necessary both to properly test for the presence of chloroplast introgression, and to more fully appreciate its significance and any accompanying patterns of nuclear gene flow.

Chloroplast DNA relationships in Angophora
The genus Angophora was strongly supported as monophyletic, as universally found in all molecular phylogenetic studies of eucalypts that have sampled two or more Angophora species (e.g. [6,27,29,30,31]). Most previous studies have included only a small number of exemplars from Angophora and thus both species limits, which differ between some treatments [1,18,86,87], and the proposed infrageneric classification [87], have not been critically tested by molecular data. Our cpDNA study included multiple accessions of four species, three of which were strongly indicated as paraphyletic or polyphyletic (Fig 3); the one species that was resolved as monophyletic, A. melanoxylon, was represented by only two samples from the same geographic area, near St George in south-east Queensland. The sample set here is small (15 samples from 8 species), and was not collected to assess geographic variation within/ between taxa, but it seems reasonable that taxonomic incongruence with cpDNA variation in Angophora, as in Eucalyptus and Corymbia (see above) reflects, at least in part, a history of cpDNA introgression between species. Consistent with this is the resolution of A. subvelutina as paraphyletic (in particular, with one sample shown as sister to a closely co-occurring sample of A. bakeri), and the polyphyly of A. floribunda, in which the northern-most sample of known provenance (TMS14-33; Table 1) falls in a clade of other samples from nearby areas, and is well separated in the phylogeny from the southernmost sample (MJB 2471). An influence of chloroplast introgression on this gene tree would be consistent with the observation of Leach [86] that hybridisation between species of Angophora ". . . has been observed in virtually all combinations that are geographically or ecologically conceivable". As with Corymbia, finescale studies of cpDNA variation could be used to test for both the presence and extent of cpDNA introgression amongst Angophora species.

Implications for genus-level taxonomy
A primary aim of this study was to use HTS chloroplast data, from a broad sample of infrageneric groups, to test the monophyly of the bloodwood genus Corymbia as currently circumscribed. The inferred chloroplast relationships strongly support nesting of Angophora in Corymbia thus making it paraphyletic (Fig 3), as also shown, with less support, in previous cpDNA studies using either more limited taxon sampling or more limited sampling of the chloroplast genome [26,28,31,40,41]. However, given the clear incongruence between the cpDNA gene tree and taxonomic boundaries that are otherwise supported by both morphological characters and analyses of nrDNA, and given the likely influence of historical cpDNA introgression between lineages (discussed above), the cpDNA data, on their own, do not provide a sound basis for assessing generic limits in this group. It is worth noting that it seems unlikely that the close cpDNA relationship of Angophora to some groups of Corymbia could be directly attributed to cpDNA introgression (at least not recently), because Corymbia-Angophora hybrids have not been reported (e.g. [2,73,77]), despite common co-occurrence of species [2] and attempts at artificial crosses (e.g. [73]).
To make sound taxonomic decisions, especially regarding the limits of genera and subgenera, among the bloodwoods and their relatives, better knowledge of relationships based on nuclear DNA sequences is essential. Analyses of nuclear DNA datasets have so far been limited to the ITS, ETS and 5S regions of nrDNA [6,25,27,29,30,38] and a small number of microsatellite markers [39], and have given mixed support for the monophyly/paraphyly of Corymbia. The nrDNA analyses here (Fig 6), for instance, using only ITS data, show 71% BS for the nesting of Angophora in Corymbia in the MP analysis but <50% support in the ML analysis, leaving open the possibility that Angophora might be sister to a monophyletic Corymbia. More thorough assessment of relationships will require analysis of more substantial datasets, for which there are now good prospects using HTS methods [37].
Even if Angophora proves to be nested in Corymbia based on nuclear data, placing them together in one genus might not be the best taxonomic solution for this group. Instead, raising one or more of the infrageneric groups of Corymbia to genus rank might be a better solution for recognising monophyletic, morphologically diagnosable (less heterogeneous) groups and minimising taxonomic upheaval (number of name changes). Adopting such a solution would first require clear understanding of the relationships of the bloodwood lineages to each other and to Angophora, informed by nuclear data, as well as the chloroplast data presented here. In the interim, and in the absence of other strongly contradictory evidence, we support continued recognition of both Angophora and Corymbia and the infrageneric groups of Corymbia as currently defined [6]. This is because there is support for most of these groups based on nrDNA data and morphology, and because name changes that are not soundly based, or might subsequently need revision in the face of stronger evidence, would cause major taxonomic instability in these economically important groups. Angophora species. We are grateful to Neil Gibson (WA) for collecting Corymbia during the ABRS Katjarra Region Bushblitz. In particular, we thank Dean Nicolle for access to the impressive living collection of eucalypts at Currency Creek Arboretum. We gratefully acknowledge Alex Johnson and Julien Bonneau for access to equipment (The University of Melbourne, Plant Nutrition Lab), Liz Milla (Walter and Eliza Hall Institute of Medical Research) for HTS data pre-processing, and Heroen Verbruggen (The University of Melbourne, Algae Lab) and Wake Forest University for computational resources. Will Neal and Harvey Orel assisted with the mounting and databasing of voucher specimens. Plant collecting permits were provided by the Victorian Department of Sustainability and Environment, New South Wales National Parks and Wildlife Service, and Northern Territory Parks and Wildlife Commission.