Using Phylogenetic and Coalescent Methods to Understand the Species Diversity in the Cladia aggregata Complex (Ascomycota, Lecanorales)

The Cladia aggregata complex is one of the phenotypically most variable groups in lichenized fungi, making species determination difficult and resulting in different classifications accepting between one to eight species. Multi-locus DNA sequence data provide an avenue to test species delimitation scenarios using genealogical and coalescent methods, employing gene and species trees. Here we tested species delimitation in the complex using molecular data of four loci (nuITS and IGS rDNA, protein-coding GAPDH and Mcm-7), including 474 newly generated sequences. Using a combination of ML and Bayesian gene tree topologies, species tree inferences, coalescent-based species delimitation, and examination of phenotypic variation we assessed the circumscription of lineages. We propose that results from our analyses support a 12 species delimitation scenario, suggesting that there is a high level of species diversity in the complex. Morphological and chemical characters often do not characterize lineages but show some degree of plasticity within at least some of the clades. However, clades can often be characterized by a combination of several phenotypical characters. In contrast to the amount of homoplasy in the morphological characters, the data set exhibits some geographical patterns with putative species having distribution patterns, such as austral, Australasian or being endemic to Australia, New Zealand or Tasmania.


Introduction
Ever since Darwin's seminal study on the origin of species was published, the issue of delimiting species as the fundamental taxonomic unit has fascinated evolutionary biologists [1][2][3]. Understanding the delimitation of species is crucial for numerous fields of biological and ecological sciences and is important for conservation issues. However, the main challenge is finding and using appropriate character sets and analytical methods to recognize species. This is especially true for organisms with relatively simple morphologies, such as fungi where it is often difficult to find sufficient defining characters, or the choice of which might be viewed as arbitrary. Traditionally, species circumscriptions in lichen-forming fungi are based on phenotypical characters, such as thallus and ascomatal morphology and anatomy, or chemical characters, such as presence of extrolites (secondary metabolites). Environmental factors have been shown to influence phenotypic characters in various groups of lichens [4][5][6]. Also a remarkable amount of morphological disparity within clades has been demonstrated in several clades of lichenized fungi [7][8][9][10][11][12][13] calling the use of these characters to delimit taxa in question. Furthermore, it has repeatedly been shown that the traditional species delimitation underestimates the diversity of these fungi with numerous cryptic lineages discovered under currently accepted species in various unrelated families .
Here we focus on a notoriously difficult group of lichenized fungi, the Cladia aggregata complex. Previous classifications have accepted between one [40] and eight species [5] based on different interpretations of morphological and chemical diversity in the group. The genus is especially diverse in Tasmania with a number of chemo-and morphotypes only known from this island [5]. The group occurs in all vegetation types in Tasmania, including coastal heathland, wet peatlands, sclerophyll forests, rainforests, and alpine vegetation. Within Tasmania, much of the chemical diversity is confined to the south-west of the island. This area is rich in Tasmanian endemics and austral species with restricted geographical distributions [41]. Hence, from a conservational point of view, it is important to understand whether the bewildering morphological and chemical variation is due to a single, highly polymorphic species, or whether these represent distinct clades with restricted distributions.
The genus Cladia belongs to Cladoniaceae (Lecanorales, Ascomycota), which currently includes 16 genera with over 400 accepted species [42]. Most genera in this family have a dimorphic thallus with a crustose or foliose primary thallus and a vertical secondary thallus that bears fruiting bodies [43][44][45]. A few genera in the family also have foliose thalli [46][47][48][49][50]. Cladia spp. have a crustose primary thallus and a fruticose, secondary thallus, often referred to as pseudopodetium. In addition to the species with dimorphic thallus, our previous studies demonstrated that foliose taxa earlier classified in Heterodea and the crustose Ramalinora glaucolivida also belong here [7,8,51]. Currently, 14 species are accepted in Cladia [5, 51,52]. In addition to a remarkable morphological variation, there is an unusually high chemical diversity as well. Seven major chemosyndromes have been detected in the group [5,52]. Some of these chemotypes have been accepted as species [5,53], while other authors regard them as chemical races of a polymorphic species [40,54,55]. In previous phylogenetic studies [7,8], we found Cladia aggregata to be nonmonophyletic, although the taxon sampling in these studies was insufficient to address the species delimitation in the group. These findings and the fact that there is no consensus on the delimitation of species in the group prompted this study.
To address the species delimitation in the C. aggregata complex, we have generated a data set using four loci: internal transcribed spacer of nuclear ribosomal DNA (ITS), intergenic spacer of nuclear ribosomal DNA (IGS), and the protein-coding genes nuclear glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and DNA replication licensing factor Mcm7 (Mcm7). The molecular data were used to perform gene trees in a maximum likelihood (ML) and Bayesian (B/MCMC) framework and coalescent-based species trees. We employed a combination of methods to address species delimitation in the Cladia aggregata complex, using gene tree estimation from single-locus and concatenated data sets, and species tree estimations. As a conservative first estimate we used a genealogical species recognition approach [56] in which presence of clades in the majority of single-locus genealogies is taken as evidence that these represent distinct lineages [57][58][59]. Since recently derived species will remain undiscovered by this method due to incomplete lineage sorting [60,61], we additionally used coalescent-based approaches. The general mixed Yule coalescent (GMYC) method developed by Pons et al. [62] and Monaghan et al. [63] allows, without prior expectations, locating of nodes that define the transitions between intraspecific (tokogenetic) and interspecific relationships. This method uses branch lengths differences to identify nodes that circumscribe species. A number of studies have employed this coalescent approach to successfully delineate species in various groups of organisms [63][64][65][66][67][68][69][70][71]. To choose among different species delimitation scenarios supported by the genealogical and GMYC methods, we employed an approach that utilizes a species tree estimation method to inform species delimitation decisions by a likelihood ratio test that measures the fit of gene trees within a given species tree. Finally, we reexamined morphological and chemical characters of the sequenced samples to study their phylogenetic structure in order to identify characters to distinguish these lineages.
Specifically, we addressed the following research questions: 1) does the Cladia aggregata complex represent a single, phenotypically variable species or multiple species-level lineages?, 2) is the phenotypical variation randomly distributed within the complex or correlated to phylogenetic structure, and 3) is there a correlation of geographical distribution and phylogenetic structure?.

Phylogenetic analyses
Four hundred and eighty-six DNA sequences of 126 representative samples of four loci were assembled, including 474 newly generated sequences (table S1). The specimens included the seven chemosyndromes in the Cladia aggregata complex and four of the currently described species in addition to C. aggregata, viz. C. deformis, C. dumicola, C. inflata, and C. moniliformis. We have not been able to get fresh material of C. mutabilis and C. oreophila and have not been able to obtain sequences from herbarium material of these species from HO. The latter species is a rare taxon that is only known from a few localities in remote mountain ranges of south-western Tasmania, whereas C. mutabilis is restricted to high rainfall peatlands in south-western Tasmania. We included only samples for which sequences for at least three of the four loci were generated in the analysis. Table 1 includes summary statistics for the four single-locus data sets. The single locus ML trees are shown in the figure S1. Since no strongly supported conflicts between the four single-locus ML phylogenetic trees were detected, a combined data set was analyzed. In the B/MCMC analysis of the combined data set, the likelihood parameters in the sample had a mean likelihood of LnL = 210,728.85 (60.354), while the ML tree had a likelihood of LnL = 29,763.28.
The phylogenetic estimates of the ML and B/MCMC analyses were congruent; hence only the ML tree ( Fig. 1) is shown here. The Cladia aggregata s. lat. samples fall into several clades (Fig. 1). The clades discussed below are those identified by the GMYC analyses as discussed below. Some clades consist of strongly supported subclades. There are some associations between clades and chemotypes as well as geographical origins. Six clades (3, 4, 5, 6, 8 & 9) are recognized by both methods of GMYC, whereas three clades (1, 2 & 7) are not identical in the single-and multiplethresholds GMYC modes. Of the nine clades obtained from the single-threshold, three have restricted distributional ranges (clades 6, 8 & 9). Clades 6 and 9 are restricted to Tasmania. Clade 8 includes specimens from mainland Australia and Tasmania. Five clades correspond with specific chemotypes (clades 2, 4, 6, 8 & 9). Clades 2, 4 and 8 include only members containing the barbatic acid chemosyndrome. Clade 9 includes samples with the fumarprotocetraric acid chemosyndrome. Clade 6 includes specimens with the homosekikaic acid chemosyndrome. The multiple-threshold mode of GMYC recognized 12 clades. Five clades corresponded to their geographical origins (1b, 2a, 6, 8 & 9). Clades 1b, 6 and 9 were restricted to Tasmania. Clade 2a included specimens from New Zealand, whereas clade 8 was composed of specimens from mainland Australia and Tasmania. Seven clades contained specimens with a specific chemosyndrome (Clades 2a, 2b, 4, 6, 7a, 8 & 9). Clades 2a, 2b, 4, 7a and 8 contained only members with the barbatic acid chemosyndrome, while the homosekikaic and fumarprotocetraric acids chemosyndromes were found in clades 6 and 9, respectively. Clade 1 includes samples from three currently accepted species, viz. C. aggregata, C. inflata and C. deformis. This clade has an austral distribution, occurring in Australia, New Zealand and southern South America (southern part of Chile). Two samples of C. inflata are clustering together and are nested within this clade, whereas C. deformis forms sister subclades and is clustering with C. aggregata. Five chemotypes are found in this clade, including the atranorin plus stictic, barbatic, fumarprotocetraric, fumarprotocetraric plus stictic and stictic acid chemosyndromes. However, the multiplethreshold mode recognized two distinct clades (1a & 1b) within this clade. Clade 2 consisted entirely of the members currently placed in C. aggregata containing the barbatic acid chemosyndrome and occurring in Australasia (including New Zealand). The multiplethreshold mode separated two different groups (2a & 2b). Clade 2a included only samples from New Zealand. Clade 3 included C. aggregata from Australasia and Southeast Asia (Penang). This clade comprised species containing the barbatic acid and stictic acid chemosyndromes. Clade 4 included the samples of the C. aggregata from Southeast Asia (Thailand and Malaysia), India and Neotropics. This clade comprised species containing the barbatic acid chemosyndrome. Clade 5 included C. aggregata samples from Australasia and La Réunion, which possess either the barbatic acid or homosekikaic acid chemosyndrome. Two morphologically distinct samples of C. moniliformis form clade 6 and contain the homosekikaic acid chemosyndrome. So far, C. moniliformis is only known from Tasmania. Clade 7 comprises specimens from Tasmania containing either the barbatic acid or caperatic acid chemosyndrome. While the majority of barbatic acid containing samples in this clade form one subclade, the other subclade included only two barbatic acid containing samples, which are intermixed with the caperatic acid containing samples (C. dumicola). Interestingly, the multiple-threshold mode placed all C. dumicola samples into one putative species (7a & 7b). Clade 7a contained only the C. aggregata samples with barbatic acid, while clade 7b included mainly specimens of C. dumicola. Clade 8 contained specimens collected in Australia. The specimens in this clade contained the barbatic acid chemosyndrome. Tasmanian collections containing the fumarprotocetraric acid chemosyndrome formed clade 9.

Species Delimitation based on the General Mixed Yule Coalescent (GMYC)
The GMYC model estimates the species delimitation from DNA surveys by identifying independently evolving lineages as a transition from intraspecific (coalescent mode) to interspecific (speciation branching patterns) relationships on a phylogenetic tree. The most recent common ancestral node at the transition point is interpreted as distinguishing species. 58 haplotypes were found in the combined data set. Identical haplotypes and the two outgroup samples (C. schizopora) were removed for the analysis. A maximum likelihood tree obtained from a RAxML search using the combined data set was used for the analyses. The GMYC analyses revealed different results for the single-and multiplethreshold modes.
In the LTT plot ( Fig. 2A) an increase in branching rates has been found at the tips of the linearized trees. In the singlethreshold analysis the GMYC model was preferred over the null model of uniform (coalescent) branching rates (logL GMYC = 2475.368 vs. logL 0 = 2463.811; 2DL = 23.113, p,0.001). Confidence intervals for the estimated number of species ranged from 9 to 11. The model fitted the switch in the branching pattern at 20.202281, leading to an estimate of 9 putative species (Fig. 3).
In the multiple-threshold analysis four switches were identified (Fig. 2B). The GMYC model was preferred over the null model of uniform (coalescent) branching rates (logL GMYC = 2478.457 vs. logL 0 = 2463.811; 2DL = 29.291, p,0.001). Confidence intervals for the estimated number of species ranged from 8 to 13. The model fitted the switch in the branching pattern at 20.2023, Figure 1. Maximum Likelihood tree depicting relationships within the Cladia aggregata complex on the basis of a concatenated data set including nuITS, IGS, protein-coding GAPDH and protein -coding Mcm7 sequences and estimated using partitioned ML analysis. Bootstrap support equal or above 70% is shown (ML) and posterior probabilities equal or above 0.95 are indicated as bold branches. The purple (single threshold) and green (multiple thresholds) shades represented species recognized by GMYC method. doi:10.1371/journal.pone.0052245.g001  20.1379, 20.0970, and 20.0602, leading to an estimate of 12 putative species and five single, unclassified individuals (Fig. 4).
The likelihood values of the single and multi-threshold analyses were not significantly different (p = 0.722), suggesting that the more complex multi-threshold analysis did not yield a significant improvement of the results. The RaxML tree obtained from a concatenated data set was used to illustrate the delimitation of putative species recognized by the single-and multiple-threshold methods of the GMYC approach (Fig. 1).

Evaluation of putative species in a genealogical framework
We used a genealogical concordance approach that takes the presence of clades in single-locus genealogies as evidence that these represent distinct lineages as discussed above. The presence of putative species identified by GMYC and their ML bootstrap support values are shown in Table 2 for the single and multiple threshold methods, respectively. Putative species 2 and 6-9 were present in all single-locus analyses, while the putative species 1 and 5 were present in three of the four single-locus phylogenies (Table 2a). Putative species 3 was present in half of the single-locus analyses, while putative species 4 only formed a monophyletic group in the GAPDH analyses. For the clades that were identified additionally in the multi-threshold GMYC analysis, putative species 1a, 2b, 7a, and 7b were present in all single-locus analyses (Tab. 2b), while putative species 1b and 2a were present in all but one single-locus tree.

Species tree analyses
*Beast analyses resulted in a 9-species scenario species tree where clades 1-5 formed a well-supported clade and the sistergroup relationship of this clade and clade 7 received significant support, while other relationships in the tree lacked support (Fig. S2A). The 12-species scenario *Beast tree had a similar topology (Fig. S2B), in which clades 1-5 formed a well-supported monophyletic group. Also the sister-group relationships of clades 2a and b and 7a and b each received support.
We used the protocol by Carstens and Dewey [72] to generate species trees from a given set of gene trees under different species delimitation scenarios. Using the same set of gene trees, we compared scenarios that included a 9-species, 12-species scenario, and a 1-species scenario, as explained under Material and Methods. An information-theoretic approach that accommodates for number of parameters strongly supported a 12-species scenario (Table 3) over the set of other species delimitation scenarios. The 1-species classification scenario received the lowest likelihood scores.

Association of phenotypical characters with putative species
Morphological and chemical characters and distributional ranges found in the samples belonging to the 9, resp. 12 putative species identified using the two methods of GMYC are listed in Table 4, table S2 and illustrated in Fig. 5. We calculated the homoplasy indices (consistency index CI and retention index RI) for 27 phenotypical characters using the ML tree of the combined data set as reference. A number of characters showed high levels of homoplasy, as indicated by their relatively low CI and/or RI values. Some characters, such as branching of fertile pseudopo-detia, matt vs. glossy surface, and length of conidia, which characterize C. moniliformis show no homoplasy, since they represent autapomorphies of that species. The results of the contingency table tests for the 9, resp. 12 putative species are listed in Table 4. Characters referring to fertile pseudopodetia were excluded from this analysis, since the sampling number was too low (due to numerous sterile specimens of some clades). We also excluded the three autapomorphic characters of C. moniliformis described above. Significant associations of characters with clades in the 9-species scenario were found in 17 characters, including morphology of the sterile pseudopodetia (eight characters) and for nine chemical characters. For the additional four species identified by GMYC using the multi-threshold method, significant associations were found for six chemical characters. Some putative species are characterized by their morphology in combination with their secondary chemistry, while others cannot be distinguished from others based on morphology or chemistry alone. Table 5 lists the character states found in the putative species of characters that were found to have phylogenetic signal in the contingency table tests, and additionally characters of the fertile pseudopodetia when found. In addition the known geographical distribution of each putative species is indicated.

Discussion
This study is another example for the hidden diversity in lichenforming fungi challenging the phenotypically-based species circumscription in these organisms, which underestimates the true species diversity [38]. We also found considerable spatial structure in the data sets with no haplotype occurring in more than one geographic region. This is remarkable for lichenized fungi, where most studies revealed wide distributions of single haplotypes [26,[73][74][75] and less geographic structure. Several putative species have restricted geographical distributions (Table 5), such as endemic to mainland Australia (putative species 1b, 8), Tasmania (putative species 6, 7b, 9), or New Zealand (putative species 2a) or have Australasian or austral distributions (putative species 1a, 2b, 3, 7a), while only two putative species have broad intercontinental distributions. These include putative species 4 with samples from the Neotropics, India and Southeast Asia. Additional sampling, however, is required to test whether the Asian material, which forms a monophyletic and well-supported clade (Fig. 1) is indeed conspecific with the Neotropical specimens, which are paraphyletic. Further, putative species 4 is not supported, indicating uncertainty in the circumscription of this species. Putative species 5 also lacks support and the two samples from Reunion form a wellsupported clade which is a sister-group to the Australasian samples, which also form a well-supported monophyletic clade (Fig. 1). To address the question whether putative species 5 consists of more than one species will require further studies with an extended sampling from Africa.
The Cladia aggregata complex is a controversial group with no obvious correlation of morphological and chemical variation. Some morphological and/or chemical variants have previously been recognized as distinct species. These include Cladia inflata with inflated pseudopodetia [76] and C. moniliformis with inflated pseudopodetia with bulbous segments [53], while others were predominantly recognized based on their chemistries [5]. Another morphologically deviating population was described as C. globosa from the Neotropics, but could not be included in this study since no fresh collections were available [52]. None of the recently described taxa have been tested using molecular data so far. Here we have most of the species currently recognized included in the molecular study, except C. globosa, C. mutabilis and C. oreophila. We have been able to include material of most chemotypes described in the C. aggregata complex.
The previously described species Cladia deformis, C. inflata and C. dumicola [5, 76] were not supported in their current circumscription by our analyses. In fact, the previously recognized species in the C. aggregata group, except C. inflata and C. moniliformis (Fig. 1), did not form distinct clades. The first previously recognized species, Cladia inflata formed a strongly supported clade which was nested within putative species 1a (Fig. 3). However, since it forms a sister-group to the remaining specimens of putative species 1a, additional sampling is necessary to identify the taxonomic status of this species. The samples in this clade were all collected in Australia, New Zealand and southern South America. Cladia deformis clustered within putative species 1b and nested among four different chemotypes of C. aggregata (Fig. 1). Cladia dumicola, which contains caperatic acid, was intermixed with C. aggregata containing the barbatic acid chemosyndrome (Fig. 3) forming putative species 7b.
The use of secondary metabolic characters for delimiting species in lichenized fungi has been supported by some phylogenetic studies (e.g. [30,31,[77][78][79][80], while other studies did not find a correlation between chemotypes and lineages found in molecular phylogenetic analyses [81][82][83][84][85][86]. There is a phylogenetic pattern in the distribution of secondary metabolites among putative species within the Cladia aggregata complex. However, it appears that chemical characters cannot consistently be used to characterize species-level lineages. For example, the barbatic acid chemosyndrome occurred in almost all clades (Fig. 3). Hence, the presence of chemosyndromes alone does not indicate the affiliation of a sample to a lineage in these fungi. Individuals within each of the chemotypes were morphologically variable and individuals in different chemotypes often shared each of morphological traits (Fig. 1). This could be due to the fact that some chemical and morphological characteristics, such as branching pattern or surface colors are homoplasious. The absence of a correlation of some phenotypical characters and phylogenetic placement may be due to different reasons. It seems obvious that they share ancestral polymorphisms and might retain some features of their ancestor's morphology. Moreover, our understanding on the effect of ecological conditions on morphological and chemical variation is still poor. Our study adds another example of our lack of knowledge in understanding adaptive values of phenotypical characters.
The GMYC method has previously been applied to species delimitation in various groups of organisms, such as insects [63], skinks [87], amphibians [88], rodents [71] and rotifers [89]. The single-threshold model identified fewer putative species (9) than the multiple-threshold model (12) in our analysis. Since the likelihood values of the single-and multi-threshold analysis were not significantly different, we employed two additional approaches: 1) a genealogical concordance approach comparing the presence of putative species in single-locus analyses and 2) a coalescent-based approach that uses a species tree estimation method to choose among species delimitation scenarios. Genealogical concordance among different loci provides strong evidence that clades represent reproductively isolated lineages [57][58][59]. However, different loci may give conflicting or ambiguous results due to various evolutionary processes associated with speciation (e.g., incomplete lineage sorting). Thus, using multi-locus sequence data and multiple empirical methods allow us to establish more robust species boundaries [38]. The four additional species identified by the multiple threshold method of GMYC were present in the majority of single-locus analyses and hence support that our data set includes 12 putative species. The coalescentbased species tree method significantly preferred the 12-species scenario. Consequently, we propose that the sampled specimens of the Cladia aggregata complex belong to 12 distinct species.
While some clades, such as putative species 6 (C. moniliformis), are characterized by several characters, other clades can be phenotypically characterized by combination of characters only or lack obvious phenotypical differences to other putative species. Our morphometric analyses demonstrated that several morphological and chemical characters were associated with clades. Since these characters can only be used in combination and to some extent to characterize clades, molecular data are necessary in most cases to unequivocally assign specimens to putative species. Despite the limitations in circumscribing species using molecular data, the effective use of genetic data appears to be essential to appropriately and practically identify natural groups in some phenotypically cryptic lichen-forming fungal lineages [33,37,80,[90][91][92][93]. Our reexamination also identified morphological and chemical characters that characterize, at least in combination, some of the putative species as also found in other groups of lichenized fungi [25,26,36,90,92,94]. Under the general lineage species concept [95,96], more independent properties associated with putative species boundaries are associated with a higher degree of corroboration, resulting in a truly integrative approach to species discovery. Hence the association of phenotypical characters with at least some of the putative species gives us confidence that the

Taxon sampling
The taxon sampling included most morpho-and chemotypes known from the Cladia aggregata group and two samples of C. schizopora as outgroup based on previous molecular studies [7,8]. In total over 150 samples were studied, but here only 126 samples are included for which we obtained sequences of at least three of the four loci studied. In the current classification these belong to 6 species. Loci sequenced were nuITS rDNA, nuIGS rDNA and the protein-coding GAPDH and Mcm7 genes. Specimens and sequences used for the molecular analyses are listed in table S1.
Thermal cycling parameters were: initial denaturation for 5 min at 94uC, followed by 35 cycles of 30 s -1 min at 95uC, 30-45 s at 52-55uC, 1-1.30 min at 72uC, and a final elongation for 10 min at 72uC. The cycle sequencing conditions was as described previously [8].
Sequence alignments were done separately for each data set using BioEdit [103]. Ambiguous regions in the nuITS alignment were removed manually before analysis.

Phylogenetic analyses
To test for potential conflict, maximum likelihood bootstrap analyses were performed on each individual data set with the same settings as in the concatenated analysis and with models as shown in Table 1, and 75% bootstrap consensus trees were examined for conflict [107]. Since no conflicts (i.e. well supported differences in the topology) were found, multi-gene data sets were analyzed under Bayesian approach (B/MCMC) and maximum likelihood (ML).
A Bayesian analysis was performed using MrBayes 3.1.2 [108] with substitution models as shown in Table 1. The data set were partitioned into eight parts including ITS, IGS and the three codon positions of each of the protein-coding GAPDH and Mcm7. Each partition was allowed to have own parameter values [109]. No molecular clock was assumed. Heating of chains was set to 0.2. Posterior probabilities were approximated by sampling trees using a variant of Markov Chain Monte Carlo (MCMC) method. Number of generations was 10 million. To avoid autocorrelation, only every 1000th tree was sampled. The first 40,000 generations were discarded as burn in. We plotted the log-likelihood scores of sample points against generation time using TRACER v1.4.1 (Tracer website. Available: http://tree.bio.ed.ac.uk/software/ tracer/. Accessed 2012 Nov 19) to ensure that stationarity was achieved after the first 40,000 generations by checking whether the log-likelihood values of the sample points reached a stable equilibrium value [108]. Additionally, we used AWTY [110] to compare split frequencies in the different runs and to plot cumulative split frequencies to ensure that stationarity was reached. Of the remaining 19,920 trees (9960 from each of the parallel runs) a majority rule consensus tree with average branch lengths was calculated using the sumt option of MrBayes. Posterior probabilities were obtained for each clade.
A maximum likelihood (ML) analysis was performed for each locus and the combined data set in RAxML 7.2.6 [111] using the GTRGAMMA model with 25 rate parameter categories. Support was then estimated by performing 1000 bootstrap pseudoreplicates [112]. Only clades with bootstrap support equal or above 70% under ML and posterior probabilities $0.95 in the Bayesian analysis were considered as strongly supported. Phylogenetic trees were visualized using the program Treeview [113].

General Mixed Yule Coalescent (GMYC) Species Delimitation
We used the DNA-based approach by Pons et al. [62] to delimit species. This method aims at detecting shifts in branching rates between intra-and interspecific relationships. Within a likelihood framework it uses chronograms to compare two models: a) a null model under which the whole sample derives from a single population obeying a coalescent process and b) an alternative general mixed Yule coalescent (GMYC) model. The latter combines equations that separately describe branching patterns within and among distinct lineages. The latter follow a Yule model including speciation and extinction, whereas intraspecific relationships follow a coalescent process. A likelihood ratio test (LRT) is used to evaluate whether the null model can be significantly rejected. If the GMYC model fits the data significantly better than the null model, the threshold T allows to estimating the number of species present in the data set.
The ML tree obtained from a RAxML search using the combined data set was used for the analysis. The two outgroup samples (C. schizopora) were excluded from the data set using the drop.tip command in ape [114]. A chronogram was calculated from the ML tree using the penalized likelihood method [115] as implemented in the chronopl command in ape. The GMYC method requires a fully dichotomous chronogram and thus we used multdivtime [116] to convert our chronogram into a fully dichotomous chronogram with internal branches of length zero, where appropriate. This modified chronogram was then analyzed using the gmyc package in SPLITS in R (version 2.10, www.cran. r-project.org), using the single and multiple threshold methods. After optimization, we plotted the lineage through time (LTT) plot [117] with the threshold indicated and a chronogram that had the putative species indicated. Finally, we used the summary command to summarize the output statistics, including the results of the LRT and the indication of the numbers of clusters and entities.

Species Tree analyses
Using *Beast [118], we generated species trees for two species delimitation scenarios: the 9-species scenario and the 12-species scenario, as described below. For these analyses we used all loci sampled that were also used in the concatenated analyses described above. Sixty million generations with a burn-in of 20 million were run and trees combined using the program LogCombiner in the Beast software package [119].
We applied a coalescent-based approach to test alternative hypotheses of species delimitation. We used the program STEM [120] following the protocol outlined in Carstens and Dewey [72] to estimate likelihood scores of alternative species delimitation scenarios among species trees. We performed a set of analyses assuming a classification including all samples in the complex in 1 species (1-species scenario), distinction of nine clades as species as suggested by the single threshold method of GMYC (9-species scenario), and accepting 12 clades as distinct species, as suggested by the multiple threshold method of GMYC (12-species scenario). Species tree analyses were performed with maximum likelihood gene trees estimated for each locus separately RAxML v7.2.7 [111,121]. STEM requires fully resolved gene trees, thus polytomies were resolved using MULTI2DI in the ape package [122]. ML scores were evaluated using likelihood-ratio tests (LRTs) to assess statistical significance after correcting for multiple comparisons with a Bonferroni correction.

Association of phenotypical characters with putative species
We used Mesquite [123] to calculate the homoplasy indices (consistency index CI and retention index RI) for 27 phenotypical characters using the ML tree of the combined data set as reference. Contingency table tests were done using the tools available on the in-silico website (Available: http://in-silico.net/ statistics/. Accessed 2012 Nov 19). For this analysis, autapomorphic characters of C. moniliformis were excluded and also traits of fertile pseudopodetia, since the sampling number was too low for those characters.