Genetic Diversity of the Cryptococcus Species Complex Suggests that Cryptococcus gattii Deserves to Have Varieties

The Cryptococcus species complex contains two sibling taxa, Cryptococcus neoformans and Cryptococcus gattii. Both species are basidiomycetous yeasts and major pathogens of humans and other mammals. Genotyping methods have identified major haploid molecular types of C. neoformans (VNI, VNII, VNB and VNIV) and of C. gattii (VGI, VGII, VGIII and VGIV). To investigate the phylogenetic relationships among these haploid genotypes, we selected 73 strains from 2000 globally collected isolates investigated in our previous typing studies, representing each of these genotypes and carried out multigene sequence analyses using four genetically unlinked nuclear loci, ACT1, IDE, PLB1 and URA5. The separate or combined sequence analyses of all four loci revealed seven clades with significant support for each molecular type. However, three strains of each species revealed some incongruence between the original molecular type and the sequence-based type obtained here. The topology of the individual gene trees was identical for each clade of C. neoformans but incongruent for the clades of C. gattii indicating recent recombination events within C. gattii. There was strong evidence of recombination in the global VGII population. Both parsimony and likelihood analyses supported three major clades of C. neoformans (VNI/VNB, VNII and VNIV) and four major clades of C. gattii (VGI, VGII, VGIII and VGIV). The sequence variation between VGI, VGIII and VGIV was similar to that between VNI/VNB and VNII. MATa was for the first time identified for VGIV. The VNIV and VGII clades are basal to the C. neoformans or the C. gattii clade, respectively. Divergence times among the seven haploid monophyletic lineages in the Cryptococcus species complex were estimated by applying the hypothesis of the molecular clock. The genetic variation found among all of these haploid monophyletic lineages indicates that they warrant varietal status.


Introduction
The Cryptococcus species complex includes two basidiomycetous encapsulated yeast species, Cryptococcus neoformans, an opportunistic pathogen, and Cryptococcus gattii, a primary pathogen. Both species are the most common fungal agents of infection of the central nervous system [1]. Two varieties of C. neoformans are recognized, C. neoformans var. grubii (serotype A), which is found worldwide, and C. neoformans var. neoformans (serotype D), which occurs mainly in Europe and South America [1,2,3]. C. gattii [4], was previously known as C. neoformans var. gattii (serotype B and C), and thought to be restricted to tropical and subtropical zones [2,5] until a recent outbreak of cryptococcosis occurred on Vancouver Island, Canada, which has expanded the range of this yeast to temperate regions [6].
Here, we investigated the phylogenetic relationships of the species and varieties of the Cryptococcus species complex to test the hypothesis that each haploid molecular type is monophyletic. We excluded the molecular type VNIII, which contains hybrid individuals of serotype AD. The phylogenetic analysis is based on comparing patterns of sequence variation across four unlinked genes involved in housekeeping, production of secreted enzymes and virulence: actin (ACT1) [17], orotate-phosphoribosyl transferase (URA5) [3], phospholipase B (PLB1) [18] and the gene encoding a 110-kDa neutral metalloendopeptidase (IDE) involved in degradation of insulin in humans and mammals [19]. The first three genes were chosen because they are polymorphic among various groups of the Cryptococcus species complex [3,9,10,17]. IDE is novel for this investigation. This gene was chosen because it is polymorphic between both Cryptococcus species [19]. In addition, this gene is conserved across a wide range of organisms [20,21,22], which makes it a candidate locus to be used for studying interspecies phylogenetic relationships [23]. The herein presented multigene sequencing data revealed seven major haploid lineages within the Cryptococcus species complex and provide further evidence to consider these major molecular types as individual varieties, if not species.

Strains
From a collection of 2000 cryptococcal strains previously analyzed, we selected ten strains each of the haploid molecular types VNI, VNII, VNIV, VGI, VGIII and VGIV, and 13 strains of VGII, representing six continents [9,15,24,25,26,Meyer et al. unpublished data] (Table 1 and 2). Only South America contained strains of every molecular type. In addition, four VNB strains, which until now had only been reported from Africa [15], were used to represent this new molecular type.

Mating type analysis
Primers specific for the MFa and MFa pheromone confirmed that MATa was predominant (66 out of 73 strains). Seven strains possessed the MATa allele: 2 VNIV, 1 VGI, 2 VGIII and 2 VGIV strains (Table 1). Non-specific amplicons were produced by MATa strains of VGIII and VGIV (data not shown). Therefore, bands of a length corresponding to the MATa amplicon were sequenced, and BlastN searches revealed 98% sequence similarity with the mating pheromone a 2 (MFa2) gene of the C. gattii strain E566 (GenBank accession No. AY710429). The two VGIII strains CN043 and LA 622 as well as the two VGIV strains LA 390 and LA 392 were accordingly designated as mating type a (GenBank Accession No. EU408654 -EU408657) ( Table 1 and 2). To our knowledge, this is the first report of the MATa mating type in strains of the molecular type VGIV. To confirm this finding we used primers specific for the SXI1a and SXI2a genes and obtained the same result as above ( Figure 1).

Mating
Mating of the two VGIV strains, LA 390 and LA 392, with the MATa reference strain of C. gattii, serotype C, NIH312 [27] and the crg1a mutant derivatives MATa supermater tester strain, JF101 [28] produced dikaryotic hyphae with fused clamp connections, basidia and bacilli-shaped basidiospores ( Figure 2). However, the strains LA 390 and LA 392 failed to mate with the MATa reference tester strains B4546 [29] and JF109 [28] (data not shown). Therefore, both strains, LA 390 and LA 392, were confirmed to possess the MATa mating type allele. No haploid fruiting was observed when the samples were incubated alone on V8 juice agar.

Phylogenetic analyses of individual and combined loci
Maximum parsimony and Bayesian methods were used to analyze phylogenetic relationships among the 73 cryptococcal strains selected, using four independent genetic loci, ACT1, URA5, PLB1 and IDE.
The four loci of the intron-excluded and intron-included datasets contained a total of 564 and 834 parsimoniously informative characters, respectively. A hypervariable region containing poly-T was found in the intron of the PLB1 gene of some VGII strains. This region was excluded from the analysis due to sequencing problems. A heuristic search of the ACT1 gene sequences resulted in two maximum parsimony trees (Length 351, CI 0.855, RI 0.957); the URA5 gene produced 1152 maximum parsimony trees (Length 208, CI 0.897, RI 0.973); the PLB1 gene produced six maximum parsimony trees (Length 395, CI 0.922, RI 0.994); and the IDE gene produced 24 maximum parsimony trees (Length 113, CI 0.912, RI 0.993). Character information and substitution models of each locus are presented in Table 3. Bayesian analyses revealed topologies very similar or identical to those obtained using maximum parsimony. The topologies of each gene in both intron-excluded and intron-included datasets were identical or very similar (data not shown).
To focus on the relationships among species and subgroups, and to avoid detecting incongruence among recombining strains within the lineages, a subset of strains was used for the analysis of combined data from four loci. This subset included a single representative strain from each of the major lineages identified by the single locus analyses (see Material and Methods for the strain numbers). Prior to the analysis, the genes were tested using incongruence length difference/partition homogeneity test (ILD/ PHT), which revealed no significant incongruence among the loci, when a conservative threshold of P,0.0001 was used. Because ILD test is prone to type I errors of incorrectly rejecting the null hypothesis of congruence among the datasets, a conservative threshold of P,0.0001 is recommended for interpreting the results of this test [30]. In our case, the P value of the ILD test of the combined dataset was 0.002, indicating that the null hypothesis of congruence cannot be rejected [30].
A heuristic search of the combined loci found 110 maximum parsimony trees (Length 1089, CI 0.864, RI 0.984) without introns and 1023 maximum parsimony trees (Length 1706, CI 0.866, RI 0.984) with introns. Bayesian analyses of the combined dataset revealed that topologies were very similar or identical to those obtained using maximum parsimony. Bayesian analyses of the data, including both exons and introns as well as excluding the introns generated phylograms with identical topologies and comparable statistical support. Both analyses strongly support the monophyly of C. neoformans and C. gattii ( Figure 3). As expected, Table 1. List of strains used in this study, including general strain information, serotype (ST), mating type (MAT), molecular type (MT) and the allele assignment for the four genes used in the multigene analysis. the VNI and VNII molecular types, which represent C. neoformans var. grubii have a sibling relationship with the VNIV molecular type, representing C. neoformans var. neoformans. In addition VNIV is basal to VNI and VNII ( Figure 3). Clades representing molecular types VGI, VGIII and VGIV are more closely related to one another than to the sibling VGII clade, which is positioned basal to them ( Figure 3). the VGIII clade. For convenience, these groups will be referred to as the ''VNII-1 group'' and the ''VGIV-1 group'', respectively ( Figure 3). For the C. neoformans taxa, the topology of the individual loci never conflicted with the phylogram of the combined loci. The analysis of the two C. neoformans var. grubii clades (VNI and VNII) confirmed their sibling relationship with the C. neoformans var. neoformans clade (VNIV) ( Figure 4). Conversely, the topologies of the individual loci of C. gattii conflicted with the phylogram of the combined loci. On the PLB1, URA5 and IDE phylogenies, the VGI, VGIII and VGIV clades are clustered together and formed a sibling group with VGII. However, the phylogram generated from ACT1 has a different topology in which, VGII strains are subdivided into two subclades. The deep branches of the ACT1 phylogram did not achieve strong statistical support in the C. gattii clade, unlike the combined data set. However, both analyses generated high statistical support for the major clades, with the exception of the VGI and VGII clades of the URA5 phylogram ( Figure 4).
The VNII-1 strains formed a monophyletic group, which was positioned between the VNI and VNII clades. Although Figure 3 and 4 suggest that the VNI-1 clade is more closely related to VNI, Figure 5 indicates that it has partial VNII sequence characteristics. The URA5 sequence analysis of the VNII-1 isolates revealed that these isolates are missing the recognition site for Sau96I at position 160, which is a specific, identifying marker of VNI isolates ( Figure 5), and the overall sequence similarity of the combined four loci showed that those isolates were more similar to VNI (99.41%) than VNII (98.91%) (Table 4).
Similarly, strains from the VGIV-1 group did not consistently form a monophyletic relationship with any of the C. gattii molecular type clades and were not supported by either analysis, except for the PLB1 gene, whose topology was similar to that of the combined loci (Figure 3 and 4). However, the sequence analysis of the four genes revealed VGIII strains that contained integrated parts of VGIV sequences, which resemble the intermediate position of the VNII-1 strains of C. neoformans ( Figure 5). The URA5 sequence analysis of the VGIV-1 isolates revealed that those isolates share the recognition site for Sau96I with VGIV at position 542, which lead to their identification as VGIV ( Figure 5), but the sequence similarity varied specifically for each of the four loci we investigated. The overall similarity to VGIII was 98.87% and to VGIV, 97.06%, resulting in an exchange of the position of those isolates in the individual gene trees (Table 4).

Recombination and clonality
To determine the extent of clonality and recombination in populations of different molecular types, we used three different tests of linkage disequilibrium, (i) the incongruence length difference/partition homogeneity test (ILD/PHT), (ii) two measures of index of association (I A and rBarD), and (iii) the phylogenetic incompatibility test. Since clonal reproduction can mask the effect of recombination, we prepared two different databases: one included all strains of each molecular type and the other included the clone corrected data of each molecular type from which identical genotypes were removed.
The ILD/PHT test showed similar results for each dataset, including and excluding individual genotypes (clone corrected). With the exception of VGII, the null hypothesis of clonality was not rejected. VGII showed incongruence in the phylogenies that allowed the rejection of clonality (P,0.0001).
In the second test we used two indexes of association measures, I A and rBarD. These tests are expected to be zero if populations are freely recombining and greater than zero if there is association between alleles (clonality). In the I A and rBarD tests including all isolates, the null hypothesis of recombination was rejected in all molecular types with the exception of VNI, which had the lowest value (Table 5). However, when the clonally corrected dataset was used, the I A and rBarD showed clear evidence of recombination in the molecular types VNI, VGII and VGIII (Table 5) and   confirmed the predominance of clonal reproduction among the other molecular types (VNII, VNIV, VGI and VGIV). The third analysis, the phylogenetic incompatibility test only rejected the null hypothesis of random mating for the inclusive datasets of VGIII and VGIV (Table 5).

Genetic variation of each locus
Overall, the ACT1 gene was the most conserved locus ( Table 6). Analysis of the sequence variation of each molecular type revealed that sequences of the VNIV and VGII clades were the most variable in C. neoformans and C. gattii, respectively (Table 6). In contrast to the overall comparable genetic variation of the Cryptococcus species complex, the IDE locus was exceptionally conserved among individual molecular types ( Table 6). The overall genetic diversities of C. neoformans and C. gattii were similar (P,0.376), but the clades of C. gattii were more diverse than those of C. neoformans ( Table 7). The genetic diversity between clades was higher for C. gattii (VGI-VGIV) than C. neoformans var. grubii (VNI and VNII) (P,0.001) and C. neoformans var. neoformans (VNIV) (P,0.038) ( Table 7).  Table 3. Phylogenetic characters of the intron-excluded and intron-included data sets.

Evolutionary Divergence
Maximum-likelihood estimations with and without a molecular clock demonstrated that every gene did not differ significantly from molecular clock expectations (P = 0.99). Comparing the two species, the C. gattii lineages (12.5 million years ago) evolved later than the major C. neoformans lineages, C. neoformans var. grubii and C. neoformans var. neoformans (24.5 million years ago), suggesting more recent recombination events (Table 8 and Figure 6). However, the most recent speciation event took place around 4.7 million years ago splitting the two monophyletic lineages within C. neoformans var. grubii, VNI and VNII (Table 8 and Figure 6).

VNII-1 group is identical to the previously identified VNB molecular type according to parsimony and Bayesian analysis
To determine whether VNII-1 isolates cluster with molecular type VNB, we performed a phylogenetic analysis with representative isolates of each molecular type including sequences of VNB isolates previously identified (bt1, bt31, bt131, bt22) [15]. The ILD test showed congruence for all pairs of the genes except IGS1 when pairing the data with URA5 and GPD1 (P,0.0001). However, since the IGS1 was used to describe the VNB molecular type [15], we included IGS1 in the combined dataset. A heuristic search of the ACT1 gene sequences found five maximum parsimony trees (Length 796, CI 0.884, RI 0.928). The tree topologies from the Bayesian analysis were similar to that found in the parsimony analysis. Despite the lack of significant support (,75 for parsimony bootstrap and ,95 for Bayesian posterior probability), the VNII-1 strains were related to the VNB strains ( Figure 7). Strain M27053 was closely related to strain bt31 with 100% support from the Bayesian posterior probability and 72% support from maximum parsimony bootstrap. The clade containing strain M27053 was significantly supported with a Bayesian posterior probability of 95% (Figure 7). The sequences were deposited in GenBank under the following accession numbers:

Discussion
The multigene phylogeny based on the ACT1, URA5, PLB1 and IDE sequences provides an additional support for the currently accepted two species concept for the pathogenic Cryptococcus species complex (C. neoformans and C. gattii). The seven clades in the phylogenetic trees reflect the haploid molecular types recognized previously by M13 fingerprint, AFLP and RFLP analyses [8,9]. For both species the obtained phylogenetic trees correspond to these molecular types and are highly supported by both maximum parsimony and Bayesian analyses. Clinical and environmental isolates clustered together in the respective major molecular types   as previously determined in an analysis of six genetic loci (ITS1/2, IGS1, CNLAC1, RPB1, RPB2 and TEF1) [14]. Here we found that the sequence diversity was comparable among strains of C. neoformans and C. gattii strains. Within the sample of C. neoformans, strains of C. neoformans var. grubii (VNI, VNII-1 = VNB and VNII) were more than twice as variable as strains of C. neoformans var. neoformans, confirming data reported in previous studies using different genetic loci and different sets of strains [8,13]. MLST analysis of C. neoformans strains from sub-Saharan Africa had also revealed extensive genetic diversity among C. neoformans var. grubii strains [31]. Based on the four loci analyzed, C. neoformans and C. gattii share an equivalent level of molecular variation. Extensive surveys of the Cryptococcus species complex in South America have revealed extensive genetic diversity among strains of C. gattii, as well as the coexistence of both mating types in nature and evidence of recombination [9,26,32]. These points, taken together, suggest that the evolutionary origin of C. gattii may be in South America.
Occasionally, molecular typing based on M13 fingerprinting, AFLP and RFLP analysis is misleading. We obtained conflicting results when comparing strains that were typed using PCR-based (PCR fingerprinting, RFLP and AFLP) analyses and sequencebased methods. For example, strains of ''VNII-1'' were genotyped as VNII according to the URA5 RFLP analysis but following multigene sequence analysis, they clustered with the VNI clade. Similarly, ''VGIV-1'' strains were identified as VGIV by URA5 RFLP analysis but clustered with the VGIII clade on multigene sequence analysis. Moreover, URA5-RFLP analysis of the VNII-1 and VGIV-1 clades gave different molecular types from those obtained by PLB1-RFLP analysis (data not shown). These discrepancies may be attributable to insufficient markers, genetic drift or recombination events ( Figure 5). Similar to results presented by others [11,12], our data indicate a lack of geographic concordance with the phylogeny, which suggest recent global dispersal of the Cryptococcus species complex. In addition, the apparent bias of MATa over MATa, as reported in previous surveys [33,34], would limit the ability of the yeast to undergo sexual reproduction and recombination, which would retard speciation. Thus, clonal populations have been documented in several cryptococcal habitats, including Australia [35,36], Thailand [37] and Canada [6] where the MATa is scarce.
To investigate the question of recombination within the Cryptococcus species complex, we analyzed the congruence or incongruence of individual gene genealogies (ILD/PHT test) [38,39] and the linkage disequilibrium (I A , rBarD and phylogenetic incompatibility) [38,40] to infer the extent of clonality. Though clonal, asexual reproduction is the main reproductive mode in the Cryptococcus species complex, the existence of recombination and putative sexual reproduction has been demonstrated in several natural populations [31,35,41,42]. In this investigation, the VGII clade was the only molecular type displaying unequivocal evidence of recombination in all the tests. Recent studies have provided evidence of sexual reproduction among strains of VGII on a global scale [43] and in local populations of Australia [35,44]. Frequent recombination in the global population of VGII could explain the capacity of this genotype to expand its ecological range, as has been seen in the case of the Vancouver Island outbreak [6]. In contrast, the global populations of VGI, VGIV and VNII seem to be predominantly clonal. Previous analysis of AFLP and MLST data also found limited evidence for recombination for these molecular types [15,35]. However, recent studies of selected populations found evidence for genetic re-assortment among environmental strains of VGI [45] and from veterinary strains of VNI [46]. For VNI and VGIII, we observed evidence of linkage equilibrium, but the ILD/ PHT analysis data showed the presence of clonality. Among global isolates, our results reflect clonality for most molecular types within the Cryptococcus species complex except VGII, but they do not reject the possibility of sexual reproduction within any molecular type if sampled within a circumscribed population [e.g. 47]. The ILD/PHT P value for each gene combined individually (P values ranged from 0.010 for ACT:PLB1 to 0.835 for URA5:IDE) and all genes combined (P = 0.002) showed no significant incongruence using a cut off point of 0.0001 [30], indicating a low degree or absence of recombination between the major molecular types, when only strains representative of the main lineages were used.  The multigene parsimony and Bayesian trees of the ACT1 gene of C. gattii showed some incongruences in the topologies between the molecular types, indicating that more recent recombinational events may have occurred. This observation is consistent with the estimation that the clades of C. gattii diverged prior to those of C. neoformans, as indicated in Figure 6. This divergence time is approximately in agreement with previous studies, which used a different set of genetic loci [11], but closer to the recent estimations based on the whole genome, which estimated a split between the two species at 80 million years [48]. The time since divergence between each molecular type clade of C. gattii (11.7 million years between VGIV and VGI/VGIII and 8.5 million years between VGI and VGIII) is much more than between VNI and VNII (4.7 million years ago) ( Figure 6). In addition, the divergence between VGII and VGI/VGIII/VGIV estimated to be 12.5 million years is in a similar range as the previously obtained divergence between the two varieties of C. neoformans, C. neoformans var. grubii and C. neoformans var. neoformans [11]. Hence, changing each clade to at least varietal status should be considered.
The VNB clade is recognized as a unique and highly diverse subpopulation of haploid isolates found in southern Africa [15]. Since the current study identified a special clade for the VNII-1 strains in the phylogenetic trees, which was in a similar position as the previously described VNB strains [15,49], we conducted a preliminary phylogenetic analysis of the VNII-1 strains using four genetic loci, PLB1, URA5, GPD1 and IGS1, and included a representative strain from each VNB sub-group: bt1 (VNB-A), bt22 (VNB-B), bt131 (VNB) and bt31 (VNB-C) [15]. The sequence analysis showed that the VNII-1 strains are closely related to the VNB clade (Figure 7). One of the strains, bt31, was almost identical to the strain M27053, which originated from South Africa. The other two isolates, Hamden C3-1 and LA511, which originated from South America (Brazil and Colombia, respectively), had related sequences, but without significant support from both analyses. However, these results suggest that VNB strains may not be unique to southern Africa. Although the clade containing strains Hamden C3-1 and LA511 did not receive significant support here, a previous study showed that strain Hamden C3-1 clustered with a VNB strain [14]. Considering their sequence similarities (Figure 5), VNB and VNII-1 strains may represent a link between the VNI and VNII clades. Whether VNB merits status as a distinct molecular type remains an open question, which needs to be addressed specifically in further studies with more VNB strains to be collected from Africa, South America and elsewhere. Bovers et al., 2008 have established a link, using MLST and AFLP analysis, between VNB isolates and their AFLP group 1A, where their AFLP group 1B clusters with isolates of the molecular type VNII [14]. URA5-RFLP analysis is not able to make this differentiation, where both AFLP groups, AFLP1A and AFLP1B, are associated with VNII, which includes the VNII-1/VNB isolates. Perhaps more importantly, DNA sequence markers are more reliable and discriminatory than AFLP genotypes. However, this report has also demonstrated that additional markers as well as additional strains are needed to resolve the number of legitimate clades of both C. neoformans and C. gattii.
A group of strains (VGIV-1) with an unusual clustering was also found in the C. gattii clade. The VGIV-1 strains clustered between VGIII and VGIV. To date, all the strains of VGIV-1 have been isolated in South America. If this geographic origin is substantiated, they perhaps represent a recombining population. Similar to VNB strains in southern Africa [31], there is a large population of fertile MATa strains in South America [26,50].
Two of those unusual clustering C. gattii strains (LA 390 and LA 392, both from Mexico) are the first VGIV strains identified as being MATa. The mating type of the two strains has been identified by two independent mating type specific PCR, which amplified only the MFa pheromone gene and the SXI2a gene from both strains. The mating type was confirmed by mating experiments with the C. gattii, serotype C, MATa reference tester strain NIH312 [27] as well as with the MATa supermater strain JF101 [28] (Figure 2).
This study provided further information on the genetic diversity among members of the Cryptococcus species complex showing that it consists of at least seven monophyletic lineages, excluding the AD hybrid (VNIII) strains. The present study highly supports the two species concept recognized currently. The genetic variation found between the major molecular types of C. neoformans var. grubii is in the same range as the genetic variation found between C. gattii types. There are three monophyletic phylogenetic clades within the species C. neoformans two closely related lineages (VNI and VNII) observed in C. neoformans var. grubii and a basal lineage for C. neoformans var. neoformans (VNIV). The four molecular types of C. gattii (VGI, VGII, VGIII and VGIV) are highly supported by both maximum parsimony and Bayesian analyses. A similar topology was obtained previously investigating the intergenic spacer regions [12] and PRP8 inteins [13], with high support (over 80%) from parsimony and Neighbor Joining analyses, respectively. The molecular type VGII forms the basal clade within the C. gattii branch of the phylogenetic tree and is phylogenetically more distant than the other three sibling clades, VGI, VGIII and VGIV. This result correlates with the finding of a specific mating characteristic for the VGII Vancouver Island outbreak isolates, which produced a basidium on stunted filaments close to the surface of the colonies when mated with other VGII strains [28]. The same morphology was also obtained by matings between other pairs of VGII isolates from different parts of the world [51]. This study revealed a level of genetic variability among the different molecular types/monophyletic lineages, which are comparable to, or in fact slightly greater than that found for C. neoformans var. grubii and C. neoformans var. neoformans.
The molecular clock analysis of the genetic diversity among the seven haploid major molecular lineages support the phylogenetic species concept, in which strains that form consistently the same monophyletic groups should be considered to be independent species [52,53]. Especially the molecular types within C. gattii should be considered as varieties. However, the occasional hybrid strains that form between either of the two currently recognized species ( [16], Trilles and Meyer pers. communication) or between the different monophyletic groups [54,55] question the biological species concept [56], which defines a species as consisting of members that are able to mate and undergo sexual recombination. Therefore the question: whether there are more than two species within the Cryptococcus species complex or whether the molecular types/monophyletic lineages within each species deserve varietal status, remains open. A combination between the six recently published loci [14] and the four reported here would be desirable together with further morphological and mating studies to draw final conclusions.

Studied strains
Seventy three strains composed of at least ten representative strains of each haploid molecular type of the Cryptococcus species complex from different parts of the world, and the outgroup species Cryptococcus albidus were retrieved from the Molecular Mycology Research Laboratory culture collection at Westmead Hospital, University of Sydney, Westmead, NSW, Australia ( Table 1). The four VNB strains were retrieved from the Culture Collection of the Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC, USA. The hybrid molecular type, VNIII (serotype AD), was excluded from the study due to ambiguity of phylogenetic Figure 7. Genealogy of the combined dataset (PLB1, URA5, GPD1, IGS1) showing that the VNII-1 group isolates cluster with the VNB isolates previously described [15]. doi:10.1371/journal.pone.0005862.g007 resolution in terms of variation in numbers of copies of each locus, and deserves to be studied separately [42,49]. The strains were grown on Saboraud Dextose Agar (2% glucose, 2% peptone and 2% agar) for 72 hr before extracting DNA.

DNA extraction
DNA extractions were performed by the liquid nitrogen grinding method as previously described [9]. Genomic DNA of the other outgroup Filobasidiella depauperata was kindly provided by Ferry Hagen (CBS -Fungal Biodiversity Centre, Utrecht, the Netherlands).

Serotyping
The serotypes of the studied isolates were determined by the slide agglutination test using the Crypto Check Kit according to the manufactures instructions (Iatron Labs. Tokyo, Japan).

Identification of the molecular type
URA5-RFLP analysis using the enzymes HhaI and Sau96I was performed initially to verify the molecular type of each studied strain as previously described [9].

Determination of the mating type by PCR
Two pairs of mating-type specific primers were used. MFaU and MFaL primers identified the a-mating type (MATa) [57]. MFa2U and MFa2L recognized the a-mating type (MATa) [28] ( Table 9). The MATa strains of VGIII and VGIV were confirmed by sequencing the mating type specific amplicons and comparing them with the appropriate homologous sequences in GenBank uisng BlastN searches. MATa of VGIV were additionally verified using a second set of mating type specific primers, SXI1aF and SXI1aR identifying MATa and SXI2aF and SXI2aR recognizing MATa [44] ( Table 9). The used PCR conditions have been published previously [28,44,57].

Mating
The fertility of MATa VGIV strains was investigated by mating them with the following MATa and MATa reference tester strains: C. gattii clinical, serotype C, strains NIH312 (MATa) [27] and B4546 (MATa) [29], and crg1a mutant derivatives (''supermater'' tester strains), JF101 (MATa) and JF109 (MATa) [28]. The MATa VGIV strains (LA390 and LA392) reported herein were cocultured with each reference tester strain. Cells from each mating partner were mixed on V8 juice agar (5% [vol/vol], 3 mM KH 2 PO 4 , 4% [wt/vol] agar, adjusted to pH 5 [58]) and incubated in darkness for up to four weeks. Sexual reproduction was confirmed by the presence of chains of basidiospores and fused clamp connections. All strains were individually incubated on V8 juice agar as described above to differentiate haploid fruiting from mating.

Chromosomal location of the gene loci studied
Four independent genetic loci were used in the multigene sequence analysis. The chromosomal location for each of those genes was determined via BlastN searches against the genome of the serotype D strains JEC21 [59] in GenBank. The ACT1 locus is located on chromosome 1, the PLB1 locus is located on chromosome 13, the URA5 locus is located on chromosome 7 and the IDE locus is located on chromosome 12.

Gene amplification and sequencing
The ACT1 [17], URA5 [3], PLB1 [18] and IDE [19,22] genes were amplified from each strain using specific primers (see Table 9). Each PCR contained 50 ng of genomic DNA, 50 ng of each primers, 0.2 mM dNTP, 3 mM MgCl 2 and 0.5 unit of either AmpliTaqH (Applied biosystems, CA, USA), BioTaqH (Bioline, NSW, Australia) or ExTaqH (Takara, Shiga, Japan) DNA polymerase with 1X of compatible buffer in a total volume of 50 ml. PCR conditions for the ACT1 gene amplification were as follows: 3 min of initial denaturation at 94uC, followed by 35 cycles of 45 seconds at 94uC, 45 seconds at 56uC, 1 min at 72uC and terminating with 7 min of final extension at 72uC. Amplification conditions of the other genes were similar to the protocol for ACT1, but the annealing temperatures and extension times differed, as follows: URA5 62uC and 2 min; PLB1 for strains of VNI, VNII and VNIV 60uC and 2 min; PLB1 for strains of VGI, VGII, VGIII and VGIV 58uC and 2 min; and IDE 62uC and 1 min. ACT1 of F. depauperata was amplified directly using the primers CNACT1 and CNACT1R whereas and the ACT1 of C. albidus was amplified using the primers ACT1CAF1 and ACT1CAR1 ( Table 9). URA5 of F. depauperata was amplified using the primers URA5DF and SJ101 (Table 9). To resolve nonspecific bands generated by several URA5 amplifications of F. depauperata we cloned the amplicons using the pGEMH-T Easy Vector System I according to the manufacturer's protocol (PromegaH, NSW, Australia). PCR amplicons and the URA5 plasmid were purified and sequenced by the ABI Big dye Terminator method (Macrogen Inc., Korea) using the amplification and additional internal primers (Table 9). Sequences were assembled using either the program Sequencher version 4.6 (Gene Codes, MI, USA) or Bioedit version 7.0.5.3 [60]. Intron and exon positions were determined by aligning the sequences with the reference sequences of each gene as follows: ACT1 (GenBank Accession No. U10867), URA5 (GenBank Accession No. AF032436), PLB1 (GenBank Accession No. AF223383), IDE (GenBank Accession No. XM568105). Sequences including and excluding introns were aligned with Bioedit version 7.0.5.3 [60] using Clustal W [61]. Sequences were deposited in Genbank (see below). Combined datasets were created by a combination of the ACT1, URA5, PLB1 and IDE sequences. Sequences from the closet siblings of the pathogenic Cryptococcus species complex: F. depauperata and C. albidus were used as outgroups [1,62].

Phylogenetic analyses
The phylogenetic relationships within the Cryptococcus species complex were inferred by parsimony and likelihood methods. Parsimony analysis was conducted with the program PAUP* 4.0b10, using the heuristic search option [63]. For the maximum parsimony analysis starting trees were obtained by stepwise addition with 100 random sequence additions. Tree bisectionreconnection (TBR) was used for branch-swapping. Maximum parsimony phylograms were generated for each of the four loci, as well as the combined datasets, which included every locus of all the isolates. Bootstrap analysis using 500 heuristic replicates was used to estimate support for the clades of each locus with MaxTree set to 100. Gaps in the sequences were treated as missing data and all characters were equally weighted.
Bayesian phylogenetic analysis was used to estimate the probability of the taxonomic structure given the data at each individual locus, as well as the combined data from all four loci using MrBayes 3.1.2 [64]. First, the model of nucleotide substitution that best fit the data of each locus was determined by a likelihood ratio test using the program PAUP* 4.0b10. The likelihood score files, created in PAUP, were then analyzed with the program ModelTest 3.7 on the ModelTest server 1.0 (http:// darwin.uvigo.es/software/modeltest_server.html) using the Akaike information criterion (AIC) [65]. In the analysis of the combined loci, parameter estimates of each locus were unlinked, allowing independent substitution models for each locus. Two analyses were performed simultaneously and used to calculate the posterior probabilities, as estimated from uniform priors, of the clades of each locus and the combined loci. Each analysis included four simultaneous and incrementally heated Markov chains; each replicate used default heating values. Markov chains were initiated from a random tree and were run for 1,000,000 generations. Samples were taken every 100 th generation. Standard deviations of split frequencies of the two runs were monitored until they converged. The last 5,000 samples of each analysis containing the standard deviations $0.02 were used to generate consensus trees using PAUP under the 50% majority rule. The clade posterior probabilities of each clade and the overall topology of each replicate were compared to verify that each consensus tree converged on a similar phylogeny.

Sequence similarity determination
The sequence similarity matrix among the major haploid molecular type clades was created by the program MEGA version 4.1 [66] based on the intron-included dataset. P-distances were generated and converted to similarity percentage using Microsoft Excel program. Standard errors were computed using 500 bootstrap replicates.

Combinability assessment
Prior to combined analyses the combinability of the data were explored using incongruence length difference/partition homogeneity test (ILD/PHT) [67,68] implanted in PAUP. To avoid detecting incongruence that is expected within lineages (see below), IDL/PHT was restricted to datasets containing only single reference strains representing the major haploid molecular types and the major lineages obtained by the phylogenetic analyses of the Cryptococcus species complex (strains: ATCC90112, WM148,  M27053, WM626, HamdenC3-1, RV58146, WM629, JEC21,  WM179, MCS022, WM178, RB52, WM175, LA644, WM779,  M27055, LA390 and LA568). The ILD/PHT analysis used only informative characters and entailed a random stepwise-addition maximum parsimony heuristic searches with 10000 replicates (TBR; maxtrees = 500). Since a significance threshold of 0.05 forces the ILD/PHT computation to be too conservative [30], the null hypothesis of congruence was rejected only if P,0.0001.

Recombination assessment
To determine the extent of recombination, if any, within each molecular type, we implemented three complementary tests: the ILD/PHT test, the Index of Association (I A ) and the phylogenetic incompatibility.
In the first test, we used the ILD/PHT with the objective to find signals of recombination in our dataset. In the absence of recombination, all the loci should provide the same phylogenetic result and therefore all the gene trees should be compatible and alleles at different regions should be associated. In contrast, in a recombining population each locus is expected to have a unique evolutionary history resulting in a different phylogeny for each locus thus this phylogenies are incongruent and incompatible. This test was conducted using PAUP including all the isolates under the same conditions explained previously.
In the second test we used the I A , that is the most common measure of multi-locus linkage. The I A uses an estimate of multilocus association within a haploid population through assessment of the number of loci that are different when individuals are compared. The expected difference between a recombinant population and a clonal one is that a clonal population will have a skewed distribution with a statistically significant excess of extreme distances, whereas a recombinant population will have a genetic distance with a normal distribution [47,69]. In view of the fact that the number of loci analyzed can influence the expected value of the I A we complemented these analysis using the statistic rBarD that is a modification of I A , which removes this dependency on number of loci. For I A and rBarD analysis, we used the program MultiLocus 1.3 [70] (http://www.agapow.net/software/ multilocus/) using 1000 artificially recombining datasets. The null hypothesis of recombination is rejected if the I A or rBarD of the observed dataset is significantly (P,0.05) different than the range of values produced from the artificially recombined datasets.
In the third test, we calculated the proportion of pairwise loci that were phylogenetically incompatible. Two loci are compatible if all the observed genotypes are explainable by mutation rather than recombination or homoplasy [71]. For example, if there are two loci with two alleles each, the two loci are incompatible if all four possible genotypes can be found in the population, showing a possible genetic exchange between strains. This test was performed using the program Multilocus 1.3 using 1000 artificially recombining datasets. P values of less than 0.05 indicate that the hypothesis of random recombination should be rejected.

Genetic variability analysis
Genetic variations of each locus were determined by using the program PAUP* 4.0b10. The variation among all sequences, as well as the parsimoniously informative characters, was calculated from each reference molecular type strain and from all cryptococcal strains.
Numbers of polymorphic characters of the combined loci from the intron-excluded dataset of each clade were compared using the Chi-Square test. The statistical analysis was performed with the program SPSS 15.0.0 (LEAD Technologies, Inc., IL, USA) and significance was defined when P value,0.05.

Estimation of genetic divergence among lineages
Genetic divergence among each lineages was estimated as previously describe [11] with modification. A maximum-likelihood tree for each gene was created based on the selected evolutionary models with and without molecular clock and the Likelihood Ratio Test statistic was calculated. Chi-square test was performed to estimate whether the four genes evolved according to the molecular clock model (P value.0.05). Distances according to the selected evolutionary model for each gene were calculated from the maximum-likelihood tree [63]. Estimates of the time of divergence and hybridization assumed the consensus mutation rate of 2610 29 per nucleotide per year for protein coding genes [11,72,73]. The distance was calculated between the most recent nodes of the hybrid to the tip of the phylogeny. Weighted average and standard deviation of the distance were calculated using the program SPSS 15.0.0 (LEAD Technologies, Inc., IL, USA). Among the four genes, the estimates of age show that the earliest hybridization event in each isolate is the maximum age of hybridization. The 95% confidence intervals were calculated using the program Excel (Microsoft Corporation, USA).

Investigation of the relationship of VNII-1 group and the VNB molecular type
A recent study in Botswana has proposed a new molecular type, VNB for the Cryptococcus species complex [15]. We determined whether our VNII-1 isolates are in fact resided in this new molecular type. Parsimony and Bayesian analysis were performed as described above, using a combined dataset of PLB1, URA5, GPD1, and IGS1 with representatives of each molecular type including: VNI: WM148, H99, WM721, RV59369; VNII: WM626, JG02; VNII-1: M27053, HamdenC3-1, LA511; VNB: bt1, bt31, bt131, bt22; VNIV: JEC21 and the standard molecular type strains of C. gattii. Bayesian analysis was performed based on HKY evolutionary model with the gamma distribution according to the program ModelTest 3.7 using the combined dataset obtained as described above. The combinability as tested as described above. Sequences of PLB1, GPD1 and IGS1 of the VNB strains were obtained from the online database (http://www.mlst. net/). Gene sequences of strain H99 were obtained from the genome project database (http://www.broad.mit.edu/) as well as GPD1 and IGS1 sequences of JEC21 (http://www.ncbi.nlm.nih. gov/). URA5 of the VNB strains were amplified and sequenced as described above. GPD1 and IGS1 of the other strains were amplified using the primers GPD1F/GPD1R [68] and IGSF/ IGSR [15] under the following conditions: 3 min of initial denaturation at 94uC, followed by 35 cycles of 45 seconds at 94uC, 45 seconds at 63uC (GPD1) or 60uC (IGS1), 1 min at 72uC and lastly, 7 min of final extension at 72uC.