Insights into the Life Cycle of Yeasts from the CTG Clade Revealed by the Analysis of the Millerozyma (Pichia) farinosa Species Complex

Among ascomycetous yeasts, the CTG clade is so-called because its constituent species translate CTG as serine instead of leucine. Though the biology of certain pathogenic species such as Candida albicans has been much studied, little is known about the life cycles of non-pathogen species of the CTG clade. Taking advantage of the recently obtained sequence of the biotechnological Millerozyma (Pichiasorbitophila) farinosa strain CBS 7064, we used MLST to better define phylogenic relationships between most of the Millerozyma farinosa strains available in public collections. This led to the constitution of four phylogenetic clades diverging from 8% to 15% at the DNA level and possibly constituting a species complex (M. farinosa) and to the proposal of two new species:Millerozyma miso sp. nov. CBS 2004T ( = CLIB 1230T) and Candida pseudofarinosa sp. nov.NCYC 386T( = CLIB 1231T).Further analysis showed that M. farinosa isolates exist as haploid and inter-clade hybrids. Despite the sequence divergence between the clades, secondary contacts after reproductive isolation were evidenced, as revealed by both introgression and mitochondria transfer between clades. We also showed that the inter-clade hybrids do sporulate to generate mainly viable vegetative diploid spores that are not the result of meiosis, and very rarely aneuploid spores possibly through the loss of heterozygosity during sporulation. Taken together, these results show that in this part of the CTG clade, non-Mendelian genetic exchanges occur at high rates through hybridization between divergent strainsfrom distinct clades and subsequent massive loss of heterozygosity. This combination of mechanisms could constitute an alternative sexuality leading to an unsuspected biodiversity.


Introduction
The life cycle of Saccharomyces species is well known. They propagate as haploid or diploids according to the environment, mate, sporulate and undergo meiosis to yield haploid spores [1]. Saccharomyces species are also known for forming inter-specific hybrids under the pressure of industrial fermentation conditions [2]. This genus is nevertheless far from being representative of the ascomycetous yeasts. Some Candida sp. pathogens belonging to the CTG clade were shown to have various life cycles. Candida albicans is a diploid that does mate and shows parasexuality, but does not perform sporulation or meiosis [3,4]. Clavispora lusitaniae, on the other hand, can mate and sporulate. Its progeny is complex and comprises haploids resulting from meiosis, diploids and aneuploids [5]. Candida tropicalis was also recently shown to display parasexuality [6], suggesting that yeast pathogens are sexual species.
However, little is known about non-pathogenic species of this clade like the most studied Debaryomyces hansenii. This species is thought to have a rather clonal reproduction through fusion between mother and daughter cells followed by meiosis, whilemating between individual cells israrely observed [7]. Another CTG species, Pichia farinosa (Lindner) E.C. Hansen var. farinosa (teleomorph of Candida cacaoi) is a ubiquitous, halotolerant yeast found mainly in food (alcoholic beverages like beer and sake, soy sauce, miso, mash of rice vinegar etc.) but alsoon various substrates (cows with mastitis, sputum, giraffedung, laboratory media and highly concentrated sorbitol solutions) [8]. A reassessment of the phylogeny of Q9 enzyme-producing species changed the name of P. farinosa to Millerozyma farinosa [9]. Increased numbers of M. farinosa and other yeasts, mainly Candida albicans, were found in patients undergoing radiotherapy for cancer of the mouth [10] and an M. farinosa strain was found as an oral cavity colonizer [11]. Two further cases of human infections associated to M. farinosa were also described [12,13].
In 1980, a related species, Pichia sorbitophila, was described [14]. The only member of this species, CBS 7064, is able to survive extremely high extracellular concentrations of salts (e.g., saturated solution of KCl) and other osmolytes (70% glucitol), although it has not been classified as halophilic or osmophilic [15,16]. Both M. farinosa and P. sorbitophila can produce high yields of glycerol and xylitol, which makes them valuable in biotechnology. The placementof some strains in the M. farinosa species is a matter of debate [8]. However, the main controversy has concerned the taxonomy of M. farinosaand P. sorbitophila. On the basis of DNA relatedness, Debaryomyces halotolerans, P. sorbitophila, and Pichia petrophilum were considered to be conspecific with M. farinosa [17], whereas physiological properties and sequence analysis indicated they should be classified as separate species,in agreement with the divergent sequences of their respective mitochondrial DNAs [18,19]. M. farinosa CBS 7064 carries two divergent URA3 genes, which were also different from that of the M. farinosa type strain [20]. The presence of the two divergent URA3 gene copies and the higher number of chromosomes in pulse field gel electrophoresis was in agreement with CBS 7064 being diploid.
From therecently sequenced CBS 7064 genome [21], it appears thatthis strain is a recent hybrid between two distant parents: the two copies of their homeologous chromosomes present more than 15% divergence at the DNA level. The parental contributorsremain unknown, but are certainly distinct from the M. farinosa type strain CBS 185 T . CBS 7064 contains sevenpairs of chromosome. In three ofthese both parents contribute one chromosome; in two the chromosomes contain regions that are identical and regions that are diverged;two consist of identical chromosomes, indicative of massive loss of heterozygosity (LOH) [21]. Our analysis of the respective contribution of each parent to the hybrid reveals the major genetic changes that occurredduring the time that separated the last common ancestor of both parents and the hybridization event. These include one reciprocal translocation between two chromosomes, a number of gene losses and unequal mitochondrial DNA insertions (NUMT) in the nuclear genomes of the parents.
In order to define the parental contributors of CBS 7064, to resolve the taxonomy of the M. farinosa species and to gain insight into the life cycle of CTG species, we took advantage of the available complete genome sequence of the CBS 7064 strain [21] to analyzealmost all the M. farinosa strains available in public collections. In this paperwe use MLST to constitute phylogenetic groups. We demonstrate reproductive isolation of these groups. We show that M. farinosa is a species complex made of distinct taxa, and in which isolates exist as haploid organisms and as hybrids resulting from crosses between strains from these taxa. We also show that the hybrids can sporulate to generate viable diploid spores that are not the result of meiosis. In this species complex, an alternative sexuality allows the formation of hybrids subjected to LOH and of rare aneuploids.

Structuring the Millerozyma farinosa populations in five clades
In a previous study, we have observed that Debaryomyces hansenii, a closely related species to M. farinose [22], consisted of distinct clades with haploid and diploid strains, the latter being hybrids from diverged strains belonging to these distinct clades [23]. We also demonstrated the efficacy of housekeeping gene intron sequence analysis in distinguishing between clades of D. hansenii. The most informative intron sequences were those of the ACT1 and RPL33 genes. In order to analyze the M. farinosa species, we therefore amplified and sequenced the introns of these two genes in 41 M. farinosa strains (Table 1). In a number of cases, PCR products were a mixture of two species which needed to be cloned before sequencing. In these cases, sequences were arbitrarily given the prefix A and B. We compared the obtained sequences and constructed gene trees using the neighbor-joining algorithm. Identical topologies were obtained when the trees were constructed with maximum-likelihood program (data not shown). The ACT1 gene intron tree is shown in Figure 1 and the RPL33 gene intron tree is shown in Figure S1. This analysis, both with ACT1 and RPL33, separated the sequences into the same five clades. Some strains displayed two different ACT1 alleles and one RPL33 allele or vice-versa, consistent with LOH. Using flow cytometry, we confirmed that the strains presenting ACT1 and/orRPL33 heterozygous alleles had a 2n DNA content. Table 2 gives the estimated genome size of all the strains presenting some heterozygosityfor ACT 1 and/or RPL33 together with ahaploid strain from each clade as a control. For CBS 185 T and the haploidstrain of each clade, the estimated size of the genome varied from 9.2 to 10.8 Mb, similar to that of some species in theSaccharomycetaceae [24], but smaller than that of the closely related species D. hansenii (13.2 Mb) [25,26]. The estimated values for the strains presenting sequence heterozygosityvaried between 18 and 22.9 Mb, approximatelytwice the value of CBS 185 T . An example in Figure S2 shows a shift of the fluorescence intensity peaks in the strains CBS 7064 and CBS 5730 compared to CBS 185 T and CBS 2001. These observations strongly suggest that the strains displaying heterozygosity result from crosses between strains belonging to different clades, as already observed for D. hansenii [23].
The first clade (ascribed the number 1) was defined by the sequence of the ACT1 alleles of 10 strains including the type strain CBS 185 T . As mentioned above, the results from the analysis of the RPL33 alleles gave essentially the same clades and are not further discussed here. Clade 2 contained 18sequencesincluding one of theACT1 alleles of CBS 7064. Sequence divergence in the ACT1 intron between these two clades was 8.5% (Table 3). We note that the five strains MUCL 38866, MUCL 38867, MUCL 38869, MUCL 38870 and MUCL 38871 all belong to clade 2 and have identical intronsequences. Since they were all isolated from the same substrate in Indonesia, they may well constitute asingle clone. The clade 3 is comprised of 5 isolates. Only one of these strains, CBS 2007, carried a unique copy of ACT1 and of RPL33; the four other strains, CBS 2006, CBS 4073, CBS 4710 and VTT 86125, each carried two alleles of ACT1, one grouping with clade 3 and the other with clade 2. The average sequence divergence in the ACT1 intronbetween clade 1 and clade 3 was 11.5% (Table 3). The fourth clade contains only one sequence, being that of the second ACT1 allele of M. farinosa CBS 7064, withasequencehighly divergent (13.7%) from that of clade 1 (Table 3). Finally, clade 5 consisted of the sequences of four strains that showed 14.5% divergence fromclade 1 ACT1 intron sequences and an even higher sequence divergence -above 15% with that of the other clades. Because intronic sequences could not be obtained for anumber of strains, weused coding sequences to place these strainsinto the defined clades (Table 1 and Table S2).
As seen above, sequence divergence between the clades varied from 8.5 to 19.5% (Table 3), indicating that these clades are clearly distinct. We defined a typical strain for each clade as being, among haploids, the first strain described (see the ACT1 gene intron tree in Figure 1). The topology of the RPL33 gene intron tree ( Figure S1) was very slightly different from that of the ACT1 gene tree. Comparison of the two gene trees indicated that the clades were composed of the same strains, suggesting the absence of marker exchange between clades, ACT1 and RPL33 being located on different chromosomes ( [21]; our unpublished data). Interestingly, one of the two RPL33 alleles of strain CBS 8045 (the defining and unique strain of clade 4) was different from all sequenced RPL33 alleles; the other allele belongs to clade 2 as does the unique ACT1 allele of this hybrid. Since CBS 7064 was homozygous for clade 2 RPL33 allele ( [21]; our data), no ''clade 4 RPL33 sequence'' was available. To know whether CBS 8045 was also, like CBS 7064, a clade 2/clade 4 hybrid and effectively carried clade 4 sequences, three markers AB-r, CD-l and IJ-r(see Table S3 and Figure S3) known to existin two heterozygous copies in the genome of CBS 7064 were sequenced in CBS 8045.For each locus we found identical sequences in CBS 7064 and in CBS 8045, indicating that CBS 8045 was indeed another strain carrying both clade 2 and clade 4 sequences. Interestingly, both strains had been isolated as contaminants from highly concentrated sorbitol solutions [27].
Finally, two strains, CECT 10348 and NCYC 386 T , previously assigned to the M. farinosa species did not fall in any of our five clades. CECT 10348 was unrelated to our test group and belonged in fact on the basis of rDNA D1/D2 sequence comparison to a species of the recently described genus Sugiyamaella [28]. The strain NCYC 386 T , though not within the M. farinosa complex was the closest on the basis of the ACT1 coding sequence. Nevertheless, its ACT1 intron does not show any similarity with that of any strain from the five clades of M. farinosa (see later).
complex distribution of alleles. The strain CECT 10348 was not related to M. farinosa, but rather to the type strain NRRL YB-2798T of Sugiyamaella sp. [  Origin of the CBS 7064chromosomes in relation with the M. farinosa clades Using two housekeeping genes ACT1 and RPL33, we were able to define five distinct clades in the M. farinosa species (see above). This MLST analysisshowed unambiguously that strains belonging to clade 2 and clade 4 contributed to the hybrid CBS 7064. In the description of the genome of CBS 7064, a distinction was made between the two parental contributing genomes (designated Pc and Pe?) on the basis of the unequal distribution of NUMT insertions and on the G+C percentage of each subgenome [21].The CBS 7064 genome contains seven pairs of chromosomes: three are heterozygous, two are homozygous and two are heterozygous along a proportion of their length, indicative of a massive loss of heterozygosity ( [21], see Figure S3).Having defined the parental clades of CBS 7064 will allow us to determine which part of the CBS 7064 genome was contributed by which clade.In addition, it was shown that synteny was conserved between the two contributors of CBS 7064, except for a single reciprocal translocation between two chromosomes of one the CBS 7064 contributor [21]. Using the genome sequence of CBS 7064, it was therefore interesting to perform a wider MLST analysis on the typical, haploid strains of the four clades (CBS 185 T , 2001, 2007 and 2004 T from clades 1, 2, 3 and 5, respectively), using 16 markers located at the extremities of each of the chromosomes of CBS 7064 to define the origin of each of the chromosomes of CBS 7064 and to better define each of M. farinosa clades.
MLST analysis was obtained for one gene on each extremity of the seven chromosome pairs and on each side of the translocation breakpoints between chromosomes E/F and I/J ( Figure S3 and Table S3) i.e.a clade 2 strain is a parental contributor of CBS 7064,whileneither clade 1, clade 3 nor clade 5 strains contributed toCBS 7064.Interestingly, none of the strains tested gave 100% identity with the right end markers of chromosomes A and B of CBS 7064 ( Figure S3), indicating that the right arm of chromosome A, which is homozygous to the right arm of chromosome B due to LOH [21], corresponds to an as-yet undescribed taxon. Here, we were able to confirm the distinction foundbetween the Pc and the Pesubgenomes [21]and to establish unequivocally the source of the parental subgenomes; a clade 2 strain is the contributor of the Pc subgenome, whereas the Pesubgenome comes from an as-yet undescribed taxon, which has no haploid representative in our M. farinose collection and which we have characterized as clade 4 ( Figure 1).
In addition, pair-wise comparisons of the marker sequences in the four typical strains gave identities of 78 to 96%, further supporting the existence of distinct clades within M. farinosa based on the 16 chromosomal markers (Table S4). The existence of four divergent sequences for each marker of the typical haploid strains CBS 185 T , 2001, 2007 and 2004 T also indicated thatneither genetic exchange due to hybridization followed by LOH, nor introgression of the tested markers between strains had occurred at the chromosome extremities in representatives of clades 1, 2, 3 and 5.

Sporulation of the hybrids and characterization of progeny
As some isolates of M. farinosa were found to be inter-clade hybrids, it was interesting to examine their ability to sporulate and if they did, the content of the resulting spores. In the genus Saccharomycesinter-specific hybrids have been shown to sporulate but to produce unviable spores. In the CTG clade few studies exist on sporulation of pathogen species [5] and none on non-pathogen species, except for CBS 7064, which was described as a sporulating isolate [14].
All the M. farinosa inter-clade hybridstrains and a number of haploids listed in Table 1 were allowed to sporulate.All the hybrids sporulated well,exceptfor CBS 4073andhybrid CBS 5949 which did not grow on the sporulation medium used. On the other hand, some of the haploid strains showed little sporulation, if at all; in particular, the typical strains of clade 2 and 3, CBS 2001 and CBS 2007 respectively, did not sporulate, unlike the type strain CBS 185 T .  The progeny of the clade 2/clade 4 hybrid CBS7064 and of the clade 2/clade 3 hybrid CBS 2006 were analyzed.Only four-spore asci were considered ( Figure S4A and S4B). Asci were microdissected and spores were germinated. A total of 52 spores of CBS 7064 out of 163 grew on YPD (32% spore viability). This result seemed surprisingly high for progeny coming from a hybrid of highly diverged parents(15.3% between the two CBS 7064 subgenomes) [21]. Indeed, in Saccharomyces, the hybrids with over 10% divergence do not produce viable spores [29]. Using a heterozygous NUMT insertion as marker, we showed that the 52 viable spores had all inherited both parental alleles, suggesting that meiosis had not taken place. Flow cytometry confirmed that 10 of these spores chosen at random had a 2n DNA content (data not shown).
Sporulation of strain CBS 2006 yielded 132 spores, out of which 103 grew on YPD (78% spore viability). A total of 70 spores were analyzedfurther. As the NUMT content of CBS 2006 is not known, we used SNP in genomic markers to analyze the progeny. We amplified and sequenced the AB-r and CD-l markers (which are locatedin CBS 7064 on the right endof the A/B chromosome pair and onthe left endof the C/D chromosome pair, respectively; Figure S3, Table S3). Two spores (designated G6 and 28) carried only one type of CD-l allele, that of clade 3. The rest of the spores carried both clade 2 and clade 3 CD-l alleles. All the spores carried both of the AB-r alleles. This suggested that all the spores except G6 and 28 were diploid, which was confirmed by flow cytometry on 16 spores. Interestingly, the DNA content of spores G6 and 28 were estimated at24.960.9 Mb and 13.160.2 Mb respectively, the parental CBS 2006 genome size amounting to 21.961.5 Mb (Table 2). This suggests a diploid state (,2.3n)for spore G6 and a haploid or aneuploid state (,1.23n)for spore 28.
The spores G6 and 28 were further analyzed using theset of 16 markers located on each extremityof all chromosome pairs. The results are summarized in Table 4. Despite a slightly higher number of heterozygous markers than CBS 7064 (9/13 vs. 7/13), the hybrid CBS 2006 harbored also some level of LOH. When compared to its parent CBS 2006, spore G6 contained the same four heterozygous regions at the left and right ends of chromosome pair A/B, left end of E/F and right end of K/L, but homozygous markers were found in the place of the five other heterozygous regions of CBS 2006 (Table 4). Taken together, these results indicate that this spore is diploid and has undergone LOH, possibly during sporulation. On the other hand, spore 28 only displayed a single heterozygous region on the left end of chromosome pair A/B. This spore has a close to 1.23n DNA content with an estimated genome size of 13.1 Mb, which is slightly over the values between 9.2 and 10.8 Mb measured for haploid M. farinosa strains. Therefore, spore 28 is probably aneuploid, because it contains both clade 2 and clade 3 AB-r markersbut is homozygous for the other markers and it may have undergone meiosis ( Table 4).

Analysis of mtDNA origin and NUMT insertions reveals secondary contacts after reproductive isolation
Within M. farinosa, we have described distinct clades whose sequence divergence is as high as that described for distinct species, as in the genus Saccharomycesfor instance. In addition, we have evidenced inter-clade hybridsthat sometimes yield aneuploids. This previously unobserved situation mimics horizontal gene transfer (HGT)and it was interesting to examine potential genetic exchanges in haploid strains. Mitochondria constitute good markers to follow genetic exchanges.
To address thequestion of the inheritance of mtDNA in the M.farinosa inter-clade hybrids, an MLST analysis of the COX2 and COX3 genes was first performed using sequence of the part of these two genes in the typical strains of each clade and in all the inter-clade hybrids. A schematic representation of the relationship between the mtDNA sequences in relation with the different clades is shown in Figure 2A. Clade 3 CBS 2007 and clade 5CBS 2004 T as expectedcarrieddivergentmtDNA alleles, while surprisinglystrains from clade 1 and clade 2 carried the same mtDNA alleles. All the tested hybrids,except CBS 7064 (clade 2/clade 4) and CBS 2167 (clade 3/clade 5), carried either the mtDNA alleles of clade 2 or that of clade 3, consistent with the idea that the hybridswere derived from crosses betweenthe clades. Our results show that, like in Saccharomyces [30]inter-specific hybrids [31] or Metschnikowia inter-specific hybrids [32], mtDNA can be transmitted by either of the parents in the M. farinosa species complex. This is in contrast with several reportson Saccharomycesfermentative hybrids involving S. cerevisiaeshowingthat mtDNA from the noncerevisiaeparental contributor wasspecifically favored [33].
Interestingly, the mtDNA carried CBS 7064 did not correspond to any of the alleles of clades 1, 2, 3 and 5. Comparison of the mitochondrial genomes of CBS 185 T and CBS 7064 had already shown that they differed [18,19]. From our results, we can deduce that mtDNA of CBS 7064 was transmitted by the parental strain for which we do not yet have a haploid representative strain.
The fact that both clade 1 CBS 185 T and clade 2 CBS 2001 typical strains carried the same mtDNA alleles prompted us to analyze the distal COX2 and COX3 geneson the mitochondrial genome of all haploid strains from clade 1 and 2 in order to try to identify the missing mtDNA type. As represented in Figure 2B, all the stains carried the same mtDNA alleles, except for clade 1 strain CBS 2020, which displayed threeSNP in COX2(0.5%) and five SNP in COX3(0.8%), demonstratingthat CBS 2020 carried another mtDNA type. These results suggest that mating events between clade 1 and clade 2 ancestors have occurred associated with unidirectional transfer of mtDNA. Unlike in Metschnikowia [32], we did not observe any recombination between the mtDNA from different clades. NUMTinsertions can be found in most yeast genomes [34].Using NUMT insertions that we have identified in the genome of CBS 7064 [21] as markers, we could go further into the analysis of the relationships between the M. farinosa strains and confirm the existence of genetic exchanges between clades 1 and 2. By monitoring the presence /absence of NUMT in PCR products, we analyzed nine heterozygous and two homozygous NUMT loci present in the CBS 7064 genome in the three haploid strains CBS 185 T , (clade 1), CBS 2001 (clade 2) and CBS 2007 (clade 3) as well as and two hybrids CBS 5730 (clade 2/clade 3) and CBS 8045 (clade 2/clade 4). Most insertions were absent in the typical strains from clades 1, 2 and 3, consistent with their presence on chromosomes E, J and M that are characteristic of the clade 4 parent of CBS 7064 (Table 5). Conversely, the sole NUMT insertion of chromosome I of CBS 7064, contributed by clade 2 parent, was found in the clade 2 strain CBS 2001 and also in clade 1 strain CBS 185 T .
During the analysis of a NUMT locus on chromosome E (NUMT s9) [21], we discovered that the sequence of the heterozygous allele on chromosome I of CBS 7064 was identical to that of clade 1 CBS 185 T and displayed 9% divergence with the sequence of the parentCBS 2001(clade 2), which contributed the rest of the chromosome. Moreover,the clade 2/clade 4 hybrid CBS 8045 (see above) also displayed the clade 1 sequence for this locus. These observations suggest that some genetic exchange has occurred between clades 1 and 2 after reproductive isolation, in agreement with the presence of an identical mtDNA in most strains of clade 1 and clade 2.

Phylogeny of the M. farinosa complex in the CTG clade and taxonomic considerations
The M. farinosaspecies was shown to be made of related, but distant clades, and it was therefore necessary to establish a phylogenetic analysis of the species and clades constituting the genus Millerozyma based on a reliable multigenic analysis. Kurtzman et al [35]has circumscribed the genus Pichia, which now contains the type strain of the genus, Pichia membranifaciens, and various species from other previously described genera such as Issatchenkia. Kurtzman and Suzuki [9] have renamed the genus to which P. farinosa belongs as Millerozyma. We generated a phylogeny with the typical strain of each of the four clades, the clade 4 (the Pe subgenome) parent of CBS 7064, NCYC 386 T ,and the type strains of the previously decribed Millerozymaspecies using the RPB1, RPB2 and ACT1 exon2 sequences ( Figure 3). All of the M. farinosa typical strains formed a monophyletic group with NCYC 386 T having a basal position. We propose that the four closely related clades 1 to 4 that we have defined in this study constitute the species M. farinosa. NCYC 386 T and CBS 2004 T clearly defined two distinct taxa; the high sequence divergence that these two strains display from the other strains of the complex leads us to propose that they can be considered as a new species. We therefore propose to reinstate the species Zygopichia miso (type strain CBS 2004 T = CLIB 1230 T ) previously used for CBS 2004 T by Mogi in 1942 (but which had not been described) under the name Millerozyma miso and we would like to define a new species Candida pseudofarinosa (type strain NCYC 386 T = CLIB 1231 T ).
Typus stirps NCYC 386 T ( = CLIB 1231 T ) Description of Candida pseudofarinosa Mallet, Weiss, Jacques &Casaregolasp. nov After 6 days at 28uC on YPD, cells are elongated and mycelial pseudohyphae are formed ( Figure 4C). Budding is multipolar. After two days on YPD agar the streak is butyrous, wrinkled, moderately raised and white-colored. Pseudomycelium was observed on culture corn meal agar. Formation of ascospores was not observed.
The type strain, NCYC 386 T ( = CLIB 1231 T ), was isolated in Japan and was deposited at National Collection Yeast Culture by IFO collection during December 1953.

Discussion
Species whose taxonomy is debatable may reflect the possible existence of a species complex containing cryptic species. By evaluating the sequence divergence in regions little subjected to selective pressure, we have shown that M. farinosa was composed of at least four diverged taxa. In addition, two taxa that presented high sequence divergence were considered as new species: Candida pseudofarinosa and Millerozyma miso. We have previously shown that another CTG species, D. hansenii, whose taxonomy has been discussed over the years [36,37] is a species complex with at least three distinct taxa [23]. Similarly, a third CTG species, Candida parapsilosis was also shown to be a complex made of three species [38].
Since the discovery of the inter-specific hybrid lager yeast S. pastorianus strains [39], a large number of hybrids have been detected in food [40] and in fermentation environments such as cider making [41,42], wine making [43], and brewing [44]. As in the Saccharomyces, in which hybrids provide new technological potentialitiessuch as specific fermentations [45], hybrids of the CTG clade may form in constrained environmentsand generate new (and beneficial) properties to the cells. In this context, it isperhaps relevant that CBS 7064 and CBS 8045, the sole clade 2/ clade 4 hybrids, were both isolated in very high sugar concentrations; the association of genetic material from these two clades may increase resistance to osmotic pressure. However, our test in the laboratory for the growth in high salt concentration (4 M NaCl) did not differentiate these two strains from the other M. farinosa isolates (data not shown), in contrast to what was described by [15]; thus new phenotypes associated with hybrid formation in M. farinosa remains to be found.
In the M. farinosa hybrids, sequence divergence in the ACT1 intron varied between 8.5 and 14.5%. The overall average genomic divergence between the contributors of CBS 7064 is 15.3% [21], consistent with the value of 15% for the introns of ACT1 and RPL33. Introns are therefore good indicators of overall sequence divergence in yeast genomes, and are superior to sequences like the rDNA D1/D2 [8] or the ACT1coding gene [46].
In the M. farinosahybrids, the overall divergence may be reduced by LOH. Indeed, massive LOH was observed in M. farinosa CBS 7064 to the point where some of the chromosome pairs became entirely homozygous [21]. We confirmed that LOH also occurred at high frequency in other M. farinosa hybrids.The distribution of markers in the diploid spore G6showed that the genetic information from one of the parents became predominant. This processseems to vary according to the strain analyzed, as seen from the comparison betweenCBS 7064and spore G6 of CBS 2006. In our study of LOH in a large number of diploids inD. hansenii,we observed a tendency, which favored only one parental contributor [47]. The present study demonstrates that M. farinosa shares common featureswith other described CTG species such as C. albicans [48], Candidadubliniensis [49], and D. hansenii, for the existence of hybrids and LOH. In addition, natural populations of both haploids and hybrids exist in M. farinosa and D. hansenii.
In the case of hybrids, LOH would certainly alleviate the heterozygosity-driven inhibition of meiotic recombination. Liti et al [29] have observed a threshold of sequence divergence (around 5%), above which the yield of viable spores in S. cerevisiaeis greatly diminished. Sequence divergence(as low as 3.6% in populations of S. paradoxus) was shown to drastically lower spore viability [50]. The sequence divergence between the M. farinosa clades (from 8.5 to 14%) is well above these thresholds and would almost certainly prevent recombination at meiosis if meiosis occurred. Consistent with this observation, our study has shown that there is no shuffling of alleles between analyzed strains using two markers. This suggests that M. farinosa has mainly a clonal propagation. Hence, it is possible that the concept of species defined by fecundity as described by Liti et al [29] for the Saccharomyces does not apply to yeasts of the CTG clade.
Although most of the M. farinosa hybrids did sporulate, one can predict that they do not undergo meiosis. In Zygosaccharomyces bailii, a species that is closer to the Saccharomyces and does not belong to the CTG clade, a lack of meiotic spores was observed. This was explained by the absence of nuclear fusion after mating [51]. In our study, none of the 52 spores of CBS 7064 and none of the 70 spores of CBS 2006 underwent meiosis, with the possible exception of one spore from CBS 2006 (spore 28). The reason why these yeasts undergo sporulation, but do not perform meiosis is unknown. This is reminiscent of C. lusitaniae, whose progeny from crosses was complexand contained 70% haploid spores, 21% diploid spores with the rest being aneuploid [5]. Obviously, the rarity of haploid spores in M. farinosa may be due to the important sequence divergence between the parents, which was not the case in C. lusitaniae. On the other hand, all the spores from D. hanseniiinter-clade hybrids are also diploid (our unpublished observation), though in this case the sequence divergence was lower (at most, 3%) than that of M. farinosa clades. This observationsuggests that sequence divergence may not be the only reason for the lack of meiosis in these species. Alternatively, in these species, a 3% sequence divergence may still be too high to allow meiosis to occur. It has been proposed that errors in meiosis such as meiosis I non-disjunction or precocious sister segregationor a possibleabsence of synaptonemal complex could lead to aneuploid and diploid sporesin C. lusitaniae [5]. Indeed, sporulation of inter-specific hybrids generates aneuploids in Saccharomyces [52], because the reduction of meiotic recombination causes a nondisjunction of chromosomes [53]. Therefore, the lack of recombination at meiosis couldexplain the death of aneuploids in M. farinosa.
After sporulation, M. farinosa could undergo LOH leading to a reduced heterozygosity allowing meiosis in the resulting organism. The study of the spore G6 confirmed that LOH could be massive in M. farinosa, as seen in M. farinosa CBS 7064. As in C. albicansin which stress affects LOH [54], stress caused by sporulation could increase LOH in M. farinosa. Figure 5 summarizes the various situations with regard to meiosis and sexual reproduction in four studied species from the CTG clade. The production of aneuploids by hybrids may be a common feature in the CTG clade, since aneuploids were also described in the species complex C. parapsilosis [55]. Interestingly, these authors also observed genetic exchanges in this species that are similar to that which we describe between M. farinosaclade 1 and 2.
The progeny of some of the CTG yeast species can therefore be complex, contributing to the diversity in this clade. M. farinosa is a mix of haploid clones belonging to different clades, hybrids resulting from mating between these clades and subjected to LOH and aneuploids having variable DNA content resulting from the sporulation of the hybrids through abortive meiosis or possible other mechanisms. The haploid or close to haploid spores resulting from the hybrids therefore display a non-Mendelian marker inheritance. For instance, these haploids may be the ancestor of strains of M. farinosa clade 1 or clade 2, which have exchanged material, as deduced from the transmission of mtDNA and from the introgression of unknown extent that we have evidenced from the NUMT analysis. Together, these results indicate that hybridization followedbyLOH is an alternative sexuality for these very divergent strains, although arare occurrence of meiosis isnot excluded. In conclusion, M. farinosa constitutes a species complex made of diverged cryptic species, whose genomes are affected by secondary contacts and LOH. We hypothesizethat this situation could be the case for a number of species of the CTG clade. This ensemble of observations invites reflection on thespecies concept in this clade.

PCR amplification
The sequences of the nucleotide primersused in this study are listed in Table S1. Primers weredesigned with Primer3 (http:// fokker.wi.mit.edu/primer3) on the alignments of genes from complete sequence genomes (http://www.genolevures.org/). Some of the primers were already described [23].Genomic DNA was extracted as previously described [23].

NUMT analysis
Primers were designed on the complete sequence of M. (P. sorbitophila) farinosa CBS7064 around each NUMT locus. For the nine heterozygous loci (with the insertion being present on only one parental allele), we chose primer sequences that were identical on both alleles. Two pairs of primers were designed for each of the two homozygous NUMT loci to increase the chance to amplify these regions in all strains.
Cloning PCR products were purified using Nucleospin extract II purification kit (Macherey-Nagel, Germany). They were cloned intoa pGemT Easy (Promega, France) using T4 DNA ligase (Promega). DH10B E. coli cells were transformed by electroporation with pGemT easy recombinant plasmids. Cells were selected for their ability to grow on 50 mg/ml ampicillin. Positive recombinant clones were chosen after PCR screening for the presence of insert by restriction analysis. Inserts were then sequenced using universal primers (T7 and SP6) or the primers used for the PCR amplification. DNA sequence determination and phylogenetic analysis PCR fragments were sequenced on both strands by Cogenics (Takeley, U.K.) and Eurofins MWG Operon (Ebersberg, Germany). Sequences were assembled with the phred/phrap/consed package. Sequences were analysed with various programs including BLAST implemented at NCBI (http://www.ncbi.nlm.nih. gov). Sequence alignments were generated by using ClustalX [56], MAFFT [57] or Muscle [58] and were manually adjusted with Genedoc (http://www.psc.edu/biomed/genedoc). Phylogenetic trees were reconstructed with Neighbor-Joining program implemented in ClustalX and in MAFFT or with Phyml [59]. Phylogenetic trees were visualized with NJplot [60].

Flow cytometry
Flow cytometry was performed as described in [47].

Nucleotide sequence and Mycobank accession numbers
The GenBank/EMBL/DDBJ accession numbers for sequences described in this article are listed in Table S2.The Mycobank (http://www.mycobank.org) accession numbers for Millerozyma miso sp. nov. and Candida pseudofarinosasp. nov. are MB 563405 and MB 563406, respectively.  Figure S3 Schematic representation of the localization of the studied markers on the CBS 7064 chromosomes. The represen-tation of the CBS 7064 chromosomes is taken from [21]. Letters on the right indicate the chromosome name. The markers selected for PCR amplification and sequencing are represented by their gene names (Genolevures nomenclature). Simple name used in this study areindicated in red.Orange:clade 2 (Pc subgenome). Green:divergent sequence from any of the sequences of clades 1, 2, 3 or 5, i.e. clade 4 (Pe subgenome). (TIF)