Fig 1.
Identification of novel isolates of C. tropicalis.
(A) Genome variation among C. tropicalis isolates. Variants were identified using the Genome Analysis Toolkit HaplotypeCaller and filtered based on genotype quality (GQ) scores and read depth (DP). Variants for all 77 isolates are shown according to variant type. Isolates are labelled on the X-axis by strain ID. One isolate (C. tropicalis ct20) has mostly homozygous variants, and six isolates have very high levels of heterozygous variants. (B) Six isolates of C. tropicalis are highly divergent. Variants were called as in (A). For heterozygous SNPs, a single allele was randomly chosen using RRHS [93] and for homozygous SNPs, the alternate allele to the reference was chosen by default. This process was repeated 100 times and 100 SNP trees were drawn with RAxML using the GTRGAMMA model [94]. The best-scoring maximum likelihood tree was chosen as a reference tree and the remaining 99 trees were used as pseudo-bootstrap trees to generate a supertree. Pseudo-bootstrap values are shown as branch labels. The six divergent isolates (Cluster B) are labelled according to their country of origin (see 1C). (C) SNP phylogeny of isolates from Cluster A indicates that clade structure is not associated with geography. The phylogeny of cluster A is shown in detail. Pseudo-bootstrap values are shown as branch labels. Isolates are labelled according to their country of origin, and environmental isolates are indicated with an asterisk. The reference strain, C. tropicalis MYA-3404, is labelled. Five putative clades are highlighted with colored bubbles. These clades are supported by principal component analysis (PCA) (S3 Fig). A sixth group was also identified by PCA, encompassing the remainder of isolates in the tree (S3 Fig).
Fig 2.
Novel C. tropicalis isolates result from hybridization.
(A) Analysis of k-mer distribution profiles reveals hybrid genomes. K-mer analysis of sequencing readsets was performed with the k-mer Analysis Toolkit (KAT [82]). For each of four divergent isolates, the number of distinct k-mers of length 27 bases (27-mers) is displayed on the Y-axis and k-mer multiplicity (depth of coverage) is displayed on the X-axis. K-mers that are present in the reference genome are shown in red, and k-mers that are absent from the reference genome are shown in black. There are two distinct peaks of k-mer coverage at approximately 50X and 100X. This pattern implies that most of the genomes are heterozygous (k-mers at 50X coverage) with few homozygous regions (k-mers at 100X coverage). Approximately half of the heterozygous k-mers in the readsets are not represented in the reference sequence. This pattern has been observed in hybrid isolates from other yeast species [25]. (B) Analysis of phased variants identifies two distinct haplotypes in divergent isolates of C. tropicalis. Variants were phased using HapCUT2 [44] into blocks covering 10–13 Mb of the genome. For each phased block, percentage difference from the reference strain in each haplotype was calculated as the number of variants divided by the length of the block. For 84–87% of the blocks, one haplotype is <0.3% different to the reference sequence and one haplotype is >1% different to the reference sequence. All phased blocks for each of the six hybrid isolates are shown as haplotype pairs, with the member of the pair more similar to the reference (haplotype A) shown in blue and the member of the pair less similar to the reference shown in orange (haplotype B) or purple (haplotype C). Percentage difference to the reference sequence is displayed on the Y-axis and position in the genome (chromosome, position (bp)) is displayed on the X-axis.
Table 1.
Results of haplotype phasing.
Fig 3.
Loss of heterozygosity in C. tropicalis isolates.
(A) Hybrid and non-hybrid isolates differ in the extent of LOH across the genome. The eight largest scaffolds in the reference genome are displayed horizontally from left to right and labelled from 1 to 8. LOH blocks are shown in pink and heterozygous (“HET”) blocks are shown in green. Centromere positions are indicated with “C”, telomere positions are indicated with “T” and the rDNA locus is indicated with “R”. Isolates are labelled on the left-hand side. The re-sequenced reference strain C. tropicalis MYA-3404 (labelled as “Ref”) is shown as a representative of the non-hybrid (AA) isolates. The genomes of the AA isolates consist mostly of LOH blocks. The AA isolate C. tropicalis ct20 has undergone extensive LOH, covering >99% of the genome. In contrast, in the AB/AC isolates, the majority of the genome consists of heterozygous blocks. (B) LOH is limited to short tracts of the genome in hybrid isolates. The histograms show the frequency of LOH blocks of different lengths in the six hybrid isolates and two AA (non-hybrid) isolates the re-sequenced reference strain C. tropicalis MYA-3404 (labelled as “Ref”) and C. tropicalis ct20. Frequency is shown on a log scale on the Y-axis while length in kilobases (kb) is shown on the X-axis, with a bin width of 1000 bp. The average length of LOH blocks in the hybrid isolates ranges from 286–416 bp. A similar pattern is observed in all six hybrid isolates, i.e. a predominance of short LOH blocks, with very few long tracts of LOH. In the non-hybrid isolates (e.g. C. tropicalis MYA-3404), LOH blocks are generally longer. C. tropicalis ct20 has the longest average LOH block length (~10 kb).
Fig 4.
Disrupting BAT22 prevents growth of C. tropicalis on branched chain amino acids as a sole nitrogen source. (A) Phenotype analysis of C. tropicalis isolates.
Growth of C. tropicalis ct04 is shown on solid media. Strains were grown in 2x2 arrays; two biological replicates (top and bottom rows), with two technical replicates each (left and right columns), of each strain were tested. C. tropicalis ct04 replicates are outlined with red boxes. C. tropicalis ct04 cannot utilize valine or isoleucine as a sole nitrogen source and also exhibits a growth defect on solid media with 2% starch or 2% sodium acetate as the sole carbon source, or on solid media without a carbon source provided. (B) Editing of BAT22. Plasmid pCT-tRNA-BAT22 was generated to edit the wild type sequence of BAT22 (CTRG_06204) using CRISPR-Cas9. The sequences of the reference C. tropicalis BAT22 (CtBAT22 (wt)), BAT22 from C. tropicalis ct04 (CtBAT22 (ct04)) and edited BAT22 (CtBAT22*) are shown. The guide sequence is highlighted with a black box, the PAM sequence is shown in bold, and the Cas9 cut site is indicated with a red scissors. C. tropicalis isolates ct44, ct09 and ct53 were transformed with pCT-tRNA-BAT22 and a repair template (RT_BAT22_2bpDel_SNP) generated by overlapping PCR using RT_BAT22_2bpDel_SNP-TOP/BOT oligonucleotides. The repair template contains two 60 bp homology arms and deletes two bases in BAT22 resulting in the same frameshift observed in C. tropicalis ct04. (C) Edited strains have defects in branched-chain amino acid metabolism. 5-fold serial dilutions of C. tropicalis ct04, ct09(wt; bat22*), ct44 (wt; bat22*) and ct53 (wt; bat22*) in the same conditions tested in (A). The edited strains cannot use valine or isoleucine as sole nitrogen sources.