Whole-Genome Comparison Reveals Novel Genetic Elements That Characterize the Genome of Industrial Strains of Saccharomyces cerevisiae

doi:10.1371/journal.pgen.1001287

Table 1.

Strains sequenced in this study.

More »

Expand

Figure 1.

Chromosomal aneuploidy determined by whole-genome sequencing coverage.

Sequencing coverage was determined for each contig using a sliding window of 1001 bp, with a 100 bp step frequency and plotted in chromosomal order (black circles). Regions of copy number variation were scored as either being greater than 1.25-fold (yellow lines; approximating either three or five copies in a tetraploid genome) or 1.5-fold (red lines; one or three copies in a diploid genome) different to the median coverage for that strain. Strains are shaded according to their industry (wine, red; ale, blue).

More »

Expand

Table 2.

Heterozygosity in industrial S. cerevisiae strains.

More »

Expand

Figure 2.

Nucleotide variation in S. cerevisiae.

(A) InDels associated with tandem repeats. Histogram showing the proportion of tandem repeats of various sizes (repeated size indicated on x-axis) present on chrXVI that were either conserved in repeat length (blue) or contained strain-specific InDels (yellow). The total number of repeat loci present in each class is listed above the histogram. (B) An example of a strain- and allele-specific InDel in a tandem repeat in the promoter region of YPL088W.

More »

Expand

Figure 3.

Nucleotide relationships between S. cerevisiae strains.

(A) A neighbor joining tree representing the genetic distance between strains as calculated from the total SNP diversity present in whole genome alignments. (B) A neighbor joining tree representing the genetic distance between strains presented in part (A) and representative strains from several S. cerevisiae geographical populations [12]. Industrial strains are color-coded based upon their primary industry (wine/European, including RM11-1a, pink; ale, blue; bioethanol, green; sake, yellow). Strains that are predicted to contain the heterogeneous five-gene cluster are labeled in bold.

More »

Expand

Figure 4.

Novel genes found in industrial strains.

(A) A 45 kb strain-specific region in AWRI796 which is predicted to encode at least 21 ORFs (full ORF sequences are listed in Dataset S12). ORFs with homology to AADs are highlighted in yellow. The extreme 5′ and 3′ ends of this cluster are homologous to a repetitive region present in the sub telomeric regions of chrXIII, XV and XVI (dark blue boxes). Black dots within ORFs represent potential frameshifts in the sequence of these regions. (B) Clustalw dendrogram produced by aligning AAD proteins from S288c, AWRI796 and the top five matches to the highly divergent AWRI796 proteins AAD(i) and AAD(ii). (C) The region in the brewing strains FostersO and FostersB containing RTM1 [22] and the conserved hypothetical ORFs are also found in the human pathogen YJM789 [8].

More »

Expand

Figure 5.

A divergent cluster of genes with a possible circular intermediate.

(A) The location and orientation of the gene cluster throughout the genomes of the industrial yeasts. Upper case roman numerals refer to standard S. cerevisiae chromosomes (unk – location unknown) with individual loci labeled with lower case roman numerals. (B) Nucleotide conservation of the five-gene clusters. An alignment of the nucleotide sequence of all eleven clusters is shown below a schematic depiction of the five predicted ORFs present in this nucleotide sequence (A, zinc-cluster transcription factor; B, cell-surface flocculin; C, nicotinic acid permease; D, 5-oxo-L-prolinase; E, C₆ transcription factor). In order to produce contiguous alignments, the sequence of each cluster was manually split to begin with the start codon of ORF A, with the position of each break indicated. Conserved bases are shaded blue (light blue for ORFs sequences). Insertions are highlighted in red and substitutions in green. (C) Differences in gene order within individual clusters. Each of the five genes are represented by filled circles (labeled as in partB), with the systematic name of the ORFs that border each insertion listed in open squares (Z.b, this cluster is present in Z. bailii (Accession number FN295481.1); Ty, transposon sequence; TEL, sub-telomeric repeat (COS) sequence). Colored arrows bordering each cluster indicate the strain(s) in which this insertion is present. (D) Each of the nine cluster locations and orders can be resolved through the use of a circular intermediate that integrates into the genome via breakage at locations indicated by each colored triangle. (E) Conservation of genomic sequences flanking individual cluster insertion events. Nucleotide alignments are shown for the 50 bp directly adjacent to either side of the five chromosomally-mapped insertion events (shaded yellow when conserved) in addition to the first and last 50 bp of the each cluster (shaded according to partB). Insertions are shaded in red, substitutions in green with both additionally highlighted by asterisks. Sequences used for the alignment are (from top to bottom) S228c, JAY291, RM11-1a, EC1118, AWRI1631, QA23 allele A, QA23 allele B, AWRI796 allele A, AWRI796 allele B, Vin13 allele A, Vin13 allele B, VL3 allele A, VL3 allele B, Fosters B allele A, Fosters B allele B, Fosters O allele A, Fosters O allele B. Nucleotide coordinates for the bases directly flanking the insertion are relative to the S288c genome.

More »

Expand