Exploring the potential of comparative de novo transcriptomics to classify Saccharomyces brewing yeasts

In this work the potential of comparative transcriptomics was explored of Saccharomyces (S.) cerevisiae and S. pastorianus for their discrimination. This way an alternative should be demonstrated to comparative genomics, which can be difficult as a result of their aneuoploid genomes composed of mosaics of the parental genomes. Strains were selected according to their application in beer brewing, i.e. top and bottom fermenting yeasts. Comparative transcriptomics was performed for four strains each of commercially available S. cerevisiae (top fermenting) and Saccharomyces pastorianus (bottom fermenting) brewing yeasts grown at two different temperatures to mid-exponential growth phase. A non-reference based approach was chosen in the form of alignment against a de novo assembled brewery-associated pan transcriptome to exclude bias introduced by manual selection of reference genomes. The result is an analysis workflow for self-contained comparative transcriptomics of Saccharomyces yeasts including, but not limited to, the analysis of core and accessory gene expression, functional analysis and metabolic classification. The functionality of this workflow is demonstrated along the principal differentiation of accessory transcriptomes of S. cerevisiae versus S. pastorianus strains. Hence, this work provides a concept enabling studies under different brewing conditions.


Brewing and yeasts
The brewing of beer is a chemically complex and lastly highly controlled biotechnological process. According to the German brewers' association there are more than 40 malt varieties, 250 different hops and over 400 yeast strains to choose from, not to mention the staggering differences in water quality. On the contrary, most of the beers result from fermentation with a very limited number of yeast strains, e.g. in Germany mostly four strains with one single strain accounting for approximately 65% are used. Thus, the unexploited combinations and possibilities to use different yeasts strains for the development of new beers are virtually endless [1].
Saccharomyces brewing yeasts can be categorised into the two species Saccharomyces (S.) cerevisiae and S. pastorianus. S. cerevisiae strains are fermented at elevated temperatures to genome for the study of species without a fully sequenced genome, since it directly accesses the base sequence [15]. To the best of our knowledge no investigative effort has been made to compare the gene expression profiles of top and bottom fermenting brewing yeasts, neither employing microarray technology, nor RNA sequencing.

Aim of the study
In this study the potential of comparative transcriptomics was explored of S. cerevisiae and S. pastorianus for their discrimination, and to avoid difficulties in the comparison of their aneuoploid genomes composed of mosaics of the parental genomes. The use of these yeasts in typical brewing processes employs different temperatures (15 or 20˚C). Still, for comparative transcriptomics targeted at differences useful for yeast classification, the transcriptional response to sub-optimal temperature for the respective yeast strain must be determined and possibly excluded. Therefore, comparative transcriptomics was performed in parallel at 15˚C and 20˚C for all cultures of four yeast strains each belonging to the species S. cerevisiae and S. pastorianus. Statistical analyses, alignment of pre-processed reads to a de novo assembled brewery associated pan transcriptome and functional annotation were performed to identify the core, pan and accessory transcriptome.
In order to accomplish that we aimed to establish a flexible workflow to perform self-contained transcriptomics and as a proof-of-concept evaluate the following hypothesis: If ale yeasts are genetically more diverse than lager yeasts [12] and lager yeasts can be phenotypically placed into two distinctly different groups [4,5], then this inherent diversity should be visible through transcriptomic analysis.

Yeast strains and culture media
The S. cerevisiae strains used in this study were: TUM 68, the most widely used commercial wheat beer strain in Germany; TUM 177, used for brewing Kölsch beer; TUM 211, an English ale strain, and TUM 511, an American ale strain, which is phylogenetically related to wine yeast strains [16]. The S. pastorianus strains used in this study were: TUM 34/70, one of the most widely used lager strains in Europe and representative of the group II Frohberg type S. pastorianus yeasts, as well as TUM 66/70, another Frohberg yeast; CBS 1503, the S. monacencis type strain and a group I Saaz type yeast, as well as CBS 1513, the S. carlsbergensis type strain and another Saaz type yeast. The strains were partly available from the in-house strain collection and partly kindly provided by the Forschungszentrum Weihenstephan für Brau-und Lebensmittelqualität (BLQ, Freising, Germany). The yeast strains were prepared as glycerol stocks (12% (w/v)) at the beginning of the project and stored at -80˚C as strains (isogenic strains in brackets) TMW 3.250 (TUM 86), TMW 3.256 (TUM 177), TMW 3.261 (TUM 211), TMW 3.673 (TUM 511), TMW 3.275 (TUM 34/70), TMW 3.285 (TUM 66/70), TMW 3.287 (CBS 1503) and TMW 3.681 (CBS 1513). Cultures were grown at 30˚C on YPD agar containing 2% glucose (w/v), 1% peptone (w/v), 5% yeast extract (w/v) and 1,5% agar (w/v) and propagated in liquid YPD (2% glucose (w/v), 1% peptone (w/v), 5% yeast extract (w/v)). For propagation 50 mL YPD were inoculated with a single colony from the second of two subsequent YPD agar plates and grown aerobically for three days at 30˚C and 180 rpm.

Growth and sample preparation
All experiments were conducted in biological triplicates. The growth experiments were performed in 250 mL Erlenmeyer flasks containing 50 mL YPD and yeast cultures pitched at cell densities of 3 × 10 5 cells mL -1 . Cells were grown at 15˚C and 20˚C, up to mid-exponential growth phase, which was determined in a previous experiment for all strains and both temperatures. These temperatures were chosen as the most frequently applied process temperatures for bottom and top fermented beers, respectively. Cells were then counted using a Thoma chamber (depth = 0.1 mm, volume = 0.0025 mm 2 ) and an aliquot of 3 × 10 8 cells was treated with RNAlater (Invitrogen, NN Bleiswijk, Netherlands), flash frozen in liquid nitrogen and stored at -80˚C until RNA isolation. Total RNA was extracted from the frozen cells through mechanical disruption with acid washed glass beads and a bead mill (FastPrep-24, MP Biomedicals, Irvine, CA, USA) and high quality RNA was isolated using the RNeasy Kit from Qiagen (Qiagen, Hilden, Germany). The quality and the quantity of the isolated RNA were checked by using Nanodrop 1000 spectrometer (Peqlab Biotechnologie GmbH, Erlangen, Germany) and isolated RNA was stored at -20˚C until shipment. During shipment the samples were cooled on dry ice.

Sequencing
RNA was submitted to a commercial provider for library construction and sequencing. Illumina HiSeq sequencing with a read length of 2 × 150 bp (paired-end reads) was carried out by GATC Biotech (Konstanz, Germany).

In silico analyses
A tailored bioinformatics workflow was developed to process and analyse the paired-end Illumina sequences. For that several established tools and algorithms were employed. The general workflow, which is detailed below, included: 1. Processing of raw read data of every single strain 2. Compilation of a de novo meta transcriptome using SPAdes [17] 3. Normalisation of read data 4. Alignment, assembly and calculation of differential expression using the Tuxedo package [18]: a. Alignment of pre-processed reads to the transcriptome using tophat2 with Bowtie2 as alignment engine b. Assembly of accepted hits for each replicate into transcripts using cufflinks and cuffmerge c. Calculation of differential expression using cuffdiff 5. Functional annotation a. Extraction of sequences (of differentially expressed genes) b. BLAST of gene sequences against MIPS Functional Catalogue (provided as supplementary material-S1 File).

Processing of transcriptomic raw data
Trimming and lengthsorting of reads was performed using the paired-end reads-aware trimming algorithm SolexaQA [19] and resulted in high quality reads in FASTQ format.
The resulting preprocessed data sets are deposited in the European Nucleotide Archive of the EMBL-EBI. The accession number of the whole project is PRJEB33088. The sample accession numbers are ERS3526866-913.

Compilation of a de novo meta transcriptome
A de novo pan transcriptome was compiled using rnaSPAdes with all parameters set to their default values [17].

Normalisation of read data
The calculation of the normalisation factors was performed using the R package DESeq with R software (DESeq version 1.30.0, R software version 3.4.3 "Kite-Eating Tree", http://www.rproject.org) [20]. Samtools (version 0.1.19_44428cd., http://www.htslib.org) and the function bamCompare from the software package deepTools (version 2.5.3, https://deeptools. readthedocs.io) were used to apply the normalisation factors to the gene counts stored in the alignment files. Differential gene expression was calculated based on the normalised gene counts (see above).

Alignment, assembly and calculation of differential expression using the Tuxedo package
Alignment to the de novo assembled pan transcriptome, transcriptome assembly and calculation of the differential gene expression was done using the Tuxedo protocol [18].
For alignment and subsequent transcriptome analysis a non-genomic-reference based approach was chosen, this means that each set of raw reads was aligned to the de novo assembled pan transcriptome and transcripts were assembled accordingly.
The trimmed and pre-processed paired-end reads were aligned to the pan transcriptome using TopHat2 version 2.1.1 [21] with Bowtie2 version 2.3.2 as its read-alignment engine [22] with all parameters set to their default values. Mapping statistics can be found in Table 1: Read alignment statistics. The read alignments were normalised (see below) and further assembled into transcripts using Cufflinks v2.2.1 [23] and merged into a merged transcriptome using Cuffmerge v1.0.0. Cuffdiff v2.2.1 was used to calculate the differential gene expression and graphic output was generated using the R packages CummeRbund version 2.18.0 [24] with R software version 3.4.0 ("You Stupid Darkness", http://www-r-project.org), ggplot2 version 2.2.1 [25] and factoextra version 1.0.5.

Functional annotation
Sequences of differentially expressed genes were extracted from the Cuffdiff output, translated to protein sequences and blasted against the MIPS Functional Catalogue [26,27] (provided as supplemental data-S1 File).

Statistical analysis
All experiments were performed in triplicates. Statistical analyses were performed using R software version 3.4.3 "Kite-Eating Tree". Principal component analysis was performed using the DESeq package for R (version 1.30.0) after compiling the data in Perseus version 1.6.0.7.
Principal component analysis with a false discovery rate (FDR) of 0.05 was conducted in Perseus version 1.6.0.7.

Functional linking to metabolic traits
As the aim of this study was to principally differentiate top from bottom fermenting yeasts as a proof of concept for the functionality of the workflow, and regulatory effects resulting from different growth temperatures should be excluded. In this context, a gene was considered as expressed in a strain if the mean values of all three replicates at both temperatures were greater than zero. This approach excludes the temperature bias and therefore regulatory effects from the evaluation. Gene expression is given in FPKM values calculated by the Tophat2 algorithm. The overlapping and strain specific gene expression of all four S. cerevisiae and S. pastorianus strains, respectively, was determined through depiction with Venn diagrams using Venny [28]. The genes mutually expressed in all four strains of each species were again compared via Venn diagram to assess the core transcriptome of all eight investigated brewing yeast strains. For metabolic inspection the genes were then sorted into functional categories according to the MIPS Functional Catalogue [26,27].

Results and discussion
This work was conducted to demonstrate the potential of comparative transcriptomics to delineate brewing yeast types despite their complex chromosomal settings. While the data should have limited significance with respect to their behaviour or metabolism in the fermentative state of the brewing process, they instead should demonstrate the differentiative potential of the self-contained de novo transcriptomics approach to differentiate these yeasts independently from brewing experiences. The results of the principal component analysis of all transcripts are shown in Fig 1. The S. pastorianus yeasts form three distinct clusters, whereby the two Frohberg yeasts form one cluster and the two Saaz yeasts diverge from that cluster and from one another into opposing directions. The cultivation temperatures used in this study had no effect on the expression profile of all four strains. This is to be seen differently from an exposure to cold shock upon yeast storage, which resulted in distinct responses in e.g. S. pastorianus [29]. The S. cerevisiae strains form a single cluster with the exception of the American ale yeast TUM511, which also is the only one showing significant temperature dependent differences in the gene expression profile.
S. pastorianus yeasts are reported to possess limited genetic diversity and therefore a limited influence on the final flavour profile of bottom fermented beers [30], especially in contrast to the enormous genetically encoded aroma producing ability of top fermenting S. cerevisiae yeasts [31][32][33]. This is not replicated in the clustering based on the expression profiles and may imply that parts of highly homologous genome regions are not (differentially) expressed under the conditions chosen for our experiment.
All expressed transcripts of one species, that could be annotated with a gene name according to the MIPS Functional Catalogue, were assessed using Venn diagrams [28]. This meant 786 transcripts expressed in S. cerevisiae yeasts and 1179 transcripts expressed in S. pastorianus yeasts. As shown in Fig 2 of the 786 transcripts expressed in S. cerevisiae, 627 genes are common to all four strains. This corresponds to 79.8% of the transcripts. In S. pastorianus only 542 of the 1179 transcripts are common to all four strains, which corresponds to 46% (Fig 3). As indicated by the respective numbers in these figures the strain specific in-group diversities were low. Generally, the (unexpectedly) higher number of transcripts found in the S. pastorianus group may be attributed to the presence of orthologues from two species (S. cerevisiae and S. eubayanus) in its genome. This shows that the inherit diversity in ale and lager yeasts is detectable through transcriptomic analysis, in which Ale yeasts presented themselves as less diverse compared to lager yeasts.
For S. cerevisiae none of the other transcript groups in the Venn diagram exceeded 6%, with the transcripts unique in the English ale yeast strain TMW 3.361 forming the biggest group with 44 transcripts (5.6%). The wheat beer yeast strain TMW 3.250 has only nine (1.1%) unique transcripts, the Kölsch yeast strain TMW 3.265 has only 15 (1.9%) unique genes and the American ale yeast strain TMW 3.673 28 (3.6%). This is shown in Fig 2. Fig 3 shows that two bigger groups are found within the S. pastorianus yeasts: the three strains TMW 3.275, TMW 3.285 and TMW 3.287 have 313 genes in common, this corresponds to 26.5%, whereas the three strains TMW 3.275, TMW 3.285 and TMW 3.681 share 197 genes, which corresponds to 16.7%. However, the number of transcribed genes for S. pastorianus, which are unique to one strain, is small: 14 (1.2%) and 6 (0.5%) for the two Frohberg yeasts TMW 3.275 and TMW 3.285, respectively, 16 (1.4%) for the Saaz yeast TMW 3.287 and 24 (2%) for the S. carlsbergensis type strain TMW 3.681.
In Fig 4 the comparison of the 627 transcripts common in all four S. cerevisiae strains and of the 542 transcripts common in all four S. pastorianus strains is shown. All eight Saccharomyces brewery related strains share 414 transcribed genes, which corresponds to 54.8%. The remaining transcripts are split into 213 (28.2%) for S. cerevisiae and 128 (17%) for S. pastorianus. These transcripts were classified according to their MIPS Functional Categories, and the ones common to all eight strains are summarised in Table 2. The genes exclusively expressed in either all four S. cerevisiae or all four S. pastorianus strains are shown in Tables 3, 4 and 5. Categories high in transcript numbers for all eight strains as well as genes exclusively expressed in either S. cerevisiae or S. pastorianus strains belong to categories reflecting transcription and translation, such as protein binding, ribosomal proteins, transcriptional control or translation, or categories related to metabolic activities, such as phosphate metabolism, C-compound and carbohydrate metabolism, and lipid, fatty acid and isoprenoid metabolism. This shows that S.   . cerevisiae (Fig 2) and 542 genes expressed by all four S. pastorianus (Fig 3) strains, respectively. The Venn diagram was generated using Venny [28]. https://doi.org/10.1371/journal.pone.0238924.g004