Figures
Abstract
The fission (Schizosaccharomyces pombe) and budding (Saccharomyces cerevisiae) yeasts have served as excellent models for many seminal discoveries in eukaryotic biology. In these organisms, genes are deleted or tagged easily by transforming cells with PCR-generated DNA inserts, flanked by short (50-100bp) regions of gene homology. These PCR reactions use especially designed long primers, which, in addition to the priming sites, carry homology for gene targeting. Primer design follows a fixed method but is tedious and time-consuming especially when done for a large number of genes. To automate this process, we developed the Python-based Genome Retrieval Script (GRS), an easily customizable open-source script for genome analysis. Using GRS, we created PRIMED, the complete PRIMEr D atabase for deleting and C-terminal tagging genes in the main S. pombe and five of the most commonly used S. cerevisiae strains. Because of the importance of noncoding RNAs (ncRNAs) in many biological processes, we also included the deletion primer set for these features in each genome. PRIMED are accurate and comprehensive and are provided as downloadable Excel files, removing the need for future primer design, especially for large-scale functional analyses. Furthermore, the open-source GRS can be used broadly to retrieve genome information from custom or other annotated genomes, thus providing a suitable platform for building other genomic tools by the yeast or other research communities.
Citation: Cummings MT, Joh RI, Motamedi M (2015) PRIMED: PRIMEr Database for Deleting and Tagging All Fission and Budding Yeast Genes Developed Using the Open-Source Genome Retrieval Script (GRS). PLoS ONE 10(2): e0116657. https://doi.org/10.1371/journal.pone.0116657
Academic Editor: Robertus A M. de Bruin, University College London, UNITED KINGDOM
Received: October 6, 2014; Accepted: December 8, 2014; Published: February 2, 2015
Copyright: © 2015 Cummings et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: All relevant data are within the paper and its Supporting Information files. The computer code, databases and the sequence files used to generate the databases are available through Figshare at the following URLs: Primed and GRS: http://figshare.com/articles/PRIMED_and_GRS/1265054, GRS: http://dx.doi.org/10.6084/m9.figshare.1265012, PRIMED: http://dx.doi.org/10.6084/m9.figshare.1265013 http://dx.doi.org/10.6084/m9.figshare.1265014 http://dx.doi.org/10.6084/m9.figshare.1265015 http://dx.doi.org/10.6084/m9.figshare.1265016 http://dx.doi.org/10.6084/m9.figshare.1265017 http://dx.doi.org/10.6084/m9.figshare.1265018.
Funding: MTC is the 2014 Alvan T. and Viola D. Fuller American Cancer Society Junior Research Fellow. This work was supported by an NCI Proton Beam Grant (C06 CA059267) and a V Scholar Award (http://www.jimmyv.org/) to MM. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The use of budding (Saccharomyces cerevisiae) and fission (Schizosaccharomyces pombe) yeasts has enabled many groundbreaking discoveries in eukaryotic biology. This is in large part because in these organisms genes can be deleted or tagged easily by transforming cells with PCR-generated DNA fragments, containing a heterologous insert flanked by short (50–100bp) regions of gene homology (Fig. 1) [1, 2]. To generate these DNA fragments, a series of modular vectors carrying different selectable markers and common proteins tags are used as PCR templates [3, 4]. The terminal homologies flanking the DNA inserts are provided by designing long PCR primers, which in addition to carrying the priming sites for these vectors, also carry the homology required for proper integration of these DNA constructs within the target gene. By varying the DNA insert and terminal homologies, perfect gene deletions and in-frame epitope fusions can be constructed easily and quickly (Fig. 1). These techniques also can be coopted to analyze the function of noncoding RNAs (ncRNAs) or non-transcribed regulatory elements in these organisms.
(A) To delete a gene, PCR-generated DNA fragments containing a selectable marker (DrugR) flanked by target gene homology (green block, upstream of start codon (ATG), and brown block, downstream of stop codons) are used to transform yeast cells. Recombination between the DNA fragment and the genomic locus deletes the gene and replaces it with the selectable marker. Terminal homologies used in transformation are embedded within especially designed long primers (shown as green and brown arrows) which are used to amplify the transforming DNA fragments from a series of previously described vectors (3,4). Primer design involves selecting the correct regions of homology (length = N basepair) relative to the target gene. For the forward (Fwd) primer, ATG plus N-3 bp upstream of the start codon is selected. For the reverse (Rev) primer, the stop codon and N-3 bp downstream of the gene is selected. For non-coding genes or other features, N bp upstream and downstream of the feature for Fwd and Rev primers is selected, respectively. (B) To add a tag to the C-terminus of a CDS, a Fwd primer (green arrow) containing N bp directly upstream of the stop codon along with Rev primer (depicted in brown as described in A) are used to amplify the fragment shown in this figure by PCR. Homologous recombination integrates the tag (orange) in-frame with the ORF, resulting in a fusion gene. The tag carries its own translation termination codon (shown as two asterisks on top of one another).
Primer design involves selecting the precise DNA sequence needed for the precise integration of the insert relative to the target gene (Fig. 1). This process requires access to sequence information, often is tedious, slow and error-prone especially if performed for several genes. Currently two websites offer automated primer design for tagging or deleting coding sequences (CDS) for the fission [5] and budding [6] yeasts. These websites are limited in scope in that primers can be obtained for only one gene at a time, deletion primers for noncoding RNAs (ncRNAs) are not provided, and in the case of S. cerevisiae, primers for only one lab strain (S288C) can be obtained. Also, these programs do not provide the flexibility to input custom or the most updated version of the yeast genomes. Here we present PRIMED, the complete PRIMEr Database for deleting (CDSs and ncRNAs) and C-terminal tagging (CDSs) the main S. pombe and five of the most commonly used S. cerevisiae strains.
To generate PRIMED, we developed the Python-based Genome Retrieval Script (GRS), an easily customizable open-source code for retrieving sequence information from annotated yeast or other genomes. PRIMED are accurate, comprehensive and are provided as downloadable Excel files, removing the need for future primer design, especially for large-scale functional analyses. Because of the compact nature of the yeast genomes, we also highlight instances in which deletion of a gene disrupts a part of a neighboring CDS or ncRNA, potentially complicating downstream analyses. Furthermore, the open source GRS can be used to retrieve genome information from other annotated genomes (for example, generating primer databases for other yeast species), thus providing a suitable platform for developing other genomic scripts by the yeast or other research communities. Overall, PRIMED and GRS are useful resources and tools, respectively.
Materials and Methods
Implementation
The databases were generated by a script written in Python 3.3.1. The main advantage of using Python is its large native library. Primers for deleting and C-terminal tagging CDSs and deleting ncRNAs were generated by implementing the following steps:
- Calling Input files: We used the reference genome and annotation files from pombase.org [7] (S. pombe), Saccharomyces genome database [8] (S. cerevisiae S288C) and Saccharomyces genome resequencing project [9] (RM11–1A, SK1, W303, and Y55 S. cerevisiae strains) as inputs. Inputs are 1) the whole genome sequence file (fa or equivalent format) and 2) the corresponding annotation file (gff3 or equivalent format). All genomic databases used in this study are listed in Table 1. The program begins by calling up its input file as read-only.
- Extracting feature information from input files: GRS first extracts chromosome level information from the sequence file (S1 Appendix and Fig. A in S1 Appendix). Next, it extracts the information of the desired genomic feature (for example CDS, ncRNA, or 3’ UTR) from the annotation file (S1 Appendix and Fig. B in S1 Appendix). For each feature, the program stores the chromosome number, start and end coordinates and transcriptional directionality. Finally, using these coordinates GRS extracts the sequence of the target feature plus N base pairs (bp) of additional sequence, added to its 5’ and 3’ ends. The optimal homology length (N) for HR-mediated integration is different among different yeasts, 80bp and 50bp for S. pombe [10] and S. cerevisiae [11], respectively. If the gene lies at the end of a chromosome, GRS fills the neighboring sequence, which falls outside of the chromosome, with repetitive N’s In S. cerevisiae, we treated pseudogenes and transposable elements as genes.
Designing forward deletion primer(Fig. 1A): As the name suggests, this primer is designed for deleting a feature, for example a gene. For CDSs, this primer is the start codon (ATG) plus N-3bp of sequence upstream of the start codon. For ncRNAs or other genomic features, this primer is N bp upstream of the start coordinate. This primer is provided in two versions: 1) the region of homology to the genome only, and 2) the region of homology plus the extra 20bp (CGG ATC CCC GGG TTA ATT AA) sequence for amplification from a pFA6a-based vector [3, 4]. The former can be added to a customized 20+bp sequence for amplification from a user-specific vector construct or PCR amplicon. The latter hybridizes to the multi-cloning site (MCS) on pFA6a- based vectors, which flank the heterologous DNA insert [3, 4]. For deleting 3’ UTRs, a transcriptional terminator must be provided for the gene, otherwise aberrant termination could disrupt gene function. Therefore, a different priming site is used (21 bp, GCG AAT TTC TTA TGA TTT ATG) which amplifies the adh1 terminator found on pFA6a- based vectors [3, 4]. Some recent reports have shown that the 3’ UTRs of gene can play an important role in regulating gene expression [12].
- Designing forward C-terminal tagging primer(Fig. 1B): This primer is used for C-terminal tagging of CDSs and is designed by extracting N bp upstream of the stop codon. We also provide two versions of this primer—with and without the extra 20 bp (CGG ATC CCC GGG TTA ATT AA) sequence for amplification from a pFA6a-series vector [3, 4]
- Designing Reverse primer (Fig. 1A and 1B): This primer is used for both deleting and C-terminal tagging of genes. For CDSs, this primer is the reverse complement of the stop codon plus N-3 bp of sequence downstream of the stop. For ncRNAs or other genomic features, this primer is the reverse complement of N bp downstream of the end coordinate. These primers use GAA TTC GAG CTC GTT TAA AC, the 20bp constant priming site used to amplify DNA from the pFA6a-series vectors.
- Creating the number and list of overlapping genes/ncRNAs: The average distance among genes and ncRNAs is short in yeast [12], thus creating deletions can result in a partial loss of a fragment of a neighboring open reading frame (ORF). In PRIMED, we note instances in which the deletion of a feature impacts the integrity of a neighboring gene, coding sequence (CDS) or ncRNA (S1-S6 Tables, columns M-R).
- Primer database: All primers for each strain are saved as a text file, which we later compiled as an Excel file (Fig. 2). They are presented in two versions plus or minus constant priming sequences found on the pFA6a- based vectors. Each primer output file provides the basic information of the given feature including start/stop coordinates, chromosome identity and direction of transcription. Also, we provide the name of the neighboring CDS or ncRNA which may be disrupted because of the deletion.
In all databases, we provide the systematic and common names, chromosome number, start/end coordinates, transcription strand and forward and reverse primer sequences for each feature under consideration. For deletion databases, also the name(s) and total number of overlapping ORFs, which are disrupted by creating the gene deletion, are indicated.
In summary, GRS was used to generate long primers for all strains and features listed in Table 1 (S1-S6 Tables).
The Python script GRS is customizable with different input parameters:
>python primer.py INPUT1 INPUT2 INPUT3
INPUT1 is an integer from 0 to 5 which specifies the input genome (0 = pombe, 1 = cerevisiae S288C, 2 = cerevisiae RM11 1A, 3 = cerevisiae SK1, 4 = cerevisiae W303 and 5 = cerevisiae Y55). Other input files can be used with our program with minor modifications. Therefore, GRS gives the user the flexibility to use custom (e.g. most updated version of the fission and budding yeast genomes) or other annotated (e.g. C. albicans) genomes for analysis.
INPUT2 determines the type of genomic feature to be analyzed by GRS (1 = CDS, 2 = ncRNA, 3 = 3’UTR (only for pombe), and 4 = tRNA). GRS scans the provided annotation files (gff3) to extract the genome coordinates and sequence for the specified feature.
INPUT3 determines the desired length of neighboring sequence (N in bp), extracted from the sequence files based on the feature coordinates. This was set to 80 for pombe and 50 for cerevisiae. This allows the user to change N easily pending application. For example, the primers for pombe CDS with 80bp homology (>python primer.py 0 1 80), and cerevisiae ncRNA with 50bp homology (>python primer.py 1 2 50) can be generated (S1 Appendix). The resulting output text files are Excel-importable.
The script generates three files for a given analysis (Fig. C in S1 Appendix).
File 1 (the header): In this file, genome coordinates, number of features under analysis and structure of the other output files are provided.
File 2 (the primer database): In this file, primers for deleting or C-terminal tagging each gene/feature are provided in addition to chromosome number, start and end coordinates and direction of transcription for each feature.
File 3 (a check file): In this file, 5’ and 3’ end regions and the entire sequence of the feature are provided for easy verification with the available genome browser tools (S1 Appendix and S1-S6 Tables). We checked our results via nucleotide BLAST for accuracy [13].
Results
The Databases
PRIMED was constructed for the main S. pombe (972 / ATCC 24843) and five of the most commonly used S. cerevisiae strains (S288C, RM11–1A, SK1, W303, Y55) (S1-S6 Tables, respectively). For each genome, primers are provided in a separate Excel file. Each Excel file is organized into four sheets, named CDS, ncRNA, 3’UTR (pombe only) and tRNA, indicating the genomic feature for which primers were designed. Systemic and common names along with chromosome number, genome coordinates, and Forward and Reverse primers, transcriptional directionality along with potential overlap with neighboring ORFs are provided for each feature. Search for individual features can be performed easily by providing the systemic or common name (for CDSs) in the “Search” feature in Excel. All six databases are provided as downloadable Excel files.
Conclusions
Here we present PRIMED, the complete PRIMEr Database for deleting and C-terminal tagging the main S. pombe and five of the most commonly used S. cerevisiae strains. These downloadable Excel files are accurate, comprehensive and also include deletion primers for all ncRNAs. To create PRIMED, we developed the open-source, Python-based Genome Retrieval Script (GRS). GRS uses whole genome and annotation files to extract coordinate and sequence information for any annotated feature. It allows users to extract sequence information from neighboring chromosomal regions at customizable lengths. Slight modifications to GRS can expand its application to custom or other annotated genomes, and enables beginners and advanced users to perform a variety of genomic analyses easily. We believe that GRS can act as a suitable platform for the development of other genomic tools by the scientific community. Overall PRIMED and GRS are valuable sources for the yeast research community, removing any need for future primer design and are a great time-saving resource for large-scale deletion or tagging studies.
Website
GRS is provided in two downloadable files (S1 File, Genome Retrieval Script (GRS) code, and S2 File, Genome Retrieval Script (GRS) Read Me file) in this paper. PRIMED are provided in S1-S6 Tables. GRS, PRIMED and genome sequence files used to generate PRIMED are available for download at www.massgeneral.org/motamedilab.
Supporting Information
S1 Appendix. Supporting Information text and figures.
Fig. A, Extracting name and length of a chromosome from genome sequence files. Red bold characters denote extracted information. For S. pombe, chromosome number and length is extracted from the sequence file, whereas for S. cerevisiae, chromosome length is calculated by counting the number of characters in the chromosome sequence. Fig. B, Extracting information about a genomic feature from annotation files. The script scans through the annotation file by “type”, and then extracts chromosome number, start/end coordinates, directionality and the systematic name for each feature. Fig. C, A sample of the output files generated using GRS. The output files shown above are for the S. pombe CDS database. (A) Header files contain genome coordinates, total number of features under analysis in the genome and structure of other the output files. (B) Primer files show all forward and reverse primers for deleting or tagging genes. The deletion primer databases also show if deleting the gene of interest disrupts neighboring ORFs. (C) The check file shows the 5’ and 3’ end regions of a feature for easy verification. Fig. D, Extracting other genomic features using GRS. Upper panel is a screen shot of the Read Me file. The script can generate sequence information for all CDSs, ncRNAs, 3’UTRs and tRNAs and the desired sequence length from neighboring regions. In addition, with minor modification, it can handle other genomic features or can be adopted to analyze another annotated yeast genome. Lower panel is a screen shot of the Python script for GRS. It shows an example of the comments provided with the GRS code. These comments instruct how the code can be modified to analyze custom genomes or other genomic features in an annotated genome.
https://doi.org/10.1371/journal.pone.0116657.s001
(DOCX)
S2 File. Genome Retrieval Script (GRS) Read Me file.
https://doi.org/10.1371/journal.pone.0116657.s003
(TXT)
S1 Table. PRIMED for S. pombe (972 / ATCC 24843).
https://doi.org/10.1371/journal.pone.0116657.s004
(XLSX)
Acknowledgments
We thank Martin Aryee, I Calvo, I Hill, J Khanduja, and A Shukla for the critical reading of the manuscript.
Author Contributions
Conceived and designed the experiments: MM. Performed the experiments: MTC RIJ. Analyzed the data: MTC RIJ MM. Contributed reagents/materials/analysis tools: MTC RIJ. Wrote the paper: MTC RIJ MM. Designed the software (GRC) used to generate the primer databases: MTC RIJ. Produced and tested the databases: MTC RIJ.
References
- 1. Rothstein R (1991) Targeting, disruption, replacement, and allele rescue: Integrative DNA transformation in yeast. Methods in Enzymol 194: 281–301.
- 2. Sabatinos SA, Forsburg SL (2010) Molecular Genetics of Schizosaccharomyces pombe. Methods in Enzymol 470: 759–795.
- 3. Bähler J, Wu JQ, Longtine MS, Shah NG, McKenzie A, et al. (1998) Heterologous modules for efficient and versatile PCR-based gene targeting in Schizosaccharomyces pombe. Yeast 14: 943–951. pmid:9717240
- 4. Longtine MS, McKenzie A, Demarini DJ, Shah NG, Wach A, et al. (1998) Additional modules for versatile and economical PCR-based gene deletion and modification in Saccharomyces cerevisiae. Yeast 14: 953–961. pmid:9717241
- 5. Penkett CJ, Birtle ZE, Bähler J (2006) Simplified primer design for PCR-based gene targeting and microarray primer database: two web tools for fission yeast. Yeast 23: 921–928. pmid:17072893
- 6. Cherry JM, Hong EL, Amundsen C, Balakrishnan R, Binkley G, et al. (2012) Saccharomyces Genome Database: the genomics resource of budding yeast. Nucl Acids Res 40: D700–5. pmid:22110037
- 7. Wood V, Gwilliam R, Rajandream M-A, Lyne M, Lyne R, et al. (2002) The genome sequence of Schizosaccharomyces pombe. Nature 415: 871–880. pmid:11859360
- 8. Engel SR, Dietrich FS, Fisk DG, Binkley G, Balakrishnan R, et al. (2014) The reference genome sequence of Saccharomyces cerevisiae: then and now. G3 4: 389–398. pmid:24374639
- 9.
Durbin R, Louis E (2008) Saccharomyces genome resequencing http://www.sanger.ac.uk/research/projects/genomeinformatics/sgrp.html Accessed 1 August 2014.
- 10. Kaur R, Ingavale SS, Bachhawat AK (1997) PCR-mediated direct gene disruption in Schizosaccharomyces pombe. Nucl Acids Res 25: 1080–1081. pmid:9023122
- 11. Lafontaine D, Tollervey D (1996) One-step PCR mediated strategy for the construction of conditionally expressed and epitope tagged yeast proteins. Nucl Acids Res 24: 3469–3471. pmid:8811105
- 12. Yu R, Jih G, Iglesias N, Moazed D (2014) Determinants of heterochromatic siRNA biogenesis and function. Mol Cell 53: 262–276. pmid:24374313
- 13.
Wood V, Harris MA, McDowall MD, Rutherford K, Vaughan BW, et al. (2012) PomBase: a comprehensive online resource for fission yeast. Nucl Acids Res 40: D695–9.
- 14. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410. pmid:2231712