Epigenetic regulation consists of a multitude of different modifications that determine active and inactive states of chromatin. Conditions such as cell differentiation or exposure to environmental stress require concerted changes in gene expression. To interpret epigenomics data, a spectrum of different interconnected datasets is needed, ranging from the genome sequence and positions of histones, together with their modifications and variants, to the transcriptional output of genomic regions. Here we present a tool, Podbat (Positioning database and analysis tool), that incorporates data from various sources and allows detailed dissection of the entire range of chromatin modifications simultaneously. Podbat can be used to analyze, visualize, store and share epigenomics data. Among other functions, Podbat allows data-driven determination of genome regions of differential protein occupancy or RNA expression using Hidden Markov Models. Comparisons between datasets are facilitated to enable the study of the comprehensive chromatin modification system simultaneously, irrespective of data-generating technique. Any organism with a sequenced genome can be accommodated. We exemplify the power of Podbat by reanalyzing all to-date published genome-wide data for the histone variant H2A.Z in fission yeast together with other histone marks and also phenotypic response data from several sources. This meta-analysis led to the unexpected finding of H2A.Z incorporation in the coding regions of genes encoding proteins involved in the regulation of meiosis and genotoxic stress responses. This incorporation was partly independent of the H2A.Z-incorporating remodeller Swr1. We verified an Swr1-independent role for H2A.Z following genotoxic stress in vivo. Podbat is open source software freely downloadable from www.podbat.org, distributed under the GNU LGPL license. User manuals, test data and instructions are available at the website, as well as a repository for third party–developed plug-in modules. Podbat requires Java version 1.6 or higher.
Citation: Sadeghi L, Bonilla C, Strålfors A, Ekwall K, Svensson JP (2011) Podbat: A Novel Genomic Tool Reveals Swr1-Independent H2A.Z Incorporation at Gene Coding Sequences through Epigenetic Meta-Analysis. PLoS Comput Biol 7(8): e1002163. doi:10.1371/journal.pcbi.1002163
Editor: Andreas Prlić, University of California San Diego, United States of America
Received: March 28, 2011; Accepted: May 26, 2011; Published: August 25, 2011
Copyright: © 2011 Sadeghi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by grants from the Swedish Cancer Society, the Göran Gustafsson foundation for research in natural sciences and medicine to KE, and from the Swedish research council to KE and JPS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Epigenetic regulation consists of a multitude of different inter-connected modifications that determine the active and inactive states of chromatin. Chromatin is modified in a variety of ways, such as post-translational histone modifications and reorganization of histone variants, but also DNA methylations and control of the DNA helical torsion , . Pivotal cellular changes such as those that occur during cellular differentiation, cell cycle arrest or programmed cell death induced by genotoxic stress, require prompt and accurate changes of gene expression patterns. The co-dependency of chromatin modifications makes it desirable to analyze the entire range of modifications simultaneously and, in addition, relate it to transcriptional levels . In fission yeast, Schizosaccharomyces pombe, chromatin regulation has been widely studied using epigenomics techniques. However, as studies have used different platforms, it is challenging to combine datasets without loosing the details of the original data. The large data volume from genomic experiments makes the analysis cumbersome even for model organisms with relatively small genomes. Several tools are available to visualize the data , ,  and commercial software (Partek Inc., St Louis, MO; DNAnexus Inc., Palo Alto, CA; GeneSpring GX, Agilent Technologies Inc., Santa Clara, CA; Avadis NGS, Strand Scientific Intelligence, San Fransisco, CA) is useful for analysis, and usually cannot easily be appended for researcher-generated questions. Especially, few algorithms can be used to identify regions where proteins are bound to DNA in a multi-sample dataset. In combination, this generally results in the need of informatics support and ad hoc script implementation to explore the experimental data.
We set out to develop an integrated computational tool, Podbat (Positioning database and analysis tool) for use on epigenomics datasets. Podbat is freely available and open source, focuses on scientific clarity and user-friendliness, and implements robust and cutting-edge algorithms. We exemplify the use of Podbat by reanalyzing all to-date published genome-wide datasets for genome occupancy data of the histone variant H2A.Z in S. pombe in WT and swr1Δ cells , , ,  together with transcriptional data after different perturbations , , , , , , , , ,  (Table 1). To exemplify the possibility of cross-species comparisons, datasets from the related budding yeast, Saccharomyces cerevisiae, is included in the analysis , . H2A.Z is encoded by the pht1+ gene in S. pombe , and is a replacement variant of the core histone H2A. It is known that H2A.Z is incorporated in the nucleosomes close to the promoters by chromatin remodeler Swr1 , , marking genes poised for activation . However, recently, H2A.Z has been identified as playing a role in transcription elongation ,  and also in Swr1-independent functions following genotoxic insult . The Podbat analysis presented here led to previously unexpected findings regarding H2A.Z-mediated gene regulation.
Design and Implementation
In order to interpret epigenomics data, Podbat follows a simple flowchart (Figure 1) and implements a flexible genome browser in its core. The main view of the user interface (Figure S1) consists of three core panels: 1) the ‘genome panel’ that displays the chromosomes and the selected regions together with the associated data in a visual manner; 2) the ‘selected element panel’, displaying a spreadsheet with the elements – genes, siRNA, promoter regions or any other element of interest – together with associated data in numerical or text format; 3) the ‘control panel’ where the user can preprocess data, define and apply filters and create sets of selected genetic elements.
The controller of the core component is the hub of the application. The core handles input and output from data files and independent modules. Text files with genomic data are imported into Podbat.
Podbat was optimized using the yeast genomes of S. cerevisiae and S. pombe. Both genomes are relatively compact, 12 Mb and 13 Mb respectively. However, any organism with a sequenced genome can be uploaded. Genomes can easily be imported and updated as Podbat connects directly to Ensembl, facilitating the use of the latest available annotations. Once the genome is set, experimental data is loaded and superimposed. Data of different sources may be visualized and analyzed simultaneously. Files can be uploaded in standard formats used for sequencing and microarray experiments. Once the data is loaded, regions can be identified based on the input data (Figure 2A). Two algorithms are available for this purpose: i) Hidden Markov Models (HMM) and ii) sliding windows. The parameters can be estimated ad hoc or using the Baum-Welch algorithm . Automated estimation of the parameters leads to robust classification results . As chromatin immunoprecipitation (ChIP) experiments generate DNA fragments of considerable length (typically >100 bp), regions defined in different samples will be variable by nature. By consecutive determination of regions in all samples, regions can be merged to detect biologically relevant fragments. When analyzing RNA data, this allows the detection of all transcribed regions, even if not previously described or in genomes where annotations are not available. The regions can be quantified – reflecting the expression of genes or binding to genetic elements, filtered and compared between experiments. The output is provided in a spreadsheet format for further analysis.
A) Regions (red squares) with high levels of H2A.Z are identified using Hidden Markov Models. Many of the promoter regions show some H2A.Z occupancy. Top panel displays the genomic overview, bottom panel shows a zoomed in fraction of the genome. ORFs are displayed as black boxes. B) The genome average shows that most H2A.Z is found at the promoter region of genes, and this is dependent on Swr1. The data is aligned at the Translation Start Site (TSS). C) The active mark H3K4me coincides with H2A.Z, but not the repressive mark H3K9me. Error bars represent the 95% confidence interval.
Additionally, Podbat allows data alignments of gene sets by the start or end position (Figure 2B–C). Using this functionality, one can visualize e.g. a protein binding within region surrounding a fix position, such as the start codon. To analyze different sets of genes, tests of enrichment can be performed between gene sets. This is a powerful method to find correlations between and within datasets. Gene sets can be defined manually from previous experiments or from public databases such as GO (Gene Ontology) or KEGG (Kyoto Encyclopedia of Genes and Genomes).
Data Storing and Sharing
In addition to saving sessions, the user has the option to store and share datasets by exporting the data into a MySQL database. The database connects seamlessly with the Podbat user interface. For unpublished data, users can create a local database, only accessible for data sharing within a smaller community. To publicly share data, a central database can be accessed. By public or private central data storage, the software facilitates data meta-analysis and incorporation of data from previous experiments. Several datasets for histone modifications and RNA transcription in various mutants of S. pombe are currently centrally uploaded and available for comparison.
Podbat functionality can be added by third party developed plug-in modules. These modules may extend the methods of the application core and tailor analysis. Development guidelines and Podbat application programming interface (API) are available at http://www.podbat.org. Validated and tested plug-in modules may be accommodated on this webpage.
Swr1 Rearranges Nucleosomes and Incorporates H2A.Z as an Active Mark
S. pombe and S. cerevisiae H2A.Z occupancy data from different sources and four experimental platforms was imported into Podbat (Table 1) , , . We show here the results from one of the S. pombe datasets, but the other data is consistent with these results (Supplemental material). As expected, most genes had an Swr1-dependent H2A.Z peak near the first transcribed nucleosome (Figure 2A,B). The average value representing the enrichment of the H2A.Z binding for these regions was 0.76±0.25 (mean±s.d.), which became reduced to 0.04±0.17 in the swr1Δ strain. The genome wide enrichment of H2A.Z in WT was 0.06±0.45 at ORFs. We identified 157 genomic regions of H2A.Z occupancy of length>100 bp and at least 2.5 fold increase over background through HMM (Table S1). HMM parameters were estimated by the Baum-Welch algorithm . The H2A.Z mark coincides with methylation of histone H3 lysine 4 (H3K4me), a mark of active chromatin, but does not correlate with the repressive mark of methylated lysine 9 (H3K9me) (Figure 2C). The regions of enriched H2A.Z occupancy overlap with 214 genes (Table S2). The Gene Ontology annotations of these genes showed enrichment for two broadly defined categories: cell cycle maintenance (both mitotic and meiotic) and DNA damage response. Also genes involved in transport, mitochondrial function and structure as well as metabolism were overrepresented among the H2A.Z bound regions (Table 2 and Table S3).
During the Cell Cycle, the H2A.Z Distribution Is Modified by Swr1
To further examine the genome-wide H2A.Z occupancy, we consulted all published genome-wide data on cell cycle and stress response in S. pombe (Table 1). The data from previously identified gene sets were aligned at the Translation Start Sites (TSS). We noted that most gene sets deviated minimally from the average gene profile, i.e. H2A.Z occupancy was found in the promoter regions in an Swr1-dependent fashion , , . Interestingly, cell cycle specific gene sets revealed that the genes up-regulated in S and early G2 phase of the cell cycle had an H2A.Z depletion in promoter regions and within the coding sequence (Figure 3A). This effect was not related to differential expression of the genes (data not shown). As for meiotic genes, one specific gene set deviated, consisting of genes induced late after nitrogen starvation, before diploid cells enter into meiosis , . These genes had a pronounced Swr1-dependent H2A.Z occupancy in both promoters and gene coding sequences. This gene set is highly related to genes under transcriptional control by Ste11p (Figure 3C) , a transcription factor critical for regulation of mating-type specific genes during meiosis . The subset of mating-type specific Ste11p-controlled genes showed an even more pronounced H2A.Z binding (Figure 3B).
A) Genes that are up-regulated in S and (early) G2 show H2A.Z depletion within the coding sequence (error bars not shown for clarity). B) Genes induced late after nitrogen starvation show a distribution of H2A.Z in gene coding regions. This effect is even more pronounced in the Ste11p-induced mating type specific genes. Error bars represent the 95% confidence interval. C) Venn diagram showing the overlap between genes induced after N2 starvation and genes under Ste11p control, including the subset which is mating type-specific (m.t.-s.).
Swr1-Independent H2A.Z Incorporation Relevant to Genotoxic Stress Response
The coding region of stress-responsive genes , ,  were associated with a slight but significant (p<10−28) H2A.Z occupancy in cells lacking Swr1 (data not shown). Also, analysis of H2A.Z deposition in S. cerevisiae revealed significant overlap of gene coding sequences bound by the H2A.Z homolog  and genes differentially transcribed after environmental stress  (Figure S4). Dissection of the stress response in S. pombe revealed that an Swr1-independent H2A.Z binding at gene coding regions was obvious after genotoxic stress (Figure 4A). Genes induced after exposure to MMS and cisplatin showed similar patterns. Recently, it was shown in S. cerevisiae that the roles of the Swr1-containing complex and H2A.Z become decoupled after treatment with MMS . As cells are damaged, H2A.Z becomes deacetylated. We investigated the H2A.Z occupancy profile of genes that are affected by changes in acetylation of H2A.Z . A large proportion of these genes are MMS-induced (Figure 4B). We found that genes that are up-regulated upon deletion of the acetylation residues (H2A.Z ΔN) follow the same pattern and have a strong Swr1-independent H2A.Z occupancy in gene coding regions (Figure 4A).
A) Genes induced by MMS have an Swr1-independent incorporation of H2A.Z in their gene coding regions. This is also true for genes that are up-regulated by deletion of the acetylatable N-terminal of H2A.Z (ΔN). Error bars represent the 95% confidence interval. B) Venn diagram showing the overlap between the different gene sets. C) Cells lacking H2A.Z (pht1Δ) have a slight sensitivity to high levels of MMS-induced genotoxic stress. Mutant and WT cells were diluted 1∶10, plated on YES with different doses of MMS (0%, 0.003%, 0.006%). D) Colony size determination reveals an SWR1-independent effect of deleting H2A.Z (pht1Δ) after treatment with MMS. Asterisk (*) marks a significant genetic interaction (p<0.001). At least 20 colonies were measured each in 5 independent experiments.
To determine whether the Swr1-independent H2A.Z functions are relevant for the cellular phenotype, we tested the sensitivity of mutant S. pombe cells to different conditions. Cells devoid of H2A.Z, Swr1 or both were exposed to MMS (Figure 4C). At a quick glance, pht1Δ (lacking H2A.Z) appears to be sensitive but even at high doses, many small colonies can be observed. Reintroduction of the pht1+ gene rescues the phenotype. The swr1Δ and the pht1Δswr1Δ cells show the same sensitivity as WT. More importantly, determination of colony size under basal conditions and after MMS exposure (Figure 4D) reveals a suppressive genetic interaction between pht1+ and swr1+ (with a score +2.0, p<0.001, calculated as in ). However, as in the related yeast S. cerevisiae, this interaction is reduced in the MMS treated condition (score +1.4, p>0.05) , as expected from our computational analysis, confirming an Swr1-independent effect of H2A.Z after genotoxic damage.
In summary, the results presented here confirm an Swr1-independent pattern of H2A.Z distribution, relevant for the response to DNA damage.
Availability and Future Directions
Podbat is open source software implemented as a desktop application in Java and can be freely and anonymously downloaded from www.podbat.org or as supplemental material accompanying this paper (Software S1). Podbat is distributed under the GNU LGPL license. User manuals, test data and instructions are available at the website, as well as a repository for third party developed plug-in modules. Podbat is tested and runs on any platform that supports Java version 1.6 and higher. Podbat is an ongoing project and new modules are continuously being developed. Data used to generate results presented in this article are available as supplementary information.
Screenshot of Podbat with 5 datasets loaded. Four genes on chromosome II are highlighted in red. A quantification of the signal has been determined of the protein binding/RNA expression/histone modification.
H2A.Z bound genes of different gene sets are aligned at the Translation Start Site (TSS) from the dataset GSM432595 from Zofall et al. . H2A.Z bound genes were identified as overlapping with HMM determined regions of increased H2A.Z signal (parameters used as in the other datasets with the exception of Region length threshold = 20 instead of 100, since this array has probes more sparsely spaced). 94 regions were identified, overlapping with 438 genes. A) Genes differentially expressed during different stages of the cell cycle . B) Gene induced by late after nitrogen starvation  and mating-type specifically (m.t.-s.) regulated genes induced through Ste11p .
H2A.Z bound genes of different gene sets are aligned at the Translation Start Site (TSS) from the dataset GSM432576 of the partial from Zofall et al. . Only approximately 2 Mb of the genome (15%), consiting of chromosome II and three telomeric sequences (1L, 2L, 2R) are covered in this array. 994/5857 ORFs are within this region. H2A.Z bound genes were identified as overlapping with HMM determined regions of increased H2A.Z signal (parameters used as in the other datasets). 27 regions were identified, overlapping with 113 genes. A) Genes differentially expressed during different stages of the cell cycle . B) Gene induced by late after nitrogen starvation  and mating-type specifically (m.t.-s.) regulated genes induced through Ste11p . NB the small number of investigated genes in these gene set.
Venn diagrams showing the overlap between genes where the coding regions are bound by H2A.Z homolog in S. cerevisiae (levels deviating 2-fold from genome average) and genes differentially regulated after environmental stress. P-values for overlaps are calculated by the hypergeometric distribution.
H2A.Z bound regions.
H2A.Z bound genes.
Complete list of GO terms enriched in the H2A.Z bound genes from .
GO terms enriched in the GSM432595 data  of H2A.Z bound genes.
The authors would like to thank members of the lab for software testing and helpful suggestions, and Jenna Persson for critically reading and commenting on the manuscript.
Conceived and designed the experiments: KE JPS. Performed the experiments: LS CB AS JPS. Analyzed the data: LS JPS. Contributed reagents/materials/analysis tools: AS. Wrote the paper: KE JPS.
- 1. Durand-Dubief M, Svensson JP, Persson J, Ekwall K (2011) Topoisomerases, chromatin and transcription termination. Transcription 2: 66–70.
- 2. Durand-Dubief M, Persson J, Norman U, Hartsuiker E, Ekwall K (2010) Topoisomerase I regulates open chromatin and controls gene expression in vivo. Embo Journal 29: 2126–2134.
- 3. Sinha I, Wiren M, Ekwall K (2006) Genome-wide patterns of histone modifications in fission yeast. Chromosome Research 14: 95–105.
- 4. Integrated Genomics Viewer. http://www.broadinstitute.org/igv.
- 5. Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Research 31: 51–54.
- 6. Nicol JW, Helt GA, Blanchard SG, Raja A, Loraine AE (2009) The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics 25: 2730–2731.
- 7. Buchanan L, Durand-Dubief M, Roguev A, Sakalar C, Wilhelm B, et al. (2009) The Schizosaccharomyces pombe JmjC-Protein, Msc1, Prevents H2A.Z Localization in Centromeric and Subtelomeric Chromatin Domains. Plos Genetics 5:
- 8. Halley JE, Kaplan T, Wang AY, Kobor MS, Rine J (2010) Roles for H2A.Z and Its Acetylation in GAL1 Transcription and Gene Induction, but Not GAL1-Transcriptional Memory. Plos Biology 8:
- 9. Kim HS, Vanoosthuyse V, Fillingham J, Roguev A, Watt S, et al. (2009) An acetylated form of histone H2A.Z regulates chromosome architecture in Schizosaccharomyces pombe. Nature Structural & Molecular Biology 16: 1286–U1107.
- 10. Zofall M, Fischer T, Zhang K, Zhou M, Cui BW, et al. (2009) Histone H2A.Z cooperates with RNAi and heterochromatin factors to suppress antisense RNAs. Nature 461: 419–U120.
- 11. Chen DR, Toone WM, Mata J, Lyne R, Burns G, et al. (2003) Global transcriptional responses of fission yeast to environmental stress. Molecular Biology of the Cell 14: 214–229.
- 12. Mata J, Lyne R, Burns G, Bahler J (2002) The transcriptional program of meiosis and sporulation in fission yeast. Nature Genetics 32: 143–147.
- 13. Gatti L, Chen D, Beretta GL, Rustici G, Carenini N, et al. (2004) Global gene expression of fission yeast in response to cisplatin. Cellular and Molecular Life Sciences 61: 2253–2263.
- 14. Mata J, Bahler J (2006) Global roles of Ste11p, cell type, and pheromone in the control of gene expression during early sexual differentiation in fission yeast. Proceedings of the National Academy of Sciences of the United States of America 103: 15517–15522.
- 15. Oliva A, Rosebrock A, Ferrezuelo F, Pyne S, Chen HY, et al. (2005) The cell cycle-regulated genes of Schizosaccharomyces pombe. Plos Biology 3: 1239–1260.
- 16. Peng X, Karuturi RKM, Miller LD, Lin K, Jia YH, et al. (2005) Identification of cell cycle-regulated genes in fission yeast. Molecular Biology of the Cell 16: 1026–1042.
- 17. Dutrow N, Nix DA, Holt D, Milash B, Dalley B, et al. (2008) Dynamic transcriptome of Schizosaccharomyces pombe shown by RNA-DNA hybrid mapping. Nature Genetics 40: 977–986.
- 18. Murai T, Nakase Y, Fukuda K, Chikashige Y, Tsutsumi C, et al. (2009) Distinctive Responses to Nitrogen Starvation in the Dominant Active Mutants of the Fission Yeast Rheb GTPase. Genetics 183: 517–527.
- 19. Rustici G, Mata J, Kivinen K, Lio P, Penkett CJ, et al. (2004) Periodic gene expression program of the fission yeast cell cycle. Nature Genetics 36: 809–817.
- 20. Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, et al. (2000) Genomic expression programs in the response of yeast cells to environmental changes. Molecular Biology of the Cell 11: 4241–4257.
- 21. Millar CB, Xu F, Zhang KL, Grunstein M (2006) Acetylation of H2AZ Lys 14 is associated with genome-wide gene activity in yeast. Genes & Development 20: 711–722.
- 22. Carr AM, Dorrington SM, Hindley J, Phear GA, Aves SJ, et al. (1994) Analysis of a Histone H2a Variant from Fission Yeast - Evidence for a Role in Chromosome Stability. Molecular & General Genetics 245: 628–635.
- 23. Mizuguchi G, Shen XT, Landry J, Wu WH, Sen S, et al. (2004) ATP-Driven exchange of histone H2AZ variant catalyzed by SWR1 chromatin remodeling complex. Science 303: 343–348.
- 24. Raisner RM, Hartley PD, Meneghini MD, Bao MZ, Liu CL, et al. (2005) Histone variant H2A.Z marks the 5′ ends of both active and inactive genes in euchromatin. Cell 123: 233–248.
- 25. Persson J, Ekwall K (2010) Chd1 remodelers maintain open chromatin and regulate the epigenetics of differentiation. Experimental Cell Research 316: 1316–1323.
- 26. Santisteban MS, Hang M, Smith MM (2011) Histone variant H2A.Z and RNA polymerase II transcription elongation. Mol Cell Biol: Feb 28. [Epub ahead of print].
- 27. Weber CM, Henikoff JG, Henikoff S (2010) H2A.Z nucleosomes enriched over active genes are homotypic. Nature Structural & Molecular Biology 17: 1500–U1136.
- 28. Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, et al. (2010) Rewiring of Genetic Networks in Response to DNA Damage. Science 330: 1385–1389.
- 29. Baum LE, Petrie T, Soules G, Weiss N (1970) A Maximization Technique Occurring in Statistical Analysis of Probabilistic Functions of Markov Chains. Annals of Mathematical Statistics 41: 164–&.
- 30. Humburg P, Bulger D, Stone G (2008) Parameter estimation for robust HMM analysis of ChIP-chip data. Bmc Bioinformatics 9:
- 31. Sugimoto A, Iino Y, Maeda T, Watanabe Y, Yamamoto M (1991) Schizosaccharomyces-Pombe Ste11+ Encodes a Transcription Factor with an Hmg Motif That Is a Critical Regulator of Sexual Development. Genes & Development 5: 1990–1999.
- 32. Schuldiner M, Collins SR, Thompson NJ, Denic V, Bhamidipati A, et al. (2005) Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell 123: 507–519.