ChIP-Chip Designs to Interrogate the Genome of Xenopus Embryos for Transcription Factor Binding and Epigenetic Regulation

Background Chromatin immunoprecipitation combined with genome tile path microarrays or deep sequencing can be used to study genome-wide epigenetic profiles and the transcription factor binding repertoire. Although well studied in a variety of cell lines, these genome-wide profiles have so far been little explored in vertebrate embryos. Principal Findings Here we report on two genome tile path ChIP-chip designs for interrogating the Xenopus tropicalis genome. In particular, a whole-genome microarray design was used to identify active promoters by close proximity to histone H3 lysine 4 trimethylation. A second microarray design features these experimentally derived promoter regions in addition to currently annotated 5′ ends of genes. These regions truly represent promoters as shown by binding of TBP, a key transcription initiation factor. Conclusions A whole-genome and a promoter tile path microarray design was developed. Both designs can be used to study epigenetic phenomena and transcription factor binding in developing Xenopus embryos.


Introduction
Chromatin immunoprecipitation combined with either microarray hybridization (ChIP-chip) or sequencing (ChIP-seq) allows to determine genomic association of DNA binding proteins and to analyze epigenetic regulation [1,2,3]. Active transcription coincides with epigenetic features like the presence of the histone H3 lysine 4 trimethylation (H3K4me3) and acetylation of lysine 9 of histone H3 (H3K9ac) at the 59 end of transcriptionally active genes [4,5,6]. In the context of development little is known about epigenetic marks in embryos using ChIP profiling. H3K4me3 ChIP-chip using zebrafish embryos identified actively transcribed embryonic genes [7]. The latter mark was also detected in Drosophila embryos at promoter regions [8]. ChIP sequencing of H3K4me3 and H3K27me3, an inactive histone mark, was used to enhance 59 gene annotation in Xenopus and to analyze spatial regulation of gene expression during gastrulation [9]. The two marks only appear after the onset of transcription at the mid-blastula transition with a hierarchy in deposition of H3K4me3 and H3K27me3, respectively.
The challenge in the near future is to elucidate epigenetic transitions and transcription networks that underlie early verte-brate development, the analysis of which will be facilitated by genome binding site analyses using either ChIP-chip or ChIP-seq. Genome tiling arrays are most cost-efficient if a dedicated design covering the relevant regulatory regions is used. Such a design can be made on the basis of annotation, or alternatively using experimental data of histone modifications like H3K4me3 that identify these regulatory regions. Here we present results obtained using a five array whole-genome tiling X. tropicalis design and a promoter microarray design based on chromatin decorated with H3K4me3. The data show that both microarray designs are suitable for studies of histone modifications and transcription factor binding events in early Xenopus embryogenesis.

Genome-Wide ChIP-Chip Using X. tropicalis Gastrula Embryos
To explore epigenetic features of gastrula-stage Xenopus tropicalis embryos, chromatin was harvested (Nieuwkoop-Faber stage [11][12] and immunoprecipitated using H3K4me3 antibodies. The purified DNA of H3K4me3-ChIP was amplified using a T7 amplification protocol [10] and hybridized to a newly designed Nimblegen whole-genome tilepath set of five microarrays covering the complete X. tropicalis genome (Joint Genome Institute v4.1). The microarrays used each consist of 2.1M isothermal (76uC) probes of 50-75bp in length that are spaced by an average of 100 bp in the genome, with the exclusion of repeat-masked regions ( Figure 1A). The H3K4me3 signal was present at the 59 end of genes, as is shown for a typical euchromatic genomic region, covering the genes ppil2, ypel1 and mapk1 ( Figure 1B). To generate a single promoter microarray design ( Figure 1C), all putative H3K4me3 positive regions (27,019 regions, see material & methods for details), together with the 59 ends of all 27,916 Joint Genome Institute FilteredModels genes (22 kb/+2 kb) were selected. In addition, approximately 1,000 regions were included based on clusters of 59 ends of ESTs (22 kb/+2 kb). In total, the promoter microarray design contains 1,951,339 probes, comprised of 46,302 contiguous sequences with a median length of ,4 kb. In addition, to measure genomic background ChIP-chip signals for the purpose of peak detection the contiguous sequence of several scaffolds were included completely.

Promoter Microarray H3K4me3 ChIP-Chip
Two new H3K4me3 gastrula ChIP-chip samples were generated, amplified using the T7 method and hybridized to the new promoter microarray design. The two biological replicate samples are quite similar in terms of location and shape of the enriched regions as is shown for part of scaffold 1 (Figure 2A). We determined the correlation between the genome-wide ChIP-chip and the promoter ChIP-chip experiments ( Figure 2B). The mean signal per peak of the genome-wide experiment correlated with the experiments using the promoter microarray design (r = 0.72 and r = 0.70 respectively, p,2.2 * 10 216 ). The correlation between the promoter microarray experiments was r = 0.94 (p,2.2 * 10 216 ). This shows that ChIP-chip experiments based on genome-wide tiling or a promoter microarray design are highly reproducible and that the biological variance between different batches of embryos is relatively low.

Analyses of H3K4me3 Triplicate
We determined the profile of the H3K4me3 signal over the 59 ends of X. tropicalis experimentally validated genes (Xtev [9]). H3K4me3 is predominantly found at the transcription start site (TSS; Figure 3A) as expected. The distribution of the H3K4me3 mark over Joint Genome Institute FilteredModels genes shows a similar profile ( Figure S1). We determined the number of H3K4me3-enriched regions using TileMap [11]. In total 10,179 H3K4me3 positive regions were detected. To validate this peak set, randomly selected targets were tested in ChIP-qPCR using three new biological ChIP samples ( Figure 3B). Using two negative loci on scaffold 1 (scaffold_1:6458583-6458633) and scaffold 919 (scaf-fold_919:126357-126407) to determine the genomic background, 16 out of 17 regions were enriched more than 2.5-fold ( Figure S2, FDR ,0.06). The H3K4me3 peak set shows a high degree of overlap with 59 ends of Xtev genes (within 1 kb of the TSS, p,10 225 ; Figure 3C). Up to 89% of the H3K4me3-enriched regions (9,031 of 10,179) are within 1kb of genes. A list of all H3K4me3-enriched regions and associated Xtev, JGI FilteredModels and RefSeq genes is supplied as a supplemental table (Table  S1). The peaks of genome-wide ChIP-chip were compared to ChIPseq of H3K4me3 of gastrula-stage embryos [9]. 88% of the peaks determined by ChIP-chip are also detected by ChIP-seq (p,10 225 ; Figure 3D). These results show that H3K4me3-based ChIP-chip experiments are highly reproducible and that both experimental platforms identify an almost identical collection of H3K4me3enriched genomic sequences. Moreover, integration of ChIP and EST data allows linking of 'orphan' H3K4me3 peaks to gene models on different genomic scaffolds in cases where promoter and coding regions were placed on different sequence contigs during genome assembly (schematic overview, Figure 3E). Both 59 and 39 exons can be located on different scaffolds. To test this, using the EST information, RT-PCR primers were designed that align to different scaffolds to amplify gastrula X. tropicalis cDNA. Four random chosen examples were validated by this approach as transcription units that go together but are assembled to different scaffolds ( Figure 3F). Importantly, genuine 59 ends of genes can be identified by their enrichment for H3K4me3, even if the 59 end and the gene body are located on different scaffolds. In total, for 991 loci we detected H3K4me3-enrichment and EST annotation to two scaffolds (Table S2). We also tested the enrichment for H3K9ac for a limited number of genomic regions and found that 34 out of 43 (79%) H3K4me3 regions are also enriched for H3K9ac (data not shown). These results show that the genome-wide ChIP-chip design is a useful tool to interrogate gene regulatory regions in Xenopus.

TBP-Enriched Regions
To examine the application value of our promoter microarray design for transcription factors we studied TATA-binding protein (TBP), a key initiation factor that is known to bind to promoter regions. T7 amplified TBP ChIP and input DNA was hybridized to the promoter microarray. Many TBP-enriched regions colocalize with H3K4me3 enriched regions as is seen for example for tubulin alpha 1c ( Figure 4A; left panel). TBP is also found at H3K4me3-positive regions that lack gene annotation. For example, the promoter region of an open reading frame with no known function was bound by TBP in the presence of H3K4me3 ( Figure 4A; right panel). We determined the correlation of H3K4me3 peak regions and TBP to reveal the relation between epigenetic marks and transcription initiation factors ( Figure 4B). Although correlation is less than observed for the H3K4me3 replicates, many H3K4me3-positive loci are bound by TBP and the correlation is highly significant (r = 0.40, p,2.2 * 10 -16 ).

Discussion
Our results present ChIP microarray designs for X. tropicalis. So far gene regulatory networks and epigenetic regulation during vertebrate embryogenesis have not extensively been studied using genome-wide binding analysis. By contrast, many epigenetic profiles have been established for human cells in culture [12], providing a solid conceptual framework to analyze epigenetic regulation in other systems. The presence of H3K4me3 at the 59 end of transcribed genes is consistent in many experimental systems [5,7,8,9,13,14,15,16,17,18].
The number of H3K4me3-enriched regions in Xenopus gastrula embryos based on ChIP-chip or ChIP-seq [9] is comparable (88% overlap). For Xenopus, a promoter microarray design based on current Joint Genome Institute gene annotation would lose valuable information regarding the TSS, since a large number of H3K4me3-  enriched regions do not overlap with current JGI gene annotation within 1 kb (40%; 4,116 out of 10,179). An experimental approach to the design of a promoter microarray, based on H3K4me3-enriched regions, is therefore highly favored. Although many promoters featured on this microarray design are expected to be active in multiple tissues and stages of development, it should be noted that most comprehensive results will be obtained with this design when analyzing promoter binding events in gastrula stage embryos.
The biological variation of the H3K4me3 modification is quite low in Xenopus gastrula embryos. We also performed TBP ChIPchip and find TBP peaks overlapping with H3K4me3 peaks, as expected for active promoters. The correlation between H3K4me3 and TBP, though highly significant, is not as strong as that observed between biological replicates of H3K4me3 enrichment. However, it should be noted that TBP paralogs have been identified that are required for transcription initiation during embryogenesis [19,20,21] and some promoters decorated with H3K4me3 do not recruit TBP. It will be worthwhile to study the dynamics of TBP related factors during early Xenopus development.
This work describes microarray designs which are made available to the community, thereby facilitating future studies in Xenopus embryogenesis using ChIP-chip. For the analysis of promoter binding the promoter microarray design for X. tropicalis can be used. For a genome-wide analysis of DNA binding or epigenetic regulation the five microarray set can be used. These types of genomic studies will enhance our understanding of transcriptional networks and developmental pathways and identify novel targets genes important for vertebrate development.

Chromatin Immunoprecipitation
Animal work has been conducted according to relevant national and international guidelines and following approval of the institutional review board for animal experimentation (Dierexperimenten commissie). Xenopus tropicalis embryos were obtained from a natural mating procedure after human chorionic gonadotropin injection, dejellied in 2% cysteine and collected at the indicated stage. Chromatin harvesting from 300 gastrula X. tropicalis embryos and ChIP (15 embryo equivalents) was performed as described [20] with two minor modifications: 12.5 ml of Prot A/G beads (Santa Cruz) were used and during reversal of the crosslinking proteinase K was omitted from the buffer. We used a-H3K4me3 (Abcam), a-H3K9ac (Upstate) and a-TBP (SL33) antibodies.

RNA Isolation and cDNA Synthesis
Total RNA was isolated from 20 gastrula X. tropicalis embryos using TRIzol reagent (Invitrogen). 5 mg of RNA was subjected to a reverse transcription reaction in the presence (cDNA) or absence (RT-) of SuperScriptIII (Invitrogen).
Quantitative PCR 5 ml of ChIP material (0.375 embryo equivalents) or 10 times diluted cDNA or RT-was used for quantitative PCR. PCR reactions were performed on a MyIQ single color real-time PCR detection system (BioRad) using iQ SYBR Green Supermix (BioRad). Primer sequences are available in Table S3.

T7 Amplification
T7 amplification procedure was essentially identical as described by [10]. In short, after removal of the 59 phosphate groups the DNA fragments enzymatically acquired a T-tail followed by T7 promoter incorporation. In vitro transcription coupled to 1 st and 2 nd strand synthesis was done to acquire dsDNA.

ChIP-Chip
ChIP-chip samples were prepared according to the manufacturers protocol (Roche Nimblegen). In short, DNA yielded by 4 ChIP reactions (60 embryo equivalents) was pooled and amplified using the T7 strategy. DNA purified from de-crosslinked chromatin (i.e. input, genomic control) was also amplified. Per labeling reaction 4 mg DNA was required (post-amplification). ChIP experimental samples were labeled with Cy3 and genomic control samples with Cy5. Samples were hybridized to the NimbleGen ChIP-chip microarrays by Research and Development at Roche NimbleGen (Madison, WI, USA).

Promoter Microarray Design
H3K4me3-positive regions to include on the promoter microarray design were selected based on relaxed criteria, to include as many putative positive regions while still keeping to the practical size limit of one HD2 Nimblegen microarray (,2.1 million probes). A relatively simple procedure was chosen in order to retrieve an inclusive peak set. This peak selection was only used to design the promoter microarray, not for any further comparisons. Peak detection was performed using a sliding window operation, with three different thresholds for three window lengths (3, 4 or 5 consecutive probes). These different windows were used to select relatively small high peaks, as well as somewhat broader, lower peaks. As a control, all probes were randomized per microarray and sliding window detection on the randomized probes was performed. On both real and randomized data the sliding window detection was repeated with an increasing threshold. A definitive threshold was chosen such that the number of peaks in the randomized set divided by the number of peaks in the real data, the theoretical FDR, was ,10%. All overlapping positive regions were merged and each region was extended both up-and downstream by 2 kb. In total this resulted in 27,019 putative promoter regions, the same order of magnitude as the annotated genes (27,916 JGI FilteredModels genes). The design was extended by the addition of several (partially overlapping) datasets. All 59 gene ends of JGI FilteredModels genes were added, as well a set of putative transcription start sites based on EST evidence (SJvH, W.Akhtar, RCA and GJCV, manuscript in preparation), both extended by 2 kb up-and downstream. Finally, the complete tile path of scaffold_1, scaffold_73, scaffold_130, scaffold_185, scaffold_432, scaffold_654 and scaffold_2758 was included. These regions can be used to assess genomic background in peak calling applications. The complete design consists of 1,951,339 probes, which fits on a single 2.1M Nimblegen microarray, covering a total of 46,302 contiguous sequences with a median length of ,4 kb.

Peak Detection
To call H3K4me3 peaks based on the three biological replicates (one genome-wide and two replicates on the promoter microarray design), TileMap [11] was chosen. In contrast to the selection of positive regions for the promoter microarray design, where an inclusive set was preferred, in this case the aim was to provide a set of high-confidence H3K4me3 peaks. As Tilemap is based on a statistical model that incorporates the replicate data, it is a more solid approach than a simple sliding window procedure. Before peak detection all probes matching more than one time to the X. tropicalis genome (JGI 4.1) were removed. The following Tilemap parameters were set: method HMM, posterior probability 0.01, expected hybridization length 10. All other parameters were left to default settings. All resulting peaks of only one probe were removed, and all peaks within 1.5 kb were merged. The peaks for TBP (9,558) were called using Tilemap with default parameters.
The files containing the H3K4me3 peaks, the TBP peaks, the number of matches for each probe to X. tropicalis JGI 4.1, as well as the Tilemap parameter files are available as supplementary information through GEO Series GSE19413.

Orphan H3K4me3 Linkage
All mapped X. tropicalis ESTs were downloaded from the UCSC Genome Browser (xenTro2, . ESTs were filtered based on two criteria: 1) mapping to exactly two scaffolds and 2) a H3K4me3 peak within 1 kb of the EST on one of the scaffolds. All unique combinations of two scaffolds with evidence of more than one filtered EST were kept and are summarized in Table S2.

Comparisons between ChIP-Chip Replicates and between ChIP-Chip and ChIP-Seq Peaks
To calculate the correlation of the genome-wide replicate to the two replicates on the promoter microarray the mean per-peak log2 ChIP/input signal was calculated per replicate for all the 10,179 peaks. The correlation coefficient between these replicate per-peak signals was calculated using the Pearson's correlation. To calculate the overlap between H3K4me3 determined by ChIP-chip and ChIP-seq, the H3K4me3 ChIP-seq peaks determined previously (GSM352202_H3K4me3_enriched.bed, available through GEO Series GSE14025) [http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc = GSE14025] were intersected with the peaks determined by ChIP-chip in this study. All ChIP-seq peaks overlapping with at least 1bp were counted as detected by both methods.

Data Availability and Supplemental Data
The data and the microarray designs used to generate the data have been deposited in NCBI's Gene Expression Omnibus [22] and are accessible through GEO Series accession number GSE19413 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19413). Improved versions (v2) of the X. tropicalis genome-wide microarray design and the promoter microarray design are available at the authors' web site (http://www.ncmls.nl/gertjanveenstra). These updated designs include extra control probes (random controls and probes corresponding to Arabidopsis BAC clones F19K16, accession AC011717, and F24B22, accession AL132957). The Arabidopsis sequences can be used as spike-in controls. In addition, all scaffolds of the whole-genome tiling microarray design were randomly distributed over the 5 microarrays of the array set to prevent that any single microarray features all the small scaffolds. The small (high number) scaffolds contain significantly fewer genes and distributing these scaffolds evenly over the 5-array set may prevent signal normalization issues. Arrays featuring these designs can be purchased from Roche Nimblegen.