Perm-seq: Mapping Protein-DNA Interactions in Segmental Duplication and Highly Repetitive Regions of Genomes with Prior-Enhanced Read Mapping

doi:10.1371/journal.pcbi.1004491

Fig 1.

Discriminative power of DNase-seq for mapping locations of multi-reads.

(a) Log base two ratios of DNase-seq versus ChIP-seq read counts in the local neighbourhoods of the two mapping locations of each multi-read in Gata2 ChIP-seq dataset (Huvec). The vertical and horizontal lines depict boundaries with the log base two ratios equal to 0.5. The proportion of read pairs in categories: ChIP: ChIP or both ChIP and DNase discriminates; Neither: Neither ChIP nor DNase discriminates; Only DNase: Only DNase discriminates; Opposite: ChIP and DNase log base 2 ratios have different signs are 23%, 44%, 26%, and 7%, respectively. (b) Classification of multi-reads with two mapping locations based on their local DNase-seq and ChIP-seq read counts into 4 groups as ChIP, Neither, Only DNase, and Opposite. (c) Overall summary of the Perm-seq pipeline.

More »

Expand

Fig 2.

Comparison of uni-read, CSEM, and Perm-seq analysis.

(a) Comparison of Ctcf optimal peak lists from uni-read, CSEM, and Perm-seq analyses. Numbers in parentheses denote comparisons of the optimal peak lists with the relaxed peak lists. For example, there are 1320 peaks identified by the Perm-seq and CSEM analyses and missed by the uni-read analyses. 664 of these peaks are still missed by the uni-read analysis even if we consider comparison of the Perm-seq and CSEM optimal peak lists with the uni-read relaxed lists. (b) Circos plots of CSEM (left) and Perm-seq (right) read allocation for reads mapping to four segmental duplication regions with coordinates chr1:143,880,003–143,978,943, chr1:206,072,707–206,171,611, and chr1:143,880,003–144,005,301, chr1:120,872,119–249,250,621. (c) Percentages of Perm-seq specific and CSEM specific peaks with the most significant motifs identified from the de novo sequence analysis of the intersection peaks, i.e., peaks common to uni-read, CSEM, and Perm-seq analysis. (d) Comparison of the Ctcf peak sets from GM12878 between Perm-seq, CSEM, Gibbs-based [2], and Lonut [4]. x.vs.Perm-seq denotes optimal peaks of method “x” not identified by Perm-seq. Similarly, Perm-seq.vs.x denotes optimal peaks of Perm-seq not identified by method “x”. (e) Annotation of the K562 peaks with respect to segmental duplications. Categories are: Prom.Dup: peaks that are in promoter regions (± 2500 bps of TSS) of RefSeq genes that reside in segmental duplications; Prom: Peaks in promoter regions (excludes peaks in Prom.Dup); Genic.Dup: peaks that are within [-10000 bps of TSS, +1000 bps of TES] of RefSeq genes that are in segmental duplications (excludes peaks in Prom.Dup); Genic: peaks that are within [-10000 bps of TSS, +1000 bps of TES] of RefSeq genes (excludes peaks in Genic.Dup, Prom.Dup); Dup: peaks that are in segmental duplications (excludes Prom.Dup and Genic.Dup); None: peaks that do not fall into any of the other defined categories. (f) Genes are ordered with respect to RNA-seq transcripts per million (TPM) values. Genes with a Common Pol2 peak in their promoters are depicted with green whereas genes with only Perm-seq-only peaks are depicted in blue.

More »

Expand

Fig 3.

Comparison of uni-read and Perm-seq analysis.

(a) Distribution of expression levels of genes with (i) Common peaks; (ii) Common and Perm-seq-only peaks; (iii) Perm-seq-only peaks. (b) Percentage of genes in segmental duplications with (i) Common peaks; (ii) Common and Perm-seq-only peaks; (iii) Perm-seq-only peaks. (c) Histone modification profiles within ± 2000 bps of the summits of the Common, Perm-seq-only, and Perm-seq-exclusive Bcl3 peaks in GM12878 and K562 cells, respectively. (d) Percentages of Perm-seq-exclusive and Common peak sets with the M1 motifs from the de novo sequence analysis of the top 500 Common peaks. Common peak set percentages are replaced by the percentages of occurrences in the subset of the Common peak sets (red circles) that is matched to Perm-seq-exclusive peaks in terms of ChIP signal.

More »

Expand

Fig 4.

Conservation analysis of the common and Perm-seq-exclusive peak sets.

(a) Empirical cumulative distribution functions (CDFs) for average phyloP scores of Usf1 binding sites from the Common and Perm-seq-exclusive Usf1 peaks in GM12878 and K562 cells, respectively. Positive scores indicate conservation and negative scores indicate acceleration. (b) Mean position-specific phyloP scores of the Usf1 binding sites for Common and Perm-seq-exclusive Usf1 peaks in GM12878 and K562, respectively. Shaded areas denote ± one standard error of the mean profile. (c) Pearson correlations between the mean position-specific phyloP scores of the binding sites from the Common and Perm-seq-exclusive peak sets. Purple and red circles indicate that correlations are not significantly different than zero only in GM12878 and in both GM12878 and K562 cells, respectively. (d) Mean position-specific phyloP scores of the Zbtb33 binding sites from the Common and Perm-seq-exclusive Zbtb33 peaks in GM12878 and K562 cells, respectively.

More »

Expand

Fig 5.

Perm-seq analysis of GM12878 Ctcf and Sin3a ChIP-seq dataset with DNase-seq and Histone ChIP-seq priors.

(a) Heatmap of normalized DNase-seq and Histone ChIP-seq read counts for the [-1000 bps, +1000 bps] window anchored at GM12878 Ctcf peak summits depicted by the vertical dashed lines. Perm-seq-specific (DNase+Histone): Perm-seq optimal peaks using DNase-seq and Histone ChIP-seq in prior construction. These peaks are not identified when using only DNase-seq for prior construction or with CSEM. Perm-seq-specific (DNase): Perm-seq optimal peaks using only DNase-seq in prior construction. These peaks are not identified when using DNase-seq and Histone ChIP-seq for prior construction or with CSEM. (b) Heatmap of normalized DNase-seq and H2a.z ChIP-seq read counts for the [-500 bps, +500 bps] window anchored at GM12878 Sin3a peak summits. Perm-seq-specific (DNase): Perm-seq optimal peaks using DNase-seq in prior construction. These peaks are not identified when using only H2a.z ChIP-seq for prior construction or with CSEM. Perm-seq-specific (H2a.z): Perm-seq optimal peaks using only H2a.z ChIP-seq in prior construction. These peaks are not identified when using DNase-seq for prior construction or with CSEM. Next best mapping positions of Perm-seq-specific (DNase) peaks denote the regions where the multi-reads of the peaks map with the next best total allocation scores, i.e., ranked second compared to the allocation scores at the actual peaks.

More »

Expand