GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data

doi:10.1371/journal.pone.0174744

Fig 1.

Schematic overview of the binomial model.

After crosslinking and digesting the chromatin, the DNA is ligated resulting in three types of ligation products. In order to detect real interactions, we first filter out self-ligations. With the remaining paired-reads, we then calculate the relative coverage across the genome in order to estimate the random interaction probability. We finally apply the binomial test to distinguish between random and real interactions.

More »

Expand

Fig 2.

GOTHiC applied to mouse fetal liver Hi-C experiments.

(A) On the left, distributions of the relative coverage and the GC content percentage, on the right the mappability score and the number of fragments per 1Mb (y-axis) across mouse Chromosome 10 (x-axis in Mb) (GC content and mappability scores are as in [8]). Pearson correlation is calculated relative to the relative coverage. (B-C) Contact maps of mouse Chromosome 10 containing raw read counts (interactions with at least 3 reads) and binomial significances respectively resulting from classic Hi-C experiment (left panel) and random ligation experiment (right panel) in fetal liver cells. The intensity of the signal is summarized by the gradient above each contact map. Significant interactions are colored with a red gradient in C. Arrows pinpoint a region of high coverage and its impact on the observed number of interactions (B, right panel). The coverage is represented at the left side of each contact map. (D) The top panel represents the distribution of observed/expected log ratio of significant (red) and non-significant (blue) interactions in the fetal liver cell sample. The bottom panel shows a qqplot of the observed and expected p-values in the random ligation data set (red) and a simulated random data set (grey). (E) Influence of the relative coverage on the distribution of interaction significance. GOTHiC interaction ranking in the Hi-C (upper panel) and random ligation (lower panel) samples. The ranked lists were divided into quartiles, the first quartiles correspond to the top ranked interactions. Significant interactions are shown in red.

More »

Expand

Fig 3.

GOTHiC applied to human lymphoblastoid Hi-C experiments.

(A-B) Contact maps of human Chromosome 3 containing raw read counts (interactions with at least 3 reads) and binomial significances respectively resulting from HindIII Hi-C experiment (left panel) and NcoI Hi-C experiment (right panel). The intensity of the signal is summarized by the gradient above each contact map. Significant interactions are colored with a red gradient in B. The coverage is represented at the left side of each contact map. (C) Venn diagrams representing the overlap between interactions with highest raw read counts and significant interactions detected in HindIII (orange percentage) and NcoI (blue percentage) samples. (D) Correlation between the HindIII (x-axis)/NcoI (y-axis) common significant interactions (69,505 interactions) according to their rank. Spearman’s correlations are indicated above the plot.

More »

Expand

Fig 4.

Comparison of mouse the fetal liver Hi-C data after processing by hiclib, hicpipe and GOTHiC.

(A-B) Contact maps of mouse Chromosome 10 containing relative probability computed by hiclib and observed/expected log ratio obtained with hicpipe respectively resulting from Hi-C experiment (left panel) and random ligation experiment (right panel) in fetal liver. The intensity of the signal is summarized by the gradient above each contact map. (C) Influence of the relative coverage on the distribution of number of observed interactions (top panel), hiclib and hicpipe interaction ranking (middle and bottom panels), in the HiC (left) and random ligation (right) samples. The ranked lists were divided into quartiles, the first quartiles correspond to the top ranked interactions. The distribution of the number of reads per interaction is represented in the top panel with green box plots (corresponding y-axis is placed on the right of the plot). (D) Recovery of 80,085 highest ranked intearctions in hicpipe and hiclib by GOTHiC.

More »

Expand