Fig 1.
The input of MrTADFinder is an intra-chromosomal contact map W. A null model E is obtained from W. Given a particular resolution γ; the chromosome is partitioned probabilistically in a way such that the objective function Q is maximized. The optimization is performed by a modification Louvain algorithm shown on the right. The algorithm is stochastic because the updating order of nodes is random. A boundary score is defined after multiple trials for all adjacent bins. Adjacent bins that are robustly assigned to two different TADs form a consensus boundary. The output of MrTADFinder is a set of consensus domains bound by the consensus domains.
Fig 2.
Identification of TADs in multiple resolutions.
A) A part of the contact map of the chromosome 10 in hES cell. The greenish triangles below represent TADs called by MrTADFinder in three different resolutions. The TADs called agree well visually with the contact map. The blue triangles and red triangles represent TADs called in human ES cells and human IMR90 cells respectively as reported in [8]. B) The size of TADs called in different resolutions. The median TADs size decreases from 3 Mbp to 300 kbp as the resolution increases from 0.75 to 3.5. C) The number of TADs increases as the resolution increases. When γ = 2.25, there are about 2600 TADs in hES cells with a median size of roughly 1Mb. The median size goes down to 300kb when the resolution increases to 3.5. The number of TADs identified in [8] is marked by the arrow. Comparing TADs called by MrTADFinder with TADs called in [8]. Two algorithms agree the most in a particular resolution (γ ≈ 2.875).
Fig 3.
Boundary signatures of histone modifications in different resolutions.
A) Histone modifications near the TAD boundary regions obtained in various resolutions. The peak density is obtained by counting the number of peaks in every 40kb bin, and normalized by a null model in which peaks are randomly distributed. B) Different histone marks show different levels of enrichment near TAD boundaries at different resolutions. Despite a general decreasing trend, the signal of certain marks likes H3K27me3 remains flat until a very high resolution.
Fig 4.
A) Distribution of house-keeping genes and tissue-specific genes near TAD boundaries at different resolutions. House-keeping genes are more enriched near TAD boundaries as compared to tissue-specific genes. B) House-keeping genes and tissue-specific genes show different levels of enrichment near TAD boundaries at different resolutions. Tissue-specific genes show a general decreasing trend, whereas the number of house-keeping genes remains flat until a high resolution.
Fig 5.
Transcription factors binding in different resolutions.
A) Enrichment of HOT (high-occupancy target) and XOT (extreme-occupancy target) regions near TAD boundaries in hES cell. Boundaries are identified by MrTADFinder at a resolution γ = 2.75. The y-axis is normalized by a null model that peaks are randomly distributed in along the chromosome. B) A logistic regression model to classify real TAD boundaries and random boundaries based on the binding pattern of 60 TFs. The most influential factors responsible for TAD boundaries formation at different resolutions are listed. Factors with a positive coefficient have a direct effect on border establishment or maintenance, whereas factors like MYC has a negative effect. The factors are sorted by corresponding P-values and only the significant factors are displayed.
Fig 6.
The number of promoter-enhancer linkages connecting the endpoints of domains in different resolutions.
As the resolution increases, the increase in the number of boundaries can capture a higher number of potential interactions. The blue curve shows the increase for an ensemble of randomly reshuffled TADs. The number of promoter-enhancer linkages connecting the endpoints of real domains is higher than the random counterparts.
Fig 7.
Mutational burdens across TAD boundaries.
The 3 clusters of boundary regions exhibit distinct patterns in terms of mutational burden. For blue and red clusters, the area marks the first and the third quartiles. For the green cluster, only the mean values at different positions are shown for clarity. The inset shows the average Repli-seq signal for the red and blue clusters.
Fig 8.
Enrichment of CTCF peaks near TAD boundaries at two different resolutions.
The blue line shows the same analysis using TADs reported in [8].