Cooltools: Enabling high-resolution Hi-C analysis in Python

doi:10.1371/journal.pcbi.1012067

Table 1.

Overview of software suites for C-data analysis.

More »

Expand

Fig 1.

Overview of cooltools functionality.

Open2C provides a modular ecosystem of software libraries for Hi-C analysis (highlighted with gray boxes). Pairtools [58] takes in paired-end sequence alignments and extracts contact pairs in the 4DN.pairs format. cooler [8] bins these contact pairs and stores the resulting sparse matrices in.cool and.mcool formats. The nextflow pipeline distiller [59] converts sequencing reads from FASTQ files directly into binned and normalized cooler files, integrating read alignment with pairtools and cooler. The library introduced in this paper, cooltools (in bold), provides methods to quantify and extract features from high-resolution contact maps stored by cooler.

More »

Expand

Fig 2.

Expected and contact frequency versus distance.

a. Observed contact map for HFF Micro-C for chr2 and chr17 at 1Mb. Chromosomal arms p and q are depicted as light and dark grey rectangles respectively. Note the wide unmappable centromeric regions (white rows and columns) between chromosomal arms. Accounting for these regions is a key aspect of calculating an expected map. b. Expected map for three classes of regions: intra-chromosomal intra-arm, intra-chromosomal inter-arm, and inter-chromosomal. Regions for expected are specified using genomic views, where individual regions are chromosomal arms. Note that intra-chromosomal expected has a strongly decreasing contact frequency with genomic distance, whereas inter-chromosomal expected appears flat. c. Average contact frequency versus genomic separation, or P(s), for intra-arm interactions (blue, orange) and for inter-arm interactions (green), calculated from contact maps at 10kb. P(s) curves are matched by region and color with arrows on the middle heatmap.

More »

Expand

Fig 3.

Compartments and eigenvectors.

a. To obtain cis compartments profiles, observed maps are first divided by expected. b. Observed/expected maps are decomposed into a sum of eigenvectors and associated eigenvalues. c. Illustration of eigenvector phasing. In mammalian Hi-C maps, the first eigenvector typically, but not always, corresponds to the compartment signal. Since eigenvectors are determined only up to a sign, their orientations are random. To obtain consistent results, the final cis compartment profile (right) is obtained as the eigenvector most correlated with a phasing track (here, GC content), and oriented to have a positive correlation.

More »

Expand

Fig 4.

Pairwise class averaging and saddle plots.

a. Compartment profile, where more negative values are B regions, and positive values A for a 15Mb region of chr2. b. Digitized compartment profile, quantized into 5 classes by percentile. The lowest is highlighted as a thicker line. c. Observed/expected map with pairs of B regions highlighted. d. Saddle-plot for the 5 digitized classes, highlighted regions in the observed/expected map contribute to the top left pixel boxed in grey.

More »

Expand

Fig 5.

Insulation and boundaries.

a. Diamond insulation is calculated as the sum in a sliding window (gray) across the genome, shown here for HFF MicroC data in a region of chr2 at 10kb resolution (chr2:10900000–11650000). b. The resulting insulation profile is shown in black. Local minima are indicated with dots. Positions of strong boundaries shown as orange dots, and filtered weak boundaries as blue dots. Two-sided gray arrow shows the boundary strength of the strong boundary at chr2:1146000–1147000, calculated relative to the maximum insulation achieved before a more prominent minima in either genomic direction. Here, strength is relative to the prominent minima at chr2:11130000–11140000, and maximum insulation is indicated with a dashed gray line.

More »

Expand

Fig 6.

Dots.

a. Dots calls from a region of chromosome 17, highlighted by squares on the upper triangular portion of the map. Squares show the size of the region scanned by convolutional kernels. b. illustration of convolutional kernels used for dot calling around one example, from left to right: ‘donut’, ‘top’, ‘bottom’, ‘lowerleft’. Local enrichment at the center pixel is calculated relative to the shaded regions in each kernel.

More »

Expand

Fig 7.

Pileups and average snippets.

a. snippets, or regions around called dots, are extracted from the genome-wide map. b. set of extracted snippets. c. average pileup for dots created by averaging the set of snippets.

More »

Expand