Fig 1.
The workflow of scHiCTools includes five steps: (1) reading input single-cell data in .txt, .hic, or .cool format, generating the summary plots of the cells, and screening cells based on their contact number and contact distance profile, (2) smoothing the scHi-C contact maps using linear convolution, random walk, or network enhancing, (3) calculating the pairwise similarity between cells using fastHiCRep, InnerProduct, or Selfish, (4) embedding or clustering the cells in a low-dimensional Euclidean space using dimension reduction methods, and (5) visualizing the two-dimensional or three-dimensional embedding in a scatter plot.
Fig 2.
Summary plots for examining the quality of input scHi-C data.
(a) A histogram of contact numbers in the individual cells. (b) A scatter plot showing the percentage of short-range contacts (<2 Mb) versus the percentage of contacts at the mitotic band (2 ∼ 12 Mb) in individual cells.
Fig 3.
Two-dimensional scatter plots of the embedding from the three methods that calculate the similarity between contact matrices, including InnerProduct, fastHiCRep and Selfish.
(Dataset: Nagano et al., 2017). (a) Two-dimensional projection from InnerProduct and MDS shows a clear circular pattern along the four stages of cell cycle. (b) Two-dimensional projection using fastHiCRep and MDS does not show clear separation between the four stages of cell cycle. (c) Two-dimensional projection using Selfish and MDS does not show clear separation between the four stages of cell cycle. (d) Evaluating the three embedding methods in a cell-cycle phasing task by ACROC. The ACROC values from InnerProduct, fastHiCRep and Selfish are 0.904, 0.858 and 0.642, respectively.
Fig 4.
Two-dimensional embedding using different dimension reduction methods.
(Dataset: Nagano et al., 2017). (a) Two-dimensional projection from InnerProduct/PHATE shows a circular pattern similiar to MDS projection (ACROC: 0.920). (b) Two-dimensional projection from InnerProduct/t-SNE shows a circular pattern (ACROC: 0.901).
Table 1.
Average run time (in seconds) of different methods as the number of cells vary.
(Run time is averaged from 10 replicate experiments, performed on an Intel Xeon W-2175 CPU with a frequency of 2.50GHz).
Fig 5.
Average ACROC measures from InnerProduct without any smoothing, InnerProduct with random walk smoothing, InnerProduct with linear convolution, and InnerProduct with network enhancing.
(a) When the dataset was sparsified with the first sparsification method, ACROC measures from the four approaches decreased when down-sampling rate increased. (b) When the dataset was sparsified with the second sparsification method, ACROC measures from the four approaches decreased when down-sampling rate increased, but at a high dropout rate (0.7), linear convolution’s ACROC remained high, whereas other methods’ ACROC dropped to below 0.9.
Table 2.
The normalized mutual information (NMI), adjusted rand index (ARI), and run time of three clustering approaches.
(Data: 750 embryo cells at five differentiation stages, including 1-cell, 2-cell, 4-cell, 8-cell and 64-cell stages, Collombet et al., 2020).