HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis

doi:10.1371/journal.pcbi.1010349

HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis

Fig 3

Overview of the hierarchical agglomerative learning approach.

(a) For high-dimensional inputs, a dimensionality (c.g. UMAP [31], t-SNE, etc.) reduction step is required in order to obtain reliable density estimates. (b) In low-dimensional spaces, density maps can be easily computed. Initial clusters are selected to be the neighborhood of the density modes (see [17]). (c) A k nearest-neighbor graph is constructed by measuring similarity via the out-of-sample accuracy by training classifiers in the original high-dimensional feature space: each node represents an individual cluster and each edge has an associated weight given by the accuracy of the classifier. (d) Nodes are successively merged by pairs following the procedure until a desired out-of-sample accuracy is reached. The end result consists of an interpretable hierarchical classifier and robust clustering assignments. The classifier can be used to predict the labels of new data and potentially identify outliers.

doi: https://doi.org/10.1371/journal.pcbi.1010349.g003