clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets

doi:10.1371/journal.pcbi.1006378

clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets

Fig 3

Comparison of methods and tuning parameter choices using clusterMany and plotClusters, demonstrated on the olfactory epithelium dataset.

The figure provides examples of using clusterExperiment to compare clustering methods and tuning parameter choices via the function clusterMany to implement the clustering procedures and the function plotClusters to visualize results. (a) shows the clustering results after running PAM with different choices of K, the number of clusters. (b) shows the clustering results for different between-sample distance measures. ‘Euclidean’ refers to the standard Euclidean distance; ‘Pearson Corr.’ and ‘Spearman’s Rho’ to a correlation-based distance, d(i, j) = 1/2(1 − ρ(i, j)), where ρ(i, j) is either the standard Pearson correlation coefficient or the robust Spearman rank correlation coefficient between samples i and j, respectively. (c) shows the clustering results for different choices of clustering algorithms. Each method is shown with the “best” choice of K, as determined by the maximum average silhouette width; “NN” refers to a user-defined, nearest-neighbor clustering (see Section Data used in the Manuscript in S1 Text). Also shown is the result of applying the consensus and merging steps of the RSEC workflow to this set of clusterings. The clusterings in (a) and (c) were run with the top 50 PCA dimensions as input. The clusterings in (b) involve comparing different between-gene distance measures and therefore were run directly on the gene expression measures after filtering to the top 1,000 most variable genes, as determined by the median absolute deviation (MAD), a robust version of variance.

doi: https://doi.org/10.1371/journal.pcbi.1006378.g003