mbkmeans: Fast clustering for single cell data using mini-batch k-means
Fig 4
The speed and memory-usage of the on-disk mbkmeans implementation depends on the structure of the on-disk file.
Performance evaluation (y-axis) of (A) maximum memory (RAM) used (GB) and (B) elapsed time (minutes) (repeated 10 times) for increasing sizes of datasets (x-axis) with N = 75,000, 150,000, 300,000, 500,000, 750,000, and 1,000,000 observations using our desktop configuration. Results for indexing a HDF5 file by gene is blue, by cell is red, as a single chunk is purple and the default indexing is green. The single chunk was only able to run for the smallest dataset size (N = 75,000). We used k = 15 and used a batch size of b = 500 observations for mbkmeans.