Fig 1.
A flow-chart of the ARK method.
Fig 2.
Results for the random K-means clustering on the simulated data.
Mean VD error at the genus level as a function of the number of clusters. Note the improvement that ARK contributes to each method.
Fig 3.
Results for the random K-means clustering on the simulated data.
Mean execution time increase (factor given in comparison to running SEK or Quikr in the absence of ARK) as a function of number of clusters. The dashed line represents a line with slope 1.
Fig 4.
Comparison of the underlying algorithms with and without ARK.
Results are for the random K-means clustering on the simulated data when fixing the number of clusters to 75. Mean VD error at the genus level. Included for comparison are results for RDP’s NBC (compare to Fig 2(b) of [3]).
Fig 5.
Comparison of the underlying algorithms with and without ARK.
Results are for the random K-means clustering on the simulated data when fixing the number of clusters to 75. Boxplot of the individual simulated sample execution times. Mean execution times for Quikr and ARK Quikr were 1.75 seconds and 4.71 minutes, while for SEK and ARK SEK they were 21.26 seconds and 19.21 minutes respectively. Mean execution time for RDP’s NBC was 38.19 minutes.
Fig 6.
Total execution time for each method on the 28 samples of real biological data.
Fig 7.
PCoA plots using the Jensen-Shannon divergence for RDP’s NBC.
Fig 8.
PCoA plots using the Jensen-Shannon divergence for ARK SEK.
Fig 9.
ARK Quikr PCoA plots (using the Jensen-Shannon divergence) on the real biological data.
In this case, we have labeling by body site. Note the clustering.
Fig 10.
ARK Quikr PCoA plots (using the Jensen-Shannon divergence) on the real biological data.
In this case, we have labeling by variable region. Note the clustering.