CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment
The figure summarizes the workflow of distributed processing in CLUSTOM-CLOUD. (A) The number of all possible sequence pairs that need to be compared for distance calculation is represented as a right-angled triangle; n represents the total number of sequences. (B) A chunk-size based on system granularity is determined to distribute only a fixed number of sequence pairs (shown here with 2 K) to each cluster node. (C) Each task (e.g., Ti) is assigned to nodes from top to bottom and left to right. (D) Each node takes and processes tasks in the order of task priority. (E) The assigned task (Ti) is divided into smaller sub-tasks (tj) and processed in parallel using multi-threads (wk) depending on the number of threads on the cluster node.