Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment

Fig 2

Schematic diagram of clustering workflow.

16S rRNA sequences in FASTA format are provided as input. Each input file, already checked for low-quality and chimera errors, is pre-processed by the removal of duplicates and transformation of k-mer into numeric values. A fixed number of sequence pairs are distributed to each cluster node for k-mer (initial) and NW (refinement) distance calculation. Processed results are merged upon completion of each unit task. Clusters are determined based on criteria described previously [11] and in text. Output files are created and data are cleared from memory.

Fig 2