CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment
16S rRNA sequences in FASTA format are provided as input. Each input file, already checked for low-quality and chimera errors, is pre-processed by the removal of duplicates and transformation of k-mer into numeric values. A fixed number of sequence pairs are distributed to each cluster node for k-mer (initial) and NW (refinement) distance calculation. Processed results are merged upon completion of each unit task. Clusters are determined based on criteria described previously  and in text. Output files are created and data are cleared from memory.