Recentrifuge: Robust comparative analysis and contamination removal for metagenomics
Fig 4
Outline of the Recentrifuge package with its ecosystem and main data flows.
Recentrifuge (rcf) accepts output files from diverse taxonomic classifiers such as Centrifuge [7], LMAT [21], CLARK [39], CLARK-S [40], Kraken [41], and others, enabling a robust taxonomic analysis for metagenomics. Recentrifuge is also supporting LMAT plasmids assignment system [15]. The additional output of Recentrifuge to different text field formats enable further longitudinal (time or space) series analysis, for example, using Dynomics (in development). The NCBI Taxonomy dump databases [44] are easily retrieved using Retaxdump. Rextract utility extracts a subset of reads of interest from single or paired-ends FASTQ input files, which can be used in any downstream application, like genome assembling and visualization. Remock easily creates mock Centrifuge samples, useful for code validation but also for including previously known contaminants. Retest is the script in charge of testing (denoted by dashed lines) the other components of the package. The dotted lines indicate software and procedures beyond Recentrifuge.