Centriflaken: An automated data analysis pipeline for assembly and in silico analyses of foodborne pathogens from metagenomic samples

doi:10.1371/journal.pone.0329425

Fig 1.

A brief overview of the centriflaken pipeline.

The pipeline starts by taking as input a UNIX path to FASTQ files after which multiple read files belonging to the same sample are merged followed by excluding reads whose length is less than 4000 bp. Then, reads are extracted belonging to a taxon of interest using Centrifuge following which the target assembly is performed using flye. Kraken2 is now used to extract contigs binned as the taxa of interest and these contigs are used for further downstream analysis such as subtyping, virulotyping, AMR gene finding, etc. The outputs from all these downstream processes are used to generate a summary report using MultiQC.

More »

Expand

Table 1.

In silico virulence factor analysis of metagenomic assembled genomes (MAGs) obtained using centriflaken for the Maguire et al. (2021) dataset. Showing the agreement of the results obtained by both workflows (manually vs. centriflaken).

More »

Expand

Table 2.

Comparative results of the virulotyping of the same ONT data generated (Maguire et al 2021) [24] by both workflows (manually vs. centriflaken).

More »

Expand

Fig 2.

Comparative analysis of EDL933 genome assemblies obtained by centriflaken during the initial validation with known samples.

This figure compares the EDL933 genome of the strain used in Maguire et al. (2021) with assemblies obtained using centriflaken for the same samples at various EDL933 enrichment spiking levels. The analysis demonstrates the recovery of the E. coli O157:H7 MAG (Metagenome-Assembled Genome) in either completely closed or fragmented forms (see Table 1). Each horizontal track represents EDL933-matching contigs extracted from a sample. Homologous segments across genomes are indicated by the same color and connected. Sequence coordinates in base pairs are shown on respective scales. The reference genome, labeled as EDL933_2 and indicated with a star (*), serves as the basis for comparison.

More »

Expand

Table 3.

qPCR results for stx and wzy genes for the 21 enriched samples.

More »

Expand

Table 4.

Sequencing statistics, serotypes, stx genes and eae genes, identified per sample.

More »

Expand

Fig 3.

Workflow of the complete precision metagenomic analysis.

The figure shows the sequential steps from sample collection and overnight culturing to DNA extraction, qPCR, library preparation, Oxford Nanopore sequencing, and data analysis using the centriflaken pipeline, ending with the generation of the final report. This overview summarizes the rapid, end-to-end process used for the detection and characterization of STEC in agricultural water samples.

More »

Expand

Table 5.

STEC virulotyping for 21 enriched agricultural waters by centriflaken.

More »

Expand