Simulation of 69 microbial communities indicates sequencing depth and false positives are major drivers of bias in prokaryotic metagenome-assembled genome recovery
Fig 5
First, we proceeded with species selection and sequence retrieval from the National Center for Biotechnology Information (NCBI). Next, community profiles were generated based on species abundance, taxonomic distribution and sequencing depth. Metagenomes were simulated for each community profile using MetaSim [42]. A quality check was performed to remove adapters and short reads. The next step consisted of assembling reads into scaffolds and performing post-assembly quality checks. For genome recovery, three pipelines were used: DAS Tool (DT) [19], Multi-metagenome (MM) [20] and the pipeline used to recover more than 8000 metagenome-assembled genomes (MAGs) (8K) [12]. Completeness and contamination of MAGs was assessed using CheckM [44]. Taxonomic classification of the MAGs was performed by IDBA-UD [43].