Skip to main content

Advertisement

PLOS Computational Biology

Browse
Publish
- Submissions
- Policies
- Manuscript Review and Publication
About

Search Search

advanced search

< Back to Article

Fig 1 — Fig 1.

Overview of the AIEdit polishing pipeline.
In the pre-processing stages, ntCard and ntStat construct Bloom filters (orange cylinders) containing high-confidence k-mers and spaced seeds from sequencing reads (upper left grey box). These Bloom filters are used to identify unsupported k-mers in the assembly and form the input to the error pattern model. Black (white) cells in the inputs to the error pattern model represent spaced seed “care” (“do not care”) positions, and patterns and hits are passed to the pink and cyan GRUs, respectively. The final hidden states of these two GRUs are concatenated and passed to the linear layer (dark blue) for predicting the error pattern, which guides base-level corrections. Finally, a single round of ntEdit is applied as a post-processing step to correct potentially missed errors, reusing the k-mer Bloom filter. Input and output files are shown in grey boxes.

More »

Fig 2 — Fig 2.

Polishing accuracy and computational performance of AIEdit compared to other tools.
Polishing accuracy is reported as the average number of mismatches and indels per 100kbp. The number of mismatches and indels in the unpolished assemblies are shown as grey circles, and their QV scores are shown as grey horizontal lines. For simulated short-read datasets (a, b, and c), AIEdit (dark cross), ntEdit alone (blue cross), and POLCA (green cross) achieved similar error correction rates. Their results are shown in logarithmic scale to highlight minor differences. The run times for the simulated long-read experiments (d, e, and f) are also in log-scale due to POLCA and Medaka’s (orange cross, when available) high run times compared to other tools. All other plots are in linear scale. “Baseline” (grey cross) refers to the unpolished assembly. Run time and peak memory usage includes the computational resources allocated for each tool’s entire pipeline reported by/usr/bin/time -pv. For experimental long read datasets (g, h, and i), k-mer QV scores of the polished assembly are calculated by Merqury.

More »

Publications
PLOS Aging and Health
PLOS Biology
PLOS Climate
PLOS Complex Systems
PLOS Computational Biology
PLOS Digital Health
PLOS Ecosystems
PLOS Genetics

PLOS Global Public Health
PLOS Medicine
PLOS Mental Health
PLOS Neglected Tropical Diseases
PLOS One
PLOS Pathogens
PLOS Sustainability and Transformation
PLOS Water

Home
Blogs
Collections
Give feedback
LOCKSS

Privacy Policy
Terms of Use
Advertise
Media Inquiries
Contact

PLOS is a nonprofit 501(c)(3) corporation, #C2354500, based in California, US