AIEdit: Alignment-free genome assembly polisher trained on spaced seed match patterns
Fig 2
Polishing accuracy and computational performance of AIEdit compared to other tools.
Polishing accuracy is reported as the average number of mismatches and indels per 100kbp. The number of mismatches and indels in the unpolished assemblies are shown as grey circles, and their QV scores are shown as grey horizontal lines. For simulated short-read datasets (a, b, and c), AIEdit (dark cross), ntEdit alone (blue cross), and POLCA (green cross) achieved similar error correction rates. Their results are shown in logarithmic scale to highlight minor differences. The run times for the simulated long-read experiments (d, e, and f) are also in log-scale due to POLCA and Medaka’s (orange cross, when available) high run times compared to other tools. All other plots are in linear scale. “Baseline” (grey cross) refers to the unpolished assembly. Run time and peak memory usage includes the computational resources allocated for each tool’s entire pipeline reported by/usr/bin/time -pv. For experimental long read datasets (g, h, and i), k-mer QV scores of the polished assembly are calculated by Merqury.