Integrating Hi-C links with assembly graphs for chromosome-scale assembly
Table 2
Scaffold and correctness statistics for NA12878 assemblies scaffolded with different Hi-C libraries.
“True links” is an idealized case where the Hi-C links have been filtered in advance. The NG50 of human reference GRCh38 is 145 Mbp. The ratio between NG50 and NGA50 represents how many erroneous joins affect large scaffolds in the assembly. The bigger the difference between these values, the more aggressive the scaffolding was at the expense of accuracy. Longest chunk represents the longest error-free portion of the scaffolds. We observed that the 3D-DNA mis-assembly detection was overly aggressive in some cases, and so we ran some assemblies both with and without this feature. For the Illumina assembly as an input, 3D-DNA w correction did not finish within two weeks and is omitted. An evaluation of a previously published [20] 3D-DNA assembly from short-read contigs is included in S3 Table but did not exceed SALSA2’s NGA50.