Variant Callers for Next-Generation Sequencing Data: A Comparison Study

doi:10.1371/journal.pone.0075619

Figure 1.

Unified steps of the pipelines.

Blue rounded rectangles represent the reads, blue rectangles represent mapping-QC procedures, red callouts indicate the tools. The dashed curve arrow represents a reduced version skipping the mapping-QC steps.

More »

Expand

Figure 2.

Boxplots of measure for validation by the pipelines.

a. Number of SNPs. b. Ti/Tv ratio. c. Number of indels by the pipelines. d. True positive calls by the pipelines. e. False positive calls plus error genotypes by the pipelines. f. Re-discovery rate (positive prediction value) by the pipelines. g. Sensitivities of the pipelines. h. Specificities of the pipelines. The green bars indicate the first quartiles, red bars extend to medians, blue bars reach the third quartile, and error bar caps show the ranges. SAMt, glfS and glfM stand for SAMtools, glfSingle and glfMultiples respectively. “_S” and “_M” represent single and multiple calling strategies. R and F represent raw and filtered variants.

More »

Expand

Figure 3.

Shared variants by single-sample pipelines and their validation.

a. Average pairwise overlapping between filtered variants called by SAMtools (blue), GATK (red), glfSingle (olive green) and Atlas2 (purple). b and c. Boxplots of sensitivities and specificities for shared variants. P13 stands for shared variants between pipeline 1 (SAMtools) and 3 (GATK), P135 stands for shared variants by pipeline 1, 3 and 5 (glfSingle), and so on.

More »

Expand

Figure 4.

Positive prediction value and sensitivity of callers for WGS data simulated at different coverage settings.

The SAMtools, GATK, glfSingle labels represent the sensitivities for SNPs, Stindel represents the sensitivity for indels called by SAMtools. a. Positive prediction value. b. Sensitivity.

More »

Expand