Highly Sensitive and Specific Detection of Rare Variants in Mixed Viral Populations from Massively Parallel Sequence Data
Figure 4
Phase information increased sensitivity, and base quality scores increased specificity.
We compared V-Phaser to alternate versions of V-Phaser with specific components disabled. In the No Phase version, V-Phaser called variants without phase information. In the Uniform Errors version, V-Phaser estimated uniform error rates within homopolymer and nonhomopolymer regions without regard to assigned base qualities. In the No Filtering version, V-Phaser did not filter out low quality bases. (A) Phase information increased sensitivity. The version without phase information attained a sensitivity of 90%, but all other versions of V-Phaser used phase information and attained a sensitivity of 97% or more. We calculated sensitivity as the percentage of known variants correctly identified. Data are from WNV mixed population control dataset. (B) Individual base quality scores increased specificity. Among loci with mismatches, the Uniform Errors version had only 91% specificity, but all other versions incorporated base quality scores in their probability model and attained 97% specificity or more. We calculated specificity as the percentage of loci in the control sample correctly identified as having no variants among loci that had at least one candidate variant. Data are from infectious clone (HIV NL4-3) control dataset.