Figure 1.
Schematic overview of FDR calculation method.
A Concept of FDR calculation which relies on the availability of a normal tissue replication experiment. B Examples of single nucleotide variations found: A somatic mutation found in all three B16 samples (left), a non-somatic mutation found in all B16 and black6 samples (middle) and a mutation found in only one black6 sample (right); this last variation would cause a raise in the FDR for all somatic mutations with a comparable or worse quality. C Process to generate FDRs for a set of somatic mutations and visualize the results. The FDR distribution is visualized as an average estimated ROC curve with the grey bars giving the 95% confidence interval for the mean in both dimensions at uniformly sampled positions. The mean was obtained from the distribution of estimated ROC curves of the FDRs for all possible 18 combinations of reference data sets (see text).
Figure 2.
Overview of the process for finding somatic mutations in B16.
Numbers for the individual steps are given as an example for one B16 sample, compared to one black6 sample. “Exons” refers to the exon coordinates defined by all protein coding RefSeq transcripts.
Figure 3.
Process of selection of mutations for validation.
A The Venn diagram shows the numbers of somatic variations in protein coding exons, found by the individual, two or all three software tools, respectively. The numbers were calculated after the recommended filtering procedures (see Methods section) and represent the consensus of all three samples. B List of unfiltered somatic mutations found in the consensus of all three samples, sorted by FDR (low to high from top to bottom). Each row represents a predicted mutation and it is indicated which program did the prediction. C For each mutation a FDR can be calculated, which is used for prioritization of the validation experiment.
Table 1.
Ten validated mutations with the lowest FDRs, selected out of a set of 2396 exonic variations which were found in duplicate in two B16 samples.
Figure 4.
Genome browser screen shot for triplicate black6 and B16 samples and associated Sanger sequencing traces for the black6 and B16 DNA for both a true positive (chr4:155261079) and a false positive (chr4:151534480) mutation call.
Both mutations are predicted by GATK, SomaticSNiPer and SAMtools. The mean coverage is 54 (true positive) and 10 (false positive), respectively. Only four reads are shown for visual clarity. The red box marks the sample, in which the three mutation callers wrongly detected a SNV.
Figure 5.
Results of mutation validation experiments.
A ROC curve for all 131 mutations with a successful validation (either positive or negative). 1-FDR was used as the probability of a mutation being a true call. B Relative amount of variations found for a given FDR cutoff in a set of 2396 exonic variations which were found in duplicate in two B16 samples (see also Table 1), plotted separately for all variants in the dataset and the 50 validated high confidence mutations. For visual clarity only values of 0 to 0.1 FDR are shown.
Figure 6.
Comparison of different experimental settings and analysis procedures.
A Estimated ROC curves for the comparison of the three different software tools (duplicates, 38× coverage). B Estimated ROC curves for the comparison of different average sequencing depths (SAMtools, no replication). 38× denotes the coverage obtained by the experiment, while other coverages were down sampled starting with this data. C Estimated ROC curves visualizing the effect of experiment replication (38× coverage, SAMtools). D Estimated ROC curves for different sequencing protocols (SAMtools, no replication). The curves were calculated using the results of the 2×100 nt library (Note: A complete display of the results can be found in Supplementary Figures S2 and S3 in Text S1. Also, unscaled versions of the plots are shown in Supplementary Figure S8 in Text S1, giving an impression of the individual set sizes).