Statistical Guidance for Experimental Design and Data Analysis of Mutation Detection in Rare Monogenic Mendelian Diseases by Exome Sequencing

doi:10.1371/journal.pone.0031358

Table 1.

Experimental design and disease factors of the causative gene relevant to the statistical power of exome sequencing for RMMDs.

More »

Expand

Figure 1.

The calculated power of exome sequencing for rare monogenic Mendelian diseases for various parameter combinations.

More »

Expand

Figure 2.

Genes underlying highly heterogeneous diseases can be identified by sequencing a moderate sized sample.

The calculated power with varying degrees of genetic heterogeneities (R) ranging from 0.01 to 1 is shown. Upper panel: power of Tr for detecting a recessive gene; Middle panel: power difference Tr-Ta for detecting a recessive gene; Lower panel: power of Td for detecting a dominant gene. Other parameters are fixed to the default values: number of mutations m = 300; total number of genes M = 20,000; sensitivity of detecting mutations Ps = 0.8; and the mutation probability equals genome-wide average w = 1. See Tables S2, S3 and S4 for more dense sampling of R values. Note that power does not always increase monotonously with sample sizes (zigzag line patterns). The loss of power upon increase of sample size is related to discrete changes in the significance level cutoff of the test and thus very small test size (not close to 0.05) as shown in Table S1, since the distribution of the test statistic is discrete.

More »

Expand

Figure 3.

High sensitivity of detecting mutations is required to achieve a useful power.

The power for varying degrees of sensitivities of mutation detection, ranging from 0.1 to 1 is shown. Other parameters are fixed to the default values: number of mutations m = 300; total number of genes M = 20,000; genetic heterogeneity R = 0.05; and the mutation probability equals genome-wide average w = 1. See Tables S5 and S6 for more dense coverage of sensitivities of mutation detection.

More »

Expand

Figure 4.

Strict filtering of false positives has limited impact on recessive diseases but dramatically reduces the power of detecting dominant disease genes.

The power for varying degrees of filtering efficiencies, ranging from 5 to 500, is shown. Upper panel: power of Tr for recessive data; Lower panel: power of Td for dominant data. Other parameters are fixed to the default values: genetic heterogeneity R = 0.05; total number of genes M = 20,000; sensitivity of detecting mutations Ps = 0.8; and the mutation probability equals genome-wide average w = 1. See Tables S7 and S8 for more dense coverage of filtering efficiencies.

More »

Expand

Figure 5.

Power is low for long genes.

The power for varying degrees of relative mutation probabilities, ranging from 0.1 to 10 times the genome average is shown. Upper panel: power of Tr for recessive data; Lower panel: power of Tr for recessive data. Other parameters are fixed to the default values: number of mutations m = 300; genetic heterogeneity R = 0.05; total number of genes M = 20,000; and sensitivity of detecting mutations Ps = 0.8. See Tables S9 and S10 for more dense coverage of filtering efficiencies.

More »

Expand