Random-effects meta-analysis of effect sizes as a unified framework for gene set analysis

doi:10.1371/journal.pcbi.1010278

Fig 1.

Power on simulated data with large effect sizes.

A) LFC distributions in the inset (red) and outset (blue) for an example simulation. The variance of the inset distribution was allowed to change, leading to simulations with different levels of enrichment for DE genes in the inset. The dashed lines show the location of the threshold (fold-change of 1.5) used to define differential expression for methods that apply such a threshold. B, C) Comparison of power of different methods as a function of the level of enrichment in the inset for simulations consisting of 20 (B) or 40 (C) samples. The x-axis indicates the relative enrichment in the inset of the proportion of the LFC distribution beyond the upper or lower thresholds.

More »

Expand

Fig 2.

Power on simulated data with small effect sizes.

A) LFC distributions in the inset (red) and outset (blue) for an example simulation. As in (Fig 1) the variance of the inset distribution was allowed to change, leading to simulations with different levels of enrichment for DE genes in the inset. The dashed lines show the location of the threshold (in this case a fold-change of 1.1) used to define differential expression. B, C) Sample sizes of 20 (B) or 40 (C) were simulated. The x-axis indicates the relative enrichment in the inset of the proportion of the LFC distribution beyond the upper or lower thresholds.

More »

Expand

Fig 3.

Incorporating uncertainty boosts power.

Power obtained with and without taking account of the standard error of the gene-level effect size estimates is shown in blue and red, respectively for simulations based on 20 (A) or 40 (B) samples. The parameter space of the simulations was the same as in Fig 1.

More »

Expand

Fig 4.

Enrichment for cancer-associated gene sets in TCGA data.

Boxplots of log-transformed nominal p-values across 15 cancer types are shown on the left column (lower values indicate more significant results). The rank of the same cancer-associated gene sets is shown in the right-hand column (lower values indicate higher rank). Each row shows the results obtained for the gene set named above the row. A fold-change threshold of 1.5 was used for the MREMA tests.

More »

Expand

Fig 5.

Ranking of disease-associated gene sets in corresponding gene expression experiments.

The proportion of times the disease-associated gene set was included in the x top-ranked gene sets as a function of x. The MREMA tests had higher values of this proportion than established methods when only the most highly ranked gene sets were considered (i.e. low values of x, corresponding to the left-hand side of the plot). The 1DF test maintained this advantage for longer than the other two tests, with GSEA matching the performance of the 1DF test by the right-hand side of the plot. A fold-change threshold of 1.5 was used for the MREMA tests. The position of the disease-associated sets in the full ranking is available in supplementary data (S1 Table).

More »

Expand