Fig 1.
Per-gene methylation measures.
(a) The mean methylation level over a specific genomic region is calculated separately for the TSS200 (promoter) and gene body genomic regions. The blue curve indicates the new position of the red curve after an additive global shift in methylation level, which might be due to technological or other experimental factors, and the difference between the horizontal red and blue lines (mean levels) illustrates the effect of this shift on the mean methylation level. (b) The intra-gene methylation variability (IGV) is calculated from the variation around the mean methylation level, i.e., from the dashed vertical lines, and is similarly calculated separately for the TSS200 and gene body genomic regions. The vertical green lines are changed very little compared to the vertical red lines, illustrating that such a global additive shift in mean methylation level has much less effect on IGV, which is therefore referred to as a ‘self-calibrating measure’.
Fig 2.
(a) Methodology overview for comparison of the four per-gene methylation measures. (b) Results of this comparison. (c) Methodology overview for calculation of ovarian cancer IGV prognostic score.
Fig 3.
IGV OC prognostic signature validation.
(a), (c) and (e): Comparison of survival curves of groups defined by the IGV prognostic score, in: (a) the main OC data set, (c) the Mayo Clinic OC validation set, (e) the uterine cancer TCGA validation set. The groups are divided by the median IGV prognostic score derived in the main OC DNAm data-set. The hazard ratio (HR) is displayed with 95% C.I. in brackets, with corresponding p-value calculated by univariate Cox regression. (d), (e) and (f): Multivariate Cox regression comparing the same groups defined by the IGV prognostic score.
Fig 4.
Comparison of IGV with Intra-Tumour Heterogeneity.
(a) Cross-sample variability of methylation (Intra-tumour heterogeneity) and IGV are calculated in different and complementary directions. The heatmap displays the methylation profile of a single gene (horizontal axis), across multiple samples (vertical axis). (b)-(e) A characteristic pattern of high cross-sample variability (intra-tumour heterogeneity) when IGV is low, and vice-versa, is consistently observed across different studies: (b) Main OC data-set, (c) Endometrial cancer intra-tumour heterogeneity data-set, (d) prostate cancer intra-tumour heterogeneity data-set, (e) BRCA basal data-set. (f)-(h) The overlap of genes in each region of (b) with genes in equivalent regions of (c)-(e) is highly significant. In (c) and (d), each line relates to samples from a single patient, and is a best fit curve equivalent to that shown in (b) and (e). In (b), odds-ratios and p-values at the top of the plot show enrichment of the genes of each cluster, either side of the median IGV of the prognostic signature. Abbreviations: ITH (intra-tumour heterogeneity), OC (ovarian carcinoma), BRCA (breast cancer invasive carcinoma).
Fig 5.
Heterogeneity and the effects of cell mixing on the 450K array.
The 450K array provides methylation measurements from a mixed-up sample of multiple cells. (a) An example of a methylation pattern which is highly variable, in a similar way across cells. This leads to low cross-sample heterogeneity, and high IGV, as in cluster hyper 2. (b) An example of a methylation pattern which is highly variable, but in a heterogenous way across cells. This leads to high cross-sample heterogeneity, however the net effect of averaging the methylation profiles across the mixed up sample of many cells gives a measurement with low IGV, as in cluster hypo 1. (c) A measure of CpG-CpG methylation variability, calculated as the mean derivative, or the mean absolute difference in methylation level between adjacent CpGs. (d) The variability of the mean-derivative measure across samples quantifies the heterogeneity of the CpG-CpG methylation variability. Cluster hyper 2 is low according to (d), and hence corresponds to a pattern such as (a). Cluster hypo 1 is high according to (d), and hence corresponds to a pattern such as (b).
Fig 6.
Transcription Factor Binding and Expression Correlation with IGV.
(a) False discovery rate adjusted p-values and odds-ratios (OR) show enrichment of binding of specific transcription factors (TFs), to the gene body regions of the genes of each cluster. TFs for which binding is significantly over or under enriched (Fisher’s exact test, FDR q < 0.05) are coloured green and red, respectively. (b) TFs which show significantly more positive correlation with IGV of the genes they bind to, compared to the genes they do not bind to. (c) TFs which show significantly more negative correlation with IGV of the genes they bind to, compared to the genes they do not bind to. (d) TFs which are significant according to (a) and either (b) or (c); TFs with known relevance are indicated with a reference to the relevant study. The lack of enrichment of TF binding to the genes of cluster hypo2, is a reflection of the small number (19) of genes in this cluster.
Table 1.
Data-sets analysed.
Table 2.
Patient cohort details of the main DNA methylation data-sets analysed.
Fig 7.
Probability density distribution of the probabilities of a gene being included in a fitted model.
The plot shows a kernel-smoothed empirical estimate of the probability density distribution of the number of genes included in each model, f, over the 8281 significant gene body methylation variance model fits, with corresponding probability of a gene being included in a model pb = f/m, where m is the number of genes with gene body methylation variance information available.