Table 1.
Clinical characteristics of patients analysed in WES 1: non-HCC liver injury samples.
Fig 1.
Bioinformatics analysis pipeline.
Each resultant data file is indicated by a sloped rectangle and each process represented by a square rectangle. Our pipeline contains 3 stages: alignment and calibration; variant calling and filtering; and variants annotation and filtration of putative germline mutations.
Table 2.
Summary data of publicly-available WES datasets used in this study.
Fig 2.
Absolute number of exonic variants and mutation subtypes in 1000G, liver injury, cirrhosis and HCC.
The exonic variants in each of the 5 datasets were enumerated (A and B) and then subdivided into 5 groups (missense, frameshift ins/del, stop-gain/-loss and non-frameshift ins/del) (C and D, expressed as a percentage of all somatic exonic mutations). 1000G and WES 1 (A and C) contain unpaired samples, while WES 2–4 (B and D) are composed of paired tumour and non-tumour samples taken from the same individual. Data are expressed as median (interquartile range). * p<0.05, ** p<0.01, *** p<0.001 and **** p<0.0001, Mann-Whitney U test (1000G and WES 1) or Wilcoxon matched-pairs signed-rank test (WES 2–4). NC-non-cirrhosis; C-cirrhosis; NT-non-tumour; T-tumour.
Fig 3.
DPMs in HCC and surrounding non-tumour tissue.
Variants were classified based on the predicted effect on the amino acid sequence (A). Total benign missense variants (B and D) and DPMs (C and E) in the datasets 1000G and WES 1–4 are shown as a percentage of all somatic exonic mutations. Significantly more DPMs (but not benign missense SNVs) were detected in tumour compared to paired non-tumour tissue (* p<0.05, ** p<0.01, *** p<0.001 and **** p<0.0001, Wilcoxon matched-pairs signed-rank test). Lines link matched non-tumour and tumour tissues samples. NC-non-cirrhosis; C-cirrhosis; NT-non-tumour; T-tumour.
Table 3.
Summary statistics for normalised DPMs between datasets.
Fig 4.
Driver mutations in non-tumour tissue.
Patient samples were separated based on the number of mutations in putative driver genes (x-axis, defined as the 20 top recurrently mutated genes in HCC according to COSMIC database, listed in S3 Table) and analysed the number of benign missense SNVs (A and C) and DPMs (B and D). Significant correlation between DPMs and putative driver mutations (p<0.0001, Spearman rank correlation test) was observed in non-tumour tissue of WES 2. No significant correlation was seen in HCC tissues (p>0.05, Spearman rank correlation test).
Fig 5.
Frequency distribution of DPMs.
A frequency distribution of the genes containing DPMs in 1000G and WES 1 (A), WES 2 (B), WES 3 (C), and WES 4 (D) shows that most are unique to a given patient. Each gene containing a DPM was grouped based on the number of patients in which that gene contained a DPM (x-axis).
Fig 6.
Hypothetical model of HCC progression.
HCC progression is presented here as multiple waves of driver sweeps within hepatocyte subclones. The equilibrium between DPM accumulation and negative selection on the hepatocyte subclones are shown in the top row. A schematic model of the liver (with each circle representing a hepatocyte and the colour gradient representing the DPM load within each hepatocyte) is shown in the centre row. The average DPM load for the tissue is depicted in the bottom row.