Proteogenomic analysis of pancreatic cancer subtypes

Pancreatic cancer remains a significant public health problem with an ever-rising incidence of disease. Cancers of the pancreas are characterised by various molecular aberrations, including changes in the proteomics and genomics landscape of the tumour cells. Therefore, there is a need to identify the proteomic landscape of pancreatic cancer and the specific genomic and molecular alterations associated with disease subtypes. Here, we carry out an integrative bioinformatics analysis of The Cancer Genome Atlas dataset, including proteomics and whole-exome sequencing data collected from pancreatic cancer patients. We apply unsupervised clustering on the proteomics dataset to reveal the two distinct subtypes of pancreatic cancer. Using functional and pathway analysis based on the proteomics data, we demonstrate the different molecular processes and signalling aberrations of the pancreatic cancer subtypes. In addition, we explore the clinical characteristics of these subtypes to show differences in disease outcome. Using datasets of mutations and copy number alterations, we show that various signalling pathways previously associated with pancreatic cancer are altered among both subtypes of pancreatic tumours, including the Wnt pathway, Notch pathway and PI3K-mTOR pathways. Altogether, we reveal the proteogenomic landscape of pancreatic cancer subtypes and the altered molecular processes that can be leveraged to devise more effective treatments.

Studies of many cancers, including those of the breast, ovary and colon, have shown that the transcriptome is poorly correlated to the proteome, with only 10%-20% of the variation in a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 protein level explained by mRNA transcription levels [25,26]. Recent studies have conducted a proteomic analysis of pancreatic cancer [27][28][29][30][31][32][33]; however, they have not linked the proteomic subtypes with alterations in various signalling pathways to the specific cancer gene mutations and/or the disease outcomes of the afflicted patients.

Proteomics subtypes of pancreatic cancer
We applied unsupervised K-means clustering with a squared Euclidean metric to the proteomics data of the pancreatic cancer samples available in the TCGA to identify two consistent clusters of patients tumours (Fig 1A) [34]. One cluster comprised 67 tumour samples that we named as subtype-1, and the other cluster comprised 34 tumour samples which we named subtype-2. Furthermore, we evaluated the reproducibility of our two-cluster classification of pancreatic tumours using a supervised machine learning classification approach. Here, we used a Kernel naïve Bayes method [35] to show that we could accurately predict tumour subtypes with an accuracy of 98% ( Fig 1B) and area under the curve of 0.97 ( Fig 1C). Furthermore, we compared the current proteomics subtypes of pancreatic cancer to the expression subtypes previously described in the literature [23]. Here, we found that the subtypes-1 and subtype-2 tumours were not consistently classified into any previously described expression subtypes of pancreatic cancer (Fig 1D).

Clinical characteristics of the proteomic subtypes of pancreatic cancer
We found a similar distribution of tumours of various grades (Fig 2A) and the patients living at the end of follow up ( Fig 2B) for each of the two pancreatic cancer subtypes. Furthermore, we found similar distributions in the age, gender, and the diagnosis of diabetes between the patients afflicted with the two subtypes of pancreatic cancer (Fig 2C).
We found that of the patients afflicted with subtype-1 tumours, 38% were disease-free at the end of the follow-up, whereas 62% of the patients had a disease that either progressed or recurred, with only 19% of these patients surviving by the end of the follow-up period. For patients afflicted with subtype-2 tumours, 31% were disease-free at the end of the follow-up, whereas 57% of the patients had a disease that either progressed or recurred, with only 17% of them surviving by the end of the follow-up periods ( Fig 2D).
However, after comparing the median disease-free survival (DFS) period, we found that the duration was shorter for patients afflicted with subtype-2 tumours (DFS = 13.5 months) than for patients with subtype-1 tumours (DFS = 17.1 months; Fig 1E). However, we found no statistically significant difference (Kaplan-Meier test; p = 0.089) in the duration of the DFS between the pancreatic cancer subtypes. Likewise, the overall survival (OS) periods for the patients with subtype-2 tumours (OS = 18.7 months) were not significantly different (Kaplan-Meier test; p = 0.157) to those of subtype-1 tumours (OS = 19.7 months; Fig 1F) [36].

Altered signalling pathways and molecular processes distinguish disease subtypes
Next, we compared the expression levels of various proteins between the two disease subtypes. We found that the subtype-1 tumours expressed significantly higher levels of several proteins, including mTOR, E-Cadherin, and Raf-pS338, compared to the subtype-2 tumours (S1 File). Conversely, the subtype-2 tumours expressed significantly higher levels of several proteins, including Stathmin, Mre11 and MAP2K1, than the subtype-1 tumours (S1 File).
We applied Gene Set Enrichment Analysis (GSEA) [37] to extract knowledge of the KEGG pathways and Gene Ontology (GO) Molecular Function terms that are enriched for in the subtypes-1 tumours compared to the subtype-2 tumours. We found that the two KEGG pathways most significantly enriched for in subtype-1 tumours were those involving thermogenesis and mTOR signalling ( Fig 3A). Furthermore, within the mTOR signalling pathway, we found several oncogenes, including mTOR, AKT1 and AKT2 and tumour suppressor genes that were
Among the GO molecular processes, we found that the subtype-1 tumours were enriched for, among others, the GO terms associated Purine Nucleotide Triphosphate Binding, ATP Binding and among others (Fig 3B, also see S2 File).

PLOS ONE
Overall, our findings revealed the distinct molecular mechanisms by which the development and progression of subtype-1 and subtype-2 may occur. For example, the KEGG pathway "mTOR signalling" forms a network whose nodes are significantly upregulated in subtype-1 tumours and are previously linked to oncogenesis, but in this case, we suggest that this is likely only correct in subtype-1 tumours and not in subtype-2 tumours (Fig 3C). We also identified proteins involved in the GO term "Purine Ribonucleoside Triphosphate Binding", a molecular

c)
A network of the genes that encompass the KEGG pathways "mTOR signalling pathway" that we found significantly enriched in subtype-1 tumours compared to subtype-2 tumours. The nodes are coloured using the degree of statistical significance for each protein (negative logarithm of the p-values) between subtype-1 and subtype-2 tumours. (d) A network of the genes that encompass the GO-term molecular function "Purine Ribonucleoside Triphosphate Binding" that we found significantly enriched in the subtype-1 compared to the subtype-2 tumours. The nodes are coloured based on the degree of statistical significance for each protein (negative logarithm of the p-values) between subtype-1 and subtype-2 tumours with redder colours indicating a higher level of statistical significance. https://doi.org/10.1371/journal.pone.0257084.g003

PLOS ONE
process that may only play an essential role in the oncogenesis of the subtype-1 tumours (Fig 3D).

The mutational landscape of proteomics subtypes of pancreatic cancer
We evaluated the extent of gene mutations and copy number variations (which we collectively refer to as gene alterations) in pancreatic cancer. Focusing only on the Consensus Cancer Genes [41], we found no significant difference in the gene alteration spectrum between the two pancreatic cancer subtypes (Fig 4A and S2 File). Across the disease subtypes, we revealed that, as previously reported by others, the most frequently altered genes were KRAS (altered in 91% of all samples), TP53 (71%), CDKN2A (42%) and SMAD4 (36%); Fig 4A [ 2,16,42].
Interestingly, we observed that the mutations in SMAD4 and those in CDKN2A tended towards being mutually exclusive (co-occurrence odd ratio = -0.137, p = 0.522; Fig 4B). These findings show that the different gene alterations (either copy number alterations or mutations) in SMAD4 and CDKN2A drive pancreatic cells towards the malignant phenotype by perturbating the signalling pathways in which either SMAD4 or CDKN2A participate (Fig 4C and 4D).
We found that in pancreatic tumours, genomic alterations are common within genes that are members of various well-known cell signalling pathways. Among the pathways with gene alterations were the Receptor tyrosine kinase-Ras pathway (altered in 75% of all tumours; S1 Fig), Wnt pathway (altered in 29% of all tumours; Fig 5A), the PI3K-mTOR pathway (23%; Fig  5B) and Transforming Growth Factor Beta Signalling Pathway (48%; Fig 5C). These cell signalling pathways have been reported altered in various cancers, including those of the lungs, skin, and breast, where they have also been shown to promote oncogenesis [22, [43][44][45]. Accordingly, we suggest that these signalling pathways may play essential roles in pancreatic cancer and may present inflexion points for targeted therapies to cure pancreatic cancer.

Discussion
We conducted an integrated analysis of proteomics, clinical outcomes, mutations and copy number alterations of pancreatic cancer. Using machine learning, we showed that pancreatic tumours could be classified into two distinct subtypes. Patients afflicted with these disease subtypes show relatively similar demographics, suggesting that the onset of these cancer subtypes are not associated with the clinical parameters, such as age, gender and diabetes. Furthermore, we found that the disease outcomes are similar between the disease subtypes.
Among the subtype-1 tumours, we found significant enrichment for the mTOR signalling pathway (Fig 3C). Recent studies show that particular subtypes of pancreatic tumours respond well to anticancer agents that target the mTOR signalling pathway [46,47]. Therefore, we expect that the subtype-1 are possibly more responsive to drugs that target the mTOR than will the subtype-2 tumours.
Our results revealed that the mutations and copy number alteration were similar among the proteomic subtypes of pancreatic cancer. This finding may indicate that the primary genomic drivers of pancreatic oncogenes are similar among disease subtypes. The observed widespread alterations in the KRAS oncogene and TP53 tumour suppressor genes further confirm this ( Fig 4A). As others have suggested, mutations in these genes likely perturb signalling through both the MAPK pathway and p53 and cell cycle pathways [48,49]. Other gene alterations, such as those we found in the CDKN2A and SMAD4 genes (Fig 4B), which we have shown to be mutually exclusive, should exert selective pressure that transforms the normal and pre-malignant cells [50][51][52]. Many other gene alterations in distinctive genes that participate in the same signalling pathway, such as the Notch pathway, PI3K-mTOR pathway (S1 Fig) and metastasis pathway, likely come in later and contribute toward the progression of the disease [50,53]. Recently, oncogenesis theory has been extended beyond the accumulation of genetic mutations [54,55] to include the disruption of epigenetic regulatory mechanisms and variations in miRNA expression [55][56][57][58][59][60]. Unlike gene mutations, we currently lack adequate tools to identify the driver alterations in epigenome and miRNAs [57,58,[61][62][63]. Accordingly, we suggest that differences in the proteome (and probably the methylome, miRNA and transcriptomes) are the likely drivers of variance between the pancreatic cancer subtypes defined here and previously [10,16,17,23].
Altogether, we have revealed the clinical and molecular characteristics of two distinct subtypes of pancreatic cancer. We have further shown that one subtype exhibits hyperactivation

PLOS ONE
of various pathways, including mTOR signalling, for which proteins of these pathways present a variety of potential disease subtype-specific biomarkers and drug targets.

Methods
We obtained the TCGA project [64] datasets of 186 pancreatic cancer patients obtained from cBioPortal (http://www.cbioportal.org) [65]. We only returned and analysed 124 pancreatic cancer patients' samples profiled using reverse-phase protein array-based (RPPA) proteomics data. Furthermore, we also utilised DNA copy number alterations and mutation data and comprehensively de-identified clinical and sample information.

Proteomics classification of pancreatic cancer
We applied unsupervised machine learning methods to classify the pancreatic tumour samples based on the protein expression levels measured using RPPA. To evaluate the optimal number of clusters, we used the Calinski-Harabasz clustering evaluation criterion, which showed that

PLOS ONE
the optimal number of clusters is two (S2A Fig) [66]. Then we applied unsupervised K-means over 1000 iterations with the squared Euclidean distance metric and chose the clustering solution with the highest average Silhouette score to define the proteomic data to identify disease subtypes [67]. Next, we visualised the clustering of the tumours; we reduced the dimensions of the proteomics measured data using Principal Component Analysis [68,69]. Finally, we plotted the first two dimensions of the principal components with points coloured based on the Kmeans clustering group assignment (Fig 1A). To evaluate the coherence of our K-mean clustering solution, we applied a supervised Kernel Naïve Bayes algorithm and evaluated the model's performance using 10-fold cross-validation. Here, we used Bayesian optimisation to select the optimal machine learning hyperparameters for the Naïve Bayes algorithm (S2B Fig).

Functional enrichment analyses
We downloaded the 2019 KEGG database and 2021 GO molecular enrichment. Then for each gene set within each database, we modified the gene sets by returning only the genes present in our proteomic dataset, thus limiting the gene background to genes only present in the proteomic data. Finally, we used Gene set enrichment analysis (GSEA) [37] to determine the KEGG pathway and GO molecular functions that are enriched for in subtype-1 tumours compared to subtype-2 tumours (see S2 File).
Next, we retrieved known protein-protein interactions from the University of California Santa Cruz Super pathway, the Kinase Enrichment Analysis, and Chromatin Immunoprecipitation Enrichment Analysis [70][71][72]. Then, we used these interactions to connect proteins that are members of the GO molecular function terms "Purine Ribonucleoside Triphosphate Binding" and the KEGG term mTOR signalling pathway. Then, we used yEd to visualise the overall connectivity of two resulting networks (Fig 3C and 3D).

Analysis and mutations and copy number alterations
We evaluated the scope of gene alterations in the pancreatic subtypes using the mutation (single nucleotide polymorphisms and indels) data and copy number alterations data. First, we combined these two gene alteration data. Then we returned only genes that are associated with human cancers using information from the Sanger Consensus Cancer Gene Database [41]. Furthermore, the oncogenes and tumour suppressor genes in the gene alteration datasets were annotated using information from the UniProt Knowledgebase, the TSGene database, and the ONGene database [41,[73][74][75]. Finally, we compared gene alterations between the disease subtypes using the Chi-square test. Also, we plotted the spectrum of genomic alterations for the fourteen most altered genes in the samples using a custom function (Fig 4A). To assess which signalling pathways, we used the PathwayMapper software [76].

Survival analysis
The Kaplan-Meier method was used to estimate overall survival and the duration of progression-free survival between the current subtype-1 and the subtype-2 of pancreatic cancer [36]. Furthermore, the Kaplan-Meier method was applied to assess the difference in the overall survival of patients afflicted with previously described pancreatic cancer subtypes taken from the supplementary files of the publication by The Cancer Genome Atlas Network [13].

Statistical analyses
We used MATLAB version 2020a to perform all the analyses presented here. Fisher's exact test was used to test for associations between categorical variables. The Welch test and the Wilcoxon rank-sum tests were used to compare differences in the tumour subtypes for the continuous variables among the various categories. We considered comparison as statistically significant when p-values are < 0.05 for single comparisons and when the Benjamini-Hochberg adjusted p-values are < 0.05 for multiple comparisons.