Integrated meta-analysis of colorectal cancer public proteomic datasets for biomarker discovery and validation

doi:10.1371/journal.pcbi.1011828

Table 1.

List of the proteomics datasets used in this study.

More »

Expand

Table 2.

Clinicopathological features of colorectal adenoma and adenocarcinoma patients included in the proteomic datasets.

More »

Expand

Fig 1.

Scheme of the project design and the data reanalysis pipeline.

Workflow of the selection, curation, reanalysis and integration of public proteomic dataset containing CRC samples.

More »

Expand

Fig 2.

Colorectal cancer (CRC) solid samples.

(A) Number of canonical proteins identified across different solid tumor samples. (B) Range of normalised iBAQ protein abundances across different samples. (C) Number of canonical proteins identified across different datasets. (D) Range of normalised iBAQ protein abundances across different datasets. (E) Number of canonical proteins identified across either one, two or three of the different solid samples subgroups (mucosa, adenoma and tumor). The number within the parenthesis indicate the number of samples.

More »

Expand

Fig 3.

Colorectal cancer secreted samples.

(A) Number of canonical proteins identified across different secreted samples. (B) Range of normalised iBAQ protein abundances across different samples. (C) Number of canonical proteins identified across different datasets. (D) Range of normalised iBAQ protein abundances across different datasets. (E) Number of canonical proteins identified across either one, two, three or four of the different secreted samples subgroups. The number within the parenthesis indicate the number of samples.

More »

Expand

Fig 4.

Analysis of GO enrichment and concordance between solid and secreted samples.

(A) Plot summary illustrating the most altered GO terms in the “biological process” category in solid samples. (B) Enrichment in ‘Secretory granule lumen’ (GO0034774) category of upregulated and downregulated proteins when comparing mucosa, adenoma and tumor samples. (C) Venn diagram representing the detected canonical proteins in solid and secreted samples. (D) Boxplot showing the correlation between solid samples and each of the subgroups of secreted samples. (E) Significance of enrichment in GO Biological Process categories of altered proteins (tumor vs normal mucosa) in different secreted subgroups.

More »

Expand

Fig 5.

Correlation between transcriptomics and proteomics analysis.

(A) Volcano plot distribution of the proteomics data. Fold change (Tumor/mucosa) is represented. Dots are labelled according to the transcriptomic fold change. (B) Scatter plot of significantly altered canonical proteins. Pearson’s coefficient is shown. (C) Histogram distribution of protein expression levels (ranked bins) and (D) Cellular Component analysis of proteins or genes significantly altered between normal mucosa and tumor in proteomics and/or transcriptomics. More relevant categories were selected. (E) Correlation of hazard ratios obtained using proteomics data and RNA-seq data from CPTAC. (F) Correlation of hazard ratios obtained using proteomics data (CPTAC) and RNA-seq data (TCGA). Only significantly prognosis-associated proteins are shown. Pearson correlation coefficient (r) is indicated. (G) Portion of the canonical proteins significantly associated with survival according to proteomics (CPTAC) and/or transcriptomics (TCGA) data. Significance was calculated by Cox regression analysis. Chart pie of all the significant associated mRNA (TCGA) and proteins (CPTAC).

More »

Expand

Fig 6.

Validation at the proteomic level of the experimentally-based signature SEC6.

(A) Histogram distribution of the expression of the SEC6 proteins detected in more than 50% of the tumor samples using CPTAC dataset (iBAQ ranked bins are used). (B) Kaplan–Meier analysis of high- and low-expression patients in stage II and III patients. P values were obtained by log-rank test. (C) Kaplan–Meier analysis of high- and low-expression patients. Mean protein expression of CD109, LTPB1, NPC2 and PSAP was used for classification. P values were obtained by log-rank test. (D) SEC6 proteins distribution across blood-derived samples.

More »

Expand

Fig 7.

Identification of blood-detectable prognostic biomarkers.

(A) Flow-chart representation of sequential prognostic biomarkers selection. (B) Kaplan–Meier analysis of high- and low-expression patients. P values were obtained by log-rank test. ((C) Distribution of the protein expression (ppb) according to the proteomic subtypes’ classification [59]. (D) Distribution of the protein expression (ppb) according to the CMS classifier. (E) CD14, MRC2, PPIA, PRDX1,TXNDC5 distribution across blood-derived samples. (F) PRDX1, CD14 and PPIA distribution across interstitial fluid from tumor and mucosa according to the PXD005693 dataset. (G), PPIA and PRDX1distribution across EC vesicles from tumor and adjacent tissue according to the JPST000867 dataset.

More »

Expand

Fig 8.

Functional analysis of the identified biomarkers.

(A) STRING protein-protein interaction network of the five biomarkers. Proteins with at least medium confidence interaction (score>0.4) and a significant correlation (p<0.01 according to a Pearson correlation) with the corresponding biomarker were selected. Protein expression (ppb) resulting from the solid samples meta-analysis were used for calculating the correlations. (B) Functional enrichments (Biological Process) of the networks according to STRING.

More »

Expand