The Proteomics of Colorectal Cancer: Identification of a Protein Signature Associated with Prognosis

Colorectal cancer is one of the commonest types of cancer and there is requirement for the identification of prognostic biomarkers. In this study protein expression profiles have been established for colorectal cancer and normal colonic mucosa by proteomics using a combination of two dimensional gel electrophoresis with fresh frozen sections of paired Dukes B colorectal cancer and normal colorectal mucosa (n = 28), gel image analysis and high performance liquid chromatography–tandem mass spectrometry. Hierarchical cluster analysis and principal components analysis showed that the protein expression profiles of colorectal cancer and normal colonic mucosa clustered into distinct patterns of protein expression. Forty-five proteins were identified as showing at least 1.5 times increased expression in colorectal cancer and the identity of these proteins was confirmed by liquid chromatography–tandem mass spectrometry. Fifteen proteins that showed increased expression were validated by immunohistochemistry using a well characterised colorectal cancer tissue microarray containing 515 primary colorectal cancer, 224 lymph node metastasis and 50 normal colonic mucosal samples. The proteins that showed the greatest degree of overexpression in primary colorectal cancer compared with normal colonic mucosa were heat shock protein 60 (p<0.001), S100A9 (p<0.001) and translationally controlled tumour protein (p<0.001). Analysis of proteins individually identified 14-3-3β as a prognostic biomarker (χ2 = 6.218, p = 0.013, HR = 0.639, 95%CI 0.448–0.913). Hierarchical cluster analysis identified distinct phenotypes associated with survival and a two-protein signature consisting of 14-3-3β and aldehyde dehydrogenase 1 was identified as showing prognostic significance (χ2 = 7.306, p = 0.007, HR = 0.504, 95%CI 0.303–0.838) and that remained independently prognostic (p = 0.01, HR = 0.416, 95%CI 0.208–0.829) in a multivariate model.


Introduction
In the western world colorectal cancer (CRC) is the third most common type of cancer and the second most common cause of cancer death [1].Worldwide one million people each year will develop CRC and the incidence of this tumour is increasing [1].Most cases of CRC are sporadic resulting from the accumulation of somatic genetic aberrations and are associated with a variety of environmental risk factors [1,2].The remaining proportion of cases involve a familial genetic component.Numerous genetic aberrations accumulate including the inactivation of the adenomatous polyposis coli tumour suppressor gene and activation of oncogenes such as K-ras, deletion of chromosome 18q and amplification of 20q [1,3].Cumulatively these genetic changes afford the tumour anti-apoptotic, pro-angiogenic and proliferative properties.Recently it has been accepted that CRC is a genetically heterogeneous disease and two distinct pathways of carcinogenesis have been identified.Of sporadic CRC, 85% results from chromosomal instability and the remaining 15% from microsatellite instability [3].Rather than occurring as a linear multistep process, colorectal carcinogenesis is more likely to be the result of the complex interplay between multiple mutational pathways.This may partly explain the clinical heterogeneity of this disease and the great difference seen in outcome between individual patients [2].This emphasises the clear requirement to have refined methods of classifying and categorising colorectal cancer by identifying and validating appropriate biomarkers.
Molecular biomarkers can be categorised by their ability to aid prevention, promote early detection, establish prognosis and predict response of patient to specific therapies [4,5].The discovery of biomarkers will also aid in the understanding of the biological mechanisms underlying disease development and progression.Whilst genomics including epigenomics and transcriptomics have been influential in biomarker discovery, studying genes and gene expression does not accurately reflect the amount of protein expressed in the cell.Additionally proteins undergo many post-translational modifications which can affect their activation, interactions and function within a cell.Proteomics which is the global study of proteins has a key role in the potential identification of tumour associated biomarkers [6,7].The relationship between individual tumour biomarkers and colorectal cancer has been extensively investigated and studies have included biomarkers representing genes and proteins involved in many aspects of tumour development and progression including tumour invasion and metastasis, cell cycle regulation, growth factors and apoptosis associated proteins [5,[8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26].
In this study we have used comparative proteomic analysis (two dimensional gel electrophoresis, image analysis of gels and mass spectrometry) to identify proteins which are over-expressed in colorectal cancer, compared with morphologically normal colorectal mucosa.Overexpressed proteins have been validated by immunohistochemistry using a large well characterised set of colorectal cancers and a protein signature associated with prognosis identified.

Methods
Two dimensional (2D) gel electrophoresis 2D gel electrophoresis was performed using matched pairs of fresh frozen Dukes' B colon cancer and morphologically normal colonic mucosa (caecum and ascending colon, n = 15 and sigmoid colon n = 13) as previously described [20][21][22]27].All cases were selected from the Aberdeen colorectal tumour bank and the clinicopathological details of the samples used for proteomics are noted in Table 1.None of patients had received chemotherapy or radiotherapy prior to surgery.On collection, both tumour and normal colorectal mucosa were dissected from colorectal cancer excision specimens within 30 minutes of surgical removal, and immediately frozen in liquid nitrogen and stored at 280uC prior to analysis.
Frozen sections (20 microns thickness, n = 30) of each sample were cut and solubilised in lysis buffer [27].One section (10 microns thickness) from each sample was stained with haematoxylin and eosin and the morphological diagnosis confirmed by light microscopic examination.Following solubulisation, the samples were centrifuged to remove insoluble cellular debris  and treated with DNAse.2D gel electrophoresis was performed in duplicate for each sample using 13 cm pI3-10 non-linear immobilon strips (GE Health Care, Little Chalfont, UK) with proteins being separated according to charge (300 V, 6 mins, 3500 V, 90 min, 3500 V, 300 min), and subsequently molecular weight (100 V, 25 mA per gel for 60 min).Following completion of the electrophoresis, gels were stained with coomassie blue to visualize proteins spots.

Gel imaging and analysis
The gels were then scanned to produce 256 grey scale 24 bit images which were saved as TIFF files.The imaged gels were analysed using Progenesis SameSpots software (Non-Linear Dynamics, Newcastle-upon-Tyne, UK).All gel images were imported into Progenesis SameSpots for analysis.Image quality assessment was also done using the SameSpots software to ensure all images were in the correct format for analysis and had no other problems that could interfere with subsequent image analysis.All gels were initially automatically aligned onto one reference gel using the analysis software, then manually aligned to ensure proper alignment of all gels, allowing all spots to be detected, normalised and matched on all gels.Artefacts (e.g., dust particles or streaks detected as protein spots) were removed by manual editing.Reference image gels were created following gel alignment using the analysis software.Once aligned, gels were automatically analysed using the Progenesis SameSpots software.Gels were separated into 2 groups as either tumour or normal gels.Statistical analysis of protein expression levels were then determined for each spot based on mean spot volume, and differences in protein expression between tumour and normal gels were assessed by ANOVA.Spots with a p#0.05 were selected for inclusion in the results.Multivariate analysis was also done using Progenesis SameSpots and both correlation analysis and principle components analysis was performed on the imaged gels.Correlation analysis was performed on log normalised spot expression levels to group spots together according to similarities in their expression profiles.Principal components analysis used spot expression levels across all gels to separate the gels according to expression variation, allowing a graphical representation of the multidimensional data, clustered into the two groups; tumour and normal.A final report showing all analysed spots on the gel together with ANOVA values, ranks and expression profiles for each spot based on the average normalised volume for the groups was then produced.

Liquid chromatography-tandem mass spectrometry
Following 2D gel electrophoresis and image analysis of the gels the protein spots of interest (those spots which were significantly increased in the tumour samples) were excised from the gels and proteins identified by liquid chromatography-tandem mass spectrometry.
Peptide solutions were analysed using an HCTultra PTM Discovery System (Bruker Daltonics Ltd., Coventry, UK) coupled to an UltiMate 3000 LC System (Dionex (UK) Ltd., Camberley, Surrey, UK).Peptides were separated on a monolithic capillary column (200 mm internal diameter 65 cm in length; Dionex).Eluent A was 3% acetonitrile in water containing 0.05% formic acid, eluent B 280% acetonitrile in water containing 0.04% formic acid with a gradient of 3%-45% B in 12 minutes at a flow rate of 2.5 mL/min.Peptide fragment mass spectra were acquired in datadependent AutoMS(2) mode with a scan range of 300-1500 m/z, 3 averages, and up to 3 precursor ions selected from the MS scan 100-2200 m/z).Precursors were actively excluded within a 1.0 min window, and all singly charged ions were excluded.
Peptide peaks were detected and deconvoluted automatically using data analysis software (Bruker).Mass lists in the form of Mascot generic files were created automatically and used as the input for Mascot MS/MS Ions searches of the NCBInr database using the Matrix Science web server (www.matrixscience.com).The default search parameters used were: enzyme = trypsin, maximum missed cleavages = 1; fixed modifications = carbamidomethyl (C); variable modifications = oxidation (M); peptide tolerance 61.5 Da; MS/MS tolerance 60.5 Da; peptide charge = 2+ and 3+ and instrument = ESI-TRAP.
Both two dimensional gel electrophoresis and mass spectrometry were carried out by the University of Aberdeen Proteome facility (www.abdn.ac.uk/ims/proteomics/).
All cases were selected from the Aberdeen colorectal tumour bank.In total, tumour samples from 515 patients were involved in this study, in each case, a diagnosis of primary colorectal cancer had been made, and the patients had undergone elective surgery for primary colorectal cancer, in Aberdeen, between 1994 and 2007.99 tumours were from the period 1994-1998, 199 tumours were from 1999-2003 and 217 tumours were from the period 2004-2007.None of the patients had received any pre-operative chemotherapy or radiotherapy.The data for the patients and their tumours included in this study is detailed in supporting information Table S1.The mean lymph node yield for all tumours in this study was 13.4 lymph nodes per tumour and for node negative tumours the mean lymph node yield was 14.4 (lymph node yield refers to the total number of lymph nodes retrieved from each colorectal cancer resection specimen).Survival information was available for all patients and at the time of censoring patient outcome data there had been 237 (46%) deaths (all cause mortality).The mean patient survival was 114 months (95% CI 105-122 months).The colorectal cancer excision specimens were received fresh, opened above and when appropriate below the tumour, washed in cold water and then fixed in 10% neutral buffered formalin for at least 48 hours at room temperature and representative blocks were embedded in wax.Sections were stained with haematoxylin and eosin for histopathological diagnosis and the tumours were reported according to The Royal College of Pathologists guidelines which incorporate guidance from TNM5 of the TNM staging system.
A colorectal cancer tissue microarray was constructed containing normal colon mucosa (n = 50), primary (n = 515) and metastatic colorectal cancer (n = 224).The metastases were all from tumour involved lymph nodes of the Dukes C cases.Each normal mucosal sample was acquired from at least 10 cm distant from the tumour as previously described [28,29].All the cases were reviewed and areas of tissue to be sampled were first identified and marked on the appropriate haematoxylin and eosin stained slide by an expert consultant gastro-intestinal pathologist (GIM).Two 1 mm cores were taken from these areas of the corresponding wax embedded block using a Beecher Instruments tissue microarrayer (Sun Prairie, WI, USA) and placed in a recipient paraffin block.Following transfer, the recipient array block was heated to 37uC, and a glass slide was used to carefully press down the cores to ensure they were all at the same level within the recipient wax block.

Immunohistochemistry
Immunohistochemistry for each antibody (Table 2) was performed with the biotin free Dako Envision TM system (Dako, Ely, UK) using a Dako autostainer (Dako) as previously described [28,30,31].Sections of the tissue microarray were dewaxed in xylene, rehydrated in alcohol and an antigen retrieval step performed.This step consisted of microwaving the sections fully immersed in 10 mM citrate buffer at pH6.0 for 20 minutes in an 800 W microwave oven operated at full power.The sections were then allowed to cool to room temperature.The primary antibody appropriately diluted (Table 2) in antibody diluent (Dako) was Figure 2. Hierarchical cluster and principal components analyses of 2D gels.Representative hierarchical cluster analysis (A) and principal components analysis (B) of normal and tumour gels.Both statistical methods show that the protein expression profiles determined by 2D gel electrophoresis and gel image analysis are distinct in tumour samples compared with normal samples.The lower panel in each figure shows the standardised expression profiles.The figures presented in figure 2 are ''screen captures'' of the output of analysis by the Progenesis SameSpots software.Both figure 2A and figure 2B represents the results of the same experiment of one case (i.e. one pair of normal gels N1 and N2 and one pair of tumour gels T1 and T2).The lower panel in each part of the figure shows the standardised expression profile and represents proteins of distinct expression plotted vertically with lines ''connecting'' the corresponding proteins in each gel.The coloured spots represent interactive spots placed by the software for the user to access each data set and are positioned arbitrarily by the software on the screen.doi:10.1371/journal.pone.0027718.g002applied for 60 minutes at room temperature, washed with buffer (Dako) with subsequent peroxidase blocking for 5 minutes (Dako).This was followed by a single 2 minute buffer wash after which pre-diluted peroxidase-polymer labelled goat anti-mouse/rabbit secondary antibody (Envision TM , Dako) was applied for 30 minutes at room temperature, followed by further washing with buffer to remove unbound antibody.Sites of peroxidase activity were then demonstrated with diaminobenzidine as the chromogen applied for three successive 5 minute periods.Finally sections were washed in water, lightly counterstained with haematoxylin, dehydrated and mounted.Omitting the primary antibody from the immunohistochemical procedure and replacing it either with antibody diluent or non-immune rabbit serum as appropriate acted as negative controls.Positive controls were tissues known to express the individual protein.
The sections were evaluated by light microscopic examination and the intensity of immunostaining in each core assessed independently by two investigator (DO'D and GIM) using a scoring system previously described for the assessment of protein expression in tumour microarrays [28][29][30][31].The intensity of immunostaining in each core was scored as negative, weak, moderate or strong.The subcellular localisation (either nuclear or cytoplasmic) of the immunostaining was also assessed.Variation in immunostaining between cores of each case was not identified.Any discrepancies in the assessment of the tissue cores between the two observers were resolved by simultaneous microscopic reevaluation.

Assessment of microsatellite instability status
Microsatellite instability status (MSI) was assessed by immunohistochemistry using antibodies to MLH1 and MSH2 (Table 2) as described previously [30].

Statistics
Statistical analysis of the immunohistochemical data including the Mann-Whitney U test, Wilcoxon signed rank test, chi-squared test, hierarchical cluster analysis, Kaplan-Meier survival analysis, log-rank test and Cox multi-variate analysis (variables entered as categorical variables) including the calculation of hazard ratios and 95% CIs were performed using PASW v18.0.2 for Windows XP TM (SPSS UK, Ltd, Woking, UK).The log rank test was used to determine survival differences between individual groups.A probability value of p#0.05 was regarded as significant.To explore the influence of different cut-off points in relation to survival the immunohistochemical scores for each marker were dichotomized.The groups that were analysed were negative versus any positive staining, negative and weak staining versus moderate and strong staining and negative, weak and moderate staining versus strong staining.Hierarchical cluster analysis was carried out using the furthest neighbour method with the square Euclidean distance as the cluster measure and cluster analysis was performed without any transformation of the data or imputation of missing values [18,19].informed consent was obtained from participants who provided fresh samples of tissue for the proteomics component of the study.The research ethics committee waived the requirement for written consent for the retrospective tissue samples included in the colorectal cancer tissue microarray.

Proteomics
In total more than 1200 individual protein spots were resolved following separation by 2D gel electrophoresis and image analysis in normal colonic mucosa and colon tumours (Figure 1).Hierarchical cluster analysis and principle components analysis showed the separation of the proteins into two distinct groups-normal and tumour (Figure 2).The study included both proximal and distal colon tumours and neither cluster nor principle components analysis showed that there was any difference in protein expression profiles between tumour and normal mucosa in these anatomical locations.Proteins showing greater than and equal to 1.5 fold increased expression in tumour samples are summarised in supporting information Table S2.The identity of these proteins was mostly confirmed by liquid chromatography-tandem mass spectrometry.For each protein multiple peptides with a high statistical probability (p,0.05) of matches to the relevant protein were analysed to confirm identity.Details of mass spectrometric identification of proteins are shown in supporting information Table S3.

Immunohistochemistry
Fifteen proteins were selected for immunohistochemical validation.The criteria for the selection of the proteins included the degree of overexpression in colorectal cancer, exclusion of known structural e.g.actin and serum proteins e.g.haemoglobin and the Evaluation of normal colonic epithelium versus primary tumour samples for immunoreactivity (Mann-Whitney U test, q = increased in tumour, Q = decreased in tumour, -= no change between tumour and normal) and evaluation of primary Dukes C colorectal tumour samples and their corresponding metastasis samples for immunoreactivity (Wilcoxon signed rank sum test, q = increased in lymph node metastasis, Q = decreased in lymph node metastasis, -= no change between primary and metastatic tumour).doi:10.1371/journal.pone.0027718.t003availability of suitable validated antibodies which were effective on formalin fixed wax embedded tissue.All the proteins showed tumour cell staining and except nucleophosmin (NPM1) showed cytoplasmic staining (Figure 3).NPM1 showed exclusively nuclear staining while glyceraldehyde 3 phosphate dehydrogenase (GAPDH) showed both nuclear and cytoplasmic staining and these two sub-cellular localisations have been assessed separately for this protein.S100A9 showed both variable tumour cell staining (S100A9t) and variable stromal cell staining (S100A9s) and these two cellular localisations of this protein have been evaluated separately.The proteins that most frequently showed strong tumour cell immunoreactivity in primary colorectal cancer were NPM1 (99.6%), major vault protein (MVP, 81.1%) and prohibitin (PHB, 75.6%) while in lymph node metastasis those proteins which showed the most frequent strong tumour cell immunoreactivity were NPM1 (95.8%),MVP (74.5%) and heat shock protein 60 (HSP60, 63.9%) (Figure 4).In normal colon the proteins that showed the highest frequency of strong epithelial cell immunoreactivity were NPM1 (99.6%), isocitrate dehydrogenase 1 (IDH1, 93%) and lactate dehydrogenase B (LDHB, 82.1%) (Figure 4).
The proteins that showed the greatest degree of overexpression in primary colorectal cancer compared with normal colonic mucosa were HSP60 (p,0.001),S100A9 (p,0.001) and translatinally controlled tumour protein (TCTP, p,0.001,Table 3), while for Dukes C cancers no proteins showed increased immunoreactivity in the lymph node metastasis compared with the corresponding primary colorectal cancers and the proteins that showed the greatest decrease in expression in lymph node metastasis were PHB (p = 0.002), peroxiredoxin (PRDX1, p = 0.003) and HSP60 (p = 0.005, Table 3).The relationship of protein expression with individual Dukes stages is shown in Table 4 and Figure 5.
Hierarchical cluster analysis and identification of prognostic protein signature.Hierarchical cluster analysis was also used as an exploratory statistical tool to examine the overall relationship of marker expression with outcome and based on this identify a protein signature associated with prognosis.A range of cluster solutions (number of clusters) was investigated to determine the optimum number of clusters that produced groups with different outcomes.Clustering the data into ten clusters was identified as the optimum number of clusters for analysis in relation to the most prognostically significant groups (supporting information Table S4, Figure 6H and Figure 7).These 10 clusters were then combined into two prognostic groups; a good prognosis group (cluster 1) and a poor prognosis group (cluster groups 2-10) (Figure 6I).The good prognosis group (mean survival = 157 months 95% CI 135-177 months, n = 39, number of deaths = 8) had a significantly better survival (x 2 = 8.144, p = 0.004, HR = 0.373, 95% CI 0.179-0.757)than the poor prognosis group (mean survival = 106 months, 95%CI 1-2-119 months, n = 392, number of deaths = 183).
Further analysis of the data based on the distribution of proteins in these cluster groups identified a two protein signature of 14-3-3b and aldehyde dehydrogenase 1 (ALDH1) that showed greater prognostic significance (x 2 = 7.306, p = 0.007, HR = 0.504, 95%CI 0.303-0.838)than 14-3-3b alone (Figure 6J).Those tumours that were both 14-3-3b and ALDH1 negative had a better prognosis than tumours showing either 14-3-3b or ALDH1 positivity.For patients with 14  .Survival curves of marker proteins.The relationship of individual proteins evaluated by immunohistochemistry with survival with different cut-off points.A. 14-3-3b (positive v negative immunoreactivity), B. PHB (positive v negative immunoreactivity), C. IDH1 (negative/weak immunoreactivity v moderate/strong immunoreactivity), D. LDHB (negative/weak immunoreactivity v moderate/strong immunoreactivity), E. TCTP (negative/weak immunoreactivity v moderate/strong immunoreactivity), F. IDH1 (negative/weak/moderate immunoreactivity v strong immunoreactivity), G. MVP (negative/weak/moderate immunoreactivity v strong immunoreactivity), H. survival in each of 10 clusters identified by hierarchical cluster analysis (each cluster is numerically identified and corresponds to the clusters that are identified in the cluster analysis panel of Figure 7), I. survival in 2 clusters-cluster 1 and clusters 2-10 combined and J. two protein signature of 14-3-3b and ALDH1 showing that double negative tumours have a significantly better outcome.doi:10.1371/journal.pone.0027718.g006

Discussion
This study has performed a comprehensive proteomic analysis and immunohistochemical validation of protein expression in a large well characterised series of colorectal cancers (n = 515).The overexpression of individual proteins in colorectal cancer has been established and a two protein signature associated with prognosis identified.
There have been a number of proteomic studies performed on colorectal cancer.A range of proteomics technology have been utilised although the predominant technologies have been 2D gel electrophoresis combined with mass spectrometry which are both robust and well established technologies [32][33][34][35][36][37][38][39][40][41][42].In most of those proteomics studies usually only a small number (often less than 10) of tissue samples have been included and some of the tissue samples included in these studies have little or no clinico-patho logical information possibly as a consequence of the samples having been procured from a third party tissue or tumour bank.In the absence of the clinico-pathological information interpretation of the proteomic studies is more difficult.Similarly when a validation component has been incorporated into those studies these have often been limited by the number of samples included in this part of individual investigations [41].
Proteomics showed that the most significantly overexpressed in protein in colorectal cancer was the beta sub-unit of 14-3-3.The 14-3-3 proteins are phosphoserine/phosphothreonine binding proteins composed of seven subunits which can both homo-and heterodimerise [43][44][45].These proteins are involved in the regulation of multiple cellular signalling pathways including cell cycle regulation, apoptosis, metabolism, transcription and protein trafficking many of them in a phosphorylation dependent manner.They are known to interact with pathways e.g.ras/raf and AKT/ mTOR pathways involved in tumourigenesis [45].Other proteins that were shown to be ovexpressed in CRC included the metabolic enzymes (enolase 1 (ENO1), GAPDH, IDH1 and LDHB) involved in pathways of glucose metabolism.Some of these proteins have previously been noted to have increased expression in CRC by proteomics [41] and highlights the increased/altered glucose metabolism occurring in tumours [46].
The selection of proteins to be validated by immunohistochemistry was based on the degree of overexpression identified by the proteomic studies with the exclusion of structural and serum proteins and the availability of well characterised antibodies already shown to be effective on formalin fixed wax embedded tissue.The presence of 14-3-3b in colorectal cancer samples was confirmed by immunohistochemistry with a cytoplasmic location in tumour cells although its overexpression was not substantiated by immunohistochemistry.However, the comparative evaluation of proteins is based on different technologies.2D gel electrophoresis and images analysis identified and compares average spot volumes in a gel while immunohistochemistry identifies cellular/ subcellular location of the protein combined with a semiquantitative assessment of the intensity of immunoreactivity of the individual protein.
Two methods were used to explore the relationship of protein expression with clinico-pathological factors and outcome.Each marker was assessed independently as a discrete variable in univariate survival analysis while hierarchical cluster analysis was performed to explore the overall relationship of marker expression, clinicopathological factors and survival to provide a more detailed understanding of that relationship.
The relation of individual proteins with survival in univariate analysis was explored in the data set using different cut-off points to dichotomize the data.The most robust cut-off point would appear to be the division between absence and presence of immunoreactivity when considered on the likelihood of reproducibility.On that basis 14-3-3b was associated with prognosis with absent 14-3-3b being associated with a better prognosis.The use of other cut-off points highlight other potential markers (IDH1, LDHB, MVP, PHB and TCTP) however those cut-off points i.e. a division between weak and moderate staining or a division between moderate and strong staining are potentially much less robust in practice than a cut-off between negative and positive.
Hierarchical cluster analysis which has been widely applied to gene expression data sets but rarely immunohistochemical data [18,19] identified multiple clusters and based on cluster membership the combination of two proteins were identified namely 14-3-3b and ALDH1 as prognostically significant.It is interesting to note that ALDH1 has been proposed as a stem cells marker and has recently been suggested to be a marker of colon cancer stem cell [47].
The colorectal cancer tissue microarray was also specifically designed to include lymph node metastasis from those primary tumours with lymph node metastasis.This is a particular strength of the design of this tumour microarray and allowed a direct comparison of the phenotype of primary tumours and their synchronous lymph node metastasis.This is important for example as treatment in the adjuvant setting is targeted at metastatic disease and it is an assumption that the phenotype of primary tumours necessarily reflects the phenotype of secondary tumours [48,49].Expression in the metastasis is likely to be influence by the microenvironmental setting in which the metastasis develop [47].Most of the proteins examined showed decreased expression in the metastasis compared with their corresponding primary tumours indicating that further deregulation of protein expression is occurring in the lymph node metastasis.Most notably 14-3-3b, ALDH1 and PHB showed significant decreases in expression in lymph node metastasis compared with primary tumours providing evidence for further dysregulation of protein expression in metastasis [47].
In summary this study has performed a comprehensive proteomics analysis of colorectal cancer and identified proteins that are overexpressed in colorectal cancer.Validation has been performed using immunohistochemistry and a two-protein signature associated with prognosis identified.

Figure 1 .
Figure 1.Tumour and normal 2D gels.Representative reference 2D gels of normal colon (A) and colon tumour (C).These are the annotated reference gels created by the Progenesis Same Spots gel image analysis software for the analysis of individual gels.The number of each spot is assigned by the image analysis software.For easier visualisation of individual spots representative non-annotated 2D gels of normal colon and colon tumour are shown in panels B and D respectively.The proteins which were validated by immunohistochemistry have been identified in D. doi:10.1371/journal.pone.0027718.g001

Figure 4 .
Figure 4. Frequency expression of individual proteins in normal colon, colorectal cancer and metastatic colorectal cancer.Frequency of expression as evaluated immunohistochemically of individual proteins in A. normal colon, B, primary colorectal cancer and C. lymph node metastasis.doi:10.1371/journal.pone.0027718.g004

Figure 6
Figure 6.Survival curves of marker proteins.The relationship of individual proteins evaluated by immunohistochemistry with survival with different cut-off points.A. 14-3-3b (positive v negative immunoreactivity), B. PHB (positive v negative immunoreactivity), C. IDH1 (negative/weak immunoreactivity v moderate/strong immunoreactivity), D. LDHB (negative/weak immunoreactivity v moderate/strong immunoreactivity), E. TCTP (negative/weak immunoreactivity v moderate/strong immunoreactivity), F. IDH1 (negative/weak/moderate immunoreactivity v strong immunoreactivity), G. MVP (negative/weak/moderate immunoreactivity v strong immunoreactivity), H. survival in each of 10 clusters identified by hierarchical cluster analysis (each cluster is numerically identified and corresponds to the clusters that are identified in the cluster analysis panel of Figure7), I. survival in 2 clusters-cluster 1 and clusters 2-10 combined and J. two protein signature of 14-3-3b and ALDH1 showing that double negative tumours have a significantly better outcome.doi:10.1371/journal.pone.0027718.g006

Figure 7 .
Figure 7. Hierarchical cluster analysis of immunohistochemical marker proteins.Graphical representation of the immunohistochemistry marker data is shown in the middle panel.The right hand panel shows the results of the hierarchical cluster analysis presented as a dendrogram with 10 individual clusters identified.The left hand panel shows an expanded segment of the graphical representation.Proteins are represented in columns and cases in rows.doi:10.1371/journal.pone.0027718.g007

Table 1 .
Clinico-pathological details of tumour samples used for proteomic analysis.
22 of the tumours used for the proteomics studies are also represented in the colorectal cancer TMA used for protein validation.doi:10.1371/journal.pone.0027718.t001

Table 3 .
Comparison of protein expression in normal colonic mucosa, primary colorectal cancer and lymph node metastasis.

Table 4 .
The relationship of protein expression in Dukes A, Dukes B and Dukes C colorectal cancers (Mann-Whitney U test). doi:10.1371/journal.pone.0027718.t004

Table 6 .
The relationship of individual protein expression with survival (log rank test) using different cut-off points for the immunohistochemical data.