Conceived and designed the experiments: TG. Performed the experiments: TG. Analyzed the data: TG JC. Contributed reagents/materials/analysis tools: JC. Wrote the paper: TG MM.
The authors have declared that no competing interests exist.
Along the transformation process, cells accumulate DNA aberrations, including mutations, translocations, amplifications, and deletions. Despite numerous studies, the overall effects of amplifications and deletions on the end point of gene expression—the level of proteins—is generally unknown. Here we use large-scale and high-resolution proteomics combined with gene copy number analysis to investigate in a global manner to what extent these genomic changes have a proteomic output and therefore the ability to affect cellular transformation. We accurately measure expression levels of 6,735 proteins and directly compare them to the gene copy number. We find that the average effect of these alterations on the protein expression is only a few percent. Nevertheless, by using a novel algorithm, we find the combined impact that many of these regional chromosomal aberrations have at the protein level. We show that proteins encoded by amplified oncogenes are often overexpressed, while adjacent amplified genes, which presumably do not promote growth and survival, are attenuated. Furthermore, regulation of biological processes and molecular complexes is independent of general copy number changes. By connecting the primary genome alteration to their proteomic consequences, this approach helps to interpret the data from large-scale cancer genomics efforts.
In the course of cancer development, cells lose regulation of the cell cycle and quality control of DNA replication. As a result, many genomic alterations accumulate, among them amplifications and deletions of chromosomal regions of varying sizes. Oncogenes that drive transformation often reside in amplified regions, while tumor suppressors are deleted, yet for thousands of genes the effect of altering gene copy number is unknown. Since only genomic alterations that ultimately affect protein levels can have functional importance, a global proteomic approach that directly measures such changes is desirable. Here, we examined output of chromosomal alterations on the proteins in a system-wide manner. We analyzed the global protein expression of cancer cells compared to normal cells using mass-spectrometry–based quantitative proteomics and quantified a large part of the expressed proteome. We compared the protein data to genomic data and matched changes in gene copy number to protein expression level changes for each gene. Overall, gene copy number changes explain only a few percent of observed protein expression changes. Knowledge of when genomic and proteomic changes correlate may help in a better understanding of regulatory mechanisms in tumor development.
Chromosomal aberrations are a hallmark of cancer cells. During transformation cells lose cell-cycle control and fidelity of DNA replication causing multiple changes in DNA copy numbers
For better understanding of the general output of chromosomal changes, the protein level therefore has to be globally examined. Such knowledge can be crucial as it can suggest novel potential drivers of transformation and, as already shown in specific cases in the past, help determine treatment modalities and prognosis
To study the effects of genomic alterations on the protein level, we performed quantitative proteomic analysis of two aneuploid breast cancer cell lines and normal diploid cells. We SILAC-labeled the MCF7 breast cancer cell line with heavy lysine and arginine to serve as internal standard for quantification. The lysate of the labeled cells was combined with normal mammary epithelial cells (HMEC) or with two breast cancer cell lines - HCC2218, derived from a patient with Stage III ductal carcinoma and HCC1143, derived from a patient with Stage II ductal carcinoma (
For proteomic analysis lysates of each of the non-labeled cells (HMEC, HCC1143 and HCC2218) were mixed with lysate of SILAC-labeled MCF7 cells. Proteins were trypsin-digested and analyzed by LC-MS using high resolution mass spectrometry. For genomic analysis, genomic DNA was isolated from HMEC, HCC1143 and HCC2218 cells and hybridized with a SNP arrays.
For the analysis of chromosomal aberrations, we mapped the copy number changes in the genome of HCC2218, HCC1143 and HMEC with SNP arrays (Affymetrix- Genome-wide Human SNP Array 6.0;
A density plot of gene copy number of the HMEC indicates that these cells are diploid and therefore can serve as a normal control (
(A) Density plot of Affymetrix smooth signal in the HMEC control cells. A small peak at zero was removed, which was caused by probes for the Y-chromosome, which was absent in this female cell line. (B,C). Scatter plot of gene copy number in HCC2218 cells (B) or HCC1143 cells (C) normalized to the copy number of genes in HMEC vs. the ratio of the proteins in HCC1143 or HCC2218 cells relative to HMEC. The rectangle in the upper left part of (B) encloses genes with increased gene copy compared to control cells but decreased protein expression. The rectangle in the lower right contains single copy genes with increased protein expression compared to control cells.
The plots of the gene copy number vs. the protein level show that the genome is distributed around integer values corresponding to 0, 1, 2, 3, 4 gene copies (
Many chromosomal changes can be inferred from mRNA data
After genome profiling, the correlation between the calculated change in protein amounts at each genome position and the corresponding change in gene copy number was greatly increased (0.64 and 0.59 for HCC2218 and HCC1143, respectively). We plotted the calculated proteomic values against their chromosomal location to visualize amplifications and deletions along the chromosomes (
Protein ratios were averaged according to their localization using the genome profiling algorithm (
To examine whether our predictions were correct despite the low correlation between genome and proteome, we performed a similar alignment of the genomic data. We plotted the smoothed data of the SNP array (normalized to the control cells) directly according to the genomic location. Although not all aberrations had a detected proteomic output; remarkably, in each of the predicted locations, we indeed found a matching change in the SNP array data (
While the correlation between the gene copy number and the proteins was very low, it was still possible that the altered genes would globally affect specific pathways and processes, to confer a growth advantage to the aneuploid cells. We comprehensively analyzed each process to determine to what extent it is regulated on the protein level or on the genomic level. We developed a two-dimensional annotation distribution analysis tool (see
Scatter plot of normalized annotation changes on the genome level against the protein level. Calculation of significance is detailed in the
Our two-dimensional annotation analysis further highlighted a number of protein complexes, such as the proteasome, ribosome, spliceosome and NADH dehydrogenase complex. We found that the proteins of these complexes always maintain equal protein ratios, despite variation in the gene copy number of their subunits (
(A) Scatter plot of global ratio distribution of genes vs. proteins in HCC2218. The core 20S proteasome components are highlighted in red. (B) Scatter plot with the 26S proteasome highlighted in red. (C) Stacked plot of protein, gene and mRNA level of 14 proteasomal subunits, normalized to the level in HMEC.
We showed above that cellular processes and molecular machines do not obey gene dosage changes. But as primary events in transformation, amplification of deletion of key regulatory genes may impact the functionality of the whole process. Indeed, oncogenes and tumor suppressors are often amplified or deleted in the genome
We zoomed-in on the small amplicons encompassing ERBB2, CCND1 and AKT1 to examine the effects of these amplifications on the expression levels of adjacent genes (
Zoom-in on the small amplicons surrounding ERBB2 in HCC2218 cells (A) CCND1 in HCC1143 cells (B) and AKT1 in HCC1143 (C). Fold changes in gene copy number compared to HMEC are marked with red rectangles; the fold changes in protein level are marked with blue diamonds.
The amplicon surrounding CCND1 gene includes five genes – of them we quantified four (
The amplicon surrounding AKT1, an oncoprotein that mediates cell growth and survival
Extrapolating from the proteins with a known role in the etiology of cancer, we created a list of potential novel regulators of transformation. We listed the overexpressed proteins encoded by amplified genes in HCC2218 and in HCC1143 cells (
We conclude that with high coverage of the proteome and high quantification accuracy, multiple chromosomal aberrations can be predicted directly from the proteomic data. Furthermore, proteomics can determine which genes in an amplified region are expressed at all and which are changing at the endpoint of the gene expression cascade – the level of the proteins. As expected, the expression of some oncogenes and tumor suppressors is affected by gene copy number. However, our data clearly show that in the majority of cases, there is no direct correspondence between the gene copy number change and the corresponding protein change. We suggest that proteomics is a useful complement to widely employed gene copy number analysis. It can determine if genome amplifications or deletions have a downstream effect on the level of the protein - a precondition for a potential impact on the transformation process.
Human mammary epithelial cells (HMEC) were obtained from Lonza and cultured in mammary epithelial cell growth medium (ECACC- Health Protection Agency). HCC1143 and HCC2218 cells were obtained from the American Type Culture Collection (ATCC), and grown in RPMI containing 10% FBS. MCF7 cells were obtained from the German Collection of Microorganisms and Cell Cultures (DSMZ). MCF7 cells were SILAC labeled by culturing them in DMEM where the natural lysine and arginine were replaced by heavy isotope labeled amino acids, L-13C615N4-arginine (Arg10) and L-13C615N2-lysine (Lys8). Labeled amino acids were purchased from Cambridge Isotope Laboratories, Inc, USA. The medium was supplemented with 10% dialyzed serum. Cells were cultured for approximately 8 doublings in the SILAC medium to reach complete labeling. For proteomic analysis each of the cell lines was analyzed in biological triplicates. The first two replicates were lysed with modified RIPA buffer (50 mM Tris HCl pH 7.4, 150 mM NaCl, 1 mM EDTA, 1% NP40, 0.25% sodium deoxycholate and protease inhibitors) at 4°C. Following lysis, lysates were centrifuged at 14,000 rpm at 4°C. Proteins were then precipitated over-night with acetone, and resuspended in 8 M urea (6 M urea, 2 M thiourea). Cells of the third replicate were lysed with a buffer containing 4% SDS, 100 mM Tris-HCl pH 7.6 and 100 mM DTT. Lysates were incubated at 95°C for 5 min, and then briefly sonicated.
Genomic DNA was isolated from the cells using QIAmp DNA Blood Maxi Kit. DNA was hybridized with the Affymetrix Genome-Wide Human SNP Array 6.0 according to the manufacturer's instructions. SNP array analysis was done in the Microarray DNA facility at the Max Planck Institute of Molecular Cell Biology and Genetics, Dresden. Raw files were analyzed with “Copy Number and LOH analysis” algorithm from the Affymetrix Genotyping console. We used the default settings with the HapMap270 as reference, quality assessment and regional GC correction configuration. The ‘SmoothSignal’ column from the Affymetrix software output was used directly for the genome profile in
Each of the non-labeled samples (HMEC, HCC1143 or HCC2218) was mixed at a ratio 1∶1 with labeled MCF7 cells. Two methods were used for trypsin digest. In-solution digestion was used for the first two replicates, where cells were lysed with RIPA buffer. Filter Aided Sample Preparation (FASP)
mRNA was isolated from HMEC, HCC1143 and HCC2218 using PrepEase RNA Spin Kit (USB). Two micrograms of each mRNA were reverse-transcribed using First strand cDNA Synthesis Kit (Fermentas) with oligo-dT primers. For real-time PCR, we used IQ SYBR-green Supermix (Biorad) on a C1000 Thermal Cycler (Biorad). Method included 40 cycles of amplification with annealing and elongation temperature of 54°C or 58°C. Primers for GAPDH were used for normalization. List of primers is given below (5′-3′):
PSMA1:for
PSMA2:for
PSMA3:for
PSMA4:for
PSMA5:for
PSMA6:for
PSMA7:for
PSMB1:for
PSMB2:for
PSMB3:for
PSMB4:for
PSMB5:for
PSMB6:for
PSMB7:for
GAPDH: for
Peptides were separated according to their isoelectric-point using an Agilent 3100 OFFGEL fractionator (Agilent,G3100AA) as described previously
Peptides were separated by reverse-phase chromatography on an in-house made 15 cm column (inner diameter 75 µm, 3 µm ReproSil-Pur C18-AQ media), using a nanoflow HPLC system (Proxeon Biosystems). HPLC was coupled on-line via a nanoelectrospray ion source (Proxeon Biosystems) to a LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific). Peptides were loaded onto the column with buffer A (0.5% acetic acid) with a flow rate of 500 nl/min, and eluted with 90 min linear gradient at a flow rate of 250 nl/min. After the linear gradient the column was washed with 90% buffer B and re-equilibrated with buffer A. Mass spectra were acquired in the positive ion mode applying a data-dependent automatic switch between survey scan and tandem mass spectra (MS/MS) acquisition. Samples were analyzed with a ‘top 5’ method, acquiring one Orbitrap survey scan in the mass range of m/z 300–2000 followed by MS/MS of the five most intense ions in the LTQ. The target value in the Orbitrap was 1,000,000 ions for survey scan at a resolution of 60,000 at m/z 400 using lock masses for recalibration
Raw MS files from the LTQ-Orbitrap were analyzed by MaxQuant
The algorithm is applied to the log ratios between relative protein levels of a cancer cell to a normal cell. Chromosomal locations are assigned to proteins according to the Ensembl annotation that is included into Uniprot. On each chromosome the sequentially ordered proteins are checked for significant regional deviations of their normalized log ratios from zero. For that purpose windows encompassing various numbers of adjacent proteins are moved along the chromosome, and the deviation of the window mean from zero is tested with a one-sample t-test. Window sizes range from 3 proteins to the whole chromosome in steps of factors of square root of 2. Each log p-value was transformed in a window-length dependent way to a posterior error probability, applying Bayes rule to two-dimensional histograms. To correct for multiple hypothesis testing, a false discovery rate of 2% was applied by permutation-based estimation on the basis of 10 randomized genomes. The final amplification or deletion profile is then calculated from the window medians of all windows in which the average value differs statistically significantly from zero. At each position each intersecting significant window is considered and among those the value is taken that deviates most from zero. This is then the value of the amplification/deletion profile reported at this position. To obtain copy numbers, these values have to be exponentiated and multiplied by two. Protein ratios and the corresponding gene copy number changes are given in
Categorical annotation is supplied in form of Gene Ontology (GO) biological process (BP), molecular function (MF) and cellular component (CC) as well as participation in a KEGG pathway and membership in a protein complex as defined by CORUM. The chromosome of the corresponding gene was considered as an additional protein annotation. For each annotation term proteins are separated into two groups, one containing the proteins annotated with this term and the other containing the complement. A two-dimensional two-sample test then finds significant difference between the two-dimensional means of the two protein populations. Here, the two numerical dimensions consist of log protein ratio and log copy number ratio, but the algorithm would apply to other data types as well. The specific test we use is a two-dimensional version of the non-parametric Mann-Whitney test. Multiple hypothesis testing is controlled by using a Benjamini-Hochberg false discovery rate threshold of 5%. For categories that are significant a two-dimensional difference score is calculated by determining the average rank of the proteins belonging to the category. This average rank is then rescaled to the interval between −1 and 1. A value of 1 in one of the dimensions would mean that all members of this category are the largest values in this dimension, while a value of 0 means that the ranks of the members of the category are distributed in the same way as the background proteins, having no significant bias towards larger or smaller values.
Oxidative phosphorylation changes. Gene copy number changes (A) and proteomic changes (B) in oxidative phosphorylation proteins in HCC2218 cells. Oxidative phosphorylation proteins that were identified in HCC2218 were selected and network was established using the STRING database
(2.93 MB EPS)
Protein complexes. Distribution of genes versus proteins. (A) Scatter plots of gene and protein distribution in HCC1143. Highlighted in red are the 26S or 20S proteasomal subunits. (B) Ratio distribution of genes and proteins in HCC1143 and HCC2218, with complexes highlighted in red.
(5.84 MB EPS)
Changes of functional categories on the proteome and genome level.
(0.13 MB PDF)
Cancer-associated genes that have a change in gene copy number.
(0.08 MB PDF)
Amplified and deleted genes and matching changing proteins.
(0.16 MB XLS)
Complete protein table.
(2.39 MB XLS)
Peptide table.
(9.21 MB XLSX)
Merged table of protein ratios and gene copy number.
(1.82 MB XLS)
Genome profiled protein ratios.
(1.25 MB XLS)
We thank colleagues at the Department for Proteomics and Signal Transduction for assistance and fruitful discussion.