A Unique Four-Hub Protein Cluster Associates to Glioblastoma Progression

Gliomas are the most frequent brain tumors. Among them, glioblastomas are malignant and largely resistant to available treatments. Histopathology is the gold standard for classification and grading of brain tumors. However, brain tumor heterogeneity is remarkable and histopathology procedures for glioma classification remain unsatisfactory for predicting disease course as well as response to treatment. Proteins that tightly associate with cancer differentiation and progression, can bear important prognostic information. Here, we describe the identification of protein clusters differentially expressed in high-grade versus low-grade gliomas. Tissue samples from 25 high-grade tumors, 10 low-grade tumors and 5 normal brain cortices were analyzed by 2D-PAGE and proteomic profiling by mass spectrometry. This led to identify 48 differentially expressed protein markers between tumors and normal samples. Protein clustering by multivariate analyses (PCA and PLS-DA) provided discrimination between pathological samples to an unprecedented extent, and revealed a unique network of deranged proteins. We discovered a novel glioblastoma control module centered on four major network hubs: Huntingtin, HNF4α, c-Myc and 14-3-3ζ. Immunohistochemistry, western blotting and unbiased proteome-wide meta-analysis revealed altered expression of this glioblastoma control module in human glioma samples as compared with normal controls. Moreover, the four-hub network was found to cross-talk with both p53 and EGFR pathways. In summary, the findings of this study indicate the existence of a unifying signaling module controlling glioblastoma pathogenesis and malignant progression, and suggest novel targets for development of diagnostic and therapeutic procedures.


Introduction
In 2013, more than 23,000 individuals were expected to be diagnosed with primary tumors of brain and central nervous system and more than 14,000 deaths were expected in the US alone [1]. The World Health Organization defines pilocytic (Grade I) and diffuse (Grade II) astrocytomas as low-grade brain tumors; anaplastic astrocytomas (Grade III) and glioblastomas (Grade IV; also designated as glioblastoma multiforme, GBM) are high-grade malignant tumors [2,3]. With an annual incidence of 2-3 per 100,000 in Europe and US, GBM is the most frequent and aggressive form of brain tumor (60-70% of total malignant gliomas), and is essentially incurable [3,4]. GBM consists of poorly differentiated, highly invasive neoplastic astrocytes; histopathological features include cellular polymorphism, nuclear atypia, mitotic activity, vascular thrombosis, microvascular proliferation and necrosis [5]. Regional heterogeneity of GBM frequently causes diagnostic discrepancies ($20% of cases). Moreover, a high percentage of gliomas, such as mixed oligoastrocytomas and lower-grade gliomas, remain difficult to categorize reproducibly due to considerable histological overlap. These factors can compromise choice as well as effectiveness of therapeutic options [6]. Histopathologic diagnosis can be further compromised when only small biopsies are available. Additional molecular markers are thus urgently needed to efficiently discriminate among patients with distinct outcomes.
Loss of PTEN, amplification of EGFR and alterations of TP53, PDGFRA and CDKN2A/P16 are frequently found to be associated with GBM pathogenesis [5,7]. Primary GBMs develop de novo after a short clinical history and without evidence of precursor lesions, whereas ''secondary'' GBMs arise from preexisting diffuse or anaplastic astrocytomas. The signaling pathways responsible for development and growth of primary versus secondary GBM appeared as profoundly diverse, suggesting these two types of GBM to be different disease entities. Rather diverse genetic signatures were further proposed in the attempt of explaining GBM pathogenesis and heterogeneity [5,[8][9][10][11][12]. However, the actual impact of genetic signatures for GBM diagnosis and prognosis remains to be defined.
Genomic and transcriptomic data have provided key insight in GBM pathophysiology [11][12][13][14]. Corresponding insight into GBM proteomics [15] has not yet been achieved. Proteomic analysis of low-and high-grade tumors has tried to fill this gap [16][17][18][19][20], but analysis of specific protein markers has largely failed to provide a comprehensive view of GBM pathology.
In this study, we set to identify significantly modulated protein clusters that may bear functional impact and robustly explain distinct, relevant GBM pathology components. Proteomic analysis of human high-grade tumors, low-grade tumors and control tissue samples from normal brain cortex was thus systematically intersected through multivariate statistical procedures (principal component analysis, PCA and partial least square-discriminant analysis, PLS-DA). Using this approach, we were able to identify protein clusters discriminating tumors from normal tissues as well as high-grade from low-grade gliomas. Connectivity network analysis then allowed to discover a GBM control module that encompassed four major signaling hubs centered on Huntingtin, HNF4a, 14-3-3f and c-Myc. This proteomic signature was shown to underlie p53 and EGFR signaling, as an interconnected network. The GBM control module is candidate to be used as diagnostic biomarker and as target for therapeutic intervention. It may also help drafting a unifying model for glioblastoma appearance.

Patients and tissue specimens
Bioptic samples from low-grade and high-grade glioma patients were frozen in liquid nitrogen and stored at 280uC at the Section of Pathology ''M. Malpighi'' of the Bellaria Hospital, University of Bologna, between 1990 and 2002. Corresponding formalin-fixed paraffin embedded (FFPE) samples were stained with haematoxylin-eosin for routine histological diagnosis. The protocol of this study was approved by the board of the Ministry of the University and Research (''Novel technologies for glioblastoma assessment'', FISR Neurobiotechnologies, Grant N 481). Informed consent was previously obtained as indicated in Marucci et al. [21].
A total of 10 low-grade glial tumors (4 oligodendrogliomas, OL, 4 pilocytic astrocytomas, PA and 2 fibrillary astrocytoma, FA) and 25 GBMs were collected. All samples were re-staged and graded by expert pathologists according to 2007 WHO central nervous system tumor classification [2]. Control samples were tissues from five normal cortices from different brain regions, as obtained at autopsy from individuals deceased from diseases not involving the brain.

Reagents and chemicals
All reagents and chemicals are purchased from Sigma-Adrich (St. Louis, MO, USA), Bio-Rad Laboratories (Hercules, CA, USA) and GE Healthcare (Little Chalfont, UK).

Brain tumor lysates
Frozen brain tumor specimens were thawed on ice and resuspended in 2-DE lysis buffer (8 M urea, 40 mM Tris base, 65 mM DTT). Tumor lysates were briefly sonicated in Eppendorf (Hamburg, Germany) tubes with three 10-sec bursts, in 4% CHAPS. The lysates were centrifuged for 15 min at 12000 rpm to remove cell debris. Lysate supernatants were then processed for 2D PAGE analysis (Two dimensional polyacrylamide gel electrophoresis).
The SDS-PAGE (Sodium dodecyl sulfate polyacrylamide gel electrophoresis) dimension was run in a vertical gradient acrylamide/PDA (9-16% T/2.6% C) slab gel. Sodium thiosulfate was used as an additive to reduce background in the silver staining. A constant current of 40 mA/gel was applied [23]. Gels were removed from glass plates, washed in deionized water for 5 minutes, and stained with ammoniacal silver as described by [22]. Preparative gels were stained with the Protea silver stain kit compatible with mass spectrometry analysis (Protea Bioscience, Morgantown, WV, USA).

Image analysis
The GS-700 Densitometer Gel Doc (Bio-Rad Laboratories, Hercules, CA, USA) was used as scanning device. Protein spots were detected using ImageMaster 2-D Platinum software, version 6.0 (GE Healthcare, Little Chalfont, UK). Spot borders were visually inspected and misidentification caused by confluent spots, artifacts and low signal to noise ratio, were manually corrected. Parameters like ''saliency'' (a measure of spot curvature) and ''min area'' (lowest area threshold under which spots are considered artefacts) were used to identify 'true' and 'false' protein spots. Manual contour drawing was then applied in all cases of suboptimal spot auto-detection. This procedure was validated by assessing total spot numbers and spot volume ratios before and after background subtraction.
In order to optimize quantitative analysis of protein spots, the volume of each candidate spot was normalized using 4 surrounding landmark spots localized in areas closeby to the spot of interest, i.e. within corresponding background grey and with analogous signal staining exposure. Landmarks ratios were used for first-level normalization. Sums of landmark signals were then used for target spot normalized quantification. Using these criteria, spot detection and quantification were obtained, which minimized intensity variations among the gels, as assessed by Mann-Whitney test (p, 0.05). Spots with more than 50% of data missing were not included in subsequent analytical steps.
Expression data of each identified spot were plotted into a frequency histogram to highlight main differences between analyzed sets, to visualize subtypes within the same histopathological group and to assess valued distribution shapes. The Shapiro-Wilk test was utilized to assess for normal (Gaussian) distributions.

Mass spectrometry analysis
After tryptic in gel-digestion, overnight at 37uC [24], the differentially expressed spots were excised from preparative gels and analyzed by mass spectrometry (MS) to identify amino acid sequences, using a Bruker Ultraflex III (Bruker, Bremen, Germany) operating in reflectron mode. This instrument was equipped with a Nd:YAG smartbeam laser to acquire positive-ion MALDI mass spectra over a mass range of m/z 800-4000. Spectral processing and peak list generation were implemented by Bruker flexAnalysis software (version 3.3, Bruker Daltonics) for MS and MS/MS spectra. For each protein spot, the most intense precursor ion signals in each MS spectrum were analyzed by MS/ MS fragmentation in LIFT mode. a-cyano-4-hydroxycinnamic acid was used as matrix. Spot identifications were performed by querying the Mascot database. Trypsin cut, carbamidomethyl (C) as fixed, oxidation (M) as variable, a maximum of one missed cleavage allowed, were imposed as modifications in the search parameters. Peptide tolerance and MS/MS tolerance were set at 250 ppm and 0.5 Da respectively.
Column-wise normalization was applied to provide Gaussianlike distributions [25,29]. Analyses were then performed on autoscaled data (mean-centered and divided by the standard deviation of each variable) [30]. A diagnostic plot was utilized to represent normalization procedures for normal distribution assessments [29]. As examples, value intensities for e.g. APOA1, PRDX2, ALDOC, CRYAB_b, TTHY are $3fold higher than others (e.g. NDUS1, QCR1, NFM, ACTB), thus inducing a skewed distribution. After autoscaling normalization, box plots have nearly same mean, standard deviation, and their distribution better matches a Gaussian curve (Figure S1B right) (Kernel density plot right-bottom) [31]. PCA was used as an unsupervised method in order to find the directions of maximum covariance among our protein spots without referring to class labels (tissue samples). This allowed to visualize differences among samples, to detect clustering and pick-up outliers.
PCA condenses datasets to obtain optimal dimensions that best capture signal covariance. However, it fails providing working hypotheses for some causal relations among data subsets [32]. Hence, we went on performing histopathology classificationguided PLS-DA. As many supervised classification algorithms tend to overfit the data [26,33], PLS-DA model validation was performed as previously described [34]. Briefly, to define the optimal number of PCs (principal components), ''7-fold crossvalidation'' (CV) was applied [35,36]. Using CV, the predictive power of the model was verified. Two parameters were calculated for evaluating the models: R 2 (goodness of fit) and Q 2 (goodness of prediction). A model with Q 2 .0.5 was considered good, Q 2 . 0.9 excellent [37,38]. As cross-validation only assesses the predictive power without a statistical validation, the performance of PLS-DA models was also validated by a permutation test (200 times).
To help interpreting results from PLS-DA, we considered the variable importance in the projection scores (VIP score) and regression coefficients (CoeffCS). This allowed to evaluate protein influence (including prediction performance) on the model and identify the best descriptors of the differences among the three groups. The VIP score is a weighted sum of squares of the PLS loading weights taking into account the amount of explained Y-variation in each dimension [25,26,39]. Since the average of squared VIP scores equals 1, the ''greater than 1'' rule is generally used as a criterion to identify the most significant variables [37,39]. PLS-DA CoeffCS express the relation between the Y variables (classes) and all the terms in the model and are used for interpreting the influence of the X variables (proteins) on Y. VIP and CoeffCS values are cumulatively calculated from all extracted PLS components. The coefficients express how strongly the Y groups are correlated to the systematic part of each X variable considering all three components.

Gene ontology, networks and functional analyses
Gene Ontology (GO) analysis was performed using PANTHER 7.2 software (www.pantherdb.org/). The signaling hubs and connectivity networks were obtained using Ingenuity Pathway analysis (IPA, Ingenuity Systems, www.ingenuity.com) and STRING 9.1 (string-db.org) package.

Western blotting
Expression levels of randomly selected proteins were analyzed by Western blotting in order to validate the dataset identified by proteomic analysis. Blots were incubated with the following primary antibodies (Santa Cruz, Santa Cruz, CA): LDH-B (Q-21) sc-133731 rabbit polyclonal; SOD-1 (V-17) sc-34015 goat polyclonal; APOA-I (FL-267) sc-30089 rabbit polyclonal; Aldolase C Antibody (N-14) sc-12065 goat polyclonal and PRXII (9A1) sc-59660 mouse monoclonal. Signal intensities of the bands were quantified with Image JA 1.46b, using a Kodak greyscale standards power curve (www.kodak.com) as reference.
Band intensity values were normalized versus red Ponceau signals of transferred proteins on Western blot filters. Normalized densitometry values between proteomic gel spots and Western blot bands were correlated with the Spearman's rank correlation analysis, to obtain rho coefficients and corresponding p values.

Immunohistochemistry data bank meta-analysis
Publicly available databases [40,41], containing high-resolution IHC (Immunohistochemistry) images extending proteome-wide were analyzed for patterns of expression of GBM-driving engines. The Human Protein Atlas (v. 12, www.proteinatlas.org) provides spatial distribution and expression data from 16621 proteins/ 21984 antibodies in different normal human tissues and different cancer types. The expression profiles of hub proteins were generated for antibody staining parameters, intensity, and fraction of positive cells in control versus glioma arrays. EGFR and p53 IHC staining were used as internal benchmark for performance assessment and quantification standards. gels from normal brain, low-grade and high-grade gliomas. Proteins from normal and tumor brain tissues were processed as described in materials and methods. Protein spots were visualized by ammoniacal silver staining. (B) 3-D representation of protein volume spots and comparison between automatic and manual procedures for definition of protein spots and edges. Manual strategy is time-consuming but allows reducing the number of misidentifications and improving the quantification of the same protein markers between different gels. (C) Normalization of density values of differentially expressed spots (green) by using surrounding landmarks (red). (D) High-grade histotype tumors (green) showing a bimodal distribution of triosephosphate isomerase (TPIS1) expression values. In low-grade tumors (blue) and control samples (red) the values appear to follow a typical Gaussian (normal) distribution. doi:10.1371/journal.pone.0103030.g001

Proteomic analysis
Twenty-five high-grade GBMs, 10 low-grade gliomas and 5 tissue samples from normal brain cortex were analyzed by 2D PAGE ( Figure 1A). Clinical data from brain tumor patients are summarized in Table 1 [21].
To improve quantification accuracy and allow robust statistical analysis of 2D gel data [42], we optimized image processing, spot detection and signal quantification procedures by applying operator-guided background subtraction and spot contour optimization ( Figure 1B and S1A). These procedures allowed to obtain 49615% increase of the detected signal (spot volume upon spot contour optimization) (Table S1), and better spot detection (+1662%), as compared with the basic/automatic procedure. Most spot variation was not detectable across all samples; therefore, protein spots were marked as ''differential'' if expression changes could be observed in at least one tumor sample versus all control samples. The statistical distributions were found to be notnormal and often bimodal for most protein spots ( Figure 1D). Forty-eight protein markers were identified by MS (Table 2, Figure S2, Supplemental Material S1), and their expression levels were quantified upon the image optimization procedures described above ( Figure S3A and Table S2A,B). For enhancing quantification robustness, the density value of each candidate spot was normalized versus 4 surrounding landmarks ( Figure 1C, Table S2A) as described in Materials and Methods section. This allowed to obtain robust quantification, through compensating for local staining dishomogeneities.

Spearman's correlation analysis
Negative or positive correlations were globally revealed by Spearman's correlation analysis ( Figure 2B). Highest positive correlations ( Figure 2C, red) were found to occur between HBA and HBD; CN37 and TBA1B; CN37 and LDHB; ALDOC and ESTD; CN37 and DEST; CN37 and RAN; ALDOC and NFM; TBA1B and DEST. Highest negative correlations ( Figure 2C, blue) were observed between UCHL1 and HBD, and between UCHL1 and HCD2 (Table S3).

PCA analysis
The proteomic matrix was processed by scaling protein expression values in order to reduce potential systematic bias and make the variables comparable in magnitude to each other [25,27], as indicated. The data scaling results and normalization procedures, are summarized graphically in the Figure S1B. The horizontal box plots represent the distributions of individual variables, the bottom curves show the global data distribution based on kernel density estimation ( Figure S1B).
Then, we went on to utilize PCA as an unsupervised multivariate method for analyzing the dataset and identifying the best discriminators among sample classes. PCA score plots were generated ( Figure 3A), where each axis represented a PC identifying linear combinations of the most tightly interconnected proteins/signaling networks [32,43]. Samples with similar protein expression profiles/PC scores clustered together with striking fitness. PC1 (score vector t1) was found to discriminate controls from tumors; PC2 (score vector t2) separated low-grade from highgrade gliomas ( Figure 3A, left). Seven major discriminators between control and tumor samples were found: expression levels of APOA1, PRDX3_a and CLIC1 were higher in tumors than in normal brain cortex, whereas significantly lower levels of NFM, CN37, NDUS1 and MDHC were found in tumors as compared with normal tissues (component PC1, Figure 3A, right). Thirteen major discriminators between low-and high-grade tumor samples were found: expression levels of HCD2, HBA and HBD were strongly up-regulated in high-grade gliomas, whereas CRYAB_b, IPYR, TPIS, PEA15, PSD13, GFAP, PHP14, 6PGL, KCRB, IDH3A had higher expression in lowgrade than high-grade tumors (component PC2, Figure 3A, right). PCA analysis also revealed that UCHL1 have a high discriminating power of this marker when control/low-grade samples were compared with high-grade tumors. Global sets of protein marker with higher ( Figure S3B,C) and lower discriminating power ( Figure S3D) were identified.

PLS-DA analysis
To verify the strength of the unsupervised PCA analysis, and to further build on it, we analyzed the dataset on the basis of known classes (controls vs high-grade tumors vs low-grade tumors) using a supervised PLS-DA method [37,38,44,45]. This model was found to have strong goodness of fit (cumulative R 2 Y = 0.890) and prediction power (cumulative Q 2 = 0.813) (Figures 4 A,B). The separation between normal brain tissues, low-grade and highgrade gliomas yielded a staggering clear discrimination ( Figure 3B). Most significant separations were explained by a three-component model, where principal components PC1, PC2 and PC3 represented 15.1%, 13.7% and 10% of the total variance in the protein spot-matrix, respectively ( Figure 4). Permutation tests were carried out in order to validate the PLS-DA model [37]: as shown in Figure S4A, the original model was found to have higher R 2 and Q 2 values than the permuted models, and negative Q 2 values were obtained for all three permuted groups tested.
A PLS-DA loading plot was generated in order to find major discriminants between the groups analyzed ( Figures 3C). Eleven major discriminators between control and tumor samples were identified: APOA1, CLIC1 and PRDX3_a were over-expressed, whereas NFM, CN37, NDUS1, MDHC, ALDOC, STMN1, PEBP1 and DDAH1 were down-regulated in tumors as compared with control samples. Twelve major discriminators between lowand high-grade gliomas were identified: HCD2, HBA and HBD were up-regulated in high-grade gliomas, whereas expression levels of CRYAB_b, IPYR, TPIS, PEA15, PSD13, GFAP, IDH3A, 6PGL and PHP14 were found to be higher in low-grade than in high-grade tumors. The regression coefficients calculated for PLS-DA outcomes confirmed the power of the identified clusters in distinguishing among sample classes ( Figure S4B). Next, we calculated the VIP score for each protein in our dataset. Out of 48 variables analyzed as potential predictors, the following 18 descriptors were found to significantly contribute to the classification model (VIP score $1): HBD, UCHL1, HBA, STMN1, HCD2, ALDOC, NFM, IPYR, NDUS1, MDHC, DDAH1, PSD13, APOA1, 6PGL, PEBP1, TTHY, ACTY, IDH3A ( Figure 4D).
In order to improve the discriminating power of the identified protein markers, we went on to intersect PCA and PLS-DA loading plots and find the shared proteins/best discriminators for each condition ( Figure 3C). HBA, HBD and HCD2 positively correlated with high-grade gliomas; GFAP, PHP14, 6PGL, PSD13, PEA15, TPIS, CRYAB_b, IPYR and IDH3A correlated with low-grade tumors; on the other hand, NFM, CN37, NDUS1 and MDHC negatively correlated with tumor samples. APOA1, PRDX3_a and CLIC1 were the best discriminators between tumors and negative controls.

Validation of proteomic profiles
The proteomic landscape of human gliomas was then validated by protein immunoblotting, quantifying the expression levels of randomly-selected spots. Protein markers (APO-A1, SOD1, PRDXII, LDHB, ALDOC) were randomly selected (,10%) among the 48 differentially expressed proteins and analyzed. Western blot chemiluminescence images were acquired at subsaturation levels and quantified with ImageJ. Silver staining density quantified as above and Western blot signals were then subjected to Spearman's correlation analysis.  The selected performance measure Q 2 shows the three-component model performs as the best one. R 2 X: portion of the variation of X explained by specified PC; R 2 X(cum) Cumulative explained portion of X set variation; Eigenvalue: number of variables (K) times R 2 X; R 2 Y: portion of the Y set variation modeled by the PC; R 2 Y(cum): cumulative modeled variation of Y set; Q 2 : overall crossvalidated R 2 for the specific PC; Limit: threshold cross-validation for the specific PC; Q 2 (cum): cumulative Q 2 up to the specified component, is a model predictive power according to cross validation. Unlike R 2 X(cum), Q 2 (cum) is not additive. (C) 3-D score plot. The samples are represented in the 3-D score plot, the first three components (PC1, PC2, PC3) are reported accounting 15.1%, 13.7% and 10% of total variation respectively. (D) Proteins able to discriminate between controls, low-grade tumors and high-grade tumors, ordered by VIP score. VIP scores $1 are significant (above the red line) and indicate important X variables (proteins) that predict Y responses (tissue samples). doi:10.1371/journal.pone.0103030.g004 Western blotting data, densitometry, normalization procedure details, scatter plots and elliptic confidence intervals.

Pathway analysis
The discriminating performance of the protein clusters identified by unsupervised analysis, and the tight correspondence between PCA and PLS-DA clusters suggested deep, intersected biological relevance. Hence, we went on to explore the existence of a ''GBM control module'' connecting the discriminating protein clusters through physical and functional interactions. In order to reveal these cross-talks we carried out bioinformatic network detection followed by data meta-analysis. The top-score network (Fisher's exact test: p = 1x10 2104 ) was found to contain 46 out of the 48 proteins identified by proteomics ( Figure 5 and Table  S5A). Most members were found to play key roles in neurological diseases (28), genetic disorders (34), skeletal and muscle diseases (22), and cancer (25). The five top score pathways of this network included proteins involved in mitochondrial function (HCD2, PRDX3, NDUS1, PRDX5 and QCR1), pentose phosphate pathway (6PGL, ALDOC, TKT); glycolysis/gluconeogenesis (HCD2, TPIS, LDHB and ALDOC), inositol metabolism (TPIS and ALDOC) and oxidative phosphorylation (NDUS1, ATP5H, QCR1 and IPYR).
We then went on to assess the relevance of the GBM control module versus known molecular pathways involved in GBM biogenesis, via IPA and STRING platforms algorithms (Tables S5B). To our surprise, the four hubs we had identified were found to converge on both major players of glioma development and progression: epidermal growth factor receptor (EGFR) and p53. Bridging proteins between network hubs and EGFR/p53 were as follows: UCHL1, TPI1 and SH3BGRL to EGFR; ACTB, CRYAB, STMN1, NME1, Tubulin, GFAP, UBE2N, PPA1 and UCHL1 to p53 (Table S5B).
Contrary to a wide-held belief, proteins and mRNA levels correlate poorly in most cellular systems [46][47][48], differential protein/mRNA stability playing a major role in this discordant scenario [49,50]. Nevertheless, identification of transcription factor-driven differential gene expression landscapes provides insight into tumor-driving gene networks [51]. Hence, we went on to identify upstream transcription factors potentially involved in a coordinated regulation of proteins taking part to the GBM control module. Using stringent criteria for the analysis (P values ,0.005, interactions $5), we discovered that 9 transcription factors (HTT, MYC, HNF4A, TP53, ESRRA, NFE2L2, PPARGC1A, MYCN, ESR1) interacted with 33 out of 48 differentially expressed proteins. Importantly, these transcription factors were also found to interact with/regulate expression of 18 of the best discriminators identified by PCA and PLS-DA (Table  S5C). Of major relevance, all transcription factors identified as central hubs (Huntingtin, c-Myc, HNF4a) of the GBM control module, together with p53, stood-up as major drivers of the expression of the vast majority of the components of the module.

Validation of hub expression in tumors
A prediction of our model was that the four hubs of the GBM control module should be broadly expressed. Hence, we assessed their expression in human glioma samples by protein immunoblotting. 14-3-3f, HNF4a and Huntingtin were widely expressed in glioma samples. Expression of HNF4a was higher in tumors samples than controls; Huntingtin and c-Myc were found to be overexpressed in high-grade gliomas ( Figure 6). Moreover, we analyzed the expression of the four hubs in tissue samples by IHC (Figure 7 and S5). In high-grade gliomas 14-3-3f and HNF4a were strongly expressed in nuclear and cytoplasmatic compartments ( Figure 7A, Figure 7C). On the other hand, specific nuclear accumulation was observed in low-grade samples ( Figure 7B, Figure 7D). c-Myc was specifically expressed in the nucleus of the glioma samples, with a trend toward an upregulation from low-to high-grade tumors( Figure 7G, Figure 7H).

Four hubs expression meta-analysis
The findings above suggested broad expression of the four hubs of the GBM control module. We verified this prediction by performing a proteome-wide profiling of IHC expression patterns (Human Protein Atlas; www.proteinatlas.org) for Huntingtin, HNF4a, 14-3-3f and c-Myc. HNF4a level in normal glial cell showed low levels of staining, with moderate intensity in ,25% of the cells. In analyzed gliomas 5/10 had a corresponding expression profile as compared to controls; 2/10 presented an increase of expression (medium staining, moderate intensity and percent reactive cells of 75-25%), 3/10 presented lower expression (,25% of cells) compared with control samples. Hungtintin level in normal glial cell showed low expression (low staining, moderate intensity and percentage ,25%) in IHC array stained with mouse mAb. In glioma tissue arrays 0/12 have the same expression profiles compared to controls. Strikingly, 12/12 presented a global increase of expression or a substantial increase of positive cells. A second IHC array set (12 samples) stained with rabbit polyclonal antibody was analyzed confirming this evidence. Consistent with our findings, c-Myc expression in normal glial cells was not detectable by IHC. Rabbit polyclonal antibody targeting the Cterminal portion of c-Myc led to positive staining on 4/11 tumors samples. The mouse mAb gave positive staining in 11/12 astrocytoma samples. 14-3-3f presented strong staining levels in normal glial cell (high staining, strong intensity, percentages between 75%-25%). Three out of 10 array samples presented the same staining patterns as controls. The remaining 7/10 samples showed a prevalence of positive cells of .75%. p53 and EGFR expression patterns were analyzed as internal benchmarks of the robustness of analysis and were shown to possess expected expression profiles and prevalence of expression findings (Table  S7).
Huntingtin, whose mutations are responsible for the neurodegenerative disorders of Huntington's disease, is found in neurites and at synapses, has anti-apoptotic functions and is neuroprotective in brain cells exposed to apoptotic stimuli, such as serum deprivation, mitochondrial toxins or death-inducing genes [52]. Notably, pathogenic Huntingtin affects the expression, redox state, disulfide bonding of antioxidant proteins identified here, among them SODC, and PRDX2, together with PRDXI [53], thus supporting a shared functional link. Taken together, our findings provide first evidence of function of Huntingtin in brain tumors, thus paving novel avenues of investigation on GBM pathophysiology.
HNF4a is a modulator of cell proliferation [54][55][56] through the cell cycle inhibitor p21 [57] and the transmembrane glycoprotein Trop-2 [51]. A tight interplay/feedback loop occurs between HNF4a and c-Myc. Both HNF4a and c-Myc proteins compete for control of the P21/CDKN1A gene transcription [57], and deletion of HNF4A in hepatocellular carcinoma cells results in significant up-regulation of c-Myc and enhanced cell proliferation rates [58]. Essentially no evidence for expression and function of HNF4a in brain tumors was available before this study, again opening novel avenues for investigation on GBM pathophysiology.
Deregulation of MYC is a frequent driver of cancer [59]. c-Myc has been reported to bind a large number of genes [60] and regulates cell proliferation by affecting cell-cycle checkpoint genes, CDK inhibitors and cyclins [61]. c-Myc also plays a major role in regulating metabolic genes required for energy production [62,63] and ribosomal biogenesis. mTORC2 controls glycolytic metabolism by regulating c-Myc cellular levels and ultimately determines overall survival of GBM patients [64]. This evidence is consistent with our GO analysis showing that major targets of our analysis, such as ALDOC, TPIS, IPYR, 6PGL, HCD2, IDH3A, NDUS1, MDHC are involved in metabolism (e.g. glycolysis, inositol metabolism and oxidative phosphorylation) and cancer-related metabolic reprogramming, including the Warburg effect [65][66][67][68][69].
Glioma development is frequently associated with mutations of the isocitrate dehydrogenase IDH1 and IDH2 genes [76,77], whereas mutations of IDH3 have never been observed in GBM [78]. Our analysis discovered IDH3A quantitative variations in low-grade samples versus high-grade tumors. This provided support for a novel model of interference of IDH proteins in GBM progression, through differential expression of a wild-type protein. Of interest, IDH3A was found to be a specific target for p53-dependent phosphorylation [79], further supporting the functional relevance of the GBM control module.
Remarkably, three of the four hubs of the GBM control module (Huntingtin, HNF4a, c-Myc) are transcription factors. Transcription factor network analysis then highlighted all three of them as major regulators of the expression of most proteins of the GBM control module, supporting a joint driving role in GBM development. A functional role of the newly discovered four-hub control module in GBM appearance and progression further required vast, coordinate expression in tumor cases.
Strikingly, analysis of the transcription factors steering the GBM control module led to the discovery that these tightly interrelate with p53. Notably, p53 can regulate Huntingtin's expression at the transcriptional level [80], thus suggesting a cooperation of these signaling pathways not only in neurological diseases, but also in development and progression of brain tumors. p53 plays a critical role as modulator of the HNF4a/c-Myc feedback loop, since it binds c-Myc [81] as well as HNF4a [82], and inhibits the activity of HNF4a via recruitment of histone deacetylase [82]. Additionally, ATM-dependent activation of p53 involves dephosphorylation and association with 14-3-3 [83].
GBM type II are linked to mutations of TP53, whereas GBM type I are thought to be driven by EGFR amplification/ disregulation [5,7]. Notably, overexpression of Huntingtin interacting protein 1 (HIP1) has been shown to correlate with increased EGFR levels [84]. HIP1 physically associates with EGFR and maintains its levels in brain tumors [84]. It was recently reported that EGFR induces expression of the oncogenic miRNA miR-7 through a Ras/ERK/Myc pathway, and that c-Myc binds to the miR-7 promoter, enhancing its activity [85]. The 14-3-3f protein was reported to directly bind EGFR upon stimulation with EGF [86]. Recently, downregulation of PEPB1/RKIP (Raf kinase inhibitor protein) was shown to be associated with poor outcome and malignant progression [87]. PEPB1 inhibits RTKs signaling blocking Raf/MEK/ERK cascade. We found PEPB1/RKIP downregulated in low-grade and high-grade tumors, extending previous indications [20]. Taken together, our findings indicate deep intertwining of the GBM control module also with EGFR signaling pathway.
In summary, using multitiered proteomic profiling, we discovered previously unidentified hubs (centered on Huntingtin, HNF4a, c-Myc and 14-3-3f; these then attract p53 and EGFR) as major signaling drivers in the pathogenesis of brain tumors. Our findings thus support the unexpected existence of a unique GBM control module, helping providing a much needed unifying model for GBM appearance and progression. Future studies should be undertaken to validate a diagnostic/prognostic role of the GBM control module. This may also provide better tools for classification and clinical evaluation of GBM, for more effective procedures for tumor diagnosis, prognosis and patients cure.  Table S1 Inter-gel spot volume quantification variance analysis. Three replica gels of a reference protein lysate from high-grade brain tumor were utilized for inter-gel variance analysis. Ten randomly chosen spot volumes were quantified using Image Master Platinum 6.0; basic/automated procedure was compared with operator-guided contour drawing. The bar graph shows the gain-of-signal (expressed as percent of variation) detected using operator-guided contour drawing versus basic/ automated analysis. P = 0.037 (comparison among means in manual vs basic/automated procedures). SD, standard deviation. a : values from basic-automated quantification procedure. b : values from operator-guided quantification procedure. c : values from operator-guided quantification procedure have been normalized on values from basic-automated procedure, and expressed as percent of variation. (XLSX) Table S2 Table S2A: Differentially expressed proteins in tumor samples versus normal brain. 2D-gels (5 Control samples, 10 Low-Grade and 25 High-Grade tumors) were processed and quantified as described. Density values from differentially-expressed protein spots were determined. Density normalization on local landmarking was performed as described in the main text.    Table S4 Validation of proteomic target proteins by immunoblotting analysis. Immunoblot analysis (second column) versus silver normalized density values (first column). Five proteins (,10%) were randomly selected among the 48 differentially expressed proteins and analyzed in tumor samples. Density values from blots were quantified as described in Materials and Methods, as normalized on red Ponceau signal. Silver staining density and Western blot signals were subjected to Spearman's correlation analysis; correlation coefficients (rho) and p-values are reported. Scatter plots for the two variables with confidence ellipses were generated. Representative blots from APOA1, SOD1, LDHB, PRDXII (upper and lower bands) ALDOC are shown. (XLSX) Table S5 Table S5A: Ingenuity Pathway Analysis. The significance values for canonical pathways and other biological functions were calculated using the right-tailed Fisher's exact test by comparing the number of user-specified proteins that participate in a given function or pathway, relative to the total number of occurrences of these proteins in all pathway or functional annotations stored in the Ingenuity pathway knowledge base (IPKB). a : The degree of interaction between differentially expressed markers was compared with that expected by chance. A p-value = 1610 2104 was computed by a hypergeometric test. Table S5B: Supervised pathway analysis. Interaction of EGFR (A) and p53 (B) with network proteins, as determined by IPA analysis. (C) Pathway analysis, as performed by STRING 9.1, of the four major hubs (HTT, HNF4A, Myc, YWHAZ) cross-interacting with p53 and EGFR. Table S5C: Transcription factor pathway analysis. Transcription Factor Analysis, as performed by IPA Upstream Regulator Analysis Tool. Using stringent cut-offs for interaction significance (p value ,0.005); a threshold value for interaction with $5 target proteins was applied. Nine transcription factors (HTT, MYC, HNF4A, TP53, ESRRA, NFE2L2, PPARGC1A, MYCN, ESR1) were shown to modulate 33 out of 48 differentially expressed proteins. Color codes correspond to those of discriminating proteins by PCA and PLS-DA analysis ( Figure 4). Proteins that positively correlate with controls are in red; with low-grade tumors are in blue, with both high-grade and low-grade are in magenta. Correlation of UCHL1 with lowgrade/control group is in yellow. (XLSX) Table S6 Table S6A: Discriminator proteins from PCA/PLS-DA clusters mapping on network hubs -HTT. Table S6B: Discriminator proteins from PCA/PLS-DA clusters mapping on network hubs -HNF4A. Table S6C: Discriminator proteins from PCA/PLS-DA clusters mapping on network hubs -Myc. Table  S6D: Discriminator proteins from PCA/PLS-DA clusters mapping on network hubs -YWHAZ (14-3-3 zeta protein). Protein members from the four discriminator clusters, as defined by PCA and PLS-DA analysis, were mapped on the IPA network (highlighted in blue). (XLSX)  Table S7B: Immunohistochemistry proteome profiling meta-analysi -HNF4A. Representative examples of IHC-stained sections of glioma samples. Specific staining for the HNF4a protein is identifiable as brown spots. Nuclei are counterstained with hematoxylin (in blue). Table  S7C: Immunohistochemistry proteome profiling meta-analysis -c-Myc. Representative examples of IHC-stained sections of glioma samples. Specific staining for the c-Myc protein is identifiable as brown spots. Nuclei are counterstained with hematoxylin (in blue). Table S7D: Immunohistochemistry proteome profiling metaanalysis -YWHAZ (14-3-3f protein). Representative examples of IHC-stained sections of glioma samples. Specific staining for YWHAZ (14-3-3f protein) is identifiable as brown spots. Nuclei are counterstained with hematoxylin (in blue). Table S7E: Immunohistochemistry proteome profiling meta-analysis -EGFR. Representative examples of IHC-stained sections of glioma samples. Specific staining for the EGFR protein is identifiable as brown spots. Nuclei are counterstained with hematoxylin (in blue). Table S7F: Immunohistochemistry proteome profiling metaanalysis -p53. Representative examples of IHC-stained sections of glioma samples. Specific staining for the p53 protein is identifiable as brown spots. Nuclei are counterstained with hematoxylin (in blue).