Cancer Missense Mutations Alter Binding Properties of Proteins and Their Interaction Networks

Many studies have shown that missense mutations might play an important role in carcinogenesis. However, the extent to which cancer mutations might affect biomolecular interactions remains unclear. Here, we map glioblastoma missense mutations on the human protein interactome, model the structures of affected protein complexes and decipher the effect of mutations on protein-protein, protein-nucleic acid and protein-ion binding interfaces. Although some missense mutations over-stabilize protein complexes, we found that the overall effect of mutations is destabilizing, mostly affecting the electrostatic component of binding energy. We also showed that mutations on interfaces resulted in more drastic changes of amino acid physico-chemical properties than mutations occurring outside the interfaces. Analysis of glioblastoma mutations on interfaces allowed us to stratify cancer-related interactions, identify potential driver genes, and propose two dozen additional cancer biomarkers, including those specific to functions of the nervous system. Such an analysis also offered insight into the molecular mechanism of the phenotypic outcomes of mutations, including effects on complex stability, activity, binding and turnover rate. As a result of mutated protein and gene network analysis, we observed that interactions of proteins with mutations mapped on interfaces had higher bottleneck properties compared to interactions with mutations elsewhere on the protein or unaffected interactions. Such observations suggest that genes with mutations directly affecting protein binding properties are preferably located in central network positions and may influence critical nodes and edges in signal transduction networks.


Introduction
Most cancers are characterized by genomic instability which is considered to be one of the important factors driving tumor development [1]. These genetic perturbations potentially lead to abnormal oncogene activation and/or tumor suppressor gene inactivation. According to the concept of ''oncogene addiction'', cancer cells depend on the activity of a single or a few oncogenes for their proliferation and survival [2]. Altered activity of oncogenes and tumor suppressors may be caused by gene amplifications, enhanced or decreased transcription or translation. At the same time, missense mutations might also play a very important role in carcinogenesis [3]. While contributing significantly to tumorigenesis, majority of mutations are considered neutral (i.e. ''passenger'' mutations), and only a few are under positive selection in cancer cells (i.e. ''driver'' mutations) [3,4]. Various methods have been applied to predict the deleterious effects of mutations [5,6], to find positively selected mutants and to distinguish driver from passenger mutations [7,8]. However, their predictive power remains limited, largely depends on the level of evolutionary conservation [9] and the background mutation rate which is difficult to determine for each sample [10]. Moreover, recent results suggest that a large majority of single nucleotide variations predicted to be functionally important are rare (with minor allele frequency less than 0.5%) [11], making such rare disease-associated variants difficult to detect.
Many signaling networks are deregulated in cancer and involve a dense network of protein-protein interactions. Therefore, the characterization of cancer-related protein interaction networks is essential for our understanding of the molecular mechanisms of carcinogenesis. Recently, new strategies were proposed to identify key network modules and driver oncogenes by combining copy number variations, missense mutations and mapping potential oncogenic driver genes onto high-throughput protein-protein interaction networks [12,13,14]. As a result of these studies, novel cancer-related genes and functionally-related gene modules targeted by driver cancer mutations were identified [13,14,15]. Moreover, proteins recognize and bind their specific targets in a highly regular manner and the specificity of these interactions is largely determined by structural and physico-chemical properties of binding interfaces. Recently, structural complexes of diseaseand cancer-related proteins were analyzed [16,17,18,19], showing that disease-related protein complexes have distinct binding properties; in particular, they contain multiple interface patches, enabling interactions with many other proteins [16], and mutations on different patches might have caused pleiotropic disease effects [20]. In addition, many disease mutations are located on protein-protein interfaces [21,22,23], a tendency that is especially pronounced for cancer missense mutations [20]. Such observations generally emphasize the importance of studying the effects of cancer mutations on protein interactions and on their binding interfaces in particular.
Many oncogenes, tumor suppressors and their mutations have been identified as key players in cancer signaling events. However, only a few have been found in different types of cancer simultaneously. Such heterogeneity complicates the identification of key players that provide selective advantages to tumor cells. In our study we utilized a set of mutations derived from glioblastoma patients, allowing us to narrow down the heterogeneity of phenotypic response to better understand genotype-phenotype relationships. Glioblastoma is the most malignant form of brain tumors according to WHO classification [24]. Recently, The Cancer Genome Atlas (TCGA) and other projects provided mutation data of glioblastoma patients on a large scale [25,26]. Eight potential driver genes were identified in glioblastomas, and mapping mutated genes on biochemical pathways indicated several prevalent pathways that contained mutated driver genes [25,26,27]. Specifically, genome alterations that were found in several key pathways were observed to be mutually exclusive to each pathway, pointing to the sufficient selective advantage of these few alterations for cancer cells [25].
Recently, we mapped the human protein interactome using structural complexes which allowed us to decipher the effect of glioblastoma missense mutations on protein-protein, proteinnucleic acid, protein-ion binding interfaces and phosphorylation sites in this study ( Figure 1). Here, we show that mutations on binding interfaces result in more drastic changes of amino acid physico-chemical properties than mutations that cannot be mapped on interfaces. Moreover, we found that mutations on protein-protein interfaces have overall destabilizing effects and mostly affect the electrostatic component of binding energy as well as the topology of protein-protein interaction networks. Importantly, we identify possible driver mutations and genes, some of which are specific to nervous system functioning. We complement our findings by suggesting the molecular mechanisms of the phenotypic effect of mutations.

Cancer Mutations might Affect Phosphorylation Sites
Many proteins that play an important role in cancer may also participate in signaling pathways, typically mediating signals through phosphorylation events. Previously, somatic cancer mutations were shown to potentially cause gain or loss of phosphorylation sites [29]. Therefore, we hypothesized that glioblastoma mutations may also affect phosphorylation sites, potentially disrupting the flow of signals through the loss of sites. We collected 2,825 phosphorylation sites from the PhosphoSite-Plus [30], Phospho.ELM [31] and PHOSIDA [32] databases which were further verified by GPS software [33]. While 94 mutation sites in Ser/Thr/Tyr residues could be potentially phosphorylated, we found that 6 out of 94 sites significantly overlapped with phosphorylation sites (Fisher exact test pvalue = 0.028, Table S1 in File S1). Indeed, phosphorylation may be accompanied by the changes in local site environment or global conformation, lead to protein activation or inactivation and modulate the strength of protein or DNA interactions [34]. Therefore, mutation of a phosphorylation site may result in the loss of these important functional properties, as exemplified by the loss of phosphorylation site Ser 313 in P53 that regulates binding to DNA.

Effect of Glioblastoma Mutations on Protein Binding
We integrated mutated genes in a structurally inferred protein interaction network and estimated the effect of these mutations on such a network. Specifically, we constructed mutant structural models (see Methods) and calculated the differences of binding energies that were caused by the corresponding amino acid substitutions. We found a negative average binding energy difference of DDDG = 22.54 kcal/mol, pointing to an overall destabilizing effect of mutations on protein-protein complexes in glioblastomas ( Figure 2A, Table 1). Furthermore, the electrostatic component of binding energy was shifted towards negative values compared to zero (p-value = 0.007) and compared to the van-der-Waals component (p-value = 0.0013). Meanwhile, the van-der-Waals component itself did not show an overall de-or overstabilizing effect. While several applications have been developed to predict the effect of mutations on protein stability, we compared our results to FoldX, allowing us to observe a significant, although not very high, correlation between the DDDG values of both approaches ( Figure S1B in File S1, Pearson's r P = 0.440.77, pvalue ,0.01). These differences may arise from the fact that FoldX uses an empirical potential calibrated on the set of experimental changes of unfolding energy in the presence of mutations. Furthermore, FoldX is not explicitly trained on disease mutations and binding energy changes and does not account for the mutation induced conformational changes of the protein backbone.
In general, substitutions with amino acids that have similar physico-chemical properties may not drastically alter the stability of a single protein or a complex. We calculated physico-chemical distances between wild-type and substituted residues and compared it with the binding energy difference for all protein complexes and their models (see Methods). The physico-chemical distance was defined as the Euclidean distance using ten different physico-chemical properties of amino acids [28]. As indicated by its corresponding DDDG, the effect of substitutions was statistically significantly correlated with the physico-chemical distance ( Figure  S1B in File S1, Pearson r P = 20.50, p-value = 0.015). Specifically, large distances corresponded to large negative DDDG and vice versa, suggesting that substitutions of amino acids with very different properties are usually destabilizing. In turn, small changes in amino acid properties may result in additional stabilization of complexes. All the data about physico-chemical distances and effects on binding energy are available at ftp://ftp.ncbi.nih.gov/ pub/panch/GBM/.
Such results also prompted us to estimate the potential amplitude of the effect of mutations even if structures or models were unavailable. Therefore, we calculated physico-chemical distances for all 695 mutations from 598 genes. We observed that distributions of physico-chemical distances that referred to amino acid substitutions on all types of interfaces, and protein-protein interfaces in particular, had significantly larger distances compared to non-interface regions ( Figure 2B, p-value = 0.011, Wilcoxon test). For example, we observed that the first peak in Figure   around 0.5 mostly referred to substitutions of aliphatic residues into each other or aliphatic into polar residues with Val-.Met being the most frequent. The substitutions of arginine and cysteine were among the most frequent, had physico-chemical distances of about 141.5 and corresponded to the second peak of the distribution in Figure 2B. In addition, we found that mutations often affected arginine on binding interfaces. Arginine has unique binding properties originating from strong stabilization of its protonated form due to its high pKa. Furthermore, Arg forms salt bridges, strong cation-p interactions and is enriched in binding hot spots [35,36,37].

Interface Analysis Complements Machine-learning Methods and Helps to Decipher Molecular Mechanisms
Several machine-learning methods were recently developed to predict the phenotypic effect of disease mutations on proteins and were successfully used for monogenic diseases [5,6]. Most of these methods utilize evolutionary conservation, residue mutability and accessible surface area as their main predictive features. We predicted the effect for 581 glioblastoma mutations using PolyPhen2, performing quite well in comparison to other prediction methods [38]. Our results showed that PolyPhen predicted 69% of all mutations on interfaces as ''probably damaging'' (Table S2 in File S1, tables at ftp://ftp.ncbi.nih.gov/ pub/panch/GBM/). Such an agreement is noteworthy, given that our protocol is not trained on a known set of disease mutations while methods like PolyPhen do not use interface features in their training. Interestingly, we also found a limited but still significant correlation between the largest absolute value of the energetic effect of mutations on protein binding DDDG (as obtained by our approach) and the corresponding PolyPhen2 score (Spearman rank correlation, r S = 0.5, p-value = 0.03). Since 23% of all mutations that PolyPhen predicted as ''probably damaging'' were located on interfaces, our approach may suggest the possible mechanism of their damaging effect through their impact on protein interactions.
As previously noted, many machine learning methods assessing the effects of mutations erroneously predict a 'benign' effect when the mutation occurs in an evolutionarily non-conserved position or is solvent accessible [9]. Yet, when a protein complex is formed, a mutation which was previously solvent accessible in a monomer (and possibly non-conserved) may be buried on the binding interface and very damaging for protein interactions. We show that 18% of interface mutations were predicted by Polyphen2 as ''benign''; we analyze their possible driver effect and mechanism of action in the next two sections. Therefore, the necessity to develop approaches which complement machine learning methods with more detailed biophysical analyses is evident and should be the subject of future endeavors.

Mechanisms of Effects of Mutations on Protein-protein Interactions
Here, we analyze the effect of mutations on protein-protein complexes and suggest the underlying mechanisms which include inactivation of wild-type enzymatic activity, destabilization of a functional multimeric complex and alteration of the protein turnover rate. All analyzed mutations were predicted to be benign by PolyPhen2. The first case represents the IDH1 R132H mutation potentially inactivating the wild type conversion of isocitrate to a-ketoglutarate (a-KG) and/or resulting in a neoenzymatic activity and production of D-2-hydroxyglutarate [39]. Since IDH1 mutations are heterozygous we first analyzed the heterodimer containing one mutated and one wild type chain. Specifically, we found that heterodimers in the inactive state of IDH1 (PDB code 1T09) were considerably stabilized by 8.6 kcal/ mol. In addition, we performed calculations for a double mutant where both chains contained the R132H mutation, and showed that its inactive dimer is further stabilized by 11.3 kcal/mol. Our results are consistent with former studies suggesting that IDH1 heterodimers are stable with a considerably lowered isocitrate dehydrogenase activity while R132H:R132H homodimers were almost completely inactive [40]. In accordance with other experimental studies, we suggest that such inactive dimer overstabilization might prevent the conformational cooperative movements of dimer subunits required to form the active state [41].
Neuroligins (NLs) are transmembrane proteins on the postsynaptic cell surface and serve as receptors for neurexins that are synaptic cell adhesion proteins on the presynaptic cell surface. Since the formation of proper synapses is crucial for normal brain function we investigated the model of neuroligin2 (NLGN2) based protein-protein interactions, we mapped interaction partners to their corresponding human proteins (step 4a), allowing us to find 160 protein interactions between 150 genes with mutations affecting their interaction interfaces. In step 4b, we compared the structures of the unperturbed wild-type protein and the mutated protein by performing energy minimization calculations and determining binding energy differences. doi:10.1371/journal.pone.0066273.g001 on the neuroligin-1/neurexin-1 beta complex [42] (PDB code 3BIW, 75% identity between NLGN2 and structural template). Earlier it was determined that the synaptogenic activity strongly depended on the formation of stable neuroligin-1 multimers [43]. We observed that glioblastoma mutation E577K was located on the dimer interface of two neuroligin monomers and contributed to a destabilization of this dimer by 1.2 kcal/mol [43]. The third example represents Rad52 playing a critical role in DNA double-strand-break repair. This protein is characterized by a very rapid turnover that is tightly regulated in the cell. Specifically, we observed that mutation R46K was located on the multimeric interface in the model of RAD52 N-terminal half of the protein (pdb code 1KN0) and considerably over-stabilized each dimer in the undecameric complex by 9 kcal/mol. Such a result might suggest that this mutation may considerably affect the Rad52 turnover rate. Indeed, it was previously shown that some mutants extend the half-life of Rad52 and dysregulate their turnover in a cell [44].

Mechanisms of Effects of Mutations on Protein-nucleic Acid and Protein-ion Interfaces
Other than protein-protein interactions, cancer mutations may affect other types of protein interactions as well. Altogether we found 16 and 13 mutations mapped to protein-ion and proteinnucleic acid binding interfaces, respectively. Table 1 shows representative examples with mutations located on binding interfaces and lists candidates for cancer biomarkers. As indicated in Table 1, mutations on five genes that correspond to protein-DNA or protein-ion interactions (BCL11A, ZIK1, ZNF497, ZNF339, and TP53) are located within C2H2-type zinc finger motifs. The zinc ion is essential for the stabilization of the local structure required for DNA binding. The disruption of Zn ion coordination may potentially lead to deregulation of corresponding proteins. Specifically, we found a C62Y substitution in LIM homeobox transcription factor 1 alpha (LMX1A), an important factor for the development of the nervous system. This transcription factor harbors two LIM zinc-binding domains, and the C62Y substitution occurs at one of the zinc-binding cysteine residues in the structure of its homolog LMO-2 and leads to decreased Znbinding by 3.2 kcal/mol (see tables on ftp site) ( Figure 3A). Indeed, a recent study suggested that LMX1A might play a tumor suppressive role and may be targeted for therapeutic intervention in human [45].
Paired box protein PAX9 is another example originating from the transcription factor Pax family that regulates the expression of target genes involved in proliferation, stem-cell self-renewal, resistance to apoptosis and cell migration. PAX9 expression is associated with favorable outcome in several cancers although its role in tumorigenesis is not well understood [46]. We studied the substitution R26W which, according to the crystal structure of its homolog PAX6 (75% identical to PAX9), directly interacts with the DNA molecule [47] (Figure 3B). Although the substitution with tryptophan may have rather drastic consequences for the maintenance of the networks of electrostatic interactions between arginine and DNA phosphates, we did not find considerable differences in protein-DNA binding affinity (DDDG = -0.05 kcal/ mol) although the mutation destabilizes an overall complex by 2.15 kcal/mol.
Cancer mutations may also directly affect enzymatic activity. Being involved in proliferation, differentiation and apoptosis pathways, mitogen-activated protein kinase 9 (MAPK9) blocks the ubiquitination of tumor suppressor p53 leading to an increase of suppressor stability. Similar to other phosphate transferring enzymes, MAPK9 uses magnesium as a cofactor for phosphorylation. We studied G35R substitution in MAPK9 and hypothesized that it might disrupt its tumor suppressor properties. The crystal structure of its homolog MAPK10 (with 85% sequence identity to MAPK9) shows that Gly35 is located at the edge of the ATP binding pocket and participates in an ATP-binding loop [48]. According to FoldX calculations the substitution of glycine into positively charged arginine compromises magnesium cation binding by 1.38 kcal/mol, supporting the deregulation of MAPK9 kinase activity and cancer cell development ( Figure 3C).

Properties of Mutated Interaction Network
Topological network analysis facilitates the interpretation of interaction data and may allow the inference of cellular functions from the underlying proteins [49]. By mapping mutations and corresponding substitutions on protein-protein interfaces using our IBIS structural inference approach [50,51], we identified 160 protein-protein interactions between 150 proteins with mutations that were located directly on binding interfaces (''mutant interactions'', MI). Furthermore, we embedded these interactions in a web of 4,073 interactions between 2,928 human proteins where each interaction was obtained by high-throughput methods as well as confirmed by the IBIS structural inference approach. In such a 'confirmed interaction network' we considered interactions that involved a protein with a mutation anywhere in a protein, allowing us to collect 444 ''all mutant interactions'' (AI). Therefore, the set of MI interactions is a subset of the AI set. To determine the role that ''MI'' and ''AI'' interactions play in a large human interaction network we determined the topological characteristics of such affected interaction networks. While we observed that the confirmed interaction network breaks into many connected components, we carried out our topological investigations on the largest connected component of 1,960 interactions ( Figure 4A). As a measure of clustering around a given interaction, we defined the edge clustering coefficient [52]. Assuming that MI interactions play a critical role in the flow of biological information in an interaction network, we hypothesized that such interactions may not be necessarily clustered but tend to bridge clustered areas. Indeed, we observed that MI and AI interactions generally tend to be placed in less clustered areas compared to the remaining unaffected interactions in Figure 4B (Wilcoxon test, p-value = 5610 28 ). Notably, we also observed a significant shift to lower clustering of MI interactions compared to AI interactions (pvalue = 0.01). A measure of an interaction's centrality in a network is its edge betweenness centrality. Specifically, edge betweenness centrality determines the number of shortest paths through a given edge, therefore corresponding to potential ''bottlenecks''. In the inset of Figure 4B, we show that interactions between proteins with mutations on binding interfaces (MI) had a significantly higher betweenness centrality than interactions involving non-mutant proteins (p-value = 0.005). We also found that such interactions had significantly higher betweenness than interactions between mutant proteins where mutation did not necessarily affect the binding interface (AI) (p-value = 0.01).

Conclusions
Many studies have shown that missense mutations might play a very important role in causing different diseases. However, causal variants and phenotypic effects of these mutations are very difficult to predict, especially for polygenic diseases [53]. Although mutations of monogenic diseases might prefer the core of the protein [54], cancer related mutations exhibit quite a different pattern. Specifically, such mutations are less likely to occur in the protein core and prefer binding interfaces [20,23]. Nevertheless the extent to which mutations might affect biomolecular interactions remains largely unknown. With this goal in mind we addressed the molecular mechanism of carcinogenic effects of glioblastoma mutations. The actual placement of cancer mutations on binding interfaces allowed us to stratify cancer-related interactions and potential driver genes and address the ways such mutations may affect binding and the underlying protein interaction network's topology.
First, we found that overall missense mutations had a significantly destabilizing effect on protein-protein interactions although some mutations over-stabilized protein complexes. This effect was mostly driven by the electrostatic component of binding energy and such observations are consistent with previous investigations, focusing on the effects of OMIM mutations on protein complexes [22]. Indeed, the charge complementarity may determine specific binding while its disruption can be accompanied by the loss of specific interactions. The contribution of a given charged pair of amino acids to the electrostatic component of binding energy depends on the balance of two large terms: desolvation penalty and electrostatic pairwise interactions. While the desolvation penalty of a group mostly depends on its net charge, the pairwise electrostatic interaction energy is also sensitive to the geometry of the side chains. Previous results indicated that electrostatic interactions on protein-protein binding interfaces are almost always favorable [55]. Therefore, amino acid substitutions might result in dramatic changes of the magnitude of the favorable pairwise electrostatic interactions, while having little impact on the desolvation penalty [56].
Furthermore, we calculated changes in physico-chemical properties between wild-type and substituted residues and found that amino acid substitutions on binding interfaces were associated with significantly larger physico-chemical distances compared to non-interface mutations. The most drastic amino acid changes were observed for mutations on protein-protein binding interfaces. Such observations point to the possible damaging effect of many glioblastoma mutations on interfaces. We expect that such mutations are unlikely passenger mutations but rather may drive the disease phenotype. Such an argument appears plausible given that genetic alterations in cancer generally affect signaling pathways, compromising many protein-protein interaction events. Specifically, we detected a group of potential cancer biomarkers, some of them specific to nervous system development, which Furthermore, we indicated all interactions that involved a mutated protein (AI). (B) Calculating edge clustering of MI and AI interactions in the largest component, we observed that interactions affected by a mutation generally tend to appear in less clustered areas. Compared to the remaining interactions, such differences were significant for both MI and AI interactions (p-value = 5610 28 , Wilcoxon test). Comparing MI and AI interactions, we observed a significant shift of MI interactions toward lower clustering (p-value = 0.01). In the inset, we determined edge betweenness of MI and AI interactions as a measure of their centrality in the network. Compared to the remaining interactions, we found that differences between both sets of interactions affected by mutations were statistically significant (p-value = 5610 23 ). Furthermore, MI showed significantly lower betweenness than AI interactions (p-value = 0.01). doi:10.1371/journal.pone.0066273.g004 might be deregulated by mutations affecting protein-protein, protein-nucleic acid and protein-ion binding.
Since many factors influence the topology of protein interaction networks and the role of a given protein in cancer development, a debate about the topological properties of disease and cancerrelated networks has emerged. Although the vast majority of disease genes were found to be nonessential and did not encode hub proteins, cancer related genes, however, suggested an opposite trend [16,57,58]. High betweenness of proteins that translates into being a ''bottleneck'' indicated another important characteristic that distinguished essential and non-essential genes especially in regulatory networks [59]. Hub-bottlenecks, for example, defined as frequently interacting proteins connecting different clustered areas, were indicated as good predictors of gene essentiality [16].
Here we found that interactions involving proteins with mutations generally tend to occur in less clustered areas and are characterized by higher edge betweenness compared to the remaining unaffected interactions from the same network. Importantly, we also show that our stratification of cancer-relevant interactions had a significant impact by focusing on glioblastoma mutations observed directly on binding interfaces (MI). In particular, MI interactions were placed in areas of lower clustering and higher edge betweenness compared to AI interactions that did not account for the mutated position with respect to protein binding. Still, AI interactions had higher bottleneck properties compared to the remaining, unaffected interactions.
Indeed, although oncogenes were reported to have a certain tendency to cluster into a small number of modules and pathways [60,61], other reports did not confirm such characteristics [16]. Here, we show that genes with mutations affecting their binding interfaces were preferably located in central network positions which might influence critical nodes/edges in signal transduction networks mediated by protein-protein interactions. Our observation is consistent with two previous studies, indicating that proteins extensively involved in signal transduction activity, actively sending and receiving signals, are more frequently mutated in cancer [60].

Constructing Human Interactome and Mapping Mutations on Binding Interfaces
We used our recently developed framework to map the human interactome [18]. Selecting the longest protein isoforms of a human query sequence, we retrieved their protein interaction partners and binding sites using the IBIS server (http://www.ncbi. nlm.nih.gov/Structure/ibis/ibis.cgi) [50,51]. IBIS predicts protein interaction partners and provides the locations of their binding sites on a query protein using a set of homologous structural complexes as evidence of an interaction. Along with different types of protein interaction partners (protein, ion, DNA, RNA, peptide, and small molecule), IBIS ensures the biological relevance of binding sites. Utilizing structural complexes, IBIS collects experimentally 'observed' protein interactions if a protein has a certain number of residues that 'contact' its partner. Two residues are considered to be in contact if any of the heavy-atom interatomic distances is less than 6 Å for protein-protein (4 Å for protein-nucleic acid and 3 Å for protein-ion) interaction partners. Such a group of residues that is in contact to an interaction partner is called a ''binding site''. Demanding that the sequence similarity between the query protein and a homologous structural complex is high enough (see Supporting Information for details), homologous complexes were subsequently grouped according to their binding site similarity. In the case of protein-protein interactions we mapped interaction partners from complexes of other organisms to their most similar human proteins that had more than 80% sequence identity and 80% protein sequence coverage. As a result of this procedure we obtained 54,861 protein-protein interactions between 9,265 human proteins, a network we refer to as 'structurally inferred'. In addition, we used IBIS to determine protein-nucleic acids and protein-ion interactions, procedures that did not require additional mapping of the interactions partners to human proteins.
We also pooled 61,240 interactions between 11,446 proteins from high-throughput experiments found in Reactome [62], MINT [63] and HPRD [64]. We confirmed these interactions by using structurally inferred interactions, allowing us to obtain 4,073 interactions between 2,928 human proteins, a network that we refer to as the ''confirmed interaction network''. We compiled missense mutations from genes in glioblastoma multiform patients from two previous studies [25,26] that identified mutations in the tumor sequences that were not present in the reference sequences of each gene. Mutations present in the normal control samples and in single nucleotide polymorphism (SNP) databases were then removed from further analyses. We additionally verified interaction interfaces in complexes using the PISA algorithm. Using chemical thermodynamics, PISA computes a set of macromolecular assemblies that are expected to be stable in solution and are assumed to represent the biological form of a protein in the cell [65]. In the end, we mapped 695 missense mutations from 598 genes on different types of proteins and protein binding interfaces as shown in Figure 1. Altogether 97 mutations from 68 genes were mapped on protein-protein interfaces affecting 160 protein-protein interactions, whereas 16 and 13 mutations were found on proteinion and protein-nucleic acid interfaces respectively (ftp://ftp.ncbi. nih.gov/pub/panch/GBM/). Among those mutations mapped on interfaces, 33% mutations were observed in more than one cancer sample according to the COSMIC database [66] compared to 15% of mutations which could not be mapped on interfaces.

Modeling of Protein Complexes
Based on templates identified in IBIS we built 3D structural models of protein-protein complexes using a homology modeling approach. In particular, we applied the Profix and Nest programs provided by the Jackal package [67] to fix missing atoms or residues of the templates and to build homology models of complexes based on their sequence alignments. The models were submitted to the TINKER package for energy minimization utilizing the Limited Memory BFGS Quasi-Newton Optimization algorithm [68]. Energy minimization was performed with the Amber98 force field. The convergence criterion was set to the root mean-squared (RMS) gradient per atom = 0.01. The Scap program [69] as provided by the Jackal package was applied to computationally generate the corresponding mutant structures using the minimized wild-type models, and the mutations were introduced by side-chain replacements. The wild-type and mutant structures were minimized again using TINKER to assure that both structures were subjected to the same refinement protocol after introducing the mutation. It was previously shown that stateof-the-art algorithms can always build high-quality models for proteins with sequence identity higher than 35,40% to the homologous protein structures [70]. Checking the quality of the models, we removed models with large van-der-Waals clashes based on calculated energies and selected only those models that were based on protein-protein structural complexes with more than 40% identity to both the query human protein and its interaction partner which resulted in 21 high quality models of protein-protein complexes. Protein-DNA and protein-ion com-plexes were modeled using Modeller [71] ''automodel'' function. Mutations were introduced using FoldX ''buildmodel'' module. DNA molecules were treated as non-flexible during the modeling processes.

Binding Energy Calculation
Binding energy was calculated using the rigid body approach described in previous studies [22,72,73]. We assumed that the internal mechanical energies of the (un-)bound monomers remain unchanged so that the energy terms of the unfolded state can be excluded in the binding energy calculation. The total potential energy and its two components, van der Waals energy and electrostatic energy, for protein-protein complexes were computed with the ANALYZE program as provided by the TINKER package [68]. The electrostatic component of total potential energy was defined as the sum of charge-charge interaction energy and continuum solvation energy obtained from the TINKER results. The binding energy was defined as the difference between the potential energy of the dimer and the individual monomers. where DDG(binding: WT) and DDG(binding: MU) were the binding energies of the wild-type (WT) complex and the mutant complex (MU) calculated using equation (1). The difference of total potential energy (DDDG tot ), van der Waals energy (DDDG vdW ) and electrostatic energy (DDDG el ) were calculated and analyzed separately. A negative value of DDDG binding indicated that the mutation decreased the binding affinity, destabilizing the complex. In turn, a positive value of DDDG binding suggested that the mutation stabilized the complex. We also used the ''AnalyseComplex'' function of the FoldX program [74] which calculates the stability of protein complexes using an empirical force field to estimate the effect of mutations on binding. The change in binding energy was approximated by impairing the selected targets, determining the stability of the separated molecules and then subtracting the sum of the individual energies from the global energy of the complex. Binding energy differences for protein-DNA interactions were calculated similarly.
In these cases DG(folding: complex) refers to the stability of the protein-DNA complex and DG(folding: A) refers to the stability of the monomer without DNA. Binding energy of protein-ion interactions were calculated by ''MetalBindingEnergy'' function of FoldX. The best five models (with the highest Modeller molpdf score) were analyzed and the mean values of DDDG calculated. All protein-DNA and protein-ion models were preprocessed by the RepairPDB function of FoldX before any energy calculation to optimize side chain conformation.

Topological Measures of Interaction Networks
As a global measure of a protein-protein interaction's centrality, we calculated edge betweenness, reflecting an edge's appearance in shortest paths through the whole network. In particular, we defined edge betweenness centrality of an interaction v as c B (v)~P

Statistical Analysis
To perform statistical tests we used Splus and R packages. To compare mean values of two distributions (or compare the mean value to zero), we used the Wilcoxon rank test and reported onesided p-values. To test the independence between the rows and columns in a contingency table, we used Fisher's exact test and reported two-sided p-values. We also calculated Pearson and Spearman correlation coefficients to test the null hypothesis about the independence of two variables and reported two-sided pvalues.

Supporting Information
File S1 This file contains Table S1 (Phosphorylation of mutated sites), Table S2 (Prediction of the effect of mutations on proteinprotein and non-interface regions using PolyPhen2), Figure S1 (The change of binding energy) and Text S1 (Mapping of complete interactomes using structural complexes, methodology).