Graph-theoretical model of global human interactome reveals enhanced long-range communicability in cancer networks

Malignant transformation is known to involve substantial rearrangement of the molecular genetic landscape of the cell. A common approach to analysis of these alterations is a reductionist one and consists of finding a compact set of differentially expressed genes or associated signaling pathways. However, due to intrinsic tumor heterogeneity and tissue specificity, biomarkers defined by a small number of genes/pathways exhibit substantial variability. As an alternative to compact differential signatures, global features of genetic cell machinery are conceivable. Global network descriptors suggested in previous works are, however, known to potentially be biased by overrepresentation of interactions between frequently studied genes-proteins. Here, we construct a cellular network of 74538 directional and differential gene expression weighted protein-protein and gene regulatory interactions, and perform graph-theoretical analysis of global human interactome using a novel, degree-independent feature—the normalized total communicability (NTC). We apply this framework to assess differences in total information flow between different cancer (BRCA/COAD/GBM) and non-cancer interactomes. Our experimental results reveal that different cancer interactomes are characterized by significant enhancement of long-range NTC, which arises from circulation of information flow within robustly organized gene subnetworks. Although enhancement of NTC emerges in different cancer types from different genomic profiles, we identified a subset of 90 common genes that are related to elevated NTC in all studied tumors. Our ontological analysis shows that these genes are associated with enhanced cell division, DNA replication, stress response, and other cellular functions and processes typically upregulated in cancer. We conclude that enhancement of long-range NTC manifested in the correlated activity of genes whose tight coordination is required for survival and proliferation of all tumor cells, and, thus, can be seen as a graph-theoretical equivalent to some hallmarks of cancer. The computational framework for differential network analysis presented herein is of potential interest for a wide range of network perturbation problems given by single or multiple gene-protein activation-inhibition.

Introduction on statistical outliers, absolute values of gene scores are subsequently substituted by a uniform pattern of average gene expression ranging in λ i 2 [−6.5, 6.5]. This transformation has the effect that genes with the same rank in a λ i -sorted list become equal weights in different cancer and non-cancer samples: l 1 > l 2 > :: 0 :: > l NÀ 1 > l N ; Sorted lists of rank-normalized gene weights for all tumor/norm, norm/tumor (i.e., reversely weighted tumor/norm lists) as well as randomized data are in S2 Table. Network topology compilation Network topology is compiled on the basis of directed pairwise protein-protein and gene regulatory interactions by integration of open-source data provided with STRING (string-db.org), MSigDB (software.broadinstitute.org/gsea/msigdb) and PATHWAYCOMMONS (www.pathwaycommons.org). The complete list of 74538 directed pairwise interactions is in S3 Table. Network communicability: plausibility considerations First, we want to define a plausible measure for quantification of total information flow (communicability) along a single linear pathway. This should be done in such a way that the absence or malfunction of one single pathway link (i.e., network edge) results in interruption or significant impairment of the entire pathway communicability, see Fig 1(a). This intuitively comprehensible constraint is considered by a measure that is defined as a product of all edge weights E i , i.e., where E i ! 0 are positive numbers whose values indicate working (conducting) or non-working (non-conducting) state of the i-th pathway link. In the case of unweighted networks, conducting and non-conducting states of pathway links are described by binary weights of network edges E i = 1 and E i = 0. In weighted networks, weights of network edges are positive floating-point numbers E i > 0. If two network nodes are connected by multiple pathways, total information flow should not critically depend on the state of a single pathway link or even one single pathway, see Fig 1(b). Consequently, total communicability between each two network nodes can be defined by a sum of all single pathway communicabilities: Another plausible requirement on the network communicability measure is that intensity of information flow should decline with increasing distance from the source. Consequently, Eq (3) can be extended to where ω(N j ) > 1 denotes a pathway length dependent weighting factor.

Normalized total communicability
In graph theory, the total number of walks of the length n joining nodes of an arbitrary complex network is calculated as the n-power A n of the graph-representing, sparse adjacency matrix A, see Fig 2. In fact, one can show that Eq (3) is formally identical to the A n compilation rule from the entries of A. In turn, the weighted version of our plausibly derived communicability measure (Eq (4)) naturally emerges within the concept of the adjacency matrix exponential e A . Following [40,41], the total communicability C ij (n) between a pair of network nodes (i, j) joined by all possible walks of the maximum length n is calculated as the exponential of  the adjacency matrix A ij computed up to the n-power term of the e A ij series expansion: where I is the identity matrix. In simple terms, C ij (n) represents a n!-weighted sum of all walks (pathways) of the lengths 1, 2, 3‥n joining a pair of network nodes with indices (i, j). In this study, the matrix exponential is calculated for n 7 using sparse matrix multiplication algorithms as available with the CSPARSE package [42]. Thereby, computational costs for iterative compilation of a C ij (n = 7) matrix with 7018 diagonal elements on a Intel Core i5-4590 powered PC amount roughly one hour. Unweighted adjacency matrices indicate existence A ij = 1 of connections between each two nodes (i, j), but they do not consider intensity of their interactions. In order to account for Example of network communicability calculus using adjacency matrices. Given initial adjacency matrices of directional weighted (W) and unweighted (A) pairwise interactions between network nodes, all weighted and unweighted walks of the length n > 1 are calculated from the n-power matrices W n and A n , respectively. Note that the shortest possible loop in a directed graph has the length n = 3. Consequently, diagonal entries of W 3 and A 3 contain non-zero values, while the n = 1, 2 power matrices are hollow.
doi:10.1371/journal.pone.0170953.g002 biologically relevant differences in strength of network interconnections, weighted adjacency matrices are required. Here, we make strength of network edges dependent on differential gene expression. The basic idea consists of constructing a matrix of differential gene expression weighted interactions which entries are positive numbers W ij ! 0 that have the following interpretation W ij < 1 interaction between ði; jÞ is downregulated; Since interactions between neighbor nodes are defined on network edges, we want to define a mapping function which maps differential gene expression of each two neighbor nodes on their interlinking edge. Furthermore, this mapping should consider that communicability of a single network link is critically dependent on intensities of interlinked nodes. The above requirements are met by the following mapping function which is further termed as the minimum metric (shortly, min-metric): In addition, we introduce an alternative average metric (shortly, avg-metric) that allows to account for a global trend of gene regulation in the entire pathway. In analogy to Eq (5), the matrix of differential gene expression weighted communicability G ij is defined as Finally, in order to avoid artificial overweighting of well-studied genes with high number of of interacting neighbors, we introduce the normalized total communicability (NTC) matrix D ij (n) where non-zero entries are defined by the ratio of differential gene expression weighted to unweighted communicability: Consequently, entries of D ij (n) matrices indicate relative changes in total information flow between each the two network nodes (i, j) joined by pathways of the maximum length n independently on the total number of these pathways.

Results
Starting from 74538 directed interactions between 7018 network nodes, NTC matrices of multistep pathways are computed iteratively as described above (Eqs (5)-(10)). Complete lists of weighted and unweighted pairwise interactions (i.e., 1st order adjacency matrices) for tumor/ norm, norm/tumor and 'random expression' samples are in S3 Table. With increasing pathway lengths, communicability matrices become densely populated. As shown in Fig 3, the occupancy of communicability matrices (i.e., the ratio of non-zero matrix entries to the dimension of the fully occupied matrix 7018 2 ) displays a particularly rapid increase from 0.15% to 53% at n = 4 and saturates around 70%. This means that the majority of network nodes are interconnected via n ! 4 distant pathways.
Differences between cancer and non-cancer interactomes are first studied using the more restrictive min-metric (Eq (7)). For this purpose, seven NTC matrices D ij (n = 1 − 7) are computed for BRCA/COAD/GBM cancer interactomes. To analyze dependency of NTC on gene scoring (i.e., gene expression), complementary 'non-cancer' NTC matrices are assembled by resorting the gene lists in reverse or random orders, i.e., Reverse : l i $ l NÀ i ; where 1 α 6 ¼ β N are two unequal random indices of differentially expressed genes in sorted BRCA/COAD/GBM gene lists.
To assess global differences between cancer and non-cancer interactomes, the average of all D ij (n) entries as well as the fraction of enhanced communicability r(n) as a function of the maximum pathway length (n = 1 − 7) are computed: where (k, l) and (i, j) denote indices of elevated (i.e., D kl (n) > 1) and all NTC entries, respectively. Fig 4 shows plots of " DðnÞ and r(n) for cancer and non-cancer networks. Remarkably, the average NTC in the range of n ! 4 distant pathways exhibits a persistent increase only in cancer interactomes. In contrast, average NTC of all non-cancer networks declines with an increasing pathway length. Difference between cancer and non-cancer NTC is also visible in the fraction of elevated NTC as a function of the maximum pathway length r(n), which shows  more rapid growth in cancer than in the reference non-cancer networks. Similar patterns of elevated long-range NTC in cancer interactomes are also observed when using the avg-metric (Eq (8)). To compare NTC of cancer and non-cancer networks simultaneously in both metrics, diagonal (θ) and cumulative off-diagonal (ξ) communicabilities in min-and avg-metric are computed as follows To determine whether elevated communicability of different cancer interactomes arises from enhancement of common genes, the impact of simulated gene inhibition on the above statistical features of NTC matrices (Eqs (12)-(13)) is simulated. For this purpose, a cut set of common BRCA/COAD/GBM genes with high NTC in both min-and avg-metric is computed using an iterative procedure, as shown in Fig 6. Starting with the initial set of 530 common BRCA/COAD/GBM genes, the smallest subset of 90 genes is identified whose simulated inhibition is sufficient to decrease the difference between NTC of cancer and non-cancer interactomes, see  Table. In addition to common genes, there are cancer-type specific genes with high NTC that appear to group in separate clusters in min-avg diagrams. 76 COAD specific genes, indicated in Fig 5 with green labels, build a prominent cluster with particularly high NTC values in avg-metric. These 76 genes are enriched in Hedgehog, Hippo and Wnt pathways which are known to promote Epitelial-to-Mesenchymal Transition (EMT) and metastatic cell transformation [43], see Fig 8 and S5 Table. Since our above simulations indicate a high level of robustness of cancer networks with respect to inhibition of a relatively small number of genes, we are interested in assessing the inhibitory effects of other prominent gene signatures. For this purpose, we examine a list of 33 genes with a high differential entropy in bladder cancer highlighted in [31]. Remarkably, 12 of 33 bladder cancer genes from [31] are also present in our list of 90 common BRCA/COAD/ GBM genes. According to the hypergeometric test HGT(7018, 90, 33, 12) = 2.6e-15, this overlap is statistically significant, see S6 Table. However, simulated inhibition of these 33 genes turned out to not be sufficient for the suppression of elevated NTC. Fig 9 shows the results of simulated inhibition 33 bladder cancer genes from [31] and 90 common BRCA/COAD/GBM genes identified in this work.

Discussion
Network-based approaches to mining omics data are increasingly popular. However, consistent modeling of biological networks remains a challenging task and requires the consideration of numerous factors whose impact on simulation results is still controversially debated in the literature. These factors include the role of a particular network topology, directionality and size as well as the choice of appropriate gene proximity metrics and numerical scores. In this work, we focused on the construction and evaluation of novel descriptors for measurement of network information flow. We let other issues remain widely unaddressed, assuming that simulation results obtained with different network topologies should be, in general, convergent.
To account for the potential bias of degree-based network features, we introduced a degreeindependent measure of information flow-the normalized total communicability (NTC). NTC relies on a well-known concept of network topology characterization by means of the npower adjacency matrices, whose entries indicate the total number of unweighted walks of the maximum length n between each two network nodes. In our approach, adjacency matrices of unweighted network topology are used for normalization of differential gene expression weighted walks. Consequently, NTC does not explicitly depend on node degrees, but rather serves as an integrative measure of up-or downregulation of all pathways of the maximum length n joining each two network nodes.
Similar to other works, we use public databases on pairwise protein-protein and gene regulatory interactions to compile the basic network topology. However, here we rely on a subset of directed interactions that naturally restrict the emergence of loops to n ! 3 network steps. Our simulation results show that elevated long-range communicability of different cancer networks is largely caused by circulation of information flow within compact subnetworks of tightly interlinked genes. In networks with non-directed interactions, this feature might be missing.
Our simulations indicate a high level of robustness of cancer networks with respect to inhibition of a low number of genes-proteins. Simulated inhibition of a few dozen genes, including a hit list of 33 bladder cancer genes from [31], was not sufficient to suppress elevation of long- range NTC. Despite the fact that elevated NTC arises in different cancer interactomes from heterogeneous gene expression profiles, we identified a subset of 90 common BRCA/COAD/ GBM genes whose simulated inhibition is capable of reducing differences between NTC of cancer and non-cancer interactomes. These genes turn out to be associated with cancer-related ontological categories, including enhanced cell division, DNA replication, elevated energy demand and cellular stress response. We conclude that enhanced NTC reflects correlated Graph-theoretical model reveals enhanced long-range communicability in cancer networks activity of genes whose coordination is required for maintenance of sustained proliferation and replication of all tumor cells. In other words, elevated long-range NTC represents a graphtheoretical hallmark of cancer networks. Under the assumption of gradual elevation of NTC in course of cancer development, an abnormal increase of NTC can serve as an early marker of malignant cell transformation. Further investigations of normally proliferating and cancer cells at different stages of disease development are required to prove this assumption and to define reliable diagnostic measures.
While focusing on construction of a feasible graph-theoretical formalism for gene expression weighted network modeling, this work does not go into the discussion of biological mechanisms of cancer network rewiring and regulation. Different biological processes on single gene, chromosome and whole genome level including gene mutations, changes in gene copy number, chromotripsis are known to accompany malignant cell transformation [44]. Consideration of this layer of information will be an important subject for future research.
Finally, our graph-theoretical framework is of potential interest for a broad spectrum of network perturbation problems such as single or multiple gene-protein activation, inhibition or malfunction due to the impact of mutations or interactions with pharmaceutic drugs. Graph-theoretical model reveals enhanced long-range communicability in cancer networks  Table. Inhibition of 33 high-score targets from [31] moderately decreases global communicability features of cancer networks. However, it is obviously not sufficient to suppress their elevation with increasing pathway length. In contrast, simulated inhibition of our 90 targets results in decreasing the difference between NTC features of cancer and non-cancer interactomes.  Table. Overlap between 90 common BRCA/COAD/GBM genes with high NTC (see S4  Table) and 33 genes with the high differential entropy in bladder cancer from [