Figures
Abstract
Uncovering acquired drug resistance mechanisms has garnered considerable attention as drug resistance leads to treatment failure and death in patients with cancer. Although several bioinformatics studies developed various computational methodologies to uncover the drug resistance mechanisms in cancer chemotherapy, most studies were based on individual or differential gene expression analysis. However the single gene-based analysis is not enough, because perturbations in complex molecular networks are involved in anti-cancer drug resistance mechanisms. The main goal of this study is to reveal crucial molecular interplay that plays key roles in mechanism underlying acquired gastric cancer drug resistance. To uncover the mechanism and molecular characteristics of drug resistance, we propose a novel computational strategy that identified the differentially regulated gene networks. Our method measures dissimilarity of networks based on the eigenvalues of the Laplacian matrix. Especially, our strategy determined the networks’ eigenstructure based on sparse eigen loadings, thus, the only crucial features to describe the graph structure are involved in the eigenanalysis without noise disturbance. We incorporated the network biology knowledge into eigenanalysis based on the network-constrained regularization. Therefore, we can achieve a biologically reliable interpretation of the differentially regulated gene network identification. Monte Carlo simulations show the outstanding performances of the proposed methodology for differentially regulated gene network identification. We applied our strategy to gastric cancer drug-resistant-specific molecular interplays and related markers. The identified drug resistance markers are verified through the literature. Our results suggest that the suppression and/or induction of COL4A1, PXDN and TGFBI and their molecular interplays enriched in the Extracellular-related pathways may provide crucial clues to enhance the chemosensitivity of gastric cancer. The developed strategy will be a useful tool to identify phenotype-specific molecular characteristics that can provide essential clues to uncover the complex cancer mechanism.
Citation: Park H, Miyano S (2024) Sparse spectral graph analysis and its application to gastric cancer drug resistance-specific molecular interplays identification. PLoS ONE 19(7): e0305386. https://doi.org/10.1371/journal.pone.0305386
Editor: Maria Ines Fariello Rico, Universidad de la Republica Uruguay: Facultad de Ingeniería, URUGUAY
Received: September 13, 2023; Accepted: May 28, 2024; Published: July 5, 2024
Copyright: © 2024 Park, Miyano. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The third-party datasets used in this study are publicly available from Sanger Genomics of Drug Sensitivity in Cancer (GDSC) dataset (https://www.cancerrxgene.org). The Expression levels (i.e., RMA normalised expression data for cell-lines) are available here (https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Home.html). The drug sensitivities (i.e., log of IC50 values: GDSC1-dataset) are also available here (https://www.cancerrxgene.org/downloads/bulk_download).
Funding: The work of Heewon Park was supported by NRF (RS-2023-00276559). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Gastric cancer is a leading cause of cancer-related deaths worldwide. Chemotherapy is the standard gastric cancer treatment. However, chemotherapy is not always effective because cancer cells acquire resistance to targeted drugs. Several studies have been conducted to uncover drug resistance mechanisms in cancer chemotherapy in various research fields, including medicine, health sciences, and pharmaceutical sciences. Although many bioinformatics studies attempted to uncover the drug resistance mechanism and related molecular characteristics, most studies were based on individual or differential gene expression (DEG) analysis [1, 2]. However, the complex mechanisms underlying drug resistance in cancer cells cannot be understood by perturbation in a single gene because abnormalities in complex molecular networks cause the mechanisms involved in disease. Therefore, the single gene-based and DEG analysis is insufficient to uncover drug resistance mechanisms and molecular characteristics in cancer cells.
The main aim of this study is to disclose the crucial molecular interplay that is instrumental in the mechanism driving acquired drug resistance in gastric cancer. In order to effectively identify the crucial molecular interplay, we consider drug resistance-specific molecular characteristics identification based on differential gene network (ssDGN) analysis. Recently, several methods have been proposed to differential gene network analysis by comparing graph spectra [1, 3, 4]. The studies identified differential networks based on eigenvalues of adjacency, Laplacian, and normalized Laplacian matrices. However, the eigenanalysis in these studies is based on fully dense loadings and thus suffers from the following drawbacks: “difficulty in interpretation of the eigenanalysis results, because eigenvalues are estimated as linear combinations of all variables” and “the results may be disturbed by noise features included in datasets”. In order to effectively perform for differentially regulated gene network identification, we develop a novel computational strategy for sparse eigenanalysis of graph spectra called Net-sEVD. Our strategy incorporates sparsity into eigen loading estimation based on L1-type regulari zation. Thus, only crucial features to describe the graph structure are involved in the eigenanalysis. Furthermore, the sparse loadings lead to effective interpretation of the eigenanalysis results. To achieve a biologically reliable results, we also incorporated knowledge of network biology that “the genes linked in the networks may have similar biological functions” and “the hub genes linked with many genes are crucial markers to understand biological mechanisms” into the eigen loading estimation [5]. Consequently, our strategy encourages coefficient similarity of the linked genes in the eigenloadings estimation, which leads to the simultaneous selection of the linked genes. Our analysis imposes a relatively small penalty on the hub genes. Thus, hub genes and their neighboring genes have relatively large coefficients and are easily selected. In summary, the spectral graph analysis is based on crucial sub-networks consisting of the hub genes and their neighboring genes. Therefore, we can perform a biologically reliable interpretation of the spectral graph analysis results.
Monte Carlo simulations were conducted to demonstrate the proposed method’s effectiveness. The ssDGN method showed an outstanding performance in identifying differentially regulated gene networks in various scenarios of gene network structures. We applied ssDGN to identify differentially regulated gene networks between gastric cancer-drug-sensitive and -resistant cell lines based on the Sanger Genomics of Drug Sensitivity in Cancer (GDSC) dataset. We then identified gastric cancer drug resistance-specific molecular interplays and related markers. Most of the identified drug resistance markers have strong evidence as biomarkers for gastric cancer and its chemotherapy. Our results with Gene Ontology (GO) term pathway analysis uncovered that “extracellular matrix” (ECM)-related pathways are involved in drug resistance mechanisms, and the COL4A1, PXDN, and TGFBI genes act as drug resistance markers. We suggest that suppression and/or induction of COL4A1, PXDN and TGFBI and their molecular interplays enriched in the extracellular-related pathways may provide crucial clues to enhance gastric cancer’s chemosensitivity.
The remainder of this paper is organized as follows: The methods section introduces the proposed method for sparse spectral graph analysis and ssDGN identification. We discuss simulation studies in the Monte Carlo simulations section. Finally, we describe the results of identifying gastric cancer drug resistance-specific gene networks. Conclusions are provided in the Discussion section.
Methods
We considered a gene regulatory network that is represented by a weighted undirected graph G = (V, E, W), where V = {1, …, p} is the set of vertices corresponding to p genes, E ∈ V × V (i.e., E = (i ∼ j)) is the edge set, and W = (wij), (i, j) ∈ E is the edge weight. The adjacency matrix A(G) is defined as
(1)
The degree of a vertex i is defined as di = ∑j∼i wij, and the degree matrix D is given as the following diagonal matrix
(2)
The Laplacian matrix is given as L0 = D − A(G) and the normalized Laplacian matrix L = D−1/2L0D−1/2 is defined as [3, 5],
(3)
The gene regulatory network can be described by A(G), L0 and L.
Existing method to measure similarity of graph
Preliminaries.
In this section, we discuss the similar matrix and their eigenvalues. The similar matrix is defined as follows (Definition 4.1 of [6]),
Definition 1 The p × p matrices MY and MN are said to be similar matrices if a nonsingular matrix V exists, such that (4)
We then consider the following property for the p × p matrix (Theorem 3.2 (d) of [6]).
Theorem 1 Let MY be an p × p matrix. Then, the eigenvalues of VMYV−1 are the same as the eigenvalues of MY, if V is a nonsigular p × p matrix.
The definition of similar matrix and theorem imply that similar matrices MY and MN have identical eigenvalues.
Graph similarity measure based on Laplacian matrices.
The eigenanalysis has been used to describe network structures, and many methods have been developed to measure the networks’ similarity/dissimilarity based on the eigenvalues of adjacency, Laplacian-, and normalized Laplacian matrices [1, 3, 7]. The weighted graphs GY and GN for phenotypes Y and N are represented by p × p matrices MY and MN (e.g., precision matrix, Laplacian matrix, adjacency matrix, etc.), respectively. Suppose MY and MN are similar matrices. It follows from Theorem 1 that the matrices MY and MN have identical eigenvalues We suppose rank(A(GY)) = rank(A(GY)) = q ≤ p. Wills and Meyer [3] introduced the following spectral distance to measure the dissimilarity of two graphs
(5)
where q is the number of eigenvalues for best rank approximation of A(GY) and A(GN), and λ(A(GY))r and λ(A(GN))r are rth eigenvalues of adjacency matrices of A(GY) and A(GN), respectively.
Although various graph spectra-based strategies have been proposed to measure network similarity, the existing methods are based on fully dense loadings. However, the fully dense loadings lead to the following drawbacks
- The eigenvalues are estimated as linear combinations of all variables, leading to difficulty interpreting the eigenanalysis results.
- The eigenanalysis can be disturbed by noisy features included in datasets.
To settle on the issues, we developed a novel strategy for sparse eigenanalysis to capture gene network structure.
Network constrained sparse spectral graph analysis of the graph laplacian matrices
The spectral (eigenvalue) decomposition of the normalized Laplacian matrix L with rank q is defined as
(6)
where V = [v1, v2, …, vq] and Λ = diag(λ1, λ2, …, λq) with non-negative eigenvalues λ1 ≥ λ2 ≥ … ≥ λq ≥ 0. The optimization problem for estimating eigen vectors V in (5) can also be interpreted by the following principal component analysis,
(7)
where
.
Various methodologies have been developed to estimate sparse loadings. Zou et al. [8] proposed the following sparse PCA
(8)
where γ1, γ2 > 0 are regularization parameters for βr, r = 1, …, q, A = [α1, α2, …, αq], B = [β1, β2, …, βq]. Guo et al. [9] developed the novel sparse PCA based on fused loadings,
(9)
where ρs,t is the sample correlation between sth and tth features in L and sign(⋅) is the sign function. Croux et al. [10] proposed a robust version of the sparse PCA based on the strong measure of variance. A methodology for sparse loading estimation was developed by incorporating sparsity into singular value decomposition [11]. However, the existing methods were developed purely from mathematical and algorithmic points without considering biological knowledge. Thus, biological interpretation of the results cannot be effectively performed.
To effectively and interpretably perform differentially regulated gene network identification, we proposed a novel strategy for sparse eigenanalysis of the normalized Laplacian matrix L, called Net-sEVD, based on network-constrained regularization [5, 12–14],
(10)
where s ∼ t indicates the set of gene pairs which are connected to s in the gene network. In our strategy, the computed Laplacian matrix L in (3) is used to input of the optimization problem in (10) and the eigenloadings B = [β1, β2, …, βq] are regularized by the normalized Laplacian matrix L. The second term of the objective function of our strategy is a quadratic Laplacian penalty, called a Dirichlet energy, that is used for quantifying the smoothness of the graph, and leads to identify important features by measuring their smoothness on the graph structure [15, 16] In our strategy, the quadratic Laplacian penalty used to induce a smooth solution of the parameters βr between neighboring vertices over the gene network. Furthermore, the scaling of the coefficients βr,s respect to the degree (i.e.,
) allows the hub gene with more connections to have larger coefficients βr,s. The use of the quadratic Laplacian penalty enables us to incorporate the following knowledge of network biology into the eigen loading estimation [5].
- The genes linked in the networks may have similar biological functions
Using the network-constrained penalty, we encouraged the linked genes’ similarity of coefficients in the eigenloadings estimation. The penalty term enabled us to locally smooth the network and encourage the simultaneous selection of related features, the large value of wst (i.e., strong edge between sth gene and tth gene) lead to a large amount of penalty to difference of βr,s and βr,t. It implies that the last term of our method leads to connected genes with strong edges have similar βr,s and βr,t (r = 1, …, q) in loading matrix estimation. - The hub genes linked with several genes play critical roles in gene regulation and biological processes [17].
The hub genes are crucial markers for understanding biological mechanisms. Our strategy imposed a relatively small penalty on the hub genes by re-scaling the coefficients with the square root of the degrees ds. Thus, the hub genes and their neighboring genes have relatively large coefficients and are easily selected.
In short, our strategy’s eigenvalue estimation is based on crucial sub-networks consisting of the hub genes and their neighboring genes, which leads to a biologically reliable interpretation of the spectral graph analysis results and crucial markers identification.
Algorithm of Net-sEVD.
To implement Net-sEVD, we adapted the iterative algorithm of the sparse PCA [8]
- Step 1. Initialize
at
by setting it to the eigenvalue decomposition of L.
- Step 2. For given
, the sparse eigenloadings are estimated by solving the following optimization problem for r = 1, …, q,
where
Update
.
- Step 3 For fixed
, compute SVD of
[8]. Update
and
.
- Step 4 Repeat Steps 2–3 until convergence.
- Step 5 Normalize
as
for r = 1, 2, …, q, and update V = [v1, …, vq].
- Step 6 Compute Λs = VTLV
Differential gene regulatory networks identification based on Net-sEVD: ssDGN
Suppose the normalized Laplacian matrices LY and LN that describe the estimated gene networks for p genes, where the networks are estimated by expression levels of phenotypes Y and N. By using the Net-sEVD, we first estimate eigenvalues λs(LY) and λs(LN) of p × p matrices LY and LN, respectively. We then measured the dissimilarity of gene regulatory networks based on the following distance ds(GY, GN) of eigenvalues in line with [3]
(11)
where LY and LN are the normalized Laplacian matrices of the graphs GY and GN, and λs(⋅)r is the rth eigenvalue of the normalized Laplacian matrix.
To identify differentially regulated gene networks, we considered the permutation p-value of ds(GY, GN). We draw permutation cell lines of phenotypes Y and N, respectively, and then estimate the permutation gene networks of p genes for the phenotypes Y and N, i.e., GYpm and GNpm. The normalized Laplacian matrices and
are computed. The eigenvalues of the permutation matrices are estimated as
and
. We then computed the permutation p-value of the distance as follows,
(12)
where T is the number of permutations, I(⋅) is an indicator function, ds(GYpm, GNpm) is the distance between eigenvalues of the normalized Laplacian matrices for permutation gene networks GYpm and GNpm, and T is several permutations. The small permutation p-value indicates that networks GY and GN show significantly different gene regulatory structures between two phenotypes, Y and N. The differentially regulated gene network identification was performed by the permutation p-value with significance level α.
Regularization parameter selection
The results of the proposed Net-sEVD heavily relies on the regularization parameters γ1 and γ2. Let and
be the estimates of A and B. We consider the following criterion to select the tuning parameters,
(13)
Then, we select the regularization parameters that minimize (13).
Monte Carlo simulations
We illustrated the performance of the proposed ssDGN based on Monte Carlo simulations. We supposed two phenotypes, A and B, with sample sizes nA = nB = 100. We considered five common and five A-specific sub-networks, where each sub-network comprises ten genes.
In scenario 1, we generated five precision matrices for each common sub-network (i.e., , nw = 1,.., 5) and A-specific networks (i.e.,
, nw = 1,.., 5) from “random” graph structure using huge.generator an R package Huge previously described [18]. The expression levels of each of the ten genes in every five common networks were generated from
, where the number of cell lines is n = 200. For the A-specific networks, the expression levels XA of each of the ten genes comprising each sub-network are generated from
, where the sample size of phenotype A is nA = 100. We then generated the expression levels XB of phenotype B from N(050, Σ50), where Σ50 is the diagonal matrix whose diagonal is σ2. Scenarios 2, 3, and 4 are similar to scenario 1 except that the precision matrices
for nw = 1,.., 5 and
for nw = 1, …, 5 have “cluster,” “scale-free,” and “hub” graph structures for scenarios 2, 3, and 4, respectively. We use significance level α = 0.05 and extract the subnetwork corresponding p.value <0.05. Fig 1 shows the graph structures of the common sub-networks and A-specific sub-networks.
In the differentially regulated gene network identification, we first estimate gene regulatory networks GA and GB based on the generated expression levels of genes for phenotypes A and B (XA and XB), respectively. In the simulation study, we compute the correlation matrices of expression levels of genes XA and XB to describe gene network (i.e., correlation network) for phenotypes A and B.
(14)
where Cov(xi, xj) is a covariance between expression levels of the ith and jth genes, and
is a standard deviation of the expression levels of the ith genes. The absolute values of correlation coefficients are used as edge weights WA and WB,
(15)
We then compute the normalized Laplacian matrices LA and LB in (3). By using the proposed Net-sEVD, we compute eigenvalues of the normalized Laplacian matrices LA and LB and then extract phenotype specific sub-network (i.e., the subject of the proposed method Net-sEVD are the normalized Laplacian matrices LA and LB). We evaluated ssDGN compared with similarity measures based on eigenvalues estimated by fully dense loading (i.e., ordinary EVD: oEVD), sparse loadings (sPCA) [8], sparse loadings (sPCA) without L2 norm (sPCAwoL2), fused sparse loadings (sfPCA) [9] and sign adjusted quadratic Laplacian penalty
(ssDGNs), where Ls = STLS with
. The permutation p-values were computed based on 500 permutation samples, and the simulations were repeated 50 times. To evaluate the performance of differential gene network identification, we compare accuracy, F-measure, true negative rate (TNR), recall, and precision. Tables 1–3 show the average of the results in 50 replications for σ2 = 1, 3 and 5, respectively, where the bold values indicate the best performance among the methods. As shown in Tables 1–3, the proposed ssDGN effectively identified differential regulatory networks for scenarios 1, 3 and 4 in overall. The methods showed similar performance in the viewpoint of Recall (i.e., true positive rate), while our strategy outperformed true negative identification given in the column “TNR”. It can be seen through Tables 1–3 that the proposed ssDGN provides effective performance for differentially regulated gene network identification, overall. We expected that our strategy will be a useful tool to identify phenotype-specific molecular inteplays that can decribed crucial characteristics of cancer mechanism.
Gastric cancer drug resistance specific gene networks identification
We applied ssDGN to identify differentially regulated gene networks between gastric cancer drug-sensitive and -resistant cell lines based on the publicly available Sanger Genomics of Drug Sensitivity in Cancer (GDSC) dataset from the Cancer Genome Project. We considered four anti-cancer drugs, doxorubicin, mitomycin-c, 5-FU, and docetaxel, approved by the Food and Drug Administration (FDA), for gastric cancer. We used the log of IC50 values as the drug sensitivity and the expression levels of 10% of the genes with the highest variance in all cell lines, i.e., 404 genes are considered as candidate regulator and target genes. The cell lines were matched by the expression levels and the drug sensitivity, where 948, 855, 891, and 948 cell lines for doxorubicin, mitomycin-c, 5-FU, and docetaxel were matched, respectively. We then defined drug-sensitive and -resistant cell lines based on 1st (D1st) and 3rd (D3rd) quartiles of drug sensitivity (DS) of each drug, i.e., drug-sensitive cell lines: DSST < D1st and drug-resistant cell lines: DSRS > D3rd.
In GDSC data analysis, the linear regression model to represent gene regulatory system is given as,
(16)
where
, δj = (δj1, …, δj,j−1, δj,j+1, …, δjp)T is the regression coefficient that represents the effect of p candidate regulator genes xi on jth target gene xij and ϵij is a random error vector for the jth target gene. For each jth target gene, we consider the remaining p = 403 genes as candidate regulator genes. The following lasso is used to select true regulator genes and estimate the gene networks. [19],
(17)
where and λ > 0 is the regularization parameters of δj. The weight of edges W = wij is computed based on the effect of the ith gene to jth gene (i.e., δij) and the jth gene to ith gene (i.e., δji) as follows,
(18)
Then, the normalized Laplacian matrix L is computed by the weight of edges W.
In order to identify the differentially regulated gene network, We first estimate drug sensitive and resistance -specific gene regulatory networks NWST and NWRS based on DSST and DSRS. From the two gene networks, we extracted each 1% edges with the largest absolute value of edge size (i.e., and
). For the sub networks consisting of the extracted edges, we identified differentially regulated gene networks between drug-sensitive and -resistant cell lines, where only sub-networks with nodes greater than five were considered. For doxorubicin, mitomycin-c, 5-FU, and docetaxel, 13, 13, 14, and 12 sub-networks were considered for differentially regulated gene network identification. We extracted sub-networks with a p-value smaller than 0.01, and 1, 1, and 2 networks were identified for doxorubicin, 5-FU, and docetaxel, respectively. There was no differentially regulated gene network for mitomycin-c. We then matched the identified 1, 1, and 3 sub networks with
and
, where the matched networks can be considered as drug sensitive and resistance -specific gene regulatory structures.
Fig 2 shows the identified drug sensitive and resistance -specific gene networks, where color of edges indicates drugs, i.e., red: 5-FU, blue: doxorubicin, green: docetaxel. As shown in Fig 2, drug sensitive and resistance cell lines show considerably different molecular interplays, where drug resistance cell lines have relatively active and dense gene regulatory system. It implies that our strategy specify drug response -specific gene regulatory structure. We also perform drug sensitive and resistance networks estimation based on the ordinary sparse PCA (sPCA), because sPCA also show effective performance in simulation studies. The sPCA provide relatively dense networks compared with our method, i.e., 26 (13) and 100 (78) edges are identified by ssDGN and sPCA, respectively, in drug resistance (sensitive) cell lines. It can be seen that all edges identified by our method in drug resistance and sensitive cell lines are also identified by sPCA (see S1 File, tab: sPCA). Although some of edges identified by sPCA are not extracted by our method, the gene networks constructed by our approaches can be considered as crucial molecular interplays for uncovering gastric cancer drug resistance mechanism.
Gastric cancer drug sensitive (left) and resistance (right) -specific gene networks, where the red, blue and green indicate that the edges are extracted from 5-FU, doxorubicin and docetaxel-resistance specific gene networks, respectively.
In order to uncover drug resistance mechanisms of gastric cancer, we focus on the identified drug resistance-specific gene networks. Table 4 shows the identified gastric cancer drug resistance markers ASCL1, NOL4, BTK, ELAVL4, LAPTM5, SCG2 that have more than three edges in the drug resistance-specific networks and their evidences, where the columns “Gastric cancer” and “Gastric cancer drugs” indicate literature related to the mechanism of gastric cancer and anti-cancer drugs uncovered by previous studies, respectively. We also perform gastric cancer resistance marker identification by using sPCA. The sPCA identify the following genes, ASCL1(13), EPS8L3(11), LGALS4(9), HSPA1A(8), NOL4(8), ELAVL4(6), IFIT3(4), SCG2(4), TRIM31(4), AGR2(3), AKR1B10(3), AKR1B10P1(3), AKR1C1(3), BTK(3), C1R(3), C1S(3), IFIT1(3), IFIT2(3), KCNMB4(3), LAPTM5(3), RTN1(3), TMSB15B(3), where the numbers in parentheses indicate the number of edges. All gastric resistance markers identified our method in Table 4 are also identified by the sPCA. It implies that the genes can be considered as crucial markers that are strongly supported by data-driven approaches.
Gastric cancer
ASCL1 expression levels have been considered characteristic of gastric cancer, and ASCL1 overexpression was identified as a signature of gastric tumor [21]. ASCL1 was identified as a differentially expressed gene in most gastric cancer subtypes [20]. High expression levels of SCG2 were also identified in subtype gastric cancer [34, 35]. BTK mutation was amplified in gastric cancer, which plays a critical role in anti-cancer drug resistance [28, 29]. BTK can be a therapeutic gastric cancer target since BTK expression knockdown inhibits the gastric cancer cell growth [24]. The ELAVL4 and LAPTM5 genes were also identified as differentially expressed genes in gastric cancer [31, 32]. Furthermore, SCG2 is reportedly crucial to study therapeutic strategies to treat gastric tumors [34].
Chemotherapy of gastric cancer
Targeted ASLC1 delivery of siRNA and doxorubicin could be a potential strategy for effective neuroendocrine tumor treatment [23]. Ovarian cancer stem-like cells are highly resistant to cisplatin which is attributable to the BTK signaling overexpression [24]. CTN06, a novel BTK and ETK dual inhibitor reportedly re-sensitized cell lines to docetaxel [25]. Docetaxel substantially increased IRF3, BTK, and DDX41 expression levels [26]. BTK inhibitors reportedly synergize with 5-FU to treat drug-resistant TP53-null colon cancers, and using BTK inhibitors in combination with 5-FU is a novel therapeutic approach for colorectal cancer [27]. Additionally, doxorubicin treatment for EBNA2-positive DLBCL cells is effectively complemented with a NF-κB or BTK inhibitor [30]. LAPTM5 was identified as a novel marker to enhance the resistance to docetaxel in breast cancer cells [33]. SCG2 was identified as a prognostic marker for chemotherapy in colorectal cancer by 5-FU [36].
As shown in Table 4, the drug resistance markers identified by our strategy have strong evidence as markers for gastric cancer and chemotherapy. The drug resistance-specific molecular interplays in the right side of Fig 2 may provide crucial clues to uncover the acquired drug resistance mechanisms of gastric cancer. Therefore, suppressing marker activities and their interplays can provide crucial clues to enhance the sensitivity to gastric cancer drugs.
To identify biological processes involved in the identified drug resistance (sensitive) markers, we performed gene enrichment analysis using the bioinformatics tool Database for Annotation, Visualization and Integrated Discovery (DAVID) [37]. DAVID is a web server for functional enrichment analysis and functional annotation of gene lists. GO term annotations are performed on these lists of genes with category “Molecular Function (MF)”, “Cellular Component (CC)” and “Biological Processes (BP)” based on species and databases [38]. We use the 26 (14) identified differentially regulated network in drug resistance (sensitive) cell lines as input for GO term pathway analysis.
Fig 3 shows the significant pathways selected with p.value < 0.05, -log(p.value) and genes comprising each pathway. As shown in Fig 3, the linked genes are enriched in a pathway, e.g., gene COL4A1 and PXDN are involved in “Extracellular matrix structural constituent”, “Extracellular matrix organization” and “Extracellular matrix” pathways. The S100A16 and S100A16 are involved in “Calcium-dependent protein binding” and “Plasma membrane” pathways. Furthermore, the genes consisting the subnetwork in bottom of drug sensitive specific cell lines, i.e., TNFRSF12A, S100A16, AHNAK2, MYOF, S100A13 and TM4SF1 are enriched in ‘Plasma membrane” pathway. The result implies that the genes linked in the networks have similar biological functions, and incorporation of the knowledge may provide biologically reliable results. GO term pathway analysis revealed that the gastric drug resistance markers are significantly enriched in the “Extracellular” associated pathway, especially the markers in “Extracellular matrix” (ECM) related pathways. The definition of the enriched “Extracellular” associated pathways is given in “THE GENE ONTOLOGY RESOURCE” (http://geneontology.org/) as follows
- Extracellular matrix:
A structure lying external to one or more cells providing structural support and biochemical or biomechanical cues for cells or tissues. - Extracellular region:
The space external to the outer structure of a cell. This region refers to space outside the plasma membrane for cells without external protective or encapsulating structures. This term covers the host cell environment outside an intracellular parasite. - Extracellular matrix organization:
A process carried out at the cellular level resulting in the assembly, arrangement of constituent parts, or disassembly of an extracellular matrix. - Extracellular matrix structural constituent:
The action of a molecule that contributes to the structural integrity of the extracellular matrix. - Extracellular space:
The part of a multicellular organism outside the cell, usually outside the plasma membranes and occupied by fluid.
Increased ECM stiffness has been considered a key hallmark of cancer because the ECM plays a crucial role in accelerating tumor progression, reducing the cure ratio, and promoting chemotherapy resistance of cancer cell lines [39]. The ECM can hinder chemotherapy effects by acting as a physical barrier, impeding drug access to the tumor [40]. In breast cancer cell lines, increasing matrix stiffness leads to drug resistance of cancer cell lines by impeding drug delivery ability and reducing drug sensitivity [39]. Depleted ECM components were demonstrated as essential features relating to 5-FU beneficial responses in patients with gastric cancer [41]. Enriched ECM components have been recognized as one of the primary reasons causing anti-cancer drug resistance as stiffened ECM impeded the drugs and immune cell infiltration into the tumor [41]. ECM-mediated chemoresistance is caused by a decrease in cell proliferation Keeratichamroen et al. [42], and proteins confer cell adhesion-mediated drug resistance [43]. Gene expression analysis revealed that ECM protein genes COL3A1, COL5A2, and COL15A1 are significantly upregulated in drug-resistant ovarian cancer cells [44].
We consider COL4A1, PXDN, and TGFBI key markers of the “Extracellular” associated pathways.
- COL4A1
COL4A1 has been considered a crucial marker to drive trastuzumab resistance in gastric cancer by performing protein/gene interactions and biological process annotation analyses [45]. COL4A1 shows overexpression in gastric cancer tissues and is upregulated in trastuzumab-resistant gastric cancer cells. - PXDN
The high PXDN expression level has been considered a signature of drug resistance in cancer cells, and it was suggested that patients with a high PXDN expression are more likely to develop drug resistance [46]. - TGFBI
TGFBI and its pathway are well-known drug resistance markers. The loss of TGFBI is reportedly sufficient to induce chemotherapy resistance by paclitaxel [47]. Furthermore, TGFBI knockdown leads to SKOV-3 cells being resistant to paclitaxel, and TGFBI was considered a marker for paclitaxel-resistant ovarian cancer cell line [48]. GFBI hypermethylation was reportedly considerably associated with trastuzumab resistance in HER2+ breast cancer patients [49].
These findings imply that uncovering the COL4A1, PXDN, and TGFBI mechanisms consisting of the ECM-related pathways are crucial to understanding mechanisms underlying acquired gastric cancer drug resistance. We also perform GOterm analysis for genes out of the identified networks given in Fig 2. The bottom (gray) of Fig 3 shows the most enriched 10 pathways of the genes. As shown in Fig 3, “Nucleoplasm”, “Cytosol”, and “Protein binding” are the most enriched pathways for the genes not in the drug sensitive and resistance networks. All pathways for the genes are given in S1 File. It can be seen that the genes in the network of drug sensitive and resistance cell lines are involved in the specific pathways, not in general pathway (bottom of Fig 3).
Although we did not incorporate any biomedical knowledge into the ssDGN identification, our data-driven analysis provided biologically reliable and valuable results to uncover mechanisms involved in gastric cancer drug resistance. Our GO term pathway analysis results strongly suggest that suppression and/or induction of COL4A1, PXDN and TGFBI and their molecular interplays involved in the extracellular-related pathways may provide crucial clues to enhance the chemosensitivity of gastric cancer.
Our strategy can be extended for differential gene network analysis based on various Laplacian regularizations [50–53]. We also performed gastric cancer drug resistance specific gene networks identification by using the sign adjusted quadratic Laplacian penalty , where Ls = STLS with
. The sign adjusted quadratic Laplacian penalty was developed to incorporate the situation that neighboring genes have opposite signs of the coefficients. The method with the sign adjusted quadratic Laplacian penalty identified more subnetworks than our strategy, i.e., doxorubicin: 3 subnetworks, mitomycin-c: 2 subnetworks, 5-FU: 5 subnetworks, docetaxel: 5 subnetworks. The networks of gastric cancer drug resistance cell lines show relatively dense network (i.e., 87 edges) than its of sensitive cell lines (i.e., 58 edges) that is similar to extracted networks by the ssDGN and sPCA. The following genes are drug resistance markers identified by the method based on the sign adjusted quadratic Laplacian penalty, ASCL1 (13), EPS8L3 (11), LGALS4 (9), HSPA1A (8), NOL4 (8), ELAVL4 (6), IFIT3 (4), SCG2 (4), TRIM31 (4), AKR1B10 (3), AKR1B10P1 (3), AKR1C1 (3), BTK (3), C1R (3), C1S (3), HLA-DRB5 (3), IFIT1 (3), IFIT2 (3), LAPTM5 (3), POU2AF1 (3), RTN1 (3), TMSB15B (3), where the numbers in parentheses indicate the number of edges. The all genes identified as the gastric drug resistance markers by the ssDGN, i.e., ASCL1, NOL4, BTK, ELAVL4, LAPTM5, SCG2, are also identified by this method. All edges identified by the method based on the sign adjusted quadratic Laplacian penalty are given in the S1 File (tab: signAD).
Discussion
In order to uncover crucial molecular interplay that plays key roles in the mechanism of gastic cancer drug resistance, we developed a novel strategy for differentially regulated gene network identification based on sparse spectral graph analysis. The proposed ssDGN estimates the eigenvalue of graph Laplacian matrices based on sparse eigen loading. Thus, we can effectively perform differential gene network identification without disturbance of noise features. The sparse eigenanalysis also leads to interpretable results; for example, we can identify crucial features to describe graph structure. In this study, we considered computational efficiency and biological interpretability and incorporated network biology knowledge into the eigenanalysis of the graph structure. Therefore, our strategy can provide biologically reliable and interpretable results.
We performed Monte Carlo simulations to illustrate the proposed strategy. The simulation results showed that the proposed ssDGN provides outstanding differential gene network identification performance. We applied our strategy to identify differentially regulated gene networks between gastric cancer drug-sensitive and -resistant cell lines. For the gastric cancer drugs, doxorubicin, Mitomycin-C, 5-FU, and Docetaxel, we estimated each drug’s sensitivity- and resistance-specific gene network and then identified the differentially regulated sub-networks between two phenotypes. The identified gastric drug resistance-specific molecular interplays and markers have strong evidence for mechanisms related to gastric cancer and its chemotherapy. The GO term pathway analysis results show that ECM-associated pathways and their members, COL4A1, PXDN, and TGFBI, are involved in gastric cancer’s acquired drug resistance mechanism. We suggest that suppression and/or induction of COL4A1, PXDN, and TGFBI, and their molecular interplays involved in the ECM-related pathways may provide crucial clues to enhance the chemosensitivity of gastric cancer.
Although the ssDGN shows effective results for differentially regulated gene network identification, our method cannot distinguish the difference between the following two networks G1 = {g1, g2 − g3} and G2 = {g1 − g2, g3} where the edge strength of g1-g2 and g2-g3 are same, because the eigenvalue of two normalized Laplacian matrices are same. This point is obvious limitation of our method. Further work remains to be done towards developing a method that can incorporate not only structure but also coordinates of network.
Supporting information
S2 File. R script for Net-sEVD is available at https://drive.google.com/file/d/1UjiZ8Ex7Fv2bA3hbQz6Y9YeRFyEsnuZC/view?usp=sharing.
https://doi.org/10.1371/journal.pone.0305386.s002
(TXT)
Acknowledgments
This research used computational resources of the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo.
References
- 1.
Lutzeyerm JF, Walden AT (2017) Comparing Graph Spectra of Adjacency and Laplacian Matrices. Preprint at https://arxiv.org/abs/1712.03769.
- 2. Yu D, Lim J, Wang X, Liang F, Xiao G (2017) Enhanced construction of gene regulatory networks using hub gene information. BMC Bioinformatics. 18, 186. pmid:28335719
- 3. Wills P, Meyer FG (2020) Metrics for graph comparison: A practitioner’s guide. PLoS One. 15(2):e0228728. pmid:32050004
- 4.
Mieghem P (2010) Graph Spectra for Complex Networks. Cambridge University Press.
- 5. Li C, Li H (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics. 24(9): 1175–1182. pmid:18310618
- 6.
James RS (2016) Matrix Analysis for Statistics, 3rd Edition. Wiley, Network.
- 7. Gera R, Alonso L, Crawford B, House J, Mendez-Bermudez JA, et al. (2018) Identifying network structure similarity using spectral graph theory. Appl Netw Sci 3(1):2. pmid:30839726
- 8. Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comp Grap Stat 15: 265–286
- 9. Guo FJ, James G, Levina E, Michailidis G, Zhu J (2010) Principal component analysis with sparse fused loadings. J Comp Grap Stat 19: 930–946. pmid:25878487
- 10. Croux C, Filzmoser P, Fritz H (2018) Robust Sparse Principal Component Analysis. Technometrics 55: 202–214.
- 11. Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. J Mult Anal 99(6): 1015–1034.
- 12.
Xia Z, Zhou X, Chen W, Chang C (2011) A Graph-based Elastic Net for Variable Selection and Module Identification for Genomic Data Analysis. IEEE Inter Conf Bioinf Biomed (BIBM): 357-362.
- 13. Iuliano A, Occhipinti A, Angelini C, De Feis I, Lió P (2016) Cancer Markers Selection Using Network-Based Cox Regression: A Methodological and Computational Practice. Front Physiol. 7:208. pmid:27378931
- 14. Avey S, Mohanty S, Wilson J, Zapata H, Joshi SR et al. (2017) Multiple network-constrained regressions expand insights into influenza vaccination responses. Bioinformatics. 33(14):i208–i216. pmid:28881994
- 15. Kumar M, Sharma A, Kumar S (2023) A Unified Framework for Optimization-Based Graph Coarsening. J Mach Learn Res 24:1–50.
- 16.
Xu L, Chen L, Wang R, Nie F, Li X (2023) Joint Feature and Differentiable k-NN Graph Learning using Dirichlet Energy. arXiv:2305.12396.
- 17. Yu T, Xia Q, Gong T, Wang J, Zhong D (2020) Molecular mechanism of acquired drug resistance in the EGFR-TKI resistant cell line HCC827-TR. Thorac Cancer 11(5): 1129–1138. pmid:32163227
- 18. He H, Cao S, Zhang JG, Shen H, Wang YP, Deng HW (2019) A Statistical Test for Differential Network Analysis Based on Inference of Gaussian Graphical Model. Sci Rep 9(1):10863. pmid:31350445
- 19. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B 58: 267–288.
- 20. Wang CY, Shahi P, Huang JT, Phan NN, Sun Z, et al. (2017) Systematic analysis of the achaete-scute complex-like gene signature in clinical cancer patients. Mol Clin Oncol 6(1): 7–18. pmid:28123722
- 21. Nakakura EK, Sriuranpong VR, Kunnimalaiyaan M, Hsiao EC, Schuebel KE, et al. (2005) Regulation of neuroendocrine differentiation in gastrointestinal carcinoid tumor cells by notch signaling. J Clin Endocrinol Metab 90(7): 4350–4356. pmid:15870121
- 22. Xiao Y, Jaskula-Sztul R, Javadi A, Xu W, Eide J, et al. (2012) Co-delivery of doxorubicin and siRNA using octreotide-conjugated gold nanorods for targeted neuroendocrine cancer therapy. Nanoscale 4(22):7185–7193. pmid:23070403
- 23. Babu A, Munshi A, Ramesh R (2017) Combinatorial therapeutic approaches with RNAi and anti-cancer drugs using nanodrug delivery systems. Drug Dev Ind Pharm 43(9): 1391–1401. pmid:28523942
- 24. Wang JD, Chen XY, Ji KW, Tao F (2016) Targeting Btk with ibrutinib inhibit gastric carcinoma cells growth. Am J Transl Res 8(7): 3003–3012. pmid:27508020
- 25. Chandrasekar T, Yang JC, Gao AC, Evans CP (2015) Mechanisms of resistance in castration-resistant prostate cancer (CRPC). Transl Androl Urol 4(3): 365–380. pmid:26814148
- 26. Salimi S, Rezaei M, Mousavi Z, Atabakhshian R, PourIran R, et al. (2020) Docetaxel Enhances the Expression of STING Protein in PC3 Cells, and cGAMP Attenuates this Effect. Nov Biom 8:4: 196–204.
- 27. Lavitrano M, Ianzano L, Bonomo S, Cialdella A, Cerrito MG, et al. (2020) BTK inhibitors synergise with 5-FU to treat drug-resistant TP53-null colon cancers. J Pathol 250(2): 134–147. pmid:31518438
- 28. Berry JD, Boese DJ, Law DK, Zollinger WD, Tsang RS (2005) Molecular analysis of monoclonal antibodies to group variant capsular polysaccharide of Neisseria meningitidis: recurrent heavy chains and alternative light chain partners. Mol Immunol 2(3): 335–344. pmid:15589322
- 29. Zhu H, Sha Y, Miao Y, Qin S, Wu W, Xia Y, et al. (2022) Novel BTK Mutations Conferring Resistance to the Second-Generation, Irreversible BTK Inhibitor Orelabrutinib (ICP-022). Blood 140 (Supplement 1): 4117–4118.
- 30. Kim JH, Kim WS, Hong JY, Ryu KJ, Kim SJ, et al. (2017) Epstein-Barr virus EBNA2 directs doxorubicin resistance of B cell lymphoma through CCL3 and CCL4-mediated activation of NF-κB and Btk. Oncotarget 8(3): 5361–5370. pmid:28036258
- 31. Huang YK, Yu JC (2015) Circulating microRNAs and long non-coding RNAs in gastric cancer diagnosis: An update and review. World J Gastroenterol 21(34): 9863–9886. pmid:26379393
- 32. Yang W, Niu L, Zhao X, Duan L, Wang X, et al. (2022) Pyroptosis impacts the prognosis and treatment response in gastric cancer via immune system modulation. Am J Cancer Res 12(4): 1511–1534. pmid:35530274
- 33. Meng Q, Zhou L, Liang H, Hu A, Zhou H, et al. (2022) Spine+specific downregulation of LAPTM5 expression promotes the progression and spinal metastasis of estrogen receptor-positive breast cancer by activating glutamine-dependent mTOR signaling. Int J Oncol 60(4):47. pmid:35294039
- 34. Ihler F, Vetter EV, Pan J, Kammerer R, Debey-Pascher S, et al. (2012) Expression of a neuroendocrine gene signature in gastric tumor cells from CEA 424-SV40 large T antigen-transgenic mice depends on SV40 large T antigen. PLoS One 7(1):e29846. pmid:22253802
- 35. Bae GE, Kang SH, Kim JS, Kim SH, Kim KH, et al. (2022) Characterization of Poorly Cohesive and Signet Ring Cell Carcinomas and Identification of PTPRM as a Diagnostic Marker. Cancers (Basel) 14(10):2502. pmid:35626106
- 36. Weng S, Liu Z, Ren X, Xu H, Ge X, et al. (2022) SCG2: A Prognostic Marker That Pinpoints Chemotherapy and Immunotherapy in Colorectal Cancer. Front Immunol 13:873871. pmid:35844556
- 37. Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, et al. (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4:P3. pmid:12734009
- 38. Brionne A, Juanchich A, Hennequet-Antier C. ViSEAGO: a Bioconductor package for clustering biological functions using Gene Ontology and semantic similarity. BioData Min 12:16. pmid:31406507
- 39. Kuermanbayi S, Yang Y, Zhao Y, Li Y, Wang L, et al. (2022) In situ monitoring of functional activity of extracellular matrix stiffness-dependent multidrug resistance protein 1 using scanning electrochemical microscopy. Chem Sci. 13(35): 10349–10360. pmid:36277620
- 40. Galindo-Pumarino C, Collado M, Castillo ME, Barquin J, Romio E, et al. (2022) SNAI1-expressing fibroblasts and derived-extracellular matrix as mediators of drug resistance in colorectal cancer patients. Toxicol Appl Pharmacol 450:116171. pmid:35878797
- 41. Dong S, Zhang S, Zhao P, Lin G, Ma X, et al. (2022) A combined analysis of bulk and single-cell sequencing data reveals that depleted extracellular matrix and enhanced immune processes co-contribute to fluorouracil beneficial responses in gastric cancer. Front Immunol 13:999551. pmid:36189263
- 42. Keeratichamroen S, Lirdprapamongkol K, Svasti J. (2018) Mechanism of ECM-induced dormancy and chemoresistance in A549 human lung carcinoma cells. Oncol Rep 39(4): 1765–1774. pmid:29436672
- 43. Yu Q, Xiao W, Sun S, Sohrabi A, Liang J, et al. (2021) Extracellular Matrix Proteins Confer Cell Adhesion-Mediated Drug Resistance Through Integrin αv in Glioblastoma Cells. Front Cell Dev Biol 9, 616580. pmid:33834020
- 44. Gao-Feng X, Ren X (2016) Function of cancer cell-derived extracellular matrix in tumor progression. J Cancer Metastasis Treat 2: 357–364.
- 45. Huang R, Gu W, Sun B, Gao L (2018) Identification of COL4A1 as a potential gene conferring trastuzumab resistance in gastric cancer based on bioinformatics analysis. Mol Med Rep 17(5): 6387–6396. pmid:29512712
- 46. Zhou X, Sun Q, Xu C, Zhou Z, Chen X, et al. (2022) A systematic pan-cancer analysis of PXDN as a potential target for clinical diagnosis and treatment. Front Oncol 12, 952849. pmid:35982948
- 47. Ahmed AA, Mills AD, Ibrahim AEK, Temple J, Blenkiron C, et al. (2007) The extracellular matrix protein TGFBI induces microtubule stabilization and sensitizes ovarian cancers to paclitaxel. Cancer Cell 12(6): 514–527. pmid:18068629
- 48. Hutchinson E (2008) Destabilizing influence. Nature Reviews Cancer 8, 77.
- 49. Palomeras S, Diaz-Lagares A, Vinas G, Setien F, Ferreira HJ, et al. (2019) Epigenetic silencing of TGFBI confers resistance to trastuzumab in human breast cancer. Breast Cancer Res 21(1), 79. pmid:31277676
- 50. Sun H, Lin W, Feng R, Li H (2014) Network-regularized high-dimensional cox regression for analysis of genomic data. Stat Sin 24(3):1433–1459. pmid:26316678
- 51. Bianchi D, Buccini A, Donatelli M, Randazzo AE (2022) Graph laplacian for image deblurring. Elect Trans Numer Anal 55: 169–186.
- 52. Huang J, Ma S, Li H, Zhang CH. The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression. Ann Stat. 2011;39(4):2021–2046. pmid:22102764
- 53. Tian X, Wang X, Chen J. Network-constrained group lasso for high-dimensional multinomial classification with application to cancer subtype prediction. Cancer Inform. 2015 Jan 12;13(Suppl 6):25–33. pmid:25635165