Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Pathway-Based Evaluation in Early Onset Colorectal Cancer Suggests Focal Adhesion and Immunosuppression along with Epithelial-Mesenchymal Transition

Pathway-Based Evaluation in Early Onset Colorectal Cancer Suggests Focal Adhesion and Immunosuppression along with Epithelial-Mesenchymal Transition

  • Seungyoon Nam, 
  • Taesung Park


Colorectal cancer (CRC) has one of the highest incidences among all cancers. The majority of CRCs are sporadic cancers that occur in individuals without family histories of CRC or inherited mutations. Unfortunately, whole-genome expression studies of sporadic CRCs are limited. A recent study used microarray techniques to identify a predictor gene set indicative of susceptibility to early-onset CRC. However, the molecular mechanisms of the predictor gene set were not fully investigated in the previous study. To understand the functional roles of the predictor gene set, in the present study we applied a subpathway-based statistical model to the microarray data from the previous study and identified mechanisms that are reasonably associated with the predictor gene set. Interestingly, significant subpathways belonging to 2 KEGG pathways (focal adhesion; natural killer cell-mediated cytotoxicity) were found to be involved in the early-onset CRC patients. We also showed that the 2 pathways were functionally involved in the predictor gene set using a text-mining technique. Entry of a single member of the predictor gene set triggered a focal adhesion pathway, which confers anti-apoptosis in the early-onset CRC patients. Furthermore, intensive inspection of the predictor gene set in terms of the 2 pathways suggested that some entries of the predictor gene set were implicated in immunosuppression along with epithelial-mesenchymal transition (EMT) in the early-onset CRC patients. In addition, we compared our subpathway-based statistical model with a gene set-based statistical model, MIT Gene Set Enrichment Analysis (GSEA). Our method showed better performance than GSEA in the sense that our method was more consistent with a well-known cancer-related pathway set. Thus, the biological suggestion generated by our subpathway-based approach seems quite reasonable and warrants a further experimental study on early-onset CRC in terms of dedifferentiation or differentiation, which is underscored in EMT and immunosuppression.


Familial adenomatous polyposis (FAP) and hereditary nonpolyposis colorectal cancer (HNPCC) are autosomal dominant diseases that result from inherited genetic mutations in adenomatous polyposis coli (APC) and mismatch repair genes [1]. However, these diseases account for only 25% of the total number of colorectal cases in the United States in 2010 [2]. The remaining 75% of cancers are reportedly sporadic colorectal cancers (CRCs) without family histories [2] (, for which the mechanism is still not clear [3].

Hong et al. [3] identified 7 highly upregulated genes (CYR61, EGR1, FOSB, FOS, VIP, UCHL1, KRT24) in early onset sporadic CRC patients that were used as a predictor gene set assessed with a microarray technique. For their experiments, normal-appearing mucosa adjacent to tumor was obtained from the CRC patients and normal mucosa was obtained from healthy controls. They also provided a discussion on signaling pathways (MAP kinase (MAPK) signaling, NFAT-immune signaling, hypoxia signaling, insulin signaling, PI3K-AKT signaling, Wnt signaling, G protein-coupled receptor (GPCR) signaling).

In the present study, we further explored the microarray dataset in order to add a potential upstream regulator of some of the enumerated signaling pathways in the early-onset CRC patients assessed in the Hong et al. study [3]. Specifically, we performed advanced statistical analysis to enhance the molecular understanding of the predictor gene set using text-mining and significant subpathways related to the early-onset CRC cases.

Our approach involves public text-mining [4] using a new statistical model that handles regulation (e.g., inhibition, activation) among biological entries, and performs a permutation test for subpathway identification of a given pathway. We first identified statistically significant subpathways related to the early onset CRCs from KEGG pathways [5] with the model, and subsequently used text-mining [4] to confirm literature associations among the predictor gene set and some representative significant subpathways.

Our proposed model suggests that early-onset CRC is involved in subcomponents of the focal adhesion pathway and the natural killer (NK) cell-mediated cytotoxicity pathway. The NK cell-mediated cytotoxicity pathway in particular hints at the presence of immune cells in the early-onset CRC patients, which implies paracrine communication between immune cells (e.g., NK cells, T cells, NK T cells) and various other cells [6]. In addition, our result indicates that the previously reported signaling pathways (Wnt, PI3K-AKT, MAPK) [3] are likely cascaded through their upstream focal adhesion kinase (FAK), [7] which belongs to the focal adhesion pathway. Therefore, FAK [7] may be a valuable therapeutic target candidate for the early-onset CRC predictor gene set diagnosis. Furthermore, our text-mining analysis of the 2 pathways along with the predictor gene set implied that some elements of the predictor gene set are involved in cell survival and epithelial-mesenchymal transition (EMT) [8], [9], [10] through the focal adhesion pathway and immunosuppression [8], [10], [11].



The main concept of our statistical model was to pinpoint statistically significant subpathways whose expression (e.g., microarray) agreed with the regulation information (e.g., activation, inhibition) (Figure 1A) in the KEGG pathway database. Our approach is briefly described here.

Figure 1. Rules and schematic diagram for the statistical model.

A. Rules for matching an edge of two adjacent entities in KEGG pathways with their gene expression changes. Given an edge, gene 1 is called a source node of which the edge goes out, and gene 2 a sink node of which the edge comes in. B. Schematic diagram of the statistical model. Given a subpathway, the longest segment (well-defined subpathway) from the leaf node was identified. A statistic S for the well-defined subpathway was calculated. The null distribution of S was obtained via 1,000,000 sample label permutations and the p-value for the observed S was finally calculated (see Materials and Methods for details). Red ovals are up-regulated in the cancer patients, and green ones down-regulated.

The non-metabolic KEGG pathways were reduced into linear subpathways, as described in the Materials and Methods (Figure 2). In this study, the term “linear subpathway” is used equivalent to “subpathway”. We then selected well-defined subpathways in which the gene expression agreed with the regulation information under the set rules (Figure 1A) as candidates for measuring their statistical significance (see Materials and Methods). A statistic S for each well-defined subpathway was calculated and its significance evaluated by computing the empirical p-value via sample label permutations (Figure 1B).

A total of 90 KEGG pathways were broken down into over 130 million extensive linear subpathways that considered all multiple gene assignments. Among these subpathways, 4,644 well-defined subpathways were identified and their significance evaluated via permutation tests. Subsequently, text-mining association analysis was performed for the selected significant well-defined subpathways; further discussion on their functional roles is provided hereafter.

Significant well-defined subpathways

We performed multiple comparison tests by controlling the false discovery rate (FDR) [12]. The FDR q-values were computed using the p-values by performing 1 million permutation tests (Figure S1). The p-value that corresponded to an FDR of 5% was 0.01386, which yielded 1,289 significant well-defined subpathways. Since we opted not to provide detailed biological interpretation of all of these significant well-defined subpathways, we focused on examination of the top 30% of these well-defined subpathways to provide a more detailed biological description.

The majority of the selected subpathways we discuss belong to 6 KEGG pathways (Figure S2): Focal adhesion (KEGG hsa04510), Pathways in cancer (KEGG hsa05200), NK cell-mediated cytotoxicity (KEGG hsa04650), MAPK signaling pathway (KEGG hsa04010), Wnt signaling pathway (KEGG hsa04310), and Neutrophin signaling pathway (KEGG hsa04722). For the functional discussion and visualization, we mapped functionally interesting well-defined subpathways (Table S1) of the 6 KEGG pathways into KEGG pathway diagrams (Figures 3 and 4; Figures S3, S4, S5, and S6). In particular, we focused on 3 pathways (Focal adhesion, NK cell-mediated cytotoxicity, Pathways in cancer) that had not been explicitly mentioned in the previous Hong et al. study [3]. The gene entries of the well-defined subpathways included in the functional discussion and visualization of the 3 pathways are summarized in Table 1.

Figure 3. Mapping of the entries of the well-defined subpathways into the focal adhesion pathway.

If the fold-change of the cancer patient group over the healthy control group is greater than one the gene is red, otherwise green. See Table 1 and Table S1 for detailed information.

Table 1. The gene entries of the well-defined subpathways used for visualizing the three pathways diagrams (Figures 3 and 4, Figure S3).

Validation of the significant well-defined subpathways

We validated the entries in Table 1 by using an independent MedLine text-mining tool [4], PubGene. The purpose of this was to confirm whether the literature supported direct co-occurrences between the term “colorectal cancer” and the entries in Table 1. We found that 79% of the entries in Table 1 had direct interactions in the PubGene analysis (Table S2). Thus, we concluded that our model results provided a reasonable agreement with the literature examined.

Pathways in cancer (hsa05200)

The KEGG pathway hsa05200 (pathways in cancer, Figure S3) is self-evident. Growth factor signaling, Wnt signaling, and MAPK signaling, which are located in the left part of Figure S3, were activated in the samples from CRC patients. The signals are common driving forces during carcinogenesis [8], [13]. Apparently normal mucosa in the CRC patients has an intrinsic potential for further transformation.

Focal adhesion pathway (hsa04510)

Figure 3 shows the focal adhesion pathway. This result indicates that the bottom part of the pathway is highly involved with the CRC patients, and FAK (PTK2, Table 1) is not only a sink node from its upstream receptors but also a source node toward its downstream signaling transductions (Wnt, PI3K-AKT/PKB, and MAPK signals) for survival. PTEN (Table 1) [14], a tumor suppressor and antagonizer of the PI3K-AKT/PKB signaling pathway, was downregulated in the focal adhesion pathway in the analysis of the CRC patients' samples compared with that of the healthy controls' samples.

Hong et al. [3] suggested that the Wnt signaling pathway is involved in CRC patients. Our result regarding the focal adhesion pathway (Figure 3) supports the view that GSK-3β (GSK3B, Table 1) regulated by PI3K-AKT/PKB signaling of FAK downstream was downregulated in the CRC patients, and also that β-catenin (CTNNB1, Table 1) was highly expressed by downregulation of the Wnt signaling inhibitor GSK-3β in the CRC patients. Upon looking further into the information in Figure S5, we determined that gene expression of various activators and inhibitors related to Wnt signaling activation is consistent with the regulation flows. Another pathway, MAPK signaling (Figure S4) that was upregulated in the CRC patients is also located downstream of FAK (Figure 3).

Since the 3 activated signals (Wnt, PI3K-AKT/PKB, and MAPK signals) are located downstream of FAK belonging to the focal adhesion pathway, FAK [7], [15] may be a therapeutic target for the early-onset CRC predictor gene set diagnosis. Furthermore, because the crucial roles of Wnt, PI3K-AKT/PKB, and MAPK signaling shed light on EMT [9], there has been gradually increasing importance placed on FAK.

Interestingly, CYR61, which was included in the predictor gene set, is a ligand of ITGB5 (αVβ5 integrin, denoted as ITGB in Figure 3), according to the KEGG BRITE database [5]. Figure 3 shows that CYR61 is one of the far upstream cues that triggers FAK, implying that FAK subsequently activates 3 signals: Wnt, PI3K-AKT/PKB, and MAPK signaling. Recently, Wnt, PI3K-AKT/PKB, and MAPK signals were shown to be involved in EMT [9], and apparently normal mucosa in CRC patients could undergo phenotypical transformation by these 3 signals via the CYR61-FAK axis. In other words, some cells in the normal-appearing mucosa in the CRC patients may be close to atypical cells by utilizing EMT. We will describe the evidence of EMT in terms of gene expression level and explore any possible association between the predictor gene set and EMT in the Discussion.

Another finding relevant to the downstream region of the focal adhesion pathway is that the anti-apoptosis protein c-IAP (BIRC3, Table 1) [16], which is a negative regulator of caspases (e.g., CASP3, CASP8, CASP9), was also upregulated in the CRC patients. Thus, we further examined the various downstream c-IAPs (cellular inhibitor of apoptosis) in the focal adhesion pathway, where c-IAPs along with survivin (BIRC5) are also important anti-apoptotic proteins. Of note, it has also been shown that the c-IAPs and survivin also inhibit downstream caspases of both extrinsic and intrinsic apoptotic pathways [16]. We found that the majority of c-IAPs were upregulated in the CRC patients (Table 2). Thus, the focal adhesion pathway may confer inhibition of caspase activity upon the tumorigenesis of potential atypical cells in apparently normal mucosa.

Table 2. The gene expressions of c-IAPs, survivin (BIRC5) and XIAP (BIRC4).

NK cell-mediated cytotoxicity pathway (hsa04650)

Our statistical analysis indicated significant agreement between the gene expression of the CRC patients and part of the immune pathway (hsa04650, Figure 4), which implies the presence of other immune cells as well as NK cells in the CRC patients' specimens.

FAS in the target cells of NK cells and its ligand (FASLG), which is produced by NK cells, were highly expressed in the CRC patients' samples. High FASLG expression in the CRC patients complies with previous clinical observations [6], [17] in which high FASLG expression was correlated with high incidences of metastases and poor survival in colorectal carcinoma patients and in other carcinoma patients.

In the apparently normal mucosa of the CRC patients, various target cells including potentially atypical cells may survive from FASLG-FAS death receptor signaling by escaping either extrinsic or intrinsic apoptotic signaling. In fact, the apoptotic signals were inhibited in the CRC patients because the c-IAPs [16] that inhibited caspases were upregulated in the CRC patients in terms of gene expression (Tables 1 and 2). Another possibility is that FASLG upregulation by target cells, including potentially atypical cells, might initiate fratricide and suicide among the immune cells with FAS beneficial for transformation of potentially atypical cells.

However, the existence of high interferon-gamma (IFN-γ) expression secreted by NK cells or immune cells in CRC patients remains controversial because NK cell cytotoxicity is traditionally believed to control immunosurveillance over cancer and atypical cells. Recently, a significant relationship between anti-tumor immunity and survival of cancer cells has been reported [6], [18]. Furthermore, IFN-γ is known to be involved in immunosurveillance against cancer cells, in multiple phenotypic effects on somatic cells (e.g., cell cycle progression, proliferation, cell differentiation, transformation), and in cancer cell escape [11], [18], [19], [20]. Thus, in the Discussion, we describe other roles of IFN-γ, especially in terms of the way cancer cells or potentially atypical cells in CRC patients could adjust the local immune system via immunosuppression in order to escape from immunosurveillance.

Association among focal adhesion, NK cell-mediated cytotoxicity, and the early-onset CRC predictor gene set

As mentioned in the text above, Hong et al. [3] reported that early-onset susceptibility was attributed to the upregulated gene set called the “predictor gene set” in CRC patients that consists of CYR61, EGR1, FOSB, FOS, VIP, UCHL1, and KRT24. We inspected the associations among the genes listed in Table 1 and the predictor gene set with the text-mining tool, PubGene [4] ( (Figure S7). The input in the tool consisted of the predictor gene set, focal adhesion (FAK, ITGB5), and NK cell-mediated cytotoxicity (INFG, FAS, FASLG). Figure 5 shows an association network for the input genes in CRC. We already mentioned that β-catenin (CTNNB1, Wnt pathway in Figure 5) was regulated by FAK in focal adhesion. The predictor gene set, focal adhesion, and NK cell-mediated cytotoxicity were highly associated with each other in CRC.

Figure 5. The association network of the susceptible gene set from Hong et al. and several representative genes from Table 1.

The three green boxes represent the gene set (CYR61, FOS, FOSB, UCHL1, VIP, EGR1, KRT24), NK cell mediated cytotoxicity (IFNG, FAS, FASLG), and Focal adhesion (PTK2) from left to right. The options used in the network are described in Figure S7. The pale blue-filled boxes represent Mesh ( Diseases terms for the genes. It is noted that ITGB5 associations did not appear in the PubGene result.

Comparison of our method with Gene Set Enrichment Analysis (GSEA) of the Hong et al. dataset

We compared the KEGG pathways containing significant well-defined subpathways identified by our method with those KEGG pathways obtained from the GSEA JAVA web start program (default options with 5,000 permutations). In our method, the significance level (p-value) was 0.05 for the cutoff of the well-defined subpathways. The same p-value of 0.05 was used for the GSEA method. Our method reported 1,966 significant well-defined subpathways that corresponded to 78 KEGG pathways. The GSEA program reported 2 broad types of significant pathway lists: 10 activated pathways and 30 repressed pathways in the CRC patients. The number of overlapping pathways between the 2 methods was 6, which is not surprising when considering the differences between 2 methods. Nevertheless, it is interesting that the 2 methods identified 6 common cancer-associated pathways.

To compare the 78 pathways identified by our method with the 40 pathways identified by GSEA, we used the cancer-related pathways reported by Vogelstein et al. [13] as a gold standard. That is, we inspected which method provided more pathways consistent with the cancer-related pathways identified by Vogelstein et al. The cancer-related pathways from the Vogelstein et al. study were manually mapped to their corresponding KEGG pathways because KEGG pathway identifiers corresponding to the cancer-related pathways were not mentioned explicitly in the study. We then inspected the overlapping pathways between the Vogelstein cancer-related KEGG pathways and those identified by the 2 methods. As shown in Table S3, our method provided more consistent results with the cancer-related pathways identified in Vogelstein et al. than did the GSEA method. Further details on this section are described in the Appendix S1.

Comparison between the pathway substructure of the Hong et al. dataset and that of the other dataset

To determine how closely the pathway substructure of the Hong dataset overlaps with that of an additional colorectal dataset, we searched for an additional colorectal dataset from Gene Expression Omnibus (GEO). Although there are several datasets for CRC, it seems no datasets are available relating to a comparison of early-onset colorectal cancer patients with healthy controls, as is carried out in the Hong et al. study. Fortunately, we found the dataset GSE4183 [21], which compares various colorectal diseases (colorectal carcinoma, colorectal adenoma, inflammatory bowel diseases) with normal controls in a more general setting ( From the GSE4183 dataset, we obtained normal, healthy controls (n = 8) and colorectal carcinomas (n = 15). The GSE4183 dataset was analyzed with our method, which revealed 3,669 well-defined subpathways (identified from ∼130 million subpathways) upon determining their significance based on 100,000 sample permutation tests. Furthermore, the comparison between the GSE4183 dataset and the dataset from Hong et al. (GSE4107) showed that 250 well-defined subpathways overlapped between the 2 datasets. To determine how well these 2 results coincided with each other, we also performed Fisher's exact test based on the randomization model. The p-value from the hypergeometric distribution was less than 2.2e-16, implying that the 2 results coincided well with each other. Thus, we conclude that our finding relevant to Hong's pathway substructure is well supported by our finding from the other independent dataset. Further details on this section are described in the Appendix S1.


Our novel analysis suggests the following results: 1) The subsets from focal adhesion, pathways in cancer, and NK cell-mediated cytotoxicity are highly involved in early-onset sporadic CRC patients; and 2) Surprisingly, the text-mining analysis suggested that the molecular function of the predictor gene set for early-onset sporadic CRCs is associated with focal adhesion and NK cell-mediated cytotoxicity. In the text below, we discuss the potential molecular mechanisms of this association in terms of immunosuppression and EMT.


The recent literature [6], [11], [18], [19], [22], [23], [24] has set up a conceptual framework in which interactions between tumor and immunity are thought to help a number of cancer cells escape from immune-raiding by undergoing the following 3 phases in a linear or mixed manner: elimination (immunosurveillance), equilibrium (tumor dormancy), and escape (immunosuppression). In particular, cancer cell escape by immunosuppression [6], [8], [11], [19], [25], [26] has been extensively studied, and 2 types of immunosuppressive cells are thought to negatively regulate anti-tumor immune response: regulatory T cells (TReg) and myeloid-derived suppressor cells (MDSCs) [8], [11], [27], [28]. We discuss below the roles of IFN-γ, other cytokines, and the predictor gene set in terms of these 2 types of immunosuppressive cells in early-onset CRC patients.

It has been demonstrated previously that IFN-γ can induce activation and expansion of MDSCs in colon cancer [28], and that activated MDSCs not only inhibit effector T cell activity/proliferation but also induce immunosuppressive CD4+CD25+Foxp3+ TReg cells from CD4+CD25 T cells [6], [11], [28]. TReg cells, which also express CTLA-4, PD-1, and PD-L1 on their cell surfaces, positively regulate immunosuppressive cytokines interleukin (IL)-10 and tumor growth factor-beta (TGF-β), which can also induce TReg differentiation [6], [11], [29]. Since TReg cells are found in tumor infiltrating lymphocytes (TILs) in various cancers [6], [11], [29], the apparently normal mucosa in the CRC patients might have TILs present with immunosuppressive activity. ARG1 is also a key metabolic enzyme for MDSCs to negatively regulate lymphocyte functions by consuming or sequestering the amino acid arginine that is critical for T cell function. Thus, we inspected the gene expression levels of the examined genes (CD4, CD25, FOXP3, TGF-β, IFN-γ, IL-10, CTLA-4, PD-1, PD-L1, ARG1) in the CRC patients and confirmed they were all upregulated in the cancer patients (Table 3).

Table 3. The genes involved in immunosuppressive MDSCs and TReg cells in terms of immunosuppression.

To provide gene expression-level evidence of the presence of MDSCs in CRC patients, we inspected (directly or indirectly) several MDSC surface markers: CD11c (ITGAX), CD11b (ITGAM), CD33, CD34, and CD15 [6]. Of note, we examined FUT4 and FUT9 instead of CD15 because CD15 is not a protein but an antigen synthesized by FUT4 and FUT9 [30]. We found that all of the markers except FUT4 were upregulated in the cancer patients (Table 3).

It has been shown previously that cancer cells expressing PD-L1 on their surface secrete immunosuppressive cytokines Galectin-1, IL-6, IL-10, and TGF-β, which can inhibit cytotoxic CD8+ T cells [11]. Furthermore, the cancer cells producing higher levels of indoleamine 2,3-dioxygenase (IDO) can prevent invasion of NK cells and effector T cells by depleting tryptophan essential for T cell function [6], [11]. In the present study, we confirmed a higher expression level of the examined genes (for PD-L1, Galectin-1, IL-6, IL-10, IDO, and TGF-β) in the cancer patients (Table 3).

The gene expression analysis shown in Table 3 suggests that immunosuppressive activity is highly observed in apparently normal mucosa. This finding could provide additional information about a “field change” [31], which refers to proliferation and anti-apoptotic activity in the apparently normal mucosa adjacent to tumor. In other words, anti-apoptosis of the field change could also benefit from immunosuppression by escaping the immune-raid.

To look further for functional refinement of the predictor gene set in terms of immunosuppression, we fed the predictor gene set into PubGene [4] with the MeSH ( term “immunosuppression” (descriptor ID: D007165). The result (data not shown) obtained at the time of manuscript preparation indicated that 4 genes (EGR1, FOS, UCHL1, and VIP) have an association with immunosuppression according to the literature. Based on a review by Ganea et al., which was suggested by PubGene, VIP (a well-known immunoregulatory neuropeptide) inhibits the secretion of proinflammatory cytokines and induces TReg cells [32]. Other recent studies also support the immunosuppressive roles of VIP [33], [34] because VIP relieves collagen-induced arthritis and sarcoidosis by inducing CD4+CD25+Foxp3+ TReg cells from CD4+CD25 T cells. VIP is also involved in immune privileges in the eye by inhibiting T lymphocyte activation and proliferation [35]. Thus, high VIP expression in CRC patients may pinpoint another major immunoregulatory cytokine in our analysis.


We also inspected the expression level of EMT-related genes [9], [36], including matrix proteases, invasion molecules, epithelial/mesenchymal markers, and E-cadherin repressors. We found that the majority of them were upregulated in the cancer patients (Table 4). Therefore, the EMT [9] process can take place in the cancer patients, at least in terms of gene expression. This finding is unexpected in that atypical or precancerous cells could exist even in the normal appearing mucosa by cell morphology changes (e.g., EMT).

To explore the potential roles of the predictor gene set, we input the predictor gene set into PubGene [4] with the MeSH term “Epithelial-Mesenchymal Transition” (descriptor ID: D058750). At the time of the manuscript preparation, 3 genes (EGR1, FOS, CYR61) out of the predictor gene set were found in the literature to have an association with EMT.

In particular, we paid attention to the gene CYR61 because CYR61 is a ligand that can trigger a focal adhesion pathway. Monnier et al. [37] demonstrated that CYR61-αVβ5 integrin-induced metastasis was involved in the tumor bed effect after radiotherapy upon utilizing HCT116 CRC cell derivatives in hypoxic conditions. Additional recent studies on CYR61-driven development of cell motility in pancreatic ductal adenocarcinoma and in gastric epithelial cells [38], [39] indicate that CYR61 is one of the key molecules for EMT that could confer metastatic ability and cell motility to a primary tumor. Thus, CYR61 may be one of the driving molecules for enhancing EMT-related pathways (Wnt and PI3K/AKT signals) [9], [40] in early-onset CRC patients via the CYR61-FAK axis (Figure 3).

Another interesting finding we made upon examining the relationship between EMT and the predictor gene set was VIP, which was recently reported to induce EMT with the stimulation of matrix proteases matrix metalloproteinase (MMP)-2 and MMP-9 in prostate tumorigenesis [41]. We found that gene expression of these 2 proteases was indeed upregulated in the cancer patients (Table 4).

Cytokines commonly involved in both EMT and immunosuppression

Because we found VIP is a cytokine involved in both EMT and immunosuppression, our finding implies paracrine signaling between immune cells and various target cells is involved in both processes. We also found an additional cytokine involved in the 2 processes, in that immunosuppressive TGF-β (TGFB1; Table S5) [22], [29], [36], [42] is a well-known EMT inducer [9], [43]. Indeed, we found that the majority of TGF-βs and their receptors were upregulated in the CRC patients.


Our gene expression data analysis suggests that at least 2 entries (VIP, CYR61) of the predictor gene set are functionally involved in phenotypical EMT induction by focal adhesion downstream (Wnt, PI3K/AKT, MAPK) and immunosuppression (Figure 6). The involvement of EMT in the apparently normal mucosa of the CRC patients suggests that a subpopulation of cells in the mucosa have experienced intrinsic transformation toward atypical or cancerous phenotypes. Furthermore, potential atypical cells may survive against immune cells by utilizing immunosuppressive cytokines (e.g., VIP, TGF-β). Promotion of such an intrinsic survival environment in the apparently normal mucosa is closely aligned with the clinical observation of a field change [31], which refers to proliferation and anti-apoptotic activity in apparently normal mucosa adjacent to tumor. In the process of immunosuppression, the immunosuppressive cytokines VIP (a member of the predictor gene set) and TGF-β may be highly involved in the dynamics between potential atypical cells and immune cells via paracrine signaling (Figure 6). Our study suggests the co-occurrence of EMT and immunosuppression [36] even in normal-appearing mucosa in early-onset CRC patients. Finally, our biological suggestion needs to be validated experimentally in future studies on early-onset CRC in terms of dedifferentiation or differentiation, which is underscored in EMT and immunosuppression.

Figure 6. Summary of functional roles of the predictor gene set in terms of EMT and immunosuppression.

The two elements (VIP, CYR61) of the predictor gene set can adjust the local immune system and induce malignant phenotype transformation via EMT.

Materials and Methods


Gene expression data for early-onset CRCs were downloaded from NCBI GEO (; the dataset identifier is GSE4107 [3]. This dataset consists of data for 12 CRC patients and 10 healthy controls. Normal-appearing mucosa adjacent to tumor had been obtained from the CRC patients and normal mucosa obtained from the healthy controls. The predictor gene set was derived from comparison between the normal-appearing mucosa from the cancer patients and the normal mucosa from the controls. Of note, the patients did not have FAP or HNPCC. We obtained prior regulation information from KEGG [5].

Decomposition of the KEGG pathways into linear subpathways

For simplicity, all the pathways of interest were divided into linear subpathways by modifying the CPAN Paths::Graph library (∼cavasquez/Paths-Graph/ (Figure 2). The linear subpathway is a sequence of linearly connected gene entities from root node to leaf node. The root nodes are generally membrane receptors, their ligands, and so on. The leaf nodes are usually transcription factors and signaling initiators toward the other pathways. We extracted as many linear subpathways as possible, considering multiple gene assignments of each node.

Rules for gene expression and edge information of KEGG

Our goal was to identify subpathways in which gene expression agreed with prior regulation information (e.g., activation, inhibition) in KEGG pathways. The gene regulation among the entries in KEGG pathways was considered to be prior knowledge. Edge types in KEGG represent regulations between the connected entities. We simplified the edges into only 2 types: activation and repression. We also assumed rules for matching an edge type of 2 adjacent entities to their gene expression changes (Figure 1A) [44]. Given a subpathway, we identified the longest consecutive segment beginning from its leaf node; the segment had to satisfy the assumed rules. The segment is referred to as a “well-defined subpathway” in terms of gene expression data and prior knowledge (Figure 1B). Further mathematical representation is also described below detailing how we obtained the well-defined subpathway.

Given a subpathway with the number of nodes (genes) p, the leaf node was set to G1 and the root node to Gp. The node Gi had its binary representation (bi) of a fold-change (fi) for cancer over control that was obtained from gene expression data. If fi>1, then bi was +1, otherwise it was −1. The prior edge type ei between the source node Gi+1 and the sink node Gi was either +1 (activation) or −1 (repression) (Figures 1B and 7). The expression ei×bi×bi+1 should have been equal to 1 if the expression matched with the regulations under the rule. Again, edge information ei was derived from the prior knowledge from KEGG, and fi and bi were derived from gene expression data. In summary, the number of nodes (n) of a well-defined subpathway was defined as follows:The function I(·) represents the indicator function and the function R(·) is a penalizing term that prevents probing progression from the leaf node to the root node when prior edge information and expression data did not agree with the rules. Figure 7 shows an example of identification of the well-defined subpathway using the previous equation, given a subpathway. It is noted that mathematical notation is also used throughout the manuscript.

Figure 7. The example of determination of the well-defined subpathway from a subpathway.

Statistical analysis

In this section, we describe a statistical model to determine significance for the well-defined subpathways. We conceptually divided the well-defined subpathway into 3 components referred to as node score (Fnode), edge score (Fedge), and score for the number of nodes (Fnum), ultimately in order to define a total score (Ftotal) of the well-defined subpathway. We designed Ftotal such that the more differentially expressed genes agreed with the rule (Figure 1A) in the well-defined subpathway, the less Ftotal was equal to. Fnode represents the differential expression of the entries, Fedge represents the regulation among the adjacent entries, and Fnum represents length normalization. Fnum, a normalization factor for n (the number of nodes in the well-defined subpathway), is necessary because the longer n was, the less Ftotal was when different well-defined subpathways were compared. In the text below, we describe the biological rationale and mathematical representation for the scores.

Before providing further explanation, we must define the terms source node and sink node. That is, given edge en−i, the upstream node Gn−(i−1) is called a source node, whereas the downstream node Gn−i is called a sink node (Figure 1A, B).

Typical expression analysis schemes focus only on highly differentially expressed genes under a certain cutoff (e.g., p-value), but it is important to consider that signaling proteins of an activated or repressed pathway involved in phenotype differences might not be changed drastically at the expression-level [45]. In other words, employing strict cutoff usage in gene expression data involves difficulties in uncovering signal cascading flows because some entries within the signal cascading flows could be missed under that cutoff. In contrast, Fnode does not filter out low differential expression with an arbitrary condition because the p-values of all the entries within the well-defined subpathway are considered.where n is the number of nodes in the well-defined subpathway and p(·) is a p-value of a gene in a two-sample t-test between the cancer and control groups. Therefore, the node score contains both high and low differential expressions without a strict cutoff.

Fedge reflects edge information (e.g., activation, inhibition) between 2 adjacent entries and derives from a joint distribution of activities of a source node and its corresponding sink node. The basic idea of Fedge is that, given a source node activity of edge en−i, its corresponding sink node activity is expected to be highly dys-regulated, which indicates a rare event. Therefore, Fedge follows, by nature, the first-order Markov chain property in which a current event depends only on its predecessor because we assume that Gn−i is only regulated by its direct upstream source node Gn−(i−1) of edge en−i.

In Figure 1A, we used fold-changes to determine whether the edge information complied with gene expression. The activities around the edge were thus measured from fold-changes in cancer over control, and the activity was simply transformed into a log2 scale for better normality. In other words, log2-transformed fold-changes of the 2 adjacent nodes were used as a measure of edge information, and the joint distribution of a source node and its sink node was calculated.

According to the first-order Markov chain property, we can simply define Fedge as the products of conditional probabilities log2(fn−i)|log2(fn−(i−1)) from i = 1 to n−1 multiplied by a prior Pr(log2(fn)). The conditional probabilities can be derived from the joint distribution of the pair (log2(fold-change of the source node), log2(fold-change of the sink node)), which is assumed to be a bivariate normal distribution.where n is the number of nodes in the well-defined subpathway.

To determine the joint distribution of the pair (log2(fold-change of the source node), log2 (fold-change of the sink node)), we extracted all the edges from the KEGG XML files (KGML files) and obtained the source nodes and their corresponding sink nodes from the edges. The log2-transformed fold-changes (e.g., a pair (log2fn−(i−1), log2fni)) of the cancer group over the control group for the pair source node and sink node were obtained from the microarrays.Prior probability (e.g., Pr(log2fn)) of the log2-transformed fold-change of a gene was assumed to be a univariate normal distribution, which is obtained from the log2-transformed fold-changes of all the entries belonging to the KEGG pathways.

Fnum was derived from a random graph model. We used an R statistical package library “igraph” [46] to make 1,000 simulated random graphs with the number of nodes set to 200 and the number of edges to 100. We reduced the random graphs into all linearly connected paths from roots to leaves, and the distribution of the number of edges for the paths was calculated (Figure S8). Fnum was borrowed from the distribution of the number of edges from the random graphs. Subsequently, n (the number of connected nodes in the well-defined subpathway) was considered as equal to 1 plus the number of edges in the well-defined subpathway.

Finally, we defined Ftotal as follows:For computational simplicity, we used its minus natural logarithm of Ftotal as a statistic instead:


We obtained the null distribution of S by generating 1,000,000 permuted samples. Each permuted sample was generated by shuffling the sample labels in the microarrays. For each permutation, we applied the rules to the original subpathways in order to redefine their well-defined subpathways, and then performed the same procedures discussed above. The p-value was obtained empirically by computing Pr(Ss), where s was the observed value in the original gene expression data.

We also provide detailed information of all 4,644 well-defined subpathways in Table S4 (xls format), including their p-values, FDR q-values, and regulation information.

Supporting Information

Figure S1.

Statistic S, p-value and multiple comparison correction. A. S versus −log10(p-value) in the 4,644 well-defined subpathways. The x-axis represents −1og10(p-value) and the y-axis S. B. −log10(p-value) versus FDR q-value. The x-axis is FDR q-value and the y axis −log10(p-value). The FDR q-values as well as p-values were summarized in Table S4 (see the sixth and eighth columns in Table S4).


Figure S2.

KEGG pathways containing the top 30% well-defined subpathways. The x-axis represents the number of the significant well-defined subpathways corresponding to the KEGG pathway.


Figure S3.

Pathways in cancer (KEGG hsa05200). Red boxes are activated in the CRC patients over the healthy controls. Green boxes are down-regulated in the CRC patients.


Figure S4.

MAPK signaling pathway (KEGG hsa04010). Red boxes are activated in the CRC patients over the healthy controls. Green boxes are down-regulated in the CRC patients.


Figure S5.

Wnt signaling pathway (KEGG hsa04310). Red boxes are activated in the CRC patients over the healthy controls. Green boxes are down-regulated in the CRC patients.


Figure S6.

Neutrophin signaling pathway (KEGG hsa04722). Red boxes are activated in the CRC patients over the healthy controls. Green boxes are down-regulated in the CRC patients.


Figure S7.

The input item options used in Figure 5. The item “Gene/Protein” in the PubGene input webpage is CYR61, FOS, FOSB, UCHL1, VIP, EGR1, KRT24, PTK2, ITGB5, IFNG, FAS, and FASLG. The item “Biological term” in the webpage is colorectal cancer.


Figure S8.

Distribution of the number of edges in the linearly connected paths based on the 1,000 simulated random graphs. The x-axis represents the number of the edges, and the y-axis probability.


Table S1.

The numeric identifiers of the well-defined subpathways used for the functional discussion and visualization of the six KEGG pathways. The number indicates column “No.” in Table S4 (xls format). Readers see all the information of significance, regulation flow, fold-change and so on from Table S4.


Table S2.

We fed the entries in Table 1 into PubGene in order to validate literature-based associations between our result and the term “colorectal cancer”. The listed genes have no direct co-occurrence with the term “colorectal cancer” according to PubGene. The majority (79%) of the entries in Table 1 have publication-based evidences. It is noted that CYR61 and FASLG in Table 1 were not included in the PubGene validation analysis because the two genes were not reported in our statistical analysis.


Table S3.

Comparison with our method and GSEA. We set the Vogelstein cancer-related pathways [13] (first column) as a gold standard. We inspected overlap between the gold standard and each method result. As a result, our method performed better than GSEA. The second column represents KEGG pathways corresponding to the first column. (O: overlap, X: no overlap)


Table S4.

Detailed information of all the 4,644 well-defined subpathways. No.: numeric identifier for the well-defined subpathway, KEGG: its corresponding KEGG pathway identifier, Title: KEGG pathway name, WellDefinedSubpathwayWithFoldChange: signaling flow of the well-defined subpathway with fold-change of the cancer patients over the healthy control, NumNodes: the number of entries, P-value: nominal p-value, S: our statistic, FDR (q-value): adjusted p-value, −log10(P-value): minus logarithm of p-value with base 10.


Table S5.

The expressions of TGF-βs and their receptors were summarized. Majority of the genes were up-regulated in the cancer except TGFBR1.


Appendix S1.

The additional analysis for GSEA comparison and independent dataset validation.



SN thanks Drs. Sanghyuk Lee at Ewha Womans Univ. and Beom Gyu Choi at National Cancer Center, and M.D. So Youn Jung at National Cancer Center for helpful discussion. Authors also thank anonymous reviewers for their insightful critiques.

Author Contributions

Conceived and designed the experiments: SN TP. Performed the experiments: SN. Analyzed the data: SN. Contributed reagents/materials/analysis tools: SN. Wrote the paper: SN TP.


  1. 1. Markowitz SD, Bertagnolli MM (2009) Molecular origins of cancer: Molecular basis of colorectal cancer. N Engl J Med 361: 2449–2460.
  2. 2. American Cancer Society (2010) Cancer Facts & Figures 2010. Atlanta: American Cancer Society.
  3. 3. Hong Y, Ho KS, Eu KW, Cheah PY (2007) A susceptibility gene set for early onset colorectal cancer that integrates diverse signaling pathways: implication for tumorigenesis. Clin Cancer Res 13: 1107–1114.
  4. 4. Jenssen TK, Laegreid A, Komorowski J, Hovig E (2001) A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 28: 21–28.
  5. 5. Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M (2010) KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 38: D355–360.
  6. 6. Du C, Wang Y (2011) The immunoregulatory mechanisms of carcinoma for its survival and development. J Exp Clin Cancer Res 30: 12.
  7. 7. Parsons JT, Slack-Davis J, Tilghman R, Roberts WG (2008) Focal adhesion kinase: targeting adhesion signaling pathways for therapeutic intervention. Clin Cancer Res 14: 627–632.
  8. 8. Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next generation. Cell 144: 646–674.
  9. 9. Polyak K, Weinberg RA (2009) Transitions between epithelial and mesenchymal states: acquisition of malignant and stem cell traits. Nat Rev Cancer 9: 265–273.
  10. 10. Reiman JM, Knutson KL, Radisky DC (2010) Immune promotion of epithelial-mesenchymal transition and generation of breast cancer stem cells. Cancer Res 70: 3005–3008.
  11. 11. Schreiber RD, Old LJ, Smyth MJ (2011) Cancer immunoediting: integrating immunity's roles in cancer suppression and promotion. Science 331: 1565–1570.
  12. 12. Benjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B (Methodological) 57: 289–300.
  13. 13. Vogelstein B, Kinzler KW (2004) Cancer genes and the pathways they control. Nat Med 10: 789–799.
  14. 14. Sansal I, Sellers WR (2004) The biology and clinical relevance of the PTEN tumor suppressor pathway. J Clin Oncol 22: 2954–2963.
  15. 15. Yu HG, Nam JO, Miller NL, Tanjoni I, Walsh C, et al. (2011) p190RhoGEF (Rgnef) promotes colon carcinoma tumor progression via interaction with focal adhesion kinase. Cancer Res 71: 360–370.
  16. 16. Pai SI, Lin YY, Macaes B, Meneshian A, Hung CF, et al. (2006) Prospects of RNA interference therapy for cancer. Gene Ther 13: 464–477.
  17. 17. Shimoyama M, Kanda T, Liu L, Koyama Y, Suda T, et al. (2001) Expression of Fas ligand is an early event in colorectal carcinogenesis. J Surg Oncol 76: 63–68; discussion 69.
  18. 18. Rocken M (2010) Early tumor dissemination, but late metastasis: insights into tumor dormancy. J Clin Invest 120: 1800–1803.
  19. 19. Dunn GP, Koebel CM, Schreiber RD (2006) Interferons, immunity and cancer immunoediting. Nat Rev Immunol 6: 836–848.
  20. 20. Muller-Hermelink N, Braumuller H, Pichler B, Wieder T, Mailhammer R, et al. (2008) TNFR1 signaling and IFN-gamma signaling determine whether T cells induce tumor dormancy or promote multistage carcinogenesis. Cancer Cell 13: 507–518.
  21. 21. Gyorffy B, Molnar B, Lage H, Szallasi Z, Eklund AC (2009) Evaluation of microarray preprocessing algorithms based on concordance with RT-PCR in clinical samples. PLoS One 4: e5645.
  22. 22. Kim IY, Kim MM, Kim SJ (2005) Transforming growth factor-beta : biology and clinical relevance. J Biochem Mol Biol 38: 1–8.
  23. 23. Reiman JM, Kmieciak M, Manjili MH, Knutson KL (2007) Tumor immunoediting and immunosculpting pathways to cancer progression. Semin Cancer Biol 17: 275–287.
  24. 24. Dunn GP, Old LJ, Schreiber RD (2004) The immunobiology of cancer immunosurveillance and immunoediting. Immunity 21: 137–148.
  25. 25. Oble DA, Loewe R, Yu P, Mihm MC Jr (2009) Focus on TILs: prognostic significance of tumor infiltrating lymphocytes in human melanoma. Cancer Immun 9: 3.
  26. 26. Bach JF (2003) Regulatory T cells under scrutiny. Nat Rev Immunol 3: 189–198.
  27. 27. Murdoch C, Muthana M, Coffelt SB, Lewis CE (2008) The role of myeloid cells in the promotion of tumour angiogenesis. Nat Rev Cancer 8: 618–631.
  28. 28. Gabrilovich DI, Nagaraj S (2009) Myeloid-derived suppressor cells as regulators of the immune system. Nat Rev Immunol 9: 162–174.
  29. 29. Li MO, Wan YY, Sanjabi S, Robertson AK, Flavell RA (2006) Transforming growth factor-beta regulation of immune responses. Annu Rev Immunol 24: 99–146.
  30. 30. Nakayama F, Nishihara S, Iwasaki H, Kudo T, Okubo R, et al. (2001) CD15 expression in mature granulocytes is determined by alpha 1,3-fucosyltransferase IX, but in promyelocytes and monocytes by alpha 1,3-fucosyltransferase IV. J Biol Chem 276: 16100–16106.
  31. 31. Badvie S, Hanna-Morris A, Andreyev HJ, Cohen P, Saini S, et al. (2006) A “field change” of inhibited apoptosis occurs in colorectal mucosa adjacent to colorectal adenocarcinoma. J Clin Pathol 59: 942–946.
  32. 32. Ganea D, Gonzalez-Rey E, Delgado M (2006) A novel mechanism for immunosuppression: from neuropeptides to regulatory T cells. J Neuroimmune Pharmacol 1: 400–409.
  33. 33. Prasse A, Zissel G, Lutzen N, Schupp J, Schmiedlin R, et al. (2010) Inhaled vasoactive intestinal peptide exerts immunoregulatory effects in sarcoidosis. Am J Respir Crit Care Med 182: 540–548.
  34. 34. Gonzalez-Rey E, Fernandez-Martin A, Chorny A, Delgado M (2006) Vasoactive intestinal peptide induces CD4+,CD25+ T regulatory cells with therapeutic effect in collagen-induced arthritis. Arthritis Rheum 54: 864–876.
  35. 35. Niederkorn JY (2009) Immune escape mechanisms of intraocular tumors. Prog Retin Eye Res 28: 329–347.
  36. 36. Knutson KL, Lu H, Stone B, Reiman JM, Behrens MD, et al. (2006) Immunoediting of cancers may lead to epithelial to mesenchymal transition. J Immunol 177: 1526–1533.
  37. 37. Monnier Y, Farmer P, Bieler G, Imaizumi N, Sengstag T, et al. (2008) CYR61 and alphaVbeta5 integrin cooperate to promote invasion and metastasis of tumors growing in preirradiated stroma. Cancer Res 68: 7323–7331.
  38. 38. Chai J, Norng M, Modak C, Reavis KM, Mouazzen W, et al. (2010) CCN1 induces a reversible epithelial-mesenchymal transition in gastric epithelial cells. Lab Invest 90: 1140–1151.
  39. 39. Haque I, Mehta S, Majumder M, Dhar K, De A, et al. (2011) Cyr61/CCN1 signaling is critical for epithelial-mesenchymal transition and stemness and promotes pancreatic carcinogenesis. Mol Cancer 10: 8.
  40. 40. Jiang BH, Liu LZ (2008) PI3K/PTEN signaling in tumorigenesis and angiogenesis. Biochim Biophys Acta 1784: 150–158.
  41. 41. Fernandez-Martinez AB, Bajo AM, Isabel Arenas M, Sanchez-Chapado M, Prieto JC, et al. (2010) Vasoactive intestinal peptide (VIP) induces malignant transformation of the human prostate epithelial cell line RWPE-1. Cancer Lett 299: 11–21.
  42. 42. Fu S, Zhang N, Yopp AC, Chen D, Mao M, et al. (2004) TGF-beta induces Foxp3+ T-regulatory cells from CD4+ CD25− precursors. Am J Transplant 4: 1614–1627.
  43. 43. Bates RC, Mercurio AM (2003) Tumor necrosis factor-alpha stimulates the epithelial-to-mesenchymal transition of human colonic organoids. Mol Biol Cell 14: 1790–1800.
  44. 44. Tarca AL, Draghici S, Khatri P, Hassan SS, Mittal P, et al. (2009) A novel signaling pathway impact analysis. Bioinformatics 25: 75–82.
  45. 45. Kotelnikova E, Ivanikova N, Kalinin A, Yuryev A, Daraselia N (2010) Atlas of signaling for interpretation of microarray experiments. PLoS One 5: e9256.
  46. 46. Csardi G, Nepusz T (2006) The igraph software package for complex network research. Inter Journal Complex Systems 1695.