Computational analysis for identification of the extracellular matrix molecules involved in endometrial cancer progression.

Recurrence and poorly differentiated (grade 3 and above) and atypical cell type endometrial cancer (EC) have poor prognosis outcome. The mechanisms and characteristics of recurrence and distal metastasis of EC remain unclear. The extracellular matrix (ECM) of the reproductive tract in women undergoes extensive structural remodelling changes every month. Altered ECMs surrounding cells were believed to play crucial roles in a cancer progression. To decipher the associations between ECM and EC development, we generated a PAN-ECM Data list of 1516 genes including ECM molecules (ECMs), synthetic and degradation enzymes for ECMs, ECM receptors, and soluble molecules that regulate ECM and used RNA-Seq data from The Cancer Genome Atlas (TCGA) for the studies. The alterations of PAN-ECM genes by comparing the RNA-Seq expressions profiles of EC samples which have been grouped as tumorigenesis and metastasis group based on their pathological grading were identified. Differential analyses including functional enrichment, co-expression network, and molecular network analysis were carried out to identify the specific PAN-ECM genes that may involve in the progression of EC. Eight hundred and thirty-one and 241 PAN-ECM genes were significantly involved in tumorigenesis (p-value <1.571e-15) and metastasis (p-value <2.2e-16), respectively, whereas 140 genes were in the intersection of tumorigenesis and metastasis. Interestingly, 92 of the 140 intersecting PAN-ECM genes showed contrasting fold changes between the tumorigenesis and metastasis datasets. Enrichment analysis for the contrast PAN-ECM genes indicated pathways such as GP6 signaling, ILK signaling, and interleukin (IL)-8 signaling pathways were activated in metastasis but inhibited in tumorigenesis. The significantly activated ECM and ECM associated genes in GP6 signaling, ILK signaling, and interleukin (IL)-8 signaling pathways may play crucial roles in metastasis of EC. Our study provides a better understanding of the etiology and the progression of EC.

Introduction ECM plays a critical role in the female endometrium, being involved in major remodelling changes every month. The ECM of the reproductive tract in women undergoes extensive structural remodelling for decidualization, implantation, and renewal of the endometrial layer each month, while abnormal ECM changes can contribute to processes such as endometriosis, infertility, and cancer development and metastasis [23]. The invasive property is considered as cancer cells intrude into surrounding normal tissues and metastasize. Alterations of ECM, ECM-associated molecules, and the local ECM microenvironment that can promote tumor progression are illustrated in Fig 1. Some of the ECM molecules such as aggrecan, nidogen, collagen type VIII chain α1 and type XI chain α2, were overexpressed in stage III EC [24], matrix metalloproteinase (MMP)-9, an ECM degradation enzyme, might play a vital role in uterine cancer progression [25]. Previous studies by Grabarek B et.al [26] also shows normal human dermal fibroblast cells along with JAK/STAT signaling pathways regulates the numerous miR's expression under the influence of adalimumab therapy, indicating ECM molecules can be targeted for treatment effectiveness. However, the regulatory mechanisms of ECM and ECM-associated molecules in EC are still unclear.
The ECM and the ECM associated molecules involved in the microenvironment surrounding the cells seem to play important roles in tumor development; however, a comprehensive analysis of the regulatory mechanisms of ECM during EC development is still lacking. In this study, we attempted to establish a PAN-ECM Data List by collecting ECM and ECM-related genes through a literature survey and incorporation of related databases. Subsequently, we retrieved the gene expression profiles of EC samples at different stages (normal, early, and late) from The Cancer Genome Atlas-Uterine Corpus Endometrial Carcinoma (TCGA-UCEC) data, and used different bioinformatic analysis tools including DESeq2, FunRich, Ingenuity Pathway Analysis (IPA), and Weighted correlation network analysis (WGCNA) to identify and verify the key ECMs and their associates that may participate EC progression. A flow chart and an overview of the computational analysis used in this study was shown in Fig 2.
Therefore, we herein focused on developing a complete dataset for ECM research and called it the PAN-ECM Data List. The construction pipeline is depicted in Fig 2A. ECM-associated genes were collected from a literature search using such keywords as cell adhesion molecules, extracellular matrix protein, proteoglycan, collagen, glycoprotein, ECM regulator, ECM micro-environment associated, proteoglycans and secreted factors, etc. Additionally, ECM genes in MatrisomeDB were also integrated into the PAN-ECM Data List, and their functions were annotated from functional annotation databases including Gencard.

EC data collection and analysis of differentially expressed genes (DEGs)
Inclusion and exclusion criteria for UCEC sample and the classification of DEGs. The RNA-Seq level 3-gene expression data for normal and solid tumors from UCEC patients were collected from TCGA [36] (Broad GDAC Firehose database: http://firebrowse.org/?cohort= UCEC). In total, 35 normal samples, and 476 tumor samples were used for this study. Based on FIGO staging for endometrial carcinoma, the tumor samples at stage IA was classified as early-stage samples, samples obtained from patients at stage II and above were classified as late-stage samples. In this study, 159 samples were late-stage and 317 samples were early-stage samples. Based on the description of the pathology in FIGO staging, EC tumors at stage IB has � 50% invasion of the myometrium. To avoid the interferences, the data from the samples at stage IB patient was excluded in this study.
Both raw read counts and normalized fragments per kilobase of transcripts per million mapped reads (FPKM) values (FPKM-UQ) at the gene level were used for further analysis.
Clinical XML files were also downloaded to retrieve stage and related clinical information, which was used to recognize tumorigenesis and metastasis samples, as explained in FIGO guidelines [21,22]. Tumorigenesis DEGs were identified by comparison the DEGs from earlystage (stage IA) to normal samples, and metastatic DEGs were identified by comparing the DEGs from late-stage (stage II or above) to early-stage EC samples (stage IA).
Statistical analysis. Expression data were pre-processed and analyzed using DESeq2, an R package [37], and log2 fold change cutoffs of >1.5 and adjusted p values of <0.01 were used to determine significant DEGs. With thousands of genes tested in an RNA-Seq, multiple comparison adjustments were necessary. The Bonferroni method was applied for filtering DEGs; this controls the mean number of false positives, that can be used for multiplicity adjustment [38]. Samples at different stages were taken for a comparative analysis to identify tumorigenesisand metastasis-associated DEGs during EC development, as illustrated in Fig 2B. A Chisquared test was applied to test whether ECM genes were significantly associated with EC tumorigenesis or metastasis.

Analysis of ECM regulatory mechanisms during EC development
To identify potential functions and regulatory roles of the ECM during EC development, as shown in Fig 2C, three different network analytical approaches were applied: a canonical pathway network analysis, molecular interaction network analysis, and co-expression network analysis. The canonical pathway network analysis was performed by using an Ingenuity Pathway Analysis (IPA) [39], that uses the popular activation z-score analytical method. It was proposed by Kraèmer et al. in 2014 [39], and it measures activation states, (either increased or decreased) of pathways affected by DEGs. We used a statistical approach to define a quantitative z-score, which determines whether a biological function has significantly more "increased" predictions than "decreased" predictions (z-score >0) or vice versa (z-score <0). In general practice, an absolute z-score of >2 or <-2 may be significant. A molecular interaction network analysis was performed using FunRich [40] to generate networks based on predicted or experimentally validated interactions, such as protein-protein interactions and gene-gene regulation to identify critical ECM-involved networks or critical ECM hub genes. The coexpression network analysis was performed using WGCNA, an R package [41], for EC samples. The Blockwi-seModules function was applied to construct gene coexpression hierarchical clustering, and dynamic tree cut methods were used to identify co-expression networks.

PAN-ECM Data List
Based on a literature survey, a list of ECM and ECM-associated molecules including 1516 ECM-related genes, called as PAN-ECM Data List was generated (S1 Table). The PAN-ECM genes were then classified into 15 categories according to their biological functions and localizations: cell surface receptors, cytoskeleton, cell adhesion molecules, collagens, ECM receptor cofactors, ECM receptors, ECM, ECM-associated regulatory factors, ECM synthetic/degradation enzymes, ECM-affiliated proteins, ECM glycoproteins, ECM regulators, ECM microenvironment-associated, proteoglycans, and secreted factors. Numbers of genes in different categories of the PAN-ECM Data List are given in

Tumorigenesis and metastasis DEG analysis
Total of 20,531 DEGs from EC patients was retrieved from TCGA. Background distribution of 1516 PAN-ECM genes in a total of 20,531 DEGs is 7.38%. Based on FIGO staging for endometrial carcinoma development, we investigated the expression level of the DEGs and classified the DEGs from normal endometrium as the normal group, the DEGs from EC samples at stage IA as an early-stage group, DEGs from EC samples at stage II and the above as late-stage group. By comparing the expression level, 7880 DEGs were considered to be tumorigenesisassociated (early vs. normal) and 1677 DEGs were metastasis-associated (late vs. early), respectively. In those tumorigeneses-and metastasis-associated DEGs, 831 (S2 Table) and 241 (S3 Table) DEGs were PAN-ECM genes, respectively. Interestingly, the 831 tumorigenesis-associated PAN-ECM genes of 7880 DEGs (10.54%, Chi-squared p<1.571e-15) and the 241 metastasis-associated PAN-ECM genes of 1677 DEGs (14.37%, Chi-square p<2.2e-16) were significant, as compared to the background distribution of 7.38% in the total of 20,531 DEGs. Detailed information about fold change values in correlation with tumorigenesis and metastasis and the list of 140 intersecting PAN-ECM DEGs were shown in Table 1. These PAN-ECM genes were separated into four different categories according to their fold changes as up/up (37 genes), up/down (35 genes), down/up (57 genes), and down/down (11 genes) in tumorigenesis and metastasis, respectively. It indicates that 48 PAN-ECM genes were either up/up or down/down throughout all stages during EC development, whereas 92 PAN-ECM genes were expressed in contrasting direction of fold change (up/down or down/up) at tumorigenesis and metastasis stages.

Pathway analysis of tumorigenesis-and metastasis-related PAN-ECM genes
IPA was applied for pathway analysis for the identified EC-associated PAN-ECM genes ( Fig  4A). Numerous signaling pathways associated with the140 intersecting PAN-ECM DEGs were identified ( Fig 4B). The identified pathways included GP6 signaling, IL-8 signaling, ILK  signaling, neuroinflammation signaling pathway, colorectal cancer metastasis signaling, leukocyte extravasation signaling, and osteoarthritis pathway. Interestingly, several signalling pathways were expressed in opposite fold changes at tumorigenesis and metastasis stages. GP6 signaling pathway was inhibited with the lowest z-score (-0.832) in tumorigenesis but activated in metastasis with the highest z-score (3.605). IL-8 signaling (z-score = -1), ILK signaling (z-score = -1), and neuroinflammation signaling pathways (z-score = -0.447) were also inhibited at tumorigenesis, but activated in metastasis. Moreover, pathways such as colorectal cancer metastasis signaling (zscore = 1.133 in tumorigenesis and 1.889 in metastasis), leukocyte extravasation signaling (z-score = 1 and 1), and osteoarthritis (z-scores = 0.447 and 1.341) were activated at both tumorigenesis and metastasis stages.
Pathway analyses were also carried out for the identified tumorigenesis and metastasis-associated PAN-ECM genes, detailed canonical pathways generated from 831 tumorigenesis-and 241 metastasis-associated PAN-ECM genes were shown in S1 Fig. Notably, GP6 signaling pathway had the highest z-score (4.14) in the metastasis-associated PAN-ECM genes and the lowest z-score (-1.414) in the tumorigenesis-associated PAN-ECM genes, that resembles the results from the analyses for the intersecting PAN-ECM genes. The PAN-ECM genes involved in the GP6 signaling pathway were listed in Fig 5. The PAN-ECM genes including COL4A4, COL6A6, ITGB3, PI3K, and αIIbβ3 in GP6 signaling pathway were upregulated in metastasis and downregulated in tumorigenesis.

Functional enrichment analysis of tumorigenesis-and metastasisassociated DEGs
To depict the possible roles of the intersecting PAN-ECM genes in tumorigenesis and metastasis of EC, the disease-related and bio functions of the DEGs from IPA were enumerated in

PLOS ONE
Computational analysis for the ECM involved endometrial cancer progression Table 2. Interestingly, the most correlated bio functions, cell movement, colony formation, cell proliferation, cell migration, adhesion of immune cells, and invasion of cells were activated with higher z-scores (>2) in metastasis and inhibited in tumorigenesis with lower z-scores (<-2). The bio functions, such as cell death and survival, cell-to-cell signaling, proliferation of epithelial cells, cellular growth, and cell cycle mitogenesis, were activated in both metastasis and tumorigenesis of EC. It suggested that the differential activation and inhibition of the bio functions of the PAN-ECM genes may lead to EC progression.

Network analysis of intersecting PAN-ECM genes between tumorigenesis and metastasis DEGs
To decipher the possible regulatory mechanisms of the ECM involved microenvironment during EC development, a molecular interaction network was generated by applying the 140 intersecting PAN-ECM genes using FunRich analysis. The generated network is shown in Fig 6. Each network consists of connected PAN-ECM genes with a hub gene. As indicated, hub Activated pathways are coloured orange (z-score >2) and inhibited pathways are coloured blue (z-score <-2). The absolute z-score is higher, the color is more intensive. https://doi.org/10.1371/journal.pone.0231594.t002 genes with the most intensive connections were FN1, ITGB3, ERBB2, and BMPR1B. In addition, the hub genes with the contrasting fold change in tumorigenesis and metastasis included 100A7, FBLN1, SERPINAS, TNFSF14, BMPR18, and GDF5, which were upregulated in tumorigenesis and downregulated in metastasis, and FSTL3, ADAMTS1. COL4A4, HGF, MAG, FN1, ITGB3, NCAM1, AGTR1, and DNM3, which were downregulated and upregulated in tumorigenesis and metastasis, respectively.

Gene coexpression network analysis
The gene coexpression network was constructed using the retrieved 20,531 genes from EC tumor samples using a systematic unsupervised method, WGCNA. As shown in Fig 7A, numerous modules were generated, whereas four modules consisting of the highest proportions of PAN-ECM genes, ranging 11%~52%, were selected for further analyses Fig 7B. Significantly involved pathways among the modules are enumerated in Table 3.
As described previously, high grade and re-occurred EC have poor prognosis outcome. Therefore, we attempted to dig the possible mechanisms and the characteristics of the ECM microenvironment during high-grade EC at metastasis stage. The pathway analyses for metastasis-associated PAN-ECM genes listed in Table 3 showed that module 1 and 2 possessed the highest proportions (52% and 30% respectively) of PAN-ECM genes, where those PAN-ECM genes are involved in three pathways, GP6 signaling, dendritic cell maturation, and osteoarthritis pathways, suggesting that those PAN-ECM genes and their involved pathways may play crucial roles during EC metastasis.
In addition, colorectal cancer metastasis signalling, IL-6 signaling, and IL-8 signaling also were also predicted to be activated during EC metastasis, whereas RhoGDI signaling pathway was predicted to be inhibited in metastatic EC. The molecular network of coexpression module 1 containing the highest proportion of PAN-ECM genes including COL1A1, BGN, NID1, ADAM12, COL1A2, INHBA, THBS1, SPARC, FBN1, and CSPG4 was shown in Fig 7C. The typical metastasis-associated hub genes were identified.

Discussion
In this study, PAN-ECM Data List was generated, that covers not only ECMs, but also synthetic and degradation enzyme for ECMs and ECM associated molecules and the regulators, also present in the extracellular microenvironment surrounding cells. The PAN-ECM Data List has a total of 1516 ECMs and ECM-associated genes. Comparison to the Matrisome Project [28], there were 489 additional genes that contribute in 9 categories including cell surface receptor, cytoskeleton, cell adhesion molecules, ECM receptor cofactors, ECM receptors, ECM membrane-associated, ECM-associated regulatory factors, ECM synthetic/degradation enzymes, and ECM microenvironment associated. Some of the genes were reported to be involved in many critical biological processes, for instance, integrin subunit alpha 2b (ITGA) and intercellular adhesion molecule 1 (ICAM1) genes, which is an ECM receptor and a molecule that can bind to integrin, respectively, both play crucial roles in tumor development [42]. In addition, the cell surface receptor category includes transforming growth factor-beta receptor genes (TGFBR1 and TGFBR3), and that the elevated TGFBR levels could induce collagen expression in metastatic breast cancer cells [43]. Several genes in the cell adhesion molecule category, such as CD44 and CADM1, were involved in cancer cell migration and the regulation of ECM adhesion [44,45]  Interestingly, several PAN-ECM genes involved signaling pathways, GP6 signaling pathway, ILK signaling pathway, and IL-8 signaling pathway were significantly inhibited in tumorigenesis and activated in metastasis of EC, suggesting that those specific PAN-ECM genes may participate in EC metastasis and progression. To our knowledge, PAN-ECM gene involved GP6 signaling in association with EC progression has not been reported. GP6, a glycoprotein, is mainly expressed in platelets. GP6 signaling pathway was primarily thought to participate in platelets and their precursor megakaryocyte activation [54]. Aggregation of platelets may shield circulating tumor cells from immunosurveillance and to promote tumor progression and metastasis [55]. Another report suggested that GP6 served as the primary signaling receptor for collagen, which induces platelet activation and thrombus formation [54]. Defect of GP6 could cause malfunction of platelet reactivity to collagens in blood and lung cancer metastasis [56,57]. An elevated level of serum platelets lowered the chemotherapy response rate and increased the risk of recurrence in EC [58]. Our current study identified several collagen genes that were upregulated in metastatic EC, suggesting that the upregulated collagens may activate GP6 signaling and protect the EC cells from immunosurveillance and promote tumor metastasis.
ILK signaling pathway was shown to represent an essential kinase activity and also regulated the decidualization of endometrial stromal cells [59]. It also played a role in hormonedependent cancer progression and regulated a variety of cellular functions including cell survival, migration, and angiogenesis, its gain or loss of function resulted in oncogenic transformation and progression of invasive and metastatic phenotypes [60]. IL-8 signaling belongs to CXC chemokine family. Many studies confirmed high levels of IL-8 in HER2-enriched and basal-like (ER-) primary breast cancer, whereas patients with a low level of IL8 had better prognosis [61].
The hub gene they interact with the other PAN-ECM genes may play crucial roles in regulating the EC development and the progression. From the molecular network analysis (Fig 6), we identified several hub genes, such as FN1, ITGB3, and BMPR1B, interacted with the other ECM molecules intensively. These genes were associated with many critical biological functions related to cancer cell metastasis. FN1 was previously reported to be associated with adverse or poor prognosis of breast and ovarian cancer [62]. Increased expression of ITGB3 could promote and control the metastasis of breast cancer and EC [63,64]. BMPR1B was correlated with cell proliferation, suggesting that it may play a role in protecting against endometriosis progression [65]. A previous study by Frida et al.'s team explained the transformation of normal cells to malignant and metastatic tumor cells. VCAN, an ECM gene was initially downregulated but upregulated in metastasizing cells in the prostate and colorectal cancers [66]. Our results showed that the metastatic EC associated ECM genes were highly expressed at a late stage and might participate in EC progression.
From the coexpression network analysis identified several modules possessing higher proportions of ECM genes. Interestingly, the GP6 signaling pathway along with the dendritic cell maturation and osteoarthritis pathways were activated in modules 1 and 2 and in metastatic DEGs (Table 3). Notably, GP6 signaling was identified not only by functional enrichment analysis but by coexpression network analysis. PAN-ECM genes, such as COL1A1, COL1A2, BGN, NID1, ADAM12, INHBA, THBS1, SPARC, and CSPG4, were identified to act as hub genes. Increased expressions of COL1A1 and COL1A2 genes were associated with lower survival of ovarian cancer patients [67]. BGN gene could enhance migration and invasion of EC [68]. NID1 gene was found to be a new therapeutic target biomarker in breast cancer and EC invasion [69]. ADAM12 was a tumor marker and has negative prognostic value for overall survival in ovarian cancer [70]. Strong colocalization of INHBA was highly associated with malignant endometrial tissues [71]. THBS1 expression might associate with the survival of EC patients [72]. Overexpression of SPARC gene could promote migration of EC [73]. Increased expression of CSPG4 gene was correlated with disease reappearance in patients with breast cancer [74].
Increased matrix stiffness has a profound effect on tumor development and progression [75]. Upregulation of collagen and integrins is known to participate in stiffness [76,77]. Increased matrix stiffness resulted in increases in cell outgrowth and sprouting, which were correlated with cancer metastasis [78]. Collagen expressions, mainly types I, III, and VI, have been studied in the past few years, and higher expressions of those ECM genes were positively correlated with matrix stiffness [27, [79][80][81]. In this study, many collagen-associated genes, such as COL4A4, COL19A1, and COL6A6 (Table 1), showed decreased expressions at early stages and increased expressions at the late stage of EC. These findings suggested that those ECM genes may participate in EC progression.

Conclusions
In this work, we compiled a comprehensive PAN-ECM Data List including ECM and ECMassociated genes. The ECM genes including collagens and integrins and the signaling pathway, GP6, ILK signaling, and IL-8 signaling pathways, were highly correlated with metastatic EC. The identified PAN-ECM genes and their participated signaling pathways may lead to the development of novel biomarkers and new therapeutic strategies for the re-occurred, poor prognosis of high-grade EC.