The Genome-Wide Analysis of Carcinoembryonic Antigen Signaling by Colorectal Cancer Cells Using RNA Sequencing

Сarcinoembryonic antigen (CEA, CEACAM5, CD66) is a promoter of metastasis in epithelial cancers that is widely used as a prognostic clinical marker of metastasis. The aim of this study is to identify the network of genes that are associated with CEA-induced colorectal cancer liver metastasis. We compared the genome-wide transcriptomic profiles of CEA positive (MIP101 clone 8) and CEA negative (MIP 101) colorectal cancer cell lines with different metastatic potential in vivo. The CEA-producing cells displayed quantitative changes in the level of expression for 100 genes (over-expressed or down-regulated). They were confirmed by quantitative RT-PCR. The KEGG pathway analysis identified 4 significantly enriched pathways: cytokine-cytokine receptor interaction, MAPK signaling pathway, TGF-beta signaling pathway and pyrimidine metabolism. Our results suggest that CEA production by colorectal cancer cells triggers colorectal cancer progression by inducing the epithelial- mesenchymal transition, increasing tumor cell invasiveness into the surrounding tissues and suppressing stress and apoptotic signaling. The novel gene expression distinctions establish the relationships between the existing cancer markers and implicate new potential biomarkers for colorectal cancer hepatic metastasis.


Introduction
Intestinal cancers rank fourth in cancer incidence in the world. Despite the improvement of early diagnostics, 20% of primary colorectal cancer diagnoses reveal remote metastases. Available therapies still offer a poor prognosis and patients have a less than 10% five year survival rate [1]. The serum CEA test is recommended by the American Society of Clinical Oncology [2] and by the European Group on Tumor Markers [3] as a prognostic and postoperative marker for metastases and as an aid in the management of cancer patients. Clinical efficacy of CEA screening has been demonstrated in the follow-up management of patients with colorectal, breast, lung, prostate, pancreatic and ovarian carcinoma [4].
CEA is a large glycoprotein (~180 kD), a member of the Carcinoembryonic antigen gene family and the immunoglobulin (Ig) gene superfamily and comprises an exceptionally diverse array of highly glycosylated glycoproteins http://www.carcinoembryonic-antigen.de/index. html [5]. CEACAM genes are expressed in multiple cell types including epithelial, endothelial and immune cells such as leukocytes and dendritic cells. CEACAM molecules are generally inserted into the cell membrane via a transmembrane domain or physically linked to membrane via glycosyl-phosphatidylinositol anchorage [5]. Regulation of intercellular adhesion is a major function of CEA [6] and CEA can establish and maintain tissue architecture and function in the colon. The tumorigenic effects of CEA include inhibiting cell differentiation, blocking cell polarization, distorting tissue architecture and inhibiting anoikis (cell death due to the loss of cell-cell contacts) [7,8]. Nonetheless, the molecular mechanism of CEA related metastasis is not well understood.
We used 2 human colorectal derived MIP101 cell lines of the same origin with a different metastatic potential [9] to study the influence of CEA on metastasis. Original low-differentiated, poorly metastatic MIP101 cell lines do not produce CEA. The derivative MIP101-clone 8 was genetically modified by transfection with a construct contaning the full-length CEA gene and a G418 antibiotic resistance selected to express CEA.
Here we measured transcriptome differences induced by CEA production in colorectal cancer cells with differing levels of CEA production and metastatic potential. The RNA sequencing technology (RNAseq) allows for the comparison of RNA produced by different cell lines, estimation of the level of gene expression, and identification of changes in gene splicing and in the signaling pathways that are involved in response to CEA over-expression.

RNA Isolation and Sequencing
Total RNA was extracted from human colon carcinoma cells using Trizol reagent, according to the manufacturer's protocol (Invitrogen Life Technologies, CA) and cleared from ribosomal RNA. Poly-(A)RNA were isolated from 3 mg of total RNA using Sera-Mag oligo (dT) spheres (Thermo Scientific, Lafayette, CA, USA).
Libraries for sequencing were obtained using the Truseq kit, universal adapter sequences, and specific PCR primers, recommended by Illumina. (Illumina, SanDiego, CA, USA). The mRNA sequencing libraries were prepared and sequenced using an Illumina HiSeq2000 instrument at the Broad Institute, Boston, USA. More than 50 M reads were produced for each library with read length of 76 bp.
Clean reads were mapped to the GRCh37 human reference genome using TopHat (v2.0.9) as part of a Tuxedo pipeline [11]. The expression level for the genes and their isoforms were calculated by Cufflinks and resulted in FPKM values (fragments per kilobase of exon per million fragments mapped). Differential expression levels between samples were calculated by EdgeR package [12], and the p-value was adjusted using the FDR (false discovery rate) control method. EdgeR package is based on the statistical method quantile-adjusted conditional maximum likelihood estimator for the dispersion parameter of the negative binomial distribution which was developed for very small sample sizes, typical of those from analysis of gene expression studies with small number of observations (e.g. 2 libraries for a given experimental condition) [12]. The genes were considered differentially expressed between two conditions if they showed a log2 fold-change ratio more than 2 and displayed an FDR-adjusted p-value <0.05. We identified relevant biological processes and molecular pathways using the Gene Set Enrichment Analysis (GSEA) tool (http://www.broadinstitute.org/gsea).

Validation of RNA-sequencing data by qRT-PCR
Using the Primer 3 program (http://www.ncbi.nlm.nih.gov/tools/primer-blast/) PCR primers were designed to be suitable for all significant isoforms of each differentially expressed gene; forward and reverse primers were located in different exons of the corresponding gene (S3 Table). The sequences of the selected primers shown in the S3 Table were used, the expression levels of the CEA and GAPDH genes were used as controls for PCR analysis. Each qRT-PCR reaction was performed with three replicates.

Global Gene Expression Profiling by RNA sequencing
To analyze the changes of transcriptome associated with CEA up-regulation, we compared global transcript levels in the CEA-up-regulated (MIP101 clone 8) and CEA negative (MIP101) cell lines using the Tophat pipeline [10,11]. One hundred transcripts of known genes were significantly different in expression. 32 genes were up-regulated and 68 were downregulated in CEA-producing cells having a fold change in expression >2 and FDR < 0.05 (S1 Table). The top 30 up-regulated and down-regulated genes with an FDR < 0.0001 are summarized in Fig 1.

Molecular Pathways and Biological Processes Analysis
In order to identify putative biological processes affected by changes in gene expression, we performed Gene Ontology enrichment analysis using the Gene Set Enrichment Analysis (GSEA) tool (http://www.broadinstitute.org/gsea). 49 biological processes were functionally enriched including cell proliferation and apoptosis in response to stress and other cancerrelated processes. The top 10 identified biological processes are summarized in Table 1 and the complete list is presented in the S2 Table. Molecular pathways regulated by CEA over-expression were identified by pathway analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG). In total, we identified 4 significantly overrepresented pathways: cytokine-cytokine receptor interaction, MAPK signaling pathway, TGF-beta signaling pathway and pyrimidine metabolism (see Table 2).  Phenotypic changes in CEA over-expression cells MIP101 clone 8 may be caused by alterations in expression levels of genes, associated with cancer progression and metastasis. Table 3 summarizes phenotype-related genes which were selected from the list of differently expressed genes after CEA over-expression.

Validation of RNA-seq data by qRT-PCR
We performed qRT-PCR to validate RNA-seq technology and bioinformatics methods. Three up-regulated genes (DSP, PCDH1 and WFS1) and two down-regulated genes (GADD45A and KLF11) were randomly selected from the list of differently expressed genes.
CEA gene expression was used as a positive control and GAPDH gene was used as an endogenous control. As demonstrated in Fig 3, the up-regulation and down-regulation of all selected genes were in accordance with RNA-seq analysis results. The results of classic PCR analysis are presented in S1 Fig.

Discussion
The steady growth of cancer incidence, increasing survival ages, and high mortality from cancer metastasis requires the development of more specific and effective methods for diagnostics and prevention of metastasis. Metastasis is the end product of a multi-step, cell-biological process termed the invasion-metastasis cascade which involves dissemination of cancer cells to anatomically distant organs and their subsequent adaptation to a foreign tissue microenvironment. Each of these events is driven by the acquisition of genetic and epigenetic alterations within the tumor cells and multiple interactions between the metastatic population and microenvironment [13]. Metastasis is a highly specific process. Colorectal cancer cells frequently develop metastases to the liver and lungs, whereas prostate and breast cancers more frequently target the bones as their site for metastasis [13]. Although CEA plays a key role in the formation of hepatic metastasis from colorectal cancer the precise mechanism for CEA-induced metastasis is not yet well understood [14]. We discovered and cloned the gene for the CEA receptor from the liver macrophages-Kupffer cells [15]. It has been shown that the CEA/CEAR interaction on the surface of Kupffer cells stimulates the IL1a, IL6, IL10 and TNF-alpha cytokine production and influences the cell adhesion molecules in the liver that enhances the survival and homing of metastatic cancer cells in the liver that trigger metastasis [16].
In this study we carried out an in-depth analysis on the effect of CEA on colorectal cancer cells. The new information on the gene expression, biological processes and signaling pathways involved in response to CEA were elucidated. The transcriptomic profiling revealed changes in the level of expression for 100 genes (Fig 1) including VEGFA, CXCL5, TNFSF15, PCDH1 etc.
Previously, we have shown the influence of CEA synthesized by cancer cells on the formation and function of E-cadherin adhesion junction complexes [8]. These complexes are one of the main contacts between the epithelial cells. CEA disrupts the interactions between the proteins that form these complexes: E-cadherin, alpha-and beta-catenins, and catenin p120. CEA production also changed the splicing variants of catenin p120 protein and triggered the nuclear expression of beta-catenin [8]. P120-catenin is a master regulator of cadherin stability and a modulator of Rho GTPase activity. Although p120-catenin stabilizes E-cadherin at the plasma membrane and promotes cell-cell adhesion, it can also promote cell motility and invasion [17]. Using the Gene Ontology (GO) enrichment analysis we discovered 49 GO biological processes that were functionally enriched by CEA including cell proliferation, apoptosis, response to stress, transcription and nucleic acid metabolism (Table 1). First, CEA receptor protein (CEAR) has been described as heterogeneous nuclear RNA-binding protein M (HNRNPM1-4) [18]. This protein belongs to a large family of heterogeneous nuclear RNA-binding proteins (HNRNPs A-U), also called "histones of RNA" [18]. It is an abundant nuclear shuttling protein that has at least 4 protein isoforms [18] and only 2 transcripts have been experimentally validated: the full-length protein (isoform 1) and the short form (isoform 2) that has a deletion [19]. We identified the isoform 2 as the CEA-binding protein on the surface of Kupffer cells [20].
The HNRNPM is a multifunctional protein that is involved in mRNA processing [21] and splicing [22]. In human cells, CEAR/ HNRNPM plays a role in regulating FGFR2 alternative splicing and can affect the splicing of several other genes [23].
Post-transcriptional modifications can modulate HNRNPM activity by altering its localization, its RNA binding specificity, and interaction with other cellular factors [19]. Generally, HNRNPM has a diffuse nuclear distribution and remains bound to the mRNA as it is transported through nuclear pores. Using isoform specific antibodies, we have shown that in macrophages and colorectal cancer cells the full-length HNRNPM protein-isoform 1 is localized in the nucleus versus the short isoform 2, which appears in the cytoplasm and on the surface of the cells [20].
In support of our data, new evidence has emerged regarding CEA/CEAR signaling pathway activation in the endothelial cells [24]. Endothelial cells do not synthesize CEA. However, soluble CEA, produced by cancer cells, binds with the CEAR receptor on the endothelial cells. Importantly, down-regulation of HNRNPM /CEAR and not VEGF in endothelial cells suppresses tumor angiogenesis [24]. Clinical research also shown that up-regulation of HNRNPM/CEAR protein is associated with the aggressive types of colon [25] and breast cancers [22]. Together, these results elucidate a novel function for CEA/CEAR signaling in the tumor angiogenesis and metastasis.
Using RNA sequencing we performed KEGG analysis and identified 4 new significantly enriched by CEA over-expression pathways: cytokine-cytokine receptor interaction, MAPK signaling pathway, TGF-beta signaling pathway and pyrimidine metabolism (see Table 2).
The transforming growth factor-β (TGF-β) signaling pathway is involved in the control of multiple cancer-related biological processes, including cell proliferation, differentiation, migration, and apoptosis. TGF-β and CEA signaling also induce the epithelial-mesenchymal transition that is essential for metastasis in tumor cells. Recently it has been reported that TGF-β may signal through HNRNPM/CEAR [26]. The authors suggested that NF-kappaB could be a common point between these pathways [26]. Our data also show that MAPK pathway is deregulated by CEA overexpression. MAPK signaling pathway is closely related to the TGF-β signaling pathway.
In this study the large-scale RNA sequencing analysis has provided a wealth of information on the biologically relevant systems that are involved in the response to CEA. We identified the key signaling pathways and proteins that are involved in the CEA-induced colorectal cancer. It represents a major step toward the understanding of the mechanism of colorectal cancer hepatic metastasis.
A successful design of novel therapeutic regimens to target CEA producing cancers is only possible if the network-based pathways are correctly identified. Uncovering the underlying molecular mechanisms of CEA/CEAR signaling in angiogenesis and metastasis will lead to the development of new therapeutic possibilities and agents to suppress the CEA expression and to prevent metastasis.