CHST4 might promote the malignancy of cholangiocarcinoma

Background Cholangiocarcinoma (CCA) is reported as an aggressive cancer which leads to high mortality and no effective therapeutic target has yet been discovered. Surgical resection is the main method to treat patients with CCA. However, only one-third of CCA patients have the opportunity to accept the operation, leading to poor prognosis for CCA patients. Therefore, it is necessary to search for new therapeutic targets of CCA or core genes involved in the happening and growth of CCA. Aim In this study, we utilized bioinformatics technology and accessed to several medical databases trying to find the core genes of CCA for the purpose of intervening CCA through figuring out an effective curative target. Methods Firstly, three differentially expressed genes (DEGs) were discovered from GEPIA, and by further observing the distribution and gene expression, CHST4 was obtained as the core gene. Afterwards, correlated genes of CHST4 in CCA were identified using UALCAN to construct a gene expression profile. We obtained PPI network by Search Tool for the Retrieval of Interacting Networks Genes (STRING) and screened core genes using cytoscape software. Functional enrichment analyses were carried out and the expression of CHST in human tissues and tumors was observed. Finally, a CCA model was established for qPCR and staining validation. Results Three differentially expressed genes (DEGs), CHST4, MBOAT4 and RP11-525K10.3, were obtained. All were more over-expressed in CCA samples than the normal, among which the change multiple and the gene expression difference of CHST4 was the most obvious. Therefore, CHST4 was selected as the core gene. We can see in our established protein–protein interaction (PPI) network that CHST4 had the highest degree of connectivity, demonstrating its close association with CCA. We found that genes were mainly enriched in CCs in the PPI networks genes which shows functional enrichment analysis results, including golgi lumen, extracellular space and extracellular region. CHST4 was found very specifically expressed in the bile duct and was significantly different from that in normal tissues. The overexpression of CHST4 was further verified in the established animal model of TAA-induced CCA in rats. Quantitative PCR (qPCR) demonstrated that CHST4 was significantly overexpressed in tumor tissues, verifying the role of CHST4 as the core gene of CCA. Conclusion CHST4 was increasingly expressed in CCA and CHST4 is worth being studied much further in the intervention of CCA.


Aim
In this study, we utilized bioinformatics technology and accessed to several medical databases trying to find the core genes of CCA for the purpose of intervening CCA through figuring out an effective curative target.

Methods
Firstly, three differentially expressed genes (DEGs) were discovered from GEPIA, and by further observing the distribution and gene expression, CHST4 was obtained as the core gene. Afterwards, correlated genes of CHST4 in CCA were identified using UALCAN to construct a gene expression profile. We obtained PPI network by Search Tool for the Retrieval of Interacting Networks Genes (STRING) and screened core genes using cytoscape software. Functional enrichment analyses were carried out and the expression of CHST in human tissues and tumors was observed. Finally, a CCA model was established for qPCR and staining validation.

Results
Three differentially expressed genes (DEGs), CHST4, MBOAT4 and RP11-525K10.3, were obtained. All were more over-expressed in CCA samples than the normal, among which the change multiple and the gene expression difference of CHST4 was the most obvious. Therefore, CHST4 was selected as the core gene. We can see in our established protein- Introduction Cholangiocarcinoma (CCA) is a malignant tumor originating from the biliary ducts [1,2]. It can be divided into intrahepatic cholangiocarcinoma (iCCA) and extrahepatic cholangiocarcinoma (eCCA) according to its anatomical location [3]. CCA accounts for about 3%-5% of all malignant tumors in the gastrointestinal system and the incidence of CCA is on the rise [4,5].
There are many risk factors for CCA, such as bile duct cysts, chronic HBV and HCV infections, primary sclerosing cholangitis and liver cirrhosis [6,7]. CCA which is a militant tumor leading to poor prognosis is one of the deadliest cancers [8]. The efficacy of chemotherapy is not good, and its survival time is less than one year on average. Surgical resection is the primary pathway of potential cure for CCA [9]. However, most advanced patients cannot be surgically treated and only about one-third of CCA patients are eligible for surgery [10]. The main factors leading to poor prognosis of CCA are advanced diagnosis and confined treatment options [11]. Therefore, there is still an urgent need to develop innovated treatments for CCA.
To synthesize a functional gene product, the process of gene expression is necessary to use the information from a gene. Through the investigation of the differential expression of genes in the tumor tissue and normal tissue, it is possible to find the most important genes in the growth and advancement of cancer. In current study, differentially expressed genes (DEGs) have been utilized to discover novel prognostic biomarkers in a variaty of cancers. 36 patients with CCA were involved in a recent study and their gene expression profiles and clinical features were analyzed by Junyu Long, et al. in 2021, to identify the DEGs of CCA. In this research, 78 hub genes associated with the differentiation of tumor were identified, among which 17 ones were considered prognostic biomarkers for CCA patients [12]. It is widely acknowledged that DEGs of CCA could be regarded as potential molecular therapy target for CCA. CHST4 is expressed in the pancreas, fallopian tube, gallbladder and liver according to the Human Protein Atlas [13]. CHST4 is one of the family members of GlcNAc6ST family genes. The transfer of sulfate can be catalyzed to GlcNAc residues position 6 which is nonreducing by CHST4. GlcNAc residues position 6 can function as L-selectin ligands in mucinassociated glycans. In high endothelial cells (HEVs), these L-selectin ligands are found crowded, leading to the orienting of lymphocyte [14].
Previous studies have identified the role of this gene in hepatocellular carcinoma. In addition, mounting evidence indicated that the expression level of CHST4 is tightly connected with the differentiation, development and progression of various malignant cancers. Study conducted by Longshan Zhang, et al. implies that CHST4 may recruit immune cells into tumor microenvironment, which prevent hepatitis B virus-related hepatocellular carcinoma (HBV-HCC) tumors from progressing. Immune cells include dendritic cells, CD4+ T cells, neutrophils, and macrophages [15]. In another study about the gene expression profiles of human intrahepatic cholangiocarcinoma by Natini Jinawath, et al., an abnormal increment of the expression level of CHST4 was revealed, confirming the important role CHST4 plays in CCA [16].
However, the link between the gene and CCA has not been explored [17], and the precise function and prognostic value of CHST4 in CCA remain unclear. In this study, effective curative targets and prognostic signals for who suffered from CCA were identified. CHST4 was regarded as a potential therapeutic gene and follow-up experiments were carried out.

Processing of differentially expressed genes
Zhang's Lab of Peking University (http://gepia.cancer-pku.cn/index.html) invented The Gene Expression Profiling Interactive Analysis (GEPIA). It is an online tool including 9,736 cancers and 8,587 normal samples, where the RNA sequencing expression data can be analyzed. The differentially expressed genes (DEGs) can be figured out using GEPIA. In our study, we use this method to define the DEGs from normal and CCA samples. We filtered out the DEGs according to the percentage cutoff and |log 2 FC|. Genes with |log 2 FC| � 3.0 and percentage cutoff = 0.9 were defined as DEGs. Then we observed gene expression profile and distribution of DEGs.

Correlated genes of CHST4
UALCAN is an extensive and synergistic net materials, which was established by PERL-CGI using CSS and javascript with high quality graphics, which allows researchers to collect useful information and data about the interested genes. We used UALCAN tool to find correlated genes with CHST4 in CCA. Pearson correlation coefficient was calculated and the six genes most associated with CHST4 were screened. Gene expression in CCA based on sample types was obtained from UALCAN.

PPI network establishment and central genes identification
The Search Tool for the Retrieval of Interacting Networks Genes (STRING) database (http:// string-db.org/) was employed to build our protein-protein interaction (PPI) networks. We regarded 0.4 as the minimum which limited PPI pairs as the lowest interaction score. In order to dig out possible relationship of PPI, we entered CHST4, which was obtained before, into the STRING database. Moreover, we utilized cytoscape software to adjust and revamp the PPI network (www.cytoscape.org/). We also calculated different connectivity degree. If connectivity degrees were important between each node, the protein nodes got a high mark. The plugin cytoHubba of cytoscape software is used to calculate the marks. Thus, we identified the top 3 genes whose connectivity degree rank the highest as central genes.

GO and KEGG pathway analysis
Biological process (BP), cell component (CC) and molecular function (MF), usually regarded as the results of functional enrichment studies in large dimensions, can show the function of gene bod, so we analyze them through Gene Ontology (GO) analysis. Kyoto Encyclopedia of Genes and Genomes (KEGG), as one of the most famous database resource of biological information, which is utilized in exploring not only biological systems but also advanced functions from information of molecular level, especially genomic sequencing gained from huge molecular datasets. We utilized database including biological tools and data to analyze and serving as an online bioinformatics database, which is Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.8. It enables users to get the results of GO analysis and KEGG pathway enrichment analysis of the PPI networks genes by obtaining biological information and analyzing gene and protein functional annotation information(https://david.ncifcrf.gov/). WEB-based Gene SeT AnaLysis Toolkit (WebGestalt) was used for enrichment analysis in three well-constructed and completing modes, which is Over-Representation Analysis (ORA), Gene Set Enrichment Analysis (GSEA), and Network Topology-based Analysis (NTA). Also, we gain GO analysis and KEGG pathway enrichment analysis of the PPI networks genes using WebGestalt.

Expression of CHST in human tissues and tumors
We used UALCAN online tool to compare the appearance of CHST4 in various tumors and normal samples. Swedish-based program invented The Human Protein Atlas (HPA) for the purpose to represent the whole proteins in human. The consolidation of different omics technologies was used in it, such as mass spectrometry-based proteomics, transcriptomics antibody-based imaging, and systems biology (https://www.proteinatlas.org). We got CHST4 expression data in cells and tissues from HPA.

Experimental protocol
We used 108 male Sprague-Dawley (SD) rats (290-310g) in our study and divided all animals into two groups, which comprised of the experiment group (n = 54) and the control group (n = 54). We kept animals in the room, exposed to light for 12 hours (07:00-19:00), at surrounding temperature of 21 ± 1 degree Celsius, with plenty of food and water. In the experiment, we fed rats 300 mg thioacetamide (TAA)/L in drinking water until they were killed [17]. We monitored rats for signs of pain during the study, and when necessary, buprenorphine was used to lessen their distress. We harvested six animals (three experiment group rats and three control group rats) per week during this research to observe the effect of TAA and build the cholangiocarcinoma animal model. In this study, we weigh the rats weekly to see weight change.

Collection and hematoxylin-eosin staining of the liver
We used isoflurane inhalation to deeply anesthetize the animals before the surgery. We put rats into the gas-tight anesthesia chamber in which rats were exposed to 2% (1~1.5 minimum alveolar concentration) isoflurane (in 4L/min O2) for 10 minutes. After that, the animals were inconscient, and if they gave little response to tail pinch, we removed them from the chamber. The rats were permitted to breathe indoor air on their own and we used heating pads to keep their rectum temperature at 37 degrees Celsius. After midline laparotomy, we explored and carefully examined all liver lobes to determine the presence of CCA. Midline laparotomy and hepatectomy results in the death of anesthetized rats with less distress. We fixed the liver samples in 10% formalin and parted liver embed in paraffin at 4 μm thickness. Then we used hematoxylin-eosin and masson to stain our samples for accurate observation. We captured images to observe CCA cells on Olympus VS-120 virtual digital slide scanning system with 40x objectives. We chose automatic scanning mode to automatically scan all slides (1-5 slides) on the stage. The multi-point intelligent focusing formed focused topographic maps solving the problem of the different flatness. The digital images were obtained for accurate observation and analysis.

Extraction of RNA and real-time quantitative polymerase chain reaction
We obtained the whole RNA by trizol method and we calculated relative CHST expression by 2− ΔCt (ΔCt = Ct of the CHST4 subtract Ct of glyceraldehyde-3-phosphate dehydrogenase). Tri-Zol reagent (Invitrogen) was prepared for the whole RNA. We rigorously followed the manufacturer's instructions. We used the SuperScript II Reverse Transcriptase Kit (Invitrogen) to reverse transcribe the CCA RNA samples and normal RNA samples into cDNA. The volume contents 10 pM of each primer. We added 1 μl reverse transcription product as the manufacturer's instructions. And the total volume is 25 μl.

Ethics approval
The methods of research were approved by the Ethics Committee of the School of Basic Medical Sciences of Shandong University. Operations have been performed according to the Declaration of Helsinki and the ARRIVE guidelines (ECSBMSSDU2021-2-82). The informed consent was not applicable.

Identification of DEGs
Based on the percentage cutoff and |log 2 FC|, three differentially expressed genes (DEGs) were obtained (Table 1). Compared to normal tissues, all were over-expressed in tumor tissues. Among them, change multiple of CHST4 was the most obvious. Afterwards, gene expression profile and distribution of DEGs were acquired from the GEPIA (Fig 1). Based on gene expression profiles, the gene expression difference of CHST4 between tumor and normal samples was the most significant. According to the distribution of DEGs, CHST4 has the highest expression in the gallbladder, followed by the pancreas, while in other parts of the body the expression is relatively low, demonstrating extremely high specificity. The other two DEGs showed different distribution features. MBOAT4 is mainly expressed in the blood, with low expression in other parts of the body. RP11-525K10.3 is highly expressed in normal testis and obviously reduced in tumor tissue. Therefore, CHST4 was selected as the core gene for subsequent analyses.

Correlated genes of CHST4 and gene expression profile
Correlated genes of CHST4 were found based on pearson correlation coefficient which was calculated according to gene expression values (TPM) between CHST4 and other genes associated with CCA. Genes with extremely low expression (Median TPM < 0.5) are filtered out. Table 2 showed top 4 genes positively correlated with CHST4 in CCA and top 3 genes negatively correlated with CHST4 in CCA. Expression pattern of correlated genes about CHST4 was delineated (Fig 2). Between the top 4 genes positively correlated and the top 3 genes negatively correlated with CHST4 in CCA, the gene expression correlation are described in Fig 3. The expression profiles of CHST4, CLIP2, ELOVL7, PDGFD, MANEAL, CDKAL1, SFPQ and GUSBL2 were all statistically significant (Fig 4).

PPI network construction and hub genes identification
In order to build the protein-protein interaction (PPI) network, we used The Search Tool for the Retrieval of Interacting Networks Genes (STRING). We used cytoscape software 3.8.2. to refine the network (S1 Data). 46 edges and 11 nodes are in the PPI network. We set the average node degree of 8.36 (Fig 5). After that, the rank of connectivity was calculated so finally the top 3 genes were identified (Table 3).

Functional enrichment analysis of PPI networks genes
Functional analyses of ppi networks genes were conducted utilizing the DAVID tools (Table 4). We set gene counts � 4 and genes with FDR < 0.05 as statistically meaningful. From the GO analysis, we found that genes were principally enriched in CCs, which includes golgi lumen, extracellular space and extracellular region. In BP analysis, it suggested that the genes were enriched in o-glycan processing. And in the MF, we found that the in extracellular matrix constituent, lubricant activity genes were enriched. Moreover, the results of KEGG analysis demonstrated that enrichment of genes were not significant. Similar results were obtained using the webgestalt tool for KEGG and GO enrichment analysis (Fig 6).

CHST4 expression in normal tissues and various tumors
In normal cells and normal tissues, the expression of CHST4 was observed by HPA (Fig 7). It occurs that CHST4 is very specifically expressed in the bile duct. The expression of CHST4 between tumor and normal samples according to TCGA database are showed in Fig 7 from the UALCAN and the expression of CHST in cholangiocarcinoma was significantly different from that in normal bile ducts.

Establishment of cholangiocarcinoma animal model
A rat model of cholangiocarcinoma was established by feeding water containing TAA. During the feeding process, we found that the weight difference between the control group and the experimental group became increasingly obvious with the prolongation of the feeding time. In normal liver, few bile ducts can be found around the blood vessels. After 8 weeks of feeding water containing TAA, little bile ducts hyperplasia and fatty degeneration of liver cells was noticed. After 10 weeks, low level of bile duct hyperplasia and liver cells fatty degeneration can be observed. After 12 weeks, bile ducts hyperplasia and fatty degeneration of hepatocytes were obvious. After 14 weeks, there is a large proliferation of bile duct cells. After 15 weeks, the proliferation of bile duct cells is obvious and the deformation of hepatocytes is more obvious. After 16 weeks, fibrosis appeared in the liver and a certain number of rats developed CCA. After 17 weeks of feeding water containing TAA, almost all rats developed tumors and liver cells became necrotic (Fig 8). After 18 weeks of feeding water containing TAA, all rats developed tumors. Fig 9 showed that at the fifth week, there was a small amount of fibroplasia and at the 16th week, a large amount of fibroplasia can be easily observed. The CCA data was referred to Sprague-Dawley (SD) rats fed thioacetamide (TAA) water for 18 weeks. Real-time quantitative polymerase chain reaction showed that not only CHST4 but also CK19 were highly expressed in tumor tissues with statistical significance, verifying the role of CHST4 as the core gene of CCA (Fig 10, S2 Data).

Discussion
Bile duct cancer, also known as cholangiocarcinoma (CCA), originates from the bile duct cells [18]. CCA is the second largest primary liver tumor after hepatocellular carcinoma and one of the deadliest malignant tumors in humans [19]. CCA has the characteristics of late diagnosis, poor prognosis and low response to existing treatments [20]. At the same time, the incidence rate of CCA has increased significantly, especially in western countries. CCA is a nonpainful but lethal tumor, so at an advanced stage it is usually diagnosed if surgery isn't available [21]. Surgical resection is the only effective treatment option, which is suitable for only 20% of cases [22]. Patients who are not suitable for surgery receive palliative treatment of gemcitabine combined with cisplatin (GEMCIS). High recurrence rate after tumor resection and chemotherapy resistance lead to an extremely poor prognosis for CCA. Thus, it shows the necessity to figure out novel effective curative targets for CCA treatment [23]. CHST4 (carbohydrate sulfotransferase 4, GlcNAc6ST2), a special type in the CHST protein family, catalyzes the diversion of sulfate. Also, it eventually acts as L-selectin ligand which are present in high endothelial cells (HEV) [24,25]. So CHST4 makes contribution to lymphocyte homing [26,27]. Previous reports have already indicated that CHST4 was preferentially expressed in HCC, and it may function as a wonderful diagnosis signal for hepatocellular carcinoma (HCC) patient survival, with higher expression predicting shorter time-to-relapse (TTR) and overall survival (OS). In the HBV-HCC, it was found that the promoter of CHST4 was hyper-methylated [15]. Additionally, following researches showed that CHST4 is highly detected in various solid cancers. It was detected that CHST4 has a highly expression in mucinous adenocarcinomas, in which core 1-and core 2-based-glycans were prolong by CHST4 through adding sulfate, but not in the nonmalignant tissues [28]. In the serum of uterine cervical and corpus cancer, CHST4 provided respectively higher positive rates than cancer antigen 125 (CA125), while it provided lower positive rates for ovarian cancer than CA125 [29]. Therefore, CHST4 is also a good maker for early stage uterine cervical amd corpus cancer. CHST4 also takes part in tumor by implicated in the ectopic expression of MECA-79, which is a newfangled diagnosis signal [30]. The altered the expression degree of CHST4 was discovered from glioma, where its mutation and amplifications are correlated with poor prognosis [31]. Moreover, in the Opisthorchis viverrini ICC, the expression of genes involved in xenobiotic metabolism, which include CHST4, are elevated [16]. Collectively, it suggests that CHST4 is an enzyme associated with different tumors, which also means it can be used as a biomarker for a variety of cancers.
In this study, potential curative targets and prognostic indicators for who were diagnosed with CCA were tried to be identified. We found differentially expressed genes (DEGs) on the Gene Expression Profiling Interactive Analysis (GEPIA) website between tumor and normal tissue. Then, on The Search Tool for the Retrieval of Interacting Networks Genes (STRING) database, the protein-protein interaction (PPI) network was constructed and the top three central genes (CHST4, MUC16 and MUC5AC) of CCA were screened by Cytoscape (3.8.2). MUC16, also called CA125, was prediction gene of surgically excision about CCA [32]. Serum MUC5AC might be potentially used as a surrogate marker in the diagnosis of CCA. Serum concentration of MUC5AC was potential diagnostic markers of CCA [33]. MUC5AC is tightly connected with the happening of cancers, especially hepatocellular carcinoma and cholangiocarcinoma [34].
On WEB-based Gene SeT AnaLysis Toolkit (WebGestalt) and DAVID, we performed Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis. After that, we built a model of cholangiocarcinoma and tested whether the gene had potential therapeutic value by Quantitative Real-time PCR (qPCR). In conclusion, this study explored the potential therapeutic targets of CCA through bioinformatics techniques and the construction of cholangiocarcinoma models, and the discovered prognostic gene (CHST4) may be used for research and development of new clinical drugs to alleviate the urgent situation of low survival rate in CCA patients.