Development and validation of a lipogenic genes panel for diagnosis and recurrence of colorectal cancer

Background & aim Accumulated evidence indicates that the elevation of lipid metabolism is an essential step in colorectal cancer (CRC) development, and analysis of the key lipogenic mediators may lead to identifying the new clinically useful prognostic gene signatures. Methods The expression pattern of 61 lipogenic genes was assessed between CRC tumors and matched adjacent normal tissues in a training set (n = 257) with the Mann-Whitney U test. Cox's proportional hazards model and the Kaplan–Meier method were used to identifying a lipogenic-biomarkers signature associated with the prognosis of CRC. The biomarkers signature was then confirmed in two independent validation groups, including a set of 223 CRC samples and an additional set of 203 COAD profiles retrieving from the Cancer Genome Atlas (TCGA). Results Five genes, including ACOT8, ACSL5, FASN, HMGCS2, and SCD1, were significantly enhanced in CRC tumors. Using the cutoff value 0.493, the samples were classified into high risk and low risk. The AUC of panel for discriminating of all, early (I-II stages), and advanced CRC (III-IV stages) were 0.8922, 0.8446, and 0.9162 (Training set), along with 0.8800, 0.8205, and 0.7351 (validation set I), and 0.9071, 0.8946, and 0.9107 (Validation set II), respectively. There was a reverse correlation between the high predicted point of panel and worse OS of CRC patients in training set (HR (95% CI): 0.1096 (0.07089–0.1694), P < 0.001), validation set I (HR (95% CI): 0.3350 (0.2116–0.5304), P < 0.001), and validation set II (HR (95% CI): 0.1568 (0.1090–0.2257), P < 0.001). Conclusion Our study showed that the panel of ACOT8/ACSL5/FASN/HMGBCS2/SCD1 genes had a better prognostic performance than validated clinical risk scales and is applicable for early detection of CRC and tumor recurrence.


Introduction
According to the global statics, Colorectal cancer (CRC) is currently ranked as the second and third most current cancers in women and men, respectively [1]. CRC population is growing about a million new cases annually, and nearly half of this number will die during the next five years. The highest rate of CRC incidence has been reported in developed countries, including Australia, the United States of America, Canada, etc., [2]. Although CRC prevalence in Iran is not high and mostly reported in middle-aged people, the later investigations indicate a growing trend of CRC in the younger population [3][4][5].
CRC mortality could be avoided if cancer is being diagnosed at the early stages. Therefore, the staging of tumors is an essential step in CRC progression. Treating of the advanced CRC cases with high dysplasia and the invasive lesion is mostly accompanied by failure [6,7]. On the other hand, about one-third of stage II CRC patients are accounted for relapse within five years after tumor resection and died because of metastasis [8]. Consequently, several investigations have been carried out to identify novel biomarkers for improving CRC progression [5,8].
Accumulated evidence indicates that the enhanced level of lipid metabolism has a crucial role in cancer development. Due to high proliferation activity, cancerous cells tend to supply their needed lipids de novo, which requires an abnormal level of lipogenic enzymes and signaling factors [9,10]. So, analyzing the differences between the expression pattern of lipid-metabolic mediators in healthy and tumoral cells could be considered as a critical hallmark [11]. Thus, the current investigation was designed to develop a diagnostic panel based on the dysregulation of the lipogenic genes for early detection of CRC and tumor recurrence.

Ethical approval
This study was conducted in accordance with the ethical principles of the World Medical Association's Declaration of Helsinki and approved by the Medical Ethical Committee of RIGLD, Tehran, Iran (Ethical code number: (IR.RIGLD.1397-947). Written informed consents were obtained from all study subjects or their parent/legal guardian in case of under 18 years old. All methods were carried out in accordance with relevant guidelines.

Identification of lipid-metabolic related genes
The core list of lipid-metabolic related genes was retrieved using the Cytoscape plugin DisGen-Net, a bioinformatics platform that integrates the genes data of various human disorders [12]. Following query terms were chosen as data sources: 1-Online Mendelian Inheritance in Man (OMIM) (Mendelian Inheritance in Man and its online version, OMIM), 2-Genetic Association Database (GAD) [13], 3-Mouse Genome Database (MGD) [14], 4-Comparative Toxicogenomics Database (CTD) [15], 5-PubMed, and 6-Uniprot. Genes were ranked according to the number of sources, organism type, and the number of supported publications. the variables between the target groups. Mann-Whitney U test was used for the analysis of the level of the genes. Spearman test was employed to analyze the relationship between the differential expression of the target RNAs and clinicopathologic characteristics. The receiver operating characteristic (ROC) curve was drawn to evaluate the value of the selected genes in the diagnosis of CRC. Cox's proportional hazards model was used for univariate and multivariate logistic regression. The goodness of fit of the multivariate models was calculated with the Hosmer-Lemeshow test. Kaplan-Meier survival was applied to estimate patients 5-year overall survival (OS). All data are represented as the mean ± S.D. (Standard deviation) and taken as significance if P < 0.05 ( � ).

Patients descriptive
The study group consisted of 480 CRC tumor specimens along with their matched adjacent normal tissues, including 272 men and 208 women. Among them, 237 patients (49.37%) were detected with early CRC (I-II TNM stage), and 243 patients (50.63%) were grouped in advanced CRC (III-IV TNM stage). Patients were subsequently divided into a training set (257 CRC tumors and matched normal samples), and validation set I (223 CRC tumors and matched normal samples). Additionally, an independent validation set II consisted of 253 TCGA-COAD profiles, including 203 patients (113 early and 90 advanced CRC) along with 50 healthy individuals was also considered for panel analysis. These target sets were statistically different based on the clinical variables. Additional details are demonstrated in Table 1.

Analysis of lipogenic genes expression in CRC samples
To identify the genes with significant dysregulation ratio in CRC tissues, the expression pattern of 61 genes involved in lipid-metabolic assessed using the Realtime-PCR between 257 CRC samples and their matched normal samples in the training set. Considering fold-change higher than 2 as criteria, five upregulated genes including ACSL5 (2.54-fold, P < 0.01), ACOT8 (3.27-fold, P < 0.01), FASN (3.19-fold, P < 0.01), SCD1 (2.36-fold, P < 0.01) and HMGCS2 (3.38-fold, P < 0.05) were achieved. These lipogenic genes are involved in the transportation of lipids and activation of FAs as well as mediating the cellular signaling. Additional details are provided in S1 Table. Establishment of the lipogenic gene panel Table 2 demonstrates the diagnostic performance of ACSL5, ACOT8, FASN, HMGCS2, and SCD1 as individual biomarkers for discriminating CRC tumors from the normal group. According to the data, all five genes were good predictors (AUC > 0.7), and ACSL5 achieved an AUC of 0.8131 (0.7506-0.8755).
To develop a single risk score using all five genes (ACOT8, ACSL5, FASN, HMGCS2 and SCD1), we used a previously developed strategy with regression analysis for multiple biomarkers [21]. In summary, the expression level of the genes was log 2 transformed to reduce the variations between the value of each gene and used for generating of the logistic regression coefficients. The risk score for each sample was calculated as the sum of the risk score for each gene, which was yielded by multiplying the expression level of a gene by its corresponding coefficient (Risk score = ∑ logistic regression coefficient of gene Mi × expression level of gene Mi). Subjects were subsequently divided into two groups using the median cutoff risk score as a threshold.
To calculate the predicted point of detecting CRC by the five lipogenic genes panel, a stepwise logistic regression coefficients model was established between 257 CRC tumor specimens and matched adjacent normal tissues in the training set. The predicted estimation of being diagnosed with cancer from the log it model based on the five selected lipogenic genes panel (Table 3), and Log it (P) = 0.429 + 0.659 x ACOT8 + 0.084 x ACSL5 + 0.029 x FASN + 0.201 x HMGCS2 + 0.119 x SCD1 was used to create the ROC curve. Using the optimal cutoff value as 0.493, the training set samples were divided into two groups with high-risk and low-risk scores of colon cancer. The combination of the measurements of these genes into a single risk score based on  Fig 1A).
To further examine the diagnostic performance of the target panel, an independent sample group consisted of 253 TCGA-COAD profiles (203 CRCs and 50 normal controls) was considered as the validation set II. The corresponding AUC for all CRCs (I-IV TNM stages) compared to healthy group was 0.9071 (95% CI: 0.8719-0.9423; sensitivity: 80.69% and specificity:

Prognostic performance of lipogenic genes in CRC
Cox's proportional hazards model was applied to calculate the prognostic value of lipogenic genes in CRC tumors, and for a better conclusion, the clinical features and histopathological data were also considered in either univariate and multivariate analyses. According to the data obtained from the univariate analysis, TNM staging and lymph node metastasis had the best prognostic values between the three independent target groups ( Table 4). Besides these clinical variables, Age higher than 70 was also considered as a non-modifiable risk factor in multivariate analysis.
The prognostic value of lipogenic genes panel was examined unadjusted and together with the clinical variables (Age, TNM staging, and lymph node metastasis) by the multivariate

PLOS ONE
analysis. As already shown in Table 5 Table 5). Kaplan-Meier survival analysis was carried out to estimate the 5-year OS of CRC population with the abnormal expression of individual genes (Fig 4) and lipogenic genes panel ( Fig  5). The median follow-up and 5-year OS of patients in target groups were as follows: Training set = 72 months and 87.4%, respectively; Validation set I = 75 months and 89.1%, respectively,  Fig  4L-4O). These data indicated that as an independent prognostic factors, the abnormal level of ACOT8, ACSL5, FASN, HMGCS2, and SCD1 is correlated with the worse clinical outcome of the CRC patients.

Discussion
Nowadays, the histopathological diagnosis of CRC is carried out with the TNM classification system. However, due to the lack of prediction accuracy via TNM staging, especially in early CRC cases, the time needed to use a proper therapeutic strategy may be lost. Thus, besides improving the outcome prediction for early CRC, the identification of new biomarkers with higher sensitivity and specificity will pave the road for choosing the best treatment with less cost and risks for CRC patients.
The prognostic value of various gene expression signatures has been investigated in CRC during the last decade. For example, an Oncotype DX assay of 12 genes involved in cell cycle control, stromal response and genotoxic stress, indicated a significant association between recurrence score (RS) and risk of deficiency in mismatch repair (MMR) along with tumor recurrence in over 1700 stage II CRC patients [22]. Accordingly, RS was reported as an independent predictor of CRC recurrence, particularly for cases with 3 MMR-I tumors [22]. The data was in line with an earlier report of 1,436 stage II CRC patients in which RS was significantly correlated with the risk of recurrence beyond the traditional clinical features [23]. An optimal set of 18-gene coloprint assay divided the I-III stages CRC tumors (n = 188) into low and high risks disease groups [24]. According to the data, the panel was succeeded in identifying low-risk cancer cases with a significantly higher 5-year relapse-free survival rate compared to the rest of patients (87.6% vs. 67.2%, P < 0.05). Profiling of a 23-gene ColoGuideEx panel between Dukes'B CRC patients (n = 72) indicated an OS accuracy of 78% (sensitivity: 72% and specificity: 83%), and statistically disease-free time difference (P < 0.0001) between the predicted relapse and disease-free patients [25]. On the other hand, Agesen et al. validated a Colo-GuideEx panel consisted of 13 genes for tumor relapse prediction in patients with stage II CRC [26]. In another study, a 32,000 cDNA microarray analysis was performed to identify molecular markers for accurate CRC staging [27]. The authors optimized a 43-gene set with an

PLOS ONE
improved prediction capability of 3-year OS than Dukes' staging (P < 0.05) and announced their molecular staging classifier more accurate than the traditional clinical staging [27]. Although these reports are promising, however, suggested gene panels consisted of intracellular signaling mediators with a minor biological significance, and through that might affect the interpretation of the results. Therefore, focusing on the main biological processes such as cellular metabolism with a high impact on cancer initiation and progression may introduce potential biomarkers for CRC screening along with new therapeutic strategies and targets.
Dysregulation of cellular signaling pathways is one of the important hallmarks of cancers. Considering this point that abnormal activity of energy metabolism cascades such as lipid metabolism has a distinctive role in cancer development, their expression and activity status has been subjecting of interest of researchers for screening and therapeutic inventions. In line with previous studies, we examined the putative correlation between the lipogenic genes signatures and the prediction of the outcome of early CRC patients. Analysis of three independent cohorts indicated significant upregulation of ACOT8, ACSL5, FASN, HMGCS2, and SCD1 as the key dysregulated metabolic factors within the study population (Fig 6). Acyl-CoA Thioesterase 8 (ACOT8) is a peroxisomal lipolysis-related enzyme catalyzing fatty acyl-CoA breakdown into FFA and COA molecules for β-oxidation. The potential role of ACOT8 in cancer development has been raised regarding the reports of its overexpression in hepatocellular carcinoma and ovarian cancer cells [28,29]. Meanwhile, the possible prognostic role of ACOT8 has only investigated during lymph node metastasis of lung adenocarcinoma in which the authors reported that ACOT8 upregulation was associated with poorer prognosis of lung cancer patients [30]. ACSL5 is a member of Acyl-CoA synthetase long-chain family that unlike the other members of ACSL family, provokes β-oxidation [31] or triacylglycerols storage [32] due to its cellular location. ACSL5 dysregulation was previously reported in bladder cancer [33], breast cancer [33,34], glioma [35], glioblastomas [36], and pancreatic ductal adenocarcinoma [37]. Meanwhile, the expression status of ACSL5 in CRC tumors is vague. While some studies reported that ACSL5 downregulation is associated with tumor development [33,38] or early tumor recurrence [39], the other investigations indicated that ACSL5 overexpression plays a key role in colon cancer cells aggressiveness [40,41]. FASN was perhaps the most studied member of our panel in oncology, which catalyzes palmitate synthesis by the condensation of malonyl-CoA and Acetyl-CoA. Downregulation of FASN with the RNAi technology has a significant impact on lipid metabolism depression and TG storage of human lymph node metastatic lesion of prostatic adenocarcinoma (LNCaP) cells [42]. Considering this point that tumor cells survival is mostly depended on FASN-mediated de novo synthesis of FAs, targeting the FASN enzyme is suggested as a suitable therapeutic strategy for human cancers [43]. The other target gene with a rate-limiting role in lipid metabolism was the mitochondrial 3-hydroxy-3-methylglutaryl-CoA synthase (HMGCS2). Since cancer cells use the ketogenesis as an alternative energy source, constitutive expression of HMGCS2 as the first step of this chain is essential for tumor development [11,44]. HMGCS2 may promote the metastasis of CRC and oral cancer cells in a ketogenesis enzymatic-independent manner via HMGCS2/ PPARα/Src axis activation [45]. The last member of our panel was Stearoyl-CoA-desaturase 1 (SCD1), a key enzyme downstream of FASN, which is highly activated in palmitate-monounsaturated FAs transformation by catalyzing Δ9 position desaturation [46]. SCD1 expression is reported to stimulate following activation of PI3K-Akt-mTOR pathway in cancer cells [47] and therefore has been investigated as a therapeutic target in a variety of human cancers including colon [48,49], endometrial [50], glioblastoma [51], lung [52], and renal cell carcinoma [53].
Our study highlights the impact of ACOT8, ACSL5, FASN, HMGBCS2, and SCD1 genes in lipid metabolism along with their distinctive role in cancer initiation and progression. To our knowledge, the combined expression signature of these genes has not been investigated in human cancers until recently, and our study is the first to report this expression pattern profile in cancer tumors. Besides the announced diagnostic and prognostic values of the target panel for early CRC, our investigation provided evidence indicating CRC tumors may benefit of ACOT8/ACSL5/FASN/HMGBCS2/SCD1 axis activation for their structural and energetic demands without common lipidic toxicity such as overproducing endogenous ceramide by inhibition of SCD1 enzyme which previously reported in CRC cells [48].
To have a complete insight, this investigation analyzed the possible dysregulation in the wide range of lipogenic genes in CRC tumors. This large sample size (n = 683) allowed us to examine early and advanced CRC samples together for better visualization and to determine the sensitivity and specificity of selected biomarkers. However, there were some limitations to this study. First, including all four tumor stages made the samples collection process non-randomized and non-blinded. Also, since samples were gathered in 6 years, not all of the patients had been tested for their cholesterol level and Body Mass Index (BMI), which could help us for a better conclusion.

Conclusion
Taking together, our panel demonstrates a better prognostic performance for the screening of the early CRC and tumor recurrence compared to the validated clinical risk scales by the American Society of Clinical Oncology (ASCO). However, further investigations are needed to elucidate the mechanisms involved in ACOT8/ACSL5/FASN/HMGBCS2/SCD1 axis activation.
Supporting information S1 Table. The expression level analysis of 61 lipogenic genes between CRC tumors and matched adjacent normal tissues in the training set (n = 257).