Mining, Validation, and Clinical Significance of Colorectal Cancer (CRC)-Associated lncRNAs

Background Colorectal cancer (CRC) is one of the deadliest tumours, but its pathogenesis remains unclear. The involvement of differentially expressed long non-coding RNAs (lncRNAs) in CRC tumorigenesis makes them suitable tumour biomarkers. Methods/Findings Here, we screened 150 cases of CRC and 85 cases of paracancerous tissues in the GEO database for differentially expressed lncRNAs. The levels of lncRNA candidates in 84 CRC and paracancerous tissue samples were validated by qRT-PCR and their clinical significance was analyzed. We identified 15 lncRNAs with differential expression in CRC tumours; among them, AK098081 was significantly up-regulated, whereas AK025209, BC040303, BC037331, AK026659, and CR749831 were down-regulated in CRC. In a receiver operating characteristic curve analysis, the area under the curve for the six lncRNAs was 0.914. High expression of AK098081 and low expression of BC040303, CR749831, and BC037331 indicated poor CRC differentiation. CRC patients with lymph node metastasis had lower expression of BC037331. In addition, the group with high AK098081 expression presented significantly lower overall survival and disease-free survival rates than the low-expression group, confirming AK098081 as an independent risk factor for CRC patients. Conclusion/Significance In conclusion, we have identified multiple CRC-associated lncRNAs from microarray expression profiles that can serve as novel biomarkers for the diagnosis and prognosis of CRC.


Introduction
Colorectal cancer (CRC) is the third most common malignant cancer, causing more than 693,900 deaths worldwide each year [1]. In China, CRC is the fourth most common malignant tumour, with a progressively increasing incidence [2]. With recent advances of comprehensive CRC treatment, more than 90% of patients with early-stage CRC can now be cured. However, in most CRC patients, the tumour has already developed to an advanced stage by the time it is diagnosed, thus severely reducing the five-year survival rate [3]. The identification of new molecular regulatory mechanisms in CRC tumorigenesis and progression could result in novel therapeutic targets for the diagnosis and treatment of CRC.
In the post-genomic era, transcriptomics studies have gradually gained importance. More than 98% of genome transcripts are non-coding RNAs (ncRNAs) that do not encode for proteins. Among them, long non-coding RNAs (lncRNAs) are ncRNAs of more than 200 nucleotides. LncRNAs can regulate physiological and pathological processes at epigenetic, transcriptional, and post-transcriptional levels [4]. Numerous studies have shown that lncRNAs are involved in cell proliferation, differentiation, apoptosis, metastasis, and many other processes. Given their significant role in tumorigenesis and development, they have become of interest in cancer research. Liu et al.reported that over-expression of NF-κB-interacting lncRNA-NKILA suppressed breast cancer metastasis [5]. Zhang et al. found that overexpression of lncRNA-TUG1 promoted the growth of gastric cancer cells and was associated with poor prognosis of this disease [6]. In addition, lncRNA-FEZF1-AS1 has been found to mediate CRC tumorigenesis and progression [7], whereas lncRNA-CASC11 appears to be upregulated in CRC, where it promotes proliferation and metastasis [8]. Considering that lncRNAs closely correlate with the clinical characteristics of cancer patients, they can be used as diagnostic and prognostic tumour indicators [9,10].
The screening of relevant lncRNAs in tissue samples is mainly based on lncRNA microarrays [11]. To date, the amount of open-access lncRNA microarrays for CRC samples is limited. Over the past 20 years, differentially expressed genes in CRC have been detected using microarray expression profiles. These data are now deposited in public databases such as the Gene Expression Omnibus (GEO). Recent studies have shown that these data sets encompass a large number of specific probes, which are now considered as lncRNAs [12]. In this study, we systematically searched the GEO database for differentially expressed lncRNAs in CRC and paracancerous tissues. The levels of lncRNAs in CRC and paracancerous tissues were validated by quantitative real-time polymerase chain reaction (qRT-PCR) and correlated with the clinicopathological features of CRC.

Mining of lncRNA data from microarrays
We first searched the GEO database (http://www.ncbi.nlm.nih.gov/gds/) for all HGU133-PLUS2.0 microarray data related to CRC. Data were revised to exclude instances of less than 10 cancerous and paracancerous samples per group. IQRray_score [13] and presence/absence call [14] were used to ensure the overall quality of microarray data and exclude any abnormal specimens. A platform called GATExplorer [15] was used to analyze microarray data at the nucleotide level in a genomic context and distinguish lncRNA candidates.The standards for lncRNAs in GATExplorer were:(i) there was experimental (i.e. EST, cDNA, RT-PCR and/or northern blot) evidence to support their existence as RNAs; (ii) they did not contain a significant ORF (i.e. <100 amino acids); (iii) they were not annotated as rRNAs, tRNAs, snoRNA, miRNA and spliceosomal RNAs; and (iv) they were mammalian [16]. Mircoarray-meta analysis [17] was applied to identify differentially expressed lncRNAs in the cancerous and paracancerous tissues.

Tissue specimens
Pathologically confirmed tissue specimens from 84 patients who had not received radiation or chemotherapy before surgery were collected at The First Affiliated Hospital of Wenzhou Medical University between February 2004 and December 2008. The specimens were obtained from surgically removed cancerous and corresponding paracancerous normal tissues, and stored in liquid nitrogen with long-term follow-up records. Paracancerous normal tissues means the normal mucosa tissue witch is more than 5cm away from the edge of cancer. The study was approved by the Research Ethics Committee of The First Affiliated Hospital of Wenzhou Medical University. Written informed consent was obtained from all patients. Patient information can be found in S1 Table. The overall survival time was calculated from the date of surgery to death. Disease-free survival time was calculated from the date of diagnosis to the date of surgery and recurrence; if recurrence was not diagnosed, then the date of death or the last followup was used as a reference.

5'/3'rapid amplification of cDNA ends (RACE)
Total RNA was extracted from frozen specimens using TRIzol (Life Technologies, Carlsbad, CA, USA) according to the manufacturer's instructions. RNA concentration was determined using a UV-visible spectrophotometer (NanoDrop ND-1000). Oligo dT-3' Adaptor primers from the 3'-Full Race Core set (TaKaRa, Shiga, Japan) were used for reverse transcription of total RNA into cDNA. To obtain the lncRNA 3'-end fragment, nested PCR was performed using a set of nested primers, consisting of a specific primer pair (S2 Table) and the 3'RACE-Out and 3'RACE-Inner primers from the kit. Total RNA was dephosphorylated and "decapitated" as instructed in the 5'-Full RACE Kit (TaKaRa), followed by ligation of the 5' Adaptor primer and reverse transcription. To obtain the lncRNA 5'-end fragment, a nested PCR was performed using a set of nested primers, consisting of a specific primer pair (S2 Table) and 5'RACE-Out and 5'RACE-Inner primers from the kit. All fragments obtained were sent to the Beijing Genomics Institute (Beijing, China) for sequencing after ligation in the pMD18-T vector (TaKaRa).

Quantitative real-time PCR assay
Total RNA (1μg) was used as template for reverse transcription using the ReverTraAce qPCR RT kit (Toyobo, Tokyo, Japan). LncRNA was quantified by RT-PCR using RNA-direct SYBR Green Real-time PCR Master Mix (Toyobo) and specific primers. Reactions were performed on a CFX96 Touch™ Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA) under the following conditions: initial step of 5 min at 95°C, followed by 40 amplification cycles of 15 s at 95°C, annealing for 32 s at a custom temperature, and a final step of 32 s at 72°C. Primers were obtained from the Beijing Genomics Institute. Primer sequences and annealing temperatures can be found in S2 Table. Each experiment was performed in triplicate.
construct the logistic regression model of lncRNA for prediction of CRC. Survival curves were analyzed using the Kaplan-Meier method and statistical significance was determined using the log-rank test. The significance of survival variables was analyzed by the Cox multivariate proportional hazards model. P<0.05 was considered statistically significant.
Expression of the lncRNA candidates in tissue specimens of CRC The 11 lncRNAs selected by the above-mentioned experiments were further analyzed by qRT-PCR in 84 CRC and corresponding paracancerous tissue samples. The expression of lncRNAs was normalized to that of glyceraldehyde 3-phosphate dehydrogenase (GAPDH; 2 -44Ct ). The melting curve and agarose gel electrophoresis of GAPDH and each lncRNA were presented in S3 Fig. AK098081 was significantly up-regulated in CRC compared with paracancerous tissues (P<0.05); AK025209, BC040303, BC037331, AK026659, and CR749831 were significantly down-regulated in CRC (P<0.05); whereas EU294757 and AK095500 showed no significant difference in cancerous and paracancerous normal tissues (P = 0.095 and P = 0.966, respectively) (Fig 2).

Diagnostic value of six lncRNAs as biomarkers of colorectal cancer
To determine whether AK098081, AK025209, BC040303, BC037331, AK026659, and CR749831 can be used as biological indicators for diagnosis of CRC, receiver operating characteristic (ROC) curve analysis was performed based on the expression level of lncRNAs in CRC and corresponding paracancerous normal tissues (represented as 2 -4Ct ). As shown in Fig 3, (Fig 3).
Correlation between lncRNA levels in cancer tissues and clinicopathological characteristics of patients with colorectal cancer Next, we analyzed the correlation between expression of the six lncRNAs (represented as 2 -44Ct ) and CRC patients' sex, age, tumour size, degree of differentiation, T stage, N stage, TNM stage, and serum carcinoembryonic antigen (CEA) levels. AK098081 was more highly expressed in poorly and moderately differentiated CRCs than in well-differentiated tumours (P = 0.032). The AUC of its ROC curve was 0.721 (95% CI = 0.517-0.924, P = 0.045, S6 Fig) with a cutoff value of 1.09, 78.8% sensitivity, 77.8% specificity, and Yonden index of 0.566. The low expression of BC040303 (P = 0.040), CR749831 (P = 0.022), and BC037331 (P = 0.037) indicated that the tumour was poorly differentiated. Moreover, BC037331 was poorly expressed in CRC patients with lymph node metastasis (P = 0.016), and its expression was negatively correlated with serum CEA levels in CRC patients (P = 0.035) ( Table 4). All clinicopathological data was presented in S3 Table. Association between lncRNA expression and patient survival The patients were divided into high-and low-expression groups based on the mean expression level of lncRNAs. LncRNAs with clinicopathological significance (AK098081, BC037331, BC040303 and CR749831) were subjected to overall survival and disease-free survival analyses. A Kaplan-Meier survival analysis showed that the group with high AK098081 expression group had significantly lower overall survival and disease-free survival rates than the lowexpression group (P = 0.004 and P = 0.030, respectively, log-rank test) (Fig 4). In contrast, there was no difference in overall survival and disease-free survival rates between the highexpression and low-expression groups for BC037331, BC040303, and CR749831 (S5 Fig). To determine the prognostic capacity of AK098081 in CRC patients, we applied the Cox multivariate proportional hazards model (Table 5). Patient age, degree of differentiation of the tumour, TNM stage, serum CEA levels and AK098081 expression level were closely associated with overall survival rate (P = 0.014, P = 0.000, P = 0.001, and P = 0.000, respectively). Thus, AK098081 expression can be considered an independent risk factor for CRC patients (HR = 1.896, 95% CI = 1.393-2.579, P = 0.000).

Discussion
LncRNAs are ncRNAs with a length of more than 200 nucleotides. They are involved at different levels in the regulation of physiological and pathological processes [21]. Moreover, Approximately 18% of lncRNAs are associated with tumours, whereas the proportion of tumourassociated coding genes is only 9% [22]. Hence, lncRNAs may play an important role in tumorigenesis and tumour progression [23]. Numerous studies have confirmed that lncRNAs are involved in the tumorigenesis and development of CRC via multiple pathways, such as DNA methylation and histone modifications [24,25]. These findings have posed the question of how to accurately screen for lncRNAs of interest within the vast amount of data already available. Compared to the traditional approach that integrated lncRNA microarrays with expression profiling, a new screening strategy enables the cost-effective integration of multiple microarrays with genome sequencing data [26]. Many of specific probes from previous expression  profiling studies have been identified as putative lncRNAs [27]. Here, we searched the GEO database for all CRC expression profiling data. Five microarrays were selected, which included CRC cases, colorectal adenoma cases, and normal tissues. CRC commonly arises from normal colorectal tissue that develops into adenomas [28]. Of 142 differentially expressed lncRNAs, 15 candidates were selected. These 15 lncRNA candidates were aligned to known lncRNAs in the NONCODE and LNCipedia databases, and 11 were identified as known lncRNAs. Interestingly, RACE experiments revealed that AK096164 lncRNA was a new transcript of LNC-EIF2C2-1. It is widely known that different lncRNA transcripts can be functionally related yet different from each other [29]. For instance, both CCAT-1 and CCAT-2 lncRNAs promote the proliferation of CRC, but CCAT-1 is an essential factor for chromosome looping at the MYC locus, whereas CCAT-2 promotes chromosomal instability [30,31].
Quantitative RT-PCR showed that some lncRNAs (AK098081) were significantly up-regulated, whereas others (AK025209, BC040303, BC037331, AK026659, and CR749831) were down-regulated in CRC. The AUC of the ROC curves for all of these lncRNAs was 0.65-0.86, with AK026659 having the highest value, 0.859. To further improve diagnostic efficiency, the data of multiple lncRNAs were innovatively integrated to construct a logistic regression model. The AUC of the CRC-predicting model was 0.914, with 86.5% sensitivity and 88.9% specificity, indicating it had a better diagnostic efficiency than detection of serum CEA in patients [32]. Several recent studies have shown that lncRNAs are closely associated with clinicopathological features and prognosis of patients with tumours. Yunlong et al. reported that NEAT1 was highly expressed in CRC compared with paracancerous tissues. Moreover, patients with high NEAT1 expression had a poor differentiation and a shorter tumour-free survivaltime [33]. Expression of lncRNA-HOTAIR in lymph node metastatic foci of CRC was reported to be higher than that in primary tumours, indicating a poor prognosis [34]. Similarly, lncRNA-ABH-D11-AS1 was found to be closely associated with tumour size and serum CEA in patients with gastric cancer [35]. In this study, the expression levels of AK098081, BC040303, CR749831, and BC037331 were associated with the degree of tumour differentiation (P = 0.032, P = 0.040, P = 0.022, and P = 0.037, respectively), whereas expression of BC037331 was negatively correlated with lymph node metastasis (P = 0.016) and serum CEA level in patients (P = 0.035). These  results suggest that expression levels of AK098081, BC037331, BC040303, and CR749831 are associated with poor prognosis in CRC patients. We also found that overall survival and diseasefree survival rates were lower in the group with over-expression of AK098081 than in the lowexpression group, confirming that AK098081 is an independent risk factor for CRC patients. Therefore, the lncRNAs identified in this study can be used as new biomarkers to predict and even improve the prognosis of CRC patients.
In summary, we have applied an innovative data mining approach to rapidly screen for differentially expressed lncRNAs in CRC using published gene expression profiling microarrays. We have also integrated multiple lncRNA expression data to construct a predictive model for CRC. This method enabled us to analyse the clinicopathological significance of lncRNAs and their effects on the prognosis of CRC patients. The present study has successfully demonstrated that the identified lncRNAs are potential diagnostic and even therapeutic targets in CRC.