Figures
Abstract
Regular screening for colorectal cancer (CRC) is critical for early detection and long-term survival. Despite the current screening options available and advancements in therapies there will be around 53,000 CRC related deaths this year. There is great interest in non-invasive alternatives such as plasma cell-free RNA (cfRNA) for diagnostic, prognostic, and predictive applications. In the current study, our aim was to identify and validate potential cfRNA candidates to improve early CRC diagnosis. In phase 1 (n = 49; 25 controls, 24 cancers), discovery total RNA sequencing was performed. Select exons underwent validation in phase 2 (n = 73; 35 controls, 29 cancers, 9 adenomas) using targeted capture sequencing (n = 10,371 probes). In phase 3 (n = 57; 30 controls, 27 cancers), RT-qPCR was performed on previously identified candidates (n = 99). There were 895 exons that were differentially expressed (325 upregulated, 570 downregulated) among cancers versus controls. In phases 2 and 3, fewer markers were validated than expected in independent sets of patients, most of which were from previously published literature (FGA, FGB, GPR107, CDH3, and RP23AP7). In summary, we optimized laboratory processes and data analysis strategies which can serve as methodological framework for future plasma RNA studies beyond just the scope of CRC detection. Additionally, further exploration is needed in order to determine if the few cfRNA candidates identified in this study have clinical utility for early CRC detection. Over time, advancements in technologies, data analysis, and RNA preservation methods at time of collection may improve the biological and technical reproducibility of cfRNA biomarkers and enhance the feasibility of RNA-based liquid biopsies.
Citation: Northrop-Albrecht EJ, Wu CW, Berger CK, Taylor WR, Foote PH, Doering KA, et al. (2024) An investigation of plasma cell-free RNA for the detection of colorectal cancer: From transcriptome marker selection to targeted validation. PLoS ONE 19(8): e0308711. https://doi.org/10.1371/journal.pone.0308711
Editor: Alvaro Galli, CNR, ITALY
Received: March 26, 2024; Accepted: July 29, 2024; Published: August 15, 2024
Copyright: © 2024 Northrop-Albrecht et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Most data are presented within the paper and its Supporting Information files. However, Mayo Clinic and Exact Sciences have a Sponsored Research Contract and Intellectual Property Development Agreement, the terms of which preclude us from disseminating raw sequencing data publicly. Data and materials restrictions have been put in place by the Mayo Clinic institutional review board and Legal Contracts Administration, the contact for which is Randall Jones (jones.randall@mayo.edu). Additional data restrictions may be in place from the study sponsor, Exact sciences, who supported this work under a sponsored Research Contract, and with whom Mayo Clinic has an intellectual property development agreement. In addition to required approval from the Mayo Clinic Institutional Review Board (507-266-4000), a Data Use Agreement (DUA) will be put in place by Mayo Clinic Legal Contracting Administration with any academic group or scientists before any transfer of data. Investigators receiving the data will be required to abide by the conditions of these agreements.
Funding: This work was supported by grant number R37 CA214679 (JBK) from the National Institute of Health. Sequencing costs and reagents were provided under a sponsored research agreement between Mayo Clinic and Exact Sciences. This publication also used shared core resources of the Mayo Clinic Comprehensive Cancer Center, supported by Grant Number P30 CA015083 from the National Cancer Institute. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: Ms Berger, Messrs Taylor & Mahoney, and Dr. Kisiel are inventors of Mayo Clinic intellectual property licensed to Exact Sciences (Madison, WI) and may receive royalties, paid to Mayo Clinic. John Kisiel receives funding from a sponsored research agreement between Mayo Clinic and Exact Sciences. Because of the terms of the Sponsored Research Contract and Intellectual Property Development Agreement between Mayo Clinic and Exact Sciences, we have restrictions on sharing raw sequencing data publicly.
Introduction
Colorectal cancer (CRC) is the third most common cause of cancer related death in both men and women in the United States. This year, approximately 153,000 people will be diagnosed with CRC, and around 53,000 will die from the disease [1]. Stage of diagnosis is the most important predictor of survival, with a 91% five-year relative survival for localized disease, and 14% for distant disease [1]. Regular screening can increase early CRC detection. Colonoscopy is the most widely used screening method and has been reported to reduce mortality from CRC [2]. However, overall patient adherence to recommended screening guidelines is inadequate due to absence of insurance, time constraints, household financial burden, bowel preparation, lack of awareness, and fear of procedural risks and pain [3]. Stool based non-invasive testing alternatives include guaiac-based fecal occult blood test (gFOBT), fecal immunochemical test (FIT), and the multi-target stool DNA test (mt-sDNA) [4, 5]. The mt-sDNA test has improved sensitivity for CRCs throughout the colon and high risk precancers compared to fecal blood tests, but sensitivity for the detection of advanced precancerous lesions (43%) has room for improvement [6, 7]. Despite the different screening options available, nearly one third of the screening eligible population is not up to date on current recommendations. The development of blood-based tests would likely further adherence to CRC screening guidelines and revolutionize patient care.
There is a heightened interest in circulating nucleic acids in biofluids as a non-invasive source for cancer biomarkers. Circulating cell-free RNA (cfRNA) is released from both cancerous and non-cancerous cells into the blood by apoptosis, necrosis, or active secretion [8]. Circulating RNA has advantages over circulating DNA and protein. RNA has multiple copies that can provide insights into cellular states and regulatory processes, and it can be quantified in a highly specific and sensitive manner. Cell-free RNA in plasma has been studied in a variety of cancers as a biological marker for early disease diagnosis, survival, surveillance, and recurrence [9, 10]. In 2006, Wong and others were the first to establish beta-catenin mRNA in plasma can serve as a potential marker for CRC [11]. Since then, several studies have investigated cfRNA expression for colorectal precancerous lesions and cancers, but they were often limited by a single gene expression technique being used (RT-qPCR, ddPCR, RNA sequencing, and targeted sequencing) and few if any marker validation steps were performed to confirm their markers’ diagnostic power [12].
While cfRNA is a powerful analyte for cancer research, it is highly unstable and degraded due to high RNase activity in blood unless protected inside extracellular vesicles. Adding to the difficulty, Larson and others (2021) recently reported that ribosomal RNA and mitochondrial RNA make up approximately 97% of cfRNA in plasma, while mRNA was around only 2% [13]. Additionally, technical and biological reproducibility is a concern with cfRNA biomarker discovery [14]. Differences in sample handling/processing methods, data analysis, and external factors such as sex, age, and lifestyle can impact cfRNA expression [15].
In the current study, we aimed to perform a complete investigation into the use of cfRNA in plasma as a source for CRC biomarkers. We not only developed protocols for RNA extraction, library preparation, targeted capture, and reverse transcription-quantitative polymerase chain reaction (RT-qPCR), but also identified a few potential CRC biomarker candidates that need to be further explored. These same optimized procedures may also be useful and applied in other fields of research. Additionally, we uncovered challenges associated with cfRNA data processing and analysis and offer alternative suggestions for future plasma RNA applications.
Materials and methods
Study synopsis
This work was approved by the Mayo Clinic Institutional Review Board (IRB 18–008752). The patient samples were obtained from another principal investigator, and under the current study there was a waiver of consent. Minnesota authorization was checked prior to obtaining clinical information. The three experiment phases are summarized in Fig 1. The first discovery phase was a global approach for identifying differentially expressed RNA markers in plasma from healthy control and cancer patients. During this phase we were also interested in determining the stability of cfRNA from two different collection tube additive types. Next targeted capture sequencing was performed in independent samples to test select markers identified during phase 1. Finally, top performing markers from phases 1 and 2 along with markers identified through literature review were tested on an independent cohort of samples using RT-qPCR.
Summary of total number of patients within each sample group for the different phases of the experiment.
Sample selection criteria and processing
Ethylenediaminetetraacetic acid (EDTA) preserved plasma and buffy coat samples were processed and stored according to standardized institutional protocols in the central repository of the Mayo Clinic Biospecimens Accession and Processing laboratory. For a subset of samples in phase 1, whole blood was collected into LBgard® (Biomatrica, San Diego CA) tubes that contained a proprietary stabilizer and processed following the manufacturer’s instructions. Within each phase, control, precancerous lesion, and cancer patients were balanced for age (quartiles), sex, and storage time (median). Cancers included newly diagnosed, referred and regional CRC patients prior to receipt of treatment, enrolled into a research biobank between 2010 and 2022. Controls were recruited from a 7-county regional population and had no history of disease as determined by negative colonoscopy or multi-target stool DNA test. Additionally, for all phases we excluded patients with a history of previous cancer or cancer within 3 years of diagnosis and transplant. Specifically, for phase 2, we also excluded patients with rheumatoid arthritis, lupus, cirrhosis, acute/chronic pancreatitis, pancreatic cysts, or a cancer diagnosis within three years of their blood draw. Patients in the phase 2 study with advanced precursor lesions were equally balanced between adenomatous (tubular adenoma >/ = 1cm or any villous component, or high-grade dysplasia) and sessile serrated lesions >0.5 cm. Statisticians and clinical coordinators had access to detailed clinical information on all the samples during the study (S1 File). All samples for each phase were randomized, and the laboratory personal were blinded to sample group information.
Phase 1-Discovery RNA sequencing and data analysis
Total RNA was extracted from plasma (500ul) using Qiagen’s miRNeasy Serum/Plasma Advanced Kit (Germantown, MD) following the manufacturer’s instructions (n = 84). RNA concentration was quantified using Quant-it RiboGreen (Invitrogen, Waltham MA). Library preparation was performed using the SMARTer Stranded Total RNA-seq Kit-Pico input kit (Takara Bio, San Jose CA), the fragmentation step was skipped. Roche’s KAPA library quantification kit (Indianapolis, IN) and the Agilent Bioanalyzer (Agilent Technologies, Santa Clara, CA) were used to confirm the quality of the libraries. Paired-end sequencing (8 samples per lane) was performed on an Illumina Hiseq 4000 platform at Mayo Clinic’s Genomics Core.
Adapter sequences in raw FASTQ files were trimmed using Trim Galore (https://github.com/FelixKrueger/TrimGalore) and then the trimmed sequences were run through the Mayo Analysis Pipeline MAPS-seq (v3) for RNA sequencing [16]. Reads were aligned to the hg38 genome using Star [17]. The quality of the reads was checked using RSeQC [18]. Differential gene expression analysis using DESeq2 in R (4.1.2) was performed [19]. Exons were considered differentially expressed if the false discovery rate (FDR) ≤ 0.05 and fold change was ≥ 2. Ingenuity pathway analysis was used to explore functions and pathways associated with the differentially expressed exons [20].
Phase 2- Marker selection, targeted capture sequencing, and data analysis
Total RNA was extracted from EDTA buffered plasma (2 ml) using Qiagen’s QIAamp Circulating Nucleic Acid Kit with DNase treatment and RNA cleanup (Germantown, MD). RNA quantity and quality was assessed using Quant-it RiboGreen (Waltham, MA) and the Agilent Bioanalyzer (Santa Clara, CA). Ribosomal depletion was performed using the NEBNext rRNA Depletion kit v2 Human/Mouse/Rat (New England Biolabs, Ipswich MA). The RNA was then converted into double stranded cDNA using the NEBNext Ultra II RNA First Strand Synthesis with no fragmentation step and NEBNext Ultra II Non-Directional RNA Second Strand Synthesis Modules (Ipswich, MA). Instead of using beads in step two of the second module, the double stranded cDNA was purified using the Zymo Oligo Clean & Concentrator (Zymo Research, Irvine CA). Library preparation was performed using Roche’s KAPA HyperPrep Kit and KAPA Dual-Indexed Adaptor Kit (Indianapolis, IN). Adaptors were diluted to 850nM and incubated for 1 hour. Libraries underwent 19 amplification cycles. For cleanup steps, a higher bead ratio (1.2x) was used to avoid loss of small fragments. Libraries were assessed using Roche’s KAPA Library Quantification kit (Indianapolis, IN) and the Agilent Bioanalyzer (Santa Clara, CA). There were 135 genes selected for further validation in phase 2 based on the following criteria: FDR, fold change, AUC, and GTEx/TCGA data. Many of these genes were previously identified to play a role in cancer development. For normalization purposes, 13 control genes were also selected for probe design. IDT created a custom discovery probe pool with 2x tiling that covered all exons of the genes of interest. The pool originally consisted of 10660 probes, but 289 were removed because the algorithm revealed that they may be present at high levels throughout the genome (S2 File). Each library (500ng) was used as input for the hybrid capture reaction using IDT’s Hybridization and Wash Kit (Newark NJ). Each sample underwent 14 cycles of post-capture PCR.
Samples were then run on an Illumina Novaseq SP flow cell with paired-end sequencing at Mayo Clinic’s Genomics Core. The data were processed using MAPS-seq (v3), the same as the discovery dataset [16]. Eight samples with unexpected low number of reads were excluded from the final analysis. Differential expression analyses were conducted using DESeq2 in R (4.1.2) [19]. Exons/genes were considered differentially expressed if the FDR ≤ 0.05 and fold change was ≥ 2.
Phase 3- Marker selection, plasma RT-qPCR validation, and reference gene testing
For phase 3A, total RNA was extracted from plasma (4 ml) using the QIAamp Circulating Nucleic Acid Kit with DNase treatment and RNA cleanup. Reverse transcription was performed using Applied Biosystems High-Capacity RNA to cDNA kit (Waltham, MA). Samples underwent 14 preamplification cycles using TaqMan Preamp Master Mix (Waltham, MA). Then qPCR was performed in duplicate using TaqMan Fast Advanced Master Mix (Waltham, MA) on a Roche 480 LightCycler. A no template control was present on each plate to assure there was no background contamination. TaqMan assays (n = 99) were selected based on candidates identified from discovery (phase 1), targeted sequencing (phase 2), TCGA, and published literature (S3 File). The reference gene with the lowest variability across samples was used as a normalizer for targeted genes of interest (delta CT).
RNA inputs into cDNA synthesis were not the same among plasma samples because of the variation and low yields across patients. Therefore, buffy coats were also used to determine the stability of 15 reference genes. For phase 3B, total RNA was extracted from 250ul of buffy coat using the Qiagen’s miRNeasy Mini Kit with DNase treatment and RNA cleanup (Germantown, MD). Total RNA (430ng) was reverse transcribed using Applied Biosystems High-Capacity RNA to cDNA kit (Waltham, MA). The cDNA was diluted to a total volume of 90ul. Then qPCR was performed in duplicate using TaqMan Fast Advanced Master Mix on a Roche 480 LightCycler (Indianapolis, IN) following the manufacturer’s recommended PCR conditions. A no template control was present on each plate to assure there was no background contamination. Standard deviations of Ct values were calculated among cancer and control patients. The most stable assay across patients was identified and applied to the plasma data for normalization.
Statistical methods
Sample size for RNA sequencing were based on the work of Hart and others (2013) [21]. The calculation of sample size requires an estimate of biological variation (BV). From their results on tissue experiments, they found that the median BV was 0.63 and the 90th percentile was approximately 0.9. To be conservative, we used the 90th percentile estimate. Assuming an average read depth of 20 reads per subject for each transcript, in order to detect a fold change of 2 or higher with 80% power with a two-sided significance level of 0.05 required a minimum of 19 patients per group. Larger group sizes or lower BV for an individual transcript would lower the detectable fold-change given the same power and two-sided significance level.
For phase 1, unsupervised clustering was performed using all expressed genes with 1-Pearson correlation as distance matrix and ward.D2 agglomeration method in R. For phase 2, Kruskal–Wallis (non-parametric) tests were used to compare two or more groups for the RNA yield data. This data was analyzed and plotted in GraphPad Prism 9.5.1 (Carlsbad CA). For phase 3, differential expression between groups (cancers and controls or different cancer stages) ANOVA test with TukeyHSD for contrasts in R (v4.1.2) was performed. Relative fold change was calculated by the 2-ΔΔCt method.
Results
Identification of plasma cfRNA candidates by total RNA sequencing (phase 1)
There was a total of 84 plasma samples that underwent total RNA sequencing in order to identify potential cfRNA targets that distinguish cancer from control patients. The proprietary reagent in LBgard tubes (n = 35) appeared to be insufficient at stabilizing cfRNA because these samples consistently had poor quality metrics (low mapping rates and low number of reads mapped to annotated genes) and were therefore removed from the final analysis (Fig 2). Additionally, there were 7 EDTA buffered plasma samples removed due to poor quality. The poor results observed in some of these samples was likely due to insufficient starting material prior to library preparation. For the samples that passed QC metrics [n = 42: 23 controls, 19 cancers (stages I: 2, II: 8, III: 7, IV: 2)] there was an average of approximately 65.4M reads per sample, with a range of 5.2M to 83.1M (S4 File). On average, 81% of reads mapped to human genome with a range of 50% to 89% (S4 File). Unsupervised clustering for expressed genes did not see clear separation between cancers and controls and between different stages of cancer (the labels below dendrogram, the 0 for controls) although in certain sub-clusters cancers and controls were separated (Fig 3). Since the cell-free RNA data were highly fragmented (uneven gene body coverage, shorter fragments generally less than 100bps, and low transcript integrity number [most TINs <20, which represents percentage of transcripts with uniform read coverage]), we opted to conduct differential exon analysis, which could help localize the changed region in a gene. There were 895 cfRNA exons that were differentially expressed among cancer and control patients (S4 File). There were 570 exons (264 genes) downregulated and 325 exons (189 genes) upregulated in the cancer group.
Samples from LBgard tubes consistently had poor quality metrics (low mapping rates and low number of reads mapped to annotated genes) compared to EDTA preserved samples.
There was no clear separation between cancers and controls and between different stages of cancer (0 for controls) although in certain sub-clusters cancers and controls were separated.
Pathway analysis for differentially expressed genes (phase 1)
Ingenuity Pathway Analysis (IPA) was used to explore the functions of differentially expressed genes among cancer and control patients (S5 File). Within the top ten ingenuity canonical pathways were: protein kinase A signaling, molecular mechanisms of cancer, thrombin signaling, integrin link kinase signaling, alpha adrenergic signaling, and IL-8 signaling, all of which have been reported to impact carcinogenesis. The top three categories for the diseases and disorders analysis were cancer, gastrointestinal disease, and organismal injury and abnormalities. Top molecular and cellular functions were cellular assembly and organization, cellular function and maintenance, cellular movement, molecular transport, and cellular development.
Plasma RNA composition (phase 2)
For phase 2, plasma RNA yield was compared among sample groups (n = 73). The median RNA yield was 20.06 ng, 14.15 ng, and 17.32 ng in cancer, control, and adenoma plasma samples, respectively (Fig 4A). There was a tendency for cancer patients to have increased cfRNA yields compared to control patients (P = 0.06). There was no significant difference in total RNA yield among the different stages of cancer (P > 0.44; Fig 4B). For most plasma samples, RNA integrity number (RIN) was between 2.0–2.5.
(a) Plasma RNA yield (ng) for the different sample groups. There was a tendency for cancer patients to have increased cfRNA yields, Kruskal-Wallis test (Mean ± SEM) (b) Plasma RNA yield (ng) by cancer stage. There was no difference in total RNA yield among different stages of cancer, Kruskal-Wallis test (Mean ± SEM).
Validation of cfRNA candidates using targeted capture sequencing (phase 2)
Through extensive troubleshooting we developed a protocol for ribosomal rRNA depletion, double stranded cDNA generation, library preparation, and hybrid capture. There were 73 EDTA plasma samples that were processed for targeted sequencing. Eight samples failed library prep and were therefore removed prior to data analysis of the remaining 30 controls, 9 adenomas, and 25 cancers (stages I: 5, II: 8, III: 10, IV: 2). There was an average of approximately 13.2M reads per sample, with a range of 9.1M to 19.6M (S6 File). On average, 96% of reads mapped to human genome with a range of 84% to 98% (S6 File). The average number of reads that mapped to targets was 72% with a range of 31% to 83% (S6 File).
For the adenoma versus control patient comparison, there were 13 differentially expressed exons (7 downregulated, 6 upregulated; S6 File). For the cancer versus control comparison, there were 224 differentially expressed exons (85 downregulated, 138 upregulated; S6 File). For the cancer versus adenoma comparison, there were 12 differentially expressed exons (1 downregulated, 11 upregulated; S6 File). When comparing cancer versus control on a gene level, there were 35 differentially expressed genes, all of which were downregulated in cancer patients (S6 File). Overall, there were no differentially expressed exons/genes that were validated from the discovery phase among cancer and control groups. Furthermore, the principal component analysis (PCA) plot for all exons revealed a lack of clustering among the three sample groups (Fig 5).
There was a lack of clustering observed among sample groups (AD/Cancer: red, adenoma: blue, and CTR/control: green).
Validation of cfRNA candidates using RT-qPCR (phase 3)
As a final experiment, we wanted to further determine if we could validate the cfRNA candidates identified from phase 1, phase 2, TCGA, and previously published literature using an alternative quantification method, RT-qPCR (S3 File). Patient demographics for this phase are presented in Table 1. Given the low RNA yield in plasma and the number of markers, preamplification was necessary. For plasma samples [n = 57; 30 controls, 27 cancers (stages II: 13, III: 6, IV: 8)], RPS12 had the lowest standard deviation in controls (S1 Fig). For buffy coat samples [n = 58; 30 controls, 28 cancers (stages II: 13, III: 6, IV: 9)], CASC3 had the lowest standard deviation (S2 Fig).
For the first analysis we used RPS12 as the normalizing gene and calculated the delta Ct value for each marker of interest. A simple nonparametric t test was performed on the delta delta Ct values. The expression of most markers was increased in plasma samples from cancer patients compared to control patients, however only six previous targets were statistically significant and validated (P ≤ 0.05; Table 2, S7 File). Of the differentially expressed TaqMan assays in phase 3, five (FGB [n = 2], RPL23AP7 [n = 2], GSDMA) were selected based on literature search and one (GPR107) was previously identified in the current study. Differential expression among the different cancer stages revealed eight differentially expressed TaqMan assays among stage IV patients and control patients (S7 File). Of the differentially expressed TaqMan assays, five (FGA, FGB [n = 2], CFAP58 [n = 2]) were selected based on literature search, two (PERP, GPR107) were selected from the current dataset, and one (CDH3) was selected from TCGA analysis. The second analysis used CASC3 as the normalizing gene, and only identified two TaqMan assays (RPL8, TMBS10) that were classified as statistically significant (S8 File), and they were both downregulated in cancer patients.
Discussion
The unstable nature of RNA makes it a difficult cancer biomarker class to study, especially for cfRNA in plasma since high RNAse activity leads to poor RNA quality and quantity. Circulating mRNAs were first reported in cancer patients in the 1990s, however, there are no cell-free RNA tests that are FDA approved to date [22–24]. Previous research has examined the use of cell-free mRNA markers in colorectal adenomas and/or cancers but prior to the work presented here, a comprehensive investigation performing discovery followed by multiple validation steps has not been conducted [25–27].
Our initial total RNA sequencing experiment revealed hundreds of possible RNA targets, 360 exons of which were upregulated in colon cancer patients compared to controls. We further narrowed down the list of the strongest candidates (n = 135 genes) to pursue validation in an independent cohort. The second experiment used targeted capture sequencing for validation, and of the 35 differentially expressed genes detected, all were downregulated in the colon cancer group compared to controls, in contradistinction to the findings of our initial discovery experiment. Similarly, the control genes we selected based on the initial discovery phase experiment were not stable across conditions. Overall, none of the selected markers identified among cancer and controls in phase 1 were validated in the phase 2 study.
Additionally, patients with precancerous lesions were added to phase 2. Current non-invasive screening options have less sensitivity for detecting adenomas and sessile serrated lesions than for cancers. The early detection and removal of precancerous polyps remains critical to interrupt the adenoma-carcinoma sequence and prevent the development and spread of CRC [28]. In the current dataset, there were only a few differentially expressed targets when comparing adenoma patients to either control or cancer patients, none of which appeared to have strong potential as biomarkers. The limited ability to detect advanced precursor lesions from blood plasma liquid biopsies has also been noted for cell-free DNA as well [29].
The third experiment, we used an alternative approach, RT-qPCR, to attempt to validate markers identified from phase 1 (discovery total RNA sequencing), phase 2 (targeted capture sequencing), literature, and GTEx/TCGA analyses. Overall, we identified six transcripts that were upregulated and validated among cancer and control patients. These markers were selected from previously published data (FGB, RP23AP7, GSDMA; [30]) and phase 2 of the current dataset (GPR107). When performing pairwise comparisons among the different stages of cancer, stage IV patients had increased expression of FGA, FGB, PERP, GPR107, CDH3 compared to controls. This is likely due to an increase in tumor burden and release of more nucleic acids into circulation [31, 32]. Additionally, we determined that results differed based on the reference gene that was used to normalize the data. An alternative approach would be to use digital droplet PCR. This recently developed technology has been shown to have higher precision and reproducibility compared to conventional qPCR [33].
Specifically, CDH3 functions in the regulation of cell adhesion and is likely involved in the progression and metastasis of colorectal cancer. CDH3 mRNA was upregulated in colon adenocarcinoma tissue compared to normal colon tissue [34, 35]. Additionally, based on a TCGA data, a cutoff was established, and high CDH3 expression had a better prognosis for patients compared to low/medium expression [35]. Fibrinogen is a protein that is encoded by three different genes (FGA, FGB, FGG), two of which were upregulated in plasma samples from cancer patients in the current study. It regulates the expression of genes involved in cell cycle regulation and metabolism by activating the focal adhesion kinase pathway, leading to the destabilization of p53 causing tumor growth and limiting senescence [36]. At the protein level, higher levels of fibrinogen in plasma have been strongly associated with poor prognosis and lower survival rate in many cancers including colorectal [37, 38].
There are multiple factors that may have contributed to lack of marker validation in the current study. Samples in the three phases comprised of different patients so the validation was totally independent due to limited plasma volumes and RNA yield/quality. There were also differences in volume of plasma used across experiments (phase 1: 500ul, phase 2: 2ml, phase 3: 4ml) due to changes in RNA extraction kit protocols and plasma availability. Furthermore, since the experiments were carried out across several years there were changes in technologies, RNA extraction, and library preparation kits used. For the initial discovery total RNA sequencing experiment, sample size was smaller than expected due to more poor-quality samples than anticipated and differential expression data was not statistically strong which may have led to inferior marker identification for downstream validation steps. Additionally, deeper sequencing may have revealed additional promising candidates. Finally, there may be a lack of universal RNA markers across patients. Given the highly fragmented nature of RNA in plasma, a possible alternative would be to focus on RNA within extracellular vesicles due to their protection from enzymatic degradation [39]. Another alternative for studying early colorectal cancer precursors, would be to focus on the abundant release of RNA by luminal exfoliation into stool directly, because it occurs much earlier than vascular invasion into the blood [40].
Cell- free RNA biomarkers have low reproducibility caused by technical and biological variability, lack of standardized methods, and insufficient sample sizes [41, 42]. We encountered similar issues in the current investigation and only validated a small number of potential cfRNA markers for CRC detection. It may be beneficial to further explore these cfRNA candidates (FGA, FGB, GPR107, CDH3, RP23AP7). Additionally, our rigorous protocol optimization and data analyses may be applied to other areas of research beyond just the scope of CRC detection. We hypothesize that over time improvement in technologies, data analysis, and RNA preservation methods at the time of blood collection may improve the feasibility of an RNA- based liquid biopsy in the future.
Supporting information
S2 File. Phase 2: Targeted capture sequencing IDT probes.
https://doi.org/10.1371/journal.pone.0308711.s002
(XLSX)
S4 File. Phase 1: RNA sequencing metrics and differential gene expression analysis.
https://doi.org/10.1371/journal.pone.0308711.s004
(XLSX)
S5 File. Phase 1: Ingenuity pathway analysis.
https://doi.org/10.1371/journal.pone.0308711.s005
(XLSX)
S6 File. Phase 2: Targeted capture sequencing metrics and differential gene expression analysis.
https://doi.org/10.1371/journal.pone.0308711.s006
(XLSX)
S7 File. Phase 3: RT-qPCR data analysis results (RPS12 reference gene).
https://doi.org/10.1371/journal.pone.0308711.s007
(XLSX)
S8 File. Phase 3: RT-qPCR data analysis results (CASC3 reference gene).
https://doi.org/10.1371/journal.pone.0308711.s008
(XLSX)
S1 Fig. Plasma mean Ct values for each reference gene for cancers and controls (phase 3A).
Among all reference genes, RPS12 and CASC3 had the lowest standard deviation, while ACTB and APP had the largest standard deviation.
https://doi.org/10.1371/journal.pone.0308711.s009
(TIF)
S2 Fig. Buffy coat mean Ct values for each reference gene for cancers and controls (phase 3B).
Among all reference genes, CASC3 had the lowest standard deviation, while UBB had the largest standard deviation.
https://doi.org/10.1371/journal.pone.0308711.s010
(TIF)
References
- 1. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin. 2023; 73(1):17–48. pmid:36633525
- 2. Kahi CJ, Imperiale TF, Juliar BE, Rex DK. Effect of screening colonoscopy on colorectal cancer incidence and mortality. Clin Gastroenterol Hepatol. 2009; 7(7):770–5. pmid:19268269
- 3. Jones RM, Devers KJ, Kuzel AJ, Woolf SH. Patient-reported barriers to colorectal cancer screening: a mixed-methods analysis. Am J Prev Med. 2010; 38(5):508–16. pmid:20409499
- 4. Tibble J, Sigthorsson G, Foster R, Sherwood R, Fagerhol M, Bjarnason I. Faecal calprotectin and faecal occult blood tests in the diagnosis of colorectal carcinoma and adenoma. Gut. 2001; 49(3):402–8. pmid:11511563
- 5. Heigh RI, Yab TC, Taylor WR, Hussain FT, Smyrk TC, Mahoney DW, et al. Detection of colorectal serrated polyps by stool DNA testing: comparison with fecal immunochemical testing for occult blood (FIT). PloS one. 2014; 9(1):e85659. pmid:24465639
- 6. Imperiale TF, Ransohoff DF, Itzkowitz SH, Levin TR, Lavin P, Lidgard GP, et al. Multitarget stool DNA testing for colorectal-cancer screening. N Engl J Med. 2014; 370(14):1287–97. pmid:24645800
- 7. Imperiale TF, Porter K, Zella J, Gagrat ZD, Olson MC, Statz S, et al. Next-generation multitarget stool DNA test for colorectal cancer screening. N Engl J Med. 2024;390(11):984–93. pmid:38477986
- 8. Rahat B, Ali T, Sapehia D, Mahajan A, Kaur J. Circulating cell-free nucleic acids as epigenetic biomarkers in precision medicine. Front Genet. 2020; 11:844. pmid:32849827
- 9. De Rubis G, Krishnan SR, Bebawy M. Liquid biopsies in cancer diagnosis, monitoring, and prognosis. Trends Pharmacol Sci. 2019; 40(3):172–86. pmid:30736982
- 10. Xi X, Li T, Huang Y, Sun J, Zhu Y, Yang Y, et al. RNA biomarkers: frontier of precision medicine for cancer. Non-coding RNA. 2017; 3(1):9. pmid:29657281
- 11. Wong SC, Lo SF, Cheung MT, Ng KO, Tse CW, Lai BS, et al. Quantification of plasma β-catenin mRNA in colorectal cancer and adenoma patients. Clin Cancer Res. 2004;10(5):1613–7.
- 12. Kan CM, Pei XM, Yeung MH, Jin N, Ng SS, Tsang HF, et al. Exploring the Role of Circulating Cell-Free RNA in the Development of Colorectal Cancer. Int J Mol Sci. 2023;24(13):11026. pmid:37446204
- 13. Larson MH, Pan W, Kim HJ, Mauntz RE, Stuart SM, Pimentel M, et al. A comprehensive characterization of the cell-free transcriptome reveals tissue-and subtype-specific biomarkers for cancer detection. Nat Commun. 2021; 12(1):2357. pmid:33883548
- 14. Geeurickx E, Hendrix A. Targets, pitfalls and reference materials for liquid biopsy tests in cancer diagnostics. Mol Aspects Med. 2020; 72:100828. pmid:31711714
- 15. Rounge TB, Umu SU, Keller A, Meese E, Ursin G, Tretli S, et al. Circulating small non-coding RNAs associated with age, sex, smoking, body mass and physical activity. Sci Rep. 2018; 8(1):1–3.
- 16. Kalari KR, Nair AA, Bhavsar JD, O’Brien DR, Davila JI, Bockol MA, et al. MAP-RSeq: Mayo analysis pipeline for RNA sequencing. BMC Bioinform. 2014; 15:1–1. pmid:24972667
- 17. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29(1):15–21. pmid:23104886
- 18. Wang L, Wang S, Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics. 2012;28(16):2184–5. pmid:22743226
- 19. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):1–21. pmid:25516281
- 20. Krämer A, Green J, Pollard J Jr, Tugendreich S. Causal analysis approaches in ingenuity pathway analysis. Bioinformatics. 2014;30(4):523–30. pmid:24336805
- 21. Hart SN, Therneau TM, Zhang Y, Poland GA, Kocher JP. Calculating sample size estimates for RNA sequencing data. Journal Comput Biol. 2013;20(12):970–8 pmid:23961961
- 22. Funaki NO, Tanaka J, Kasamatsu T, Ohshio G, Hosotani R, Okino T, et al. Identification of carcinoembryonic antigen mRNA in circulating peripheral blood of pancreatic carcinoma and gastric carcinoma patients. Life Sci. 1996; 59(25–26):2187–99. pmid:8950323
- 23. Lo KW, Lo YD, Leung SF, Tsang YS, Chan LY, Johnson PJ, et al. Analysis of cell-free Epstein-Barr virus-associated RNA in the plasma of patients with nasopharyngeal carcinoma. Clin Chem. 1999; 45(8):1292–4. pmid:10430801
- 24. Kopreski MS, Benko FA, Kwak LW, Gocke CD. Detection of tumor messenger RNA in the serum of patients with malignant melanoma. Clin Cancer Res. 1999; 5(8):1961–5. pmid:10473072
- 25. Xue VW, Cheung MT, Chan PT, Luk LL, Lee VH, Au TC, et al. Non-invasive potential circulating mRNA markers for colorectal adenoma using targeted sequencing. Sci Rep. 2019; 9(1):12943. pmid:31506480
- 26. Jin N, Kan CM, Pei XM, Cheung WL, Ng SS, Wong HT, et al. Cell-free circulating tumor RNAs in plasma as the potential prognostic biomarkers in colorectal cancer. Front Oncol. 2023; 13:1134445. pmid:37091184
- 27. Tsang HF, Pei XM, Wong YK, Wong SC. Plasma Circulating mRNA Profile for the Non-Invasive Diagnosis of Colorectal Cancer Using NanoString Technologies. Int J Mol Sci. 2024;25(5):3012. pmid:38474258
- 28. Simon K. Colorectal cancer development and advances in screening. Clin Interv Aging. 2016; 967–76. pmid:27486317
- 29. Chung D, Gray DM, Greenson J, Gupta S, Eagle C, Hu S, et al. Clinical validation of a cell-free DNA blood-based test for colorectal cancer screening in an average risk population [abstract]. J Gastroenterol. 2023; 164(6):S–1573.
- 30. Chen S, Jin Y, Wang S, Xing S, Wu Y, Tao Y, et al. Cancer type classification using plasma cell-free RNAs derived from human and microbes. eLife. 2022; 11:e75181. pmid:35816095
- 31. Myint NN, Verma AM, Fernandez-Garcia D, Sarmah P, Tarpey PS, Al-Aqbi SS, et al. Circulating tumor DNA in patients with colorectal adenomas: assessment of detectability and genetic heterogeneity. Cell Death Dis. 2018; 9(9):894. pmid:30166531
- 32. Albrecht LJ, Höwner A, Griewank K, Lueong SS, von Neuhoff N, Horn PA, et al. Circulating cell‐free messenger RNA enables non‐invasive pan‐tumour monitoring of melanoma therapy independent of the mutational genotype. Clin Transl Med. 2022; 12(11):e1090. pmid:36320118
- 33. Hindson CM, Chevillet JR, Briggs HA, Gallichotte EN, Ruf IK, Hindson BJ, et al. Absolute quantification by droplet digital PCR versus analog real-time PCR. Nat Methods. 2013; 10(10):1003–5. pmid:23995387
- 34. Kumara HS, Bellini GA, Caballero OL, Herath SA, Su T, Ahmed A, et al. P-Cadherin (CDH3) is overexpressed in colorectal tumors and has potential as a serum marker for colorectal cancer monitoring. Oncoscience. 2017; 4(9–10):139. pmid:29142905
- 35. Xu Y, Zhao J, Dai X, Xie Y, Dong M. High expression of CDH3 predicts a good prognosis for colon adenocarcinoma patients. Exp Ther Med. 2019; 18(1):841–7. pmid:31281458
- 36. Sharma BK, Mureb D, Murab S, Rosenfeldt L, Francisco B, Cantrell R, et al. Fibrinogen activates focal adhesion kinase (FAK) promoting colorectal adenocarcinoma growth. J Thromb Haemost. 2021; 19(10):2480–94. pmid:34192410
- 37. Pedrazzani C, Mantovani G, Salvagno GL, Baldiotti E, Ruzzenente A, Iacono C, et al. Elevated fibrinogen plasma level is not an independent predictor of poor prognosis in a large cohort of Western patients undergoing surgery for colorectal cancer. World J Gastroenterol. 2016; 22(45):9994. pmid:28018106
- 38. Sun ZQ, Han XN, Wang HJ, Tang Y, Zhao ZL, Qu YL, et al. Prognostic significance of preoperative fibrinogen in patients with colon cancer. World J Gastroenterol. 2014; 20(26):8583. pmid:25024612
- 39. D’Souza-Schorey C, Clancy JW. Tumor-derived microvesicles: shedding light on novel microenvironment modulators and prospective cancer biomarkers. Genes Dev. 2012; 26(12):1287–99. pmid:22713869
- 40. Ahlquist DA, Taylor WR, Mahoney DW, Zou H, Domanico M, Thibodeau SN, et al. The stool DNA test is more accurate than the plasma septin 9 test in detecting colorectal neoplasia. Clin Gastroenterol Hepatol. 2012; 10(3):272–7. pmid:22019796
- 41. Cabús L, Lagarde J, Curado J, Lizano E, Pérez-Boza J. Current challenges and best practices for cell-free long RNA biomarker discovery. Biomark Res. 2022; 10(1):1–0.
- 42. Wagner JT, Kim HJ, Johnson-Camacho KC, Kelley T, Newell LF, Spellman PT, et al. Diurnal stability of cell-free DNA and cell-free RNA in human plasma samples. Sci Rep. 2020; 10(1):16456. pmid:33020547