Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A genome-wide SNP-SNP interaction analysis exploring novel interacting loci associated with the risk of recurrence in colorectal cancer

  • Aaron A. Curtis,

    Roles Formal analysis, Investigation, Project administration, Writing – original draft, Writing – review & editing

    Affiliation Division of Biomedical Sciences, Faculty of Medicine, Memorial University, St. John’s, NL, Canada

  • Yajun Yu,

    Roles Investigation, Writing – original draft, Writing – review & editing

    Affiliation Institute of Cardiovascular Research, Southwest Medical University, Luzhou, Sichuan, China

  • Megan Carey,

    Roles Data curation, Writing – review & editing

    Affiliation Division of Biomedical Sciences, Faculty of Medicine, Memorial University, St. John’s, NL, Canada

  • Yildiz E. Yilmaz,

    Roles Investigation, Writing – review & editing

    Affiliation Department of Mathematics and Statistics, Faculty of Science, Memorial University, St. John’s, NL, Canada

  • Sevtap Savas

    Roles Conceptualization, Data curation, Funding acquisition, Investigation, Project administration, Supervision, Writing – original draft, Writing – review & editing

    savas@mun.ca

    Affiliations Division of Biomedical Sciences, Faculty of Medicine, Memorial University, St. John’s, NL, Canada, Division of Population Health and Applied Health Sciences, Faculty of Medicine, Memorial University, St. John’s, NL, Canada, Discipline of Oncology, Faculty of Medicine, Memorial University, St. John’s, NL, Canada

Abstract

Background

Genetic factors can influence and predict patient outcomes. The association of interactions of germline SNPs with patient outcomes is an understudied area of prognostic research. In this study, we applied the first genome-wide SNP-SNP interaction analysis in relation to colorectal cancer outcomes.

Objectives

Our objective was to explore interacting SNP loci at the genome-wide level that predict the risk of local or distant recurrence (RMFS) in a cohort of stage I-III colorectal cancer patients from the Canadian province of Newfoundland and Labrador.

Methods

The patient cohort consisted of 430 unrelated Caucasian patients. Genetic and medical data was collected previously and the genetic data consisted of a total of 384,415 genotyped SNPs. The PLINK epistasis function was utilized to examine pairwise SNP interactions. Select interactions were assessed by multivariable Cox-regression models, adjusting for established clinical covariates. Genomic regions identified were explored for additional interactions. Published databases were utilized to retrieve biological information about the loci identified.

Results

After Bonferroni correction for multiple testing, no interaction remained significant. We present the top 20 interactions. The interaction p-values ranged from p = 1.37E-8 to p = 2.14E-9 in this set. Interactions were also tested by multivariable Cox regression models including established clinical covariates. Many of the SNPs were intronic and some of them were functional (e.g., expression quantitative expression loci). Analysis of the other SNPs in the same genomic regions as the interacting SNPs led to the identification of three additional interaction models.

Conclusions

We present the results of the first genome-wide SNP-SNP interaction analysis in colorectal cancer outcomes. While no SNP-SNP interaction remained significant after correction for multiple testing, our methodology emphasizes the additional knowledge that can be obtained using interaction analyses while studying prognostic markers.

Introduction

Colorectal cancer is one the most common cancers globally [13]. As in the case of many complex diseases, in colorectal cancer there has been a great interest in identifying prognostic biomarkers (including genetic variants such as Single Nucleotide Polymorphisms [SNPs]) [4]. The past decade has witnessed tremendous progress in our fundamental understanding of biology, health, and disease, as well as in technology, which has allowed researchers to conduct larger and more challenging studies, such as genome-wide association studies (GWASs).

While the GWAS–which often examines individual associations of SNPs with an outcome–is a widely applied method [510], it is important to move beyond it. This is because GWAS identified variants usually explain only a small fraction of the phenotypic variation among patients, suggesting that there are genetic mechanisms or variations still waiting to be discovered (i.e., “missing heritability” [1114]). Genetic interactions are one of the possible mechanisms that can help elucidate the missing heritability. In simple terms, interactions (sometimes also referred to as epistasis) are when the effect of a variant on a phenotype is dependent on another variant, where its effect may increase (synergistic interaction) or decrease (antagonistic interaction) in the presence of the other locus [1117]. As GWASs only examine the effect of one SNP at a time, they cannot identify these interactive effects.

A number of methods have been developed to examine interactions [14,1829] and further development in this area is expected, including further artificial intelligence-based methods [29]. Since interaction analyses are advanced when compared to one-SNP-at-a-time type analyses, examining interactions can provide unique, novel, and exciting information that can support current studies and further progress prognostic research. Interactions, however, are not well studied in colorectal cancer prognosis. One of the reasons for this is the computational load required to examine interactions and a lack of availability of tools that can handle the large-scale data that is common in genome-wide settings. A few interaction studies in colorectal cancer outcomes, some led by our lab, have been published [3035]; while these studies opened new ways to examine prognostic biomarkers, they were restricted to candidate genes and/or pathways, missing the opportunity to provide a more comprehensive landscape of the interactions at the genome-wide level.

In the presented study, our aim was to explore genome-wide SNP-SNP interactions that may be associated with the 5-year (local or distant) recurrence risk in colorectal cancer. Our findings underline the potential utility of examining SNP-SNP interactions in biomarker research, can inform future studies, and can inspire other groups to integrate interaction analyses into their studies.

Methods

Ethics statement

This study has been approved by the provincial Health Ethics Research Board (HREB #2018.051; #2009.106). As this study was a secondary use of data study, HREB waived the requirement for patient consent. For this study, authors did not require access to information that could identify individual participants during or after data collection. However, previously Megan Carey had access to patient identifiers while updating the outcome data (HREB # 15.006).

Patient cohort, SNP genotyping and clinical data

The patient clinical and genetic data used in this study were collected by the Newfoundland Familial Colorectal Cancer Registry (NFCCR) investigators between January 1, 1999 and January 24, 2018. The data used in this study was accessed between October 22, 2022 and July 9, 2024. In brief, the NFCCR recruited around 750 patients diagnosed with colorectal cancer between January 1999–December 2003 [36]. Genome-wide SNP genotype data was previously obtained using the Illumina® Omni1-Quad human SNP genotyping platform at a genotyping facility (Centrillion Biosciences, USA) [7]. DNA extracted from non-tumor (e.g., blood) samples was used in genotyping reactions. Patients whose genotype data failed the standard quality control (QC) measures as well as those patients who were non-Caucasian and 1st-, 2nd-, or 3rd-degree relatives were removed from the patient cohort, leaving 505 patients with genome-wide SNP genotype data (consisting of 729,737 SNPs) [7]. Clinical and demographic data was collected using a variety of resources over time, including medical charts, patient questionnaires, and data from local tumor registries [3638]. The last follow up date was January 24, 2018 [38].

PLINK uses logistic regression in its epistasis analysis. Therefore, we processed the outcome data to fit this method. The clinical endpoint used for this study was 5-year recurrence or metastasis free survival (RMFS) status, where events were defined as recurrence or metastasis at or before 5 years after diagnosis of colorectal cancer. Initially, patients who did not experience recurrence or metastasis and whose last follow-up time was before 5 years were removed from the dataset, as their survival at 5 years was unknown (censored patients; n = 11). Stage IV patients were also removed from the data as our study focuses on the risk of recurrence or metastasis and stage I, II, and III patients are susceptible to these outcomes (i.e., stage IV patients would already have metastases at the time of diagnosis). Patients with missing covariate data were also excluded. After these data processing steps, there remained 430 stage I-III patients in the cohort for the PLINK interaction analysis. Baseline features for this patient cohort are shown in Table 1.

thumbnail
Table 1. Baseline characteristics of the patient cohort included to the PLINK epistasis analysis.

https://doi.org/10.1371/journal.pone.0321967.t001

SNP genotype data extraction and quality control measures

All genotype data processing prior to analysis was performed using PLINK 1.9 [39]. Genotypes of SNPs were extracted for the patient cohort using PLINK with the following standard parameters: Missing genotypes = 0, Minor Allele Frequency (MAF) ≥ 0.05, Hardy-Weinberg Equilibrium (HWE) p-value > 0.0001. Genotype data was restricted to autosomal chromosomes, as association analysis of genotypes of the variants on sex chromosomes requires special approaches in cohorts including both men and women, such as ours [41].

The SNP dataset, once extracted, was then subject to pruning. Linkage Disequilibrium (LD)-based pruning was performed using the PLINK commands --indep-pairwise 50 5 0.8 and --extract [42]. This removed one SNP of each pair of SNPs with an r2 greater than 0.8, within a window of 50 SNPs, moving the window by 5 SNPs each step. As a result, 384,415 SNPs were included in the final data set.

PLINK epistasis function

Epistasis analysis was performed using PLINK 1.9. [39] and the --epistasis command. With this command, PLINK performs logistic regression analysis for every pair of SNPs in the dataset. PLINK was chosen for this analysis due to its speed, robustness, memory efficiency, and simplicity. The output of the --epistasis command was retrieved and stored in two files (an epi.cc and an epi.cc.summary file). As a result of this epistasis analysis, 73,887,253,905 interactions were tested.

Statistical analyses

After Bonferroni correction for multiple testing, none of the interaction p-values remained significant. We report the 20 SNP pairs identified in the epi.cc output file with the lowest interaction p-values (“top 20 SNP pairs”). Genotype data was extracted for these pairs using an additive genetic model with the PLINK command --recodeA.

We constructed univariable Cox regression models for the top SNPs as well as two-SNP Cox regression models (with interaction term) adjusting for clinical covariates of location (rectum vs. colon), stage (II vs I and III vs I), adjuvant chemotherapy (yes vs no), and baseline radiotherapy (yes vs no) [38]. Time to recurrence data was utilized in Cox regression analyses. During these latter analyses, 11 patients who were censored, and hence removed from the PLINK analysis, were added back to the dataset in order to limit bias. Statistical analyses were done using SPSS (IBM; version 29.0.0.0 (241) for Windows). Bonferroni-adjusted significance thresholds are p < 0.00128 and p < 0.0025 for univariable and multivariable Cox regression analyses, respectively.

Analysis of the interactions in identified genomic regions

As LD-pruning was performed, which removes one SNP of each high-LD pair from the dataset, only a subset of SNPs in our dataset were directly included in our initial PLINK epistasis analysis. In order to address the possibility that interactions involving the pruned SNPs may have stronger associations with RMFS than the interactions identified in our initial analysis, a subsequent analysis was performed. SNPs which were in LD with those identified in the top 20 interactions of our initial epistasis analysis (PLINK r2 inter-chr function) were used to form a new dataset, along with the 39 unique SNPs from the top 20 interactions (one SNP appeared in two interactions). The PLINK epistasis function was again run using this new dataset. There were 30 models found with a p-value ≤ 1.37E-8 (the lowest p-value identified in the initial PLINK interaction analysis). These models included the original 20 interactions, leaving 10 new models for us to consider. In the latter models, each model included one SNP in common with an original interaction set and a new SNP (that is in high-LD with the second SNP from the original interaction). Among those, only three new alternative interactions had a PLINK interaction p-value lower than the original interaction which it corresponded to – these are noted in this manuscript. For each of these alternative interactions, 2-SNP (with interaction term) Cox regression analyses adjusting for covariates were performed as described above.

Bioinformatics tools and databases

In order to explore the biological knowledge available about the identified loci, we utilized published resources including RegulomeDB [43], GTEx [44], and PUBMED. RegulomeDB ranks variants based on functionality using a scoring system (rank scores vary between 1–7), where with decreasing rank score, the evidence for functionality increases. For example, a ranking score of 1 means the variant is at least an expression quantitative trait/chromatin accessibility quantitative trait locus (eQTL/caQTL) and a transcription factor binding site (https://regulomedb.org/regulome-help/), and hence, is more likely to have functional consequences compared to a variant with a higher ranking score. GTEx, on the other hand, includes expression and eQTL data for normal tissues. Since GTEx does not include data from rectum, here we present the GTEx data extracted for sigmoid and transverse colon only. We note that since none of the interactions remained significant after correction for multiple testing (see Results), the results of these analyses should rather be taken as explorative in terms of how they relate to colorectal cancer outcomes, but not definitive.

Results

The flow-chart of the study workflow is shown in Fig 1. After applying the Bonferroni correction, none of the interactions identified by PLINK remained significant. The top 20 pairs of SNPs with the lowest interaction p-values identified by PLINK are shown in Table 2. Except for two cases, interacting SNPs were located in different chromosomes. One SNP (rs742257 on chromosome 1) was identified in two interacting SNP sets.

thumbnail
Table 2. PLINK epistasis analysis results for the top 20 interactions (sorted by p-value).

https://doi.org/10.1371/journal.pone.0321967.t002

thumbnail
Fig 1. The study workflow.

LD: Linkage Disequilibrium; RMFS: Recurrence-Metastasis Free Survival; SNP: Single Nucleotide Polymorphism.

https://doi.org/10.1371/journal.pone.0321967.g001

We performed univariable Cox regression (for each SNP; Table 3) and multivariable Cox regression analyses (for each SNP pair: Table 4) to examine the SNPs’ relationships with the outcome. No SNP was individually associated with RMFS in the univariable Cox regression analysis (Bonferroni adjusted significance threshold < 0.00128). This is expected, as interacting SNPs are not expected to be associated with the outcome individually. In the multivariable models, all interactions had p-values varying between 1.04E-5 and 3.48E-10 (Bonferroni corrected significance threshold is p < 0.0025).

thumbnail
Table 3. Univariable Cox regression analysis results for the SNPs in the top 20 list.

https://doi.org/10.1371/journal.pone.0321967.t003

thumbnail
Table 4. Multivariable Cox regression analysis results for the SNPs and interactions identified in this study.

https://doi.org/10.1371/journal.pone.0321967.t004

We next examined whether there were interactions with smaller p-values that may have been missed because of SNP pruning, within the genomic regions captured by the identified SNPs. As a result, three new interacting SNP pairs were found (also called “alternative” sets or models in this manuscript). In these new models, one of the SNPs was the same variant between the original and the alternative models): rs4678497 and rs7212295, as an alternative to rs4678497-rs12601535; rs6757680-kgp11888590 as an alternative to rs6757680-rs9305669; and rs4485715 and rs914491 as an alternative to rs9855001-rs914491. In multivariable Cox regression analyses, two of these interactions had a p-value lower than the original interaction (Table 4).

S1 Table summarizes the MAFs, genetic locations, associated genes, RegulomeDB rankings, and eQTL information for the SNPs. Many of the SNPs were intronic SNPs and two SNPs were located in coding regions (rs12601535, a synonymous substitution located in MYO18A, and rs9671369, a missense variant in the SYNE3 gene). Twenty SNPs identified in this study were highly likely to be functional (based on RegulomeDB [43] rank of 1a-1f; S1 Table). Interestingly, in some cases both of the interacting SNPs were likely functional (Table 2).

Based on the GTEx [44] data, four of the SNPs in the list were eQTLs in sigmoid and/or transverse colon tissues (eQTLs are variants that are associated with the expression levels of certain genes [43]) (rs2247213; rs6056615; rs12436380; and rs35579818 being eQTLs for LINC01352, HLX-AS1, and RP11-295M18.6; RSPO4; TRIM9; and TMEM80, HRAS, and RNH1 genes, respectively). Additionally, rs9984518 identified in one of the alternative SNP sets was found to be an eQTL for WRB in transverse colon. No entry was returned when PUBMED and dbCPCO [45] databases were searched with the IDs of the SNPs identified in this study. Lastly, a literature search showed that many of the genes identified in this study were linked to colorectal cancer development, progression, or outcomes previously (see Discussion).

Discussion

Colorectal cancer is a common disease worldwide [13] and in Canada [46]. New prognostic markers can help prognosis, improve understanding of the biological reasons for variable prognosis among patients, and aid in the development of better therapeutic agents and prognostic tools. In this study, by following a robust and easy-to-apply approach, we aimed to explore the SNP-SNP interactions that are associated with the local or distant recurrence-free survival times in a cohort of colorectal cancer patients from Canada.

Prognostic GWASs are widely performed studies. While they are quite popular, their analytical abilities are restricted to analysis of the relationship of individual variants to an outcome measure. This may potentially lead to missing at least a part of the “heritability” component of prognostic variability, which can be explained by interacting loci [1114]. Previously, our group [30,31] and others [3335] have seen this possibility and applied interaction analyses at candidate gene or pathway settings in colorectal cancer. A genome-wide interaction analysis, as we have done here, however, has not been done in colorectal cancer before. Our analysis explored candidate SNP-SNP interactions in relation to recurrence in colorectal cancer. While further studies are needed to validate these findings and the accuracy of the models, and assess the potential short-term and long-term effects of the SNP interactions, we invite all researchers with genetic and RMFS data to test our findings in their datasets.

There may be also a timely opportunity to investigate or confirm the biological features of the SNPs reported in this manuscript, and their relation to disease recurrence in colorectal cancer using experimental approaches. Note that many of the SNPs highlighted in our study were estimated to affect biological functions/gene regulation. In some cases, both SNPs in the interaction set were predicted to be functional (e.g., based on the RegulomeDB scores or GTEx eQTL information). These SNPs are likely “lead SNPs” [47] that can be prioritized for further studies. In addition, literature search showed that some of the genes that SNPs identified here-in were located in (e.g., LAMB3 [48,49]; SYNE3 [50]; HRAS [51,52]; DIAPH3 [53]; NLRP3 [54,55]); PEBP4 [56]; GABRG3 [57]; LGALS8 [58]; OPCML [59]; MYO18A [60]; HAUS6 [61,62]; HLX [63,64] and some of the genes that were linked to eQTLs identified here-in (e.g., HRAS [51,52]) were previously linked to colorectal cancer. Altogether, this information strengthens the potential biological connections of some of the SNPs identified in this study with colorectal cancer.

The key strengths and limitations of this study can be summarized as follows: PLINK [39] is a fast and robust tool and it handled our large dataset reasonably quickly (less than a day), however, the PLINK epistasis function currently cannot adjust for covariates and can examine two SNP interactions only. Therefore, PLINK misses higher-order interactions (3-SNP and higher) and requires follow-up analyses with multivariable modeling to examine whether interactions are independent of established clinical covariates. There is also a need for robust tools that can handle time-to-event analyses while examining large number of interactions. The cohort for this study has a long follow-up time, but it included only Caucasian patients, and as such, results may not be applicable to other ethnicities. X-linked and Y-linked SNPs were excluded from the analysis, therefore additional information can be gained by examining the variants on these chromosomes in future studies. The SNP dataset included common SNPs with a MAF of at least 5%, hence, interactions involving rare variants also remain to be examined. Our cohort size was small and none of the p-values provided by PLINK remained significant after the Bonferroni correction. Therefore, further research in bigger cohorts with comparable characteristics (e.g., ancestry, treatment features) is required to test the results of this hypothesis generating study and before any of these interactions can be included in a prognostic model in the clinic. Lastly, this is the first time such an extensive interaction analysis has been performed in colorectal cancer, and as such, our results encourage new ways to examine prognostic biomarkers beyond the traditional GWAS approach.

Conclusions

Interaction analyses can identify collective relations of multi-variables in relation to a phenotype. While conducting genome-wide interaction studies is challenging, a few computational methods have been developed that can be utilized for such purposes. Here we present such a study performed using PLINK, the first genome-wide SNP-SNP interaction analysis in colorectal cancer outcomes. Our study brings new depth to the prognostic research, can inform future studies, and is expected to inspire other groups to integrate interaction analyses in their prognostic studies. Further studies building on our findings can also advance prognostic research, identify new interactions, and help address the missing heritability in colorectal cancer prognosis.

Supporting information

S1 Table. Information about the SNPs identified in this study.

Chr: Chromosome; eQTL: expression Quantitative Trait Locus; MAF: Minor Allele Frequency; SNP: Single Nucleotide Polymorphism. * This SNP shares the same location as kgp5016729. ** This SNP shares the same location as kgp11888590. μAlternative interacting SNP identified through analysis of the genomic regions where the original top 20 interacting SNPs were located. £Reference genome: hg19.

https://doi.org/10.1371/journal.pone.0321967.s001

(PDF)

Acknowledgments

We are indebted to the patients recruited to the NFCCR. We thank the NFCCR staff, investigators, and NL Tumor Registry staff for helping collect the data used for this study. SS is a senior scientist of the Beatrice Hunter Cancer Research Institute (BHCRI).

References

  1. 1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. pmid:30207593
  2. 2. Arnold M, Rutherford MJ, Bardot A, Ferlay J, Andersson TML, Myklebust TÅ. Progress in cancer survival, mortality, and incidence in seven high-income countries 1995–2014 (ICBP SURVMARK-2): a population-based study. Lancet Onc. 2019;20(11):1493–505.
  3. 3. Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 2015;136(5):E359-86. pmid:25220842
  4. 4. Savas S, Liu G. Genetic variations as cancer prognostic markers: review and update. Hum Mutat. 2009;30(10):1369–77.
  5. 5. Labadie JD, Savas S, Harrison TA, Banbury B, Huang Y, Buchanan DD, et al. Genome-wide association study identifies tumor anatomical site-specific risk variants for colorectal cancer survival. Sci Rep. 2022;12(1):127. pmid:34996992
  6. 6. Yu Y, Werdyani S, Carey M, Parfrey P, Yilmaz YE, Savas S. A comprehensive analysis of SNPs and CNVs identifies novel markers associated with disease outcomes in colorectal cancer. Mol Oncol. 2021;15(12):3329–47. pmid:34309201
  7. 7. Xu W, Xu J, Shestopaloff K, Dicks E, Green J, Parfrey P, et al. A genome wide association study on Newfoundland colorectal cancer patients’ survival outcomes. Biomark Res. 2015;3:6. pmid:25866641
  8. 8. Pander J, van Huis-Tanja L, Böhringer S, van der Straaten T, Gelderblom H, Punt C, et al. Genome Wide Association Study for Predictors of Progression Free Survival in Patients on Capecitabine, Oxaliplatin, Bevacizumab and Cetuximab in First-Line Therapy of Metastatic Colorectal Cancer. PLoS One. 2015;10(7):e0131091. pmid:26222057
  9. 9. Penney ME, Parfrey PS, Savas S, Yilmaz YE. A genome-wide association study identifies single nucleotide polymorphisms associated with time-to-metastasis in colorectal cancer. BMC Cancer. 2019;19(1):133. pmid:30738427
  10. 10. Phipps AI, Passarelli MN, Chan AT, Harrison TA, Jeon J, Hutter CM, et al. Common genetic variation and survival after colorectal cancer diagnosis: a genome-wide analysis. Carcinogenesis. 2016;37(1):87–95.
  11. 11. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–53. pmid:19812666
  12. 12. Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: genetic interactions create phantom heritability. Proc National Acad Sciences of the United States of America. 2012;109(4):1193–8.
  13. 13. Domingo J, Baeza-Centurion P, Lehner B. The causes and consequences of genetic interactions (epistasis). Annu Rev Genom Hum Genet. 2019;20(1):433–60.
  14. 14. Uppu S, Krishna A, Gopalan RP. A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data. IEEE/ACM Trans Comput Biol Bioinform. 2018;15(2):599–612. pmid:28060710
  15. 15. Song YS, Wang F, Slatkin M. General epistatic models of the risk of complex diseases. Genetics. 2010;186(4):1467–73. pmid:20855570
  16. 16. Hemani G, Knott S, Haley C. An evolutionary perspective on epistasis and the missing heritability. PLoS Genet. 2013;9(2):e1003295. pmid:23509438
  17. 17. Diaz-Gallo L-M, Brynedal B, Westerlind H, Sandberg R, Ramsköld D. Understanding interactions between risk factors, and assessing the utility of the additive and multiplicative models through simulations. PLoS One. 2021;16(4):e0250282. pmid:33901204
  18. 18. Chen G-B, Xu Y, Xu H-M, Li MD, Zhu J, Lou X-Y. Practical and theoretical considerations in study design for detecting gene-gene interactions using MDR and GMDR approaches. PLoS One. 2011;6(2):e16981. pmid:21386969
  19. 19. Bochdanovits Z, Sondervan D, Perillous S, van Beijsterveldt T, Boomsma D, Heutink P. Genome-wide prediction of functional gene-gene interactions inferred from patterns of genetic differentiation in mice and men. PLoS One. 2008;3(2):e1593. pmid:18270580
  20. 20. Lou X-Y, Chen G-B, Yan L, Ma JZ, Zhu J, Elston RC, et al. A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. Am J Hum Genet. 2007;80(6):1125–37. pmid:17503330
  21. 21. Motsinger AA, Ritchie MD. Multifactor dimensionality reduction: An analysis strategy for modelling and detecting gene - gene interactions in human genetics and pharmacogenomics studies. Hum Genomics. 2006;2(5):318.
  22. 22. Motsinger AA, Ritchie MD, Dobrin SE. Clinical applications of whole-genome association studies: future applications at the bedside. Expert Rev Mol Diagn. 2006;6(4):551–65. pmid:16824029
  23. 23. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF. Multifactor-Dimensionality Reduction Reveals High-Order Interactions Among Estrogen-Metabolism Genes in Sporadic Breast Cancer. Am J Hum Genet. 2001;69(1):138–47.
  24. 24. Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022;23(1):40–55. pmid:34518686
  25. 25. Al-Rajab M, Lu J, Xu Q. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. Comput Methods Programs Biomed. 2017;146:11–24. pmid:28688481
  26. 26. Lunetta KL, Hayward LB, Segal J, Van Eerdewegh P. Screening large-scale association study data: exploiting interactions using random forests. BMC Genet. 2004;5:32. pmid:15588316
  27. 27. Lee S, Kwon MS, Oh JM, Park T. Gene-gene interaction analysis for the survival phenotype based on the Cox model. Bioinformatics. 2012 Sep 15;28(18):i582–8.
  28. 28. Shibata M, Terada A, Kawaguchi T, Kamatani Y, Okada D, Nagashima K, et al. Identification of epistatic SNP combinations in rheumatoid arthritis using LAMPLINK and Japanese cohorts. J Hum Genet. 2024;69(10):541–7. pmid:39014190
  29. 29. Musolf AM, Holzinger ER, Malley JD, Bailey-Wilson JE. What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics. Hum Genet. 2022;141(9):1515–28. pmid:34862561
  30. 30. Curtis A, Yu Y, Carey M, Parfrey P, Yilmaz YE, Savas S. Examining SNP-SNP interactions and risk of clinical outcomes in colorectal cancer using multifactor dimensionality reduction based methods. Front Genet. 2022;13:902217. pmid:35991579
  31. 31. Curtis AA, Yu Y, Carey M, Parfrey P, Yilmaz YE, Savas S. Multifactor dimensionality reduction method identifies novel SNP interactions in the WNT protein interaction networks that are associated with recurrence risk in colorectal cancer. Front Oncol. 2023;13:1122229. pmid:36998434
  32. 32. Afzal S, Gusella M, Jensen SA, Vainer B, Vogel U, Andersen JT, et al. The association of polymorphisms in 5-fluorouracil metabolism genes with outcome in adjuvant treatment of colorectal cancer. Pharmacogenomics. 2011;12(9):1257–67. pmid:21919605
  33. 33. Pander J, Wessels JAM, Gelderblom H, van der Straaten T, Punt CJA, Guchelaar HJ. Pharmacogenetic interaction analysis for the efficacy of systemic treatment in metastatic colorectal cancer. Ann Oncol. 2011;22(5).
  34. 34. Sarac SB, Rasmussen CH, Afzal S, Thirstrup S, Jensen SA, Colding-Jørgensen M, et al. Data-driven assessment of the association of polymorphisms in 5-Fluorouracil metabolism genes with outcome in adjuvant treatment of colorectal cancer. Basic Clin Pharmacol Toxicol. 2012;111(3):189–97. pmid:22448752
  35. 35. Hu X, Qin W, Li S, He M, Wang Y, Guan S. Polymorphisms in DNA repair pathway genes and ABCG2 gene in advanced colorectal cancer: correlation with tumor characteristics and clinical outcome in oxaliplatin-based chemotherapy. Cancer Manag Res. 2018;11:285–97.
  36. 36. Woods MO, Younghusband HB, Parfrey PS, Gallinger S, McLaughlin J, Dicks E. The genetic basis of colorectal cancer in a population-based incident cohort with a high rate of familial disease. Gut. 2010;59(10):1369–77.
  37. 37. Negandhi AA, Hyde A, Dicks E, Pollett W, Younghusband BH, Parfrey P, et al. MTHFR Glu429Ala and ERCC5 His46His polymorphisms are associated with prognosis in colorectal cancer patients: analysis of two independent cohorts from Newfoundland. PLoS One. 2013;8(4):e61469. pmid:23626689
  38. 38. Yu Y, Carey M, Pollett W, Green J, Dicks E, Parfrey P, et al. The long-term survival characteristics of a cohort of colorectal cancer patients and baseline variables associated with survival outcomes with or without time-varying effects. BMC Med. 2019;17(1):150. pmid:31352904
  39. 39. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. pmid:17701901
  40. 40. Schemper M, Smith TL. A note on quantifying follow-up in studies of failure time. Control Clin Trials. 1996;17(4):343–6. pmid:8889347
  41. 41. Xu W, Hao M. A unified partial likelihood approach for X-chromosome association on time-to-event outcomes. Genet Epi. 2018;42(1):80–94.
  42. 42. Génin E, Schumacher M, Roujeau J-C, Naldi L, Liss Y, Kazma R, et al. Genome-wide association study of Stevens-Johnson Syndrome and Toxic Epidermal Necrolysis in Europe. Orphanet J Rare Dis. 2011;6:52. pmid:21801394
  43. 43. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22(9):1790–7. pmid:22955989
  44. 44. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45(6):580–5. pmid:23715323
  45. 45. Savas S, Younghusband HB. dbCPCO: a database of genetic markers tested for their predictive and prognostic value in colorectal cancer. Hum Mutat. 2010;31(8):901–7. pmid:20506273
  46. 46. Canadian Cancer Society’s Advisory Committee on Cancer Statistics. Canadian Cancer Statistics 2017. Canadian Cancer Society; Toronto, ON: 2017. Available at: cancer.ca/Canadian-Cancer-Statistics-2017-EN.pdf
  47. 47. Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. Linking disease associations with regulatory information in the human genome. Genome Res. 2012;22(9):1748–59. pmid:22955986
  48. 48. Zhu Z, Song J, Guo Y, Huang Z, Chen X, Dang X, et al. LAMB3 promotes tumour progression through the AKT-FOXO3/4 axis and is transcriptionally regulated by the BRD2/acetylated ELK4 complex in colorectal cancer. Oncogene. 2020;39(24):4666–80. pmid:32398865
  49. 49. Qin H, Zhang H, Li H, Xu Q, Sun W, Zhang S, et al. Prognostic risk analysis related to radioresistance genes in colorectal cancer. Front Oncol. 2023;12:1100481. pmid:36741692
  50. 50. Wang X-C, Yue X, Zhang R-X, Liu T-Y, Pan Z-Z, Yang M-J, et al. Genome-wide RNAi Screening Identifies RFC4 as a Factor That Mediates Radioresistance in Colorectal Cancer by Facilitating Nonhomologous End Joining Repair. Clin Cancer Res. 2019;25(14):4567–79. pmid:30979744
  51. 51. Feng J, Hua F, Shuo R, Chongfeng G, Huimian X, Nakajima T. Upregulation of non-mutated H-ras and its upstream and downstream signaling proteins in colorectal cancer. Onc Reps. 2001;8(6):1409–13.
  52. 52. Boidot R, Chevrier S, Julie V, Ladoire S, Ghiringhelli F. HRAS G13D, a new mutation implicated in the resistance to anti-EGFR therapies in colorectal cancer, a case report. Int J Colorectal Dis. 2016;31(6):1245–6. pmid:26561417
  53. 53. Huang R, Wu C, Wen J, Yu J, Zhu H, Yu J, et al. DIAPH3 is a prognostic biomarker and inhibit colorectal cancer progression through maintaining EGFR degradation. Cancer Med. 2022;11(23):4688–702. pmid:35538918
  54. 54. Wang B, Li H, Wang X, Zhu X. The association of aberrant expression of NLRP3 and p-S6K1 in colorectal cancer. Pathology - Res Pract. 2020;216(1):152737.
  55. 55. Shi F, Wei B, Lan T, Xiao Y, Quan X, Chen J, et al. Low NLRP3 expression predicts a better prognosis of colorectal cancer. Biosci Rep. 2021;41(4):BSR20210280. pmid:33821998
  56. 56. Liu H, Kong Q, Li B, He Y, Li P, Jia B. Expression of PEBP4 protein correlates with the invasion and metastasis of colorectal cancer. Tumor Biol. 2012;33(1):267–73.
  57. 57. Yan L, Gong Y-Z, Shao M-N, Ruan G-T, Xie H-L, Liao X-W, et al. Distinct diagnostic and prognostic values of γ-aminobutyric acid type A receptor family genes in patients with colon adenocarcinoma. Oncol Lett. 2020;20(1):275–91. pmid:32565954
  58. 58. Nagy N, Bronckart Y, Camby I, Legendre H, Lahm H, Kaltner H. Galectin-8 expression decreases in cancer compared with normal and dysplastic human colon tissue and acts significantly on human colon cancer cell migration as a suppressor. Gut. 2002;50(3):392–401.
  59. 59. Li C, Tang L, Zhao L, Li L, Xiao Q, Luo X, et al. OPCML is frequently methylated in human colorectal cancer and its restored expression reverses EMT via downregulation of smad signaling. Am J Cancer Res. 2015;5(5):1635–48. pmid:26175934
  60. 60. Lin P-C, Yeh Y-M, Lin B-W, Lin S-C, Chan R-H, Chen P-C, et al. Intratumor Heterogeneity of MYO18A and FBXW7 Variants Impact the Clinical Outcome of Stage III Colorectal Cancer. Front Oncol. 2020;10:588557. pmid:33194745
  61. 61. Zhu Q, Huang X, Yu S, Shou L, Zhang R, Xie H, et al. Identification of genes modified by N6-methyladenosine in patients with colorectal cancer recurrence. Front Genet. 2022;13:1043297. pmid:36324506
  62. 62. Shen A, Liu L, Huang Y, Shen Z, Wu M, Chen X, et al. Down-Regulating HAUS6 Suppresses Cell Proliferation by Activating the p53/p21 Pathway in Colorectal Cancer. Front Cell Dev Biol. 2022;9:772077. pmid:35096810
  63. 63. Hollington P, Neufing P, Kalionis B, Waring P, Bentel J, Wattchow D. Expression and localization of homeodomain proteins DLX4, HB9 and HB24 in malignant and benign human colorectal tissues. Anticancer Research. 2004;24(2B):955–62.
  64. 64. Chen S, Zhang L, Wang K, Huo J, Zhang S, Zhang X. The Potential Dual Role of H2.0-like Homeobox in the Tumorgenesis and Development of Colorectal Cancer and Its Prognostic Value. Can J Gastroenterol Hepatol. 2023;2023:5521544. pmid:37719132