Figures
Abstract
Geography and geospatial data science hold the potential to make unique contributions to the reduction of the burden of cancer on society. Here we use colorectal cancer (CRC) as an example to show how spatial insights into CRC risk factors and priority areas for screening may be obtained to achieve geographically targeted screening. We obtained data from the UK Biobank and divided the participants into the older (50<=age < 70) and young (age < 50) adult groups. The data consists of 2,080 CRC cases and 8,062 controls. We used a case-control study and geographically weighted logistic regression (GWLR) to explore spatial variations in risk levels of significant factors at a fine geographic resolution. Analysis results reveal that, among all significant risk factors, polygenic risk score (PRS) is the most important risk factor for both age groups. Findings suggest that the top priority screening areas for older adults, using PRS as the sole risk factor, are between Sheffield, Birmingham, Cardiff, Bristol, and west of Greater London. For young adults, the top priority areas are between the south of Glasgow and Edinburgh and northwest of Greater London. Furthermore, the approach used in this study holds promise for developing more effective targeted cancer screening.
Citation: Yang M, Narasimhan VM, Zhan FB (2025) Spatial insights into colorectal cancer risk factors and priority areas for screening in the United Kingdom based on data from the UK Biobank. PLoS One 20(7): e0328778. https://doi.org/10.1371/journal.pone.0328778
Editor: Momiao Xiong,, University of Texas School of Public Health, UNITED STATES OF AMERICA
Received: November 4, 2024; Accepted: July 4, 2025; Published: July 23, 2025
Copyright: © 2025 Yang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying this study are available from the UK Biobank. Due to restrictions outlined in the material transfer agreement with the UK Biobank, the data cannot be shared publicly. However, researchers may apply for access to the data directly through the UK Biobank’s application process at https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access. Access is granted to qualified researchers following institutional approval and adherence to the UK Biobank’s data use policies.
Funding: FBZ was partly supported by the Cancer Prevention and Research Institute of Texas (CPRIT). VMN was supported by a grant from the Allen Discovery Center program, a Paul G. Allen Frontiers Group advised program of the Paul G. Allen Family Foundation, and a Good Systems for Ethical AI grant from the University of Texas at Austin. Compute resources were supported by a Director’s Discretionary Award from the Texas Advanced Computing Cluster. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. There was no additional external funding received for this study.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The societal burden of cancer is enormous, and this burden has been growing. Based on estimates from the International Agency for Research on Cancer [1], there were 20 million new cancer diagnosis and 9.7 million deaths worldwide in 2022 alone [1]. It is now well recognized that nearly half of cancers are preventable [2]. One effective way of preventing some cancers is screening. We advocate here that geography and geospatial data science can make unique contributions to the war against cancer in providing insights into priority geographic areas and populations for targeted screening [3]. Colorectal cancer (CRC) is one of three major cancers worldwide [1]. In this study, we use data from the UK Biobank to show how we can gain insights into the risk factors and priority areas that can be used to achieve geographically targeted CRC screening.
CRC is mostly a preventable disease, but it remains the second leading cause of cancer-related deaths in the world, with more than 900,000 deaths each year worldwide [4]. In the United Kingdom (UK), approximately 42,900 new cases are diagnosed annually [5–7], and the CRC incidence rate among young adults has increased rapidly in recent decades [8]. The incidence and mortality rates of CRC vary widely across countries, including the UK [4]. CRC screening is effective in reducing the burden of the disease [9]. An important step in developing effective CRC screening programs is the identification of risk factors associated with CRC and the delineation of priority geographic areas for targeted screening [3,10].
Previous studies explored whether CRC-associated risk factors can explain the geographic variations in CRC incidence [11,12]. Oyeyemi et al. [12] found that lifestyle-related CRC risk factors, such as smoking, education, and fruit and vegetable intake, are associated with CRC but these researchers did not explore the spatial variation of CRC incidence. Dagne [11] used Bayesian spatial models to identify clustering and hotspots of high CRC incidence rates in several Florida counties. However, this study did not examine the spatial variation of CRC-related risk factors.
Geographically weighted logistic regression (GWLR) models have been developed to explore the spatial variation of the occurrence of some diseases and possible association between these diseases and various factors. Some examples of these studies include examination of the drivers of leptospirosis [13], lung cancer risks [14], chronic obstructive pulmonary disease risk factors [15], and the presence or absence of larvae [16]. However, GWLR has hardly been used in the analysis of CRC to explore the spatial variations of its risk factors.
While genetic factors are recognized as playing an important role in CRC risk, they are rarely incorporated with other factors in spatial analyses reported in the literature. This study addresses this research gap and aims to provide spatial insights into risk factors associated with CRC and to identify priority areas for CRC screening, using genetic and other factors from the UK Biobank. In this discussion, spatial insights are geographic patterns derived from location-based observations about CRC cases in the UK and the factors that are associated with these patterns. We used a case-control study design and GWLR to identify significant risk factors associated with CRC and to explore the spatial variations in their risk levels across the UK. The overall approach and findings will contribute to the literature in targeted cancer screening and hence help reduce the burden of colorectal cancer and other cancers.
Materials and methods
Data source
In this study, we utilized data from the UK Biobank, consisting of participants from England, Scotland, and Wales aged 40–70 years, collected between 2006 and 2010. The variables included in the analysis were polygenic risk score (PRS), current employment status, sex, age, alcohol intake frequency, body mass index (BMI), current tobacco smoking status, index of multiple deprivation (IMD), average total household income for each individual, number of vehicles in household, maternal smoking around birth, and education level.
Study setting and participants
This case-control study involved 10,142 White British participants who were not genetically related beyond the second degree and had no family history of CRC. Cases were selected based on ICD-10 codes of C18.0-C18.9, C19, C20, and C26.0. Controls were selected from participants in the UK Biobank who had no cancer diagnosis, were within a 5-year age range of matching CRC cases, and were living in the same output area (OA) as the CRC cases. OAs are small geographic areas created by aggregating postcode areas. The final dataset included 2,080 CRC cases and 8,062 controls.
We organized the cases and controls into two datasets using age 50 as the cutoff, based on prior literature [8,17]. Dataset 1 included 9,287 participants in the older group (aged 50 years or older), with 1,919 cases and 7,368 controls. Dataset 2 consisted of 855 participants in the younger group (under 50 years), with 161 cases and 694 controls. The reason for using two datasets is that the incidence of early-onset colorectal cancer (EO-CRC), defined as CRC diagnosed before age 50, has been rising at a concerning rate in recent decades among individuals younger than 50, a trend that occurs without a known etiology [8,17,18]. We wanted to examine risk factors and top priority areas for CRC screening for people younger than 50 using data from the UK Biobank. Fig 1 illustrates the geographic distribution of cases and controls for the older (Fig 1A) and the younger group (Fig 1B). Fig 2 shows the population density across the UK.
(A) the older group (50<=age < 70). (B) the younger group (age < 50). Shown on the maps are the densities of CRC cases and controls at the local authority level. (Note: Country and local authority boundary data were obtained from UK Census data, available through the UK Data Service: https://ukdataservice.ac.uk/help/data-types/census-data/.).
(Note: UK boundary data were obtained from UK Census data, available through the UK Data Service: https://ukdataservice.ac.uk/help/data-types/census-data/.).
Ethics approval was not required for this analysis because the UK Biobank data is accessible to all researchers. The data was fully de-identified before we accessed it. We did not have access to any information that could identify individual participants during or after data collection.
Polygenic risk score calculation
We calculated the PRS for each participant using the scoring function in PLINK 2.0 software [19], based on the 140 single nucleotide polymorphisms (SNPs) previously identified as risk variants for CRC among individuals of European ancestry [20]. The list of risk SNPs data and corresponding effect size on the risk of CRC can be found in the study of Thomas et al. [20]. The reason for using PRS in the analysis is that it is widely used to explore the impact of genetic risk on CRC [20,21]. Genetic data in the UK Biobank were imputed using the Haplotype Reference Consortium panel. Directly genotyped SNPs were encoded as 0, 1, or 2, representing the number of risk allele copies, while imputed SNPs were represented as dosage values, reflecting the expected number of risk allele copies. For each CRC case and control, we began by extracting all relevant risk SNPs from the imputed genotype dataset. We then calculated the PRS as the sum of risk alleles for the respective variants, using imputed dosages for imputed SNPs and 0, 1, or 2 copies of the risk allele for genotyped SNPs. Like the categorization used in the study of Jia et al. [21], we categorized the PRSs into the high PRS group (the highest 5% PRSs) and the low PRS group (the rest of the PRSs).
Regression models
In this study, we conducted GWLR analysis to explore the spatial variations of CRC risk associated with different factors among both the older and young adult groups. GWLR is a model that can be used to effectively examine the spatial variation of relationship between dependent and independent variables at the individual level and at different spatial scales. We utilized the GWmodel package [22] in R for the GWLR analysis and used ArcGIS Pro to visualize the GWLR results on the map. Statistical significance was defined as a two-tailed p-value less than 0.05. The formula for the GWLR model is shown in Equation 1:
Where is the dependent variable at location 𝑖;
is the value of the 𝑘th independent variable at location 𝑖; m is the number of independent variables;
is the intercept parameter at location 𝑖;
is the local regression coefficient for the 𝑘th independent variable at location 𝑖; and
is the random error at location 𝑖.
The model captures the local relationships at each regression point 𝑖, with the corresponding set of regression coefficients estimated using a weighted least squares method. This estimation is represented in matrix form in Equation 2.
Where X is the matrix of the independent variables with a column of 1s for the intercept; 𝑦 is the dependent variable vector; is the vector of m + 1 local regression coefficient; and
is the diagonal matrix denoting the geographical weighting of each observed data for regression point 𝑖 at location
. The weight is determined by the bi-square kernel function (Equation 3), a distance-decay weighting kernel:
Where is the 𝑗th element of the diagonal of the matrix of geographical weights
, and
is the distance between observations 𝑖 and 𝑗, and b is the bandwidth. The optimal bandwidth was selected based on the smallest Akaike Information Criterion (AICc) value using the bw.ggwr function.
Results
Table 1 lists the variables used in this study and their corresponding characteristics. For the older group, BMI values range from 15.27 to 54.52, with a median of 26.67. IMD values range from 0.82 to 81.07, with a median of 10.49. For the younger group, BMI values range from 15.84 to 53.57, with a median of 26.04. IMD values range from 1.51 to 80.29, with a median of 13.50. The older group had a significantly higher proportion of participants with daily alcohol intake (26.4% vs 17.4%) and household incomes below poverty level (20.1% vs 7.6%) compared to the younger group (Table 1).
The results of GWLR shown in Table 2 list the variables with a significant odds ratio (OR), the ranges of the values of these variables, and the significant proportion of participants associated with each of these variables in both datasets. The proportion of significant results was calculated by dividing the number of participants with statistically significant odds ratios for the risk factors (p < 0.05) by the total number of participants included in the analysis. In the older group, PRS and sex are significant CRC risk factors for all participants, with median ORs of 2.94 and 1.44, respectively. Employment is a significant CRC risk factor for 9,286 (99.9%) participants, with a median OR of 1.66. Age, BMI, and alcohol intake frequency are significant CRC risk factors for 8,579 (92.4%), 6,373 (68.6%), and 2,017 (21.7%) participants, with median ORs of 1.02, 1.01, and 1.09, respectively. In the younger group, PRS is the significant risk factor associated with CRC for all participants (median OR: 4.07). Smoking is a significant CRC risk factor for 104 (12.2%) participants with a median OR of 0.54 in the younger group.
Spatial insights into CRC risk factors for the older group
Spatial variations of different significant risk factors among participants in the older group (Dataset 1) are illustrated using choropleth maps (Fig 3). In generating the choropleth maps, we used the median odds ratio among participants in each of the local authority’s areas for each risk factor. Among all significant risk factors, PRS stands out as the risk factor with the highest odds ratios (Fig 3A). Other significant risk factors include employment status, sex, alcohol intake, age, and BMI.
(A) PRS. (B) employment status. (C) sex. (D) alcohol intake frequency. (E) age. (F) BMI. Shown on the maps are the median odds ratios in local authority’s areas. (Note: Local authority boundary data were obtained from UK Census data, available through the UK Data Service: https://ukdataservice.ac.uk/help/data-types/census-data/).
There are obvious spatial variations of the level of risk for each of the significant risk factors, most noticeably PRS. Areas with the highest risk level of PRS (OR>3.0) are in the southern urban areas of the UK, including Birmingham, Cardiff, Bristol, and areas around Greater London (Fig 3A). Areas with the second highest risk level (2.0<OR<3.0) are in the central portion of the UK, including regions like Glasgow, Edinburgh, Newcastle, Durham, Manchester, Liverpool, Leeds, and Sheffield (Fig 3A).
Areas with elevated risk (1.0<OR<2.0) for employment status and sex are similar (Fig 3B and 3C). These areas are in the major urban areas of the UK, containing regions around Glasgow, Edinburgh, Newcastle, Durham, Manchester, Liverpool, Leeds, Sheffield, Birmingham, Cardiff, Bristol, and areas around Greater London (Fig 3B and 3C). Concerning alcohol, areas with elevated risk (1.0<OR<2.0) are in the central northern parts of the UK in regions around Glasgow, Edinburgh, Newcastle, and Durham (Fig 3D).
For age, areas with elevated risk (1.0<OR<2.0) are in central and southwestern parts of the UK, including areas south of Edinburgh and areas around Newcastle, Durham, Manchester, Liverpool, Leeds, Sheffield, Birmingham, Cardiff, Bristol, and the west part of Greater London (Fig 3E). Regarding BMI, areas with elevated risk (1.0<OR<2.0) are concentrated in the southwestern parts of the UK, including areas between Birmingham, Cardiff, Bristol, and the west part of Greater London (Fig 3F).
Spatial insights into CRC risk factors for the younger group
The only risk factor for participants in the younger group (Dataset 2) is PRS (Fig 4A). Areas with the highest risk (OR>4.0) include urban areas between Glasgow and Edinburgh in the north, through urban areas in central parts of UK, and areas northwest of the Greater London in the south. Areas with the second highest risk (3.0<OR<4.0) are in the region between Cardiff, Bristol, and west parts of Greater London (Fig 4A). Surprisingly, smoking is a factor negatively associated with the risk of developing CRC among participants in the younger group who lived in areas between Bristol and west of Greater London with significant odds ratios varying between 0.50 and 0.51 (Fig 4B).
(A) PRS. (B) smoking. Shown on the maps are the median odds ratios in local authority’s areas. (Note: Local authority boundary data were obtained from UK Census data, available through the UK Data Service: https://ukdataservice.ac.uk/help/data-types/census-data/).
Overall, for participation in both the older group and the younger group, PRS is the outstanding significant risk factor for participants who lived in urban areas from Glasgow and Edinburg in the north through Cardiff, Bristol, and Greater London in the south.
Discussion and conclusion
In this study, we aimed to examine the geographical variations of CRC risk factors for the older (50<=age < 70) and young (age < 50) adult groups in the UK based on data from the UK Biobank. Our findings revealed that: 1) polygenic risk score (PRS) is the most prominent risk factor among participants in both the older and young adult groups, 2) the spatial patterns of the level of risk associated with PRS are different between the two age groups, 3) the level of risk associated with other risk factors are less pronounced compared to that of PRS based on the values of odds ratios associated with these factors, and 4) CRC risk factors exhibit spatial variations across different regions in the UK. In addition, the overall approach used in this study provides a general framework for providing spatial insights that can be used to aid the deployment of geographically targeted cancer screening efforts.
Findings from this study suggest that: 1) PRS is potentially the most important and reliable factor for targeted screening in reducing the burden of CRC for people across all age groups in the UK, 2) geographic variations may exist among people of different age groups. It is important to keep these findings in mind when developing screening programs to prioritize screening of people of different age groups in different geographic areas. In the UK, top priority areas for people ages between 50 and 70 are areas between Birmingham, Cardiff, Bristol, and west of Greater London. For people younger than 50, the top priority areas are areas in the central part of UK between south of Glasgow and Edinburgh and northwest of Greater London.
Previous studies have widely used PRS to explore the impact of genetic risk on CRC [20,21,23–27]. PRS is essential to select screening intervals after negative findings from a colonoscopy [25]. To help define the start ages for CRC screening, Guo et al. [25] developed a polygenic risk scoring system using 90 SNPs that affect people of European descent at risk of CRC. The results indicate that people with a low and a medium PRS can have their recommended 10-year screening interval prolonged, while there is no need to shorten the screening interval for people with a high PRS.
In addition, PRS plays a crucial role in predicting and identifying high-risk individuals for CRC [20,21], as well as exploring the association between PRS and CRC risks at different stages [23,26]. Jia et al. [21] generated PRS using genome-wide association studies (GWAS) variants for eight common cancers, including CRC, based on genetic data from the UK Biobank with 400,812 participants of European descent, and found that individuals among the highest 5% of the PRS had two to threefold elevated risk for CRC. Furthermore, in the association analysis between CRC and PRS, CRC-associated common genetic variants were found to be more strongly associated with early-onset cancer than late-onset cancer [23]. These findings support the results of our study, which identified PRS as a prominent risk factor for CRC in both age groups. However, previous studies did not incorporate spatial analysis. Differing from previous studies, our study examined how to use PRS as a risk factor for prioritizing geographic areas for targeted screening. The ancestral differences between North and South Britain, with their different genetic diversities, may help explain the geographic variations in PRS risk found in this study [28].
Studies reported in the literature have shown that multiple other factors are associated with the development of CRC. Patient-level factors, such as age and gender, have been extensively studied in relation to changes in CRC incidence rates [8,29–31]. It has been found that age is associated with increased risk of CRC [30], and a greater level of CRC emergency diagnosis [29]. Males have a higher CRC risk than females [30,31]. Other published studies suggest that lifestyle factors, such as obesity (BMI > 30) [4], high alcohol intake [32], and cigarette smoking [33], are risk factors for the development of many types of cancer, including CRC. Individuals who are unemployed were less likely to participate in CRC screening compared to those who were employed [34]. These findings align with the results of our study. The spatial disparities identified in our study reflect differences in exposure to these risk factors [35].
This study is the first to analyze the spatial variation of CRC risk factors in the UK using data from the UK Biobank. The use of maps to display the geographic variations of each significant CRC risk factor provides a visually impactful method of conveying GWLR model results to healthcare providers. Moreover, the visualization of GWLR results facilitates the identification of the most critical drivers of CRC at different locations for targeted screening. A limitation in this study is that the participants were concentrated in urban regions in the UK. Our interpretation of the results has primarily focused on these urban regions, potentially overlooking the implications for rural areas. To gain a more comprehensive understanding of the broader impact of CRC risk factors, it is essential to conduct additional studies with an increased number of participants from rural regions in the UK. It should be noted that the sample size for people younger than 50 in this current study was small. This may lead to less generalizable results for this demographic. Despite the small sample size, the results of our study align with previous findings that PRS is more strongly associated with early-onset cancer than with late-onset cancer [23]. Additional research involving a larger sample size is needed to enhance the robustness and generalizability of the findings.
Future studies should further explore the concentration of significant OR for smoking in the Greater London region. In contrast to previous studies [12,36], participants with a current smoking status had less risk of CRC in the Greater London region in this study. As noted by Dimou et al. [36], spending more time smoking over a patient’s lifetime was positively associated with CRC, while patients who no longer smoked were not associated with CRC risk. Participants who were former smokers and smoked for ≥10 years or current smokers who have smoked for ≥10 years had a higher CRC risk compared to those who never smoked [12]. However, the multiple possibilities of current smokers’ status and unknown length of smoking time can be confounding factors in this study.
Furthermore, future studies should investigate the impact of population stratification on the regional variation of PRS for CRC in the UK. The impact of PRS on complex traits is known to be susceptible to population stratification [28]. The GWAS used to derive the 140 SNPs for the PRS used in this current study did not adequately control for population stratification. Additionally, ancestral differences between North and South Britain further compound this issue [28]. Therefore, uncontrolled population stratification in GWAS represents a potential confounding factor in our study. Future investigations replicating the study in diverse cohorts will help validate our findings and ensure that the results are generalizable across different population groups.
In conclusion, we used colorectal cancer (CRC) as an example and data from the UK Biobank to show how we can gain insights that may be used to prioritize CRC screening. While additional studies are needed to confirm if PRS is indeed a risk factor that can be used to facilitate CRC screening across different populations, this study has demonstrated that the overall approach holds potential to provide insights into risk factors and geographic areas to prioritize screening of CRC and other cancers.
Acknowledgments
The research reported in this article is part of Mei Yang’s dissertation completed at Texas State University. Our gratitude goes to the participants and staff of the UK Biobank for their dedication and valuable contribution to this research.
References
- 1. The International Agency for Research on Cancer. Cancer today 2024. Available from: https://gco.iarc.fr/today/en. Accessed 2024 July 16.
- 2. Emmons KM, Colditz GA. Realizing the potential of cancer prevention - the role of implementation science. N Engl J Med. 2017;376(10):986–90. pmid:28273020
- 3. Zhan FB, Morshed N, Kluz N, Candelaria B, Baykal-Caglar E, Khurshid A, et al. Spatial insights for understanding colorectal cancer screening in disproportionately affected populations, Central Texas, 2019. Prev Chronic Dis. 2021;18:E20. pmid:33661726
- 4. World Health Organization. Colorectal cancer 2023. Available from: https://www.who.int/news-room/fact-sheets/detail/colorectal-cancer. Accessed 2023 May 24.
- 5. Cowling TE, Cromwell DA, Bellot A, Sharples LD, van der Meulen J. Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably. J Clin Epidemiol. 2021;133:43–52. pmid:33359319
- 6. Cancer Research UK. Bowel cancer statistics 2023. Available from: https://www.cancerresearchuk.org/health-professional/cancer-statistics/statistics-by-cancer-type/bowel-cancer. Accessed 2023 October 9.
- 7. Young C, Wood HM, Seshadri RA, Van Nang P, Vaccaro C, Melendez LC, et al. The colorectal cancer-associated faecal microbiome of developing countries resembles that of developed countries. Genome Med. 2021;13(1):27. pmid:33593386
- 8. Exarchakou A, Donaldson LJ, Girardi F, Coleman MP. Colorectal cancer incidence among young adults in England: trends by anatomical sub-site and deprivation. PLoS One. 2019;14(12):e0225547. pmid:31805076
- 9. Gupta S, Coronado GD, Argenbright K, Brenner AT, Castañeda SF, Dominitz JA, et al. Mailed fecal immunochemical test outreach for colorectal cancer screening: Summary of a Centers for Disease Control and Prevention-sponsored Summit. CA Cancer J Clin. 2020;70(4):283–98. pmid:32583884
- 10. Pignone MP, Crutchfield TM, Brown PM, Hawley ST, Laping JL, Lewis CL, et al. Using a discrete choice experiment to inform the design of programs to promote colon cancer screening for vulnerable populations in North Carolina. BMC Health Serv Res. 2014;14:611. pmid:25433801
- 11. Dagne GA. Geographic variation and association of risk factors with incidence of colorectal cancer at small-area level. Cancer Causes Control. 2022;33(9):1155–60. pmid:35870048
- 12. Oyeyemi SO, Braaten T, Botteri E, Berstad P, Borch KB. Exploring geographical differences in the incidence of colorectal cancer in the Norwegian Women and Cancer Study: a population-based prospective study. Clin Epidemiol. 2019;11:669–82. pmid:31496822
- 13. Mayfield HJ, Lowry JH, Watson CH, Kama M, Nilles EJ, Lau CL. Use of geographically weighted logistic regression to quantify spatial variation in the environmental and sociodemographic drivers of leptospirosis in Fiji: a modelling study. Lancet Planet Health. 2018;2(5):e223–32. pmid:29709286
- 14. Shao Y, Wang Y, Yu H, Zhang Y, Xiang F, Yang Y, et al. Geographical variation in lung cancer risk associated with road traffics in Jiading District, Shanghai. Sci Total Environ. 2019;652:729–35. pmid:30380480
- 15. Peng Q, Zhang N, Yu H, Shao Y, Ji Y, Jin Y, et al. Geographical variation of COPD mortality and related risk factors in Jiading District, Shanghai. Front Public Health. 2021;9:627312. pmid:33614588
- 16. Imran M, Hamid Y, Mazher A, Ahmad SR. Geo-spatially modelling dengue epidemics in urban cities: a case study of Lahore, Pakistan. Geocarto International. 2019;36(2):197–211.
- 17. Siegel RL, Medhanie GA, Fedewa SA, Jemal A. State variation in early-onset colorectal cancer in the United States, 1995-2015. J Natl Cancer Inst. 2019;111(10):1104–6. pmid:31141602
- 18. Abboud Y, Fraser M, Qureshi I, Srivastava S, Abboud I, Richter B, et al. Geographical variations in early onset colorectal cancer in the United States between 2001 and 2020. Cancers (Basel). 2024;16(9):1765. pmid:38730717
- 19. Choi SW, Mak TS-H, O’Reilly PF. Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc. 2020;15(9):2759–72. pmid:32709988
- 20. Thomas M, Sakoda LC, Hoffmeister M, Rosenthal EA, Lee JK, van Duijnhoven FJB, et al. Genome-wide modeling of polygenic risk score in colorectal cancer risk. Am J Hum Genet. 2020;107(3):432–44. pmid:32758450
- 21. Jia G, Lu Y, Wen W, Long J, Liu Y, Tao R, et al. Evaluating the utility of polygenic risk scores in identifying high-risk individuals for eight common cancers. JNCI Cancer Spectr. 2020;4(3):pkaa021. pmid:32596635
- 22. Gollini I, Lu B, Charlton M, Brunsdon C, Harris P. GWmodel: AnRPackage for exploring spatial heterogeneity using geographically weighted models. J Stat Soft. 2015;63(17).
- 23. Archambault AN, Su Y-R, Jeon J, Thomas M, Lin Y, Conti DV, et al. Cumulative burden of colorectal cancer-associated genetic variants is more strongly associated with early-onset vs late-onset cancer. Gastroenterology. 2020;158(5):1274-1286.e12. pmid:31866242
- 24. Chen X, Guo F, Hoffmeister M, Chang-Claude J, Brenner H. Non-steroidal anti-inflammatory drugs, polygenic risk score and colorectal cancer risk. Aliment Pharmacol Ther. 2021;54(2):167–75. pmid:34114659
- 25. Guo F, Weigl K, Carr PR, Heisser T, Jansen L, Knebel P, et al. Use of polygenic risk scores to select screening intervals after negative findings from colonoscopy. Clin Gastroenterol Hepatol. 2020;18(12):2742–2751.e7. pmid:32376506
- 26. Jenkins MA, Buchanan DD, Lai J, Makalic E, Dite GS, Win AK, et al. Assessment of a polygenic risk score for colorectal cancer to predict risk of lynch syndrome colorectal cancer. JNCI Cancer Spectr. 2021;5(2):pkab022. pmid:33928216
- 27. Yang M, Narasimhan VM, Zhan FB. High polygenic risk score is a risk factor associated with colorectal cancer based on data from the UK Biobank. PLoS One. 2023;18(11):e0295155. pmid:38032963
- 28. Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. Elife. 2019;8:e39702. pmid:30895926
- 29. Bright CJ, Gildea C, Lai J, Elliss-Brookes L, Lyratzopoulos G. Does geodemographic segmentation explain differences in route of cancer diagnosis above and beyond person-level sociodemographic variables?. J Public Health (Oxf). 2021;43(4):797–805. pmid:32785586
- 30. Chung RY-N, Tsoi KKF, Kyaw MH, Lui AR, Lai FTT, Sung JJ-Y. A population-based age-period-cohort study of colorectal cancer incidence comparing Asia against the West. Cancer Epidemiol. 2019;59:29–36. pmid:30660075
- 31. Mosquera I, Mendizabal N, Martín U, Bacigalupe A, Aldasoro E, Portillo I, et al. Inequalities in participation in colorectal cancer screening programmes: a systematic review. Eur J Public Health. 2020;30(3):416–25. pmid:32361732
- 32. Rawla P, Sunkara T, Barsouk A. Epidemiology of colorectal cancer: incidence, mortality, survival, and risk factors. Prz Gastroenterol. 2019;14(2):89–103. doi: https://doi.org/10.5114/pg.2018.81072 pmid:31616522
- 33.
Thélin C, Sikka S. Epidemiology of colorectal cancer—incidence, lifetime risk factors statistics and temporal trends. London: IntechOpen Limited; 2015. doi: https://doi.org/10.5772/61945
- 34. Song EY, Swanson J, Patel A, MacDonald M, Aponte A, Ayoubi N, et al. Colorectal cancer risk factors and screening among the uninsured of Tampa Bay: a free clinic study. Prev Chronic Dis. 2021;18:E16. pmid:33630731
- 35. Roshandel G, Ghasemi-Kebria F, Malekzadeh R. Colorectal cancer: epidemiology, risk factors, and prevention. Cancers (Basel). 2024;16(8):1530. pmid:38672612
- 36. Dimou N, Yarmolinsky J, Bouras E, Tsilidis KK, Martin RM, Lewis SJ, et al. Causal effects of lifetime smoking on breast and colorectal cancer risk: mendelian randomization study. Cancer Epidemiol Biomarkers Prev. 2021;30(5):953–64. pmid:33653810