Skip to main content
  • Loading metrics

Cancer systems epidemiology: Overcoming misconceptions and integrating systems approaches into cancer research

Summary points

  • While traditional epidemiological approaches have helped generate important insights about cancer prevention and treatment, they have important limitations and alone cannot bridge the gaps that continue to exist in cancer research and knowledge.
  • One shortcoming is the failure to fully account for and characterize the complexity of various systems (e.g., biological, behavioral, social, environmental, and economic) that can lead to cancer and are affected by cancer.
  • Systems approaches can help researchers, clinicians, and other decision makers better understand complex systems and address these systems at many levels, ranging from the cellular to the societal scale.
  • Systems mapping can shed light on otherwise hidden mental models, and dynamic modeling can enable virtual experimentation—the systematic exploration of counterfactual scenarios not observable in the real world.
  • We present and discuss 14 common misconceptions that will need to be overcome in order for systems epidemiology to realize its potential role in cancer prevention and control.
  • Examples of systems approaches applied to cancer-related research topics are given to illustrate the utility of systems approaches to transform cancer epidemiology to cancer systems epidemiology.

Background and significance

Many traditional epidemiological methods are regression-based and attempt to find associations between certain risk factors (e.g., smoking, diet, physical activity) and disease outcomes (e.g., cancer). While these methods help identify factors to explore further, they are not equipped to uncover the complex systems and processes that underlie cancer. That is, they are not designed to really examine the complex mechanisms and interactions among multiple independent variables (e.g., biological, behavioral, social, economic), which play out over time to affect health. Gaining a more complete understanding of these complex systems requires a new approach. Thus, there is a need for more systems science approaches (e.g., systems epidemiology), which can help better untangle the complexity in systems [1,2]. As part of the PLOS Collection “Cancer Systems Epidemiology Insights and Future Opportunities,” which covers many of the topics discussed in the National Cancer Institute (NCI) Workshop to Facilitate Cancer Systems Epidemiology Research and exemplifies the opportunities and uses of systems epidemiology approaches in cancer research [3], we describe systems science approaches and how they can be utilized in cancer research, present common misconceptions that must be overcome for systems epidemiology to realize its potential for cancer epidemiology, and describe how greater use of systems epidemiology can transform cancer research.

Traditional cancer epidemiology top-down approaches have helped identify important associations

Epidemiology has been defined as “the study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to the control of health problems [4].” Traditional epidemiological methods tend to be more top-down approaches, in which predefined designs and analytical approaches are applied to datasets. This paradigm often starts with sets of data on a disease in a specific population and tries to determine associations between risk factors and diseases and then draws inferences from the associations. This approach includes descriptive statistics of the datasets and domain-limited statistical methods to identify potential associations and trends such as linear regression, logistic regression, and survival analysis. Such methods are useful for evaluating the extent to which a variety of exposures are associated with health outcomes of interest.

Inferential statistics have helped generate important insights about cancer prevention and treatment. For example, exploring the causal role that cigarette smoking has in increasing lung cancer risk helped to develop modern statistical approaches for inference in chronic disease epidemiology and has greatly reduced cancer burden [5]. While such correlations and associations certainly do not prove cause-and-effect, they can suggest that a factor may be involved in the causal pathway of cancer, whether it is a direct cause or a sign that something else is happening. Despite their value, traditional approaches have limitations and alone cannot bridge the gaps that continue to exist in cancer research and knowledge.

Systems epidemiology bottom-up approaches can help better understand complex mechanisms

While traditional methods help show possible associations, systems epidemiology methods are more bottom-up, aiming to rebuild the systems of interest and untangle the actual mechanisms and causal pathways involved. Such pathways may be complex, nonlinear, and dynamic, potentially spanning multiple levels, scales, and sectors. Systems epidemiology methods attempt to represent complex systems in somewhat simplified forms, distilling them to their essential elements and processes, stripping away the noise and making the system easier to understand. They can also allow for virtual testing of different circumstances, interventions, and policies that may not be possible or practical in the real-life system.

One common set of systems approaches are systems maps/diagrams that visually represent components of the system and their relationships with each other. These can show how different people’s conceptualizations or mental models of the system may be similar versus different and then identify a more comprehensive representation of the system. They can also serve as blueprints to develop subsequent systems models.

A systems map becomes a systems model when one adds quantitative representations (e.g., mathematical equations) of the relationships and processes that link the different components in the system. Once these equations are established, data are used to populate, calibrate, and validate the model. Thus, the model begins with the understanding/conceptualization of the system and not necessarily with a particular dataset. These equations can represent a situation at a particular point in time or simulate what happens over time, making it a dynamic simulation model. The equations can use specific values for a deterministic model or incorporate variability and uncertainty, making it stochastic.

Since systems models aim to recreate the system, they are quite different from traditional statistical models that try to identify associations and trends and potentially extrapolate them (Ip and colleagues describe additional differences [6]). The latter starts with the data and then identifies patterns or trends in the data according to statistical properties subject to pre-identified assumptions. Systems models are also different from other computer-driven approaches that start first with the data, such as machine learning categorization and automated feature selection, and then try to find associations and trends in the data.

Population of a systems model consists of establishing values for each parameter in each equation. Once the model is populated, calibration entails adjusting the values so that the model fits the right constraints and assumptions. Model validation determines how well the model represents what it is supposed to represent [7]. This includes face validity (experts evaluate model structure, data sources, assumptions, and results), criterion validity (how well the model can recreate real-world datasets), and convergence/divergence validity (how similar are model results to other ways of calculating such results when they should be similar and how different are they when they should differ) [7]. A key aspect of systems modeling is performing sensitivity analyses, which explore the effects of varying key model parameters. Sensitivity analyses can help reveal the major drivers or key relationships that explain observed outcomes.

A systems model can serve as a virtual laboratory to test different possibilities. Such virtual experimentation has advantages over real-world experimentation. Real-world experimentation can take considerably more time, effort, and resources. It may not even be ethical or feasible. For example, running simulations can allow you to go back in time to see what could have happened or go forward in time to see what may happen. Conducting virtual experimentation first can guide the design of real-world experiments so that these are done much more effectively and efficiently. Dynamic models can also be reverse-engineered to estimate resource requirements and their amount and type of intervention(s) necessary to achieve a desired outcome in a specified timeframe [8], and can be used to evaluate the utility of existing interventions compared to a counterfactual in which they were not implemented. By representing the actual processes in a system, simulation experiments can reveal potential unintended consequences of a policy or intervention.

Ideally, systems mapping and modeling should proceed in an iterative manner as illustrated by Fig 1. One does not need to come up with a perfect representation of the system at the beginning. Instead, the initial systems map and model can be a rough approximation that in turn can identify the knowledge and data gaps to then guide study designs and data collection. Once such studies and data collection yield more insights and data, the systems map and model can be updated accordingly, leading to more cycles of further refining both the systems map and model as well as the studies, data collection, and insights. This iterative process can help move toward better understanding of the system.

Growing use of systems epidemiology methods in health research

There are already examples of how researchers and decision makers (e.g., stakeholders, regulators, program managers, organizational leaders, government leaders) have used such systems methods at different stages for research and decision-making, from conceptualization to development to the real-world (as Fig 2 illustrates) [916]. For example, computational models have helped demonstrate relationships and effects that traditional methods may have missed [17], such as how smoking cessation treatment policies resulted in the largest reductions in smoking prevalence, followed by cigarette tax increases, smoke-free air laws, and educational policies but that implementing these all in combination yielded a significantly larger reduction than any single one alone [10]. As another example, models have helped decision makers better understand the impact and cost of physical inactivity rates among youth and revealed how the type and intensity of physical activity could significantly affect the results clinical outcomes and costs [13]. Computational models have also helped show how distributing vaccines to lower income neighborhoods first during a pandemic could be more beneficial to society [15] and how cooperation among healthcare facilities in a region can lead to better overall control of an antibiotic-resistant pathogen [16].

Fig 2. Systems modeling and approaches can and do occur at different points along the research path from idea inception to policy implementation.

Systems epidemiology applications in cancer research

There are examples of systems methods assisting with cancer-related research. Initiatives such as the NCI’s Cancer Systems Biology Consortium [18] and Integrative Cancer Biology Program [19] have generated new insights. For example, systems approaches have made substantial progress in characterizing the genetics of cancer and contributions of individual intracellular pathways involved in tumor initiation and progression [20]. They have also provided a better understanding the drivers of tumor growth and cancer development and progression [2123] as well as identifying possible cancer treatments for novel combination therapies [24].

In the field of cancer epidemiology, systems approaches have been used to inform a variety of policy making such as helping guide recommendations for cancer screening. For example, collaborative systems modeling has been used to inform the US Preventative Services Task Force’s (USPSTF) breast cancer screening recommendations [25]. Within the Cancer Intervention and Surveillance modeling Network (CISNET), 6 independently developed models evaluated mammography screening strategies in the U.S. population and helped inform decisions being made about various screening strategies [26]. Data used in CISNET models included age-specific breast cancer incidence, digital mammography performance characteristics, ER/HER2-specific treatment effects, and average and comorbidity-specific non-breast cancer causes of death, among others. Outputs of these models include reduction in mortality, breast cancer deaths averted, life-years and quality-adjusted life years (QALYs) gained and the number of screenings, false-positive screens, benign biopsies, and overdiagnosed cases (that is, cases that would not have been clinically detected in the absence of screening due to lack of progression or death). The results of these systems models showed that biennial screening strategies are the most efficient and that digital mammography screening of average-risk women aged 40 to 50 years modestly lowers mortality and extends the length of life [26]. These CISNET systems models have continued to inform policy making and to investigate emerging issues in breast cancer control including legislation about the risks of undergoing mammography on long-term breast cancer outcomes, impact of comorbidities on when screening should stop and on overdiagnosis, and the costs and benefits of transitioning to digital screening [25]. Further, CISNET models of other common cancers have shed light on the relevance of exposures such as smoking intervention in reducing lung cancer burden [27] and colorectal screening for reducing colon cancer development [28].

Additionally, the amount of whole-genome tumor sequence and biological annotation datasets have been rapidly increasing in size, number, and content. With this growth, there is a need for a systems epidemiology approach to integrate functionality across databases, methods, and analyses. An example is the development and application of software that uses a systems approach to manage, annotate, and analyze cancer mutations (using tumor data across dozens of studies and tissue types) [29]. This approach uses information from multiple different annotation sources to differentiate tumor mutations that are drivers from passengers; the drivers are then retained in a novel panel for sequencing in cell-free DNA [30]. By incorporating multiple levels of information (whole-genome sequence data), this approach outperformed conventional sequencing panel methods (e.g., based on frequency of observed mutations) in an application to prostate cancer [30,31].

Barriers to greater use of systems epidemiology in cancer research

However, existing efforts have only scratched the surface of what systems epidemiology can do for cancer prevention and treatment. Use of systems epidemiology approaches has been limited by common misconceptions, such as those listed in Table 1, lack of training, lack of funding, lack of awareness, and institutional and professional inertia. Few universities offer systems epidemiology training programs. Systems epidemiology requires crossing over many traditional disciplines that often are siloed off from each other such as those of programmers, modelers, epidemiologists, clinicians, and policy makers. Many funding mechanisms and scientific review processes still focus on more established traditional approaches [32]. Change in general can take time. Of course, the extent to which systems epidemiology can be used depends on how well the different mechanisms involved in cancer biology and epidemiology are elucidated, how well the maps and models can represent these mechanisms, and how much the scientific community accepts such representations. These are far from unsurmountable challenges and, in fact, can grow more and more achievable with time.

Table 1. Common misconceptions about systems maps and models.

Greater use of systems epidemiology can transform cancer research

Our society is at an inflection point where there is now more data from different, disparate, and wide-ranging sources and there is wide availability of analytic tools with greater computational resources and power. We are no longer limited to analyzing a single dataset or study population at a time. Systems epidemiology methods can help link different, disparate data and make better use of data that may have been viewed as imperfect in the past. While research in the past several decades has resulted in more effective prevention and treatment measures, there are a number of areas where progress has stalled. For example, some cancers (e.g., skin, liver) have been on the rise and others (e.g., pancreatic, liver, esophageal) continue to have poor cure rates [33]. This suggests that the causal pathways may be more complex than realized and that key factors and processes are not being addressed. With a greater understanding of the capabilities of systems epidemiology and a greater investment of resources, it is our hope that systems epidemiology may be able to better elucidate these causal pathways and lead to more effective prevention and treatment measures. Moreover, significant disparities exist in many cancer diagnoses and outcomes and risk, and the course of cancer can vary substantially among different people and populations. Therefore, one-size-fits-all approaches that are driven by standard designs and analytical approaches may not work adequately. Systems epidemiology can help identify and develop more tailored approaches and move toward precision medicine for cancer prevention and treatment. In addition, even when cancer treatments are effective, they can have risks and side effects. Systems epidemiology can help elucidate what may be leading to these risks and side effects and help develop better treatments. Finally, with a greater recognition of the many existing constraints, systems epidemiology can help decision makers such as clinicians, public health officials, policy makers, and third-party payers prioritize initiatives, save time, effort, and money, and better allocate limited resources among different cancer research, prevention, and treatment options.


Systems epidemiology methods, such as mapping and dynamic simulation modeling, are designed to gain understanding of complex phenomena through simplified representation and virtual experimentation. These methods have proven to be valuable complements to other methods of inquiry in other health domains but have been underutilized to date in cancer epidemiology. With increasing availability of high-performance computers and sophisticated analytical tools, systems epidemiology has enormous potential to expand research on cancer prevention, treatment, and control by helping untangle the complexities in cancer epidemiology. The benefits of using systems epidemiology include a better understanding of a problem’s impact over time, identification of leverage points for intervening in the system, and trade-offs and consequences of policy decisions. When adopting systems epidemiology methods, researchers should be aware of the common misconceptions that need to be overcome. Systems epidemiology can transform cancer research by helping identify and develop more tailored approaches to move toward precision medicine for cancer prevention and treatment.


This manuscript is based on the presentations by the authors at the Workshop to Facilitate Cancer Systems Epidemiology Research from February 29 to March 1, 2019 in Bethesda, Maryland.

The authors of this manuscript are solely responsible for its content. Statements in the manuscript do not necessarily represent the official views of, or imply endorsement by, the National Institute of Health, Agency for Healthcare Research and Quality, or US Health and Human Services.


  1. 1. Meadows DH. Thinking in systems: A primer. Chelsea Green Publishing; 2008.
  2. 2. Mabry PL. Making sense of the data explosion: the promise of systems science. Am J Prev Med. 2011;40(5):S159–S61. pmid:21521590
  3. 3. Barajas R, Hair B, Lai G, Rotunno M, Shams-White MM, Gillanders EM, et al. Facilitating cancer systems epidemiology research. PLoS ONE. 2021;16(12):e0255328. Epub 2022/01/01. pmid:34972102; PubMed Central PMCID: PMC8719747.
  4. 4. Last J. A dictionary of epidemiology. Oxford University Press; 2001.
  5. 5. Glass TA, Goodman SN, Hernán MA, Samet JM. Causal inference in public health. Annu Rev Public Health. 2013;34:61–75. pmid:23297653
  6. 6. Ip EH, Rahmandad H, Shoham DA, Hammond R, Huang TT, Wang Y, et al. Reconciling statistical and systems science approaches to public health. Health Educ Behav. 2013;40(1 Suppl):123S–31S. Epub 2013/10/23. pmid:24084395; PubMed Central PMCID: PMC5105232.
  7. 7. Eddy DM, Hollingworth W, Caro JJ, Tsevat J, McDonald KM, Wong JB, et al. Model transparency and validation: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force—7. Value Health. 2012;15(6):843–50. Epub 2012/09/25. pmid:22999134.
  8. 8. Levy DT, Mabry PL, Graham AL, Orleans CT, Abrams DB. Exploring scenarios to dramatically reduce smoking prevalence: a simulation model of the three-part cessation process. Am J Public Health. 2010;100(7):1253–9. pmid:20466969
  9. 9. Feirman SP, Donaldson E, Glasser AM, Pearson JL, Niaura R, Rose SW, et al. Mathematical Modeling in Tobacco Control Research: Initial Results From a Systematic Review. Nicotine Tob Res. 2016;18(3):229–42. Epub 2015/05/16. pmid:25977409.
  10. 10. Levy DT, Mabry PL, Graham AL, Orleans CT, Abrams DB. Reaching healthy people 2010 by 2013: a SimSmoke simulation. Am J Prev Med. 2010;38(3):S373–S81. pmid:20176310
  11. 11. U.S. Food and Drug Administration. U of M TCORS: Center for the Assessment of the Public Health Impact of Tobacco Regulations 2018. Available from:
  12. 12. Mabry PL, Bures RM. Systems science for obesity-related research questions: an introduction to the theme issue. Am J Public Health. 2014;104(7):1157–9. pmid:24832429
  13. 13. Lee BY, Adam A, Zenkov E, Hertenstein D, Ferguson MC, Wang PI, et al. Modeling the economic and health impact of increasing children’s physical activity in the United States. Health Aff. 2017;36(5):902–8. pmid:28461358
  14. 14. Natonal Collaborative on Childhood Obesity Research. NCCOR Envision. Available from:
  15. 15. Lee BY, Brown ST, Bailey RR, Zimmerman RK, Potter MA, McGlone SM, et al. The benefits to all of ensuring equal and timely access to influenza vaccines in poor communities. Health Aff (Millwood). 2011;30(6):1141–50. Epub 2011/06/10. pmid:21653968; PubMed Central PMCID: PMC3385997.
  16. 16. Lee BY, Bartsch SM, Wong KF, Yilmaz SL, Avery TR, Singh A, et al. Simulation shows hospitals that cooperate on infection control obtain better results than hospitals acting alone. Health Aff. 2012;31 (10):2295–303. pmid:23048111
  17. 17. Willem L, Verelst F, Bilcke J, Hens N, Beutels P. Lessons from a decade of individual-based models for infectious disease transmission: a systematic review (2006–2015). BMC Infect Dis. 2017;17(1):612. Epub 2017/09/13. pmid:28893198; PubMed Central PMCID: PMC5594572.
  18. 18. Cancer Systems Biology Consortium (CSBC) [cited 2022 Jan 31]. Available from:
  19. 19. Koch Institute for Integrative Cancer Research at MIT. Available from:
  20. 20. Archer TC, Fertig EJ, Gosline SJ, Hafner M, Hughes SK, Joughin BA, et al. Systems Approaches to Cancer Biology. Cancer Res. 2016;76(23):6774–7. Epub 2016/11/20. pmid:27864348; PubMed Central PMCID: PMC5135591.
  21. 21. Miller HA, Lowengrub J, Frieboes HB. Modeling of Tumor Growth with Input from Patient-Specific Metabolomic Data. Ann Biomed Eng. 2022. Epub 2022/01/28. pmid:35083584.
  22. 22. Noble R, Burri D, Le Sueur C, Lemant J, Viossat Y, Kather JN, et al. Spatial structure governs the mode of tumour evolution. Nat Ecol Evol. 2021. Epub 2021/12/25. pmid:34949822.
  23. 23. Mohammad Mirzaei N, Su S, Sofia D, Hegarty M, Abdel-Rahman MH, Asadpoure A, et al. A Mathematical Model of Breast Tumor Progression Based on Immune Infiltration. J Pers Med. 2021;11(10). Epub 2021/10/24. pmid:34683171; PubMed Central PMCID: PMC8540934.
  24. 24. Schmucker R, Farina G, Faeder J, Frohlich F, Saglam AS, Sandholm T. Combination treatment optimization using a pan-cancer pathway model. PLoS Comput Biol. 2021;17(12):e1009689. Epub 2021/12/29. pmid:34962919; PubMed Central PMCID: PMC8747684.
  25. 25. Alagoz O, Berry DA, de Koning HJ, Feuer EJ, Lee SJ, Plevritis SK, et al. Introduction to the Cancer Intervention and Surveillance Modeling Network (CISNET) Breast Cancer Models. Med Decis Making. 2018;38(1_suppl):3S–8S. Epub 2018/03/20. pmid:29554472; PubMed Central PMCID: PMC5862043.
  26. 26. Mandelblatt JS, Stout NK, Schechter CB, van den Broek JJ, Miglioretti DL, Krapcho M, et al. Collaborative Modeling of the Benefits and Harms Associated With Different U.S. Breast Cancer Screening Strategies. Ann Intern Med. 2016;164(4):215–25. Epub 2016/01/13. pmid:26756606; PubMed Central PMCID: PMC5079106.
  27. 27. Jeon J, Meza R, Krapcho M, Clarke LD, Byrne J, Levy DT. Chapter 5: Actual and counterfactual smoking prevalence rates in the U.S. population via microsimulation. Risk Anal. 2012;32Suppl 1:S51–68. Epub 2012/08/29. pmid:22882892; PubMed Central PMCID: PMC3478148.
  28. 28. Zauber AG, Lansdorp-Vogelaar I, Knudsen AB, Wilschut J, van Ballegooijen M, Kuntz KM. Evaluating Test Strategies for Colorectal Cancer Screening-Age to Begin, Age to Stop, and Timing of Screening Intervals: A Decision Analysis of Colorectal Cancer Screening for the US Preventive Services Task Force from the Cancer Intervention and Surveillance Modeling Network (CISNET). U.S. Preventive Services Task Force Evidence Syntheses, formerly Systematic Evidence Reviews. Rockville (MD) 2009.
  29. 29. Cario CL, Witte JS. Orchid: a novel management, annotation and machine learning framework for analyzing cancer mutations. Bioinformatics. 2018;34(6):936–42. Epub 2017/11/07. pmid:29106441; PubMed Central PMCID: PMC5860353.
  30. 30. Cario CL, Chen E, Leong L, Emami NC, Lopez K, Tenggara I, et al. A machine learning approach to optimizing cell-free DNA sequencing panels: with an application to prostate cancer. BMC Cancer. 2020;20(1):820. Epub 2020/08/30. pmid:32859160; PubMed Central PMCID: PMC7456018.
  31. 31. Chen E, Cario CL, Leong L, Lopez K, Marquez CP, Chu C, et al. Cell-free DNA concentration and fragment size as a biomarker for prostate cancer. Sci Rep. 2021;11(1):5040. Epub 2021/03/05. pmid:33658587; PubMed Central PMCID: PMC7930042.
  32. 32. Shams-White MM, Barajas R, Jensen RE, Rotunno M, Dueck H, Ginexi EM, et al. Systems epidemiology and cancer: A review of the National Institutes of Health extramural grant portfolio 2013–2018. PLoS ONE. 2021;16(4):e0250061. Epub 2021/04/16. pmid:33857240; PubMed Central PMCID: PMC8049352.
  33. 33. American Cancer Society. Cancer Facts & Figures 2019. Atlanta, GA: American Cancer Society; 2019.
  34. 34. Bartsch SM, O’Shea KJ, Lee BY. The Clinical and Economic Burden of Norovirus Gastroenteritis in the United States. J Infect Dis. 2020;222(11):1910–9. pmid:32671397
  35. 35. Lee BY, Alfaro-Murillo JA, Parpia AS, Asti L, Wedlock PT, Hotez PJ, et al. The potential economic burden of Zika in the continental United States. PLoS Negl Trop Dis. 2017;11(4). WOS:000402256700057. pmid:28448488