Is mammography screening beneficial: An individual-based stochastic model for breast cancer incidence and mortality

The benefits of mammography screening have been controversial, with conflicting findings from various studies. We hypothesize that unmeasured heterogeneity in tumor aggressiveness underlies these conflicting results. Based on published data from the Canadian National Breast Screening Study (CNBSS), we develop and parameterize an individual-based mechanistic model for breast cancer incidence and mortality that tracks five stages of breast cancer progression and incorporates the effects of age on breast cancer incidence and all-cause mortality. The model accurately reproduces the reported outcomes of the CNBSS. By varying parameters, we predict that the benefits of mammography depend on the effectiveness of cancer treatment and tumor aggressiveness. In particular, patients with the most rapidly growing or potentially largest tumors have the highest benefit and least harm from the screening, with only a relatively small effect of age. However, the model predicts that confining mammography to populations with a high risk of acquiring breast cancer increases the screening benefit only slightly compared with the full population.

This is a PLOS Computational Biology Methods paper.

Introduction
Breast cancer is one of three most commonly diagnosed cancers in women, making up 30% of all cancer cases in women in the United States in 2018 [1]. Screening mammography was introduced to detect small and more treatable tumors before they cause symptoms. Several trials, such as the Health Insurance Trial [2], the Edinburgh randomised trial [3,4], the Canadian National Breast Screening Study (CNBSS) [5] and the Swedish Two-Country Trial [6,7], have quantified the benefits of screening mammography. The Swedish study and many others reported that breast cancer mortality was significantly reduced due to screening mammography [6,8], while the CNBSS found no benefits [5,9]. In addition, Welch et al. also found no benefit in their analysis of the SEER data [10]. These contradictory conclusions have spurred intense debate over the benefits of screening mammography. The wide implementation of screening mammography has led to an increased rate of small tumor detection and a decreased rate of large tumor detection over the last decades [10]. The primary cost of screening is overdiagnosis of small benign or unaggressive tumors that would have remained asymptomatic during a patient's lifetime, turning a healthy individual into a patient, and requiring follow-up tests and treatments with deleterious side effects including death [11]. Overdiagnosis also results in unwanted economic and psychological burdens. To address the controversy, the WISDOM study based on a woman's individual risk was initiated in the United States in 2016 [12].
Studies based on statistical or stochastic models [13][14][15] have quantified the influence of various factors such as age, screening frequency and adherence behavior on the benefits and harmful effects of mammography screening based on different data sources or trials other than the CNBSS. Most of transition probabilities in these models were held constant or age-dependent, and thus did not include the effects of tumor heterogeneity across patients. Using the CNBSS, several analyses ( [16,17] and references therein) have estimated screening sensitivities, transition probabilities and sojourn time distributions. As far as we know, no study has developed a mathematical model to quantify the benefits and harm of screening that explicitly takes tumor heterogeneity into consideration. In this work, we propose a mechanistic model focusing on differences among individuals that provides a mathematical tool to gain insight into breast cancer progression. This model includes all possible transitions of breast cancer before and after detection from cancer incidence and detection through progression, treatment and mortality. Our central focus is on unmeasured heterogeneity, which we include through variation in the aggressiveness of tumor growth and maximum tumor size. By including unaggressive cancers (tumors with small aggressiveness of tumor growth and/or maximum tumor size), we are able to model the role of unmeasured heterogeneity in incidence levels, detected tumor sizes, and long-term outcomes to address the balance between costs and benefits of screening. The benefits are measured as the increase in 25-year survival. The costs are the increase in overdiagnosis quantified in two ways, through the difference in the number of patients diagnosed [18], and through the number who would have died due to other causes if treatment were relatively ineffective.
The proposed model is designed to first reproduce the cancer incidence and mortality in the CNBSS with a minimum of parameter fitting to the data itself [5]. By varying key model parameters, we simulate different scenarios of tumor aggressiveness and cancer treatment effectiveness to quantify their effect on the benefit and harm of mammography screening in a population.
The paper first presents the model framework and describes how parameters were estimated from the literature and the CNBSS. Because of the focus on unmeasured differences in underlying cancers, we term this the Breast Cancer Heterogeneity Aggressiveness Model (BCHAM). Using BCHAM, we experiment with the effectiveness of treatment and the underlying mean and variance of tumor aggressiveness to identify when mammography should provide the greatest benefit and the least harm.

The CNBSS
The CNBSS has been described in detail [5,19], and we here summarize its key features ( Fig  1). The CNBSS was designed to investigate the benefits of mammography screening in women aged 40-59. The patients were followed up for up to 25 years (22 years on average). A population of 89, 835 healthy women aged 40-59 was randomly assigned to mammography (five annual mammography screens) and control (no mammography). Women in the mammography arm received both annual mammography and physical examination for the first 5 years of follow-up. In the control arm, women aged 40-49 received only a single physical examination at enrollment, and women aged 50-59 received annual physical examination for the first 5 years of follow-up. Participants were considered eligible if they were in good health, had no mammography in the previous 12 months, and had no history of breast cancer. The number of detected breast cancers, breast cancer mortality and all-cause mortality was recorded during the follow-up period.

BCHAM: An individual-based stochastic model
We use a five-compartment model to track the number of women at each cancer stage via the probabilities and rates of transition between consecutive stages (Fig 2). Let a denote the age of a woman at enrollment and t the time since the beginning of follow-up. The rate of cancer incidence, c a (t), is a bell-shaped function based on the report of Canadian Cancer Registry and Health Statistics Division [20]. The rate of non-breast cancer mortality is captured by an exponential function h a (t) obtained from the 1991 Canadian statistics reported in [21].
Several models of tumor growth have been used in the literature [22,23]. We model tumor diameter at time t with initial diameter d ini at initial time t ini , d(t, t ini , d ini , k), in Table 1, with a Gompertz model of human breast cancer growth [22]. The key parameter k is the tumor aggressiveness constant. Fig 3A illustrates the effects of tumor aggressiveness k and maximum tumor size d max on the Gompertz growth. Detection sensitivities of a tumor are modeled by sigmoid functions [24]. S m (d), S p (d) and S s (d) denote the tumor detection sensitivities of mammography, physical screening and self-examination respectively during the follow-up period. To capture undetected cancers entering the study, we let S b (d) be the tumor detection sensitivity of self-examination before the beginning of the study to eliminate candidates with noticeable tumors. Mathematical formulas of these functions together with their parameters are provided comprehensively in Table 1. Let t d represent the time when a tumor is detected (the time when a patient moves from Stage 2 to 3). Suppose that breast cancers originate from a single cell of the diameter d 0 at time t 0 , the time when a woman moves from Stage 1 to 2. Let d de = d(t d , t 0 , d 0 , k) be the size of a tumor at detection time t d > t 0 . The hazard of cancer mortality depends on the size at detection, tumor aggressiveness and time since detection according to h(d de , k, t − t d ) in Table 1, which follows a two-parameter Weibull distribution based on the probability of cancer mortality [25]. This function includes a parameter α that captures the effectiveness of treatment. The effects of α and detected tumor size d de on the hazard rate of cancer mortality are depicted in Fig 3B. All parameter values are presented in Tables 1 and 2. At enrollment, the participants can be in either the healthy or the undetected cancer compartment. As time passes, they can transfer between stages (Fig 2). Here we assume that once an individual gets diagnosed, they will be labeled as detected cancer for the rest of their life regardless of survival status, which results in no individuals moving from Stage 3 to Stage 1. Furthermore, it should be noted that tumor aggressiveness k, d max and age are incorporated as individualized factors. We simulate a population of N 0 individuals. The total women in the i-th compartment at time t is

Parameter calibration based on the CNBSS
We simulate the model using the CNBSS [5] to calibrate model parameters that cannot be estimated independently from the literature. In the CNBSS, the large number of breast cancers detected during the first year of follow-up (253 and 170 diagnosed cancers in the mammography and control arms respectively ( Table 1 in [5])) suggests that at enrollment the participants may have been either in the healthy or undetected cancer compartment. To capture this, we   year −1 Rate of other cause mortality, h a (t) [21] 0.208 [19,28] 1.5, 6.33, 18.5, 20 mm Mean of log(k) adapted from [22], 128) is a uniform distribution on the interval [1,128]. (2) A polynomial approximation of c a (t) was used in programming to speed up model simulations.
https://doi.org/10.1371/journal.pcbi.1008036.t001 began each simulated patient as healthy 6 years prior to the study initiation, with 6 years as a sufficient time period for previously originating cancers to be diagnosed before the first year of the study. In accordance with the study design, we assume that cancers can be diagnosed only by self physical examination with the detection sensitivity S b (d) before the study. Patients who get diagnosed, die of breast cancer or of other causes before the beginning of the study are not included in the simulated population. A time step of 1 day is chosen for numerical simulation.
Due to the nature of discretization, possibilities of two events occurring during a period of a time step are encompassed in our numerical simulations. In particular, Algorithm I in [29] was used to speed up the simulation of Stage 1 and also eliminate the simultaneous occurrence of undetected cancer and non-breast cancer death events. Because implementation of Algorithm I requires the integration of the cancer incidence rate c a (t), we used a polynomial approximation of c a (t). Moreover, we considered all possible cases including the concurrence of detected cancer and non-breast cancer death events when simulating Stage 2. At any time during the follow-up, if the death event occurs, the simulation is terminated. Let X i be the simulated outcomes, the number of detected cancers and the number of cancer deaths, and Y i be the corresponding recorded observations from the study. For the jth realization of our model, S j is the sum of squared deviations, i.e. S j = ∑ i (X i − Y i ) 2 . The three unknown parameters (detection sensitivity constant b b2 , scale factor in cancer incidence γ and cancer mortality hazard parameter Z) are chosen to minimize the expected value of S. Model calibration is carried out using the data only from the control arm. Then we simulate the model with the estimated parameters over both arms to reproduce the outcomes of the CNBSS.

Statistics and calculation of overdiagnosis
To quantify the survival and overdiagnosis, we simulated each patient in the BCHAM 100 times, 50 times in the mammography and 50 in the control arm, with identical parameters and time of onset of cancer. We used Cox proportional hazards (the coxph function in R [30]) to evaluate the effect of mammography arm on survival from the time of acquiring cancer, thus avoiding the effects of lead time bias [18]. To illustrate the effects, we conduct these regressions on data broken up into sextiles of aggressiveness k and maximum tumor diameter d max , and by the time of cancer acquisition before, during or after the study. We term these 108 groups as study subcohorts. To estimate confidence limits, we bootstrap the simulated patients by sampling with replacement.
We quantify the number of diagnoses by computing the number of patients diagnosed in each arm and comparing in each subcohort with a χ 2 test. To test for overdiagnosis itself, we compute the probability of death from other causes before death from cancer using the hazards h and h a from Table 1 and integrating as in [31]. To minimize confounding the benefits of treatment that sufficiently delay cancer-induced mortality to allow death from other causes, we lower the treatment effectiveness parameter in the cancer mortality hazard h to its minimum value α = 2.5.

Fit with CNBSS results
The outcomes of the CNBSS were used to calibrate our model parameters. Only three unknown parameters (b b2 , γ and Z) were estimated using the data from the control arm. The model was then simulated with the fitted parameters over both arms to reproduce the outcomes of the CNBSS, which fall in the range of our model projections (more than half within the first and third quartiles, Figs 4 and 5 and Table 3). Those simulation results include the number of detected cancers, deaths from cancer, deaths from other causes, and the distributions of age at diagnosis in both arms. With this minimal calibration, the BCHAM is still able to capture well the outcomes of the CNBSS.

Benefit and harm of mammography screening
The accurate fit of the model to the data under current conditions motivates testing how various model parameters affect the balance between benefit and harm. We first quantify the benefit of increasing the parameter α that describes the effectiveness of treatment (Fig 6). Value of screening is estimated as the percent increase in survival after 25 years of follow-up. As can be seen, when cancer treatment is highly effective, mammography screening provides only a minimal benefit.
To compare the benefit and harm as a function of age, tumor aggressiveness k, and maximum tumor diameter d max , we simulate identical populations in both arms. In the simulation, participants receive an annual mammography and physical examinations in the mammography arm, or only an annual physical examination in the control for the first five years of follow-up. The simulation of each patient was repeated 50 times to reduce variance due to individual variation. Higher values of k and d max strongly increase survival benefit from mammography screening, and decrease overdiagnosis since a tumor with large k and d max is more likely (if not surely) to be malignant and a patient with this kind of tumor is less likely to be overdiagnosed (Fig 7A). The effects of age are much weaker, with slightly improved benefits in the middle age groups (women between the ages of 44 and 56). For patients with unaggressive tumors (small values of k and/or d max ), mammography provides little benefit and the highest harm. Our CNBSS-calibrated model estimates that the maximum benefit of mammography screening is about 1.2% increase in 25-year survival shown in Fig 7A. This insignificant quantity is consistent with the empirical data of the CNBSS which has shown no screening benefit. The magnitudes of the benefit and harm strongly depend on the study to which the model parameters are fitted.
We also illustrate overdiagnosis and survivorship as functions of whether patients first acquired their cancer before, during, or after the study, and of the aggressiveness (k) and the maximum tumor diameter d max of their tumor. For overdiagnosis (Fig 8A and 8B), we compare the difference in the number of patients per thousand diagnosed in the mammography and control arms (top number) with the difference in the number of patients per thousand who would have died of other causes with less effective treatment (α = 2.5, bottom number). For survivorship (Fig 8C and 8D), we compare the difference in number of deaths per thousand in the mammography and control arms (top number) with the hazard ratio of death due to inclusion in the mammography arm (bottom number).
For patients who acquired an undetected tumor before the beginning of the study, those with ultimately small tumors are highly overdiagnosed and experience reduced survival due to the effects of the treatment. The greatest survival benefits accrue to patients with largest and most aggressive tumors. Patients who acquire a tumor during the five years of the study show a similar but weaker pattern of overdiagnosis, and no strong survival cost of overtreatment of rapidly-growing but ultimately small tumors. Patients who acquired tumors after the conclusion of the study show no effect of mammography as expected. These observations suggest that overdiagnosis mainly occurs at the first screening [32].
To capture a high-risk population, such as women with germline BRCA1 and BRCA2 mutations, family breast cancer history, hormone therapy and smoking history, we assume an increase of breast cancer incidence by a factor of 5 [33] over the baseline case. The main observations remain largely unchanged. In a high risk population, mammography screening is slightly more beneficial for younger women, illustrated by a slight shift in the age effect in Fig  7B compared       https://doi.org/10.1371/journal.pcbi.1008036.g008

Discussion
We have developed an individual-based mechanistic model of breast cancer incidence and mortality in a population based on the Canadian National Breast Screening Study (CNBSS). All but three of the parameters could be estimated independently from the literature or taken from the CNBSS report, with the remaining ones calibrated to the outcomes of the CNBSS. The model includes three forms of heterogeneity: tumor aggressiveness describing the growth rate, maximum tumor size and age. The treatment response is personalized to a certain extent through dependence of the hazard rate of cancer mortality on tumor aggressiveness and detected tumor size. We did not include more specific differences in response to therapy, such as through pre-existing resistance, due to lack of sufficient information.
The model accurately matches the cancer incidence and survival in the CNBSS (see the Results section). We then use the model to quantify the benefit and harm of mammography screening, with benefit measured as the increase in 25-year survival, and harm as the increase in overdiagnosis. The benefit of screening decreases almost to zero with highly effective treatment. In general, patients with the most rapidly growing or potentially largest tumors have the highest benefit and least harm from mammography screening, with only a relatively small effect of age.
We measured overdiagnosis in two ways, through the difference in the number of patients diagnosed (excess incidence [18]), and through the number who would have died of other causes if treatment were relatively ineffective. The goal of treatment is, of course, to ensure that all patients have the chance to die of something else, and thus comparing the number of deaths with relatively effective treatment confounds true overdiagnosis with successful treatment. An alternative defines overdiagnosis as cancers that would not have presented clinically during the patient's lifetime [15,31] which is most appropriate for modeling studies that optimize timing and type of testing.
Unlike age or other known risk factors, it is difficult in practice to predict specific tumor characteristics in an individual patient before recommending screening. In addition to improving mammography technology, increasing the net benefit of screening may require pretreatment tests that can identify women at the greatest risk of the highly aggressive cancers.
The CNBSS has been criticized because participants were volunteers [34] and thus possibly at higher risk than the general population. However, because participants were then randomized, this selection of volunteers should only increase the effect size of screening, but not create a change in direction.
The developed model has several limitations. The parameters from the literature come from a variety of sources and studies that might not apply across all populations. The remaining three are based on a single study, and future work will test how effectively it can reproduce the outcome of other clinical trials, especially the Swedish trial [6,7], by modifying few or no parameter values. Calibrating this model to the Swedish trial, we hope to comprehend which factors contribute to the contradicting outcomes of these two studies. In addition, our modeling of treatment is quite simplified, without taking into account recent improvements or different treatments for different breast cancer types.
Our model brings a new quantitative tool to bear on the controversy over the use of mammography screening. This CNBSS-based model suggests, in line with recent trials, that the benefits are sufficiently small and the harm sufficiently large to make screening of dubious value except in patients destined to have highly aggressive cancers, who of course are difficult if not impossible to identify in advance.