Developing and validating an individualized breast cancer risk prediction model for women attending breast cancer screening

Background Several studies have proposed personalized strategies based on women’s individual breast cancer risk to improve the effectiveness of breast cancer screening. We designed and internally validated an individualized risk prediction model for women eligible for mammography screening. Methods Retrospective cohort study of 121,969 women aged 50 to 69 years, screened at the long-standing population-based screening program in Spain between 1995 and 2015 and followed up until 2017. We used partly conditional Cox proportional hazards regression to estimate the adjusted hazard ratios (aHR) and individual risks for age, family history of breast cancer, previous benign breast disease, and previous mammographic features. We internally validated our model with the expected-to-observed ratio and the area under the receiver operating characteristic curve. Results During a mean follow-up of 7.5 years, 2,058 women were diagnosed with breast cancer. All three risk factors were strongly associated with breast cancer risk, with the highest risk being found among women with family history of breast cancer (aHR: 1.67), a proliferative benign breast disease (aHR: 3.02) and previous calcifications (aHR: 2.52). The model was well calibrated overall (expected-to-observed ratio ranging from 0.99 at 2 years to 1.02 at 20 years) but slightly overestimated the risk in women with proliferative benign breast disease. The area under the receiver operating characteristic curve ranged from 58.7% to 64.7%, depending of the time horizon selected. Conclusions We developed a risk prediction model to estimate the short- and long-term risk of breast cancer in women eligible for mammography screening using information routinely reported at screening participation. The model could help to guiding individualized screening strategies aimed at improving the risk-benefit balance of mammography screening programs.


Introduction
There is ongoing debate on the benefits and harms of breast cancer screening [1][2][3]. To improve this balance, current evidence supports personalized screening [4,5]. Modeling studies have shown that modifying the screening interval, screening modality, or age range of the target population based on women's individual risk yielded greater benefit than conventional standard strategies [5][6][7]. Several risk models have been designed to estimate women's individual breast cancer risk based on their personal characteristics [8][9][10][11][12][13][14][15]. However, most of these models have not been specifically developed to estimate the risk of women targeted for breast cancer screening in order to offer them personalized strategies.
A recent consensus statement of the European Conference on Personalized Early Detection and Prevention of Breast Cancer (ENVISION) [16] stated the need to develop breast cancer risk prediction models based on data from large screening cohorts and including risk factors easily obtainable at screening participation, such previous mammographic features and prior benign breast disease.
To date, only one model has specifically aimed to predict women's individual risk looking to personalize breast cancer screening strategies [17]. Although highly valuable, the model was based on short-term risk estimates and did not account for relevant characteristics of prospective studies such as internal time-dependent covariates. This model only estimates the twoyear risk, which could lead to bias as one of the aims proposed in breast cancer screening personalization is to see which women are at a lower risk in order to extend their screening period to three or four years. Therefore, if new breast cancer risk models are developed with the aim of analyzing the possibilities offered by personalized screening strategies, it would be interesting to estimate the biennial risk of each woman, in other words, to obtain estimators not only at 2 years, but also every two years (2, 4, 6, 8. . . up to 20 years, which is the total time a woman is screened). This will help to better understand the different possibilities of screening strategies and will allow to observe the differences in the validation of the model estimators for the different time horizons. There is therefore a need for breast cancer risk prediction models, with risk estimates in the short-and long-term, and based on data from large screening cohorts. These new risk models should include a limited and feasible number of variables for the proposed objective, for example, detailed information on the type of previous benign breast disease or previous mammographic characteristics, which existing risk models tend not to use.
We aimed to design and validate an individualized risk prediction model to estimate the biennial risk of breast cancer in women eligible for mammography screening by using data from the long-standing population-based screening program in Spain.

Setting and study population
Breast cancer screening in Spain started in 1990 in a single setting and expanded until it became nationwide in 2006. This program follows the recommendations of the European Guidelines for Quality Assurance in Breast Cancer Screening and Diagnosis [18]. Women aged 50 to 69 years are invited to biennial screening mammography by written letter. Screening mammograms are interpreted according to the Breast Imaging Reporting and Data System (BI-RADS) scale by trained breast radiologists [19]. Women with an abnormal mammographic feature are recalled for further assessments to confirm or rule out malignancy. Women without a breast cancer diagnosis are invited again for routine screening at 2 years. Overall, breast cancer screening in Spain has a recall rate of 43.0, a detection rate of 4.0, and an interval cancer rate of 1.1 per 1,000 mammographic examinations [20]. The positive predictive value is 9.8% for recalls and 38.9% for recalls involving invasive procedures. Overall, 16.8% of all screen-detected cancers are ductal carcinoma in situ (DCIS). More details of breast cancer screening in Spain are described elsewhere [21].
We analyzed data from two centers forming part of the Spanish breast cancer screening program in the Metropolitan Area of Barcelona. These centers routinely gather information on family history of breast cancer, previous benign breast disease (BBD), and previous mammographic features. The centers collect information on screening mammography examinations, recalls, further assessments, and diagnostic results in their defined catchment areas. The cohort included all 123,251 women screened at least once between 1995 and 2015 and followed-up until December 2017. We excluded 758 women diagnosed with breast cancer at the first screen, 210 women with missing information on family history, 213 women with missing information on previous BBD, and 101 women with missing information for both family history and previous BBD. The study population for the analysis consisted of 121,969 women who underwent 437,540 screening mammograms during the study period.

Definition of study variables
Information on family history and history of prior breast biopsies was self-reported and collected from face-to-face interviews conducted by trained professionals at the time of mammography screening. This information was consistently collected over the 20 years study period. A family history of breast cancer was defined as having at least one first-degree relative with a history of breast cancer.
Breast biopsy results were classified by a community pathologist at each center using SNOMED codes [22]. Pathological diagnoses were grouped following the benign breast disease classification proposed by Dupont and Page [23][24][25] into non-proliferative and proliferative disease. Proliferative lesions with and without atypia were combined into a single category due to the small number of subsequent breast cancer cases among those with a proliferative lesion with atypia. If women reported having had a biopsy before the start of the screening but no pathology results were available, the biopsy was classified as having a prior biopsy, unknown diagnosis.
A community radiologist routinely reported on mammographic features found at mammography screening interpretation. We classified as mammographic features any mass, calcification, asymmetry or architectural distortion reported by radiologists at mammographic interpretation. Findings were assigned to the category of multiple mammographic features if more than one of the previous mammographic features had been reported simultaneously at screening interpretation.
We included both invasive breast cancers and DCIS for the analysis.

Model design
We built the risk prediction model using a random sample of 60% of the study population (estimation subcohort). The remaining 40% was used for an internal validation (validation subcohort). We estimated the age-adjusted hazard ratios (aHR) and the 95% confidence intervals (95%CI) for the breast cancer incidence for each category of family history, previous BBD, and previous mammographic features with the estimation subcohort. Age was included in the model as a continuous variable. We used partly conditional Cox proportional hazards regression, an extension of the standard Cox model, to incorporate changes in these risk factors over time. Robust standard errors were used to estimate 95% confidence intervals using the Huber sandwich estimator [26]. If a woman has had a diagnosis of cancer, she will contribute women-years at risk from the date of her first mammogram to the diagnosis of cancer. Since we can identify all interval cancers, a woman who has not had a diagnosis of cancer at the end of her follow-up will contribute womenyears at risk from the first mammogram to the last mammogram plus 2 years of follow-up.
We tested whether family history, previous BBD, and previous mammographic features interacted among themselves or with age. The interaction terms were not significant and were therefore not included in the model. The proportional hazards assumption was assessed by plotting the log-minus-log of the survivor function against log time for each predictor variable. The proportional hazards assumption appeared to be reasonable for all predictors.

Model validation
We calculated the absolute breast cancer risk estimates for each 2-year interval over the 20-year lifespan covered by screening (ages 50 to 69 years) for each individual in the validation subcohort. As proposed by Zheng and Heagerty, we used a general hazard function to predict the absolute risk of breast cancer diagnosis based on length of follow-up, prediction time, and women's risk profile [27].
We conducted an internal validation of the model to evaluate its predictive performance by assessing its calibration and discrimination. To assess calibration, we calculated the ratio between the expected breast cancer rate in the validation subcohort versus the observed rate in the estimation subcohort. To account for censoring, the observed rate was estimated using the Kaplan-Meier estimator. The expected breast cancer rate was calculated as the average of the risk estimates in the validation subcohort. The expected breast cancer rate in a specific risk group was calculated as the average of the risk estimates for each woman in that risk group of the validation subcohort. The expected-to-observed (E/O) ratio assessed whether the number of women predicted to develop breast cancer from the model matched the actual number of breast cancers diagnosed in the validation subcohort. An E/O ratio of 1.0 indicates perfect calibration. We calculated the E/O ratio 95% confidence intervals (95% CI) using the formula of the standardized mortality ratio proposed by Breslow and Day [28]. The discriminatory accuracy of our model was assessed by estimating the area under the receiving operating characteristic curve (AUC) for each 2-year interval based on the predicted risks for each woman and whether she developed breast cancer during the time interval or not [29]. The predicted risks were calculated using the model coefficient estimates at the baseline mammogram for those women in the validation cohort who have been followed for a time greater than or equal to the time horizon being estimated. The AUC measured the ability of the model to discriminate between women who will develop breast cancer from those who will not. We calculated the 95% CI using the approach proposed by Hanley and McNeil [30].
Statistical tests were two-sided and all p-values <0.05 were considered statistically significant. All analyses were performed using the statistical software R version 3.4.3 (Development Core Team, 2014).
The study was approved by the Clinical Research Ethics Committee of Hospital del Mar Medical Research Institute (2015/6189/I). The review boards of the institutions providing data granted approval for data analyses. This is an entirely registry-based study that used anonymized retrospective data and hence there was no requirement for written informed consent.
The authors declare that they have no conflicts of interest.
Overall calibration of the model was accurate across all 2-year time horizons. The E/O ratio ranged from 0.99 at 2 years to 1.02 at 20 years and was never significantly different than 1 ( Table 3). The AUC was lowest at the 4-year risk estimate (AUC, 58.7%; 95%CI: 55.9%-61.5%) and highest at the 18-year risk estimate (AUC, 64.7%; 95%CI: 62.5%-66.9%) and were significantly higher than 50% for all the time horizons.
Estimates for the 10-year time horizon showed that the model slightly overestimated breast cancer rates in women with masses (E/O ratio, 1. Distribution of the absolute cumulative risk estimates at 2-, 10-and 20-year time horizons are shown in Fig 1. The 10-year risk was between 1.5% and 2% in 60% of the women and was higher than 2% in 35%. The 20-year risk was lower than 3% in only 4% of the women, between 5% and 7% in 17% of the women, and was higher than 7% in approximately 9% of the women.

Discussion
We used individual-level data from a large cohort of women regularly screened in Spain to design and validate a risk prediction model to estimate the biennial risk of breast cancer in

PLOS ONE
Developing a breast cancer risk prediction model for women attending breast cancer screening

PLOS ONE
Developing a breast cancer risk prediction model for women attending breast cancer screening [31,32]. However, as stated in the statements of the last European Conference on Risk-Stratified Prevention and Early Detection of Breast Cancer, there is a need for risk models specifically designed for women eligible for breast cancer screening, based on data from large screening cohorts [16]. A previous model was designed to estimate the risk of breast cancer in women eligible for mammography screening [17]. The model used the Karma cohort from Sweden and included information on mammographic features. That study focused solely on estimating the shortterm risk of breast cancer over the next mammographic examination. In addition, it used a case-control design to establish risk factors, which may bias the estimates of the short-term association with breast cancer risk. Our model adds to the breast cancer risk prediction models currently available and can be used to help guide personalized screening strategies by employing information easily obtained at screening participation. Additional useful information from our model is estimation of a woman's risk for breast cancer at 2-yearly intervals.
Our model was further developed by adding the effect of mammographic features, such as masses, calcifications, asymmetries, and architectural distortions. Previous studies have shown that mammographic features increase the subsequent risk of breast cancer [33]. In our model, the strongest influence on risk was conferred by calcifications. The biology behind calcifications is not well established. It has been suggested that mammary cells may acquire some mesenchymal characteristics, being able to contribute to the production of breast calcifications as a sign of carcinogenic transformation [34].
The role of BBD as a risk factor for breast cancer is well established [9,33,35]. However, its inclusion in breast cancer risk prediction models is rare, mainly because available information on BBD in large cohorts of women is uncommon. Only one previous risk model included different estimates for the different categories of the Dupont and Page BBD pathological classification [23][24][25]. The Breast Cancer Surveillance Consortium model was updated to include BBD, which led to only minimal improvement in discrimination [9]. This lack of significant improvement could be due to the absence of pathology results for most women who reported breast biopsies prior to their first screening round, as was also the case in our study. However, the addition of BBD to the model markedly increased the proportion of women identified as being at high risk for invasive breast cancer.
We assessed the internal validity of the model by means of its calibration and discriminatory accuracy. To perform internal validation we split our cohort in two sets, the estimation subcohort, to perform the analysis and development of the model and the validation subcohort, to perform the internal validation of the model. This technique known as split validation is common for this type of models [9] but cross validation or bootstrapping could also have been performed [36,37]. The model showed accurate calibration, neither overestimating nor underestimating the overall risk through the different years. In Table 4 we saw the calibration of the 10-year estimates from the model in risk factor subgroups. We also performed the E/O ratio estimates in risk factor subgroups for each one of the time horizons proposed. We only showed the 10-year estimates since showing all of them could be confusing. We showed the 10-year estimates since they have a good balance between the number of events observed (in the first time horizons some subcategories have a low number of observed events was lower) and the number of people observed (in the last time horizons we have some lost to follow-up, as the mean time of follow-up is 7.5 years). Nonetheless, the E/O ratio was overestimated for women with a proliferative BBD, due to the small number of cases among this subgroup.
The model showed modest discrimination with a maximum AUC of 64.7%. Discriminatory accuracy in breast cancer risk prediction models is usually low because a substantial proportion of cases are diagnosed in women with no known risk factors and the AUC of the different models vary between 60 and 70% [14]. This is clearly in contrast with prediction models for other diseases, such as cardiovascular disease, which achieve good discrimination [38,39]. However, the model presented in this paper performed as well as other models that include many other risk factors that were not available in this study. As one of the reasons why the existing risk models have not been implemented for personalized screening is that it is difficult to collect all of the necessary risk factors in practice, a simpler model like the one we present could be useful. We tested other approaches to validate our model, such as the AUC estimation proposed by Li et al [40]. This estimation uses weights to calculate the contribution in the estimates of those women without a breast cancer diagnosis who were censored before reaching the time horizon. However, this approach produced no substantial differences in our validation.
A major strength of our model is that we used individual-level data from more than 120,000 women participating in a large, well-established, population-based screening program in Spain from 1995 to 2015, with a mean follow-up of more than 7.5 years and a maximum of 20 years. The program has a participation rate of 67% and a re-attendance rate of 91.2% [19].
This study also has some limitations. First, a major weakness is the lack of information on breast density, which was not systemically collected as part of screening data in the participating centers. Previous models estimating individual breast cancer risk have shown that the addition of breast density improved the discriminatory power of the models [9,17,41,42]. Dense breasts confer women a higher risk of breast cancer and are also associated with a higher risk of false-positive results, masking, and interval cancers [43]. In addition, we had no information on common genetic variants, which has been added to other breast cancer risk prediction models [44,45]. However, the discriminatory accuracy of the models was scarcely improved by the inclusion of information on single nucleotide polymorphisms (SNPs). This lack of both variables may be useful for some institutions where these risk factors are not available.
Second, the number of breast cancer cases among women with a proliferative BBD was small, which reduced our ability to accurately predict the expected number of cases across risk factor subgroups. Nevertheless, the overall calibration of the model across the time horizons assessed was highly accurate. Also, as a consequence of the small number of subsequent breast cancer cases among those women with a proliferative BBD with atypia, we merged proliferative BBD with and without atypia into a single category which might make the model less usable in practice. Third, our model was based on a large set of representative data from the Breast Cancer Screening Program in Spain, which provides good generalizability. However, external validation of the results is needed to verify the predictive performance of our risk model.
Another limitation might be the reason for censoring. Over 52% of women in the cohort had their last mammogram in the last two years of the study follow-up and 17% of women had their last mammogram at ages 68 or 69 years. Most of the remaining 31% are women who did not participate in the 2014-2015 round or who have changed health areas and thus are not in our study population. The screening program does not have an exhaustive record of which women die and, therefore, we cannot differentiate them from non-participating women.
Finally, we were unable to analyze the association between the laterality of the BBD with the subsequent risk of breast cancer. In a previous analysis, we found that 40% of incident breast cancer cases in women with BBD were contralateral to the prior BBD, suggesting that a large proportion of benign lesions may be risk markers rather than precursors of subsequent cancer [46].

Conclusions
We designed and internally validated a risk prediction model to estimate the short-and longterm risk of breast cancer in women eligible for mammography screening based on their age, family history, previous benign breast disease, and previous mammographic features. The model showed good calibration and modest discriminatory power, and could be improved by adding further variables such as breast density and polygenic risk scores. The model can be used biennially to predict a woman's breast cancer risk during her screening lifespan (age 50 to 69 years) using information easily obtained at screening participation. Risk prediction models specifically designed for women eligible for breast cancer screening are key to guide individualized screening strategies aiming to improve the risk-benefit balance of mammography screening programs.