Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Spatial and temporal modeling of breast cancer mortality in Kansas: An R-INLA approach

  • Stephanie Colwell,

    Roles Data curation, Formal analysis, Investigation, Methodology, Validation, Writing – original draft

    Affiliation Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, United States of America

  • Prabhakar Chalise,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Project administration, Supervision, Writing – original draft

    Affiliations Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, United States of America, The University of Kansas Cancer Center, University of Kansas Medical Center, Kansas City, Kansas, United States of America

  • Byron Gajewski,

    Roles Conceptualization, Formal analysis, Methodology, Writing – original draft

    Affiliations Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, United States of America, The University of Kansas Cancer Center, University of Kansas Medical Center, Kansas City, Kansas, United States of America

  • Isuru Ratnayake,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Resources, Supervision, Writing – original draft

    Affiliations Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, United States of America, The University of Kansas Cancer Center, University of Kansas Medical Center, Kansas City, Kansas, United States of America

  • Dinesh Pal Mudaranthakam

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Supervision, Writing – original draft

    dmudaranthakam@kumc.edu

    Affiliations Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, United States of America, The University of Kansas Cancer Center, University of Kansas Medical Center, Kansas City, Kansas, United States of America

Abstract

Introduction

Based on Breast Cancer Statistics, 2025, breast cancer is a leading cause of death among women in the United States. Geographic disparities and associated risk factors influence breast cancer mortality over time and across spatial areas within the state of Kansas.

Objective

This study investigates the spatial and temporal distribution of breast cancer mortality in Kansas, analyzing associations with socioeconomic, healthcare, and behavioral characteristics while accounting for geographic heterogeneity and temporality.

Methods

Using data from 105 counties within Kansas, breast cancer mortality was modeled using known count distributions. Within these model frameworks, two approaches to spatial units were implemented: using county-level units and creating spatial clusters of counties. These models incorporated both spatially structured and unstructured effects with different correlation structures. Key socioeconomic, healthcare, and behavioral factors were analyzed. Model performance was evaluated using the Deviance Information Criterion (DIC), Widely Applicable Information Criterion (WAIC), and Marginal Log Likelihood.

Results

The Poisson BYM2 model provided the best fit for the county analysis (DIC = 1305.02, WAIC = 1308.40) and the spatial cluster analysis (DIC = 2435.90, WAIC = 2420.70). The percent of females who binge drink alcohol was significant in the county analysis. In contrast, the average percent of females who binge drink alcohol, the average percent of females who smoke tobacco, the average percentage of females with diabetes, and the average percent of females were significant in the spatial cluster analysis. The relative risk of breast cancer mortality did not change significantly over time in the county analysis, but it did in the cluster analysis.

Conclusions

Spatial and temporal models provide valuable insights into the risk of breast cancer mortality in Kansas, within the county analysis and the spatial cluster analysis. Public health officials should focus on providing resources and equitable healthcare in high-risk counties and high-risk spatial clusters through targeted interventions to improve access to healthcare and breast cancer outcomes.

Introduction

In the United States of America, breast cancer is one of the leading causes of death among women [1]. Although breast cancer mortality rates in the United States have been decreasing annually, the rate of this decrease has slowed in recent years [2]. Despite the steady decline, breast cancer still poses a significant threat among women. According to the Kansas Department of Health and Environment, female breast cancer is the top cancer diagnosis among females in Kansas between 2017 and 2021 and was recorded as the second-highest cancer death among females in Kansas between 2019 and 2023 [3]. Within the State of Kansas, the age-adjusted incidence rate of female breast cancer was 135.6 per 100,000 between 2017 and 2021 [3], and the age-adjusted female breast cancer mortality rate in Kansas was 19.4 deaths per 100,000 females between 2019 and 2023 [3]. The age-adjusted incidence rate of female breast cancer was 130.8 per 100,000 women between 2018 and 2022, with an age-adjusted mortality rate of 19.2 per 100,000 annually in the United States between 2019 and 2023 [4].

When attempting to determine breast cancer mortality rates, certain risk factors need to be considered. Numerous studies have highlighted various risk factors for female breast cancer, including screening practices [5], smoking [6], diabetes [7], obesity [7,8], physical inactivity [9], race [9], educational attainment [10], poverty [10], and environmental influences like air pollution [1012]. Other potential risk factors, such as age at menarche, age at menopause, breastfeeding, parity, and the percentage of the population born abroad, could also be relevant. However, obtaining comprehensive data on these factors for all counties in the contiguous United States may be highly challenging or even unfeasible [13]. Due to the inability to directly capture key risk factors for breast cancer mortality at the county level, researchers can use spatial methods to adjust models for these inherent differences within the counties. According to the 2020 Census [14], 59 of Kansas’s 105 counties, or 56%, were classified as rural, with an urban index of less than 0.40. It has been shown that individuals living in rural areas are more likely to experience hardships in healthcare, such as higher travel costs and a greater need to access care due to limited resources in their area [15]. These burdens can negatively affect breast cancer outcomes, and by accounting for rurality, public health efforts can better tailor interventions, allocate resources more equitably, and develop policies that improve access to timely and high-quality care.

Spatial and temporal modeling of breast cancer mortality, especially within small spatial areas, is not without its challenges. It is typical for cancer registries and state health agencies to have specific criteria related to the release of data for small geographic areas. A standard practice within small area estimation is to report cases that are greater than 10 [16]. Small area estimation of breast cancer mortality, such as within counties in Kansas, can present challenges when the number of breast cancer mortality in certain counties is very low, often less than 10 cases. These small counts not only lead to statistical instability of spatial and temporal models but also raise serious concerns about patient privacy and confidentiality, as individuals could be potentially identified, especially in sparsely populated areas [16]. To address this issue, spatial clustering techniques can be applied to group neighboring counties with similar characteristics, thereby forming spatial clusters that each contain more than 10 breast cancer mortality cases. This approach can improve the reliability of statistical estimates while also protecting individual privacy by aggregating data to a level that maintains confidentiality.

A similar Bayesian spatial–temporal modeling framework using R-INLA was previously applied by Deblina Khana et al. (2018) to estimate mortality rates across U.S. counties at the national level. In contrast, the present study focuses on University of Kansas Cancer Center Catchment area, with the goal of producing locally actionable small-area estimates to inform regional cancer control and resource allocation.

In this study, we estimate county-level spatiotemporal variation in breast cancer mortality using Bayesian hierarchical models fitted via Integrated Nested Laplace Approximation (INLA). To accommodate overdispersion, excess zeros, and spatial–temporal dependence, we compare Poisson (Po), generalized Poisson (GP), negative binomial (NB), and their zero-inflated counterparts (ZIP, ZINB) under Besag-York-Mollié (BYM), Besag-York-Mollié 2 (BYM2) for space, and Random Walk order 1 (RW1) and order 2 (RW2) for time, including county-specific effects and spatial clustering. Models adjust for socioeconomic, healthcare access, and lifestyle covariates, and we generate maps of relative risk over space and time. We select the specification that best balances fit and interpretability using DIC, WAIC, and Marginal Likelihood, to characterize how mortality risk evolves geographically and temporally after accounting for known risk factors. By incorporating spatial mapping, we identify hotspots, areas with persistently elevated risk based on Bayesian exceedance probability mapping, providing actionable locations for targeted resource allocation and funding to help reduce breast cancer mortality. By incorporating spatial mapping of breast cancer mortality, we can identify hotspots, defined as areas most in need of action. This will give healthcare providers and policy makers key areas that are in most need and can receive better resource allocation and funding to assist in reducing their breast cancer mortality.

Methods

Data sources

This study used data on breast cancer deaths in 105 counties in the state of Kansas over a four-year period from 2018 to 2021. The data were obtained from the Kansas Information for Communities (KIC) health information portal [17]. This data uses breast cancer deaths from malignant neoplasms of the breast, explicitly between 2018 and 2021. In addition to mortality counts, the study incorporates various demographic, healthcare, and behavioral characteristics. These covariates were obtained from the Kansas Health Matters portal [18]. The county level covariates that are incorporated are percentage of female [18], number of primary care physicians per 100,000 population [18], percentage of females who binge drink [18], percentage of females who smoke cigarettes [18], percentage of females over age 20 that are obese, percentage of females over age 20 that have been diagnosed with diabetes [18], and the community assets and relative rurality (CARR) index [19]. This is the latest curated data available and accessible for mortality counts. Data collection details for each covariate are described below.

The percentage of females is defined as the percentage of the population that is female. This covariate was captured through the U.S. Census Bureau Population and Housing Unit Estimates. The number of primary care physicians is defined as the primary care provider rate per 100,000 population [18]. Primary care providers included any practicing physician who specialized in general practice medicine, family medicine, internal medicine, or pediatrics. This covariate was captured through County Health Rankings, which is maintained by the Conduent Healthy Communities Institute [18]. The percentage of females who binge drink is defined as the percentage of female adults who reported binge drinking at least once during the 30 days prior to the survey. Female binge drinking is defined as drinking four or more drinks on one occasion. This covariate was captured by CDC – PLACES and is maintained by Conduent Healthy Communities Institute [18]. The percentage of females who smoke is defined as the percentage of female adults who currently smoke cigarettes. This covariate was captured by CDC-PLACES and is maintained by Conduent Healthy Communities Institute [18]. The percentage of females who are obese is defined as the percentage of female adults aged 20 and older who are obese according to the Body Mass Index (BMI). The BMI is calculated by taking the females weight and dividing it by their height squared in metric units. A BMI greater than or equal to 30 is considered to be obese. This covariate was captured by the Centers for Disease Control and Prevention and maintained by Conduent Healthy Communities Institute [18]. The percentage of females over 20 with diabetes is defined as the percentage of females aged 20 and older who have ever been diagnosed with diabetes. Females who were diagnosed with diabetes only during the course of their pregnancy were not included. This covariate was captured by the Centers for Disease Control and Prevention and maintained by Conduent Healthy Communities Institute [18]. The community assets and relative rurality (CARR) index is defined as a continuous, multi-dimensional measure of rurality based on the concept of sustainable development that integrates measures of environmental, social, and economic resources. The CARR index varies between 0 and 1 where values closer to 0 represent urban areas and values closer to 1 represent rural areas [19]. The CARR values in Kansas ranged from 0.178 to 0.740, reflecting meaningful variation in rurality across counties. However, the majority of counties were clustered within a moderate range (median = 0.396; IQR: 0.315–0.566), suggesting that extreme rurality differences are less common. All data were accessed for research purposes on July 25th, 2025, and authors did not have access to information that could identify individual participants during or after data collection.

Expected counts were calculated using internal (crude) standardization rather than age‑standardized reference populations due to limited availability and instability of age‑specific county‑year data, particularly in counties with small counts. Accordingly, estimated relative risks reflect comparative spatial patterns rather than fully age‑adjusted mortality rates.

In the following section, we will discuss the spatial-temporal Bernardinelli model that is implemented on the breast cancer mortality count data and covariates described in this section. It should be noted that even though the data used are breast mortality counts, the model below uses the population of each county as an offset in order to model the relative risk of breast cancer mortality adjusted for the differing population sizes within each county in Kansas.

Spatial-temporal Bernardinelli model

This paper implements the Bernardinelli spatial-temporal model. Let denote the spatial units and denote the temporal units. The Bernardinelli [20] model assumes that the observed number of cases, , in spatial unit at temporal unit follows a Poisson distribution:

where the Poisson mean (i.e., ) is defined as:

Taking the logarithm of the mean yields the log-linear formulation:

where is included as an offset term to account for differences in population size across spatial units and temporal units.

Here, denotes the expected number of cases, and represents the relative risk for the spatial unit at temporal unit . The expected counts were calculated using internal standardization based on the overall crude incidence rate across all spatial units at each temporal unit:

where is the population at risk at spatial unit at temporal unit . Thus, relative risk, represents the ratio of the risk in spatial unit at temporal unit to the overall average risk across all spatial units at the same temporal unit, as defined by the internally standardized reference rate used to compute . Values greater than 1 (i.e., ) indicate higher-than-average risk, while values less than 1 (i.e., ) indicate lower-than-average risk. Equivalently, implies that the risk in that spatial unit and time period is equal to the overall average risk at that time.

The relative risk is modeled on the log scale as:

(1)

where denotes the intercept, and represent spatial random effect, is a global linear trend effect, and represents spatial deviations from the global temporal trends. Area-specific covariates can be incorporated by replacing the intercept parameter in equation (1) with , where denotes the covariates for spatial unit at time and represents the regression coefficients. It is noteworthy to mention that the reported relative risks are obtained from the fitted spatio-temporal model and are adjusted for included covariates as well as spatial and temporal random effects.

Temporal specification

Across all spatial–temporal models, denotes the temporal index (year). Because the study period includes only four years (2018–2021), temporal patterns are inherently weakly identifiable, and inference regarding long-term trends is constrained. Accordingly, a parsimonious temporal specification was adapted in which time enters linearly through under the Bernardinelli formulation. The coefficients captures the overall year-to-year effect, while allows for area-specific deviations from global temporal trend. Weakly informative Gaussian priors were assigned to global temporal coefficient to stabilize estimation.

Differences observed between county-level and cluster-level analyses reflect aggregation effects, as clustering increases case counts and reduces variability, making short-term differences more detectable. Temporal findings should therefore be interpreted cautiously.

Spatial correlation structures

Several spatial dependence structures are incorporated through alternative prior specifications for the structured spatial effect in equation (1). In this study, spatial dependence is modeled by using the Besag-York-Mollie (BYM), Besag-York-Mollie 2 (BYM2), Random Walk of Order 1 (RW1), and Random Walk of Order 2 (RW2) models.

The BYM model incorporates an Intrinsic Conditional Autoregressive (ICAR) model for spatial dependence and a random effect for non-spatial heterogeneity. The ICAR model assumes that follows a Conditional Autoregressive (CAR) model such that:

(2)

where . In (2), represents the set of neighbors to the spatial unit and represents the number of neighbors of the spatial unit . The ICAR model smooths the spatial effect according to a neighborhood structure in which two areas are considered neighbors if they share a common boundary. The unstructured spatial random effects are assumed independent:

(3)

Under the BYM specification, the spatial–temporal linear predictor in equation (1) remains unchanged, with spatial dependence entering through the prior distributions of and .

The BYM2 model is a reparametrized version of the BYM model [21], introducing the mixing parameter and the precision parameter . The combined spatial effect is written as:

This formulation improves identifiability and stabilizes the partitioning of spatial variance [22].

The key difference between the BYM and BYM2 models is in how they structure and scale the random effects and . The BYM2 model provides a more consistent decomposition of spatial variance, making it easier for Markov Chain Monte Carlo (MCMC) convergence.

The RW1 prior assumes:

(4)

Here, refers to the preceding county in the defined geographic ordering. This specification captures smooth large-scale spatial gradients rather than adjacency-based local dependence.

The Random Walk of Order 2 prior penalizes second-order differences,

(5)

Encouraging a smoother quadratic spatial trend across the ordered counties.

The BYM and BYM2 frameworks decompose spatial variation into structured dependence and unstructured random heterogeneity, with BYM2 employing a reparameterization that improves identifiability and stabilizes variance partitioning across spatial components. The RW1 and RW2 specifications provide alternative spatial smoothing structures by penalizing first- and second-order differences (equations 4 and 5), respectively, across ordered spatial units. Within the INLA framework, weakly informative priors were specified for hyperparameters to ensure model stability while minimizing undue prior influence on posterior estimates. These assumptions guide how each model captures spatial dependence and uncertainty, providing a clear rationale for comparing model specifications.

The spatial-temporal framework described in equation (1) can be adjusted to incorporate different count distributions such as Zero-Inflated Poisson, Negative Binomial, Zero-Inflated Negative Binomial, and Generalized Poisson.

In the following section, we will discuss the integrated nested Laplace approximation method for Bayesian estimation of the Bayesian spatial-temporal hierarchical models discussed in this section.

Integrated Nested Laplace Approximation (INLA)

The Integrated Nested Laplace Approximation (INLA) is an approximate Bayesian inference method that provides a computationally efficient alternative to Markov Chain Monte Carlo (MCMC) for latent Gaussian models [23]. INLA is specifically designed for models that can be expressed as latent Gaussian Markov random fields (GMRFs), a class that includes spatial, temporal, and spatial–temporal hierarchical models. Latent Gaussian models amenable to INLA can be expressed through three hierarchical layers: the likelihood, the latent Gaussian field, and the hyperparameters. The general formulation is:

(6)(7)(8)

where denotes the observed data, denotes the latent Gaussian field, and denotes the vector of hyperparameters. The matrix is the precision (inverse covariance) matrix governing the Gaussian prior of the latent field.

In the context of the Bernardinelli spatial–temporal model described in equations (3)(5), the latent Gaussian field corresponds to the full linear predictor for all spatial–temporal units. Specifically,

so that the spatial random effects (), temporal components, and space–time interaction terms form elements of the latent field. The hyperparameters include all precision parameters governing the spatial ICAR effects, unstructured spatial effects, temporal random walk components, and any mixing parameters those in the BYM2 specification. Thus, the Bernardinelli model in equations (1)(3) is a specific instance of the general latent Gaussian model class estimated using INLA.

Although spatial dependence is incorporated in the model, it enters through the prior distribution of the latent Gaussian field (equation 7) rather than through the likelihood. Conditional on the latent field and hyperparameters , the observations are assumed independent, Therefore, the likelihood retains the product form given in equation 6, and the spatial dependence is encoded is the precision matrix .

The relative risk is a deterministic transformation of the linear predictor and defined as:

and therefore, does not receive an independent prior. Its posterior distribution is induced by the posterior distribution of the latent field and hyperparameters.

The joint posterior distribution of the latent Gaussian model is denoted as:

Here, corresponds directly to the Gaussian prior specified in the equation (7), and is defined in equation 8.

Inference proceeds by integrating over the latent field and hyperparameters to obtain marginal posterior distributions. In particular, hyperparameter marginals are obtained by integrating out ,

while marginal posterior distributions for components of the latent field are obtained by integrating over the hyperparameters,

These integrals are approximated numerically using nested Laplace approximations within the INLA framework [23,24]. Since the dimension of the latent field can be very large (), while the number of hyperparameters is relatively small (typically 2–5, but not to exceed 20). This structured approximation enables efficient computation of marginal posterior distributions. The INLA algorithm is implemented using the R-INLA package [23].

Hotspot definition

In this study, hotspot areas were identified using exceedance probabilities, defined as the posterior probability that the area-specific relative risk (RR) exceeds a prespecified threshold , i.e., . Exceedance probability mapping is commonly used in Bayesian disease mapping to highlight areas with unusually elevated risk while accounting for the posterior uncertainty [25].

For the primary hotspot definition, this study used , a moderate elevation above the base line risk, and an area was classified as a hotpot if A relative-risk threshold of 1.5 has been used in prior exceedance mapping to represent a practically meaningful elevation in risk beyond background variability [26].

To evaluate robustness to threshold choice, a sensitivity analysis was conducted by repeating hotspot identification under alternative criteria, with probability >0.80 (more stringent) and with probability >0.70 (less stringent). These alternatives were selected to examine whether hotspot locations remain consistent when requiring stronger risk elevation and/or greater posterior certainty.

In the following section, we will discuss the Spatial Kluster Analysis by Tree Edge Removal (SKATER) algorithm that will be implemented to generate spatial clusters of counties within the state of Kansas. This algorithm is used to generate new spatial units that can be used in small area estimation analysis to ensure patient privacy and confidentiality.

Spatial Kluster analysis by Tree Edge Removal (SKATER)

Spatial clusters were generated using the SKATER algorithm [27], an established regionalization method for partitioning spatial units into contiguous and internally homogeneous clusters. The implementation in this study does not introduce methodological novelty; rather, SKATER was applied to address instability arising from small-area counts. The algorithm operates on a spatial connectivity graph (Fig 1), where counties are represented as vertices and edges represent spatial adjacency. Edges were defined using first-order queen contiguity, meaning that counties sharing either a common boundary or a common vertex were considered neighbors.

thumbnail
Fig 1. Connectivity graph used in the SKATER algorithm.

https://doi.org/10.1371/journal.pone.0347607.g001

While adjacency defines the neighborhood structure, clustering requires distinguishing between more and less similar neighboring counties. Therefore, edges were assigned weights based on multivariate dissimilarity between standardized county-level covariates using Euclidean distance [24]. The existence of an edge alone indicates spatial contiguity but does not capture similarity in demographic, behavioral, or healthcare characteristics; thus, adjacency by itself is insufficient for identifying meaningful regional partitions.

SKATER constructs a minimum spanning tree (MST) from the weighted connectivity graph and iteratively removes edges with high dissimilarity to partition the graph into spatially contiguous clusters that minimize within-cluster variance [2729]. The MST ensures that all counties remain connected prior to partitioning and provides a systematic framework for identifying spatially contiguous and internally homogeneous clusters. During the clustering procedure, a constraint can be imposed on a specific attribute, such as the population within a spatial cluster or the number of disease cases within a spatial cluster. The SKATER algorithm was implemented as described above in the spatial cluster analysis.

Ethics approval

This study was reviewed and approved by the University of Kansas Medical Center Institutional Review Board and was determined to be non–human subject’s research.

Results

Descriptive statistics

Table 1 shows the range of values for breast cancer mortality in Kansas from 2018–2021, along with socio-demographic and behavioral characteristics that are associated with breast cancer mortality. Noticeable between-county variation in breast cancer mortality was observed each year, with counts ranging from 0–88 in 2018, 0–67 in 2019, 0–68 in 2020, and 0–67 in 2021. The average breast cancer mortality is constant across years, with an average breast cancer mortality of 3.55 in 2018, 3.90 in 2019, 3.28 in 2020, and 3.59 in 2021. Many counties had breast cancer mortality cases below the mean due to a few extreme values. Among the socio-demographic and behavioral characteristics that are known to increase the risk of cancer, female smoking and diabetes prevalence showed slight decreasing patterns from 2018–2021, whereas binge drinking and obesity demonstrated modest increases over time. The Bayesian-estimated annual changes were small (, 95% CI: −1.31 to 0.61; 95% CI: −0.09 to 0.96; , 95% CI: −0.56 to 1.38; 95% CI: −0.24 to 0.08 percentage points per year), and all 95% credible intervals included zero, indicating no statistically significant temporal trends during the study period. It is noteworthy to mention that Female (%) represents the proportion of the county population that is female based on census data.

thumbnail
Table 1. County-Level Summary Statistics for Breast Cancer Mortality Cases and Their Associated Variables in Kansas between 2018-2021.

https://doi.org/10.1371/journal.pone.0347607.t001

Fig 2 presents descriptive histograms of county-level breast cancer case counts from 2018 to 2021, illustrating temporal variation in disease burden. It can be observed that most counties report fewer than 10 breast cancer mortality cases each year, with most counties reporting zero breast cancer mortality cases. Few counties report having breast cancer mortality cases larger than 20, with the largest breast cancer mortality cases of 88 in 2018.

thumbnail
Fig 2. Variation in Breast Cancer Mortality Across Kansas Counties (2018-2021).

https://doi.org/10.1371/journal.pone.0347607.g002

Kansas County model comparisons

Table 2 summarizes Bayesian model comparison metrics used to evaluate competing specifications, including the Deviance Information Criterion (DIC), the Widely Applicable Information Criterion (WAIC), and the marginal log-likelihood. Lower DIC and WAIC values indicate improved expected out-of-sample fit after accounting for model complexity. Several candidate models demonstrated comparable fit, with relatively small differences in DIC and WAIC across the leading specifications. The Poisson BYM2 model was used as the primary model for inference because it yielded the most favorable combination of fit and parsimony among the leading specifications (DIC = 1305.02; WAIC = 1308.40), while acknowledging that several alternative specifications provided comparable fit.

thumbnail
Table 2. Model Fit Statistics for Spatial Temporal Models using R-INLA.

https://doi.org/10.1371/journal.pone.0347607.t002

It can be seen that the DIC and WAIC values for Poisson, Generalized Poisson, and Negative Binomial distributions with the BYM and BYM2 spatial correlation structures are relatively close, suggesting that there is no clear, overwhelming superior model in terms of fit and complexity balance. In Bayesian model comparison, small differences in DIC are generally considered negligible and may indicate that the models perform similarly with respect to the observed data. Taking this into account, the final model was a spatiotemporal Poisson model with a BYM2 spatial structure—not only due to its lowest DIC and WAIC and highest marginal log-likelihood, but also for its parsimony, interpretability, and computational efficiency relative to more complex alternatives. Given the large number of zero counts, we also evaluated zero-inflated Poisson and zero-inflated negative binomial alternatives; however, these models did not show a clear improvement in fit compared with the leading non–zero-inflated specifications (Table 2), suggesting that spatial smoothing and alternative dispersion structures captured most of the observed variability.

Kansas County final model

Table 3 presents the posterior estimates of fixed effects under the BYM2 spatial correlation structure assuming a Poisson distribution. Percentage female binge drinking did not show increased risk in this population (; 95% CI: −0.146, −0.006). Although the estimated coefficient is counter intuitively negative, its magnitude is small and the upper bound of the credible interval is very close to zero. This direction of association is counterintuitive given established individual-level evidence linking alcohol consumption to increased breast cancer risk. The observed minimal negative association at the ecological level likely reflects aggregation effects, collinearity with rurality and related behavioral covariates, or residual spatial confounding rather than a protective effect. Pairwise correlations among covariates are presented in Supplementary S4 Fig, where moderate correlations are observed among several behavioral and rurality-related predictors, supporting the possibility of multicollinearity within the multivariable model. It is noteworthy to mention that all covariates were retained in the model because they represent key factors related to breast cancer risk, behavioral characteristics, and healthcare access examined in this study. Other covariates including rurality index, percent female, percent female with diabetes, percent female with obesity, primary care physicians (PCP), and percent female who smoke had 95% credible intervals that included zero but were retained in the model due to their established relevance in breast cancer epidemiology. The absence of statistical significance for several established covariates does not imply model inadequacy, but may reflect spatial smoothing, aggregation effects inherent to county-level data, and shared variance among correlated predictors. The estimated global time effect (trend) in the model was small (; 95% CI: −0.041, 0.080) and its 95% credible interval included zero, indicating no statistically significant linear year-to-year trend in breast cancer mortality risk at the county level.

thumbnail
Table 3. Posterior Estimates and 95% Credible Intervals of Fixed Effects using the BYM2 correlation structure with Poisson Distribution.

https://doi.org/10.1371/journal.pone.0347607.t003

Fig 3 presents the estimated relative risk of breast cancer mortality for all Kansas counties between 2018 and 2021 (Fig 3a), as well as the 95% credible intervals (Fig 3b, 3c). In Fig 3a, each panel represents the relative risk estimates for breast cancer mortality, quantifying whether the breast cancer mortality risk in a county is higher (RR > 1) or lower (RR < 1) than the average risk in Kansas during that year. Relative risk values that are greater than 1 are shown in red and relative risk values that are less than 1 are shown in blue. The relative risk patterns appear broadly similar across years, with no marked visual shifts in the spatial distribution of elevated or reduced risk. Fig 3b shows the lower bound of the 95% credible interval for the relative risk of breast cancer mortality and Fig 3c shows the upper bound of the 95% credible interval for the relative risk of breast cancer mortality. Table 4 presents the five counties with the strongest evidence that the relative risk differs from 1 (null hypothesis ), defined as counties whose posterior 95% credible interval for excluded 1. In 2018, 11 counties had 95% credible intervals that were above the value of one, and two counties had 95% credible intervals that were less than the value of one. Table 4 shows that Chautauqua, Cowley, Greenwood, Woodson, and Kingman counties had higher relative risk of breast cancer mortality compared to the average relative risk of breast cancer mortality in Kansas, and Wyandotte and Douglas counties had 95% credible intervals had a lower relative risk of breast cancer mortality compared to the average relative risk of breast cancer mortality in Kansas during 2018. In 2019, Chautauqua, Woodson, Greenwood, Ottawa, and Labette counties had a higher relative risk of breast cancer mortality, and Wyandotte, Douglas, and Johnson counties had a lower relative risk of breast cancer compared to the average relative risk of breast cancer mortality in Kansas. In 2020, Chautauqua, Greenwood, Cowley, Woodson, and Kingman counties had a higher relative risk of breast cancer mortality, and Wyandotte, Douglas, Riley, and Sedgwick counties had a lower relative risk of breast cancer compared to the average relative risk of breast cancer mortality in Kansas. In 2021, Chautauqua, Greenwood, Woodson, Marion, and Labette counties had a higher relative risk of breast cancer mortality, and Wyandotte, Riley, and Douglas counties had a lower relative risk of breast cancer compared to the average relative risk of breast cancer mortality in Kansas. The counties that had the highest relative risk of breast cancer mortality are counties with a more rural demographic, while the counties that had the lowest relative risk of breast cancer mortality are counties with a more urban demographic. The rurality index (CARR) values for the counties with high relative risk of breast cancer mortality are between 0.32 and 0.39 and the rurality index (CARR) values for the counties with low relative risk of breast cancer mortality are between 0.17 and 0.26.

thumbnail
Table 4. Top 5 Counties with high or low relative risk compared to the average relative risk in Kansas.

https://doi.org/10.1371/journal.pone.0347607.t004

thumbnail
Fig 3. Relative Risk and 95% Credible Interval of Breast Cancer Mortality for Kansas Counties between 2018 and 2021.

https://doi.org/10.1371/journal.pone.0347607.g003

In order to determine hotspots, or counties that have unusual elevation in relative risk of breast cancer mortality, we can calculate the probability of the relative risk estimates being greater than a given threshold, called exceedance probabilities, for each county. These probabilities are useful in assessing unusual elevation in breast cancer mortality risk. In this study, was used to represent a moderate evaluation in risk (at least 50% higher than the statewide average for that year). Counties were classified as hotspots when the exceedance probability exceeded 0.75, indicating high posterior support for elevated risk while avoiding labeling areas with substantial posterior uncertainty as hotspots. Fig 4 shows the exceedance probabilities within Kansas counties from 2018 to 2021. Sensitivity analyses were conducted using the alternative hotspot definition (RR > 2.0, with probability>0.80, and RR > 1.50 with probability >0.70); results are reported in Supplementary Material S5 File.

thumbnail
Fig 4. Map of Exceedance Probabilities in Kansas Counties (>1.5).

https://doi.org/10.1371/journal.pone.0347607.g004

This map provides evidence of excess risk of breast cancer mortality within individual counties from 2018 to 2021. Counties with probabilities close to 1 are very likely to have relative risk that exceeds 1.5, counties with probabilities close to 0 are very unlikely to have relative risk that exceeds 1.5, and counties with probabilities around 0.50 have the highest uncertainty and correspond to having relative risk below or above 1.5 with equal probability. Fig 4 can be used to determine hotspot areas by identifying counties that have large exceedance probabilities. The larger the exceedance probability the darker red the county and the smaller the exceedance probability the more yellow the county. A county would be considered a hotspot for breast cancer mortality if the exceedance probability is greater than 0.75. Table 5 gives the counties that are classified as hotspots. In 2018 and 2019, there were 7 counties identified as hotspots for elevated relative risk of breast cancer mortality. The number of hotspots decreased to one county identified as a hotspot in 2020, and then the number of hotspots increased to 3 in 2021. The hotspot locations did not remain constant over time, with some counties exhibiting elevated exceedance probabilities in certain years and attenuating in others. This temporal variability suggests that geographic risk is dynamic and may reflect shifts in demographic composition, healthcare access, or behavioral risk factors such as smoking and alcohol use. These findings highlight the importance of continued spatial surveillance rather than reliance on static geographic risk classifications. The counties identified as hotspots are known to have a more rural demographic and population.

thumbnail
Table 5. Kansas Hotspot Counties Identified by Marginal Exceedance Probability.

https://doi.org/10.1371/journal.pone.0347607.t005

Sensitivity analyses showed that the spatial pattern of hotspot identification was generally consistent under alternative exceedance thresholds, with fewer hotspots detected under the more stringent criterion (RR > 2.0; probability >0.80) and slightly more hotspots under the less stringent criterion (RR > 1.5, probability >0.70). Detailed county-level sensitivity hotspot listings and maps are provided in the Supplementary Materials S5 File.

A major challenge we encountered during this analysis was the challenge of counties reporting small counts of breast cancer mortality. Small-area counts can significantly affect spatial modeling by introducing high variability and instability in estimates. For example, many counties reported zero breast cancer mortality cases in a given year, while others reported substantially higher counts (e.g., up to 88 cases in 2018), resulting in highly skewed distributions and variance exceeding the mean. To address this instability and reduce the influence of extreme small-area fluctuations, a spatial cluster analysis was implemented.

Cluster model comparisons

To address the limitations that many counties had small breast cancer mortality counts in the county-level analysis, we implemented a spatial clustering approach to generate larger continuous spatial units that would eliminate the small-count issue. We implemented a constrained spatial regionalization clustering algorithm known as SKATER to cluster counties into a new contiguous spatial clusters. Spatial contiguity was defined using first-order queen adjacency, whereby counties sharing either a common boundary or vertex were considered neighbors. Clustering was implemented in R using the skater function within the spdep package. We generated spatial clusters using only the 2018 county data and assumed these spatial clusters would remain constant across all time points to preserve spatial coherence and avoid introducing additional temporal clustering variability. Temporally aggregating data across multiple years prior to clustering could mask small-area heterogeneity and dilute the small-count structure that motivated the clustering approach.

The clustering variables were the percent of women who are obese, the percent of women who have diabetes, the percent of females, the percent of females who smoke, the percent of females who binge drink alcohol, the number of primary care physicians, and the rurality index (CARR) were used to determine the edge costs associated with each edge in the spatial neighbor list. All covariates were standardized into z-scores, with mean of 0 and standard deviation of 1. Dissimilarity between each county and its neighboring counties was computed based on these standardized variables to generate contiguous spatial clusters. Euclidean distance was used to measure dissimilarity within the SKATER algorithm. Each spatial cluster was required to include at least 10 breast cancer mortality cases to further reduce disclosure risk and ensure confidentiality in reporting. To ensure that all spatial clusters had at least 10 breast cancer mortality cases in all years, the 2018 clusters were constrained to include more than 20 cases.

Fig 5 shows an elbow curve of the within-cluster sums of squares for different cluster values. When deciding the optimal number of clusters, the elbow, or the point at which the within-cluster sums of squares begin to flatten, identifies the point at which additional clusters yield diminishing reductions in within-cluster variability. In this case, the rate of decrease in within-cluster sums of squares becomes progressively smaller after six clusters, indicating diminishing returns in additional cluster partitioning. Therefore, six was selected as the optimal number of spatial clusters, balancing within-cluster homogeneity and between-cluster separation while satisfying the minimum case constraint. Fig 6 shows the new spatial clusters generated from 105 counties in Kansas. A complete listing of county-to-cluster assignments is provided in Supplementary S3 Table to ensure reproducibility. The six spatial clusters were named based on their location within the state. The six spatial clusters are the Northwest, Southwest, Central, Northeast, Southeast, and Metro. These new spatial clusters will be used within the spatial-temporal modeling discussed in the county analysis.

thumbnail
Fig 6. New Spatial Units Identified Through Cluster Analysis using SKATER.

https://doi.org/10.1371/journal.pone.0347607.g006

Table 6 presents a summary of key Bayesian fit indices used to evaluate and compare the performance of the estimated models for the cluster analysis. This table includes the Deviance Information Criterion (DIC), the Widely Applicable Information Criterion (WAIC), and the Marginal Log Likelihood. These criteria help assess model quality by balancing model fit and complexity; lower values of DIC and WAIC indicate better-fitting models. Based on the model fit indices in Table 6, several candidate specifications provided comparable fit, with only modest differences in DIC and WAIC across the leading models. The Poisson BYM2 specification achieved the lowest DIC and WAIC among the Poisson-based cluster models (DIC = 2435.90; WAIC = 2420.70). We therefore selected this model as the primary cluster-level specification for inference due to its favorable fit, parsimony, and interpretability, while acknowledging that competing models demonstrated similar performance.

thumbnail
Table 6. Model Fit Statistics for Spatial Temporal Cluster Models using R-INLA.

https://doi.org/10.1371/journal.pone.0347607.t006

It can be seen that the DIC and WAIC values for all probability distributions and spatial correlation structures are relatively close, suggesting that there is no clear, superior model in terms of fit and complexity balance. In Bayesian model comparison, slight differences in DIC are considered negligible and may indicate that the models perform similarly with respect to the observed data. Taking this into account, the final model was chosen to be the cluster spatial-temporal model, assuming a Poisson distribution with the BYM2 spatial correlation structure due to its simplicity.

Cluster final model analysis

Table 7 presents the posterior estimates of fixed effects under the BYM2 spatial correlation structure assuming a Poisson distribution. In Table 7, average percent of females who binge drink alcohol (95% CI: −0.91, −0.28), average percent of females who smoke tobacco (95% CI: −0.28, −0.05), average percentage of females with diabetes (95% CI: −0.58, −0.02), and average percent female (95% CI: 0.93, 1.57) were found to be statistically significant. Although all other covariates of average rurality index and average percentage of females who are obese are not found to be statistically significant, they are kept in the model due to them being known risk factors for breast cancer mortality. The cluster-level model suggested short-term year-to-year differences in relative risk over the study period.

thumbnail
Table 7. Posterior Estimates and 95% Credible Intervals of Fixed Effects using the BYM2 correlation structure with Poisson Distribution.

https://doi.org/10.1371/journal.pone.0347607.t007

Differences in statistical significance between the county-level and cluster-level analyses likely reflect aggregation effects and variance reduction. By combining counties into larger spatial units, total case counts increase and variability decreases, improving statistical stability and power to detect associations. However, aggregation may also introduce ecological bias; therefore, cluster-level findings should be interpreted within the ecological framework of the study.

Fig 7 presents the estimated relative risk of breast cancer mortality for all six spatial clusters between 2018 and 2021 (Fig 7a), as well as the 95% credible intervals (Fig 7b, 7c). In Fig 7a, each panel represents the relative risk estimates for breast cancer mortality, quantifying whether the breast cancer mortality risk in a cluster is higher (RR > 1) or lower (RR < 1) than the average risk in Kansas during that year. Relative risk values that are greater than 1 are shown in red and relative risk values that are less than 1 are shown in blue. We can see that the relative risk of breast cancer mortality in the Northwest cluster of Kansas stays constant from 2018 to 2019 and then declines, while the relative risk of breast cancer mortality in the Southwest cluster of Kansas increases from 2018 to 2021. The relative risk of breast cancer mortality in the Central cluster and Southeast cluster of Kansas fluctuate up and down from 2018 to 2021 while the Northeast and Metro clusters relative risk of breast cancer stays constant from 2018 to 2021. Fig 7b shows the lower bound of the 95% credible interval for the relative risk of breast cancer mortality, and Fig 7c shows the upper bound of the 95% credible interval for the relative risk of breast cancer mortality. Table 8 presents spatial clusters whose posterior 95% credible intervals for excluded 1. Table 8 shows that the Metro and Northeast spatial clusters had a lower relative risk of breast cancer mortality compared to the average risk of breast cancer mortality in Kansas from 2018 to 2021. The Southwest and the Central clusters had a higher relative risk of breast cancer mortality compared to the average risk of breast cancer in Kansas from 2018 to 2021. The Southeast cluster had a higher relative risk of breast cancer mortality compared to the average risk of breast cancer mortality in Kansas from 2018 to 2019 and in 2021, and the Northwest cluster had a higher relative risk of breast cancer mortality compared to the average risk of breast cancer mortality in Kansas from 2018 to 2020. The spatial clusters that had the highest relative risk of breast cancer mortality are spatial clusters with a more rural demographic, while the spatial clusters that had the lowest relative risk of breast cancer morality are spatial clusters with a more urban demographic. The average rurality index (CARR) values for the spatial clusters with high relative risk of breast cancer mortality are between 0.31 and 0.56 and the average rurality index (CARR) values for the spatial clusters with low relative risk of breast cancer mortality are between 0.22 and 0.26.

thumbnail
Table 8. Relative Risk and 95% CI for Kansas Spatial Clusters.

https://doi.org/10.1371/journal.pone.0347607.t008

thumbnail
Fig 7. Relative Risk and 95% Credible Interval of Breast Cancer Mortality for Kansas Clusters between 2018 and 2021.

https://doi.org/10.1371/journal.pone.0347607.g007

In order to determine hotspots, or clusters that have unusual elevation in relative risk of breast cancer mortality, we can calculate the probability of the relative risk estimates being greater than a given threshold, called exceedance probabilities, for each cluster. These probabilities are useful in assessing unusual elevation in breast cancer mortality risk. In this study, hotspots were defined using the same exceedance probability rule as in the county analysis: RR > 1.5 to represent moderate excess risk and exceedance probability>0.75 to indicate high posterior support. Sensitivity analyses using alternative thresholds (RR > 2.0, probability>0.80; and RR > 1.5, probability>0.70) are provided in Supplementary Material S5 File. Fig 8 shows the exceedance probabilities within Kansas clusters from 2018 to 2021.

thumbnail
Fig 8. Map of Exceedance Probabilities in Kansas Clusters.

https://doi.org/10.1371/journal.pone.0347607.g008

This map provides evidence of excess risk of breast cancer mortality within clusters across from 2018 to 2021. Clusters with probabilities close to 1 are very likely to have relative risk that exceeds 1.5, clusters with probabilities close to 0 are very unlikely to have relative risk that exceeds 1.5, and clusters with probabilities around 0.50 have the highest uncertainty and correspond to having relative risk below or above 1.5 with equal probability. Fig 8 can be used to determine hotspot areas by identifying clusters that have large exceedance probabilities. The larger the exceedance probability the darker red the cluster and the smaller the exceedance probability the more yellow the cluster. A cluster would be considered a hotspot for breast cancer mortality if the exceedance probability is greater than 0.75. Table 9 gives the clusters that are classified as hotspots. In 2018, the Northwest cluster was identified as a hotspot for elevated relative risk of breast cancer mortality. In 2019, the Northwest and Central clusters were identified as hotspots. In 2020, the Southwest cluster was identified as a hotspot, and in 2021 the Central and Southwest clusters were identified as hotspots. The clusters identified as hotspots are known to have a more rural demographic and population.

thumbnail
Table 9. Kansas Hotspot Clusters Identified by Marginal Exceedance Probability.

https://doi.org/10.1371/journal.pone.0347607.t009

Under alternative exceedance thresholds (RR > 2.0, probability >0.80; RR > 1.5, probability >0.70), hotspot identification showed similar geographic patterns, with stricter thresholds yielding a smaller subset of hotspots and relaxed thresholds yielding modest expansion. Full results are provided in Supplementary Material S5 File.

Discussion

There are limited studies that have examined the direct and indirect risk factors for breast cancer mortality while simultaneously accounting for spatial and temporal variations. In this study, we examined socioeconomic, healthcare, and behavioral characteristics of breast cancer mortality across different spatial units in the state of Kansas. We implemented spatial and temporal modeling across county spatial units and within spatial clusters of counties generated by the SKATER spatial clustering algorithm. The spatial clustering algorithm was implemented to generate spatial clusters of counties due to a large number of counties reporting breast cancer mortality cases less than 10. We wanted to demonstrate a technique for generating spatial units that preserve patient privacy and confidentiality while enabling spatial and temporal modeling.

In the county-level analysis, percent female binge drinking was statistically significant but exhibited a negative association with breast cancer mortality after adjustment for spatial structure and other covariates. Although extensive individual-level evidence demonstrates that alcohol consumption increases breast cancer risk [3035], the negative association observed in our ecological model likely reflects aggregation effects, urban–rural structure, correlated behavioral covariates, or residual spatial confounding rather than a protective effect. This discrepancy highlights the distinction between individual-level risk and ecological associations, and the findings should not be interpreted causally.

The Poisson BYM2 model identified several counties in the southeast region of the state with elevated relative risk of breast cancer mortality, corresponding primarily to more rural areas. Although we report results from the Poisson BYM2 model due to its parsimony and interpretability, model comparison results indicated that alternative count specifications (including negative binomial, generalized Poisson, and zero-inflated models) demonstrated similar fit. This suggests that the main conclusions are not driven by a single distributional assumption.

The counties that had the highest relative risk of breast cancer mortality are counties with a more rural demographic, while the counties that had the lowest relative risk of breast cancer mortality are counties with a more urban demographic. The rurality index (CARR) values for the counties with high relative risk of breast cancer mortality are between 0.32 and 0.39, and the rurality index (CARR) values for the counties with low relative risk of breast cancer mortality are between 0.17 and 0.26. This aligns with prior evidence that rural residents experience delayed diagnoses, lower screening rates, and reduced access to oncology services, resulting in poorer outcomes [36]. Although obesity and diabetes are biologically linked to breast cancer progression [37], we did not find them to be significant predictors in the county-level analysis. Certain counties were identified as hotspots for high relative risk of breast cancer mortality through the use of marginal exceedance probabilities, with the specific counties detailed in the Results section. The counties identified as hotspots were more likely to be rural. The number of counties classified as hotspots varied across the study period but showed an overall decline relative to the earlier years.

In the spatial cluster analysis, findings likely reflect broader demographic and behavioral patterns at the population level rather than direct individual effects. Predictors that were not statistically significant may have limited independent signal after adjustment for correlated covariates or reduced variability across counties. Although clustering increases statistical stability by combining counties and increasing case counts, the analysis remains ecological and does not allow for individual-level causal conclusions. The Poisson BYM2 model for the spatial cluster analysis identified clusters with a high relative risk of breast cancer mortality and clusters with fluctuating relative risk over time. The spatial clusters that had the highest relative risk of breast cancer mortality are spatial clusters with a more rural demographic, while the spatial clusters that had the lowest relative risk of breast cancer morality are spatial clusters with a more urban demographic. The average rurality index (CARR) values for the spatial clusters with high relative risk of breast cancer mortality are between 0.31 and 0.56 and the average rurality index (CARR) values for the spatial clusters with low relative risk of breast cancer mortality are between 0.22 and 0.26. We were able to identify certain clusters as hotspots for high relative risk of breast cancer mortality through the use of marginal exceedance probabilities. It was shown that the clusters identified as hotspots were more likely to have a rural demographic.

Temporal fluctuations in relative risk at the county level may reflect demographic shifts, variability in access to screening and treatment services, and statistical instability in sparsely populated areas, even after Bayesian smoothing. Persistent hotspot patterns likely represent structural and healthcare access disparities shared across neighboring counties. These findings are ecological and should be interpreted as hypothesis-generating rather than causal.

The temporal dimension includes a limited number of annual observations, which constrains the identifiability of sustained temporal trends. Accordingly, the temporal component should be interpreted as capturing short-term year-to-year variation rather than long-term change. Differences between the spatial-unit and cluster analyses likely reflect differences in statistical stability and aggregation; clustering increases case counts and reduces variance, which can make short-term temporal differences more detectable.

Overall, this study demonstrates that spatial and temporal modeling provides great insight into the trend of breast cancer mortality by capturing the complex interplay among spatial units and known risk factors. While further research is needed in order to incorporate specific prevention strategies, our findings provide valuable insights into potential hotspots, areas of high risk, and temporal trends of breast cancer mortality in Kansas.

Limitations

This study has several limitations that should be considered when interpreting the findings. First, retrospective observational design inherently limits causal inference, as it relies on existing data that may be subject to confounding and bias. Additionally, the use of aggregated county-level covariates introduces the risk of ecological fallacy, whereby associations observed at the group level may not hold at the individual level. This analysis did not account for a comprehensive set of known risk factors for breast cancer mortality, such as genetic predispositions and other biological or lifestyle-related variables, due to the inability to capture such data at the county level. The omission of these covariates may influence the interpretation of the findings. Furthermore, counties with small numbers of breast cancer mortality cases may produce unstable estimates and reduced statistical power, even with Bayesian smoothing. Spatial clusters were identified using baseline data and assumed to remain constant over time, which may not fully capture potential shifts in spatial configuration. Finally, the limited number of temporal observations constrains the ability to distinguish between random fluctuation and systematic temporal change; therefore, temporal findings should be interpreted cautiously and not as evidence of sustained long-term trends.

Because this analysis relies on aggregated spatial units and Bayesian spatial smoothing, the estimated relative risks represent stabilized population‑level patterns rather than individual‑level risk. Spatial smoothing and clustering improve estimate stability in areas with small counts but may attenuate extreme values or mask localized heterogeneity. As a result, identified areas of elevated risk should be interpreted as regions warranting further investigation rather than definitive evidence of localized causal risk.

Conclusions

This study demonstrates the value of spatial-temporal modeling and cluster analysis in analyzing the relative risk of breast cancer mortality across Kansas. Within the county and spatial cluster analysis, significant geographic variation was identified, emphasizing the importance of incorporating both structured and unstructured heterogeneity. Among all models considered, the Poisson BYM2 model provided the best fit for the county analysis and the spatial cluster analysis. The findings show the importance of addressing socioeconomic, healthcare, and behavioral characteristics that are known risk factors for breast cancer mortality. Public health officials and researchers should implement strategies that prioritize allocating resources and improving breast cancer mortality outcomes in counties and spatial clusters of high-risk to achieve better health outcomes. Implementing spatial and temporal models as described in this paper can assist public health researchers, community members, and policymakers in assessing local breast cancer mortality risks to inform targeted interventions.

Future work will expand this framework by incorporating longer time series to better characterize sustained temporal trends and by exploring finer spatial resolutions or multilevel spatial models, subject to data availability and confidentiality constraints. Extending these methods to multi‑state or regional datasets may further enhance understanding of geographic disparities in breast cancer mortality

Supporting information

S1 File. Descriptive statistics of all covariates within each county in Kansas over 2018–2021: Mean (Standard Deviation).

https://doi.org/10.1371/journal.pone.0347607.s001

(DOCX)

S2 Fig. Variation in Average Demographics Across Kansas Counties (2018–2021).

https://doi.org/10.1371/journal.pone.0347607.s002

(DOCX)

S5 File. Sensitivity analysis for hotspot thresholds.

https://doi.org/10.1371/journal.pone.0347607.s005

(DOCX)

References

  1. 1. American Cancer Society. Cancer facts & Figs 2023. 2025.
  2. 2. Amin RW, Fritsch BA, Retzloff JE. Spatial clusters of breast cancer mortality and incidence in the contiguous USA: 2000–2014. Journal of General Internal Medicine. 2019;34(3):412–9.
  3. 3. Kansas Department of Health and Environment. Cancer data dashboard. 2025,. https://www.kdhe.ks.gov/2239/Data
  4. 4. National Institutes of Health. Cancer stat facts: Female breast cancer. 2023. https://seer.cancer.gov/statfacts/html/breast.html
  5. 5. National Institutes of Health. Cancer stat facts: Female breast cancer. 2022. https://seer.cancer.gov/statfacts/html/breast.html
  6. 6. Terry PD, Rohan TE. Cigarette smoking and the risk of breast cancer in women: a review of the literature. Cancer Epidemiol Biomarkers Prev. 2002;11(10 Pt 1):953–71. pmid:12376493
  7. 7. Danaei G, Vander Hoorn S, Lopez AD, Murray CJL, Ezzati M, Comparative Risk Assessment collaborating group (Cancers). Causes of cancer in the world: comparative risk assessment of nine behavioural and environmental risk factors. Lancet. 2005;366(9499):1784–93. pmid:16298215
  8. 8. Calle EE, Rodriguez C, Walker-Thurmond K, Thun MJ. Overweight, obesity, and mortality from cancer in a prospectively studied cohort of U.S. adults. N Engl J Med. 2003;348(17):1625–38. pmid:12711737
  9. 9. Thune I, Brenn T, Lund E, Gaard M. Physical activity and the risk of breast cancer. N Engl J Med. 1997;336(18):1269–75. pmid:9113929
  10. 10. Albano JD, Ward E, Jemal A, Anderson R, Cokkinides VE, Murray T, et al. Cancer mortality in the United States by education level and race. J Natl Cancer Inst. 2007;99(18):1384–94. pmid:17848670
  11. 11. Crouse DL, Goldberg MS, Ross NA, Chen H, Labrèche F. Postmenopausal breast cancer is associated with exposure to traffic-related air pollution in Montreal, Canada: a case-control study. Environ Health Perspect. 2010;118(11):1578–83. pmid:20923746
  12. 12. Freeman HP. Poverty, culture, and social injustice: determinants of cancer disparities. CA Cancer J Clin. 2004;54(2):72–7. pmid:15061597
  13. 13. Hystad P, Villeneuve PJ, Goldberg MS, Crouse DL, Johnson K, Canadian Cancer Registries Epidemiology Research Group. Exposure to traffic-related air pollution and the risk of developing breast cancer among women in eight Canadian provinces: a case-control study. Environ Int. 2015;74:240–8. pmid:25454241
  14. 14. United States Census Bureau. QuickFacts: Kansas. 2020. https://www.census.gov/quickfacts/KS
  15. 15. Joseph Sheehan T, DeChello LM, Kulldorff M, Gregorio DI, Gershman S, Mroszczyk M. The geographic distribution of breast cancer incidence in Massachusetts 1988 to 1997, adjusted for covariates. Int J Health Geogr. 2004;3(1):17. pmid:15291960
  16. 16. Waller LA, Gotway CA. Applied spatial statistics for public health data. Hoboken, N.J: John Wiley & Sons; 2004.
  17. 17. Kansas Department of Health and Environment. Kansas information for communities (KIC). 2025. https://kic.kdheks.gov/index.html
  18. 18. Kansas Health Matters. Female breast cancer rate. 2025. https://www.kansashealthmatters.org/
  19. 19. Nelson KS, Nguyen TD. Community assets and relative rurality index: A multi-dimensional measure of rurality. Journal of Rural Studies. 2023;97:322–33.
  20. 20. Bernardinelli L, Clayton D, Pascutto C, Montomoli C, Ghislandi M, Songini M. Bayesian analysis of space-time variation in disease risk. Stat Med. 1995;14(21–22):2433–43. pmid:8711279
  21. 21. Asmarian N, Ayatollahi SMT, Sharafi Z, Zare N. Bayesian Spatial Joint Model for Disease Mapping of Zero-Inflated Data with R-INLA: A Simulation Study and an Application to Male Breast Cancer in Iran. Int J Environ Res Public Health. 2019;16(22):4460. pmid:31766251
  22. 22. Simpson D, Rue H, Riebler A, Martins TG, Sørbye SH. Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors. Statist Sci. 2017;32(1).
  23. 23. Rue H, Martino S, Chopin N. Approximate Bayesian Inference for Latent Gaussian models by using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2009;71(2):319–92.
  24. 24. Martins TG, Simpson D, Lindgren F, Rue H. Bayesian computing with INLA: New features. Computational Statistics & Data Analysis. 2013;67:68–83.
  25. 25. Moraga P. Spatial Statistics for Data Science: Theory and Practice with R. Chapman & Hall/CRC; 2023.
  26. 26. Neyens T, Faes C, Vranckx M, Pepermans K, Hens N, Van Damme P, et al. Can COVID-19 symptoms as reported in a large-scale online survey be used to optimise spatial predictions of COVID-19 incidence risk in Belgium? Spat Spatiotemporal Epidemiol. 2020;35:100379. pmid:33138946
  27. 27. AssunÇão RM, Neves MC, Câmara G, Da Costa Freitas C. Efficient regionalization techniques for socio‐economic geographical units using minimum spanning trees. International Journal of Geographical Information Science. 2006;20(7):797–811.
  28. 28. Aho AV, Hopcroft JE, Ullman JD. Data structures and algorithms. Addison-Wesley; 1983.
  29. 29. Jungnickel D. Graphs, networks and algorithms. Springer; 1999.
  30. 30. Dumitrescu RG, Shields PG. The etiology of alcohol-induced breast cancer. Alcohol. 2005;35(3):213–25. pmid:16054983
  31. 31. Fernandez SV. Estrogen, alcohol consumption, and breast cancer. Alcohol Clin Exp Res. 2011;35(3):389–91. pmid:22132831
  32. 32. McDonald JA, Goyal A, Terry MB. Alcohol Intake and Breast Cancer Risk: Weighing the Overall Evidence. Curr Breast Cancer Rep. 2013;5(3):10.1007/s12609-013-0114-z. pmid:24265860
  33. 33. Oyesanmi O, Snyder D, Sullivan N, Reston J, Treadwell J, Schoelles KM. Alcohol consumption and cancer risk: understanding possible causal mechanisms for breast and colorectal cancers. Evid Rep Technol Assess (Full Rep). 2010;(197):1–151. pmid:23126574
  34. 34. Seitz HK, Pelucchi C, Bagnardi V, La Vecchia C. Epidemiology and pathophysiology of alcohol and breast cancer: Update 2012. Alcohol Alcohol. 2012;47(3):204–12. pmid:22459019
  35. 35. Singletary KW, Gapstur SM. Alcohol and breast cancer: review of epidemiologic and experimental evidence and potential mechanisms. JAMA. 2001;286(17):2143–51. pmid:11694156
  36. 36. Singh GK, Williams SD, Siahpush M, Mulhollen A. Socioeconomic, Rural-Urban, and Racial Inequalities in US Cancer Mortality: Part I-All Cancers and Lung Cancer and Part II-Colorectal, Prostate, Breast, and Cervical Cancers. J Cancer Epidemiol. 2011;2011:107497. pmid:22496688
  37. 37. Kang C, LeRoith D, Gallagher EJ. Diabetes, Obesity, and Breast Cancer. Endocrinology. 2018;159(11):3801–12. pmid:30215698