Figures
Abstract
This study, focusing on the assessment of obesity prevalence trends in public health management, proposes an improved Transformer model that integrates temporal embeddings with spatially-constrained feature dependencies rather than purely geographic adjacency. Using state-level data from the CDC BRFSS, the method first performs joint temporal–health encoding (JTH) of obesity prevalence time series and health indicators. It then incorporates temporal decay and a learnable spatial constraint matrix (STA) into the attention mechanism, while employing dual-branch consistency training to enhance stability and generalization. We conducted comparative and ablation experiments on ten states, including Alaska and Alabama, and carried out independent validation on unseen states such as Guam and Idaho. The results show that the proposed approach outperforms representative models including MLP, LSTM, 1D-CNN, Mamba, iTransformer, and TimeMixer across metrics such as MAE, RMSE, sMAPE, R2, and MASE. Ablation experiments further demonstrate that JTH and STA contribute complementary improvements to model performance, while independent validation confirmed that the R2 values for all states exceeded 0.84. In addition, SHAP analysis was employed to illustrate the contributions and dependencies of key features, providing interpretable evidence to support, thereby guiding evidence-based resource allocation in obesity prevention and control.
Citation: Tan W, Geng B, Bai X (2025) Spatiotemporal prediction of obesity rates and model interpretability analysis from a public health perspective. PLoS One 20(11): e0335908. https://doi.org/10.1371/journal.pone.0335908
Editor: Guangyin Jin, National University of Defense Technology, CHINA
Received: September 6, 2025; Accepted: October 19, 2025; Published: November 13, 2025
Copyright: © 2025 Tan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data used in this study are obtained from the publicly available Behavioral Risk Factor Surveillance System (BRFSS), a public resource released by the Division of Nutrition, Physical Activity, and Obesity (DNPAO) under the U.S. Centers for Disease Control and Prevention (CDC). These data are aggregated at the state level and are fully accessible to the public, containing no personally identifiable information. Researchers and the public can download the original datasets, codebooks, and metadata from the CDC’s official website. More information can be found at the following links: BRFSS Annual Data and Documentation: https://www.cdc.gov/brfss/annual_data/annual_data.htm. BRFSS Data and Documentation Overview: https://www.cdc.gov/brfss/data_documentation/index.htm. CDC BRFSS Dataset on Data.gov: https://catalog.data.gov/dataset/cdc-behavioral-risk-factor-surveillance-system-brfss.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
In recent years, public health issues have received increasing global attention, with the high prevalence of obesity emerging as a major challenge threatening human health and social development [1,2]. Reports from the World Health Organization indicate that obesity is not only highly associated with cardiovascular disease, diabetes, and various cancers, but also places long-term pressure on public health systems [3–5]. Therefore, accurately predicting the dynamic trends of obesity prevalence and uncovering its potential driving factors are of great importance for designing targeted interventions and public policies. In regions with significant socioeconomic disparities and complex environmental factors, spatiotemporal modeling and interpretability analysis provide new data-driven approaches for public health decision-making [6]. Table 1 shows the obesity rate trends in different regions.
However, existing studies still face numerous challenges in modeling and applying public health data. First, traditional statistical methods often struggle to effectively capture the nonlinear spatiotemporal evolution of obesity prevalence, resulting in limited predictive accuracy. Second, while certain deep learning methods can improve prediction performance to some extent, they often lack sufficient interpretability, making it difficult to answer the question of “which factors are most critical to obesity changes across different regions and time periods.” Furthermore, public health data typically exhibit multi-source heterogeneity, missing values, and imbalances, further complicating model generalization and real-world applicability. These challenges limit the credibility and practical value of predictive models in public health decision-making.
To address these issues, this paper proposes a predictive framework that integrates temporal modeling with spatial constraints, while explicitly incorporating interpretability mechanisms into the model architecture. Unlike prior interpretable spatiotemporal deep learning approaches such as GNN-based models or hybrid RNN–Transformer variants, our method introduces a unified design that combines temporal embeddings, feature-level spatial constraints, and consistency training. Specifically, we design a modeling module based on temporal embeddings and joint encoding of health indicators to enhance the ability to capture complex dynamic patterns. At the same time, by embedding feature-constrained spatial relations rather than purely geographic adjacency into the attention mechanism, the model can better characterize spatiotemporal dependencies among regions. In the prediction stage, we adopt a multi-branch consistency training strategy to improve model stability and generalization, and employ SHAP-based methods to achieve feature-level interpretability analysis. Through this integrated approach, the framework provides both accurate forecasting of obesity prevalence and transparent, interpretable evidence to support public health decision-making.
The main contributions of this paper are summarized as follows:
(1) From a public health perspective, we systematically propose a spatiotemporal modeling framework for obesity prediction that effectively integrates temporal embeddings and spatial adjacency relationships;
(2) We incorporate interpretability mechanisms into the model architecture, leveraging feature importance analysis and visualization to identify the key driving factors of obesity prevalence;
(3) We develop a multi-branch consistency training strategy that enhances model stability and generalization in cross-regional and cross-temporal predictions;
(4) We conduct extensive experiments on multiple public health datasets, demonstrating that the proposed method outperforms existing baselines in both predictive accuracy and interpretability, thus providing practical references for public health interventions and policymaking.
2 Related work
2.1 Spatiotemporal modeling and public health data analysis
The importance of spatiotemporal modeling in public health research has become increasingly prominent. A growing body of literature has employed spatial statistical models, time series methods, and deep learning approaches to analyze the spatiotemporal dynamics of obesity and related chronic diseases. Gao et al. [12] modeled and projected childhood obesity trends in 191 countries from 1975 to 2030, revealing long-term cross-national patterns. Guo et al. [13] utilized multiple waves of Chinese nutrition and health survey data, applying Bayesian models with environmental features to predict childhood and adolescent obesity. Similarly, Tong et al. [14] used data from 2016–2020 on Chinese adolescents and found clear regional clustering patterns of obesity, while Azanaw et al. [15] demonstrated that socioeconomic factors significantly explained spatiotemporal distributional differences across years in their study of Ethiopian urban women. Collectively, these studies highlight the importance of incorporating spatial and temporal dimensions, providing a solid empirical foundation for understanding the dynamic characteristics of obesity prevalence.
In terms of methodological development, scholars have sought to integrate predictive models with multi-source data to enhance the interpretability and predictive capacity of public health research. For example, Shiri et al. [16] employed a Bayesian spatiotemporal model to forecast obesity prevalence in Iran through 2040, uncovering long-term effects of gender differences and regional characteristics. Grimaccia and Rota [17] analyzed the spatiotemporal dynamics of obesity in Italian regions and stressed the need for regional interventions. At the same time, advanced modeling techniques have been gradually introduced into public health contexts. The EpiGNN model [18], leveraging graph neural networks, revealed regional transmission mechanisms and provided insights into spatial dependence predictions for chronic diseases. Çolak [19] applied multiple time series models to predict the long-term obesity trends among older adults in the United States, whereas Dahu et al. [20] combined satellite remote sensing imagery with deep convolutional networks to achieve high-precision predictions of obesity prevalence in Missouri. Furthermore, Rota et al. [21] proposed a spatiotemporal dynamic modeling framework based on Bayesian Beta regression, offering a novel tool to explore regional differences and future trajectories of obesity prevalence. These studies not only demonstrate the application potential of diverse methodologies in public health but also provide feasible pathways for prediction and decision support in the prevention and control of obesity and other chronic diseases.
2.2 Model interpretability and health decision support
In public health research, model interpretability plays a critical role in ensuring the scientific validity of results and their value for decision support. Recent studies have widely applied explainable artificial intelligence (XAI) methods to uncover the underlying factors in obesity prediction models. Allen [22] developed an interpretable machine learning model for county-level obesity rates in the United States, demonstrating the contributions of socioeconomic and environmental variables. Görmez et al. [23] combined lifestyle and dietary habit data with ensemble XAI-based machine learning methods to achieve transparent prediction of obesity levels. Du et al. [24] designed a visualization-based risk prediction system, translating model outputs into intuitive health management tools. Khater et al. [25] further revealed the complex mechanisms of lifestyle factors influencing obesity. Lin et al. [26] proposed an interpretable obesity risk prediction model for overweight populations, providing valuable references for individual interventions. Meanwhile, scholars have also explored general interpretable classification models [27], as well as explainable prediction approaches based on electronic health records (EHR) [28], underscoring the close connection between data-driven methods and clinical applications.
In the context of health decision support, interpretability not only enhances model transparency but also strengthens its practical utility in clinical and policy settings. Cho et al. [29] applied interpretable models to predict postoperative hospital stay, verifying their feasibility in clinical resource management. Amarasinghe et al. [30] emphasized the potential and research gaps of explainable machine learning in public policy applications. Gupta et al. [31] proposed an obesity prediction framework that integrates deep learning with interpretable elements based on EHR data, while Deva et al. [32] highlighted the limitations of “non-identifiability” in constraining model interpretability from an epidemiological perspective. Kumar et al. [33] introduced concept-driven self-explainable neural networks for ICU mortality prediction, demonstrating that it is possible to balance high performance with interpretability in complex medical scenarios. Overall, these studies underscore the critical role of XAI technologies in health prediction, resource allocation, and policy-making, providing more reliable scientific evidence for obesity prevention and public health management.
3 Method
3.1 Ethics statement
This study only uses state-level aggregated data (BRFSS dataset) publicly released by the U.S. Centers for Disease Control and Prevention (CDC), and does not involve any identifiable personal information or individual privacy. Therefore, this study does not require additional ethical approval.
3.2 Overall model architecture
This study proposes a model based on an improved Transformer architecture for spatiotemporal prediction of obesity rates within a single state. The overall framework takes time series data as the core input, where positional encoding and feature embedding of health survey indicators are jointly modeled to effectively capture dynamic changes across different years and the associations among public health factors. On this basis, spatiotemporal constraints are incorporated into the attention mechanism, enabling the model to more precisely focus on key features driving the temporal evolution of obesity rates, thereby enhancing both predictive accuracy and interpretability. The design highlights two main innovations: first, the integration of temporal embeddings with health indicators strengthens the model’s ability to represent long-term trends; second, the modified attention mechanism improves spatiotemporal dependency modeling and provides greater interpretability by uncovering the critical driving factors behind obesity rate variations. The overall model architecture is shown in Fig 1.
The model takes temporal embeddings and indicator embeddings as inputs, and leverages the improved spatiotemporal constrained attention module to capture long-term dependencies and dynamic variations. The final output layer not only maintains predictive accuracy but also enhances interpretability, providing effective support for obesity rate prediction in public health scenarios.
3.3 Joint modeling of time series embedding and health indicators (JTH)
In the overall modeling process, we treat the public health data sequences related to obesity rates as time series signals, denoted as . To capture their temporal dynamics more concisely, we define an embedding function that projects each raw input into a d-dimensional latent space:
where is the representation at time t. This embedding allows the model to represent long-term trends and short-term fluctuations within a unified feature space. In this study,
combines the raw input value with a sinusoidal positional encoding of the time index t, rather than a learnable embedding. The sinusoidal encoding enables the model to capture periodicity and relative temporal information effectively, while ensuring that embeddings for unseen time steps can be consistently extrapolated.
On this basis, we introduce a multi-head attention mechanism to jointly model the embeddings and obtain global dependencies across time steps. Specifically, let ,
, and
, then for any time step t, its dependency on other time steps can be expressed as
where denotes the attention weight and zt is the aggregated context representation. In this way, the model can explicitly emphasize the most critical time segments for prediction, thereby avoiding the gradient vanishing problem when modeling long-term dependencies. The attention mechanism architecture is shown in Fig 2.
This component constructs interactive attention between time series embeddings and health indicators, effectively capturing dependencies across different time segments, thereby enhancing the model’s representational capacity and interpretability in obesity rate prediction.
To further stabilize the prediction results and enhance interpretability, we introduce a temporal residual connection after context modeling, namely
which ensures that the original temporal information is not lost during deep propagation, effectively preserving sensitivity to short-term dynamics. Meanwhile, the residual term provides an additional channel for interpretability analysis, enabling us to distinguish the respective contributions of the original trend and the global dependencies in the prediction process. The model architecture is shown in Fig 3.
The framework captures long-term trends via convolution-based aggregation and short-term fluctuations via MLP-based encoding, providing complementary temporal representations.
Finally, the joint representations of all time steps are aggregated to obtain the overall prediction vector. The specific form is
where denotes the predicted obesity rate at the next time step, and
is the prediction mapping function. Through the joint modeling of time series embeddings and health indicators, the model can simultaneously capture long-term trends and critical changes, ensuring stable predictive performance while enhancing interpretability, thereby making it highly valuable for public health research.
3.4 Attention mechanism with spatiotemporal constraints (STA)
On the basis of joint modeling, to further improve prediction accuracy and stability, we introduce a spatiotemporal constraint mechanism into the attention computation. Let the time series length be T and the set of spatial locations be , then the representation for any time step t and location
can be written as
where ut,s denotes the embedding representation of location s at time t. In this study, the notion of "spatial" does not correspond to explicit geographic or inter-state adjacency. Instead, it refers to feature-level positions within the embedding space, where spatiotemporal constraints are imposed to regulate correlations among different feature dimensions across time. In other words, the spatial dimension here is an abstraction of learned feature interactions, rather than a physical or population-based neighborhood structure. This setting enables the model not only to capture dependencies along the temporal dimension, but also to model potential correlations across spatial positions. The model architecture is shown in Fig 4.
Temporal decay and correlation-based spatial weights are integrated into the attention computation, enabling the model to capture both local temporal dependencies and cross-feature correlations for improved prediction.
To encode spatiotemporal constraints within the attention mechanism, we first define a time-dependent decay function:
where is a temporal scale hyperparameter that controls the decay rate between different time steps. This mechanism ensures that the model remains sensitive to neighboring time points while modeling long-term dependencies.
Similarly, in the spatial dimension we introduce a correlation-based matrix and define the spatial constraint function as
where As,r is computed from the correlation coefficient between feature position s and feature position r in the embedding space, rather than from geographic adjacency. This correlation-based spatial constraint guides the attention mechanism to focus aggregation on feature dimensions that exhibit stronger statistical associations or contextual relevance.
Based on the above definitions, the spatiotemporally constrained attention weights can be expressed as
where denotes a similarity function. This formulation explicitly integrates temporal decay and spatial adjacency into the attention computation, enabling the model to simultaneously focus on critical time segments and spatial locations during inference.
After obtaining the constrained attention weights, we aggregate the context representations as
Furthermore, to enhance the model’s ability to capture global dynamics, we perform weighted aggregation of the temporal context vectors across all spatial positions:
where denotes a learnable spatial weight coefficient.
Finally, the overall representation of the sequence is obtained through temporal aggregation:
and the final result is produced by the prediction function:
By incorporating spatiotemporal constraints into the attention mechanism, the model not only captures long-term dependencies while avoiding over-smoothing, but also maintains sensitivity to critical time segments. At the same time, the spatial constraint ensures that the prediction results align more closely with the propagation logic observed in real-world public health processes. Overall, this design improves predictive accuracy while also revealing the spatiotemporal driving factors behind obesity rate variations, thereby enhancing interpretability.
Algorithm 1. Prediction process of spatiotemporal constrained attention.
3.5 Training objectives
During the training phase, we jointly consider the two outputs obtained from the time series embedding and the spatiotemporal attention constraint. Let , and obtain the prediction result through
; meanwhile, let
, and obtain another prediction value through
. Here,
is regarded as the temporal branch, focusing on sequential embeddings, while
serves as the spatiotemporal branch, emphasizing correlations constrained by attention. Both branches are trained simultaneously, and their outputs are optimized against the ground-truth label yT + 1.
Specifically, we adopt the Mean Squared Error (MSE) as the optimization objective, defined as
where jointly comes from
and
. The final objective function can be expressed as
where and
correspond to the temporal and spatiotemporal branches, respectively, and
are weighting coefficients that control their relative contributions. In this study, we set
to balance the two paths. During inference, the final prediction
is obtained as the weighted average of the two branches:
. This design ensures that both the temporal and spatiotemporal perspectives contribute to the final result, thereby improving stability and robustness.
4 Dataset and evaluation metrics
4.1 Dataset
This study uses data from the Behavioral Risk Factor Surveillance System (BRFSS) of the Centers for Disease Control and Prevention (CDC), which covers public health information related to adult diet, physical activity, and weight status. The data are included in the Data, Trends, and Maps database of the Division of Nutrition, Physical Activity, and Obesity (DNPAO), providing trends on obesity, nutrition, physical activity, and breastfeeding at both the national and state levels. In this study, we selected data from ten states for analysis, namely Alaska (AK), Alabama (AL), Arkansas (AR), Arizona (AZ), California (CA), Colorado (CO), Connecticut (CT), District of Columbia (DC), Delaware (DE), and Florida (FL). Table 2 provides a detailed description of the dataset features. In terms of data preprocessing, we first examined missing values and found that each state contained approximately 10% missing entries, which were randomly distributed. To ensure data integrity, we applied mean imputation for continuous variables and mode imputation for categorical ones. Outliers were detected using the interquartile range method and subsequently replaced with the mean value of the corresponding feature. Finally, all features were normalized using z-score standardization to improve training stability.
In terms of feature engineering, we selected Data_Value (obesity rate) as the primary prediction target, while YearStart, Sample_Size, and Stratification1 were retained as auxiliary explanatory variables. To capture temporal dynamics, we further constructed lag features (one- and two-year lags), first-order and second-order differences, and rolling means with windows of three and five years. These derived features were designed to represent short-term fluctuations, long-term trends, and seasonal effects in obesity rates. All categorical variables such as Stratification1 were converted into one-hot encodings to ensure compatibility with the model. We chose this feature set because it balances interpretability with predictive power, and preliminary correlation analysis confirmed that these engineered variables had significant associations with the target variable.
In addition, we summarize the number of test samples for each state in Table 3. Reporting the sample size ensures transparency of the evaluation process, as the reliability of statistical tests and confidence intervals can be influenced by the scale of the test data. The variation in the number of samples across states also highlights potential heterogeneity in data availability, which may affect model robustness.
4.2 Evaluation metric
In the evaluation of model performance, we selected five commonly used metrics to measure the accuracy and stability of the predictions. The specific definitions are as follows.
Mean Absolute Error (MAE) measures the average absolute difference between predicted values and true values. A smaller value indicates higher prediction accuracy:
Root Mean Squared Error (RMSE) is more sensitive to larger errors and reflects the overall level of prediction bias:
Symmetric Mean Absolute Percentage Error (sMAPE) is a relative error measure that eliminates the effect of different scales, defined as:
The Coefficient of Determination (R2) measures the proportion of variance in the true values explained by the model. Its range is , and values closer to 1 indicate better fitting performance:
Mean Absolute Scaled Error (MASE) normalizes the error by comparing it with the error of the naive forecasting method, facilitating comparisons across different time series:
5 Experimental results and analysis
5.1 Experimental setup
In the experimental process, we modeled and predicted obesity rate data from ten states based on the aforementioned improved spatiotemporal attention model. The model input consisted of time series data mapped through embedding, which were jointly modeled using an attention mechanism combining temporal embeddings and spatial constraints, and finally processed by a prediction function to output the estimated obesity rates. During training, we consistently adopted Mean Squared Error (MSE) as the optimization objective, and applied state-specific independent modeling to ensure that the experimental results could faithfully reflect the dynamic characteristics of different regions.
To guarantee reproducibility and fairness of the experiments, we maintained consistent hyperparameter settings across all experiments. Specifically, the model was trained with a fixed learning rate and batch size, while the number of iterations was controlled within a reasonable range to avoid overfitting. In addition, appropriate hidden dimensions were set for both the attention mechanism and the feed-forward network. More concretely, the overall architecture consisted of two stacked spatiotemporal attention layers, each followed by a feed-forward sub-layer. The multi-head attention module used 8 heads, with each head operating on a sub-dimension of 16 (total embedding dimension 128). The feed-forward network contained two fully connected layers with dimensions 256 and 128, respectively. We applied the ReLU activation function after each feed-forward layer, and a dropout rate of 0.1 was used to improve generalization. Layer normalization was applied after both the attention and feed-forward sub-layers. The main hyperparameter configurations are presented in Table 4.
5.2 Comparative experimental results
To validate the effectiveness of the proposed method, this study selected a variety of representative time series forecasting models for comparative experiments. The baseline models include traditional neural network approaches (MLP [34], LSTM [35], 1D-CNN [36], BiLSTM [37]), as well as several recently proposed advanced architectures (iTransformer [38], TimeMixer [39], Mamba [40], and LSTM-Transformer [41]). By comparing the performance of different models under the same data and task settings, we can more comprehensively evaluate the advantages and applicability of the proposed method in obesity rate prediction. The experimental results of the average evaluation metrics are shown in Table 5. And The complete state-wise experimental results are provided in the Appendix of Table 12.
This is the average result of three random seeds, and all subsequent tables are in this way.
From the overall comparison, it can be observed that traditional neural network methods such as MLP, LSTM, and BiLSTM are able to capture certain temporal dependencies in obesity rate prediction, but they still suffer from insufficient accuracy when dealing with complex temporal dynamics. Models based on convolution or improved architectures (e.g., 1D-CNN, Mamba, LSTM-Transformer) achieve further improvements in error control and stability, indicating that deeper modeling of time series is of significant value for forecasting public health data. Meanwhile, recently proposed architectures such as iTransformer and TimeMixer outperform traditional methods on most metrics, demonstrating the potential of novel temporal architectures in health trend prediction.
Among all methods, the proposed improved spatiotemporal attention model achieves the best performance, obtaining the lowest values in MAE, RMSE, sMAPE, and MASE, while reaching the highest score in R2. These results indicate that the proposed method not only provides more accurate predictions of obesity rate trends but also offers more reliable evidence for public health interventions. In addition, when analyzing the results across different states, we observe that the predictive performance is not uniformly distributed. For example, states with relatively stable obesity trends exhibit lower variance in prediction errors, whereas states with stronger fluctuations or more irregular patterns show larger deviations. This suggests that regional characteristics, including population structure, lifestyle, and survey sample size, contribute to heterogeneous prediction difficulty. By explicitly comparing these state-level differences, our model demonstrates robustness in both stable and volatile regions, further supporting its generalization ability.
Beyond numerical superiority, we further analyze the reasons behind the performance gains. Compared to LSTM and BiLSTM, our method better captures long-term dependencies through the temporal embedding mechanism and multi-head attention, avoiding gradient decay issues inherent in recurrent models. Relative to iTransformer, the introduction of the Spatiotemporal Attention module allows our model to incorporate cross-feature relational constraints, which is particularly beneficial for capturing interactions among lag, difference, and rolling mean features that iTransformer tends to treat independently. In comparison with TimeMixer, our Joint Temporal-Health encoding explicitly integrates temporal signals with domain-specific health indicators, thereby strengthening the alignment between feature dynamics and health outcomes. These design choices collectively explain why the proposed model not only improves error metrics but also provides stronger interpretability and adaptability across diverse state contexts.
In real-world scenarios, this means that relevant authorities can identify potential risk distributions of obesity earlier and more precisely, thereby formulating more scientific nutrition interventions and physical activity promotion policies, ultimately providing strong support for public health management and disease prevention. The experimental results of the significance test (paired t-test) are further given, as shown in Table 6.
Results marked with * indicate p < 0.05.
In the results of the statistical significance tests, it can be observed that our method demonstrates significant advantages over most baseline models in key metrics such as MAE, sMAPE, and MASE (p<0.05), indicating that the performance improvement is not due to random fluctuations but represents stable and reliable gains. At the same time, for certain metrics such as RMSE and R2, the differences compared with the strongest baseline (e.g., TimeMixer) do not reach the level of statistical significance, suggesting that there is still room for further optimization in specific scenarios. Overall, the significance analysis verifies the effectiveness and robustness of the proposed method across multiple dimensions, providing statistical support for its practical applicability.
5.3 Ablation experiment results
To further validate the effectiveness of each proposed module, we conducted ablation experiments based on the baseline Transformer architecture. Specifically, we sequentially introduced the Joint modeling of time series embedding and health indicators (JTH) module and the Attention mechanism with spatiotemporal constraints (STA) module, and evaluated their respective contributions to overall performance. Finally, both modules were combined to form the complete model (Ours). Through this step-by-step comparison, the role and value of different designs in the obesity rate prediction task can be clearly identified. Table 7 presents the average experimental results across the ten states. And The complete state-wise experimental results are provided in the Appendix of Table 13.
From the experimental results, it can be observed that the baseline Transformer is able to capture certain temporal features in the obesity rate prediction task, but the overall error remains relatively high. After introducing the Joint modeling module (+JTH), the model shows significant improvements in metrics such as MAE, RMSE, and sMAPE, indicating that jointly modeling time series embeddings and health indicators enables better characterization of long-term trends and dynamic variations, thereby enhancing prediction stability and accuracy.
With the further introduction of the spatiotemporal constrained attention module (+STA), the model performance is further enhanced, particularly with a notable improvement in the R2 metric, demonstrating the importance of this mechanism in capturing temporal dependencies and constraint information. When both modules are combined to form the complete model (Ours), the best performance is achieved across all evaluation metrics. This not only proves the complementarity of the two designs in obesity rate prediction but also highlights the capability of the proposed method to provide stronger support for disease risk assessment and policy-making in public health data analysis. The experimental results of the paired t-test are also given, as shown in Table 8.
Asterisks indicate p<0.05.
From the results of the significance tests, it can be observed that Ours demonstrates statistically significant advantages (p < 0.05) over the three baselines (Transformer, +JTH, and +STA) across all five metrics (MAE, RMSE, sMAPE, R2, and MASE). This indicates that the improvements in overall prediction accuracy and robustness achieved by the proposed method are not due to random fluctuations but represent stable and reliable gains. Overall, these findings further validate the effectiveness and necessity of the proposed approach under different module combinations.
5.4 Hyperparameter sensitivity experiment results
This paper further presents the experimental results of hyperparameter sensitivity, mainly focusing on the hyperparameter sensitivity experiments on embedding dimensions and attention heads. The experimental results are shown in Tables 9 and 10.
This set of sensitivity experiments shows that, for the choice of attention heads, the model performance improves steadily as the number of heads increases from 2 to 8. When the number of attention heads reaches 8, the evaluation metrics achieve the best values. This indicates that adding more heads enables the model to capture temporal dependencies in different subspaces more effectively, thereby enhancing its ability to model complex dynamic patterns.
For the embedding dimension, the results demonstrate a clear improvement when increasing from 64 to 128, with the best performance obtained at 128 dimensions. Further increasing the dimension to 256 and 512 leads to slight performance degradation, suggesting that overly large dimensions may introduce redundant features and increase the risk of overfitting. Therefore, these experiments indicate that the optimal hyperparameter configuration is attention heads = 8 and embedding dimension = 128.
5.5 Scatter plot visualization results
To more intuitively demonstrate the correspondence between the predicted values and the ground truth, scatter plots were generated based on the data from ten states. This visualization provides a clear reflection of the model’s fitting performance and its consistency with the overall trend in the obesity rate prediction task. The experimental results are shown in Fig 5.
By plotting the data from different states, the model’s fitting performance and overall trend consistency in obesity rate prediction can be intuitively demonstrated.
From the scatter plot distributions of the ten states, it can be observed that the predicted values and the ground truth generally follow a trend close to the diagonal line, indicating that the model is able to effectively capture the variation patterns of obesity rates. Although the point cloud distributions differ slightly across states, reflecting regional characteristics and fluctuations inherent in health data, the overall alignment with the diagonal remains strong, suggesting that the method achieves reliable predictive accuracy and generalization across multiple regions.
Further examination of the scatter clustering reveals that most points are concentrated in the medium-to-low value range and closely align with the reference diagonal line. This indicates that the model performs more robustly within the main distribution range of obesity rates. However, at higher observed values a slight downward deviation of predictions can be seen, implying that the model tends to underestimate extreme obesity rates. While a few outliers still exist in certain states, their impact on the overall trend is limited. Acknowledging this limitation, the model nonetheless demonstrates strong overall spatiotemporal modeling capability and provides interpretable evidence to support public health research, assisting policymakers in designing targeted intervention measures across states.
5.6 Visualization results of Shap value importance analysis
To further explore the interpretability of the model predictions, this study employed SHAP values to visualize the importance of input variables. By illustrating the contributions of different variables to the prediction results, the analysis intuitively reflects the key factors that the model focuses on in the obesity rate prediction task, thereby providing more interpretable references for public health research. The experimental results are shown in Fig 6.
The figure illustrates the contribution of different features in obesity rate prediction, providing an intuitive basis for the interpretability analysis of the model.
From the visualization results, it can be observed that the ranking of feature contributions varies across different states, but overall, time-related variables (such as lag terms, difference terms, and rolling averages) play a dominant role. This indicates that the model relies more on the dynamic temporal patterns of the time series itself when predicting obesity rates. In particular, features such as the three-step rolling mean (roll3), the first-order difference (diff1), and the one-step lag (lag1) consistently show the highest SHAP values, highlighting that cumulative temporal effects and short-term fluctuations are the most critical signals. These findings suggest that policymakers should prioritize monitoring temporal dynamics of obesity prevalence, as both long-term accumulation and short-term variations directly drive risk changes. Furthermore, the ability to identify states with rapidly rising short-term fluctuations implies that the model can provide early-warning signals several months in advance, creating a valuable intervention window.
Meanwhile, some auxiliary variables, such as sample size and the width of confidence intervals, also show certain importance in different states, indicating that statistical characteristics and data stability influence the reliability of predictions. For example, states with smaller survey sample sizes or larger confidence intervals may face greater uncertainty, implying the need for stronger investment in data collection and quality assurance to ensure reliable evidence for policy use. In this regard, the model not only helps allocate resources toward high-risk states but also distinguishes between interventionable and non-interventionable factors, enabling more targeted and efficient policy design. SHAP analysis reveals the key driving factors emphasized by the model across states, which not only enhances the interpretability of the method but also provides data support for public health research. By explicitly linking these explanatory results with decision-making, the analysis helps policymakers distinguish between universally important temporal drivers and state-specific statistical conditions, enabling more targeted and effective intervention strategies.
5.7 SHAP dependency graph analysis
This paper also selects AK, AZ, and AR to provide a dependency graph analysis of the top 3 importance levels. The experimental results are shown in Fig 7.
From the figure, it can be seen that the model visualizes the effects of the top three most important features through dependence plots. Under different feature values, the corresponding SHAP values exhibit relatively stable monotonic trends, indicating that these features provide clear directional contributions in obesity rate prediction. In other words, as the values of key features increase, their positive impact on the prediction results also strengthens, thereby enhancing the model’s sensitivity to temporal variations.
At the same time, the color of the points represents the values of another interacting feature, revealing potential coupling relationships between different features. At the same level of the main feature, variations in the interacting feature lead to slight differences in SHAP values, suggesting that the model does not rely solely on a single feature for prediction but instead considers the interactions among multiple features. Such interactions are particularly important for explaining complex public health data, as obesity rates are often influenced by multiple factors simultaneously. It can be seen from the experimental results, the SHAP analysis further uncovers nonlinear effects and heterogeneous impacts across states. For example, the same lag feature may contribute positively in states with stable obesity trends but show weaker or even negative marginal effects in highly fluctuating states. This indicates that the model captures not only the global monotonic influence of features but also context-dependent variations, which provides a more fine-grained understanding of how temporal and health-related variables jointly drive obesity dynamics.
Furthermore, the distributions of different features are shown in the gray histograms at the bottom, indicating that the model performs robustly within the main sample distribution regions, while some uncertainty may exist at extreme values. This highlights that while the model is stable within the majority data range, the SHAP dependence plots also expose boundary conditions where predictions are less reliable, thereby offering valuable guidance for identifying risk scenarios that require additional public health attention. Overall, this experimental result not only validates the interpretability of the model but also provides a powerful tool for revealing the associations between risk factors and outcomes in the field of public health.
5.8 Independent validation experiments and analysis
This paper concludes with independent validation experiments conducted on additional states. Specifically, the model was first trained on all ten states and then tested on unseen states. We selected Guam (GU), Idaho (ID), Massachusetts (MA), Mississippi (MS), North Carolina (NC), and South Carolina (SC) for testing. The results are presented in Table 11.
From the table, it can be observed that the model achieves overall stable prediction performance across different states, with MAE maintained between 2.0–2.3 and all R2 values greater than 0.84. This indicates that the proposed method retains strong generalization ability under the complex spatiotemporal structures of public health data. Notably, Idaho (ID) and Mississippi (MS) show slightly lower errors and relatively higher predictive fit, reflecting the model’s adaptability to different regional feature distributions. In contrast, North Carolina (NC) and Massachusetts (MA) present slightly higher errors, suggesting that local population characteristics or social-environmental factors may increase modeling difficulty.
In the context of public health, these prediction results represent more than numerical differences; they also reflect the complexity of obesity rate variations and the multifactorial drivers across regions. The model’s consistently high predictive accuracy across multiple states provides a reliable decision-support tool for relevant authorities. For instance, in regions with relatively higher obesity rates and slightly larger prediction errors, intervention policies can be prioritized and resources better allocated based on model outputs, thereby enabling more precise and effective public health management.
6 Discussion
The findings of this study highlight the value of incorporating spatiotemporal modeling into obesity rate prediction. While traditional models can capture general temporal trends, the independent contributions of the Joint Temporal-Health encoding and the Spatiotemporal Attention module emphasize that obesity dynamics cannot be fully understood without explicitly modeling both time-evolving patterns and cross-feature interactions. This suggests that the obesity epidemic reflects not only gradual changes over time but also the complex interplay between health indicators, demographics, and survey characteristics. Our results therefore provide evidence that advanced temporal architectures can enhance the reliability of public health forecasting.
Moreover, the SHAP-based interpretability analysis reveals important insights into how specific features contribute to prediction outcomes. The monotonic relationships observed in dependence plots confirm that temporal lag and difference features exert stable directional effects, while state-level variations indicate that regional factors may influence prediction difficulty. Although some fluctuations remain in extreme cases, these observations point to the practical significance of explainable modeling: public health authorities can better understand not just the accuracy of forecasts, but also the underlying drivers of obesity risk. Overall, the discussion underscores that the proposed approach not only improves predictive accuracy but also enriches interpretability in ways that are meaningful for real-world health management.
When compared with existing research, our approach builds upon and extends prior findings in time series forecasting and health trend analysis. Traditional models such as LSTM or BiLSTM focus on sequential dependencies but struggle with long-term gradients, while convolutional architectures like 1D-CNN improve local feature extraction but lack global interpretability. More recent methods such as iTransformer and TimeMixer emphasize efficient attention or token mixing, yet they primarily treat features as independent inputs. In contrast, our JTH and STA modules introduce explicit cross-feature and temporal-health interactions, providing a more domain-aware representation. This represents a fundamental difference rather than an incremental adjustment, as it embeds health-specific priors directly into the modeling process.
At the same time, it is important to acknowledge the limitations of our work. Compared with generic models such as TimeMixer, our method requires additional preprocessing and domain-specific feature engineering, which may reduce flexibility in unseen application areas. Furthermore, while JTH and STA provide improved accuracy, the complexity of the model is higher than lightweight baselines, potentially increasing training cost. These trade-offs highlight that our contribution is not solely numerical improvement, but rather a balance between predictive performance, interpretability, and domain relevance. By situating our method alongside both traditional and modern baselines, the discussion clarifies which aspects are incremental optimizations and which constitute fundamental innovations.
7 Conclusion
This study proposes an improved Transformer model that integrates temporal embeddings with a spatially constrained attention mechanism for state-level spatiotemporal prediction of obesity prevalence. Comparative experiments, ablation studies, and independent validation conducted across ten states demonstrate that the proposed method outperforms various mainstream models in terms of MAE, RMSE, sMAPE, R2, and MASE, while maintaining stable predictive performance across different regions. Furthermore, model interpretability was assessed using the SHAP method, revealing the influence patterns of diverse health indicators on obesity prediction, thereby providing data-driven references for public health policymaking.
Looking forward, the approach presented in this study can be further extended to larger-scale public health datasets, such as nationwide long-term dynamic monitoring or more fine-grained population stratification prediction. In addition, future research may integrate multi-source heterogeneous data (e.g., environmental, socioeconomic, and lifestyle factors) and explore advanced methodologies such as graph neural networks and causal inference to more deeply characterize the complex driving mechanisms of obesity prevalence. By connecting model predictions with policy simulations, the framework offers clearer and more practical guidance for public health authorities when designing targeted intervention strategies.
Appendix A. Appendix
Appendix A.1. Comparison of forecast results for ten states
References
- 1.
Organization WH, et al. WHO European regional obesity report 2022. 2022.
- 2. Chukwuonye II, Ohagwu KA, Ogah OS, John C, Oviasu E, Anyabolu EN, et al. Prevalence of overweight and obesity in Nigeria: systematic review and meta-analysis of population-based studies. PLOS Glob Public Health. 2022;2(6):e0000515. pmid:36962450
- 3. GBD 2021 Adolescent BMI Collaborators. Global, regional, and national prevalence of child and adolescent overweight and obesity, 1990-2021, with forecasts to 2050: a forecasting study for the Global Burden of Disease Study 2021. Lancet. 2025;405(10481):785–812. pmid:40049185
- 4. Okati-Aliabad H, Ansari-Moghaddam A, Kargar S, Jabbari N. Prevalence of obesity and overweight among adults in the middle east countries from 2000 to 2020: a systematic review and meta-analysis. J Obes. 2022;2022:8074837. pmid:35154826
- 5. Tulp OL, Obidi OF, Oyesile TC, Einstein GP. The prevalence of adult obesity in Africa: a meta-analysis. Gene Reports. 2018;11:124–6.
- 6. GBD 2021 Adult BMI Collaborators. Global, regional, and national prevalence of adult overweight and obesity, 1990-2021, with forecasts to 2050: a forecasting study for the Global Burden of Disease Study 2021. Lancet. 2025;405(10481):813–38. pmid:40049186
- 7. NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in underweight and obesity from 1990 to 2022: a pooled analysis of 3663 population-representative studies with 222 million children, adolescents, and adults. Lancet. 2024;403(10431):1027–50. pmid:38432237
- 8.
Bentham J, Di Cesare M, B I lano V, Boddy LM, et al. Worldwide trends in children’s, adolescents’ body mass index, underweight, obesity, in comparison with adults and from 1975 to 2016: a pooled analysis of 2,416 population-based measurement studies with 128.9 million participants. Lancet. 2017.
- 9. GBD 2015 Obesity Collaborators, Afshin A, Forouzanfar MH, Reitsma MB, Sur P, Estep K, et al. Health effects of overweight and obesity in 195 countries over 25 years. N Engl J Med. 2017;377(1):13–27. pmid:28604169
- 10. Ng M, Dai X, Cogen RM, Abdelmasseh M, Abdollahi A, Abdullahi A. National-level, state-level prevalence of overweight, obesity among children, adolescents and adults in the USA 1990 –2021, and forecasts up to 2050. The Lancet. 2024;404(10469):2278–98.
- 11. Wang L, Zhou B, Zhao Z, Yang L, Zhang M, Jiang Y, et al. Body-mass index, obesity in urban and rural China: findings from consecutive nationally representative surveys during 2004 -18. Lancet. 2021;398(10294):53–63. pmid:34217401
- 12. Gao L, Peng W, Xue H, Wu Y, Zhou H, Jia P, et al. Spatial-temporal trends in global childhood overweight and obesity from 1975 to 2030: a weight mean center and projection analysis of 191 countries. Global Health. 2023;19(1):53. pmid:37542334
- 13. Guo C, Wang H, Feng G, Li J, Su C, Zhang J, et al. Spatiotemporal predictions of obesity prevalence in Chinese children and adolescents: based on analyses of obesogenic environmental variability and Bayesian model. Int J Obes (Lond). 2019;43(7):1380–90. pmid:30568273
- 14. Tong Z, Zhang H, Yu J, Jia X, Hou X, Kong Z. Spatial-temporal evolution of overweight and obesity among Chinese adolescents from 2016 to 2020. iScience. 2024;27(1).
- 15. Azanaw MM, Zewde EA, Gebremariam AD, Dagnaw FT, Asnakew DT, Chanie ES, et al. Spatiotemporal distribution and determinants of overweight or obesity among urban women in Ethiopia: a multivariate decomposition analysis. BMC Womens Health. 2022;22(1):494. pmid:36471341
- 16. Shiri MS, Karami H, Ghanbarnezhad A, Bordbar N, Mouseli A, Emamgholipour S. National and subnational trends in obesity prevalence in Iran: a Spatiotemporal study with future predictions. Sci Rep. 2025;15(1):17664. pmid:40399344
- 17. Grimaccia E, Rota L. Spatiotemporal analysis of obesity: the case of Italian regions. Obesities. 2025;5(2):37.
- 18.
Xie F, Zhang Z, Li L, Zhou B, Tan Y. EpiGNN: Exploring spatial transmission with graph neural network for regional epidemic forecasting. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2022. p. 469–85.
- 19. Çolak H. Future projections of elderly obesity in the United States using time series models. Obesity Medicine. 2025;56:100627.
- 20. Dahu BM, Khan S, Toubal IE, Alshehri M, Martinez-Villar CI, Ogundele OB, et al. Geospatial modeling of deep neural visual features for predicting obesity prevalence in Missouri: quantitative study. JMIR AI. 2024;3:e64362. pmid:39688897
- 21. Rota L, Argiento R, Cameletti M. Modeling spatio-temporal dynamics of obesity in Italian regions via Bayesian beta regression. arXiv preprint 2025.
- 22. Allen B. An interpretable machine learning model of cross-sectional U.S. county-level obesity prevalence using explainable artificial intelligence. PLoS One. 2023;18(10):e0292341. pmid:37796874
- 23. Görmez Y, Yagin FH, Yagin B, Aygun Y, Boke H, Badicu G, et al. Prediction of obesity levels based on physical activity and eating habits with a machine learning model integrated with explainable artificial intelligence. Front Physiol. 2025;16:1549306. pmid:40740428
- 24. Du J, Yang S, Zeng Y, Ye C, Chang X, Wu S. Visualization obesity risk prediction system based on machine learning. Sci Rep. 2024;14(1):22424. pmid:39342032
- 25. Khater T, Tawfik H, Singh B. Explainable artificial intelligence for investigating the effect of lifestyle factors on obesity. Intelligent Systems with Applications. 2024;23:200427.
- 26. Lin W, Shi S, Huang H, Wen J, Chen G. Predicting risk of obesity in overweight adults using interpretable machine learning algorithms. Front Endocrinol (Lausanne). 2023;14:1292167. pmid:38047114
- 27.
Khater T, Tawfik H, Sowdagar S, Singh B. Interpretable models for ML-based classification of obesity. In: Proceedings of the 2023 7th International Conference on Cloud and Big Data Computing, 2023. p. 40–7. https://doi.org/10.1145/3616131.3616137
- 28.
Phan-Vo TL. An interpretable prediction model for obesity prediction using EHR data. 2019.
- 29. Cho HN, Ahn I, Gwon H, Kang HJ, Kim Y, Seo H, et al. Explainable predictions of a machine learning model to forecast the postoperative length of stay for severe patients: machine learning model development and evaluation. BMC Med Inform Decis Mak. 2024;24(1):350. pmid:39563368
- 30. Amarasinghe K, Rodolfa KT, Lamba H, Ghani R. Explainable machine learning for public policy: use cases, gaps, and research directions. Data & Policy. 2023;5.
- 31. Gupta M, Phan T-LT, Bunnell HT, Beheshti R. Obesity prediction with EHR data: a deep learning approach with interpretable elements. ACM Trans Comput Healthc. 2022;3(3):32. pmid:35756858
- 32.
Deva A, Shingi S, Tiwari A, Bannur N, Jain S, White J. Interpretability of epidemiological models: the curse of non-identifiability. arXiv preprint 2021. https://arxiv.org/abs/2104.14821
- 33.
Kumar S, Yu SC, Kannampallil T, Abrams Z, Michelson A, Payne PRO. Self-explaining neural network with concept-based explanations for ICU mortality prediction. In: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 2022. p. 1–9. https://doi.org/10.1145/3535508.3545547
- 34.
Poulton MM. Multi-layer perceptrons and back-propagation learning. In: Handbook of Geophysical Exploration: Seismic Exploration. vol. 30. Elsevier; 2001. p. 27–53.
- 35. Sahoo BB, Jha R, Singh A, Kumar D. Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting. Acta Geophys. 2019;67(5):1471–81.
- 36.
Kiranyaz S, Ince T, Abdeljaber O, Avci O, Gabbouj M. 1-D convolutional neural networks for signal processing applications. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019. p. 8360–4. https://doi.org/10.1109/icassp.2019.8682194
- 37.
Tavakoli N. Modeling genome data using bidirectional LSTM. In: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC). 2019. https://doi.org/10.1109/compsac.2019.10204
- 38.
Liu Y, Hu T, Zhang H, Wu H, Wang S, Ma L. Itransformer: Inverted transformers are effective for time series forecasting. arXiv preprint 2023. https://arxiv.org/abs/231006625
- 39. Wang S, Wu H, Shi X, Hu T, Luo H, Ma L. Timemixer: decomposable multiscale mixing for time series forecasting. arXiv preprint 2024.
- 40. Gu A, Dao T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint 2023.
- 41. Kabir MR, Bhadra D, Ridoy M, Milanova M. LSTM–transformer-based robust hybrid deep learning model for financial time series forecasting. Sci. 2025;7(1):7.