Figures
Abstract
Introduction
The increasing aging population raises significant concerns about the ability of individuals to age healthily, avoiding chronic diseases and maintaining cognitive and physical functions. However, the pathways through which SDOH factors are associated with healthy aging remain unclear.
Methods
This retrospective cohort study uses the registered tier data from the All of Us Research Program (AoURP) registered tier dataset v7. Eligible study participants are those aged 50 and older who have responded to any of the SDOH survey questions with available EHR data. Three different algorithms were trained (logistic regression [LR], multi-layer perceptron [MLP], and extreme gradient boosting [XGBoost]). The outcome is healthy aging, which is measured by a composite score of the status for 1) comorbidities, 2) cognitive conditions, and 3) mobility function. We evaluate the model performance by area under the receiver operating characteristic curve (AUROC) and assess the fairness of best-performed model through predictive parity. Feature importance is analyzed using SHapley Additive exPlanations (SHAP) values.
Results
Our study included 99,935 participants aged 50 and above, and the mean (SD) age was 74 (9.3), with 55,294 (55.3%) females, 67,457 (67.5%) Whites, 11,109 (11.1%) Hispanic ethnicity, and 44,109 (44.1%) are classified as healthy aging. Most of the individuals lived in their own house (64%), were married (51%), obtained college or advanced degrees (74%), and had Medicare (56.2%). The best predictive model was XGBoost with random oversampler, with a performance of AUROC [95% CI]: 0.793 [0.788–0.796], F1 score: 0.697 [0.692–0.701], recall: 0.739 [0.732–0.748], precision: 0.659 [0.655–0.663], and accuracy: 0.716 [0.712–0.720], and the XGBoost model achieved predictive parity by similar positive and negative predictive values across race and sex groups (0.86–1.06). In feature importance analysis, health insurance type is ranked as the most predictive feature, followed by employment status, substance use, and health insurance coverage (yes/no).
Citation: Chen W-H, Lee Y-A, Tang H, Li C, Lu Y, Huang Y, et al. (2026) Social determinants of healthy aging: An investigation using the all of us cohort. PLoS One 21(3): e0342292. https://doi.org/10.1371/journal.pone.0342292
Editor: Bettye A. Apenteng, Georgia Southern University, UNITED STATES OF AMERICA
Received: January 8, 2025; Accepted: January 20, 2026; Published: March 6, 2026
Copyright: © 2026 Chen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: This dataset for this study is sourced from the All of Us Research Program Registered tier dataset v7, which is not publicly available. The authors have obtained all necessary permissions to access and utilize the All of Us Research Program registered tier dataset for this study. Researchers who are interested in accessing the All of Us Research Program dataset can apply for access through the All of Us Research Hub. For more information and to initiate the application process, please visit the All of Us Research Hub website at https://www.researchallofus.org/. The code for building the models and assessing the model performance using bootstrap is shared using Github repository. https://github.com/weihanc11/Healthy-aging.git.
Funding: Dr. Guo received funding from National Institute of Health / National Institute on Aging: R01AG089445. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Aging is an inevitable biological process characterized by a gradual decline in physiological functions, leading to increased vulnerability to diseases and death. The concept of healthy aging has emerged as a critical focus in geriatric research and public health. [1–3] Healthy aging refers to the process of developing and maintaining functional abilities that enable well-being in older age. [4–6] Several theoretical frameworks have been proposed for defining and operationalizing healthy or successful aging in the gerontological literature. Rowe and Kahn [7] conceptualized successful aging as comprising low probability of disease and disease-related disability, high cognitive and physical functional capacity, and active engagement with life. The World Health Organization (WHO) similarly emphasizes physical, cognitive, and psychosocial well-being as central elements of healthy aging. [8] Given the multidimensional nature and ongoing debates about the definition of healthy aging, our study adopts a pragmatic approach by measuring healthy aging based on comorbidities, cognitive status, and mobility, which are consistent with the key domains identified in these foundational frameworks. This approach enables us to operationalize a construct of healthy aging that is both clinically relevant and comparable to prior research while remaining sensitive to the complexities inherent in the concept. It encompasses physical, mental, cognitive, and social well-being, allowing individuals to live independently and enjoy a good quality of life despite the natural aging process. The increasing proportion of older individuals in the global population has intensified the need to understand and promote healthy aging, making it a vital area of study. [9] In 2020, nearly 1 in 6 Americans were 65 years or older, and this group is estimated to constitute 23% of the total US population in 2050 [10,11].
Social determinants of health (SDOH) — the conditions where people are born, grow, work, live, and age — play a crucial role in individuals’ health, influencing the aging process and the ability to age healthily. [12] SDOH includes factors such as socioeconomic status, education, neighborhood and physical environment, employment, social support networks, and access to healthcare. [13–15] Previous studies have demonstrated that these SDOH can significantly affect an individual’s health outcomes by influencing behaviors, exposures, and access to resources necessary for maintaining health. [16–20] Individuals with higher socioeconomic status, better education, and stronger social support tend to have better health outcomes and a higher likelihood of healthy aging. [21–23] Addressing disparities in SDOH is therefore essential for promoting health equity and improving the quality of life for older adults, especially those socioeconomically disadvantaged groups. [24–26]
Existing studies on the impact of SDOH on healthy aging are limited. [27] For instance, Sowa et al. have identified a set of predictors using health surveys in Europe, however, they focused only on lifestyle and psychosocial factors and did not consider many other SDOH. [28] On the other hand, the application of machine learning (ML) models has shown great promise in predicting health outcomes. [29] Other studies that applied ML techniques have mainly focused on biological or physiological factors in healthy aging, [30] none have studied SDOH.
To fill the gap, the objective of this study is to develop a prediction model of healthy aging by leveraging a large cohort of older adults from the AoU and advanced ML techniques. Understanding the relationship between SDOH and healthy aging holds significant clinical and policy implications. Clinically, this knowledge enables healthcare providers to create more personalized care plans that address both medical and social factors influencing a patient’s health. On the policy side, identifying key SDOH linked to healthy aging can guide targeted interventions and resource allocation, fostering public health strategies that promote healthy aging across diverse populations. Additionally, we also evaluated the fairness of the ML models in predicting healthy aging, ensuring that they do not perpetuate existing disparities and can be applied equitably across different demographic groups. Lastly, we identified the top predictors for healthy aging using SHapley Additive exPlanations (SHAP) values, a well-established explainable ML method, which could inform the development of targeted interventions and policies to support healthy aging by addressing the most influential SDOH.
Methods
Data source and study population
We used the registered tier data from the All of Us (AoU) Research Program registered tier dataset v7. [31] The AoU was a nationwide program which aimed to provide diverse and comprehensive information among under-represented groups. The database included survey questions (e.g., lifestyle, demographic, and social determinants of health) and electronic health records (EHR). [32] Both survey questions and EHR were standardized and could be mapped utilizing Observational Medical Outcomes Partnership (OMOP) Common Data Model infrastructure. [33] We included individuals aged ≥50 years of age who have responded to any of the SDOH survey questions with available EHR data.
Study outcome
The primary outcome is a dichotomous score of healthy aging, which was measured by a composite score of the status for 1) comorbidities, 2) cognitive conditions, and 3) mobility function. Charlson comorbidity index (CCI) by Quan. et al [34] was used for assessing comorbidity status. We modified the original CCI algorithm to exclude age as a parameter (referred to as modified CCI [mCCI]) since our goal was to predict healthy aging. Secondly, we assessed the cognitive conditions by ICD-9 and −10 CM codes with a diagnosis of mild cognitive impairment (MCI). Lastly, to assess the mobility function, we identified individuals in assisted living using CPT/HCPCS codes and records of discharge locations. An individual aged over 75 is classified as experiencing healthy aging if they have a composite score of 0, which includes an mCCI score of 0, no MCI, and are not in assisted living. Thus, a composite score of 0 indicates that an individual is free from medical, cognitive, and functional impairments, which is the approach to define healthy aging in this study.
We also defined a secondary cohort as a composite score of 0, with age greater than 85 classified as healthy aging, otherwise as non-healthy aging. Two distinct cohorts were then created for primary outcome and the secondary outcome analysis, respectively. The secondary cohort analysis allows us to examine whether the association between SDOH and healthy aging hold consistent when applying a more stringent definition of healthy aging. Consistent results across both cohorts would reinforce the robustness of our findings across varying definitions of healthy aging.
Study design
We adopted a retrospective cohort study design and illustrated the cohort selection process in Fig 1. Patients aged under 75 with an mCCI score of 0 are excluded from the analysis in primary cohort. For the secondary cohort, this exclusion extends to patients aged under 85 with an mCCI score of 0.
Potential risk factors
Potential risk factors (i.e., input features) were SDOH information collected from multifaceted survey questions, including The Basics (demographic information), Lifestyle (smoking, alcohol use, substance use, etc.), Healthcare Access & Utilization (access to and use of health care resources), and Social Factors (neighborhood, social life, stress, etc.). Self-reported race and gender were also recorded and included in the analysis. We reported the counts (percentages) for categorical variables and median (interquartile range) for continuous variables.
Statistical analysis
We aimed to use SDOH features to develop a machine learning model to predict healthy aging. Three machine learning algorithms were applied: logistic regression (LR), multi-layer perceptron (MLP), and Extreme Gradient Boosting [35] (XGBoost). Regularization was employed in both logistic regression (lasso [L1] [36], ridge [L2] [37], and ElasticNet [38]) and XGBoost (alpha [L1] and lambda [L2]) to reduce overfitting. Following machine learning best practices, we split the data into training and testing with a ratio of 8:2. To account for target class imbalance, we employed both random over-sampling and random under-sampling methods and compared their performance for further analyses. [39] For random over-sampling, we increased the minority class to match the size of the majority class, resulting in a final balanced distribution of 50% for each class. Similarly, for random under-sampling, we reduced the majority class to match the size of the minority class, also achieving a balanced class distribution.
After hyperparameters tuning using Bayesian optimization with 5-fold cross-validation over 100 iterations to optimize the area under the receiver operating characteristic curve (AUROC), we reported the performance metrics of the testing set including AUROC, precision, recall, F1 score, and specificity. In addition, we obtained the 95% confidence intervals (CI) of the averaging performance metrics by bootstrap method with 50 iterations. Specifically, the bootstrap involved repeatedly sampling with replacement from the test set, applying the final trained model with chosen hyperparameters to each resampled set, and calculating the average performance metrics. The best model is selected based on average AUROC and the potential clinical application with the goal of a higher F1 score, showing the balance between precision and recall.
We then assessed the fairness of the machine learning model selected by comparing the ratios of metrics such as positive predicted value (PPV), negative predicted value (NPV), false positive rate (FPR), true positive rate (TPR), false negative rate (FNR), and overall accuracy across race and gender. We consider that models with ratios of between 0.8 and 1.25 as achieving predictive parity. We designated non-Hispanic Whites and females as the privileged groups for race and gender, respectively, and identified Black and males as the protected groups for these categories. Lastly, we adopted SHapley Additive exPlanations (SHAP) values to identify and rank the most important features, with a view to providing explainability and improved clinical decision-making. [40]
We performed all analyses in Python (version 3.10 with libraries such as Scikit-learn, Imbalanced-learn). The study followed the STROBE cohort reporting guideline [41] and was approved by the University of Florida institutional review board as “non-human subject” (#NH00044557) with the use of de-identified data.
Results
Descriptive statistics
In the primary cohort, 99,936 eligible older adults aged 50 or older who had responded to SDOH survey questions were included, and 44,109 (44%) were identified as healthy aging (age ≥ 75 and a composite condition score of 0, Table 1). The mean (SD) age was 74 (9.3) years, with 55,294 (55.3%) females, 41,977 (42.0%) males. Of the cohort, 67,457 (67.5%) were White, 14,612 (14.6%) were Black or African American, 11,109 (11.1%) having Hispanic ethnicity. The median (IQR) of the mCCI was 1 (0–2). Most of the individuals lived in their own house (64%), were married (51%), obtained college or advanced degrees (74%), and had Medicare (56.2%).
In the secondary cohort, 62,475 participants were included, and 6,648 (10.6%) were identified as healthy aging (i.e., age ≥ 85 and a composite condition score of 0, Table 1). The mean (SD) age is 71 (10.6) years, with 36,101 (58%) females, 24,671 (40%) males. 38,802 (62%) were White, 11,437 (18%) were Black or African American, 8,270 (13%) having Hispanic ethnicity. The median (IQR) of the mCCI was 2 (1–3). Similarly, most of the individuals lived in their own house (57%), were married (46.9%), obtained college or advanced degrees (70%), and had Medicare (46%).
Model performance and selection
Performance metrics on the test dataset and the AUROC for the three models are presented in Fig 2 and S1 Table Bootstrapped performance with 95% CI over 50 iterations for the best algorithm are included in S2 Table. Overall, all three models achieved decent prediction performance in the AoU database with AUROC >0.7. Among them, we found that the XGBoost model with over-sampling adjustments (AUROC: 0.795 and 0.862 for primary and secondary cohort, respectively) shows superior performance. This outperformed both the LR model (AUROC: 0.786 and 0.85) and the MLP model (AUROC: 0.794 and 0.854). Though the AUROC was comparable between XGBoost and MLP, the F1 score and other metrics of MLP classifier are much lower than those of XGBoost classifier. Detailed machine learning algorithm and tuning values are included in S3 Table, and the confusion matrix is included in S4 Table.
(A) Performance on test datasets of the three algorithms in the primary cohort. (B) Performance on test datasets of the three algorithms in the secondary cohort. *XGBoost: extreme gradient boosting, LR: logistic regression, MLP: multilayer perceptron.
Model fairness assessment
Table 2 demonstrates the fairness metrics across gender and race for the best selected model, XGBoost. The ratios of accuracies and PPVs showed no evidence of model biases towards a specific population.
Feature importance analysis
Fig 3 shows the SHAP values to explain the healthy aging prediction of XGBoost model (best performance). In both cohorts, health insurance type (e.g., Medicare, Medicaid, insurance purchased from a company) is ranked as the most predictive feature (SHAP value: 0.595), followed by employment status (0.233), substance use (0.171), health insurance coverage (yes/no, SHAP value 0.143). The direction of the plots revealed that all top 10 features were positively (red on the right in Fig 3) associated with healthy aging.
Each test sample is depicted as a point for every feature, with the x-axis indicating whether the feature’s effect on the model’s prediction is positive (red on the right) or negative (blue on the right). The color of each point reflects the feature’s value, and this color scale is adjusted individually according to the value range present in the dataset. (A). SHAP values and feature importance for the primary cohort using XGBoost. (B). SHAP values and feature importance for the secondary cohort using XGBoost.
Discussion
This cohort study leverages the AoU datasets, which not only included diverse populations from historically underrepresented groups and racial/ethnic minority groups, but also provided a rich source of SDOH through standardized OMOP data infrastructure. Our findings suggest that machine learning models are capable of identifying patterns for healthy aging from SDOH information, highlighting the promise of integrating SDOH factors into clinical decision-making.
It is noteworthy that our work included high-dimensional SDOH with large-scale population and an explainable ML framework. A few of previous studies were available to include SDOH across several domains, such as neighborhood environment, education access, etc.. [42,43] This study also added values by the integration of objective measures from EHRs with detailed survey data on SDOH, providing a more comprehensive assessment of healthy aging compared to studies relying solely on self-reported data. [28]
Our models showed fairness in predictive parity, where the ratios of both positive (PPV) and negative predictive values (NPV) are close to 1. Ensuring that ML models do not discriminate against different racial or ethnic groups is crucial, as these models must perform equitably independent on sensitive features. In our context, if healthy aging is less accurately identified in disadvantaged groups, it may lead to unnecessary and potentially harmful treatments, thereby increasing their financial burden and causing undue harm. Therefore, maintaining equal PPV and NPV across different demographic groups is imperative to prevent such disparities and ensure equitable and healthcare outcomes. [44]
Some machine learning classifiers are notorious as a “black box” where excellent performance is often obtained at the cost of lacking interpretability. [45–49] In the feature importance analysis, health insurance was the strongest positive SDOH factor for predicting healthy aging. Our study identified top SDOH factors from several domains positively associated with healthy aging: health insurance type, employment status, education level, marital status, housing status. These aligned with previous studies indicating that higher socioeconomic status, including higher income and education level, was associated with better health outcomes. [30,50] Although the precise mechanism of marital and housing statuses on healthy aging has not yet been identified, studies have shown that there was an intricate pattern associated with mental health and chronic diseases. [51,52] Our study added the evidence that they could also be related to healthy aging.
Past research has suggested that substance use and related drug overdoses may have contributed to lower life expectancy, [53–55] however, we found that substance and alcohol use were positively associated with healthy aging. We suggest the following probable underlying causes of the results. First, it is possible that substance and alcohol users in our cohort be healthier than those in the general population from previous studies since our participants mainly participated AoU voluntarily rather than being randomly recruited into the program. Secondly, some substance and alcohol users may have altered their health behaviors following enrollment in the program, namely, Hawthorne effect, where individuals knowingly adopt a healthier behavior when they were being assessed in a research program. [56,57] Thus, there may be difference in substance and alcohol use status between the baseline period and the follow-up period. Additionally, the apparent protective effects observed may reflect the “healthy user bias,” where individuals who engage in moderate substance use also participate in other health-promoting behaviors such as regular exercise, healthy diet, and preventive healthcare seeking. [58] It is also crucial for future studies to consider the substance use pattern in the elderly population to further interpret the study findings. According to statistics from Centers for Disease Control and Prevention, the drug overdose death rates were higher among groups aged 25–44 (~50 deaths per 100,000 population) compared to those aged over 55 (5–35 death per 100,000 population). [59] Thus, our results could be affected by selection bias since we only included participants aged over 50 in our study, whose health behaviors are different from general substance and alcohol use populations. These factors could contribute to the unexpected positive association between substance use and healthy aging in our study.
This study has some limitations. First, to date, a unanimous definition of healthy aging has not yet been reached. [60] Our definition of healthy aging, while based on objective measures, may not capture all constructs of ‘healthy aging’ such as quality of life, social engagement, and subjective well-being. Also, CCI limited the spectrum of comorbidities. For instance, some may not consider Parkinson’s disease as healthy aging, however, it is not covered in CCI.
Secondly, while the All of Us Research Program provides a large and diverse dataset, it may not be fully representative of the U.S. population. Participants in All of Us are volunteers who agreed to share their health data, which could introduce selection bias. These individuals may be more health-conscious, have better access to healthcare, or obtained higher educational degrees than the general population, potentially leading to an overestimation of healthy aging in our sample. While the generalizability of our findings is limited to the participants in AoU, it is important to note that the cohort is wide across the nation. Furthermore, the representation of racial and ethnic minority groups has improved in AoU, which enhances the applicability of our results to a more diverse population. [61] However, caution should still be exercised when extrapolating these findings to other populations or other clinical settings.
Thirdly, while we attempted to account for a wide range of SDOH factors, there may still be unmeasured confounders. For instance, we did not have data on lifelong health behaviors or early-life exposures that could significantly impact aging trajectories. Therefore, residual confounding is present that could bias our study findings, and a causal relationship could not be derived. Future studies could also consider using genetic data to understand the epigenetic factors for healthy aging. Also, those SDOH factors in the AoURP which were collected through surveys could be subject to recall bias. We also acknowledge that while our selected fairness metrics provided a quantitative approach to assess bias, these metrics may mask underlying disparities that contribute to differential outcomes. For example, achieving statistical parity may not imply that both groups have an equitable health care access. Future studies could consider integrate causal analysis to contextualize these fairness metrics by uncovering the root causes of health disparity and explore the intersectionality between SDOH.
Conclusion
In this cohort study utilizing the AoU database, our machine learning model effectively predicted individuals likely to achieve healthy aging, emphasizing the critical influence of health insurance on this outcome. The findings highlight that access to health insurance is not merely a facilitator of healthcare services but a pivotal determinant of long-term health outcomes in older adults. By addressing the gaps in health insurance, policymakers can contribute to the promotion of healthy aging across diverse populations, ultimately leading to improved quality of life. The integration of health insurance into public health strategies could therefore be a powerful tool in enhancing the overall well-being of aging populations.
Supporting information
S1 Table. Performance metrics for the three algorithms.
https://doi.org/10.1371/journal.pone.0342292.s001
(DOCX)
S2 Table. Bootstrapped performance on the test dataset over 50 iterations.
https://doi.org/10.1371/journal.pone.0342292.s002
(DOCX)
S3 Table. Selection of model hyperparameters using Bayesian optimization with 5-fold cross-validation.
https://doi.org/10.1371/journal.pone.0342292.s003
(DOCX)
S4 Table. Confusion matrix for the primary cohort.
https://doi.org/10.1371/journal.pone.0342292.s004
(DOCX)
Acknowledgments
We gratefully acknowledge All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health’s All of Us Research Program for making available the participant data examined in this study.
References
- 1. Menassa M, Stronks K, Khatmi F, Roa Díaz ZM, Espinola OP, Gamba M, et al. Concepts and definitions of healthy ageing: A systematic review and synthesis of theoretical models. EClinicalMedicine. 2023;56:101821. pmid:36684393
- 2. Rudnicka E, Napierała P, Podfigurna A, Męczekalski B, Smolarczyk R, Grymowicz M. The World Health Organization (WHO) approach to healthy ageing. Maturitas. 2020;139:6–11. pmid:32747042
- 3. Michel J-P, Leonardi M, Martin M, Prina M. WHO’s report for the decade of healthy ageing 2021-30 sets the stage for globally comparable data on healthy ageing. Lancet Healthy Longev. 2021;2(3):e121–2. pmid:36098109
- 4.
Pan American Health Organization. Healthy Aging. PAHO/WHO. https://www.paho.org/en/healthy-aging. 2024. Accessed 2024 July 7.
- 5. Beard JR, Officer A, de Carvalho IA, Sadana R, Pot AM, Michel J-P, et al. The World report on ageing and health: A policy framework for healthy ageing. Lancet. 2016;387(10033):2145–54. pmid:26520231
- 6.
Healthy aging. National Institute on Aging. https://www.nia.nih.gov/health/healthy-aging. Accessed 2024 July 7.
- 7. Rowe JW, Kahn RL. Successful aging. Gerontologist. 1997;37(4):433–40. pmid:9279031
- 8. Dey AB. World report on ageing and health. Indian Journal of Medical Research. 2017;145(1):150–1.
- 9. Jackson EMJ, O’Brien K, McGuire LC, Baumgart M, Gore J, Brandt K, et al. Promoting healthy aging: Public Health as a leader for reducing dementia risk. Public Policy Aging Rep. 2023;33(2):92–5. pmid:37736523
- 10. Bureau UC. Older Population and Aging. Census.gov. https://www.census.gov/topics/population/older-aging.html. Accessed 2024 July 7.
- 11.
Vespa J, Medina L, Armstrong DM. Population Estimates and Projections.
- 12. Noren Hooten N, Pacheco NL, Smith JT, Evans MK. The accelerated aging phenotype: The role of race and social determinants of health on aging. Ageing Res Rev. 2022;73:101536. pmid:34883202
- 13.
CDC. Social Determinants of Health (SDOH). About CDC. https://www.cdc.gov/about/priorities/why-is-addressing-sdoh-important.html. 2024. Accessed 2024 July 7.
- 14.
Social Determinants of Health - Healthy People 2030. https://health.gov/healthypeople/priority-areas/social-determinants-health. Accessed 2023 October 1.
- 15.
Social determinants of health: Key concepts. https://www.who.int/news-room/questions-and-answers/item/social-determinants-of-health-key-concepts. Accessed 2024 July 7.
- 16. Rangachari P, Govindarajan A, Mehta R, Seehusen D, Rethemeyer RK. The relationship between Social Determinants of Health (SDoH) and death from cardiovascular disease or opioid use in counties across the United States (2009-2018). BMC Public Health. 2022;22(1):236. pmid:35120479
- 17. Tran R, Forman R, Mossialos E, Nasir K, Kulkarni A. Social Determinants of disparities in mortality outcomes in congenital heart disease: A systematic review and meta-analysis. Front Cardiovasc Med. 2022;9:829902. pmid:35369346
- 18. Short SE, Mollborn S. Social determinants and health behaviors: Conceptual frames and empirical advances. Curr Opin Psychol. 2015;5:78–84. pmid:26213711
- 19. Alcántara C, Diaz SV, Cosenzo LG, Loucks EB, Penedo FJ, Williams NJ. Social determinants as moderators of the effectiveness of health behavior change interventions: Scientific gaps and opportunities. Health Psychol Rev. 2020;14(1):132–44. pmid:31957557
- 20. Ayangunna E, Kalu K, Shah G. Role of Community-level health behaviors and social determinants of health in preventable hospitalizations. JGPHA. 2022;8(3).
- 21. Bundy JD, Mills KT, He H, LaVeist TA, Ferdinand KC, Chen J, et al. Social determinants of health and premature death among adults in the USA from 1999 to 2018: A national cohort study. Lancet Public Health. 2023;8(6):e422–31. pmid:37244672
- 22. Monroe P, Campbell JA, Harris M, Egede LE. Racial/ethnic differences in social determinants of health and health outcomes among adolescents and youth ages 10-24 years old: A scoping review. BMC Public Health. 2023;23(1):410. pmid:36855084
- 23. Adkins-Jackson PB, George KM, Besser LM, Hyun J, Lamar M, Hill-Jarrett TG, et al. The structural and social determinants of Alzheimer’s disease related dementias. Alzheimers Dement. 2023;19(7):3171–85. pmid:37074203
- 24. Perez FP, Perez CA, Chumbiauca MN. Insights into the social determinants of health in older adults. JBiSE. 2022;15(11):261–8.
- 25. Llorens-Ortega R, Bertran-Noguer C, Juvinyà-Canals D, Garre-Olmo J, Bosch-Farré C. Influence of social determinants of health in the evolution of the quality of life of older adults in Europe: A comparative analysis between men and women. Humanit Soc Sci Commun. 2024;11(1).
- 26. Yearby R. The social determinants of health, health disparities, and health justice. J Law Med Ethics. 2022;50(4):641–9. pmid:36883406
- 27. Abud T, Kounidas G, Martin KR, Werth M, Cooper K, Myint PK. Determinants of healthy ageing: A systematic review of contemporary literature. Aging Clin Exp Res. 2022;34(6):1215–23. pmid:35132578
- 28. Sowa A, Tobiasz-Adamczyk B, Topór-Mądry R, Poscia A, la Milia DI. Predictors of healthy ageing: Public health policy targets. BMC Health Serv Res. 2016;16 Suppl 5(Suppl 5):289. pmid:27609315
- 29. Wong J, Horwitz MM, Zhou L, Toh S. Using machine learning to identify health outcomes from electronic health record data. Curr Epidemiol Rep. 2018;5(4):331–42. pmid:30555773
- 30. Wagg E, Blyth FM, Cumming RG, Khalatbari-Soltani S. Socioeconomic position and healthy ageing: A systematic review of cross-sectional and longitudinal studies. Ageing Res Rev. 2021;69:101365. pmid:34004378
- 31. All of Us Research Program Investigators, Denny JC, Rutter JL, Goldstein DB, Philippakis A, Smoller JW, et al. The “All of Us” Research Program. N Engl J Med. 2019;381(7):668–76. pmid:31412182
- 32. Tesfaye S, Cronin RM, Lopez-Class M, Chen Q, Foster CS, Gu CA, et al. Measuring social determinants of health in the All of Us Research Program. Sci Rep. 2024;14(1):8815. pmid:38627404
- 33.
Data Standardization – OHDSI. https://www.ohdsi.org/data-standardization/. Accessed 2024 February 29.
- 34. Quan H, Li B, Couris CM, Fushimi K, Graham P, Hider P, et al. Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am J Epidemiol. 2011;173(6):676–82. pmid:21330339
- 35. Chen T, Guestrin C. XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. 785–94.
- 36. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1996;58(1):267–88.
- 37. Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.
- 38.
Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology.
- 39.
Lemaître G, Nogueira F, Aridas C. Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning. 2016;18.
- 40. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 4768–77.
- 41. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guidelines for reporting observational studies. BMJ. 2007;335(7624):806–8. pmid:17947786
- 42. de Keijzer C, Bauwelinck M, Dadvand P. Long-term exposure to residential greenspace and healthy ageing: A systematic review. Curr Environ Health Rep. 2020;7(1):65–88. pmid:31981136
- 43. Chen M, Tan X, Padman R. Social determinants of health in electronic health records and their impact on analysis and risk prediction: A systematic review. J Am Med Inform Assoc. 2020;27(11):1764–73. pmid:33202021
- 44. Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring fairness in machine learning to advance health equity. Ann Intern Med. 2018;169(12):866–72. pmid:30508424
- 45. Stiglic G, Kocbek P, Fijacko N, Zitnik M, Verbert K, Cilar L. Interpretability of machine learning‐based prediction models in healthcare. WIREs Data Min & Knowl. 2020;10(5).
- 46. Zihni E, Madai VI, Livne M, Galinovic I, Khalil AA, Fiebach JB, et al. Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome. PLoS One. 2020;15(4):e0231166. pmid:32251471
- 47. Sajid MR, Khan AA, Albar HM, Muhammad N, Sami W, Bukhari SAC, et al. Exploration of black boxes of supervised machine learning models: A demonstration on development of predictive heart risk score. Comput Intell Neurosci. 2022;2022:5475313. pmid:35602638
- 48. Azodi CB, Tang J, Shiu S-H. Opening the black box: Interpretable machine learning for geneticists. Trends Genet. 2020;36(6):442–55. pmid:32396837
- 49. Ratti E, Graves M. Explainable machine learning practices: Opening another black box for reliable medical AI. AI Ethics. 2022;2(4):801–14.
- 50. Wu Y-T, Daskalopoulou C, Muniz Terrera G, Sanchez Niubo A, Rodríguez-Artalejo F, Ayuso-Mateos JL, et al. Education and wealth inequalities in healthy ageing in eight harmonised cohorts in the ATHLOS consortium: A population-based study. Lancet Public Health. 2020;5(7):e386–94. pmid:32619540
- 51. Swope CB, Hernández D. Housing as a determinant of health equity: A conceptual model. Soc Sci Med. 2019;243:112571. pmid:31675514
- 52. Yannakoulia M, Panagiotakos D, Pitsavos C, Skoumas Y, Stafanadis C. Eating patterns may mediate the association between marital status, body mass index, and blood cholesterol levels in apparently healthy men and women from the ATTICA study. Soc Sci Med. 2008;66(11):2230–9. pmid:18329772
- 53. Rehm J, Probst C. Decreases of life expectancy despite decreases in non-communicable disease mortality: The role of substance use and socioeconomic status. Eur Addict Res. 2018;24(2):53–9. pmid:29627831
- 54. Imtiaz S, Probst C, Rehm J. Substance use and population life expectancy in the USA: Interactions with health inequalities and implications for policy. Drug Alcohol Rev. 2018;37 Suppl 1:S263–7. pmid:29737615
- 55. Gold MS. The Role of alcohol, drugs, and deaths of despair in the U.S.’s Falling Life Expectancy. Mo Med. 2020;117(2):99–101. pmid:32308224
- 56. Clifford PR, Davis CM, Maisto SA, Stout RL. Alcohol Treatment Research Contributing to Changes in Substance Use Behavior and Related Negative Consequences. J Stud Alcohol Drugs. 2022;83(3):364–73. pmid:35590177
- 57. Berkhout C, Berbra O, Favre J, Collins C, Calafiore M, Peremans L, et al. Defining and evaluating the Hawthorne effect in primary care, a systematic review and meta-analysis. Front Med (Lausanne). 2022;9:1033486. pmid:36425097
- 58. Shrank WH, Patrick AR, Brookhart MA. Healthy user and related biases in observational studies of preventive interventions: A primer for physicians. J Gen Intern Med. 2011;26(5):546–50. pmid:21203857
- 59.
Drug overdose deaths - Health, United States. https://www.cdc.gov/nchs/hus/topics/drug-overdose-deaths.htm. 2023. Accessed 2024 July 7.
- 60. Lu W, Pikhart H, Sacker A. Domains and measurements of healthy aging in epidemiological studies: A review. Gerontologist. 2019;59(4):e294–310. pmid:29897451
- 61. Kathiresan N, Cho SMJ, Bhattacharya R, Truong B, Hornsby W, Natarajan P. Representation of race and ethnicity in the contemporary us health cohort all of us research program. JAMA Cardiol. 2023;8(9):859–64. pmid:37585212