Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Unsupervised learning using EHR and census data to identify distinct subphenotypes of newly diagnosed hypertension patients

  • Jaclyn M. Hall ,

    Roles Conceptualization, Data curation, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    jaclynha@ufl.edu

    Affiliation Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, United States of America

  • Jie Xu,

    Roles Formal analysis, Methodology, Validation, Writing – original draft

    Affiliation Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, United States of America

  • Marta G. Walsh,

    Roles Data curation, Methodology, Project administration, Resources, Writing – review & editing

    Affiliation Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, Florida, United States of America

  • Hee-Deok Cho,

    Roles Data curation, Formal analysis, Investigation, Methodology, Visualization

    Affiliation Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, United States of America

  • Grant Harrell,

    Roles Conceptualization, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Community Health & Family Medicine, University of Florida, Gainesville, Florida, United States of America

  • Shailina A. Keshwani,

    Roles Data curation, Investigation, Methodology, Software, Writing – review & editing

    Affiliation Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, Florida, United States of America

  • Steven M. Smith,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Department of Pharmaceutical Outcomes & Policy, University of Florida, Gainesville, Florida, United States of America, Division of Cardiovascular Medicine, Department of Medicine, University of Florida, Gainesville, Florida, United States of America

  • Stephanie A. S. Staras

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, United States of America

Abstract

Background

Hypertension (HTN) is a complex condition with significant heterogeneity in presentation and treatment response. Identifying distinct subphenotypes of HTN may improve our understanding of its underlying mechanisms and guide more precise treatment or public health initiatives.

Methods

Using EHR and Medicaid claims data from the OneFlorida+ research consortium (2012–2021), we identified a cohort of adult Floridians with newly diagnosed HTN (first diagnosis following two outpatient blood pressures ≥140/90 mmHg & no prior anti-HTN treatment). We extracted demographic and clinical data from the diagnosis visit and ≤1 year prior. We used hierarchical clustering (unsupervised machine learning) to identify distinct subphenotypes within the OneFlorida+ HTN population.

Results

A total of 40,686 patients were included (mean ± SD age, 60.9 ± 17.5 y; 55% women). Five subphenotypes (S1-5) were identified. S1 was characterized by older age, higher Body Mass Index (BMI), and prevalent type 2 diabetes. S2 included over 50% of Black patients who were primarily women, younger, with higher BMI, but living in communities with higher levels of socioeconomic vulnerabilities. S3 contained a higher percentage of Hispanic patients with comparatively lower BMI. S4 is characterized by higher age and co-morbidities. S5 had 94% of patients with chronic kidney disease. Distinctions in social determinants of health factors were also observed.

Conclusions

Unsupervised learning identified 5 HTN subphenotypes varying in demographic, socioeconomic, and risk profiles. Further investigation into the biological mechanisms of these subphenotypes and the relationships to social factors may enhance our ability to deliver targeted interventions that consider social policy implications in addition to the traditional behavioral and physiological interventions.

Controlling high blood pressure (BP) is paramount to public health. High BP, also known as hypertension (HTN) is a serious circulatory condition that afflicts approximately 48% of U.S. adults [1]. HTN is the leading underlying risk factor contributing to death and morbidity worldwide. Despite several evidence-based guidelines and proven interventions, only 1 in 4 U.S. adults have their BP under control [1]. In the US, cardiovascular health (CVH) remains concerningly low, and there are multiple opportunities to monitor, maintain, and improve CVH in individuals and the population [2].

HTN can affect people of all ages, genders, and ethnic backgrounds. Certain factors can increase the risk of developing HTN, and these include advancing age, gender, family history of HTN, smoking, obesity, and chronic conditions, including kidney disease, diabetes, and sleep apnea [36]. For example, men are more likely to smoke than women but also more likely to be physically active [2,7]. Thus, tailored interventions to focus on reducing smoking may be needed for subgroups of men. It is also well established that some ethnic groups have a higher predisposition for developing HTN, including individuals of African, Caribbean, or South Asian descent [811]. Tailored interventions, based on a group’s cultural values that reflect the behavioral preferences and expectations of the group, are more successful [12].

Successful hypertension management is strongly influenced by social context, like education and income [13]. Incorporating known non-medical drivers of health can improve the accuracy of hypertension risk models by accounting for environmental and social factors that contribute to disease development and progression. Many environmental and community factors—like chronic stress from substandard housing or financial insecurity—can raise blood pressure, worsen control and contribute to related conditions such as poor sleep and mental health [1416]. Hypertension disproportionately affects marginalized groups and community measured social determinants of health, such as poverty or limited access to care, contribute to disparities in hypertension outcomes [17]. Community level variables provide the critical context that helps explain the higher risk faced by some patients, despite their clinical profiles, leading to improved risk stratification and more targeted interventions. A better understanding of the contribution of these social determinants of health can inform policymakers and health systems seeking to align resources with the needs of the community, e.g., community clinics, transportation services, or education campaigns in underserved areas.

Despite the knowledge that HTN is a complex condition with significant heterogeneity in presentation and treatment response [3,18], standard clinical recommendations to improve CVH suggest a one-size-fits-all treatment approach that includes an extensive list of behavioral and clinical interventions. The clinically supported recommendation by the American Heart Association Life’s Essential 8 ™ is to improve CVH by focusing on eight categories: 1) manage cholesterol, 2) control blood sugar, 3) lower BP, 4) improve nutrition, 5) increase physical exercise, 6) manage a healthy weight, 7) refrain from smoking, and 8) sleep 7–9 hours daily [19]. While critical to CVH, only a small fraction (0.45%) of the U.S. population is reaching optimal goals for all eight measures [2]. Large differences exist in the scores; thus, the health interventions needed to achieve these eight measures will also differ across groups [20].

Asking a patient with, or at risk for, HTN to undergo multiple health interventions simultaneously can potentially be overwhelming and challenging for them to manage effectively. Implementing multiple interventions simultaneously might be unrealistic for some patients, leading to willpower depletion, behavioral inertia, or the natural resistance to change a habit. These factors are amplified when patients have limited time, resources, or support [21]. To successfully improve their odds of preventing HTN, clinicians could guide patients in beginning their journey to better health by focusing on one or two of the eight components (e.g., nutrition and physical exercise). There is evidence that the choice of the priority intervention matters and may differ by population subgroups. For example, because racial/ethnic disparities in HTN prevalence are significant, identifying the most beneficial targeted interventions may be predicted by the needs and circumstances of population subgroups [22,23]. However, there is little evidence to assist clinicians in determining which areas of change should be prioritized.

By recognizing and categorizing HTN subphenotypes, we can potentially tailor more precise and effective treatments, leading to better patient outcomes. Additionally, identifying distinct subphenotypes of HTN may advance our understanding of the underlying mechanisms of this condition. In this paper, we applied hierarchical clustering to analyze patient data to identify distinct subgroups of patients with HTN. This allowed for a detailed exploration of the variations within the hypertensive population and provided insights that could enhance our approach to managing HTN.

Methodology

Ethics statement

This study was conducted using anonymized HIPAA-limited electronic health records obtained from the OneFlorida+ Data Trust. The OneFlorida+ Data Trust Consortium reviewed and approved the approach and variables. The study data were fully de-identified and did not contain any personal identifiers that could be linked to individual patients, ensuring the protection of patient privacy. This retrospective study did not require direct interaction with human subjects. The study was conducted in full compliance with relevant ethical standards for human data research and was approved by the University of Florida Institutional Review Board (IRB# IRB202201071).

Data source and study population

This study utilized EHR and Medicaid claims data from the OneFlorida+ Clinical Research Network (CRN) [24], covering from January 1, 2012 to December 31, 2021, and data were received in August of 2022. OneFlorida+ is a comprehensive clinical research network with a longitudinal dataset of over 17 million Floridians’ real-world patient-level data, encompassing Medicaid claims, cancer registries, vital statistics, and EHRs from clinical partners (health systems with broad geographic coverage across the state). The network complies with the PCORnet Common Data Model, providing anonymized information on patient demographics, vital signs, conditions, encounters, diagnoses, procedures, prescriptions, dispensing, and laboratory results.

To identify the study population, we applied a strict criterion to develop the base cohort of patients with newly diagnosed HTN, similar to our prior studies [25]. Specifically, we included adult individuals (age ≥ 18 years) who received their first HTN diagnosis, marked by ICD-9 codes 401.x or ICD-10 codes I10-I15, with the first HTN diagnosis being considered the index date. To ensure the accuracy and reliability of our findings, and avoid patients with missing data, we identified patients who are meaningfully engaged with the healthcare system. Patients were required to have ≥2 prior elevated outpatient BP readings on separate dates from the same health system within the 18 months preceding (and inclusive of) the index date. Elevated BP was defined as a systolic BP ≥ 140 or diastolic BP ≥ 90 mm Hg. Patients were excluded if they had a prior history of prescribed 1st-line antihypertensive treatment (angiotensin-converting enzyme inhibitor [ACEI], angiotensin receptor blocker [ARB], calcium channel blocker [CCB], thiazide diuretic, or β-blocker). Additionally, to maximize the inclusion of consistent patients (who would be expected to have the most relevant healthcare data at the health system in which they were determined to have incident HTN), we required that patients have ≥2 outpatient encounters over the 24 months preceding (and exclusive of) the index date, at the same health system as the elevated BP and HTN diagnosis. Finally, we excluded non-Florida residents, those without a BP in the 6 months prior to and including the index date, those with missing or physiologically implausible lab or other biometric measurements, and those with pregnancy-associated HTN (ICD-10 O12-O16 or ICD-9 642.3–642.6 within nine months of index date) (Fig 1). The study protocol received ethical approval from the Institutional Review Board of the study institution.

thumbnail
Fig 1. Flow diagram for development of initial HTN diagnosis cohort.

https://doi.org/10.1371/journal.pone.0326776.g001

Data preprocessing

In this study, the clustering analysis utilized essential demographic and clinical information, such as comorbidities and prescribed drugs, from the diagnosis visit and the period up to one year before the index date. A total of 41 variables from OneFlorida+ EHR included five demographic variables (gender, age, race, ethnicity, insurance type), a history of 17 comorbidities, 21 clinically relevant variables, 13 lab results, BMI, and the Charlson Comorbidity Score (CCS). The CCS is a well-validated index incorporating 19 categories of medical conditions identifiable in medical records, allowing for a more precise understanding of the burden of chronic disease at a population level [26]. In addition, to gain insights into the contextual factors influencing patient health outcomes, patient’s 5-digit residential zip codes were used to link to community-level factors associated with the social determinants of health. We identified the Zip Code Tabulation Area (ZCTA) for each 5-digit zip code. ZCTAs are the generalized spatial representations of the geographic extent of the mail routes that a ZIP Code represents, built using Census blocks. We linked each patient’s most recent residential location 5-digit zip code to 17 variables related to the Social Determinants of Health (SDoH) from Census.gov. These included ZCTA level unemployment rate, percent poverty, percentage of population without a high school degree, etc.). Including community variables enabled us to consider broader social and environmental determinants that might impact HTN, providing valuable context for our research.

Thorough data preprocessing, described below, was conducted to ensure the quality and reliability of the dataset used for HTN subphenotyping. For variables with missing values, we employed imputation techniques using the median value of that specific variable among all patients with values. Additionally, we handled categorical variables like race and binary attributes like sex, using one-hot encoding [27] to transform categorical variables into numerical representations, facilitating their incorporation into our hierarchical clustering analyses.

Subphenotyping with agglomerative hierarchical clustering

Hierarchical cluster analysis aims to group objects or records that are “close” to one another, and the calculation of distance measures is repeated between clusters, which are then grouped into larger clusters. The outcome is represented graphically as a dendrogram. Following the preprocessing of variables as previously described, we employed agglomerative hierarchical clustering of the cohort using Ward linkage to derive subphenotypes [28]. The primary objective was to minimize within-group dispersion at each binary fusion step. The linkage function specifies the distance between two clusters and is computed as the increase in the “error sum of squares” (ESS) after combining patients from the two clusters into a single cluster. It is repeated until only two clusters remain. The resulting dendrogram initially differentiated the clustering outcomes, illustrated in Fig 2. To identify the optimal number of clusters that best captured the underlying patterns within the data, we utilized two distinct metrics, i.e., the gap statistic to compare the change in within-cluster dispersion with that expected under an appropriate reference null distribution [29], and Davies-Bouldin index to evaluate the ratio between the cluster scatter and the cluster separations [30].

thumbnail
Fig 2. Dendrogram depicting the five selected subphenotypes of HTN patients.

The blue dotted line represents the cut-point, or the initial differentiation of the clustering outcome. The horizontal axis represents the inter-cluster distances, while the vertical axis represents the distribution of patients with HTN.

https://doi.org/10.1371/journal.pone.0326776.g002

After completing the clustering process, we conducted a series of statistical tests to explore potential differences among the identified subphenotypes. For categorical variables, we applied the chi-square test [31]. In the case of continuous data, we first assessed the normality of the variables using the Shapiro-Wilk test [32]. For those variables found to be normally distributed, we performed adjustments for age and sex using One-Way ANOVA analysis [33]. Additionally, we applied the Kruskal-Wallis test for variables exhibiting a skewed distribution [34]. This approach to clustering and subsequent statistical analysis aimed to reveal meaningful patterns and associations within the data, providing insights into the characteristics of the distinct HTN subphenotypes.

Results

Between January 1, 2012, and June 30, 2021, 2,798,975 patients with an HTN diagnosis were recorded in OneFlorida + . We identified a unique cohort of 40,686 adult Floridians with newly diagnosed HTN (first diagnosis following two outpatient BPs ≥ 140/90 mmHg and no prior anti-HTN treatment) who remained within the same health system. Among this final cohort, 55% were female, 44% were 45–64 years old, 20% were Black, 22% were Hispanic, and 0.08% had no race or ethnicity identified (Table 1).

thumbnail
Table 1. Newly diagnosed HTN Patient Cohort from OneFlorida (2012–2021).

https://doi.org/10.1371/journal.pone.0326776.t001

The average age of the final cohort was 60.9 (± SD 17.5) years, with an average BMI of 30.5 kg/m2 and 23% with records of current tobacco use (Table 2). The average combined comorbidity score for the entire cohort was 1.3. Only 6.1% of the patients were insured by Medicaid, 30% were insured by Medicare, and 45% were privately insured. Also of note- 10% of the cohort had a history of depression, 15% had a history of sleep disorder, 15% also had a history of type 2 diabetes, and 26% had a history of chronic kidney disease. The mean LDL-cholesterol was in the normal range for the entire cohort, but the mean value for the protective HDL-cholesterol, 51.5 mg/dL, was lower than the recommended ≥60 mg/dL.

thumbnail
Table 2. Characteristics of 5 HTN subphenotypes derived from unsupervised hierarchical clustering. Clustering was guided by the Davies-Bouldin index and Gap Statistics to determine the appropriate number of clusters. Cells are color-scaled for each individual row where the maximum value is red, the minimum value is deepest blue, the mean of the values would be white, and cells are colored based on the value’s relative position between minimum and maximum. Ordinary least squared regression p-values, adjusting for age and sex. All values < .05 unless indicated by a * (* denotes p-value > .05).

https://doi.org/10.1371/journal.pone.0326776.t002

Five subphenotypes of new HTN patients were identified by evaluating the optimal clustering statistics, with patient numbers ranging from 482 to 22,991 (Table 2). Colors represent the highest (red) and lowest (blue) values, with shading indicating where values fall within the range of each row. Characteristics of each contributing variable to the subphenotypes are given in Table 2, including the adjusted p-value result of testing the statistical differences between subphenotypes. Only three clinical variables (history of atrial fibrillation, COPD, and uric acid measurement) and one community-level variable (per capita income) did not show statistical differences across subphenotypes. Two race variables had < 50 individuals—Native American/Alaskan and Native Hawaiian/Other Pacific Islander.

Subphenotype 1 had the largest membership (22,991 patients, 57%) and had a characterization more closely reflecting the overall cohort for demographic and most clinical variables. Compared to the other subphenotypes, Subphenotype 1 has higher participation in private insurance, a higher EHR recorded history of asthma, depression, and sleep disorders, and higher lab values for cholesterol, triglycerides, ALT, and AST. Patients in S1 also tended to live in ZCTAs with communities associated with less socioeconomic vulnerability and less minority population but a higher percentage of mobile homes.

Subphenotype 2 (5,983 patients, 15%) had a larger share of female (62%) and Black patients (52%) compared with the overall cohort. S2 members were characterized by the youngest mean age, the least comorbidities (mean CCS = 0.8), the highest percent with Medicaid insurance, higher BMI, and a higher percent of members residing in ZCTAs with the socioeconomic burden (e.g., higher poverty, unemployment, a person with a disability, adults with less than high school education).

Subphenotype 3 (8,712 patients, 21%) had a higher percentage of White and Hispanic patients than the overall cohort. Most Asian and multi-race patients were members of S3. S3 had relatively low rates of comorbidities (mean CCS = 1.2), a low mean BMI, and a comparatively lower smoking rate (16%). S3 members also resided more frequently in communities with higher incomes and less vulnerability and minority populations who speak English ‘less well.’

Subphenotype 4 (2,518 patients, 6%) had a higher percent White and non-Hispanic patients. This group represents older patients, mostly with Medicare insurance, 36% of which smoke, and who have multiple comorbidities (mean CCS = 3.6), including a high percent with history of various cardiac-related conditions (50% with coronary heart disease, 77% with atherosclerotic heart disease, and 9% with history of stroke). This subphenotype was associated with a higher percentage of residences in ZCTAs that closely aligned with the overall cohort on social and economic variables.

Subphenotype 5 (482 patients, 1%) was a small, specific group comprised of individuals with chronic kidney disease (95%), end-stage kidney failure (70%), and the highest levels of creatine, potassium, and uric acid. S5 had the second lowest mean age and the lowest BMI but the highest mean CCS (5.3). SDoH variables showed only slightly more burden than the larger population.

Discussion

We identified five subphenotypes of patients with a first diagnosis of HTN. Derived from an EHR and claims data trust representing over 17 million Floridians, our results represent those Floridians seeking and receiving health care. While the majority of the patients were separated into a more traditional risk factor group of older White adults with some chronic conditions and Black patients with obesity, we also identified a group consisting of older Hispanics and two distinct groups of patients with more advanced chronic conditions. These subphenotypes represent a first step in the process of precision prioritization of the Essential Eight to specific groups of patients, which may lead to more effective and efficient control of BP.

Our overall cohort had expected differences compared to those reported for the Florida population. First, similar to other EHR-based studies, the percentage of females is higher than that of males, as women seek health care services more than men. Additionally, for HTN specifically, older women have a higher risk for HTN after losing the beneficial effects of hormones post-menopause [35,36]. Second, the racial and ethnic breakdown of the overall cohort of newly diagnosed HTN aligns with that of the population of the state of Florida in higher age categories [37]. While the HTN cohort has a lower percentage of Hispanics compared to Florida, this is expected because of the lower prevalence of HTN in US Hispanic populations and the smaller (but growing) proportion of older-age Hispanics in Florida [38]. Third, consistent with HTN risk factors of higher smoking rates (cohort = 23.2% vs. Florida = 14.7%) and diabetes (cohort = 15.2% and Florida = 11.8%) [37].

Our results demonstrated significant differences in clinical values and cooccurring condition presentation across different subphenotypes. Subphenotype 1 contained 56.5% of the initial newly HTN diagnosed cohort and represents what might be thought of as a historically typical patient with HTN: majority White, non-Hispanic, ~ 60 years of age, overweight with high cholesterol, having a history of sleep-disordered or depression, at risk of kidney disease, and about 1 in 4 are tobacco users. This subphenotype may benefit from interventions around maintaining weight, lowering cholesterol, and maximizing restful sleep. Members of S1 were from various communities but were more likely to live in a community of mobile homes, possibly representing residents of the many rural areas or wintering retiree communities in Florida. In contrast, Subphenotype 2 had a higher share of Black patients and women, lower mean age at HTN onset, and fewer comorbidities, yet higher mean BMI. Both obesity and HTN affect Black men and women at younger ages in the US [39]. Patients in this subphenotype are more likely to live in communities with higher poverty, a higher percentage of non-White residents, and have a lower income [40]. This subphenotype is an important target group, and clinicians can prioritize these younger Black patients for early identification of HTN to prevent further HTN-related complications. Subphenotype 3 represents Florida’s first sizeable population of retiring Hispanic residents. This group is slightly higher in mean age than the entire HTN cohort. Still, on average, they are relatively healthy and tend to live in communities with higher income, possibly representing the large Hispanic population in South Florida. This subphenotype may benefit most from maintaining weight and other healthy lifestyle behaviors. Almost all Asians in the cohort were placed in Subphenotype 3, demonstrating how our clustering method efficiently identifies even small subgroups. Subphenotype 4 has a higher age and much higher rates of comorbidities. This subphenotype of primarily White men represents an important group that needs concentrated efforts to boost cardiovascular wellness, along with help with sleep disorders, depression, tobacco use, and diabetes. They also live in communities with higher rates of residents living in mobile homes.

Americans from a broad spectrum of community types struggle to achieve optimal CVH [41]. In their 1988 report, the Joint National Committee on Detection, Evaluation, and Treatment of High Blood Pressure recommended considering patient demographic characteristics when selecting initial treatment options [42]. The standard treatments for HTN have evolved as new drugs have become available and with new understandings of health measures associated with improving and maintaining CVH [19, p. 8]. Still, patients face unique challenges in managing CVH, including maintaining healthy behaviors and utilizing preventative health services [43]. With the exception of per-capita income, our five subphenotypes differ significantly in SDoH and demographic factors. Our results demonstrated the significant differences in clinical values and cooccurring condition presentation across different subphenotypes.

Considering sociodemographics and factors related to patient residential community will be necessary to produce both culturally relevant and clinically effective health interventions [44]. Interventions should include health education strategies tailored to specific groups and target a broader set of locations for public health engagement, such as mobile home communities and community centers is low income areas, as the subphenotypes outline. For example, men may have higher prevalence of hypertension earlier in life, and lower hypertension awareness [45]. Thus, men with a profile aligning with Subphenotype 1 may benefit from early monitoring and interventions that address occupational stress or dietary habits common in male populations.

Strengths and limitations

The authors acknowledge several limitations and strengths of this study. The study relied on EHRs and Medicaid claims data, which may not capture all relevant patient data [46]. However, using the OneFlorida+ Data trust, which takes multiple approaches to improve data quality [24], we are assured that data have been standardized across different health data providers. The data used were only from the state of Florida. Florida has the third largest population and is one of the most representative states, as it closely aligns with the nation’s population on matters of race, education, income, employment, etc. [47]. Therefore, studies using data from Florida can be considered generalizable. In addition, OneFlorida+ has EHR and claims data for 16 million FL patients at the time of this study [24], and the final cohort of newly diagnosed HTN patients was sufficiently large to conduct population-level studies. We chose a well-established and interpretable method to identify preliminary patient subgroups that may have clinical relevance. As an exploratory effort, this study offers a starting point for future investigations that could employ more advanced or tailored clustering approaches to further examine these patterns. By incorporating SDoH-related data, we complemented the traditional understanding of clinical patient types. Some variables may contain larger proportions of missing data than others. While we minimized the impact of missingness by employing standard imputation techniques, we acknowledge the missing data may not be random. For laboratory measurements (e.g., ALT and AST, both measures of liver function, or lipid measurements of HDLC, LDLC, and TRIG), the missingness may be correlated. However, the variables are interesting because the values are informative and represent modifiable health measures.

Conclusions

We employed machine learning to delve beyond classic risk factors for hypertension. We identified distinctly different demographic populations that may exhibit different risks and ultimately benefit from different intervention strategies through precision public health initiatives. Unsupervised learning identified 5 HTN subphenotypes varying in demographic, socioeconomic, and risk profiles. These subtypes inform our understanding of HTN patients and the potential barriers they may face to controlling their BP. Identifying distinct subphenotypes of HTN may improve our understanding of the underlying mechanisms of HTN and guide the development of personalized and precise treatment of individuals. Further investigation into the biological mechanisms of these subphenotypes could reveal their potential and barriers to successful blood pressure control. This greater understanding will enhance our ability to deliver targeted interventions that consider social policy implications in addition to the traditional behavioral and physiological interventions.

References

  1. 1. Sekkarie A, Fang J, Hayes D, Loustalot F. Prevalence of Self-Reported Hypertension and Antihypertensive Medication Use Among Adults - United States, 2017-2021. MMWR Morb Mortal Wkly Rep. 2024;73(9):191–8. pmid:38451865
  2. 2. Lloyd-Jones DM, Ning H, Labarthe D, Brewer L, Sharma G, Rosamond W, et al. Status of Cardiovascular Health in US Adults and Children Using the American Heart Association’s New “Life’s Essential 8” Metrics: Prevalence Estimates From the National Health and Nutrition Examination Survey (NHANES), 2013 Through 2018. Circulation. 2022;146(11):822–35. pmid:35766033
  3. 3. Bakris GL, Sorrentino M. Hypertension: A Companion to Braunwald’s Heart Disease. Elsevier Health Sciences; 2017.
  4. 4. Oliveros E, Patel H, Kyung S, Fugar S, Goldberg A, Madan N, et al. Hypertension in older adults: Assessment, management, and challenges. Clin Cardiol. 2020;43(2):99–107. pmid:31825114
  5. 5. CDC. High Blood Pressure Risk Factors. n.d. [cited 2024 June 9. ]. https://www.cdc.gov/high-blood-pressure/risk-factors/index.html
  6. 6. The connection between diabetes, kidney disease and high blood pressure. www.heart.org. n.d. [Jun. 09, 2024. ]. https://www.heart.org/en/news/2020/11/03/the-connection-between-diabetes-kidney-disease-and-high-blood-pressure
  7. 7. Janković J, Mandić-Rajčević S, Davidović M, Janković S. Demographic and socioeconomic inequalities in ideal cardiovascular health: A systematic review and meta-analysis. PLoS One. 2021;16(8):e0255959. pmid:34379696
  8. 8. He J, Bundy JD, Geng S, Tian L, He H, Li X, et al. Social, Behavioral, and Metabolic Risk Factors and Racial Disparities in Cardiovascular Disease Mortality in U.S. Adults : An Observational Study. Ann Intern Med. 2023;176(9):1200–8. pmid:37579311
  9. 9. Wendell CR, Waldstein SR, Evans MK, Zonderman AB. Distributions of Subclinical Cardiovascular Disease in a Socioeconomically and Racially Diverse Sample. Stroke. 2017;48(4):850–6. pmid:28235961
  10. 10. Pursnani S, Merchant M. South Asian ethnicity as a risk factor for coronary heart disease. Atherosclerosis. 2020;315:126–30. pmid:33317714
  11. 11. Baptiste D-L, Turkson-Ocran R-A, Ogungbe O, Koirala B, Francis L, Spaulding EM, et al. Heterogeneity in Cardiovascular Disease Risk Factor Prevalence Among White, African American, African Immigrant, and Afro-Caribbean Adults: Insights From the 2010-2018 National Health Interview Survey. J Am Heart Assoc. 2022;11(18):e025235. pmid:36073627
  12. 12. Marín G. Defining culturally appropriate community interventions: Hispanics as a case study. J Community Psychol. 1993;21(2):149–61.
  13. 13. Mashuri YA, Ng N, Santosa A. Socioeconomic disparities in the burden of hypertension among Indonesian adults - a multilevel analysis. Glob Health Action. 2022;15(1):2129131. pmid:36217968
  14. 14. Sims M, Kershaw KN, Breathett K, Jackson EA, Lewis LM, Mujahid MS, et al. Importance of Housing and Cardiovascular Health and Well-Being: A Scientific Statement From the American Heart Association. Circ Cardiovasc Qual Outcomes. 2020;13(8):e000089. pmid:32673512
  15. 15. Gu KD, Faulkner KC, Thorndike AN. Housing instability and cardiometabolic health in the United States: a narrative review of the literature. BMC Public Health. 2023;23(1):931. pmid:37221492
  16. 16. Mason KE, Alexiou A, Li A, Taylor-Robinson D. The impact of housing insecurity on mental health, sleep and hypertension: Analysis of the UK Household Longitudinal Study and linked data, 2009-2019. Soc Sci Med. 2024;351:116939. pmid:38749252
  17. 17. Chaturvedi A, Zhu A, Gadela NV, Prabhakaran D, Jafar TH. Social Determinants of Health and Disparities in Hypertension and Cardiovascular Diseases. Hypertension. 2024;81(3):387–99. pmid:38152897
  18. 18. Materson BJ, Reda DJ, Cushman WC, Massie BM, Freis ED, Kochar MS, et al. Single-drug therapy for hypertension in men. A comparison of six antihypertensive agents with placebo. The Department of Veterans Affairs Cooperative Study Group on Antihypertensive Agents. N Engl J Med. 1993;328(13):914–21. pmid:8446138
  19. 19. Life’s Essential 8 Fact Sheet. n.d. [cited 2023 November 9. ]. https://www.heart.org/en/healthy-living/healthy-lifestyle/lifes-essential-8/lifes-essential-8-fact-sheet
  20. 20. Virani SS, Alonso A, Benjamin EJ, Bittencourt MS, Callaway CW, Carson AP, et al. Heart Disease and Stroke Statistics-2020 Update: A Report From the American Heart Association. Circulation. 2020;141(9):e139–596. pmid:31992061
  21. 21. Paterick TE, Patel N, Tajik AJ, Chandrasekaran K. Improving health outcomes through patient education and partnerships with patients. Proc (Bayl Univ Med Cent). 2017;30(1):112–3. pmid:28152110
  22. 22. Sanchez E. Life’s Simple 7: Vital But Not Easy. J Am Heart Assoc. 2018;7(11):e009324. pmid:29773574
  23. 23. Fu J, Liu Y, Zhang L, Zhou L, Li D, Quan H, et al. Nonpharmacologic Interventions for Reducing Blood Pressure in Adults With Prehypertension to Established Hypertension. J Am Heart Assoc. 2020;9(19):e016804. pmid:32975166
  24. 24. Hogan WR, Shenkman EA, Robinson T, Carasquillo O, Robinson PS, Essner RZ, et al. The OneFlorida Data Trust: a centralized, translational research data infrastructure of statewide scope. J Am Med Inform Assoc. 2022;29(4):686–93. pmid:34664656
  25. 25. Smith SM, Winterstein AG, Gurka MJ, Walsh MG, Keshwani S, Libby AM, et al. Initial Antihypertensive Regimens in Newly Treated Patients: Real World Evidence From the OneFlorida+ Clinical Research Network. J Am Heart Assoc. 2023;12(1):e026652. pmid:36565195
  26. 26. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–83. pmid:3558716
  27. 27. Rodríguez P, Bautista MA, Gonzàlez J, Escalera S. Beyond one-hot encoding: Lower dimensional target embedding. Image and Vision Computing. 2018;75:21–31.
  28. 28. Murtagh F, Contreras P. Algorithms for hierarchical clustering: an overview. WIREs Data Min & Knowl. 2012;2(1):86–97.
  29. 29. Tibshirani R, Walther G, Hastie T. Estimating the Number of Clusters in a Data Set Via the Gap Statistic. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2001;63(2):411–23.
  30. 30. Ros F, Riad R, Guillaume S. PDBI: A partitioning Davies-Bouldin index for clustering evaluation. Neurocomputing. 2023;528:178–99.
  31. 31. McHugh ML. The chi-square test of independence. Biochem Med (Zagreb). 2013;23(2):143–9. pmid:23894860
  32. 32. Hanusz Z, TarasiŃska J. Simulation Study on Improved Shapiro–Wilk Tests for Normality. Communications in Statistics - Simulation and Computation. 2014;43(9):2093–105.
  33. 33. Judd CM, McClelland GH, Ryan CS. Data Analysis: A Model Comparison Approach To Regression, ANOVA, and Beyond, Third Edition, 3rd ed. New York: Routledge; 2017. https://doi.org/10.4324/9781315744131
  34. 34. McKight PE, Najab J. Kruskal‐Wallis Test. The Corsini Encyclopedia of Psychology. Wiley; 2010. 1–1. https://doi.org/10.1002/9780470479216.corpsy0491
  35. 35. Bertakis KD, Azari R, Helms LJ, Callahan EJ, Robbins JA. Gender differences in the utilization of health care services. J Fam Pract. 2000;49(2):147–52. pmid:10718692
  36. 36. Smith SM, McAuliffe K, Hall JM, McDonough CW, Gurka MJ, Robinson TO, et al. Hypertension in Florida: Data From the OneFlorida Clinical Data Research Network. Prev Chronic Dis. 2018;15:E27. pmid:29494332
  37. 37. FLHealthCHARTS.gov: Population Characteristics Data. n.d. [cited Jun. 07, 2024. ]. https://www.flhealthcharts.gov/charts/PopulationCharacteristics/default.aspx
  38. 38. Elfassy T, Zeki Al Hazzouri A, Cai J, Baldoni PL, Llabre MM, Rundek T, et al. Incidence of Hypertension Among US Hispanics/Latinos: The Hispanic Community Health Study/Study of Latinos, 2008 to 2017. JAHA. 2020;9(12).
  39. 39. Lofton H, Ard JD, Hunt RR, Knight MG. Obesity among African American people in the United States: A review. Obesity (Silver Spring). 2023;31(2):306–15. pmid:36695059
  40. 40. Heart Disease and African Americans. n.d. [cited 2024 June 9. ]. https://minorityhealth.hhs.gov/heart-disease-and-african-americans
  41. 41. Wang Y, Wang QJ. The prevalence of prehypertension and hypertension among US adults according to the new joint national committee guidelines: new challenges of the old problem. Arch Intern Med. 2004;164(19):2126–34. pmid:15505126
  42. 42. The 1988 report of the Joint National Committee on Detection, Evaluation, and Treatment of High Blood Pressure. Arch Intern Med. 1988;148(5):1023–38. pmid:3365073
  43. 43. Javed Z, Haisum Maqsood M, Yahya T, Amin Z, Acquah I, Valero-Elizondo J, et al. Race, Racism, and Cardiovascular Health: Applying a Social Determinants of Health Framework to Racial/Ethnic Disparities in Cardiovascular Disease. Circ Cardiovasc Qual Outcomes. 2022;15(1):e007917. pmid:35041484
  44. 44. Kreuter MW, Thompson T, McQueen A, Garg R. Addressing Social Needs in Health Care Settings: Evidence, Challenges, and Opportunities for Public Health. Annu Rev Public Health. 2021;42:329–44. pmid:33326298
  45. 45. Song J-J, Ma Z, Wang J, Chen L-X, Zhong J-C. Gender Differences in Hypertension. J Cardiovasc Transl Res. 2020;13(1):47–54. pmid:31044374
  46. 46. Gomes KM, Ratwani RM. Evaluating Improvements and Shortcomings in Clinician Satisfaction With Electronic Health Record Usability. JAMA Netw Open. 2019;2(12):e1916651. pmid:31834390
  47. 47. Analysis | What state best represents America? Washington Post; n.d. [cited 2024 July 3. ]. https://www.washingtonpost.com/business/2024/05/10/most-representative-most-unique-places-america/