Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Toward better data disaggregation: A person-centered approach to understanding AANHPI sociodemographic diversity in resource constrained times

  • Lu Dong ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Supervision, Writing – original draft

    ‡ LD and JS contributed equally to this work.

    Affiliations RAND, Santa Monica, California, United States of America, Department of Psychology, Stony Brook University, Stony Brook, New York, United States of America

  • Jaimie Shaff ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Supervision, Writing – original draft

    jshaff@rand.org

    ‡ LD and JS contributed equally to this work.

    Affiliation RAND, Santa Monica, California, United States of America

  • Douglas Yeung,

    Roles Conceptualization, Investigation, Project administration, Resources, Supervision, Writing – original draft

    Affiliation RAND, Santa Monica, California, United States of America

  • Ruolin Lu,

    Roles Formal analysis, Visualization, Writing – review & editing

    Affiliation RAND, Arlington, Virginia, United States of America

  • Delia Bugliari,

    Roles Formal analysis, Supervision, Writing – review & editing

    Affiliation RAND, Santa Monica, California, United States of America

  • Anthony Rodriguez,

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    Affiliation RAND, Boston, United States of America

  • Anita Chandra

    Roles Data curation, Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

    Affiliation RAND, Arlington, Virginia, United States of America

Abstract

Background

Each year, the United States loses billions of dollars due to health inequities. Data disaggregation is essential for understanding the health status and needs of populations to identify these inequities and inform efficient resource allocation. For example, aggregating data from people identifying with Asian, Native Hawaiian, and other Pacific Islander (AANHPI) communities may inhibit the identification of important health challenges within this large and diverse community, impeding meaningful progress toward reducing differences in health outcomes.

Methods

This study employed Latent Class Analysis (LCA) to identify meaningful subgroups within the AANHPI population. Two studies were conducted: Study 1 analyzed data from the Amplify AAPI Survey, which included 1,026 AANHPI adults, while Study 2 utilized the 2023 National Survey of Health Attitudes (NSHA) with a sample of 318 AANHPI respondents. Both studies collected comprehensive sociodemographic measures, including educational attainment, household income, and employment status.

Results

Study 1 identified four latent classes, revealing heterogeneity within the AANHPI sample based on income, education, language use, and generational status. Class characteristics highlighted variations in age, marital status, and employment. Study 2 identified two classes: high socioeconomic status (SES) and low SES. Class characteristics demonstrated differences in age distribution, homeownership, and perceptions of community well-being.

Conclusion

This study demonstrated the feasibility and utility of a person-centered analytic approach like LCA to identify meaningful subgroups within an aggregated population. These findings join a growing body of evidence that emphasizes the complexity within the AANHPI population and the importance of data disaggregation in public health. These insights are crucial for informing targeted interventions and optimizing resource allocation to effectively address health disparities.

Introduction

Health inequities cost the United States (U.S.) billions of dollars each year [1]. Data disaggregation, or the process of breaking data into subgroups, is a crucial public health activity that can inform actions to meaningfully address these inequities and reduce the economic burden associated with them [2]. In the U.S., disaggregated data may be used to understand the characteristics or needs of different populations, determine how to most efficiently allocate resources, and identify opportunities for tailored efforts.

Data disaggregation is used across industries to allocate resources, from determining how to best invest advertising dollars to deploying limited public health resources. Unfortunately, aging systems and outdated practices can limit efficient resource allocation by concealing populations with the greatest needs. Aging data infrastructure (i.e., inability to ingest disaggregated data [3]), limiting statistical norms (i.e., suppression rules [4]), mischaracterization of populations as “hard to measure” (i.e., limited inclusion of low-income communities in survey data [5]), and concerns about the validity of typical approaches (i.e., perceived unmet need [6]), which meaningfully influence real-world implementation of recommendations and best practices. These structural limitations impede the research community from identifying opportunities to better allocate limited resources.

Public health data are typically disaggregated in alignment with federal national data standards, norms within the field, and local demographics. Within public health, data are disaggregated to understand the health of different populations, identify opportunities to address potential health inequities, and direct limited resources to address the most pressing needs. Throughout this paper, we will be using Asian, Native Hawaiian, and other Pacific Islander populations as an illustrative example. The federal national data standards include two distinct categories for “Asian” and “Native Hawaiian or Other Pacific Islander” [7]. Government classifications have long grouped these populations into a single category, referred to in different ways, such as “Asian American and Pacific Islander,” and public health initiatives are often developed for this aggregate group. This approach to aggregating groups together is often done to simplify political and public sector efforts; additionally, Asian, Native Hawaiian, and other Pacific Islander (AANHPI) communities have a long history of working together to form coalitions and uplift the needs of their communities as part of a larger voice [8,9].

AANHPI describes culturally, linguistically, and ethnically diverse people with origins from over 50 countries and territories [10]. Historical immigration laws, global events, and migration patterns have resulted in geographic diversity of AANHPI heritage groups across the U.S. and will continue to do so as the AANHPI population grows [11]. Accordingly, the need for data disaggregation within the AANHPI population has been well established and can greatly improve the efficiency of resource allocation [1214]. Subgrouping recommendations by ancestral heritage have been developed and adopted in the 2020 Census [15] and for health survey data in California [16] and New York [12]. However, questions remain on if grouping by ancestral origin is appropriate for identifying health inequities, or if other factors are more salient [17,18]. Additionally, data disaggregation efforts are limited by the ways in which data are collected: many health, state, and local agencies continue to only collect aggregated AANHPI data [19].

Like any other community, different dimensions of wellbeing of AANHPI community members are influenced by social factors such as ancestry group, language, immigration status, and geography [1214]. Historical events, such as selective immigration policies, influenced the movement of different ancestry groups, positionality in society, biases faced in the U.S., and access to health services [12,14]. Cultural norms, linguistic differences, acculturation, and racialization by phenotypes influence health, perceptions of health status and needs, and multiple steps of the care-seeking process [12,14,20]. These factors impact experiences in the U.S.; the way data are provided, collected, and interpreted; the way interventions are funded, designed, and evaluated; and, ultimately, our ability to make meaningful progress towards all members of our communities experiencing optimal health and wellbeing [21].

Decades of effort have been expended in both advocating and developing methods to facilitate more granular analyses of population subgroups. Many such efforts have focused on increasing data granularity [22], which in turn increases the need to collect greater amounts of data. However, existing data disaggregation approaches may not always be sufficient or feasible, as they fail to capture the important intersectional factors that drive disparities in health outcomes. Methodological characteristics of these approaches may constrain their usefulness. Disaggregation efforts that attempt to disaggregate beyond conventional groups can be impeded by factors such as the way data were collected, lack of statistical power, and risk of identifiability of individuals [2224]. Further, shifting funding priorities may mean that certain types of data (e.g., race/ethnicity) either cannot be collected, or if already collected, are no longer available [19]. If such real-world policy leads data on race/ethnicity becoming either less available or acceptable, the public health community may need to identify alternate approaches to inform efficient and effective allocation of limited resources. Addressing such concerns may require multi-pronged data disaggregation approaches [25].

Intersectionality research leverages a person-centered theoretical framework with quantitative or empirical methods to incorporate social experiences to develop meaningful subgroups [26]. Data-driven quantitative methods, such as latent class analysis (LCA), have successfully supported exploration beyond variable-centered approaches typically used for identifying opportunities to address health needs [26]. Given limitations of existing data, public health teams working to identify and address health inequities utilize more advanced methods, such as LCA, to better understand and respond to the health needs of their communities that have been masked by less advanced techniques.

The goal of this analysis is to empirically identify and describe distinct sociodemographic subgroups within the AANHPI population using a person-centered analytic approach. Given the considerable heterogeneity in educational attainment, income, language use, and generational status among AANHPI individuals [10], we used LCA to uncover meaningful subtypes based on these core indicators. We then characterized these latent classes using additional demographic variables, including age, AANHPI origin, household size, marital status, and employment status. Finally, we assessed the replicability of the identified class structure using a second independent survey sample. This multi-step analytic approach with the use of two study samples allows for a more nuanced understanding of heterogeneity within AANHPI communities and lays the groundwork for more targeted and data-informed research, practice, and decision-making.

Study 1: Analysis of the Amplify AAPI Survey

Methods

Participants and procedures.

In a joint effort by RAND and the Robert Wood Johnson Foundation (RWJF), researchers supplemented the 2023 sample for the National Survey of Health Attitudes (NSHA), [27] with additional AAPI respondents drawn from the Amplify Panel, a probability-based panel of AANHPI populations conducted by NORC at the University of Chicago. Data were accessed May-June 2025; authors had no access to information that could identify individual participants during or after data collection.

Participants for the current study were drawn from the May 2024 wave of the Amplify AAPI Omnibus Survey, a national, probability-based survey of Asian American, Native Hawaiian, and Pacific Islander (AANHPI) adults administered by NORC at the University of Chicago. The survey targeted individuals aged 18 years or older residing in the United States, including all 50 states and the District of Columbia. A total of 5,303 panelists were invited to participate, and 1,026 individuals completed the survey during the field period from May 6 to May 10, 2024, yielding a completion rate of 19.3% and a weighted cumulative response rate of 3.3%.

Participants were recruited from the Amplify AAPI Panel, a probability-based household panel designed to be representative of the U.S. AANHPI population. The panel includes members recruited through two sources: (1) a dedicated AAPI panel built using targeted address-based sampling and (2) AAPI respondents from NORC’s flagship AmeriSpeak Panel. Panel recruitment incorporated multistage sampling with stratification by age, gender, education, and geographic location. To enhance representativeness and reduce language-related selection bias, panel materials and surveys were available in English, Mandarin, Cantonese, Vietnamese, and Korean.

Surveys were administered using a mixed-mode design. Respondents completed the survey either online or via telephone interview, depending on their indicated preference during panel enrollment. Telephone interviews were conducted in English only. NORC implemented a randomized block design to rotate question order and mitigate order effects, as the Omnibus Survey included questions from multiple research sponsors.

To ensure data quality, NORC applied strict data cleaning procedures. A total of 103 cases were excluded due to response quality issues, including speeding, excessive item nonresponse, and straight-lining on grid items. Final analytic weights were computed in multiple stages: panel base weights, study-specific base weights accounting for sampling probabilities, and final weights adjusted for nonresponse and calibrated to U.S. Census American Community Survey (ACS) 2018–2022 benchmarks on age, gender, education, nativity, and AANHPI subgroup. Additional information about panel methodology and recruitment is publicly available [28].

Measures

The survey instrument included a comprehensive set of sociodemographic measures developed and administered by NORC at the University of Chicago as part of the Amplify AAPI May 2024 Omnibus Survey. These measures were selected to capture key indicators relevant to social stratification, cultural background, and demographic heterogeneity within the AANHPI population.

Basic sociodemographic variables.

Respondents self-reported their gender (male, female), age group (18–29, 30–44, 45–59, 60+), and educational attainment (less than high school, high school diploma or GED, some college, bachelor’s degree, postgraduate/professional degree). Annual household income was categorized into five levels: less than $50,000; $50,000–$74,999; $75,000–$99,999; $100,000–$149,999; and $150,000 or more. Marital status was collected and classified as married, divorced/separated, or never married. Employment status was categorized as employed by others (employee), self-employed, not working (e.g., homemakers, job seekers), or retired. Respondents reported the number of individuals residing in their household, categorized as 1, 2, 3–4, or 5 or more persons.

Race/ethnicity.

Respondents were asked to identify their specific AANHPI origin, with mutually exclusive options including Chinese, Asian Indian, Filipino, Vietnamese, Korean, Japanese, Native Hawaiian/Pacific Islander, other singular AAPI origin, or multiple AAPI origins. Those who selected multiple categories were classified into a “multiple AAPI origins” category for analysis.

Nativity and generational status.

Participants were asked their place of birth and their parents’ place of birth, enabling classification into first generation (foreign-born), second generation (U.S.-born with foreign-born parents), or third-plus generation (U.S.-born with U.S.-born parents).

Language spoken at home.

Language spoken at home was assessed with a binary indicator: English or a non-English language (e.g., Mandarin, Cantonese, Korean, Vietnamese). The inclusion of Asian languages in survey administration aimed to reduce measurement error and underrepresentation of linguistically isolated households.

All sociodemographic variables were used in descriptive analyses and served as covariates or inputs to the latent class analysis (LCA), which aimed to identify meaningful subgroups within the AANHPI population based on shared patterns of educational attainment, income, language use, and generational status.

Ratings of perceived community well-being.

Two well-being ratings were included in the current analysis. First, for national well-being, participants rated, on a five-point scale, the well-being of “most people living in the United States” with response options ranging from “Excellent” to “Poor.” Second, for community well-being, participants rated the well-being of “the community in which you live” using the same five-point scale described above.

Statistical analysis

The analytic plan was comprised of three steps: 1) identify latent classes based on key sociodemographic variables using the Amplified AAPI survey; 2) characterize the latent classes identified using additional variables; and 3) replicate the latent classes using a second sample. The analytic sample included 1,026 individuals with complete data on the variables used in the latent class analysis (LCA) and auxiliary characterization. LCA is a person-centered, model-based clustering technique used to identify unobserved (latent) subgroups within a population based on individuals’ patterns of responses across observed categorical variables. LCA was appropriate for this analysis because it enables the empirical identification of distinct subgroups of AAPI individuals who share similar sociodemographic profiles, without relying on arbitrary groupings. This method is particularly valuable for disaggregating heterogeneous populations to uncover meaningful subtypes that may differ in health, social, or behavioral outcomes.

LCA models were estimated in Mplus version 8.0 [29] using the manual three-step approach [30]. Four indicator variables were included in the LCA model: education level, annual household income, language spoken at home, and generational status in the U.S. The optimal class solution was determined by evaluating a combination of fit indices, including the negative two log likelihood (−2LL), Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), sample-size adjusted BIC (aBIC), the Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test (VLMR), and the Lo-Mendell-Rubin (LMR) adjusted likelihood ratio test. For all log-likelihood-based indices (−2LL, AIC, BIC, aBIC), lower values indicate better model fit [31]. The VLMR and LMR tests provide significance tests comparing models with k versus k–1 classes (e.g., 4 versus 5 classes). We also considered the theoretical interpretability and sample sizes of the emergent classes.

To characterize the four latent classes, we examined associations with additional sociodemographic variables as well as the two well-being ratings not included in the class formation: age (categorized as 18–29, 30–44, 45–59, and 60+), Asian origin (nine categories), household size (1–6 + persons), marital status (married, divorced/separated, never married), and employment status (working for an employer, self-employed, not working, retired). These variables were treated as auxiliary variables in Mplus. The distal categorical outcome (DCAT) approach was used to compare proportions across latent classes without influencing class membership estimation, while the Bolck, Croon, and Hagenaars (BCH) method was used to examine relationships between latent class membership and continuous auxiliary variables.

For the two well-being ratings, chi-square tests were conducted to assess whether the distribution of these ratings differed significantly across the identified latent classes. All comparisons were conducted using listwise deletion for cases with missing auxiliary variable data (e.g., marital status), consistent with Mplus default settings. Significant omnibus and pairwise differences (p < .05) were used to interpret meaningful distinctions among the classes.

Study 1 results

Table 1 presents the sample characteristics of the Amplify sample. This sample (N = 1,026) was nearly evenly split by gender, with 52.9% male and 47.1% female. The majority identified as Chinese (35.8%), and most participants were highly educated (84.8% held at least a bachelor’s degree), aged 30–59 (70.2%), and had household incomes of $100,000 or more (63.8%).

thumbnail
Table 1. Sample Characteristics of the Amplify Sample (N = 1,026).

https://doi.org/10.1371/journal.pone.0336912.t001

LCA Results

As shown in Table 2, we estimated latent class models with one through five classes using key sociodemographic variables (education, income, English language use at home, and generational status) to identify distinct subgroups within the AANHPI sample. Model selection was guided by a combination of statistical fit indices, likelihood ratio tests, and substantive interpretability.

thumbnail
Table 2. Model Fit Statistics and Likelihood Ratio Tests for Latent Class Analyses.

https://doi.org/10.1371/journal.pone.0336912.t002

The four-class model was selected as the best-fitting and most interpretable solution. While the three-class model demonstrated strong fit, with improvements over the two-class model indicated by significant VLMR (p = .019) and LMR adjusted likelihood ratio tests (p = .020), the four-class solution provided additional model improvement. Specifically, the bootstrapped likelihood ratio test comparing the three- and four-class solutions was significant (p < .001), as were the VLMR (p = .0132) and LMR (p = .0142) tests. Although the four-class solution had a slightly higher BIC compared to the three-class model (7470.573 vs. 7449.896), the adjusted BIC favored the four-class model (7359.409 vs. 7367.317).

The average latent class probabilities for most likely class membership ranged from 0.749 to 0.913, indicating adequate classification certainty and separation among classes. Class proportions were as follows: Class 1 (30.6%), Class 2 (7.8%), Class 3 (17.7%), and Class 4 (43.9%). The four latent classes reflected distinct sociodemographic profiles. Class 1 was predominantly high-income, highly educated, English-speaking, and second- or third-generation individuals. Class 2 was a smaller group with lower income and education levels, predominantly English-speaking, and mostly third-generation. Class 3 included individuals with mixed educational attainment and income levels, primarily first-generation and limited English use at home. Class 4 was the largest group, characterized by high income, postgraduate education, and a high proportion of first-generation immigrants with limited English use at home.

Class characterization results

After identifying the optimal four-class solution, we further characterized the latent classes using additional sociodemographic variables: age, Asian origin, household size, marital status, and employment status. Table 3 presents the class characterization results. Equality tests revealed significant differences across classes for all variables (p < .001), indicating meaningful heterogeneity in demographic composition. Figs 14 through 4 show the probability by class for each of these auxiliary variables.

thumbnail
Table 3. Class Characterization Results from the Amplify Sample.

https://doi.org/10.1371/journal.pone.0336912.t003

thumbnail
Fig 1. Probability by Class and Category for English at Home.

Amplify Sample: Probability distribution of language spoken at home across the four latent classes identified in the latent class analysis. Classes 1 and 2 are predominantly composed of individuals who speak English at home. Classes 3 and 4 show substantially higher probability of speaking languages other than English at home. Categories include: English at Home (green) and Other than English (orange).

https://doi.org/10.1371/journal.pone.0336912.g001

thumbnail
Fig 2. Probability by Class and Category for Income.

Amplify Sample: Probability distribution of annual household income across the four latent classes identified in the latent class analysis. Class 1 is predominantly concentrated in the highest income bracket of $100,000 or more. Class 2 shows elevated probability in the under $60,000 income bracket. Class 3 demonstrates a more balanced distribution across income categories. Class 4 is concentrated in the highest income bracket. Categories include: $100,000 or more (green), $60,000 to under $100,000 (orange), and under $60,000 (blue).

https://doi.org/10.1371/journal.pone.0336912.g002

thumbnail
Fig 3. Probability by Class and Category for U.S. Generation.

Amplify Sample: Probability distribution of generational and immigration status across the four latent classes identified in the latent class analysis. Class 1 is predominantly composed of individuals who are 2nd generation born in the US or 3rd generation. Class 2 is heavily represented by 3rd generation individuals born to one or both US parents. Class 3 shows substantial representation across all generational categories. Class 4 demonstrates high probability of being 1st generation born outside the US to parents born outside the US. Categories include: 1st generation born outside US to parents born outside US (green), 2nd generation born in US to parents both born outside US (orange), and 3rd generation born to one or both US parents (blue).

https://doi.org/10.1371/journal.pone.0336912.g003

thumbnail
Fig 4. Probability by Class and Category for Education.

Amplify Sample: Probability distribution of educational attainment levels across the four latent classes identified in the latent class analysis. Class 1 shows elevated probability of post-graduate or professional degrees. Class 2 demonstrates minimal representation in post-graduate education with higher representation in lower education categories. Class 3 shows a balanced distribution across multiple education levels. Class 4 is predominantly represented in postgraduate or professional degrees. Categories include: Bachelor’s degree (green), Less than high school/graduate equivalent (orange), Post-graduate study/professional degree (blue), and Some college/associate’s degree (pink).

https://doi.org/10.1371/journal.pone.0336912.g004

Class 1 (30.6%) consisted primarily of older individuals, with nearly 26% aged 60 or older and another 31% aged 45–59. Members of this class were highly educated, had high income, and almost exclusively spoke English at home. They were predominantly married (77.9%) and employed as salaried workers (64.9%). Most lived in two-person households. Asian origin composition was largely Chinese (33.7%) and Japanese (22.0%), with smaller proportions from other subgroups.

Class 2 (7.8%) was the smallest class, characterized by older individuals (41.1% were age 60+), lower income, and lower educational attainment. This class had the highest proportion of one-person households (49.0%) and showed a diverse AAPI origin composition, including high representation of Japanese (32.3%) and individuals identifying with multiple AAPI origins (18.5%). Marital status was diverse, with only 28.6% married and a majority never married or divorced/separated. Employment status was also varied, with nearly 28.4% retired and 37.0% not working.

Class 3 (17.7%) was composed of younger individuals, with 27.4% aged 18–29 and 25.9% aged 30–44. Members were predominantly 1st generation immigrants who spoke a language other than English at home. Educational attainment centered around a bachelor’s degree, and household sizes were relatively balanced. Class 3 had higher representation of Chinese (30.0%) and Filipino (14.3%) origin. Marital and employment status were also balanced, with 42.3% married and 37.8% employed, while 29.0% were not working.

Class 4 (43.9%), the largest class, included younger adults primarily in the 30–44 (50.7%) and 45–59 (34.6%) age ranges. This group was highly educated, had high income, and was predominantly 1st generation, although language use at home varied. Members were largely of Chinese origin (45.3%), with others identifying across a range of AAPI backgrounds. Most were married (73.9%), lived in moderately sized households, and were working as employees (80.5%).

Well-being ratings by latent class

We examined differences across the four latent classes in self-rated perceptions of national and community well-being using continuous ratings (1 = Excellent to 5 = Poor). Perceptions of national well-being were most negative in Class 2 (M = 3.745, SE = 0.120), followed by Class 3 (M = 3.465, SE = 0.072), Class 1 (M = 3.299, SE = 0.052), and Class 4 (M = 3.313, SE = 0.043), with relatively modest variation. These differences were statistically significant overall (χ² = 14.083, p = 0.003), with Class 2 differing significantly from Classes 1 and 4. Stronger variation was observed in perceptions of community well-being. Respondents in Class 2 rated their community’s well-being least favorably (M = 3.515, SE = 0.144), followed by Class 3 (M = 3.141, SE = 0.081), Class 4 (M = 2.640, SE = 0.049), and Class 1 (M = 2.549, SE = 0.060). These differences were also statistically significant (χ² = 62.457, p < .001), with multiple significant pairwise contrasts, especially between Class 2 and the other classes.

Study 2: Analysis of the National Survey of Health Attitudes (NSHA) sample

Methods

Participants and procedures.

Participants for the NSHA analysis were drawn from the 2023 National Survey of Health Attitudes (NSHA), a large, nationally representative survey designed to assess public views about health, well-being, and health equity across diverse U.S. populations. Data were collected by RAND in partnership with the Robert Wood Johnson Foundation. The final analytic sample (N = 5,620) combined two national probability-based online survey panels: the American Life Panel (ALP; n = 1,570) and the KnowledgePanel (n = 4,050), both of which are long-standing survey platforms commonly used for U.S. health and social research.

All ALP respondents included in the NSHA sample were panelists who had previously participated in the 2015 wave of the NSHA and remained active in the ALP at the time of the 2023 fielding. KnowledgePanel respondents were newly sampled for the 2023 administration, which allowed for adjustments to the sampling strategy to better meet project goals. Specifically, KnowledgePanel recruitment intentionally oversampled Black, Hispanic, and, where feasible, Asian American, Native Hawaiian, and Pacific Islander (AANHPI) respondents to support subgroup analyses. Sampling procedures were designed to ensure national representativeness and were weighted to reflect the broader U.S. adult population using post-stratification benchmarks from the American Community Survey.

Survey data were collected between March and April 2023 via self-administered web-based questionnaires. Respondents could complete the survey using either desktop or mobile devices. Participants provided informed consent electronically prior to participation, and survey instructions and items were available in English only. The full survey instrument included questions about perceived health, well-being, personal and community experiences with structural barriers, and attitudes toward equity-promoting policy actions. Data were accessed May-June 2025; authors had no access to information that could identify individual participants during or after data collection.

Measures

Race/ethnicity.

To capture racial and ethnic diversity, survey participants were first asked to select all applicable racial and ethnic groups (e.g., Asian, Black or African American, Hispanic or Latino) and then provide further detail through subgroup categories. For instance, those selecting “Asian” could choose one or more detailed subgroups (e.g., Chinese, Vietnamese, Filipino, Korean, Indian, Japanese). This race/ethnicity item design follows the revised federal recommendations for data collection (OMB Statistical Policy Directive No. 15) and enables granular identification of ethnic subgroup identity [7]. Respondents identifying with Native American groups were asked to specify their tribal affiliation via free-text entry. Notably, participants could endorse more than one racial or ethnic identity, and as a result, some were categorized into multiple subgroups in descriptive analyses.

Well-being ratings.

Two well-being ratings were included in the current analysis as in Study 1. First, for national well-being, participants rated, on a five-point scale, the well-being of “most people living in the United States” with response options ranging from “Excellent” to “Poor.” Second, for community well-being, participants rated the well-being of “the community in which you live” using the same five-point scale described above.

Statistical analysis

Similar to Study 1, we used LCA to identify distinct subgroups among Asian American respondents in the 2023 NSHA sample, employing a person-centered analytic approach. Given the relatively small number of AAPI respondents in the NSHA (n = 318), we focused on a reduced set of structural socioeconomic indicators to maximize interpretability and class stability while illustrating the utility of LCA in typical survey samples.

The final model included three categorical input variables: educational attainment (less than high school, high school diploma/GED, some college, bachelor’s degree or higher), annual household income (categorized into three levels: < $50,000; $50,000–$99,999; ≥ $100,000), and employment status (employed, unemployed, retired/other). Additional covariates, including census region, urbanicity, and homeownership, were considered in preliminary models but ultimately excluded from the final LCA due to insufficient variation and minimal contribution to class differentiation. The LCA method was detailed in the Study 1 Statistical Analysis section.

Study 2 results

As shown in Table 4, the NSHA subsample (N = 318) was nearly evenly split by gender and predominantly comprised individuals of Chinese (29.9%), Asian Indian (17.0%), and Filipino (12.6%) origin. Most participants were highly educated (76.1% with a bachelor’s degree or higher), aged 30 and older (88.7%), and reported household incomes of $100,000 or more (66.4%).

thumbnail
Table 4. Sample Characteristics for NSHA Subsample (N = 318).

https://doi.org/10.1371/journal.pone.0336912.t004

LCA results

LCA was used to identify distinct sociodemographic subgroups among Asian American respondents in the 2023 NSHA sample (N = 318). The model incorporated educational attainment, household income, and employment status as indicator variables. Model fit statistics are presented in Table 5. The two-class solution was selected as the best-fitting model, demonstrating significantly improved fit over the one-class model across all indices (AIC, BIC, aBIC) and supported by statistically significant VLMR, LMR, and BLRT tests (p < .0001). The three-class solution did not yield significant improvement in model fit and was therefore not retained.

thumbnail
Table 5. Model Fit Comparison (NHSA Sample; N = 318).

https://doi.org/10.1371/journal.pone.0336912.t005

Class 1 (High SES), which comprised 57.7% of the sample, was characterized by higher levels of educational attainment and household income, with a greater proportion of individuals currently employed. This group included more respondents in younger and middle-aged categories, with over one-third (38.9%) between the ages of 30 and 44. The majority of individuals in this class reported being married (73.4%) and owning their home (81.6%). In contrast, Class 2 (Low SES), which comprised the remaining 42.3% of the sample, had lower income and education levels, and a higher representation of older adults, with 46.2% aged 60 and above. Members of this group were more likely to be retired or unemployed, renters (54.5%), and never married (32.8%).

Differences also emerged by census region and urbanicity, with Class 1 more likely to reside in the Northeast and Class 2 more likely to reside in the West. However, nearly all participants across both classes lived in urban areas, limiting the utility of rural-urban stratification.

Class characterization results

Following the identification of the optimal two-class solution, we further examined demographic and structural differences between the classes using age, household size, marital status, homeownership, and geographic region. Class characterization results are presented in Table 6. Figs 57 illustrate the probability by class for each of these auxiliary variables.

thumbnail
Table 6. Class Characterization Results from the NSHA Subsample.

https://doi.org/10.1371/journal.pone.0336912.t006

thumbnail
Fig 5. Probability by Class and Category for Income.

NSHA Sample: Probability distribution of annual household income across the two latent classes identified in the latent class analysis. Class 1 is predominantly concentrated in the highest income bracket, reflecting high socioeconomic status. Class 2 is heavily concentrated in lower income brackets, with minimal representation in the highest income category, reflecting economic disadvantage. Categories include: $100,000+ (green), $50,000 - $100,000 (orange), and <$50,000 (blue).

https://doi.org/10.1371/journal.pone.0336912.g005

thumbnail
Fig 6. Probability by Class and Category for Employment Status.

NSHA Sample: Probability distribution of employment status across the two latent classes identified in the latent class analysis. Class 1 demonstrates substantially higher employment probability, reflecting greater economic stability. Class 2 shows greater economic precarity, with substantially higher unemployment probability compared to Class 1. Categories include: Employed (green) and Unemployed (orange).

https://doi.org/10.1371/journal.pone.0336912.g006

thumbnail
Fig 7. Probability by Class and Category for Education.

NSHA Sample: Probability distribution of educational attainment levels across the two latent classes identified in the latent class analysis. Class 1 shows elevated probability of post-graduate or professional degrees, with minimal representation in lower education categories. Class 2 demonstrates a more balanced distribution across bachelor’s degrees, with virtually no representation in post-graduate education. Categories include: Bachelor’s degree (green), Less than high school/graduate equivalent (orange), Post-graduate study/professional degree (blue), and Some college/associate’s degree (pink).

https://doi.org/10.1371/journal.pone.0336912.g007

Class 1 (57.7%) represented a high socioeconomic status profile, characterized by younger and middle-aged individuals: nearly 39% were aged 30–44 and 31% were aged 45–59, with only 9.4% under 30. This class had the highest homeownership rate (81.6%) and the lowest proportion of renters (18.4%). Most respondents lived in moderately sized households of two to four people. A strong majority were married (73.4%), and only 4.2% were divorced or separated. The group was overwhelmingly urban (98.1%) and regionally distributed across the West (46.5%), Northeast (26.6%), and South (19.7%), with fewer respondents in the Midwest (7.3%).

Class 2 (42.3%) reflected a lower SES profile, with nearly half (46.2%) of respondents aged 60 or older. This class had greater economic precarity, as reflected in the higher rate of rental housing (54.5%) and lower homeownership (45.5%). Household composition skewed smaller, with 39.4% living in two-person households and a slightly higher proportion reporting larger households (5 + members) compared to Class 1. Just 58.4% were married, while 32.8% were never married and 8.8% were divorced or separated. Although a majority resided in urban settings (93.6%), this group had the highest rural representation (6.4%). Regionally, respondents in this class were more concentrated in the West (58.7%) and had lower representation in the Northeast (13.9%).

Well-being ratings by latent class

We examined differences across the two latent classes in self-rated perceptions of national and community well-being, rated on a scale from 1 (Excellent) to 5 (Poor). Mean ratings of national well-being did not significantly differ across classes (High SES: M = 3.41, SE = 0.06; Low SES: M = 3.28, SE = 0.10; χ² = 1.098, p = 0.295). However, a significant difference emerged for community well-being (χ² = 5.689, p = 0.017), with the High SES class reporting a more favorable mean rating (M = 2.35, SE = 0.06) compared to the Low SES class (M = 2.66, SE = 0.11).

General discussion

Building from an effort in the 2023 NSHA to collect additional data that would better reflect health attitudes within the AANHPI community, this exploratory study aimed to utilize LCA to disaggregate data from AANHPI community members into meaningful subgroups to inform practice. Findings from this study underscore the demographic complexity and heterogeneity within the AANHPI population. The LCA analysis identified four latent classes in the larger Amplify sample and two latent classes in the smaller NSHA sample. These classes represent distinct subgroups that vary not only in socioeconomic characteristics but also in language use, generational status, and national origin.

When compared to disaggregating by AANHPI origin group alone, the LCA analysis provided an opportunity for more nuanced insights, inclusive of several important social determinants of health, overcoming sample size limitations. The LCA results indicate that even in a relatively small and traditionally sampled survey like the NSHA, latent class analysis can reveal meaningful sociodemographic heterogeneity within aggregated racial/ethnic categories. Class characterization results suggest that meaningful class-based stratification exists within the AANHPI NSHA subsample. While both classes were predominantly urban due to the nature of the respondent pool, substantial differences emerged in age structure, household stability, marital status, and homeownership—factors that likely influence perceptions of well-being and opportunity. The two-class solution captures key structural dimensions of heterogeneity that may be obscured in aggregate racial/ethnic data.

Given the staggering cost of health inequities [1] and limited resources available for public health entities to address these health inequities, it is crucial that data disaggregation efforts be feasible and informative for action and practice [32]. While origin-specific factors such as culture, language, indigeneity, and history are important for developing tailored community-humble supports, origin-focused disaggregation approaches are limited by the availability of data on origin [14]. Given current societal shifts that may limit the collection and use of data, such as race or ethnicity, to identify potential health disparities, it is essential that public health efforts continue to utilize available data to inform action [2].

Community based organizations and public health researchers have typically had limited capacity to utilize methods such as LCA in their research due to limited access to appropriate training and software. However, public health training programs offer programs in these methodologies, and some public health departments are training their own staff in utilizing these methods to address health inequities within their communities. Additionally, freely-available packages developed for use in the R environment can conduct LCA, allowing for typically resource-constrained research teams to use these methods [3335]. As more public health researchers adopt R as a standard programming language, the feasibility of incorporating methods such as LCA may continue to improve.

This study utilized observable variables associated with health inequities, such as educational attainment [36], household income [37], language spoken at home [38], and generational status [39], to identify latent groups within the AANHPI populations sampled. Both LCA and clustering methods have been used in intersectionality research to identify subgroups within populations often treated as homogenous and have the potential to inform more tailored public health practice [26]. The identified differences in community well-being underscore how these sociocultural factors may shape individuals’ local outlooks and perceptions of health-related opportunity, in ways not readily captured by ancestry or racial/ethnic labels alone. These differences emphasize the limitations of treating AANHPI individuals as a single, uniform group in research or practice. Using these statistical approaches to identify potential inequities within an aggregated racial group may also allow for investigations that assess the impact of racism, a health risk factor that is often assessed for by proxy using variables collected for race or ethnicity [18]. While it remains important to consider origin-specific factors when tailoring interventions and practices for AANHPI communities, reviewing the distribution of respondent AANHPI origin across classes and well-being indicators illuminates the potential opportunities of a person-centered disaggregation approach. LCA is not intended to replace disaggregated data collection and analysis; rather, LCA is offered as a complementary strategy.

This study has several limitations. Although the included surveys were intended as representative samples, the unweighted data were used in this particular analysis, limiting the generalizability of the results. The majority of respondents for both the NSHA and Amplify panels endorsed being homeowners, living in urban areas, having a household income over $100,000, being married, and having a bachelor’s degree or higher; this may have resulted in an overrepresentation of individuals with higher socioeconomic status. Adequate representation of diverse AANHPI community members in health surveys and research is a longstanding issue [14,40]. While the LCA approach described in this study is an important tool, complementary community-engaged research is an important step to identify information on the health status and needs of community members underrepresented or excluded from existing data [41]. This research can also serve to explore community- and culturally-specific aspects of health that existing tools and instruments are unable to capture. As the Amplify panel did not capture information on Multiracial/ethnic AANHPI community members with racial/ethnic identities outside of the AANHPI categories, the LCA analyses for each of the studies excluded individuals who identify as AANHPI and some other race or ethnicity. Multiracial/ethnic AANHPI people are a growing part of the AANHPI community, particularly within younger generations [42]. Public health data continue to identify Multiracial/ethnic communities, including Multiracial/ethnic AANHPI community members, as experiencing a high burden of adverse health outcomes, such as suicidal thoughts and behaviors [4244]. Future research should consider how to include Multiracial/ethnic AANHPI community members in research, and practice recommendations should be inclusively developed.

Conclusion

What individual and societal differences, variations, and factors genuinely matter to health outcomes and health attitudes? This exploratory analysis suggests the benefits of a shift in disaggregation perspective, focusing on the relevance and impact of the identified variables rather than solely on the breadth of disaggregation. While this study used AANHPI populations as an illustrative example, LCA can be used within any population to disaggregate data and inform policy and practice recommendations.

This study demonstrates the utility of applying a person-centered, disaggregated analytic approach to capture the within-group heterogeneity of the AANHPI communities. The Amplify panel was intentionally designed to improve on standard national survey limitations. While an improvement, even dedicated sampling has limitations in capturing subpopulations. This study demonstrated the opportunity for methods such as LCA to elucidate critical insights on the health status and needs of AANHPI subgroups that may be masked by aggregated data and/or non-representative samples. This is crucial for informing tailored resource allocation and strategies that reflect the multifaceted circumstances of AANHPI individuals [45]. Critically, LCA can also support efforts to disaggregate data from a larger group using available variables. This process can inform advancements in research and practice and can support practitioners throughout the public health and healthcare system to consider other social factors that drive health inequities in different subpopulations. This approach can also serve to allow for the adoption of more inclusive analytic methods capable of capturing unmeasurable differences within the group.

It is important to recognize the value of intentional efforts to collect sufficient data to speak to the health needs within populations often underrepresented in health data when opportunities exist, as evidenced by the existence of the Amplify panel. Even if limited to origin-only stratification, the Amplify panel highlighted several differences across origin-groups in perceptions of well-being, particularly for people identifying as Native Hawaiian and Other Pacific Islander, that were not captured by the more traditionally sampled NSHA study. The sample size of the Amplify panel also allowed for more nuanced LCA, which can further inform future disaggregation efforts and the tailoring of supportive resources. Future use of LCA could be enhanced with targeted oversampling of smaller racial/ethnic communities. We encourage researchers to continue exploring how to better understand the health status and needs of populations underrepresented or excluded from existing data. Approaches such as integrating an oversample into an existing survey, conducting a complementary study that aims to generate a more representative sample, and partnering with communities to support the generation of data on communities by communities are examples of how we can bridge known and unknown data gaps.

We must also recognize the ongoing shortage of resources for public health activities and the increasing cost of healthcare, which will likely result in increased competition for available resources. Simultaneously, the availability of data on social drivers of health inequities within existing and future datasets is uncertain, requiring researchers to consider different avenues to inform practice recommendations. This study presents a feasible approach to data disaggregation using variables that were available in each study and provides important insights on differences in outcome variables that are important for practitioners to consider as resources are allocated. When paired with staff training in LCA methodologies, this approach may help health agencies make more efficient use of these limited resources. Future research and policy should continue to embrace these methodologies to promote data-driven solutions for decision-making that go beyond the data aggregate to make meaningful progress toward reducing differences in health outcomes.

Acknowledgments

We would like to extend our gratitude to the participants of the surveys included in this study for taking the time to share their perspectives on health and wellbeing in the United States. We would also like to thank our RAND colleagues Dulani Woods, Jennifer Huang Bouey, Madhumita Ghosh Dastidar, Nipher Malika, Eunice Wong, Laurie Martin, and Regina Shih as well as Vincent A. Eng from the Asian and Pacific Islander American Health Forum for contributing and reviewing earlier conceptual proposals that led to this work. We would also like to recognize the tremendous efforts of community leaders, communities, and researchers who have been working to advance efforts to make meaningful progress towards achieving health equity.

References

  1. 1. LaVeist TA, Pérez-Stable EJ, Richard P, Anderson A, Isaac LA, Santiago R, et al. The Economic Burden of Racial, Ethnic, and Educational Health Inequities in the US. JAMA. 2023;329(19):1682–92. pmid:37191700
  2. 2. Ponce NA, Becker T, Shimkhada R. Breaking Barriers with Data Equity: The Essential Role of Data Disaggregation in Achieving Health Equity. Annu Rev Public Health. 2025;46(1):21–42. pmid:39883940
  3. 3. CSTE. Addressing gaps in public health reporting of race and ethnicity data for COVID-19: findings & recommendations among 45 state & local health departments. 2022.
  4. 4. Bowen CM, Snoke J. Do No Harm Guide: Applying Equity Awareness in Data Privacy Methods. Urban Institute; 2023.
  5. 5. Tran V. Asian Americans are falling through the cracks in data representation and social services. Urban Institute; 2018.
  6. 6. Jang Y, Yoon H, Park NS, Rhee M-K, Chiriboga DA. Mental Health Service Use and Perceived Unmet Needs for Mental Health Care in Asian Americans. Community Ment Health J. 2019;55(2):241–8. pmid:30357724
  7. 7. Revisions to OMB’s Statistical Policy Directive No. 15: Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity. 2024. https://www.federalregister.gov/documents/2024/03/29/2024-06469/revisions-to-ombs-statistical-policy-directive-no-15-standards-for-maintaining-collecting-and2024
  8. 8. Asian Pacific Institute on Gender-Based Violence. Census Data API Identities. 2025. https://api-gbv.org/resources/census-data-api-identities/
  9. 9. Nguyen KH, Lew KP, Trivedi AN. Trends in Collection of Disaggregated Asian American, Native Hawaiian, and Pacific Islander Data: Opportunities in Federal Health Surveys. Am J Public Health. 2022;112(10):1429–35. pmid:35952328
  10. 10. Ruiz NG, Noe-Bustamante L, Shah S. Diverse cultures and shared experiences shape Asian American identities. Pew Research Center; 2023.
  11. 11. Budiman A, Ruiz NG. Key facts about Asian origin groups in the U.S. Pew Research Center; 2021.
  12. 12. King L, Deng WQ, Hinterland K, Rahman M, Wong BC, Mai C. Health of Asians and Pacific Islanders in New York City. New York City Department of Health and Mental Hygiene; 2021.
  13. 13. Yeung D, Dong L. The health of Asian Americans depends on not grouping communities under the catch-all term. RAND. 2021.
  14. 14. Obra JK, Lin B, Đoàn LN, Palaniappan L, Srinivasan M. Achieving Equity in Asian American Healthcare: Critical Issues and Solutions. J Asian Health. 2021;1(1):e202103. pmid:37872960
  15. 15. Shih H. What has changed with detailed Asian group data for the 2020 census? Medium. 2024.
  16. 16. California Department of Public Health. Asian and Pacific Islander Data Disaggregation Highlights: California Assembly Bill 1726 (2016). 2022.
  17. 17. Boyd RW, Lindo EG, Weeks LD, McLemore MR. On racism: A new standard for publishing on racial health inequities. Health Affairs. 2020.
  18. 18. Chokshi DA, Foote MMK, Morse ME. How to Act Upon Racism-not Race-as a Risk Factor. JAMA Health Forum. 2022;3(2):e220548. pmid:36218834
  19. 19. Ponce NA, Becker T, Riti S, Scheitler A, Babey S. Data democracy in crisis: How changing federal data reshapes research and representation. The Milbank Quarterly. 2025.
  20. 20. Kurien P, Purkayastha B. Why Don’t South Asians in the U.S. Count As “Asian”?: Global and Local Factors Shaping Anti‐South Asian Racism in the United States*. Sociological Inquiry. 2024;94(2):351–68.
  21. 21. Kauh TJ, Read JG, Scheitler AJ. The Critical Role of Racial/Ethnic Data Disaggregation for Health Equity. Popul Res Policy Rev. 2021;40(1):1–7. pmid:33437108
  22. 22. Yang YT, Sudarshan S. Data disaggregation and unintended consequences. Lancet. 2024;403(10426):528. pmid:38237627
  23. 23. Chandra A, Martin LT, Acosta JD, Nelson C, Yeung D, Qureshi N, et al. Equity as a Guiding Principle for the Public Health Data System. Big Data. 2022;10(S1):S3–8. pmid:36070506
  24. 24. Ponce NA, Shimkhada R, Adkins-Jackson PB. Making Communities More Visible: Equity-Centered Data to Achieve Health Equity. Milbank Q. 2023;101(S1):302–32. pmid:37096622
  25. 25. Alcántara C, Suglia SF, Ibarra IP, Falzon AL, McCullough E, Alvi T, et al. Disaggregation of Latina/o Child and Adult Health Data: A Systematic Review of Public Health Surveillance Surveys in the United States. Popul Res Policy Rev. 2021;40(1):61–79.
  26. 26. Bauer GR, Mahendran M, Walwyn C, Shokoohi M. Latent variable and clustering methods in intersectionality research: systematic review of methods applications. Soc Psychiatry Psychiatr Epidemiol. 2022;57(2):221–37. pmid:34773462
  27. 27. Chandra A, Bugliari D, May LW, Weilant S, Nelson CD, Martin LT. 2023 National Survey of Health Attitudes: Description and Top-Line Summary Data. RAND Corporation; 2024.
  28. 28. NORC. Our capabilities 2025. https://amplifyaapi.norc.org/our-capabilities.html
  29. 29. Muthén LK, Muthén BO. Mplus User’s Guide. Los Angeles, CA: Muthén & Muthén; 2017.
  30. 30. Nylund-Gibson K, Grimm R, Quirk M, Furlong M. A Latent Transition Mixture Model Using the Three-Step Specification. Structural Equation Modeling: A Multidisciplinary Journal. 2014;21(3):439–54.
  31. 31. Nylund KL, Asparouhov T, Muthén BO. Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study. Structural Equation Modeling: A Multidisciplinary Journal. 2007;14(4):535–69.
  32. 32. Michaud J, Kates J, Oum S, Rouw A. U.S. Public Health. KFF;2025.
  33. 33. Van Lissa CJ, Garnier-Villarreal M, Anadria D. Recommended Practices in Latent Class Analysis Using the Open-Source R-Package tidySEM. Structural Equation Modeling: A Multidisciplinary Journal. 2023;31(3):526–34.
  34. 34. Naldi L, Cazzaniga S. Research Techniques Made Simple: Latent Class Analysis. J Invest Dermatol. 2020;140(9):1676-1680.e1. pmid:32800180
  35. 35. Mournet AM, Kleiman EM. Latent class analysis of depressive symptoms and associations with suicidal thoughts, plans, and attempts among a large national sample. Psychol Med. 2024;54(12):1–6. pmid:39320458
  36. 36. Suiter SV, Meadows ML. Educational Attainment and Educational Contexts as Social Determinants of Health. Prim Care. 2023;50(4):579–89. pmid:37866832
  37. 37. Chen AM. Barriers to health equity in the United States of America: can they be overcome?. Int J Equity Health. 2025;24(1):39. pmid:39920763
  38. 38. Kurlander D, Lam AG, Dawson-Hahn E, de Acosta D. Advocating for language equity: a community-public health partnership. Front Public Health. 2023;11:1245849. pmid:37915815
  39. 39. Loi S, Li P, Myrskylä M. Unequal weathering: How immigrants’ health advantage vanishes over the life-course. J Migr Health. 2025;11:100303. pmid:39911450
  40. 40. Wu B, Qi X. Addressing Health Disparities Among Older Asian American Populations: Research, Data, and Policy. Public Policy Aging Rep. 2022;32(3):105–11. pmid:35992733
  41. 41. American Public Health Association. Advancing Community-Based Participatory Practice in Public Health. 2024.
  42. 42. Yi SS. Data Equity and Multiracial and Multiethnic Communities. JAMA Netw Open. 2024;7(11):e2446839. pmid:39576649
  43. 43. Shaff J, Wang X, Cubbage J, Bandara S, Wilcox HC. Mental health and Multiracial/ethnic adults in the United States: a mixed methods participatory action investigation. Front Public Health. 2024;11:1286137. pmid:38274534
  44. 44. Substance Abuse and Mental Health Services Administration. Highlights by Race/Ethnicity for the 2023 National Survey on Drug Use and Health. 2024.
  45. 45. Rajagopalan R, Fujimura JH. Will personalized medicine challenge or reify categories of race and ethnicity?. Virtual Mentor. 2012;14(8):657–63. pmid:23351323