Figures
Abstract
Race is a critical variable in understanding health disparities, yet health databases lack consistent practices for identifying race. This rapid scoping review aimed to examine existing recommendations for identifying race in health databases and highlight gaps in current literature to guide future research. Following the Joanna Briggs Institute methodology and PRISMA-ScR guidelines, searches were conducted in MEDLINE, Embase, and Scopus for relevant literature published between January 2019 and February 2025. Articles were included if they addressed race identification in health databases, were available in English, had full-text access, and were peer-reviewed, knowledge syntheses, or grey literature. All articles were double screened in Covidence, and twenty-one articles were included. Descriptive thematic analysis identified five recommendation categories, including, self-identification and patient-centered practice, standardization across healthcare systems, data quality and completeness, algorithmic and predictive methods, and disaggregated data use and cross sector collaboration. There were common findings on the value of self-identification, cross-system consistency, and tools like natural language processing and imputation models. Some articles emphasized combining multiple strategies to improve system-wide practices, and overall, minimal conflicting evidence was observed. However, gaps remain in operationalizing these recommendations across various healthcare settings. Future directions should prioritize implementation-focused research and cross-jurisdictional comparisons to inform scalable, equity-driven improvements in race data practices. Ultimately, improving the consistency and accuracy of race data will enhance health equity monitoring, guide equitable resource distribution, and inform policies that better reflect the needs of racialized populations.
Citation: Chow M, Senthinathan A, El-Kotob R, Guilcher SJ (2025) Recommendations to improve race identification in health records: A rapid scoping review. PLoS One 20(12): e0339025. https://doi.org/10.1371/journal.pone.0339025
Editor: Usama Waqar, Emory University, UNITED STATES OF AMERICA
Received: September 19, 2025; Accepted: November 30, 2025; Published: December 29, 2025
Copyright: © 2025 Chow et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files (S1 File and S2 File).
Funding: The author(s) received no specific operational funding for this work. Dr. Sara Guilcher is supported by the University of Toronto Centre for the Study of Pain salary award.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Health databases are comprehensive repositories that store patient health data collected from different healthcare encounters, including health administrative databases, electronic health records (EHRs), electronic medical records (EMRs), clinical administrative data, and health information systems [1]. Notably, EHRs and EMRs are often used interchangeably, however, EMRs typically describe records within a single healthcare organization, while EHRs are shared across multiple settings to provide comprehensive patient care [2]. In general, these databases often capture details regarding hospital admissions, diagnoses, treatments, and demographics that can serve as important tools for health system performance monitoring and research [3–5]. Capturing social determinants of health, such as race, supports equity-focused research, system performance and tailored clinical care and policy [6]. However, race data are often not collected, incomplete, inconsistently reported, or oversimplified [6]. The capacity for health databases to support equity-focused research is therefore limited and may obscure insights into systemic health disparities [6,7].
Despite being historically misunderstood as a biological concept, race is now widely accepted as a social determinant of health, with increasing attention focused on addressing its role in perpetuating health inequities [8,9]. Race often refers to categories of people who share perceived physical characteristics that are socially constructed with meaning. The classification is rooted in discrimination and social oppression to maintain hierarchies of privilege and power [8,10]. Conversely, ethnicity refers to shared cultural identity, social practices, and heritage, such as language, religion and customs [8,10]. Recognizing this difference is critical for designing equitable health interventions and policies. Unfortunately, the current literature uses the terms “race” and “ethnicity” inconsistently, which makes it challenging when designing equitable health interventions and policies. In efforts to clarify the confusion, this review focuses on the definition of "race" while reporting the authors’ original terms and instances of conflation [11,12].
Racialized populations face disproportionate health burdens that are shaped by structural and systemic inequities, including discrimination, food insecurity, and unequal access to healthcare [7]. For instance, in Canada, chronic conditions such as diabetes are 2.3 times more prevalent among South Asian adults, 1.9 times more among Black adults, and 1.8 times more among Arab and West Asian adults compared to White adults [13]. Mental health disparities are also evident, where Southeast Asian and Arab adults are less likely to rate their mental health as excellent or good, and lower life satisfaction is reported among West Asian and Black populations. Socioeconomic inequalities compound these disparities [14]. Food insecurity is 2.8 times more common among Black adults, and core housing need is over twice as high among Arab, West Asian, and Black Canadians [15]. Indigenous groups also experience some of the highest unemployment levels in Canada [16]. Discrimination within healthcare further contributes to inequities, where approximately 50% of racialized Canadians reported unfair treatment when accessing services, including being dismissed by healthcare providers, receiving less thorough assessments, or feeling disrespected during clinical encounters [17,18]. These patterns underscore the urgency of incorporating race data into health research to inform inclusive and effective interventions.
Race is increasingly understood as a fundamental variable in health research, particularly for the detection and reduction of health disparities [19]. Yet, the collection and analysis of race data into health databases remains inconsistent across countries [19]. Some health systems collect race information directly through EHRs, while others rely on surveys, geographic proxies, or algorithmic estimates based on names or locations [19,20]. These inconsistent practices differ in quality and reliability, often leading to misclassification, oversimplification, and non-comparable racial data across settings [20]. The lack of standardized frameworks, such as overgeneralized racial categories, missing data, and the absence of individual-level information, limits the ability to identify meaningful within-group differences, evaluate target interventions, and monitor longitudinal disparities in marginalized populations [21]. In health systems and research, racialized populations remain underrepresented due to variable methods of race data collection and reporting in health databases [20]. As a result, our understanding of how systemic racism and social determinants impact health outcomes for minority populations is hindered [22,23]. Strengthening the consistency and accuracy of race data across health databases is therefore essential to support equity-driven research, policy development, and system-level change.
In summary, most of the literature focused on describing disparities rather than evaluating how race information was identified or classified. To address these gaps, this rapid scoping review study aimed to 1) examine recommendations for race identification in health databases, including best practices, frameworks, guidelines, and protocols for data identification, collection, and accuracy, and 2) determine gaps in literature on recommendations for identifying race in health databases to inform further research.
Methods
Methodological framework
This rapid scoping review was conducted and designed based on the six-step methodology developed by Joanna Briggs Institute (JBI) Methodology for Scoping Reviews, consisting of: 1) defining the research questions and aims; 2) developing eligibility criteria for study selection; 3) performing the search strategy; 4) evidence screening and selection; 5) data extraction and analysis; and 6) the presentation of results [24]. The reported findings are aligned with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for scoping review (PRISMA-ScR) checklist [25]. The PRISMA-ScR checklist used in this review can be found in the supplementary information section [25]. The scoping review protocol was developed a priori and registered on the Open Science Framework Registries on February 3rd, 2025 (https://osf.io/yw4sd/).
Eligibility criteria
This scoping review included articles that focused on recommendations for identifying race in health databases. Peer-reviewed studies, knowledge syntheses, and grey literature, such as dissertations and organizational articles, were included to incorporate diverse and reliable sources of information. Additionally, articles with English translations were considered to allow researchers to accurately interpret and analyze the findings.
Editorials and opinion-based sources were excluded to minimize bias and maintain focus on evidence-based recommendations. Articles published before 2019 were omitted to ensure the research reflects relevant and contemporary practices and frameworks. Sources without access to full text were excluded to ensure a comprehensive understanding of the content. Lastly, articles that focused solely on ethnicity were excluded, as the review specifically examined race data collection and reporting in health databases to address inequities tied to racial factors.
Search strategy
Three electronic databases were searched on February 20th, 2025, including MEDLINE (Ovid), Embase (Ovid), and Scopus (Elsevier). The Ovid MEDLINE search was reviewed by lab members (MC, HZ, AS) and refined by two experienced health science librarians (AW, AR). The searches were developed using three key concepts: 1) “health databases” (e.g., health administrative databases, EMRs, digital health systems); 2) “race” (e.g., race data, racial minorities, socio-demographic data); and 3) “recommendations” (e.g., best practices, guidelines, frameworks, algorithms). Using each platform’s corresponding command languages and controlled vocabularies, the search strategies were adapted for each database when applicable. A publication year limit was applied to ensure only articles published from 2019 onward were included. The full search strategies located in the supplementary information were presented exactly as executed.
Study selection
Articles identified from the database searches were uploaded into Covidence, an online software platform for managing reviews, on February 20th, 2025. Covidence was used for article de-duplication and screening. For consistency, three reviewers (MC, HZ, AS) preliminarily conducted a title and abstract screening of 25 articles to achieve a good interrater agreement (>80% agreement) of 88%, assessed with Microsoft Excel. Any discrepancies were resolved through virtual discussions until consensus was reached. No revisions or clarifications to the eligibility criteria were necessary, so the remaining titles and abstracts were independently double screened (MC, HZ), with any disagreements resolved by a third reviewer (AS). Next, a pilot screening of 20 full-text articles was conducted by the same reviewers (MC, HZ, AS) to ensure consistent interpretation and application of the eligibility criteria. This achieved an interrater agreement of 85% with discrepancies resolved through virtual discussions. The remaining full-text articles were independently double screened (MC, HZ), and disagreements were resolved by a third reviewer (AS).
Data extraction and analysis
In Covidence, a structured data extraction table was formulated to guide the extraction process. Each article was double extracted by two lab members (MC, HZ) independently to enhance reliability. A third reviewer (AS) resolved any discrepancies. Extracted variables included general study information (study ID, title, authors, year of publication, journal, and country), study characteristics (objectives, design, methodology, health database examined, and racial identities analyzed), and reported recommendations for identifying race within health databases.
The extracted data were then analyzed descriptively using thematic analysis. Data were summarized by study design, health database, location, year of publication, demographic data and key findings. Next, a qualitative synthesis of the recommendations was conducted, where one lab member (MC) inductively coded the data to identify recurring themes and patterns across studies. These themes were iteratively reviewed to ensure they captured the breadth and nuances of the recommendations, allowing for a structured categorization of best practices and proposed strategies for race data identification.
Results
Study selection
The database search yielded a total of 7072 articles, with 3041 duplicate studies removed, and 4031 remaining following de-duplication. At the title and abstract level of screening, 3898 articles were excluded, leaving 133 for full-text screening. Of the 133 full-text articles screened, 112 were excluded. A total of 21 studies were included in the data extraction and analysis. Fig 1 represents the PRISMA flow diagram documenting records identified, included, and excluded.
Study characteristics
A summary of the study characteristics (author, year, country, objective, study design, health database source, and racial categories analyzed) and recommendations can be found in Table 1. Of the included studies, nearly all were conducted in North America (n = 20, 95%) [20, 26–44] in either the United States (n = 19, 90%) [20,27–44] and Canada (n = 1, 5%) [26]. The remaining study spanned multiple European countries, specifically Belgium, France and the Netherlands (n = 1, 5%) [45]. In terms of publication year, the fewest number of studies were published in 2020 (n = 1, 5%) [29] and 2025 (n = 1, 5%) [26], followed by 2021 (n = 2, 10%) [28,33], 2022 (n = 3, 15%) [37,42,43] and 2024 (n = 3, 14%) [34,38,40], 2019 (n = 4, 19%) [20,31,36,41], and the most published in 2023 (n = 7, 33%) [27,30,32,35,39,44,45].
All included studies employed qualitative or observational methodologies [20,26–45]. The most common study design was validation studies (n = 9, 43%) [20,27,32,34–36,38,41,43], followed by cross-sectional studies (n = 3, 14%) [26,31,42], scoping reviews (n = 2, 10%) [30,45], and systematic reviews (n = 2, 10%) [28,39]. The remaining articles were associated with other approaches (n = 5, 24%) [29,33,37,40,44], such as organizational policy statements or descriptive implementation reports. The included studies assessed a range of health databases, with the most common being EHRs (n = 9, 49%) [27,29,31,35–38,40,42], followed by health administrative databases (n = 5, 24%) [20,32,34,41,43], and EMRs (n = 1, 5%) [26]. Some studies engaged with other databases (n = 3, 14%) [30,33,44] and a few did not examine a specific database (n = 3, 14%) [28,39,45], as they focused on reviewing literature or discussing conceptual frameworks.
Approaches to identifying race and ethnicity varied across the included studies. Seven studies used the five-category classification based on the United States Office of Management and Budget framework, which includes American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, and White [20,32,33,35,36,39,42]. Furthermore, six studies adopted an updated seven-category framework that added Hispanic and Multiracial as distinct groups [28,29,34,39,40,44]. Notably, six studies grouped race and ethnicity into a single variable [26,31,37,41,43,45]. One study focused specifically on Indigenous populations [30], and two studies discussed identity-related concepts however did not employ a specific framework to organize race data collection (e.g., immigration status or foreign-born populations) [27,45]. These studies were retained as they provided relevant recommendations for improving race data analysis, despite discussing identity more broadly. This variation highlights ongoing challenges in standardizing race and ethnicity data collection across datasets and research contexts.
Key Recommendations
Five overarching recommendation categories were inductively identified across the included studies: (1) Self-identification and patient-centered practice [20,27,30,33–35,38,40,45], (2) Standardization across healthcare systems [26,34,35,39,40], (3) Data quality and completeness [20,26–31,36,37,39], (4) Algorithmic and predictive methods [28,32,37,39,41–44], and (5) Disaggregated data use and cross sector collaboration [27,29,32,33,37,39,40]. The overlapping and interdisciplinary nature of efforts to improve race data in health databases was evident as numerous studies addressed more than one recommendation category. The key recommendations are summarized in Table 2, which describes each category alongside the corresponding articles and their examples.
Discussion
Many important findings were identified in this rapid scoping review. First, self-identification and person-centered practices were described as the gold standard for race data collection, emphasizing patient autonomy and lived experiences [20,27,30,33–35,38,40,45]. Notably, many articles emphasized improvement to race data collection at the point of care [20,26,27,33,34,38,40], such as self-reported patient intake forms or staff-administered surveys, rather than methods that refine or supplement race data after it had been recorded. Second, multiple studies highlighted the importance of transparency in how data are sourced and reported, particularly in distinguishing between self-reported, inferred, or algorithmically imputed data [20,27,39]. Third, studies identified standardization across systems to be essential for having consistent categorization of race and enabling reliable comparisons across databases [26,34,35,39,40]. Finally, strategies to improve data completeness, such as the use of supplementing sources and predictive tools, were proposed to enhance existing datasets when self-reported data were not available [26,28,41–44]. Altogether, these findings highlight opportunities to strengthen race data practices in healthcare and bring up persistent gaps in implementation, standardization, and health research.
Self-identification was the most widely supported and validated method for collecting race data in health systems, but real-world implementation challenges remain an issue [20,26,27,30,33,34,38,40,45]. It is grounded on the premise that individuals are best positioned to define and express their own identities [26]. Several studies highlighted strategies to operationalize self-identification [26,27,30,40], such as embedding self-reported surveys into EMRs [27], with reported response validity ranging from 84% to 100% in primary care settings [45]. Almklov et al. [27] demonstrated that electronic self-reporting tools outperformed both standard EHR documentation and staff-entered race data in terms of completeness and accuracy. Patient-facing tools, such as pre-visit portals or kiosk check-ins, were also proposed for clinical implementation, allowing individuals to directly record and update their demographic information, enhancing data quality and patient control [26,34]. Self-identification would also enhance the validity of race data by reducing common misclassification errors in observational or administrative databases, while also supporting more equitable care by centering patient voices and lived experiences [20,26,27,30,34,38,40]. This shift from extractive to patient-informed data practices may foster trust and align with principles of cultural safety.
Despite these strengths, policy, resource availability, and ethical challenges continue to limit the uptake of widespread implementation. In terms of policy, the absence of national or institutional mandates for standardized race data collection leads to inconsistent adoption [46]. Resource constraints, such as staff training or digital infrastructure, further complicate routine collection within already burdened clinical settings [46]. Ethically, concerns regarding data privacy and potential misuse create hesitancy within institutions and among patients. For instance, higher rates of “decline to answer” were observed when demographic questions lacked clarity or sensitivity, or when patients were unsure how their information would be used [27]. Future directions should explore how to frame race-related self-reporting questions to encourage disclosure, particularly among populations that are hesitant to self-identify. Also, although many studies emphasized the importance of self-identification and proposed techniques for improved data collection, few evaluated their implementation in real-world healthcare settings. Most recommendations remained conceptual, offering limited insight into feasibility, cost, equity impact, and sustainability of this intervention [30,33–35,40,45]. Further research should assess these suggestions based on patient engagement outcomes and determine necessary supports to translate these approaches into routine practice.
Another key finding from our review was the need for transparency in how race data were sourced, recorded and interpreted [20,27,39]. Swilley-Martinez et al. [39] emphasized that researchers and health systems should clearly document if race information was self-reported, observed, or algorithmically assigned, and record who made the classification. This level of documentation would strengthen the credibility of race data and ensure that downstream research and policy decisions were based on credible, well-understood sources [27]. Despite these advantages, few health systems consistently tracked or reported how race data were collected, and such information was often overlooked in research publications and datasets [46]. This introduced uncertainty, limited comparability, and made it difficult to determine whether observed disparities reflected real differences or were artifacts of data collection methods [46]. Therefore, future research should focus on developing clear standards for disclosing race data sources and examining how different data collection techniques may influence the interpretation of race-based analyses in health equity research. Establishing greater transparency will assist in laying the groundwork for standardized race data practices that are credible across healthcare systems.
Our findings highlighted the need for standardized frameworks for race data collection to improve comparability, accuracy, and equity across healthcare settings and databases [32,34,35,39]. Inconsistent definitions and classification schemes were noted as common barriers to accurately identifying and comparing race data, particularly when conducting analyses across institutions or national datasets [32]. Standardization was suggested to improve data interoperability, reduce data cleaning, and enable large-scale equity analyses within health systems. For instance, Huang et al. [32] evaluated Medicare administrative data and determined significant variability in the accuracy of commonly used coding systems depending on the racial group being identified. Thus, if racial groups were organized in the same manner, health databases could consistently produce demographic information that is comparable across sources. This could augment the ability to monitor disparities, allocate resources, and inform equitable policy responses. However, it is important to consider limitations that arise with race data standardization. Implementing harmonized systems requires extensive coordination between stakeholders such as government agencies, health system leaders, and EHR vendors, which may be resource intensive [37]. Future research should investigate strategies that account for institutional readiness, stakeholder alignment, and policy supports needed to adopt standardized data practices in differing health system contexts. Additionally, overly broad and outdated categories could perpetuate exclusion and obscure meaningful differences in health outcomes, such as masking subgroup variation or collapsing multiracial identities [26]. Further studies should explore frameworks for expanding racial categories to reflect evolving identities and capture within-group diversity.
Finally, our review identified the use of supplementary and predictive models as a strategy to improve the completeness of race data where self-reported data were incomplete or missing [26,28,41–44]. This included data linkage techniques, natural language processing, and algorithmic imputations such as the Bayesian Improved Surname Geocoding model. The Bayesian Improved Surname Geocoding model was the most mentioned method and estimates a patient’s race based on their last name and geocoded address [26,28,42]. This was shown to reduce missing data and enhance the reliability of race-based categorization in large datasets [28,42,44]. Some studies explored advanced or locally tailored imputation models that incorporated other factors like age, income, or household structure to improve accuracy [41,43]. To display uncertainty, probabilistic rather than fixed categorical assignments were recommended by multiple sources [28,41–44]. For instance, reporting a patient as 70% Black, 20% Hispanic, and 10% White to provide a more nuanced analysis with room for error [41,42]. Ultimately, this might improve the utility of health datasets that lack self-identified race information and allow an increased sample size of data. However, an important consideration to note was that predictive models relied on assumptions of factors (i.e., name, location, neighbourhood composition) that may not reflect an individual's lived racial identity [41]. This could unintentionally reinforce systematic biases, especially without any data validation or transparency. Ethical concerns such as lack of consent, limited interpretability, and potential misuse in policy or research contexts should also be considered. Future directions include exploring how advancements in artificial intelligence and natural language processing could improve contextual understanding of unstructured clinical notes, enhance data linkage, and automate validation processes to reduce biases and strengthen accuracy in race data identification. Future research should continue to focus on validating these imputation models against self-reported data, identifying methods for local tailoring, and creating ethical standards for probabilistic reporting to ensure these tools are used responsibly. As predictive approaches evolve, ensuring their ethical applications will be essential for advancing the quality of research and equity-informed health policies.
While most studies presented similar recommendations, a few presented conflicting perspectives regarding feasibility and data validity. Specifically, some authors questioned whether standardized categories risk oversimplifying complex identities [20,33,38], while others argued that uniform classification is necessary for comparability across databases [34,37,39]. For instance, certain studies recommended consolidating racial categories to align with United States national standards, such as the Office of Management and Budget standardized codes for comparability [26,34,35,39], while others emphasized the need to expand definitions or disaggregate data to better capture within-group differences and nuanced identities [29,33,37]. Furthermore, while some articles viewed algorithmic approaches as practical solutions for missing data [32,36,42–44], others cautioned that they would perpetuate biases or obscure inequities [26,40,41]. These contradictions demonstrated tension in the field regarding precise data and its comparability, as well as inconsistencies in innovation and ethics. Future research should aim to explore these perspectives by integrating both patient-centered and system level approaches, which manage data accuracy along with lived experience.
Overall, these findings reflected both advancements and ongoing challenges when collecting and reporting race data in health systems. Notably, of the 21 articles identified, most were based in the United States, limiting the global applicability of the recommendations due to differences in health systems, race categorizations, and data governance structures across diverse regions. Therefore, while current research provides a constructive foundation, the small number of applicable studies relative to the broad scope of the review highlights an important area for further research. Strengthening race data practices might enable a more accurate understanding of racial disparities in healthcare use and outcomes, which may guide more effective, equity-informed interventions and policy decisions.
Study strengths and limitations
A notable strength of this review was the comprehensiveness of the search strategy, which included interdisciplinary sources to capture diverse methodologies and perspectives on race data identification. Nevertheless, there were a few study limitations. While the database search was extensive, the review was limited to studies published between January 2019 and February 2025. This may have omitted earlier foundational work or longstanding recommendations that are still relevant today. Additionally, restricting eligibility criteria to only include English studies may have introduced selection bias by potentially excluding important findings in other languages. The review also focused specifically on race rather than ethnicity, which may have excluded studies examining broader constructs of identity and social categorization, thus potentially influencing population health outcomes. Furthermore, the quality of the included studies and their recommendations were not assessed, as critical appraisal is not typically required for scoping reviews [24]. As a result, the relative strength and reliability of individual strategies could not be evaluated.
Conclusion
This scoping review identified current recommendations for improving the identification of race within health databases, revealing five key thematic areas: self-identification and patient-centered practices, standardization across healthcare systems, data quality and completeness, algorithmic and predictive methods, and equity-oriented and disaggregated data use. While promising strategies exist, implementation remains inconsistent, and gaps such as a lack of global applicability, limited critical appraisal, and minimal focus on real-world feasibility highlight the need for further research. Strengthening race data identification is essential not only for improving data accuracy but also for supporting equity-driven research, policy, and health system transformation. Continued efforts to refine and operationalize these recommendations are crucial to advancing more inclusive healthcare systems.
Supporting information
S2 File. Search strategy used in MEDLINE (Ovid), Embase (Ovid), and Scopus (Elsevier).
https://doi.org/10.1371/journal.pone.0339025.s002
(DOCX)
Acknowledgments
The authors would like to acknowledge Dr. Laura van Staalduinen for her thoughtful support throughout this project. Thank you to Lauren Cadel and Amy Cho for their assistance with refining the search strategy and to Hannah Zuta for her contributions to screening and data extraction. The authors would also like to thank the Queen’s University librarians, Amanda Ross-White and Angelique Roy, for their expertise and contributions in developing the search strategy.
References
- 1. Cadarette SM, Wong L. An Introduction to Health Care Administrative Data. Can J Hosp Pharm. 2015;68(3):232–7. pmid:26157185
- 2. Bednorz A, Mak JKL, Jylhävä J, Religa D. Use of Electronic Medical Records (EMR) in Gerontology: Benefits, Considerations and a Promising Future. Clin Interv Aging. 2023;18:2171–83. pmid:38152074
- 3. Mukasa CDM, Kovacheva VP. Development and implementation of databases to track patient and safety outcomes. Curr Opin Anaesthesiol. 2022;35(6):710–6. pmid:36302209
- 4. Government of Canada CI. Health Services Research - CIHR. https://cihr-irsc.gc.ca/e/48809.html. 2014. 2025 September 17.
- 5. Boncyk CS, Jelly CA, Freundlich RE. The Blessing and the Curse of the Administrative Database. Ann Am Thorac Soc. 2020;17(2):174–5. pmid:32003609
- 6. Cary MP Jr, Zink A, Wei S, Olson A, Yan M, Senior R, et al. Mitigating Racial And Ethnic Bias And Advancing Health Equity In Clinical Algorithms: A Scoping Review. Health Aff (Millwood). 2023;42(10):1359–68. pmid:37782868
- 7. Lorem G, Cook S, Leon DA, Emaus N, Schirmer H. Self-reported health as a predictor of mortality: A cohort study of its relation to other health measurements and observation time. Sci Rep. 2020;10(1).
- 8. Suyemoto KL, Curley M, Mukkamala S. What Do We Mean by “Ethnicity” and “Race”? A Consensual Qualitative Research Investigation of Colloquial Understandings. Genealogy. 2020;4(3):81.
- 9.
Baciu A, Negussie Y, Geller A. The root causes of health inequity. Communities in Action: Pathways to Health Equity. U.S. National Library of Medicine. 2017.
- 10. Cheon YM, Bayless SD, Wang Y, Yip T. The Development of Ethnic/Racial Self-Labeling: Individual Differences in Context. J Youth Adolesc. 2018;47(10):2261–78. pmid:29546623
- 11. Macias-Konstantopoulos WL, Collins KA, Diaz R, Duber HC, Edwards CD, Hsu AP, et al. Race, Healthcare, and Health Disparities: A Critical Review and Recommendations for Advancing Health Equity. West J Emerg Med. 2023;24(5):906–18. pmid:37788031
- 12. Lu C, Ahmed R, Lamri A, Anand SS. Use of race, ethnicity, and ancestry data in health research. PLOS Glob Public Health. 2022;2(9):e0001060. pmid:36962630
- 13. Public Health Agency of Canada. Inequalities in health of racialized adults in Canada. 2022. https://www.canada.ca/en/public-health/services/publications/science-research-data/inequalities-health-racialized-adults-18-plus-canada.html
- 14.
Perceived health and well-being indicators among racialized groups. Statistics Canada. 2023. https://www150.statcan.gc.ca/n1/pub/89-657-x/89-657-x2025004-eng.htm
- 15. Canadians are facing higher levels of food insecurity. https://www.statcan.gc.ca/o1/en/plus/6257-canadians-are-facing-higher-levels-food-insecurity. 2024 2025 November 3.
- 16. Durand-Moreau Q, Lafontaine J, Ward J. Work and health challenges of Indigenous people in Canada. Lancet Glob Health. 2022;10(8):e1189–97. pmid:35839817
- 17. Husbands W, Lawson DO, Etowa EB, Mbuagbaw L, Baidoobonso S, Tharao W, et al. Black Canadians’ Exposure to Everyday Racism: Implications for Health System Access and Health Promotion among Urban Black Communities. J Urban Health. 2022;99(5):829–41. pmid:36066788
- 18.
Statistics Canada. Half of racialized people have experienced discrimination or unfair treatment in the past five years. The Daily. https://www150.statcan.gc.ca/n1/daily-quotidien/240516/dq240516b-eng.htm. 2024. 2025 November 3.
- 19. Stanford FC. The Importance of Diversity and Inclusion in the Healthcare Workforce. J Natl Med Assoc. 2020;112(3):247–9. pmid:32336480
- 20. Polubriaginof FCG, Ryan P, Salmasian H, Shapiro AW, Perotte A, Safford MM, et al. Challenges with quality of race and ethnicity data in observational databases. J Am Med Inform Assoc. 2019;26(8–9):730–6. pmid:31365089
- 21. Sheikh F, Fox-Robichaud AE, Schwartz L. Collecting Race-Based Data in Health Research: A Critical Analysis of the Ongoing Challenges and Next Steps for Canada. bioethics. 2023;6(1):75–80.
- 22. Macias-Konstantopoulos WL, Collins KA, Diaz R, Duber HC, Edwards CD, Hsu AP, et al. Race, Healthcare, and Health Disparities: A Critical Review and Recommendations for Advancing Health Equity. West J Emerg Med. 2023;24(5):906–18. pmid:37788031
- 23.
Tinuoye O. Frameworks that guide race and ethnicity data collection practices in health settings: A scoping review. Western University. 2024. https://ir.lib.uwo.ca/etd/10153
- 24. Peters MD, Godfrey C, McInerney P, Munn Z, Tricco AC, Khalil H. Scoping reviews. JBI Manual for Evidence Synthesis. JBI. 2024.
- 25. Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med. 2018;169(7):467–73. pmid:30178033
- 26. Abulibdeh R, Tu K, Butt DA, Train A, Crampton N, Sejdić E. Assessing the capture of sociodemographic information in electronic medical records to inform clinical decision making. PLoS One. 2025;20(1):e0317599. pmid:39823404
- 27. Almklov E, Cohen AJ, Russell LE, Mor MK, Fine MJ, Hausmann LRM, et al. Assessing an electronic self-report method for improving quality of ethnicity and race data in the Veterans Health Administration. JAMIA Open. 2023;6(2):ooad020. pmid:37063405
- 28. Cook LA, Sachs J, Weiskopf NG. The quality of social determinants data in the electronic health record: a systematic review. J Am Med Inform Assoc. 2021;29(1):187–96. pmid:34664641
- 29. Cusick MM, Sholle ET, Davila MA, Kabariti J, Cole CL, Campion TR Jr. A Method to Improve Availability and Quality of Patient Race Data in an Electronic Health Record System. Appl Clin Inform. 2020;11(5):785–91. pmid:33241548
- 30. Gartner DR, Maples C, Nash M, Howard-Bobiwash H. Misracialization of Indigenous people in population health and mortality studies: a scoping review to establish promising practices. Epidemiologic Reviews. 2023;45(1):63–81.
- 31. Hatef E, Rouhizadeh M, Tia I, Lasser E, Hill-Briggs F, Marsteller J, et al. Assessing the Availability of Data on Social and Behavioral Determinants in Structured and Unstructured Electronic Health Records: A Retrospective Analysis of a Multilevel Health Care System. JMIR Med Inform. 2019;7(3):e13802. pmid:31376277
- 32. Huang AW, Meyers DJ. Assessing the validity of race and ethnicity coding in administrative Medicare data for reporting outcomes among Medicare advantage beneficiaries from 2015 to 2017. Health Serv Res. 2023;58(5):1045–55. pmid:37356821
- 33. Kauh TJ, Read JG, Scheitler AJ. The Critical Role of Racial/Ethnic Data Disaggregation for Health Equity. Popul Res Policy Rev. 2021;40(1):1–7. pmid:33437108
- 34. Martino SC, Elliott MN, Haas A, Peltz A, Saliba D, Hassan S, et al. Assessing the accuracy of race‐and‐ethnicity data in the Outcome and Assessment Information Set. J American Geriatrics Society. 2024;72(8):2508–15.
- 35. Samalik JM, Goldberg CS, Modi ZJ, Fredericks EM, Gadepalli SK, Eder SJ, et al. Discrepancies in Race and Ethnicity in the Electronic Health Record Compared to Self-report. J Racial Ethn Health Disparities. 2023;10(6):2670–5. pmid:36418736
- 36. Sholle ET, Pinheiro LC, Adekkanattu P, Davila MA III, Johnson SB, Pathak J, et al. Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation. Journal of the American Medical Informatics Association. 2019;26(8–9):722–9.
- 37. Smith MA, Gigot M, Harburn A, Bednarz L, Curtis K, Mathew J, et al. Insights into measuring health disparities using electronic health records from a statewide network of health systems: A case study. J Clin Transl Sci. 2023;7(1):e54. pmid:37008604
- 38. Sojka PC, Maron MM, Dunsiger SI, Belgrave C, Hunt JI, Brannan EH, et al. Evaluation of Reliability Between Race and Ethnicity Data Obtained from Self-report Versus Electronic Health Record. J Racial Ethn Health Disparities. 2025;12(4):2200–3. pmid:38839729
- 39. Swilley-Martinez ME, Coles SA, Miller VE, Alam IZ, Fitch KV, Cruz TH, et al. “We adjusted for race”: now what? A systematic review of utilization and reporting of race in American Journal of Epidemiology and Epidemiology, 2020-2021. Epidemiol Rev. 2023;45(1):15–31. pmid:37789703
- 40. Weathers AL, Garg N, Lundgren KB, Benish SM, Baca CB, Benson RT. Improved Accuracy/Completeness of EHR Race/Ethnicity Data: A Requisite Step to Address Disparities in Care. Neurol Clin Pract. 2024;14(3):e200313. pmid:38720950
- 41. Xue Y, Harel O, Aseltine RH Jr. Imputing race and ethnic information in administrative health data. Health Serv Res. 2019;54(4):957–63. pmid:31099021
- 42. Yee K, Hoopes M, Giebultowicz S, Elliott MN, McConnell KJ. Implications of missingness in self‐reported data for estimating racial and ethnic disparities in Medicaid quality measures. Health Services Research. 2022;57(6):1370–8.
- 43. Zavez K, Harel O, Aseltine RH Jr. Imputing race and ethnicity in healthcare claims databases. Health Serv Outcomes Res Method. 2022;22(4):493–507.
- 44. Bear Don’t Walk OJ 4th, Pichon A, Reyes Nieva H, Sun T, Li J, Joseph J, et al. Contextualized race and ethnicity annotations for clinical text from MIMIC-III. Sci Data. 2024;11(1):1332. pmid:39638783
- 45. Meudec M, Affun-Adegbulu C, Cosaert T. Review of health research and data on racialised groups: Implications for addressing racism and racial disparities in public health practice and policies in Europe: a study protocol. F1000Res. 2023;12:57.
- 46. Sharghi S, Khalatbari S, Laird A, Lapidus J, Enders FT, Meinzen-Derr J, et al. Race, ethnicity, and considerations for data collection and analysis in research studies. J Clin Trans Sci. 2024;8(1).