Racism as a Determinant of Health: A Systematic Review and Meta-Analysis

Despite a growing body of epidemiological evidence in recent years documenting the health impacts of racism, the cumulative evidence base has yet to be synthesized in a comprehensive meta-analysis focused specifically on racism as a determinant of health. This meta-analysis reviewed the literature focusing on the relationship between reported racism and mental and physical health outcomes. Data from 293 studies reported in 333 articles published between 1983 and 2013, and conducted predominately in the U.S., were analysed using random effects models and mean weighted effect sizes. Racism was associated with poorer mental health (negative mental health: r = -.23, 95% CI [-.24,-.21], k = 227; positive mental health: r = -.13, 95% CI [-.16,-.10], k = 113), including depression, anxiety, psychological stress and various other outcomes. Racism was also associated with poorer general health (r = -.13 (95% CI [-.18,-.09], k = 30), and poorer physical health (r = -.09, 95% CI [-.12,-.06], k = 50). Moderation effects were found for some outcomes with regard to study and exposure characteristics. Effect sizes of racism on mental health were stronger in cross-sectional compared with longitudinal data and in non-representative samples compared with representative samples. Age, sex, birthplace and education level did not moderate the effects of racism on health. Ethnicity significantly moderated the effect of racism on negative mental health and physical health: the association between racism and negative mental health was significantly stronger for Asian American and Latino(a) American participants compared with African American participants, and the association between racism and physical health was significantly stronger for Latino(a) American participants compared with African American participants. Protocol PROSPERO registration number: CRD42013005464.


Introduction
Racism can be defined as organized systems within societies that cause avoidable and unfair inequalities in power, resources, capacities and opportunities across racial or ethnic groups. Racism can manifest through beliefs, stereotypes, prejudices or discrimination. This encompasses everything from open threats and insults to phenomena deeply embedded in social systems and structures [1]. Racism can occur at multiple levels, including: internalized (the incorporation of racist attitudes, beliefs or ideologies into one's worldview), interpersonal (interactions between individuals) and systemic (for example, the racist control of and access to labor, material and symbolic resources within a society) [1][2][3]. Racism persists as a cause of exclusion, conflict and disadvantage on a global scale [4], and existing data suggests racism is increasing in many national contexts (e.g., [5][6][7][8][9]).
The first reviews on discrimination and health were conducted in the mid 1990s and were concerned largely with conceptual and methodological advancements in studying the role of racism in health disparities in the United States [16,17]. These reviews identified less than a dozen studies that were conducted since 1983 on racial discrimination and various health outcomes, including a range of mental and physical health outcomes. They provided early indication for the adverse impacts of racism on health and called for further research on the topic. Four additional reviews were conducted in the early 2000s that included studies from 1972 onwards, consisting of samples ranging from 13 to 53 studies [18][19][20][21]. These reviews further examined the poor mental and physical health outcomes of racial discrimination mainly in the U.S, among African-American adult populations. They found consistent evidence for associations between racism and mental health outcomes, and mixed evidence regarding associations with physical health outcomes (of which blood pressure and hypertension were the main outcomes).
Two larger systematic reviews were published in 2006 and 2009, covering a combined total of 253 empirical studies, published between 1984 and 2007 [10,22]. International in scope, these reviews focused on racism and a plethora of health outcomes, and found the strongest and most robust associations between racism and poor mental health as well as health-related behaviors. More recently, two large-scale meta-analyses focused on the relationship between discrimination more generally, and mental health [13,23] and physical health [13]. They examined 192 and 328 studies (with some overlap) published between 1986 and 2012, of which nearly two thirds of studies examined racial discrimination. These meta-analyses found significant negative impacts of discrimination on mental health [13,23], and a somewhat weaker, but still significant, association with physical health [13]. Additional analyses focused specifically on racism found significant associations with mental health (operationalized as well-being), self-esteem and psychological distress, and similar results for life satisfaction, anxiety and depression (though metrics for the latter three were not published) [23]. Their results regarding the adverse mental health of racism and discrimination were confirmed also by a smaller metaanalysis and another review [24,25].
Several reviews and meta-analyses have concentrated on specific populations, such as children and adolescents, and ethnic groups, including Asian Americans, African Americans, and Latino/a Americans, with negative mental health outcomes as the main health outcome of interest [14,15,[26][27][28][29][30][31]. These reviews were published between 1985 and 2011 and included between 20 and 62 studies, with the exception of one review that included 121 studies [14]. They have found consistent, adverse associations with mental health, while for the few that also examined physical health findings were mixed [14,15,26,27].
Several recent reviews and a meta-analysis have been conducted specifically on racism and physical health outcomes, particularly blood pressure and hypertension, and cardiovascular disease [32][33][34][35]. These works included study samples ranging from 15 to 44 studies, covering the period 1984-2013. They have found mixed, and often weak, associations between racism and hypertension and blood pressure, with the exception of ambulatory blood pressure, a potential measure of stress reactivity, which has shown consistent associations with racism [34,35].
In summary, the reviews and meta-analyses thus far have noted that self-reported discrimination is consistently related to poor mental health, but less consistently related to poor physical health. A limitation of these prior reviews, however, is that cross-sectional studies were aggregated alongside longitudinal studies (with the exception of Schmitt et al. [23] analysis of longitudinal effects for discrimination more generally). Greater attention to longitudinal analyses is required as a way to assess causality, as well as to examine the possibility of a lag between exposure to discrimination and the development of physical health problems, which some studies have previously indicated [36][37][38].

Moderators
Moderators are variables that influence the nature (i.e., direction and/or strength) of the relationship between a predictor and an outcome variable [39]. Many scholars have noted the important role that moderators may play in understanding the differential health-related outcomes among individuals experiencing racism and associated stress [11,40,41]. Clark et al. [40] hypothesized that moderators may first influence perceiving the environmental stimulus as a type of racism, and, second, impact processes via which racism affects the individual. More recently, Williams and Mohammed [41] developed a causal model that highlights moderators such as age, socio-economic status, racial group, gender and relational status, as influencing the racism and health relationship.
Despite this theoretical interest, the empirical data regarding the influence of moderators on relationships between racism and health outcomes is currently mixed, with previous moderation analyses of participant subgroups largely inconclusive. Some studies have found no differences in the association between discrimination and health between men and women, while others have found stronger effects on the mental health of women, while still others have found the opposite [42]. In a recent meta-analysis, gender did not significantly moderate the association between discrimination and mental health, and between discrimination and physical health [13]. Similarly, ethnicity has been a significant moderator in some studies (e.g., [43]), whereas others have found no significant differences across ethnic groups (e.g., [44]). A metaanalysis, which conducted an analysis specifically on racism and mental health, found that effects were significantly smaller for studies of anti-White discrimination compared with studies of discrimination against other ethnic groups, but found no additional differences between other ethnic groups [23]. A meta-analysis comparing the associations between discrimination and mental and physical health in Asian, Black, Hispanic, Native American, and White participants showed no significant differences based on ethnicity [13]. Similarly, little is known about age as a possible moderator of this relationship. In a meta-analysis on discrimination and mental health, age was a significant moderator in multivariate (but not univariate) analysis [23], Other materials, including conference papers and presentations, were excluded. The majority of articles were identified through an online search. The search covered the following databases and electronic collections: Medline, PsycInfo, Sociological Abstracts, Social Work Abstracts, ERIC, CINAHL, Academic Search Premier, Web of Science and ProQuest (for dissertations/ theses). For a list of search terms used, please see S1 Appendix. In addition, the authors' personal databases were searched for additional references. We also identified 25 major literature reviews, meta-analyses and other relevant works for which reference lists were manually searched for additional articles.

Inclusion criteria
Articles were considered for review if they consisted of empirical studies reporting quantitative data on the association between racism and a health outcome/s. The titles and abstracts of articles were first screened, and then their full texts were screened in an additional stage. We use 'articles' throughout this paper to refer to both published and unpublished material, while we use 'study' to relate to unique research; therefore one study can be reported in multiple articles, and an article may report several studies.
Exposure. Reported racism is the exposure examined in this study, and includes: selfreported racism experienced directly in interpersonal contact; racism directed towards a group (e.g., based on ethnicity/race/nationality) of which the person is a member; vicarious experiences of racism (e.g., witnessing racism experienced by family members or friends); proxy reports of racism (e.g., a child's experiences of racism as reported by their parent); and internalized racism (i.e., the incorporation of racist attitudes and/or beliefs within an individual's worldview). Exposure measures include discrimination, maltreatment, prejudice, stereotypes, aggression and related terms (see S1 Appendix), where the reason/s for these include race, skin color, ethnicity, culture, ancestry, origin, birth country, nationality, migration status, religion, language and/or accent.
More general measures of discrimination, wherein the specific effect of racism cannot be isolated, were excluded. For example, papers that used the Everyday Discrimination Scale as a measure in its original (general discrimination) format, were excluded, unless the scale was modified to explicitly specify race, skin color, ethnicity, etc. as the reason for discrimination. In cases where the majority of items assessed racism specifically and all remaining items were about discrimination broadly defined (where the reason for discrimination was not specified), the measure was included.
Exposures that focused solely on discrimination due to other reasons (e.g., gender, socioeconomic status) were excluded. Several instruments combine racism and possible health outcomes in the same measure. These were excluded since this review is focused on studies in which exposure and outcome were clearly delineated as separate constructs, to allow an examination of their association without possible confounding. Exposures to race-related stress or discrimination-distress (e.g., the extent to which racial discrimination was stressful/upsetting, as measured for example by the Index of Race-Related Stress-Brief Version (IRRS-B); [47] or by the Racism Experiences Stress Scale (EXP-STR); [48]), and other exposures relating racism to health within the same instrument, or combining racism with responses to racism (e.g., how much respondents are bothered by racism) were therefore excluded. For example, versions of the Perceived Racism Scale (PRS; [49]) that included measurement of emotions, coping behaviors, and cognitive appraisals related to racist encounters, were excluded (e.g., [50]). Ecological exposure measures of racism (e.g., racial segregation), experimental exposures (e.g., videos, vignettes, tasks) (e.g., [51,52]) and other exposures where racism was assessed by the researcher, were also excluded due to our focus on observational studies examining racism perceived by research participants.
Outcomes. The following health outcomes were included: (1) negative mental health (depression, anxiety, distress, psychological stress, negative affect, post-traumatic stress (PTS) and posttraumatic stress disorder (PTSD), somatization, internalizing, suicidal ideation/planning/attempts, other mental health symptoms such as paranoia and psychoticism, and general mental health); (2) positive mental health (self-esteem, life satisfaction, control and mastery, wellbeing, positive affect); (3) physical health (blood pressure and hypertension, overweight-related measures, heart conditions and illnesses, diabetes, high cholesterol, and miscellaneous/mixed measures of physical health); and (4) general health (including both physical and mental health, or unspecified as physical and/or mental health; e.g., feeling unhealthy).
Several articles used relevant exposure and outcome measures but did not examine and/or did not report their association, and were therefore excluded. Studies that measured racism as an outcome (rather than as an exposure) were also excluded.

Screening
Approximately 20,672 articles were screened for titles and abstracts. Online database searches yielded 19,292 articles, searches of the authors' personal databases yielded 896 articles, and another 484 articles were found in the reference lists of 25 meta-analyses and literature reviews. Search results were imported into Endnote X5 [53], duplicates deleted, and two reviewers independently screened all titles and abstracts to assess eligibility for inclusion. All authors of the study protocol [46] were involved in the screening. Articles were further considered for inclusion if their title and abstract indicated that they may contain empirical research, and that they report quantitative data on the relationship between racism and health. Articles examining discrimination more generally were retained at this stage, and their full-texts further examined in the next stage. Articles not focused primarily on the association between racism and relevant health outcomes but nonetheless reporting association/s between them were included (e.g., associations between racism and health reported as part of a large correlation matrix in a study primarily focused on other variable/s). Disagreements were resolved by consensus or by a third reviewer. After completing the first screening stage of titles and abstracts, a total of 1,110 articles were retained.
Full texts of potentially eligible articles were obtained, with each full text then independently screened by two reviewers. Disagreements were resolved by consensus or by a third reviewer. The main reasons for exclusion were the reporting of irrelevant exposure and/or irrelevant outcome measures, especially articles only reporting general experiences of discrimination. Associations with outcome groups such as health behaviors/risk behaviors, pregnancy and birth outcomes and health care utilization, were excluded as beyond the scope of this meta-analysis. Where the exact same journal article was published multiple times, only the most recent version was retained and duplicates excluded. Duplicate articles by the same authors that report identical association data were excluded. Multiple articles reporting identical association data but written by different authors were retained and their associations were averaged using Comprehensive Meta-Analysis software version 2.0 (CMA, see below) [54]. After completing the second stage of screening, 534 articles remained.
In the third and final screening stage, one reviewer examined whether articles reported appropriate and sufficient association data to allow the calculation of correlation coefficients. Decisions regarding exclusion were discussed with a second reviewer. Articles reporting only adjusted associations between racism and health were recorded but excluded from the metaanalysis. While it is appropriate to adjust for covariates in individual studies, since articles on racism and health often adjust for different sets of covariates, the effects of each covariate or sets of covariates cannot be determined. Articles adjusting for different covariates and reporting effect size data that need to be converted pose an additional limitation since such consistent conversions are not possible [13]. Given that few articles adjust for the same sets of covariates (e.g., [13]) we opted to focus solely on unadjusted effect size data.
The correlation coefficient was the most commonly reported measure of association between racism and health (particularly mental health), and was used as the measure of effect size. Beside correlations and sample sizes, standardized beta coefficients in unadjusted (univariate) regression models, which are equal to correlation coefficients, were used as correlations. All other metrics were converted to correlation coefficients. The following data formats were converted using CMA: 1) Odds Ratios (OR) and confidence intervals (CIs); 2) Means, standard deviations and sample sizes of two groups (racism and no racism); 3) Cross-tabulations (2x2) of events and non-events (racism/no racism and poor/good health); 4) Means and samples sizes for two groups, and an independent group t-value; 5) Means and samples sizes for two groups, and an independent group p-value; 6) Standardized mean differences (Cohen's d) and sample sizes for two groups; and 7) p-value and sample size for correlation coefficient. In few cases (less than 2% of associations) only the sample size and p-level were reported, as well as whether the association was significant or not. Where the p-value was not significant and its exact value not reported, the correlation coefficient was conservatively recorded as zero (see [34] for a similar approach). Where the p-value was significant and its exact value not reported, the p-value was conservatively recorded just below the p-level (e.g., a significant p-value at 0.05 was recorded as p = 0.049999). Unstandardized regression coefficients were first converted into standardized betas and then converted into Cohen's d using the Campbell Collaboration webbased effect size calculator [55, 56], before their conversion into correlations. Covariances were converted into correlations using corresponding standard deviations [57]. When the sample size was described as a range, the range's minimum sample size was used.
After exclusion of articles that did not report appropriate and sufficient association data to allow the calculation of correlation coefficients (including articles reporting adjusted effect size data only), 333 articles were left that comprised the final sample for analysis.

Data extraction and coding
Articles were reviewed and data were extracted and coded by six reviewers, including two of the authors and four postgraduate students (three doctoral students and one master's student, all with experience in research on racism and health). An Excel spreadsheet and a corresponding manual were developed for data extraction purposes (both available from the authors upon request). Reviewers extracted five types of data from each article, including data at the level of the study, participants, exposure measures, outcome measures, and effect size data. Before extraction began, reviewers read the study protocol [46], and previous meta-analyses on the topic. They were instructed on using the data extraction manual and spreadsheet, practiced extracting data from several articles, and discussed unclear issues with the first and second authors. Two reviewers each independently extracted data from a random sample of approximately 10% of articles, for which agreement was 85%. Data from all other articles were extracted by one of the six reviewers, with another reviewer double-checking coding. Disagreements were resolved by a third reviewer.

Data integration
Using CMA, effect size data were coded so that a negative correlation indicates association between high levels of racism and low (i.e. poor) levels of health, that is, when racism increases, decreased health was coded as a negative association (or OR values lower than 1). Where OR values higher than 1 originally indicated association between high levels of racism and poorer health, and values lower than 1 originally indicated association between high levels of racism and better health, these values were reverse-coded using 1/OR.
Weighted effect sizes were calculated to account for variation in sample sizes, giving more weight to effects from larger samples. When a study comprises multiple associations between racism and health for the same participant group/s, these associations are not independent. To ensure that data are independent, each study should contribute exactly one association per analysis. A single association can be selected or calculated through averaging. We chose averaging as it allows retaining as much data as possible. We extracted all relevant associations and used a shifting unit of analysis approach (e.g., [58]) to conduct analyses both at the level of individual outcomes (e.g., for depression only), and at the level of broader outcome groups (e.g., negative mental health), in which case we averaged associations for different negative mental health outcomes such as depression, anxiety and distress. Where multiple articles reported associations from the same study or data source, these associations were averaged. , and Sinai Improving Community Health Survey (k = 2). We also examined associations from papers by the same first author where the names of data sources were not mentioned but where the methodology and samples characteristics were identical or nearly identical, suggesting the same data may have been used in multiple papers. Nine such potential data sources, each reported in multiple papers, were identified in discussion between two reviewers. Associations were averaged for each data source using CMA.
We used the random effects model in aggregating effect sizes in all analyses. This model has been used in previous meta-analyses on the topic, as it is more appropriate than a fixed-effects model given various differences in methods, instrumentation and sample characteristics across studies (e.g., [13]) and considering that our aim is to generalize findings to the population of studies on racism and health outcomes (see also [31,59]). Mixed effect models were used in all moderator analyses, as a more conservative approach that enables testing of differences between different moderator levels (e.g., [31]).

Moderation analyses
Moderation analyses were conducted separately for each individual outcome, as well as for the broader outcome groups of negative mental health, positive mental health, physical health, and general health. Moderation analyses were conducted only when at least two levels of the moderator included five or more studies. This threshold was used in prior meta-analyses, and based on minimum thresholds established in the literature [60, 61]. Study was the unit of analysis utilized in moderation for the following variables: publication status (published/unpublished), sampling procedure (representative/non-representative), country (U.S./Other than U.S.), publication year (2005 or earlier/2006 or later), and data types (cross-sectional/longitudinal). Missing values were less than 5% and excluded from all analyses. We also conducted study-level analyses for exposure measure characteristics: exposure instrument name (comparing the 8 most frequently reported instruments, e.g., Schedule of Racist Events (SRE), Experience of Discrimination (EOD)), exposure instrument type (direct/ indirect exposure to racism), exposure instrument number of items (8 or less/9 or more), exposure instrument reliability coefficient (Cronbach's alpha) value (lower than 0.8/0.8 or higher) [62], exposure timeframe (less than 3 years/3 years or more/not specified). Categorizations were based on commonly used cutoff points (see for example [32,63]), with the aim of retaining as much data as possible given low study numbers reporting each moderator level in several analyses. Where a study reported multiple levels of the same moderator (e.g., a study reporting two exposure instruments, each using a different timeframe), such studies were excluded from the moderation analysis to avoid violating the assumption of independence.
Additional moderation analyses were conducted for participant subgroups. We included in the analysis studies entirely focused on a single participant subgroup (e.g., females only) as well as studies that reported separate, independent associations for multiple different subgroups (e.g., associations for females and males). Participant groups analyzed included: sex (male/ female), age (under 18/18 or over), U.S. ethnic group (African American/Asian American/ European American/Latino/a American/Native American), birth country (local-born/foreignborn), and, among student samples, current education level (primary/secondary/tertiary).

Quality appraisal
Several moderators were also examined as indicators of study quality, namely: 1) publication status; 2) sampling procedure; 3) data type (i.e., longitudinal versus cross-sectional); 4) exposure instrument number of items; and 5) Exposure instrument reliability coefficient (Cronbach's alpha) value. Published studies, representative sampling, longitudinal data, 9 or more items and alpha coefficient values of 0.8 or higher for exposure measures, were considered as indicators of higher study quality.

Publication bias analyses
Three methods were used to assess publication bias among the sample of studies. First, we produced funnel plots and examined their symmetry to assess whether there was evidence of bias. Second, we used Egger's weighted regression method [64], where the intercept's significance was examined for statistical evidence of bias. Third, we calculated a fail-safe N to estimate the number of un-located studies with an average zero effect size required to change the results substantively [65]. The fail-safe N allows us to assess whether the effect is an artifact of bias. Where we detected publication bias, we used the trim and fill method to estimate and adjust for missing (un-reported) studies to estimate what the effect size would have been in the absence of bias [66,67]. All tests of publication bias, as well as the trim and fill point estimates, were calculated using the CMA program.

Results
Descriptive statistics are provided in Tables 1 and 2. The screening resulted in 333 articles that met all inclusion criteria and comprised the final sample analyzed in this study [43][44]. The 333 articles contained unadjusted association data from 293 unique studies, after accounting for multiple articles that report the same study and for several articles (N = 5) that report two studies each. As in previous reviews and meta-analyses, this descriptive section reports on data per article rather than per study. Since multiple articles reporting the same study often examine different subsets of the data, reporting descriptive data at the level of the study was not feasible. This approach potentially double counts participants from 25 studies, each reported in multiple articles (altogether N = 70 articles), which we recognize as a potentially minor bias. Study, exposure, and outcome characteristics are presented in Table 1  Articles used a variety of instruments for assessing exposure to racism, with several articles using more than one exposure instrument. The most commonly used instruments of exposure to racism were the Schedule of Racist Events (SRE) (used in 10.2% of articles), Racism and Life Experience Scales (RaLES) (6.0%), Experiences of Discrimination (EOD) (5.7%), Perceived Racism Scale (PRS) (5.7%), Everyday Discrimination Scale (EDS) (4.2%), Perceived Ethnic Discrimination Questionnaire (PEDQ) (3.0%), Multidimensional Inventory of Black Identity (MIBI)-public regard subscale (3.0%), and the Nadanolitization scale (NAD) (1.5%). Most articles (79.9%) used measures of direct exposure to racism, and 18.0% of articles used indirect measures (e.g., group, vicarious, proxy-reports). Most articles used instruments that did not specify the exposure timeframe (63.1%), while a 12-month exposure timeframe was used in 16.5% of articles, and more than 3 years (including lifetime exposure) in 14.1% of articles. With regard to internal reliability of exposure instruments, almost half of the articles (48.6%) reported instruments with Cronbach's alpha coefficients of 0.80 or higher, and 14.1% of articles reported coefficients of 0.79 or lower. Nearly half of the articles (48.6%) used instruments with 9 or more items, over a third (35.1%) used instruments with 2-8 items, and single items were reported in 16.8% of articles. The most frequently reported mental health outcome was depression (reported in 37.2% of articles), followed by self-esteem (24.3%), psychological stress (21.3%), distress (18.3%), anxiety (14.4%). Life satisfaction was reported in 8.4% of articles, followed by negative affect (7.5%), control and/or mastery (5.7%), posttraumatic stress and posttraumatic stress disorder (4.8%), somatization (3.9%), internalizing (3.6%), suicidal ideation, planning and/or attempts (3.6%), general mental health (3.6%), wellbeing (3.0%), and positive affect (1.2%). Other mental health symptoms such as paranoia and psychoticism, were reported in 3.6% of articles. Among physical health outcomes, blood pressure and hypertension were reported in 7.2% of articles, followed by overweight-related outcomes (Body Mass Index (BMI), overweight, obesity, Waist Circumference (WC), Waist-Hip Ratio (WHR)) (5.1%), heart conditions and illnesses (2.4%), diabetes (2.1%), cholesterol (1.2%). Miscellaneous physical health outcomes were reported in 6.0% of articles. Miscellaneous physical health outcomes included: 1) measures that combine some of the physical health outcomes listed above (e.g., a measure combining blood pressure and diabetes); and 2) measures of other health outcomes listed below, either on their own, combined with each other, or combined with outcomes listed above. These included: angina back problems, arthritis, asthma, bodily pain, brittle bones, cancer, constipation, diarrhea, ear infection, exhaustion, fever, headache, gastrointestinal infection and disease, general/overall physical health, kidney and liver ⁄ gallbladder problems, major paralysis, muscular problems, nausea, neurological conditions, number of childhood illnesses, osteoporosis, Parkinson's disease, physical disability, physical functioning and role-physical, physical healthrelated quality of life, respiratory infection, rheumatism, scabies, sickle cell disease, sickle cell trait, skin infection, sleeping problems, sore throat, stomachache, stroke, and trouble breathing. General health outcomes, either unspecified as physical or mental health, or combining physical and mental health, were reported in 9.0% of articles. Table 2 reports the characteristics of the participants across articles. Most articles reported age, sex and, for U.S. articles, ethnic groups (Table 2). Age was reported in 264 articles, where 84.4% of participants were aged 18 years or above. Sex was reported in 322 articles, with females accounting for 60% of the total sample. Ethnic groups in the U.S. were reported in 271 articles. The largest participant subgroup was African Americans (37.1%), followed by European Americans (29.6%), Hispanic/Latino/a Americans (18.6%), and Asian Americans (9.4%). Birth country was reported in 147 articles, where 66.9% of participants were native-born, and 33.1% foreign-born. Level of education was captured in two ways. In 137 articles reporting student samples, 13.9% of participants were elementary, 54.7% were secondary, and 28.5% were tertiary school students. Additionally, in 68 articles that reported the highest level of education completed, 18.0% of participants completed less than high school/ General Education Development (GED), 27.2% completed high school, and 54.8% completed more than high school. Additional papers reported different groupings of highest levels of education that are not presented here.
Effect sizes (r) by outcome groups Table 3 presents the mean weighted effect sizes for the associations between racism and negative mental health, positive mental health, physical health, and general health using a randomeffects model. Study-level results and forest plots are presented for the four main outcome groups (see . Of these four health outcome groups, the largest mean weighted effect size was for negative mental health (r = -. 23 Examination of the funnel plots showed fairly symmetric plots for all four outcome groups (available upon request from the authors). Using Rosenthal's (1979) fail-safe N criterion that the value should be over five times greater than the number of studies included in the metaanalysis, the fail-safe N for all outcome groups were at least eight times larger than the criterion value, suggesting that the effect sizes are unlikely to be an artifact of bias. Finally, Egger's regression intercept was statistically significant for negative mental health, physical health, and general health, suggesting potential publication bias. It was not statistically significant for positive mental health. For negative mental health, physical health, and general health, Duval and Tweedie's trim-and-fill procedures were utilized. The number of imputed missing studies ranged from two studies (for general health) to 47 studies (for negative mental health). Imputing resulted only in minor reductions in effect sizes, which remained significant, indicating that  Effect sizes (r) by individual outcomes Table 3 also presents the mean weighted effect sizes for the associations between racism and individual health outcomes using a random-effects model. Due to space limitations, forest plots are not presented for individual outcomes, but are available from the authors upon request. The majority of studies examined negative mental health outcomes, with the mean weighted effect sizes ranging from r = -.34 for post-traumatic stress and post-traumatic stress disorders (95% CI [-.40,-.27], k = 16) to r = -.16 for suicidal ideation (95% CI [-.19,-.12], k = 10). All effect sizes for the negative mental health outcomes were significantly negative, indicating that racism is related to poorer mental health. The effect sizes for positive mental health outcomes ranged from r = .00 (positive affect) to r = -.19 (wellbeing). With the exception of positive affect (k = 4), which did not reach significance, racism had a significant negative association with all positive mental health outcomes.
The effect sizes for physical health outcomes ranged from r = .00 for each of the following: . Except for weight-related outcomes r = -.08 (95% CI [-.11,-.05], k = 15) and miscellaneous physical health outcomes, there was no statistically significant association between racism and physical health outcomes.
Examination of the funnel plots showed fairly symmetric distributions (available upon request from the authors) and the fail-safe Ns were all well above the criterion values for all individual health outcomes. Of the 23 individual outcomes, Egger's regression intercept was statistically significant for seven: depression, distress, stress, self-esteem, life satisfaction, overweight, and general health. Imputing possible missing studies for these seven outcomes resulted in either no, or very little, reduction in effect sizes, suggesting that the impact of bias is likely to be trivial. Depression (k = 109) had the highest number of imputed studies (i.e., 23 studies) and largest adjustment to its random-effects point estimate from r = -.21 to r = -. 18

Study-level moderators
Using moderation analyses, we examined whether associations between racism and health were moderated by study-level variables. Results are shown in Table 4. Significant moderation effects were found for all moderators except the internal reliability of exposure instruments.
Publication status: publication status significantly moderated the association between racism and self-esteem (Q(1) = 7.38, p = .007), and between racism and positive mental health more broadly (Q(1) = 5.89, p = .015). Published studies had larger effect sizes as compared to unpublished studies.
Publication year: publication year was a significant moderator for the association between racism and depression (Q(1) = 6.04, p = .014), and between racism and anxiety (Q(1) = 7.00, p = .008). Studies published from 2006 onwards had stronger effect sizes compared to studies published before 2006. Country: the country where the study was conducted significantly moderated the association between racism and negative affect (Q(1) = 4.33, p = .037), and between racism and selfesteem (Q(1) = 5.02, p = .025). Specifically, the effect sizes for studies conducted outside the U.S. were larger than effect sizes for studies conducted in the U.S.
Sampling: the sampling design was a significant moderator of the association between racism and the following three outcomes: stress (Q(1) = 3.98, p = .046), anxiety (Q(1) = 4.34, p = .037), and negative mental health more broadly (Q(1) = 5.13, p = .023). The effect sizes for studies using non-representative sampling were larger than the effect sizes for studies using representative sampling. Longitudinal versus cross-sectional data: there were sufficient numbers of studies reporting longitudinal data to allow the comparison of effect sizes from cross-sectional data versus longitudinal data for two outcome groups, positive mental health and negative mental health as well as two individual health outcomes, self-esteem and depression. Type of data collected (i.e., longitudinal versus cross-sectional data) significantly moderated the association between racism and negative mental health (Q(1) = 5.58, p = .018), but did not moderate the associations for self-esteem (Q(1) = 2.63, p = .105), positive mental health (Q(1) = 3.00, p = .083), and depression (Q(1) = 2.37, p = .124). There was insufficient data to examine data type features for physical health and other outcomes.
For negative mental health, a moderation analysis of cross-sectional and longitudinal studies (regardless of time between exposure and outcome), showed that the effect size for crosssectional data (r = -.22, z = -25.55, p < .001, k = 197) was larger than the effect size for longitudinal data (r = -.16, z = 5.84, p < .001, k = 14, Q(1) = 5.58, p = .018). Sufficient longitudinal data were reported in studies on racism and negative mental health to allow separate analyses for short-term longitudinal data (up to 1 year between exposure and outcome) and long-term longitudinal data (more than 1 year between exposure and outcome) (not reported in the table). This moderator was significant once again (Q(2) = 13.08, p = .001). The effect size for cross-sectional data (r = -.22, z = -25.55, p < .001, k = 197) was similar to the effect for shortterm longitudinal data (r = -.21, z = -4.38, p = < .001, k = 5, Q(1) = 0.049, p = .826), and significantly larger than the effect for long-term longitudinal data (r = -.11, z = -3.79, p < .001, k = 7, (Q(1) = 13.082, p <. 001), which was significant nonetheless. For the longitudinal studies  reporting negative mental health outcomes, we also tested the months between exposure and outcome as a potential moderator. The number of months between exposure and outcome was not a significant predictor of effect sizes (B = .0057, z = 1.57, p = .115, k = 12). Two studies reported both short-term and long-term longitudinal effects and were excluded from this supplementary analysis. Exposure instrument type: exposure instrument type was a significant moderator for the association between racism and depression (Q(1) = 4.68, p = .031), and between racism and negative mental health more broadly (Q(1) = 6.58, p = .010). Studies using exposure instruments that measure direct exposure to racism had larger effect sizes as compared to studies using exposure instruments that measure indirect (i.e., group or vicarious) exposure to racism.
Exposure instrument timeframe: the exposure instrument timeframe significantly moderated the association between racism and life satisfaction (Q(1) = 4.07, p = .044). Studies that ask about exposure to racism in the last 3 years or less had a smaller effect size as compared to studies using exposure instruments with unspecified timeframes.
Number of exposure instrument items: the number of items in the exposure instruments significantly moderated the association between racism and distress (Q(1) = 3.95, p = .047). The effect size for studies using exposure instruments with 9 items or more was larger than the effect size for studies using exposure instruments with 8 items or less.
Internal reliability of exposure instruments: only five outcome groups (depression, anxiety, self-esteem, negative mental health and positive mental health) had sufficient studies to allow moderation analysis using internal reliability of the exposure instrument as a moderator. In these analyses, exposure instrument reliability did not significantly moderate the associations between racism and any of the five health outcomes

Participant-level moderators
We conducted moderation analyses for participant subgroups, examining age, sex, ethnicity (for studies conducted in the U.S.), level of education and birthplace as potential moderators of the association between racism and health outcomes. Age (18 years vs. < 18 years), sex (male vs. female), current education level (primary vs. secondary vs. tertiary) and birthplace (local vs. foreign born) did not significantly moderate the association between racism and any of the health outcomes examined, where at least two levels of the moderator consisted of five or more studies. However, the associations between racism and depression, negative mental health, and physical health were significantly different across U.S. ethnic groups.
For depression (Q(2) = 6.29, p = .043), the associations for Asian Americans produced the largest effect size (r = -.28, z = -6.21, p < .001, k = 11), followed by associations for Latino/a Americans (r = -.24, z = -12.49, p < .001, k = 29). While these two groups did not have significantly different effect sizes from each other, they both had significantly larger effects when compared with African Americans (r = -.18, z = -8.06, p < .001, k = 38). Effect sizes for the other ethnic groups (European Americans, Native Americans) were few in number and were therefore not included in the analysis.
For negative mental health, which consists of depression as well as other mental health outcomes, the overall moderation analysis including 5 ethnic groups was not significant (Q(4) = 7.64, p = .106). However, ethnicity was significant as a moderator in pairwise analyses comparing effects for African Americans with effects for Latino/a Americans and for Asian Americans. Accordingly, the associations for Asian Americans produced the largest effect size (r = -.28, z = -8.30, p < .001, k = 20), followed by Latino/a Americans (r = -.25, z = -14.32, p < .001, k = 49). While these two groups did not have significantly different effect sizes from each other, both had significantly larger effect sizes when compared with African Americans (r = -.20, z = -10.79, p < .001, k = 68). The effect sizes for European Americans (r = -.21, z = -5.28, p < .001, k = 6), and for Native Americans (r = -.21, z = -8.25, p < .001, k = 6), were not significantly different from the effect sizes for the 3 other ethnic groups.
For physical health, African Americans and Latino/a Americans were the only groups for which sufficient numbers of associations were reported to allow moderation analysis. Ethnicity significantly moderated the association between racism and physical health (Q(1) = 5.22, p = .022). Specifically, the effect size for Latino/a Americans (r = -.12, z = -3.26, p = .001, k = 5) was significantly larger than the effect size for African Americans (r = -.03, z = -2.33, p = .020, k = 24).

Discussion
This meta-analysis is the first to focus specifically on racism and health across a range of populations, national contexts and health outcomes. Using a comprehensive and rigorous search protocol, a total of 293 studies reported in 333 articles were located. Consistent with previous systematic reviews, trends over time indicate increasing output of published articles focusing on racism and (particularly adult) health, while a relative majority of research is still being conducted in the U.S. among African Americans [10,14]. However, this meta-analysis indicates an increasing trend for articles to include European Americans (often for comparative purposes) with a growing focus on Latina/o and Asian Americans as well. The majority of participants were adults, and only about 16% of participants were younger than 18 years old. Most participants were female, and over a third of articles focused on student samples (predominately from secondary and tertiary, rather than elementary, education levels). This research demonstrates a primary focus on locally-born populations but with a third of articles also including migrant populations (see also [14]). Comparative studies on the impact of racism over time on both migrants and native-born populations of similar ethnic/racial backgrounds are currently lacking and should be examined in future research.
This meta-analysis indicates that racism is significantly related to poorer health, with the relationship being stronger for poor mental health and weaker for poor physical health. After adjusting for publication bias, the correlation with poor mental health remained twice as large as the correlation for poor physical health, with results for general health (unspecified as mental or physical health/mental and physical health combined) falling in-between. This contrasts with the findings reported by Pascoe and Richman [13], where the association between perceived discrimination and physical health compared to mental health did not differ significantly. One possible reason for the discrepancy is that Pascoe and Richman [13] examined multiple forms of perceived discrimination, whereas the present study focused on discrimination based explicitly on race and related classifiers like ethnicity and nationality.
A more detailed examination of health outcomes indicates a two-fold range in the strength of association between racism and poor mental health (from r = -0.16 for suicidal ideation, planning, and attempts and r = -0.34 for post-traumatic stress and post-traumatic stress disorder). After adjustment for publication bias, depression (the most commonly reported outcome) had the same magnitude of association as the broader category of negative mental health. With regard to physical health, only overweight-related outcomes and a range of miscellaneous physical health outcomes were significantly associated with racism, as found in recent longitudinal studies [399,400]. A recent review [401] found only some significant associations between racism and obesity. Our different finding may be due to the more comprehensive nature of our examination of overweight-related outcomes (including BMI, WC, WHR, overweight and obesity), the inclusion of additional studies (some of which were conducted more recently), and the use of different designs (a meta-analysis versus a literature review).
Small sample sizes for cholesterol (k = 4) and heart conditions/illnesses (k = 8) may have limited power to detect associations for such outcomes, however this was not the case for the null finding between racism and blood pressure/hypertension (k = 24). In a recent meta-analysis Dolezsar et al.
[34] also found that racism was not significantly related to either blood pressure or hypertension. Whereas previous meta-analyses in this field have tended to consider physical health outcomes as a group, our additional examination of disaggregated outcomes reveals mixed findings across distinct physical health outcomes. Although some research has suggested the relationship between racism and blood pressure may be curvilinear [121, 281,304,[402][403], this possibility was not explored in the current meta-analysis due to limitations in the statistical analysis program we utilized.
The stronger association between racism and mental health outcomes, compared with physical health, raises questions about the mechanisms by which racism affects health. Chronic exposure to racism may be implicated in hypothalamic-pituitary-adrenal (HPA) axis dysregulation that, in turn, can damage bodily systems and lead to physical outcomes such as CVD and obesity. The impacts of racism on the dysregulation of cognitive-affective regions such as the prefrontal cortex, anterior cingulate cortex, amygdala and thalamus share similarities with pathways leading to anxiety, depression and psychosis [404]. Neuroimaging studies have also identified activation of these regions in response to social rejection that are correlated with selfreport distress levels and are analogous to the activation of regions involved in physical pain [405]. Such neurobiological changes may also be precursors to racism-related vigilance and rumination which are emerging as health risk factors in their own right [406][407][408][409][410][411].
For negative mental health, the effect size for studies using cross-sectional data was larger than the effect size for studies using longitudinal data. We found that long-term longitudinal data (more than one year between exposure and outcome) showed weaker, although still significant, associations between racism and health compared to either cross-sectional or shorterterm longitudinal data (up to one year between exposure and outcome). Schmitt et al. [23] have found similar results with regards to discrimination more broadly. This finding suggests that the detrimental impact of racism may attenuate over time, perhaps because of the fading impact of brief exposure or due to individuals becoming 'hardened' to racism over time [412,413]. The implications of such preliminary evidence for the etiological 'half-life' of racism warrant further investigation through a greater focus on longitudinal data in the field (comprising only 9% of articles to date).
Moderation by age, sex, education level and birthplace of study participants has been found in previous individual studies (e.g., [42,414]) but has been inconclusive in previous systematic reviews and meta-analyses [e.g., [13]). Similarly, we found that these variables did not significantly moderate the association between racism and the health outcomes. Ethnicity, however, significantly moderated the association between racism and depression, negative mental health, and physical health, providing tentative evidence that the association between racism and negative mental health is stronger for Asian Americans and Latino/a Americans compared with African Americans, and that the association between racism and physical health was stronger for Latino/a Americans compared with African Americans. There were no significant differences between African Americans and either European or Native Americans. These findings could suggest that African (and possibly Native) Americans are more resilient to racism than other minority groups [415]. It is also possible that, as may be the case for European Americans [416], their experiences are qualitatively distinct from other minority groups. These results should be treated as preliminary given the number of studies reporting effects for some ethnic groups was rather small (for example, only 6 studies of Native Americans and 6 studies of European Americans were used in analyses of racism and mental health, and only 5 studies of Latino/a Americans were used in analyses of racism and physical health).
Other studies have found variations in correlation strength within Latino/a American groups (Cubans, Mexicans, Dominicans and Puerto Ricans) [29] and between Asian, African and Latino Americans in relation to chronic conditions [417]. The magnitude of associations in our meta-analysis was similar (or slightly higher) than findings from previous meta-analyses that focused on these specific ethnic/racial population groups [28][29][30][31].
In terms of study quality, there was some indication of publication bias whereby some associations between racism and health outcomes were stronger in published versus unpublished studies, consistent with the tendency for 'null' studies to remain unpublished (i.e., the 'file drawer' problem [418]). Similarly, studies using non-representative samples had larger effect sizes than those using representative samples, indicating that bias may be introduced through 'convenience' and other sampling strategies. This finding raises questions about the potential role of sampling bias. Convenience samples often employ strategies such as snowball sampling, whereby participants are connected to one another, which then leads to problems with autocorrelation, or recruitment via advertisements in community locations that could lead to recruitment of those participants for whom racism is more salient, thus inflating effect sizes. By contrast, larger and more representative studies often use more sophisticated sampling methods, as well as statistical corrections (e.g. sampling weights) that may lead to less biased estimates. We stress that the associations between racism and health are evident regardless of the study methodology, but note that studies using convenience may overestimate the association between racism and health outcomes.
It is not immediately clear why associations between racism and some negative mental health outcomes were stronger in studies published more recently (from 2006) or in non-U.S. studies. Further research, including meta-regression to control for potential methodological covariates, should investigate such trends more closely. Unlike Pieterse et al. [31] who found no moderation by sample, publication or instrument type in their meta-analysis of perceived racism and mental health for Black Americans, we found stronger associations for some health outcomes in studies utilizing direct rather than indirect exposure measures, exposure assessment with specified rather than unspecified timeframes, and instruments with more rather than less items. The latter resonates with previous findings that the association between racism and health is stronger in studies that employ multiple item or multiple domain measures of racism [10,22,419].
One of the key challenges in the study of racism and health is the profusion of exposure measurements currently utilized by researchers [420,421]. Combining the eight most popular scales accounts for only about a third of extant articles, with the Schedule of Racist Events (the most commonly utilized tool) being referenced in only around 10% of articles. As the field matures, it is likely that measurement will converge on validated best-practice instruments. Both the Everyday Discrimination Scale [422] and Experiences of Discrimination scale [221] are likely candidates given applicability to a range of ethnoracial groups and extensive psychometric validation [280,[423][424][425][426][427][428], although a focus on how these and other instruments can validly assess racism among children and youth is currently lacking [14]. We caution however, that our focus on explicit mentions of race will tend to understate the contribution of scales like the Everyday Discrimination Scale, for which attributions to race are volunteered by participants, rather than intrinsic to the question asked.
This study noted some differential findings by exposure measures, for example for PRS. Studies using PRS found stronger effects of racism on mental health when compared with studies using RaLES, whereas the effect of racism on physical health was weaker in studies using PRS compared with studies using EDS and EOD. These novel findings should be investigated in future studies that explicitly compare effects between various measures (e.g., [429]).
Internal reliability, however, did not moderate associations between racism and health. Alpha coefficients are the most commonly reported indicator of a study's reliability, but unfortunately, only measure a single dimension (internal consistency). More valid and reliable instruments should be better able to detect associations between racism and health. Accordingly, we encourage authors to provide more comprehensive description of their instrument's psychometric properties (e.g., test-retest). Finally, the relatively weaker association found in studies using indirect measures (e.g., group and vicarious exposures to racism) may stem from the relatively under-developed nature of measurement approaches to date (e.g., [295,430,431]), highlighting a need for further development.
Although required to estimate variations in exposure over time [432], explicit time frames were not always included in exposure instruments. Gee, Walsemann and Brondolo [413] have argued that explicit attention to the timing of racism is critical both for theoretical and empirical reasons, especially these dimensions of time: (1) the length of exposure to discriminatory events; (2) the timing of these events within the life course; and (3) the etiological period between exposure and the onset of illness. While in this study the effects of racism on health were not modified by age (categorized as 18 years and older vs. less than 18 years), it is highly plausible that children are more vulnerable to the harmful effects of racism, and that experiences of racism in the early years of life have more severe and persistent health consequences than racism experienced later in life [14]. This is likely through the biological embedding of early life stress as well as weathering effects resulting from chronic exposure to stress throughout life [433]. The few studies on racism and health conducted among pre-adolescent populations limit the extent to which moderation effects by age can be tested. More longitudinal studies commencing early in the lifespan, from pre-conception onwards, are required to further elucidate these causal pathways.
This study includes a number of limitations. First, it does not include articles in languages other than English and thus may under-represent studies from countries which publish in other languages. Second, it focuses only on unadjusted associations, mainly because of the challenges related to adjusting for different sets of covariates. A recent meta-analysis demonstrated that for mental health and discrimination more generally, differences between unadjusted associations and those adjusted for covariates were not significant [23]. Nonetheless, we concur with Pascoe and Richman [13] that a specific set of control covariates should be reported in future studies to allow more thorough meta-analytic investigation of partial correlations. A growing body of literature may also allow further elucidation of moderators using meta-regression. Although beyond the scope of this paper, a fine-grained examination of individual outcomes (e.g., psychological stress, depression) is also required given evidence of differential associations between racism and various health outcomes.
This study is the most comprehensive meta-analysis on racism and health to date, providing information on the state of play in this rapidly growing field. Our findings corroborate previous research findings as to the magnitude of associations between racism and mental health, adding novel meta-analyses of associations between racism and a diverse range of outcomes, including overweight, somatization, psychological stress, and post-traumatic stress (PTS) and stress disorder (PTSD). It also provides evidence that racism has long-term effects on health that remain significant despite attenuation over time. It is hoped that this meta-analysis can provide new directions for research in understanding, as well as addressing, racism as a determinant of illhealth.
Supporting Information S1