Death by Segregation: Does the Dimension of Racial Segregation Matter?

The county-level geographic mortality differentials have persisted in the past four decades in the United States (US). Though several socioeconomic factors (e.g., inequality) partially explain this phenomenon, the role of race/ethnic segregation, in general, and the different dimensions of segregation, more specifically, has been underexplored. Focusing on all-cause age-sex standardized US county-level mortality (2004–2008), this study has two substantive goals: (1) to understand whether segregation is a determinant of mortality and if yes, how the relationship between segregation and mortality varies by racial/ethnic dyads (e.g., white/black), and (2) to explore whether different dimensions of segregation (i.e., evenness, exposure, concentration, centralization, and clustering) are associated with mortality. A third goal is methodological: to assess whether spatial autocorrelation influences our understanding of the associations between the dimensions of segregation and mortality. Race/ethnic segregation was found to contribute to the geographic mortality disparities. Moreover, the relationship with mortality differed by both race/ethnic group and the dimension of segregation. Specifically, white/black segregation is positively related to mortality, whereas the segregation between whites and non-black minorities is negatively associated with mortality. Among the five dimensions of segregation, evenness and exposure are more strongly related to mortality than other dimensions. Spatial filtering approaches also identified six unique spatial patterns that significantly affect the spatial distribution of mortality. These patterns offer possible insights that help identify omitted variables related to the persistent patterning of mortality in the US.


Introduction
While the United States (US) as a whole has experienced a significant decrease in mortality since World War II from approximately 20 deaths per 1,000 population to 8 deaths per 1,000 population [1], two mortality patterns have remained-racial/ethnic and geographic mortality disparity. An example of the racial/ethnic mortality disparity can be found in data comparing the mortality of non-Hispanic blacks to those of other race/ethnicity groups. In 2011, the ageadjusted mortality among non-Hispanic blacks was 9.04 deaths (per 1,000 population), whereas non-Hispanic whites and Hispanics had 7.54 deaths and 5.39 deaths, respectively [2]. Similarly, geographic mortality disparities across the US has been stable for the past four decades [3]. For example, the counties in the Black Belt and lower Mississippi Valley have had relatively high mortality rates, whereas those in the Great Plains, Mid-West, and along the US/ Mexico border much lower [3]. Explanations for these geographic mortality disparities have mainly focused on socioeconomic factors (e.g., poverty, unemployment, educational attainment, income inequality, and social capital), demographic structure (e.g., racial compositions), and context (e.g., rurality) [4][5][6][7][8][9]. While these factors are important, the variation in mortality along with race/ethnicity and geographic dimension has not been fully explained in the literature. A potential determinant, namely race/ethnic segregation and the different dimensions of segregation, has-perhaps surprisingly-rarely been incorporated into ecological mortality research, particularly nationwide county-level studies [10].
The concept of segregation is complex but earlier studies on health and segregation largely overlook the complexity of how to measure segregation. Massey and Denton [11] have shown that segregation can be classified into five different dimensions-evenness, exposure, concentration, centralization, and clustering-and each of these dimensions can in turn be calculated in multiple ways. Little attention has been paid to the nuances of race/ethnic segregation in health research [12,13]. A recent study used three different measures of segregation to explore the associations with mortality [14]; however, this analysis focused solely on white/black segregation, ignoring other race/ethnic groups. In this context, this paper makes a unique contribution to the literature by examining whether racial/ethnic segregation and specific dimensions of segregation are associated with US county-level mortality. The findings of this study will substantiate the utility of ecological mortality research, and contribute new material and perspectives on discussions of segregation and health.
Research on race/ethnic segregation has tended to use the metropolitan and nonmetropolitan definitions proposed by the Office of Management and Budget and to focus on US metropolitan areas [15][16][17]. This metropolitan focus persists even though race/ethnic segregation has been reported to be higher in nonmetropolitan areas where the racial composition is generally less diverse [18][19][20][21]. Following Lobao and colleagues [22], we adopt a county-level perspective to consider both metropolitan and nonmetropolitan areas of the US and this approach complements the metropolitan focus in the literature in the following ways. First, the metropolitan focus is often on a single or a subset of metropolitan area(s) and does not cover the entire US; with the consequence that nonmetropolitan areas are overlooked. The county-level perspective provides a nationwide assessment of the relationship between segregation and mortality and allows for a direct comparison with previous county-level studies, e.g., Cossman et al [3]. Second, some large metropolitan areas are comprised on multiple counties and in this context the findings are specific to a metropolitan and not for each county within a metropolitan area; the consequence here being that conventional approaches overlook the heterogeneity within metropolitan areas and may obscure the variation in mortality within a metropolitan area [10]. Thus the county-level perspective disentangles the segregation-mortality relationship within a metropolitan area and offers nuanced insight into the geographic mortality differential across the US. However, the use of county-level data in an ecological analysis can also raise several methodological issues. Most specifically, counties are not independent observations and thus using county-level data without controlling for spatial dependence may result in biased estimates and lead to invalid or misleading interpretation of findings [23]. In this paper, we directly address this issue via the use of spatial filtering regression methods; an emergent technique in spatial analysis.
This study has four main goals: (1) to understand whether race/ethnic segregation is a determinant of mortality in US counties, (2) to investigate whether the relationship between segregation and mortality varies by individual minority groups (i.e., blacks, Hispanics, and Asians/ Pacific Islander) from the dominant group (i.e., whites), (3) to explore whether different dimensions of segregation are associated with mortality, and finally (4), a methodologically informed goal, to assess whether spatial dependence influences our understanding of the associations between the individual dimension of segregation and mortality. Our analysis focuses on US county-level mortality data for 2004-2008.

Segregation and Health by Race/Ethnicity
The conventional belief that race/ethnic segregation is adversely related to health is partially rooted in the ethnic stratification perspective. Collins and Williams argued [24] that race/ethnic segregation could be understood as a structural manifestation of racism against minorities, and in particular, non-Hispanic blacks (hereafter blacks). Discrimination against minority groups can take on many forms. Among the most dominant forms is when discrimination fuels the residential sorting process, which as Logan [25] noted, is a powerful mechanism maintaining the advantages of the majority group and generates ethnic stratification. From this perspective, people living in racially segregated neighborhoods are exposed to multiple health risk factors, such as poverty, crimes, and poor public services [24,26], and these risk factors are associated with poor health and racial health disparities [27]. The legal challenges to discrimination and discriminatory practices enacted since the 1960s has in part lead to the decline in white/black segregation [28].
Extending the ethnic stratification perspective, there are four key mechanisms underlying the common belief that race/ethnic segregation is detrimental to health: (1) areas with high levels of race/ethnic segregation have poor socioeconomic status (e.g., high poverty and high unemployment), which may contribute to poor health outcomes [24,26,29]; (2) race/ethnic segregation is associated with political alienation and powerlessness and these factors may lead to relatively few resources being channeled into a minority area; (3) the environment of an area with high race/ethnic segregation is more likely to be neglected and lacking infrastructure [30,31]; and (4) the hospitals and community health care centers in highly segregated areas have been found to lack state-of-the-art technology or facilities that can reduce mortality and living in segregated areas may be translated into poor access to and/or utilization of health care services [32,33]. These pathways, individually and in combination, may expose local residents to multiple health risks; a negative association between race/ethnic segregation and health is expected [27]. The conventional framework is heavily driven by the ethnic stratification perspective, but this framework may not be applicable to non-black minority groups, such as Hispanics and Asians/Pacific Islanders.
The landscape of racial composition has rapidly changed since the 1980s mainly due to the influx of immigrants from both Latin America and Asia [34]. The geographically mixing and thus the race/ethnic segregation of Hispanics and Asians/Pacific Islanders with whites have been transformed [35,36]. Researchers examining the residential sorting processes for different minority groups find differences between the growing race/ethnic groups and the white population compared to the processes accounting for white/black segregation. Though Asian immigrants have stronger social capital and higher educational attainment than Hispanic immigrants, both groups tend to live in an ethnically bound neighborhood or enclaves [37]. This living arrangement can help them to improve their socioeconomic situation and the process of adaption to the new society. That is, race/ethnic segregation between Hispanics and Asians from whites may be strategic for these minority groups. Logan and colleagues [37] identified two types of neighborhood that serves the goal to help immigrants to survive and thrive: ethnic enclaves and ethnic communities. The distinction between these two neighborhoods is grounded in the motives of minority residents. Specifically, the former plays a temporary or transitional role in the process of adaption, whereas the latter is established by minority members who voluntarily live nearby, usually in the later stage of the process of adaption [37]. Despite the difference, the shared and imperative function of both neighborhoods is to help minorities to thrive or accumulate social and financial capital. This function may encourage Hispanics and Asians to be self-segregated to take advantage of ethnically bound neighborhoods. Racial segregation may, hence, be beneficial to Hispanics and Asians for the following reasons.
First, living in an ethnic enclave/community can translate into increased social support, frequent social engagement with people of the same race/ethnicity, and fewer challenges emerging from linguistic isolation [31]. These factors foster strong social cohesion that may facilitate health and well-being [38]. Second, and related, an ethnic enclave/community may provide social, economic, and structural resources generated by the close-knit social connections or among residents of the same race/ethnicity [39,40]. That is, the access to educational, information, and occupational opportunities in an ethnic enclave/community may be better in an ethnic enclave or community than in other types of neighborhood. Third, being segregated from the dominant racial group indicates a low level of exposure to direct racial discrimination, and in such a neighborhood, the norm that racial discrimination is intolerable would prevail thanks to a strong ethnic identity [41,42]. All of these factors lead us to argue that race/ethnic segregation may be beneficial to health (i.e., mortality in this study), particularly for non-black minorities. It should be noted that scholars recently found the living in ethnic enclaves was associated with no access to psychiatrist in a neighborhood, which conflicts with our arguments [33], however, the majority of the literature still provides support to the aforementioned benefits.
As all-cause mortality is directly affected by the prevalence of diseases and/or health behaviors, in order to better estimate the relationship between mortality and segregation, it becomes necessary to control for population health in a county. Somewhat surprisingly, relatively few ecological mortality studies in the past decade took this aspect into consideration, cf. Kindig and Cheng [43]. This study will take population health measures into account. Should this approach be used and the relationship between segregation and mortality remain, we obtain strong evidence to support our substantive arguments.
Since segregation and mortality are essentially ecological measures, their distribution across counties may be spatially dependent as are many other social data [44]. To obtain unbiased estimates, researchers have become increasingly aware of the importance to take spatial dependence into account, particularly in ecological demographic studies [45]. We will discuss how this study addresses this methodological issue later.

Why Do Dimensions of Segregation Matter?
We argue that the aforementioned mechanisms linking segregation and health may be better captured by some specific dimensions of segregation and we propose two reasons to justify this. First, it has been found that the levels of segregation are not necessarily consistent across the five dimensions. For example, Wilkes and Iceland [17] examined the five dimensions of segregation among the US metropolitan areas and found that the cities with high concentration and centralization tend to have relatively low evenness, exposure, and clustering. That is, for a given metropolitan area, a highly uneven distribution of minority groups does not necessarily translate into a high concentration of minority groups. In this way, using a single dimension of segregation may not fully capture the mechanisms from which segregation affects health.
Second, the social and health implications differ by the dimensions of segregation. For example, exposure focuses on the potential contact between the minority and dominant racial/ ethnic groups and high levels of contact indicate less segregation. The exposure dimension better echoes the concept of social integration or social capital that facilitates population health [38,46] than do other dimensions. In contrast, evenness is a measure of the spatial distribution of minority groups; however, the evenness dimension may be less relevant to health or social outcomes than clustering as high clustering indicates a racial/ethnic enclave, where the life chances may be better/worse than other types of communities [37]. Similarly, higher centralization has been found to increase the odds of an area being a primary care physicians shortage areas, particularly for blacks [32], indicating that living in a highly centralized area would lead to the lack of access to primary care physicians. Given these potential differences across dimensions, it is important to examine the relationship between health and the dimensions of segregation.

Using Spatial Structure to Address Spatial Dependence
In this study, mortality and segregation both are measured for ecologic units (i.e., counties) and the spatial relationships between these units need to be explicitly incorporated into an analytic strategy if we are to obtain unbiased estimates of the relationships that exist between them [47,48]. Several demographers have explicitly incorporated a spatial perspective into mortality research based on a spatial econometric framework and concluded that applying a spatial perspective to county-level data would generate better model fits and more accurate predications than the models without a spatial component [8,9,14]. Arguably, spatial econometric methods are the most popular in spatial demography but they can only provide an overall assessment of how much spatial structure matters. A common criticism of the spatial econometrics approach is that researchers do not know how spatial structure matters. Griffith [49,50] and Tiefelsdorf and Griffith [51] have proposed spatial filtering methods that use eigenfunctions to create a series of spatial patterns that are mutually unrelated but associated with the spatial structure underlying the spatial units. A recent study by Thayn and Simanis [52] suggested that the spatial filtering approach effectively minimizes spatial misspecification errors, improves model fit, and eliminates spatial dependence. In addition, the unrelated spatial patterns can be visualized (i.e., mapped) to gain further insight into how spatial structure contributes to the analysis and potentially shed new light on omitted variable bias [49]. Indeed, perhaps the most distinctive feature of the spatial filtering approach lies in the decomposition of errors and that allows for the visualization of unknown spatial processes that affect the spatial pattern of the dependent variable (i.e., mortality). To our knowledge, our study is the first to use a spatial filtering approach to examine the mortality pattern across US counties.
Drawing from the discussion above, we propose two substantive research hypotheses and one derived from our use of spatial filtering. The main substantive hypotheses are: 1. White/black segregation is positively related to mortality as the segregation process (ethnic stratification) is rooted in discrimination, and the segregation between whites and nonblack minority groups are beneficial to mortality as the segregation processes lead to the formations of enclaves and/or communities.
2. Not all five dimensions of segregation matter and the exposure dimension is the most relevant as dimension as it captures the mechanisms linking segregation and mortality by assessing the level of interactions between the minority and the dominant racial/ethnic group.
We also expect that 1. spatial dependence will bias the estimates of the relationships between segregation and mortality and as such a spatial filtering approach can help refine our model specification and be used to identify the spatial patterns. We will examine these issues focusing on five dimensions of segregation.

Data and Measures
The county-level mortality rate is the dependent variable of this study. Based on the Compressed Mortality Files maintained by the National Center for Health Statistics [53], we created the 2004-2008 five-year average mortality rates that are standardized to the 2006 US age-sex population structure. Using the five-year average rates minimizes the fluctuations across years and this approach has been used in recent ecological mortality studies [54,55]. As race/ethnic segregation plays a crucial role in this study, rather than standardize mortality rates with respect to racial groups, instead we include the race/ethnic composition of a county in the analysis. While segregation is the key independent variable of this study, we further control for six factors that have been commonly included in ecological mortality research, namely metropolitan status, socioeconomic status, racial composition, income inequality, social capital, and population health. Segregation and each of the other six factors are measured with different variables; we discussed the operationalization of our analytical variables in detail below:

Segregation
We measured all five dimensions of racial segregation. Specifically, the "evenness" and "exposure" dimensions are measured with entropy [56] and the isolation index, respectively. The entropy index assesses the average deviation of the sub-unit (i.e., tract in this study) from the county's racial diversity, whereas the isolation index measures "the extent to which minority members are exposed only to one another (p.288)" [11]. Entropy and isolation index both range between 0 and 1 and higher values of entropy and isolation suggest higher segregation. The "centralization" dimension is captured with the absolute centralization index that is developed to understand if the minority group is distributed around the center of a county. The absolute centralization index varies between -1 and 1 where a positive value indicates that minorities tend to live nearer the center of a county, whereas the negative values suggest that minority populations live in the outlying areas [11]. The "concentration" dimension is based on the delta index [57] and is calculated to assess the proportion of minority members who live in the areas where the minority density is higher than average, ranging between 0 (no concentration) and 1. The delta index represents the proportion of a minority group that have to move to reach a uniform density within an area [11]. The fifth and final dimension, "clustering," is measured by the spatial proximity index [58], which is the average of proximities within the minority and majority group, respectively. A spatial proximity index greater than 1 suggests that minority members live close to one another and so do the majority. When a spatial proximity index is less than 1, it means that majority and minority members live closer to each other than the members of their own groups.
We calculated the five segregation measures by the following three race/ethnicity combinations: non-Hispanic whites vs. non-Hispanic blacks (white/black), non-Hispanic white vs.
Hispanic (white/Hispanic), and non-Hispanic white vs. Asians/Pacific Islanders (white/API). We only focused on the three largest minority groups in order to avoid unreliable segregation measures due to small number issues in many counties. The statistical procedures developed by Iceland and colleagues [35] were applied to the 2010 Census Summary File 1 race/ethnicity data and we aggregated tract data into counties to obtain the fifteen segregation measures for all US counties.

Metropolitan
The rural-urban mortality differential in US counties has been documented and metropolitan counties have been found to have higher mortality rates than their rural counterparts [7]. Taking other socioeconomic covariates, such as poverty and educational attainment, into account does not fully explain the geographic mortality differential and thus it is important to consider metropolitan status in this study. We employed the metropolitan status developed by the US office of Management and Budget in 2010 to dichotomize US counties into metropolitan and nonmetropolitan counties. While the heterogeneity within each group could be great, a recent study [9] reported that the conclusions based on the metro-nonmetropolitan dichotomy were similar to those drawn from a finer metropolitan measure (i.e., the rural-urban continuum codes). That is, the dichotomous metropolitan status provides modeling parsimony to this study.

Socioeconomic status
As discussed previously, socioeconomic status is an important factor for mortality. We will use 2005-2009 American Community Survey (ACS) to obtain a set of social indicators and apply principal component analysis to reduce the total number of variables. This variable reduction approach is comparable with that proposed by Sampson and colleagues [59]. Indeed, more specifically, the following four indicators loaded on the concept of social affluence (factor loading in parenthesis): the log of income per capita (0.72), the percentage of population aged 25 or over with at least a bachelor's degree (0.91), the percentage of population working in professional, administrative, and managerial positions (0.87), and percentage of families with annual incomes higher than $75,000 (0.87). Similarly, the concept of social disadvantage was derived from three indicators: the poverty rate (0.72), the percentage of population receiving public assistance (0.71), and the percentage of female-head households with children below 18 (0.81). These two concepts-social affluence and social disadvantage-account for more than 70 percent of the variation among these seven indicators. We used the regression approach to generate the factor scores included in the analysis.

Racial composition
As the dependent variable, county-level mortality rate, was not standardized with the race/ethnicity structure, we included racial composition variables namely the proportion of blacks, the proportion of Hispanics, and the proportion of other races. It should be noted that racial composition is highly correlated with the five segregation measures. For example, the proportion of blacks is strongly associated with the white/black isolation index (Pearson's R = 0.90). To avoid multicollinearity, we excluded the racial composition variable from models that include the same group-specific minority segregation measures (e.g., excluding the proportion of blacks in the model with all five white/black segregation indices). Furthermore, even when the proportion of Asians or Pacific Islanders is separated from the proportion of other races and included in the analysis, its association with mortality was not statistically significant (due to vary small proportions across counties) and our conclusions were not altered. For the purpose of modeling parsimony, we just presented the results using the proportion of other races.

Income inequality
The relationship between income inequality and health has drawn much attention in the literature [60] and income inequality is closely related to racial segregation [61]. To understand whether the association between segregation and mortality is independent of income inequality, we used the 2005-2009 ACS household income data to calculate the Gini coefficient and included it in the analysis to control for the level of income inequality in a county. The Gini coefficient ranges between 0 and 1 and a larger Gini coefficient indicates a higher level of income inequality. As the top-coded category in the ACS income data is an open category ($200,000 or above), the income inequality measure may be underestimated (a common drawback when calculating the Gini coefficient with grouped, instead of individual, income data).

Social capital
As for social capital, we adopted social capital index developed by Rupasingha et al [62], measuring county-level social capital based on Putnam [63]. Four indicators were used to assess the strength of social capital in a county: the number of associations (e.g., sports clubs) per 10,000 population, the number of non-profit organizations per 10,000 population, the mail response rate for the decadal census, and the presidential election voting rate. Using principal factor analysis, Rupasingha and Goetz [64] calculated the 2005 social capital index (the latest available) for US counties and a larger social capital index suggests stronger social capital in a county. We used the 2005 social capital index in the analysis.

Population health
As discussed previously, the county-level mortality rate is a consequence of overall population health in a county and including population health covariates helps us to better clarify the segregation-mortality association. Somewhat surprisingly, few previous studies explicitly include the population health measures of a county in the analysis. To fill this gap, we obtained two reliable population health measures from the University of Wisconsin Population Health Institute [65]: average unhealthy days per month of the population in a county and the adult obesity rate. The unhealthy days include both mental and physical unhealthy days based on residents' answers to the question of "how many days during the past 30 days was your physical and mental health not good." The adult obesity rate indicates the percentage of adults with a body mass index greater than 30. Both measures were originally developed by the Centers for Disease Control and Prevention (CDC) and have been included in the Behavioral Risk Factor Surveillance Surveys conducted and maintained by CDC. The reliability and validity of these measures have been examined by CDC and the University of Wisconsin Population Health Institute. The methodologies used to obtain these county-level health measures could be found elsewhere [66][67][68].

Methodology: Spatial filtering
Mortality rates are not evenly distributed across the US [3] and importantly these patterns indicate strong spatial dependence. Based on a spatial filtering approach, the spatial pattern of mortality can be decomposed into three parts: (1) a spatial trend that can be explained by a set of independent variables related to mortality, (2) a spatial process that could only be captured by the factor that is not included as an independent variable, and (3) the random disturbances.
The eigenvector spatial filtering approach aims to extract distinctive spatial patterns that are not only associated with the spatial process in (2) but also account for the spatial autocorrelation in the dependent variable (i.e., mortality). This eigenvector spatial filtering approach can be adopted by most classical regression models, such as the ordinary least squares (OLS) and logistic regression; the estimates of the spatially filtered models would be unbiased and the interpretations of the estimates remain the same [51].
Since the dependent variable of this study is continuous, we discuss the eigenvector spatial filtering approach under the OLS framework. The basic OLS regression model for mortality can be expressed as y = βX + ε Ã , where y is a vector of mortality rates, β represents the parameters associated with a set of independent variables, X, and ε Ã are spatially autocorrelated errors. The eigenvector spatial filtering approach further decomposes ε Ã into Eγ + ε, where E represents a set of unspecified factors that are related to the spatial autocorrelation of mortality, γ is a set of estimates for E (i.e., the relationships with mortality), and ε denotes the random disturbances. In empirical research, E is often, if not always, unknown. However, as the goal of spatial filtering is to account for spatial autocorrelation in the dependent variable, E can be created based on Moran's I [49,51], a commonly used measure of global spatial autocorrelation [69].
To obtain E, we create a set of dummy variables (B) based on the equation below: where I is an n-by-n identity matrix and 1 is a vector of length n containing ones. The superscripted T indicates a transposed matrix and n is the total number of observations. We can use B to transform the spatial weight matrix (C) underlying the spatial data: where O is the transformed matrix and C is the spatial weight matrix based on the spatial relationships (defined by adjacency or distance) among units. The fact that the eigenvectors of O are orthogonal (i.e., uncorrelated) is the reason why C has to be transformed. The orthogonal eigenvectors indicates the unique spatial patterns filtered from the spatially autocorrelated errors. The Moran's I value for each eigenvector given a specific spatial weight matrix (C) can be expressed as a function of the eigenvalues of O [49, 50]: Eq (3) suggests that the Moran's I values can be computed for any numerical values of a dependent variable (e.g., mortality) in a data set with n observations. It should also be noted that the first eigenvector (denoted as E 1 ) will have the largest Moran's I value given the spatial structure C and the second eigenvector will be a set of numbers that will make Moran's I statistic largest, yet smaller, than the Moran's I of the first eigenvector. Similarly, the third eigenvector includes the real numbers that generate the largest Moran's I value that is smaller than the Moran's I of the second eigenvector. In this fashion, spatial dependence, as measured by the Moran's I, decreases as the order of eigenvector increases [49].
In order to tie the spatial filtering procedures above to the OLS regression, one needs to incorporate a set of independent variables (X) into Eq (1) as follows: where X is a matrix containing the independent variables, which is the same X included in the basic OLS regression. Multiplying M by y results in the matrix of ε Ã and the eigenvectors of the transformed matrix (MCM) are hence derived from and are orthogonal to the independent variables (X). This MCM matrix still could be applied to Eq (3) and the spatial filtering procedures could be implemented. Importantly, the selected eigenvectors based on Moran's I can be added to the OLS model as supplementary covariates that mainly account for spatial autocorrelation in the dependent variable [50,51,70]. In other words, each eigenvector included in a spatial filtering model can be understood as an explanatory variable that is unrelated to other independent variables and accounts for the variation in the dependent variable (i.e., mortality).
Since the eigenvectors of MCM can be as many as n and all these eigenvectors are orthogonal to one another (i.e., mutually independent or no correlation), the next stage of spatial filtering is to select a parsimonious subset of eigenvectors. The conventional approaches to eigenvector selection are either to set a Moran's I threshold for inclusion [50] or to apply stepwise regression to positive Moran's I values [71]. However, both of these approaches used an iterative process, and this can be computationally demanding. To address the computational demands, Tiefelsdorf and Griffith [51] proposed to minimize a Z-score objective function of Moran's I of residuals and they found that this new approach greatly facilitates the process of eigenvector selection and removes spatial autocorrelation. Furthermore, this approach guarantees that each eigenvector included in the regression is associated with the dependent variable.
In this study, we use the Tiefelsdorf and Griffith [51] approach; details of how to implement this approach can be found in Chun and Griffith [72]. The spatial weight matrix was constructed based on the first order Queen specification (i.e., neighbors are defined as those counties that share the same boundary or a vertex).
The analytic strategy has several steps. First we examined descriptive statistics and Moran's I tests for measuring spatial autocorrelation. Second, we estimated both OLS and spatial filtering regression models. The second step will help us to better understand whether individual dimensions of segregation are associated with mortality. In the third step, we focused on each dimension of segregation by including the same segregation dimension measures for three race/ethnic groups in one regression model, e.g., simultaneously considering white/black, white/Hispanic, and white/API entropy indices. The models were estimated with both OLS and spatial filtering approaches. Following Tiefelsdorf and Griffith [51], we only consider the eigenvectors that are statistically significant and help to explain the spatial variation in mortality. We used R for all statistical analyses in this study [73].

Results
The descriptive statistics and Moran's I values are shown in Table 1. The average age-sex standardized mortality rate in US counties was 8.9 deaths per 1,000 population between 2004 and 2008, which is comparable with the number in a recent report [74]. The Moran's I of mortality was 0.55, suggesting that counties with similar mortality rates tend to cluster together and the strength of spatial autocorrelation is moderate. As for the segregation measures, two findings are notable. First, regardless of dimensions, white/black segregation measures are, on average, higher than white/Hispanic and white/API segregation indices. Second, all five dimensions of segregation were found to be spatially autocorrelated. Based on Moran's I, the exposure and centralization dimensions have the strongest and weakest levels of global autocorrelation, respectively. This pattern is consistent across the three race/ethnic groups. We also computed the Pearson's correlation coefficients between mortality and all fifteen segregation measures in (Table A in S1 File).
With respect to other covariates, roughly 35 percent of counties were metropolitan counties and overall (and on average), a county had approximately 9 percent of non-Hispanic blacks, 8 percent of Hispanics, and 4 percent of other minorities. The average Gini coefficient in our data, 0.43, closely matched the national level of income inequality [75]. Every month, the population in a county reported approximately 7 unhealthy days and the average adult obesity rate was almost 30 percent. All of the independent variables have moderate to strong spatial autocorrelations based on a Queen first order spatial weights matrix (see Moran's I in Table 1). The spatial structure evident in our dependent and independent variables implies the need for model specifications that are explicitly spatial.  Table 2 presents the OLS and spatial filtering regression results by race/ethnicity groups. There are several important findings. First, among the white/black segregation measures, only isolation index (the exposure dimension) was found to be positively related to mortality. While this association holds in both OLS and spatial filtering models, the estimated association   between isolation and mortality decreased by more than 30 percent between the OLS and the spatial filtering model ((1.051-0.709)/1.051 = 0.33). Second, as we argued and expected, the white/Hispanic and white/API segregation measures are negatively associated with mortality; though the evidence for this is based only on the isolation index. Two segregation measures, white/Hispanic spatial proximity index and white/API absolute centralization, follow our expectations in the OLS models, but in the spatial filtering models there are no longer significant (see spatial filtering models in Table 2). That said, their associations with mortality in the OLS models are confounded with the eigenvectors (i.e., omitted variables) as taking the eigenvectors identified by the spatial filtering approach into account fully explained the association. Third, we visualized the spatial distributions of the three significant isolation indices and mortality rates by quintiles in Fig 1. The white/black isolation index and mortality rates share a similar pattern where the Black Belt, Mississippi Delta, and eastern Texas have both high mortality rates and white/black isolation. By contrast, white/Hispanic and white/API isolation indices are higher along with the US/Mexico border and Pacific coast, in which mortality rates are relatively low. Fig 1, to some extent, provides an explanation for why the associations of the exposure dimension of segregation with mortality differ by race/ethnic groups. Even considering other potential explanatory variables in the models, our results offered support to the bivariate visual comparison in Fig 1. The spatial filtering approach identified more than 50 eigenvectors and improved the adjusted R-square by approximately 20 percent from the OLS models for each race/ethnicity group. After examining these eigenvectors (results not shown but available upon request), a fourth observation from our analysis is that the first four most important eigenvectors were the same across race/ethnic groups and they are eigenvectors 1, 6, 15, and 19. Following Griffith [49], we visualized these shared eigenvectors (based on quintiles) to gain a better understanding of what their spatial patterns are. As shown in Fig 2, Fig A in S2 File). As discussed in the method section, the four eigenvectors are independent of one another and each represents a variable at the county level not considered in the analysis. Note that in the spatial filtering models, these eigenvectors were already included in the analysis and could be regarded as additional independent variables that represent the covariates omitted in regression but associated with mortality (e.g., migration trends). Their estimated relationships with mortality were in Table 2. Except eigenvector 1, all other eigenvectors are positively associated with mortality and the parameter estimates are stable across models. The estimates for these eigenvectors are relatively large due to the fact the eigenvalues in each eigenvectors are small (i.e., decomposition of errors).
Fifth, beyond the findings related to segregation, the associations of other independent variables with mortality in Table 2 echo recent findings in the mortality literature [8,76]. For instance, both OLS and spatial filtering results indicated that metropolitan counties have higher mortality rates than nonmetropolitan counterparts, the so-called rural paradox [9]. Socioeconomic status variables suggested that better socioeconomic environment is associated with lower mortality, as Link and Phelan [29] argued. Regarding racial composition, the proportion of Hispanic population was negatively related to mortality, whereas the presence of other minority groups increased mortality. It should be noted that these associations between race/ethnicity groups and mortality were found even after taking racial segregation into account.
Finally, the statistically significant associations for both income inequality and social capital index with mortality observed in the OLS models were eliminated in the spatial filtering models. That is, once spatial structure is taking into account, the relevance of social capital and income inequality to mortality is reduced; as such, our findings contribute to the ongoing debates on these topics [60,77].
Following our analytic strategy, we also implemented analyses by segregation dimensions and the results were summarized in Tables 3 and 4. Since the main interest of this study is segregation (and the findings related to other independent variables, such as metropolitan status and income inequality, were similar to those in Table 2), we focus our discussions on segregation and spatial filtering results. Again, there are several notable findings. First and foremost, Segregation Dimensions, Race/Ethnicity, and Mortality the estimated relationships between the white/black segregation measures and mortality were all positive and they were statistically significant in four of the five segregation dimensions (except clustering). By contrast, the associations of white/API segregation measures with mortality were negative, with a statistically significant association for evenness, exposure, and clustering dimensions. While the white/Hispanic entropy index was marginally significant and negatively related to mortality, overall, white/Hispanic segregation does not affect all-cause mortality (in models including other covariates). Second, the OLS models seemed to overestimate the importance of segregation, such as the findings in the evenness and clustering dimensions. The spatial structure underlying the data contributes to this overestimation as the spatial filtering models generated weaker relationships between segregation measures and mortality, and improved the adjusted R-squared.
Third, the total number of eigenvectors found in each model is comparable across segregation dimensions and, among them, six were commonly shared by the five segregation dimensions, i.e., eigenvectors 15, 19, 1, 6, 21, and 17. Comparing with the findings in Table 2, two   We would like to reiterate that the six eigenvectors capture the spatial processes that are not associated with the independent variables in the models but they contribute to the observed spatial pattern of mortality in the US. Last, as the spatial filtering approach aims to remove spatial autocorrelation in the dependent variable, we conducted Moran's I tests to assess if the residuals of these models are still   Tables 2 and 3 indicated that spatial filtering effectively reduces spatial autocorrelation by approximately 85 percent from OLS models. The residuals' Moran's I values in spatial filtering models are all very close to zero (i.e., no spatial autocorrelation), while they remained statistically significant. The explanation for the statistical significance is that our eigenvector sets are optimal (based on statistical significance), rather than exhaustive [72]. When we included those eigenvectors with a p-value between 0.05 and 0.1, the residuals' Moran's I became non-significant (results not show but available). Since adding more eigenvectors does not change our findings, we believe the models presented in Tables 2  and 3 are the most parsimonious.

Discussion and Conclusions
We used the findings above to examine the research hypotheses. Following the ethnic stratification perspective, we first hypothesized that white/black segregation is positively related to mortality. When taking all five segregation dimensions into account (Table 2), only the exposure dimension was found significantly and positively related to mortality. Nonetheless, the dimension-specific analyses (Tables 3 and 4) offered stronger evidence to support our hypothesis as four out of the five white/black segregation measures followed our expectation. In contrast to the literature [14], we have provided further evidence to suggest that most dimensions of white/black segregation would increase mortality even after adjusting for other covariates.
In addition, we also expected that the segregation between whites and non-black minority groups would be negatively related to mortality (i.e., beneficial) and that exposure dimension may be the most relevant determinant among the five dimensions. Our findings provided some support to our hypotheses. Specifically, the results in Table 2 suggested that higher levels of isolation between non-Hispanic whites and these two minority groups are associated with lower levels of mortality in US counties. Though white/Hispanic spatial proximity index (clustering dimension) and white/API absolute centralization index (centralization dimension) in the OLS models followed our expectations, these findings did not hold when the spatial structure underlying the data was considered. When considering one dimension of segregation at a time, we obtained stronger evidence for white/API than white/Hispanic segregation measures. Specifically, three of the five white/API dimensions of segregation were negatively related to mortality, whereas only the white/Hispanic entropy index was marginally significant. Overall, we received relatively weak support (in contrast to white/black segregation) for the argument that living in ethnic enclaves or communities may be beneficial for non-black minorities. Coupled with the recent findings by Dinwiddie and colleagues [33], we suggested that more efforts should be made to further understand whether the segregation between whites and non-black minorities can benefit population health.
The third hypothesis indicated that the spatial autocorrelation affects the estimates of the relationships between segregation and mortality and spatial filtering approach would identify the spatial patterns that are not only related to county-level mortality but also shared by various segregation dimensions. This hypothesis was confirmed as the OLS models tend to overestimate the importance of segregation (see both Tables 2 and 3). This pattern has been reported by several studies using a spatial perspective [8,14,48]. This study reinforces the suggestion that when analyzing ecological data a spatial analytic approach should be employed. Furthermore, the race/ethnicity-specific models shared four eigenvectors and the dimension-specific analyses identified two additional eigenvectors. The six eigenvectors have distinctive spatial patterns and each of them represents a dimension not captured by our proposed independent variables. While it is not clear what these covariates may be, they provide scientific insights into future mortality studies as researchers could explore what factors correspond to these spatial patterns [49]. For example, Eigenvector 1 in Fig 2 suggests the combination of the edge effect surrounding the US boundaries and the potential association of Mainline Protestants with mortality in the US (see the spatial patterns in Fig 2 and Fig A in S2 File). We would like to emphasize that these eigenvectors were already included in the spatial filtering models so it is not necessary to include new independent variables into the models, which is the advantage of applying the spatial filtering in empirical research [72].
Overall, we believe that our hypotheses received some support, especially from the spatial filtering models. This study includes several noteworthy findings. First, the race/ethnicity-specific analyses suggested that the relationship of isolation index with mortality is the most consistent among other segregation measures, which suggests that exposure to non-Hispanic whites may be the most important dimension of segregation. As Massey and Denton [11] defined, the exposure dimension refers to the extent to which minority and majority group members interact within a given area and the isolation index captures the level of segregation experienced by minorities. This definition fits the ethnic stratification and ethnic community/ enclave perspectives and the opposite associations between non-Hispanic blacks and other minority groups were expected. Furthermore, the evenness and exposure dimensions of segregation were more closely related to mortality than the other three dimensions (Tables 3 and 4), which is consistent with Kramer and Hogue's [13] emphasis on evenness and exposure dimensions of segregation in the literature on segregation and health.
Second, the relationships between white/Hispanic segregation and mortality were weakly supported by our analytic results. Though we suspected that white/Hispanic segregation measures may be highly correlated with racial composition or other independent variables, such as social capital and income inequality, our sensitivity analyses (not shown) where these variables were excluded did not support this explanation. Thus, the possible explanation would be that white/Hispanic segregation measures are associated with the spatial processes that are not included or captured by our models.
Our findings related to income inequality and social capital directly speak to but do not resolve some debates in ecological mortality research. Specifically, in Tables 2 and 3, we found that income inequality and social capital became non-significant determinants of mortality when using a spatial filtering approach, which contradicts several recent studies using other analytical techniques, such as spatial econometrics and weighted least squares modeling [7,9,55,78]. One plausible explanation for the discrepancy is that inequality and social capital both have strong spatial patterns and when spatial filtering is used, these spatial patterns may be explained by the eigenvectors generated by the spatial filtering method. In other words, there may be some unknown covariates that confound the relationships between mortality and inequality and social capital. As the strength of spatial filtering is to identify potential covariates omitted in the analysis, our findings somewhat challenge the literature that income inequality and social capital matter for mortality. A systematic examination on why different spatial methods yield different conclusions on this topic is necessary. These intertwined substantive and methodological issues are beyond the scope of this current study but we recognize exploring this would be a fruitful future direction.
This study contributes to the mortality literature in the following ways. Only a few mortality studies have employed segregation as an explanation for mortality differentials across the US counties or cities [14,24] and even less has focused on the segregation between whites and non-black minority groups, as well as the five dimensions of segregation. Using the ethnic stratification and ethnic community/enclave perspective, we argued that white/black segregation is detrimental to overall mortality but white/Hispanic and white/API segregation are beneficial. It should be emphasized that the empirical support (and findings) for our arguments was obtained even after controlling for county-level population health measures, which strongly validates our conclusions. In addition, the spatial filtering approach identified several common eigenvectors that demonstrate unique spatial patterns in US counties. These eigenvectors represent missing variables that have implications for mortality but cannot be captured with any of the independent variables in our models. That said, mortality researchers should think outside the box to find determinants of geographic mortality differentials and these unique patterns offer some clues.
We included several health infrastructure variables, such as numbers of medical doctors or hospital beds per 1,000 population, into the analysis to see if they account for the relationship between segregation and mortality. Similar to the findings reported by Kindig and Cheng [43], we did not find significant associations between county-level health infrastructure and mortality (not shown but available upon request). Our conclusions related to segregation and mortality are not altered, suggesting that our findings are robust and consistent. Moreover, the spatial dependence embedded in our data (both mortality and independent variables) is confirmed even using different types of spatial weight matrix, such as first order Rook or second order Queen. As there is no agreement on which spatial weight matrix should be used, this study follows previous literature [9,14,48] to report the findings based on the first order Queen approach.
There are several limitations specific to this study. First, the ecological relationships between segregation measures and mortality cannot be generalized to individuals, and we would urge health researchers to investigate how the health of non-black minorities is affected by segregation from non-Hispanic whites. Second, this study combined multiple data sources from different time points to explore the associations between the independent and dependent variables. No causality can be derived from our analyses and the temporal misalignment should be noted. Third, like other ecological studies, our analysis is subject to the modifiable area unit problem [79] as changing the unit of analysis (e.g., using tracts) may alter the findings and conclusions. Fourth, somewhat related to the previous limitation, this study used the NCHS county-level vital statistics to study the persistent US mortality spatial pattern. While some state governments offer vital statistics at the tract-level, these data do not allow a nationwide study. Future efforts are warranted to investigate the segregation-mortality association at other geographic levels.
Fifth, this study used all-cause age-sex standardized mortality as the dependent variable as it is an overall evaluation of population health in a county. While race-specific mortality rates can be calculated, we encountered the small area/population estimation issues [80] as numerous counties have zero death for non-black minorities. The NCHS also suggested that mortality rates with fewer than 20 age-specific deaths should be suppressed for statistical reliability, which further complicates the small area issue. This problem will also make it difficult to compute the racial/ethnic mortality difference within a county. A related issue is that we did not exclude those counties with relatively small minority residents (e.g., less than 1,000 minorities) and thus those counties with the more unstable mortality in our analysis. We note that some segregation measures such as the isolation index, our measure of exposure, may be more sensitive to the size of minority groups. While the approach we adopt is commonly used [14,35], our findings may be subject to underestimated segregation indices [35]. A potential solution is to concentrate on one or two states with large Hispanic population, such as California or Arizona and examine the relationships between age-race-specific mortality rates and segregation (but the total number of counties would be low).
Finally, while our findings suggested that evenness and exposure dimensions are more important than others, future research may still need to investigate whether hypersegregation [81,82] is a more useful measure in health research. Some segregation measures, such as absolute centralization index, may work better in metropolitan counties than nonmetropolitan counties, which may be one reason why we did not find significant findings for these dimensions (though we note that most nonmetropolitan counties include urban places, including county seats). As one of our goals is to understand whether the relationship between segregation and mortality varies by individual dimension of segregation, the analysis using hypersegregation is beyond the scope of our study. Similarly, several scholars have developed spatial segregation indices [36,83,84]; however, these measures have not been commonly used in mortality research. Questions as to whether the choice of segregation measures matters remain underexplored [85].
Some future research directions can be drawn from this study. First, our discussion on ethnic enclave and community is relevant to the literature on immigrant health [86,87]. The Compressed Mortality Files used in this study do not include the information on nativity of the deceased, which prevents us from directly addressing this issue. Future studies should use other data sources to investigate the relationship among immigration, segregation, and health. Second, most mortality research used the latest available data to explain the mortality differentials across social groups and long-term latency between disease onset and death has been overlooked [88]. That is, the mortality differentials observed today may be the result of the socioeconomic or environmental factors in existence decades prior rather than those measured concurrently with mortality. Addressing the latency issue may better clarify the causality between the persistent mortality pattern and its determinants. Third, the segregation measures used in this study only concern two race/ethnic groups. The measures of multi-group segregation [89] and spatial segregation measures [36,83] should be employed to understand whether the choice and specification of segregation measure alters the results/conclusions. Finally, more attention should be paid to the mortality ratios between different race/ethnicity groups (e.g., white and black) and their associations with two-group racial/ethnic segregation measures. The mortality ratios measure racial/ethnic health disparities within an area and using these ratios as dependent variables would speak directly to the important question of whether segregation is associated with racial/ ethnic disparities. Moreover, examining the patterns and trends in race-specific mortality ratios and racial/ethnic health disparities overtime should be a research priority.
In sum, racial segregation is argued to be the major cause of health disparities [26] and a determinant of health outcomes [13]. Previous evidence has been drawn heavily from the non-Hispanic blacks. A growing body of literature has found that segregation may be beneficial to health outcomes or behaviors for non-black minorities, i.e., Hispanics and Asians/Pacific Islanders [90][91][92]. This study echoes the recent development in the literature and offers county-level evidence for the potential benefit of segregation for Hispanics and Asians/Pacific Islanders.
Supporting Information S1 File. Pearson's Correlation Coefficients between Mortality and Segregation Indices (Table A). (DOCX) S2 File. Spatial Distribution of Mainline Protestants in US counties, by Quintiles (Fig A).