Census tracts are often used to investigate area-based correlates of a variety of health outcomes. This approach has been shown to be valuable in understanding the ways that health is shaped by place and to design appropriate interventions that account for community-level processes. Following this line of inquiry, it is common in the study of pedestrian injuries to aggregate the point level locations of these injuries to the census tracts in which they occur. Such aggregation enables investigation of the relationships between a range of socioeconomic variables and areas of notably high or low incidence. This study reports on the spatial distribution of child pedestrian injuries in a mid-sized U.S. city over a three-year period. Utilizing a combination of geospatial approaches, Near Analysis, Kernel Density Estimation, and Local Moran’s I, enables identification, visualization, and quantification of close proximity between incidents and tract boundaries. Specifically, results reveal that nearly half of the 100 incidents occur within roads that are also census tract boundaries. Results also uncover incidents that occur on tract boundaries, not merely near them. This geographic pattern raises the question of the utility of associating area-based census data from any one tract to the injuries occurring in these border zones. Furthermore, using a standard spatial join technique in a Geographic Information System (GIS), these points located on the border are counted as falling into census tracts on both sides of the boundary, which introduces uncertainty in any subsequent analysis. Therefore, two additional approaches of aggregating points to polygons were tested in this study. Results differ with each approach, but without any alert of such differences to the GIS user. This finding raises a fundamental concern about techniques through which points are aggregated to polygons in any study using point level incidents and their surrounding census tract socioeconomic data to understand health and place. This study concludes with a suggested protocol to test for this source of uncertainty in analysis and an approach that may remove it.
Citation: Curtis JW (2017) Spatial distribution of child pedestrian injuries along census tract boundaries: Implications for identifying area-based correlates. PLoS ONE 12(6): e0179331. https://doi.org/10.1371/journal.pone.0179331
Editor: Catherine Staton, Duke University, UNITED STATES
Received: December 13, 2016; Accepted: May 26, 2017; Published: June 14, 2017
Copyright: © 2017 Jacqueline W. Curtis. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are provided in the Supporting Information file named "S1 File".
Funding: The author received no specific funding for this work.
Competing interests: The author has declared that no competing interests exist.
Many studies report a relationship between area-based socio-economic characteristics and injury [1–9] and the census tract is often the geographic unit in which these relationships are operationalized . In particular, understanding of child pedestrian injuries has been advanced through such linkages, [11–15], which are more recently being facilitated through use of Geographic Information Systems (GIS) [16–22].
In such studies, it is standard to overlay a) points of incidents with b) census tracts. The points are then aggregated to the census tract in which they are located through a spatial join technique. Once the data are aggregated in this way, they can then be normalized by an appropriate denominator and analyzed with statistical approaches to show relationships between outcomes and their socioeconomic context.
This technique of aggregating points to the polygons in which they occur, spatial join, is generally performed using one of four options depending on the data being utilized and the objectives of data manipulation. Using ArcGIS terminology , these options are:
1) Points to Polygons: Each polygon is appended with a summary of the numeric attribute of the points that fall inside it, and a count field of the points that fall inside it. 2) Points to Polygons: Each polygon is appended with the attributes of the point that is closest to its boundary, and a distance field showing how close the point is. 3) Polygons to Points: Each point is appended with the attributes of the polygon that it falls inside. 4) Polygons to Points: Each point is appended with the attributes of the polygon that is closest to it. For numerous studies of health and place where outcomes or events are represented as points and the place is represented by tracts, Option 1 is the standard approach.
This is the most intuitive way to achieve a count of points in each polygon. In addition to enabling quantitative analysis with census variables, this data transformation has the benefits of improved visualization of spatial patterns and spatial confidentiality. For example, when looking at a point layer of many types of health outcomes in a city, it is difficult to make sense of their geographic distribution, depending on the data and scale of observation. One efficient way to quickly see areas of high versus low occurrence is to aggregate the points to some underlying administrative boundary and create a choropleth map. Furthermore, such a map has the added effect of masking the locations of individual outcomes, which is particularly important if the points represent residential locations. For these reasons of intuitiveness, visualization, and confidentiality, Option 1 is the standard approach for aggregating points to polygons. To date, there has been no reason to question its validity.
This study was originally designed to include such a traditional analysis of area-based correlates of child pedestrian injury. However, in the process of initial mapping and aggregating point level incidents to their associated census tracts, two unexpected outcomes resulted. First, a striking geographic pattern was observed where many of these points were close to or even apparently on tract boundaries (Fig 1). Second, after aggregating these points to the census tracts in which they fall inside, the total number of incidents for the tracts was greater than the number of incident points. This observation suggests that the spatial join approach so widely used to aggregate points to polygons counts some points as falling inside more than one polygon. However, there is no indication to the user of this occurrence or where these multiple counts are located.
While such a coupling of incidents to tract boundaries is not expected in many health outcomes, pedestrian injuries are an exception. These events occur within or very near roads, and roads serve as one form of census tract boundary. If these points along the boundaries are being counted as falling inside all tracts that share this boundary, it is likely that these census tracts are being identified as having substantially high numbers of incidents based on what occurs at the outer fringes of these units.
Therefore, the aim of this study is to a) demonstrate the spatial distribution of child pedestrian injury incidents along census tract boundaries and then to b) test the implications of this specific geographic pattern in aggregating the point locations to the census tract in which they are located. The study concludes with a suggested protocol for avoiding the potential uncertainty that this pattern can introduce in examining area-based correlates and an approach that removes such uncertainty.
Materials and methods
Police incident data on pedestrian-motor vehicle crashes in a mid-sized Ohio city occurring between January 1, 2013 and December 31, 2015 were acquired [S1 File]. In this 3-year period, 327 incidents were reported. Age of the pedestrian was reported for all but one incident (Table 1). All 100 incidents involving children had an accompanying x,y coordinate for the location of occurrence.
The census tract boundary file created for the 2010 U.S. Census was acquired for the study area and both datasets (incidents and census tracts) were mapped in ArcGIS 10.4 . Once in the GIS, the data were transformed from a geographic coordinate system in the North American Datum 1983 (GCS NAD 83) to Universal Transverse Mercator projection (UTM Zone 17 N) for analysis.
The spatial relationship between injury incidents and tract boundaries was first investigated using Near Analysis. First, census tracts were converted from polygons to lines using the Polygon to Line tool, a Geoprocessing Tool under Data Management > Features. Then, the distance between each incident and its nearest line (tract boundary) was calculated using Near Analysis under Analysis Tools > Proximity > Near. This approach created a new field in the incident attribute table labeled “NEAR_DIST”. As these calculations were conducted in UTM projection, results are reported in meters. To identify clusters in these data, a) Kernel Density Estimation (KDE) was performed and b) the “NEAR_DIST” values were used as weights when calculating spatial autocorrelation  with close distances being more heavily weighted. These approaches enable visualization of specific areas where incidents are located on or near census tract boundaries and would therefore raise concern about linkage with tract level socioeconomic variables in these places.
To further investigate how points are assigned to polygons based on location, especially in border regions, three relevant approaches were tested: Option 1 (standard approach), Option3, and Option 4. Option 2 is not tested as the output from this process does not result in the ability to count the number of points within each polygon.
These data and methods address the aims of this study to a) demonstrate the spatial distribution of child pedestrian injury incidents along census tract boundaries and then to b) test the implications of this specific geographic pattern in aggregating the point locations to the census tract in which they are located.
Near analysis, hot spots, and spatial autocorrelation
Near analysis results in a range of incidents located from 0–633.48m from a census tract boundary, with an Interquartile Range (IQR) of 215.90m. Twelve of these points have a value of 0m. Fig 2 provides a summary of the distances of each incident to its nearest census tract boundary.
Furthermore, street widths were measured to identify a zone of census tract change, where the distance of the incident from the boundary is greater than zero, but still in close proximity. Based on these measurements, 30m was selected as a representative street zone of transition between tracts. Using this measure, all incidents were selected with a NEAR_DIST value of 30m or less. Forty-six incidents (46%) occurred within this zone of a census tract change and all of these points are 11m or closer to the boundary. Fig 3 illustrates the distribution of the points that are within a street width distance of the boundary.
These results identify a preponderance of incidents occurring just at the boundary from one census tract to another, and with some being located directly on the boundary itself. However, it is not enough to know that this pattern exists, but also where (e.g., is it present across the study area or only in a certain places), and what it portends for linking child pedestrian injury locations to their surrounding socioeconomic context. Therefore, KDE was used to visualize the presence of hot spots and then Local Moran’s I was employed test the significance in clustering of these border and near-border incidents.
KDE (800m bandwidth) revealed that these intersecting or near-intersecting points are concentrated in the northeast quadrant of the study area (Fig 4), with smaller pockets distributed throughout the city.
In order to quantify the significance of this pattern of apparent clustering, Local Moran’s I was calculated using a fixed distance threshold of 3200m to ensure a minimum number of neighbors in the analysis and the distance calculated from the Near_Dist approach were used as weights. However, no areas met the criteria of statistical significance (p<0.05). Despite falling short of significance, visualizing their distribution is still valuable as it reveals a spatial pattern of the risk of introducing uncertainty in aggregation and the subsequent linkage with census tract socioeconomic data. These techniques demonstrate the spatial distribution of child pedestrian injury incidents along census tract boundaries which portends geographic variation in uncertainty within the study area for subsequent area-based analysis. The pattern exists and is concentrated more in some areas than in others, but how this pattern manifests in subsequent analysis requires further investigation through testing of a variety of common forms of aggregation.
The three relevant options that can produce a count of the number of points (incidents) in a polygon (census tract) each produced different results.
Option 1: This approach resulted in 87 points being assigned to only one census tract, 11 assigned to two census tracts, and 2 assigned to three census tracts (Fig 5). The sum of all incidents is 115.
Option 3: This approach resulted in 13 of the points not be assigned to a census tract (where m = 0 through to 0.000016). The sum of all incidents is 100.
Option 4: This approach resulted in all points being assigned to only one census tract. The sum of all incidents is 100.
Comparing the results of Option 3 and Option 4, both yield a sum of 100 incidents which indicates an absence of assigning a point to more than one tract as occurred in Option 1. Examination of the census tracts reveals that of the 44 tracts with at least 1 incident occurring within its boundaries based on at least one of the spatial join approaches 17 (41%) demonstrated inconsistency in the number of incidents across the three approaches. Seventeen (100%) of the census tracts changed between Option 1 and Option 3 with a total difference of 28 incidents. Thirteen (76%) of the census tracts changed between the Option 1 and Option 4 with a total difference of 15 incidents. Seven (41%) of the census tracts changed between Option 3 and Option 4, with a difference of 13 incidents. Recall that Option 3 counted 13 incidents as NULL values and therefore these are not assigned to any census tract. Table 2displays census tracts in which the number of incidents attributed to it varied based on the spatial join approach. The grayed cells identify the anomalies.
This result raises the question about the nature of the geography of the points in relation to the census tract boundaries in these places and how it may impact subsequent linkage with area-based correlates. In sum, results are twofold: 1) a spatial pattern of child pedestrian injury incidents is often aligned with census tract boundaries, and 2) different spatial join procedures produce differential assignment of points to the polygons in which they are located.
Observations and implications
This study produces two important findings to which researchers should be alerted in the near term. The first is both conceptual and methodological and focuses on the implications of a coupled point-boundary geographic pattern of child pedestrian injury incidents. Near Analysis revealed that, in this case, 46 out of 100 incidents occurred within 30m of a census tract boundary. In effect, this means, that if the incident occurs in one lane of the road, it is assigned to that tract, but take a few steps further into the adjacent lane, and now the incident is assigned to that tract instead. Such a small distance may result in large differences in the number of incidents counted as occurring in each tract and being associated with a different set of socioeconomic contextual variables.
Roads are often used as census tract boundaries as they are relatively stable, visible, and identifiable features . Roads are also the common location of pedestrian injuries. Therefore, while using census tracts to study the area-based correlates of many health outcomes, even of injury types, this geographic unit presents a special problem in the case of pedestrian injuries as they are often occurring at the boundaries between tracts. Conceptually, when so many of this type of injury occur at tract boundaries, is it valid to assign them only to the tract in which they occur to investigate area-based correlates? Perhaps using the census data in which the incident “falls within” is only partially satisfactory. Unfortunately, despite the many advances enabled through GIS, it is limited in that polygon boundaries are necessarily represented as firm lines, with one set of values assigned to one side and another set of values assigned to the other (Fig 6). However, the reality is that it is unreasonable to expect that human spatial patterns and processes respect a hard stop at a census tract boundary. It is more realistic to expect that these boundaries have a fuzzy characteristic that is not easily represented in a GIS environment.
The second issue of immediate concern, which is technical in nature, is how are points assigned to polygons in GIS when they are located on a boundary? Option 1, the standard and most intuitive approach, resulted in multiple counting of each border incident which propagated into a larger number of incidents in the tracts than what was reported in the original data set. This led to investigation of alternative approaches. In Option 3, each point was appended with the attributes of the polygon that it falls inside. This meant that every point representing a child pedestrian injury incident was assigned the name of the census tract in which it was located, as well as any other variables included in the tract data. However, this study demonstrated that those points on the border where not assigned to a tract. By summarizing the newly appended tract field, a table was created with counts of incidents for each tract in which they occurred. This table was then joined to the census tract layer to create the choropleth map and conduct the same analysis as would result from Option 1. Option 4 enabled the same output, but also included a field calculating the distance from the points to their nearest polygon boundary and when the point was located inside the polygon, the distance was zero. This option assigned all points to only one tract each. These results should provide guidance to the numerous studies aggregating health outcomes to the census tracts in which they occur, especially those investigating outcomes that may be geographically associated with census boundaries. Pedestrian injury in general, and child pedestrian injury in particular, is one area where the growing body of research [13–16, 21, 25, 26–28] is becoming aware of inherently geographic issues within their studies–those identified through this investigation should be added to the list.
Identifying and managing uncertainty
The findings of this study demonstrate that using the standard spatial join approach in a GIS, points located on a polygon boundary are counted as falling inside all adjacent polygons. This occurs without any alert to the user, leaving the research to proceed without recognizing that a) the total number of points has increased from the original dataset and b)individual counts of points in particular polygons where a number of points are located on the boundary may be inflated above the expected value. This source of uncertainty is possible in any spatial join procedure of points to polygons, but especially those where the spatial distribution of points is naturally aligned with areal unit boundaries. Therefore, practical approaches are needed immediately for identifying and systematically managing this uncertainty. Table 3 offers a three-stage protocol to address this issue.
In sum, these recommendations are easy and quick first steps to integrate into existing GIS data protocols when needing to aggregate point data into the polygons in which they are located. They are simple steps to identify and manage uncertainty, as well as to provide transparency in interpreting results in the numerous studies where GIS is used to understand the relationship between health and place. In this case, GIS should be a tool that provides greater insight into the socioeconomic contexts that result in child pedestrian injuries. However, as they are often spatially distributed along roads that serve as census tract boundaries, unless protocols are adjusted to account for boundary effects on aggregation, it could be muddying the waters.
Looking beyond big cities
This investigation was initiated through a process that began with observation of points overlaying census tracts. Despite the simplistic nature of this exercise, observation remains essential. In this case, GIS facilitated identification, visualization and then quantification of this geographic pattern. In part, this positive outcome is due to the relatively small size of the study area and low number of incidents which stands in contrast to existing research, most of which has been conducted in large urban areas [13–14, 21, 26]. In major metropolitan areas, the census tracts are smaller and there are usually more points on the map. The result of these large study areas with small tracts and many incidents is difficulty in observing macro-level spatial patterns. Looking at the data in a GIS provides a snapshot of a sea of points, with little ability to see what lies beneath them. It would be unusual to clearly see a boundary relationship between points of incidents and census tracts in these types of study sites when looking at the area in its entirety. In this case, working in a mid-sized city is a benefit as it has sufficient numbers of incidents to cause concern and require analysis, but the numbers and geographic area of their distribution are not so large as to preclude quick, clear visualization. Looking beyond big cities to the mid-sized and smaller towns results in less data distributed over less space, which can lead to greater clarity in observation of geographic patterns.
The contributions of this study and their implications for current and future research must be considered in the context of the project’s limitations. First, the results are only for one mid-sized U.S. city using police incident data over a three-year period. Additional investigation is needed in smaller and in larger areas with different numbers of incidents and geographies of census tracts (e.g., urban areas where tracts are smaller and more numerous and rural areas where they are larger and fewer). Comparative studies across such types of sites will enable understanding of what geographic conditions create a coupling of numerous pedestrian injury incidents with census tract boundaries. Furthermore, this study did not examine the degree of uncertainty that these discrepancies in counts present to subsequent analysis of child pedestrian injury with area-based correlates. For example, how much do census data vary from tract to contiguous tract? Further investigation must be conducted to identify if and how much this issue affects results. Finally, the findings on spatial join approaches and their discrepant results presented in this study are only for a specific GIS software and version. Although the software utilized in this project is the most common, there is growing adoption of other packages and these questions should be investigated in them as well. These are issues for future research to address.
To date, there has been no reason to question the standard GIS practice of aggregating points to polygons. However, this study reveals that investigating this process and its implications for reported relationships between area-based correlates and health outcomes is needed. This is a particularly pressing concern for research on phenomena that may be tightly coupled with census boundaries.
This study highlights not only a technical concern, but also a more substantive one about the geography of phenomena and their core-periphery relationships to administrative boundaries. Analysis of census tract socioeconomic data has offered valuable understanding of the varied and complex linkage between health and place. Though most health outcomes and processes are not spatially tied to roads and other features of the environment that serve as tract boundaries, some necessarily are linked in this way. Research is needed to identify potentially boundary-dependent processes and outcomes.
With the rapid expansion of GIS throughout public health and the growing acknowledgment of the powerful insights this technology can offer, this study offers a caution. Scholars should pause to critically reflect about the often “black box” nature of the technology and our own personal limits in understanding its operations. Developing formal protocols for use of GIS techniques (even the most basic ones) in public health can improve transparency and confidence as this line of inquiry continues apace to yield real-world benefits for people’s health.
S1 File. PLOS_One_Minimal_Dataset.xlsx.
The records listed in this file can be accessed online: http://online.akronohio.gov/apdonline/reportlookup/EULA.aspx?referrer=ReportLookup.
The author is grateful for the thorough and thoughtful feedback provided by the reviewers.
- Conceptualization: JWC.
- Data curation: JWC.
- Formal analysis: JWC.
- Investigation: JWC.
- Methodology: JWC.
- Project administration: JWC.
- Resources: JWC.
- Validation: JWC.
- Visualization: JWC.
- Writing – original draft: JWC.
- Writing – review & editing: JWC.
- 1. Durkin MS, Davidson LL, Kuhn L, O'Connor P, Barlow B. (1994) Low-income neighborhoods and the risk of severe pediatric injury: a small-area analysis in northern Manhattan. Am Public Health 84: 587–592.
- 2. Reading R, Langford IH, Haynes R, Lovett A. (1999) Accidents to preschool children: comparing family and neighbourhood risk factors. Soc Sci Med 48: 321–330. pmid:10077280
- 3. Cubbin C, LeClere FB, Smith GS. (2000) Socioeconomic status and injury mortality: individual and neighbourhood determinants. Journal of Epidemiology and Community Health 54: 517–524. pmid:10846194
- 4. Haynes R, Reading R, Gale S. (2003) Household and neighbourhood risks for injury to 5–14 year old children. Soc Sci Med 57: 625–636. pmid:12821011
- 5. Soubhi H, Raina P, Kohen D. (2004) Neighborhood, family, and child predictors of childhood injury in Canada. Am J Health Behav 28: 397–409. pmid:15482969
- 6. Reading R, Haynes R, Shenassa ED. (2005) Neighborhood influences on child injury risk. Child Youth Environ 15: 165–185.
- 7. Potter BK, Speechley KN, Koval JJ, Gutmanis IA, Campbell MK, Manuel D. (2005) Socioeconomic status and non-fatal injuries among Canadian adolescents: variations across SES and injury measures. BMC Public Health 5: 1. Available: http://bmcpublichealth.biomedcentral.com/articles/10.1186/1471-2458-5-132
- 8. Kim M H, Subramanian SV, Kawachi I, Kim CY. (2007) Association between childhood fatal injuries and socioeconomic position at individual and area levels: a multilevel study. J Epidemiol Community Health 61: 135–140. pmid:17234872
- 9. Bell N, Schuurman N, Hameed SM (2009) A small-area population analysis of socioeconomic status and incidence of severe burn/fire-related injury in British Columbia, Canada. Burns 35: 1133–1141. pmid:19553025
- 10. Krieger N. (2006). A century of census tracts: health & the body politic (1906–2006). J Urban Health, 83: 355–361. pmid:16739037
- 11. Rivara FP, Barber M. (1985) Demographic analysis of childhood pedestrian injuries. Pediatrics 76(3): 375–381. pmid:4034298
- 12. Braddock M, Lapidus G, Gregorio D, Kapp M, Banco L. (1991) Population, income, and ecological correlates of child pedestrian injury. Pediatrics 88: 1242–1247. pmid:1956744
- 13. LaScala EA, Gerber D, Gruenewald PJ. (2000) Demographic and environmental correlates of pedestrian injury collisions: a spatial analysis. Accid Anal Prev 32: 651–658. pmid:10908137
- 14. Chakravarthy B, Anderson CL, Ludlow J, Lotfipour S, Vaca FE. (2010) The relationship of pedestrian injuries to socioeconomic characteristics in a large Southern California County. Traffic Inj Prev 11: 508–513. pmid:20872307
- 15. Statter M., Schuble T, Harris-Rosado M, Liu D, Quinlan K. (2011) Targeting pediatric pedestrian injury prevention efforts: teasing the information through spatial analysis. J Trauma Acute Care Surg 71: S511–S516.
- 16. Braddock M, Lapidus G, Cromley E, Cromley R, Burke G, Banco L. (1994) Using a geographic information system to understand child pedestrian injury. Am J Public Health 84: 1158–1161. pmid:8017545
- 17. Cusimano M, Chipman M, Glazier RH, Rinner C, Marshall SP. (2007) Geomatics in injury prevention: the science, the potential and the limitations. Inj Prev 13: 51–56. pmid:17296690
- 18. Edelman LS. (2007) Using geographic information systems in injury research. J Nurs Scholarsh 39: 306–311. pmid:18021129
- 19. Bell N, Schuurman N (2010) GIS and injury prevention and control: history, challenges, and opportunities. Int J Environ Res Public Health 7: 1002–1017. pmid:20617015
- 20. Cusimano M, Marshall S, Rinner C, Jiang D, Chipman M. (2010) Patterns of urban violent injury: a spatio-temporal analysis. PLoS One 5(1): e8669. Available: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0008669 pmid:20084271
- 21. Chakravarthy B, Anderson CL, Ludlow J, Lotfipour S, Vaca FE. (2012) A geographic analysis of collisions involving child pedestrians in a large Southern California county. Traffic Inj Prev 13(2): 193–198. pmid:22458798
- 22. Malek M, Guyer B, Lescohier I. (1990) The epidemiology and prevention of child pedestrian injury. Accid Anal Prev 22: 301–313. pmid:2222697
- 23. ESRI. (2015) ArcMap 10.4, Redlands, CA.
- 24. Anselin L. (1995). Local indicators of spatial association—LISA. Geogr Anal 27(2), 93–115.
- 25. U.S. Census. (2010) Geographic Terms and Concepts–Census Tract. Available online: https://www.census.gov/geo/reference/gtc/gtc_ct.html Last accessed: 12/5/2016.
- 26. Lightstone AS, Dhillon PK, Peek-Asa C, Kraus JF. (2001) A geographic analysis of motor vehicle collisions with child pedestrians in Long Beach, California: comparing intersection and midblock incident locations. Inj Prev 7: 155–160. pmid:11428565
- 27. Haas B., Doumouras A. G., Gomez D., De Mestral C., Boyes D. M., Morrison L., & Nathens A. B. (2015). Close to home: an analysis of the relationship between location of residence and location of injury. J Trauma Acute Care Surg 78(4), 860. pmid:25807410
- 28. Silverman J. D., Hutchison M. G., & Cusimano M. D. (2013). Association between neighbourhood marginalization and pedestrian and cyclist collisions in Toronto intersections. Can J Public Health 104(5), e405–9. pmid:24183182