Differences in Factors Affecting Various Crash Types with High Numbers of Fatalities and Injuries in China

Objectives Road traffic crashes that involve very high numbers of fatalities and injuries arouse public concern wherever they occur. In China, there are two categories of such crashes: a crash that results in 10–30 fatalities, 50–100 serious injuries or a total cost of 50–100 million RMB ($US8-16m) is a “serious road traffic crash” (SRTC), while a crash that is even more severe or costly is a “particularly serious road traffic crash” (PSRTC). The aim of this study is to identify the main factors affecting different types of these crashes (single-vehicle, head-on, rear-end and side impact) with the ultimate goal of informing prevention activities and policies. Methods Detailed descriptions of the SRTCs and PSRTCs that occurred from 2007 to 2014 were collected from the database “In-depth Investigation and Analysis System for Major Road Traffic Crashes” (IIASMRTC), which is maintained by the Traffic Management Research Institute of the Ministry of Public Security of China (TMRI). 18 main risk factors, which were categorized into four areas (participant, vehicle, road and environment-related) were chosen as potential independent variables for the multinomial logistic regression analysis. Comparisons were made among the single-vehicle, head-on, rear-end and side impact crashes in terms of factors affecting crash occurrence. Findings Five risk factors were significant for the six multinomial logistic regression models, which were location, vertical alignment, roadside safety rating, driver distraction and overloading of cargo. It was indicated that intersections were more likely to have side impact SRTCs and PSRTCs, especially with poor visibility at night. Overloaded freight vehicles were more likely to be involved in a rear-end crash than other freight vehicles. Driver distraction is an important risk factor for head-on crashes, while vertical alignment and roadside safety rating are positively associated with single-vehicle crashes. Conclusion Based on the findings, promising measures were proposed to prevent each type of SRTC and PSRTC, which governmental or regulatory agencies could employ to plan strategies to reduce SRTCs and PSRTCs and support lifesaving policies.


Introduction
The global burden of road crashes has been rising with increasing motorization, and it is estimated that there are over 1.2 million people killed and up to 50 million people injured in road traffic crashes each year [1]. While the cost of these crashes is hard to calculate with certainty, it is considered to be between 1% and 3% of a country's Gross Domestic Product. Across the world, this amounts roughly to 500 billion US dollars, a substantial figure. Although non-fatal collisions are more numerous, fatal crashes-and especially crashes with high numbers of fatalities and injuries-have much more significant impacts on those involved and their families. The costs of medical care, loss of income (often due to death or injury of the main breadwinner) and demands of care for long term impairment can be impoverish families.
The "493 Decree" published by the China State Council contains definitions for road traffic crashes with very high numbers of fatalities and injuries or a high cost. A "serious road traffic crash" (SRTC) is defined as one where 10 to 30 people die in the crash, or 50 to 100 people are seriously injured, or the crash leads to direct economic loss of 50 to 100 million RMB (about 8 to 16 million US dollars). A "particularly serious road traffic crash" (PSRTC) is more costly, involving more than 30 deaths, or more than 100 seriously injured people, or more than 100 million RMB direct economic loss. Official statistics for the period 2007 to 2014 show that 2908 deaths and 2897 injuries occurred in 194 SRTCs and PSRTCs [2]. Most of these crashes (187) are SRTCs; the 7 PSRTCs accounted for 294 deaths and 69 injuries.
Prediction models are a well-recognized means of analyzing the factors that contribute to an increased road crash risk, and therefore could be addressed through preventive measures. However, what constitutes an efficient and effective prediction model can vary depending on specific objectives such as what is being predicted and at which level [3]. The majority of prediction models attempt to predict crash severity. These models are divided into 3 major classifications: (1) binary outcome models, which are typically applicable to situations with dichotomous injury severity outcomes, such as fatal and nonfatal crashes; (2) ordered discrete outcome models, which account for the ordinal nature of injury data (e.g., ranging from property damage only crashes to injury crashes and fatal crashes); and (3) unordered multinomial discrete outcome models, which take into account three or more outcomes and do not specifically consider the ordering of injury severity data [4]. In addition to crash severity, a considerable number of researchers have aimed at predicting frequency (number of crashes that occurred during a period) or crash rate (crash frequency relative to traffic exposure). Common methods include Poisson regression models, negative binomial regression models, and variations of these models [5].
Despite the contribution of previous modeling exercises, it has been argued in recent studies that the nature and hazardous factors influencing crash occurrence vary significantly by crash type, e.g. single-vehicle, head-on, etc. [6,7]. Thus, identifying risk factors for different crash types is of great importance in promoting an understanding of various crash occurrence mechanisms and informing consequent prevention activities and policies. Golob and Recker applied nonlinear canonical correlation together with cluster analyses to more than 1000 crashes in Southern California to determine how crash type (e.g., rear-end, sideswipe, hit object) was related to traffic flow conditions [8]. Sullivan and Daly employed descriptive statistics to compare crashes with and without median trees with regard to crash types based on crash data on urban and suburban conventional highways in California [9]. It was found that median trees were associated with lower proportions of head-on and broadside collisions and with a higher proportion of hitpedestrian collisions. In addition, more single vehicle collisions and fewer collisions involving three or more parties occurred in the presence of median trees. In Basyouny's study [10], the relationship between the combination of different weather elements and seven crash types was explored using multivariate Poisson lognormal (MVPLN) models based on 11 years of daily weather and crash data for Edmonton, Canada. Adverse weather states were found associated with an increase of 9% to 73.7% for all crash types, with the highest increase for run-off-the-road crashes. The effects of more comprehensive factors (human-related, road-related, vehicle-related, traffic related factors, etc.) on the occurrence of specific crash types were considered in Yu and Larsen's research [11,12]. Active traffic management strategies designed for traffic safety improvement were proposed based on the identification of the real-time crash patterns.
The literature mentioned above demonstrates that most crash-type analyses have been undertaken in developed countries and regions. However, the characteristics and mechanisms of crashes in developing countries are different from those in developed countries [13][14][15]. In addition, road traffic crashes with high numbers of fatalities and injuries are much more frequent in developing countries, e.g. in China. Investigations into such crashes are of great importance in saving lives and meeting public expectations of safety. Therefore, the purpose of this study is to bridge the research gap on differences among SRTCs and PSRTCs in China by studying the factors affecting the occurrence of crash types (single-vehicle, head-on, rear-end and side impact), based on data on SRTCs and PSRTCs that occurred from 2007 to 2014. Moreover, an effort is made to highlight the preventive measures and policies for each crash type.

Data Preparation
In China, every SRTC and PSRTC is investigated thoroughly by the Ministry of Public Security (MPS)'s expert team with all the data and files recorded in a comprehensive database-Indepth Investigation and Analysis System for Major Road Traffic Accidents (IIASMRTA). The expert team is composed of local traffic police authorities, road traffic crash inspectors of the MPS and university professors with relevant expertise, and risk factors are identified by them for each SRTC and PSRTC based on the combination of witness interviews, hospital records, coroners' reports, physical evidence and collision reconstruction. The risk factors in this study were selected from two sources: the "Traffic Violation Code Table" published by the MPS and the "Relevant Factors" listed in the IIASMRTA database. The former source is related to traffic violation behaviors, and the latter source classifies factors into three categories: participantrelated factors (contains four subcategories: distraction, health problem, misjudgment, operation error); vehicle-related factors; and road and infrastructure-related factors.
In this study, data for all 194 SRTCs and PSRTCs that occurred in China from 2007 to 2014 were collected from the IIASMRTA database with the assistance of the Traffic Management Research Institute of the Ministry of Public Security of China (TMRI). A preliminary data screening procedure developed by Jiang and Lyles was followed to make the crash data relatively error free [16]. Five crashes were filtered out as a result of missing information and the final data contained 189 crashes, which comprised 100 single-vehicle crashes, 39 head-on crashes, 26 rear-end crashes and 24 side impact crashes (side crashes and sideswipe crashes). The percentage of missing records (2.6%) was much lower than those in several similar studies [4,17,18], which implies relatively good data quality.
The validity and reliability of the assigned risk factors for the crashes were cross-checked by the authors, through an examination of witness interviews, hospital records, and coroners reports and local police reports. This was undertaken for only part of the period (2007 to 2011) with the assistance of TMRI. A second cross-checking approach was undertaken by the first author through on-the-spot investigations of three SRTCs in 2012/13. The first author accompanied the expert team when they attended the crashes immediately after their occurrence, analyzed the same sources of evidence and reconstruction used by the expert team, and as a result was able to validate the risk factors assigned by the team.
The risk factors were ranked in terms of overall frequency of involvement in crashes, and the top 18 factors were chosen as potential independent variables for multinomial logistic regression. These factors were chosen because they met the requirement of data size for multinomial logistic regression analysis, which is a minimum of 10 cases per independent variable [19]. Other factors with lower ranks did not meet the requirement. The 18 factors can be divided into four categories (participant-related, vehicle-related, road-related, environmentrelated) [20,21]. Table 1 provides the summary descriptive statistics for the dependent and potential independent variables used in this paper.
The regulations that apply to some of the road-related and participant-related risk factors listed in Table 1 are different from those in other countries, thus, some explanation of these factors is needed. Regarding "Missing or nonstandard median strip", the type, width, and barrier specification of median strips on highways and urban roads with various grades are regulated by highway and road design standards, nevertheless, substandard median barriers or missing median strips are found in some SRTCs and PSRTCs. Such road deficiencies tend to cause crashes in which vehicle(s) deviate into the opposite lane and encounter a head-on collision with other vehicle(s). With respect to the factors "Missing or nonstandard signs" and "Missing or nonstandard markings", although the Ministry of Transport has issued a series of standards regarding the design and use of signs or markings, missing warning signs before hazardous locations (e. g. road sections next to slopes/embankments and bodies of water, or with a high roadbed, or situated at a combination of horizontal curve and steep slope) are very common. Missing or vague road markings (e. g. lane demarcation lines, lane border lines, delineators, shoulder rumble strips) are also frequently observed near or at such locations.
In participant-related factors, "Illegal overtaking" is defined as "overtaking that occurs in overtaking-prohibited road sections or without safe overtaking conditions". "Fatigued driving" is referred as "continuous driving over 4 hours without a break or a break time less than 20 minutes". "Driver distraction" is explained as "the diversion of driver attention away from the driving task, which includes passenger-related distraction, cell phone-related distraction, cognitive distraction, and in-vehicle distraction".

Methods
The main objective of this research is to identify factors that have a significant influence on the occurrence of each crash type. Multinomial logistic regression is employed for two reasons: firstly, the impact of factors can be analyzed quantitatively since this method is capable of predicting the probability of each category of the dependent variable based on multiple independent variables. The results thus become more explicable than qualitative methods, e. g. decision tree. Secondly, multinomial logistic regression does not require the normality, linearity, or homoscedasticity of the data, so it is more attractive than methods which necessitate more assumptions about the data, e.g. discriminant function analysis [22].
However, multinomial logistic regression does need careful consideration of the correlation between each independent variable and the dependent variable (crash type), as well as the possible multicollinearity among the independent variables.
Since Pearson' chi-square test is suitable for correlation analysis between two categorical variables, it was used to evaluate the relationship between each potential independent variable and crash type [23]. Independent variables with P values greater than 0.05 were eliminated from the subsequent analyses. The formula of the Pearson' chi-square test statistic is [24] Where χ 2 is the Pearson cumulative test statistic; O i is the number of observations of type i; N is the total number of observations; n is the number of cells for N observations; E i is the expected (theoretical) frequency of type i, which is expressed as Where p i is the fraction of type i in the population. χ 2 is used to calculate a P-value by comparing the value of the statistic to a chi-squared distribution. The number of degrees of freedom) is equal to n minus the reduction in degrees of freedom. After conducting Pearson' chi-square test, bivariate correlation analysis was undertaken to identify the potential correlations among the remaining variables. Since most variables are discrete, Spearman's rank correlation coefficients rather than Pearson's correlation coefficients was used [25]. The Spearman's rank correlation coefficient is given by [26] Where d i is the difference between ranks for each x i , y i data pair and m is the number of data pairs. Subsequently, a multinomial logistic regression model was employed to reveal the relationships among single-vehicle, head-on, rear-end and side impact crashes. Multinomial logistic regression is used to predict the probability of category membership of a dependent variable based on multiple independent variables. In this paper, the model is expressed as [27] log Where π i is the probability of non-baseline category i of the response variable (crash type), i = 1,. . .p (i6 ¼j); p is the number of categories of the response variable; π j is the probability of baseline category j of the response variable; α i is the intercept of the i-th equation; X T is the transpose of the independent variable vector x; β i is the coefficient vector for i-th equation. Finally, the statistical significance of the variables and the fitness of the model were assessed [28]. All the statistical analysis was performed in SPSS version 22 statistical package for Windows. Table 2 shows the results of Pearson's chi-square tests. The results indicate that several independent variables are not statistically significant at the 0.05 level, including "Speeding", "Fatigued driving", "Overloading of people" and "Vehicle malfunction". Thus these variables were discarded. Table 3 gives the Spearman's rank correlation coefficients among the remaining factors; many strong correlations are observed at a 0.01 significance level.

Results
In order to eliminate multicollinearity among the factors, the "Forward Selection (Likelihood Ratio)" stepwise selection method was employed in the multinomial logistic regression analysis, with variable entry testing based on the significance of the score statistic (the significance level was set at P 0.05), and removal testing based on the probability of a likelihoodratio statistic based on the maximum partial likelihood estimates (the significance level was set at P>0.10) [29]. Finally, only five factors remained (shown in Table 4). The likelihood ratio test shows that all of them are statistically significant at a 0.05 level of significance.
Outcomes of the multinomial logistic regression model are used to predict the odds that the dependent variable (crash type) will be in one category as compared to another category. Thus, for this analysis, there are six category contrasts for the crash types: (1) single-vehicle crashes compared to side impact crashes, (2) head-on crashes compared to side impact crashes, (3) rear-end crashes compared to side impact crashes, (4) single-vehicle crashes compared to head-on crashes, (5) single-vehicle crashes compared to rear-end crashes, (6) head-on crashes compared to rear-end crashes. The output of a multinomial logistic regression model with four categories of dependent variable usually reveals three relationship contrasts, and this is typically how the results of such analyses are reported. However, all six of the relationship contrasts are provided to offer the reader additional insights that may not have been easy to determine otherwise [30]. Table 5 summarizes the parameter estimates of the final models, it should be noted that the reference level for each binary variable in SPSS is 1 rather than 0.  The quality of the model was examined. Firstly, the result of the goodness-of-fit test indicates that deviance P-values are greater than 0.05, which means that the model has acceptable fitness [5]. Compared with the model with only the intercept, the change in −2Loglikelihood in the final model is 174.133, which is well above the critical value at a 0.05 significance level for 15 degrees of freedom and implies that the model has a sufficient explanatory power [31]. In addition, the values of pseudo-R 2 indicate a reasonable level of fit, and the overall prediction accuracy is desirable (75.1%) compared with previous literature [32][33][34].

Discussion
Taking side impact crash as the reference type, it is shown in Table 3 that the signs of the "Location" variable are consistently negative for all three models (single-vehicle vs side impact, head-on vs side impact, rear-end vs side impact), which indicates that intersections are more likely to have side impact SRTCs and PSRTCs. This phenomenon can be understood as traffic lights are sometimes missing or malfunctioning at intersections in China. Therefore, drivers may fail to notice a vehicle coming from their side, especially at night with poor visibility. In the data set, eight side impact SRTCs and PSRTCs occurred at intersections, all traffic lights were missing or malfunctioning, and six of the crashes occurred at night without street lights. This finding is also generally in line with Abdel-Aty et al's suggestion that side impact crashes have increased injury severity compared with other crash types at intersections [35]. When rear-end crashes are considered as the reference type, it is found that "Location" has positive signs for all the models, which means rear-end crashes are less likely to occur at intersections. Similarly, the positive coefficients of "Roadside safety rating" means that rear-end crashes are less likely than other crashes to happen at locations with hazardous and non-traversable roadsides (e.g. locations next to slopes/embankments and bodies of water or with a high roadbed). The coefficients of "Overloading of cargo" are all negative, thus, freight vehicles are more likely to be involved in a rear-end crash when overloaded. A reasonable explanation of this phenomenon is that overloading of freight vehicles tends to cause brake failure and increases braking distance [36], which in turn leads to rear-end collision with vehicles in front. Support for this interpretation can be found in crash information that shows that brake failure occurred in 11 of 13 "Overloading of cargo" rear-end crashes.
Roads with vertical grades are more likely to have single-vehicle crashes (OR = 11.688), rear-end crashes (OR = 3.472) and side impact crashes (OR = 2.513) when compared to headon crashes. "Driver distraction" has negative signs for the three models, which demonstrates that distracted drivers are more likely to be involved in a head-on crash. The main reason for this kind of crash is inattentive driving or overtaking other vehicle(s) dangerously, and crossing the centerline without noticing the on-coming vehicle [23]. Suitable preventive measures include appropriate road markings and the posting of overtaking-forbidden signs, and the installation of caution signs/devices such as centerline rumble strips to raise drivers' awareness at road locations prone to head-on-crashes.
For single-vehicle SRTCs and PSRTCs, the coefficients of "Driver distraction" and "Overloading of cargo" are positive, demonstrating that they are not the main risk factors for such crashes. However the negative signs of "Vertical alignment" and "Roadside safety rating" mean they are highly associated with single-vehicle crashes. There are in total 100 single-vehicle crashes, 76 of which occurred at locations with vertical grades and hazardous and non-traversable roadsides. It is worth noting that 73 are run-off-road crashes with lack of, or nonstandard, roadside safety infrastructure. As provision of clear zones is usually not an economical option for these road sections, roadside barriers of the proper type and sufficient strength are strongly suggested. Delineators, shoulder rumble strips, and warning signs that remind drivers of danger should also be implemented [37].

Conclusions
Eighteen main risk factors of the SRTCs and PSRTCs that occurred from 2007 to 2014 were derived from the IIASMRTC database, which is maintained by the TMRI. Multinomial logistic regression analysis was performed to compare single-vehicle, head-on, rear-end and side impact crashes in terms of factors affecting crash occurrence. Based on the analysis results, several important findings and recommendations can be made: 1. Intersections are more likely to have side impact SRTCs and PSRTCs, especially those with poor visibility at night. Maintenance of traffic lights and street lights would be feasible measures to decrease such crashes.
2. Freight vehicles are more likely to be involved in a rear-end crash when overloaded. Since the major motivation for overloading of freight vehicles is the excessive pursuit of financial benefit, increasing regular checks by traffic police and improving transportation management to increase the income of contractors and drivers are practical ways to prevent overloading of cargo.
3. Distracted drivers are more likely to be involved in a head-on crash. The main reason for this kind of crashes is inattentive driving or overtaking other vehicle(s) dangerously, and crossing the centerline without noticing the on-coming vehicle. Promising preventive measures include appropriate road markings and the posting of overtaking-forbidden signs, and the installation of caution signs/devices such as centerline rumble strips to raise drivers' awareness at head-on-crashes-prone road locations.
4. Vertical alignment and roadside safety rating are highly associated with single-vehicle crashes in a positive way, and most are run-off-road crashes. As provision of clear zones is usually not an economical option for these road sections, roadside barriers of the proper type and sufficient strength are strongly suggested. Delineators, shoulder rumble strips and warning signs that remind drivers of danger should also be implemented.
Despite the contributions of this work, a limitation lies in the validity and reliability of the risk factors. The risk factors collected from the IIASMRTA database were determined through in-depth investigation and analysis by the expert team. Although the authors reviewed the same evidence for most of the period under consideration, and validated the expert team assignment factors in three incidents through on-the-spot investigation, the reliability of the factors assigned across all the cases cannot be completely guaranteed.
Another limitation is the size of the data set. Information about SRTCs and PSRTCs that occurred before 2007 was unavailable in the IIASMRTA database and thereby not analyzed in this study. Although the data set used in this study met the desired sample size, a larger sample size would increase the prediction accuracy of the model and enable the authors to investigate more potential contributing factors at various significance levels [38], which should be focused on in the future.