Analysis of influencing factors of traffic accidents on urban ring road based on the SVM model optimized by Bayesian method

Lei Wang; Mei Xiao; Jiliang Lv; Jian Liu

doi:10.1371/journal.pone.0310044

Abstract

Based on small scale sample of accident data from specific scenarios, fully exploring the potential influencing factors of the severity of traffic accidents has become a key and effective research method. In order to analyze the factors mentioned above in the scenario of urban ring roads, this paper collected data records of 1250 traffic accidents involving different severity on urban ring road of a central city in northwest China in the past 3 years. Firstly, the Support Vector Machine (SVM) model of non-parametric method is utilized to analyze the data above, and three kernel functions of linear, inhomogeneous polynomial and Gaussian radial basis are constructed respectively. Considering comprehensively 16 potential influencing factors covering the driver-vehicle-road-environment integrated system, the SVM models of above three kernel functions are verified, accuracy reaches 0.771 and F1 reaches 0.841. Then, Bayesian Optimization (BO), Grids Search (GS) and Rough Set (RS) are utilized as optimizer to adjust the parameters of Gaussian radial basis function SVM model, the performance of BO-SVM is further improved and reaches the optimum, with an average accuracy of 0.875 on the test set and a F1 of 0.886, completely outperforming the benchmark models of GS-SVM, RS-SVM, Bilayer-LSTM and BP. Finally, the sensitivity analysis method is utilized to quantify the sensitivity of the potential influencing factors to the severity of road accidents, and the backward selection method is utilized to screen the core influencing factors that influence the severity of accident, concluded that core influencing factors are age, driving mileage and vehicle type. This paper will provide reference for the analysis of the significant influencing factors for road accidents severity, and to provide theoretical support for the precise formulation of accident improvement strategies.

Citation: Wang L, Xiao M, Lv J, Liu J (2024) Analysis of influencing factors of traffic accidents on urban ring road based on the SVM model optimized by Bayesian method. PLoS ONE 19(9): e0310044. https://doi.org/10.1371/journal.pone.0310044

Editor: Tien Anh Tran, Vietnam Maritime University, VIET NAM

Received: May 7, 2024; Accepted: August 21, 2024; Published: September 24, 2024

Copyright: © 2024 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data cannot be shared publicly because these data are collected by the transportation management department of Chinese government and are sensitive information related to the social governance of the city mentioned in the paper, the data is prohibited from being made public for public inquiry. According to relevant laws and regulations and data usage agreements, this data can only be used for our team's scientific research, and we cannot share it with third parties, including the journal editorial department. We have described in detail the methods and processes of data processing in the article to ensure transparency and reproducibility of the research. Data Management Secretary, Ms. Zhu, zhuqianqian@catarc.ac.cn. Researchers can contact the provided data contact person for data queries.

Funding: This research was funded by Tianjin Transportation Technology Development Plan Project (2023-7), Science & Technology Development Fund of Tianjin Education Commission for Higher Education (2023KJ246), and Tianjin Science and Technology Plan (23YDTPJC00810).

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

The study of road traffic accidents serves as a cornerstone for the development of effective traffic system safety countermeasures. The complexity of the causes underlying road traffic accidents necessitates a thorough examination of the factors that influence their severity. Drawing insights from accident data to uncover the roots of traffic accident severity has emerged as a promising research theory and method.

Analyzing the intricate causes of traffic accidents necessitates a thorough examination of the pivotal factors that govern the severity of road incidents. This comprehensive understanding allows for the precise formulation of targeted traffic safety enhancement strategies, thereby effectively reducing the detrimental consequences of accidents and safeguarding the personal well-being and property interests of all road users. Traditional research, heavily reliant on discrete mathematical statistics methods like Logit/Probit, has yielded valuable insights into accident influencing factors [1]. Notably, Sheikh Manirul Islam et al. [2] used a hierarchical multinomial logit model to evaluate factors affecting severity of right-turn crash injuries at 221 signalized intersections, considering within-intersection crash correlation. Shen Xin et al. [3] employed the Multinomial Logit model to explore the severity of traffic accidents, highlighting significant contributing variables. Furthermore, Wang Peng et al. [4] employed the ordered Probit method to delve into the intricate causal relationships between a wide array of factors—encompassing environmental conditions, road attributes, vehicle specifications, and driver behavior—and the severity of rear-end collisions. Despite the proven effectiveness of discrete choice models in accident analysis, their utilization often neglects the inherent ordinality of accident severity. Moreover, the reality of incomplete accident records, imprecise indicator variables, and potential false positives can introduce biases into parameter estimations, undermining the accuracy of results [5]. Consequently, there is a pressing need to transcend these limitations and adopt advanced analytical approaches capable of capturing the complexity and nuances of traffic accident causality. This endeavor will not only advance our understanding of accident severity but also pave the way for the development of more effective and targeted traffic safety interventions.

Because of its own characteristics, the traditional discrete model not only requires high data quality, such as data collation in the data preparation stage, but also has restrictions on the conditions of utilizing. In order to solve the problems above, some scholars try to utilize non-parametric methods to analyze the influence factors of accidents, such as classification regression model, Bayesian network model, neural network model and many other models. Studies have shown that non-parametric methods show better statistical fitting ability and generalization performance in dealing with accident analysis problems. For example, Sun Yixuan et al. [6] established a decision tree model to analyze the differences of factors influence the severity of road accidents. Li Qingquan et al. [7] utilized fuzzy support vector machine (FSVM) to successfully predict the road accident status. Shi Juan et al. [8] utilized Back Propagation neural network to analyze the unsafe factors suffered by workers, and established a prediction model of accident injury severity, which achieved good performance. Nonetheless, it is crucial to recognize the limitations inherent in existing research: The sample datasets employed may be limited in size and lack of comprehensive records pertaining to casualties, injury locations, and vehicle conditions, as noted in the reference article. Furthermore, the categorization of road accident severity into a finite number of classes could be refined to offer a more intricate perspective on accident outcomes.

Despite these limitations, non-parametric methods offer a promising avenue for advancing beyond traditional discrete models. They empower researchers to delve deeper into the intricate web of factors influencing road accident severity, thereby informing the formulation of more targeted and effective traffic safety strategies. This paper employs an enhanced Support Vector Machine (SVM) model to analyze urban ring road accident data and uncover the impact of latent influencing factors on accident severity. Through the construction of various SVM models, including those with linear kernel functions, inhomogeneous polynomial kernel functions, and Gaussian radial basis kernel functions, this study comprehensively explores the effects of 16 potential influencing factors spanning the four primary aspects of humans, vehicles, roads, and the environment on accident severity. This approach can overcome the limitations of traditional discrete choice models, enhancing the accuracy of accident severity predictions. Additionally, it provides a fresh perspective and tool for traffic accident analysis, facilitating a deeper understanding of accident mechanisms and influencing factors. Based on the precise prediction model proposed in this paper, traffic management departments can develop more targeted safety strategies, thereby effectively reducing the severity of ring road traffic accidents and protecting public life and property safety.

2. Data description and processing

2.1 Data description

The data utilized in this paper comprises 1250 data records of traffic road safety accidents that occurred over the past 3 years on the urban ring roads of a central city in northwest China. As detailed in Table 1, these records encapsulate essential information pertaining to the accidents’ characteristics, the parties involved and their casualties, vehicle type specifics, and additional supplementary data. To ensure the utmost precision in analyzing the interplay between accident severity and its influencing factors, a rigorous data cleansing process was undertaken. This involved discarding attributes characterized by high missing rates, such as those related to the transportation of hazardous materials or vehicle modifications, along with attributes displaying low correlation with accident severity, like vehicle registration numbers, colors, and investigator details. Recognizing the paramount importance of the closed-loop driving system—encompassing drivers, vehicles, roads, and the environment—in shaping accident severity, these components were designated as independent variables for subsequent modeling, with accident severity itself serving as the dependent variable.

Download:

Table 1. List of influencing factors.

https://doi.org/10.1371/journal.pone.0310044.t001

2.2 Data processing

Data imbalance caused by excessive or rare proportion value of each potential influencing factors of accident severity will influence the analysis effect of the established model, and combining values with similar meanings can compensate for the impact of data imbalance [9]. In order to prevent the proportion of each value in the balanced attribute from being too little, the lower limit of the proportion of attribute values is set to 10% in this paper [10]. For example, due to the improvement of the current road maintenance and management level, "snow and ice" rarely occur on the road. Considering that the proportion of "snow and ice" in the road condition is less than the set lower limit, as shown in Table 1, it is merged into "others".

3. Theoretical methods

3.1 Data inspection

Reviewing the current research status, most scholars rely on the traditional discrete choice model-Logit/Probit model to analyze the severity of traffic accidents, but there is deficient work on the verification of the conditions for the utilizing of the discrete choice model. However the traditional discrete choice model in application is with strict applicable conditions, such as the Multinomial Logit model is applicable to the preconditions that require the independence of independent selection, while the ordered Logit model is applicable to the conditions that require the parallel line assumption. In most studies, the severity of road accidents is divided into “only property damage accident”, “injury accident” and “fatal accident”. With the help of SPSS, the Small-Hsiao [11] method is utilized to test the independence hypothesis of independent selection, and the Hausman [12] method is utilized to test the parallel line hypothesis. Following Tables 2 and 3 for the test results.

Download:

Table 2. Small-Hsiao independence hypothesis test.

https://doi.org/10.1371/journal.pone.0310044.t002

Download:

Table 3. Hausman parallel line hypothesis test.

https://doi.org/10.1371/journal.pone.0310044.t003

It can be seen from Tables 2 and 3 that the test results do not meet the conditions of independence hypothesis test of independent selection and parallel line hypothesis test, and the traditional discrete selection model cannot be utilized to analyze the accident data, Therefore, the related model in the non-parametric method is considered to analyze the data.

3.2 SVM model

SVM model is a non-parametric solution to classification problems based on statistical learning and reasoning knowledge. For binary classification problems, try to find a hyperplane to separate the uncertain number sets, where the hyperplane equation can be expressed as follows: (1)

In the hyperplane set composed of hyperplanes, there is an optimal separation hyperplane. The separation hyperplane has a unique determination of the edge vector. Therefore, under the condition that the data is nonlinear and separable, the SVM model will be transformed into solving the optimal problem: (2)

In the formula, parameters C and ξ_i respectively represent the penalty factor of classification error and the relaxation variable of classification error. However, the value of C should meet the above constraints.

The Lagrange operator is further introduced to obtain the corresponding classification decision function: (3)

In the Eq (3),αi, i, x and b respectively represent Lagrangian operators, support vectors and real numbers, where real number b is defined as the basic hyperplane. Since the data is linearly indivisible, it is necessary to map the non-fractional data to the high-dimensional eigenvector space by mapping it. In this case, kernel function can be utilized for nonlinear transformation: (4)

For multi-classification problems to be solved, SVM is based on the idea of structural risk minimization, and realizes the processing ability of linear to nonlinear classification problems by constructing the kernel function to realize the mapping of feature space, and finally determines the optimal hyperplane in the mapping feature space to maximize the classification interval by changing the nature of the kernel function. Commonly utilized kernel functions are linear kernel function k_l, non-homogeneous polynomial kernel function k_p and Gaussian radial basis kernel function k_r, as shown in formula (5): (5a) (5b) (5c)

Combining formula (2), (4) and (5), it can be known that when the three different kernel functions are utilized, the corresponding parameters of the kernel function should be determined.

The SVM model itself cannot be directly utilized for multi-classification problems, and the multi-classification problems need to be processed step by step. Generally, there are three ways to deal with multi-classification problems using this model:

Cortes, based on K subcategories, first selects one category sample to be treated as one category, and the remaining K-1 categories are treated as another category, and constructs K SVM models step by step, so this method is called "one-to-other".
Knerr [13], based on the K classification problem, determines two classification categories each time, and only needs to design a two-classification SVM model for each two categories each time. This method is called "one-to-one".
Platt [14], with the help of decision tree theory, combines binary decision tree with SVM model to build a new multi-class recognizer.

4. Model verification and result analysis

4.1 Model verification

When utilizing SVM model to deal with multi-classification problems, the "one-to-one" method is widely utilized because of its own advantages. Therefore, the "one-to-one" method is utilized to analyze road accident data in this study. When the SVM model is tested repeatedly with three kernel functions, the average prediction accuracy of the classification model corresponding to the three kernel functions is the highest when the cross-validation adopts 10 fold cross. In addition to taking the average prediction accuracy of “only property damage accident”, “injury accident” and “fatal accident” in the severity of road accidents as the basis for judging the classification accuracy of the three kernel function SVM models, this paper also introduces F1 evaluation index based on the confusion matrix as the generalization performance evaluation standard of the model. As shown in Table 4, the prediction performance of SVM model with three kernel functions is obtained.

Download:

Table 4. SVM prediction performance corresponding to different kernel functions.

https://doi.org/10.1371/journal.pone.0310044.t004

From the classification performance of the above three kernel functions, It is clear to know that the SVM model using Gaussian radial basis function has the highest classification accuracy. When dealing with nonlinear classification problems, the kernel parameters γand penalty factor C of Gaussian radial basis function influence the distribution of mapping space. The value of γ has a further impact on the number of support vectors. The more support vectors are obtained, the model is prone to under-fitting problems. However, the less support vectors are obtained, the model is prone to over-fitting problems [15]. These two parameters not only affect the training accuracy of the model, but also affect the training speed of the model. Therefore, the optimal kernel parameters γ and penalty factor C is extremely important [16].

4.2 Model parameter tuning

In order to further explore the impact of kernel parameters of Gaussian radial basis function on the classification performance of SVM models, three classification prediction models, BO-SVM, GS-SVM and RS-SVM, with Bayesian Optimization (BO), Grid Search (GS) and Rough Set (RS) as optimizers, are established respectively. The optimization parameters and prediction performance of the three models are shown in Table 5.

Download:

Table 5. Three improved SVM model performance.

https://doi.org/10.1371/journal.pone.0310044.t005

By comparing Tables 4 and 5, it can be concluded that the performance of the SVM model based on the Gaussian radial basis function of the optimizer has been further improved. Comparing the classification performance of the SVM model obtained by the three optimizers, it can be seen that the performance of the BO-SVM model is the best compared with GS-SVM and RS-SVM. The conclusions can be drawn from the analysis of Table 5, the performance of the above three classification prediction models on the data set is good and the accuracy of the test set is more than 80%, but the performance of BO-SVM on the training set and test set is the best. In addition, the optimal parameters of the three models are obtained through 10 fold cross validation. Through the analysis of the optimal parameters, it can be seen that the parameters of the SVM model optimized by GS and RS are greater than that of BO, and the model is over-fitted, which ultimately leads to poor performance in the test set. The specific performance is that the effect of RS algorithm in optimizing parameters depends on the number of samples in the sample feature space. The sampling result is easy to make local optimization, and the optimization efficiency is low. However, GS algorithm cannot traverse all C and γ, in order to get the optimal parameter combination, we need to increase the grid search density and more computing resources.

To further explore the impact of kernel parameters in the Gaussian radial basis function on the classification performance of the SVM model, we established three classification prediction models: BO-SVM, GS-SVM, and RS-SVM, utilizing Bayesian Optimization (BO), Grid Search (GS), and Rough Set (RS) as optimizer, respectively. To comprehensively evaluate the performance of these models, in addition to comparing these three SVM variants, We also introduced two deep learning models of Double layer long short memory neural network (Bilayer-LSTM) [17] and Back-propagation Neural Network (BP), as comparative algorithms. All five prediction algorithms were trained on a Windows 10 system using Matlab R2023a. For robust model evaluation, this paper systematically split the HighD dataset into 8:1:1 for training, validation, and testing, and all models adopted 10-fold cross-validation. The deep learning environment was configured as follows: Deep Learning Toolbox 23.2 was used as the learning framework, running on hardware equipped with an Intel Xeon W-2295 CPU and an NVIDIA RTX A2000 GPU. For the deep learning models, we employed the Adam algorithm as the optimizer, with a learning rate set to 0.001, a dropout rate of 0.2 to prevent over-fitting, a batch size of 128, and 300 training epochs. Comparing Tables 4 and 5, we can draw the following conclusions.

The performance of SVM models with Gaussian radial basis kernel functions, aided by optimizer, has been further improved. Among the SVM variants, the BO-SVM model exhibits optimal performance compared to GS-SVM and RS-SVM. All five classification prediction models perform well on the dataset, with test set accuracy exceeding 80%. However, BO-SVM demonstrates the best performance in all aspects.
Through 10-fold cross-validation, the optimal parameters for the three traditional machine learning models, BO-SVM, GS-SVM, and RS-SVM were obtained. Analysis of these optimal parameters reveals that the SVM model parameters obtained using GS and RS optimization are relatively large compared to those obtained using BO. This results in model over-fitting, ultimately leading to poor performance on the test set. Specifically, the effectiveness of the RS algorithm in parameter optimization depends on the sampling frequency of the sample feature space, and the sampling results can easily lead to local optima with lower optimization efficiency. On the other hand, the GS algorithm cannot iterate through all possible combinations of C and γ [18, 19]. To obtain the optimal parameter combination, it requires increasing the grid search density, demanding more computational resources. These factors could explain why GS-SVM and RS-SVM do not exhibit optimal performance on the dataset.
Although the Bilayer-LSTM and BP models possess strong learning capabilities, their performance in this experiment did not surpass that of the BO-SVM model with parameter tuning. This can be attributed to several reasons: Firstly, considering the data scale and characteristics, when dealing with smaller datasets or less complex feature spaces, the potential of deep learning models may not be fully realized, whereas traditional machine learning algorithms like SVM perform more robustly and effectively in such scenarios. Secondly, due to their higher complexity, deep learning models have increased demands for data and computational resources. Limitations in resources or data may lead to overfitting or inadequate training, thereby affecting performance. Lastly, BO-SVM fine-tunes its kernel parameters through Bayesian optimization, giving it an advantage in specific tasks. In contrast, the performance of deep learning models relies more on the selection of hyperparameters and optimization strategies. And the performance of the BP model is slightly better than that of Bilayer LSTM because the dataset does not have strict time sequence data.

4.3 Analysis of potential influencing factors

4.3.1 Sensitivity analysis of influencing factors.

Explaining the influence of variables on response variables, commonly utilized methods include sensitivity analysis [20, 21], marginal influence coefficient [22] and backward selection method [23]. The principle of sensitivity analysis and marginal influence coefficient method is similar. Firstly, change the magnitude of each independent variable one by one while keeping the remaining independent variables unchanged. Secondly, the change of the average probability of different accident severity after each modification of the magnitude of the independent variable is calculated, to measure the relationship between potential accident impact variables and accident severity. The results of sensitivity analysis are shown in Table 6. However, the sensitivity analysis or marginal impact coefficient only reflects the single factor relationship between a certain potential impact variable and the accident severity, but cannot reflect the interaction of multiple impact variables on the accident severity. Therefore, sensitivity analysis is utilized to quantify the influence of a single influencing factor on the accident severity, and backward selection method is adopted to explore the influence of interaction between various influencing variables on the accident severity, and reveal the most influential variable combination on the severity of road accidents.

Download:

Table 6. Sensitivity analysis results.

https://doi.org/10.1371/journal.pone.0310044.t006

It is clear to know from Table 6 that the same potential influence factor has different significance effects on different accident severity degrees, and different potential influencing factors also have different significance influence on accidents of the same severity degrees [24]. The following is an analysis of the influence of potential factors on the severity degrees of road accident from four aspects: driver-vehicle-road-environment.

(1) Driving human factor. The age of drivers is divided into three stages, with the age group under 30 years old as the reference group. Compared with the reference group, the other two age control groups have reduced the accident death rate of 0.0063 and 0.0293 respectively, indicating that compared with young people, middle-aged people are less likely to cause serious accidents when driving, and middle-aged people are more cautious when driving; However, the proportion of injuries caused by middle-aged people compared with young people increased by 0.0073 and 0.1503 respectively, indicating that middle-aged people are more vulnerable to injuries than young people in accidents. Among the potential influencing factors of driver gender, taking female drivers as the reference group, it can be seen that the ratio of injury accidents and fatal accidents caused by male drivers has decreased by 0.0086 and 0.0056 compared with female drivers.

It shows that male drivers are more calm than female drivers in accidents and have better ability to deal with emergencies; However, the proportion of "only property damage accident" caused by male drivers is 0.0155 more than that of female drivers. This may be due to the fact that male drivers are more likely to drive “angry” vehicles when they encounter serious traffic jams during driving, such as forced congestion, forced lane change and other operations are more likely to cause "scratch" accidents. There is a deficiency in measuring whether a driver is a skilled driver by his driving age, it is more practical to use driving mileage as the measurement standard. Drivers with a driving mileage of 40000km can be considered as skilled drivers, and can avoid casualty accidents more than unskilled drivers. Taking unskilled drivers as the reference group, the ratio of casualty accidents caused by skilled drivers has decreased by 0.0135 and 0.0896 respectively, so the training intensity of drivers should be strengthened. Perfecting the training system of drivers and improving the cultivation of their own quality are very important for reducing the occurrence of injury and fatal accidents.

(2) Vehicle factor. Among the accidents caused by violation of rules and regulations, the ratio of property damage and injury accidents caused by violation of rules and regulations by non-motor vehicles increased by 0.0283 and 0.0095 respectively, taking motor vehicle violations (including driving without following the road markings, not giving way to pedestrians and running yellow lights, etc.) as the reference group, indicating that non-motor vehicles are more likely to cause only property damage accidents or injuries accidents to pedestrians or non-motor vehicle drivers. In addition, the rate of fatal accidents caused by non-motor vehicles decreased by 0.0053, indicating that non-motor vehicles are more likely to have accidents with non-motor vehicles or pedestrians; Compared with motor vehicles, pedestrian violations are more likely to cause serious injuries to pedestrians, indicating that pedestrians are the most vulnerable group of road users to serious injuries, and pedestrians should improve their safety awareness and self-cultivation. Among the accidents involving trucks, trucks are more likely to cause serious accidents and casualties. The ratio of casualties has increased by 0.0155 and 0.0263 respectively compared with cars. The reason is that trucks are more likely to cause serious injuries to the surrounding road users in the event of accidents due to their large inertia in driving. In order to reduce the consequences of serious accidents caused by trucks, the driving route and time of trucks can be reasonably limited to avoid driving during periods with large traffic volume. At the same time, the type of motor vehicle accident is also a potential influence factor of the severity of road accidents. Taking the type of rear-end collision accident as the reference group, the side collision is more likely to cause minor traffic accidents than the rear-end collision, which shows that the ratio of "only property damage accident" caused by rear-end collision has increased by 0.0205. However, compared with rear-end accidents, frontal collision accidents are more likely to cause casualties. Specifically, the ratio of casualties caused by frontal collision accidents has increased by 0.0715 and 0.0010 respectively. The reason for the increase in the probability of casualties in frontal collision accidents is that the driver or fellow passengers have weak traffic safety awareness and have not taken effective passive safety measures, such as the lack of safety belt wearing. Therefore, it is not only necessary to implement the safety belt wearing of the front seated passengers, but also particularly important for the rear passengers.

(3) Road factor. In addition, the road itself is also a potentially important factor that causes road accidents. Compared with the road without lighting conditions, the road with lighting conditions will reduce the rate of casualty accidents. The road with lighting conditions will provide the driver with a broader view at night. The driver does not need to turn on the high beam when driving on the road with good lighting conditions, which reduces the threat of dazzling, and even "instant blindness" caused by the high beam towards the driver. Compared with the impact of lighting conditions, the driver is also very sensitive to the light and dark changes. Especially in the urban tunnel section with saturated traffic flow, accidents in the tunnel are more likely to cause serious serial accidents of vehicles at the tunnel entrance. In order to prevent this phenomenon, the tunnel needs to consider the light change transition section at the tunnel entrance at the design stage to increase the driver’s adaptability to light changes. In addition, adding separation facilities between motor vehicles and non-motor vehicles in the same lane can reduce the ratio of injuries and fatalities to 0.0155 and 0.0265 respectively. Since non-motor vehicles are more vulnerable to injuries than motor vehicles when driving on the road, it is particularly important to isolate motor vehicles and non-motor vehicles on the same lane.

(4) Traffic environment factor. In addition, the traffic environment is also one of the potential factors that cause traffic accidents with different severity. The environmental factors listed in this study include weather factors, road surface factors, seasonal factors and holiday traffic flow factors. In the weather factors, taking the visibility greater than 200m as the reference group, by comparing the relationship between the severity of road accidents and the change of visibility, It is clear to know that with the reduction of visibility, the rate of casualty accidents has decreased to varying degrees. The reason is that with the reduction of visibility, drivers are more cautious in driving, and deliberately reduce the speed of driving, even if there is a traffic accident, the severity of road accidents will not be too serious. Changes in road conditions will also influence the driving speed of drivers. Taking dry roads as the reference group, slippery, waterlogged and even snowy and icy roads have drivers to reduce the speed compared with dry roads. Therefore, it is not difficult to understand that the latter three road condition will respectively reduce the ratio of fatal accidents by 0.1032, 0.0542 and 0.1278.

The saturation of traffic flow is also an important factor affecting the severity of road accidents. Due to the policy that vehicle license plates are not restricted on holidays in Chinese big cities, the traffic environment becomes more congested and the traffic flow becomes more saturated. However, due to lower vehicle speeds, the rate of fatalities and injuries decreases compared to workdays.

4.3.2 Backward selection method analysis.

The steps of the backward selection method are: Firstly, consider all potential influencing factors to obtain a 10 fold cross validation accuracy y₁ for BO-SVM modeling. Secondly, rank the influencing factors according to their significance, as shown in Table 7, eliminate the influencing factors that are ranked lower in order according to the ranking order, continue to use BO-SVM modeling to obtain a 10 fold cross validation accuracy of y₂ until all impression factors are eliminated and a series of prediction models M_i,j (i = 1,2,3; j = 1,2,…, 16) are obtained, Correspondingly, a series of accuracies y_i,j (i = 1,2,3; j = 1,2,…, 16) are obtained. The performance of the backward selection model is shown in Table 8.

Download:

Table 7. Influential factor significance ranking.

https://doi.org/10.1371/journal.pone.0310044.t007

Download:

Table 8. Backward selection method model performance comparison.

https://doi.org/10.1371/journal.pone.0310044.t008

Utilizing the backward selection method, the P value and accuracy rate of the "only property damage accident" model are generated. As shown in Table 8, there is a significant difference between the M_1,8-M_1,15 model and M_1,1, and the accuracy rate of the model is reduced significantly. Therefore, the top seven core factors that affect the severity of “property loss accident” in Table 8 are: accident type, accident period, accident lane, vehicle type, driving mileage, week day, and age. Similarly, the top eight core factors affecting the severity of “Injury accident” are: driving mileage, risky driving, accident lane, gender, vehicle type, age, road conditions, and violations. The top eight core factors affecting the severity of “fatal accident” are: age, accident type, gender, vehicle type, road conditions, road lighting, risky driving, and violations.

Drivers are the largest uncertain factor influencing traffic operation safety. Some studies have found that human factors account for up to 90% of the factors related to traffic accidents [25, 26]. The influencing factors involved in this study include the age, driving mileage, and gender of the driver. The vehicle factors involved include the type of vehicle involved (trucks, cars, and non motor vehicles). Utilizing the backward selection method, age, driving mileage, and vehicle type were ultimately selected as the core influencing factors. It is clear to know that drivers’ physical functions also decrease to varying degrees as they age, such as their ability to obtain information, process information, and their agility. Measures such as driver health monitoring can be taken to protect the rights and interests of drivers, such as conducting regular physical examinations, and enforcing mandatory rest after driving for more than 4 hours continuously. With the increase in driving mileage, drivers’ ability to handle unexpected events and vehicle performance increases. It is recommended to increase the training time for truck drivers on the basis of the current driving test to further improve their driving skills and psychological quality. In addition, trucks account for a large proportion of the factors that cause serious casualties. It is recommended to limit the travel time and lane of trucks on conditional urban roads, and try to avoid large vehicle and pedestrian flows.

5. Conclusion

To analyze the potential influencing factors of the severity of traffic accidents on urban ring roads and provide theoretical support for the precise formulation of road accident improvement strategies, this paper explores the application of Support Vector Machine (SVM) models based on optimization strategies in predicting and classifying the severity of traffic accidents, mainly drawing the following research conclusions.

By employing a linear kernel function, a non-homogeneous polynomial kernel function, and a Gaussian radial basis function, we have validated the SVM model’s prediction and generalization abilities in addressing the multiple classification challenges associated with traffic accident severity. Notably, the SVM model, when equipped with a Gaussian radial basis function as its kernel, demonstrates superior performance. Furthermore, the introduction of the BO optimizer to enhance the BO-SVM model offers additional performance gains for the SVM approach.
The non-parametric BO-SVM model, rooted in Gaussian radial basis kernel functions, serves as a valuable adjunct to conventional discrete selection models, such as Logit/Probit. Through sensitivity analysis techniques, we’ve quantified the impact of 16 potential factors that encompass the integrated driver-vehicle-road-environment system on the likelihood of accidents of varying severity.
By adopting the backward selection technique, the paper precisely identifies the key factors that significantly influence the accidents severity levels for "only property damage accidents", "injury accidents," and "fatal accidents", this comprehensive analysis provides crucial insights into the determinants of accident severity, thereby enhancing our understanding of the factors that shape the outcomes of traffic incidents.

Although our study offers important contributions, we must also acknowledge its limitations. For example, the data samples selected in this study are insufficient, the sample data lacks detailed records related to casualties, and the injury site and severity are not explained in detail. The sample records lack detailed description of vehicle conditions. There are only three grades for the severity of road accidents, which is need to be further refined and improved.

References

1. Hosseinpour M, Sahebi S, Zamzuri Z H, et al. Predicting crash frequency for multi-vehicle collision types using multivariate Poisson-lognormal spatial model: A comparative analysis[J]. Accident Analysis & Prevention, 2018, 118(SEP.):277–288. pmid:29861069
- View Article
- PubMed/NCBI
- Google Scholar
2. Islam S M, Washington S, Kim J, et al. A hierarchical multinomial logit model to examine the effects of signal strategies on right-turn crash injury severity at signalised intersections[J]. Accident Analysis & Prevention, 2023, 188: 107091. pmid:37150130
- View Article
- PubMed/NCBI
- Google Scholar
3. SHEN Xin, SHEN Jinxing, ZHENG Changjiang, et al. Severity Analysis of Slow Traffic Accidents in North Carolina Based on Multinomial Logit Model[J]. Traffic and Transportation. 2021(05):24–28.
- View Article
- Google Scholar
4. WANG Peng, LU Xiaozhao, YAN Zhangcun, et al. Analysis on Influencing Factors of Rear-end Crash Severity Based on Ordered Probit Model. Journal of Highway and Transportation Research and Denelopment, 2018, 35(4): 102–107, 122.
- View Article
- Google Scholar
5. Hazimeh H, Mazumder R, Radchenko P. Grouped variable selection with discrete optimization: Computational and statistical perspectives[J]. The Annals of Statistics, 2023, 51(1): 1–32.
- View Article
- Google Scholar
6. SUN Yixuan, SHAO Chunfu, ZHAO Dan, et al. Traffic accident severity prediction model based on C5.0 decision tree[J]. Journal of Chang’an University (Natural Science Edition). 2014(05):109–116.
- View Article
- Google Scholar
7. LI Qingquan, GAO Dequan, YANG Bisheng, et al. Urban road traffic status classification based on fuzzy support vector machines[J]. Journal of Jilin University (Engineering and Technology Edition),2009,39 (S2): 131–134.
- View Article
- Google Scholar
8. SHI Juan, CHANG Dingyi, ZHENG Peng. An early warning model of unsafe behaviors of construction workers based on BP neural network[J]. China Safety Science Journal,2022, 32(01):27–33.
- View Article
- Google Scholar
9. Sangapu S C, Prasad K S N, Kannan R J, et al. Impact of class imbalance in VeReMi dataset for misbehavior detection in autonomous vehicles[J]. Soft Computing, 2023: 1–11.
- View Article
- Google Scholar
10. BAI Yu, WEI Yi. Influencing Factors on Severity of Serious Road Traffic Accidents[J]. Traffic and Trans-portation. 2022(03):22–26.
- View Article
- Google Scholar
11. GÜNERİ Ö İ, DURMUŞ B, İNCEKIRIK A. ASSUMPTIONS, TESTS AND COMPARATIVE CRITERIA IN QUALITATIVE PREFERENCE MODELS[J]. Academic Research & Reviews in Social, Human and Administrative Sciences-II, 2023: 120.
- View Article
- Google Scholar
12. Hahn J, Liao Z, Liu N, et al. Some finite-sample results on the Hausman test[J]. Economics Letters, 2024, 238: 111721.
- View Article
- Google Scholar
13. Knerr S, Personnaz L, Dreyfus G, et al. Single-layer learning revisited: a stepwise procedure for building and training a neural network[J]. Neurocomputing Algorithms Architectures & Applications, 1990.
- View Article
- Google Scholar
14. Platt J, Cristianini N, Shawe-Taylor J. Large Margin DAG’s for Multiclass Classification[C]. Neural Infor-mation Processing Systems. MIT Press, 1999.
15. Ramampiandra E C, Scheidegger A, Wydler J, et al. A comparison of machine learning and statistical species distribution models: Quantifying overfitting supports model interpretation[J]. Ecological Modelling, 2023, 481: 110353.
- View Article
- Google Scholar
16. Jahed Armaghani D, Ming Y Y, Salih Mohammed A, et al. Effect of SVM kernel functions on bearing capacity assessment of deep foundations[J]. Journal of Soft Computing in Civil Engineering, 2023, 7(3): 111–128.
- View Article
- Google Scholar
17. Wang J, Sun L, Li H, Ding R, Chen N. Prediction model of fouling thickness of heat exchanger based on TA-LSTM structure[J]. Processes, 2023, 11(9): 2594.
- View Article
- Google Scholar
18. Sun Y, Ding S, Zhang Z, et al. An improved grid search algorithm to optimize SVR for prediction[J]. Soft Computing, 2021, 25: 5633–5644.
- View Article
- Google Scholar
19. Shams M Y, Elshewey A M, El-Kenawy E S M, et al. Water quality prediction using machine learning models based on grid search method[J]. Multimedia Tools and Applications, 2024, 83(12): 35307–35334.
- View Article
- Google Scholar
20. Iooss B, Saltelli A. Introduction to sensitivity analysis[J]. Handbook of uncertainty quantification, 2017: 1103–1122.
21. Zamanian A, Ahmidi N, Drton M. Assessable and interpretable sensitivity analysis in the pattern graph framework for nonignorable missingness mechanisms[J]. Statistics in Medicine, 2023, 42(29): 5419–5450. pmid:37759370
- View Article
- PubMed/NCBI
- Google Scholar
22. Ma Z, Zhang H, Chien I J, et al. Predicting expressway crash frequency using a random effect negative binomial model: A case study in China[J]. Accident Analysis & Prevention, 2017, 98(JAN.):214–222. pmid:27764690
- View Article
- PubMed/NCBI
- Google Scholar
23. Noroozi Z, Orooji A, Erfannia L. Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction[J]. Scientific Reports, 2023, 13(1): 22588. pmid:38114600
- View Article
- PubMed/NCBI
- Google Scholar
24. Farooq A, Xie M, Stoilova S, et al. The Application of Smart Urban Mobility Strategies and Initiatives: Application to Beijing[J]. European Transport\Trasporti Europei, 2019.
- View Article
- Google Scholar
25. Stewart T. Overview of motor vehicle traffic crashes in 2021[R]. 2023.
26. Das D K. Exploring the significance of road and traffic factors on traffic crashes in a South African city[J]. International journal of transportation science and technology, 2023, 12(2): 414–427.
- View Article
- Google Scholar

[ref1] 1. Hosseinpour M, Sahebi S, Zamzuri Z H, et al. Predicting crash frequency for multi-vehicle collision types using multivariate Poisson-lognormal spatial model: A comparative analysis[J]. Accident Analysis & Prevention, 2018, 118(SEP.):277–288. pmid:29861069
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Islam S M, Washington S, Kim J, et al. A hierarchical multinomial logit model to examine the effects of signal strategies on right-turn crash injury severity at signalised intersections[J]. Accident Analysis & Prevention, 2023, 188: 107091. pmid:37150130
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. SHEN Xin, SHEN Jinxing, ZHENG Changjiang, et al. Severity Analysis of Slow Traffic Accidents in North Carolina Based on Multinomial Logit Model[J]. Traffic and Transportation. 2021(05):24–28.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref4] 4. WANG Peng, LU Xiaozhao, YAN Zhangcun, et al. Analysis on Influencing Factors of Rear-end Crash Severity Based on Ordered Probit Model. Journal of Highway and Transportation Research and Denelopment, 2018, 35(4): 102–107, 122.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref5] 5. Hazimeh H, Mazumder R, Radchenko P. Grouped variable selection with discrete optimization: Computational and statistical perspectives[J]. The Annals of Statistics, 2023, 51(1): 1–32.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref6] 6. SUN Yixuan, SHAO Chunfu, ZHAO Dan, et al. Traffic accident severity prediction model based on C5.0 decision tree[J]. Journal of Chang’an University (Natural Science Edition). 2014(05):109–116.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref7] 7. LI Qingquan, GAO Dequan, YANG Bisheng, et al. Urban road traffic status classification based on fuzzy support vector machines[J]. Journal of Jilin University (Engineering and Technology Edition),2009,39 (S2): 131–134.
View Article
Google Scholar

[22] View Article

[23] Google Scholar

[ref8] 8. SHI Juan, CHANG Dingyi, ZHENG Peng. An early warning model of unsafe behaviors of construction workers based on BP neural network[J]. China Safety Science Journal,2022, 32(01):27–33.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref9] 9. Sangapu S C, Prasad K S N, Kannan R J, et al. Impact of class imbalance in VeReMi dataset for misbehavior detection in autonomous vehicles[J]. Soft Computing, 2023: 1–11.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref10] 10. BAI Yu, WEI Yi. Influencing Factors on Severity of Serious Road Traffic Accidents[J]. Traffic and Trans-portation. 2022(03):22–26.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref11] 11. GÜNERİ Ö İ, DURMUŞ B, İNCEKIRIK A. ASSUMPTIONS, TESTS AND COMPARATIVE CRITERIA IN QUALITATIVE PREFERENCE MODELS[J]. Academic Research & Reviews in Social, Human and Administrative Sciences-II, 2023: 120.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref12] 12. Hahn J, Liao Z, Liu N, et al. Some finite-sample results on the Hausman test[J]. Economics Letters, 2024, 238: 111721.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref13] 13. Knerr S, Personnaz L, Dreyfus G, et al. Single-layer learning revisited: a stepwise procedure for building and training a neural network[J]. Neurocomputing Algorithms Architectures & Applications, 1990.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref14] 14. Platt J, Cristianini N, Shawe-Taylor J. Large Margin DAG’s for Multiclass Classification[C]. Neural Infor-mation Processing Systems. MIT Press, 1999.

[ref15] 15. Ramampiandra E C, Scheidegger A, Wydler J, et al. A comparison of machine learning and statistical species distribution models: Quantifying overfitting supports model interpretation[J]. Ecological Modelling, 2023, 481: 110353.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Jahed Armaghani D, Ming Y Y, Salih Mohammed A, et al. Effect of SVM kernel functions on bearing capacity assessment of deep foundations[J]. Journal of Soft Computing in Civil Engineering, 2023, 7(3): 111–128.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Wang J, Sun L, Li H, Ding R, Chen N. Prediction model of fouling thickness of heat exchanger based on TA-LSTM structure[J]. Processes, 2023, 11(9): 2594.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Sun Y, Ding S, Zhang Z, et al. An improved grid search algorithm to optimize SVR for prediction[J]. Soft Computing, 2021, 25: 5633–5644.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Shams M Y, Elshewey A M, El-Kenawy E S M, et al. Water quality prediction using machine learning models based on grid search method[J]. Multimedia Tools and Applications, 2024, 83(12): 35307–35334.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Iooss B, Saltelli A. Introduction to sensitivity analysis[J]. Handbook of uncertainty quantification, 2017: 1103–1122.

[ref21] 21. Zamanian A, Ahmidi N, Drton M. Assessable and interpretable sensitivity analysis in the pattern graph framework for nonignorable missingness mechanisms[J]. Statistics in Medicine, 2023, 42(29): 5419–5450. pmid:37759370
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref22] 22. Ma Z, Zhang H, Chien I J, et al. Predicting expressway crash frequency using a random effect negative binomial model: A case study in China[J]. Accident Analysis & Prevention, 2017, 98(JAN.):214–222. pmid:27764690
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref23] 23. Noroozi Z, Orooji A, Erfannia L. Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction[J]. Scientific Reports, 2023, 13(1): 22588. pmid:38114600
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref24] 24. Farooq A, Xie M, Stoilova S, et al. The Application of Smart Urban Mobility Strategies and Initiatives: Application to Beijing[J]. European Transport\Trasporti Europei, 2019.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref25] 25. Stewart T. Overview of motor vehicle traffic crashes in 2021[R]. 2023.

[ref26] 26. Das D K. Exploring the significance of road and traffic factors on traffic crashes in a South African city[J]. International journal of transportation science and technology, 2023, 12(2): 414–427.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

Figures

Abstract

1. Introduction

2. Data description and processing

2.1 Data description

2.2 Data processing

3. Theoretical methods

3.1 Data inspection

3.2 SVM model

4. Model verification and result analysis

4.1 Model verification

4.2 Model parameter tuning

4.3 Analysis of potential influencing factors

4.3.1 Sensitivity analysis of influencing factors.

4.3.2 Backward selection method analysis.

5. Conclusion

References