Figures
Abstract
Animal-vehicle crashes (AVC) pose risks in rural areas, often leading to casualties and injuries. Despite their infrequent occurrence, AVC can have significant consequences, especially when larger animals are involved. This study investigates factors contributing to fatalities and injuries resulting from animal-involved collisions. It examines 24 variables using 1403 animal-vehicle crash observations on intercity and major intra-city roads from 2016–2021. The study employs a random parameters logit model (RPLM) and ensemble machine learning approaches to explore the contributory factors in crashes. The RPLM accounts for unobserved heterogeneity, identifying significant variables. Meanwhile, the ensemble learner and Shapley Additive exPlanations (SHAP) provide further insights. Key findings show that expressways, roads with one or two lanes per direction, horizontal curvature, and structurally poor pavement surfaces increase the risk of severe crashes, i.e., fatalities and injuries. Side fence barriers and speed bumps also impact crash severity. The absence of side fencing and damaged fencing both positively influence severe crashes, while the presence of speed bumps is likely to increase severe crashes. Camel exposure, vacation-period crashes, and adverse weather also play positive roles. However, heavy truck involvement is negatively associated with severe crashes. Policymakers and road safety authorities can use these findings to implement effective countermeasures to prevent such collisions.
Citation: Zawad MN, Almannaa M, Alkahtani KF (2025) Investigating factors influencing fatalities and injuries in animal-vehicle crashes using a random parameters logit model and ensemble machine learning approaches. PLoS One 20(9): e0331197. https://doi.org/10.1371/journal.pone.0331197
Editor: Yajie Zou, Tongji University, CHINA
Received: June 17, 2025; Accepted: August 13, 2025; Published: September 2, 2025
Copyright: © 2025 Zawad et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data used in this study are owned by the Saudi Ministry of Transport and Logistic Services (MTLS) and cannot be shared publicly due to a non-disclosure agreement. Researchers interested in accessing similar data may refer to the official website of the Ministry: https://mot.gov.sa.
Funding: Ongoing Research Funding program, (ORF-2025-291), King Saud University, Riyadh, Saudi Arabia. The funders of this research effort had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Background
Animal-vehicle crashes (AVC) pose a significant road safety concern globally, particularly in rural regions. These incidents result in severe human injuries, fatalities, property damage, and wildlife loss. Despite a generally low human fatality count, AVC demand attention due to recent crash trends, associated costs, and severe patient aftermath [1]. In 2021, the USA reported over 280,000 AVC, leading to more than 27,000 injuries and 156 fatalities [2]. In Canada, the Insurance Corporation of British Columbia recorded an annual average of over 11,000 AVC from 2018–2022, including 900 injuries and 3 fatalities [3]. In Australia, Australian Associated Motor Insurers Limited received over 17,000 vehicle insurance claims in 2022 involving crashes with animals. According to their report, one in every seven vehicles engaged with wildlife had to be junked off due to the severe damage [4].
Economically, AVC cost the USA over USD 8 billion annually [5], while São Paulo state in Brazil incurs over USD 25 million annually [6]. AVC often involve large animals like kangaroos in Australia, moose and deer in the USA, Canada, and parts of Europe, and camels in Middle Eastern countries. Crashes involving larger animals like moose or camels, which typically weigh over 500 kg and stand 6–8 ft. tall, can cause long-term injuries, especially in the head, face, and cervical spine [7,8] in addition to fatalities and significant vehicle damage.
In Saudi Arabia (KSA), AVC, particularly involving Arabian camels (both domesticated and feral), are prevalent, causing approximately 600 collisions annually with a human fatality rate of 0.25 per collision [9]. Camel owners often graze their camels near highways, leading to increased AVC due to inadequate roadside protection. To aggravate the situation, roadside fencing installed to protect the highways from animal infiltration is often cut to make a passage for camels and sheep and even for vehicles to travel to the other side of the locality [10]. Besides camels and sheep, feral donkeys are also seen to approach human habitations in rural areas, increasing the likelihood of AVC [11].
To minimize the loss of human capital and protect the animals in KSA, especially camels which hold great cultural significance, it is essential to investigate more about animal-related collisions. While mitigation measures like using warning signs of particular shape and size [9], developing animal detection system using GPS [12], and deep learning models [13] have been proposed in context of Saudi Arabia, other preventive investigations are yet to be explored. Concerning crash severity, researchers in KSA have explored individual victim classes such as drivers of varying nationalities [14] and ages [15] while investigating overall crash severity. However, no attention has been given to understanding the contributory factors affecting the severity of animal-vehicle crashes. To fill this gap, this study aims to analyze the severity outcomes of animal-involved crashes and identify the contributory factors influencing fatalities and injuries, based on 1,403 events recorded over four years in Saudi Arabia. The analysis employs both statistical and machine learning approaches, with a focus on severe outcomes over no-injury cases due to their greater impact on human life and public health. The ultimate goal is to provide insights that can assist policymakers in enhancing road safety and reducing severe injuries and fatalities.
Literature review
Crash injury severity has been a long-standing topic in road safety research, analyzing the relationship between human injury severity levels and influencing factors using statistical methods [16], or machine learning (ML) techniques [17]. However, limited studies have focused on AVC severity. This scarcity is often attributed to the inherent challenges in acquiring detailed and high-quality data on animal-vehicle collisions. Despite challenges, recent methodological advances have addressed underreporting issues using copula regression models [18,19].
Savolainen and Ghosh [19] Savolainen and Ghosh [20] conducted an early study in Michigan, USA, using a multinomial logit model to investigate severe injuries in AVC. The study highlighted that younger drivers (≤ 25 years) and female drivers were more likely to experience severe injuries. Additionally, horizontal curvature and speed limits exceeding 90 kph were identified as contributing factors to crash severity, emphasizing the role of roadway geometry and speed in crash outcomes.
Building on this, Al-Bdairi, Behnood [21] applied a random parameters logit model (RPLM) with heterogeneity in means and variances to crash data of vehicle collisions with deer and elk from Washington, USA. Their findings revealed that crashes occurring during daylight, on freeways, roads with high-speed limits (≥ 90 kph) were associated with higher injury severity. Ahmed, Cohen [22] explored deer-vehicle collisions in Pennsylvania, USA, using a correlated random parameter ordered logit model. Their analysis distinguished between factors contributing to severe injuries and fatalities, and property-damage-only (PDO) crashes. Airbag deployment was found to significantly increase the likelihood of severe injuries and fatalities indicating high-impact collisions with animals, while unbelted passengers, nighttime crashes, and younger drivers (≤30 years) were linked to PDO outcomes.
Gharraie and Sacchi [23] expanded the scope by investigating wildlife-vehicle collisions in Saskatchewan, Canada, using a structural equation modeling approach. Their results demonstrated that non-intersection sites, divided roadways, poor pavement conditions, wet road surfaces, heavy vehicle involvement, and bad weather significantly influenced crash severity. Horizontal curvature was again confirmed as a critical factor contributing to severe injury outcomes, aligning with findings from earlier studies.
In recent years, ML has been a popular choice in AVC severity research due to some of its advantages over conventional statistical models. Unlike statistical methods, ML models, non-parametric models, to be specific, do not require any assumptions regarding the underlying probability distribution of the data and presumed relationships between the response variable and predictor variables [24]. Moghaddam, Balali [25] utilized supervised ML methods, specifically CatBoost classifiers, to study AVC in Tennessee, USA. Their research primarily identified significant variables like speed limits, crash timing, and alcohol involvement but lacked detailed interpretive insights. Rahman, Das [26] adopted an unsupervised ML method, association rules mining, to analyze injury severity in Louisiana, USA. They identified unlit dark conditions, involvement of light trucks, younger drivers (≤ 25 years), and speed limits above 100 kph as critical factors.
While these studies provide a foundational understanding of AVC severity, they primarily focus on specific geographic regions or animal types. Moreover, variables like roadway conditions, animal types, and temporal factors are explored in isolation rather than in combination, leaving room for more comprehensive analyses.
Although ML models excel in computational capability and predictive accuracy, they often lack interpretability, which is a critical requirement for policymakers developing effective countermeasures for crash prevention. To address this limitation, many crash-severity studies have adopted post-hoc interpretability techniques like Local Sensitivity Analysis, Partial Dependence Plots (PDP), Local Interpretable Model-agnostic Explanations (LIME), and Shapley Additive exPlanations (SHAP) [27]. However, the application of ML to capture unobserved heterogeneity in crash data remains underdeveloped compared to statistical approaches [28,29].
This gap highlights the need for a hybrid approach combining statistical and ML methods to use their complementary strengths. Statistical models like the RPLM offer insights into variable significance and unobserved heterogeneity, while ML methods enhance predictive accuracy and can analyze complex datasets. Employing a hybrid approach that combines these models is increasingly recognized as a promising research direction [30]. While numerous studies have adopted this approach through comparative analyses of the methods’ results in the past [15,31,32], recent studies have shifted the focus on integrating traditional statistical models, particularly heterogeneity models, with machine learning techniques to form a cohesive connection and derive comprehensive conclusions [33–38].
Research aim and contribution
This study contributes empirically by including new variables related to the conditions of road safety items at crash sites, which were not previously examined in the context of AVC severity analysis. These variables, listed in Table 1, include observing the existence and conditions of road marking, cat eyes, concrete barriers, metal cable barriers, side fence barriers, traffic signs, overhead signs, speed bumps, cleanliness of the roadway, and road shoulders, presence of U-turns, and obstacles on the sides of the road. The RPLM is employed to quantify variable significance and capture potential unobserved heterogeneity. Using the statistically significant variables from RPLM, the study further explores variable impacts on crash severity outcomes by utilizing Ensemble Machine Learning classifier (EL) and SHAP dependency plots. This research is the first to integrate RPLM and EL methods to investigate factors contributing to AVC severity. EL method was chosen for its superior predictive accuracy over other ML models in classifying severity outcomes [17]. This work is also the first to examine camel-involved collisions, which accounts for 88% of the dataset records.
Materials and methods
Data description
Two datasets were utilized from the Saudi Ministry of Transportation and Logistic Services (MTLS): Crash Records and Roadway Condition Records. While the observations in the crash dataset were recorded at the time of the incident, the second dataset consisting of roadway evaluation reports observing both safety and maintenance items was collected later by another survey team. Although there was a separate column for animal-vehicle collision records in the crash dataset, we also manually reviewed crash observation remarks to ensure that all possible AVC cases were captured.
From this process, 1,403 AVCs were identified and extracted from a larger dataset of 59,403 total crashes that occurred on intercity and major intra-city roads across all 13 provinces in Saudi Arabia over a period from October 2016 to February 2021. This subset represents approximately 2.3% of the total crash dataset and was included in its entirety, without any sampling or subjective selection criteria. Fig 1 illustrates the regional distribution of these animal-vehicle crashes. It is worth noting that Medina reported the highest number of (AVCs) in the dataset, followed by Riyadh.
The list of variables extracted from the two datasets with detailed descriptive statistics are shown in Table 1.
The dataset was modified for analysis, with two new variables: Weekend and Vacation, created from crash date records. AVC records on Fridays and Saturdays were labeled as Weekend, reflecting KSA’s official weekends. The Vacation variable indicates if crashes occurred during public holidays. Length of Road variable consists of the information about road length using the road ID number. This variable has been used as a proxy-variable for road classification to capture the effects of long-distance driving in animal-vehicle crash severity for each road type. The Road Geometry variable was added using crash location coordinates, but 171 data points were labeled ‘not available’ due to missing location information. The dataset includes information on animal types involved in crashes, obtained from crash observation notes. However, the animal’s name was not mentioned in the crash report for 36 AVC events. The distribution of crashes over time is grouped into two variables: daytime refers to the crashes that occurred between 6 AM and 6 PM, and nighttime includes the crash events that occurred at night. Any features not matching the roadway type or particular road section were labeled ‘not applicable’. The table also includes two distinct variables: Road Surface Condition, assessing pavement quality, and Cleanliness of the Roadway, checking for sand dunes or debris accumulation.
The crash severity outcome initially had three levels: fatality, injury, and property damage only (PDO). Due to fewer fatal outcomes (11%), they were combined with injury classes (40%) to avoid bias towards the majority class. This combination is labeled ‘Severe’ in subsequent sections.
Methodology
The methodological framework of crash severity modelling is shown in Fig 2. As proposed, RPLM and EL classifiers were used in this study to explore the contributing factors in animal-involved road crash severity. RPLM contributed to finding statistically significant factors that affect the AVC severity outcome (either Severe or PDO). The identified significant variables in the RPLM were then investigated using the best EL classifier. Finally, the findings were explained with the help of SHAP summary and dependency plots.
It is worth noting that the methodological approach proposed here differs from conventional hybrid methods that employ machine learning for initial feature ranking before statistical interpretation. Instead, statistical methods (Binary Logit with VIF and LASSO, followed by RPLM) were prioritized for feature selection before applying the EL classifier with SHAP interpretation. This reverse sequence was chosen for three key reasons. First, to address potential multicollinearity and unobserved heterogeneity in our dataset through statistically rigorous feature selection; second, to ensure that selected features have statistical significance and theoretical relevance rather than merely predictive power; and third, to utilize ML’s pattern recognition capabilities on a refined feature set while exploring complex non-linear relationships and interaction effects through SHAP, which might not be fully captured by the RPLM alone.
Feature selection
Redundant variables often cause multicollinearity among the independent variables which violates one of the general assumptions of logistic regression. Variance Inflation Factor (VIF) was used to detect the presence of such variable redundancy in the dataset. Although no strict values of VIF are set, a rule of thumb of 10 or 5 is often used [39]. In this study, a VIF value greater than 5 was set as the condition for dropping the highly correlated variables. An iterative approach was adopted instead of removing all highly correlated variables simultaneously. The initial results identified three variables with high VIF scores: Road Type (5.03), Side Fence (6.25), and Overhead Sign (6.22). Despite its lower score compared to the Side Fence variable, the Overhead Sign variable was removed first due to the greater significance of side fencing conditions in AVC compared to the overhead signs. After removal, remaining variables met the inclusion criteria. Post redundancy removal, the Least Absolute Shrinkage and Selection Operator (LASSO), an embedded method [40] was applied to prevent overfitting of the model. LASSO shrank 6 variables to zero, selecting 17 in total as potential predictors.
Random parameters logit model (RPLM)
RPLM is a statistical model that improves reliability and prediction accuracy by accounting for unobserved variables. Adopting the mathematical background from the book of Washington, Karlaftis [41], the formulation of the model begins with the crash severity function which determines the AVC severity outcome. The basic form of it can be defined as:
where, ysi is the severity function which determines the probability of AVC severity outcome s in crash i. βs is a vector of parameters to be estimated for AVC severity level s. These parameters represent the effects of various factors on crash severity. xsi is a vector of observed variables (predictors), and εsi is a random error term assumed as extreme values. The derivation of the random parameters logit model can be shown as:
where Pi(s) is the probability of the crash i that results in a discrete outcome s (s ∈ S), either Severe or PDO. In RPLM, some of the βs are allowed to vary across individuals to capture unobserved heterogeneity. The weights used to determine Pi(s) is governed by the f(β│φ), the density function of β. The function follows a specific distribution (normal/ log-normal/ triangular or others) which is described by the vector φ. If the estimated variance of the parameter distribution is statistically significant, it is confirmed to be a random parameter. The coefficients of RPLM were estimated using maximum simulated likelihood, with 500 Halton draws used for probability simulation, as suggested by Sarrias [42]. Several distributions for the random parameters were explored (such as, normal, log-normal, truncated normal, triangular, and uniform). The random parameters with normal distribution were found to yield better statistical fit than other form of distributions which aligns with the previous findings of crash severity analysis by Al-Bdairi, Behnood [21] Islam and Jones [43], and others. Statistical testing was carried out to determine the well-fitted model with the help of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). A likelihood ratio (LR) test was also conducted to compare the statistical significance of the RPLM model over a standard logit model (LM). The LR test statistic follows a chi-square distribution, with degrees of freedom equal to the difference in the number of parameters estimated in the models. The null hypothesis, that models with or without random parameters are equal, is rejected if the p-value is less than the chosen significance level. In this study, the LR test statistic based on the likelihoods of RPLM and LM was found as 6.26. The p-value of the chi-square test statistic with three degrees of freedom is 0.099, providing evidence at a 90% confidence interval in favor of the RPLM.
Ensemble learning (EL) classifiers
EL classifiers combine multiple base machine learning algorithms. These base models are trained individually to predict crash severity outcomes, with results aggregated using a ‘majority vote’ method for categorical outcomes [44]. This study tested five ensemble machine learners—Random Forest [45], Extra Trees [46], CatBoost [47], XGBoost [48], and LightGBM [49] —using significant variables from the RPLM results. To avoid complexity, mathematical modeling of all these ensemble learners has not been elaborated upon in this subsection. Readers seeking a more detailed understanding of the algorithms are encouraged to refer to the citations provided alongside each ensemble learner.
The metrics used to evaluate and compare the prediction performance of the ensemble learners include Accuracy, Recall, Precision, Area Under the Curve (AUC), and F1-score. These metrics use the summarization of the prediction performance of a specific classifier and return a value between 0 and 1 based on the formulas. Table 2 shows a standard format of the confusion matrix, and the metrics formulas are shown in Equation (3–6).
In the confusion matrix, ‘positive’ implies the severe crash outcomes whereas ‘negative’ indicates crashes with no injuries, i.e., PDO. Using the matrix, Accuracy measures the proportion of correctly identified instances among all. It is considered a fair estimation of the classifier’s performance for class-balanced datasets. The unit of Accuracy ranges in [0, 1], where 1 represents a case with perfect prediction. Recall, also known as sensitivity or true positive rate (TPR), is an evaluator which is more concerned with the positive outcomes. It is expressed as the ratio between correctly identified positive instances and all positive instances. A classifier with higher sensitivity is recommended in case of crash severity modeling to avoid misjudgment and underestimation of the severity level. In addition to Accuracy and Recall, other evaluation metrics including Precision, AUC, and F1 Score provide better insights into the classifier’s performance. Precision evaluates how many of the crashes predicted as severe were actually severe. AUC quantifies the overall performance of the model. A model with an AUC close to 1 demonstrates strong discriminatory power, whereas an AUC near 0.5 indicates random guessing. F1 Score serves as a harmonic mean of Precision and Recall, balancing the trade-off between the two. It is especially beneficial when dealing with class imbalance, as it considers both false positives and false negatives.
Addressing reproducibility challenges in machine learning, a repeated 10-fold cross-validation (CV) strategy was implemented for the best EL model selection. The crash dataset was randomly split into training and testing samples, and five ensemble learners were fitted on the training dataset. A stratified 10-fold CV assessed model performances, with steps repeated ten times and scores recorded each iteration. The classifier with the highest average score was selected for further hyperparameter tuning to optimize performance.
Model interpretation
The Shapley Additive exPlanations (SHAP) method [50] was used to interpret the optimal EL model, providing an extensive understanding of the significant variables derived from the RPLM. SHAP, a novel model interpretation technique introduced by Lundberg and Lee [50], is based on the Shapley value, a solution concept in game theory, originally proposed by Shapley [51]. The Shapley value for each variable is calculated as the average change in prediction when the particular variable is included, compared to when it is not, across all possible subsets of features. This concept is similar to the average marginal contribution of a variable.
In the context of the animal-vehicle crash (AVC) severity analysis, let f(x) represent our ensemble model that predicts the probability of a severe crash outcome given a set of crash characteristics (x) (including road, environmental, and vehicle factors). For each crash instance in the dataset, SHAP computes the contribution of each feature to the deviation from the average predicted probability of severe outcomes across the entire dataset. Mathematically, for a specific crash with features (x), the prediction f(x) is decomposed as:
where represents the base value (average prediction across the dataset), N is the total number of features, and
is the SHAP value for feature j. The SHAP value is defined as:
where N is the set of all features, S is a subset of features excluding feature j, and fx(S) represents the expected model output when only the features in subset S are known.
The SHAP method was selected for its ability to provide both global and local interpretability, which is crucial for understanding the complex relationships in crash severity outcomes. Unlike other methods, such as LIME, which require repetitive model fitting, SHAP provides consistent analysis across the entire dataset with reduced computational complexity. Its additive property ensures that feature contributions are clear and intuitive, allowing policymakers to understand the findings.
The SHAP summary plot arranges predictors in descending order of significance towards the predicted severity outcome, illustrating each predictor’s local contribution. The spread of color density in the scatter plot indicates the significance, with colors representing predictor levels from low (blue) to high (red). The location of specific color from zero (either right or left) explains the significance of the level of that particular feature (either positive or negative) on the response variable. The SHAP feature dependency plots were used to reveal feature interactions and their joint effects on crash severity outcomes. For a pair of features i and j, the dependency effect is calculated as:
In the dependency plots, Shapley values are labelled on the y-axis and crash instances are distributed on the x-axis according to the primary feature levels. Crash instances are colored based on the levels of the secondary feature, which interacts most strongly with the primary feature. In SHAP dependency plots, the distribution of the color above the zero line indicates positive influence of the interacting feature levels on the response variable, which in this study is crash severity outcome.
Results and discussions
Estimation results of RPLM
Based on the best-fit model, 11 variables were identified as statistically significant. A summary of the estimation results for these 11 variables is provided in Table 3.
The odds ratio results of RPLM show that severe animal-related road crashes are more likely to happen on freeways than undivided roads by a factor of 3.72 (e^1.313). Roads with three lanes per direction have 74% (100 × (0.258−1)) fewer chances of injuries and fatalities than one-lane roads. Roads with two lanes per direction (Lane_2) were found to be a random parameter with a mean of −0.32971 and a standard deviation of 3.806206. Using the cumulative distribution function of the normal distribution, it can be stated that for 47% of the observations, roads having two lanes per direction increase the probability of severe crash outcomes. The mean of Lane_2 not being statistically significant suggests that the unobserved heterogeneity of roads with two lanes in the sample varied from one location or event to another, thus cancelling positive and negative effects. This variation could be due to other road characteristics, such as traffic volume, or might relate to animal habitat or grazing points. Concerning road geometry, the odds of having severe animal-vehicle crashes are 61.20% higher on roads with horizontal curvature than on straight roads. Factors such as poor pavement, vacation periods, and adverse weather increase the likelihood of severe AVC outcomes. Good side fencing protection can decrease the chances of severe crash outcomes as the odds of having more injuries and fatalities are associated with damaged/ tampered/worn-out side fences that need replacement. As for the involvement of heavy trucks, the chances of severe outcomes are 69.51% less likely than the involvement of cars or buses. Collisions with camels are 2.95 times more likely to result in injuries and fatalities than those with donkeys. In addition to the interpretations based on odds ratio, further insights were sought regarding variable impacts on the crash severity outcome using ensemble machine learning approaches.
Selection of best ensemble learner and interpretation of the results
A subset of the AVC dataset involving only the statistically significant 11 predictor variables was trained using the five EL classifiers to find out the best learner. The CatBoost classifier was found to have better overall accuracy and recall scores than others.
Table 4 summarizes the best performance metrics measured on the validation datasets. The hyperparameters of CatBoost classifier were tuned further to enhance its prediction performance resulting in an accuracy score of 0.61 and a recall score of 0.64.
The findings of the CatBoost classifier were interpreted using SHAP plots. According to the SHAP summary plot shown in Fig 3, the type of animal involved has the greatest impact on the crash severity prediction, followed by the length of road and road surface condition. The distribution for each variable in the SHAP summary plot supports the findings of the estimation results of RPLM.
To interpret the results in an organized way, variables with similar attributes are discussed in the subsequent subsections utilizing both SHAP summary and dependency plots.
Roadway features
Fig 4a illustrates the impact of road type and road length on crash severity. For undivided roads, crashes tend to be more severe when the length is less than 70 kilometers. These shorter, undivided roads often have more conflict points, such as intersections, frequent stops, and turns, which could lead to a higher number of injuries and fatalities, particularly in rural areas. In contrast, animal-vehicle crashes on freeways are more likely to be severe, mostly on longer stretches of road (more than 130 kilometers). This observation aligns with the findings of Al-Bdairi, Behnood [21] and Erdogan [52]. Given their uninterrupted and access-controlled nature, vehicles on freeways or freeways are likely to travel at higher speeds. Combined with longer distances, this could lead to driver fatigue and drowsiness, potentially resulting in severe crashes, especially if animals suddenly appear on the road. A study by Ting, Hwang [53] suggests that on highways, the maximum safe limit to avoid fatigue-related crashes is 80 minutes of driving, which equates to a distance ranging from 133 to 187 kilometers in KSA, depending on the vehicle type. Divided multilane roads are expected to have fewer injuries and fatalities than the other two road types.
Fig 4b shows the interaction between the number of lanes and road length. Regardless of the road length, the chances of severe crashes are high on single-lane roads per direction, whereas the chances are low for roads with three lanes. With more lanes on rural roads, drivers are expected to get better space to avoid severe collisions with animals. Although unrelated to animal-vehicle collisions, similar conclusions were made by Se, Champahom [54] found less severe crashes in two-lane roads per direction compared to one-lane. Islam and Jones [43], while investigating pedestrian-at-fault crashes, also found that single-lane per direction roads have high probability of injuries than multilane roads in rural areas. Regarding roads with two lanes per direction, severe crashes are more commonly associated with longer roads. Conversely, shorter lengths may have varying severity outcomes, which could be influenced by unobserved circumstances.
Pavement surfaces with distresses, potholes, or rutting are likely to have more severe crashes in all road types, as plotted in Fig 5a. This is likely due to the challenges in maneuvering vehicles on such poorly maintained road surfaces, especially on high-speed roads with less time to react [55].
According to Fig 5b, AVC on straight roads have more possibilities to result in injuries and fatalities when they occur on vacation, possibly due to over-speeding. The spread of blue dots with positive SHAP values in road sections with horizontal curvatures indicates that roads with curvatures are more sensitive to severe crashes, even without a rush for holiday trips. The scatter plot of road geometry in Fig 3 also suggests that AVC on horizontal curves are more vulnerable to severe crashes, possibly due to the limited visibility of the driver on the curves, leading to reduced reaction time. This finding is also evident in the work of Gharraie and Sacchi [23] and Savolainen and Ghosh [20].
Road safety and maintenance items
Fig 6a shows the interaction effects of speed bumps and road type. The presence of speed bumps (whether in good or bad conditions) is likely to have a positive influence on severe outcomes of crashes occurring on divided and undivided roads. While speed bumps were associated with increased severity, this may reflect confounding due to road environment or driver behavior (e.g., speeding, limited visibility), rather than the speed bumps themselves. For instance, in rural areas where the speed limit is 80 kph or above, it is plausible that drivers, who may be distracted by mobile devices or impaired by limited night vision, could lose control of their vehicles upon encountering a sudden speed bump. This could lead to collisions with either animals, fixed objects, tripped rollovers, or rear-end crashes due to sudden vehicle braking.
Fig 6b shows the interaction effects of side fencing with road type. The scatter plot reveals a positive association of severe crashes in freeways when the side fencing is in a state of bad condition (either worn out or bent) which can lead to animal exposure on roads. The absence of side fences also increases the likelihood of injuries and fatalities on freeways and undivided rural roads. Although not related to severity, presence of well-designed and maintained side fencing alongside roads was found to be an effective way in reducing animal-vehicle collisions, as reviewed by Hedlund, Curtis [56].
Environmental feature
Animal-vehicle collisions in adverse weather conditions are expected to have severe outcomes, as depicted in the SHAP summary plot (Fig 3). This finding is supported by the studies of Rahman, Das [26] and Gharraie and Sacchi [23]. Adverse weather conditions, such as rain or sandstorms, can create visibility hazards and reduce surface friction, increasing the likelihood of severe crashes [57]. However, the impact of these conditions on the model is relatively low compared to other variables. This may be due to the small proportion of crash occurrences (2%) in adverse weather conditions in the dataset.
Other crash contributory variables
For animal type, Fig 3 suggests that collisions with camels (represented by blue spreads) have a positive influence on the severe crashes. Being heavier and taller compared to other animal types in KSA, the possibility of more injuries and fatalities from camel-vehicle collisions is expected.
Regarding vacation, the SHAP summary plot (Fig 3) indicates that crashes occurring during vacation periods (represented by red spreads) are more likely to result in severe outcomes. Almannaa, Zawad [58]) also found vacation to be one of the contributing parameters towards overall fatal crashes in KSA. During vacation periods, the likelihood of crashes increases, particularly in rural locations. These incidents often involve multiple vehicles, angular collisions, rear-end collisions, and instances where restraints are not used [59].
Concerning trucks, the negative dispersion of red dots on SHAP values in Fig 3 suggests that the involvement of heavy trucks contributes less to injuries and fatalities in crashes involving animals. This is consistent with the findings of Al-Bdairi, Behnood [21]. Since trucks are large in sizes and are capped to a relatively lower speed limit than passenger vehicles and buses, drivers may find it easier to control their vehicles in the event of a sudden animal appearance on the road the severity remains lower. However, the chances of fatality remains high for the animals if got hit by a heavy truck [60].
Conclusions
Animal-vehicle crashes (AVC) are often overlooked in crash severity studies due to their relative infrequency. However, in regions like Saudi Arabia (KSA), where camels are a significant part of the animal population and often wander onto rural roads, the impact of AVC can be substantial. This research, a pioneering effort in KSA, examined the factors contributing to fatalities and injuries in AVC. The study analyzed 1,403 AVC on intercity and major intra-city roads, examining 24 variables related to roadway features, safety items, crash associated factors, and spatial, temporal, and environmental attributes.
A systematic approach was used to test the variables for multicollinearity and select potential variables for crash severity modeling using LASSO regularization. A random parameters logit model (RPLM) was employed to quantify statistically significant variables and account for unobserved heterogeneity in the dataset. A likelihood ratio test was conducted between a standard logit model and RPLM to justify the selection of the model. The estimation results of RPLM found one variable (roads with two lanes per direction) to be a random parameter and 11 statistically significant variables contributing to injuries and fatalities in animal-vehicle crashes. These variables were further examined using ensemble machine-learning techniques to seek more insights. Five ensemble learners were used to train the data and predict the crash outcome. The CatBoost classifier, chosen based on its superior average prediction scores, was tuned to improve accuracy. Shapley Additive exPlanations (SHAP) was utilized to interpret the findings of the tuned CatBoost classifier, providing detailed insights into the contributory factors’ relative significance and their interaction effects.
The findings of this study provide valuable insights for policymakers and road safety authorities to mitigate the severity of animal-vehicle crashes. Several practical implications emerge from the analysis. Roadside fencing along highways is critical for preventing animal intrusions from grazing areas and nearby localities. Implementing proper fencing combined with regular maintenance can significantly reduce AVC severity. Crashes occurring during vacations, which show a higher likelihood of severe outcomes, highlight the need for targeted interventions, including stricter enforcement of traffic regulations, enhanced traffic monitoring, and public awareness campaigns. Although heavy truck involvement was found to be less likely to contribute to severe crashes, the significance of truck involvement in crashes underscores the importance of increasing monitoring and inspection, particularly on roads lacking checkpoint stations. Similarly, periodic maintenance of road safety items, including speed bumps, is essential. According to the findings, poor pavement conditions also contribute significantly to severe AVC across all roadway types. Ensuring compliance with construction standards, incorporating accurate traffic forecasts during planning, and scheduling regular pavement maintenance are critical measures to address this issue. Uninterrupted freeway driving increases the risk of fatigue-related crashes, necessitating measures like frequent rest areas, fatigue-alert signage, and in-vehicle monitoring devices. Finally, the significant contribution of camels to AVC fatalities calls for safety awareness programs targeting camel owners to prevent animal exposure on roads. These actionable strategies can help reduce AVC severity and improve road safety.
Limitations and future works
The research faced certain limitations due to some constraints. The roadway condition dataset was collected by a separate survey team at different times from the crash events. This temporal misalignment may introduce measurement error that could affect the precision of severity attribution. However, most infrastructure elements (e.g., pavement quality, signage, fencing) change slowly, especially on intercity roads. In addition, the use of LASSO regularization and robust standard errors in the RPLM helps mitigate potential bias from unobserved time-varying factors. Future studies are encouraged to employ real-time or near-real-time road monitoring systems to reduce such limitations. Additionally, due to data unavailability, factors such as traffic volume, road segment length, street lighting conditions, driver’s age, airbag deployment, seat belt usage, vertical curvature on roadway, and effect of vegetation or locality alongside rural roads could not be investigated. For future research, it is recommended to include these variables to provide a more comprehensive understanding of the factors influencing animal-involved crash severity. Exploring other methodologies, such as capturing unobserved heterogeneity using artificial neural networks can add new dimensions to animal-vehicle crash severity modeling.
References
- 1. Ang JY, Gabbe B, Cameron P, Beck B. Animal-vehicle collisions in Victoria, Australia: An under-recognised cause of road traffic crashes. Emerg Med Australas. 2019;31(5):851–5. pmid:31361079
- 2.
NHTSA. Traffic safety facts 2021: A compilation of motor vehicle traffic crash data: National Center for Statistics and Analysis; 2023 [updated 12/2023; cited 2024 January]. Available from: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/813527
- 3.
ICBC. Quick Statistics: Insurance Corporation of British Columbia; 2023 [updated October 5, 2023; cited 2024 January]. Available from: https://public.tableau.com/app/profile/icbc/viz/QuickStatistics-Crashesinvolving/CrashesInvolving
- 4.
AAMI. Aussie drivers’ admit to risky tactics to avoid wildlife collisions: Suncorp Group; 2023 [cited 2024 February]. Available from: https://www.suncorpgroup.com.au/news/news/AAMI-animal-collision-data
- 5.
Huijser MP, McGowan P, Clevenger AP, Ament R, Berger L. Wildlife-Vehicle Collision Reduction Study: Best Practices Manual: Report to Congress. 2008.
- 6. Abra FD, Granziera BM, Huijser MP, Ferraz KMPM de B, Haddad CM, Paolino RM. Pay or prevent? Human safety, costs to society and legal perspectives on animal-vehicle collisions in São Paulo state, Brazil. PLoS One. 2019;14(4):e0215152. pmid:30973920
- 7. Ansari S, Ashraf Ali KS. Camel collision as a major cause of low cervical spinal cord injury. Spinal Cord. 1998;36(6):415–7. pmid:9648198
- 8. Clark DE, Fulton G, Ontengco JB, Lachance T, Sutton JE. Moose–motor vehicle collision: a continuing hazard in Northern New England. Journal of the American College of Surgeons. 2019;228(6):941–7.
- 9. Al-Ghamdi AS, AlGadhi SA. Warning signs as countermeasures to camel-vehicle collisions in Saudi Arabia. Accid Anal Prev. 2004;36(5):749–60. pmid:15203352
- 10. Abdullah M, Yousuf M. Facing the unkindest cut of all. Arab News. 2008.
- 11. Soultan A, Darwish M, Al-Johani N, Abdulkareem A, Alfaifi Y, Assaeed AM, et al. Feral Donkey Distribution and Ecological Impacts in a Hyper-Arid Region. Animals (Basel). 2023;13(18):2885. pmid:37760285
- 12. Jadi AM. Implementing a safe travelling technique to avoid the collision of animals and vehicles in Saudi Arabia. International Journal of Advanced Computer Science Applications. 2019;10(9).
- 13. Alghamdi S, Algethami A, Tan T. Vehicle-camel collisions in Saudi Arabia: Application of single and multi-stage deep learning object detectors. Ain Shams Engineering Journal. 2023;:102328.
- 14. Alogaili A, Mannering F. Unobserved heterogeneity and the effects of driver nationality on crash injury severities in Saudi Arabia. Accid Anal Prev. 2020;144:105618. pmid:32535248
- 15. Islam MK, Gazder U, Akter R, Arifuzzaman M. Involvement of road users from the productive age group in traffic crashes in Saudi Arabia: An investigative study using statistical and machine learning techniques. Applied Sciences. 2022;12(13):6368.
- 16. Savolainen PT, Mannering FL, Lord D, Quddus MA. The statistical analysis of highway crash-injury severities: a review and assessment of methodological alternatives. Accid Anal Prev. 2011;43(5):1666–76. pmid:21658493
- 17. Santos K, Dias JP, Amado C. A literature review of machine learning algorithms for crash injury severity prediction. J Safety Res. 2022;80:254–69. pmid:35249605
- 18. Zou Y, Zhong X, Tang J, Ye X, Wu L, Ijaz M. A copula-based approach for accommodating the underreporting effect in wildlife‒vehicle crash analysis. Sustainability. 2019;11(2):418.
- 19. Yang X, Zou Y, Wu L, Zhong X, Wang Y, Ijaz M. Comparative analysis of the reported animal‐vehicle collisions data and carcass removal data for hotspot identification. Journal of Advanced Transportation. 2019;2019(1):3521793.
- 20. Savolainen P, Ghosh I. Examination of Factors Affecting Driver Injury Severity in Michigan’s Single-Vehicle–Deer Crashes. Transportation Research Record: Journal of the Transportation Research Board. 2008;2078(1):17–25.
- 21. Al-Bdairi NSS, Behnood A, Hernandez S. Temporal stability of driver injury severities in animal-vehicle collisions: A random parameters with heterogeneity in means (and variances) approach. Analytic Methods in Accident Research. 2020;26:100120.
- 22. Ahmed SS, Cohen J, Anastasopoulos PCh. A correlated random parameters with heterogeneity in means approach of deer-vehicle collisions and resulting injury-severities. Analytic Methods in Accident Research. 2021;30:100160.
- 23. Gharraie I, Sacchi E. Severity Analysis of Wildlife–Vehicle Crashes using Generalized Structural Equation Modeling. Transportation Research Record: Journal of the Transportation Research Board. 2020;2675(2):53–64.
- 24. Zhang J, Li Z, Pu Z, Xu C. Comparing prediction performance for crash injury severity among various machine learning and statistical methods. IEEE Access. 2018;6:60079–87.
- 25. Moghaddam K, Balali V, Singh P, Khalilikhah M. Evaluation of multi-class multi-label machine learning methods to identify the contributing factors to the severity of animal-vehicle collisions. International Journal for Traffic & Transport Engineering. 2021;11(3).
- 26. Rahman MA, Das S, Sun X, Sun M, Hossain MM. Using unsupervised learning to investigate injury-associated factors of animal-vehicle crashes. Int J Inj Contr Saf Promot. 2023;30(2):210–9. pmid:36278330
- 27. Wen X, Xie Y, Jiang L, Pu Z, Ge T. Applications of machine learning methods in traffic crash severity modelling: current status and future directions. Transport Reviews. 2021;41(6):855–79.
- 28.
Jiang T, Hu W, Liu Y, Xiao L, editors. Capturing and Exploring Unobserved Heterogeneity in Traffic Crash Injuries in China: A Deep Learning Approach. 2022 IEEE 7th International Conference on Intelligent Transportation Engineering (ICITE); 2022: IEEE.
- 29. Yu H, Li Z, Zhang G, Liu P, Ma T. Fusion convolutional neural network-based interpretation of unobserved heterogeneous factors in driver injury severity outcomes in single-vehicle crashes. Analytic methods in accident research. 2021;30:100157.
- 30. Mannering F, Bhat CR, Shankar V, Abdel-Aty M. Big data, traditional data and the tradeoffs between prediction and causality in highway-safety analysis. Analytic Methods in Accident Research. 2020;25:100113.
- 31. Iranitalab A, Khattak A. Comparison of four statistical and machine learning methods for crash severity prediction. Accid Anal Prev. 2017;108:27–36. pmid:28841408
- 32. Goswamy A, Abdel-Aty M, Islam Z. Factors affecting injury severity at pedestrian crossing locations with Rectangular RAPID Flashing Beacons (RRFB) using XGBoost and random parameters discrete outcome models. Accid Anal Prev. 2023;181:106937. pmid:36599213
- 33. Sun Z, Wang D, Gu X, Abdel-Aty M, Xing Y, Wang J, et al. A hybrid approach of random forest and random parameters logit model of injury severity modeling of vulnerable road users involved crashes. Accid Anal Prev. 2023;192:107235. pmid:37557001
- 34. Agheli A, Aghabayk K. How does distraction affect cyclists’ severe crashes? A hybrid CatBoost-SHAP and random parameters binary logit approach. Accid Anal Prev. 2025;211:107896. pmid:39673830
- 35. Hossain A, Sun X, Das S, Jafari M, Rahman A. Investigating pedestrian-vehicle crashes on interstate highways: Applying random parameter binary logit model with heterogeneity in means. Accid Anal Prev. 2024;199:107503. pmid:38368777
- 36. Islam M, Patel D, Hasan AS, Jalayer M. An exploratory analysis of two-vehicle crashes for distracted driving with a mixed approach: Machine learning algorithm with unobserved heterogeneity. Journal of Transportation Safety & Security. 2024;16(7):709–45.
- 37. Sadeghi M, Aghabayk K, Quddus M. A hybrid Machine learning and statistical modeling approach for analyzing the crash severity of mobility scooter users considering temporal instability. Accid Anal Prev. 2024;206:107696. pmid:38964138
- 38. Alhadidi TI, Al-Marafi MN, Alazimi A. Development of safety performance measures for different crashes severity at urban roundabouts. Results in Engineering. 2025;25:103680.
- 39.
Kutner MH, Nachtsheim CJ, Neter J, Li W. Applied linear statistical models: McGraw-hill; 2005.
- 40. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1996;58(1):267–88.
- 41.
Washington S, Karlaftis MG, Mannering F, Anastasopoulos P. Statistical and econometric methods for transportation data analysis: CRC press; 2020.
- 42. Sarrias M. Discrete choice models with random parameters in R: The Rchoice package. Journal of Statistical Software. 2016;74:1–31.
- 43. Islam S, Jones SL. Pedestrian at-fault crashes on rural and urban roadways in Alabama. Accid Anal Prev. 2014;72:267–76. pmid:25089767
- 44.
Rokach L. Ensemble learning: pattern classification using ensemble methods. World Scientific; 2019.
- 45. Breiman L. Random forests. Machine Learning. 2001;45:5–32.
- 46. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Machine Learning. 2006;63:3–42.
- 47. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems. 2018;31.
- 48.
Chen T, Guestrin C, editors. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016.
- 49. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems. 2017;30.
- 50. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems. 2017;30.
- 51.
Shapley LS. A value for n-person games. Contributions to the Theory of Games. 21953. p. 307-17.
- 52. Erdogan S. Explorative spatial analysis of traffic accident statistics and road mortality among the provinces of Turkey. J Safety Res. 2009;40(5):341–51. pmid:19932314
- 53. Ting P-H, Hwang J-R, Doong J-L, Jeng M-C. Driver fatigue and highway driving: a simulator study. Physiol Behav. 2008;94(3):448–53. pmid:18402991
- 54. Se C, Champahom T, Jomnonkwao S, Kronprasert N, Ratanavaraha V. The impact of weekday, weekend, and holiday crashes on motorcyclist injury severities: accounting for temporal influence with unobserved effect and insights from out-of-sample prediction. Analytic Methods in Accident Research. 2022;36:100240.
- 55. Lee J, Nam B, Abdel-Aty M. Effects of pavement surface conditions on traffic crash severity. Journal of Transportation Engineering. 2015;141(10):04015020.
- 56. Hedlund JH, Curtis PD, Curtis G, Williams AF. Methods to reduce traffic crashes involving deer: what works and what does not. Traffic Inj Prev. 2004;5(2):122–31. pmid:15203947
- 57. Jung S, Jang K, Yoon Y, Kang S. Contributing factors to vehicle to vehicle crash frequency and severity under rainfall. J Safety Res. 2014;50:1–10. pmid:25142355
- 58. Almannaa M, Zawad MN, Moshawah M, Alabduljabbar H. Investigating the effect of road condition and vacation on crash severity using machine learning algorithms. Int J Inj Contr Saf Promot. 2023;30(3):392–402. pmid:37079354
- 59. Anowar S, Yasmin S, Tay R. Comparison of crashes during public holidays and regular weekends. Accid Anal Prev. 2013;51:93–7. pmid:23201757
- 60. Yuan Q, Xu X, Yang Z, Shi D, Qi S, Zhang Y. Investigating crash-related injuries between animal-related and motor vehicle in Rural China: Bayesian random parameter probit model considering endogenous variables. Cogent Engineering. 2023;10(1).