Classification of road traffic injury collision characteristics using text mining analysis: Implications for road injury prevention

Road traffic injuries are a leading cause of morbidity and mortality globally. Understanding circumstances leading to road traffic injury is crucial to improve road safety, and implement countermeasures to reduce the incidence and severity of road trauma. We aimed to characterise crash characteristics of road traffic collisions in Victoria, Australia, and to examine the relationship between crash characteristics and fault attribution. Data were extracted from the Victorian State Trauma Registry for motor vehicle drivers, motorcyclists, pedal cyclists and pedestrians with a no-fault compensation claim, aged > = 16 years and injured 2010–2016. People with intentional injury, serious head injury, no compensation claim/missing injury event description or who died < = 12-months post-injury were excluded, resulting in a sample of 2,486. Text mining of the injury event using QDA Miner and Wordstat was used to classify crash circumstances for each road user group. Crashes in which no other was at fault included circumstances involving lost control or avoiding a hazard, mechanical failure or medical conditions. Collisions in which another was predominantly at fault occurred at intersections with another vehicle entering from an adjacent direction, and head-on collisions. Crashes with higher prevalence of unknown fault included multi-vehicle collisions, pedal cyclists injured in rear-end collisions, and pedestrians hit while crossing the road or navigating slow traffic areas. We discuss several methods to promote road safety and to reduce the incidence and severity of road traffic injuries. Our recommendations take into consideration the incidence and impact of road trauma for different types of road users, and include engineering and infrastructure controls through to interventions targeting or accommodating human behaviour.


Introduction
By 2030 road traffic injuries are projected to become the fifth leading cause of mortality globally [1]. In Australia, road traffic injuries lead to significant long-term disability and mortality, especially for people with injuries that require treatment in hospital [2,3], and have an economic impact exceeding $33 billion AUD per year [4]. All road traffic injury collisions are influenced by a range of factors that are principally attributed to human error, which highlights the need to consider the five safety pillars of the Safe System approach in order to reduce the incidence and impact of road trauma: safe roads, safe people, safe vehicle, safe speeds and post-crash care [5]. To date few studies have sought to characterise the circumstances of road traffic collisions, which presents a key gap in knowledge that impedes our capacity to reduce road trauma. Many jurisdictions collate road traffic injury data through police, insurance, or government transport agencies. These data can be used to conduct large scale analysis of the incidence of road traffic collisions [6][7][8]. However, these databases often lack coding of the range of additional individual characteristics and contributory factors involved in each collision. Studies that do generate detailed coding of road traffic injury events typically do so only for specific road user groups [7,[9][10][11][12], or focus on specific types of roadways [9,13,14]. The aforementioned studies provide important insights to guide injury prevention strategies for specific road users or locations. However, development of injury prevention strategies could be improved further if we understand the events leading to serious injury for all road users who may play a direct role in causing or preventing the road traffic injury event, particularly motor vehicle drivers or motorcycle riders, cyclists and pedestrians.
When seeking to characterise road traffic injury events and to identify avenues for injury prevention, it is helpful to know which party was at fault. Fault can be measured in several ways including recording who was legally responsible (e.g., contributory negligence, intent, knowledge or recklessness of each party) [15]; who has legal liability or which entity must pay compensation for the injuries sustained; and who the injured person blames or feels was responsible for the injury event [16]. The differences between these attributions are important given that people may feel that they are partially to blame even if their actions during the event were not negligent or reckless, or they may recognise that multiple parties were at fault. Fault attributions play a key role in understanding the causal factors contributing to road traffic injury, and are also known to influence outcomes. People who are not responsible, or for whom another party has legal liability to provide injury compensation, have been found to have poorer health, pain and work outcomes after transport injury [16]. While the reasons for this association are not known, it is thought that external attributions of causality and perceptions of injustice for the injury may impede psychological recovery, and negatively influence injury-related beliefs and recovery over time [17]. The benefits of understanding the patterns of fault attributions across different types of road traffic injury collisions may help us to (a) identify where and how injury prevention strategies should be implemented to have the greatest impact on road safety; and (b) identify the types of collisions that may lead to worse injury outcomes, that enable the provision of early targeted interventions to injured road users.
The primary aim of the present study was to classify injury events reported by injured motor vehicle drivers, motorcyclists, pedal cyclists and pedestrians who survive an unintentional road traffic collision using text mining methods. A secondary aim was to examine the associations between road user characteristics, injury characteristics and fault attributions.

Methods
The study received low risk ethics approval from the Monash University Human Research Ethics Committee (Project number 14283). The study involved analysis of deidentified data. All trauma cases are included in the Victorian State Trauma Registry (VSTR) using an opt out process.
Victorian State Trauma Outcomes Registry and Monitoring Group (VSTORM). Instructions available here: https://www.monash.edu/medicine/ sphpm/vstorm/data-requests, and data requests require ethics approval before data can be provided. There may be some other limits to data requests. To obtain the linked data from the Transport Accident Commission (TAC) that was used in this study, which was more extensive than the routine linkage between VSTR and TAC, requires a data request via the client research team at the TAC (research@tac.vic.gov.au). To initiate both of these data requests interested parties should first contact the VSTORM project office; contact details are available here: https://www. monash.edu/medicine/sphpm/vstorm/contact. To gain access to the exact same dataset as that used in the present study interested parties should contact the study corresponding author, and follow the same data custodian request processes outlined above.

Participants
Participants from the VSTR were included if they sustained a road traffic injury as a motor vehicle driver, motorcycle rider, pedal cyclist, or pedestrian between 1 July 2010 and 30 June 2016, were aged 16 years and older at the time of injury, and had an accepted compensation claim with the Transport Accident Commission (TAC). Availability of linked claimant data from the Transport Accident Commission (TAC) was required as the linked claim provided the text description of the injury event, and the claimant and police fault attributions. Motor vehicle and motorcycle passengers were excluded given that the fault data are not likely to represent the passenger's own role in the injury event. People with "other" injury circumstances, predominantly including injuries sustained on a tram, train, mobility scooter or public bus, were excluded due to small numbers. Participants were excluded if their injuries were intentional or if the intent was unknown, if they had a serious head injury (head injury Abbreviated Injury Scale (AIS) score >2 and Glasgow Coma Scale Score �3 and �8), or if they died within 12 months of injury (Fig 1). People with a serious head injury were excluded as they were considered less likely to be able to report the circumstances of the injury event.

Setting, data sources and data linkage
The VSTR is a population-based registry that collects information on all patients admitted to one of 138 trauma receiving health services across the state of Victoria who meet major trauma criteria [18]. The inclusion criteria for the VSTR are: (1) death after injury; (2) admission to an intensive care unit (ICU) for > 24 hours and requiring mechanical ventilation for at least part of their ICU stay; (3) Injury Severity Score (ISS) >12; or (4) surgery within 48 hours for intracranial, intrathoracic or intraabdominal injury, or for fixation of pelvic or spinal fractures. The VSTR includes demographic, pre-injury health, and injury-related characteristics that are collected either from the hospitals or from structured post-discharge telephone interviews. Postdischarge mortality outcomes are determined through linkage with the Victorian Registry of Births, Deaths and Marriages.
The TAC is a government owned organisation that provides financial compensation to people injured in collisions involving at least one motorised vehicle or a vehicle that operates on rails in the State of Victoria, or that involved a vehicle with a Victoria registration. An injured person is entitled to a compensation claim, regardless of fault, to support their healthcare costs if they meet a medical excess, which was $651 during the study period but does not apply to people who are admitted to hospital. People who sustain permanent impairment greater than 10%, as determined by an independent medical examiner, are entitled to lump sum impairment benefits. Additionally, people who are seriously injured and another was partially or completely at fault are entitled to common law compensation.
Claims data from the TAC were accessed from routine data linkage between the VSTR and the TAC using the TAC claim number. Claimant data for people included in the present study were provided to the research team following the routine annual linkage. The TAC provided the claimant's description of the injury event as a special request for this study after they had removed all identifiable nouns (i.e., person and location names).

Participant demographics
Demographic characteristics from VSTR included age at injury, sex, preferred language, education, work status and occupation pre-injury, and neighbourhood characteristics based on residential postcode at the time of injury. Highest level of education (university, high school, advanced diploma, did not complete high school) was classified in accordance with the Australian Standard Classification of Education (ASCED) [19]. Occupation skill level was classified in accordance with the Australian Standard Classification of Occupations (ASCO) [20]. Occupational skill levels were categorised into six occupation levels: managers and professionals; associate professionals; tradespersons and advanced clerical workers; intermediate sales, clerical, service, production, and transport workers; and elementary sales, clerical, or service workers and labourers.
The Index of Relative Socioeconomic Advantage and Disadvantage (IRSAD) deciles [21] classify neighbourhood socioeconomic position based on national census data on the typical family structure, employment and education level within each postcode region. Victorian ranked IRSAD deciles were summarised into quintiles ranging from one (most disadvantaged) to five (least disadvantaged). The Accessibility/Remoteness Index of Australia (ARIA) (Department of Health and Aged Care, 2001) classifies regions in Australia into five levels of remoteness (major cities, inner regional, outer regional, remote, very remote), which were summarised as major cities versus regional and remote areas due to the small number of remote regions in Victoria.

Pre-injury health
International Classification of Diseases (10) Australian Modification (ICD-10-AM) diagnosis codes from the hospital coders were used allocate the Charlson Comorbidity Index (CCI) weight [22], and to identify comorbid substance use and mental health conditions. The CCI provides weightings for the severity and number of comorbid conditions, where a weighting of zero indicates no comorbid conditions that increase mortality risk and higher weightings represent greater risk of mortality. The CCI weightings are validated predictors of trauma outcomes following major trauma [23] and orthopaedic injury [24]. Pre-injury substance use and mental health conditions were identified in accordance with published criteria [25]. Disability level in the week prior to injury was assessed using a five-level rating scale ranging from no disability to severe disability [26].

Injury characteristics and the injury event description
Injury characteristics included the ISS classified into tertiles, injured body regions based on the maximum AIS body region severity scores, length of hospital stay and discharge destination. Road user group and place of injury were determined from the injury coding in the VSTR. The time of day at which the injury occurred was classified in relation to sunrise and sunset times obtained from Geosciences Australia using criteria that were consistent with previous studies examining injury in relation to daylight hours [14,27]. We could not classify injury events according to whether they occurred on or off road.
Data from the TAC were used to characterise the injury event including the claimant text description of the injury event, the number of claimants and the number of vehicles involved in the collision, and fault attribution by the claimant and police report. The claim lodgement process includes the question "Was another vehicle at fault in the accident?" with responses of "yes", "no", or "unknown". Fault status was therefore not specific to the potential fault of the injured person's own personal (or their vehicle's) role in the collision. The claimant and police responses were used to categorise participants into the following five groups:

Data analysis
The data were processed and analysed using QDA Miner version 5, Wordstat 7.1.22 and Stata Version 15.0. The association between participant demographic, health and injury-related characteristics and fault attribution were examined for each road user group using Chi Square tests and Kruskal-Wallis test in order to provide a descriptive overview of characteristics associated with fault attributions. The text descriptions of injury events were analysed in Wordstat. The text was first corrected of grammatical errors, and processed using lemmatization and categorisation dictionaries that were iteratively developed from examining the keywords and phrases that appeared in the corpus. Cluster analyses were used to help identify the terms used to describe each type of injury event, which were exported to Stata for analysis. A detailed explanation of the text analysis methods is available in the S4 File; however, in brief, the text analysis followed the six steps outlined below: 1. Removal of punctuation and symbols, and correction of spelling errors.
2. Applying exclusion list (S1 File) for cluster and keyword co-occurrence analyses.
3. Viewing keywords in context and applying an adaptation of the English lemmatization dictionary (S2 File) developed by Provalis Ltd to consolidate terms to be processed. 4. Developing a categorisation dictionary (S3 File) based on the review of keywords and phrases so that semantically similar concepts could be analysed as a single category of terms.
5. Undertaking multiple iterations of exploratory cluster analyses for each road user group to review the commonly co-occurring keywords and category terms. Cluster terms were then reviewed in context to identify the combinations of keywords and category terms that are used to describe similar types of injury events.
6. Extracting keyword occurrences to Stata for classification of injury events, and identification of the relevant VicRoads Definitions for Classifying Accidents [DCA; 28] categories.
The DCA is used to classify collision and near-collision events from the perspective of the driver of a vehicle, and the key categories for this study included: pedestrian impacts (DCA 100-108); turning vehicles from an adjacent direction (DCA 110-118) or from opposing direction (DCA 121-125); head on collision (DCA 120); collisions in the same direction (i.e., rear-end collisions; DCA 130-137); manoeuvring (e.g., u turns, leaving or entering a car park; DCA 140-148); overtaking or changing lanes (DCA 150-154); collision with an animal (DCA 167); events in which the vehicle lost control on path (DCA 160-167) or off path (DCA 170-175) on straight, or on a curve (DCA 180-184); and other miscellaneous events involving falls from a vehicle, being struck by a load falling from a vehicle, hitting a train/tram/railway infrastructure, or being hit by a runaway parked car . Injury events could be classified into more than one scenario. Injury events were analysed separately for each road user group, and were summarised using frequencies and percentages.

Cohort overview
There was a total of 9,754 cases admitted to hospital following road traffic injuries who met major trauma criteria for inclusion in VSTR between 1 July 2010 and 30 June 2016. Of those cases, 3,941 were excluded from this study due to the injury and compensation claim eligibility criteria (Fig 1). Of the 5,813 motor vehicle drivers, motorcyclist, pedal cyclists and pedestrians with a compensation claim who met the inclusion criteria, only 2,486 could be included in the study as their claim included a description of the injury event. A higher proportion of cases whose claim did not include a description of the injury event spoke English as their preferred language, lived in neighbourhoods with greater disadvantage, were working pre-injury, had a pre-existing substance use or mental health condition, sustained more severe injuries, were involved in collisions with fewer vehicles, were injured in the evening hours, and when another vehicle was at fault ( Table 1). The text descriptions were only available for people with injuries between 2013 to 2016, except for two cases from 2011-12.
The cohort had a median age of 43 years (Q1-Q3: 28-60), and were predominantly male (n = 1782, 71.7%). Motorcyclists had the youngest median age of all injured road users (Median (Med) = 38, Q1-3: 27-50) compared with motor vehicle drivers (Med = 44, Q1-Q3: 28-66) and pedal cyclists (Med = 44, Q1-3: 34-74), and pedestrians were the oldest injured road users (Med = 54, Q1-3: 30-74), p<0.001. A higher proportion of motorcyclists were male (95.2%) compared with pedal cyclists (79%), motor vehicle drivers (60%) and pedestrians (53%), p<0.001. Eighty eight percent of injured pedal cyclist and pedestrians lived in metropolitan areas, compared with 62.8% (n = 712) of motor vehicle drivers and 76.9% (n = 598) of motorcycle rider, p < 0.001. While 69% of pedal cyclists were living in neighbourhoods with the highest two quintiles of socioeconomic position, the other road user groups were relatively The characteristics associated with fault attribution groups for motor vehicle drivers (n = 1150), motorcyclists (n = 786), pedal cyclists (n = 196) and pedestrians (n = 354) are reported in Tables 2-5, respectively. For motor vehicle drivers the only characteristics that differed across fault attribution groups was that a larger proportion of people who were working pre-injury were injured when another was at fault, or claimed to be at fault. Even though the majority of drivers did not have a substance use (91.9%) or mental health condition (92.3%), it was notable that a higher proportion of people injured in collisions where no other was at fault did have a substance use condition. A higher proportion of drivers who were the sole claimant or who were injured in single vehicle collision reported that no other vehicle was at fault, denied that another was at fault, or did not know if another was at fault. A higher proportion of motorcyclists who were injured at the fault of another were working pre-injury and were a median of 7-11 years older than the other fault groups. A higher proportion of motorcyclists who had a pre-existing substance use condition were injured in collisions in which no other was at fault, and a larger proportion of motorcyclists with unknown fault had preinjury disability. Most motorcyclists were the sole claimant from their injury event; however, a higher proportion of motorcyclists with no other at fault, or who denied another was at fault, were injured in collisions with a single vehicle compared with the other fault groups.
Most pedal cyclists (61.7%) and pedestrians (49.4%) reported unknown fault. No demographic, health and injury characteristics varied across fault groups for pedal cyclists. Pedestrians who reported unknown fault were a median of 20 years older and were predominantly unemployed pre-injury compared with pedestrians who reported whether another party was at fault or not. A higher proportion of pedestrians who reported that no other was at fault, or denied that another was at fault had a CCI weighted condition pre-injury.

Injury events
The total text corpus for injury event descriptions contained a total of 40,057 words (Table 6). Each case contained a median of 11 words (Q1-3: 6-18). Across all road user groups 229 (9.2%) cases had no recollection of the injury event (Fig 2). Sixty eight percent of motorcyclists and 84% of motor vehicle drivers with no recollection of the injury event were injured in collisions where no other was at fault, or in which they denied another was at fault. On the contrary, 48% and 57% of pedestrians and pedal cyclists, respectively, were injured in collisions where the fault of another vehicle was unknown.
Most injury events occurred during daylight hours (Fig 3). There were no apparent differences in the number of injuries occurring throughout the week for motor vehicle drivers; however, a larger proportion of motorcyclists were injured on the weekend than during the week, a larger proportion of pedal cyclists were injured between Tuesday and Thursday, and pedestrian injuries peaked on Fridays.

Motor vehicle driver injury event classifications
There were 34 motor vehicle driver collision classifications. The number of vehicles and fault attribution groups involved in each type of collision varied, as described below and shown in Fig 4 for the most common collision types. The majority of driver collisions in which no other was at fault, or the driver denied that another was at fault, involved losing control of the vehicle and/or veering off the road, including losing control when driving over or onto earthmatter (e.g., dirt, gravel or stones) or plant-matter (e.g., branches on the road) on or by the side of the road. All but one collision type occurred on a road, street or highway according to the place of injury classification by VSTR. Other circumstances with no other at fault involved poor weather, road conditions, or limited visibility; having a mechanical failure; hitting a tram Mechanical failures for motor vehicle drivers primarily included having a flat or blown tyre, failed brakes, or when the driver stated that their foot was stuck on the accelerator. Driver injury events in which there was predominantly another at fault, or a claim that another was at fault, involved another vehicle entering the road or the driver's lane; being hit while the driver's vehicle was stationary or slowing in traffic; being in a rear-end or t-bone collision; or losing control when trying to avoid an obstacle or another vehicle. Driver collision classifications that had a higher proportion of cases (>10% of the collision scenarios) with unknown fault included scenarios in which the driver was hit while stationary; the driver hit a pole; the collision involved hitting or being hit by another vehicle, or a t-bone or rear-end collision with another vehicle. In particular, collisions involving two or more vehicles had a high proportion of cases with unknown fault (35.3% of all cases involving two or more vehicles). A large number of cases had a collision with a tree (n = 166), pole (n = 55) or other type of barrier (e.g., fence, house, wall or safety railing; n = 20). Sixty two cases involved a collision with a heavy vehicle, and the vehicle rolled in 91 "loss of control" injury events. Collision classifications with a frequency of <10 cases that are not depicted in Fig 4 included: cases who reported being unable to stop in time to avoid an obstacle; loss of control while towing a trailer or caravan, or travelling around a corner or bend; collision with a rock on or at the side of the road; collision while reversing; being hit by another vehicle and going into an embankment; and disclosure that the driver was under the influence of illicit substances or alcohol.

Motorcyclist injury event classifications
There were 35 motorcyclist collision classifications, of which 31 are summarised in Fig 5. Crash classifications where there was predominantly no other at fault, or where motorcyclists denied that another was at fault, included scenarios in which the motorcyclist described that they fell off or lost control while avoiding an obstacle, or attempted to avoid an obstacle; lost control after driving on gravel or earth matter, or when travelling around a bend or corner. For the 68 motorcyclists who lost control on gravel or earth matter, 42 were driving on a road, street or highway, only two of whom reported that they were driving on a road with dirt or gravel surface in the injury event description. Other circumstances with no other at fault included motorcyclists who had a mechanical failure; had poor visibility, road or weather conditions (88.2% of which involved wet weather); slipped on tram tracks; or had a collision with or when attempting to avoid an animal, most of which involved kangaroos (n = 24, 68.6%). Three hundred (39.7%) motorcyclist injuries involved the motorcyclist losing control, of which 225 (72.1%) motorcyclists reported no other was at fault. Mechanical failures for motorcyclists predominantly included the brakes or wheels locking up or failing, stalling the motorcycle, or other mechanical factors (e.g., accelerator got stuck). Collision classifications in which there was predominantly no other at fault also included events in which the motorcyclist was injured while executing a jump (e.g., at a motor-cross track); or if the motorcyclist hit a curb or gutter, fence or wall, railing or barrier, pole or street sign, tree, or stationary vehicle. Claimants injured in collisions occurring at an intersection or roundabout predominantly reported that no other was at fault (51.7%), with 14 (48.3%) of those events occurring at a roundabout and only 9 (9.4%) of which referred to a turning vehicle.  Collision classifications where there was predominantly another at fault, or the motorcyclist claimed that another was at fault, included collisions with: a turning vehicle or while turning; a vehicle turning from an adjacent street or driveway; a vehicle in or exiting a car park; a vehicle that was merging or changing lanes; or an oncoming vehicle. It should be noted that the collisions that involved a turning vehicle could have resulted in injury to the motorcyclist because they collided with another vehicle, or they lost control avoiding that vehicle and Table 5. Characteristics of pedestrians stratified by fault attribution, n(%), N = 354.

PLOS ONE
fell from the motorcycle. Most types of injury events included less than a quarter of cases with unknown fault. The remaining collision classifications not shown in Fig 5 included events where the claimant described falling off their bike but provided no other detail to allow for more precise classification of the injury event (n = 38) the majority of whom involved no other party at fault or the claimant denied that another was at fault (n = 29, 76.4%). Fewer than five claimants referred to a driver running a red light.

Pedal cyclist injury event classifications
There were 24 cyclist injury event classifications, and only 10 classifications that included five or more claimants, Fig 6. While fault status was unknown for the majority of cases (median proportion of cases with unknown fault = 58.6%; range: 47.7-83.3%), collision classifications in which large proportions of cases recorded unknown fault status included events where the cyclist collided with the door of a parked vehicle, when the cyclist was hit in the rear by a vehicle travelling in the same direction, or when the cyclist hit a stationary or slowing vehicle in front of them. Less than 20% of all collision classifications involved no other at fault; however, collisions in which some cyclists were injured with no other at fault or who denied that another was at fault, involved a cyclist being hit by a turning or oncoming vehicle, while the cyclist was turning, or occurred at an intersection; or when a vehicle was stationary or slowing down in front of them (e.g., the vehicle was attempting to enter a car park). Injury event classifications in which there was predominantly another at fault, or the cyclist claimed that  another was at fault, occurred when there was a head-on collision; when a vehicle drove into the path of the cyclist from an adjacent direction (e.g., side street) or from another lane of traffic; when travelling through a roundabout or intersection; and when the event involved a head-on collision, or while attempting to avoid another vehicle, cyclists or object.   Collision classifications that had fewer than five cases included events in which the cyclist was hit by a reversing vehicle or vehicle exiting a driveway; when riding or walking across the road at a pedestrian crossing; when the cyclist suddenly braked to avoid an obstacle or vehicle; or when they hit a vehicle that braked suddenly in front of them. Other collision classifications with few cases included events in which the cyclist or motor vehicle were manoeuvring with a U-turn or hook turn; while a vehicle was exiting a carpark into traffic; when the cyclist veered off the road, or slipped on tram or train tracks; when a motor vehicle ran a red light; or when there were poor weather conditions.

Pedestrian injury event classifications
There were 13 injury event classifications in which pedestrians sustained injuries, 10 of which comprised five or more cases, Fig 7. Injury events for which a larger proportion of cases included no other at fault, or the pedestrian denied that another was at fault, included circumstances when the pedestrian was hit by a vehicle exiting a driveway; or in a "slow moving" traffic area (e.g., walking through a carpark, behind a vehicle reversing into a carpark, at a petrol station). The injury classifications in which a larger proportion of cases reported that another was at fault, or the pedestrian claimed that another was at fault, included circumstances when the pedestrian was exiting or entering their own vehicle; the pedestrian was hit while they were on a nature strip, median strip, footpath, or outdoor dining area by the side of a road; the pedestrian was hit while at the side of the road (e.g., in emergency lane, changing a tyre, or waiting to cross), or when a vehicle ran a red light. Injury event classifications with a high proportion of cases with unknown fault included a collision with the pedestrian in a slow traffic area; an impact with a tram or train, or infrastructure around a train or tram tracks; or when a pedestrian was hit at the side of the road. Injury event classifications that had fewer than five cases involved the pedestrian falling onto the road, being injured in a hit and run or through a deliberate act. All cases that recorded that the collision involved a "hit and run" reported unknown fault status, probably because the claimant cannot recall the event and it was not possible for the police to ascertain whether the other vehicle was at fault because they had left the scene.

Discussion
Understanding the circumstances of road trauma events resulting in serious injury, particularly in relation to which vehicle was at fault, is important for advancing road safety and injury prevention strategies. In the present study of compensation claimants who survived a serious injury in Victoria, Australia, between 2010-2016, we examined crash circumstances using text mining, and evaluated the relationship between demographic, health, injury and injury characteristics with fault attributions. Two thirds of motor vehicle drivers and just over half of motorcyclists were injured in collisions where no other vehicle was at fault, or where the claimant denied that another was at fault, and more than a third of those injury events involved a single vehicle. For pedal cyclists and pedestrians, however, fault status was unknown for more than half of all cases. The vast majority of all road traffic injuries occurred during daylight hours, or within one hour of sunset, consistent with the fact that 77% of Victorian road use occurs between 7am and 7pm [29]. While there were some common patterns in the relationships between collision circumstances and fault attributions across road user groups, several features were specific to each road user group or type of road traffic collision. We now discuss the implications of the present findings for road safety and injury prevention.

Implications for road safety
Understanding the circumstances leading to road traffic injury is crucial to improving road safety, and the implementation of countermeasures to reduce both the incidence and severity of road trauma [13,30]. Most road traffic injuries are thought to occur due to human error [5,31], even when environmental conditions or engineering and infrastructure factors are recognised to have played a role. Therefore, road safety strategies must consider how to reduce injury risk due to human error through all aspects of the Safe System approach, from improving road and infrastructure conditions to engineering or behavioural interventions and postcrash care to reduce both the incidence and impacts of human error.
Not surprisingly, road traffic injuries were common in circumstances where there are heightened opportunities for conflicts between road users, including being hit by another vehicle changing lanes or emerging from an adjacent direction, or approaching intersections or roundabouts. In many of these instances a large proportion of cases were injured at the fault of another vehicle, or the injured person claimed that another was at fault, perhaps due to the perception of human error or recklessness in the other driver. For instance, these injury events often involve driver distraction, driving in a manner that does not allow a safe braking distance, or misjudging the location of vehicles in other lanes [32]. Previous studies have shown that motor vehicle collision frequency is increased on roads with multiple shared or multiple lanes [33,34] and at or approaching intersections [35]. In particular, intersection collisions often involve vehicles driving too closely behind another vehicle, turning or crossing in front of oncoming traffic, or violating traffic signals [32,35,36]. For cyclists, one of the most common on road collision events occurs when vehicles merge into or across the path of a cyclist [12], and the motor vehicle driver is typically at fault [37]. "Dooring" injuries were found to be infrequent for cyclists in the present cohort, which is consistent with previous studies in Australia [12,38]. However, we recognise that cyclists may avoid dooring injuries by riding closer to traffic on roads with parked cars than they do on roads without parked cars [39]. Road safety may be enhanced with traffic calming strategies to reduce traffic volume and speed near intersections and collision hotspots for road users faced with turning or merging traffic, especially cyclists [34]. Moreover, protected infrastructure for pedal cyclists that separates them from both parked cars and traffic may reduce both the frequency and severity of cyclist injuries.
Pedestrians were predominantly older than the other road user groups, and were most often injured when crossing the road. We know that older pedestrians are particularly vulnerable to serious injury when crossing the road [7], possibly because it typically takes older adults more time to safely reach the other side of the road, particularly older people with comorbid conditions or frailty. Nature strips, traffic islands and median strips are common traffic calming measures that are used to promote pedestrian safety, and may provide a place of refuge for pedestrians who can cross the road in stages [6]. However, we found that five percent of pedestrians reported that they were injured on a median strip, footpath or outdoor dining area by the side of a road. In the absence of scene analysis, we speculate that most of those injury events probably occurred due to driver error rather than reckless pedestrian behaviour in which the driver breached the kerb or median to impact with the pedestrian. Moreover, while the specific road features of the injury locations was not known, previous studies have shown that the protective effects of median strips is lost if the median is not raised above the road surface, or if it is less than 150cm wide [34]. An analysis of the locations where pedestrians are more often injured near or on median strips should allow the identification of hotspot areas requiring enhanced safety strategies.
Nearly half of injured motor vehicle drivers, and one in ten motorcyclists, reported losing control in the collision. In most of those cases, there were no other vehicles in the collision and no other vehicle was reported to be at fault. Rather, most people described situations where they lost traction when the vehicle veered into the gravel shoulder of the road, or into an embankment, or they veered off the road and hit another obstacle or rolled their vehicle. While pedal cyclist injuries have been found to often occur after the cyclist loses control in Canadian [40] and Australian studies [11,41], these descriptors did not emerge in the text mining of the present pedal cyclist injury event descriptions. For motorised vehicles, previous research has found that single vehicle road traffic injuries often occur on roads with low shoulder width and sight distance where, presumably, the risk of losing traction or control of the vehicle is heightened if the driver veers onto the roadside shoulder [14]. We could not find any previous studies that reported the prevalence or circumstances of "loss of control" collisions for motorised vehicles. Most modern vehicles include safety features to detect when the driver is at risk of losing control (e.g., due to loss of traction or oversteering); however, these safety mechanisms are lacking for many people in the community given that 43% of vehicles registered in Australia are more than 10 years old [42] and electronic stability control systems only became mandatory for new vehicle registrations in Victoria since 2011 [43]. Given that many of the loss of control injury events occurred when drivers veered off road, or when motorcyclists were travelling around a bend or corner other environmental (e.g., rumble strips, roadside barriers, or other traffic calming interventions) and behaviour targets (e.g., safety campaigns via billboards in high risk zones, or broadcast via television, radio or social media) may have the greatest effect at reducing the risk of drivers losing control of their vehicle.
Many motor vehicle and motorcyclist collisions involved interactions with unsafe road conditions, losing control at the roadside shoulder or due to debris on the road, and collisions with trees, fences, poles or barriers at the roadside. One key target, therefore, to promote safety could be to improve the level of protection from fixed roadside objects with installation of roadside barriers, reductions in speed limits, or extension to the width of the sealed shoulder of rural roads [9]. That said, it is likely that the presence of roadside barriers reduced the severity of injuries sustained in those collisions (e.g., by preventing the injured person from colliding with oncoming traffic), and it is important that we do not assume that the presence of roadside barriers caused those collisions.
People often lost control in injury events in which they attempted to avoid a hazard or animal, especially kangaroos. In Australia, kangaroos are the most prevalent animal counterpart in road traffic collisions [30,44], and the incidence of animal-vehicle collisions has increased since 2017 in Victoria [30]. The implementation of countermeasures is therefore a critical priority to reduce the burden from these types of injury events. Given that most animal-related injuries involved kangaroos, where injury risk is known to be heightened during the winter months and between dusk and midnight [44], injury prevention strategies should target high risk rural road networks and road users who are most likely to encounter kangaroos. Countermeasures may include imposing reduced speed limits between dusk and midnight, reviewing vehicle safety and driver skills, particularly for drivers who travel in high risk zones at high risk times of the day. Installation of road side barriers in high risk zones could also prevent drivers from veering into oncoming traffic or trees at the roadside while avoiding a kangaroo. While installation of under or overpasses to allow animals to cross roads can reduce risk in some settings [45], this is not feasible for Australia's extensive rural road network [44]. Moreover, those roadside barriers would need to account for the capacity of kangaroos to jump 2-3 metres in height. While driver education could reduce the incidence of animal-vehicle collisions, education-based strategies have not achieved these safety outcomes in other countries [46], and would require critical evaluation before large-scale investment in Australia.
Inclement weather conditions generally increase the risk of road traffic collisions, with no apparent difference in risk between moderate and heavy conditions [for a review, see : 13].
Weather conditions predominantly increase collision risk due to reductions in traction and visibility of hazards and other road users [47]. While only a small number of collisions cited a role of inclement conditions, we recognise that traffic volumes are typically reduced during periods of heavy fog, strong wind or rainfall. In particular, very few cyclists and only 20 motorcyclists referred to the role of inclement conditions in their collision, which is consistent with previous studies [12,41], and highlights not only that fewer people probably choose to ride in inclement conditions but that those who do may be more experienced, fitter or cautious and therefore less likely to sustain an injury [48]. The prevention of road traffic injuries due to weather or visibility could be facilitated through targeted safety campaigns, broadcasting and traffic mitigation techniques during periods of poor weather, especially in winter or during heavy rainfall, and adjusting speed limits in hazard hotpots during inclement conditions [47]. Moreover, safety features in modern vehicles can reduce injury risk in all conditions, but particularly in wet conditions [32]. These features include anti-lock braking systems, lane departure warning, traction control, reversing camera and warning systems, as well as improved tyre quality and maintenance.
While we did not have information on the type or age of vehicles involved in the present collisions, a small number of motor vehicle and motorcyclist collisions involved mechanical failures highlighting the need for regular maintenance, particularly braking systems and tyre conditions given that these were the primary mechanical issues reported. Previous studies have shown that younger drivers may be particularly vulnerable to serious road traffic injury as they tend to drive older vehicles [10]. While few prior studies have examined the role of mechanical failures in road traffic injuries, a Canadian study published more than ten years ago also reported that mechanical failures were infrequent causes of serious road traffic injuries [49].

Implications for understanding fault attribution
There are two major theories on how people attribute fault following a road traffic injury. First, the actor-observer effect is thought to generate biases in attributions of personal responsibility versus the responsibility of others. That is, people typically focus on the impact of situational and contextual factors (e.g., the weather, road conditions, unexpected hazards) on their own behaviour when they were at fault, but focus on intrinsic or dispositional characteristics of the other person when they are at fault [50]. Second, the Defensive Attribution Theory argues that when describing an injury event people tend to identify features of the event that were controllable and preventable, with a bias towards believing that another party was responsible for preventing the injury, particularly when the consequences are severe [51]. While it was not possible to quantitatively evaluate these biases in the present study, it does appear that people who did not attribute fault to another vehicle predominantly described the role of the environment or precipitating factors in the injury event, or their loss of control due to those factors. On the contrary, people who were injured when another was at fault typically described the other driver's contribution to the collision and the nature of the impact. It did not appear, however, that many people used emotive language or made statements about intrinsic characteristics of the other driver with the exception of a handful of cases (e.g., ". . . was turning right with a green arrow . . . an idiot driving an [imported model car] did not stop for her red light and she plowed [sic] into me").
Few previous studies have examined which characteristics are associated with fault or responsibility attributions after injury. In one study, Gabbe, Simpson [52] reported characteristics associated with fault attribution in people who were hospitalised for orthopaedic road traffic injuries. Men were more often injured when no other was at fault or denied to be at fault. A larger proportion of people injured when another was at fault, however, had a university level of education, were injured while cycling, and did not sustain a serious injury. Similarly, women and "not at fault" drivers were more likely to be injured in motor vehicle collisions in a study in Florida, USA, but were less likely to cause injuries to drivers of other vehicles [53]. Another recent study in Victoria, Australia, examined characteristics associated with being personally responsible for events resulting in serious injury [54]. People predominantly reported low levels of personal responsibility if they were female or if their injury was compensable. On the contrary, high levels of personal responsibility were associated with injuries from falls or motorcycle collisions compared with motor vehicle drivers, and for people with a pre-existing substance use condition. Unknown personal responsibility was associated with sustaining a head or spinal cord injury. Together with the present findings, therefore, it appears that people are more likely to be injured when another person is at fault if they are female and have higher socioeconomic position (i.e., higher levels of education, employment, or living in a neighbourhood with higher levels of socioeconomic advantage).

Study strengths and limitations
This registry study has several strengths, particularly the inclusion of a very large sample of people who sustained serious injury after a road traffic collision. However, more than half of the potential participants could not be included as they did not have a text description of the injury event in their compensation claim, the majority of whom were injured in 2011-2012. In particular, the study disproportionately excluded eligible cases whose preferred language was English, who had a pre-existing substance use or mental health condition, sustained more severe injuries, were involved in injury events with fewer vehicles, or that occurred in the evening, and in collisions where another vehicle was at fault. The present results may therefore have under-represented the type or prevalence of different types of collisions in which another was at fault.
Even with relatively well structured text data, the processes involved in text mining is often described as part 'art' and part 'science' [55]. Immeasurable decisions are made when mining text data to refine and improve classifications through iterative data processing and dictionary development, and it is not possible to record and quantify every one of those steps and decisions made. In natural language processing or text mining research we typically seek to use coding rules and machine learning based processes, including the generation of robust dictionaries and coding rules using both a training set and a test set from the corpus, which may also be validated against an alternative coding system [56]. When preparing the dataset for text analysis we realised that it would not be wise to use natural language processing and machine learning techniques to extract meaning from the grammatical structure of the injury event description or to make assumptions about the actions of the various parties involved in an injury event given that it was often not possible for us to make these assumptions when we read the descriptions. Moreover, in several cases the injury event description only referred to the ambulance or police reports, which were not available to the researchers, and 229 people simply indicated that they could not recall the injury event in their text description. Many injury events were described in vague terms about the actions of their own vehicle or person versus other parties; e.g., in collisions that involved a turning vehicle it was not always clear whose vehicle was turning or the direction in which they were turning relative to the other vehicles involved. In some cases multiple claimants from the same injury event appeared to have the same text description. We assume that these cases probably had a linked claim for family members where individual claimants did not or could not provide a unique accident description. Therefore, we could not use the text to make assumptions about the individual claimant's personal attribution of fault. These are common problems in text descriptions of injuries using claims data [55]. Unfortunately, the classification of injury events, such as the DCA coding [28], that are routinely generated by police and government transport authorities were not available to the study team, and so the injury event classifications that we generated could not be validated. Instead, we regularly cross checked the classifications against the full text description to ensure that the classifications were accurate.
Finally, given that the text data were taken from claims submitted to the compensation scheme, the text descriptions probably lacked details about potential negligent or reckless behaviour by the claimant during the injury event (e.g., use of a mobile phone, fatigue, or drug/alcohol use) because they were not asked about it when making their claim, because the claimant wanted to reduce the risk that they could be charged for dangerous driving causing serious injury or death, or to maximise their entitlements for additional benefits (e.g., loss of earnings) or common law damages. Moreover, the median length of the text descriptions was 11 words, which highlights that the majority of claimants did not provide detailed descriptions of the injury event.
Despite the present limitations, this study has generated novel insights and resources through the use of iterative text mining and processing. The comprehensive dictionaries that we developed could be used by compensation schemes, or other researchers, who seek to understand the patterns and incidence of different types of collision classifications using routinely collected administrative or insurance data. We recommend now that future studies extend the methods used in the present study using predictive modelling to generate deeper insights into the association between fault attribution and/or traffic collision characteristics (e.g., single vehicle versus multiple vehicles) and circumstances (e.g., environmental conditions, driver characteristics) on a range of factors, including crash severity, injury severity, survival rates and long-term outcomes.

Conclusions
The present study has presented a novel overview of compensable road traffic collision circumstances resulting in serious injury. Using text mining we characterised the predominant types of road traffic injury events that occur when another vehicle is at fault, or not, and have identified potential strategies to enhance road safety. Future research should now examine whether the implementation of countermeasures has played a role in reducing the incidence and severity of the types of collisions that have been targeted, from animal-related and loss of control collisions through to multi-vehicle, vehicle-cyclist, and vehicle-pedestrian conflicts. for obtaining the sunrise and sunset times from Geosciences Australia to classify the time of day of injury events. We also acknowledge the TAC road safety and business intelligence teams, which provided important guidance for the analysis and synthesis of the injury events.