The effects of sleep loss on young drivers’ performance: A systematic review

Young drivers (18–24 years) are over-represented in sleep-related crashes (comprising one in five fatal crashes in developed countries) primarily due to decreased sleep opportunity, lower tolerance for sleep loss, and ongoing maturation of brain areas associated with driving-related decision making. Impaired driving performance is the proximal reason for most car crashes. There is still a limited body of evidence examining the effects of sleep loss on young drivers’ performance, with discrepancies in the methodologies used, and in the definition of outcomes. This study aimed to identify the direction and magnitude of the effects of sleep loss on young drivers’ performance, and to appraise the quality of current evidence via a systematic review. Based on the Preferred Reporting Items for Systematic Reviews and Meta- Analyses (PRISMA) approach, 16 eligible studies were selected for review, and their findings summarised. Next, critical elements of these studies were identified, and the Grading of Recommendations Assessment, Development and Evaluation (GRADE) guidelines augmented to rate those elements. Using those criteria, the quality of individual papers was calculated and the overall body of evidence for each driving outcome were assigned a quality ranking (from ‘very low’ to ‘high-quality’). Two metrics, the standard deviation of lateral position and number of line crossings, were commonly reported outcomes (although in an overall ‘low-quality’ body of evidence), with significant impairments after sleep loss identified in 50% of studies. While speed-related outcomes and crash events (also with very low- quality evidence) both increased under chronic sleep loss, discrepant findings were reported under conditions of acute total sleep deprivation. It is crucial to obtain more reliable data about the effects of sleep loss on young drivers’ performance by using higher quality experimental designs, adopting common protocols, and the use of consistent metrics and reporting of findings based on GRADE criteria and the PRISMA statement. Key words: Young drivers, sleep loss, driving performance, PRISMA, the GRADE, systematic review.


Introduction
Sleepiness is a primary cause of road crashes [1][2][3], underlying an average of 20% of all crashes in developed countries [2,[4][5][6][7][8] (17% in Australia [9,10], and 25% in the UK [11,12]). Road crashes impose a huge economic and social burden, estimated to be $1,855 billion per year globally [13] on modern societies. Based on a conservative estimates derived from police reports, sleep-related crashes cost $12.5 billion monetary losses in the US annually [14]. However, these figures are likely the tip of the iceberg, with actual costs potentially $29.2 to $37.9 billion in the USA [15]. Sleepiness is mainly induced by sleep deprivation [16] due to total sleep loss, partial sleep loss, extended wake duration, and sleep fragmentation or sleep disturbances.
Young drivers (those aged 18-24 years) are generally at higher risk for road crashes than are older drivers [2,[17][18][19], with an estimated risk of crash between 2 to 10 fold, when compared with other age groups [5,20]. Young drivers also comprise a greater proportion of driver fatalities. Some specific characteristics of young drivers such as late maturation of their brains' decision-making areas [21,22], their slower reaction times while sleepy [23,24], and a lower tolerance for sleep loss than older adults [25], results in greater vulnerability to sleep deprivation [26,27] and hence their over representation in sleep-related crashes [20,26,28].
To the best of the authors' knowledge only three systematic reviews of effects of sleep deprivation on driving tasks have been published. The first review examined the effect of driver sleepiness (from shift work, excessive daytime sleepiness and sleep loss) on crash rates, but not on any other specific index of driver performance [29]. This review included 18 cross sectional and case-control studies with only one paper examining the effect of sleep loss on crash rate. The papers generally could not make a robust conclusion on the relationship between fatigue and crash rate due to small sample sizes, biases, and aspects of their designs, and could not identify a strong effect of sleepiness on crash rate [29]. The second review investigated the effect of sleepiness on driving performance outcomes to determine if such outcomes could reliably predict driver sleepiness on road. This review included papers with a broad inclusion of participants, cause of sleepiness (sleep loss vs fatigue from time-ontask), driver experience (professional driver vs road user), and sleep disturbance (shift worker vs non-shift worker). They found that the majority of studies had examined simple performance measures such as standard deviation of lane position in controlled experimental settings, with results reported as an average among drivers. Individual differences were largely not taken into account [30]. A recent systematic review by the National Sleep Foundation Drowsy Driving Consensus Working Group [31] considered the severity of sleep loss and involvement in motor vehicle crash for drivers over the age of 15 years. Their consensus conclusion was that drivers would be impaired by 3 to 5 hours sleep loss incurred during the preceding 24 hours.
Apart from the above-mentioned systematic reviews, about 200 original research papers have been published on the topic of the effects of sleepiness or fatigue on driving tasks. However, the effects of sleep loss on young drivers' performance specifically remains uncertain in that, a) more than 50% of these papers did not study sleepiness from sleep loss, but instead from other sources such as time-on task fatigue or usual daytime sleepiness, or they have examined the effect of countermeasures for sleepiness (e.g. light, modafinil, caffeine, etc.),but not the effects of sleepiness itself, and b) about 40% of papers have included a broad range of drivers (professional and non-professional, young and old drivers), or examined only the prevalence of sleepiness or outcome measures other than driving performance. Fewer than 10% of the existing literature has examined the direct effects of sleep loss on driving performance of young drivers (between 18-24 years old).
Given the higher vulnerability of young drivers to sleep related crashes, and the high cost of sleepiness-related fatalities and severe injuries it is crucial to systematically review the available body of evidence. A systematic review provides the opportunity to better understand the effects of sleep deprivation on driving performance of young drivers and to inform future prevention strategies. This paper aims to systematically review all peer-reviewed original research studies, and to rate the quality of the available body of evidence on effects of sleep deprivation on young drivers' driving performance over the last 12 years. A preliminary search into the databases revealed that applicable and relevant data about the effects of sleep loss on driving performance outcomes in young adults specifically are largely limited to the last decade. As such, a 12-year period was defined for inclusion of relevant studies. As sleep loss is a public health problem, the research team agreed that if a meta-analysis was not feasible due to data limitations, then an appropriate evaluative approach should be taken to estimate the quality of evidence (i.e. the confidence in current knowledge).
The term 'sleepiness' in this paper refers to the broader term 'fatigue' as well. It is acknowledged that 'sleepiness' could be more precisely distinguished from other conceptualizations of 'fatigue', particularly chronic fatigue [32]. However, in the current review, due to coexistence of sleepiness and fatigue after sleep loss [33] and lack of standard definitions for these terms, the two terms have been considered interchangeably to address a 'need for sleep'.

Materials and methods
This systematic review was conducted by the authors based on the PRISMA statement; Preferred Reporting Items for Systematic Reviews and Meta-Analyses [34]. A protocol was developed for this systematic review, but was not registered. In the first step, following the PRISMA statement, the research question, the scope of the study and inclusion/ exclusion criteria were defined. Next, the available literature was systematically screened before selection of eligible studies based on PRISMA flowchart. Finally, the selected papers were reviewed, the quality of the body of evidence was rated and the effect sizes of sleepiness on drivers' performance were summarised using the GRADE guidelines; Grading of Recommendations Assessment, Development and Evaluation [35][36][37][38][39][40][41][42][43][44][45][46][47][48]. Two review groups (group 1: SH.SH.S + S.S.S and group 2: M.J.W + V.G.H) conducted the review steps independently and reached a consensus before moving to the next step.

Research question
The elements of Population, Intervention, Comparator (control), Outcomes and Study design (PICOS, [34] were considered from the PRISMA statement in development of the research question as "What are the effects of sleep loss on young drivers' driving performance outcome measures?"

Scope of the review, inclusion/exclusion criteria
To answer the research question, specific inclusion and exclusion criteria were set to define inclusion of original research papers studying the independent effects of sleep deprivation on young adults' driving performance. These criteria were based on characteristics of the papers such as peer-review status, participants, sleepiness exposure, outcome measures, publication date, and study design as well as publication language ( Table 1). Because of the increased risk of bias from translation of information from other languages to English [49], and the likely modest impact of removing non-English literature on the estimation of effects [50], papers published in other languages were excluded. A specific search statement was developed as follows: [("sleep depriv Ã " OR "sleep loss" OR "sleep limitation" Or "sleep restriction") AND ("sleepiness" OR drows Ã OR hypersomnol Ã OR "sleep onset" OR "excessive sleep Ã " OR "sleep propensity" OR fatigue Ã OR microsleep Ã OR alert Ã OR vigilance OR hypovigilan Ã ) AND (driver OR simulator OR vehicle OR "commercial driver" OR "professional driver" OR "driver performance" OR "truck driver" OR "bus driver")].

Search strategy and selection of studies
Some databases, such as Transportation Research Information Database, The Cochrane Library and EMBASE, do not utilise asterisk ( Ã ) within their search strategy. AS such, in the search statement the complete wordings of key words were utilised for these databases. By using some filters, the records were narrowed to include only peer-reviewed papers published within the last 12 years (from 2004 to 2016). In some cases, the journal websites were checked directly to ensure peer-review processes. The search (via the above databases) was restricted further to English language only. Search alerts were activated where available to automatically update the records. Bibliographic records of all identified papers were also examined to identify additional potential papers for inclusion. The primary outcome measures of interest should include driving performance outcomes, either driving simulator or on-road. Driving performance outcomes could be studied individually or along with other objective and subjective determinants of sleepiness Using the PRISMA 2009 flow diagram [34], all potential papers were first identified via this search strategy. After aggregating all records and removing duplicates, screening of the title and abstracts of all papers against inclusion criteria was undertaken by two review groups independently. The full-text prints of selected papers were assessed for eligibility and the reason for inclusion/exclusion of papers was recorded by the review groups independently. Finally, papers were selected by a discussion with other members of the research team, and a consensus approach was used to decide in case of any discrepancy. Where required, further information was sought from authors of selected papers about their research to inform these decisions.

Summarising the papers
Based on the GRADE guidelines [37,39,40], the important elements of selected studies were summarised and criteria for rating the quality of the papers were developed. For this purpose, some specific and important aspects of individual papers such as study design/objective, sample size, participants' age range, sleep deprivation regime, driving settings, driving duration, frequency and time of drive, and driving performance outcomes were reviewed and the important methodological elements (strengths and potential flaws of the studies) were extracted and summarised. Not all items specified in the GRADE (a schema developed primarily for review of health and medical literature) are applicable to studies on road safety, as such, adaptation was needed to apply GRADE to this literature (i.e. experimental studies versus RCTs etc.). Also based on the possible differential consequences of various degrees of sleep deprivation [51][52][53], the sleep deprivation regimens were classified into acute and chronic sleep loss, with acute sleep loss rated at three levels of moderate (2-4 h), severe (4-6 h) and total (8 h) sleep loss, and chronic sleep loss rated at two levels of mild (1-2 h) and moderate (2-4 h) sleep loss.

Development of the GRADE criteria
The GRADE guidelines [37,39,40] include some criteria for rating the quality of the papers. GRADE is a flexible approach and relies to some extent upon the judgment of the researcher, as such, additional criteria were derived from the summarised aspects of the studies and their methodological elements in order to augment the existing GRADE criteria. These modified GRADE criteria were comprised of discipline-specific downgrading and upgrading scores for rating the quality of the reviewed papers.

Identification of the quality the body of evidence
Using the modified GRADE criteria and the GRADE guidelines [37], a multi-step approach was taken to identify the quality body of evidence for the outcomes: First, these modified GRADE criteria were utilised to calculate a single GRADE score for every outcome measure reported in each individual papers. Next, these single GRADE scores were utilised to calculate an overall quality of evidence for all papers reporting the same outcome. Finally, a quality rank was assigned to the body of evidence for every driving performance outcome.
Rating the quality of individual papers. The quality of a driving performance outcome measure was rated in individual papers by considering factors degrading the quality of papers including poor study design, risk of bias (due to inadequate monitoring sleepiness during test (wake EEG) and presence of practice effect), and imprecision (due to ungeneralizable findings and small sample size), as well as some upgrading factors including large effect size, large sample size, objective measurement of sleepiness (EEG) and control for distraction. For this purpose, a four-step approach was taken as follows: 1) as for the study design, the GRADE score of 4, 2, 1 and 0 were first assigned to studies with randomised control trial (RCTs), longitudinal, quasi experimental, and other designs respectively. In the sleep studies, quasi-experimental designs that manipulate sleep and longitudinal studies that provide detail of the cumulative effects of chronic sleep deprivation are both capable of showing the magnitude and direction of effect of sleep loss on drivers' performance. Therefore, the GRADE scores were modified by adding one point to studies applying either of these two designs.
2) The quality of the papers was further assessed for risk of bias and imprecision. Given that the risk of bias and imprecision adversely affect measurement of driving performance outcomes and the generalizability of the findings, the quality of the papers was downgraded by deducting one point for existing risk of bias (e.g. inadequate monitoring of sleepiness during driving task, presence of practice effect), and by further deduction of one point for imprecision (e.g. increased uncertainty due to small sample size).
3) The quality of papers was upgraded by adding one point for their methodological strengths such as strong control of sleep loss before test and by an additional point for factors increasing certainty of findings. 4) A single quality score was assigned to the individual papers by adding all positive and negative points in the above-mentioned order. The same process was repeated for other outcome measures of driving performance.
Rating the quality of body of evidence. Based on the single GRADE scores of individual papers, an Overall GRADE Score (OGS) was calculated for the body of evidence (including at least two individual papers reporting the same outcome). It should be noted that the OGS for the body of evidence was not determined by averaging the single GRADE scores, but by considering the contribution of individual papers toward the estimated magnitude of effect of sleep loss on a given driving performance outcome. For example, studies with larger sample sizes were considered as more important contributors, and were weighted to reflect that contribution. There is no recommended algorithm in the GRADE guidelines to calculate the OGS for the body of evidence. As such, a new formula including the sample size was developed to calculate the OGS as follows: Overall Grade Score for the body of evidence ¼ P ðGRADE score for paper Ã Sample size of paperÞ Total sample size of the body of evidence Ranking the quality of body of evidence. The quality of body of evidence for each outcome was ranked, by review team consensus, at four ranking levels from 'very low' to 'highquality' based on the GRADE guidelines [37].These ranks reflect the extent of confidence that the estimated effect is close to the true effect. The GRADE guidelines [37] do not directly map onto the OGS for the body of evidence at the above-mentioned levels, so four ranges of OSG scores were assigned to these four quality rank (based on judgment of the research team) as follows: 1. High quality (3 OGS): a high confidence of true effect lying close to the estimated effect, Using these grading and ranking protocols the two review groups first graded and ranked the studies independently before a group discussion to ensure consensus. Table 2 shows the search statement and number of papers initially selected from individual databases. Initially, 331 records were identified through an online search into the 15 electronic databases. From these 331 papers, 131 duplicate papers were removed. The titles and abstracts of 200 remaining papers were screened, and 108 irrelevant records were excluded. The majority of these 200 papers (more than 50%) did not address the implications of sleep loss on adults' performance, instead they studied effects of time-on task fatigue, usual daytime sleepiness or Obstructive Sleep Apnoea on drivers' performance, or they have examined the effects of nap, light, wake-promoting agents, caffeine, etc. on drivers' sleepiness. The full texts of the 92 remaining records were assessed and 76 papers (more than 40% of the primary 200 papers) were excluded as they studied professional drivers, or the prevalence of sleepy driving only, or did not include driving performance outcomes in their designs. Finally, the 16 remaining papers (only 8% of the primary selected papers) were included in the systematic review. It should be noted that despite the presence of some other sleepiness-related studies that included the same age group [54,55], these studies could not be included since the outcome measures did not include driving performance [54], or their sample included older adults as well [55].  There were no randomised control trials within the reviewed papers. There was a homogenous group of experimental studies including 4 cross-over studies [56][57][58][59], 5 between-groups [26,[60][61][62][63], and 7 within-group [64][65][66][67][68][69][70]  Of the 16 studies, only 4 studies were conducted on real roadways ( [56][57][58][59], with two of those studies also including simulated drives in their protocol [56,57]. The remaining 12 studies utilised a driving simulator only.

Designs and methodologies
Driving durations ranged from 10 minutes to 8 hours. Overall, 50% of studies (8 papers) adopted short durations of less than 30 minutes (10 minutes: [60,63,66], 20 minutes; [67,69], and 30 minutes [61,65,68,70]. The other 8 papers varied in the durations of their drives, with some of studies examining multiple drive durations in their protocols. Only two papers, reporting data from the same study, adopted longer driving durations of 4 and 8 hours [58,59]. The majority of the reviewed studies (12 papers) adopted an acute sleep loss protocol, with 3 papers exerting a moderate (between 2 and 4 hours) sleep loss [61,62,64], 7 papers severe sleep loss [61] [26,[56][57][58][59]64], and 4 papers exerting total sleep loss [65,[68][69][70]. The remaining 4 papers included a chronic sleep deprivation paradigm [60,63,66,67]. Table 3 presents a detailed summary of key methodological characteristics of individual papers including year and country of publication, design and objectives, sample size, participant age, sleep deprivation regime, driving setting and driving duration, frequency and time of day when driving, as well as driving performance outcome measures. All papers reported on more than one outcome measure. Many of the papers did not directly report the standard estimates of effect size such as partial eta square or Cohen's D, Cohen's f 2 , coefficient of correlation (r), or coefficient of determination (r 2 ). Instead, they reported unstandardized effect sizes (the differences in outcome variables in the original units of variables), and some papers reported results as confidence intervals. Only four papers [61,63,67,69], reported the effect sizes as Cohen's d, Cohen's f 2 , or partial eta square. Different outcome measures were reported including lane crossings events, lateral position variables, speed variables, and crash events. As it is obvious from Table 3, there was a great variability in the methodological profiles of the studies presenting challenges for comparison of the effects of sleep loss and the generalisability of findings. More specifically, despite the prior intention of conducting meta-analyses in the protocol, the heterogeneity of the studies and incomplete reporting of effect sizes made this inappropriate.

Findings of the reviewed papers
Lateral position variables. As Table 3 shows lane position (lateral position) had different definitions, referring to the distance from a certain point on the car (i.e. the centre of the car, right side of the right front wheel) to some reference point on the road (i.e. roadway midline, one of lane markers, left lane marker). While, mean lateral position was not the primary outcome in most studies, and reported only in two studies with no effect of sleep loss on this outcome [61,68], the standard deviation of lateral position was the most frequently reported outcome after both acute and chronic sleep loss (nine papers; [26,60,61,63,[65][66][67][68][69]), representing variability in lane positioning.
While moderate acute sleep loss (3 h) increased the standard deviation of lateral position (large effect size, in a short simulated drive of 30 min [61]), with unclear changes in longer duration of drives [62,64], severe acute sleep loss of 5 to 5.5 h increased this outcome measure in both short (30 min) [61] and long drives [26], by 1.2 fold after the 30 th min of 1.5-h drive [26]. One night of total sleep loss also increased the standard deviation of lateral position in  (1): (driver or simulator or vehicle or "commercial driver" or "Professional driver" or "driver performance" or "truck driver" or "bus driver") and (sleepiness or drowsiness or hypersomnolence or "sleep onset" or "excessive sleepiness" or "sleep propensity" or fatigue or microsleep or alertness or vigilance or hypovigilance) and ("sleep deprivation" or "sleep loss" or "sleep limitation" or "sleep restriction"), b Statement (2): ("sleep depriv*" OR "sleep loss" OR "sleep limitation" Or "sleep restriction") AND TX (("sleepiness" OR drows* OR hypersomnol* OR "sleep onset" OR "excessive sleep*" OR "sleep propensity" OR fatigue* OR microsleep* OR alert* OR vigilance OR hypovigilan*) AND TX (driver OR simulator OR vehicle OR "commercial drivers" OR "professional driver" OR "driver performance" OR "truck driver" OR "bus driver") Overall, these nine simulator papers reported an adverse effect of sleep loss, except for one study [67] reporting no significant change in this outcome associated with sleep loss, while none of on-road studies, with severe sleep loss (6 h) and longer duration of drives (1.5-2 h) have reported this outcome measure [56][57][58][59]. Systematic review of effects of sleep loss on driving performance Lane crossings. Lane crossings (inappropriate line crossings) was the second most frequently reported variable, appearing in eight papers [56-59, 61, 62, 64, 66], and variously defined as crossing one lateral lane marker, leaving the road by all four wheels, and running off the road at least by two wheels.
In simulated driving paradigms, lane crossings increased significantly under different combinations of sleep loss and duration of drive. Both moderate (3 h) and severe acute sleep losses (5 h) in both short (30 min) [61] and long drives (the last 30 min of a 1.5-h drive) [62], increased number of lane crossings and the cumulative number of lane crossings (6-h sleep loss, 2-h simulated drive) [56]. There was also a positive correlation between lane crossings and distraction (defined as looking away from the main road way for more than 3 s) has also been reported in long simulated drives of 2-h under both moderate (3 h) and severe (5 h) sleep loss [64]. Similarly, a chronic moderate sleep loss (3 h) in a forced desynchrony protocol increased lane crossings in a short simulated drives of 10 min [66].
In on-road studies severe acute sleep loss (6 h) increased the number of line crossings [56], as well as the cumulative number of line crossings per person [57] during 6 and 5 episodes of a 1.5-h drive per day respectively [57], as well as longer drives of 2 h, 4 h and 8 h when compared   (Continued) Systematic review of effects of sleep loss on driving performance    (Continued) with the reference driving session (9-10 p.m.) [58,59]. In general, line crossings were reportedly increased after a variety of sleep loss and drive time combination. Speed variables. A variety of speed variables were reported in six studies [61,[65][66][67][68]70]. Moderate to severe acute sleep loss (3-5 h) [61] or even a total sleep loss [65], in short simulated drives of 30 min, did not impair mean deviation from speed limit [61] nor standard deviation of deviation from speed limit (speed variability) [61,65]. In two other studies, with the same drive times, total sleep loss did not change mean speed and speed variability, but significantly increased mean deviation from speed limit [68,70].
Chronic mild sleep loss (1.5 h) over 5 nights in short simulated drives (20-30-min) did not affect mean speed and speed variability [67]. Conversely, chronic (9-d force-desynchrony) moderate sleep loss (3 h) in short 10-min simulated drives, not only resulted in increases in variables such as deviation from the speed limit and speed violation (cumulative time of having a speed 5 km/h more than speed limit) as sleep debt accumulated over 9 days [66], but also an increase in speed variability at night time (effect of circadian phase) [66]. Overall, speed variables were less frequently and inconsistently found to respond to combinations of various types or severities of sleep loss and durations of drives.
Crash events. Crash events were reported in four papers [65,67,68,70], either with no explicit definition [67,68,70], or defined as driving off the road, stoppage events, or truck collisions [65]. From three studies, while acute total sleep loss in short simulated drives of 30 min did not change number of crashes in one study [65], there were significant increases in two other studies [68,70]. Chronic mild (1.5 h) sleep loss did not also change the presence of crashes in 20-min simulated drives [67]. These findings suggest an inconsistency in crash events under various sleep deprivation paradigms.
Effect of circadian drive for sleepiness on the findings. The circadian-mediated drive for sleep (time-of-day) contributed to impairments of some outcomes during the circadian nadir (typically the early morning hours) or in the afternoon. The time-of-day effect was reported in three forced-desynchronized studies that applied a 1 to 2-h [60], a 3-h [66] or a 4-h [63] sleep deprivation and a 10-min drive time in their protocols. In one study, the effect of prior wake time on standard deviation of lateral position was significantly greater at the circadian phase 60˚after nadir (2 h after awakening) when compared with circadian phase 180˚after nadir (7 h after awakening) [60]. In another study, standard deviation of lateral position had a significant rise at circadian phase 180˚after nadir (7 h after awakening), as opposed to circadian phase 60˚after nadir (2 h after awakening) [66]. In a more recent study a large effect of circadian phase was found on standard deviation of lateral position during circadian nadir (circadian phase 0˚) [63]. Greater impairments at circadian phase 60˚after nadir (2 h after awakening) have also been reported in in the speed variability [66].
The interaction between sleep loss and time-of-day is an important point to consider. Forced desynchronized studies support an effect of sleep restriction on performance, but one that is mediated by circadian phase position.

Direction of effects
The possibility of statistically combining the quantitative results by conducting a formal metaanalysis was explored. However, due to insufficiency, inconsistency, and non-comparability of the unstandardized reported effects, it was not feasible to combine the data to obtain a single pooled estimated effect size for each outcome. Instead then, this review determined and summarised the direction of effects of sleep loss on each outcome measure, as has been adopted in other sleep-related systematic reviews [71].

Quality of individual papers and the body of evidence
A summary of methodological elements (strengths and weaknesses) of the reviewed studies, that were considered for developing the GRADE criteria, is presented in the supplementary information (S1 Table). The GRADE criteria for rating the quality of each outcome measure in the individual papers are represented in Table 4.
The quality of each outcome measure within individual papers and across papers (body of evidence) is rated against the GRADE criteria in S2 Table. Clearly, each individual paper has been assigned different quality scores for different outcomes. Table 5 represents the ranking of the body of evidence for the quality of each outcome. Of the body of evidence that infrequently reported driving performance outcomes such as mean lateral position, deviation from speed limit, speed variability and crash events, all ranked very low quality suggesting a very low level of confidence of proximity of estimated effect of sleep deprivation on these outcomes to real effect. The body of evidence that frequently reported other outcomes such as standard deviation of lateral position, lane crossing, mean speed and standard deviation of speed were ranked low-quality evidence with a limited confidence of  Systematic review of effects of sleep loss on driving performance validity of estimated effect. None of the reported outcomes came from a medium or high-quality body of evidence.

Discussion
Based on the PRISMA-based systematic search in this review there is only limited (16 peerreviewed original papers) available evidence, with no systematic reviews, for impact of sleep loss on driving performance of young drivers over the last decade. This limited literature suffers from considerable inconsistencies in study designs, sample sizes, sleep deprivation regimes, definition and measurement of outcomes, driving settings, time-of-day, duration of drives, control for confounding factors, reporting of methodologies and results and magnitudes of effects. This heterogeneity of multiple study aspects and reported outcomes limits the generalisability of the findings and ability to conduct a meta-analysis. Lack of high-quality evidence in the existing literature, when applying the GRADE approach for quality ranking, could be mainly due to weak design, risk of bias and imprecision. The study designs included some robust quasi-experimental cross-over, within-groups, or between-groups repeated measures designs, but no randomized control trials (RCTs), nor large-scale studies or strong experimental designs. While "risk of bias" stemmed from inadequate monitoring of sleepiness while conducting the experiment and presence of task practice effect, "imprecision" (uncertainty) arose from small sample sizes with only male participants, possibly due to the over-representation of men in road crashes, or because of attempts to control for sex differences in response to sleep loss.
The standard deviation of lateral position and lane crossings were the two most commonly examined and predominantly impaired outcomes in this review. The findings suggest that the standard deviation of lane position is sensitive to prior wake period, time of day, and the day of sleep deprivation [66], with significant impairments of under acute [61] and chronic sleep loss [60,67]. Similarly, lane crossings was reported to increase after acute sleep loss [56]. These findings are in agreement with previous reports that lateral lane position and steering wheel variables are the most sensitive outcomes to sleep loss, both of which could result in lane crossings or hitting adjacent cars [55]. However, none of the reviewed papers reported findings for steering wheel variables, sufficient to enable any determination here on the utility of those variables. These findings therefore have limited reliability and suffer from a low quality of body of evidence suggesting a limited level of confidence in these two outcomes. Systematic review of effects of sleep loss on driving performance Speed related outcomes and crash events in this review both responded to sleep loss inconsistently. For instance, mean and standard deviation of speed as well as deviation from speed limit did not change after acute sleep loss, but significantly deteriorated after chronic sleep loss. Likewise, crash events in some studies did not change after acute sleep loss, but in other studies increased both in acute and chronic sleep loss. These findings on the one hand do not suggest a clear direction for effect of sleep loss, and on the other hand were graded as low quality and a carry a very limited confidence in their accuracy (reliability).
In summary, a small body of evidence is currently supporting the consequences of sleep loss on young drivers' performance, with considerable variety in the study designs, outcome measures, severity of sleep loss and methodologies. The reviewed studies do not suggest a robust and generalized conclusion for the type and magnitude of the effects. Consistent increases in standard deviation of lateral position and line crossing events were identified, but this was not the case for crash events and other speed-related outcomes. There is also no clear distinction between impact of sleep loss and circadian misalignment, since the confounding effects of circadian contributors to sleepiness have not been considered in the majority of these studies. Even these limited findings are questionable as the evidence is from very low to low quality studies as assessed by the GRADE criteria.
To draw a unified conclusion on the effect of sleep loss on young driver's performance, it is crucial for future studies to initially adopt higher quality experimental designs, including the RCTs to test interventions or superior epidemiological methods to ensure adequate power. Next, common protocols and consistent metrics should be taken in consideration when developing methodologies. Young female drivers should be included in studies intended to represent the driving population and to further research into any gender differences in response to sleep loss. The ecological limitations of driving simulators on the one hand, and progressive developments in driver and in-vehicle monitoring technologies on the other hand, suggest a need to shift from simulators towards on-road measurements. Lastly, best practice reporting protocols as outlined in the GRADE guidelines and the PRISMA Statement should be considered when reporting the findings to enable meta-analyses.
Supporting information S1 Table. Methodological elements of papers considered for quality rating. (DOCX)

S2 Table. Quality of individual papers and body of evidence based on the GRADE criteria.
A single GRADE score for a given outcome within an individual paper and an Overall Grade Score (OGS) for the body of evidence for that outcome are presented in the last two columns of this table. The upgrading and downgrading elements have been highlighted in yellow. The outcomes that have been reported once could not be assigned any OGS and are marked as NA in the table.