Visual attention outperforms visual-perceptual parameters required by law as an indicator of on-road driving performance

Purpose A variety of visual and psychometric tests have been developed for assessing on-road driving performance and fitness to drive. The diagnostic power of a state of the art psychometric test battery (Vienna Test System) combined with a set of standard visual parameters recommended for assessing fitness to drive is investigated using an on-road driving test. The study aimed to determine whether a psychometric test battery could predict older adults’ on-road driving performance. The relevance of visual standards required by law is discussed. Methods Vision impairment is more prevalent in later adulthood and many studies on visual and cognitive impact on driving safety and performance therefore focus on adults above 60 years of age. We therefore acquired an extensive set of driving-related visual and psychometric performance parameters in a group of elderly drivers (N = 84, median age 69, SD 6.6 years). Visual assessment included foveal acuity, perimetric field size, and dynamic aspects of peripheral vision (termed “PP”) in the computer-based Vienna Test System (VTS; Schuhfried), as well as letter contrast thresholds in foveal and parafoveal vision in a separate setup. A selection of psychometric driving-aptitude tests that demonstrated the battery’s capacity to predict aspects of driving performance and safety were further conducted on the VTS. Driving performance was assessed in a standardized on-road driving test. Two independent observers rated driving performance using a fixed scoring system assessing the number of driving errors in pre-defined traffic situations. In addition, globalized driving competence scores were assigned on a 6-point scale. Results The test battery performed excellent in identification of good drivers but failed in the prediction of bad driving performance. Visual performance indicators required by German law were less indicative of driving ability than psychometric assessment. Selective and divided attention turned out to be much more important for predicting fitness to drive than either visual acuity, size of the visual field, or contrast sensitivity. Conclusion Predicting fitness to drive by means of visual and psychometric tests is an ambitious challenge. On the one hand sensitivity of a multi-disciplinary test-battery is too low to predict reliable driving ability in diagnostic settings which require an unambiguous interpretation of test results for individual drivers. Low sensitivity and low predictive values are incompatible with that objective. On the other hand, the results are valuable for a routine screening of fitness to drive. For that case, the assessment of attentional abilities in particular appears to be promising. Performance measures of divided and selective attention showed themselves to be the most predictive for fitness to drive in a sample pre-screened for clear visual deficits. Visual performance parameters required by law, in contrast, had no meaningful impact on driving performance, indicating a gap between mandatory regulations of state authorities and research results. Our results suggest that visual acuity tests designed for clinical diagnosis and monitoring of eye diseases should not at all be the choice for a screening of fitness to drive.


Introduction
Driving is a multifactorial competence. Driving means to enact distinctive, driving-specific competences, like navigation and orientation, lane keeping, speed control, distance monitoring, environmental scanning, giving way [1,2]. A certain level of basic sensory and psychological functions like visual and attentional capabilities further have a profound impact on driving-specific performance requirements. [1] Unlike driving competence, fitness to drive is a legal term that is defined by respective national legislation. In the European Union's member states, for example, there were about 110 different driver license models in 2006, with differing rights of disposal and periods of validity [3]. Requirements for fitness to drive now differ mainly depending on vehicle class. According to Annexe III of the EU Directive on driving licences, two groups of vehicles are distinguished: Group 1 comprises motorbikes, passenger cars, and tractors for agriculture and forestry (license classes A, A1, A2, B, BE, AM, L, and T), and Group 2 comprises trucks, truck trailers above 750 kg total mass, busses, and passenger transportation (license classes C, C1, CE, V1E, D D1, DE, D1E).
Based on that EU Directive, the German Driving Licences Regulation [4] entitles permanent residents aged 18 and above posessing verified visual abilities and knowledge of regulations to obtain a car driving license in Germany (Group 1). Adequate driving competence must be confirmed by an independent observer (examiner) in a driver's license examination. Psychological areas of competence (stress load capacity, navigation skills, ability to concentrate, attentiveness, responsiveness) are not routinely checked and are only tested as warranted (at indications of degraded performance or traffic offences).
Basically, assessment of driving fitness rests on a categorical diagnostic model: fitness (the criterion) is inferred from the scores on a pre-defined set of performance measures, the (preconditions). In the subsequent dichotomization, individuals are classified as fit, or unfit, to sustain steady control over brake and accelerator, and to quickly shift from one pedal to the other as circumstances may require); Head/Neck Rotation, and Arm Reach [13]. Among the perceptual-cognitive abilities, the Motor-Free Visual Perception Test/Visual Closure subtest was most predictive for identifying at-risk drivers by a wide margin. Three additional perceptual-cognitive measures-Trail-making (Part B); Delayed Recall; and Useful Field of View (subtest 2)-were also shown to be potentially useful predictors. Among the physical measures, the Rapid Pace Walk and Head/Neck Rotation appear to have the greatest potential value as predictors of driving impairment. Interestingly, absolute (rather than relative comparisons) cut-off values were defined for either indicating the need for prevention efforts or intervention have been calculated, i.e., for the UFOV (Subtest 2) a value of 200 msec (prevention) and 300 msec (intervention).
In Australia, Wood et al. developed a very similar, multi-disciplinary driving assessment battery, incorporating tests from vision, cognitive, and motor domains [14]. Central motion sensitivity was measured using a computer-based random dot kinematogram (Dot Motion) [15], further test were a computerized choice-reaction-time task (Colour Choice Reaction Time) [14], Postural Sway (participants standing on a medium-density foam mat with their eyes closed are asked to remain as still as possible for a 30 second duration) and kilometers driven on average per week were recorded.
In contrast to the relative-measures principle of psychometric psychological testing described above (where results are expressed as relative scores, set in relation to age-independent standard samples) tests of visual assessment from the ophthalmic and optometric tradition provide sample-independent, physically defined cut-off values for testing fitness to drive (Table 1). Essential for defining visual performance requirements in Europe are the basic recommendations of the EU's Eyesight Working Group [16]. According to the guidelines, binocular visual acuity should be 0.3 logMAR or better. The field should have a horizontal extent of at least 120 degrees. It should furthermore extend to 20 degrees above and below the horizontal meridian, and a minimum of 50 degrees to the right and to the left are suggested. There are no  in particular has been found to have a stronger relation with traffic accidents and violations than visual acuity), it is not clear what cut-off value  and method of measurement to use for contrast sensitivity, nor which cut-off value for glare  sensitivity. As the authors emphasize, the rationale for recommended cut-off values have not yet been properly justified. Across countries, in an international comparison strikingly similar standards for certifying fitness for driving are in place for both testing visual sensory function (acuity, visual field, and contrast sensitivity) and methods for their measurement [18]. Part of that consensus might be due to a common point of view when discussing visual performance in connection to causes and consequences of vision loss. In particular, aspects of vision are there typically focused on structural and functional changes of the eye. Cognitive visual skills (like reading, orientation), in contrast, are not discussed [19,20]. Furthermore, the importance of (non-visual) cognitive skills for safe driving appears to be generally underestimated, in light of the fact that these are not standardly required for a common driver's license. While there is a consensus about the necessity of visual-cognitive attentional fitness, evidenced e.g. in the studies on the UFOV test [21], studies on the diagnostic quality of cognitive tests for predicting fitness for driving, and in particular their comparison to visual skills for that aim, are rare. Wood et al. showed that a multi-disciplinary driving assessment battery was superior to measures of visual function tests (visual acuity, contrast sensitivity, and visual field) in its ability to predict on-road driving performance outcomes [22].
The purpose of our study was to investigate what aspects of vision and psychological testing are most predictive for an assessment of fitness to drive. Visual acuity was included because it has face validity and represents the exclusively necessary requirement for driving license applicants [23]. Assessment of further visual parameters followed recommendations of the European's Eyesight Working Group [16]. Procedures of psychological testing as an incident-driven assessment due to reasoned doubts of driver's license authorities about a driver's fitness to drive have been conducted according to requirements of the German Federal Highway Research Institute [5]. We therefore assumed that visual and psychological abilities should both add equivalent diagnostic value for predicting fitness to drive assessed in an in-traffic on road driving test.

Participants
Participants were 84 older drivers aged 60 years and above (56 = 67% male, 28 = 33% female; aged 60-91 yr with a mean age of 68.9, SD 6.6 yr), recruited by advertisements in local newspapers. Subjects were payed for joining the study. None of the participants reported ongoing ophthalmic treatment or eye disease (detected or declared). However, data were excluded from analysis if the individual dataset showed significant visual field loss that indicated ocular or visual pathway disease, or was simply due to systematic measurement error. In sum, seven participants were excluded from the original sample. Three of them showed significant visual field loss that indicated ocular or visual pathway disease, four participants were excluded due to indications for systematic measurement error. None of the participants was excluded due to insufficient visual acuity required by law (decimal acuity 0.3, for the better eye or binocularly).
All participants were current drivers and licensed to drive in Germany. They were given a full explanation of the nature of the study and experimental procedures, and written informed consent was obtained. The study adhered to the tenets of the Declaration of Helsinki, and has been approved by the Human Science Center (HWZ) as a permanent institution of the University of Munich. Participants attended two testing sessions. The first included a series of tests of vision and psychometric assessment and took about 2 ½ hours. The second session comprised a one-hour on-road driving test.

Visual assessment
Visual assessment followed the recommendations of the Eyesight Working Group who suggest to use those functions that are based on reasoning, common sense and practical experience important for safe driving [16]. We therefore included measurements of visual field size, visual acuity, and contrast sensitivity ( Table 2).
For diagnosing visual field size, an automated kinetic perimetry was performed (Octopus 101, Haag-Streit, Interzeag, Switzerland; Software PeriTrend V6.05). The OCTOPUS 101 is a 90˚-field cupola perimeter. The perimeter was installed in a darkroom and was controlled by a standard personal computer operating under Microsoft Windows. We used the Octopus Goldmann Kinetic Perimetry (GKP) module with a Goldmann III 4e stimulus (stimulus "size III"has a diameter of 0.43˚; intensity "4e"corresponds to 1000 asb, i.e. 318 cd/m 2 ). The kinetic module allows moving stimuli slowly from the outer boundaries of the visual field towards the center. Along 24 radii separated by an angle of 15 deg, stimuli moving at an angular velocity of 4˚/s were presented in random order. Stimuli were preannounced by an acoustic cue each, to ensure the stimulus is attended to. Participants responded by pressing a button, thereby indicating that the stimulus had been detected. The area seen by the respective, steady fixating eye defines the monocular visual field. Binocular visual field size and maximum field size on the horizontal meridian were computed manually. No trial lenses for correction of refractive error were applied. Quality of eye fixation was monitored by the system interrupting the examination when the patient was not fixating or closing the eye. Participants showing characteristic defects that indicated structural damage of the visual system (e.g. acute loss of visual field or quadrantanopia) on the basis of a visual analysis of printouts (analyzing the shape the visual field) were excluded from the study. Isopters were not corrected for participant's reaction time.
Visual acuity was measured using an Oculus Binoptometer (Binoptometer 3, G/59850/ 0207/d). The device allows visual acuity measurement using standardized optotypes (Landolt C) and standardized viewing conditions (DIN EN ISO 8596). Optotypes are presented in eight orientations. The natural status of the accommodation-convergence cross coupling is maintained by optical means in the apparatus. The simulated free-space viewing conditions thus induce no instrument myopia. To ensure reliable test results the system provides predefined assessment procedures by micro-processor controlled test sequences. For controlling and result-storage an external serial interface was used (RS 232C). Test sequences 4 and 5 of Detection of kinetic stimuli moving from the outer boundaries towards the visual field center

Contrast sensitivity
R_Contrast (Strasburger, 1987) Recognition of numerals varying in contrast (light on gray background) presented at one of five predefined locations in the center (0˚) and near-periphery (10˚) https://doi.org/10.1371/journal.pone.0236147.t002 procedure G25 were selected for reporting visual acuity in six predefined steps: 0.30, 0.22, 0.15, 0.10, 0.00, and -0.10 logMAR. Contrast sensitivity was measured using the routine R_Contrast, a program developed for rapid assessment of recognition contrast thresholds [24]. The recognition contrast threshold is defined as the level of contrast needed for correctly identifying a pattern out of a number of alternative patterns. Standard stimuli were the 10 numerals (0 to 9; size: 1˚visual angle; presentation time: 100 msec; background luminance: 62 cd/m 2 ) presented as light on gray patterns on a 19-inch flat screen. Stimuli were presented singularly in random order at one of five predefined locations in the central visual field. Participants were positioned at a constant viewing distance of 43 cm from the monitor, with distance stabilized by a chin-and head rest. Recognized numerals were reported by keyboard entry. R_Contrast uses the adaptive maximumlikelihood technique ML-PEST developed by Harvey [25,26] for stimulus presentation and threshold estimation. Both center-fovea (0˚) and near-peripheral (eccentricity 10˚) recognition contrast thresholds were obtained.

Psychological assessment
The Vienna Test System (VTS) that we used for the psychometric assessment is a computerized assortment of tests that can be used singly, or combined as test batteries. Part of the system is the "Expert System Traffic" (XPSV) that has been developed for subject assessment in the field of traffic psychology. At the core of the Expert System Traffic are two standardized test batteries (Standard and Standard Plus) that can be used to test abilities relevant to traffic. Various tests were selected from the XPSV, focusing on perceptual performance in traffic situations, on selective and divided attention, and reactive performance (Table 3) [27,28]. Psychological testing of older adults took about two hours with large variance (Mean: 110 minutes, SD: 25, range: 73-224). Noteworthy especially maximum test durations exceeded the approximate range of processing times provided by the manufacturer of the system (84-169 minutes).

Driving assessment
Driving performance was assessed in a standardized on-road driving test in natural traffic environment representing a typical setting for driver licensing tests in Germany. Open-road test designs are known to have particularly high validity and represent the gold-standard for assessing driving performance [29]. Driving assessment for the study took place in a car from a driving school (Audi A3) with manual gearshift. An accredited, professional driving instructor sitting in the front passenger seat was responsible for directing the driver along the route and monitoring safety. Whenever the situation allowed, informal conversation was being made while driving to relax drivers and to distract from the stated topic of having a driving a test-unless drivers did not wish to do so. Upon finishing the route, subjects were welcome for self-assessment and feedback from the driving instructor.
Participants drove along a 20-km route on the open road, starting with a short familiarization period. The driving assessment took around 60 minutes to complete. It was conducted either at mid-morning or mid-afternoon. The route included different kinds of road in the city, along suburban streets and two-lane bypass roads, and on small highways without centered road marking. The range of driving behaviors included merging (lane changing, merging and entering/exiting traffic flow), priority/giving way (at intersections, pedestrian crossings, and roundabouts), behavior at traffic-light controlled intersections, as well as traffic sign recognition for orientation and navigation. It thus complied with typical settings for driving assessment [30,31].
Driving was scored both by the driving instructor and by a psychologist trained in test drives, seated in the back of the car (backseat evaluator). Scoring was done at 52 predefined sections along the route. At each of the 52 sections, relevant observational aspects of driving were evaluated in four categories (0 = not observable; 1 = mistake; 2 = solved; 3 = very well solved). In sum there were 134 observations per driver. Each observation was further assigned to one of nine behavioral categories: distance behavior (keeping appropriate distance to vehicles in front and to the side, and to other road users), lane keeping (lane departures, lane positioning in curves, driving in the middle of the lane), transparent communication and interaction (non-verbal communication to other road users and pedestrians to indicate one's own behavioral intentions; i.e. face-to-face interaction and manual signaling), speed behavior (exceeding speed limits, inappropriate speed; i.e. driving too slow in relation to other drivers; Task: reacting to given critical constellations of visual and auditory stimuli by moving a finger from a resting position (resting button) to a response button speed compromising safety), observation of blind spots (correct checking for blind spots and shoulder checks, checking the rear-view and side mirrors, having a second view at intersections), priority/giving way (give way to the right, give way at intersections), indicating/signaling (appropriate use of the directional indicator) attentive and perceptive behavior (scanning the environment, paying visual attention to other road users, pedestrians, and bicycles) and anticipatory driving (avoiding heavy accelerations and decelerations or short intervals between accelerating and braking, appropriate planning and preparation). The proportion of errors observed by either the driving instructor or the psychologist was calculated for each observational category.
In addition, overall driving performance was scored on a 6-point scale based on driving standards criteria. Driving scores from 1 to 6 were assigned as subjective ratings that reflect the error scores obtained during the test drive. Consistency of these results was checked by rank correlations (Spearman Rho) between the two scoring systems (number of mistakes observed, and globalized ratings). Assessments showed themselves to be highly consistent, in particular for those categories that were easy to observe (like lane keeping, speed behavior, observation of blind spots). Higher proportion of errors led to lower globalized ratings. In line with our consistency checks, globalized driving scores were previously shown to be valid measures for differentiation between good and bad driving performance [22,[32][33][34]. A score of "1" indicated excellent driving skills with near flawless behavior. Drivers scoring "2" would definitely pass the licensing test, indicating average driving skills with minor driving errors. A score of "3" indicated below-average driving and observation skills, where the driver might or might not pass the licensing test. There are, however, no major driving errors. A score of "4" indicated the driver would definitely fail the licensing test. Drivers with that score failed to drive in a safe manner; major driving errors had been observed, traffic rules had been disregarded. Scores of "5" or "6" indicated that drivers represent an increased risk for other road users or that the instructor had to take action to avoid an incident. Overall driving performance was then calculated as the mean value of the ratings from both observers, instructor and psychologist. The driving instructor was not informed about the participants' functional performance in the laboratory testing; the back-seat evaluator conducted both the laboratory testing and driving assessment, for practical reasons. Inter-rater reliability for the ratings between the driving instructor and backseat evaluator was excellent, with an inter-rater correlation coefficient (Spearman Rho) of 0.86 (p<0.001).

Statistical analysis
For the statistical analysis, in a first step drivers were grouped, post-hoc, as safe or unsafe, according to their overall driving performance scores. Drivers being scored below 2.5 (indicating driving skills where the driver may fail the licensing test) were labeled "possibly unfit to drive", and drivers who scored 2.50 or better were labeled "fit to drive". Hence, the chosen cutoff value of 2.5 represents a very strict and rigid criterion. It was chosen to suit the categorical diagnostic model that is commonly used for testing driving ability, resulting in a split between drivers fit to drive or the opposite. Bivariate diagnostic models, by definition, cannot differentiate performance measures in between safe and unsafe. We thus had to decide between a strict and a tolerant bivariate model. Due to the statistical requirements for balanced samples sizes, we decided in favor of the strict model. Since "unfit to drive" might be a misleading label for drivers that scored around 3 in the driving test we described them as "possibly unfit to drive".
Post-hoc group differences for vision and driving characteristics were examined using independent t-tests and χ 2 tests, where appropriate. When group differences were significant, effect sizes were calculated using an online analysis tool provided by www.psychometrica.de [35], as significance and p values on their own are insufficient and can be misleading in interpreting data [36][37][38]. As interpretation of effect sizes depends on context, we interpreted the practical relevance of effect sizes using recommendations of J. Ferguson for social science data [36]. In particular, Ferguson proposes a minimum size representing a "practically" significant effect for social science data ("RMPE"). We will interpret effect sizes above that value but below the moderate size as 'small effect'.
In a second step, for each subset of variables (visual and psychometric measures), multiple regression with backward elimination via the Wald criterion (alpha = 0.1) was used, to isolate those variables with most power for predicting driving performance. Using the derived subset of visual and psychometric predictors, we attempted to reproduce the groups by means of binary logistic regressions. Data were analyzed with SPSS (V. 19.0; www.ibm.com).

Results
The sample was grouped into "fit to drive" (n = 59) and "possibly unfit to drive" drivers (n = 25) based on their driving performance in the on-road driving test described in Statistical analysis (Fig 1). In total, 59 drivers (70%) were assessed to definitely meet the driving license requirements for standard on-road driving tests (scores 1 and 2), while, on the other side, 7 drivers (8%) were assessed to definitely fail driving in a safe manner (scores 4 and 5; risky drivers). Driving scores in between (around 3) were labeled as below-average drivers (18 drivers, i.e., 21%). So, very bad drivers were rare; the majority of older adults in the study were, on the whole, good drivers. Table 4 shows driving characteristics of results indicating safe and potentially unsafe drivers. Not surprisingly there are striking differences between the post-hoc groupings of the sample, evidenced by mostly substantial effect sizes in nearly all observed categories. In particular deficits in give way situations, missing observation of blind spots, inappropriate speed behavior, and missing attentive/perceptive behavior mark drivers with low driving competence (Cohen's d > 1.15; i.e. moderate size). Table 5 shows the results of the fitness-to-drive tests, together with subject demographics. A post-hoc analysis showed between-group differences with small effect sizes for age, gender, estimated kilometers driven per year, and number of years posessing a class B driver's license (cars). No differences were found for level of education. Additional correlative analysis for the total sample reported elsewhere [39] revealed significant intercorrelations between a number of variables. The older the participants the lower the amount of estimated kilometers per year (correlation of estimated kilometers per year and age r = -0.27). Estimated kilometers per year is also related to gender (correlation of estimated kilometers per year and gender r = -0.29).

Visual assessment
Results of the visual assessment are shown in Table 6. Significant results, if of small impact on driving performance, were found for visual field size and visual acuity. The maximum horizontal visual field size (termed binocular visual field) varied, in total, between 115˚and 161˚but the average field size for the fit drivers was slightly larger (151˚) than that for the bad drivers (144˚). Visual acuity, being overall good and even at its lowest (logMAR 0.3 or 20/40, viz. 0.5) far from values that would indicate visual impairment, showed also in the mean a significant, but small, difference between the groups. For central-foveal and peripheral contrast sensitivity, no significant relationship to driving performance was found. As a predictive analysis, multiple linear regression was used to explain the relationship between driving performance and the visual performance indicators (Table 7). To reduce the predictive model to only those variables that explain additional variance in driving performance, a stepwise regression procedure with backward elimination of variables was chosen (Wald criterion alpha: 0.1). The resulting regression model accounts for about 18%

PLOS ONE
Visual attention as a promising indicator of on-road driving performance (R 2 = 0.174, F(2, 81) = 8.51, p < 0.01) of the variance found in driving performance. Visual acuity and Binocular visual field, in summary, show significant regression weights indicating that drivers who score better on those two variables are expected to show slightly better driving performance.

Psychometric assessment
We repeated the procedure described above to analyze the impact of psychological (psychometric) parameters (Table 8). Seven out of ten variables revealed significant effects on driving performance, divided attention showing the most prominent effect (Cohen's d = 1.41; i.e. moderate size). Intelligence, learning ability and reaction time showed no significant effect on driving performance. Again a multiple linear regression with backward elimination of variables was used to find the set of variables most predictive for driving performance ( Table 9). The resulting regression model accounts for about 30% (R 2 = 0.305, F(2, 81) = 17.79, p < 0.01) of variance in driving performance. Selective attention (test procedures Cognitrone, "COG) and divided attention (Peripheral Perception Test, "PP") show significant though small regression weights indicating that drivers with better attentional skills show better driving performance.
To calculate the amount of variance of attention measures account for beyond pure vision measures, we did further regression analysis. The analysis included both visual and psychometric parameters shown to be important for driving performance in the preceding regressions (see Tables 7 and 9). The regression model shows three significant predictors of which  [40] attentional measures (selective attention, β = −0.271, p = 0.005, and divided attention, β = 0.334, p = 0.001) are more predictive than visual parameters. The mixed regression model accounts for about 40% (R 2 = 0.388, F(4, 78) = 12.38, p < 0.01) of variance in driving performance (Table 10). Attentional measures therefore account for about 20% of variance beyond that for the vision measures (visual acuity, binocular visual field) that explained 18% of variance in driving performance (cf. visual assessment). Multicollinearity, i.e., correlations between predictors, is low (roughly 10%). Indicators for multicollinearity (tolerance and variance inflation factor (VIF)) are close to the ideal level (the obtained value of close to one, in both cases, indicates that predictors are not correlated).

Predicting driving ability by means of bivariate logistic regressions
Finally, a binary logistic regression was performed to determine the relative impact of visual and psychometric parameters as part of a combined diagnostic multilinear model predicting drivers as being fit or unfit to drive. The logistic regression model was statistically highly significant (χ 2 (4) = 28.619, p = 0.000 with df = 4). The result is in line with the result of the Hosmer-Lameshow goodness-of-fit test being not significant i.e., that the model prediction does not significantly differ from the observed values. The final model explained 42% (Nagelkerke R 2 ) of the variance in fitness to drive revealing a moderate goodness of fit between the predictors and the prediction. The model correctly classified 80% of the participants providing high specificity (93%). Sensitivity was low (46%), however, identifying just 11 out of 25 bad drivers of the standardized driving test (Table 11). Performance of the model rests on just two significant predictor variables. Low selective attention scores (COG) reduced the probability of being classified as fit to drive significantly (15% reduction of likelihood, Exp (B) = 0.848). Increased scores in divided attention were associated with an increased likelihood (28%, Exp (B) = 1.275) of being classified as fit to drive. None of the standard visual performance indicators (contrast sensitivity, visual acuity) added significantly to the model (Table 12).

Discussion
We investigated the relevance of visual and psychometric predictors for fitness to drive in a sample of elderly drivers. The diagnostic power of a state-of-the-art psychometric test battery  on the Vienna Test System, combined with a set of standard visual parameters recommended for assessing fitness to drive by the Eyesight Working Group of the European Union (visual acuity, field of view, and letter contrast sensitivity) turned out to be surprisingly low. We thus believe the test battery, even being state-of-the art and including additional visual tests, does not satisfactorily meet its main purpose, which is screening for fitness-to-drive and thereby select bad drivers to insure traffic safety. Bad drivers in our study revealed deficits in particular in observational/attentional behavior related to traffic safety (observation of blind spots, attentive and perceptive behavior, give way) indicating below-average driving skills where the driver might fail to pass a licensing test. However, only 11 out of 25 drivers who showed impaired driving performance in the standardized on-road driving test were identified by the augmented test battery (i.e., test sensitivity was 46%). The test battery as designed would thus not even be able to appropriately screen out drivers who were most at risk for driving accidents (i.e. drivers scoring �3.50 or worse in the driving test). Out of seven risky drivers, three had been incorrectly labeled as "fit to drive". In short, the diagnostic model failed to reliably predict fitness to drive. Another main finding of the study focuses on the practical relevance of the predictors involved. Visual performance parameters required by law had no meaningful impact on driving performance. This indicates an astounding gap between mandatory regulations of state authorities and research results. Visual testing, deemed particularly important, explained only about 20% of variance in driving performance, in contrast to psychological testing which accounts for about 30%, based on mainly two single predictors: performance in selective-attention and divided-attention tests. Our results are in line with those of Staplin et al. whose aim is updating a set of screening guidelines for older drivers with the American Association for Motor Vehicle Administrators (AAMVA) [13]. The validity criterion there, unlike the criterion of driving performance in the present study, are traffic-safety outcome measures provided by the National Highway Traffic Safety Administration. The study also revealed the importance of perceptual-cognitive measures addressing selective and divided performance as potentially useful predictors. Interestingly, no exclusively visual screening test was part of the guidelines recommended by Staplin et al., in spite of driving being unarguably a highly visual task [23].
The role of vision in driver safety and driving performance was summarized by Owsley and McGwin in a comprehensive review of the literature [23]. Although driving performance should be linked to driver safety in theory, they found little empirical evidence for that link. This means that road users who demonstrate impaired driving performance in on-road driving tests are not necessarily the ones at high risk for future crash involvement, and vice versa. Notwithstanding the ubiquitous visual screening being the standard at licensing agencies for determining driving fitness, the reviewed studies mostly revealed no significant relationship between visual acuity and driving safety (i.e. with motor-vehicle collision involvement). This is also true for visual-acuity related driving performance decrements (e.g. sign recognition) which do not translate into reduced safety. The authors conclude that visual acuity testing does not measure the visual skills required for driving. For visual field impairment, the reviewed studies showed mixed results. Owsley and McGwin refer to a famous large-scale study (i.e., 10.000 drivers) conducted by Johnson & Keltner, which showed that drivers with severe binocular field loss had significantly higher motor vehicle collisions and violation rates compared to those without loss [41]. Similar results were found by Rubin et al. [42]. In contrast to that, several other studies found no significant relation of visual field impairment and driving safety. Owsley and McGwin interpret the ambiguous results as due to differences in the definition of visual field impairment, mixed procedures of visual field measurement and missing control of compensatory strategies (i.e., eye and head movement) [23]. Concerning contrast sensitivity there are no known licensing regulations in the U.S. or Europe that require its assessment, but numerous studies have reported significant associations between impaired contrast sensitivity and crash involvement [42,43] or driving performance [44,45]. The authors close their review promoting a more practical approach to improve the efficacy of vision screening in assessing fitness-to-drive by supplementing the screening of visual acuity by other types of screening approaches, focusing on contrast sensitivity, visual field, processing speed, and divided attention.
Our conclusion is similar to that stated by Owsley & McGwin [23]. As visual impairment is mostly related to eye diseases (i.e. glaucoma, age-related macular degeneration [AMD], retinitis pigmentosa, cataract) it is the assignment of ophthalmologists and optometrists to test the integrity and health of the visual system as part of a medical check-up or health-care activity. The primary objective of visual screening might be the implicit reason why some studies have found significant relationships between visual function assessment and measures of driving performance and safety, and others did not. The extent of the relationship is dependent on the nature of the sample under observation [21]. For example, population-based studies have failed to find an association between contrast sensitivity deficits and increased crash risk [42,46], whereas studies of drivers with diagnosed cataracts (who showed a pronounced impairment of contrast sensitivity) revealed a strong relationship between contrast-sensitivity impairment and increased crash risk [43]. In sum, several studies have shown that people suffering from eye diseases might show impaired driving performance dependent on the severity of symptoms associated with eye diseases [43,[47][48][49]. Thus, in a health-care setting visual diagnostics are effective in revealing accompanying risks for persons concerned with eye diseases that should be taken care of. Future studies are needed to further explore the ability to drive safely with visual impairment, taking into account compensatory strategies that might help to prolong driving ability [50]. Importantly, however, visual acuity tests designed for clinical diagnosis and monitoring of eye health are not at all the choice for a screening of fitness-todrive in a healthy population. As Wood & Owens, e.g., show, results of a visual acuity test in most cases are not at all, or only minimally, relevant for fitness to drive [51].
A screening for fitness-to-drive should reflect the complexity of the driving task; we now enter the domain of psychological testing. If scores fail the cut-off criterion, the respective drivers might be referred for further testing (e.g., to an on-road test). But how to install psychological testing in an adequate manner in licensing policies of state authorities? The European Council Directive 2006/126/EG [52] gives general recommendations for license renewal intervals that may be related to a screening for fitness to drive. As of 19 January 2013, licenses issued by Member States shall have an administrative validity of 10 or 15 years. EU Member States are free to reduce the period of administrative validity to the age of 50 years in order to allow for an increased frequency of medical checks or other specific safety measures, such as requiring attendance to refresher courses. There is widespread agreement that aging generally results in some level of decline in sensory, perceptual, cognitive, psychomotor, and physical performance, and therefore probably also in driving skills [53]. However, age is not a useful index for determining intervals for license renewal or frequency of medical checks. The arguments against age-based assessment are many and varied: it has no demonstrable road safety benefits, it prompts premature cessation of driving, and it prompts older people to use alternative modes of transportation that are riskier than the private car [53][54][55][56][57]. There are likely even counterproductive effects of age-based assessments, indicated by a higher rate of fatalities of older pedestrians in countries with very strict renewal requirements [58]. According to Siren et al. [59], this is also the case for an age-based cognitive screening of older drivers. After implementation of a cognitive screening program, more drivers were involved in fatal accidents attributed to a possible shift of drivers who did not pass cognitive screening to more dangerous modes of transportation (i.e. becoming pedestrians) which made them more vulnerable in traffic. There are even indications that a simple in-person license renewal may be more effective in reducing older driver deaths than tests [54]. On the other hand, independent of age there are a large number of drivers that continue driving even when they are not in condition to do so [60]. Regarding the type of experienced indisposition (physical and/or psychological), the main causes are related to physical health. Emotional or psychological shortcomings related to driving (e.g. memory problems, attentional disturbances) are either not known to the respective drivers or ignored [61]. This is especially true for drivers suffering from dementia. It is estimated that one in three dementia patients still drives [62][63][64]. Hence, leaving the decision to continue driving to the driver appears insufficient to ensure on-road safety. We therefore argue for a multi-tiered strategic approach for enhancing individual and public safety without, however, unjustifiedly (based on crash statistics) stigmatizing the group of older drivers as potentially risky drivers. Research has shown older drivers to be among the safest group of drivers on the road. Old people in particular need individual mobility as a significant factor for quality of life. Driving cessation is often associated with loss of independence, with isolation, and even death [56]. Hence, older drivers should drive their own car as long as possible. We therefore recommend a brief, and inexpensive, routine first-stage screening in connection with an age-independent, in-person renewal of driving licenses. Here, instruments of known validity for safe driving should be applied. Cognitive tests should cover all relevant faculties, in particular selective and divided attention in a comprehensive battery, to thereby improve test sensitivity for driving competence [65]. The routine screening should be accompanied by community policies for referring drivers there that are suspected of having a high crash risk. At the core of the referral system would be family practitioners and caregivers, trained for understanding issues of safe driving and health-related factors or illnesses that could impede fitness to drive and on-road safety [61]. Practitioners and caregivers would need to be trusted third parties, able to refer a patient to more extensive mobility checking [66].
As Baldock et al. have shown, psychological assessment is a promising way for a screening for fitness to drive [67]. The study tested 90 drivers aged from 60 to 91, and used a battery of psychological, visual, physical, and cognitive tests, combined with a standardized on-road driving test. A computerized test of visual attention, devised specifically for the study, showed itself the best predictor of on-road driving performance. Also, Wood et al. evaluated screening tests for predicting older driver performance concerning its relationship to driving safety in an on-road driving test [14,22]. Participants included 270 [14] or 79 [22] older drivers incorporating tests from vision (e.g. static acuity, contrast sensitivity, visual fields, central motion sensitivity, central motion sensitivity), cognition (e.g. processing speed, complex reaction time for driving), and sensorimotoric domains (e.g. postural sway, total range of neck rotation). The final result of both studies is that a multi-disciplinary test battery composed of parameters derived from different domains showed a marked capacity to predict safe and unsafe driving in older adults assessed in an on-road driving test under in-traffic conditions. Similar to Baldock et al.'s study the final battery included no standard optometric visual parameters [67]. Part of the test-battery was driving exposure (kilometers driven per year), sensorimotoric assessment (Postural Sway), and cognitive testing (Dot Motion, Color Choice Reaction Time), providing a sensitivity of 80% (i.e. the proportion of unsafe drivers who were correctly identified as such by the test), and a specificity of 73% (i.e. the proportion of drivers who did not have an incident or make a critical error) [22]. Wood et al. reported similar values, 89% sensitivity and 77% specificity [14]. The levels of sensitivity and specificity reported exceed ours and those of other studies [68]. According to Wood, this might result from dissimilarities in the samples and/ or different levels of difficulty of the driving tests allowing visual or cognitive deficits to show up [22]. Methods of statistical analysis were further different. In the tradeoff between sensitivity and specificity, Wood et al. found cutoff-points by inspecting the ROC, that led to acceptable values for each slightly favoring sensitivity [14,22,69]. Last but not least the multi-disciplinary driving assessment proposed by Wood et al. incorporates a much wider range of domains important to driving, mirroring the underlying multi-factorial nature or driving [1,70]. Our study focused exclusively on visual and cognitive assessment. Nonetheless, both studies come to the same conclusion, that mainly cognitive assessment provides the key predictors for safe and unsafe driving in older adults, visual assessment in comparison turns out to be less effective.
While our findings do not provide a clear guidance for predicting fitness to drive, the study results are consistent with previous research [22,23,67], suggesting the need for a specific test battery, with demonstrated practical relevance, for a population-based screening for safe and unsafe driving in older individuals. Towards that goal, our results clearly indicate the superior effectiveness of psychological assessment over a visual screening. Our results are based on driving assessment under real traffic conditions using a standardized route providing a wide variety of typical driving challenges. Agreement in ratings between the driving instructor and psychological supervisor were high, indicating reliable on-road driving assessment. A limitation of statistical analysis in the present study might, however, be a selection bias of participants. In Germany there is no restriction for driving licenses based on cognitive measures. Cognitive measures are required only for drivers having committed documented, repeated traffic violations (speeding, jumping red lights, etc.). Hence, the visual variability in the sample was indeed limited due to selective processes (e.g., datasets indicating ocular or visual pathway disease were excluded since persons with poor visual acuity get no driving license). In contrast, no screening for cognitive deficits took place, providing room for variability in cognitive parameters. Thus, it is fair to say that attention measures were more predictive than vision measures in a sample pre-screened for clear-cut visual deficits but these results might not generalize beyond that restriction. Furthermore drivers aware of their visual deficits or driving impairments might have not responded to our newspaper announcement inviting to a research project related to fitness-to-drive. Our sample may thus not fully represent the population of elderly drivers. Another limitation of our study is a limited number of drivers that had been rated as unfit to drive in the on-road driving test (25 out of 84). While that proportion of 30% may be representative of the elderly population, it is not the best precondition for logistic regression which works better with a balanced sample size (50%/50%) [71]. Classification accuracy, in the unbalanced situation, is not of judging model performance. The proportion of drivers rated unfit to drive in our sample was below that value (30%) meeting the recommended minimum limit (N = 25).
In summary, our study highlights the need to take notice of the disparate goals of medical diagnosis in health care and traffic safety, and distinguish screenings in the tradition of ophthalmologists and optometrists from screenings for fitness to drive addressing driving safety of road users. The former, based on visual acuity testing and assessment of the intactness of the visual field, is highly effective in clinical diagnosis and monitoring of eye diseases. The latter addresses human performance in a highly complex behavioral task: driving. The improved practical relevance and superior effectiveness of specific test batteries designed exclusively for that purpose has been shown, here and elsewhere. Future research should take into account ongoing innovation in vehicle technologies which will also alter requirements for the screening for fitness to drive. Advanced driver-assistance systems (ADAS), designed to minimize human error, might compensate for known driving deficits that accompany a driver's sensory impairments underlying bad driving performance [23]. Increasing the automating of driving will shift active drivers to the role of a supervisor of the car, thus reducing the complexity of driving to that of a simple monitoring of the system [72][73][74][75]. Inherently, the pronounced shift of technology to automatic driving might shift the focus of a screening for fitness to drive even more to an assessment of attentional abilities needed to monitor the system.