Visual scanning strategies in the cockpit are modulated by pilots’ expertise: A flight simulator study

Christophe Lounis; Vsevolod Peysakhovich; Mickaël Causse

doi:10.1371/journal.pone.0247061

Abstract

During a flight, pilots must rigorously monitor their flight instruments since it is one of the critical activities that contribute to update their situation awareness. The monitoring is cognitively demanding, but is necessary for timely intervention in the event of a parameter deviation. Many studies have shown that a large part of commercial aviation accidents involved poor cockpit monitoring from the crew. Research in eye-tracking has developed numerous metrics to examine visual strategies in fields such as art viewing, sports, chess, reading, aviation, and space. In this article, we propose to use both basic and advanced eye metrics to study visual information acquisition, gaze dispersion, and gaze patterning among novices and pilots. The experiment involved a group of sixteen certified professional pilots and a group of sixteen novice during a manual landing task scenario performed in a flight simulator. The two groups landed three times with different levels of difficulty (manipulated via a double task paradigm). Compared to novices, professional pilots had a higher perceptual efficiency (more numerous and shorter dwells), a better distribution of attention, an ambient mode of visual attention, and more complex and elaborate visual scanning patterns. We classified pilot’s profiles (novices—experts) by machine learning based on Cosine KNN (K-Nearest Neighbors) using transition matrices. Several eye metrics were also sensitive to the landing difficulty. Our results can benefit the aviation domain by helping to assess the monitoring performance of the crews, improve initial and recurrent training and ultimately reduce incidents, and accidents due to human error.

Citation: Lounis C, Peysakhovich V, Causse M (2021) Visual scanning strategies in the cockpit are modulated by pilots’ expertise: A flight simulator study. PLoS ONE 16(2): e0247061. https://doi.org/10.1371/journal.pone.0247061

Editor: Peter James Hills, Bournemouth University, UNITED KINGDOM

Received: July 10, 2020; Accepted: February 1, 2021; Published: February 18, 2021

Copyright: © 2021 Lounis et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting information files.

Funding: This research was supported by a chair grant from Dassault Aviation (\CASAC”, holder: MC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Monitoring activity in the cockpit

Throughout the flight, pilots must build and update their situation awareness (SA) to maintain flight safety margins [1]. The flight crew cannot update the SA without monitoring specific flight instruments (e.g., attitude indicator, speed, altimeter, engine parameters) and the external environment (by clear weather). The monitoring activity, particularly critical during dynamic flight phases such as take-off and landing, includes the observation and interpretation of the flight path data, aircraft-configuration status, automation modes, and on-board systems. It supposes a real-time comparison of instrument data or system modes against the expected values according to the current flight phase. A rigorous cockpit monitoring allows timely corrective actions in case of a parameter deviation, ensuring an optimal level of safety [2]. This monitoring activity is structured in sequences of attentional shifts from an instrument to another.

Irregularities in these sequences can undermine the safety margins. In numerous cases of aircraft accidents, pilots’ visual scanning has been described as “inadequate”, “ineffective”, or “insufficient” [3]. Since the 1994 report by the National Transportation Safety Board that the inappropriate monitoring was involved in 84% of major accidents in the United States [4], numerous studies investigated the visual behavior of the pilots. However, in a “practical guide for improving flight path monitoring” by the Flight Safety Foundation [5], which investigated 188 accidents with monitoring issues, it is underlined that many monitoring errors still occur, most of them during dynamic phases of flight (e.g., climb, descent, approach, and landing). In 2013, the Federal Aviation Administration required airlines to include an explicit training program to improve monitoring skills [6, 7]. Following the PARG study [7], the Bureau d’Enquêtes et d’Analyses (French Investigation Agency) encouraged the use of eye tracking systems to finely analyze and improve crews’ visual scanning. Interestingly, an extensive survey conducted on 931 pilots during the PARG study [8] showed that most of the pilots need a better description of what a “standard” visual circuit in the cockpit is. Similarly, in another recent survey [9], 75% of pilots deemed helpful to know the required visual patterns for the different flight phases to enhance their cockpit monitoring skills.

Visual scanning strategies as a marker of expertise

The relationship between visual scanning skills and performance has been highlighted in experiences where participants were trained to gaze at relevant areas. For instance, Shapiro et al. [10] demonstrated that videogamers that were trained using efficient visual scanning examples showed better performance compared with random pattern training or no training at all. In another air traffic control study, Kang and Landry [11] enhanced novices’ performance in a conflict detection task by presenting experts’ visual scans overlaid on the radar screen during the task. The study also showed that the visual presentation outperformed the “instruction-only” condition. These studies support the relationship between visual patterns and task performance, and demonstrate the possibility to improve these patterns with adequate training. The task performance increases with the experience and associated expertise. The links between the visual scanning strategies and the expertise were observed in fields such as radiology, driving, sport, military aviation or chess (e.g., [12–14]). Gegenfurtner, Lehtinen, and Säljö [15] conducted a meta-analysis and highlighted that experts (compared to non-experts) generally demonstrate more fixations on task-relevant areas as well as shorter fixations. In their review of eye movements in medicine and chess, Reingold and Sheridan [16] have labeled this greater perceptual effectiveness of experts as “superior perceptual encoding of domain-related patterns”.

Several studies in the aeronautical domain showed that pilots’ visual scanning strategies (e.g. duration and frequency of fixations) evolve with the level of expertise [17–25]. According to Bellenkes et al. [26] the fixations of experts are shorter and fixations on instruments are more frequent. Similarly, Kasarskis, Stehwien, and Hickox [27] noticed that expert pilots (1500—2150 flight hours) perform more fixations and have shorter dwell times than novices (40—70 flight hours), and argued that experts have more structured visual patterns. Lorenz et al. [28] have shown that experts (3000–10300 flight hours) spend more time looking outside the cockpit compared to novices (13–500 flight hours) during a taxiing task. Furthermore, a study involving fighter pilots flying high speed low altitude flights [29] highlighted the importance of efficient visual scanning strategies. In this study, the pilots who achieved the best flight performance made shorter fixations on the heads-down tactical display and alternated more frequently between the tactical display and the outside world. Similar results were found in experts (>1000 hours) and novices (200—400 hours) playing flight simulation games [30]. Because visual scanning appears to differentiate between expert and novice pilots’ performance, it is interesting to examine which eye tracking metrics are available in the literature [31] to compare the visual scanning strategies using various approaches such as the estimation of the distribution and patterning of the visual scanning.

The objective of the present work is to provide a framework for eye movement data analysis techniques to study visual scanning strategies in novices and experts. These eye movement measures and algorithms are presented in light of the results of an experiment involving novice and expert pilots during a landing scenario performed in a flight simulator. We examined the impact of expertise and the difficulty of the flight scenario on the visual attention allocation. The participants performed three times the same landing scenario with varying difficulty conditions. Two difficulty conditions incorporated a supplementary visual monitoring task, with different time pressure, to make cockpit monitoring more complex by increasing visuomotor activity. We analyzed the effect of the pilots’ profile (pilot vs. novice) as well as the effects of the landing difficulty on numerous standard (number of dwells, average dwell times) and advanced eye movements metrics (Lempel-Ziv Complexity, Gaze Transition Entropy, attentional modes, N-gram methods) presented in the following section.

State-of-the-art visual scanning metrics

Classical eye movements measures such as fixation duration, dwell time, or the number of fixations, provide relevant results when comparing novices vs. experts. However, statistical analyses of these metrics often involve time-averaging operations, thus, neglecting the information regarding the sequence of instrument scanning. Consequently, a rich part of the data that reflects the dynamic of the deployment of the attention processes is lost or not fully exploited. Numerous other metrics are available to explore and characterize in more depth visual scanning strategies. We use the broad term “visual scanning” to describe visual scanning made up of an at least one dwell to one area of interest (AOI), followed by a transition, and a dwell to another AOI; “visual scanning pattern” is used when the visual scanning is made up of repeated sequences of a given “visual scanning”. One approach to examine visual scanning strategies is to analyze transition matrix (e.g., [32–35], a second one is the characterization of fluctuation between ambient/focal visual behavior [36], another one is to derive global patterns metrics such as entropy (e.g., [37, 38], see [39] for a review). More generally, in this paper, we classified visual scanning strategies metrics in three AOI based approaches: one is based on Markov chains (transition matrix), another is based on the attentional modes, and the last one is based on sequences analyses. Fig 1 presents a comparison of the visual scanning metrics described below (e.g. formula, definition, strength shortcomings, strength, etc…).

Download:

Fig 1. Overview of the different visual scanning metrics classified by approaches.

https://doi.org/10.1371/journal.pone.0247061.g001

Markov chains

Several metrics allow examining whether visual scanning is narrow or wide.

The transition matrix probabilities.

They contain the information about how often a transition from one Area Of Interest (AOI) to another occurred based on subsequent dwells of the visual scan. This method provides a data representation that can also lead to the development of stochastic and queuing models [40] of the pilot’s scanning in the cockpit. This method can be extended to three dimensions by considering the location of the previous two dwells, which Norris et al. [41] have described as a second-order Markov chain. Jones et al. [42] showed that transitions matrices are sensitive to flight maneuvers. Based on the transition matrices, Hayashi proposed in 2004 [43] a Hidden Markov Model approach corresponding to different flight tasks. Its works were used afterward to model the dwell patterns of the space shuttle crew [44].

Transition matrix density.

Introduced by Goldberg and Kotval [31], the transition matrix density describes the dispersion of attention over time [45]. Transition matrix density provides a single quantitative value by dividing the number of active transition cells (i.e., those containing at least one transition) by the total number of cells. An unusually dense transition matrix (large index value), with most cells filled with at least one transition, can indicate a dispersed, lengthy, and wandering visual scan (this can reflect an extensive search on a display for example) [46]. A sparse matrix can reflect a more efficient and directed search, for example when using a computer software [40], or, in other contexts, can indicate a failure to properly monitor the environment, for example when a novice driver directs his gaze continuously to the road while ignoring/forgetting the rearview mirrors or when a pilot is excessively engaging his visual attention on a single instrument (e.g., [47]).

Attentional modes

K coefficient.

Another evaluation of the dispersion of the attention is a novel parametric scale called K coefficient introduced by Krejtz et al. [48]. This metric was created and developed during exploring artwork (e.g., painting) and map viewing [49] in order to investigate the dynamics of visual scan (focal vs ambient) when operating such tasks. In a recent study, Lounis et al. [50] used this method by modifying input data, using dwells and transitions instead of fixations and saccades. During various flight phases with automation in a full-flight simulator, they calculated for each pilot the mean difference between standardized values (z-scores) of each transition (a(i + 1)) and its preceding ith dwell (di), where d_i is the duration of the i–th dwell and a_i+1 the amplitude of the transition that occurs after the i–th dwell. μ_d, μ_a are the mean dwell durations and transition amplitudes, respectively, and ρ_d, ρ_a are standard deviations, respectively. (1) Values of K_i close to zero indicate relative similarity between dwell durations and transition amplitudes. Positive values of K_i show relatively long dwells followed by short transition amplitudes, which indicate focal attention. Negatives values of K_i refer to the situation where relatively short dwells are followed by a relatively long transition, suggesting ambient attention (diffuse attention). According to Heitz, R. P., & Engle, R. W. (2007) [51], in the diffuse mode, visual attention is more allocated to all regions of the visual field in quite equal proportion; in the focused mode, attention is concentrated at a few areas of interest, specified by a central or peripheral cue. An extremely focused mode could be compared to the concept of attentional tunnelling [47]. It is worth noting that the values of the K coefficient should be interpreted together with dwell duration results because different groups can have different average values of dwell duration and transition amplitudes.

Sequence analyses

The sequence analyses approach allows measuring the extent to which the time sequence of eye movements is ordered or random during a flight.

Gaze Transition Entropy (GTE).

Defined by Shannon and Weaver [52], entropy is a measure of lack of predictability in a sequence. This metric enables evaluating the structuration of the gaze [53]. When applied to eye tracking data, transition entropy describes the amount of information needed to describe the visual strategies, following the formula: (2) where i represents the “from” AOI and j represents the “to” AOI. Higher transition entropy denotes more randomness and more frequent switching between AOIs [54]. Ephrath, Tole, Stephens, and Young [55] have noticed an increase of entropy with increasing pilots’ mental workload (by adding a secondary task). Van de Merwe et al. [56] found that entropy increased as a result of cockpit instrument failure, conditions that most likely produce an increased mental workload. More recently, using GTE, Allsop et Gray, 2014 [57] revealed that visual scanning became more random during the an anxiety landing scenario. Diaz-Piedra et al. [58] observed a significant decrease in pilot’s gaze entropy when pilots faced a scenario presenting more complexity.

Lempel-Ziv complexity.

The complexity (i.e., the quantity and diversity) of visual scanning patterns can be assessed using Lempel-Ziv Complexity (LZC). LZC was defined by Lempel and Ziv in 1976 (for a review, see [59] as a data compression algorithm computing the minimum number of bits from which a particular message or file can effectively be reconstructed. This algorithm counts the number of different patterns in a sequence when scanned from left to right. For instance, Lempel-Ziv complexity of s = 101001010010111 is 7, because when scanned from left to right, 7 different patterns are observed: 1∣0∣10∣01∣010∣0101∣11. Recently, LZC was applied to the dwell transition to evaluate the number of different visual scanning patterns [60].

N-gram sequences.

N-gram is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads [61]. Basically, an N-gram model predicts the occurrence of an AOI, based on the occurrence of its N–1 previous AOI. So here we are answering the question: How far back in the history of a sequence of AOI should we go to predict the next AOI? For instance, a bigram model (N = 2) predicts the occurrence of an AOI given only its previous AOI (as N–1 = 1 in this case). Similarly, a trigram model (N = 3) predicts the occurrence of an AOI based on its previous two AOI. The common N-gram sequence analysis used the n-grams frequency-based method [62] to identify the number of common 3, 4, 5, and 6-gram sequences in each group. By using this method, it is possible to count the occurrence of N-gram AOI and their occurrence for each pilots, and thus it allows to compare for each N-gram the intra-group patterns consistency.

Current study

In the present study, we evaluated the efficiency of the previously describe metrics on the eye tracking data from novice and expert pilots. Our main hypothesizes were that expert pilots should exhibit different visual behaviors than novices, including more numerous dwells and shorter dwell times, following the idea that superior perceptual encoding processing comes with expertise. We expected also a sensitivity of all advanced metrics to expertise, with more visual scanning complexity (as evaluated by the Lempel Ziv complexity and the visual pattern lengths), and a more regular visual scanning (as evaluated by the transition entropy) in experts. We also assumed that the pilots’ expertise could be classified (using machine learning) in their way that they switched from an instrument to another, using transition matrices. Finally, we hypothesized that the addition of a parallel monitoring task should also have an impact on ocular behavior, notably by increasing complexity, reducing the regularity level, and generating an ambient mode of attention (i.e. more diffuse attention).

Materials and methods

For reproducibility purpose, the protocol is available on protocols.io; DOI number: dx.doi.org/10.17504/protocols.io.zb5f2q6.

Participants

Thirty-two participants, all males, participated in this experiment. They all had normal or corrected to normal vision. They were not informed about the exact purpose of the study. They were divided into two groups according to their flying experience. A first group called “novices” consisted of participants with no real flight experience (n = 16, mean age 25.7±5.5 years). They were recruited from a French aerospace engineering school (ISAE-SUPAERO, Toulouse, France). All these novices participants had advanced theoretical knowledge about aeronautical engineering, were familiar with the various information given by the instruments in the cockpit (altimeter, altitude etc.), and had flight notions on how to manually interact with the aircraft. Our experimental flight scenarios were relatively simple: the participant had to control the trajectory and the speed of the aircraft. The scenarios did not require complex navigation activities or interacting with automation. Thus the scenarios were feasible for these novices after a relatively short training session. A second group called “pilots” consisted of active professional airline pilots (n = 16, mean age 34.39 ± 8.86 years) with a minimum of 1600 flight hours (mean = 4321.73 ± 2911.41 hours). They were recruited from various airline companies. They all flew on A320 and were currently flying on A320 (68.75%) or B737 (31.25%) at the time of the experiment.

Ethics statement

This research project was approved by the local institutional Research Ethics Committee of the University of Toulouse (Comité d’Ethique de la Recherche de l’Université de Toulouse, code N°2019-131) and was conducted in accordance with the Helsinki Declaration. Volunteers signed an informed consent prior to the experiment and were informed of their right to stop their participation at any time.

Materials

Flight simulator.

We used an A320-like flight simulator (“PEGASE”) located at ISAE-SUPAERO (Toulouse, France), see Fig 2. Like in the A320 aircraft, flight instruments included a Primary Flight Display (PFD), a Navigation Display (ND), an Electronic Central Aircraft Monitoring display (ECAM), and an FCU (Flight Control Unit). The field of view covered by the simulator is about 180°. The participants controlled the aircraft with a side-stick, two thrust levers, and a rudder. We recorded flight data to calculate flight performance during the landing.

Download:

Fig 2. ISAE-SUPAERO flight simulator with its external screens.

https://doi.org/10.1371/journal.pone.0247061.g002

Flight scenarios.

Participants manually (i.e., without the autopilot) performed three times the same landing scenario according to three different conditions. The “control scenario” (CS) was a nominal landing without a supplementary task. The “easy dual task scenario” (EDTS) and the “difficult dual task scenario” (HDTS) were similar to the “control scenario” except that participants were asked to perform a supplementary monitoring task. The purpose of this supplementary task was to increase the level of visuo-attentional effort: participants had to regularly check the ND Zone in the ND screen to say aloud the value at the right time. In the “easy dual task scenario”, participants were asked to say aloud the distance between the aircraft and the airfield threshold every 0.5 Nm (information provided by a radio beacon localized near the airfield and displayed in the ND Zone, see Fig 3). In the “difficult dual task scenario”, they were asked to say aloud this distance every 0.2 Nm. The experimenter stayed in the cockpit during the entire experimentation. Each of the three-landing scenarios consisted of performing an approach/landing to Toulouse-Blagnac Airport, Runway LFBO 14R. The flight began at coordinates 1.2159° of longitude and 43.7626° of latitude. During each scenario, the participants had to comply with the same specific instructions related to the flight. In particular: to maintain a vertical speed between +500 ft/min and -800 ft/min, a speed of 130 knots, and a heading of 143° (corresponding to the Runway 14R). We choose these values because they roughly correspond to a standard landing speed with a commercial aircraft. The negative vertical speed of -800 ft/min approximately corresponds to the vertical speed at 130 kt with an angle of approach of three degrees. We defined a tolerance range in case the participant was not well stabilized on the approach slope and had to regain altitude (+500 ft/min maximum). Each landing scenario started at an altitude of 2000 ft and lasted approximately four minutes. The three scenarios were randomized across participants to avoid learning effects. Performance dependent variables were heading, vertical speed, and speed deviations. The number of omissions (i.e., the participant omitted to call out the distance) during the supplementary task was also calculated.

Download:

Fig 3.

Overview of the ten different AOIs: (1) Attitude indicator, (2) Speed tape, (3) Vertical speed tape, (4) Flight mode annunciator, (5) Heading tape, (6) Navigation display, (7) ND zone (displays the distance to recall during the two landing scenarios with the supplementary task), (8) Flight control unit, (9) Electronic centralized aircraft monitoring, (10) Out of the window.

https://doi.org/10.1371/journal.pone.0247061.g003

Eye movements recordings.

Eye movements were recorded at 60Hz using a Smart Eye remote eye tracker (Smart Eye AB, Sweden). The system detects human face/head movements, eye movements, and gaze direction. Gaze direction and eyelid positions are determined by combining image edge information with 3-D models of the eye and eyelids. As presented in Fig 3, the system uses five cameras integrated into the cockpit. A major advantage of using several cameras is that eye and head tracking can be maintained despite significant head motions (translation and rotation) or occlusion of one of the cameras by the participant (e.g., by its hand). Smart eye system allows to design a 3D environment and to establish calibration points (at the vicinity of AOI). When the world model is designed, we just need to operate an automatic calibration for each participant.

World model and area of interest.

The cockpit was split into 10 AOIs, corresponding to the different flight instruments and displays that pilots can examine during a flight, see Fig 3. We choose to restrict our analysis to instruments that display information directly related to the flight parameter (altitude, speed etc.) and external view (i.e., Out of the Window).

Procedure

At first, participants filled out the consent form and provided demographic information such as their flight qualification (aircraft type) and their flight experience (total hours of flight experience). Participants were briefed on the study and instructed about the different flight scenarios. Then, they were invited to seat in the flight deck at the captain position (left seat). The eye-tracking system was calibrated using an 11-point calibration. Following the Smart Eye manual recommendation, the 11 points were located in the vicinity of the AOIs. Participants performed a training session, consisting of performing two times a landing scenario. All participants (including novice ones) were able to control the aircraft correctly after these two landing. Then, the participants performed three times the same landings scenario than during the training, but with varying levels of complexity.

Data processing

Flight simulator and eye-tracking data were analyzed using MATLAB R2019b with custom homebuilt scripts. The data were recorded from the beginning of the landing scenario to touch-down. Because the landing duration depends on the pilot’s actions, landing durations could differ by a few seconds. As a consequence, the beginning of the scenarios has been cut out to obtain the same duration for each participant, corresponding to 14,000 frames sampled at 60 Hz for the eye-tracking data and 233 frames at 1 Hz for the flight simulator.

Eye tracking data.

Fig 4 shows the entire eye tracking pipeline analysis. Each AOI was coded using numbers from 1 to 10 corresponding to the flight instruments (see Fig 3). Only AOI-based data were extracted in this experiment and concatenated to obtain two chronological vectors containing the indices of the visited AOIs (from 1 to 10) and the time spent on them. Dwells inferior to 200 ms [40] were discarded. Furthermore, consecutive fixations in the same area were merged (e.g., for 1, 1, 4, 4, 5, 5, 5, 6 we only consider 1, 4, 5, 6). The transition vector (the vector containing the transitions between each AOI numbers) was used to compute LZC, GTE. Concerning the transition matrices, given their high dimensionality, it is difficult to use classical inferential statistics. Therefore, we applied machine learning algorithms on the concatenated transition matrices to compare the two groups of participants (novice vs pilot). Various types of machine learning model were used (SVM, LDA, K-Nearest Neighbor, for a review see [63]). The algorithm performing the best accuracy (Cosine KNN) was selected in this paper. The transition probabilities from one AOI to another were taken as a feature, thus raising the number of features to a total of 100 features (i.e., 10 AOIs × 10 AOIs). A principal component analysis (PCA) was used to reduce the features’ numbers. This restricts the model to 35 features corresponding to the main transition probabilities of the matrices. Five-fold cross-validation was used, which is a good trade-off between bias and variance estimation [64]. According to Combrisson and Jerbi [65] theoretical chance level for classification for p < 0.05 with two classes is around 58%. Concerning the K coefficient, the transition entropy, and the Lempel Ziv complexity methods, they were respectively computed following the methods of [48, 54, 60]. Finally, based on the transition vector, the n-grams frequency-based method [62] was used to identify the number of common 3, 4, 5, and 6-gram sequences in each group. After counting the occurrence of given n-grams for each participant, the number of common sequences of each n-gram was calculated for each group (Novice/Pilots).

Download:

Fig 4. Analysis pipeline for the eye tracking data.

https://doi.org/10.1371/journal.pone.0247061.g004

Flight simulator data.

The flying performances were examined to quantify the ability of the pilot to comply with the specific flying instructions given by the experimenter. As presented in Fig 5, Root Mean Square Errors (RMSEs) were calculated for 3 different flight parameters: speed, vertical speed, and heading. In this experiment, the predicted values corresponded to the different specific threshold given by the experimenter (i.e., speed 130 kt; vertical speed below -500 ft/min and above +800 ft/min; heading different from 143°) and the observed values corresponded to actual pilots’ performances. The deviations were calculated following the formula: (3) as where for n data points between points k and k + 1, Pi was the predicted value and Oi the observed value.

Download:

Fig 5. Analysis pipeline for the flight parameters data.

https://doi.org/10.1371/journal.pone.0247061.g005

Statistical analysis

We performed 2 × 3 repeated measures analysis of variance (ANOVA) for each dependent variable (i.e., dual task omission, average dwell time, the total number of dwells, LZC, transition entropy, K coefficient, RMSE heading, RMSE vertical speed, RMSE speed) to assess the effects of the group (novices, pilots) with scenario difficulty as the within-subjects factors (three levels: Control scenario, Easy dual task scenario, Difficult dual task scenario). The normal distribution for each dependent variable was also checked. We used the Greenhouse-Geisser and Huynh-Feldt adjustment to correct the violation of the sphericity assumption when needed. Bonferroni post-hoc tests were performed for multiple comparisons and reported Bonferroni post-hoc are only those with significant differences. The level of significance was set to α = 0.05 and partial η² was used to estimate the effect sizes.

Results

Flight performances

The flight performances are shown in Fig 6.

Download:

Fig 6. Flight performances for heading, vertical speed, and speed deviations among novices and pilots groups.

Error bars represent SD and * indicates main effects p < 0.05. (CS = control scenario; EDTS = Easy dual task scenario; HDTS = Hard dual task scenario).

https://doi.org/10.1371/journal.pone.0247061.g006

Heading.

There was no significant main effect of the group, F(1, 30) = 0.03, p = 0.874, nor main effect of the scenario, F(2, 60) = 0.9, p = 0.39, on heading deviations. The scenario × group interaction was not significant, F(2, 60) = 0.4, p = 0.67.

Speed.

A significant main effect of the group on speed deviation was found, F(1, 30) = 4.3, p < 0.05, η² = 0.13, with the novice’s group (M = 5.46; SD = 1.94) showing higher speed deviation than pilot’s group (M = 2.66; SD = 1.97). Analyses also revealed a significant main effect of the scenario, F(2, 60) = 3.6, p < 0.05, η² = 0.11. Bonferroni post-hoc test showed that speed deviation was lower during the control scenario (M = 2.95; SD = 0.93) compared to the easy dual task scenario (M = 3.70; SD = 1.02) and the difficult dual task scenario (M = 5.53; SD = 2.79). There was a significant effect of scenario × group interaction, F(2, 60) = 3.3, p < 0.05, η² = 0.09. Bonferroni post-hoc test showed that the speed deviation was lower for the pilot’s group in the difficult dual task scenario (M = 2.93; SD = 3.97) compared to the novice’s group in the difficult dual task scenario (M = 8.13; SD = 4.02).

Vertical speed.

Analyses revealed a significant main effect of the group, F(1, 30) = 11.4, p < 0.05, η² = 0.28, on vertical speed deviation, with the novice’s group (M = 565; SD = 130) showing higher vertical speed deviation than pilot’s group (M = 258; SD = 134). Analyses also revealed a significant main effect of the scenario, F(2, 60) = 5.1, p < 0.01, η² = 0.15. Bonferroni post-hoc test showed that the vertical speed deviation was lower during the control scenario (M = 265; SD = 103) compared to the easy dual task scenario (M = 403; SD = 141) and the difficult dual task scenario (M = 566; SD = 184). The scenario × group interaction was not significant, F(2, 60) = 0.7, p = 0.52, η² = 0.02.

Dual task omissions

Analyses showed (Fig 7) a significant main effect of the group on omissions, F(1, 30) = 35.3, p < 0.05, η² = 0.54. The novice’s group had a higher number of omissions (M = 2.75; SD = 1) than the pilot’s group (M = 0.68; SD = 0.5). Analyses also revealed a significant main effect of the scenario, F(1, 30) = 24.8, p < 0.05, η² = 0.45. Bonferroni post-hoc test showed that the difficult dual task scenario (M = 2.37; SD = 0.52) yielded more omissions than the easy dual task scenario (M = 1.06; SD = 0.3). The scenario × group interaction was significant, F(1, 30) = 16.2, p < 0.05, η² = 0.35. Bonferroni post-hoc test showed that there were more omissions during the difficult dual task scenario (M = 1.5; SD = 1) vs. easy dual task scenario (M = 3.95; SD = 2) in novices whereas the number of errors did not differ among the two scenarios for pilots.

Download:

Fig 7. Omission number for the easy dual task scenario and hard dual task scenario among novices and pilots groups.

Error bars represent SD and * indicates main effects p < 0.05.

https://doi.org/10.1371/journal.pone.0247061.g007

Basic eye metrics

Average dwell times.

Analyses showed (Fig 8) a significant main effect of group, F(1, 30) = 8.1, p < 0.05, η² = 0.22, with short average dwell times for the pilot’s group (M = 1.1; SD = 0.2) compared to the novice’s group (M = 1.51; SD = 0.21). We also found a significant main effect of the scenario, F(2, 60) = 19.0, p < 0.05, η² = 0.39. Bonferroni post-hoc showed that the average dwell time was shorter during easy dual task (M = 1.16; SD = 0.12) and difficult dual task scenario (M = 1.16; SD = 0.17) than during the control scenario (M = 1.58; SD = 0.22). There was no significant scenario × group interaction, F(2, 60) = 2.3, p = 0.11, η² = 0.07. The time spent gazing outside the defined AOIs was relatively low (M = 4.21% for experts; M = 4.62% for novices), see supplementary material for detailed information (S1 Fig).

Download:

Fig 8. From left to right, respectively the average dwell and the number of dwells averaged over all scenarios among novice and pilot groups.

Error bars represent SD and * indicates main effects p < 0.05.

https://doi.org/10.1371/journal.pone.0247061.g008

Number of dwells.

Analyses showed (Fig 8) a significant main effect of group, F(1, 30) = 13.3, p < 0.05, η² = 0.31, with a higher number of dwells for the pilot’s group (M = 188; SD = 21) compared to the novice’s group (M = 137.5; SD = 19.9). Analyses also revealed a significant main effect of the scenario, F(2, 60) = 13.2, p < 0.05, η² = 0.31. Bonferroni post-hoc showed that the number of dwells was higher during easy dual task scenario (M = 172; SD = 16) and during the difficult dual task scenario (M = 177; SD = 18) compared to the control scenario (M = 137; SD = 17). There was no significant scenario × group interaction, F(2, 60) = 0.7, p = 0.50, η² = 0.02.

Markov chain and machine learning

The confusion matrix presented in Fig 9 show that approach based on Cosine KNN reached classification accuracy up to 91.7% to classify expertise based on transition matrices during the baseline scenario. As shown in Fig 10, the differences in transition matrices between novices/pilots are mainly observed in a more sparsed distribution of transition probabilities from one instrument to another for the pilot’s group. Most of the AOI explored by Novice group involved AOI concentrated in the PFD (from 1 to 5, see Fig 4) while pilot’s group explore other combinations of AOI.

Download:

Fig 9. Confusion matrix of fivefold cross-validation using the Cosine K-Nearest neighbors among novices and Pilots groups during the baseline scenario.

https://doi.org/10.1371/journal.pone.0247061.g009

Download:

Fig 10. Markov chains (Left) and transition matrices (Right) AOI-based representations among novices (top) and pilots groups (bottom) during the baseline scenario.

https://doi.org/10.1371/journal.pone.0247061.g010

Attentional modes and K coefficient

Analyses showed (Fig 11) no significant effect of the group, F(1, 30) = 3.3, p = 0.07, η² = 0.10, on the K coefficient. However, the main effect of scenario was significant, F(2, 60) = 38.1, p < 0.01, η² = 0.56. Bonferroni post-hoc test showed that K coefficient was lower during the easy dual task scenario (M = -0.12; SD = 0.06) and during the difficult dual task scenario (M = -0.01; SD = 0.12) compared to the control scenario (M = 0.28; SD = 0.10). There was also a significant difference between the easy dual task scenario (M = -0.12; SD = 0.06) and the difficult dual task scenario (M = 0; SD = 0.12). The scenario × group interaction was significant, F(2, 60) = 4.8, p = 0.01, η² = 0.15. Bonferroni post-hoc test showed that K coefficient was lower for the pilot’s group in the control scenario (M = 0.14; SD = 0.16) compared to the novice’s group in the control scenario (M = 0.41; SD = 0.16). Bonferroni post-hoc test also showed that K coefficient was lower for the pilot’s group in the difficult dual task scenario (M = -0.10; SD = 0.17) compared to the novice’s group in the difficult dual task scenario (M = 0.09; SD = 0.16).

Download:

Fig 11. Ambient focal K coefficient during the control scenario, the easy dual task scenario, and hard dual task scenario among novices and pilots groups.

k > 0 indicates a focal visual attention, whereas k < 0 indicates an ambient visual attention. (error bars represent SD and * indicates main effects p < 0.05.

https://doi.org/10.1371/journal.pone.0247061.g011