Using Highlighting to Train Attentional Expertise

Brett Roads; Michael C. Mozer; Thomas A. Busey

doi:10.1371/journal.pone.0146266

Abstract

Acquiring expertise in complex visual tasks is time consuming. To facilitate the efficient training of novices on where to look in these tasks, we propose an attentional highlighting paradigm. Highlighting involves dynamically modulating the saliency of a visual image to guide attention along the fixation path of a domain expert who had previously viewed the same image. In Experiment 1, we trained naive subjects via attentional highlighting on a fingerprint-matching task. Before and after training, we asked subjects to freely inspect images containing pairs of prints and determine whether the prints matched. Fixation sequences were automatically scored for the degree of expertise exhibited using a Bayesian discriminative model of novice and expert gaze behavior. Highlighted training causes gaze behavior to become more expert-like not only on the trained images but also on transfer images, indicating generalization of learning. In Experiment 2, to control for the possibility that the increase in expertise is due to mere exposure, we trained subjects via highlighting of fixation sequences from novices, not experts, and observed no transition toward expertise. In Experiment 3, to determine the specificity of the training effect, we trained subjects with expert fixation sequences from images other than the one being viewed, which preserves coarse-scale statistics of expert gaze but provides no information about fine-grain features. Observing at least a partial transition toward expertise, we obtain only weak evidence that the highlighting procedure facilitates the learning of critical local features. We discuss possible improvements to the highlighting procedure.

Citation: Roads B, Mozer MC, Busey TA (2016) Using Highlighting to Train Attentional Expertise. PLoS ONE 11(1): e0146266. https://doi.org/10.1371/journal.pone.0146266

Editor: Rouwen Canal-Bruland, VU University Amsterdam, NETHERLANDS

Received: September 13, 2015; Accepted: December 15, 2015; Published: January 8, 2016

Copyright: © 2016 Roads et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Files containing raw data from Experiments 1–3 are available from the figshare database: http://dx.doi.org/10.6084/m9.figshare.2009589.

Funding: This work was supported by the National Science Foundation, Directorate of Social, Behavioral and Economic Sciences (SBE-0542013 to MCM); National Science Foundation, Office of Multidisciplinary Activities (SMA-1041755 to MCM); National Science Foundation, Division of Social and Economic Sciences (SES-1461535 to MCM); and the National Institute of Justice (Grant #2009-DN-BX-K226 to TAB). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Individuals spend a majority of their waking hours performing complex visual tasks and many occupations specifically require operating in challenging visual environments, e.g., monitoring multiple stock-exchange status displays, controlling air traffic, screening baggage, examining fingerprints, inspecting medical images, driving trucks, and performing surgery. Yet, acquiring visual expertise in any task domain is challenging and time-consuming.

Visual expertise might be decomposed into two interacting abilities: attentional expertise—knowing where to attend in complex, cluttered scenes—and procedural expertise—knowing what to do with the information gathered at the focus of attention. Acquiring these two abilities poses a circular challenge: individuals cannot learn what features and locations in the environment are task-relevant until they understand how the information should be integrated and processed, but individuals cannot learn how to process information until they identify relevant feature locations.

Studying the acquisition of visual expertise is quite challenging due to the interaction between the attentional and procedural skills required. Can one type of skill be studied in isolation? It does not make sense to study procedural expertise in the absence of attentional expertise, because attentional expertise provides the visual representations on which classification and judgment procedures operate. However, the reverse is not true: in principle, attentional expertise can be acquired in the absence of procedural expertise, as in, for example, a situation where one is instructed to classify based on color but not told the classification rule. Once novices have learned where and to what to attend, expert-like attention logically supports the acquisition of procedures an expert must perform.

Indeed, past work provides encouraging indications that guiding attention can guide higher-order cognitive processing [1,2,3,4,5,6,7,8]. For example, Grant and Spivey [2] noted that certain fixation patterns predict success on the tumor-and-lasers radiation problem, and simply cueing the critical locations increased the probability of success. Although subjects had no top-down guidance or goals that steered attention to a critical location, the mere act of attending to the location was sufficient to increase the likelihood of the relevant insight.

Our long-term goal is to develop procedures that improve training of visual expertise. Just as visual expertise might be decomposed into attentional and procedural skills, training of visual expertise might be decomposed similarly. Given the dependencies between attentional and procedural expertise we just discussed, it seems like a sensible first step to focus on the challenge of training attentional expertise. Thus, we wish to develop an efficient and relatively effortless means by which novices learn to deploy spatiotemporal attention in a task-appropriate manner. We explore a training paradigm that leverages expert knowledge in a perceptual learning paradigm that involves the following steps: (1) recording gaze dynamics of experts as they perform a particular task; (2) building a model that predicts locations experts are likely to inspect in specific images and task contexts; and (3) placing a novice in the visual environment and having them perform the task while highlighting predicted locations of interest via saliency manipulations.

This paradigm, which we refer to as attentional highlighting, addresses two challenges that arise when using experts to assist in the training of novices. First, expert knowledge is often procedural and implicit, and experts are limited in their ability to articulate their strategies [9,10]. For example, when experts fail to report abnormalities in medical images, fixation statistics still discriminate between missed abnormalities and abnormality-free areas (see [11] for a review). This finding suggests that expert gaze behavior reflects additional implicit knowledge that is not readily verbalizable. Attentional highlighting avoids the issue of knowledge accessibility by analyzing where experts are looking instead of asking experts to verbally report their strategies; expert fixations indicate the locations and features that are important for accomplishing the task. Second, verbal instruction may be unhelpful or even harmful to a trainee because it interferes with the deployment of attention and the natural pace of perceptuomotor behavior while performing a task. With attentional highlighting, verbal instructions are replaced by saliency enhancements, which leverage individuals’ rapid and automatic machinery for directing gaze.

This paper explores whether the acquisition of attentional expertise can be facilitated via highlighting—guiding attention by dynamically modulating saliency during training. Because individuals are capable of learning statistical correlations in visual information for statistically structured sequences of objects [12,13], task-irrelevant perceptual information [14], and visuospatial context [15], attending to expert fixation locations may be sufficient to train novices to deploy attention in a task-relevant manner.

Highlighting has been used to boost performance in problem solving and memory tasks. For example, in a collaborative puzzle-solving task involving a novice and an expert, Velichkovsky [7] observed improved performance when either the novice is cued to the expert partner’s fixation sequence, or vice versa. However, highlighting in conjunction with a verbal description of a solution procedure can impair a novice’s performance [16]. In the domain of recognition memory, yoking fixations at encoding and test improves performance, although replaying other-observer fixations at test is as effective as replaying same-observer fixations, and scanpath order does not matter [17]. In perception, Litchfield et al. [18] showed that novice radiologists benefit from viewing another individual’s scanpath.

Recently, attentional guidance has been used as a training method. Nalanagula, Greenstein, and Gramopadhye [19] trained novices to detect defects in circuit boards. Training included viewing three displays in which dynamic highlighting of location sequences was provided. Instead of using actual expert fixation sequences, the sequences used were based on the expert’s verbal expression of their search strategy given a trace of the raw saccade data. Dynamic highlighting of sequences during training led to better performance on the detection task over a control condition in which subjects viewed images without gaze cues. Unfortunately, the two conditions were not strictly matched for viewing time or controlled to ensure equal attention to the training images. Vine et al. [8] also found that guiding attention can expedite the learning of laparoscopic skills needed by surgeons. The task involved remotely manipulating balls into cups. Highlighting occurred via a mask overlaid on the field of view that occluded most of the scene. Under manual control, the experiment serially unmasked single locations as the objects at those locations became task-relevant. Because highlighted locations corresponded to task subgoals, highlighting served to sequence actions of the student in a task involving extended action sequences. Causer et al. [20] found that reviewing gaze behavior with subjects improved performance outcomes. As part of the training, subjects viewed videos of their gaze pattern and those of the top nationally ranked shotgun shooter. During the video sessions, researchers highlighted similarities and differences between the subject’s gaze behavior and the expert model. The eye gaze behavior of the training group become significantly more expert-like and shooting accuracy improved. Tomlinson, Howe, and Love [6] studied a video game in which players could select one of eight different status information formats for an on-screen display. Using a model of expert selection as a training companion, novices provided with contextually relevant information converged more rapidly on expert-like behavior.

Although these experiments establish that cueing novices to locations can speed training, a causal link between expert eye movements and performance is still uncertain. Are the eye movements exhibited by experts functional, i.e., do they contribute to the expert’s performance, or are they merely a byproduct of cognitive operations? To the extent that high-resolution vision is needed for performing a complex information processing task, clearly critical visual information must be foveated. For example, when radiologists are presented with x-ray images for a brief duration that prevented saccades, increasing the distance from the fixation point in the image to the tumor results in a monotonic drop in detection accuracy [21]. Although all individuals can attend to a location other than the locus of fixation [22], moving the eyes is more efficient than covertly shifting attention when performing complex visual tasks [23,24]. Beyond these arguments that fixation is efficient or necessary to obtain expert-level performance, a study by Thomas and Lleras [5] provides a causal link between fixation and performance. While subjects tried to solve the tumors and lasers problem, they were cued in a particular fixation sequence using an irrelevant detection task that was superimposed on the standard tumor diagram. Subjects were unaware that they were being cued in particular manner, but when cued in a solution-specific sequence—multiple saccades across different parts of the tumor boundary—subjects were more successful in solving the problem.

Based on the above results, we find it difficult to conceive that the fixations of experts are not a key contributing factor to their performance. Nonetheless, expertise has benefits that go beyond where the expert is fixating. When stimulus presentations are sufficiently brief that experts do not have time to make saccades, expert performance is often above chance, indicating that experts can utilize parafoveal and peripheral vision to attend to local features in a larger spatial context. Expert use of parafoveal and peripheral vision is directly implicated in domains such as chess, where experts do not always foveate on individual chess pieces, but instead foveate on empty squares that lie at the centroid of arrangement of pieces [25].

As a domain, we focus on the forensic task of comparing a pair of fingerprints. We begin by describing the fingerprint-matching domain, characterizing the fixation sequences of novices and experts, and constructing a model that reliably discriminates saccade sequences of novices and experts. With this model, we can conduct experiments and evaluate the degree to which training via attentional highlighting is effective in transitioning novices toward expert-like attentional control.

Characterization of Fingerprint Examiners

Fingerprint analysis is part of the broader field of forensic expertise, which involves the examination of partial or distorted trace evidence left at a crime scene. Fig 1 illustrates a typical fingerprint pair used in casework and adopts the common practice of placing the latent print on the left and the inked print on the right. Latent prints are obtained from crime scenes and are often distorted, partial, and overlaid on surplus visual information. However, inked prints are made under carefully controlled conditions to get as clean and consistent an impression as possible. Automated classification techniques have been developed to reliably match pairs of inked prints, but due to the variability and degradation of latent prints, matches involving a latent print and an inked print require expert human judgments.

Download:

Fig 1. An example of realistic fingerprint casework taken from the National Institutes of Standards and Technology Special Database 27.

This example demonstrates the noisy and partial nature of a latent print (left) compared to the matched inked print (right).

https://doi.org/10.1371/journal.pone.0146266.g001

Fingerprint examination expertise is acquired via a time-consuming training period; it can take 1–2 years until a trainee is allowed to carry out unsupervised casework [26]. However, following this training period, fingerprint examiners are exceedingly accurate compared to novices [27,28]. Due to the substantial training required, even modest gains in training efficiency would be beneficial. Forensic science is well-poised to benefit from novel training techniques that exploit research on perceptual expertise and cognitive science in general [29,30,31].

Although the expert examiner’s task will always involve latent prints, in the present work we focus on the task of comparing pairs of inked prints, for two reasons. First, initial training on noise-free and complete examples can benefit the learner [32]. Second, expert scanpaths are more consistent than novice scanpaths on inked prints [33]. In contrast, expert scanpaths are actually less consistent than novices on latent prints, possibly because the partial and noisy nature of latent prints elicits more idiosyncratic strategies [33]. Consequently, it is easier to evaluate the expertise of an examiner using pairs of inked prints.

There is broad consensus among fingerprint experts that a fingerprint examination occurs at multiple levels of analysis [34]. At a coarse level of analysis, experts examine the overall ridge flow of the fingerprint. Two diagnostic ridge flow patterns are the core and the delta (Fig 2). While it is possible to reject a match at this coarse level, further analysis is necessary to confirm a match. At a fine level of analysis, experts zoom in to local discriminative features called minutiae; the two fundamental minutiae are ridge endings and bifurcations (Fig 2). Experts report that they typically rely on the intermediate level of analysis for making a judgment between two fingerprints [34].

Download:

Fig 2. Close up of an inked print from the dataset used by Busey et al.

[35]. Shown are the locations of the core and delta used in level one analysis and examples of the two fundamental minutiae—ridge endings and bifurcations—that are used in level-two analysis.

https://doi.org/10.1371/journal.pone.0146266.g002

Eye Movement Data from Novice and Expert Fingerprint Examiners

To explicate differences between expert and novice fingerprint analysts we reanalyzed a dataset collected by Busey et al. [35]. The dataset consists of 26 matching inked fingerprint pairs and two mismatching pairs. (Matching pairs are a preferred and more informative source of data because ‘same’ judgments are typically slower and more deliberate than ‘different’ judgments.) Busey et al. [35] collected fixation data from 12 experts and 12 novices performing the matching task. In this study, experts were recruited at forensic identification conferences and laboratories, and novices were recruited from the Bloomington, Indiana community. Experts in the dataset had a reported average of 7.6 years of unsupervised latent print work (minimum 3 years, maximum 13 years). On each trial, subjects viewed a pair of fingerprints side by side, which we’ll refer to as the left and the right prints. Each trial was divided into three parts: for 5 s, the left print was presented alone, then for 5 s, the right print was presented alone, and finally for 10 s, both prints were presented simultaneously. After each trial, subjects were asked to indicate whether they believed the two prints came from the same source, different sources, or ‘unable to tell’.

Summary statistics of the dataset reveal no gross differences between expert and novice fingerprint examiners [35]. The total number of fixations (19.0 for experts vs. 18.1 for novices) and their mean durations (176 ms for experts, 174 ms for novices) are similar. Some differences can be found: experts have smaller saccade amplitudes on both the left and right print and experts make more saccades within prints than across prints than novices. The fact that experts make more saccades within prints is consistent with the idea that experts are more efficient at encoding and searching for new information, similar to the behavior exhibited by expert chess players [25] and expert medical diagnosticians [11]. Experts also have slightly more fixations on the left print than the right, which is consistent with the standard procedure for latent print examinations. In contrast to summary statistics, saccade targets and sequences indicate stark differences between novices and experts, as illustrated by Fig 3. We therefore focus on using stimulus-specific fixation patterns to discriminate novice and expert viewing behavior.

Download:

Fig 3. A prototypical scanpath of a novice (a) and an expert (b).

The scanpath begins at the black dot on the end of the dark blue segment. The first fixation is located at the black dot on the right and the color scheme shows the expert's fixation sequence as it progresses through time, going from dark blue to yellow to dark red. This expert demonstrates a typical scanpath that concentrates on the core and delta regions of the fingerprint, diagnostic of level one analysis. In contrast, the novice exhibits less directed behavior.

https://doi.org/10.1371/journal.pone.0146266.g003

Discriminative Modeling of the Degree of Expertise

In the experiments we describe in this article, subjects are shown fingerprint images and are instructed to inspect the images in order to determine whether or not the pair of fingerprints matches. The dependent measure in the experiments is the degree of attentional expertise exhibited by a subject before and after training. We use a subject’s fixation sequence to compute an attentional expertise score. The S1 Appendix presents the probabilistic model underlying this score.

Briefly, the model assigns a likelihood to a fixation sequence F given image I for a population s (novices or experts), denoted p(F | s, I). If novices and experts systematically differ, a model of experts will assign a higher likelihood to expert sequences than to novice sequences, and a model of novices will do the opposite. By contrasting the predictions of the models, one can thus discriminate novice and expert attentional behavior. Specifically, Bayes’ rule can be used to classify a fixation sequence as having been produced by a novice or by an expert: (1)

We use a measure related to Eq 1, the log likelihood ratio (LLR), to characterize the degree of attentional expertise reflected in a subject’s fixation sequence.

The novice and expert likelihood models used in Eq 1, p(F | s = novice, I) and p(F | s = expert, I), describe the generative process underlying fixation sequences. However, the models should not be thought of as making theoretical claims about underlying cognitive processes. Instead, the models are simply being used as a black-box tool for discriminating novice and expert fixation sequences. In the S1 Appendix, we describe a family of models and, via previously collected data [35], identify the specific model that achieves the best inter-group discrimination.

Assuming models exist that can be used to assess the attentional expertise exhibited by subjects, we now address the main goal of our work: to develop and evaluate training procedures that yield a rapid and relatively effortless shift toward expert-like behavior by highlighting locations where an expert is likely to attend. We describe three experiments that explore attentional highlighting to train novice fingerprint examiners.

Experiment 1: Expert Highlighting

In Experiment 1, naive subjects were trained on fingerprint pairs to replicate the fixation sequence of one randomly selected expert from Busey et al.’s [35] dataset. (We chose to train each subject on a single expert to allow for the possibility of expert-specific idiosyncrasies [36]). Each training trial began with presentation of a fingerprint pair, followed by a cue to the expert’s first fixation location within the image. The cue consisted of a blinking red spot in the image background (Fig 4). When the subject made a saccade to the cue, as registered by an eye tracker, the cue shifted to the expert’s next fixation, and so forth. From the subject’s perspective, the training task involved following a sequence of red spots, sometimes jumping within a fingerprint and sometimes jumping to the other fingerprint. The training procedure preserves the order of fixations because there is potential information in the sequence itself [37], which may support learning. Experiments 2 and 3 involved a similar training procedure, but in Experiment 2, subjects were trained to follow the fixation sequence of a naive viewer, and in Experiment 3, subjects were trained to follow the fixation sequence that an expert made to a different stimulus rather than the current one. The rationale for these experiments will be explained shortly.

Download:

Fig 4. An example of the attentional highlighting used in Experiments 1–3.

Each fixation location in a selected sequence shows attentional highlighting using a flashing red Gaussian intensity bump.

https://doi.org/10.1371/journal.pone.0146266.g004

All three experiments included pre-training and post-training phases during which subjects performed free viewing of images. A comparison of fixation patterns pre- and post-training was used to assess the influence of training. During free viewing, the subjects’ task was to perform as a fingerprint analyst and examine the pair of prints in order to determine similarities and differences between them.

Participants

Twelve subjects with normal or corrected to normal vision drawn from a paid subject pool at the University of Colorado, Boulder. Subjects were given $10 in compensation for their time. One subject was dropped because the eye tracker failed to reliably detect gaze shifts. The dropped subject was rerun.

Apparatus and Materials

The experiment utilized a Tobii T60 XL eye tracker (24-inch 1920 x 1200 pixels widescreen monitor), an open source Talk2Tobii extension [38], the Psychophysics Toolbox extension [39], and MATLAB all running on an iMac computer. The 28 stimuli (each 1680x1050 pixels) consisted of two fingerprints side by side in 256 bit gray scale. (Two stimuli were discarded because some training data were corrupted on those two stimuli.) Stimuli were displayed pixel perfect on the eye tracker screen and subtended approximately 39.2 by 25.1 degrees of visual angle. The remainder of the screen surrounding the stimulus was displayed in black. The two fingerprints in each stimulus cannot be distinguished easily. Among the 28 possible stimuli, only two stimuli contained mismatching fingerprint pairs and not every subject was assigned a mismatching case.

Procedure

The protocol for Experiments 1–3 was approved by the Institutional Review Board of the University of Colorado Boulder (Protocol 12–0661, "Visual Task Training Using Attentional Highlighting"). Written informed consent was obtained from all participants.

Prior to the beginning of the experiment, subjects were calibrated to maximize performance of the eye tracker. Subjects were seated so that their eyes were approximately 25 inches (63.5 cm) from eye tracker screen, as recommended by Tobii documentation, and the eye tracker’s height was adjusted so that the subject’s eyes were approximately level with the center of the eye tracker screen. Prior to any experiment-specific instructions, all subjects were verbally informed that blinking during the experiment was acceptable and that slight movement was permitted. The eye tracker was calibrated for each subject using a script that was slightly modified from a publicly available script that accompanied the Talk2Tobii software. Halfway through the experiment the eye tracker was calibrated again to account for drift.

At the initiation of the experiment, an automated program randomly assigned each subject 16 stimuli from 28 possible stimuli in the training dataset. These 16 stimuli made up the subject’s stimulus set. Eight of the 16 stimuli in the stimulus set were randomly assigned to the training set and the remaining eight were assigned to the transfer set. In addition to being randomly assigned stimuli, each of the 12 subjects was randomly paired with one of the 12 experts in the training dataset, in one-to-one correspondence. All attentional highlighting provided to a subject during the training phase utilized fixations from their matched expert and a given stimulus always showed the same fixation sequence.

Following calibration, subjects received on-screen instructions which asked them to imagine that it was their first day on the job as a fingerprint analyst and they should look for similarities and differences between pairs of fingerprints displayed on the screen. After subjects indicated they were ready to continue, they were immediately presented with the on-screen instructions for the pre-training phase, informing them that they would begin by examining fingerprints without assistance. (The overall trial sequence is shown in Fig 5.)

Download:

Fig 5. The four phases of Experiments 1–3.

During the pre-test and post-test phases, subjects view images for 10 s and their fixations are recorded as they examine the prints. During the training phase, attentional highlighting guides fixations to locations of interest (red dot). Once subjects saccade to the location of the cue, the next fixation in the sequence is cued. Each trial begins with a fixation cue for 1 s preceding the onset of the fingerprint image. The fingerprint pairs shaded in purple represent the trained stimulus set, while the cyan fingerprint pairs represent the transfer stimulus set.

https://doi.org/10.1371/journal.pone.0146266.g005

The pre-training phase consisted of 16 trials, each involving free viewing of a stimulus for 10 s. On each trial, one of the 16 stimuli in the subject’s stimulus set was displayed, in randomized order. During the 10 s viewing period, the subject’s gaze was recorded. Prior to each trial, a white fixation cross was presented on a black background for 1 s. This pre-training phase was designed to mimic the simultaneous exposure period in Busey et al. [35] where two inked fingerprints were shown side by side. In the pre-training phase and the remainder of the experiment, subject responses were not collected because the focus of the study is on eye gaze behavior rather than on decision-making. Even without overt judgments, we were confident that the prints were carefully examined by virtue of the number of saccades produced on each trial, and the observation that the vast majority of saccades landed on one of the two prints.

Following the pre-training phase, subjects were trained using attentional highlighting over 64 trials, divided into eight blocks each consisting of the eight stimuli in the training set, randomized within a block. After blocks 2, 4, and 6, subjects were informed of their progress, reminded of their task and given the opportunity to take a quick break. Prior to the training phase, subjects were presented with on-screen instructions explaining that they would be examining pairs of fingerprints aided by attentional highlighting. They were told that they should look at the locations of a series of cues and that the trial would end when all of the cues in the sequence had been visited. They were not given any information about the source of the cues (i.e., that the locations were provided by an expert).

Each trial began with presentation of a white fixation cross on a black background. After 1 s, the cross was replaced by a pair of fingerprints centered on the cross location. Gaze-contingent attentional highlighting sequentially cued subjects to the locations of their assigned fixation sequence. The particular fixation sequence that each subject saw was determined by the expert they were assigned at the beginning of the experiment. For a given subject, all training trials used fixation sequences from the same expert viewing both images simultaneously. The onset of the first highlight (i.e., the first fixation in the expert sequence) coincided with the stimulus onset. Once the subject made a correct saccade to the cued location, the highlight was immediately removed and after 180 ms, a highlight appeared at the next fixation in the sequence. (If the subject did not saccade to the cue after 3 s, the next cue was presented. Although this event was rare, a cue timeout guaranteed the experiment would not extend beyond the allotted time. A saccade to a cue might not be registered because the subject could not locate it or the eye tracker could not get an appropriate reading.) The highlighted training proceeded until all fixations in the sequence had been visited in the correct order. The trial ended when the last fixation in the sequence was visited. Each trial lasted roughly 30 sec. By following the cues, the subjects traced out their assigned expert’s scanpath.

Each cue had a Gaussian-distributed intensity and filled only the white background of the image and did not obstruct the black fingerprint ridges. The center of the cue is specified by the expert’s fixation, and is fully saturated red. The cue has standard deviation of 0.25° of visual angle (10 pixels) and flashed once per second, with an onset-to-offset interval of 300 ms, to facilitate its detection. Onsets and offsets are strong attractors of attention regardless of the task [40,41].

Following the training phase, subjects were presented with on-screen instructions informing them that they would be comparing fingerprints without assistance (as they had done during the pre-training phase). The post-training phase consisted of a recorded 10 s free gaze period identical to the pre-training period. The stimulus order in the post-training was independently randomized from the pre-training stimulus order. The entire experiment took about 45 minutes.

In each phase of the experiment, each subject’s gaze was recorded by the Tobii T60 XL at the rate of 60 Hz. Along with the eye coordinates at the time of sampling, the Tobii records a code for each eye indicating whether or not the eye has been detected. The data were cleaned by removing invalid gaze points. Although the Tobii is able to repair samples that have only one invalid eye, samples that have two invalid eyes must be thrown out. Any trials that have over 50% invalid gaze samples are thrown out since this suggests a systematic problem with the trial. While our rejection criterion is low, in practice, nearly all subjects in the following experiments had greater than 90% valid gaze samples. Gaze collection issues tended to concentrate with specific individuals, resulting in a small number of subjects being dropped from further analysis. Following the removal of invalid gaze points, the location of the gaze points is checked to make sure that all gaze points fall on the stimulus. Since the stimuli do not fill the entire screen of the eye tracker, it is possible for subjects to look off the stimulus. Any samples that are off the stimulus are removed and trials that have more than 50% of the gaze points off the stimulus are thrown out because this suggests a systematic inattention to the task.

Since the eye-tracker collects data at a uniform rate and does not directly determine fixations and saccades, it is necessary to run the gaze data through a fixation filter to extract fixation locations. The gaze filter used follows common parameters used in other research [35,42]. Specifically, a median filter with a window of three consecutive points is run over the data, which serves to reduce noise in the data. The magnitude of the velocity is then calculated at every point in the smoothed data and a velocity threshold is used to segment successive gaze points into fixations. The velocity threshold used here was 10° s^-1. A minimum of 67 ms is established for a set of gaze points to be grouped as a fixation in order to eliminate extremely brief fixations. A threshold of 67 ms was used to mirror the analysis of Busey et al. [35]. Using alternative threshold values between 50–150 ms does not qualitatively change the results of later analysis. One subject was replaced due to difficulty collecting sufficient valid data. All subjects used for analysis had 32 valid trials.

Results

The experiment involved four conditions in a 2-by-2 design: the time of recording (before vs. after training) crossed with whether the stimuli were used for training or not (training vs. transfer stimuli). We performed three different analyses on the cleaned data: (1) an empirically based analysis that measures the spatial proximity of subject fixations to expert fixations, (2) a model-based analysis that evaluates the degree of expertise via expert and novice image-specific models, and (3) a summary analysis of gaze statistics. We discuss each in turn.

Empirical analysis of spatial proximity to expert fixations.

For each fixation of each subject on a particular image, we determined the mean Euclidean distance to all expert fixations on the same fingerprint impression. This mean distance was used to form an empirical cumulative distribution function (CDF). By comparing the CDFs before and after training, we can assess the degree to which subjects’ fixations better approximated the experts’. The top row of Fig 6 shows the empirical CDF in the four conditions of Experiment 1. A two-sample Kolmogorov-Smirnov test (K-S test) allows us to test whether the probability distributions before and after training differ. If we treat each fixation as independent of the others, the K-S test produces highly significant effects for both the training stimuli (D(2025, 1913) = 0.114, p < .001) and the transfer stimuli (D(2049, 1825) = 0.083, p < .001). However, such a treatment doesn’t consider the dependence among fixations in a sequence and therefore inflates the apparent degrees of freedom in the data. Consequently, we propose an approach to estimating the reliability of differences in the distributions by excluding adjacent fixations, performing the K-S test with only every k-th fixation in a sequence—i.e., fixations 1, k+1, 2k+1. With a suitable k, the fixations will be independent. By intuition, we chose k = 3, which yields a reliable shift toward expertise for both training and transfer stimuli (D(706, 669) = 0.123, p < .001; D(718, 640) = 0.096, p < .005). Significance levels in Fig 6 are based on k = 3. Because of the arbitrariness in the choice of k, we performed a second, model-based analysis of the data.

Download:

Fig 6. Summary of spatial proximity results from Experiments 1–3.

Each experiment is shown in a row of the figure. The left and right columns present results for trained and transfer stimuli, respectively. Each graph shows the empirical cumulative distribution function before (blue) and after (red) the training phase. Each sample in the cumulative distribution function derives from the mean distance to all expert fixations (on the same image and impression) for a given subject fixation. Experiment 1–3 involved highlighted training based on expert fixations, novice fixations, and incongruent expert fixations, respectively.

https://doi.org/10.1371/journal.pone.0146266.g006

Model-based analysis.

The data were evaluated via the expert and novice image-specific models to obtain a log-likelihood ratio (the LLR, defined in Eq A5). The LLR provides a measure of relative expertise, allowing us to compare the nature of eye movements before and after training. The top row of Fig 7 shows relative expertise across subjects in the four conditions of Experiment 1. For both trained and transfer stimuli (left and right graphs, respectively), highlighted training shifts performance toward expertise. We conducted an analysis of variance (ANOVA) with subjects as the random variable and two within-subject factors: experiment phase (before vs. after training) and stimulus type (training, transfer). The ANOVA obtained a main effect of phase (F(1, 11) = 24.92, p < .001), no reliable effect of stimulus type (F(1, 11) = 0.05), and no phase-stimulus type interaction (F(1, 11) = 0.16). Thus, highlighted training was effective not only in altering fixation patterns on the specific images trained, but on novel images as well.

Download:

Fig 7. Summary of model-based analysis from Experiments 1–3.

Each experiment is shown in a row of the figure, and the left column presents results for trained stimuli and the right column for transfer stimuli. The ordinate of each graph indicates the log likelihood ratio (LLR), a measure of relative expertise. An increase in the LLR from before to after training indicates acquisition of expertise. Experiment 1–3 involved highlighted training based on expert fixations, novice fixations, and incongruent expert fixations, respectively. The error bars are calculated to account for systematic tendencies of data derived from the same subject and variability between subjects [43].

https://doi.org/10.1371/journal.pone.0146266.g007

Summary gaze statistics.

The data were evaluated to obtain low-level gaze statistics of various measures before and after training. This analysis included proportion of fixations on the left (vs. the right), proportion of saccades within impressions (vs. between impressions), mean fixation duration, mean saccade amplitude on the left and mean saccade amplitude on the right. Data were collapsed across training and transfer stimuli to reduce variance of the statistics. The top section of Table 1 shows the analysis of these variables for Experiment 1. There was a significant increase in the percentage of saccades that occurred within impressions; no other effect was significant.

Download:

Table 1. Low-level statistics of subject gaze behavior for Experiments 1, 2 and 3.

For each experiment, an analysis was performed on the before- and after-training data for the following statistics: the percentage of fixations that occur on the left impression versus the right (Left Fixations), the percentage of saccades that occur within an impression versus across impressions (Within Saccades), the mean fixation duration (Fixation Duration), the amplitude of saccades that occur on the left and right impression (Saccade Amplitude). Experiment 1–3 involved highlighted training based on expert fixations, novice fixations, and incongruent expert fixations, respectively.

https://doi.org/10.1371/journal.pone.0146266.t001

Discussion

The results of Experiment 1 indicate that training via attentional highlighting leads to gaze patterns that are less like those of domain novices and more like those of domain experts. This shift toward expertise is observed both for stimuli that were used in training and for novel stimuli. Had highlighting benefited only the training stimuli, one might have explained the effect in terms of memorization of previous scanpaths [44]. However, both the empirical analysis and the model-based analysis indicate that generalization is quite robust, making it unlikely that subjects were simply memorizing the instructed fixation sequences.

The analysis of low-level gaze statistics provides some further support for the value of highlighting in facilitating the acquisition of expertise, but also points out some limitations. The increase in the percentage of saccades that occur within an impression is in agreement with the observations of Busey et al. [35] that experts make a higher percentage of saccades within impressions than do novices. It is possible that this reflects adoption of a more efficient examination strategy that seeks to compare an arrangement of features rather than compare features one at a time. In contrast, our analysis suggests that highlighting does not increase the percentage of fixations on the left or decrease the amplitude of saccades within an impression. Although Busey et al. [35] did find that experts fixated significantly more often on the left impression, the difference between experts and novices in their corpus was small (52.9% versus 51.4% respectively), and the reason for this difference with real-world experts is that examination strategies may focus on the latent print, which is conventionally placed on the left [35]. In our experiment, no latent print was used.

Thus far, we have focused on gaze statistics and haven’t reported measures of the primary discrimination task that subjects performed—comparing a pair of prints to determine if they came from the same individual. As we mentioned previously, 26 of the 28 images used in Experiment 1 contained matching prints. The predominance of matching prints ensures extended gaze sequences: with inked prints, the clarity of the images makes it easy to reject mismatching prints. Given a relatively small dataset and the imbalance of matches and mismatches, we did not expect any measure of discriminability or accuracy to provide reliable indications of learning, and thus did not collect same/different judgments.

One concern with Experiment 1 is that subjects might not be learning from the highlighting per se; rather, their improved performance might simply be due to mere exposure to the fingerprint stimuli and the resulting increased familiarity with fingerprints in general. To test this hypothesis, Experiment 2 was identical to Experiment 1, except that instead of highlighted training based on expert fixation sequences, the training procedure was based on novice fixation sequences. Each subject in Experiment 2 was matched with a novice in the training dataset, and the training novice provided the fixation sequences for highlighting. If highlighting has no effect, then Experiment 2 should produce the same results as Experiment 1. However, if subjects are learning from the highlighted sequence, the shift towards expertise should not be observed in Experiment 2.