Pupil diameter differentiates expertise in dental radiography visual search

Expert behavior is characterized by rapid information processing abilities, dependent on more structured schemata in long-term memory designated for their domain-specific tasks. From this understanding, expertise can effectively reduce cognitive load on a domain-specific task. However, certain tasks could still evoke different gradations of load even for an expert, e.g., when having to detect subtle anomalies in dental radiographs. Our aim was to measure pupil diameter response to anomalies of varying levels of difficulty in expert and student dentists’ visual examination of panoramic radiographs. We found that students’ pupil diameter dilated significantly from baseline compared to experts, but anomaly difficulty had no effect on pupillary response. In contrast, experts’ pupil diameter responded to varying levels of anomaly difficulty, where more difficult anomalies evoked greater pupil dilation from baseline. Experts thus showed proportional pupillary response indicative of increasing cognitive load with increasingly difficult anomalies, whereas students showed pupillary response indicative of higher cognitive load for all anomalies when compared to experts.

Background Characterizing Expertise 58 Expertise lies in the mind. The theory that expert aptitude develops a more structured 59 long term memory designated for domain-specific tasks [17] offers insight into experts' 60 faster and more accurate abilities [18]. Long term working memory, proposed by 61 Ericsson and Kintsch [17], offers this explanation for how experts seemingly effortlessly 62 handle their domain-specific tasks. Their memory structuring facilitates their ability to 63 maintain working memory at optimal capacity, avoiding overload, which affects 64 productivity and performance. 65 Expertise and Memory 66 Long-term memory (LTM) manages how we automatically engage in familiar activities 67 without much thought (e.g. riding a bicycle, remembering your childhood phone 68 number). Similarly, expertise is dependent on this capacity. However, the mechanism 69 that effectively accesses domain-specific information in LTM is a distinguishing asset to 70 experts. Generally, working memory is understood as temporary storage for processing 71 readily available information [19] and has two prongs: Short-term and long-term 72 working memory. Where the former relates to structuring available for limited capacity, 73 the latter relates to the structuring available to the larger, long-lasting storage and is of 74 more interest in skill learning [17]. This structuring conceptualization explains why 75 experts intuitively handle their domain specific tasks. For instance, chess players 76 employ memory chunking that enables them to quickly recognize favorable positions 77 and movements with less focus on single pieces [20]. Athletes show faster reaction to 78 attentional cues, especially in interceptive sports, (e.g., basketball), indicating more 79 rapid mental processing [21]. Also, medical professionals have been thought to 80 proficiently employ heuristics in their decision making strategies, i.e., visual search of 81 radiographs [22] or diagnostic reasoning [23,24]. 82 Skill Acquisition and Cognitive Load 83 Developing new skills and the related memory structures for a specific discipline rely 84 heavily on the capacity of working memory. According to Just and Carpenter [25], when 85 the working memory capacity is reached, comprehension is inhibited, leading to negative 86 effects on performance. Effective comprehension then relies on resource allocation [25]. 87 Optimal resource allocation supports rapid convergence to the most appropriate 88 task-solution. Experts can filter out irrelevant information, which is evident in gaze 89 behavior; they focus more on areas relevant to the task solution and less on areas that 90 are irrelevant to the solution [18]. For instance, expert radiologists have more fixations 91 on anomaly prone areas [26][27][28] and have shorter time to fixation on an anomaly [22,29]. 92 Additionally, when the task becomes too difficult, there is more demand on working 93 memory [30]. Sweller points out that the means-to-an-end problem solving strategies 94 that novices employ can overload working memory [31]. For instance, a student using a 95 trial and error approach to an end goal needs to maintain a history of all their wrong 96 answers so far. Each wrong answer then gets added to the stack, taking up working 97 memory capacity. Cognitive load, or more specifically intrinsic cognitive load [32], is the 98 effect of "heavy use of limited cognitive-processing capability" [31]. For more 99 information, see review by Paas and Ayres [33]. High cognitive load has been shown to 100 have negative effects on performance [30] and effective learning in general [34]. Thus, 101 too high cognitive load can hinder the aspect of learning where the memory structures 102 are developed. challenging problems that could evoke more load on working memory. It has been found 106 that experts employ more efficient reasoning strategies when accurately evaluating 107 clinical case examinations [36]. However, experts that inaccurately evaluated these 108 examinations also employed reasoning strategies similar to novices [36]. Furthermore, 109 inefficient reasoning strategies were also likely to be elicited in experts in more 110 complicated case examinations [37]. Inefficient reasoning strategies brings back the 111 illustration of the stack in working memory being filled with irrelevant information, 112 exhausting capacity and creating cognitive load.

113
One way to asses levels of cognitive load is the pupillary response [38,39], where 114 pupil size has been shown to increase as a response to memory resources reaching 115 capacity [40,41] as well as when the task becomes too difficult [34,42]. Accordingly, 116 experts have a higher threshold for what is difficult compared to their novice 117 counterparts, which is evident in the pupil response. Therefore, we are interested in 118 expert and novice differences in task difficulty as measured by the pupil diameter.

119
Specifically, expert and novice dentists when interpreting anomalies of varying degree of 120 difficulty in panoramic radiographs. More interesting, our aim is to further understand 121 experts' perception of difficulty in their domain-specific tasks and whether this affects 122 cognitive load. performance, e.g., during search, has been characterized by metrics derived from the 127 discrimination of fixations and saccades. Fixations are the period when eye movements 128 are relatively still, indicating focus of attention, usually on areas prone to a specific 129 diagnosis [43]. Saccades, the rapid eye movements, are usually made when scanning over 130 irrelevant areas to a specific diagnosis [18]. Kok et al. [44] showed that distinguishable 131 gaze strategies were evident in expert, intermediate, and novice radiologists. Their 132 strategies were affected by top-down (context, knowledge-based) or bottom-up (salient, 133 noticeable images features) aspects of the task. In other words, top-down based gaze 134 behavior can be representative of the cognitive processing during efficient reasoning.

135
Conversely, less efficient reasoning can be linked to bottom-up based gaze behavior, 136 where attention is spread out over areas deemed salient, regardless of if they are relevant 137 to the diagnosis at hand. Additionally, depending on the anomaly, experts employ a 138 mixture of focal and diffusive, or ambient, gaze strategies [44][45][46]; however, they are 139 more accurate at determining anomalies than novices and intermediates.

140
Beyond gaze strategies based on fixations and saccades, other forms of eye 141 movements that have been used to measure aspects of cognition during visual search 142 include their speed and direction [5], microsaccades [47], pupil diameter oscillation [48], 143 and measures related to pupil diameter itself [49]. Generally, most of these measures 144 concentrating on estimation of cognitive load have produced metrics sensitive to the 145 presence or absence of cognitive load. In this paper, we show that the baseline-related 146 pupil dilation, which has been one of the more consistently reliable measures of 147 cognitive load, can also discriminate varying levels of difficulty in experts' search of 148 radiograph images.  Not only does visual search strategy reflect cognitive processes, but pupil diameter has 151 also been shown to be a robust, non-invasive measurement of cognitive otherwise known as task-evoked pupillary response. Originally, Kahneman and 154 Beatty [50] linked pupil response to attentional differences. Then, the link between 155 attention and capacity was promoted [42]; where higher load on the working memory 156 showed a larger change in pupil dilation. Additionally, pupillary response has been 157 found to be an indicator of learning [34], where pupil diameter decreased with more 158 experience in a task. This understanding of pupil diameter changes has further been 159 employed as a robust cognitive load classifier [39]. 160 Much of the early research in processing capacity and cognitive load have elicited 161 effects from language or number recall tests [40][41][42]. However, pupil activity correlates 162 to workload during a variety of other tasks (see review by van der Wel et al. [56]). For 163 instance in visual search tasks, more distractors make the paradigm more difficult, 164 affecting the pupil diameter increase [57]. Furthermore, when asked to recall the 165 amount of objects in the stimuli pupil diameter size increased even more [57]. Backs 166 and Walrath [58] found that monochrome displays evoked longer search time and more 167 pupil dilation than colored displays when performing visual search tasks for both object 168 counting and target finding. 1 Regarding uncertainty during a search task, an increase 169 in pupil diameter was associated with response time and uncertainty of target 170 selection [59]. Although the effects of learning are still apparent, pupil dilation decreases 171 as an effect of training over time [60].

172
One of the more important takeaways from the visual search literature is the 173 interplay of selective attention, increasing task demand, and the mental effort evoked.

174
Moreover, this understanding is applicable to medical professionals and the cognitive 175 processes involved during diagnostic interpretation of radiographs, where they are 176 highly exposed to varying difficulties of this task and their student counterparts less so, 177 though they are in the process of learning. Over time, students accumulate more 178 experiences and face more challenging tasks, reducing cognitive load.

180
Though it is apparent that pupillary response is a product of cognitive load, other 181 factors have been shown to effect pupil size. For example, changes in luminance in the 182 environment result in the physiological response of constriction or dilation [55]. Age 183 difference has also been shown to affect pupil size differences, where overall pupil size in 184 older adults is smaller than younger adults, though variance between subjects in similar 185 age groups is also quite high [51,55]. With these factors in mind, studies on pupil 186 diameter and load recommend a task-to-baseline comparison in luminance controlled 187 environments [34, 38-42, 50, 53, 57, 59, 61]. A model was developed that measures pupil 188 dilation during workload that accounts for light changes, where task-related changes are 189 still measurable giving varying lighting conditions [62]. Other factors known to affect 190 pupillary response can be fatigue [63,64], caffeine or drug consumption [65,66], and 191 emotion or arousal [67]. Therefore, when measuring pupillary response in relation to 192 cognitive load or mental effort in general, these factors should be controlled in order to 193 avoid such confounds.  [53] found that novices had a larger pupillary 197 response compared to experts. Also, experts' pupillary response was not affected by 198 question difficulty or accurate response [53]. Thus, for questions related to field of 199 expertise, trained physicians showed more accurate performance and less cognitive load, 200 1 They could not rule out luminance differences as a possible confound.
whereas novices exhibited greater cognitive load, especially for more difficult 201 questions [53]. 202 Further research in laproscopy found that expert surgeons' pupil diameter increased 203 as a result of increasing task difficulty during laproscopic procedures [68]. While 204 performing hernia repair surgery, Tien et al. [69] found that junior surgeons had larger 205 pupil sizes than experts and that specific tasks also affected their pupillary response as 206 well. This pupil response due to less experience was corroborated by self-report of task 207 load, where experts experienced less mental demand than juniors [69]. For more 208 references highlighting lower pupillary response as an effect of medical expertise (e.g., 209 surgeons, anesthesiologists, physicians), see Szulewski et al. [70].

210
Regarding specifically medical image interpretation, Brunyé and colleagues [52] 211 evaluated expert physicians viewing digitized breast biopsies with varying levels of 212 difficulty and their resulting case diagnoses. They found pupil diameter increases as an 213 effect of difficulty in diagnostic decision making, moreso for cases that were accurately 214 diagnosed [52]. They attribute their results to experts' possible perception of case 215 difficulty during an initial analysis. Therefore, pupil diameter can be indicative of the 216 cognitive processes involved in interpreting medical images and can indicate the level of 217 expertise as well as the degree of difficulty. Brunyé et al. [71] further highlight the 218 prospects that pupillary response in combination with gaze behavior has in 219 understanding uncertainty in medical decision making.

220
One of the earlier studies that have specifically focused on dental expertise and OPT 221 interpretation found that the degree of image difficulty (obvious, intermediate, and 222 subtle pathologies) had an effect on the gaze behavior for both experts and 223 students [72]. They found that experts were shorter with their total search time as well 224 as time to identify (first fixation) an anomaly compared to novices. However, experts 225 used more fixations and longer fixation durations on difficult images compared to 226 obvious images. Students showed no differences in how often and how long they looked 227 at obvious images or difficult images [72]. 228 Castner and colleagues [73] also found a possible effect of degree of difficulty and 229 how often an expert dentist glances at an anomaly before he or she physically labels it 230 as such. Where certain anomalies were only glanced at once to be accurately labeled, 231 and others needed to be glanced at multiple times to be accurately labeled. Moreover, 232 gaze behavior is indicative of the expertise and the cognitive processes involved in 233 interpreting medical images. Additionally, the degree of difficulty in accurate pathology 234 detection can affect gaze behavior, which can be indicative of the reasoning strategies 235 used. For this reason, we are interested in further understanding the cognitive processes 236 during visual search of dental radiographs. Mainly we wish to know how the degree of 237 pathology difficulty can interrupt the flow of efficient expert reasoning.

238
With this intention in mind, we looked at expert and novice dentists' pupillary 239 response while fixating on anomalies of varying difficulty in panoramic radiographs. Not 240 only do these OPTs have multiple anomalies, but also within one OPT, varying 241 difficulties can be present. Therefore, we are not analyzing an overall impression of easy 242 or difficult image. Rather, through the course of the search strategy, we are extracting 243 when they spot an anomaly and extracting the mental processing at that moment. We 244 propose the degree of anomaly interpretation difficulty can be indicated by changes in 245 the pupillary response; where a larger response is more representative of harder to 246 interpret anomalies. We also hypothesize to find a difference in the pupillary response 247 between experts and novices, as established by prior research. However, whether novices 248 are as attuned to anomaly difficulty as their expert counterparts is also of interest to 249 our work. Data collection was performed during summer and winter semesters from 2017 to 2019. 253 Students from semesters six through tenth were recorded during an OPT inspection 254 task. Only the sixth semester students were evaluated three times in each period of 255 data collection due to their curriculum requirement of an OPT interpretation training 256 course. We chose to evaluate the sixth semester students after this course 257 (N sixthM3 = 50), since they were more likely to experience cognitive load due to the 258 increase in conceptual knowledge and OPT reading skills from this course. Fig 1 shows 259 the difference in students as well as experts' pupil diameters over the stimulus duration. 260 The sixth semester students after this training course ("Six M3" in Fig. 4) have higher 261 overall pupil diameter. Smoothed raw data is averaged over 1 second bins. The sixth semester students were measured on three separate occasions: Before, during, and at the end of their obligatory training course (as indicated by "M" for measurement). Students had OPT images presented for 90 seconds, whereas experts had the images presented for 45 seconds. Table 1 details both the student and expert data. Experts (N experts = 28) from the 263 University clinic volunteered their expertise for the same task that students performed. 264 Experience was defined as professional years working as a dentist and ranged from 1 to 265 43 years (M years = 9.88). 50% of experts saw between 11 and 30 patients on a given day 266 and the remainder saw less than 10 patients a day. All experts had the necessary 267 qualifications to practice dentistry and or any other dental related specialty: e.g., 268 Prosthodontics, Orthodontics, Endodontics, etc. Due to technical difficulties, eye   [85]. While the screen luminance was 312 also controlled and followed the standard protocols for viewing radiographs, the exact 313 effect of the screen brightness on the pupillary response is out of the scope of this work; 314 rather the pupillary response dependent on mental load during these reading task is the 315 focus.

316
Eye Tracker

317
The SMI RED250 remote eye tracker is a commercial eye tracker with 250Hz sampling 318 frequency, and used for gaze data collection. The software included with the eye 319 2 Illuminance is the amount of light on a given space. Luminance is light reflected off a surface [74]. 3 Depending on nature of radiograph, e.g. Mammography, CT.
tracking offers an experiment designer (Experiment Center ) and event analysis tool 320 (BeGaze). Since the eye tracker has a high sampling frequency, both stable (fixations) 321 and rapid (saccadic) eye movements for static stimuli can be measured. Analysis was 322 performed on the raw gaze data output from the eyetracker: x and y coordinates with 323 timestamps mapped to the screen dimensions. The raw data points also have pupil 324 diameter output in millimeters 4 Although the data is raw and has not been run through 325 event detection algorithms, raw gaze points are labeled as fixation, saccade, or blink. 326 We evaluated gaze data for the left eye. OPTs. In total, the participants view 20 OPTs with a short break after the first ten. Outline of Experimental Session. Initially, there was a calibration and procedural instructions. Then for each image, there is a fixation cross for baseline data, the exploration phase (45s duration for experts and 90s for students), instructions for the marking phase, and the marking phase (unlimited time). Students received two sets of 10 OPTs with a break in between and experts received one set of 15 OPTs with a break after the first seven.
The diagnostic task for the expert group was highly similar to that of the students. 348 However, it was determined that 90 seconds is too long of a duration for the experts, 349 since much of the previous literature has shown experts are faster at scanning 350 radiographs [18,22,27,28,72,[87][88][89]. Therefore, the exploration phase was shortened to 351 a duration of 45 seconds. Additionally, due their busy schedules, experts only viewed 15 352 OPTs, with a short pause after the first seven.

353
Both students and experts were able to move their head during the experiment, 354 although they were instructed to move their head as little as possible. Further details of 355 one of the student data collections can be found in Castner et al. [90] and expert data 356 collections can be found in Castner et al. [73]. 357 4 Millimeters extrapolated from pupil height and width dimensions in pixels [86]. 5 The first data collection of the sixth semester students was done with 13 points. However, the other data collections were done with 9 points. The sixth semester students and experts analyzed for this work both performed 9-point calibrations.  This graph shows raw pupil signals of the left and right eye over the course of image presentation. Red and green dots in the lower part show when the eye tracker labels the data point as a blink. The particular subject in 4a had a high tracking ratio (98%) for this image, even though there is the possibility that many data samples are missing and incorrectly labeled as blinks. The participant in 4b also has a high tracking ratio, though the data appears to be acceptable with typical blink durations detected and little signal loss.

358
Gaze Signal Only gaze data from the interpretation phase was of interest to this 359 work, since gaze data from the marking phase was affected by the use of the screen 360 drawing-tool. Initially, the raw gaze data was examined for signal quality. The eye 361 tracker reports proportion of valid gaze signal to stimulus time as the tracking ratio.

362
Therefore, if a participant's tracking ratio for an OPT was deemed insufficient-less than 363 80%-we omit his or her data for this OPT. If overall, a participant has poor tracking 364 ratios for the majority of OPTs he or she viewed (i.e. maximum of three images with 365 acceptable tracking ratios), the total gaze data for that participant was removed. This 366 preprocessing stage can assure that errors (e.g. post-calibration shifts, poor signal due 367 to glasses) in the gaze data are substantially minimized. Table 1 gives the distribution 368 of participants and the percent of datasets excluded due to low tracking ratio (Last 369 row): 199 datasets were initially excluded on the grounds of poor quality data.  Upon investigation, standard and simple implementations for blink detection define 379 a minimum duration threshold that detects a blink if there is no gaze signal for this 380 threshold or longer [43]. The minimum blink duration in the current data set is 70 ms, 381 as corroborated by the SMI manual [86] 6 . However, it also states that it is not possible 382 for their implementation to distinguish a blink from pupil signal loss [86]. Consequently, 383 the main issue stems from the apparent lack of a maximum blink duration threshold.

384
The question becomes, did the student in Fig. 4a close his or her eye -Pirate style-for 385 almost half of the stimulus, or is this simply a situation where the left eye was not 386 detected for more than half the time?

387
Extra criteria was necessary to further detect and exclude datasets with pupil signal 388 loss mislabeled as a blink. We overestimated the threshold for atypical blink durations, 389 setting this value to 5000 ms, to account for situations where a participant could 390 possibly be rubbing his or her eye/s or possibly even closing the eye shortly. This 391 threshold could then optimally leave an acceptable amount of pupil data for the entire 392 stimulus presentation (90 or 45 seconds). Since baseline data was the two second 393 fixation cross presented directly before each stimulus, we set the threshold blink 394 duration to 500 ms and added an extra criteria of a minimum 200 pupil samples to 395 effectively extract enough samples for an acceptable pupil diameter baseline. Therefore, 396 an initial quality check was low tracking ratio exclusion. Then, the second data quality 397 check removed data sets if blink durations were atypical. These datasets were excluded 398 from the final analysis, leaving 570 datasets from 72 participants (48 students, 24 399 experts).

400
Pupil Diameter Data analysis was done for the left eye. As previously mentioned, 401 the raw gaze signal is divided into the labels blink, fixation, or saccade. Thus, we can 402 determine when the gaze signal is indicative of fixation-like behavior and saccade-like 403 behavior. For further signal processing, we removed gaze coordinates and pupil data for 404 the raw data points labeled as saccades (since visual input is not perceived during rapid 405 eye movements [43]). Data points with a pupil diameter of zero or labeled as a blink 406 were also removed. Additionally, data points 100 ms before and after blinks were 407 removed, due to pupil size distortions from partial eye-lid occlusion. Lastly, the first 408 and last 125 data points in the stimulus presentation were removed due to stimulus 409 flickering. [91][92][93] The remaining data was smoothed with a third order low-pass 410 Butterworth filter with a 2Hz cutoff as illustrated by the purple data points in Fig. 5.  Both dentists independently examined the OPTs and the patient work-ups and further 417 consolidated together to determine groundtruths for each image. Two OPTS were 418 negative (no anomalies) controls.

419
Anomaly Ground Truths Additionally, the level of difficulty for each anomaly was 420 pre-determined. Fig. 6 shows four OPT images viewed in the experiment. Anomalies anomalies in this image are approximal caries (4) and a maxillary sinus mass. Anomalies 428 indicated by the white dashed circles were determined as ambiguous, e.g. the nature of 429 their difficulty and or pathology is unclear. For example, in Fig. 6(B) (7,8) are impacted 430 wisdom teeth, though it is uncertain whether this will become a problem for the patient 431 and therefore is regarded as potentially pathologic. (6) is an apical translucency at the 432 mesial root apex and it is unclear whether it is indicative of an inflammation. Therefore, 433 they were kept in this analysis even though the nature of their difficulty is unclear.  anomalies-all colored red. Thresholding for red values was performed to automatically 437 get the pixel coordinates of the ellipse edges. Then, the ellipses were filled with the 438 poly2mask() function. Anomalies automatically extracted from this process were 439 double checked for overlapping and had their boundaries corrected. Certain anomalies 440 inside another and that were highly similar in nature, such as (2a,b) in Fig. 6(C), were 441 grouped together as one anomaly. Other anomalies too close together and too different 442 in pathology, such as (3,8) in Fig. 6(C), were excluded from the analysis, due to possible 443 spatial accuracy errors in the gaze. Similarly, anomalies that were denoted by too small 444 of an ellipse were padded to have a larger pixel area,e.g. (4) in Fig. 6(B), to account for 445 an spatial accuracy errors in the gaze. Each segmented anomaly is given a  Fig. 6. Each anomaly is segmented and given a distinguishing interger. Raw gaze points from the left eye are then mapped to the map and gaze coordinates receive the corresponding value. These distinguishing values are further linked to the pre-determined anomaly difficulty in order to get a count of how many raw gaze hits landed on each anomaly type.

449
For both students and experts, we plotted the raw gaze points that landed in each 450 anomaly and extracted its level of difficulty. For simplicity, we will refer to them as gaze 451 hits. For all hits on an anomaly for a participant, we calculated the median pupil 452 diameter. The median pupil diameter for each anomaly was then subtracted from the 453 respective baseline data for that image. Therefore, the difference from baseline could be 454 indicative of diameter increase (positive value) or diameter decrease (negative value) 455 compared to baseline.

456
With the gaze hits on anomalies of varying difficulties, we can evaluate the pupillary 457 response of both experts and students during anomaly fixations. The pupillary response, 458 as measured by change from baseline, can then provide insight into the mental/cognitive 459 load both groups are undergoing while interpreting the anomalies.

461
Overall Change from Baseline 462 Independent of gaze on anomaly behavior, we looked at participants' median pupil 463 diameter for each image compared to baseline median pupil diameters. We favored the 464 median over the mean, because it has greater robustness towards noise and outliers. The median pupil diameter change from baseline for students (blue bars) and experts (red bars) when gazing on anomalies of varying difficulty. Standard errors are indicated in black. Students had larger pupillary response from baseline compared to experts but this effect was homogeneous for the differing anomalies. Whereas experts showed a pupillary response behavior as an effect of increasing difficulty. Though it is unclear as to the nature of the anomalies pre-defined as ambiguous.

488
We measured pupil diameter change from baseline when gazing on anomalies of varying 489 difficulty during visual search of dental panoramic radiographs. We found that the 490 gradation of anomalies in these images had an effect on expert pupillary response.

491
Anomaly gradation did not have an effect on student pupillary response.

492
Students showed larger and more homogenous pupil size change from baseline for all 493 anomaly gradations compared to experts. Thus for students, pupillary response was 494 independent of whether an anomaly was easy or difficult to interpret. This effect was 495 also found during visual inspection of the whole image (Fig. 8). Students showed 496 significantly larger pupillary response than experts, which has been supported by the 497 previous literature [52,53,[69][70][71]94]. This response has also been indicative of higher 498 cognitive load [25, 40-42, 50, 53, 92]. For instance, Tien et al. [69] found that novices 499 have more higher memory load compared to an expert performing the same task. This 500 behavior can be likened to students' lack of conceptual knowledge and experience 501 producing them to "think harder" [95,96] to interpret these images.

502
The more interesting takeaway from this work is the lack of influence of anomaly 503 gradation on student cognitive processing. One would imagine that even the most 504 pronounced of anomalies would make the recognition process easier. However, the 505 pupillary response indicates that, regardless of how conspicuous, the level of mental 506 workload remains constant. The large pupil size could be reflective of learning during 507 the task, where students are developing the proper memory structures as theorized by 508 Ericsson and Kintsch [17]and Sweller [31]. Additionally, it could reflect that they have 509 not yet developed the conceptual knowledge to quickly recognize the image features 510 indicative the specific anomalies or how to interpret their underlying patholgies. Even 511 for easy anomalies, they may be unsure of whether they accurately interpreted or not. 512 Therefore, pupillary response while focusing on anomalies during visual search of 513 OPTs suggests students employ similary cognitive strategies for differing anomaly 514 gradations. Patel et al. [36] found this similar behavior when novices interpreted clinical 515 case examinations. Furthermore, previous reseearch has found systematic gaze 516 strategies were similary present in students searching medical images [22,28,44].
517 Systematic search has also been shown to affect larger pupil dilation [97]. Systematic 518 search evokes more load on the working memory, however, this is what the students are 519 generally being trainied to perform, when they first get exposed to these images [89,98]. 520 Conversely, experts showed a strong pupillary response to anomaly gradation.

521
Where the least difficult to interpret anomalies showed less change from baseline, then 522 the intermediary anomalies, and finally the largest response was for the most difficult 523 anomalies ( Fig. 9). Meaning, as the gradation of difficulty increases so does the pupillary 524 response. This behavior, however, was not evident for the ambiguous anomalies, which 525 showed the smallest response change from baseline. This behavior effect may lie in the 526 nature of the uncertainty of these anomalies. As determined by the two experts involved 527 in the project, this category was a mixture of potential areas that may or may not have 528 included an anomaly: Or even an anomaly, but with no cause for alarm. Therfore, it is 529 uncertain how difficult, easy, or even existing these anomalies were.

530
Nevertheless, when expert dentists perform a visual inspection of an OPT, they gaze 531 in many areas that potentially have a multitude of differing pathologies or even 532 positional and summation errors. Depending on the gradation of the area they are 533 focusing on, proper interpretation may need to evoke differing processing strategies. In 534 general, as task difficulty increases, so does the workload [68] and correspondingly, the 535 pupil dilation [30,42,48,99]. Patel et al. [37] found more cognitive load in physicians 536 examining more complicated case examinations. Duchowski et al. [47] also showed 537 increased cognitive load during decision-making of increasingly difficult abstract stimuli, 538 but did so using microsaccade rate. Chi et al. also found that experts can more 539 accurately determine how difficult a problem is [100]. 540 Gaze behavior in expert dentists was also shown to change with difficult images [72]. 541 Castner and colleagues [73] also found that different image types evoked either more or 542 less fixations in order to accurately detect anomalies. The current work went one step 543 further and found changes within the visual search of an OPT in contrast to the overall 544 response to interpretation of such an image. In visual search, employing a top-down 545 strategy means that someone uses his or her acquired knowledge and understanding of 546 the current problem to focus on the relevant aspects of an image to effectively process 547 it [26,98,98]. Moreover, prior knowledge to a problem has been shown to reduce 548 cognitive load [31,33,40,53]. An expert generally knows in what areas of the OPT they 549 are prevalent and how they are illustrated in the image features. Therefore, from these 550 top-down effects, an expert can quickly recognize an image feature as a specific anomaly. 551 In contrast to overall visual inspection, were we found that experts showed low average 552 change from baseline. When inspecting specific areas, pupil dilation fluctuation can be 553 indicative to changes in workload even for experts. Although, experts have a higher 554 threshold for perceived difficulty than students, it is assumed that they still experience 555 tasks or subtasks they perceive as difficult or can be uncertain about. 556 However, if all anomalies and their pathologies were equally prevalent and salient in 557 OPTs or any other medical image types, experts could effortlessly detect the vast array 558 of issues with 100 percent accuracy. Also in this case, training of accurate detection 559 would increase solely from more exposure. Naturally, interpretation of medical images is 560 not this simple and certain image or pathology features can avert the true diagnosis.

561
Experts are more robust at determining more difficult or subtle 562 anomalies [12,28,72,89,101]. Although harder to detect anomalies evoke behavior 563 indicative of task-difficulty [34,42]. More subtle anomalies evoked behavior that is likely 564 of more thorough inspection.

565
Experts, though reknowned for their streamlined processing abilities, are able to 566 selectively allocate their attention to relevant information and is evident in the pupillary 567 response. However, selective attention coupled with focus on an area perceived as 568 challenging can increase the pupil dilation even further as we found in our investigation. 569 Similar to students, albeit perceived to a lesser extent and only for difficult anomalies, is 570 the effect of uncertainty on the pupil size when looking at these specific image features. 571

572
In short, we found evidence of workload in experts as well as differences between expert 573 and novice workload during visual inspection of dental OPTs. However, it should be 574 noted that there were age differences between the two groups. Due to the sensitivity of 575 the expert demographic data, we did not record their ages; but we can expect them to 576 be older than their student counterparts. Age has been found to have an effect on the 577 average pupil size [51,55]. For this reason, we measured a change from baseline.

578
Additionally, Van Gerven et al. [54] found that pupillary response to workload in older 579 adults is not as pronounced as in younger adults. However, their population was adults 580 in their late sixties and early seventies compared to adults in their early twenties [54].

581
Though we cannot say exactly how old our expert population was, they were all still 582 working in the clinic and therefore more than likely to be younger than early seventies. 583 Also, their years of experience in the clinic (average of 10 years) suggests they were 584 more middle aged (30 to 45 years old). Further research is needed to better address this 585 limitation control for possible age difference effects on pupillary response.

586
Another limitation to this work could be the technical problem associated with the 587 eye tracker data collection. We removed data sets determined as poor quality; however, 588 spatial resolution errors can accumulate within an experimental session if a participant 589 moves too much. Then, the gaze appears to have a shifted offset, which would affect 590 precision in determining if a participant looked at an anomaly. To control for this error, 591 we increased the areas of smaller ground-truth anomalies and excluded anomalies that 592 were too close and too different in nature. The total gaze hits on each type of anomaly 593 were not evenly distributed, with more gaze hits on easier and intermediary anomalies. 594 Students used more total gaze hits due to longer OPTs persentation time, but the 595 distributions were highly similar to experts. Future research can further untangle the 596 differences in gaze hits on easier and difficult anomalies, while controlling for 597 presentation time differences.

598
Although a majority of expert studies have established that experts are more robust 599 at accurately solving their domain-specific tasks than their student 600 counterparts [17,18,26,102], pupillary response during anomaly inspection in connection 601 to detection performance is also of interest for furture work. It would be interesting to 602 see whether pupil diameter may be indicative of not only anomaly difficulty but also 603 accurate detection of difficult anomalies.

604
The temporal scanpath information is also an interesting direction for future 605 research, where systematic search in students and its effect on workload and pupillary 606 response. For example, how often do "look backs" on anomaly areas occur and does the 607 pupil dilation increase with each look back. Also, whether easy or more conspicuous 608 anomalies are viewed at first and how the pupillary response in students incorporates 609 this initial information. Following up on the understanding that systematic search 610 produces more memory load as measured by pupil dilation [97], would also be 611 interesting to replicate with temporal information from our findings.