Attention Allocation for Dysphoric Information in Adults with Depression Symptoms Using Eye-tracking and Mouse-tracking

Mary E. McNamara; Jason Shumake; Christopher G. Beevers

doi:10.1371/journal.pone.0318923

Abstract

Biased attention for dysphoric stimuli is thought to maintain depression, but poor measurement has limited prior tests of this hypothesis. The current study examined the association between biased attention for dysphoric information and depression using a novel free viewing attention bias task combined with measuring line of visual gaze via eye tracking or a behavioral proxy for line of visual gaze via mouse tracking in three samples of college students using in-person eye-tracking (Experiment 1, N = 129) and remotely collected mouse-tracking (Experiment 2, N = 79; Experiment 3, N = 154). Mixed effects regression analyses revealed that depression severity was significantly associated with greater attention for dysphoric stimuli in Experiments 1 and 2, but not Experiment 3. Results suggest that depression severity is associated with attention for dysphoric information (although findings from Experiment 3 temper this conclusion) and that eye- and mouse-tracking may be good options for measuring attention bias in depression. Additional work using longitudinal research designs seems warranted to further examine the relationship between attention bias for dysphoric information and the maintenance of depression.

Citation: McNamara ME, Shumake J, Beevers CG (2025) Attention Allocation for Dysphoric Information in Adults with Depression Symptoms Using Eye-tracking and Mouse-tracking. PLoS ONE 20(4): e0318923. https://doi.org/10.1371/journal.pone.0318923

Editor: Claudia Greco, University of Pavia: Universita degli Studi di Pavia, ITALY

Received: September 3, 2024; Accepted: January 23, 2025; Published: April 7, 2025

Copyright: © 2025 McNamara et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data files used in the creation of this manuscript and the reproducible code can be found in the Texas Data Repository at: https://doi.org/10.18738/T8/RIV4X9

Funding: The author(s) received no specific funding for this work.

Competing interests: “Christopher Beevers has received funding for his research from the National Institutes of Health, Brain and Behavior Foundation, Aiberry Inc., and other not-for profit entities. He has received income from the Association for Psychological Science for his editorial work and from Orexo, Inc. for serving on a Scientific Advisory Board related to digital therapeutics. Dr. Beevers' financial disclosures have been reviewed and approved by the University of Texas at Austin in accordance with its conflict-of-interest policies. Dr. Beevers has no ongoing employment, consultancy, patents, products in development or marketed products with Orexo or any other for-profit entities. This does not alter our adherence to PLOS ONE policies on sharing data and materials.”

Introduction

Attention bias refers to an information processing bias central to Beck’s model of depression [1]. Briefly, attention bias is the tendency for depressed individuals to spend more time attending to mood-congruent, typically dysphoric, stimuli (both internal and external) and less time attending to positive stimuli than their non-depressed counterparts. Traditionally, these attentional processes have been examined using externally-directed (i.e., visual) attention for stimuli, namely words and facial expressions [2]. While non-depressed individuals generally exhibit an attentional preference away from negative information, depressed individuals tend to spend more time viewing dysphoric stimuli, relative to individuals reporting little-to-no depression symptoms [3–7]. Additionally, depressed individuals have difficulty disengaging their attention once captured by sad stimuli [8] and take longer to disengage than both never-depressed participants [9] and individuals with lower depression symptoms [10]. However, these effects have not always been consistent, and other studies have not captured evidence of attention bias in depression [11,12]; see [13] for a review of mixed evidence).

According to cognitive theory, these biased attentional processes will yield downstream effects: depressed persons will ultimately take in more negative information and less positive information about themselves, the world, and others than their non-depressed peers [1]. Importantly, these attentional biases can also function as part of a feedback loop, where depressed persons might also have increased awareness of their unwanted thoughts, feelings, and physiological symptoms, which may engender more dysphoria and extend negative moods [14], maintaining the cycle [15,16]. Indeed, both attention bias [17] and delayed disengagement from sad faces in turn predict difficulty with mood recovery [9].

However, the assessment of attention bias in depression has been infamously illusive and fraught with challenges, due in large part to two particular issues. The first and more widely discussed obstacle has been difficulty with reliable measurement and poor psychometric properties of the bias metrics. Traditionally, the most widely-used behavioral paradigm for measuring attention bias has been the dot-probe task [18]. In the dot probe, participants are presented with stimuli (most often words or faces) on either side of the screen, typically of neutral and emotional (e.g., dysphoric) valences. Both stimuli then disappear and are replaced by a dot in the former location of one of the two stimuli. Participants indicate with a keyboard press whether the dot appeared on the left or right side of the screen, and a bias score is derived from this reaction time data. The bias score is used to infer attentional bias: attentional preference for sad stimuli is presumed when participants exhibit shorter reaction times when the dot appears in the location of the sad stimuli (implying the participant was already looking there), and longer reaction times when the dot appears in the location of the neutral stimulus (suggesting the participant was looking in the location of the sad stimulus and needs to disengage and shift attention to the other side of the screen).

However, the psychometrics of the traditional reaction time bias score derived from the dot probe task are resoundingly quite poor [19]. In particular, the bias metric has repeatedly been found to be unreliable in terms of the test-retest reliability [20,21]. A recent evaluation of the threat version of the dot-probe (where participants are presented with threat and neutral stimuli) involving 9,600 participants found that, across 36 different variations of the task, none produced internal reliabilities sufficient to justify the use of the bias difference score [22]. Attempts have been made to improve the psychometrics through computational modeling [23], as well as the use of eye-tracking instead of reaction-time based data to compute the bias score from the dot probe [24]. Methodologically, eye-tracking is a more precise means of measuring participant gaze, as opposed to inferring it from reaction time data. Moreover, the components of the bias score are often heavily correlated when using reaction time data, which contributes to instability of the bias score [25,26]. Despite these efforts, there have been continued calls to develop more psychometrically sound assessments of attentional bias [27].

The second principle challenge is that behavioral approaches of measuring attention bias in depression are largely an artifact of history and may not be optimally designed for depression in particular. The dot-probe task was first used to assess attention bias in patients with clinical levels of anxiety (e.g., the aforementioned threat dot-probe), and the verbal stimuli used depicted words that corresponded to social or physical threats [18]. Attentional bias in anxiety is characterized by hypervigilance and rapid, initial orienting to threat, and the design of the dot-probe task was originally suited to capture this process as it manifests in anxiety. In contrast, attentional bias in depression is marked by more elaborative processing, such that when individuals’ attention is captured by dysphoric stimuli, they experience difficulty shifting their attention away [6,28]. Indeed, later research demonstrated that attentional bias effects in depression tend to emerge in the context of stimuli being presented for longer durations [2], suggesting that different paradigms may be needed to optimally measure attention bias as it manifests in depression. This led to the development of free-viewing tasks that allowed for more sustained processing of emotional information to better capture bias effects [4,29–31]. Importantly, in the face of poor psychometrics and mis-aligned task design, the ability to test the role of attention bias in cognitive theory has been significantly hampered.

Deriving metrics from eye-tracking data used alongside free-viewing tasks appears to produce better quality psychometrics than the dot-probe task, particularly when using extended dwell time over early orienting measures [32,33]. However, collecting eye-tracking data in-person is sometimes unfeasible (due, for instance, to expensive equipment or when trying to recruit participants from other geographic locations). In these instances, mouse-tracking can be paired alongside free-viewing tasks as a proxy for eye-tracking. Prior work has shown that mouse-tracking can measure attention with high validity and reliability [34]. Mouse-tracking functions by first occluding the stimuli on the screen using a Gaussian blurred overlay, mimicking peripheral vision. Next, participants use a clear, circular aperture that is controlled by moving their mouse to “view” stimuli, allowing researchers to track the amount of time spent “viewing” emotional versus neutral stimuli. Evaluating the dwell time from the mouse-tracking data revealed these metrics had good psychometrics [35], consistent with what others have found [36].

In the present study, we examined the role of attentional bias in depression using a gold-standard eye tracking method and a more novel mouse-tracking method. We used the same attention bias task with these approaches: a free-viewing task that uses a 4 x 4 matrix array of sad and neutral facial expressions [30,31]. Importantly, the metrics derived from these free-viewing tasks seem to have good psychometric properties [30]. Having first demonstrated that dwell time for facial stimuli had strong internal consistency and unidimensionality in both the eye-tracking version of the task [33], and the mouse-tracking version [35], we turn towards using these metrics to evaluate whether depressed individuals display an attentional bias for dysphoric stimuli (faces) across one eye-tracking and two mouse-tracking samples. To-date, no studies have used this free viewing paradigm with mouse-tracking among people with depression. The current study involved data collection from three samples. In Experiment 1, we collected data from N = 130 college students using an eye-tracking version of this free-viewing task. In Experiment 2, we tested a similar version of the task using mouse-tracking in an online sample of N = 79 college students. In Experiment 3, we attempted to replicate the findings of Study 2 in a larger sample (N = 154) of college students.

Using this free-viewing task, our aim was to determine whether attentional biases are observed among those with elevated depression, as predicted by Beck’s cognitive theory of depression. Specifically, we examined whether depression severity was associated with greater attention for dysphoric stimuli relative to neutral stimuli. We expected a significant depression severity by stimuli valence interaction, as higher depression severity was expected to be associated with more time spent viewing dysphoric stimuli compared to neutral stimuli. This work has important implications for obtaining assessments of attention bias remotely (thereby opening up the possibility for large-scale online research) and for testing an important aspect of the cognitive theory of depression.

Methods and Procedures

Experiment 1

Methods.

The procedures and methods for Experiment 1 are described in detail in a previous publication documenting the psychometrics of the data derived from this task [33], however, we describe our sample and recapitulate the important methodological details here briefly.

Participants.

We initially collected data from 138 college students who received course credit for their participation. We excluded 9 individuals due to issues completing the experiment (only completing one of the two blocks and/or technical malfunctions during the task administration), or missing more than 50% of their data. This left us with a sample of N = 129.

Participants were eligible for the study so long as they were (a) between the ages of 18-45 years old; (b) able to speak, read, and understand English fluently; and (c) willing and able to provide informed consent. The study was approved by the University of Texas Institutional Review Board and all participants gave written consent to participate through REDCap (Research Electronic Data Capture), a secure web application for building and managing online studies. Data collection began on 3/14/2019 and concluded on 12/4/2019, spanning the spring, summer, and fall academic semesters of 2019.

Descriptive characteristics can be found in Table 1. Average age was 19.4 (SD = 1.4). The sample was mostly women (55.8%), non-Hispanic (67.4%), and the most common race reported was white (48.8%).

Download:

Table 1. Participant Demographics.

https://doi.org/10.1371/journal.pone.0318923.t001

Materials.

In this sample, depression severity was measured using the Beck Depression Inventory-II (BDI-II; [37]). In the present study, we used a 20-item version that excluded the suicidal ideation item. We removed the suicidal ideation item since immediate evaluation and triage would not be possible, as the experimental sessions were being conducted by undergraduate (e.g., unlicensed) research assistants. In this sample the 20-item BDI-II had strong internal consistency (ɑ = .91, 95% CI [.88,.93]).

“Eye-Tracking Apparatus and Procedure

Eye-tracking was captured via a video-based eye tracker (EyeLink 1000 Plus Desktop Mount; SR Research, Osgoode, ON, Canada). Dominant eye for each participant was determined using a modified version of the near-far alignment task [38]. Calibration was completed prior to beginning the task in order to map the participants’ gaze onto the screen coordinates; a 13-point calibration routine was used. We allowed for natural head movement (e.g., we did not use a headrest), and used a head-based tracker to provide consistent eye-tracking. Sampling was taken at 250 Hz using the participant’s dominant eye.

The eye-tracking task was presented using a 23.6-inch CRT monitor (ViewPixx; VPixx Technologies, Quebec, Canada) with a screen resolution of 1920 × 1080 pixels (120 Hz refresh rate). The task was run using OpenSesame, a user-friendly graphical experiment builder, with options for scripting utilizing PsychoPy, open-source software for running experiments [39]. OpenSesame also has integration with Eyelink software for data acquisition. Data was initially processed using Eyelink Data Viewer.

Eye-tracking attention bias task.

In this study, we altered a previous version of a task originally designed by Lazarov and colleagues [30,31,40], which we have previously described in detail [33]. Matrices of sixteen faces in a 4 x 4 array are presented in each trial of this task, consistent with other free-viewing approaches that have been used in the literature for assessing more naturalistic patterns of visual attention [4].

Stimuli were chosen from the FACES dataset and were selected based on previously documented accuracy ratings [41]. In our task, we used equal numbers of middle-aged and young-adult actor images and did not use the older adult facial stimuli in an effort to be more consistent with the demographics of our participants.

Each trial consisted of a 4 x 4 matrix of photographs on a black background. When designing the task, we selected 32 images of 8 male and 8 female actors, with neutral and sad faces for each (e.g., 8 sad male faces, 8 neutral male faces from the same actor, 8 sad female faces, 8 neutral female faces from the same actor). Only sad and neutral facial expressions were used in this task. We separated the images into four pools and generated 15 trials from each pool for a total of 60 trials. Consistent with the design of Lazarov and colleagues, the generation of the matrices were random with the following constraints: (a) each actor could appear only once on the matrix, (b) there was an even split of genders in each matrix (8 male and 8 female), (c) there was an even split of valences in each matrix (e.g., 8 neutral and 8 sad), and (d) the four inner faces always contained two emotional and two neutral faces. We refer to this task from herein as the matrix. An example trial can be found in Figure 1.

Download:

Fig 1. Example trial of the matrix task.

Note: In our actual trials, we had the following constraints, consistent with Lazarov, Abend, and Bar-Haim (2016): (a) each actor could appear only once on the matrix, (b) there was an even split of genders in each matrix (8 male and 8 female), (c) there was an even split of valences in each matrix (e.g. 8 neutral and 8 sad), and (d) the four inner faces always contained two emotional and two neutral faces. This sample trial uses images approved for publication, so while criteria (b) - (d) are met, criterion (a) is not fulfilled in this example [41].

https://doi.org/10.1371/journal.pone.0318923.g001

Our task consisted of 60 trials displayed for 10 seconds each, and trials were arranged in two blocks of 30 trials each. Between blocks, participants were allowed to take a break and were told to press the spacebar when they were ready to continue. Instructions for the task told participants to fixate on the fixation dot presented at the beginning of each trial. The following instructions were given for the task, “Before each matrix, a fixation dot will appear on the screen. Make sure to fixate on the dot when it appears. When the matrix appears, look at the images freely and naturally. Do you have any questions?” These instructions were used to allow for and encourage naturalistic processing of the images [30].

Eye Tracking Metrics.

From each stimulus presentation, we created 16 areas of interest (AOIs), one for each individual photograph presented in the image grid during each trial. We then collapsed the AOIs into two categories, one sad AOI and one neutral AOI for the corresponding faces (e.g., two AOIs per trial). We computed dwell time for sad and neutral faces, which in our prior work has had good internal consistency and unidimensionality [33]. Dwell time was operationalized as the sum of fixation durations across the course of the whole trial and is presented in milliseconds (ms).

Procedure.

Participants were recruited through SONA systems, a cloud-based recruitment and participant management system geared toward university research studies. This study was conducted by trained research assistants in the Mood Disorders Laboratory on-campus at the University of Texas at Austin. Participants completed the study in exchange for 1 hour of course credit.

Participants completed a consent form, followed by a demographic form and self-report questionnaires. They then completed the matrix task and were awarded course credit at the end of the experiment session.

Experiment 2

Methods

Participants.

We collected a sample of 108 college students from the University of Texas participant pool. However, before running our main analyses, we performed a number of data quality assessments typically performed when collecting data remotely and fully unsupervised, ultimately retaining a sample of N = 79. Each step of our data quality assessment is detailed in the section Data Quality Analysis below, along with the number of people excluded in each step.

Participants were eligible for the study so long as they were (a) between the ages of 18-45 years old; (b) able to speak, read, and understand English fluently; and (c) willing and able to provide informed consent. Participants were also required to use a desktop or laptop computer to access the study. This study was approved by the University of Texas Institutional Review Board. Since participants consented to the study online through clicking a checkbox, we obtained a waiver of consent documentation from the IRB. Data collection took place between 11/23/2021-12/13/2021.

Descriptive characteristics of the sample can be found in Table 1. Our sample was majority female (50.6%), white (54.4%), and non-Hispanic (68.4%). Average age was 19.3 (SD = 1.3) and mean PHQ-8 score was 11.2 (SD = 6.4), indicating mild depression severity.

Materials

PHQ-8.

We measured depression severity via the Patient Health Questionnaire-8 (PHQ-8). The PHQ-8 is identical to the PHQ-9 except the suicide item has been removed. The use of the PHQ-8 is potentially advantageous to use in internet-based research where immediate follow-up for a risk evaluation is not logistically possible [42]. The PHQ-8 appears to have similar properties as the PHQ-9 [43], and one report found no significant differences in internal consistency and validity between the 8 and 9-item versions [44]. Internal consistency in our sample was similarly high (alpha = .91, 95% CI [.87,.94]).

IDAS Dysphoria.

As a test of robustness, in this study we included a secondary measure of depression severity, the Inventory of Depression and Anxiety Symptoms (IDAS) dysphoria subscale. The IDAS questionnaire was derived using factor analysis and the dysphoria subscale appears to have good psychometric properties [45,46] including unidimensionality justifying use of a total sum score. Internal consistency in our sample was also excellent (alpha = .94, 95% CI [.92,.96]).

Attentive Responding Scale - Infrequency.

Given that this study was being administered online, we incorporated a number of attention checks to ensure participants were engaged in the task. For data cleaning purposes, we used the 11-item Infrequency subscale of the Attentive Responding Scale - 33 item version [47] (further information about all data cleaning procedures are detailed below in the Assessment of Data Quality section). The infrequency subscale is designed to capture participants with a highly unlikely pattern of responding. Items are designed to encourage highly skewed responses from most participants (e.g., “ My favorite subject is agronomy”, “ I love going to the DMV (Department of Motor Vehicles)”, “My main interests are coin collecting and interpretive dancing.”) Frequent endorsement of these items results in a larger score. Participants who score above a previously established threshold ( > 11.5 [47]) were excluded on the basis of inattentive responding.

Mouse-tracking attention bias task.

Our mouse-tracking task was very similar to the task described in sample one, with a few exceptions. The major difference was the number of trials and length of stimulus presentation. When piloting the task, we observed that it takes longer to navigate through all the images in the matrix using the mouse cursor than when simply navigating through visually with line of sight. Therefore, we doubled the presentation time of each trial to 20 seconds, and reduced the total number of trials to 30, distributed across two blocks (15 trials each). We reduced the number of trials so as not to increase the total time spent on the task and therefore increase participant burden.

Between blocks, participants were allowed to take a break and were told to press the spacebar when they were ready to continue. Instructions for the task read: “In this task, you will use your mouse to look at the pictures. You will need to click on the fixation cross to make the pictures appear. You can then look freely at the pictures in any way you choose until they disappear. Press the spacebar when you are ready to continue with the task.” The requirement to click the fixation cross at the beginning of each trial was implemented for engagement and to ensure participants started at the same point on the screen for each trial. In each trial, participants were presented with a 4 x 4 matrix of photographs; in this study, the images appeared on a white background.

In combination with the task, we used Mouseview.js, a JavaScript library that is integrated with Gorilla experiment builder. The Mouseview software first utilizes a Gaussian blur overlay, a photo processing technique that blurs the images on the screen using a Gaussian function. The entire webpage is then blurred and all images on the screen are obfuscated. Next, the participant can view portions of the screen by moving a small circular aperture with the computer’s mouse. Participants are able to view the images only via hovering over the picture with the aperture associated with the location of the mouse. Mouseview then tracks the coordinates of the aperture viewing window to identify where and how long participants have been viewing stimuli. Participants also were instructed to look at the pictures by moving their mouse any way they chose until the pictures disappeared; all participants were given a practice trial first as a demonstration. An example trial of the free-viewing task presented with Mouseview can be found in Figure 2. A practice trial was given to participants as a demonstration. The Gaussian overlay can be adjusted in terms of color (e.g., black vs color overlay), opacity (how transparent the overlay is), and degree of blur. We used the default settings within Gorilla (color = black; alpha (opacity) = 0.8; Gaussian blur = 20). We also maintained the default size for the viewing aperture (aperture size = 5% of computer screen).

Download:

Fig 2. Example trial of matrix task with Mouseview.js when stimuli are occluded via Gaussian Blur, and the mouse-aperture is hovering over an image.

https://doi.org/10.1371/journal.pone.0318923.g002

Mouse Tracking Metrics.

Similar to the eye-tracking data described above, 16 AOIs were created from the 4x4 matrix in each trial. The AOIs were then collapsed into two omnibus categories per trial: one sad AOI and one neutral AOI for the corresponding faces. We computed total scan time for sad and neutral faces on a per trial basis. Scan time was operationalized as the sum of the amount of time the aperture was positioned over the faces across the course of the whole trial (presumably reflecting attention), and is presented in milliseconds (ms). We have previously established that the metrics used in this analysis have good psychometric properties [35].

Procedure.

Participants were recruited through SONA systems, a cloud-based recruitment and participant management system geared toward university research studies. Within SONA they were given a link to access the study hosted on Gorilla – an online platform for running behavioral experiments [36]. All aspects of this study were conducted through the Gorilla platform (e.g., consent, questionnaires, behavioral tasks).

Participants completed the study in exchange for 1 hour of course credit. While the study protocol was less than 1 hour, participants were given a 2-hour window from the time they initiated the study to finish the questionnaires and tasks (although some of the tasks had time limits). If participants were interrupted or were not aware of the 2-hour time limit and wanted to be able to complete the study for course credit, they had to restart the study from the beginning. No one was given more than 1 additional attempt at completing the study.

Participants completed a consent form at the beginning of the study. They were then randomized to one of two studies, the other of which is not described in this report; no participants completed both studies. Next, they filled out a demographic form and self-report questionnaires. They then completed the matrix task, along with other behavioral tasks not relevant to the current study. Finally, they completed a question asking how much effort they had put into the study (detailed further below) and were awarded course credit.

Assessment of Online Data Quality.

Given the importance of assessing participant attention and effort [47], particularly in online research [48,49], we incorporated a number of attention checks [50] into our data cleaning pipeline for our online samples prior to computing our results. First, we removed individuals who were missing 50% or more of their trials in the matrix task. Trials were also coded as missing if participants did not move the aperture over at least 4 faces in the trial (25% of the stimuli in the trial). Thus, if participants were engaging with less than a quarter of the stimuli per trial for 50% or more of the trials, we removed them on the basis of poor engagement (9 participants).

Next, we considered a measure of engagement, the Infrequency subscale [47]. Using a previously established cutoff score, we excluded 3 individuals who scored a 11.5 or higher on the ARS Infrequency scale, designed to capture a pattern of highly unusual responses, signaling inattention to the task (further details on this measure can be found in the materials section). One additional individual was excluded for failing to complete self-report data.

At the end of the experiment, we asked participants to rate their overall level of engagement: “Please answer the following question honestly. You will still get credit for your participation, regardless of your answer. How much effort did you put into the tasks in this study?” Participants could select the following responses: 1 - “A lot; I tried my best the whole time”; 2 - “Some; I gave it a shot”; 3 - “Not much; I was distracted”; 4 - “None; I clicked through randomly.” We then excluded 3 individuals that indicated a 3 or 4 on this item, self-disclosing that they had been distracted during the experiment.

At this point, we had a sample of 92 participants. However, after data collection was completed, the Gorilla software documentation was updated to warn experimenters that the Gaussian Blur component of Mouseview does not work as intended in Safari Browsers. Therefore, we had to exclude an additional 13 participants who had accessed the experiment on a Safari browser for a sample size of N = 79. Note that the same individuals may have met criteria for more than one data quality exclusion criterion. This level of attrition (108 - 79/ 108 = 26.8%) for remotely conducted studies is typical [50–52].