Neurally-constrained modeling of human gaze strategies in a change blindness task

doi:10.1371/journal.pcbi.1009322

Fig 1.

Gaze metrics predict success in a change blindness experiment.

A. Schematic of a change blindness experiment trial, comprising a sequence of alternating images (A, A’), displayed for 250 ms each, with intervening blank frames (B) also displayed for 250 ms (“flicker” paradigm), repeated for 60 s. Red circle: Location of change (not actually shown in the experiment). All 20 change image pairs tested are available in Data Availability link. B. Distribution of success rates of n = 39 participants in the change blindness experiment. Red and blue bars: good performers (top 30^th percentile; n = 9) and poor performers (bottom 30th percentile; n = 12), respectively. Inverted triangles: Cut-off values of success rates for classifying good (red) versus poor (blue) performers. C. Classification accuracy, quantified with area-under-the-curve (AUC), for classifying trials as hits versus misses (left horizontal line) and performers as good versus poor (right horizontal line), obtained with a support vector machine classifier. Violin plots: Null distributions of classification accuracies based on a permutation test (*** p<0.001). Error bars: Clopper-Pearson binomial confidence intervals. D. Feature selection measures for identifying the most informative features that distinguish good from poor performers. From top to bottom: Fisher score, Information gain, Change in area-under-the-curve (AUC) and bag of decision trees (for details, see Feature Selection Metrics in the Materials and Methods). Brighter colors indicate more informative features. Solid red outline: most informative feature in the fixation feature subgroup (left); dashed red outline: most informative feature in the saccade feature subgroup (right). FD—fixation duration, SA—saccade amplitude, SD—saccade duration, SPS—saccade peak speed. μ and σ² denote mean and variance of the respective parameter. E. Distribution of mean fixation duration (μ_FD, in milliseconds) across 19 change images for good performers (x-axis) versus poor performers (y-axis); one change image pair, successfully detected by all performers, was not included in these analyses (see text). Each data point denotes average value of μ_FD, across each category of performers, for each image tested. Dashed diagonal line: line of equality. p-value corresponds to significant difference in mean fixation duration between good and poor performers. F. Same as in E, but comparing variance of saccade amplitudes (in squared degrees of visual angle) for good versus poor performers. Other conventions are the same as in panel E.

More »

Expand

Fig 2.

Scan paths and fixation maps do not distinguish good from poor performers.

A. (Left) Representative image used in the change blindness experiment (Image #6 in Data Availability link). (Right) Clustering of the fixation points based on the peak of the fitted BIC (n = 13) profile. Fixation points in different clusters are plotted in different colors. Black fixations occurred in fixation sparse regions that were not included in the clustering. Black arrows show a representative scan path–a sequence of fixation points. The character “string” representation of this scan path is denoted on the right side of the image. B. Variation in the Bayesian Information Criterion (BIC; y-axis) with clustering fixation points into different numbers of clusters (x-axis; Materials and Methods). Circles: Data points. Gray curve: Bi-exponential fit. C. Distribution of edit distances among good performers (x-axis) versus edit distances among poor performers (y-axis). Each data point denotes median edit distance for each image tested (n = 19). Other conventions are the same as in Fig 1E. D. Distribution of intra-category edit distance (y-axis), among the good or among the poor performers, versus the inter-category edit distance (x-axis), across good and poor performers. Red and blue data: intra-category edit distance for good and poor performers respectively. Each data point denotes the median for each image tested (n = 19). Other conventions are the same as in panel C. E. Same as panel C, but comparing Pearson correlations of fixation maps among good (x-axis) and poor performers (y-axis). Other conventions are the same as in panel C. F. Same as in panel D, but comparing intra- versus inter-category Pearson correlations of fixation maps. Other conventions are the same as in panel D. G. Distribution of time to first fixation within the region of change (in seconds) for good performers (x-axis) versus poor performers (y-axis). Other conventions are the same as in panel C. H. Same as in E, but comparing time to detect change (in seconds) for good versus poor performers. Other conventions are the same as in panel G.

More »

Expand

Fig 3.

Saccade probabilities and fixated features are similar across good and poor performers.

A. Average saccade probability matrices for the good performers (top; red outline) and poor performers (bottom; blue outline). These correspond to probabilities of making a saccade between different “domains” (1–4), each corresponding to a (non-contiguous) collection of image regions, ordered by frequency of fixations: most fixated regions (domain 1) to least fixated regions (domain 4). Cell (I, j) (row, column) of each matrix indicates the probability of saccades from domain j to domain i. B. Classification accuracy for classifying good versus poor performers based on the saccade probability matrix features, using a support vector machine classifier. Other conventions are as in Fig 1D. Error bars: s.e.m. C. Identifying low-level fixated features across good and poor performers. 112x112 image patches were extracted, centered around each fixation, for each participant; each point in the 112x112 dimensional space represents one such image patch. Principal component analysis (PCA) was performed to identify low-level spatial features explaining maximum variance among the fixated image patches, separately for good and poor performers. D. Top 6 principal components, ranked by proportion of variance explained, corresponding to spatial features explaining greatest variance explained across fixations, for good performers (left panels) and poor performers (right panels). These spatial features were highly correlated across good and poor performers (median r = 0.20, p<0.001, across n = 150 components).

More »

Expand

Fig 4.

A Bayesian model of gaze strategies for change detection.

A. Schematic showing a typical fixation across the pair of images (A, A’) and an intervening blank. B. Detailed steps for modeling change detection (see text for details). (Clockwise from top left) At each fixation, a Cartesian variable resolution (CVR) transform is applied to mimic foveal magnification, followed by a saliency map computation to determine firing rates at each location. Instantaneous evidence for change versus no change (log-likelihood ratio, log L(t)) is computed across all regions of the image. An inverse CVR transform is applied to project the evidence back into the original image space, where noisy evidence is accumulated, (sequential probability ratio test, drift-diffusion model). The next fixation point is chosen using a softmax function applied over the accumulated evidence (E_t). To model human saccadic biases, a distribution of saccade amplitudes and turn angles is imposed on the evidence values prior to selecting the next fixation location (polar plot inset). C. A representative gaze scan path following model simulation (cyan arrows). Colored squares: specific points of fixation (see panel D). Grid: Fine divisions over which the image was sub-divided to facilitate evidence computation. Green (1), blue (2) and red (3) squares denote first (beginning of simulation), intermediate (during simulation) and last (change detection) fixation points, respectively. D. Evidence accumulated as a function of time for the same three representative regions as in panel D; each color and number denotes evidence at the corresponding square in panel C. When the model fixated on the green or blue squares (in panel C), the accumulated evidence did not cross the threshold for change detection. As a result, the model continued to scan the image. When the model fixated on the red square (in the change region), the accumulated evidence crossed threshold (horizontal, dashed gray line) and the change was detected.

More »

Expand

Table 1.

Model parameters and their default values.

More »

Expand

Fig 5.

Effect of model parameters on change detection success.

A. Change in model performance (success rates, % correct) with varying the relative interval of the images and blanks, measured in units of time bins (Δt = 25 ms/time bin; Table 1), while keeping the total image+blank interval constant (at 50 time bins). Positive values on the x-axis denote larger image intervals, as compared to blanks, and vice versa, for negative values. Blue points: Data; gray curve: sigmoid fit. B. Same as in panel (A), but with varying the maximum decay factor (γ; Eq 2). Curves: Sigmoid fits. C. Same as in panel (A) but with varying the firing rate prior (μ_f) for image pairs with the lowest (blue; bottom 33^rd percentile) and highest (red; top 33^rd percentile) magnitudes of firing rate changes. Curves: Smoothing spline fits. Colored squares: μ_f corresponding to the center of area of the two curves. D. Same as in panel (A), but with varying the mean fixation duration (μ_FD; measured in time bins, Δt = 25 ms/time bin). (Inset; lower) Variation of μ_FD with prior ratio of change to no change (P(C:NC)). (Inset; upper) Same as lower inset but with varying threshold decay rate ζ (Table 1). E. Same as in panel (A), but with varying saccade amplitude variance (σ²_SA). (Inset) Variation of σ² _SA with the softmax function temperature parameter (T) (see text for details). F. Same as in panel (A), but with varying saccade amplitude variance (σ²_SA). (Inset) Variation of σ² _SA with the foveal magnification factor (FMF). Other conventions in B-F are the same as in panel A. Error bars (all panels): s.e.m.

More »

Expand

Fig 6.

Comparison between human and model performance.

A. (Left) Joint distribution of saccade amplitude and saccade turn angle for human participants (averaged over n = 39 participants). Colorbar: Hotter colors denote higher proportions. (Right) Same as in the left panel, but for model, averaged over n = 40 simulations. B. Correlation between change detection success rates for human participants (x-axis) and the model (y-axis). Each point denotes average success rates for each of the 20 images tested, across n = 39 participants (human) or n = 40 iterations (model). Error bars denote standard error of the mean across participants (x-axis) or simulations (y-axis). Dashed gray line: line of equality. C. Average absolute deviation from human performance of the sequential probability ratio test (SPRT) model (Model, leftmost bar), for a control model in which evidence decayed rapidly (Control 1, γ = 1; second bar from left), for a control model in which the stopping rule was based on the derivative of the posterior odds ratio (Control 2; third bar from left), or for a control model which employed a random search strategy (Control 3, T = 10⁴; rightmost bar). p-values denote significance levels following a paired signed rank test, across n = 20 images (*p < 0.05).

More »

Expand

Fig 7.

Comparison between human, model and Deep Gaze II performance.

A. Distribution of saccade amplitudes for human participants (yellow), sequential probability ratio test (SPRT) model (red) and the Deep Gaze II neural network (blue). B. Top 10 clusters of human fixations, ranked by cumulative fixation duration (rows/columns 1–10). Increasing indices correspond to progressively lower cumulative fixation duration. C. Saccade probability matrix (left) averaged across all images and all participants, (middle) for simulations of the sequential probability ratio test (SPRT) model, and (right) for the Deep Gaze II neural network. D. Distribution, across images, of the correlations (r-values) of saccade probability matrices between human participants and sequential probability ratio test (SPRT) model (left) and human participants and Deep Gaze II neural network (right). p-value indicates pairwise differences in these correlations across n = 20 images.

More »

Expand