Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

General diagram of the experimental attention analysis process.

The process is composed of three stages: 1) data preparation and experimental setup, 2) image transformation, and 3) distribution analysis. Data preparation and setup consist of the image evaluation process by each experiment participant. This stage requires the use of an eye-tracker to determine participants’ gaze positions and define the experimental conditions. Image transformation consists of generating an attention map using the ViT attention module and of having users experimentally evaluate it on a set of objects. The last stage compares both information sources and thus determines whether there is any correlation. Note: The craft figures shown are similar but not identical to the original images and are included for illustrative purposes only.

More »

Fig 1 Expand

Fig 2.

Objects used in the experiment consisted of ten basketry objects and ten ginger jars.

The objects were randomly selected, with the fewest possible objects in the background. Note: The craft figures shown are similar but not identical to the original images and are included for illustrative purposes only.

More »

Fig 2 Expand

Fig 3.

Experimental procedure for object visualization.

Before starting the experimental phase, a calibration procedure is performed with the recording of a sequence of five points on the screen. Once this process is completed, the experimental phase begins with the projection of an image with a white background and a red dot, displayed for 5 seconds. Then one of the 20 objects is displayed for 10 seconds. This procedure repeats until all objects have been displayed.Note: The craft figures shown are similar but not identical to the original images and are included for illustrative purposes only.

More »

Fig 3 Expand

Fig 4.

Participant setup in front of the screen during the experimental phase.

All participants remain seated while the experiment is conducted. At the beginning of each experiment, a calibration process is performed with the eye-tracker and the experiment is explained to the participant. The chosen distance between the user and screen remains relatively fixed at 150 cm, as it reduces visual fatigue. Note: The craft figures shown are similar but not identical to the original images and are included for illustrative purposes only.

More »

Fig 4 Expand

Fig 5.

Heatmap generation according to positions recorded by each observer.

The heatmap of each object is constructed as the average of individual visualizations transformed to a two-dimensional Gaussian distribution. Note: The craft figures shown are similar but not identical to the original images and are included for illustrative purposes only.

More »

Fig 5 Expand

Table 1.

Classification of each human fixation.

More »

Table 1 Expand

Table 2.

Sociodemographic information of participants.

More »

Table 2 Expand

Fig 6.

Heatmap analysis by each observer.

(a) heatmap of each user for object #1 (basketry), (b) user’s gaze as the parameter increases, the greater the coverage area of the average vision. (c) heatmap of each user for object #13 (ginger jar).

More »

Fig 6 Expand

Fig 7.

Density of users’ average gaze for each object for .

Basketry: Zoom object #4: Detail of the region of an object without a buckle. Zoom object #6: Detail of buckle with longer observation time by users. Ginger Jar: Zoom object #14 vase symbol with the highest amount of observation.

More »

Fig 7 Expand

Fig 8.

12 heatmaps generated by the ViT attention module, both for a basketry-type object and for a jar.

Each of the 12 heatmaps represents part of the attention visualization within the algorithm. Note: The craft figures shown are similar but not identical to the original images and are included for illustrative purposes only.

More »

Fig 8 Expand

Fig 9.

Average ViT for objects.

Average heatmap of the 12 heads of the ViT attention module for each object in the experiment.

More »

Fig 9 Expand

Fig 10.

The distance between the participants and each ViT head is computed separately for each metric (KL, CC, SSIM, SIM).

In this way, we estimate the distance between the 30 participants and the 12 heatmaps from the ViT module (). This procedure is repeated for the 20 objects in the experiment (), and since this distance is computed for a given , we evaluate multiple values with . Thus, we obtain combinations.

More »

Fig 10 Expand

Fig 11.

Each point in the box plot represents the distance between one of the 20 objects and each of the attention heads.

In this example, the parameter is fixed at 2.6. Note that each mean corresponds to the average distance with respect to each head.

More »

Fig 11 Expand

Fig 12.

Variation of distance as the value of increases for each head.

Each distance is computed as the average distance between the average visualization across all objects and each ViT head. Across all metrics, head 12 attains the value closest to human attention. However, as increases, the standard error (shown in light blue) also increases.

More »

Fig 12 Expand

Fig 13.

Variation of distance as the value of increases for each head.

Each distance is computed as the average distance between the average visualization across all objects and each ViT head. The blue region corresponds to the 95% confidence interval.

More »

Fig 13 Expand

Fig 14.

Comparison between each average visualization with respect to head #12 of the ViT attention module.

In the case of average visualization, a value of has been considered.

More »

Fig 14 Expand

Fig 15.

Tukey honestly significant difference (HSD) with different .

Difference between Tukey Honestly Significant Difference (HSD) to measure the difference in means between attention module heads with three variants of .

More »

Fig 15 Expand

Fig 16.

Behavior of the p-value for head #12 according to the HSD test.

The analysis was performed only for the KL and SSIM metrics, as they exhibit a statistically significant difference.

More »

Fig 16 Expand

Fig 17.

Heatmap of lift by head and image.

Each cell reports . Color highlights the magnitude of the effect. The threshold was defined per image and per head, as described in the evaluation section.

More »

Fig 17 Expand

Table 3.

AOI analysis results for the basketry set.

More »

Table 3 Expand

Table 4.

AOI analysis results for the jar set.

More »

Table 4 Expand