Dynamic perceptive compensation for the rotating snakes illusion with eye tracking

This study developed a dynamic perceptive compensation system for the rotating snakes illusion (RSI) with eye tracking. Large eye movements, such as saccades and blinks, were detected with an eye tracker, and perceptive compensation was dynamically performed based on the characteristics of RSI perception. The proposed compensation system considered three properties: spatial dependence, temporal dependence, and individual dependence. Several psychophysical experiments were performed to confirm the effectiveness of the proposed system. After the preliminary verification and determination of the temporal-dependent function for RSI perception, the effects of gaze information on RSI control were investigated. Five algorithms were compared using paired comparison. This confirmed that the compensation system that took gaze information into account reduced the RSI effect better than compensation without gaze information at a significance threshold of p < 0.01, calculated with Bonferroni correction. Some algorithms that are dependent on gaze information reduced the RSI effects more stably than still RSI images, whereas spatially and temporally dependent compensation had a lower score than other compensation algorithms based on gaze information. The developed system and algorithm successfully controlled RSI perception in relation to gaze information. This study systematically handled gaze measurement, image manipulation, and compensation of illusory image, and can be utilized as a standard framework for the study of optical illusions in engineering fields.


Ø (Revised in ll. 527--546) Effect of superposition of spatially and temporally dependent compensation:
The effectiveness of compensation deteriorates for many subjects when spatially and temporally dependent compensation is combined. It is necessary to consider space x time interactions for the precise reduction of optical illusion, as spatial and temporal parameters interact with each other. As described elsewhere, two main mechanisms of RSI related to eye movement have been proposed: refreshing the retinal image through large eye movements and updating the image with small eye movements.
Our study dealt only with the former under gaze-free conditions, and the results confirmed that taking large eye movements into account would be effective for compensating for RSI perception. The results also indicated that it would be possible to compensate for RSI perception by physically rotating the image in the opposite direction of the perceptual motion, a conclusion that is consistent with previous studies [12,14]. On the other hand, the results of Experiment 2 showed that the illusory effect was halved by 500 ms, while larger absolute PSE values were observed in the illusory condition than in the control setting, even at longer compensation time. This suggests that not only to temporally dependent components derive from large eye movements but also that temporally independent components deriving from small eye movements likely exist, which is consistent with the results of Backus et al. [12] which reported a temporal dependence of the RSI perception in the eye-fixation condition.

(c) Guidance for potential engineering applications
Thank you for your comment. Perceptive compensation for optical illusions is already in use in engineering applications such as typeface design and icon arrangement. For example, in many typefaces, the intersection of lines in the letter X is intentionally shifted such that the lines are perceived to be connected, exploiting the Poggendorff illusion effect. Taking into account the Hermann grid illusion effect, square icons should not be placed near to each other in a display. In this study, we conducted three experiments on RSI illusions to obtain basic knowledge of how to compensate for gazedependent motion illusions. Following these consideration, we have added guidance for potential engineering applications, i.e., how the findings of our research can be applied to the developments in the field of virtual reality (VR) and human-computer interaction (HCI), which is summarized below.
Ø (Original in ll. 500--506, Discussion) The methods of constructing a perceptual model are broadly divided into a top-down approach that rules visual information processing from a mathematical informatics perspective [4,5] and a bottom-up approach that generates human perception in a computer using machine learning [26,35]. Both have succeeded in reproducing and emergent illusions. Both have succeeded in reproducing and emergent illusions. After conducting the above modeling, a couple of individual perceptual parameters are experimentally determined, which realizes to develop more universal compensation system for optical illusions. Ø (Revised in ll. 603--624) The methods of constructing a perceptual model are broadly divided into a top-down approach that assesses visual information processing from a mathematical perspective [4,5] and a bottom-up approach that generates human perception in a computer using machine learning [26,35]. Both have succeeded in reproducing and generating illusions. In the context of our study, it may be possible to construct a modelbased illusion compensation system by creating a reproduction model of illusions, including the RSI, for which some parameters can be experimentally determined. Finally, we discuss how our proposed methodology, dynamic perceptive compensation synchronized with eye movements, can be used in various fields, such as VR and HCI. The fundamental operations of the measurement of eye movements, image manipulation, and compensation for illusory perceptions in this study can be utilized in relation to gaze-dependent perception. For example, image manipulation by suppressing optical flows that exist in peripheral vision will probably help maintain the sense of balance, which may lead to a reduction in motion sickness. In relation to compensation methodology, the results of our study suggest that it is necessary to account for both spatial and temporal dependence on eye movements. In other words, compensation should be based on the spatio-temporal characteristics of perception in relation to eye movements. Furthermore, our results suggest that it may be necessary to optimize the algorithm parameters using the superposition effect described above. Multi-dimensional optimization methods such as QUEST+ [36] will likely be an important part of effectively improving system optimization.
Ø (Revised in ll. 239--244) That is, system compensation against the perceived rotation can reduce the RSI effects when the appropriate angular velocity is chosen: forming an approximate ±0.5 deg./s rotation against perceived motion in our experimental design. Note that this optimal parameter is expected to vary in relation to experimental conditions such as the lighting environment, display brightness, and specific pattern arrangement.
Second, in the following two sentences where the specific values of the optimal parameters are mentioned, the text "although the parameter may vary in relation to the environmental conditions, such as display brightness and lighting " is added to weaken the statements to the results of Experiment 1 and the Conclusion. A typographical error ("anglular") and wording have also been also corrected. Ø (Original in ll. 226--228, Experiment 1) Specifically, the compensation anglular velocity of 0.5 deg./s against the perceived motion can reduce the RSI effects in both CW and CCW images in these experiment settings. Ø (Revised in ll. 264--268) Specifically, the compensation angular velocity of 0.5 deg./s against the perceived motion can reduce the effects of the RSI for both the CW and CCW images in these experimental settings, although the parameter may vary in relation to the environmental conditions, such as display brightness and lighting.
Ø (Original in ll. 518--521, Conclusion) Preliminary experiment based on paired comparison showed that the compensation by the proposed method reduced the Rotating Snakes Illusion (RSI) effects by setting appropriate parameters; compensation angular velocity at 0.5 deg./s against the perceived motion can reduce the RSI effect. Ø (Revised in ll. 635--640) A preliminary experiment using paired comparison showed that the compensation with the proposed method reduced the effects of the RSI by setting appropriate parameters to our experimental conditions; the compensation angular velocity at 0.5 deg./s against the perceived motion reduced RSI effects, although the optimal parameter could vary in relation to the experimental condition.
Second, the phrase "can be personalized," found in two places in the manuscript, was changed to "was personalized" to more directly indicate the existence of the calibration step. Ø (Original in ll. 405-407, Experiment 3) Also, by executing the calibration step, the optimal algorithm parameters can be obtained for each individual, and each of the proposed algorithms can be personalized. Ø (Revised in ll. 474--475) Further, through executing a calibration step, the optimal algorithm parameters could be obtained for each individual, and each of the proposed algorithms was personalized.

#R1_3-----------------------------------------------------------------------------------------------------------------------------------
The statement that the established parameters "realizes to develop more universal compensation system for optical illusions" (lines 506-7) also seems to overstep the data: this system is specific to RSI. Thank you for your comment. We apologize for the inadequate description. Our statement likely confused the application of our compensation system to the reproduction model proposed in previous studies. Thus, we have revised the description of our challenge as follows: Ø (Original in ll. 504 -507, Discussion) Both have succeeded in reproducing and emergent illusions.After conducting the above modeling, a couple of individual perceptual parameters are experimentally determined, which realizes to develop more universal compensation system for optical illusions. Ø (Revised in ll. 607--610) Both have succeeded in reproducing and generating illusions. In the context of our study, it may be possible to construct a model-based illusion compensation system by creating a reproduction model of illusions, including the RSI, for which some parameters can be experimentally determined. Under what VR or HCI situation would you want to augment vision by reducing visual illusions in the periphery-particularly if this process requires a great deal of calibration for specific people and contexts? Thank you for your comment and we apologize for the ambiguity of our explanation. Ultimately, our goal was to develop a system that can intervene in human perception by compensating for illusions that depend on eye movements, provide awareness of illusions, and reduce unnecessary illusions. As described in #R1_0, perceptive compensation techniques are utilized in various fields, including typeface design, while compensation including gaze information has not yet been investigated. Keeping this in mind, we have added the following description. Ø (Original in ll. 41--46, Introduction) The technology of controlling gaze-dependent illusion is not limited to understanding the dynamics of visual perception, but is also applicable to eliminating the visual perception discrepancies that are caused by optical illusions for VR and human-computer interaction (HCI). Our research systematically handles gaze measurement, image manipulation, and compensation of illusory image, which can be used as a standard framework to research on optical illusion in the engineering fields. Ø (Revised in ll. 46--57) The technology of controlling gaze-dependent illusions is not limited to understanding the dynamics of visual perception. It can also be applied to eliminating the discrepancies in visual perception that are caused by optical illusions in the fields of virtual reality (VR) and human-computer interaction (HCI). By intervening in human perception by compensating for gaze-dependent illusions using a head-mounted display or glasses with an eye tracker, it may be possible to produce a system that make one aware of necessary perception and reduces unnecessary illusions. Because the preception of RSI alters with eye movements, use of this technology means we can achieve the fundamental knowledge necessary to intervene in human perception in relation to gaze information. Our research systematically handled gaze measurement, image manipulation, and compensation for an illusory image, and may be usable as a standard framework for research on optical illusion in engineering fields.

#R1_5-----------------------------------------------------------------------------------------------------------------------------------
Line 133 says "Both systems were set to drive at 30 Hz" and it is unclear what two systems are referred to here. Were the display and eye tracker set to 30 Hz? Why are 90 Hz and 60 Hz values also reported in the same paragraph? The current text is unclear. A 90 Hz sampling rate is well below the benchmarks achieved by researchgrade eyetrackers, and may negatively impact the saccade detection. However, a 30 Hz rate is definitely too low to be valid: previous work has shown that such low-resolution eye tracking fails to reproduce expected saccadic parameters. Thank you for your comment and we apologize for the ambiguous description in the device and system specifications. In our study, the eye tracker itself was driven at 90 Hz to detect eye position, and the computation time for each processing loop should be kept within approximately 10 ms to drive the whole system at 90 Hz. However, a bottleneck in the system processing caused a delay of more than 10 ms. According to the data from our pilot study (see S1 Dataset), the computational loop from the detection of eye movement to image display took 20.3±1.1 ms (approximately 50 Hz), and the system worked with this delay as a bottleneck when the devices were synchronized. Further, as you note, the sampling rate for our eye tracker was below the 1000 Hz benchmark that has been achieved by research-grade eye tracker, so we have revised the description of the possible effects as follows.
Ø (Original in ll. 128--136, Experimental setting and stimuli) Experiment 1 used the system that consisted of the eye tracker at 90 Hz (Tobii Eye Tracker 4C), the laptop PC with the keyboard (ThinkPad T440p, Intel(R) Core i-7-4710 MQ 2.50GHz), and the display at 60 Hz (BENQ G2411HD, 1920 x 1080). Experiments 2 and 3 used the system that comprised an eye tracker (Tobii Eye Tracker 4C) driven at 90 Hz, a PC (Dell, Precision 7920 Tower, Intel(R) Xeon(R) Silver 4210 2.20GHz), and the display at 60 Hz (BENQ G2411HD, 1920 x 1080) driven at 60 Hz. Both systems were set to drive at 30 Hz considering the delay in image processing and displaying. Though it is usual for video presentation at 30 Hz, note that the compensation time Δt has error under 66 ms due to refresh rate limits of the whole system. Ø (Revised in ll. 138--161) Experiment 1 used a system that consisted of an eye tracker at 90 Hz (Tobii Eye Tracker 4C), a laptop PC with a keyboard (ThinkPad T440p, Intel(R) Core i-7-4710 MQ 2.50GHz), and a display at 60 Hz (BENQ G2411HD, 1920 x 1080). Experiments 2 and 3 used the system incorporating an eye tracker (Tobii Eye Tracker 4C) driven at 90 Hz, a desktop PC (Dell, Precision 7920 Tower, Intel(R) Xeon(R) Silver 4210 2.20GHz), and a display at 60 Hz (BENQ G2411HD, 1920 x 1080). In all experiments, the eye tracker itself was driven at 90 Hz to detect the eye position, and the computation time for each processing loop should be kept within approximately 10 ms to drive the whole system at 90 Hz. However, a bottleneck in system processing caused a delay of more than 10 ms. According to data from our pilot study (S1 Dataset), the computational loop from the detection of the eye movements to the image display took 20.3±1.1 ms (approximately 50 Hz), and the system took this delay as a bottleneck when the devices were synchronized. The sampling rate for our eye tracker was below the 1000 Hz benchmark achieved by research-grade eye trackers, and previous studies have found that degradation in the number of detected saccades and inaccurate estimations of saccade duration occurred with eye trackers that have low sampling rates, although accurate measurement of fixation times and points is possible with such trackers [27,28]. With our eye tracker, it was difficult to follow the saccade process in detail, and the tracker sometimes failed to detect brief saccades, which led to a failure of compensation based on the detection of these brief saccades. On the other hand, our algorithm, as described above, did not require direct measurement of the number or duration of saccades, so it was assumed that the proposed algorithm was able to work adequately through detecting large eye movements and blinks.
We have added the dataset for the pilot study to the Supporting information, as follows. Ø (Revised in ll. 668--669 , Supporting information) S1 Dataset. Dataset of eye tracking log data in the pilot study. Data from a pilot study to investigate the sampling rate of eye tracking, provided in CSV format.
The RSI reduction can be realized at the level of consumer devices, which suggests the possibility of widespread use of our technology at low cost. Thus, we have added the following sentence to the Conclusion. Ø (Original in ll. 535--537, Conclusion and future prospects) Consequently, we realized a system that can control the RSI perception depending on gaze information, and the system can be personalized. Based on these results, future prospects are described as follows. We adopted the refresh rate at 30 Hz as a usual video rate, while more precise compensation system can realize the improvement of real-time performance. Ø (Revised in ll. 652--656) Consequently, we created a system that could control the perception of the RSI depending on gaze information, and this system was personalized. The results of our experiments indicated that the effects of the RSI could be diminished even at the speed of consumer devices, which suggests the possibility of widespread use of our technology at a low cost.

#R1_6-----------------------------------------------------------------------------------------------------------------------------------
Lines 228-231 suggest that the system compensated regardless of where participants looked on the display, but this was not specifically tested. True, participants were allowed to gaze freely across the screen. However, the authors assessed compensation as an average across trials, not on the basis of where participants were looking during a trial. The compensation could work better or worse in certain regions of the visual field. Thank you for your constructive feedback, which is helpful for improving the accuracy of our system. As you note, we consider that the position from which participants observed the display affected the effectiveness of our compensation system. Although we obtained log data (all data are available in S3 Dataset and examples of gaze heat map obtained in our experiment are shown in S1 Fig to S3 Fig.) of where participants were looking during each trial, the subjects' responses themselves should have been averaged for the trials, which suggests that the RSI as compensated by the system would have been perceived as more stationary on average. Therefore, first, we have added the following note to experimental methodology in Experiment 1. Ø (Original in ll. 180--181, Experiment 1) The fixation point of the eyes was not prepared because the experiment measures the effects of eye movements. Ø (Revised in ll. 207--213) No eye fixation point was prepared, because the experiment measured the effects of the eye movements. It should be noted that that the subjects gave their responses only once per task after observation of the stimuli; the subjects probably responded their perception, which of the two image sets had moved to a great extent, after considering all of their observation during a task (all data are available as S3 Dataset and examples of the gaze heat map obtained in our experiment are shown in S1 Fig to S3 Fig).
Second, we added the sentence "it is possible that this compensation could work more or less easily depending on the viewpoint of the subjects " to the results in Experiment 1, as follows.
Thank you for your comment and we apologize for the confusion. In our study, the statement that "half of the subjects looked at the non-illusory image after the illusory image" in a single trial was not tested, but the statement across all tasks was tested, as the counterbalancing was performed by blocks of tasks grouped by a stimulus image. That is, the selection bias mentioned in the manuscript does not imply that half of the subjects observed the nonillusory (control) image after the illusory image during a single trial; but instead is in reference to entire tasks. Half of the subjects provided their perception under control conditions following the illusory condition because a counterbalancing experiment was performed for each block of tasks, grouped by the stimulus images; all of the images of the illusory condition were presented after the controls for half of the subjects. Thus, we revised the presentation as follows. First, the method used for counterbalancing in Experiment 2, where the tasks were grouped by stimulus image, is now described in the experimental procedures. Ø (Original in ll. 269--270, Experiment 2) All tasks were presented at random for every subject. Ø (Revised in ll. 315--320) All tasks were presented at random for each subject. Counterbalancing was performed by blocks of tasks grouped by stimulus image. In other words, half of the subjects were randomly presented with a task for each t condition for the illusory image, and after all of the tasks with the illusory image were completed, they were randomly presented with a task in the control condition. The other half received the tasks in the reverse order.
Second, we have added the specification that the selection bias mentioned in the manuscript was probably caused by the counterbalance grouping by trial blocks. Ø (Original in ll. 304--307 in Experiment 2) On the other hand, almost all PSEs of the controlled condition have negative values except the results for Δt = 250 ms, which can be explained by selection bias in the direction of rotation probably occurred: half of the subjects looked at the non-illusory image after the illusory image for which observers perceived clockwise rotation. Ø (Revised in ll. 359--365) On the other hand, almost all of the PSEs of the control condition had negative values, except for the results forΔt = 250 ms, which could be explained by potential selection bias in the direction of rotation: half of the subjects first worked on the illusory condition task block, and observed an illusory image due to the counterbalancing performed with the blocks of tasks groped by stimulus images. In other words, some subjects continued to observe an illusory image during the first task block, which may have biased their perception of perceived rotation when observing the control image.

#R1_8-----------------------------------------------------------------------------------------------------------------------------------
The authors say that an "effect of superposition" may have prevented the combination of spatial and temporal compensation from being effective, but the authors do not describe this superposition effect (beyond describing the data). What mechanisms do you believe are in play here? Also, the text here says "However, the effect is still majorly higher than the still image even in that case" but algorithms O and C did not have a significant difference-and therefore C is not "majorly higher". The combined compensation algorithm does not appear to be effective, and this should be clear in the discussion. Thank you for your comment. We apologize for our inadequate description of the effects of superposition. We assumed that this effect may have been due to the insufficient tuning and selection of the compensation parameters in both the calibration and evaluation steps, caused by the fact that the differences between the images with and without compensation could only be recognized for a short time in peripheral vision. Based on the above, we have added the following sentences to the Discussion. Ø (Original in ll. 463--465, Discussion) The system and algorithm can be applied to Fraser-Wilcox illusion groups that have a similar pattern as RSI. These illusion groups cause perceived motion in a still image and the direction of motion is determined by arrangement or brightness of the pattern. Ø (Revised in ll. 547--565) In addition, we discuss the superposition effects observed in our experiment, namely, that the algorithm taking into account both spatial and temporal dependence (Algorithm C) had lower performance than algorithms considering either one or the other (Algorithms A and B). The reason for this may be that the tuning and selection of the compensation parameters were insufficient in both the calibration and the evaluation steps. Algorithm C compensated for this illusion by temporary rotation only to the peripheral vision alone. Thus, the subjects here were sometimes unable to adjust the parameters sufficiently to clearly distinguish the difference between two images. It was confirmed that Algorithms A, B, and C were more effective than Algorithm U, which suggests that taking gaze information into consideration is essential for RSI compensation, and the effects of the superposition of spatial and temporal dependence requires further investigation through the imposition of restrictions on the parameters and observation times. Next, the possible application of the proposed system and algorithm is described. The fundamental points in the development of our system can be applied to other dynamic illusions that depend on gaze information. The system and algorithm can be applied to Fraser--Wilcox illusion groups with a similar pattern to that of RSI. These illusion groups cause perceived motion in still images, where the direction of motion is determined by the arrangement or brightness of the pattern.
In addition, as you note, Algorithms O and C did not show a significant difference, so we have deleted the following sentence in the explanation of the effect because we consider previous assertion not to be supported. Ø (Original in ll. 458--462, Discussion) Effect of Superposition of Spatial-and Temporal-dependent Compensation: The compensation effectiveness deteriorate for many subjects when spatial-and temporaldependent compensation are combined. However, the effect is still majorly higher than the still image even in that case. It is necessary to consider space x time interaction for precise reduction of optical illusion because spatial and temporal parameters interact each other. Ø (Revised in ll. 527--531) Effect of superposition of spatially and temporally dependent compensation: The effectiveness of compensation deteriorates for many subjects when spatially and temporally dependent compensation is combined. It is necessary to consider space x time interactions for the precise reduction of optical illusion, as spatial and temporal parameters interact with each other. Minor:

#R1_10---------------------------------------------------------------------------------------------------------------------------------
The Supporting Information contains subject-level averages, rather than raw data (trial-by-trial responses). As per PLOS data policy, the data points behind means, medians and variance measures should be available. Thank you for your comment. We have checked PLOS ONE documentation of data-availability policies (https://journals.plos.org/plosone/s/data-availability). The dataset itself was uploaded as a supplemental file, but the title of the file name was not consistent with the documentation, and it was, ultimately, not cited in the manuscript. As you note, we have added the raw data of each subject in the experiments in the same supplementary file. For example, mention of the dataset has been added to the Experiment 1 section, as follows. Ø (Original in ll. 196--198) Namely, the lower the SR is, the more effectively the system could reduce the RSI effect. The blue line denotes the results for the CW images, the red line the results for the CCW images. Ø (Revised in ll. 231--234) Lower SRs indicate more effective reduction of the RSI effect by the system. The blue line denotes the results for the CW images, and the red line gives the results for the CCW images. The original data are available in the Experiment 1 Table in the supplemental file (S2 Dataset).
Ø (Original in the caption of Fig, 4) Experimental results. Each value indicates the average results across the subjects. The horizontal axis indicates the compensation angular velocity of , the vertical axis indicates the average of the selection rate (SR) for subjects to choose the compensated image. Namely, the lower the SR is, the more effectively the system could reduce the RSI effect. The blue line denotes the results for the CW images, the red line the results for the CCW images. Ø (Revised) Results of Experiment 1. Each value indicates results averaged across the subjects. The horizontal axis presents the compensation angular velocity for Δθ, and the vertical axis indicates the average of the selection rate (SR) for subjects for choosing the compensated image. Here, the lower the SR, the more effective reduction of the RSI effects by the system. The blue line denotes the results for the CW images, and the red line gives the results for the CCW images. The original data are available in the Experiment 1 Table of the supplemental file (S2 Dataset).
We have also added the section Supporting Information in which the dataset used in the analysis of our study was mentioned.
Ø (Revised in ll. 664--665) S2 Dataset. Dataset of Experiments 1 to 3. All data obtained in Experiments 1 through 3 are included. All data are available in the tabs in the Excel file.
All of the values in Table 2 have been altered to the values after Bonferroni correction, ten times the original values. The p-value for comparison between Algorithms O and C have been set at 1.0, the maximum p-value, because it exceeds 1.0 after Bonferroni correction. Ø (Original in Table 2) Ø (Revised)

Reviewer #2's comments
Review on Kubota et al. "Dynamic perceptive compensation for the rotating snakes illusion with eye tracking" Overall evaluation

Ø (Original in Abstract)
This study developed a dynamic perceptive compensation system for perceptual manipulation technology based on optical illusions using an eye tracking device that reduces illusion effects depending on eye movement. Using visual stimuli of Rotating Snakes Illusion (RSI), which is one of the gazedependent illusion, large eye movements such as saccades and blinks were detected and perceptive compensation was dynamically performed based on an algorithm according to the characteristics of the RSI perception. Our dynamic perceptive compensation in this study considers three properties: spatial dependence, temporal dependence, and individual dependence. We performed several psychophysical experiments to confirm the effectiveness of the proposed system. After preliminary verification and determination of temporal-dependent function for RSI perception, the effect of gaze information on RSI control was investigated. Seven subjects observed the RSI stimuli that were compensated by our algorithms. Five algorithms were compared using the paired comparison. The obtained parameter values for each algorithm indicate the rotation against the perceived motion made appropriate compensation. The comparison results confirm that our compensation system considering gaze information reduces the RSI effect more appropriately than the compensation without gaze information at a significant level of 0.1 \% with Bonferroni correction, while the spatial-and temporal-dependent compensation has a lower score than the other compensation algorithms that may be due to the fact of superposition effects. Some algorithm dependent on the gaze information can be reduced the RSI effects more stably than the still RSI images. Consequently, our system and algorithm can control the RSI perception dependeing on gaze information, and the system can be personalized. Our research systematically handles gaze measurement, image manipulation, and compensation of illusory image, which can be used as a standard framework to research on optical illusion in the engineering fields. Ø (Revised) This study developed a dynamic perceptive compensation system for the rotating snakes illusion (RSI) with eye tracking. Large eye movements, such as saccades and blinks, were detected with an eye tracker, and perceptive compensation was dynamically performed based on the characteristics of RSI perception. The proposed compensation system considered three properties: spatial dependence, temporal dependence, and individual dependence. Several psychophysical experiments were performed to confirm the effectiveness of the proposed system. After the preliminary verification and determination of the temporaldependent function for RSI perception, the effects of gaze information on RSI control were investigated. Five algorithms were compared using paired comparison. This confirmed that the compensation system that took gaze information into account reduced the RSI effect better than compensation without gaze information at a significance threshold of p < 0.01, calculated with Bonferroni correction. Some algorithms that are dependent on gaze information reduced the RSI effects more stably than still RSI images, whereas spatially and temporally dependent compensation had a lower score than other compensation algorithms based on gaze information. The developed system and algorithm successfully controlled RSI perception in relation to gaze information. This study systematically handled gaze measurement, image manipulation, and compensation of illusory image, and can be utilized as a standard framework for the study of optical illusions in engineering fields.

#R2_2-----------------------------------------------------------------------------------------------------------------------------------
2. The eye-tracking device samples data at relatively low rate (90 Hz), which is comparably small when compared to state-of-the-art vision science labs. Could the authors please comment on the potential impact on the results? Thank you for your comment. As you note, the sampling rate of our eye tracker was below the 1000 Hz benchmark achieved by research-grade eye trackers. However, RSI reduction can be achieved at the level of consumer devices, which suggests the possibility of widespread use of our technology at a low cost. First, we have added a description of possible effects due to the low sampling rate of eye tracking, as follows. In all experiments, the eye tracker itself was driven at 90 Hz to detect the eye position, and the computation time for each processing loop should be kept within approximately 10 ms to drive the whole system at 90 Hz. However, a bottleneck in system processing caused a delay of more than 10 ms. According to data from our pilot study (S1 Dataset), the computational loop from the detection of the eye movements to the image display took 20.3±1.1 ms (approximately 50 Hz), and the system took this delay as a bottleneck when the devices were synchronized. The sampling rate for our eye tracker was below the 1000 Hz benchmark achieved by research-grade eye trackers, and previous studies have found that degradation in the number of detected saccades and inaccurate estimations of saccade duration occurred with eye trackers that have low sampling rates, although accurate measurement of fixation times and points is possible with such trackers [27,28]. With our eye tracker, it was difficult to follow the saccade process in detail, and the tracker sometimes failed to detect brief saccades, which led to a failure of compensation based on the detection of these brief saccades. On the other hand, our algorithm, as described above, did not require direct measurement of the number or duration of saccades, so it was assumed that the proposed algorithm was able to work adequately through detecting large eye movements and blinks.
Second, we have added the following sentence to present the potential impact of the use of a consumer device to the Conclusion. Ø (Original in ll. 535--537, Conclusion and future prospects) Consequently, we realized a system that can control the RSI perception depending on gaze information, and the system can be personalized. Based on these results, future prospects are described as follows. We adopted the refresh rate at 30 Hz as a usual video rate, while more precise compensation system can realize the improvement of real-time performance. Ø (Revised in ll. 652--656) Consequently, we created a system that could control the perception of the RSI depending on gaze information, and this system was personalized. The results of our experiments indicated that the effects of the RSI could be diminished even at the speed of consumer devices, which suggests the possibility of widespread use of our technology at a low cost.
Ø (Original in ll. 458--462, Discussion) Effect of Superposition of Spatial-and Temporal-dependent Compensation: The compensation effectiveness deteriorate for many subjects when spatial-and temporaldependent compensation are combined. However, the effect is still majorly higher than the still image even in that case. It is necessary to consider space x time interaction for precise reduction of optical illusion because spatial and temporal parameters interact each other. Ø (Revised in ll. 527--546) Effect of superposition of spatially and temporally dependent compensation: The effectiveness of compensation deteriorates for many subjects when spatially and temporally dependent compensation is combined. It is necessary to consider space x time interactions for the precise reduction of optical illusion, as spatial and temporal parameters interact with each other. As described elsewhere, two main mechanisms of RSI related to eye movement have been proposed: refreshing the retinal image through large eye movements and updating the image with small eye movements.
Our study dealt only with the former under gaze-free conditions, and the results confirmed that taking large eye movements into account would be effective for compensating for RSI perception. The results also indicated that it would be possible to compensate for RSI perception by physically rotating the image in the opposite direction of the perceptual motion, a conclusion that is consistent with previous studies [12,14]. On the other hand, the results of Experiment 2 showed that the illusory effect was halved by 500 ms, while larger absolute PSE values were observed in the illusory condition than in the control setting, even at longer compensation time. This suggests that not only to temporally dependent components derive from large eye movements but also that temporally independent components deriving from small eye movements likely exist, which is consistent with the results of Backus et al. [12] which reported a temporal dependence of the RSI perception in the eye-fixation condition.
Ø (Original in l. 29, Introduction) From these background, our study aims to reduce the effect of the RSI effects that dependes on eye movements. Ø (Revised in ll. 29--30) Our study investigated means of reducing the effects of RSI perception in relation to eye movements.