An oculometrics-based biofeedback system to impede fatigue development during computer work: A proof-of-concept study

A biofeedback system may objectively identify fatigue and provide an individualized timing plan for micro-breaks. We developed and implemented a biofeedback system based on oculometrics using continuous recordings of eye movements and pupil dilations to moderate fatigue development in its early stages. Twenty healthy young participants (10 males and 10 females) performed a cyclic computer task for 31–35 min over two sessions: 1) self-triggered micro-breaks (manual sessions), and 2) biofeedback-triggered micro-breaks (automatic sessions). The sessions were held with one-week inter-session interval and in a counterbalanced order across participants. Each session involved 180 cycles of the computer task and after each 20 cycles (a segment), the task paused for 5-s to acquire perceived fatigue using Karolinska Sleepiness Scale (KSS). Following the pause, a 25-s micro-break involving seated exercises was carried out whether it was triggered by the biofeedback system following the detection of fatigue (KSS≥5) in the automatic sessions or by the participants in the manual sessions. National Aeronautics and Space Administration Task Load Index (NASA-TLX) was administered after sessions. The functioning core of the biofeedback system was based on a Decision Tree Ensemble model for fatigue classification, which was developed using an oculometrics dataset previously collected during the same computer task. The biofeedback system identified fatigue with a mean accuracy of approx. 70%. Perceived workload obtained from NASA-TLX was significantly lower in the automatic sessions compared with the manual sessions, p = 0.01 Cohen’s dz = 0.89. The results give support to the effectiveness of integrating oculometrics-based biofeedback in timing plan of micro-breaks to impede fatigue development during computer work.


Introduction
Fatigue is often reported by computer users [1,2] and associated with the development of musculoskeletal and psychological disorders [2,3] and compromised performance resulting in a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 applications for biofeedback [48,49], e.g. decrease of the muscular load during computer work [16,17], adjustment of the mental load of computer games using physiological signals including pupil diameter changes [50], mental training in competitive sports [51], and counteracting stress and anxiety [52][53][54]. The proposed biofeedback system in this study was designed to alert participants to take micro-breaks during computer work based on oculometrics in a statistical model describing fatigue states. Of note, counteracting fatigue within this framework of cognitive intervention is favorable from the consumption of psychoactive drugs, e.g. caffeine in association with health risks [55]. We hypothesized that the oculometrics-based biofeedback system would impede fatigue development without compromising the performance of computer work.

Participants
Twenty participants, 10 females and 10 males, aged 26 (SD 3) years old with the height of 1.72 (SD.10) m, and the body mass of 69 (SD 15) kg were recruited. All participants had normal or corrected-to-normal vision (self-reported and examined by Snellen chart). The participants were familiar with computer work and used their right hand as their dominant side for computer mouse. Participants were asked to abstain from alcohol for 24 h, and caffeine, smoking and drugs for 12 h prior to experimental sessions. The participants reported at least 6 h (mean 7.6 ± 0.8 h) of night sleep before the experimental session. The Fatigue Assessment Scale (FAS) [56] and the Visual Fatigue Scale (VFS) [57] were administered respectively to exclude participants suffering from chronic fatigue and eye strain. No participant was found with chronic fatigue and eye strain. Written informed consent was obtained from each participant. The experiment was approved by The North Denmark Region Committee on Health Research Ethics, project number N-20160023 and conducted in accordance with the Declaration of Helsinki.

Experimental approach
A counterbalanced-measures design was employed to investigate the effectiveness of biofeedback-triggered micro-breaks in comparison with self-triggered micro-breaks. To do this, two experimental sessions were conducted in two counterbalanced sessions (days) with one-week inter-session interval.
Computer task. In both experimental sessions, participants were asked to perform a cyclic computer task [38] for approx. 31-35 min (Fig 1). The task [58] developed on MATLAB R2018a (The MathWorks, Natick, MA) was displayed on a 19-in screen (1280×1024 pixels, refresh rate: 120Hz) located approx. 58 cm in front of a sitting participant subtending 27˚×22o f visual angle. The task involved 180 cycles each taking approx. 10 s (corresponding to methods times measurement (MTM-100) [59,60]). Each cycle began by memorizing a random pattern of connected points with different shapes presented on a computer screen. The order of connecting points was determined by a textual cue displayed under the pattern, indicating the starting point. It was followed by a washout period, where no pattern was displayed, and the participants were instructed to keep their gaze on a cross in the center of the screen. The cycle continued by the presentation of the doubled-size replica of the pattern without connecting lines. To redraw the lines and replicate the presented pattern, participants were required to click on a sequence of the pattern points as targets. Once the allocated time to replicate the pattern passed, a new cycle with a different pattern was presented. In this design, the perceived level of fatigue based on Karolinska Sleepiness Scale (KSS) [61], was indicated by the participants after each 20 cycles, i.e. segment, in five seconds (KSS pause). The KSS can be rated from one (extremely alert) to 10 (extremely sleepy, can't wait to sleep).
Micro-breaks. Each experimental session involved either self-triggered or the biofeedback-triggered micro-breaks, respectively termed as manual and automatic sessions. In the manual sessions, the participants were instructed to press the right click button asking for a micro-break, whenever through the task they felt fatigued equivalent to KSS�5. When a micro-break was triggered by a right click (or more), the task execution was interrupted after the earliest upcoming KSS pause (Fig 1). In the automatic sessions, the biofeedback system triggered the micro-break based on its prediction of KSS (explained further in the subsequent section) being �5 [62,63]. In this study, the micro-break consisted of a 25-s interruption of the task, while the participant took an active pause. During the micro-break, a down counter of the seconds from 25 to zero was displayed on the computer screen (Fig 1). The green color has been shown to have restorative effects on attention and cognition [64,65]. The micro-breaks involved four repetitions of seated bilateral shoulder rotations with an elastic band where the shoulders were abducted horizontally up to 45˚while keeping the elbows fixed around 90˚. During the micro-break, the participants were also instructed to perform mindful breathing based on [66,67] where the participants were guided to become aware of their breathing. Besides the benefits of active pauses [68] especially during computer work [69], mindful breathing is associated with oxygenation and reduced mental load and stress to counteract sustained attention [70,71]. The breathing rate was at the participants discretion, due to the diversity and individuality in breathing patterns [72].
Familiarization and task engagement. The participants were instructed to perform the computer task and micro-breaks in four days prior to the first session. In addition, anthropometric measures, visual acuity, and general health and fatigue questionnaires were collected. Afterwards, the participants performed the computer task for 10-min. The participants received a brief overview of the experimental procedure also in the beginning of both sessions and performed the computer task for 5-min as an additional training before commencing the experimental protocol to reduce the learning effect. The participants were not informed about the principle of functioning of the biofeedback system. It was further explained that their choices of KSS had no effect on their performance or the biofeedback system. To evaluate the perceived workload from the tasks, the questionnaire of National Aeronautics and Space Administration Task Load Index (NASA-TLX) [73] was administered after the task termination, as it provides information on different dimensions of workload of a task [74,75]. The participants were informed that their performance was measured and compared with other participants to maintain motivation, and achieving high performance makes them candidates to win a monetary reward (100 Danish Kroner).

The development of a statistical model for fatigue detection
To implement the biofeedback system, a statistical model of fatigue was developed based on previously collected oculometrics dataset (OLDSET) during an identical computer task [38]. The OLDSET consisted of the oculometrics extracted from gaze positions and pupil dilations, and KSS ratings from 38 participants in 40-min samples of an identical computer task without micro-breaks as described in [38]. The state of fatigue for each segment was assigned based on the KSS scores obtained after each segment. The KSS scores were dichotomized based on a threshold value of five, corresponding to being "neither alert nor sleepy", as a transition point between alertness and fatigue. Thus, the segments with the KSS value of �5 were assigned to the class of fatigued and the KSS scores of <5 were assigned to the class of alert. This dichotomization criterion has been used in previous studies, e.g. [62]. It is suggested as a critical value in the association between ocular metrics and sleepiness [63]. With this dichotomization criterion, 45% of the collected segments (205 out of 456) across the entire subject pool in the OLD-SET were labeled as fatigued.
Thirty-four features including oculometrics, sex, and age were used in setting up the classifier (Table 1). A series of viable classification models were examined as outlined in Table 2. A feature subset consisting of five features, i.e. Blink Frequency (BF), Percentage of the duration of closed eyes to opened eyes (PERCLOS), Saccade Frequency (SF), Saccade Peak Velocity Amplitude Relationship (SVA), and Pupil Diameter Interquartile Range (PDIR), was chosen using sequential floating forward feature selection [76]. The feature selection helps to choose a combination of features that best explain the separability of the two classes [76]. The classification criterion was the Youden's J statistic or Youden's index TP P þ TN N À 1 À � [77], where P and N are the number of instances (segments) with respectively positive (fatigued) and negative (alert) labels, and TP, TN are respectively the number of true positive (correctly detected fatigue) and true negative (correctly detected alert) instances. Here, TP P and TN N are respectively True Positive Rate (TPR), and True Negative Rate (TNR). The Youden's index was computed using leave-one-person-out (LOPO) approach on a random forest model ( Table 2) [78]. Different classifiers as outlined in Table 2 were examined using the selected feature subset as input and the class labels of fatigued or alert as output. In the LOPO approach, the classifier was trained using the data from all the participants except one, and it was tested using the excluded participant. This approach was performed for all the 38 participants to compute the average of classification performance across the entire participant pool. Finally, the ensemble of Decision Trees (DT Ensemble) was chosen based on its superior classification performance in terms of accuracy ACC = (TP + TN)/(P + N) (66±21%), TPR (61±29%), and TNR (70±22%) in comparison with the classifiers listed in Table 3, i.e. linear discriminant analysis, decision tree, k-nearest neighbors, support vector machines, Naïve Bayes, feed-forward neural networks, subtractive clustering-based Fuzzy classifier, Fuzzy c-means classifier, logistic regression classifier, and random forest.
The classification model with the best performance (DT Ensemble) was picked to form the core of the biofeedback system. The DT Ensemble with the configuration outlined in Table 2 was trained with the whole dataset consisting of 456 samples (38 participants × 12 segments) to make a statistical model to predict the class label of each segment of the biofeedback system.
The permutation test [96] was conducted on the OLDSET to further examine the classification accuracy of the DT Ensemble against the chance level accuracy obtained from 100 randomly permuted class labels. The sensitivity to the dichotomization criterion for the KSS was also performed on the OLDSET. The classification performance of the DT ensemble was significantly higher than chance level as assessed by the permutation test for the OLDSET (α = 0.01). Changing the dichotomization criterion of KSS scores to �6 as fatigued for the DT Ensemble model did not lead to better classification performances than the criterion of five in the OLDSET. The receiver operating characteristics (ROC) of the training and test sets, and the confusion matrix for the DT Ensemble model are depicted in Fig 2, where the area under the ROC curves for the training (0.89) and test (0.64) sets as well as the results from the confusion matrix supported the usability of the model in the biofeedback system.
During the computer task in the current study, the feature set was obtained across 20 consecutive cycles within a segment and the core of the biofeedback system classified the segment into either the fatigued or the alert class. The section titled "Oculometrics" outlines the  [85] performed analysis to obtain the oculometrics as features. If the segment was classified as fatigued, the biofeedback system triggered the micro-break command following the KSS pause after that specific segment. If the segment was classified into the alert class, no feedback was given. The architecture of the biofeedback system is depicted in Fig 3. The approaches from feature selection to model evaluation in the development of the statistical model to predict fatigue is summarized in Fig 3, and thereby the two mechanisms to trigger the micro-breaks in the automatic and manual sessions are illustrated. All aspects of the biofeedback system were implemented in MATLAB R2018a (The Mathworks, Natick, MA).

Data acquisition and processing
A video-based monocular eye-tracker (Eye-Trac 7, Applied Science Laboratories, Bedford, MA, USA) coupled with a head tracker (Visualeyez II system set up with two VZ4000 trackers, Choices for hyperparameters were based on the recommended constraints [87] KNN k-Nearest Neighbors 11-nearest neighbors classifier using the Euclidean distance metric and an exhaustive searcher k = 11 was chosen based on the highest classification performance in a grid search of k in [1 21], where the upper limit came from ffi ffi ffi n p , where n is the number of the training samples [88], Standardized feature set [89] SVM Support Vector Machines 2 types of SVM each using 4 different kernels. 1) c-SVM (c = 10).

Feedforward Neural Network
One hidden layer including five neurons in the hidden layer with scaled conjugate gradient backpropagation function. Different membership functions were also tested, and the best was chosen (Gaussian) [93].

Fusion
Major voting scheme The same parameter settings for the single classifiers (LDA, DT, KNN, c-SVM with Gaussian kernel, NB, NN, FIS-SC and FIS-FCM) used.
The class (C Fusion ) was determined based on the average of the posterior probability (PP i ) of the single classifiers,

Logistic Regression Classifier
Using binomial logistic regression for the two classes of fatigued and alert No additional option for the model.

DT Ensemble
Ensemble of Decision Tree classifiers Robustness against noisy samples [95] due to the subjectivity of class labels. Choices for hyperparameters were based on the recommended constraints [87] Random Forest

Ensemble of Decision Tree classifiers
Max number of splits: 5, Number of trees:51, The function to measure the quality of a split was GDI Suitable for categorical variables (e.g. sex). Choices for hyperparameters were based on the recommended constraints [87] https://doi.org/10.1371/journal.pone.0213704.t002 An oculometrics-based biofeedback system to impede fatigue development during computer work  Phoenix Technologies Inc., Canada) was utilized to measure the eye movements, pupil diameter, and point of gaze at a sampling frequency of 360 Hz. The coupling of the eye-tracker and the head-tracker was done using built-in software to integrate eye and head positioning data and to compensate for head movements allowing free head movements during the experiment. As reported by the manufacturer, spatial precision of the eye-tracker is lower than 0.5˚of visual angle. The spatial accuracy is less than 2˚in the periphery of the visual field. The calibration of the eye-tracker was performed before starting the task with 9-point calibration protocol and examined before the task execution and after the task termination on the calibration points. The measured accuracy was on average 0.7 ± 0.4˚across participants and did not significantly change across time (p> 0.6). The experiments were conducted in a noise-and illumination-and temperature-controlled indoor room to rule out environmental confounding factors.
Oculometrics. Among all the features outlined in Table 1, the oculometrics were extracted from each segment. Saccades, blinks, and fixations were first identified for each segment using the algorithm of [97] as applied in [38]. Briefly, the algorithm initiated by the computation of visual angle between consecutive samples of point-of-gaze, followed by its derivatives to the angular velocity and acceleration using a 19-samples-length second-order Savitsky-Golay filter [97]. It applied data-driven thresholds on the angular velocity to detect saccadic samples. Zero-valued samples of pupil diameter, corresponding to closed eyes or missing pupil image provided by the built-in software of the eye-tracker, constituted blink samples, and the rest of the samples were assigned to fixations. Pupil diameter (including linearly interpolated zero-valued samples) were filtered using a zero-phase low-pass thirdorder Butterworth to remove noise and artefacts [98]. Additional constraints were imposed to exclude invalid ocular events [38]. The data during the micro-breaks and KSS pauses were not included in the computation of oculometrics.
The frequency of blinks (BF), saccades (SF), and fixations (FF) were computed respectively as the number of blinks, saccades, and fixations during each segment divided by the duration of the segment. The mean duration of blinks (BD), fixations (FD), and saccades (SCD) were computed across each segment. Pupillary responses were characterized using the mean, coefficient of variation, interquartile range, instantaneous phase [83] of pupil diameter, respectively indicated by PD, PCV, PDIR, and PH. The number of closed-eyes samples (zero-valued pupil diameter) to opened-eyes samples was computed as PERCLOS. Blinks were further characterized by the frequency of blinks coincided by gaze shifts >2˚(BGF) [79,99] and their ratio to the number of all blinks (BGR). The mean of inter-blink interval (IBI), the frequency of blinks occurring with IBI<700 ms (DBF), the number of long blinks >200 ms [80] to the segment duration (LBF), and the ratio of long blinks to all blinks (LBR). Saccades were further quantified in terms of the mean value of their peak velocity (SPV), amplitude (SA), curvature [59] (SCR), peak amplitude of saccadic acceleration (SPA) and deceleration (SPD) profiles, intersaccadic intervals (ISI) excluding ISI>250 ms, and the slope of the line regressing peak velocity of saccades to their amplitude (SVA) and duration (SDA). Similarly, fixations were further characterized as the ratio of long fixations (>0.9 s) [34] to all fixations (LFR).
Gaze dispersion was characterized using the mean value of gaze-point displacements and distances between two successive fixations respectively computed as Euclidean distance between the center of gaze points of two successive fixations (FF disp ), the summation of Euclidean distances between successive gaze-points from the onset to offset of saccades connecting the two successive fixations (FF dist ), and the ratio of the FF disp to FF dist for the same two successive fixations (FF disp/dist ). The successive fixations exceeding over feasible saccade duration of >100 ms in this study were excluded [100]. In addition, overall dispersion (OD) was quantified as the averaged Euclidean distance between fixation centers and center of fixations. The center of fixations, gaze points and fixation centers were obtained using the mean value of their corresponding coordinates. The dynamics of visual perception were also quantified using the kappa coefficient of ambient and focal attention (KPA) as defined in [85]. The mean of time intervals (<700 ms) between a blink and its successive saccade was also extracted as a feature in association with blink perturbation effects on saccades [101].
The selected features of SF, SVA, BF, PDIR and PERCLOS were computed in the biofeedback system in the same way as they were computed from the OLDSET. To inspect the effect of the biofeedback system, the mean of the overall performance (OP) [38] across segments was computed. It represents how accurate and fast the pattern replication was done. Theoretically, the OP is a positive value with zero for the lowest performance (no click on the targets).

Statistical analysis
The statistical analysis was performed in SPSS 25. The classification performance of the deployed model (DT Ensemble) in the biofeedback system was reported in terms of the ACC, Sensitivity (TPR), and Specificity (TNR). The classification performance (ACC) was compared between the manual and automatic sessions using repeated measures analysis of variance (RM-ANOVA) and the interaction effect of the time of the day (morning or afternoon) on ACC was also considered. RM-ANOVA was also performed on the outcome variables (OP, KSS and oculometrics) with the time spent doing the task, Time-on-Task (TOT) (nine segments), and the automatic and the manual modes as within-subject factors (significance level p = 0.05). Post-hoc comparisons between the segments were included in pairs indicated by Bonferroni correction. The Huynh-Feldt correction was applied if the assumption of sphericity was not met. The measure of effect size, partial eta-squared, Z 2 p , was also reported. The perceived workload (NASA-TLX scores) and the number of micro-breaks was compared between the automatic and manual sessions using paired t-test with the effect size in terms of Cohen's d z [102], to further evaluate the effectiveness of the biofeedback system. The normality of the variables was assessed using Shapiro-Wilk test. The KSS scores were transformed to normal distribution using square root transformation. Due to the counter-balanced order of the sessions, half of the first sessions were conducted in the manual mode (without biofeedback) and the rest half were conducted in the automatic mode (with biofeedback). The learning effect was analysed using RM-ANOVA to compare the OP across the first and second sessions. Sexrelated differences in the OP, and the NASA-TLX scores, and the ACC were also reported. The Spearman's rank-order correlation was computed between every pair of the following variables in the manual and automatic sessions. The variables were the mean and standard deviation of the OP, the relative change of the KSS scores as (KSS last segment − KSS first segment )/KSS first segment , the number of micro-breaks, and the total and the weighted subscale scores of the NASA-TLX.

Results
In the automatic sessions, the model (DT Ensemble) predicted fatigue with the following classification performance (Mean ± SD): ACC (69±16%), Sensitivity (59±35%), and Specificity (74±22%). The segments with fatigue label constituted 55 segments in total of 180 segments. Fig 4 demonstrated the classification performance (ACC) of the model in both sessions for each participant. There was neither a significant difference in the classification performance between the automatic and manual sessions, nor an interaction of the time of the day on the ACC. The ACC was approx. 10% higher on average in the male participants than in the female participants.
The OP in the presence of the biofeedback did not significantly change, F(1,18) = 1.3, p = .262, Z 2 p ¼ :1, (Fig 5a). The OP increased significantly as TOT increased, F(8,152) = 4.7, p < .001, Z 2 p ¼ :2. Pairwise comparisons revealed that the OP was significantly higher in segments eight and nine compared with two, five, and six, Fig 5a. In addition, there was no learning effect on the OP across the first and second sessions (Fig 5b). The participants reported significantly lower workload in the automatic (55±11) than the manual sessions (65±8) in terms of the total NASA-TLX scores, t(19) = 3.86, p = 0.01, with the Cohen's d z = 0.89 corresponding to a large effect size according to [103]. This improvement was more pronounced in mental and temporal subscales than the other workload subscales as demonstrated in Fig 6. To have an insight on the individual and sex-related differences in the perceived workload, the total NASA-TLX scores as well as the scores for all NASA-TLX subscales were depicted for each participant in Fig 7. In 60% of female and 70% of male participants, the total scores of  An oculometrics-based biofeedback system to impede fatigue development during computer work NASA-TLX were lower in the automatic sessions than in the manual sessions. The total scores were almost equal for the manual and automatic sessions for the participants number 11 and 15 (both females). In 70% of female and 60% of male participants, the mental demand was lower in the automatic sessions than in the manual sessions. 60% of female and 70% of male participants found the task less temporally demanding in the automatic sessions.
The KSS scores significantly increased in both of the manual and automatic sessions as the segments increased F(5.8,109.6) = 15.6, p < .001, Z 2 p ¼ :4, Fig 8. No significant change in the KSS scores was found between the automatic and manual sessions. However, a tendency of  An oculometrics-based biofeedback system to impede fatigue development during computer work biofeedback×TOT interaction was found, F(5.9,113.6) = 1.7, p = .129, Z 2 p ¼ :1. Pairwise comparisons showed that in the manual sessions, the KSS was lower in the first segment than in the segments 5-9, similarly between the segments 2-3 and 7-9, but in the automatic sessions, the significant difference was between the segments 1 and 2 being lower than both segments of 8 and 9.
The OP in automatic and manual sessions for each participant is illustrated in Fig 9. The participants are separated by their sex in Fig 9 to represent sex-related differences on the individual level. The occurrences of micro-breaks at the end of each segment is also indicated by "1" (and "0" for no micro-break) in Fig 9. More variations within the female participants can be observed in the OP than the male participants in both sessions. The standard deviation of the OP across segments averaged across the females were 0.061 whereas it was 0.056 for the males. In addition, the OP was higher in the 58% and 48% of the first segment following the micro-breaks compared with the segment prior to the micro-breaks, respectively for the male and female participants. There was no significant difference between the number of microbreaks in the automatic sessions (2.9±1.9) from the number of micro-breaks in the manual sessions (2.5±2.3), p = . 55.
The correlation analysis revealed that there are statistically significant relationships between a number of outcome variables in the manual sessions. The relative change of the KSS scores were positively correlated with the number of micro-breaks, r s = .47, p = .039. The mean and the standard deviation of the OP tended to be negatively correlated, r s = −.43, p = .060.
In the automatic sessions, there was a marginal correlation between the standard deviation of the OP and the number of micro-breaks, r s = .42, p = .068. The number of the micro-breaks tended to positively correlate with the relative change of the KSS scores, r s = .058, p = .043. Similar to the manual sessions, the mean and the standard deviation of the OP were negatively correlated, r s = −.47, p = .037. No significant correlation was found between the total and weighted subscale scores of the NASA-TLX in the manual and automatic session with any of the mean and standard deviation of the OP and the range of the KSS scores. The presence and absence of micro-breaks are indicated respectively by "1" and "0" at the end of each segment (X-axis). The OP is color coded with the color bar indicated on the right side of the graph with blue for lower and green for higher task performance.
decreased significantly as TOT increased, F(4.9,94.6) = 3.4, p = .007, Z 2 p ¼ :1. Pairwise comparisons revealed that the SF decreased significantly from segment 5 to 9. The SVA fluctuated significantly through TOT, F(8,152) = 2.2, p = .027, Z 2 p ¼ :1. The change between segments 1 and 2 was significant in SVA. No significant effect of TOT on PDIR was observed. Neither any significant effect of biofeedback nor biofeedback×TOT interaction was found in any of the oculometrics.

Discussion
This study provided a novel framework to investigate the application of a biofeedback system to reduce fatigue development in its early stages during computer work. The proposed biofeedback system deployed a statistical model of fatigue, which used quantitative features extracted from eye movements and pupillary responses, i.e. SF, PERCLOS, PDIR, BF, and SVA. The accuracy of the statistical model was promising considering the subjectivity of KSS scores. As hypothesized, the biofeedback system with the embedded micro-breaks, effectively counteracted fatigue development reflected in delayed trending towards fatigue (Fig 8) and decreased perceived workload (Figs 6 and 7).
The involved oculometrics (SF, PERCLOS, PDIR, BF, and SVA) have been previously reported to be reliable and sensitive to fatigue progression as well as mental load [33,34,38,59,104,105]. The PERCLOS and BF are reported to increase with fatigue [34, 106], which is in line with the current results. The decrease in SF and increase in BF along-side with TOT were also in agreement with previous findings [107]. Saccadic main sequence and the range of pupil diameter decreases and increased, respectively with TOT [38,108], but the SVA and PDIR did not change monotonically with TOT, most likely because of the presence of micro-breaks.
To the best of our knowledge, this was the first study to deploy a statistical model of fatigue in a biofeedback system to trigger objective micro-breaks, and thereby to elaborate self-awareness of fatigue. A few studies have contributed in noninvasive fatigue detection. In [109], mental fatigue was detected offline using 31 statistical features from saccades, fixations, blinks, and pupillary responses exhibiting 77.1% accuracy with 10-fold cross validation via an SVM classifier. Our biofeedback system reached approx. 70% of accuracy (with an estimated misclassification rate of 0.30 in 10-fold cross-validation of the model) using the five features (versus 31 in [109]), which may facilitate real-time applications. Nine 30-s data samples collected from each participant before and after two 17-min cognitive tasks, were used to detect mental fatigue in [109]. A numerical rating scale has been used as a subjective rating of fatigue in [109], however, the samples recorded before and after the cognitive tasks have been respectively labeled as non-fatigued and fatigued, regardless of individual differences in fatigue perception as opposed to the current study. A recent study using wearable electroencephalography has classified fatigue from alertness using an SVM classifier, based on KSS threshold of five, with the accuracy of about 65% in a 10-fold cross validation [110]. In [111], fatigue, subjectively labeled using a different rating scale, has been classified via a feedforward neural network using nine features extracted from computer user interactions with mouse and keyboard achieving an accuracy of 81% in a hold-out cross validation. The classification model proposed in [111], has been validated using the same group of individuals as opposed to the present study. Moreover, the features have been extracted over the period of one hour [111], which is much longer than a segment (�200 s) to trigger micro-breaks questioning the practical use of such approach.
A general limitation in the study of fatigue is the inaccuracies of subjective ratings (KSS scores). Although it is still one of the most commonly used methods to acquire fatigue level [112], it could be affected by factors such as experimental design [113] and individual's emotional state [114,115]. One may suggest the OP as an alternative to the KSS. However, the OP cannot necessarily be translated into fatigue levels in early detection of fatigue, Fig 5a. Additionally, the task performance may consistently change with TOT [7]. This is the main point of the studies investigating the maintenance of homeostasis in response to perceived fatigue via psychophysiological measures, e.g. heart rate variability [116].
Another important issue to consider is the effect of circadian rhythms on the accuracy of the fatigue state estimation [117]. Circadian rhythm is a source of variability in oculometrics [118,119] and cognition [120], which makes the prediction of fatigue quite challenging. The non-significant difference between the classification accuracy of the DT Ensemble model for the half of the participants who did the tasks in the morning (9:00-12:00) and the rest of the participants who did the tasks in the afternoon (13:00-15:00) gave support to the robustness of the model against circadian variations, (cf. Fig 4b).
An efficient and effective design for micro-breaks is quite challenging especially due to the complex interferences between physical and mental demands of a task [121,122]. Interestingly, reduced perceived workload was observed in the sessions where the micro-breaks were triggered by the biofeedback system compared with the manual sessions. More specifically, mental and temporal demands contributed more than the other subscales to the perceived workload in both manual and automatic sessions. The slight improvements of the OP, Fig 5a, and delayed inclination to fatigue, Fig 8, were observed through using the biofeedback system. Even though the improvement in the performance was statistically insignificant, one may conceive that in a long run the improvement may be of importance for the prevention of musculoskeletal disorders [16,17,123].
The relationships between the OP, perceived fatigue, and the number of micro-breaks in the manual and automatic sessions revealed some aspects of the two approaches to take microbreaks. The positive correlation of the number of the micro-breaks with the relative change of the KSS scores, and the standard deviation of the OP, may imply that the frequency of the micro-breaks was reflected in the variations in the performance and the perceived fatigue in the automatic session. The correlation of the number of micro-breaks and the relative change of the KSS scores were stronger in the manual sessions than the tendency in the automatic sessions, which may imply that the relative change in the perception of fatigue was less dependent to the number of micro-breaks in the automatic session. The negative correlation between the mean and standard deviation of the OP implies that as performance increases its variation is restrained. The lack of correlation of the number of the micro-breaks, the mean and standard deviation of the OP, and the range of the KSS scores the NASA-TLX scores suggests that none of these variables had any significant impact on the perceived workload of the participants. This finding indicates that the perceived workload cannot be reduced by just increasing the number of micro-breaks, giving support to the necessity of an intelligent approach to implement the micro-breaks [17].
The difference between the male and female participants were assessed on individual level in terms of their perceived workload (Fig 7), the accuracy of the model (Fig 4), and the effect of occurrences of micro-breaks on the OP (Fig 9). As mentioned in the results, the females on average exhibited similar OP to males, but at the individual level, the OP fluctuated more among females. The females perceived the task more demanding than the males in line with the higher prevalence of neck-shoulder complaints reported in females [124,125]. The performance of the statistical model to predict fatigue was higher for the males than females. Different factors may contribute to the sex-related differences, e.g. hormonal variations due to the menstrual cycle of the females [126]. Of note, the stages of the menstrual cycle were not assessed in the present study. Even though menstrual cycle may contribute to differences in fatigability among females, It is noteworthy however that this has not yet well established and contradicting results can be found in the literature regarding the menstrual cycle, e.g. [126,127]. The population size and design of the current study do not allow a thorough substantiation of the matter.
Some issues have important impacts on the design and assessment of the micro-breaks. The activities during micro-breaks should not demand for the same mental resources that a task might require [128]. Considering the multiple resource model [129], targeting the same mental resources may decline performance. Accordingly, in comparison with the task demands, the micro-breaks intuitively required little physical and mental demands as well as low vigilance to attend to down-counter displayed on the screen. The measurement of respiration rate to assess the breathings [71] was avoided to approach ecological validity for computer work as a limitation of this study. The quality of the mindful breathing could be assessed based on unobtrusive measurement of the respiration rate [130] or self-assessments of mindfulness [131]. In practice, to avoid too frequent and invalid micro-breaks, interactive micro-breaks [47] and model adaptation is suggested to study through the presented framework. In comparison with [132], the simplicity and effectiveness of the proposed microbreak as well as the unconstrained technique of eye tracking potentially meet constraints of out-of-lab settings.

Conclusion
In line with our hypothesis, this study shows for the first time that the integration of oculometrics-based biofeedback in the design of micro-breaks is effective in fatigue mitigation during computer work. The effectiveness of the oculometrics-based biofeedback was evidenced by the decreased perception of workload and further by the postponed inclination to fatigue using the biofeedback system compared with self-triggered micro-breaks. In sum, the use of oculometrics as objective indices of fatigue in a biofeedback system may be a viable approach to impede fatigue development.