Subject-independent decoding of affective states using functional near-infrared spectroscopy

Affective decoding is the inference of human emotional states using brain signal measurements. This approach is crucial to develop new therapeutic approaches for psychiatric rehabilitation, such as affective neurofeedback protocols. To reduce the training duration and optimize the clinical outputs, an ideal clinical neurofeedback could be trained using data from an independent group of volunteers before being used by new patients. Here, we investigated if this subject-independent design of affective decoding can be achieved using functional near-infrared spectroscopy (fNIRS) signals from frontal and occipital areas. For this purpose, a linear discriminant analysis classifier was first trained in a dataset (49 participants, 24.65±3.23 years) and then tested in a completely independent one (20 participants, 24.00±3.92 years). Significant balanced accuracies between classes were found for positive vs. negative (64.50 ± 12.03%, p<0.01) and negative vs. neutral (68.25 ± 12.97%, p<0.01) affective states discrimination during a reactive block consisting in viewing affective-loaded images. For an active block, in which volunteers were instructed to recollect personal affective experiences, significant accuracy was found for positive vs. neutral affect classification (71.25 ± 18.02%, p<0.01). In this last case, only three fNIRS channels were enough to discriminate between neutral and positive affective states. Although more research is needed, for example focusing on better combinations of features and classifiers, our results highlight fNIRS as a possible technique for subject-independent affective decoding, reaching significant classification accuracies of emotional states using only a few but biologically relevant features.


Introduction
Multivariate brain decoding (MBD) might allow the inference of mental states based solely on specific brain signals' features [1]. In comparison to conventional analytic methods, this approach has the advantage of considering the various brain regions simultaneously, thus positive, negative or neutral affective states [25]. Critically, those images were selected to balance the arousal dimension for the different valences. In the active task, participants were instructed to imagine personal situations with positive, negative or neutral affective contexts. We expected decoding performances higher than the chance level when comparing two different affective states in each task. Also, based on the affective-workspace model [26,27], we expected the relevant information used in classification to come from nodes of neural networks mainly comprising frontal and occipital areas [28,29].

Methods
Ethical approval for this study was obtained from the Federal University of ABC Ethics Committee.

Participants
Forty-nine healthy participants (25 female, 24.65±3.23 years) were enrolled in the first part of the experiment. Inclusion criteria were no previous (self-reported) diagnosis of neurological (ICD-10: G00-G99) and/or psychiatric disorders (ICD-10: F00-F99), and normal or correctedto-normal vision. All participants were attending college or graduated and provided written informed consent to participate in this study. Two years later, a second (and independent) sample of twenty healthy participants (10 female, 24.00±3.92 years) was collected by a different researcher, and in a separate laboratory (at the same university). The inclusion criteria and the experimental protocol were equal to the first sample. However, considering that subjects, environmental noises, and experimenter bias are different between datasets, we treat them as two independent datasets. For both cases, ethical approval was obtained from the local Ethics Committee, and no payment was provided to the participants, according to the national rules.

Functional NIRS acquisition
fNIRS measurements were conducted using the NIRScout System (NIRx Medical Technologies, LLC. Los Angeles, California) using an array of optodes (12 light sources and 12 detectors) covering the prefrontal, temporal and occipital areas. Optodes were arranged in an elastic band, with nine sources and nine detectors positioned over the frontal and temporal regions, and three sources and three detectors over the occipital region. Four positions of the International 10-20 System were adopted as reference points during the setup: sensors 1 and 9 were positioned approximately over the T7 and T8 locations, respectively, while the Fpz and Oz positions were in the center of channels 5-5 and 11-11, respectively, as shown in Fig 1. The source-receptor distance was 30 mm for adjacent optodes, and the used wavelengths were 760 and 850 nm. Signals obtained from the thirty-two channels were measured with a sampling frequency of 5.2083 Hz (determined by the maximum sampling frequency of the equipment-62.5 Hz-divided by the number of sources-12) using the NIRStar 14.0 software (NIRx Medical Technologies, LLC. Los Angeles, California).

Experimental protocol
During the test, participants sat in a padded chair with armrest, positioned 1-meter distance in front of a monitor. They were asked to remain relaxed, with hands within sight resting on the armrests or the table. They were also requested to avoid eye movements, as well as any body movement. The recording room remained dark during registration and the subject used earplugs.  Each subject completed an eleven-point Likert mood scale immediately before and after the  session, to evaluate the possible influence of mood on our results. This test quantifies her/his  agitation, strength, confusion, agility, apathy, satisfaction, worry, perspicacity, stress, attention,  capacity, happiness, hostility, interest, and introspection [30].
Reactive task. For the reactive task, we used images available on the international affective picture system (IAPS) catalog [25] This task consisted of twenty trials (5 for positive stimuli, 10 for neutral and 5 for negative). For the first 2 seconds of each trial, a white cross was presented in the center of a blank screen. During the next 30 seconds, a new figure was displayed every 5 seconds, totaling six randomly selected figures per trial corresponding to the desired affective class (Fig 2). Presenting a group of images with the same valence ensures the maintenance of cognitive engagement [31] and the required duration to achieve the peak of oxygen concentration change relative to baseline [32]. At the end of the trial, a new screen was presented asking the participant to assign a score from 1 to 9 for the subjective valence (1 -extremely negative valence; 9 -highly positive valence) and subjective arousal (1 -lower arousal; 9 -higher arousal) experiences. After this, a blank screen appears for a random duration between 2-4 seconds and participants were instructed to blink and/or move in this period but not in the other phases.
Active task. The active task (affective imagination) consisted of twenty trials (5 trials for positive affect, 5 for negative affect and 10 for resting with eyes open, also referred as neutral affect). Each trial started with a baseline period of a blank screen with a white cross in the center. After 2 seconds, the instruction (representing the desired emotion) appeared to the left of the display, remaining on the screen for 2 s (Fig 2). The instruction consisted of either a green arrow pointing up (positive affect), a red arrow pointing downward (negative affect) or a blue circle (neutral affect). For 30 seconds after the instruction was presented, the screen remained unchanged, corresponding to the participant's affective imagination period. At the end of the trial, a new screen was presented asking the participant to assign a score from 1 to 9 for the subjective valence (1extremely negative valence; 9 -highly positive valence) and subjective arousal (1 -lower arousal; 9 -higher arousal) experiences. After this, a blank screen appears for a random duration between 2 to 4 seconds and participants were instructed to blink and/or move in this period.
Considering that Mayer waves-related noise around 0.1 Hz might interfere in the classification of fNIRS data [33, 34], we used a conservative frequency range in our analysis (0.01-0.1 Hz) (a supplementary analysis with a broader filter is included in the S1 File). Thus, each participant's raw data were digitally band-pass filtered by a linear-phase FIR filter (0.01-0.1 Hz). Besides Mayer waves, this range also filters noises due to heartbeat (0.8~1.2 Hz) and respiration (0.3 Hz) [35,36]. Then, each wavelength was detrended by their respective whole length record (without segmentation), and variations in concentration of oxyhemoglobin and deoxyhemoglobin were calculated by the modified Beer-Lambert law (differential pathlength factor set to 7.25 and 6.38, respectively) [37]. Each concentration curve was then segmented into the 30 s of interest of each trial, for all studied conditions (Fig 3a).
The mean concentration of oxyhemoglobin and deoxyhemoglobin for each segment was calculated for each channel using the average of moving 2s-window means with 50% overlap. Thus, each subject's database was composed of 64 features of average concentration (32 channels × 2 chromophores) in 20 experimental conditions (10 neutral trials + 5 positive trials + 5 negative trials), for both tasks.
Machine learning procedure. To decode affective states across databases, we used the Linear Discriminant Analysis (LDA) implementation provided by the BCILAB toolbox [38].
To avoid overfitting the model to our dataset, the LDA was computed with its default parameters. This analysis was divided in two main steps represented in Fig 3b. First, we performed an LDA-based feature selection using the whole feature space from the first experimental sample (49 participants) (steps in blue in Fig 3b). The LDA model searches for a linear projection of multivariate observations to univariate ones [39]. During the LDAbased feature selection, the eigenvalues of the within-class covariance matrix in the LDA can be used as a measure of feature relevance [40-44]. Thus, after sorting these eigenvalues, larger values provide the most discriminative information, while smaller values indicate less relevant features. This approach was successfully employed as a feature selection step in different neuroimaging studies [41, 43-45], even outperforming other classifier-based feature selection approaches [45]. In this study, we first ranked the absolute eigenvalues from all features, and then created different subsets selecting the best 5 to 100% (with steps of 5%) features with highest eigenvalues (in other words, one subset using the top 5% features, another subset using the top 10%, and so on).
For each obtained feature subset, projections of the training and testing datasets described by the selected features only were built using the first (49 participants) and second (20 participants) experimental samples, respectively (steps in red in Fig 3b). Then, we evaluated the predictive performance on each corresponding test set. This approach was applied to evaluate affective conditions in pairs, following "Positive versus Neutral", "Negative vs. Neutral" and "Positive vs. Negative" combinations, for reactive and active tasks separately. Here we followed this binary approach since many current affective neurofeedback protocols are mainly based on two-states designs [24,[46][47][48][49][50][51].
Since the train and test sets are composed of two independent datasets, cross-validation analyses do not apply to this case. Thus, to evaluate the prediction performance in each comparison, we applied the trained model to decode data from each subject from the test set, and calculated accuracy as: (trials correctly classified from first class + trials correctly classified from second class)/2. This ratio avoids the potential unbalance of sample size in each comparison, and this random result should be 50%.
Two complementary analysis are described in the S1 File: one repeating exactly the same procedure using a Support Vector Machine classifier; other using the LDA classifier in a leaveone-subject-out cross-validation including all 69 subjects.
Statistical analysis. Considering that the results here presented are sum-based proportions, and no ceiling-effect was observed, these data assume approximately normal distributions. Thus, here we applied parametric t-test to evaluate our results.
First, the subject scores of valence and arousal of the second database were evaluated by a two-sample t-test. The comparisons used the mean values of valence and arousal assigned during positive, negative and neutral trials. This procedure was repeated for both tasks. The pvalues were Bonferroni corrected for multiple comparisons (2 subjective measures × 3 comparisons of conditions × 2 tasks).
To evaluate possible influences of mood fluctuations on the classification performance, we calculated the Spearman's correlation between the best classification accuracies in each block and the pre-experiment mood scores, as well as between the best classification accuracies and the delta (post-experiment minus pre-experiment) mood scores. P-values were Bonferroni corrected for 64 multiple comparisons (2 experiment blocks x 2 time points x 16 mood measures).
The significance of classification accuracy of each comparison was evaluated by a one sample t-test, comparing the prediction performance from all participants in this comparison against the chance level (50%). Each result was independently adjusted by using the Bonferroni correction for 120 multiple comparisons (20 different percentages of features × 3 comparisons of conditions × 2 tasks). Table 1 presents the mean subjective scores of arousal and valence relative to the positive, negative and neutral conditions, during both active and reactive tasks. Moreover, the corrected pvalues of each comparison are also presented.

Subjective scores of arousal and valence
Valence scores presented significant differences for all the comparisons. It is relevant, however, that no significant difference of the arousal scores was found during "Positive vs. Negative" comparisons, as well as during "Positive vs. Neutral" comparison during the reactive task.
No significant correlations were found between pre-experiment mood scores and classification accuracies, or delta mood scores and classification accuracies. These results suggest that the classification results resulted from temporary affective changes induced during the respective trial, and not due to mood fluctuations not related to the experiment.

Classification results
Boxplots describing the LDA accuracy are presented in Fig 4. Classification accuracy for the reactive task significantly exceeded chance level in "Positive vs. Negative" comparisons, with highest mean result as 64.50 ± 12.03% (mean ± standard-deviation, p<0.01) using 20% of features (12 channels), and in "Negative vs. Neutral" comparisons (68.25 ± 12.97%, p<0.01, 30% of features-19 channels). No significant differences were found for "Positive vs. Neutral" comparisons.
For the active task, accuracies were greater than chance level in "Positive vs. Neutral" comparisons, and the highest mean accuracy was 71.25 ± 18.02% (p<0.01, with maximum value for one single participant equals to 100%) using only 5% of the features (3 channels). No significant differences were found for "Positive vs. Negative" and "Negative vs. Neutral" comparisons.

Relevant channels
In Fig 5, the channels selected in each of the highest significant results reported in Section 3.2 are shown alongside their respective weights assigned by the LDA classifier. The location of the channels follows the same order shown in Fig 1. Among the "Positive vs. Negative" classification during the reactive task, the highest accuracy was achieved using 12 features. Nine of these features correspond to information about Table 1. In the first half of the table, mean ± standard-deviation of subjective scores (ranging from 1 to 9) assigned by each subject and for each trial of the second database during negative, neutral and positive trials. In the second half, p-values of the comparison between the scores assigned for each trial type. the deoxyhemoglobin concentration. According to the 10-5 EEG electrode positioning system [52], four channels are approximately at the frontopolar area, two at the anterofrontal area, and three at the occipital area. Complementary, three features included information about the oxyhemoglobin concentration at the frontopolar (two channels) and the occipital (one) areas.

Mean subjective scores
For the "Negative vs. Neutral" classification during the reactive task, 19 channels carried relevant information: six frontopolar, five anterofrontal, one frontal, and three occipital channels for the deoxyhemoglobin concentration; and two frontopolar and two occipital channels for the oxyhemoglobin concentration.
Finally, the "Positive vs. Neutral" decoding during the active task used only three channels to achieve the highest performance: two frontopolar and one anterofrontal. These channels provided information about the deoxyhemoglobin concentration. The corresponding averaged time series are shown in Fig 6, as well as the averaged values used as inputs to the LDA classifier. It is notable that the time series from second sample have higher variability, since the number of subjects (20) is smaller than the first sample (49). However, it is also notable that

Affective decoding
First, it is essential to highlight the absence of statistical significance in the subjective arousal score during the "Positive vs. Negative" comparisons. In addition to the significant difference in the valence scores, it suggests that any classification result in these comparisons is exclusively related to valence differences between positive and negative affect. Further, during both "Positive vs. Neutral" and "Negative vs. Neutral" comparisons, statistical differences were found for valence and arousal scores. It is expected that Positive and Negative present differences in arousal and valence when compared to absent affective content (Neutral). Some studies suggest valence and arousal not as two orthogonal dimensions, but as a single V-shaped dimension [53]. Accordingly, these two dimensions cannot be separated in most of the emotional stimuli systems (for example, the International Affective Pictures System) [27,54]. Given this, we have demonstrated that it is possible to detect distinct patterns of hemodynamic activity generated by different affective valence elicitation tasks. Our results also provide For the decoding of the reactive task, only the accuracies for "Positive vs. Negative" and "Negative vs. Neutral" comparisons were higher than chance level. This finding is in line with previous inter-subject studies using the IAPS database to decode affective states, but with fMRI [13]. In a qualitative evaluation, some participants described, after the experiment, that the negative affect induced by the pictures presented during the reactive block was more intense than the positive one. Negative figures into the IAPS catalog are mainly related to death, malnutrition, sickness, poverty, and disgust [25], which are more consensual than the content of some positive stimuli, such as babies, pets or beaches. Moreover, some studies suggest that processing negative stimuli are more demanding in the brain [55], which might generate more clear signals to our classifiers than the neural processing of positive stimuli.
The decoding of the active task, on the other hand, presented significant accuracy for the "Positive vs. Neutral" condition exclusively. The participants also described that the positive imagery during active condition was more natural to achieve and more intense than the negative representation. This result is particularly interesting for potential neurofeedback applications [4,56]. For example, in psychiatric applications where patients present increased levels of negative emotions and decreased levels of positive emotions [57], improving the selfregulation between Positive and Neutral states might be clinically relevant [24].

Relevant features and the neural networks of affect
For both results in the reactive task, relevant features included both fNIRS derived oxyhemoglobin and deoxyhemoglobin concentrations measures. In fact, each chromophore is expected to provide different and complementary information. Deoxyhemoglobin is thought to be an indicator of functional activation and is more directly related to the fMRI BOLD signal [15,58], while a recent work found a correlation between oxyhemoglobin signal and the EEG band power variation in some cortical regions [59]. Other decoding experiments also reported both hemoglobin concentrations as relevant for classification [19]. These results suggest a nonredundancy of these measures.
In accordance with two meta-analyses of neuroimaging data in affective tasks, we found that occipital areas signals were among the relevant features for classification [28,29]. Additionally, fNIRS studies also described activation of the occipital cortex during affective stimulation [60,61]. Minati and colleagues, for example, found increased occipital response to positive and negative affects relative to neutral pictures of the same database we have used (IAPS) [62]. Moreover, it is interesting to note that the classifier assigned high absolute weights for a considerable number of channels from frontopolar and anterofrontal regions. A frontal lateralization of activity during the experience and regulation of emotional responses is a classical effect [63][64][65][66]. However, a broader theoretical account of affective processing, affective workspace hypothesis, states that activity patterns of the same core neural network could implement both positive and negative affective responses [26,27]. Our application of subjectindependent affective decoding, which considers distributed brain activity, is in line with the theoretical assumptions of such model. Frontal regions, which were consistently relevant for classification in our experiments, superpose to the network nodes that have been proposed as core regions of the affective neural workspace [28, 67].

Subject-independent designs
fNIRS-based affective neurofeedback protocols were recently applied to healthy [24,68] and psychiatric populations [69], including patients with schizophrenia [70], autism disorder [71], and attention-deficit/hyperactivity disorder (ADHD) [72,73]. However, all these protocols have subject-specific designs which require training blocks or calibration trials for every experimental session. In addition to the expected duration of the setup preparation, these demands can act as stressors for patients presenting anxiety symptoms, or cause physical and mental fatigue shortly after the initial blocks [6]. Consequently, patients may not achieve their best performance, or the therapeutic benefits associated with the protocol. All these aspects make the investigation of subject-independent fNIRS protocols a timely research topic.
Critically, classification accuracies presented in this paper are lower than those presented in our previous subject-specific study [22]. However, the performance drop during subject-independent designs is an expected effect due to the inter-subject variability of anatomy, level of stress, rest, among others. For example, Robinson and colleagues report more than 10% of performance reduction from a subject-dependent to a subject-independent fNIRS-based motor imagery neurofeedback [74].
Also, the mean accuracy for "Positive vs. Neutral" discrimination in the active block was slightly over the 70% threshold suggested by the brain-computer interface and neurofeedback communities as sufficient to perform device control and communication [19,75]. This is a crucial finding since the differentiation between the active elicitation of positive affect and a neutral (resting-state) condition is the state of the art of hemodynamic-based neurofeedback protocols applied to both healthy and psychiatric populations [24,47,49,50,76,77]. It is also important to emphasize that this result was reached using only three channels. Considering future applications, the use of only three channels means shorter setup procedure, lower instrumental and computational costs, and the possibility of even more portable systems.

Limitations and future perspectives
In subject-independent designs using fMRI data, the maximum accuracy for decoding data from new individuals is 100%, and averages varying from 60% to 80% depending on the number of voxels (from 2000 to 4000) included as predictors [13]. Although our best results belong to the same accuracy range from fMRI studies, fMRI studies outperform fNIRS experiments. However, comparisons should consider the differences in the spatial resolution of both methods. While these experiments include voxels from subcortical regions, such as the amygdala [13], or even from specific neuron populations [78], affect decoding using fNIRS relies on data from external layers from prefrontal gyri [16,17]. This is a relevant limitation that should be considered when planning future experiments.
Despite the controls we implemented in our experimental design, collection and analysis, limitations should be recognized. First, although fNIRS signal is mainly related to the nearinfrared light absorption into the cortical surface, it also encompasses peripheral responses, including changes in skin, muscular and cranial blood flow and the aerobic process of energy consumption related to muscle contractions [16]. Consequently, some authors refer to fNIRS applications for control of devices such as corporeal machine interfaces [19]. As this constitutes one fundamental issue of this method, further validation using this approach is warranted. Here, one approach that could be explored in future studies is the use of shortseparation channels to filter systemic hemodynamic fluctuations from non-neural sources [79,80].
Also, we used minimal preprocessing steps, classifying fNIRS data using the mean changes in oxy and deoxyhemoglobin concentrations as features. This approach was used in order to test fNIRS based MDB of affective state with the least assumptions possible, to avoid introducing spurious artifact in the signal and to test the feasibility for posterior real-time analysis [81,82]. In comparison, recent studies identified that combining the mean hemoglobin concentration with other temporal and time-frequency features improves the decoding accuracies reaching values close to 90% in within-subject decoding [19,83,84]. Therefore, future studies should also evaluate the effect of different feature extraction techniques to the inter-participants MBD of affective states. Regarding experimental design, it is important to emphasize that potential differences in perceptual and semantic complexity of positive and negative images and of the phenomenological features of the affective states (e.g., vividness, content of memory retrieval) were not controlled in depth detail. Exploring the effects of such differences on valence decoding and investigating the possibility of disentangling such differences using subject-independent fNIRS-based classification approaches would constitute prolific lines of enquiry in affective neurofeedback.
However, even with these limitations, we reinforce the advantages of fNIRS compared to other neuroimaging techniques in implementing MBD systems. Particularly, the portability, benefit-cost relation and the good balance between spatial and temporal resolutions makes fNIRS especially suitable for this purpose [16]. In fact, these aspects create a solid base for an increasing number of studies using fNIRS for MBD of motor, cognitive, and more recently for decoding of affective states [19,20,22]. In subject-independent designs, experiments can also be found classifying semantic experiences [85] or audiovisual processing [86]. Additionally, the portability of fNIRS systems allow a fast evolution of naturalistic experiments targeting real-world applications [87,88], such as affective neurofeedback systems for therapeutic purposes [24,69].

Conclusion
Although more experiments are necessary to increase the classification accuracies reached here, the results from this pilot study suggest that fNIRS is a potential tool for subject-independent decoding of affective states. The accuracy in discriminating positive and neutral emotions during the affective task was significant and above the threshold desired for effective control of BCI or neurofeedback protocols. Thus, the construction of fNIRS-based subject-independent neurofeedback devices should be attempted.