Failure to modulate reward prediction errors in declarative learning with theta (6 Hz) frequency transcranial alternating current stimulation

Recent evidence suggests that reward prediction errors (RPEs) play an important role in declarative learning, but its neurophysiological mechanism remains unclear. Here, we tested the hypothesis that RPEs modulate declarative learning via theta-frequency oscillations, which have been related to memory encoding in prior work. For that purpose, we examined the interaction between RPE and transcranial Alternating Current Stimulation (tACS) in declarative learning. Using a between-subject (real versus sham stimulation group), single-blind stimulation design, 76 participants learned 60 Dutch-Swahili word pairs, while theta-frequency (6 Hz) tACS was administered over the medial frontal cortex (MFC). Previous studies have implicated MFC in memory encoding. We replicated our previous finding of signed RPEs (SRPEs) boosting declarative learning; with larger and more positive RPEs enhancing memory performance. However, tACS failed to modulate the SRPE effect in declarative learning and did not affect memory performance. Bayesian statistics supported evidence for an absence of effect. Our study confirms a role of RPE in declarative learning, but also calls for standardized procedures in transcranial electrical stimulation.


Introduction
Declarative memory consists of memory for facts and events that can be consciously recalled [1,2]. Memoranda are learned rapidly, often after a single exposure [3]. The process of acquiring such memories is called declarative learning. Declarative memory differs from procedural memory, where a skill is learned slowly and by means of repeated practice (e.g., learning how to drive a car). Research has firmly established that prediction errors modulate declarative memory [4], just like they do in procedural memory [5]. Recent research shows that reward prediction errors (RPE; i.e., mismatches between reward outcome and reward prediction) specifically may facilitate memory formation. RPEs were primarily studied within procedural learning (e.g., [6]). However, recent evidence suggests that RPEs are crucial for declarative learning as well [7][8][9]. Unfortunately, the evidence for theta modulation of RPEs in declarative memory thus far remains correlational only. With the rise of non-invasive brain stimulation (NIBS) techniques, the causal role of neural oscillations and their relation to behavior can be explicitly tested [37]. More specifically, transcranial Alternating Current Stimulation (tACS) allows modulating neural oscillations [38]. It is hypothesized that tACS causes underlying brain networks to synchronize or desynchronize. Although tACS has rather low temporal and spatial resolution, its frequency resolution is high. By applying a weak sinusoidal current to the scalp, the likelihood of neural firing is increased or decreased, depending on the stimulation parameters [39]. Ongoing neural oscillations can thus be entrained at specific frequencies of interest [39]. This synchronization modulates brain activity and alters cognitive processes, leading to behavioral changes, which can be measured through, for example, memory performance [40].
Whereas several tACS experiments entraining oscillations at theta frequency looked at its effects on working memory [41][42][43][44][45][46], a few studies have investigated its effects on declarative memory [47]. applied theta-frequency tACS over the right fusiform cortex while face and scene pairs were encoded. Here, stimulation enhanced memory performance measured after a 24-hour delay. Similarly [48], also found enhanced long-term memory performance after applying theta-frequency tACS over the right posterior cortex while participants learned facemonetary value pairs. To the best of our knowledge, no study examined the effects of theta-frequency tACS over MFC in relation to declarative learning.
Together, these findings suggest that RPEs are projected from brainstem to MFC; elicit theta phase synchronization between several neural areas; and thus boost declarative learning. As such, the goal of the current study was to use theta-frequency (6 Hz) tACS to entrain neural oscillations whilst encoding new word pairs associated with RPEs of different sizes and values. To this end, tACS was applied over the MFC while participants acquired 60 Dutch-Swahili word pairs using the variable-choice experimental paradigm. We hypothesized that if declarative learning is modulated by theta oscillations in MFC, then subsequent memory performance and certainty ratings should be modulated by tACS (i.e., higher recognition accuracies and certainty ratings in the real compared to sham stimulation group); and if theta oscillations are driven by RPE, as the literature review suggests, tACS and RPE should interact.

Participants
We tested a total of 77 healthy, Dutch-speaking participants. One participant was excluded from further analysis due to below chance level performance on the recognition test. The analyses were run on the remaining 76 participants (57 females, range = 18-29 years, M age = 20.8 years, SD age = 2.4 years). All participants had no prior knowledge of Swahili, gave written informed consent, were randomly assigned to a real (N = 38) or sham (N = 38) stimulation group, and were paid €17.5. The study was approved by the Medical Ethics Review Board of the Ghent University Hospital and was carried out in accordance with the Declaration of Helsinki.

Material
A total of 330 words (66 Dutch, 24 Japanese and 240 Swahili words) (S1-S4 Tables) were used. Each participant memorized 60 Dutch-Swahili word pairs. The experiment was run on an HP ProBook 6560b laptop with a 15.6" screen size running PsychoPy software (version 1.85.4) [49].

Experimental paradigm
Familiarization task. Participants started with a familiarization task using the stimuli in the experiment, to control for the novelty of the foreign Swahili words. All Dutch (N = 60) and Swahili (N = 240) words were randomly and sequentially presented on the screen for a duration of two seconds. Participants were asked to press the space bar whenever a Dutch word was presented.
Acquisition task. Prior to the actual acquisition task, a total of six practice trials with Dutch (N = 6) and Japanese (N = 24) words was presented. After successfully finishing the practice set, participants were presented with the acquisition task. Here, the aim was to learn 60 unique Dutch-Swahili word pair associations. On each trial, one Dutch word was shown together with four Swahili translations ( Fig 1A). After four seconds, frames surrounded the eligible Swahili translations. Either one, two or four Swahili translations were framed. In the oneoption condition, one Swahili translation was framed and participants could only choose this Swahili word as the translation for the Dutch word. In the two-option condition, two Swahili translations were framed and participants could choose between two options. In the fouroption condition trials, all four Swahili translations were framed and participants could choose among these four options. The probability of choosing the correct Swahili translation was therefore 100% (in one-option condition trials), 50% (in two-option condition trials), or 25% (in four-option condition trials). Importantly, each trial was associated with a specific RPE value by fixing a priori whether a trial was rewarded or not and the number of eligible Swahili translations. As a result, participants did not learn the actual Swahili translations of the Dutch words. They were unaware of this manipulation during the experiment, but were debriefed afterwards. Note also that although not explicitly communicated to the participants, there was a clear, normatively correct choice that had to be remembered on each trial. The intention of the experiment was also made clear by the colors (i.e., red/green) and the feedback (i.e., wrong/correct) that were used in the acquisition task. Participants responded with the index and middle finger of the right and left hand. For stimulation purposes, trial duration was controlled by instructing participants to make their choice as soon as the fixation cross turned blue. If no choice was made after two seconds, the fixation cross turned red, urging participants to choose as soon as possible. To ensure that stimulation was given throughout the entire duration of the acquisition task, total time spent in the acquisition task was equated for each participant. Specifically, if participants made a choice less than two seconds after the fixation cross turned blue, feedback was presented after [two seconds-choice duration] seconds. After participants made their choice, the fixation cross turned into a blue "o" indicating that their response had been registered. They were then provided with feedback where they saw the Dutch word, an equation sign, and the to-be-learned Swahili translation (in green for correct choices and in red for incorrect choices) for a duration of five seconds. This was followed by reward feedback (+0.5 Euros for correct choices and +0 Euros for incorrect choices) and a reward update telling them how much money they earned up until the last completed trial (two seconds). After every ten trials, the acquisition task was briefly paused for ten seconds to allow an impedance check.
Design. Parametric modulation of RPEs was accomplished by fixing a priori the number of options (one, two or four) and reward on each trial (reward/no reward). This allowed the computation of an RPE for each cell of the design (Fig 1B). In addition, the proportion of trials in each cell of the design matched the reward expectation (i.e., 100% rewarded trials in the oneoption condition, 50% rewarded and 50% non-rewarded trials in the two-option condition, and 25% rewarded and 75% non-rewarded trials in the four-option condition).

PLOS ONE
Reward prediction errors, declarative learning and transcranial alternating current stimulation SRPEs were obtained by subtracting reward probability from reward outcome. For rewarded trials, reward outcome is equal to one, whereas reward outcome is equal to zero for unrewarded trials. Reward probability is determined by the number of options. URPEs are computed by taking the absolute value of the SRPE.
Recognition test. In the recognition test, participants' recognition was tested on 60 Dutch-Swahili word pairs that were acquired during the acquisition task ( Fig 1A). On each trial, one Dutch word was shown together with the same four Swahili translations from the acquisition task. Spatial positions of the Swahili translations were randomly shuffled relative to the acquisition task to avoid that participants would respond based on the spatial position instead of the learned translation of the Dutch word. In contrast to the acquisition task, no frames surrounded the Swahili translations, and no feedback was provided. No time limit was imposed. At the end of each trial, participants rated their certainty on a four-point scale ("very certain", "rather certain", "rather uncertain", "very uncertain").

Sensations questionnaire
A subset of participants (N = 61) filled out a sensations questionnaire [50] (S1 File). Participants rated seven sensations (itching, pain, burning, warmth/heat, pinching, metallic/iron taste and fatigue) on a five-point scale (none, mild, moderate, considerable, strong). They were also asked when the discomfort began, how long the discomfort lasted and how much these sensations affected their performance. The sensations questionnaire was used to verify whether participants in the real and sham stimulation group report a difference in sensations.

tACS stimulation
tACS stimulation was applied using a DC-stimulator Plus device (NeuroConn GmbH, Ilmenau, Germany). Two saline-soaked sponge electrodes (5 x 6.5 cm 2 ) were placed on the scalp and neck. The stimulation (red) electrode was positioned at FCz (according to the 10-20 positioning system), targeting the MFC, while the reference (blue) electrode was placed on the neck (Fig 1C). The sponge electrodes were fixed onto the participant's head with elastic fabric bands. Impedance between electrodes was kept below 15 kO. Participants received tACS stimulation at the theta (6 Hz) frequency with an intensity of 2 mA (peak-to-peak; mean 0 mA). A sinusoidal stimulation waveform was used with no DC offset and a phase shift of zero degrees. A fade-in and fade-out period of 5 seconds (30 cycles) was used. tACS was administered during the entire acquisition task for a duration of 16.6 minutes (6000 cycles) in the real stimulation group, while the sham stimulation group received 40 seconds (240 cycles) of stimulation at the beginning of the acquisition task only. Sham stimulation duration was deliberately kept short to avoid changes in cortical excitability [51,52]. Current flow was simulated using the ROAST (Realistic vOlumetric Approach to Simulate Transcranial electric stimulation) toolbox [53] in MATLAB (Fig 1D).

Data analysis
Both frequentist and Bayesian statistics were calculated. With regard to frequentist statistics, all data were analyzed within the linear mixed effects framework in R software [54], unless mentioned otherwise. For continuous dependent variables (i.e., certainty ratings in the recognition test) linear mixed effects models were used, while for categorical dependent variables (i.e., recognition accuracy) generalized linear mixed effects models were applied. A random intercept for participants was included in each model, while all predictors (i.e., accuracy, SRPE and stimulation) were mean-centered. Note that SRPEs were treated as a continuous predictor allowing the inclusion of all 60 trials per participant to estimate its regression coefficient, with the exception of invalid trials (i.e., trials on which a non-framed Swahili translation was chosen during the acquisition task). We report the χ 2 statistics from the ANOVA Type III tests. All data are made publicly available at OSF (DOI 10.17605/OSF.IO/ZXHQ4).
In addition to frequentist statistics, Bayesian repeated measures analyses of variance (ANO-VAs) are reported that were performed in JASP (version 0.11.1; [55]). In Bayesian ANOVAs, recognition accuracy and certainty ratings were analyzed as a function of SRPE and stimulation. Bayes factors (BFs) quantify the evidence in favor of the null hypothesis (BF 01 ; e.g., tACS does not influence memory performance) or the alternative hypothesis (BF 10 = 1/BF 01 ; e.g., tACS influences memory performance). BF 01 is reported when the Bayesian analysis provides relatively more evidence for the null hypothesis; BF 10 is instead reported when the analysis provides relatively more evidence for the alternative hypothesis. We used default prior settings for all analyses [56]. To determine the strength of evidence, we used Jeffreys' benchmarks [57], with BFs corresponding to anecdotal (0-3), substantial (3-10), strong (10-30), very strong (30-100) or decisive (>100) evidence.

Sensations questionnaire
Independent samples t-tests were used to verify whether sensations varied between the two stimulation groups. Participants in the real and sham stimulation groups did not report a significant difference for any of the sensations probed (itching, pain, burning, warmth/heat, pinching, metallic/iron taste and fatigue) (all p > .06). Furthermore, there were no significant differences between stimulation groups with regard to when the discomfort began, t(58.
Bayesian repeated measures ANOVA provided substantial evidence for the absence of a stimulation effect (BF 01 = 3.02, evidence for null versus alternative model). Thus, the observed data were about 3 times more likely under the model that included no stimulation than under the alternative model that did. The evidence for the SRPE effect was decisive (BF 10 > 100, evidence for alternative versus null model). In addition, there was strong evidence against the interaction of SRPE and stimulation (BF 01 = 54.66, evidence for main-effects-only relative to main-effects-plus-interaction model).
A Bayesian repeated measures ANOVA revealed anecdotal evidence for the absence of a stimulation effect (BF 01 = 1.33, null model relative to model including stimulation). For the SRPE effect, the evidence was decisive (BF 10 > 100, model including SRPE compared to null model). We also found strong evidence against the interaction of SRPE and stimulation (BF 01 = 19.74, compared to two-main-effects model).

Discussion
The main objective of our study was to examine if theta-frequency (6 Hz) tACS can modulate the effect of RPEs in declarative learning. For this purpose, participants acquired 60 Dutch-Swahili word pairs, associated with RPEs of different sizes and values, while the MFC was stimulated. We replicated our earlier finding of SRPEs driving declarative learning [10]. Word pair recognition increased for large and positive RPEs. However, contrary to our hypothesis, thetafrequency (6 Hz) tACS did not successfully improve memory nor modulate the effect of RPEs on declarative learning. There was a small effect of stimulation on certainty in the correctly recognized words, but this effect requires replication and must currently be interpreted with caution.
Whereas the importance of RPEs in procedural learning has been well established, its role in declarative learning has remained elusive until recently. One of the first experimental paradigms examining the effect of RPEs in declarative learning was put forward by [58]. Although this RPE effect on declarative memory could not be replicated [59,60], several research labs have since then used a range of experimental paradigms to investigate the role of RPEs in declarative learning. Most of these studies revealed positive effects of RPEs on declarative memory [8,9,61], but one study also reported negative effects [62] (for review see [7]).  A-B) Recognition accuracy as a function of SRPE in the real and sham stimulation group, respectively. The average recognition and its 95% confidence interval were estimated and superimposed. Gray dots represent data points for individual subjects. Recognition accuracy increases linearly with larger and more positive RPEs in the two stimulation groups, suggesting a SRPE effect. (C-D) Certainty rating for correct recognitions in the real and sham stimulation group, respectively. The average certainty and its 95% confidence interval were estimated and superimposed. Gray dots and rectangles represent data of individual subjects for correct recognitions. In the two stimulation groups, SRPE significantly predicted certainty for correctly recognized word pairs. (E-F) Certainty rating for incorrect recognitions in the real and sham stimulation group, respectively. The average certainty and its 95% confidence interval were estimated and superimposed. Gray dots and rectangles represent data of individual subjects for incorrect recognitions. In the two stimulation groups, SRPE did not significantly predict certainty for incorrectly recognized word pairs. https://doi.org/10.1371/journal.pone.0237829.g002 Overall, these studies (including the current one) support the claim that RPEs are a key factor in the formation of declarative memory.
Prior research has repeatedly shown a role of theta frequency in (reward) prediction error processing [63][64][65][66] as well as memory performance [21]. In particular [25], provided direct evidence for a causal role of theta frequency in memory. Memory for multimodal (audiovisual) stimuli was enhanced only when these stimuli were modulated at the theta frequency and not at other frequencies. Furthermore, in an earlier EEG study from our lab, we examined the neural signatures of RPEs in declarative learning and found increased theta (4)(5)(6)(7)(8) power during reward feedback [11]. However, it must be noted that in this particular EEG study, theta frequency followed an unsigned RPE (URPE) pattern during reward feedback. Theta power thus increased for both large negative and large positive RPEs. This URPE pattern evolved into a SRPE pattern during reward feedback and was accompanied by power increases in the high-beta (20-30 Hz) and high-alpha (10-17 Hz) frequency bands. Although beta and alpha power followed a clear SRPE pattern, we opted not to stimulate at these frequencies as there is more inter-individual variability with regard to peak-frequency [67].
We hypothesized that declarative learning is facilitated by theta frequency synchronization. Neurons are synchronized when their activation is locked to a common (slow-wave) phase. In such case, spikes of pre-and postsynaptic neurons are highly correlated, enabling synaptic learning between pairs of neurons because synaptic plasticity relies on the precise spike-timing of neurons [68]. Theta phase may modulate spike-timing-dependent plasticity by ensuring that (anatomically distant) neurons fire in synchrony [69,70]. As tACS modulates the spiketiming of neurons [71][72][73], it is a promising tool to causally manipulate neural oscillations related to RPE-processing in declarative learning. For this reason, theta-frequency tACS was used to stimulate the MFC. Unfortunately, however, our tACS manipulation did not affect memory performance.
In the following section, we speculate why we found no effect of theta-frequency (6Hz) tACS and provide suggestions for future research. First, tACS has a relatively low spatial resolution. As a consequence, current flow is not focal, but distributed across the entire scalp. In Fig 1D, we simulated the electric field in our paradigm. The distribution of current flow is indeed very broad, encompassing several brain areas. Therefore, it is conceivable that our tACS manipulation did not exclusively stimulate the MFC. Due to a complex interplay of brain networks, it remains possible that other brain regions were stimulated as well, potentially interacting or interfering with our RPE effect in declarative learning. Second, tACS only generates weak electrical fields. The simulation in Fig 1D shows that using a stimulation intensity of 2mA caused, at best, an electric field strength of 0.3 V/m, which is on the weak side. The induction of weak electrical fields makes it difficult to entrain endogenous oscillations. This is especially the case if the brain regions that need to be stimulated are located deeper within the brain. For instance [74], reported that low frequency tACS did not modulate ongoing brain activity during resting wakefulness [75]. also found that conventional stimulation parameters are insufficient to induce measurable effects. However, the use of stronger currents might be accompanied by increased discomfort. Third, some researchers raised the issue of brain-statedependent effects [76][77][78][79][80]. More specifically, tACS effects might depend on the current brain state of the participant. If a participant is in an optimal brain state where brain networks are synchronized enabling high encoding efficiency, stimulating the learning brain might impair learning. If, however, a participant is in a non-optimal brain state where synchronization is less pronounced and accompanied by decreased encoding efficiency, then applying stimulation could facilitate learning and improve memory performance. Importantly [81], have shown that endogenous brain oscillations are entrained only when phase-alignment is achieved between the applied stimulation and the ongoing brain activity (see also [72]). Therefore, stimulation should ideally be phase-aligned to participants' internal brain states [82]. As we could not measure participants' brain states in our study, it is possible that tACS interacted with ongoing endogenous brain states. Fourth, it remains possible that theta frequency has no effect on RPEs in declarative learning and declarative memory per se. For instance [83], applied theta-frequency (5 Hz) tACS over the ventrolateral prefrontal cortex during the acquisition of face-occupation pairs in older adults. In line with our study, thetafrequency tACS did not affect memory performance. Fifth, due to logistical constraints, a between-subjects design was used. By doing so, individual differences are not easily controlled. This could be mitigated by using a within-subjects design, where each participant is subjected to a real and a sham stimulation condition. Finally, due to the lack of standardized tACS procedures across studies, it remains difficult to draw definitive conclusions. The absence of an effect highlights the importance for understanding its underlying mechanisms [84], and setting up general procedural guidelines with regard to neurostimulation studies [51,85].
Taken together these issues, we argue that the lack of strong, localized, and phase-dependent stimulation is the most important factor contributing to our null result. Therefore, a follow-up of our study would be to use rhythmic Transcranial Magnetic Stimulation (TMS) to improve spatial resolution and induce stronger electrical fields [86] while simultaneously measuring EEG. Even though the spatial resolution of TMS remains debated [87], it is more focal than tACS. By using a closed-loop approach, brain states are continuously monitored and stimulation can be phase-aligned to individual theta oscillations. As such, we would be in a better position to influence learning. Interestingly, in the same experimental paradigm where rTMS at beta frequency modulated declarative memory [88], tACS at beta frequency did not successfully modulate memory formation [89]. This finding thus further validates the use of (rhythmic) TMS over tACS. To further increase stimulation strength, instead of delivering single pulses at theta frequency, another procedure would be to deliver high-frequency bursts at theta frequency. This procedure has also been shown to increase memory performance and certainty ratings [90,91] and thus is also a viable alternative for future research.
In summary, the current study examined whether applying theta-frequency (6 Hz) tACS over the MFC modulates the RPE effect in declarative learning. Previous behavioral results were replicated, with SRPEs driving declarative learning. However, theta tACS over the MFC did not modulate the effect of RPEs on declarative learning, and we proposed guidelines for future neuromodulation studies in declarative memory.