Dynamic Changes in Single Unit Activity and Gamma Oscillations in a Thalamocortical Circuit during Rapid Instrumental Learning

The medial prefrontal cortex (mPFC) and mediodorsal thalamus (MD) together form a thalamocortical circuit that has been implicated in the learning and production of goal-directed actions. In this study we measured neural activity in both regions simultaneously, as rats learned to press a lever to earn food rewards. In both MD and mPFC, instrumental learning was accompanied by dramatic changes in the firing patterns of the neurons, in particular the rapid emergence of single-unit neural activity reflecting the completion of the action and reward delivery. In addition, we observed distinct patterns of changes in the oscillatory LFP response in MD and mPFC. With learning, there was a significant increase in theta band oscillations (6–10 Hz) in the MD, but not in the mPFC. By contrast, gamma band oscillations (40–55 Hz) increased in the mPFC, but not in the MD. Coherence between these two regions also changed with learning: gamma coherence in relation to reward delivery increased, whereas theta coherence did not. Together these results suggest that, as rats learned the instrumental contingency between action and outcome, the emergence of task related neural activity is accompanied by enhanced functional interaction between MD and mPFC in response to the reward feedback.

However, despite their well established anatomical connectivity, the functional interaction between MD and mPFC during goaldirected behavior remains poorly understood, because no previous study has recorded activity from both regions simultaneously during goal directed behavior. Based on previous work [4,28], we hypothesized that instrumental learning is accompanied by significant changes in the coordination of medial prefrontal and mediodorsal thalamic activity. We predicted that, as rats learn to perform reward-guided actions, activity in both regions will change to reflect the acquisition of the action-outcome instrumen-tal contingency. To test this hypothesis, we chronically implanted miniaturized multi-electrode arrays (up to 64 channels) in rats to record from the MD and mPFC as they learned to press a lever to earn rewards. We recorded single unit activity as well as local field potential (LFP) chronically in both MD and mPFC as rats were trained to press a lever for food reward. We measured the oscillatory activity in these brain regions simultaneously across successive days of instrumental learning. Our results show that, in the MD-mPFC circuit, dynamic changes in both single unit spiking activity and oscillatory LFP response in neuronal populations accompany the learning of a new action.

Ethics Statement
All procedures were approved by the Institutional Animal Care and Use Committee at Duke University and followed National Institutes of Health guidelines (Protocol Number: A087-08-03).

Animals and Surgery
Eight male Long-Evans rats (,3 months of age at the beginning of the experiments) were used: in 5 rats we recorded single unit and LFP activity from MD and mPFC simultaneously, and in 3 rats we recorded from MD only. Surgery was performed under general anesthesia with isoflurane (2%). A craniotomy was performed over the bilateral thalamic and/or cortical locations according to known stereotaxic coordinates (from bregma in mm the coordinates were MD AP -2.1-3.3; ML-1-1; mPFC AP 4.6-2.5; ML -1-1). The electrode arrays used in this study consisted of 468 or 268 platinum-coated tungsten microwire electrodes (35 mm diameter, Innovative Neurophysiology, NC), with 150 mm between microwires, and 200 mm between rows. The arrays were lowered to the appropriate stereotaxic depth (MD ,5.0 mm, mPFC ,2.5 mm,). Electrode placement was confirmed post-mortem after perfusion and fixation with 10% formalin, followed by Thionin staining in 100 mm coronal sections ( Figure 1).

In vivo Multi-electrode Recording during Instrumental Learning
Two weeks after surgery, rats were food deprived and maintained at ,85% of free feeding weight throughout the experiments. Training took place in a Med Associates (St. Albans, VT) operant chamber designed for in vivo extracellular recording. The chamber was equipped with a food magazine that received 45 mg dustless precision pellets (Bio-Serv, NJ) from a pellet dispenser and two retractable levers on either side of the magazine and a 3 W 24 V house light mounted on the wall opposite the levers and magazine. A computer with the Med-PC-IV program was used to control the equipment and record behavior. Time stamps for lever pressing behavior and reward delivery were sent as TTL pulses to the Blackrock Cerebrus data acquisition system.
Lever press training consisted of four daily sessions under a continuous reinforcement schedule (CRF, each press earns one food pellet). Each session started with illumination of the house light and insertion of the lever, and ended with turning off the house light and retraction of the lever after 120 minutes or 100 earned pellets (whichever came first). The amount of training used was based on previous work on instrumental conditioning, which showed that performance was goal-directed following limited training [31]. In a pilot experiment, we also verified the goaldirected control of the instrumental performance using an outcome devaluation procedure. Rats (n = 4) were given a 90min pre-feeding session using the same pellets as the training sessions. They were then tested on a 2-min probe test conducted in extinction, i.e. without any reward delivery.
Single-unit and LFP activity were recorded using the Cerebrus data acquisition system (Blackrock Microsystems). For 468 electrodes arrays, a TBSI (Triangle Biosystems) gain 2 headstage were used. For 268 arrays, the Blackrock gain 1 headstage were used, as recently described [32,33]. In brief, the data were filtered with both analog and digital bandpass filters (analog high-pass first order Butterworth filter at 0.3 Hz, analog low-pass third order Butterworth filter at 7.5 kHz) and sampled at 30 kHz. Single unit data was separated with a high-pass digital filter (fourth order Butterworth filter at 250 Hz), while local field potential (LFP) signals were filtered with a third order high-pass filter and seventh order low-pass filter (0.1 Hz-5 Hz cutoffs).
Spikes were sorted using Offline Sorter (Plexon) and single-unit activity was isolated on the basis of principal component analysis. Only single-unit activity with a clear separation from noise was used for the analysis. Matlab was used to remove 60 Hz line noise and large transient artifacts in the LFP data: 60 Hz noise was removed using a blocked least mean squares (LMS) adaptive filter algorithm. The reference signal for the adaptive filter was created by finding the peak frequency of the LFP signal near the expected line noise frequency, and creating a sinusoidal reference signal with that frequency. The step size of the LMS algorithm was estimated by running the algorithm on a portion of the input signal for a range of varying step sizes, and using the step size that yielded the lowest RMS value of the error. Large transient motion artifacts were removed by subtracting a 20-sample moving window average around portions of the line-noise filtered signal with amplitude of greater than 6 standard deviations from the mean.

Data Analysis
Neuronal data analysis was performed with Neuroexplorer (Nex Technologies), Microsoft Excel, Graphpad Prism (GraphPad Figure 1. Electrode placement and behavioral results. A, Coronal sections of the rat brain illustrating MD and mPFC electrode placements. The coordinates are based on a standard rat brain atlas [58]. The numbers indicate distance in mm from Bregma. MDC, mediodorsal thalamic nucleus, central part; MDM, mediodorsal thalamic nucleus, medial part; Cg1, cingulate cortex, area1; PrL; prelimbic cortex. B, Outcome devaluation test. Devalued, rats received 1 h of unlimited food pellets, same as earned by lever pressing. Non-devalued, rats did not receive any food for 1 h before test. Normalized rate of presses were the ratio of presses under each condition. Error bars indicate SEM. doi:10.1371/journal.pone.0050578.g001 Software), and MATLAB (MathWorks). Neural activity was averaged in 50-ms bins, averaged across trials, and smoothed with a Gaussian filter to construct the Peri-Event histogram. To classify ''action initiation'' neurons, neural activity within 500 ms before the onset of lever pressing was compared to a baseline window from 1500 ms to 1000 ms before the lever press (two tailed t test was used, p,0.01). To classify "reward delivery" neurons, neural activity within a 1000 ms window after reward delivery was compared with a baseline window from 2000 to 1000 ms before reward delivery. The time windows used were based on visual inspection of the data.
Spectral analysis of LFP power and coherence was performed by using Neuroexplorer. The power spectra were calculated using Welch's method (512 frequencies between 1 and 100 Hz, smoothed with a Gaussian Kernel with bin width 3). Coherence is a measure of the linear correlation between two signals as a function of frequency [34,35]. Coherence between two signals is calculated by dividing the cross-spectral density function by the auto -spectral density function. The cross spectrum between two time series and the auto-spectrum of each signal are obtained by calculating the product of the Fast Fourier transformed series. The signals are then subdivided into time intervals of length equal to the number of frequency samples divided by the maximum frequency, and the spectra are estimated by averaging the spectrum over these intervals (Welch's method). The coherence measure is sensitive to both a change in power and a change in phase relationships. Consequently, if either power or phase changes in one of the signals, the coherence value is affected. In our study, Coherence analysis between LFPs from two regions was performed using 512 frequencies between 1 and 100 hz with a 5% overlap window, smoothed with a Gaussian kernel with bin width = 3.

Behavior
All rats were naive when training began. Within the very first session of training, they learned to press the lever for reward, and their performance improved over 4 days. Previous work has established that with such limited training, instrumental behavior is highly goal-directed, sensitive to devaluation of the outcome [31,36]. In a separate experiment, we assessed the effect of outcome devaluation on lever pressing with limited training. After the same amount of CRF training, rats were given 1 hour of exposure to unlimited amount of food pellets just before a 2-min probe session conducted in extinction. Outcome devaluation by pre-feeding significantly reduced instrumental performance (n = 4, paired t test, p = 0.01; Figure 1B), suggesting that with the amount of training used in this study the performance is controlled by the action-outcome instrumental contingency.

Electrode Placement
In 5 rats, MD and mPFC were recorded simultaneously, with each array covering both sides of the brain. Three rats were implanted in the MD only. Histological analysis showed clear electrode tracks and recoding sites in MD and mPFC (mainly prelimbic and infralimbic regions), but not in the anterior cingulate cortex ( Figure 1A). We recorded from a total of 268 neurons from MD (n = 69, 71, 66, 62 for each recording session) and 170 neurons from mPFC (n = 44, 45, 44, 37 per session). Based on the waveform differences over days from the same electrode, new neurons were considered to be recorded each day.

Changes in Single Unit Activity during Acquisition
Single unit neural activity was recorded starting with the 1 st session of CRF training. All rats learned to press a lever for food pellets within 4 sessions of training. In the beginning very few neurons were task related. With training, however, the neural activity in both MD and mPFC changed dramatically. The most common type of task related modulation was found in response to reward delivery. Figure 2A shows the dynamic changes of the firing rate of all recorded neurons upon the reward delivery across four consecutive sessions. Interestingly, the firing rates of mPFC neurons increased with learning (one-way ANOVA, Kruskal-Wallis test, p = 0.02), but not those of MD neurons (Kruskal-Wallis test, p = 0.71).
In both regions, many units responded after the termination of lever press and the delivery of the reward ( Figure 2B). The reported increase was observed even when only the first 30 presses from each session were analyzed. Thus, the increased number of  lever presses in the later sessions was not responsible for producing this effect. Representative waveforms of the single units are shown in Figure 3A. We found 42 MD neurons and 28 mPFC neurons that were significantly excited by the reward delivery. On the other hand, there were 32 MD neurons and 17 mPFC neurons that reduced firing after reward delivery ( Figure 3B). Some neurons exhibited clear increased responses to reward delivery even within a single session after learning (Figure 4).
We also analyzed single unit activity just before the lever press. We found that the activity of fewer neurons was modulated by action preparation and initiation. We found 15 "excited" neurons in the MD and 3 in the mPFC; and 17 "inhibited" neurons in the MD and 10 in the mPFC.

Changes in LFP during Learning
We also examined changes in LFP during learning. We recorded from 14 mPFC channels from 5 rats in which MD and mPFC were simultaneously recorded, and from 22 MD channels from 5 MD-mPFC and 2 MD rats (1 rat was excluded because of excessive noise in the LFP recording). Representative peri-event histograms are displayed in Figure 5A. Upon reward delivery, a prominent dip was observed in the LFP, indicating a net depolarization in the subthreshold activity of the neuronal population. As shown in Figure 5B, this depolarization increased in the course of learning. The effect was observed when we only analyzed the same number of presses from the first session and the last session, to rule out any differences due to the increase in the number of presses during learning.

Dynamic Changes in Neural Oscillations Associated with Learning
In the MD, LFP showed strong theta oscillations (,7-8 Hz) and weak gamma oscillations (,50 Hz), whereas mPFC LFP showed the opposite pattern ( Figure 6). More importantly, as shown in Figure 7, the overall oscillatory activity in both MD and mPFC changed dramatically during learning. In the MD, theta power increased during learning ( Figure 7B, one-way ANOVA, F = 5.75, p = 0.002), but gamma power did not change significantly (F = 0.53, p = 0.66). In the mPFC, on the other hand, gamma oscillations became very pronounced after learning ( Figure 7D, one-way ANOVA, F = 4.60, p = 0.008), but no significant changes were seen in the theta power (F = 1.15, p = 0.34).
In accord with our single unit recording data, we did not find significant modulation of the LFP during the action initiation period (just before the lever press). But gamma power in both MD and mPFC peaked upon the reward delivery. mPFC showed higher gamma power compared to MD. Two representative perievent spectrograms are shown in Figure 8. LFP oscillations upon reward delivery (during the time window from the reward delivery to the start of the head entry into the food cup) in both MD and mPFC changed differentially across training sessions. In the MD, neither theta nor gamma power changed significantly during acquisition (repeated measures ANOVA, Fs ,2.33, ps.0.05). By contrast, in the mPFC, gamma oscillations became more pronounced with training (repeated measures ANOVA, F = 3.21,

Changes in Coherence between MD and mPFC Activity during Learning
To determine the dynamic interactions between MD and mPFC during learning, we analyzed the coherence between these areas across four sessions. Coherence can be used as an estimate of the strength of coupling between activities from two different brain regions. As shown in Figure 9, the overall coherence between MD and mPFC changed significantly during the course of learning. Theta coherence did not change significantly across sessions (repeated measures ANOVA, F = 2.40, p = 0.07). By contrast, gamma coherence was weak at first, but increased significantly with learning (repeated measures ANOVA, F = 3.39, p = 0.02).
Next, we examine how the coherence between MD and mPFC was modulated by reward delivery across sessions. Gamma coherence upon reward delivery increased during learning (repeated measures ANOVA, F = 7.75, p = 0.0001), but theta coherence did not (F = 1.43, p = 0.24).

Discussion
To understand the role of MD and mPFC in the acquisition of goal directed behavior, we recorded from both areas as rats learned to press a lever for food rewards. All rats learned to press the lever by the end of the first session, and progressively increased their rate of lever pressing (Figure 2). They were able to learn rapidly the relationship between the lever press and reward. Neural activity in this thalamocortical circuit changed dramatically during instrumental learning. Our results suggest that MD and mPFC form a functional circuit, with similar task-related activity which emerges in the course of learning. However, we also Figure 7. Dynamic changes in oscillatory activity during learning. A, Power spectral analysis of theta and gamma oscillations in the MD. Theta band oscillations increased during training, but gamma oscillations did not. First, first session; Last, last (4th) training session. Representative data are shown from one rat with simultaneous MD and mPFC recordings. B, Normalized (% of the first session) power of theta and gamma oscillations in the MD (n = 22) during acquisition. Theta oscillations in the MD increased significantly over time, whereas gamma oscillations did not. Data from all animals are averaged and shown here. Error bars present SEM. C, Power spectral analysis of theta and gamma oscillations in the mPFC. Representative data are shown from one rat with simultaneous MD and mPFC recordings. D, Normalized (% of the first session) power of theta and gamma oscillations in the mPFC (n = 14) during acquisition. There was a significant increase in the gamma oscillation but not in theta oscillations. doi:10.1371/journal.pone.0050578.g007 found significant differences in the pattern of oscillatory activity in these two regions, and above all in the dynamic changes of such activity during training. Such oscillatory activity was modulated by reward delivery. The coherence between MD and mPFC activity also changed significantly during the course of learning (Table 1).
In our study, we recorded from completely naive rats learning to press the lever for the first time. We were thus able to collect data on how neural activity changed during the initial phase of instrumental learning, when the animal rapidly acquired the relationship between the lever press and reward delivery. It is important to note that performance of the action after initial acquisition is highly sensitive to changes in outcome value, as shown by our devaluation test. The lever pressing was therefore clearly goal-directed. The observed plasticity accompanies the acquisition of the action-outcome contingency.
At the start of training, there was virtually no task related neurons in either MD or mPFC. However, as the rats learned to press the lever, many neurons in both regions increased or decreased their rate of firing in relation to lever pressing and reward ( Figure 2). The LFP data ( Figure 5), which show significant depolarization in the subthreshold activity in response to reward delivery, also suggest that the emergence of reward elicited activity is a widespread phenomenon. To our knowledge, this is the first report of significant plasticity in vivo in this thalamocortical circuit during instrumental learning. For the continuous reinforcement task used in this study, reward is delivered immediately upon the completion of the lever press. Surprisingly, although the firing rate of some neurons were modulated during the action initiation period (starting at 500 ms before the lever press), such neurons are rare in both MD and mPFC. Nor did we observe significant population activity (LFP) that was modulated by action initiation. In contrast, neurons that altered their firing activity following the completion of the action and the reward delivery were much more common, confirmed by the LFP recordings (Figures 3 and 5). These results suggest that the primary role of the MD-mPFC circuit is to signal the outcome of the goal directed behavior, in this case the reward feedback. This is in accord with previous work that learning of stimulus reward associations also requires the MD [21,22,23].

Changes in Oscillatory Activity in Local Field Potential Recording
Oscillatory activities in different frequency ranges are widely found in different brain areas and correlated with behavioral states [37,38,39,40,41]. Previous work has shown significant changes in oscillations during learning [41,42,43,44]. Despite the similarities between MD and mPFC in their overall pattern of task-related activity, we observed striking differences between these areas in the dynamic changes in oscillatory LFP activity. Above all, gamma power increased in mPFC, but not in MD ( Figure 8).
Theta oscillations has been also shown to working memory performance in rodents [48], monkeys [49] and humans [50]. Gamma oscillations in the PFC are hypothesized to play an important role in attention by enhancing the neuronal representation of attended sensory input and by regulating the communication among neuronal groups in distinct areas that convey the behaviorally relevant information [46].
The coherence measure could reflect the functional interactions between different brain regions [51,52]. When we measured the overall coherence between simultaneously recorded MD and mPFC LFP during the course of training, we found that theta coherence did not change, whereas gamma coherence increased with instrumental learning. When we examined the coherence in response to the reward delivery, we also found a significant increase in gamma coherence, but theta coherence did not change significantly across sessions (Figure 9). The enhanced gamma coherence could reflect excitatory inputs responsible for the increase in firing rate of single units immediately after reward delivery ( Figure 3). Thus, an overall increase in gamma coherence between MD and mPFC in response to reward delivery is the most striking change in the LFP during initial acquisition. Such changes can have a major impact on effective communication between these two structures. Whether the increase in gamma coherence we observed reflects increased perceptual attention to essential environmental feedback for goal-directed actions, or plays a more critical role in the generation of the appropriate action, remains to be determined by future studies that manipulate online neural activity directly.
In short, our data revealed that instrumental learning in a standard operant task is accompanied by dramatic changes in coordination of population activity between MD and mPFC. Few neurons in MD and mPFC changed their activity prior to the initiation of action, suggesting that this thalamocortical circuit is not critical for action initiation and selection, in agreement with the effects of lesions to these two areas [4,28]. On the other hand, basal ganglia lesions are well known to impair action initiation [53]. Given the strong, projections from the mPFC to the ventral and medial striatal regions, signals representing behavioral outcomes (such as reward) could be transmitted to the basal ganglia, which plays an important role in the learning and expression of goal-directed actions [54]. The role of the MD-mPFC circuit therefore appears to be restricted to the signaling of the reward feedback following the action [55]. Our findings are also in agreement with previous lesion studies implicating MD and mPFC in the learning of the action-outcome contingency [4,56,57]. It is important to point out that this thalamocortical circuit alone is not sufficient for instrumental learning; a distributed circuit involving additional brain regions in the basal  ganglia is needed [28,54]. The present study therefore merely represents an initial step in elucidating the computational roles of the brain regions that are essential for the acquisition and expression of goal-directed behaviors.
Author Contributions