Fig 1.
Stimulating the network at an electrode evokes a burst of activity. Response strengths were dependent on the period of inactivity preceding the stimulus.
(A) Raster shows responses at one chosen recording channel in a network to 50 stimuli at the same electrode. Stimuli were delivered periodically, and thus at random latencies relative to the previous SB. Stimulation cycled through five pre-selected electrodes at 10 s intervals. Stimulus properties: -0.7 V, 0.4 ms, monophasic against common ground. Trials were aligned to the time of stimulation (red line) and sorted by the count of spikes within the designated response window (see magenta overlay). A response window of 2 s was chosen for this network. The diagram exposes the relationship of response strengths to the period of prior inactivity. The first 200 ms post-stimulus is zoomed in panel (B). Responses typically consisted of an early (≤ 20 ms post-stimulus) and late (≥ 50 ms post-stimulus) component. (C) The relationship between response strengths and periods of prior inactivity can be captured in a saturating exponential model similar to the dependency of response length [20].
Fig 2.
Stimulation trials and the closed-loop architecture.
(A) A trial started with the end of a spontaneous burst (SB). The trial was terminated either by the next SB (dotted box) or a stimulation. In our paradigm, reward was defined as the number of spikes in the response. Interruptions by SBs led to neutral rewards (punishment). (B) The time within each trial was discretized into 0.5 s steps, corresponding to states 1, …, N. At each state, the controller could choose between two actions: to wait or to stimulate. A ‘stimulate’ action led to one of the terminal states Ti, with i indicating the strength of the response. Terminal state F was reached if the trial was interrupted by ongoing activity. (C) Schematic visualization of the closed-loop architecture.
Fig 3.
Identification of network specific objective functions.
(A) Networks of dissociated neurons in vitro exhibit activity characterized by intermittent network-wide spontaneous bursts (SB) separated by periods of reduced activity (raster plot for 60 channels in a DIV 27 network). The shading marks the limits of individual SBs as detected by the burst-detection algorithm. (B) The distribution of Inter-Burst Intervals (IBIs) is approximately lognormal. The histogram shows the IBI distribution for the network in (A). The cumulative of this distribution (red) is predictive of the probability of being interrupted by ongoing activity given the elapsed period of inactivity, i.e. the current state st. (C) Such a distribution was used to weight response strengths so that each dot represents the mean response strengths that can be evoked over a set of trials, including those that did not lead to stimulation, for a given stimulation latency. The fit predicts the objective function of the optimization problem. The example shows the data for the network shown in Fig 1C. The curve reveals a quasiconcave dependency, a unique global maximum and an optimal latency of ≈ 2.5 s in this network. (D) Fits to the probability of avoiding an interruption (blue), response strengths prediction (orange), and the resulting weighted response curve (orange, dotted) shown for another network. An optimal latency of ≈ 1.5 s emerges in this case. (E) All predicted objective functions for each of the 20 networks studied were quasiconcave and unique choices of optimal stimulus latencies were available. The objective functions were normalized to peak magnitude.
Fig 4.
Dependence of optimal latency on parameters that capture the network’s response to stimuli.
Dependence of the objective function on parameters that capture the network’s response to stimuli. In all panels the parameters λ, μ and σ were set to 6.67, 1, 0.6 and 1, respectively. (A) Changes of response strength with the gain A of the response strength model within the range observed experimentally (5 ≤ A ≤ 40, B = 6.67; t: stimulus latency) (B) The optimal latencies t* (dots), i.e. the maxima of the objective function f(t) increased non-linearly with the gain parameter A (dashed line). Color code as in panel A (B = 6.67). (C) Changes of optimal timing t* as a function of gain A and y-intercept B within the range observed experimentally (-10 ≤ B ≤20). B influences the relationship of t* with A and was trivial at B = 0. Black dots and dashed line indicate the case B = 6.67 shown in panel B. Note that A + B > 0 was imposed to ensure that the maximal responses were strictly positive.
Fig 5.
Dependence of the optimal latency on properties of the network’s activity dynamics.
(A) Dependence of the optimal stimulus latency t* on the A − B plane. Each plane corresponds to a different value of the time constant λ of the recovery function within the range observed experimentally (0.2 ≤ λ ≤ 1.2). (inset) Zoom-in to −2 ≤ B ≤ 6.67 to reveal the monotonic rise of t* (dots and dashed line) that corresponds to the case described in Fig 4B (λ = 1). (B) Dependence of the gain in stimulation efficacy by using t* over random stimulation latencies on the time constant λ of the recovery function. μ, A, B, and σ were set to 0.6, 20, 6.67, and 1 respectively. (C) IBI distributions for the range of values observed experimentally of the location parameter μ (0.6 ≤ μ ≤ 2) for A, B, λ, σ set to 20, 6.67, 1 and 1 respectively. (D) The family of objective functions corresponding to the IBI distributions in (C) shows the near linear relationship of the optimal latencies with μ (dots and dashed line) (A, B, λ, σ were 20, 6.67, 1 and 1 respectively; colors as in (C)). (E) Summary of the dependence of the optimal stimulus latency on the A–B–μ space for λ = 1. Each plane corresponds to a different value of the location parameter μ of the IBI distribution. (inset) Zoom-in to −2 ≤ B ≤ 6.67) to reveal the rise of t* (dots and dashed line) that corresponds to the case described in Fig 4B (λ = 1, μ = 0.6).
Fig 6.
A closed-loop learning session in an example network.
A closed-loop learning session in an example network. The session consisted of 1000 trials (200 training (Ti, red), 50 testing (Xi, green) trials and 4 such pairs) (A) Raster diagram showing the activity at the recording channel around the time of stimulation. Trials interrupted by ongoing activity are left empty at t > 0 s. The spikes of the interrupting SB were removed in (A) and (B) for clarity. Successful stimuli evoked responses at t > 0 s. Blue lines mark the period of latency prior to the stimulus at t = 0 s Magenta triangles indicate stimuli delivered in preceding trials. Within training rounds, the controller was free to explore the state space. Note that these rounds are in closed-loop mode but with a random sequence of stimulation latencies. The strategy in this example was non-greedy. During testing rounds the hitherto best policy was chosen. After the final round, a latency of ≈ 1.4 s was learned. Stimulus properties were as in Fig 1. (B) Zoom-in on responses evoked throughout the session. Interrupted trials appear as empty rows; in this example all stimuli elicited responses. (C) Stimulus efficacy estimated as the response strength per SB (RS/SB) computed over each of the training/testing rounds. RS/SB improved considerably during testing compared to the training rounds. The fraction of trials interrupted in each round is shown as red circles and numerically. The dashed line was added for clarity.
Fig 7.
Comparison of open-loop predictions with autonomously learned strategies.
(A) Dependence of response strengths on pre-stimulus inactivities in data during a closed-loop session in an example network. Each box shows the statistics of response strengths recorded at one discrete state. The central measures are median and the edges with 25th and 75th percentiles. Whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. The fit (red) was made to the medians. The minimal latency for burst termination was 0.4 s in this example, which was thus the earliest state available for stimulation. (B) Across networks, closed-loop estimates of the gain A correlated strongly with open-loop estimates (r = 0.91, p<10-5, n = 15 networks), indicating that A was mostly stable during the experiments. (C) Similarly, closed-loop estimates of B were in agreement with open-loop ones (r = 0.66, p = 0.003, n = 18 networks), although to a lesser degree. (D) Across networks, learned stimulus latencies show a positive correlation with predicted optimal values (r = 0.94, p<10-8, n = 17 networks). (E) In spite of some variability in Panels B-D the magnitudes of the modeled objective functions for predicted and learned latencies matched closely (green dots), indicating that the network/stimulator system was performing at a near optimal regime, regardless of slight discrepancies in the latencies. Exact optima were likely unreachable owing to the coarse discretization (0.5 s) of states. Red dots denote the corresponding magnitudes at trand for a strategy delivering stimuli at random latencies estimated as the mean of the objective function. (F) The distribution of errors between learned and predicted latencies is centered around the predicted optimum and confined to within 2 discrete steps from it.
Fig 8.
Performance evaluation of the controller.
(A) The percentage of interrupted trials during training (x-axis) and testing (y-axis) sessions (n = 52 pairs across 11 networks). This percentage decreased sharply after learning in 94.2% of the recorded sessions. (B) The mean RS evoked per stimulus was, however, preserved in both sessions. (C) The variability in RS per stimulus decreased significantly (p = 0.01, two-sample t-test). (D) Comparison of the optimal stimulus efficacies predicted from our models with the efficacies achieved during the final closed-loop testing sessions. Vertical bars represent 99% confidence intervals corresponding to the models fitted for each network. Achieved values fall within the interval in 8/11 networks studied. (E) Mean rewards were calculated over trials in the final training and testing rounds to compare the controller’s performance. After learning, mean rewards increased in each network, which is indicative of the improvement in stimulation efficacy. The rewards across the sequence of trials in each round were drawn from distinct distributions in every network (p<0.002, two-sample Kolmogorov-Smirnov test). The individual distributions are shown in S2 and S3 Figs. (F) Summary of learning across networks on a normalized RS/stimulus vs. interruption probability plane (11 networks). Only final training and testing rounds were considered. Normalization for interruptions was performed relative to the model-based estimate of interruption probabilities, corresponding to stimulation at random latencies for each network. The RS/stimulus measure was similarly normalized to the model-based estimates of the efficacy assuming a random stimulation strategy. The improvement in performance clearly separates the data points in the plane. Of the two modalities that contribute to stimulus efficacy, the improvement was dominated by reduction of interruption probabilities.