^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: SH ZJ WM. Performed the experiments: ZJ. Wrote the paper: SH ZJ WM. Theoretical analysis: SH.

Experimental data from neuroscience suggest that a substantial amount of knowledge is stored in the brain in the form of probability distributions over network states and trajectories of network states. We provide a theoretical foundation for this hypothesis by showing that even very detailed models for cortical microcircuits, with data-based diverse nonlinear neurons and synapses, have a stationary distribution of network states and trajectories of network states to which they converge exponentially fast from any initial state. We demonstrate that this convergence holds in spite of the non-reversibility of the stochastic dynamics of cortical microcircuits. We further show that, in the presence of background network oscillations, separate stationary distributions emerge for different phases of the oscillation, in accordance with experimentally reported phase-specific codes. We complement these theoretical results by computer simulations that investigate resulting computation times for typical probabilistic inference tasks on these internally stored distributions, such as marginalization or marginal maximum-a-posteriori estimation. Furthermore, we show that the inherent stochastic dynamics of generic cortical microcircuits enables them to quickly generate approximate solutions to difficult constraint satisfaction problems, where stored knowledge and current inputs jointly constrain possible solutions. This provides a powerful new computing paradigm for networks of spiking neurons, that also throws new light on how networks of neurons in the brain could carry out complex computational tasks such as prediction, imagination, memory recall and problem solving.

The brain has not only the capability to process sensory input, but it can also produce predictions, imaginations, and solve problems that combine learned knowledge with information about a new scenario. But although these more complex information processing capabilities lie at the heart of human intelligence, we still do not know how they are organized and implemented in the brain. Numerous studies in cognitive science and neuroscience conclude that many of these processes involve probabilistic inference. This suggests that neuronal circuits in the brain process information in the form of probability distributions, but we are missing insight into how complex distributions could be represented and stored in large and diverse networks of neurons in the brain. We prove in this article that realistic cortical microcircuit models can store complex probabilistic knowledge by embodying probability distributions in their inherent stochastic dynamics – yielding a knowledge representation in which typical probabilistic inference problems such as marginalization become straightforward readout tasks. We show that in cortical microcircuit models such computations can be performed satisfactorily within a few

The question whether brain computations are inherently deterministic or inherently stochastic is obviously of fundamental importance. Numerous experimental data highlight inherently stochastic aspects of neurons, synapses and networks of neurons on virtually all spatial and temporal scales that have been examined

The goal of this article is to provide a theoretical foundation for understanding stochastic computations in networks of neurons in the brain, in particular also for the generation of structured spontaneous activity. To this end, we prove here that even biologically realistic models

Our theoretical results imply that virtually any data-based model

A crucial issue which arises is whether reliable readouts from

The notion of a cortical microcircuit arose from the observation that “it seems likely that there is a basically uniform microcircuit pattern throughout the neocortex upon which certain specializations unique to this or that cortical area are superimposed”

We show that for this standard model of a cortical microcircuit marginal probabilities for single random variables (neurons) can be estimated through sampling even for fairly large instances with 5000 neurons within a few

We also address the question to which extent our theoretical framework can be applied in the context of periodic input, for example in the presence of background theta oscillations

Finally, our theoretically founded framework for stochastic computations in networks of spiking neurons also throws new light on the question how complex constraint satisfaction problems could be solved by cortical microcircuits

In order to make the results of this article accessible to non-theoreticians we present in the subsequent

A simple notion of network state at time

This result can be derived within the theory of Markov processes on general state spaces, an extension of the more familiar theory of Markov chains on finite state spaces to continuous time and infinitely many network states. Another important difference to typical Markov chains (e.g. the dynamics of Gibbs sampling in Boltzmann machines) is that the Markov processes describing the stochastic dynamics of cortical microcircuit models are non-reversible. This is a well-known difference between simple neural network models and networks of spiking neurons in the brain, where a spike of a neuron causes postsynaptic potentials in other neurons - but not vice versa. In addition, experimental results show that brain networks tend to have a non-reversible dynamics also on longer time scales (e.g., stereotypical trajectories of network states

In order to prove results on the existence of stationary distributions

Note that Theorem 1 states that the network embodies not only the joint distribution

Theorem 1 requires that neurons fire stochastically. More precisely, a basic assumption required for Theorem 1 is that the network behaves sufficiently stochastic at

An illustration for Theorem 1 is given in

The external input

The influence of the initial network state on the first

Theorem 1 also applies to networks which generate stereotypical trajectories of network activity

We address two basic types of knowledge extraction from a stationary distribution

In order to place the estimation of marginals into a biologically relevant context, assume that a particular component

But according to Theorem 1, the correct marginal distribution

Marginal probabilities of subpopulations, for example

Notably, the estimation of marginals sketched above is guaranteed by ergodic theory to converge to the correct probability as observation time increases (due to Theorem 1 which ensures that the network is an ergodic Markov process, see

Approximate maximal a posteriori (MAP) assignments to small subsets of variables

A sample-based approximation of this operation can be implemented by keeping track of which network states in the subnetwork

Whereas many types of computations (for example probabilistic inference via the junction tree algorithm

We address this question by analyzing the convergence speed of stochastic computations in the cortical microcircuit model of

Various more efficient

Using the Gelman-Rubin diagnostic, we estimated convergence speed for marginals of single neurons (see

The above simulations were performed in a circuit of 560 neurons, but eventually one is interested in the properties of much larger circuits. Hence, a crucial question is how the convergence properties scale with the network size. To this end, we compared convergence in the cortical microcircuit model of

In order to estimate the required computation time associated with the estimation of marginal probabilities and MAP solutions on small

An interesting research question is which dynamic or structural properties of a cortical microcircuit model

Convergence properties for single neurons (as in

As a reference point,

In Theorem 1 we had already addressed one important case where the network

Hence, a circuit

Theta-paced spatial path sequences in hippocampus constitute a particularly well-studied example of phase-specific network activity

The previously described theoretical framework also provides an interesting new perspective on multi-stability, a wide-spread phenomenon which has been observed in various sensory domains

In

Altogether, one sees that the presence of background oscillations has relevant functional implications on multi-stability. In particular, the presence of background oscillations in multi-stable networks facilitates both exploitation within a cycle and exploration across cycles: Within a cycle high firing rates force the network into one of the attractors, thereby avoiding interference with other attractors and facilitating the readout of a consistent network state. At the end of a cycle low firing rates allow the network to switch to different attractors, thereby promoting fast convergence to the stationary distribution. The rhythmic deepening and flattening of attractors and the resulting phase-specific attractor dynamics could be particularly useful for the extraction of information from the circuit if downstream networks are phase-locked to the same rhythm, as reported, for example, for the interactions between neurons in hippocampus and prefrontal cortex

Whenever an inhibitory neuron fires, it reduces for a short while the probability of firing for its postsynaptic targets. In fact, new experimental data

We have selected a specific constraint satisfaction problem for demonstrating the capability of networks of spiking neurons to generate rapidly approximate solutions to constraint satisfaction problems through their inherent stochastic dynamics: solving Sudoku puzzles (see

This architecture makes it easy to impose the interlocking constraints of Sudoku (and of many other constraint satisfaction problems). Each pyramidal cell (or each local group of pyramidal cells) votes for placing a particular digit into an empty field of the grid, that is not dictated by the external input

A specific puzzle can be entered by providing strong input

In our simulations we found that the solve time (the time until the correct solution is found for the first time) generally depends on the hardness of the Sudoku, in particular on the number of givens. For the “hard” Sudoku with 26 givens from

We have shown that for common noise models in cortical microcircuits, even circuits

The stationary distribution

Our computer simulations for a standard cortical microcircuit model

Another important issue is the tradeoff between sampling time and sampling accuracy. In high-level cognitive tasks, for example, it has been argued that “approximate and quick” sample-based decisions are often better than “accurate but slow” decisions

It had been shown in

Attractor neural networks

We had focused in our computer simulations on the investigation of the stationary distribution

A surprisingly large number of computational tasks that the brain has to solve, from the formation of a percept from multi-modal ambiguous sensory cues, to prediction, imagination, motor planning, rule learning, problem solving, and memory recall, have the form of constraint satisfaction problems: A global solution is needed that satisfies all or most of a set of soft or hard constraints. However, this characterization per se does not help us to understand how the brain can solve these tasks, because many constraint satisfaction problems are computationally very demanding (in fact, often NP-hard

Future work will need to investigate whether and how this approach can be scaled up to larger instances of NP-complete constraint satisfaction problems. For example, it will be interesting to see whether stochastic networks of spiking neurons can also efficiently generate heuristic solutions to energy minimization problems

Furthermore, additional research is needed to address suitable readout mechanisms that stabilize and evaluate promising candidate solutions (see

A substantial number of behavioral studies in cognitive science (see e.g.

In biological networks it is reasonable to assume that the network dynamics unfolds on a continuum of time scales from milliseconds to days. Our goal in this article was to focus on stochastic computations on shorter time scales, between a few milliseconds to seconds. To this end we assumed that there exists a clear separation of time scales between fast and slow dynamical network features, thus allowing us to exclude the effect of slower dynamical processes such as long-term plasticity of synaptic weights during these shorter time scales. In network models and experimental setups where slower processes significantly influence (or interfere with) the dynamics on shorter time scales, it would make sense to extend the concept of a stationary distribution to include, for example, also the synaptic parameters as random variables. A first step in this direction has been made for neurons with linear sub-threshold dynamics and discretized synapses in

Deterministic network models such as leaky integrate-and-fire neurons without noise (no external background noise, no synaptic vesicle noise and no channel noise) violate the assumptions of Theorem 1 and 2. Furthermore, although realistic neurons are known to possess various noise sources, the theoretical assumptions could in principle still fail if the network is not

For deterministic (or insufficiently stochastic) networks the question arises whether convergence to a unique stationary distribution may still occur under appropriate conditions, perhaps in some modified sense. Notably, it has been recently observed that deterministic networks may indeed lead to apparently stochastic spiking activity

Our theoretical results demonstrate that every neural system

Our Theorem 2 suggests in addition that neural systems

Our Theorem 1 predicts in addition that a generic neural circuit

The model for problem solving that we have presented in

The Sudoku example has shown that networks of spiking neurons with noise are in principle able to carry out quite complex computations. The constraints of many other demanding constraint satisfaction problems, in fact even of many NP-complete problems, can be encoded quite easily into circuit motifs composed of excitatory and inhibitory spiking neurons, and can be solved through the inherent stochastic dynamics of the network. This provides new computational paradigms and applications for various energy-efficient implementations of networks of spiking neurons in neuromorphic hardware, provided they can be equipped with sufficient amounts of noise. In particular, our results suggest that attractive computational properties of Boltzmann machines can be ported into spike-based hardware. These novel stochastic computing paradigms may also become of interest for other types of innovative computer hardware: Computer technology is approaching during the coming decade the molecular scale, where noise is abundantly available (whether one wants it or not) and it becomes inefficient to push through traditional deterministic computing paradigms.

The results of this article show that stochastic computation provides an attractive framework for the investigation of computational properties of cortical microcircuits, and of networks of microcircuits that form larger neural systems. In particular it provides a new perspective for relating the structure and dynamics of neural circuits to their computational properties. In addition, it suggests a new way of understanding the organization of brain computations, and how they are modified through learning.

The Markov state

For each neuron

We denote the space of all possible network states of length

We study general theoretical properties of stochastic spiking circuit models, driven by some external, possibly vector-valued, input

We consider in this article two different noise models for a neuron: In noise model I, the spike generation is directly modeled as a stochastic process. All network dynamics, including axonal delays, synaptic transmission, short-term synaptic dynamics, dendritic interactions, integration of input at the soma, etc. can be modeled by a function which maps the Markov state (which includes the recent spike history of the neuron itself) onto an instantaneous spiking probability. This model is highly flexible and may account for various types of neuronal noise. In the more specific noise model II, the firing mechanism of the neuron is assumed to be deterministic, and noise enters its dynamics through stochastic vesicle release at afferent synaptic inputs. Also combinations of noise models I and II in the same neuron and circuit can be assumed for our theoretical results, for example neurons with a generic stochastic spiking mechanism which possess in addition stochastic synapses, or mixtures of neurons from model I and II in the same circuit.

In noise model I, the instantaneous spiking probability of neuron

Assumptions

The input signal

In noise model II the basic stochastic event is a synaptic vesicle release (in noise model I it is a spike). Accordingly, the Markov state

Combinations of noise model I and II are also possible. In this case, the Markov state

Below, proofs for the existence and uniqueness of stationary distributions of network states for the considered network models are given. Furthermore, bounds on the convergence speed to this stationary distribution are provided. To obtain a comprehensive picture, convergence is studied under three different input conditions: constant, stochastic, and periodic input. All proofs are described in detail for noise model I. The results transfer in a straightforward manner to noise model II and mixtures of these two models, since the same framework of assumptions applies to all cases.

We view the simulation of a cortical microcircuit model, under a given input condition and starting from a given initial network state, as a random experiment. Formally, we denote the set of all possible outcomes in this random experiment by

We define the index set of time

For subsequent proofs the following definition of a

Here we write

Before studying specific input conditions, a few basic key properties of the network dynamics

The proposition follows directly from the fact that

Proposition 1 entails a central contraction property of stochastic networks of spiking neurons

Note that the above Contraction Lemma which holds for spiking neural networks has some similarities to Lemma 1 in

We divided the precise formulation of Theorem 1 into two Lemmata: Lemma 2 is a precise formulation for the case where inputs are fixed (e.g. fixed input rates). Lemma 3 in the next section corresponds to the case where input rates are controlled by a Markov process. The precise assumptions on the network model required for both Lemmata are described above (see “Scope of theoretical results”).

Here we assume that the vector of inputs

Under constant input conditions,

Lemma 2 provides a general ergodicity result for the considered class of stochastic spiking networks in the presence of fixed input rates

Note that, although aperiodicity and irreducibility are well known necessary and sufficient conditions for ergodicity in discrete time Markov chains on finite state spaces, they are not sufficient for exponential ergodicity in continuous time Markov processes on general state spaces (see

Lemma 2 constitutes a proof for Theorem 1 for fixed input rates

Fixed input assumptions may often hold for the external input

Let

The second part of Theorem 1 (exponentially fast convergence for the case of external input generated by an ergodic Markov process) follows from Lemma 3. Note that in the main text we slightly abuse the notation

We have split the proof of Lemma 3 into proofs of four auxiliary claims (Propositions 2–5). Consider the following variations of Proposition 1, which hold for the Markov process

It is easy to show that these properties, together with the fact that

The

Denote by

According to this definition, one can express the hitting time of degree

Let

Furthermore, let

This follows from (25) and (26) which ensure that whenever the input process visits the small set

By Proposition 5,

The Markov states

If the input sequence is periodic with period

This implies the following result, which is a more precise version of Theorem 2:

Lemma 4 then follows from recursive application of (54)–(56) for multiple periods, and choosing a singleton

In the main text, we use the notation

Previous work on the question whether states of spiking neural networks might converge to a unique stationary distribution had focused on the case where neuronal integration of incoming spikes occurs in a linear fashion, i.e., linear subthreshold dynamics followed by a single output non-linearity

The recent publication

We are not aware of previous work that studied convergence in spiking networks with dynamic synapses, or in the presence of stochastic or periodic inputs (see the second part of Theorem 1 concerning Markov processes as input, and Theorem 2). We further note that our method of proof builds on a new and rather intuitive intermediate result, Lemma 1 (Contraction Lemma), which may be useful in its own right for two reasons. On the one hand it provides more direct insight into the mechanisms responsible for convergence (the contraction between any two distributions). On the other hand, it holds regardless of the input trajectory

A key advantage of sample-based representations of probability distributions is that probabilities and expected values are in principle straightforward to estimate: To estimate the expected value

Under the mild assumptions of Theorem 1 the dynamics of a stochastic spiking network in response to an input

This approach can also be used to estimate marginal probabilities, since probabilities can be expressed as expected values, for example,

All simulations of microcircuit models for

A stochastic variation of the leaky integrate-and-fire model with conductance-based integration of synaptic inputs was used, for both excitatory and inhibitory neurons. Sub-threshold dynamics of the membrane potential

Note that Theorem 1 also holds for substantially more complex multi-compartment neuron models incorporating, for example, data on signal integration in the dendritic tuft of pyramidal cells

The short-term dynamics of synapses in all data-based simulations was modeled according to

E | I | |

L2/3 | 120 | 30 |

L4 | 80 | 20 |

L5 | 200 | 50 |

Synaptic parameters and connectivity rules for the data-based cortical column model were taken from

We tested the validity of our cortical microcircuit model by comparing the average activity of different layers (see

The small cortical microcircuit model of

Various methods have been developed for measuring convergence speed to a stationary distribution in the context of Markov chain Monte Carlo sampling

The Gelman-Rubin convergence diagnostic

For computing the scale reduction factor

An unfortunate source of confusion is the fact that Gelman and Rubin

In the multivariate case (e.g. when analyzing convergence of the vector-valued simple state of a small subset of neurons as in the dotted lines of

Gelman-Rubin values were calculated based on

Random readouts for

Synapses onto the readout neuron were created in a similar manner as connections within the cortical column model: short-term plasticity parameters were set depending on the type of connection (EE or IE) according to

Convergence analysis of vector-valued simple states of subsets of neurons (see

In

Below are additional details to the circuits used for

The theoretical proof for Theorem 2 can be found after the proof of Theorem 1 above. For

A constraint satisfaction problem consists of a set of variables defined on some domain and a set of constraints, which limit the space of admissible variable assignments. A solution to a problem consists of an assignment to each variable such that all constraints are met. To formulate Sudoku as a constraint satisfaction problem, we define for each of the 81 fields (from a standard 9×9 grid), which has to be filled with a digit from 1 to 9, a set of 9 binary variables (taking values in

Sudoku can be implemented in a spiking neural network by creating for each of the 9 binary variables in each Sudoku field a local group of

Stochastic spike generation in both excitatory and inhibitory neurons is implemented consistent with the theoretical noise model I (see next section for details). The network thus fulfills all theoretical conditions for Theorem 1, and is guaranteed to have a unique stationary distribution

Simulations for

WTA circuits were formed by reciprocally connecting a single inhibitory neuron to all participating pyramidal cells. The single inhibitory neuron was modeled to mimic the response of a population of inhibitory neurons (i.e. strong inhibition for a prolonged amount of time), using an absolute refractory period of

To set a particular puzzle, given numbers were fixed by providing strong input currents to the corresponding pyramidal cells. In particular, neurons coding for the given numbers in a Sudoku field received a constant positive input current (a constant input

A final practical remark concerns the number of neurons coding for each binary variable,

We would like to thank Stefan Häusler, Robert Legenstein, Dejan Pecevski, Johannes Bill and Kenneth Harris for helpful discussions. We are grateful to Dejan Pecevski for developing the NEVESIM simulator, an event based simulator for networks of spiking neurons written in C++ with a Python Interface (