Towards a theory of cortical columns: From spiking neurons to interacting neural populations of finite size

Neural population equations such as neural mass or field models are widely used to study brain activity on a large scale. However, the relation of these models to the properties of single neurons is unclear. Here we derive an equation for several interacting populations at the mesoscopic scale starting from a microscopic model of randomly connected generalized integrate-and-fire neuron models. Each population consists of 50–2000 neurons of the same type but different populations account for different neuron types. The stochastic population equations that we find reveal how spike-history effects in single-neuron dynamics such as refractoriness and adaptation interact with finite-size fluctuations on the population level. Efficient integration of the stochastic mesoscopic equations reproduces the statistical behavior of the population activities obtained from microscopic simulations of a full spiking neural network model. The theory describes nonlinear emergent dynamics such as finite-size-induced stochastic transitions in multistable networks and synchronization in balanced networks of excitatory and inhibitory neurons. The mesoscopic equations are employed to rapidly integrate a model of a cortical microcircuit consisting of eight neuron types, which allows us to predict spontaneous population activities as well as evoked responses to thalamic input. Our theory establishes a general framework for modeling finite-size neural population dynamics based on single cell and synapse parameters and offers an efficient approach to analyzing cortical circuits and computations.


Introduction
When neuroscientists report electrophysiological, genetic, or anatomical data from a cortical neuron, they typically refer to the cell type, say, a layer 2/3 fast-spiking interneuron, a parvalbumin-positive neuron in layer 5, or a Martinotti cell in layer 4, together with the area, say primary visual cortex or somatosensory cortex [1][2][3][4]. Whatever the specific taxonomy used, the fact that a taxonomy is plausible at all indicates that neurons can be viewed as being organized into populations of cells with similar properties. In simulation studies of cortical networks with spiking neurons, the number of different cell types, or neuronal populations, per cortical column ranges from eight [5] to about 200 [6] with 31'000 to 80'000 simulated neurons for one cortical column, but larger simulations of several columns adding up to a million neurons (and 22 cells types) have also been performed [7]. In the following, we will refer to a model where each neuron in each population is simulated explicitly by a spiking neuron model as a microscopic model.
On a much coarser level, neural mass models [8][9][10], also called field models [11][12][13], population activity equations [14], rate models [15], or Wilson-Cowan models [16] represent the activity of a cortical column at location x by one or at most a few variables, such as the mean activity of excitatory and inhibitory neurons located in the region around x. Computational frameworks related to neural mass models have been used to interpret data from fMRI [17,18] and EEG [9]. Since neural mass models give a compact summary of coarse neural activity, they can be efficiently simulated and fit to experimental data [17,18].
However, neural mass models have several disadvantages. While the stationary state of neural mass activity can be matched to the single-neuron gain function and hence to detailed neuron models [11,14,[19][20][21][22], the dynamics of neural mass models in response to a rapid change in the input does not correctly reproduce a microscopic simulation of a population of neurons [14,22,23]. Second, fluctuations of activity variables in neural mass models are either absent or described by an ad hoc noise model. Moreover, the links of neural mass models to local field potentials are difficult to establish [24]. Because a systematic link to microscopic models at the level of spiking neurons is missing, existing neural mass models must be considered as heuristic phenomenological, albeit successful, descriptions of neural activity.
In this paper we address the question of whether equations for the activity of populations, similar in spirit but not necessarily identical to Wilson-Cowan equations [16], can be systematically derived from the interactions of spiking neurons at the level of microscopic models. At the microscopic level, we start from generalized integrate-and-fire (GIF) models [14,25,26] because, first, the parameters of such GIF models can be directly, and efficiently, extracted from experiments [27] and, second, GIF models can predict neuronal adaptation under stepcurrent input [28] as well as neuronal firing times under in-vivo-like input [26]. In our modeling framework, the GIF neurons are organized into different interacting populations. The populations may correspond to different cell types within a cortical column with known statistical connectivity patterns [3,6]. Because of the split into different cell types, the number of neurons per population (e.g., fast-spiking inhibitory interneurons in layer 2/3) is finite and in the range of 50-2000 [3]. We call a model at the level of interacting cortical populations of finite size a mesoscopic model. The mathematical derivation of the mesoscopic model equations from the microscopic model (i.e. network of generalized integrate-and-fire neurons) is the main topic of this paper. The small number of neurons per population is expected to lead to characteristic fluctuations of the activity which should match those of the microscopic model.
The overall aims of our approach are two-fold. As a first aim, this study would like to develop a theoretical framework for cortical information processing. The main advantage of a systematic link between neuronal parameters and mesoscopic activity is that we can quantitatively predict the effect of changes of neuronal parameters in (or of input to) one population on the activation pattern of this as well as other populations. In particular, we expect that a valid mesoscopic model of interacting cortical populations will become useful to predict the outcome of experiments such as optogenetic stimulation of a subgroup of neurons [29][30][31]. A better understanding of the activity patterns within a cortical column may in turn, after suitable postprocessing, provide a novel basis for models of EEG, fMRI, or LFP [9,17,18,24]. While we cannot address all these points in this paper, we present an example of nontrivial activity patterns in a network model with stochastic switching between different activity states potentially linked to perceptual bistability [32][33][34] or resting state activity [18].
As a second aim, this study would like to contribute to multiscale simulation approaches [35] in the neurosciences by providing a new tool for efficient and consistent coarse-grained simulation at the mesoscopic scale. Understanding the computations performed by the nervous system is likely to require models on different levels of spatial scales, ranging from pharmacological interactions at the subcellular and cellular levels to cognitive processes at the level of large-scale models of the brain. Ideally, a modeling framework should be efficient and consistent across scales in the following sense. Suppose, for example, that we are interested in neuronal membrane potentials in one specific group of neurons which receives input from many other groups of neurons. In a microscopic model, all neurons would be simulated at the same level; in a multi-scale simulation approach, only the group of neurons where we study the membrane potentials is simulated at the microscopic level, whereas the input from other groups is replaced by the activity of the mesoscopic model. A multiscale approach is consistent, if the replacement of parts of the microscopic simulation by a mesoscopic simulation does not lead to any change in the observed pattern of membrane potentials in the target population. The approach is efficient if the change of simulation scale leads to a significant speed-up of simulation. While we do not intend to present a systematic comparison of computational performance, we provide an example and measure the speed-up factor between mesoscopic and microscopic simulation for the case of a cortical column consisting of eight populations [5].
Despite of its importance, a quantitative link between mesoscopic population models and microscopic neuronal parameters is still largely lacking. This is mainly due to two obstacles: First, in a cortical column the number of neurons of the same type is small (50-2000 [3]) and hence far from the N ! 1 limit of classic "macroscopic" theories in which fluctuations vanish [14,[36][37][38]. Systematic treatments of finite-size networks using methods from statistical physics (system size expansion [39], path integral methods [40,41], neural Langevin equations [42][43][44][45]) have so far been limited to simplified Markov models that lack, however, a clear connection to single neuron physiology.
Second, spikes generated by a neuron are generally correlated in time due to refractoriness [46], adaptation and other spike history dependencies [28,[47][48][49][50][51]. Therefore spike trains are often not well described by an (inhomogeneous) Poisson process, especially during periods of high firing rates [46]. As a consequence, the mesoscopic population activity (i.e. the sum of spike trains) is generally not simply captured by a Poisson model either [52][53][54], even in the absence of synaptic couplings [55]. These non-Poissonian finite-size fluctuations on the mesoscopic level in turn imply temporally correlated synaptic input to other neurons (colored noise) that can drastically influence the population activity [53,54,56] but which is hard to tackle analytically [57]. Therefore, most theoretical approaches rely on a white-noise or Poisson assumption to describe the synaptic input [58][59][60][61][62], thereby neglecting temporal correlations caused by spike-history dependencies in single neuron activity. Here, we will exploit earlier approaches to treating refractoriness [23] and spike frequency adaptation [55,63] and combine these with a novel treatment of finite-size fluctuations.
Our approach is novel for several reasons. First, we use generalized integrate-and-fire models that accurately describe neuronal data [25,26] as our microscopic reference. Second, going beyond earlier studies [58][59][60]64], we derive stochastic population equations that account for both strong neuronal refractoriness and finite population size in a consistent manner. Third, our theory has a non-perturbative character as it neither assumes the self-coupling (refractoriness and adaptation) to be weak [65] nor does it hinge on an expansion around the N ! 1 solution for large but finite N [55,66,67]. Thus, it is also valid for relatively small populations and non-Gaussian fluctuations. And forth, in contrast to linear response theories [55,[68][69][70][71][72], our mesoscopic equations work far away from stationary states and reproduce large fluctuations in multistable networks.
In the RESULTS section we present our mesoscopic population equations, suggest an efficient numerical implementation, and illustrate the main dynamical effects via a selected number of examples. To validate the mesoscopic theory we numerically integrate the stochastic differential equations for the mesoscopic population activities and compare their statistics to those of the population activities derived from a microscopic simulation. A detailed account of the derivation is presented in the METHODS section. In the discussion section we point out limitations and possible applications of our mesoscopic theory.

Results
We consider a structured network of interacting homogeneous populations. Homogeneous here means that each population consists of spiking neurons with similar intrinsic properties and random connectivity within and between populations. To define such populations, one may think of grouping neurons into genetically-defined cell classes of excitatory and inhibitory neurons [4], or, more traditionally, into layers and cell types (Fig 1A). For example, pyramidal cells in layer 2/3 of rodent somatosensory cortex corresponding to whisker C2 form a population of about 1700 neurons [3]. Pyramidal cells in layer 5 of the same cortical column form another one (*1200 neurons [3]), fast-spiking inhibitory cells in layer 2/3 a third one (*100 neurons [3]) and non-fast-spiking inhibitory cells in layer 2/3 a fourth one (*130 neurons [3]), and so on [3,6,73]. We suppose that the parameters of typical neurons from each population [27,73,74], the number of neurons per population [3,73] and the typical connection probabilities [5] and strengths within and between populations [73,[75][76][77][78][79] are known from experimental data. The resulting spiking neural network can be simulated on a cellular level by numerical integration of the spiking dynamics of each individual neuron (Fig 1B). In the following, we will refer to this level of description as the microscopic level. Apart from being computationally expensive, the full microscopic network dynamics is highly complex and hence difficult to understand. To overcome these shortcomings, we have developed a new mean-field description for the mesoscopic dynamics of interacting populations.

Mesoscopic population equations
A population α of size N α is represented by its population activity A a N ðtÞ (Greek superscripts label the populations, Fig 1C)

defined as
s a i ðtÞ: ð1Þ  [5] with * 80'000 neurons organized into four layers (L2/3, L4, L5, L6) each consisting of an excitatory ("e") and an inhibitory ("i") population. On the microscopic level, each individual neuron is described by a generalized integrate-and-fire (GIF) model with membrane potential u a i ðtÞ, dynamic threshold W a i ðtÞ and conditional intensity f a ðu a i ðtÞ À W a i ðtÞÞ. Inset: GIF dynamics for a specific neuron i of population L4e (α = L4e). The neuron receives spikes from neurons in L4e, L4i and L6e, which drive the membrane potential u a i ðtÞ. Spikes are elicited stochastically by a conditional intensity l a i ðtÞ ¼ f a ðu a i ðtÞ À W a i ðtÞÞ that depends on the instantaneous difference between u a i ðtÞ and the dynamic threshold W a i ðtÞ. Spike feedback (voltage reset and spike-triggered threshold movement) gives rise to spike history effects like refractoriness and adaptation. https://doi.org/ 10.1371/journal.pcbi. 1005507.g001 Here, s a i ðtÞ ¼ P k dðt À t a i;k Þ with the Dirac δ-function denotes the spike train of an individual neuron i in population α with spike times t a i;k . Empirically, the population activity is measured with a finite temporal resolution Δt. In this case, we define the coarse-grained population activity as where Δn α (t) is the number of neurons in population α that have fired in a time bin of size Δt starting at time t. The two definitions converge in the limit Δt ! 0. An example of population activities derived from spiking activity in a cortical circuit model under a step current stimulation is shown in Fig 1B. To bridge the scales between neurons and populations, the corresponding mean-field model should ideally result in the same population activities as obtained from the full microscopic model (Fig 1D). Because of the stochastic nature of the population activities, however, the qualifier "same" has to be interpreted in a statistical sense. The random fluctuations apparent in Fig 1B and 1D are a consequence of the finite number of neurons because microscopic stochasticity is not averaged out in the finite sum in Eq (1). This observation is important because estimated neuron numbers reported in experiments on local cortical circuits are relatively small [3,73]. Therefore, a quantitatively valid population model needs to account for finite-size fluctuations. As mentioned above, we will refer to the population-level with finite size populations (N * 50 to 2000 per population) as the mesoscopic level. In summary, we face the following question: is it possible to derive a closed set of evolution equations for the mesoscopic variables A a N ðtÞ that follow the same statistics as the original microscopic model?
To address this question, we describe neurons by generalized integrate-and-fire (GIF) neuron models (Fig 1A (inset) and METHODS, Sec. "Generalized integrate-and-fire model") with escape noise [14]. In particular, neuron i of population α is modeled by a leaky integrate-and-fire model with dynamic threshold [49,80]. The variables of this model are the membrane potential u a i ðtÞ and the dynamic threshold W a i ðtÞ ¼ u th þ R t À 1 y a ðt À t 0 Þs a i ðt 0 Þ dt 0 (Fig 1A, inset), where u th is a baseline threshold and θ α (t) is a spike-triggered adaptation kernel or filter function that accounts for adaptation [26,47,[81][82][83][84] and other spike-history effects [14,84] via a convolution with the neurons spike train s a i ðtÞ. In other words, the dynamic threshold depends on earlier spikes t a i;k of neuron i: W a i ðtÞ W a ðt; t a i;k < tÞ. Spikes are elicited stochastically depending on the present state of the neuron (Fig 1A, inset). Specifically, the probability that neuron i fires a spike in the next time step [t, t + Δt] is given by λ i (t)Δt, where l a i ðtÞ is the conditional intensity of neuron i (also called conditional rate or hazard rate): with an exponential function f a ðxÞ ¼ c a expðx=D a u Þ. Analysis of experimental data has shown that the "softness" parameter D a u of the threshold is in the range of 1 to 5 mV [85]. The parameter c α can be interpreted as the instantaneous rate at threshold.
Besides the effect of a spike on the threshold as mediated by the filter function θ α (t), a spike also triggers a change of the membrane potential. In the GIF model (METHODS, Sec. "Generalized integrate-and-fire model"), the membrane potential u a i ðtÞ is reset after spiking to a reset potential u r , to which u a i ðtÞ is clamped for an absolute refractory period t ref . Absolute refractoriness is followed by a period of relative refractoriness, where the conditional intensity Eq (3) is reduced. This period is determined by the relaxation of the membrane potential from the reset potential to the unperturbed or "free" potential, denoted h(t), which corresponds to the membrane potential dynamics in the absence of resets.
The GIF model accurately predicts spikes of cortical neurons under noisy current stimulation mimicking in-vivo like input [25,26] and its parameters can be efficiently extracted from single neuron recordings [26,27]. Variants of this model have also been suggested that explicitly incorporate biophysical properties such as fast sodium inactivation [86,87], conductancebased currents [88] and synaptically-filtered background noise [89].
Mean-field approximations. In order to derive a mesoscopic mean-field theory for populations of GIF neurons, we first approximate the conditional intensity l a i ðtÞ of an individual neuron by an effective rate l a A ðtjt a i Þ that only depends on its last spike timet a i and on the history of the population activity A a N ðt 0 Þ, t 0 < t (as expressed by the subscript "A"). This is called the quasi-renewal approximation [63]. Taking into account the dependence on the last spike is particularly important because of neuronal refractoriness.
To obtain such a quasi-renewal description we make two approximations. Firstly, we approximate the random connectivity by an effective full connectivity with proportionally scaled down synaptic weights ("mean-field approximation"). As a result, all neurons of the same population are driven by identical synaptic input (see METHODS). This implies that for all neurons that had the same last spike time, the time course of the membrane potential is identical, u a i ðtÞ % u A ðt;t a i Þ. Secondly, we make the quasi-renewal approximation for GIF neurons [63], which replaces the threshold ϑ i (t) by an effective threshold W a A ðt;t a i Þ. Again, the effective threshold only depends on the last spike time and the history of the population activity. As a final result we obtain the conditional intensity for all neurons with a given last spike timet as ( Fig 1C, inset). A comparison of Eq (4) with Eq (3) shows that the explicit dependence on all past spike times t a i;k <t of a given neuron i has disappeared. Instead, the conditional intensity now only depends on the last firing timet and the past population activity A a N ðt 0 Þ, t 0 < t. To keep the notation simple, we omit in the following the population label α at all quantities.
Finite-size mean field theory. In this section, we present the main theoretical results with a focus on the finite-size effects arising from neuronal refractoriness. So far, we have effectively reduced the firing probability of a neuron to a function l A ðtjtÞ that only depends on its last spike timet (Fig 2A). This allows us to describe the evolution of the system by the density of the last spike time [23,68,88,89]. Because the last spike time characterizes the refractory state of the neuron, this density will also be referred to as the refractory density. Before we describe the novel finite-N theory, it is instructive to first recall the population equations for the case of infinite N (Fig 2B and 2C). Let us look at the population of neurons at time t and ask the following question: What fraction of these neurons has fired their last spike betweent and t þ dt? This fraction is given by the number of neurons AðtÞdt that have fired in this interval multiplied by the survival probability SðtjtÞ that such a neuron has not fired again until time t. In other words, the product Q 1 ðt;tÞ ¼ SðtjtÞAðtÞ evaluated at time t represents the density of last spike timest. Because a neuron with last spike timet emits a spike with rate l A ðtjtÞ, the total population activity at time t is given by the integral [23] AðtÞ ¼ This situation is depicted in Fig 2B. At the same time, the survival probability SðtjtÞ of neurons In the bin immediately before t, a large fluctuation (blue peak) was induced by forcing some of the neurons with last spike time around t* to fire. At time t, the density of last spike times (blue) has a hole and a peak of equal probability mass. The red line shows the pseudo-density S 0 ðtjtÞAðtÞ that would be obtained if we had used the survival probability S 0 ðtjtÞ of the unperturbed system. The peak att ¼ t À Dt does not contribute to the activity A(t) because of refractoriness, but the hole att ¼ t Ã contributes with a non-vanishing rate (A), implying a reduction of A(t) (hatched bin). (D) For a finite population size (here N = 400), the refractory density S N ðtjtÞA N ðtÞ (blue), determines the expectation " AðtÞ (hatched bin) of the fluctuating activity A N (t). Analogous to the forced fluctuation in (C), the finite-size fluctuations are associated with negative and positive deviations in the refractory density (holes and overshoots) compared to the non-normalized density SðtjtÞA N ðtÞ (red line) that would be obtained if only the mean SðtjtÞ and not the exact survival fraction S N ðtjtÞ had been taken into account. The variance of the deviations is proportional to vðt;tÞ given by Eq (12). As a function oft, vðt;tÞ shows the range oft where deviations are most prominent (bottom). (E, F) Given the number of neurons firing in the bin ½t;t þ DtÞ, DnðtÞ, the fraction of neurons that survive until time t is shown for ten realizations (gray lines, one highlighted in black for clarity). The mean fraction equals the survival probability SðtjtÞ (red line, top panel). The variance of the number of survived neurons at time t, vðt;tÞNDt, is shown at the bottom (red line: semi-analytic theory, Eq (12); circles: simulation). (E) DnðtÞ ¼ 5, (F) DnðtÞ ¼ 40.
In the limit N ! 1, the dynamics of A N (t) = A(t) is deterministic because microscopic noise averages out. Nevertheless, the infinite-N case is useful to understand the main effect of fluctuations in the finite-N case. To this end, let us perform the following thought experiment: in a small time interval of length Δt immediately before time t, we induce a large, positive fluctuation in the activity by forcing many of the neurons with last spike close to a given timet ¼ t Ã to fire a spike ( Fig 2C). As a result, the density of last spike times at time t exhibits a large peak just prior to time t corresponding to the large number of neurons that have been forced to fire in the time interval [t − Δt, t). At the same time, these neurons leave behind a "hole" in the density aroundt ¼ t Ã . Because the number of neurons is conserved, this hole exactly balances the area under the peak, and hence, the density of last spike times remains normalized. However, the two fluctuations (the hole and the peak) have two different effects on the population activity after time t. Specifically, the hole implies that some of the neurons which normally would have fired at time t with a nonzero rate λ A (t|t Ã ) > 0 are no longer available. Moreover, neural refractoriness implies that neurons which fired in the peak have a small or even vanishing rate λ A (t|t − Δt) % 0 at time t. As a result, the population activity is reduced shortly after the perturbation (Fig 2C). This example highlights the importance of the normalization of the refractory density as well as the state-dependence of the firing probability for understanding the effect of fluctuations. In particular, the normalization condition and neuronal refractoriness imply that a positive fluctuation of the population activity is succeeded by a negative fluctuation, and vice versa. This behavior is characteristic for spiking neurons, which are known to exhibit negative auto-correlations of their mean-centered spike trains at short time lags (see e.g. [14,53,90]) . We now turn to the finite-size case. In this case, it is advantageous to discretize the last spike times into small bins of size Δt that begin at the grid pointst k ¼ kDt, k 2 Z. Furthermore, we adopt the definition of the coarse-grained population activity, Eq (2), i.e. we consider the number of spikes Dnðt k Þ in the time bin ½t k ;t k þ DtÞ. Instead of the survival probability, we introduce the fraction of survived neurons S N ðtjt k Þ, t >t k , such that S N ðtjt k ÞDnðt k Þ is the number of neurons from bin k that have not fired again until time t (Fig 2D and 2E). Dividing this number by NΔt and taking the continuum limit Δt ! 0, yields the density of last spike times Q N ðt;tÞ ¼ S N ðtjtÞA N ðtÞ in the case of finite N. Since all neurons are uniquely identified by their last spike time, this density is normalized [23] 1 ¼ We note that differentiating this equation with respect to time t yields the population activity A N ðtÞ ¼ À R t À 1 @ t S N ðtjtÞA N ðtÞ dt as the formal finite-size analog of Eq (5). The change of the survival fraction @ t S N ðtjtÞ, however, is not deterministic anymore as in Eq (6) but follows a stochastic jump process (Fig 2E and 2F): In the time step [t, t + Δt), the number of survived neurons for a given bint k in the past, S N ðtjt k ÞDnðt k Þ, makes a downstep Xðt;t k Þ that corresponds to the number of neurons that fire in the group with last spike timet k . For sufficiently small Δt, this number is Poisson-distributed with mean l A ðtjt k ÞS N ðtjt k ÞDnðt k ÞDt. Hence, the fraction of survived neurons S N ðtjt k Þ evolves in time like a random stair case according to the update rule S N ðt þ Dtjt k Þ ¼ S N ðtjt k Þ À Xðt;t k Þ=Dnðt k Þ. The activity A N (t) = Δn(t)/(NΔt) in the time bin starting at t is given by the sum of all the downsteps, DnðtÞ ¼ P^t k <t Xðt;t k Þ, where the sum runs over all possible last spike times. This updating procedure represents the evolution of the density of last spike times, Q N ðt;tÞ ¼ S N ðtjtÞA N ðtÞ, for finite N under the quasirenewal approximation (cf. METHODS, Eqs (41) and (42)). Although it is possible to simulate such a finite-N stochastic process using the downsteps Xðt;t k Þ, this process will not yield the reduced mesoscopic dynamics that we are looking for. The reason is that the variable S N ðtjt k Þ refers to the subpopulation of neurons that fired in the small time bin att k . For small Δt, the size of the subpopulation, Dnðt k Þ, is a small number, much smaller than N. In particular, in the limit Δt ! 0, the simulation of S N ðtjt k ÞDnðt k Þ for allt k in the past would be as complex as the original microscopic dynamics of N neurons. Therefore we must consider such a simulation as a microscopic simulation. To see the difference to a mescoscopic simulation, we note that the population activity A N (t) involves the summation of many random processes (the downsteps Xðt;t k Þ) over many small bins. If we succeed to simulate directly A N (t) from an underlying rate dynamics that depends deterministically on the past activities A N (t 0 ), t 0 < t, we will have a truely mescoscopic simulation. How to arrive at a formulation directly at the level of mescocopic quantities will be the topic of the rest of this section.
The crucial question is whether the stochasticity of the many different random processes fS N ðtjt k Þg^t k <t can be reduced to a single effective noise process that drives the dynamics on the mesoscopic level. To this end, we note that for small Δt and given history Dnðt k Þ,t k < t, each bint k contributes with rate l A ðtjt k ÞS N ðtjt k ÞDnðt k Þ a Poisson random number of spikes to the total activity at time t ( Fig 2D). Therefore, the total number of spikes Δn(t) is Poisson distributed with mean N " AðtÞDt, where " AðtÞ is the expected population rate Here, the integral extends over all last spike timest up to but not including time t. Eq (8) still depends on the stochastic variables fS N ðtjt k Þg^t k <t . The main strategy to remove this microscopic stochasticity is to use the evolution of the survival probability SðtjtÞ, given by Eq (6), and the normalization condition Eq (7). For finite N, the quantity Sðtjt k Þ is formally defined as the solution of Eq (6) and can be interpreted as the mean of the survival fraction S N ðtjt k Þ (Fig 2E and 2F, see METHODS). Importantly, Sðtjt k Þ is a valid mesoscopic quantity since it only depends on the mesoscopic population activity A N (through l A ðtjt k Þ, cf. Eq (6)), and not on a specific microscopic realization. Combining the survival probability SðtjtÞ with the actual history of the mesoscopic activity A N ðtÞ fort < t yields the pseudo-density Qðt;t k Þ ¼ Sðtjt k ÞA N ðt k Þ. In contrast to the macroscopic density Q 1 ðt;tÞ ¼ SðtjtÞAðtÞ in Eq (5) or the microscopic density Q N ðt;t k Þ ¼ S N ðtjt k ÞA N ðt k Þ, the pseudo-density is not normalized. However, the pseudo-density Sðtjt k ÞA N ðt k Þ has the advantage that it is based on mesoscopic quantities only.
Let us split the survival fraction into the mesoscopic quantity Sðtjt k Þ and a microscopic deviation, S N ðtjt k Þ ¼ Sðtjt k Þ þ dSðtjt k Þ. In analogy to the artificial fluctuation in our thought experiment, endogenously generated fluctuations in the finite-size system are accompanied by deviations of the microscopic density S N ðtjt k ÞA N ðt k Þ from the pseudo-density Sðtjt k ÞA N ðt k Þ (Fig 2C and 2D, red line). A negative deviation (dSðtjt k Þ < 0) can be interpreted as a hole and a positive deviation (dSðtjt k Þ > 0) as an overshoot (compare red curve and blue histogram in Fig 2D). Similar to the thought experiment, the effect of these deviations needs to be weighted by the conditional intensity l A ðtjt k Þ in order to arrive at the population activity. Eq (8) yields Analogously, the normalization of the refractory density, Eq (7), can be written as We refer to the second integral in Eq (9) as a correction term because it corrects for the error that one would make if one sets S N = S in Eq (8). This correction term represents the overall contribution of the holes (δS < 0) and overshoots (δS > 0) to the expected activity.
To eliminate the microscopic deviations dSðtjtÞ in Eq (9) we use the normalization condition, Eq (10). This is possible because the correction term is tightly constrained by the sum of all holes and overshoots, R t À 1 dSðtjtÞA N ðtÞ dt, which by Eq (10), is completely determined by the past mesoscopic activities. Eqs (9) and (10) suggest to make the deterministic ansatz R t À 1 l A dSðtjtÞA N ðtÞ dt % LðtÞ R t À 1 dSðtjtÞA N ðtÞ dt for the correction term. As shown in METH-ODS ("Mesoscopic population equations"), the optimal rate Λ(t) that minimizes the mean squared error of this approximation is given by Here, the quantity vðt;tÞ, called variance function, obeys the differential equation with initial condition vðt;tÞ ¼ 0 (see METHODS, Eq (51). Importantly, the dynamics of v involves mesoscopic quantities only, and hence v is mesoscopic. As shown in METHODS and illustrated in Fig 2D (bottom), we can interpret vðt;t k ÞNDt as the variance of the number of survived neurons, S N ðtjt k ÞDnðt k Þ. To provide an interpretation of the effective rate Λ(t) we note that, for fixed t, the normalized variance vðt;tÞ= R t À 1 vðt;tÞ dt is a probability density overt. Thus, the effective rate Λ(t) can be regarded as a weighted average of the conditional intensity l A ðtjtÞ that accounts for the expected amplitude of the holes and overshoots.
Using the effective rate Λ(t) in Eq (9) results in the expected activity Looking at the structure of Eq (13), we find that the first term is the familiar population integral known from the infinite-N case, Eq (5). The second term is a correction that is only present in the finite-N case. In fact, in the limit N ! 1, the pseudo-density SðtjtÞA N ðtÞ converges to the macroscopic density SðtjtÞAðtÞ, which is normalized to unity. Hence the correction term vanishes and we recover the population Eq (5) for the infinite system. To obtain the population activity we consider an infinitesimally small time scale dt such that the probability of a neuron to fire during an interval [t, t + dt) is much smaller than one, i.e. " AðtÞdt ( 1. On this time scale, the total number of spikes dn(t) is an independent, Poisson distributed random number with mean " AðtÞNdt, where " AðtÞ is given by Eq (13). From Eq (2) thus follows the population activity Alternatively, the population activity can be represented as a δ-spike train, or "shot noise", where ft pop;k g k2Z is a random point process with a conditional intensity function l pop ðtjH t Þ ¼ N " AðtÞ. Here, the condition H t denotes the history of the point process {t pop,k } up to (but not including) time t, or equivalently the history of the population activity A N ðtÞ fort < t. The conditional intensity means that the conditional expectation of the population activity is given by hA N ðtÞij H t ¼ " AðtÞ, which according to Eq (13) is indeed a deterministic functional of the past activities. Finally, we note that the case of large but finite populations permits a Gaussian approximation, which yields the more explicit form Here, ξ(t) is a Gaussian white noise with correlation function hξ(t)ξ(t 0 )i = δ(t − t 0 ). The correlations of ξ(t) are white because spikes at t 0 and t > t 0 are independent given the expected population activity " AðtÞ at time t. However, we emphasize that the expected population activity " AðtÞ does include information on the past fluctuations ξ(t 0 ) at time t 0 . Therefore the fluctuations of the total activity A N (t) are not white but a sum of a colored process " AðtÞ and a white-noise process ξ(t) [55]. The white noise gives rise to the delta peak of the auto-correlation function at zero time lag which is a standard feature of any spike train, and hence also of A N (t). The colored process " AðtÞ, on the other hand, arises from Eq (13) via a filtering of the actual population activity A N ðtÞ which includes the past fluctuations xðtÞ. For neurons with refractoriness, " AðtÞ is negatively correlated with recent fluctuations xðtÞ (cf. the thought experiment of Fig  2B) leading to a trough at small time lags in the spike auto-correlation function [14,53,90].
The set of coupled Eqs (6), (12), (11)- (14) constitute the desired mesoscopic population dynamics and is the main result of the paper. The dynamics is fully determined by the history of the mesoscopic population activity A N . The Gaussian white noise in Eq (15) or the independent random number involved in the generation of the population activity via Eq (14) is the only source of stochasticity and summarizes the effect of microscopic noise on the mesoscopic level. Microscopic detail such as the knowledge of how many neurons occupy a certain microstatet has been removed.
One may wonder where the neuronal interactions enter in the population equation. Synaptic interactions are contained in the conditional intensity l A ðtjtÞ which depends on the membrane potential u A ðt;tÞ, which in turn is driven by the synaptic current that depends on the population activity via Eq (29) in METHODS. An illustration of the derived mesoscopic model is shown in Fig 1C (inset). In this section, we considered a single population to keep the notation simple. However, it is straightforward to formulate the corresponding equations for the case of several equations as shown in METHODS, Sec. "Several populations".
Stochastic population dynamics can be efficiently simulated forward in time. The stochastic population equations provide a rapid means to integrate the population dynamics on a mesoscopic level. To this end, we devised an efficient integration algorithm based on approximating the infinite integrals in the population equation Eq (13) by discrete sums over a finite number of refractory statest (METHODS, Sec. "Numerical implementation"). The algorithm involves the generation of only one random number per time step and population, because the activity is sampled from the mesoscopic rate " A a ðtÞ. In contrast, the microscopic simulation requires in each time step to draw a random number for each neuron. Furthermore, because the population equations do not depend on the number of neurons, we expect a significant speed-up factor for large neural networks compared to a corresponding microscopic simulation. For example, the microscopic simulation of the cortical column in Fig 1B took 13.5 minutes to simulate 10 seconds of biological time, whereas the corresponding forward integration of the stochastic population dynamics (Fig 1D) took only 6.6 seconds on the same machine (see Sec. "Comparison of microscopic and mesoscopic simulations").
A pseudocode of the algorithm to simulate neural population dynamics is provided in METHODS (Sec. "Numerical implementation"). In addition to that, a reference implementation of this algorithm is publicly available under https://github.com/schwalger/mesopopdyn_gif, and will be integrated in the Neural Simulation Tool (NEST) [91], https://github.com/nest/ nest-simulator, as a module presumably named gif_pop_psc_exp.

Two different consequences of finite N
For a first analysis of the finite-size effects, we consider the special case of a fully-connected network of Poisson neurons with absolute refractory period [14]. In this case, the conditional intensity can be represented as l A ðtjtÞ ¼ f ðhðtÞÞYðt Àt À t ref Þ, where t ref is the absolute refractory period, Θ(Á) is the Heaviside step function and h(t) is the free membrane potential, which obeys the passive membrane dynamics where τ m is the membrane time constant, μ(t) = u rest + RI(t) (where u rest is the resting potential and R is the membrane resistance) accounts for all currents I(t) that are independent of the population activities, J is the synaptic strength and (t) is a synaptic filter kernel (see METHODS, Eq (27) for details). For the mathematical analysis, we assume that the activity A N (t) and input μ(t) have started at t = −1 so that we do not need to worry about initial conditions. In a simulation, we could for example start at t = 0 with initial conditions A N (t) = δ(t) for t 0 and h(0) = 0. For the conditional intensity given above, the effective rate Λ(t), Eq (11), is given by Λ(t) = f(h(t)) because the variance vðt;tÞ is zero during the absolute refractory period t À t ref t < t. As a result, the mesoscopic population Eq (13) reduces to the simple form This mesoscopic equation is exact and could have been constructed directly in this simple case. For N ! 1, where A N (t) becomes identical to " AðtÞ, this equation has been derived by Wilson and Cowan [16], see also [14,23,92]. The intuitive interpretation of Eq (17) is that the activity at time t consists of two factors, the "free" rate λ free (t) = f(h(t)) that would be expected in the absence of refractoriness and the fraction of actually available ("free") neurons that are not in the refractory period. For finite-size populations, these two factors explicitly reveal two distinct finite-size effects: firstly, the free rate is driven by the fluctuating population activity A N (t) via Eq (16) and hence the free rate exhibits finite-size fluctuations. This effect originates from the transmission of the fluctuations through the recurrent synaptic connectivity. Secondly, the fluctuations of the population activity impacts the refractory state of the population, i.e. the fraction of free neurons, as revealed by the second factor in Eq (17). In particular, a large positive fluctuations of A N in the recent past reduces the fraction of free neurons, which causes a negative fluctuation of the number N " AðtÞDt of expected firings in the next time step. Therefore, refractoriness generates negative correlations of the fluctuations hΔA(t)ΔA(t 0 )i for small |t − t 0 |. We note that such a decrease of the expected rate would not have been possible if the correction term in Eq (13) was absent. However, incorporating the effect of recent fluctuations (i.e. fluctuations in the number of refractory neurons) on the number of free neurons by adding the correction term, and thereby balancing the total number of neurons, recovers the correct Eq (17).
The same arguments can be repeated in the general setting of Eq (13). Firstly, the conditional intensity l A ðtjtÞ depends on the past fluctuations of the population activity because of network feedback. Secondly, the fluctuations lead to an imbalance in the number of neurons across different states of relative refractoriness (i.e. fluctuations do not add up to zero) which gives rise to the "correction term", i.e. the second term on the r.h.s. of Eq (13).

Comparison of microscopic and mesoscopic simulations
We wondered how well the statistics of the population activities obtained from the integration of the mesoscopic equations compare with the corresponding activities generated by a microscopic simulation. As we deal with a finite-size system, not only to the first-order statistics (mean activity) but also higher-order statistics needs to be considered. Because there are several approximations involved (e.g. full connectivity, quasi-renewal approximation and effective rate of fluctuations in the refractory density), we do not expect a perfect match. To compare first-and second-order statistics, we will mainly use the power spectrum of the population activities in the stationary state (see METHODS, Sec. "Power spectrum").
Mesoscopic equations capture refractoriness. Our theory describes the interplay between finite-size fluctuations and spike-history effects. The most prominent spike-history effect is refractoriness, i.e. the strong effect of the last spike on the current probability to spike.
To study this effect, we first focus on a population of uncoupled neurons with a constant threshold corresponding to leaky integrate-and-fire (LIF) models without adaptation (Fig 3). The reset of the membrane potential after each spike introduces a period of relative refractoriness, where spiking is less likely due to a hyper-polarized membrane potential ( Fig 3A). Because of the reset to a fixed value, the spike trains of the LIF neurons are renewal processes. In the stationary state, the fluctuation statistics as characterized by the power spectrum is known analytically for the case of a population of independent renewal spike trains (Eq (134) in METHODS).
A single realization of the population activity A N (t) fluctuates around the expected activity " AðtÞ that exhibits a typical ringing in response to a step current stimulation [23,93]. The time course of the expected activity as well as the size of fluctuations are roughly similar for microscopic simulation (Fig 3B, top) and the numerical integration of the population equations ( Fig  3B, bottom). We also note that the expected activity " AðtÞ is not constant in the stationary regime but shows weak fluctuations. This is because of the feedback of the actual realization of A N (t 0 ) for t 0 < t onto the dynamics of " AðtÞ, Eq (13). A closer inspection confirms that the fluctuations generated by the mesoscopic population dynamics in the stationary state exhibit the same power spectrum (Fig 3C) as the theoretically predicted one, which is given by Eq (134). In particular, the mesoscopic equations capture the fluctuation statistics even at high firing rates, where the power spectrum strongly deviates from the white (flat) power spectrum of a Poisson process (Fig 3C bottom). The pronounced dip at low-frequencies is a well-known signature of neuronal refractoriness [94].
Mesoscopic equations capture adaptation and burstiness. Further important spike-history effects can be realized by a dynamic threshold. For instance, spike-frequency adaptation, where a neuron adapts its firing rate in response to a step current after an initial strong response ( Fig 4A and 4B), can be modeled by an accumulating threshold that slowly decays between spikes [47,49]. In single realizations, the mean population rate as well as the size of fluctuations appear to be similar for microscopic and mesoscopic case ( Fig 4B, top and bottom, respectively). For a more quantitative comparison we compared the ensemble statistics as quantified by the power spectrum. This comparison reveals that the fluctuation statistics is well captured by the mesoscopic model ( Fig 4C). The main effect of adaptation is a marked reduction in the power spectrum at low frequencies compared to the non-adaptive neurons of Fig 3. The small discrepancies compared to the microscopic simulation originate from the quasi-renewal approximation, which does not account for the individual spike history of a neuron before the last spike but only uses the population averaged history. This approximation is expected to work well if the threshold kernel changes slowly, effectively averaging the spike history locally in time [63].
In the case of fast changes of the threshold kernel, we do not expect that the quasi-renewal approximation holds. For example, a biphasic kernel [95] with a facilitating part at short interspike intervals (ISI) and an adaptation part for large ISIs (Fig 4D-(i)) can realize bursty spike patterns ( Fig 4D-(ii)). The burstiness is reflected in the ISI density by a peak at small ISIs, corresponding to ISIs within a burst, and a tail that extends to large ISIs representing interburst intervals ( Fig 4D-(iii)). Remarkably, the mesoscopic equations with the quasi-renewal approximation qualitatively capture the burstiness, as can be seen by the strong low-frequency power at about 1 Hz (Fig 4E). At the same time, the effect of adaptation manifests itself in a reduced power at even lower frequencies. The systematic overestimation of the power across frequencies implies a larger variance of the empirical population activity obtained from the  mesoscopic simulation, which is indeed visible by looking at the single realizations ( Fig 4D-(iv)). As an aside, we note that facilitation which is strong compared to adaptation can lead to unstable neuron dynamics even for isolated neurons [96].
Recurrent network of randomly connected neurons. So far, we have studied populations of uncoupled neurons. This allowed us to demonstrate that the mesoscopic dynamics captures effects of single neuron dynamics on the fluctuations of the population activity. Let us now suppose that each neuron in a population α is randomly connected to presynaptic neurons in population β with probability p αβ such that the in-degree is fixed to p αβ N β connections. In the presence of synaptic coupling, the fluctuations at time t are propagated through the recurrent connectivity and may significantly influence the population activity at time t 0 > t. For instance, in a fully-connected network (p αβ = p = 1 for all α and β) of excitatory and inhibitory neurons (E-I network, Fig 5B and 5C), all neurons within a population receive identical inputs given by the population activities A a N ðtÞ (cf. Eq (29)). Finite-size fluctuations of A a N ðtÞ generate common input to all neurons and tend to synchronize neurons. This effect manifests itself in large fluctuations of the population activity ( Fig 5B). Since the mean-field approximation of the synaptic input becomes exact in the limit p ! 1, we expect a good match between the microscopic and mesoscopic simulation in this case. Interestingly, the power spectra of the population activities obtained from these simulations coincide well even for an extremely small E-I network consisting of only one inhibitory and four excitatory neurons ( Fig 5C). This extreme case of N = 5 neurons with strong synapses (here, w EE = w IE = 12 mV, w II = w EI = −60 mV) highlights the non-perturbative character of our theory for fully-connected networks, which does not require the inverse system size or the synaptic strength to be small. In general, the power spectra reveal pronounced oscillations that are induced by finite size fluctuations [43]. The amplitude of these stochastic oscillations decreases as the network size increases and vanishes in the large-N limit.
If the network is not fully but randomly-connected (0 < p < 1), neurons still share a part of the finite-size fluctuations of the population activity. Earlier theoretical studies [58,68,97] have pointed out that these common fluctuations inevitably yield correlated and partially synchronized neural activity, as observed in simulations (Fig 5D and 5F). This genuine finite-size effect decreases for larger networks approaching an asynchronous state [98] (Fig 5C and 5E). As argued in previous studies [58][59][60], the fluctuations of the synaptic input can be decomposed into two components, a coherent and an incoherent one. The coherent fluctuations are given by the fluctuations of the population activity and are thus common to all neurons of a population. This component is exactly described by our mean-field approximation, u a i ðtÞ % u A ðt;t a i Þ used in Eq (4) (cf. METHODS, Eq (31)). The incoherent fluctuations are caused by the quenched randomness of the network (i.e. each neuron receives input from a different subpopulation of the network) and have been described as independent Poisson input in earlier studies [58][59][60]. If we compare the membrane potential of a single neuron with the one expected from the mean-field approximation ( Fig 6A, 6C and 6E, top), we indeed observe a significant difference in fluctuations. This difference originates from the incoherent component. Differences in membrane potential will lead to differences in the instantaneous spike emission probability for each individual neuron; cf. Eq (3). However, in order to calculate the population activity we need to average the conditional firing rate of Eq (3) over all neurons in the population (see Methods, Eq (28) for details). Despite the fact that each neuron is characterized by a different last firing timet, the differences in firing rate caused by voltage fluctuations will, for sufficiently large N and not too small p, average out whereas common fluctuations caused by past fluctuations in the population activity will survive (Fig 6A, 6C and 6E, bottom). In other words, the coherent component is the one that dominates the finite-N activity whereas the incoherent one is washed out. Therefore, mesoscopic population activities can be well described by our mean-field approximation even when the network is not fully connected (Figs 5E, 5G and 6B, 6D, 6F). Remarkably, even for C = 200 synapses per neuron and p = 0.05, the mesoscopic model agrees well with the microscopic model. However, if both N and p are small, the mesoscopic theory breaks down as expected ( Fig 5G, blue circles). Furthermore, strong synaptic weights imply strong incoherent noise, which is then passed through the exponential nonlinearity of the hazard function. This may lead to deviations of the population-averaged hazard rate from the corresponding mean-field approximation ( Fig 6E, bottom), and, consequently, to deviations between microscopic and mesoscopic population activities in networks with strong random connections ( Fig 6F).
Finite-size induced switching in bistable networks. In large but finite E-I networks, the main effect of weak finite-size fluctuations is to distort the deterministic population dynamics of the infinitely large network (stable asynchronous state or limit cycle motion) leading to stochastic oscillations and phase diffusion that can be understood analytically by linear response theory [43,55,60,69] and weakly nonlinear analysis [58]. This is qualitatively different in networks with multiple stable states. In such networks, finite-size fluctuations may have a drastic effect because they enable large switch-like transitions between metastable states that cannot be described by a linear or weakly nonlinear theory. We will show now that our mesoscopic population equation accurately captures strongly nonlinear effects, such as large fluctuations in multistable networks.
Multistability in spiking neural networks can emerge as a collective effect in balanced E-I networks with clustered connectivity [99], and, generically, in networks with a winner-take-all architecture, where excitatory populations compete through inhibitory interactions mediated by a common inhibitory population (see, e.g. [14,[100][101][102][103][104] and Fig 7A). Jumps between metastable states have been used to model switchings in bistable perception [32][33][34]. To understand such finite-size induced switching in spiking neural networks on a qualitative level, phenomenological rate models have been usually employed [32,104]. In these models, stochastic switchings are enabled by noise added to the deterministic rate equations in an ad hoc manner. Our mesoscopic mean-field equations keep the spirit of such rate equations, however with the important difference that the noisy dynamics is systematically derived from the . Switching between high-and low-activity states is more regular than in the non-adapting case as revealed by a low-frequency peak in the power spectrum (F) and a narrow, unimodal density of dominance times (G). In ( underlying spiking neural network without any free parameter. Here, we show that the mesoscopic mean-field equations quantitatively reproduce finite-size induced transitions between metastable states of spiking neural networks. We emphasize that the switching statistics depends sensitively on the properties of the noise that drive the transitions [105]. Therefore, an accurate account of finite-size fluctuations is expected to be particularly important in this case. We consider a simple bistable network of two excitatory populations with activities A E1 and A E2 , respectively, that are reciprocally connected to a common inhibitory population with activity A I (Fig 7A). We choose the mean input and the connection strength such that in the large-N limit the population activities exhibited two stable equilibrium states, one corresponding to a situation, where A E1 is high and A E2 is low, the other state corresponding to the inverse situation, where A E1 is low and A E2 is high. We found that in smaller networks, finite-size fluctuations are indeed sufficient to induce transitions between the two states leading to repeated switches between high-and low-activity states (Fig 7B and 7E). The regularity of the switching appears to depend crucially on the presence of adaptation, as has been suggested previously [32,34]. Remarkably, both in the presence and absence of adaptation, the switching dynamics of the spiking neural network appears to be well reproduced by the mesoscopic mean-field model.
For a more quantitative comparison, we use several statistical measures that characterize the bistable activity. Let us first consider the case without adaptation. As before, we compare the power spectra of the population activity for both microscopic and mesoscopic simulation and find a good agreement (Fig 7C). The peak in the power spectrum at relatively high-frequency reveals strong, rapid oscillations that are visible in the population activity after a switch to the high-activity state (inset of Fig 7B with magnified view). In contrast, the large power at low frequencies corresponds to the slow fluctuations caused by the switching of activity between the two excitatory populations, as revealed by the low-pass filtered population activity ( Fig 7B). The Lorentzian shape of the power spectrum caused by the slow switching dynamics is consistent with stochastically independent, exponentially distributed residence times in each of the two activity states (i.e., a homogeneous Poisson process). The residence time distribution shows indeed a monotonic, exponential decay ( Fig 7D) both in the microscopic and mesoscopic model. Furthermore, we found that residence times do not exhibit significant serial correlations. Together, this confirms the Poissonian nature of bistable switching in our three-population model of neurons without adaptation.
In models for perceptual bistability, residence times in the high-activity state are often called dominance times. The distribution of dominance times is usually not exponential but has been described by a more narrow, gamma-like distribution (see, e.g. [106]). Such a more narrow distribution emerges in a three-population network where excitatory neurons are weakly adaptive. When the population enters a high-activity state, the initial strong increase of the population activity is now followed by a slow adaptation to a somewhat smaller, stationary activity ( Fig 7E). Eventually, the population jumps back to the low-activity state. The switching dynamics is much more regular with than without adaptation leading to slow stochastic oscillations as highlighted by a second peak in the power spectrum at low frequencies ( Fig 7F) and a narrow distribution of dominance times (Fig 7G), in line with previous theoretical studies [32][33][34]. We emphasize, however, that in contrast to these studies the underlying deterministic dynamics for N ! 1 is in our case not oscillatory but bistable, because the adaptation level is below the critical value necessary in the deterministic model to switch back to the low-activity state.
The emergence of regular switching due to finite-size noise can be understood by interpreting the residence time of a given population in the high-activity state as arising from two stages: (i) the initial transient of the activity to a decreased (but still large) stationary value due to adaptation and (ii) the subsequent noise-induced escape from the stationary adapted state. The first stage is deterministic and hence does not contribute to the variability of the residence times. The variability results mainly from the second stage. The duration of the first stage is determined by the adaptation time scale. If this time covers a considerable part of the total residence time, we expect that the coefficient of variation (CV), defined as the ratio of standard deviation and mean residence time, is small. In the case without adaptation, a deterministic relaxation stage can be neglected against the mean noise-induced escape time so that the CV is larger.
Mesoscopic dynamics of cortical microcolumn. As a final example, we applied the mesoscopic population equations to a biologically more detailed model of a local cortical microcircuit. Specifically, we used the multi-laminar column model of V1 proposed by Potjans and Diesmann [5] (see also [72,107] for an analysis of this model). It consists of about 80 0 000 nonadapting integrate-and-fire neurons organized into four layers (L2/3, L4, L5 and L6), each accommodating an excitatory and an inhibitory population (see schematic in Fig 1A). The neurons are randomly connected within and across the eight populations. We slightly changed this model to include spike-frequency adaptation of excitatory neurons, as observed in experiments (see e.g. [26]). Furthermore, we replaced the Poissonian background noise in the original model by an increase of mean current drive and escape noise (both in the microscopic and mesoscopic model). The mean current drive was chosen such that the firing rates of the spontaneous activity were matched to the firing rates in the original model. We note that the fitting of the mean current was made possible by the use of our population equations, which allow for an efficient evaluation of the firing rates. The complete set of parameters is listed in METHODS, Sec. "Modified Potjans-Diesmann model".
Sample trajectories of the population activities have already served as an illustration of our approach in Fig 1, where neurons in layer 4 and 6 are stimulated by a step current of 30 ms duration, mimicking input from the thalamus as in the original study [5]. Individual realizations obtained from the microscopic and mesoscopic simulation differ due to the marked stochasticity of the population activities (Fig 1B and 1D). However, trial-averaging reveals that the mean time-dependent activities that can be estimated from a peri-stimulus-time histogram (PSTH) obtained from microscopic and mesoscopic simulations indeed agree well, except for a slight underestimation of the oscillatory peak during stimulus offset compared to the microscopic simulation ( Fig 8A). However, during the short moments where the mean time-dependent activity (PSTH) of the mescoscopic and microscopic simulation do not match, the timedependent standard deviation across hundreds of trials ( Fig 8B) is extremely high in both mesoscopic and microscopic simulation, indicating that fluctuations of the activity between one trial and the next are high after stimulus offset at 0.09s. The standard deviation as a function of time (Fig 8B) agrees overall nicely between microscopic and mesoscopic simulation, suggesting a good match of second-order statistics. A closer look at the second-order statistics, as provided by the power spectra of spontaneous activities ("ground state" of cortical activity), also reveals a good agreement at all frequencies (Fig 9). This agreement is remarkable in view of the low connection probabilities (p < 0.14, see table 5 in [5]) that violate the assumption of dense random connectivity used in the derivation of the mesoscopic mean-field equations. More generally, this example demonstrates that the range of validity of our mesoscopic theory covers relevant cortical circuit models.
Finally, we mention that the numerical integration of the mesoscopic population equations yields a significant speed-up compared to the microscopic simulation. While a systematic and fair comparison of the efficiencies depends on many details and is thus beyond the scope of this paper, we note that a simulation on a single core of 10s of biological time took 811.2s using the microscopic model, whereas that of the mesoscopic model only took 6.6s. This corresponds to a speed-up factor of around 120 achieved by using the mesoscopic population model. In the simulation, we employed the same integration time step of Δt = 0.5 ms for both models for a first naive assessment of the performance. However, a more detailed comparison of the performance should be based on simulation parameters that achieve a given accuracy. In this case, we expect an even larger speed-up of the mesoscopic simulation because for the same accuracy the temporally coarse-grained population equations allow for a significantly larger time step than the microscopic simulation of spiking neurons.

Discussion
In the present study we have derived stochastic population equations that govern the evolution of mesoscopic neural activity arising from a finite number of stochastic neurons. To our knowledge, this is the first time that such a mesoscopic dynamics has been derived from an underlying microscopic model of spiking neurons with pronounced spike-history effects. The microscopic model consists of interacting homogeneous populations of generalized integrateand-fire (GIF) neuron models [14,[26][27][28], or alternatively, spike-response (SRM) [14] or generalized linear models (GLMs) [55,108,109]. These classes of neuron models account for various spike-history effects like refractoriness and adaptation [14,84]. Importantly, parameters of these models can be efficiently extracted from single cell experiments [27] providing faithful representations of real cortical cells under somatic current injection. The resulting population equations on the mesoscopic level yield the expected activity of each population at the present time as a functional of population activities in the past. Given the expected activities at the present time, the actual mesoscopic activities can be obtained by drawing independent random numbers. The derived mesoscopic dynamics captures nonlinear emergent dynamics as well as finite-size effects, such as noisy oscillations and stochastic transitions in multistable networks. Realizations generated by the mesoscopic model have the same statistics as the original

Quantitative modeling of mesoscopic neural data: Applications and experimental predictions
Our theory provides a general framework to replace spiking neural networks that are organized into homogeneous populations by a network of interacting mesoscopic populations. For example, the excitatory and inhibitory neurons of a layer of a cortical column [5] may be represented by one population each, as in Fig 1. Weak heterogeneity in the neuronal parameters are allowed in our theory because the mesoscopic equations describe the population-averaged behavior. Further subdivisions of the populations are possible, however, such as a subdivision of the inhibitory neurons into fast-spiking and non fast-spiking types [26]. Populations that show initially a large degree of heterogeneity can be further subdivided into smaller populations. In this case, a correct description of finite-size fluctuations, as provided by our theory, will be particularly important. However, as with any mean-field theory, we expect that our theory breaks down if neural activity and information processing is driven by a few "outlier" neurons such that a mean-field description becomes meaningless. Further limitations may result from the mean-field and quasi-renewal approximation, Eq (4). Formally, the mean-field approximation of the synaptic input requires dense connectivity and the heterogeneity in synaptic efficacies and in synapse numbers to be weak. Moreover, the quasi-renewal approximation assumes slow threshold dynamics. However, as we have demonstrated here, our mesoscopic population equations can provide in concrete applications excellent predictions even for sparse connectivity (Figs 5D-5G, 8 and 9) and may qualitatively reproduce the mesoscopic statistics in the presence of fast threshold dynamics (Fig 4D and 4E).
Using our mesoscopic population equations it is possible to make specific predictions about the response properties of local cortical circuits. For instance, recent progress in genetic methods now enables experimentalists to selectively label and record from genetically identified cell types, such as intratelencephalic (IT), pyramidal tract (PT) and corticothalamic (CT) neurons among the excitatory neurons, and vasoactive intestinal peptide (VIP), somatostatin (Sst) and parvalbumin (Pvalb) expressing neurons among the interneurons [4]. These cell types have received much attention recently as it has been proposed that they may form a basic functional module of cortex, the canonical circuit [4,110]. The genetic classification of cells defines subpopulations of the cortical network. A model of the canonical circuits of the cortex in terms of interacting mesoscopic populations can be particularly useful if used to describe experiments that use optogenetic stimulation of genetically-defined populations by light, which in our framework can be represented as a transient external input current. To build a mesoscopic population model based on our theory demands some assumptions about microscopic parameters such as (i) typical neuron parameters for each subpopulation, (ii) structural parameters as characterized by average synaptic efficacies and time scales of connections between and within populations, and (iii) estimates of neuron numbers per subpopulation. Parameters for a typical neuron of each population could be extracted by the efficient fitting procedures presented in [26,27]. Structural parameters and neuron numbers have been estimated, for instance, for barrel columns in rodents somato-sensory cortex [3,73] and other studies (see e.g., [5]). Our population equations could then be used to make predictions about circuit responses to light stimuli, e.g. by imaging the activity of a genetically-defined subpopulation in one column in response to optogenetic stimulation of another cell class in another column.
As a first step in this direction, we have demonstrated here that our population equations correctly predict the mesoscopic activities (means and fluctuations) of a simulation of a detailed, microscopic network model of a cortical microcircuit [5] under thalamic stimulation of layer 4 and 6 neurons. Using a population density method, mean activities of this model have also been predicted in a recent study to analyze its computational properties [107] with a special focus on predictive coding. Our population density approach goes beyond that study by also predicting finite-size fluctuations of the activities and their effects on the mesoscopic network dynamics such as finite-size induced stochastic oscillations. Predicting activities in real experiments is, however, complicated by the fact that the parameters of a microscopic network model are typically underconstrained given the lack or uncertainty of measured or estimated parameters [111]. Here, our population equations provide an efficient means to constrain unknown microscopic parameters by requiring consistence with mesoscopic experimental data.
While the canonical circuit represents a model of interacting populations on the mesoscopic level, recent interest in macroscopic models of entire brain areas or even of whole brains has risen [6,7]. Population dynamics can be used in this context as a means to reduce large parts of the macroscopic neuronal network to a system of interacting populations that is numerically manageable, and requires less detailed knowledge of synaptic connectivity (mean synaptic coupling of populations as opposed to individual synapses). However, even this information about mesoscopic network structure might not be available given that it corresponds to an M × M matrix of mean synaptic efficacies, where the number M of populations, or respectively cell types, might be large. In this case, our population equations can be utilized to efficiently constrain unknown structural parameters, such as synaptic weights, such that the resulting mesoscopic activities are consistent with experimental data. This leads in turn to experimentally testable predictions for synaptic connectivities. Such an approach [111] has been recently applied to a network model of primate visual cortex demonstrating the usefulness of mean-field theories for predicting structural properties of large-scale cortical networks.
An interesting complementary route for further studies is a multiscale model, in which a large-scale model is simulated in terms of reduced, mesoscopic populations but with one or several areas in focus that are simulated in full microscopic detail. As knowledge of anatomy and computational capacity increases, more and more mesoscopic populations can be replaced by a microscopic simulation, while at any time in this process the full system is represented in the model. We therefore expect our population dynamics model to be a useful tool to continuously integrate experimental data into multiscale models of whole mammalian brains.
Simplified whole brain models of interacting neuronal areas have recently been proposed [112,113]. Furthermore, large-scale neuro imaging data are routinely modeled by phenomenological population models such as neural mass, Wilson-Cowan, or neural field models [9,22]. Our new population dynamics theory could be used in such approaches as an accurate representation of the fluctuations of neural activity in the reduced areas. For example, in macroscopic data such as resting state fMRI, EEG or MEG, the endogenously generated fluctuations of brain activity are of major interest [113]. A fortiori the same applies to mesoscopic data such as local field potentials (LFP) or voltage-sensitive dye (VSD), in which finite-size fluctuation are expected to be large. Our theory paves the way for relating macroscale fluctuations to the underlying networks of spiking neurons and their activity, and so to the neuronal circuits that underlie the computations of the brain.
Another interesting application of our population model is to predict the activity of neural networks grown in cultures. This model system is much more accessible and controllable (e.g., by optogenetic stimulations) than cortical networks in-vivo but may still provide valuable insights into the complex network activity of excitatory and inhibitory neurons as proposed in a recent study [114]. In particular, in that study the authors propose a critical role for shortterm plasticity [115]. Although we have here used static synapses, an extension of our mesoscopic mean-field theory to synaptic short-term plasticity is feasible. Furthermore, finite-size fluctuations appear to be particularly important in cell cultures as suggested by a previous theoretical study [62]. Our mesoscopic population theory thus represents a framework to predict spontaneous as well as evoked activity in neuronal cell cultures.

Theoretical aspects
From a theoretical point of view, our study represents a generalization of deterministic, macroscopic population equations for an infinite number of spiking neurons with refractoriness [14,23,63] to stochastic, mesoscopic population equations for a finite number of neurons. The resulting dynamics can be directly used to generate single stochastic realizations of mesoscopic activities, in analogy to a Langevin dynamics. Our work is thus conceptually different from earlier studies of finite-size effects [66][67][68], who also considered finite networks of spiking neurons and refractoriness but derived deterministic evolution equations for moment and crosscorrelation functions and hence characterized the ensemble dynamics. Furthermore, in contrast to these studies, our theory is not based on a perturbation expansion around the N ! 1 limit, and thus captures large and non-Gaussian fluctuations in strongly nonlinear population dynamics such as bistable networks.
Outside the low-rate Poisson firing regime, spiking neurons exhibit history dependencies in their spike trains, the most prominent of which is neuronal refractoriness, i.e. the strongly reduced firing probability depending on the time since the last spike. On the population level this means that a positive (negative) fluctuation of the population rate affects the underlying refractory state of the population because more (less) neurons than expected become refractory. This altered refractory state in turn tends to decrease (increase) the mean and variance of the population activity shortly after the fluctuation. More generally, fluctuations of the population activity influence the population density of state variables, which in turn influences fluctuations. In this study, we have worked out how to incorporate this interplay between fluctuations of the population activity and fluctuations of the refractory density into a mesoscopic population dynamics. The key insight to achieve this was (i) to exploit the normalization condition for the density of microscopic states (in our case, the density of last spike times) and (ii) to associate density fluctuations with a time-dependent but state-independent average rate that emphasizes the microscopic rates of those states that exhibit the largest finite-size fluctuations (in our case, the weighted average rate with respect to the variance vðt;tÞ).
Our work is thus in marked contrast to previous stochastic rate models for finite-size systems in the form of stochastic Wilson-Cowan equations [62,104], or stochastic neural field equations [39,116]. In these models, finite-size fluctuations of the rate may feed back through the recurrent connections but the strong negative self-feedback due to refractoriness is neglected. This is the case even if the stationary or dynamic transfer function employed in the rate dynamics corresponds to a spiking neuron model [62,72]. Furthermore, fluctuations of the population rate have often been implemented ad hoc by a phenomenological white-noise source, which was added to the macroscopic (i.e. deterministic) rate dynamics [32,102,112]. The intensity of the noise is a free parameter in these cases. Our mesoscopic equations are also driven by a noise source, but two differences to these studies are noteworthy: First, it is derived from a microscopic model and does not contain any free parameter; and second, the noise is white given the predicted mean activity but since the activity predicted in one time step depends on fluctuations in all earlier time steps, the effective noise leads to a colored noise spectrum-even if coupling is removed (see Fig 3). This observation is consistent with previous studies [55,58,60,69], in which the power spectrum of the fluctuations about a steadystate has been calculated analytically.
On the population level, refractoriness can be taken into account by population density equations such as the Fokker-Planck equation for the membrane potential density (see e.g. [36,58,59], or [107,117,118] for related master equations), or the population integral equation for the refractory density [14,23,88,89]. These studies were mainly concerned with macroscopic populations, which formally correspond to the limit N ! 1. For the refractory density formalism, we have shown here how to extend the population integral equation to the case of finite population size. To this end, we corrected for the missing normalization of the mesoscopic density (e.g. Qðt;tÞ ¼ SðtjtÞA N ðtÞ in Eq (13) or qðt;tÞ in Eq (80)), and thereby accounted for the interplay between fluctuations and refractoriness. Finite-size fluctuations of the population rate have also been used in the Fokker-Planck formalism [58][59][60] but the immediate effect of these fluctuations on the membrane potential density at threshold, and hence the refractoriness, has been neglected: in fact, a positive (negative) fluctuation of the population rate increases (decreases) the number of neurons at the reset potential while the number of neurons close to the threshold has to decrease (increase) such that the microscopic density remains normalized. The finite-N membrane potential density used by Mattia and Del Giudice [60] does not account for this normalization effect. Whereas the numerical integration of their equation may still give a satisfying solution in the low-rate, Poissonian-firing regime, where refractory effects can be neglected, it becomes unstable at higher rates unless the density is renormalized manually at every time step [119]. How to correct for the missing normalization in the Fokker-Planck approach is still an unsolved theoretical question. In this respect, using analogies to and insights from our approach might be promising.
The quasi-renewal approximation [55,63] allowed us to develop a finite-size theory for an effectively one-dimensional population density equation even in the presence of adaptation.
Here, the only microscopic state variable is the last spike timet, or equivalently the age of the neuron t ¼ t Àt. Longer lasting spike history effects such as adaptation are captured by the dependence of the conditional intensity on the population activity A N , which as a mesoscopic mean-field variable does not need to be treated as a state variable. Furthermore, Chizhov and Graham have shown that the one-dimensional population density method in terms of the age τ can also capture multiple gating variables in conductance-based neuron models with adaptation [88]. Such one-dimensional descriptions have great advantages compared to population density equations that include adaptation by additional state variables and which thus require a multi-dimensional state-space [37,[120][121][122]: Firstly, the numerical solution of the density equations grows exponentially with the number of dimensions and becomes quickly infeasible if multiple adaptation variables are needed as e.g. in the case of multi-timescale adaptation [28] or if an adiabatic approximation of slow variables [64,120,123] is not possible. Secondly, it is unclear how to treat finite-size fluctuations in the multi-dimensional case.
Our theory is based on an effective fully-connected network, in which neurons are coupled by the actual realization of the stochastic population activity (the "mean field"), both in the microscopic and mesoscopic model. Thus, in the limit of a fully-connected network, the problem of self-consistently matching the input and output statistics, which arises in mean-field theories, is automatically satisfied to any order by our finite-size theory. This is in marked contrast to the opposite limit of a sparsely-connected network [59]. In that case, the mean-field variables correspond to the statistics of the spike trains (e.g. rate and auto-correlation function) rather than to the actual realization of the population activity. These statistics must be matched self-consistently for input and output, which is a hard theoretical problem [56,57,124,125]. Between these two limit cases, where the network is randomly connected with some finite connection probability 0 < p < 1, our examples (Figs 5, 8 and 9) indicate that the approximation by an effective fully-connected network can still yield reasonable results even for relatively sparse networks with p = 5%. We emphasize that in our microscopic network model we used a fixed in-degree in order to avoid additional variability due to the quenched randomness in the number of synapses. This allowed us to focus on dynamic finite-size noise in homogeneous populations and its interactions with refractoriness. In contrast, the heterogeneity caused by the quenched randomness is a further finite-size effect [58] that needs to be examined in a future study.
As an integral equation, the mesoscopic population model is formally infinitely dimensional and represents a non-Markovian dynamics for the population activity A N . Such complexity is expected given that the derived population equations are general and not limited to a specific dynamical regime. Loosely speaking, the equations must be rich enough, and hence sufficiently complex, in order to reproduce the rich repertoire of dynamical regimes that fully connected networks of spiking neurons are able to exhibit (e.g. limit cycles, multi-stability, cluster states [23]). For a mathematical analysis, however, it is often desirable to have a lowdimensional representation of the population dynamics in terms of a few differential equations, at least for a certain parameter range. Apart from the dynamics in the neighborhood of an equilibrium point (see e.g. [126]) or in the limit of slow synapses [127], such "firing rate models" are difficult to link to the microscopic model already in the deterministic (macroscopic) case (for notable exceptions see [128,129]), let alone the stochastic, finite-size case. Here, our mesoscopic population rate equations can serve as a suitable starting point for deriving low-dimensional dynamics that links microscopic models to mesoscopic rate equations with realistic finite-size noise.

Extensions of the model
There are several ways to extend our mesoscopic population model towards more biological realism. We already mentioned the possibility to include short-term synaptic plasticity in our mean-field framework. Furthermore, the hazard function could be generalized to capture Gaussian current noise as arising from background spiking activity [5,[57][58][59][60]98]. Approximate mappings of white and colored current noise to an effective hazard function in the escape noise formalism are available [88,89] and might be combined with our mesoscopic population model. Yet another extension concerns the synaptic input model. Here we only looked at current input but, as shown by Chizhov and Graham [88], it is straightforward to extend population theories of the renewal type to the case of conductance inputs. In the simplest case, the synaptic current of neuron i embedded in population α and driven by populations β can be modelled by a linear voltage-dependence: (cf. corresponding expression Eq (22) in METHODS for current-based synapses). Here, E αβ is the reversal potential of a synapse from population β, and g αβ (t) is the conductance response (in nS) elicited by a spike of a presynaptic neuron in population β. The same mean-field arguments as for the current-based model carry over to the case of conductance-based synapses. For example, the membrane potential u a A ðt;tÞ of a current-based leaky integrate-and-fire neuron with a last spike time at timet follows the equation where at t ¼t and during an absolute refractory period u A ðt;tÞ ¼ u r is at the reset potential (see METHODS, Eq (30) for details). In the case of conductance-based input, Eq (18), we only need to replace Eq (19) by where R α is the membrane resistance. How to model nonlinear voltage-dependence of synaptic currents such as N-methyl-D-aspartate (NMDA) currents within a mean-field approximation is less obvious but approximations also exist for this case [20]. It will be an interesting question for the future how well these approaches work with the finite-N theory developed in the present study. Alternatively, effective current models [130,131] with activity-dependent, effective time constant τ m (t) and effective resting potential u rest (t) could be another solution to treat conductance inputs. Here, we have used a discrete set of populations. In large-scale models of the brain, one often regards the spatial continuum limit, resulting in so-called stochastic neural field equations [116]. These equations represent a compact description of neural activity and do not depend on a specific discretization of space. Just as discrete firing rate models, these field equations must be considered phenomenological because the link to neuronal parameters is not clear (note however that such equations have been derived from non-spiking, two-state neuron models for N < 1 [39], and from spiking models for N ! 1 [132,133]). By taking the spatial continuum limit, our mesoscopic population equations can be formulated as a stochastic neural field equation that is directly derived from a finite-size, spiking neural network. It would be interesting to employ this continuous extension of our mesoscopic equations to study the effect of spike-history effects on the stochastic behavior of bumps and waves in neural fields.
A first simple comparison of the computational performance in RESULTS, "MESOSCOPIC DYNAMICS OF CORTICAL MICROCOLUMN", demonstrated already that the mesoscopic population dynamics outperformed the microscopic simulation by a speed-up factor of around 120. In this example, the numerical integration of the population dynamics has not been particularly optimized with respect to time step Δt and history length T. A systematic comparison under the condition of some given accuracy, has the potential for an even larger speed-up because the population equations can be integrated with a larger time step than the spiking neural network. In addition to that, we have also compared the mesoscopic model to the full microscopic simulation of the refractory density (cf. RESULTS, "FINITE-SIZE MEAN-FIELD THEORY") and found a moderate enhancement in performance for sufficiently large networks (N ≳ 100). These computational aspects will be investigated in a separate study.

Methods Model
Network setup. We consider a network of M populations each consisting of N α interconnected neurons of the same type (the superscript α = 1, . . ., M labels the populations). Neuron i in population α receives p αβ N β connections (synapses) from a random subset G b i of presynaptic neurons in population β. Here, p αβ denotes the probability for a connection from a neuron in population β to a neuron in population α. That is, the connections between any two populations are random with fixed in-degree.
Let the spike train of neuron i in population α be denoted by where t a i;k is its k-th spike time and δ denotes the Dirac δ-function. The neuron receives spike train input from its presynaptic partners in population β with a transmission delay Δ αβ and synaptic weight w αβ . More precisely, the synaptic input current I a syn;i ðtÞ is modeled as a sum of post-synaptic currents caused by each spike of presynaptic neurons: where R α and t a m are the membrane resistance and membrane time constant of a neuron in population α, respectively, and w αβ sets the synaptic weights in units of mV. The synaptic kernel αβ (t) is defined as the postsynaptic current (PSC) normalized by its charge that is induced by one input spike from a neuron of population β. More precisely, αβ is the PSC divided by its integral, and therefore it has units of 1/sec. In Eq (22), the first sum runs over all populations β, whereas the second sum runs over the set G b i of all neurons in population β that project onto neuron i in population α.
In general, the filtered total synaptic input from population β, P j2G b i ð ab Ã s b j ÞðtÞ, may be modeled by a set of differential equations for a finite number of synaptic variables y ab i;' , ℓ = 1, . . ., L. In simulations, we model the synaptic kernel by a single exponential with constant delay where Θ(t) denotes the Heaviside step function. The synaptic time constants are t E s ¼ 3 ms and t I s ¼ 6 ms for excitatory and inhibitory synapses, respectively. This kernel can be realized by a single synaptic variable y ab i ðtÞ, which obeys the first-order kinetics t b s _ y ab Generalized integrate-and-fire model. Neurons are modeled by a leaky integrate-andfire model with a dynamic threshold [14,47,49,80] and an escape noise mechanism [14,[26][27][28]. Following [27], we refer to this model as the generalized integrate-and-fire (GIF) model. The crucial variables of this model are the membrane potential u a i ðtÞ and the dynamic threshold W a i ðtÞ. The membrane potential obeys the subthreshold dynamics where t a m is the membrane time constant and m a ðtÞ ¼ u rest þ R a I a ext ðtÞ is the drive in the absence of synaptic input consisting of a constant resting potential u rest and an external stimulus I a ext ðtÞ. The synaptic current I a syn;i ðtÞ has been defined in Eq (22). After each spike the voltage is reset to the potential u r , where it is clamped for an absolute refractory period t ref = 4 ms. Furthermore, each spike t a i;k adds a contribution y a ðt À t a i;k Þ to the dynamic threshold: where u a th is a baseline threshold and θ α (t) is called the spike-triggered kernel [26,27]. Since the increases in spike threshold accumulate over several spikes, the spike-triggered kernel causes spike-frequency adaptation. We set θ α (t) = 1 for t 2 (0, t ref ) so as to ensure absolute refractoriness. For the sake of simplicity, we assumed here that all spike-triggered accumulation effects can be lumped into the threshold variable (cf. Sec. "Mapping onto a generalized linear model" below). However, if realistic membrane potentials are needed (e.g., for fitting membrane potential data [27] or in a conductance-based extension of the model (see DISCUSSION) and [88]), adaptation mechanisms affecting the voltage should be kept in the voltage dynamics.
Spikes are elicited stochastically by a conditional intensity (also called hazard rate, escape rate or conditional rate) which depends on the momentary distance between the membrane potential and the threshold via the exponential link function f a ðxÞ ¼ c a expðx=D a u Þ. The parameter c α is the escape rate at threshold and the parameter D a u > 0 characterizes the softness of the threshold (Fig 1A, inset). Intuitively, a neuron fires immediately if its membrane potential is 2 Á D a u millivolts above the threshold and is unlikely to fire if its membrane potential is 2 Á D a u millivolts below the threshold [14]. In the limit D a u ! 0, the model turns into a deterministic (but adaptive) leaky integrate-and-fire model with a hard threshold. We emphasize that our standard choice of D a u $ 2 mV is consistent with the intrinsic stochasticity of neurons in cortical slices [26,85]. Alternatively, the softness of the threshold D a u may also be regarded as a phenomenological parameter that accounts for all incoherent noise sources that are individual to each neuron. This includes, e.g., any intrinsic noise but also fluctuations of external background input from other neural populations that are not modeled explicitly. For instance, to account for the external Poisson input used in the original cortical column model by Potjans an Diesmann [5], we increase in Figs 1, 8 and 9 the softness to D a u ¼ 5 mV. We note that for more detailed comparisons with the original model, more elaborate approximations of the escape rate for the case of colored noise exist [89], which in principle could be used to approximate external Poisson noise without a free parameter. However, because such a mapping is not the focus of the current study, we stick here for the sake of simplicity to the phenomenological escape rate, Eq (25), of the exponential form.
The parameters of the model used in simulations (unless specified differently) are summarized in Table 1.
Mapping onto a generalized linear model. We also considered a slightly different variant of the model, called spike-response model [14] or generalized linear model (GLM) [55,66,84,108,109]. This model does not reqire the reset rule of the integrate-and-fire model but instead relies on spike-triggered kernels to implement refractoriness and other spike-history effects. Specifically, the membrane potential is given by where h a i ðtÞ is the free membrane potential given by For a membrane filter kernel k a ðtÞ ¼ YðtÞe t=t a m =t a m , where Θ(t) denotes the Heaviside step function, the dynamics of h a i is equivalent to the dynamics of u a i (Eq (23)), except that h a i is not reset upon spiking. Spike-history effects on the level of the membrane potential are captured by the second term in Eq (26). This term represents the convolution ðZ a Ã s a i ÞðtÞ of the output spike train with a spike-triggered kernel η(t) and generates a spike-after-potential that accumulates over spikes. As before, the threshold W a i ðtÞ obeys Eq (24). Given the membrane potential u a i ðtÞ and the dynamic threshold W a i ðtÞ, spikes are generated by the same hazard rate l a i ðtÞ given by Eq (25). At low firing rates, the spike-triggered kernel η can be used to approximate the integrateand-fire dynamics by choosing ZðtÞ ¼ ðu r À u th Þe À ðtÀ t a ref Þ=t a m YðtÞ. However, this is not an exact mapping because the value of the membrane potential is not reset to a fixed value u r after spiking, in contrast to the GIF model. This is due to the accumulation of the threshold and the variability in the voltage at the moment of firing . We also mention that the kernel η can be transformed into the kernel θ of the threshold dynamics [14]. This is possible because we are only interested in the spike emissions of the neurons and not the membrane potentials. In fact, the conditional firing rate, Eq (25), is invariant under the transformation θ ! θ − η, η ! 0.

Mean-field approximation
An important variable that characterizes the internal state of a neuron is the time of its last spike, or, equivalently, the time elapsed since the last spike ("age" of the neuron) [88]. The time since the last spike is a good predictor of the refractory state of a neuron at time t. Our approach is to use a population density description for this refractory state [23,68,88,89], in which the coupling of neurons as well as the adaptation of single neurons are mediated by the mesoscopic population activities A a N ðtÞ defined by Eq (1). To this end, we replace the conditional firing rate l a i ðtÞ of a neuron i in population α by an effective rate l a A ðtjt a i Þ that only depends on its last spike timet a i and the history of the population activity fA a N ðt 0 Þg t 0 <t [63]. Here and in the following, the subscript A indicates the dependence on the history of A a N ðtÞ. We note that the expected total activity " A a ðtÞ of population α at time t is the average of all the conditional firing rates summed over all neurons in this population: " A a ðtÞ ¼ ð1=N a Þ P i l a i ðtÞ. The effective rate l a A ðtjt a i Þ is determined such that it approximates the conditional intensity on average: To find such an approximation, we proceed in two steps [55]: first, the membrane potential u a i ðtÞ is approximated by a function u a A ðt;t a i Þ using a mean-field approximation of the synaptic input. For fully connected populations, this first approximation turns into an exact statement. Second, the dynamic threshold W a i ðtÞ is approximated by a function W a A ðt;t a i Þ using the quasirenewal approximation [63]. For renewal neurons, the second approximation becomes exact. Once we have found an expression for the mean-field approximation Eq (28), we are in a position to use a population density description with respect to the last spike timest a i . In the following two paragraphs we explain the above two steps in detail.
Mean-field approximation of synaptic input. In the special case of a fully connected network (p αβ = 1), the membrane potential can be completely inferred from the last spike timet a i and the mean field A a N . In this case, the synaptic input Eq (22) can be rewritten as Thus, in a fully connected network all neurons in population α "see" the same synaptic input R a I a syn given by the "mean field" A N (t). From Eq (23) follows that GIF neurons with the same last spike timet all have the same membrane potential u a A ðt;tÞ that obeys the differential equation with J αβ = p αβ N β w αβ . The initial condition is u a A ðt;tÞ ¼ u r corresponding to the reset of the membrane potential after the last spike. If we use this insight for the conditional intensity f a ðu a i ðtÞ À W a i ðtÞÞ we see that the explicit dependence upon u a i ðtÞ can be dropped as long as we keep track of the last spike timet a i , cf. Eq (28); hence u a i ðtÞ ¼ u a A ðt;t a i Þ. In a randomly connected network (p αβ < 1), the synaptic input is different for each neuron. On the population level, however, this heterogeneity is averaged allowing us to still use the mean-field approximation, Eq (30). To see this, we note that in our network with fixed indegree, each neuron i in population α has p αβ N β presynaptic neurons in population β (β = α is possible). This means that in Eq (22) we can approximate the sum P j2G b i s b j ðtÞ over the p αβ N β presynapic neurons by (1)). The mean-field approximation, Eq (31), only depends on the population activity and is thus identical for all neurons. Therefore, fluctuations of the population activity can be regarded as common input fluctuations that are coherent across neurons. On the other hand, the deviation from the mean-field approximation (i.e. the difference between the left-and right-hand side of Eq (31)) is different for each neuron and can be regarded as incoherent noise. For low connection probabilities, this incoherent part of the fluctuations may lead to a significant deviation of the membrane potential u a i ðtÞ from the meanfield approximation u a A ðt;t a i Þ (Fig 6A, 6C and 6E, top). On the mesoscopic scale, however, the total number of spikes in a small time step Δt is determined by the population-averaged conditional firing rate, Eq (28), (cf. Eq (46) below). Hence, for sufficiently large N α , incoherent noise average out, whereas common finite-size fluctuations survive (Fig 6A, 6C and 6E, bottom). Note, however, that incoherent noise may cause a small bias because we average a nonlinear function of the noisy membrane potential on the l.h.s. of Eq (28). Effectively, the incoherent noise softens the threshold of the escape noise mechanism.
Quasi-renewal approximation. So far we have reduced the conditional intensity to l a i ðtÞ % f ðu A ðt;t a i Þ À W a i ðtÞÞ. This expression still involves the individual threshold W a i ðtÞ of neuron i in population α, which depends on the full spike history of that neuron. This means that the spike-train is generally not a time-dependent renewal process. Here, we employ the quasi-renewal approximation [63] and average over the spikes before the last spike time assuming that they occurred according to an inhomogeneous Poisson process with rate A a N ðt 0 Þ, t 0 <t a i . Averaging the conditional intensity, Eq (25), in this way, conditioned on a given last spike timet a i and a given history A a N ðt 0 Þ, t 0 <t a i , yields [63,134,135] where W a A ðt;t a i Þ is an effective dynamic threshold given by W a A ðt;tÞ ¼ u th þ y a ðt ÀtÞ þ Z^t À 1ỹ Here,ỹ a ðtÞ ¼ D a u ½1 À e À y a ðtÞ=D a u is the so-called quasi-renewal kernel [63], while y a ðt ÀtÞ describes the increase of the threshold induced by the last spike. Note that as a result of the two approximations, the conditional firing rate no longer depends on the precise spiking history of a given neuron and its presynaptic neurons, but only on its last firing time, cf. Eqs (25) and (32). This ends our explanation of Eq (28).

Discretized population density equations
Using the mean-field approximation Eq (32), we have reduced the model to a population of time-dependent renewal processes [23,68], where the conditional intensity of neuron i is l a A ðtjt a i Þ. Neurons are effectively coupled through the dependence of l a A ðtjt a i Þ upon the membrane potential u a A ðtjt i Þ, which in turn depends on the activities A b N of all populations β that are connected to population α. This is the only place where population labels different from α appear. For the sake of notational simplicity, we will omit the population label α and the subscript A in this section, keeping in mind that all quantities refer to population α and that the coupling with other populations is implicitly contained in u a A ðtjt i Þ. Microscopic dynamics of the refractory density. Because the firing probability of a neuron only depends on its last spike time and the mesoscopic population activity in the past, we can use a population density description of all last spike timest i in the population. To derive such representation it is useful to discretize time by introducing the discrete time points t k = t 0 + kΔt, k 2 Z, and the corresponding intervals I k ¼ ½t k ; t k þ DtÞ. Time is measured relative to a reference time t 0 , which, however, is irrelevant for the following arguments. We require that the size of the intervals Δt is sufficiently small so that each neuron fires at most once during any interval. Specifically, we require that Δt t ref . We also require that the sum of axonal and synaptic delays is not smaller than Δt. Furthermore, we identify the discrete time point t l as the current time, whereas indices k with k < l correspond to the past. In the population density approach, we do not keep track of the last spike time of each individual neuron but for each past time interval I k we only track the number of those neurons that have their last spike time in this interval. This number is denoted by m(t l , t k ). The collection fmðt l ; t k Þg k2Z;k<l of these numbers for all intervals I k , k < l, represents the current distribution of last spike timest i in the population at time t l (Fig 10A). Because each neuron has exactly one last spike time, the distribution m(t l , t k ) is normalized to the total number of neurons: Since the last spike time determines the refractory state of a neuron, the distribution m(t l , t k ) will be also called refractory distribution and the function Q N (t l , t k ) m(t l , t k )/(NΔt) can be regarded as the corresponding refractory density. The refractory distribution fully characterizes the microscopic state of the population. We now introduce the number of neurons that fired a spike in the interval I k (not necessarily the last spike). This number is denoted by Δn(t k ) ( Fig 10A) and is related to the population activity by A N (t k ) = Δn(t k )/(NΔt). Therefore, Δn(t k ) will be often referred to as simply the "activity" at time t k . Knowing the past activities Δn(t k 0 ) for k 0 < l and the last spike time t k fully determines the membrane potentials u(t l , t k ) and thresholds ϑ(t l , t k ), and hence the escape rate λ(t l |t k ) = f(u(t l , t k ) − ϑ(t l , t k )) associated with the interval I k . Thus, the knowledge of the past activities and the distribution of last spike times at time t l is sufficient to statistically determine these quantities at time t l+1 . In other words, the evolution of the system can be described by a Markov process if we define the microscopic state X ðt l Þ of the population at time t l by the sequence of pairs In the following, the main task will be to derive the statistics of the number of spikes Δn(t l ) in the next time interval [t l , t l + Δt) and the distribution m(t l + Δt, t k ) of last spike times at time t l+1 given the state X ðt l Þ at time t l . We mention that what we have lost in this population density description is only the information about the identity of the neurons, which, however, is irrelevant for the mesoscopic description of homogeneous populations.
There is a second interpretation of m(t l , t k ): let us consider the group of neurons that have fired in the interval I k , k < l. The number of neurons from this group that have "survived" (i.e. that have not fired again) until time t l is exactly given by m(t l , t k ). We will therefore also call it number of survived neurons or survival number for the refractory state k. Correspondingly, the ratio S N (t l |t k ) = m(t l , t k )/Δn(t k ) is the fraction of survived neurons. As time t l evolves, the number of survived neurons diminishes whenever there is a spike in that group (Fig 10B). Thus, if the group fires X lk spikes in the time step [t l , t l + Δt), then m(t l , t k ) decreases by X lk . For l > k, this gives rise to the evolution equation The initial condition is given by m(t k + Δt, t k ) = Δn(t k ), which follows from the absolute refractoriness during the first time step after a spike. Absolute refractoriness also entails that each neuron can fire only one spike per time step (Δt t ref ) with a firing probability In the last step, we introduced the average hazard rate " lðt l jt k Þ ¼ ½lðt l jt k Þ þ lðt lþ1 jt k Þ=2. Because the past activities Δn(t k ), k < l, completely determine the probability to fire P λ (t l |t k ), each neuron decides independently from the others whether it fires in the next time step. Furthermore, there is a total number of m(t l , t k ) neurons from the considered group that could potentially fire in the interval [t l , t l + Δt). Therefore, the number of spikes X lk is the result of m(t l , t k ) independent Bernoulli trials with success probability P λ (t l |t k ). This implies that X lk follows a binomial distribution: Moreover, the random numbers X lk associated with different past time intervals I k are conditionally independent given the current state of the system X ðt l Þ (cf. Eq (35)). The total number of spikes emitted in the current interval [t l , t l + Δt) is equal to the total reduction of survivals in that interval, hence Eqs (36)-(39) define the microscopic kinetics in discrete time. In a simulation, for each past time interval I k one independent Poisson random number X lk needs to be drawn per time step and population. These random numbers determine the current spike count via Eq (39) and the update of the distribution of last spike times m(t l , t k ), via Eq (36). We call this description microscopic because for small time steps, there will be many (order of N) intervals I k that contain survived neurons, i.e. for which m(t l , t k ) > 0 and for each of which one needs to draw a random number X lk in a simulation. In the limit Δt ! 0, such a simulation would be as complex as the original microscopic simulation of N neurons. The microscopic population density description can be summarized in a particularly compact form by performing the continuum limit Δt ! 0 and by assuming large N. For large N, the statistics of X(t l , t k ) becomes Gaussian with mean and variance P λ (t l |t k )m(t l , t k ). Thus, the dynamics of mðt;tÞ, Eq (36), can be rewritten as where fN ðt l ; t k Þg k;l2Z are independent, standard normal random numbers and the ramp function [x] + = xΘ(x) ensures non-negativity of m. Using the density of last spike times Q N (t l , t k ) S N (t l |t k )A N (t k ) m(t l , t k )/(NΔt), setting t l = t and t k ¼t, and expanding P l ðtjtÞ % lðtjtÞDt for small Δt we arrive at the following dynamics in the limit Δt ! 0: Here, xðt;tÞ lim Dt!0 N ðt;tÞ=Dt is a Gaussian random field with zero mean and correlation function hxðt;tÞxðt 0 ;t 0 Þi ¼ dðt À t 0 Þdðt Àt 0 Þ. For a given last spike timet, Eq (41) has the form of a Langevin equation. Its initial condition is Q N ðt;tÞ ¼ A N ðtÞ. The population activity results from Eq (39) as the integral of changes of the refractory density Q N ðt;tÞ over all refractory states, i.e. A N ðtÞ ¼ À R t À 1 @ t Q N ðt;tÞ dt, or using Eq (41): Eqs (41) and (42) represent the microscopic population density equations in continuous time under the Gaussian and quasi-renewal approximations. Mesoscopic description. At the mesoscopic level, we want to describe the state of the population at time t l only by the mesoscopic variables Δn(t k ), k < l, that have been observed so far. Therefore, we define the history of n at time t l by which completely determines the mesoscopic state. In contrast to the microscopic state X ðt l Þ defined in Eq (35), the mesoscopic state does not require the knowledge of the detailed distribution of last spike times m(t l , t k ). We call a variable mesoscopic if it only depends on the history Hðt l Þ. Likewise, an equation is called mesoscopic if it only involves mesoscopic variables. In the following sections, all averages at a given time t l have to be understood as conditional averages given the history Hðt l Þ. We will therefore often omit an explicit notation of this condition.
To derive a mesoscopic equation, we want to find an approximate dynamics with only one effective, mesoscopic noise term that summarizes the effect of all microscopic random variables X lk . In a simulation, this would imply to draw only one random number per time step and per population. Towards that end, we assume that Δt can be chosen sufficiently small such that P λ (t l |t k ) ( 1, which is always possible if neurons are stochastic und hence do not perfectly synchronize. Under this assumption, the binomially-distributed random numbers X lk are approximately Poisson-distributed, i.e.
where E X lk jmðt l jt k Þ ½ ¼ P l ðt l jt k Þmðt l ; t k Þ: ð45Þ is the conditional mean of X lk given the current survival number m(t l |t k ). Given the conditional independence of X lk for different k, the Poisson property implies that the global activity Δn(t l ) in Eq (39) is also Poisson-distributed given the current refractory distribution {m(t l , t k )} k<l , i.e.
with mean D" nðt l Þ E Dnðt l Þjfmðt l ; t k Þg k<l ; Hðt l Þ Â Ã ¼ Because of the definition of refractory densities and P λ 1, we find that D" nðt l Þ N is automatically satisfied at any moment in time. However, for the numerical implementation with finite Δt later on we need to keep in mind that the Poisson number Δn(t) could become larger than N, if D" nðt l Þ is close to N. In this case, using a binomial statistics will be more appropriate, as explained in Sec. "Numerical implementation". Eqs (46) and (47) suggest the possibility to generate Δn(t l ) by a single Poisson-distributed random number. However, Eq (47) is not a mesoscopic equation yet because it still depends on the dynamics of m(t l , t k ), Eq (36), which contains the microscopic random variables X lk . There is another, more subtle problem if we want to use Eqs (46) and (47) as a mesoscopic dynamics that generates the activities Δn(t l ). If we regard Δn(t l ) as an independent random variable, the conservation of neurons, Eq (39), imposes a constraint on the microscopic random numbers {X lk } k<l , which will therefore not be independent anymore. Conversely, if we consider both {X lk } k<l and Δn(t l ) as independent variables, we almost certainly violate the conservation of neurons, Eq (39), or equivalently, the normalization condition Eq (34). This problem does not occur in the microscopic dynamics, where Δn(t l ) is a dependent variable generated from the independent random variables {X lk } k<l via Eq (39), and hence the correct normalization is guaranteed at any time. Nevertheless, the "non-normalized" or "unconstrained" process, in which {X lk } k<l and Δn(t l ) are drawn independently, will be useful for deriving mesoscopic equations because it allows us to calculate the moments of the survival numbers m(t l , t k ). Our main strategy is to use these moments in conjunction with the normalization condition to express the expected spike count D" nðt l Þ, Eq (47), as a deterministic functional of the past activities. In this way, D" nðt l Þ will not depend anymore on the actual microscopic realizations of the constrained noise fX l 0 ;k g k;l 0 2Z;k<l 0 <l (constrained by a given history {Δn(t k )} k<l via Eq (39)) and can thus be used to generate Δn(t l ) as a Poisson random number from the knowledge of the past activities.
Moment equations. To achieve such deterministic relationship, we first derive mesoscopic equations for the mean and variance of m(t l , t k ) given the history Hðt l Þ in the so-called non-normalized ensemble or unconstrained ensemble. This means that the history determines the initial conditions of the dynamics of m(t l , t k ), Eq (36), as well as the conditional intensities λ(t l |t k ), but it does not impose the constraint Eq (39) on the random numbers {X l 0 , k } l 0 l . Although this unconstrained noise leads to a non-normalized distributionmðt l ; t k Þ, it still yields a very good approximation of its mean and variance in the actual constrained ensemble. Taking the average of Eq (36), and using Eq (44) yields the evolution of the mean: with initial condition hm kþ1;k i ¼ Dnðt k Þ. Here and in the following,m l;k is short-hand for mðt l ; t k Þ to simplify the notation, and hÁi denotes the ensemble average of the unconstrained process for a given history Hðt l Þ. Actually, the condition for the average hÁi can be extended to the history Hðt lþ1 Þ (and to any future activities) because in the unconstrained ensemble neither m l;k norm lþ1;k depend on the most recent activity Δn(t l ) (clearly, this also holds for any future activity). Importantly, Eq (48) is a mesoscopic equation because it is fully determined by the past activities.
As a next step we derive an equation for the variance ofm. To this end, let Dm l;k ¼m l;k À hm l;k i ð49Þ denote the deviation from the mean. Using the law of total variance, we find for the variance in the next time step The conditional mean of m l+1,k given the current value m l,k , denoted by E[m l+1,k |m l,k ], follows from the evolution Eqs (36) and (45) as [1 − P λ (t l |t k )]m l,k . Therefore, its variance is ½1 À P l ðt l jt k Þ 2 hDm 2 l;k i. For the second term in Eq (50), we note that the conditional variance Var[m l+1,k |m l,k ] is equal to the variance Var[X lk |m l,k ] of the decrement X lk . Because X lk is a Poisson variable, this variance is equal to the mean given by Eq (45). Taken together, we find the following update rule for the total variance with initial condition hDm 2 kþ1;k i ¼ 0. As a function of t l (Fig 2C bottom), the variance hDm 2 ðt l ; t k Þi is initially zero because all neurons have still survived immediately after firing at time t k . On the other hand, at long times t l ) t k , the variance also vanishes because according to Eq (48), the mean number of survived neurons hm(t l , t k )i appearing in Eq (51) goes to zero. As a consequence, the variance obtains a maximum at an intermediate time. Similarly, the dependence of the variance at time t l for different last spike timest ¼ t k shows the same limiting behavior which implies a maximum at an intermediate last spike timet (Fig 2B bottom). However, the rugged shape of this function with many local maxima reflects the discontinuity of the driving force hm(t l , t k )i as a function of t k that arises from the stochastic initial condition hm(t k+1 , t k )i = Δn(t k ).

Mesoscopic population equations.
Let us return to Eq (47) for the expected activity D" nðt l Þ, which is used to draw the activity Δn(t l ) as a Poisson random number (cf. Eq (46)). Because we condition on the history {Δn(t l 0 )} l 0 <l , the processes m l,k in this equation belong to the "constrained" ensemble, in which the normalization condition, Eq (34), is obeyed. We note that these constrained processes could in principle be generated microscopically by Eq (36) if at each time t l 0 in the past, the microscopic noise X l 0 k was sampled from a joint distribution that ensures the conservation of neurons, Eq (39), i.e. ∑ k X l 0 k = Δn(t l 0 ). However, as we will show in the following, such a construction is not needed because the dependence of the expected activity D" nðt l Þ on a specific realization of m l,k can be eliminated by exploiting the normalization condition, Eq (34). To this end, we take advantage of the fact that the mean hm l;k i of the unconstrained process is a mesoscopic variable. This suggests to split the constrained processes m l,k into the mean of the unconstrained process and a fluctuation part: The first contribution is deterministic given the past activities Δn(t k ) while the second contribution represents the microscopic fluctuations. We note that the fluctuation δm l,k is not equivalent to the deviation Dm l;k of the unconstrained process because hm l;k i þ Dm l;k does not obey the normalization condition, Eq (34), whereas hm l;k i þ dm l;k does.
To remove the microscopic fluctuations δm l,k , we require that both Eqs (34) and (47) are simultaneously satisfied. Substituting Eq (52) into these equations leads to D" nðt l Þ ¼ The microscopic fluctuations δm l,k enter the dynamics only in the form of two sums. First, the normalization condition Eq (53) imposes a strict relation between the summed deviation ∑ k δm l,k and the means of the unconstrained processes, hm l;k i, irrespective of the specific, underlying microscopic dynamics of m l,k . In particular, we can solve for P k dm l;k ¼ N À P k hm l;k i with terms on the r.h.s. that are completely determined given the past activities. Second, the total effect of the deviations on the expected activity D" nðt l Þ is given by the weighted sum ∑ k P λ (t l |t k )δm l,k in Eq (54) with P λ (t l |t k ) 1 for all k<l. The weighted sum ∑ k P λ (t l |t k )δm l,k is therefore tightly constrained by the value of the summed fluctuation ∑ k δm l,k . These considerations suggest to make the following decoupling approximation: with a still unknown factor P Λ (t l ), that we call effective firing probability. To determine the effective firing probability, we require that in the unconstrained ensemble the corresponding approximation minimizes the mean squared error EðP L Þ ¼ hε 2 l i, where P Λ is short-hand for P Λ (t l ). We use the unconstrained deviations Dm lk here because we are only interested in the typical error.
Note that if ε l is Gaussian distributed, minimizing the mean squared error yields the optimal estimation of P Λ in the sense that it maximizes the log-likelihood of P k P l ðt l jt k ÞDm l;k given P k Dm l;k under the linear approximation Eq (56). The error can be rewritten as ε l = ∑ k [P λ (t l |t k ) − P Λ (t l )]Δm l,k , which for N ) 1 is a sum of many independent variables that can indeed be considered to be Gaussian. The derivative of E with respect to P Λ is where we have exploited that P λ (t l |t k ) is deterministic given the past activities Δn(t k ), k<l.
Furthermore, under this condition, the fluctuations Dm l;k and Dm l;k 0 with k 6 ¼ k 0 are conditionally independent. Using this property and setting dE=dP L ¼ 0 we find that the optimal effective firing probability is The variance hDm 2 l;k i in this formula obeys the mesoscopic dynamics derived above in Eq (51). Hence, the effective firing probability is itself mesoscopic.
Using Eqs (53) and (55), ∑ k δm l,k can be eliminated in Eq (54) resulting in D" nðt l Þ ¼ Thus, we obtain an equation that yields the mean spike count D" nðt l Þ at the present time as a function of the past spike counts {Δn(t k )} k<l . Eq (59) is the desired mesoscopic equation in discrete time. For sufficiently small time steps Δt, the present spike count Δn(t l ) can be generated by drawing a Poisson random number with mean D" nðt l Þ according to Eq (46).

Mesoscopic population density equations in continuous time
In continuous time, we consider the rescaled variables Here, " AðtÞ can be interpreted as the expected population activity given the past activity A N (t 0 ), t 0 < t. For Δt small but positive, the spike count Δn(t) is an independent Poisson number with mean D" nðtÞ ¼ N " AðtÞDt. Thus, on a coarse-grained time scale, the continuum limit of the population activity may be written in the following suggestive way where dt denotes an infinitesimal (but temporally coarse-grained) time step and dn(t) is an independent Poisson-distributed random number with mean N " AðtÞdt. In the limit dt ! 0, the spike count in an infinitesimal time step is a Bernoulli random number, where dn(t) = 1 with probability N " AðtÞdt and n(t) = 0 with probability 1 À N " AðtÞdt. Therefore, in this limit the population activity A N (t) converges to a sequence of Dirac δ-functions occurring at random times t pop,k with rate N " AðtÞ. Thus, A N (t) can be written more formally as a population spike train or "shot-noise" where ðt pop;k Þ k2Z is a point process with conditional intensity l pop ðtjH t Þ ¼ N " AðtÞ. Here, the condition H t denotes the history of the population activity {A N (t 0 )} t 0 <t , or equivalently, the history of spike times {t pop,k } t pop,k <t , up to (but not including) time t.
To obtain the dynamics for " AðtÞ, we also introduce the rescaled variables The function SðtjtÞ can be interpreted as the survival probability of neurons that have fired their last spike at timet. Furthermore, for small Δt the firing probability is given by P λ (t l |t k ) = λ(t l |t k )Δt + O(Δt 2 ). Thus, the continuum limit of Eq (59) reads The sums in this equation can be regarded as the definition of stochastic integrals, which allows us to rewrite Eq (64) as Here, is an effective rate corresponding to the effective firing probability P Λ (t). Note that according to Eq (64), the stochastic integrals in Eq (65) extend only over last spike timest < t not including timet ¼ t. Taking the continuum limit of Eq (48) we find that the survival probability satisfies the differential equation This equation has the simple solution Similarly, we find from Eq (51) that the rescaled variance obeys the differential equation The set of coupled Eqs (62)-(69) defines the mesoscopic population dynamics. We emphasize that not only A N (t) depends on " AðtÞ (cf. Eq (62)) but that there is also a feedback of A N (t) onto the dynamics of " AðtÞ, cf. Eq (65). In fact, " AðtÞ can be regarded as a deterministic functional of the past activities up to but not including time t. In particular, A N (t) is not an inhomogeneous Poisson spike train because the specific realization of the spike history of A N (t) determines the conditional intensity function for the point process (t pop,k ) via Eq (65). Furthermore, we note that, in the case of synaptic coupling or adaptation, also the variables S and v depend on the history of the population activity through the dependence of lðtjtÞ on the membrane potential uðt;tÞ and the threshold Wðt;tÞ (cf. Eqs (30) and (33)).
For large N, the population activity can be approximated by a Gaussian process. To this end, we note that in the discrete time description the spike counts Δn(t l ) are conditionally independent random numbers with mean and variance N " Aðt l ÞDt. Therefore, in the large-N limit, the variable is normally distributed with mean zero and variance Δt, and hence corresponds to the increment of a Wiener process. Using Eq (60) for the population activity and taking the continuum limit Δt ! 0, we obtain where ξ(t) = lim Δt!0 ΔW(t)/Δt is a Gaussian white noise with correlation function hξ(t)ξ(t 0 )i = δ(t − t 0 ). This Gaussian approximation has the advantage that the multiplicative character of the noise in Eq (71) becomes explicit because ξ(t) is independent of the state of the system. It also explicitly reveals that the finite-size fluctuations scale like 1= ffiffiffiffi N p . We stress again that A N (t) is not a white-noise process with time-dependent mean, as Eq (71) might suggest at first glance, but it is a sum of two mutually correlated processes, (i) a white-noise term proportional to ξ(t) that reflects the fact that the population activity is a δ-spike train and (ii) a colored "noise" " AðtÞ that arises from the filtering of ξ(t) through the dynamics in Eq (65). As a result, the auto-correlation function of A N (t) contains a δ-peak and a continuous part, consistent with previous theoretical findings [55]. In particular, at short lags the auto-correlation function may be negative as a result of refractoriness: in this case, ξ and " A are anti-correlated in line with the intuitive picture discussed in the RESULTS section, Fig 2, that a positive fluctuation ξ(t) is associated with the creation of a "hole" in the distribution of last spike times leading to a reduced activity after time t. In the frequency domain, refractoriness corresponds to a trough in the power spectrum at low frequencies [94] as visible, for instance, in

Several populations
It is straightforward to generalize the population equations to several populations by adding a population label α = 1, . . ., M. For the sake of completeness, we explicitly state the full set of equations. The activity of population α is given by where ðt a pop;k Þ k2Z is a point process with conditional intensity l a For each population, the system of Eqs (73)-(78) contains a family of ordinary differential equations for the variables S, u and v parametrized by the continuous parametert with À 1 <t < t, and five integrals over this parameter. In the next section, we show that the family of ordinary differential equations is equivalent to three first-order partial differential equations. Furthermore, in Sec. "Population equations for a finite history", we reduce the infinite integrals to integrals over a finite range, which will be useful for the numerical implementation of the population equations.

Refractory density representation
There is an equivalent formulation of the population equation in terms of first-order partial differential equations for the density of ages t ¼ t Àt [23,68,88,89]. The representation in terms of age τ as a state variable is useful because it parallels the Fokker-Planck formalism for the membrane potential density [14,36,[58][59][60] or related density equations [117,118], in which the state variable is the membrane potential of a neuron. To keep the notation simple we consider in the following population α but drop the index α wherever confusion is not possible. Thus, we write e.g. S for S α and A N for A a N but we keep the index β as well as double indices αβ occurring in Eq (77).
The density of ages at time t is defined as q(t, τ) = S(t|t − τ)A N (t − τ). We recall that because of finite-size fluctuations, q is not a normalized probability density. Furthermore, we regard the functions λ, u and v as functions of t and τ. With these definitions the population equation, Eq (65), can be rewritten as The stochastic activity A N (t) then follows from Eqs (14) or (15). Eq (79) yields the expected population rate at time t for a given density of ages. In the Fokker-Planck formalism, this would correspond to the calculation of the rate from the membrane potential density as the probability flux across the threshold. Noting that @ t SðtjtÞA N ðtÞ ¼ ð@ t þ @ t Þqðt; tÞ, we find from Eq (67) the following first-order partial differential equation for the density of ages q(t, τ): Similarly, u and v obey from Eqs (77) and (76), respectively, with boundary conditions u(t, 0) = u r and v(t, 0) = 0. These functions, together with the threshold dynamics determine λ(t, τ) and Λ(t) via Eq (74), i.e.

Population equations for a finite history
To simulate the population activity forward in time, the integrals in Eq (65) over the past need to be evaluated, starting att ¼ À 1. For biological systems, however, it is sufficient to limit the integrals to a finite history of length T. This history corresponds to the range t À T t < t, where we have to explicitly account for the dependence of the conditional firing rate lðtjtÞ on the last spike timet. We will call neurons with last spike time in this range "refractory" because they still experience some degree of (relative) refractoriness caused by the last spike. The remaining part of the integral corresponding to the range À 1 <t < t À T receives a separate, compact evaluation. We will refer to neurons with their last spike time in this range as "free" because their conditional intensity is free of the influence of the last spike. How should we choose the length of the explicit history T? First of all, this length can be different for different populations and is mainly determined by the time scale of refractoriness, i.e. the time it needs to forget the individual effect of a single spike in the past. Furthermore, it depends on the properties of the spike-triggered kernel, i.e. the dynamic threshold that is responsible for adaptation. More precisely, we determine the length of the history by the following conditions: first, the conditional intensity is insensitive to the precise timing of the last spike att < t À T if Here, t ref is the absolute refractory period and τ rel is the time scale of the relative refractory period. For the GIF model τ rel = τ m . Second, we demand that T is chosen such that for t > T, the quasi-renewal kernelỹðtÞ ¼ D u ½1 À e À yðtÞ=D u can be well approximated by the original spike-triggered kernel θ(t). Taylor expansion of the exponential yields the condition The length of the history T needs to be chosen such that both conditions, Eqs (85) and (86) are fulfilled. It is important to note that condition Eq (86) does not require the time window T to be larger than the largest time scale of the spike-triggered kernel. For instance, consider the kernel yðtÞ ¼ J y t y e À t=t y , where J θ and τ θ are adaptation strength and time scale, respectively. In particular, the adaptation strength J θ sets the reduction in firing rate compared to a non-adapting neuron in the limit of strong drive irrespective of the time scale τ θ (see e.g. [80,82]). Condition Eq (86) can be fulfilled for a given T if either τ θ is small enough such that the exponential e À t=t y is small, or, for a fixed adaptation strength J θ , by increasing the adaptation time scale τ θ such that J θ /τ θ ( Δ u .
Dynamic threshold of refractory and free neurons. For free neurons, i.e. for À 1 <t < t À T, we use the average threshold under the assumption that spikes occurred in the range À 1 <t < t À T according to an inhomogeneous Poisson process with rate A N ðtÞ. This average is given by [63,134] where in the last step we used Eq (86). We assume that for t > T the spike-triggered kernel can be sufficiently well approximated by a sum of exponentials yðtÞ ¼ YðtÞ J y;' t y;' e À t=t y;' : This allows us to express the threshold for free neurons as where the variables g ℓ (t) satisfy the differential equations For refractory neurons, i.e. if t À T t < t, we need to evaluate in the effective threshold, Eq (33), an integral over the exact quasi-renewal kernelỹðtÞ. Splitting this integral into the free and refractory part yields the threshold of refractory neurons: Wðt;tÞ ¼ W free ðtÞ þ yðt ÀtÞ þ We can use the threshold for free and refractory neurons, Eqs (88), (89) and (90), respectively, to obtain the respective conditional intensities: l free ðtÞ ¼ f hðtÞ À W free ðtÞ ð Þ; lðt;tÞ ¼ f uðt;tÞ À yðt;tÞ where h(t) is the free membrane potential given by Eq (27). Let us remind the reader that h(t) obeys the dynamics Eq (23) but without resetting of the membrane potential after a spike. Population equations. We now apply the split of the history to the integrals that appear in the population equations, specifically Eqs (65) and (66). By definition, the conditional intensity of free neurons does not depend explicitly on the last spike timet. That is, lðtjtÞ ¼ l free ðtÞ for À 1 <t < t À T, where the free hazard rate λ free (t) is given by Eq (91). In the free part of the integrals in Eqs (65) and (66), the free hazard rate can be pulled out of the integral, which yields LðtÞ ¼ Here we have introduced the expected number of free neurons xðtÞ ¼ N R tÀ T À 1 SðtjtÞA N ðtÞ dt and the partial integral over the variance function zðtÞ ¼ N R tÀ T À 1 vðt;tÞ dt. Differentiating these new variables and employing Eqs (63) and (69), we find that they obey the differential equations dx dt ¼ À l free ðtÞx þ NSðtjt À TÞA N ðt À TÞ; ð94Þ Thus, the integrals no longer run from −1 to t but are now limited to the range [t − T, t). The long tails over the past have been reduced to differential equations.

Numerical implementation
Discretization of time. We discretize the time axis into a grid with step size Δt and grid points Because we keep track of a finite history with the oldest last spike timet ¼ t À T, the history consists of a finite number K of bins such that T = KΔt. If the index k = l corresponds to the current time, the oldest last spike time of the explicit history corresponds to an index k = l − K and the most recent one corresponds to the index k = l − 1. Note that the numerical implementation requires the absolute refractory period t ref to be at least as large as the integration time step Δt (see below).
To facilitate the notation of the update rules, it is convenient to introduce the following notations: u k ðt l Þ uðt l jt k Þ; ð99Þ In particular, " m k and v k correspond to the mean and variance of the unconstrained survival numbers, respectively. We also recall that the population index α is dropped wherever confusion is not possible, while the index β as well as double indices αβ will be kept.
Choice of Δt. A crucial assumption of the derivation of the population equations in discrete time was that the time step Δt is small enough such that each neuron fires at most one spike per time step. This can be achieved by the condition (cf. Sec. "Discretized population density equations"). Clearly, this condition implies that the total number of spikes per time step must be bounded by the number of neurons, i.e. the population activity must obey Δn(t l ) N. The equality sign corresponds to the case that all neurons fire in the same time step. In addition to condition Eq (101), we also had to require that Δt is not larger than the transmission delay Δ, i.e.

Dt D: ð102Þ
In order to justify the use of the Poisson statistics in the derivation of the population equations, we further assumed that Δt is sufficiently small such that the expected number of spikes per time step, D" nðtÞ, is much smaller than N, or equivalently " AðtÞDt ( 1. While this does not pose a problem for the theory, which ultimately concerns with the continuum limit Δt ! 0, an efficient numerical integration of the population equations benefits from a time step that is as large as possible and should thus not be limited by such a condition. In particular, we should allow a large fraction of neurons to fire during one time step, either as a result of an external synchronization of many neurons (e.g. by a strong, sudden stimulus) or because of synchronous oscillations emerging from the network dynamics. In this case, a Poisson-distributed spike count Δn(t) may exceed the number of neurons N. This problem can be remedied by drawing Δn(t) from a binomial distribution with mean D" nðtÞ and maximal value N. For D" nðtÞ ( N, this binomial distribution agrees with the Poisson distribution used in our theory, whereas at large D" nðtÞ it ensures that the spike count is bounded by the total number of neurons N. Although the binomial distribution does not follow strictly from our theory, it is expected to yield a very good approximation even at large D" nðtÞ. The reason is as follows: A statement analogous to our argument that the sum of Poisson numbers [Eq (39)] yields again a Poisson number [Eq (46)] is, in general, not valid for binomial random numbers if the random numbers (i.e. the firing probabilities P λ (t l |t k ) in our model) are very different. However, if neurons are strongly synchronized, and hence D" nðtÞ $ N, they fire with a similar probability, which implies indeed a binomial distribution of the spike count Δn(t).
Besides Eqs (101) and (102), a third condition concerns the approximation of the integral R tþDt t lðt 0 Þdt 0 in Eq (37) by " lðtÞDt (trapezoidal rule). This approximation is valid if the membrane potential u and threshold ϑ do not vary too strongly during a time step. More precisely, the absolute error of the trapezoidal rule is known to be Δt 3 |λ 00 |/12, which we require to be much smaller than λΔt. Thus, the relative error is of order Δt 2 . Using the definition of λ, Eq (25), this leads to the condition @u @t À @W @t 2 þ D u @ 2 u @t 2 À In summary, Δt should be chosen such that all three conditions, Eqs (101)- (103) are satisfied for all populations. Update of the membrane potential. To compute the firing probabilities, we need to update both the membrane potential and the threshold. In the presence of an exponential synaptic filter ab ðsÞ ¼ Yðs À DÞe À ðsÀ DÞ=t b s =t b s , the membrane potential of free neurons u(t) = h(t) in population α obeys the differential equation Assuming that the external stimulus I ext (t) and the population activity A N (t) are constant during one time step, the solution over one time step is given by where h tot is the total input of population α given by For refractory neurons, we obtain the membran potential in the GLM model by the simple formula u k (t l+1 ) = h(t l+1 ) + η(t l+1 − t k ). For the GIF model, the same update rule as for h(t), Eq (106), can be applied for k = l − K, . . ., l − k ref : For the absolute refractory period, l − k ref < k<l, the membrane potential remains at u k (t l+1 ) = u r . Note that the total integrated input h tot needs to be computed only once per time step.
Update of the threshold. Let us first discuss, how to compute the threshold at time t l given the values of g ℓ (t l ) and Δn(t k ) for k = l − K, . . ., l − 1. For free neurons, the threshold ϑ free (t l ) is given by Eq (88) evaluated at time t = t l . For refractory neurons, we find from Eq (90) that the threshold can be written in the discretized form W k ðt l Þ ¼ W free ðt l Þ þ yðt l À t k Þ þ 1 N X kÀ 1 k 0 ¼lÀ Kỹ ðt l À t k 0 ÞDnðt k 0 Þ; ð110Þ k = l − K, . . ., l − k ref . Eq (110) can be rewritten as where the variablesŴ k ðtÞ can be calculated iteratively starting at k = l − K: W kþ1 ðt l Þ ¼Ŵ k ðt l Þ þ N À 1ỹ t l À t k ð ÞDnðt k Þ; k ¼ l À K; . . . ; l À 1 À k ref ð112Þ with initial conditionŴ lÀ K ðt l Þ ¼ W free ðt l Þ. Thus, at each time step t l , the threshold can be rapidly evaluated in one sweep via Eqs (88), (111) and (112).
For the computation of the firing probabilities below, it is necessary to compute the threshold one time step ahead, i.e. at time t l+1 . To this end, we first update the variables g ℓ according to Eq (89): This yields the threshold ϑ free (t l+1 ) of free neurons via the formula Eq (88). For refractory neurons we find from Eqs (111) and (112) W k ðt lþ1 Þ ¼Ŵ k ðt lþ1 Þ þ y t lþ1 À t k À Á ; ð114Þ whereŴ k ðt lþ1 Þ can be iterated bŷ W kþ1 ðt lþ1 Þ ¼Ŵ k ðt lþ1 Þ þ N À 1ỹ t lþ1 À t k À Á Dnðt k Þ; k ¼ l À K; . . . ; l À k ref À 1: with initial condition W lÀ K ðt lþ1 Þ ¼ W free ðt lþ1 Þ À N À 1ỹ t lþ1 À t lÀ K À Á Dnðt lÀ K Þ: ð116Þ Firing probabilities. The firing probabilities for free and refractory neurons are given by P free ðt l Þ ¼ 1 À e À " l free ðt l ÞDt ; P l ðt l jt k Þ ¼ 1 À e À " lðt l jt k ÞDt ; ð117Þ respectively. Here, " l free ðt l Þ ¼ ½l free ðt l Þ þ l free ðt lþ1 Þ=2; ð118Þ are the arithmetic mean of the respective intensities at the beginning and end of the time interval (cf. Eq (37)). The free intensity λ free (t) is given by Eq (91). For refractory neurons, the conditional intensities are given by l k ðtÞ ¼ f u k ðtÞ À W k ðtÞ ð Þ; t k < t À k ref Dt where the last case corresponds to the absolute refractory period.

Population dynamics.
We can directly use the discretized form of the population equations given by Eqs (58) and (59). As in Eq (92) the infinite sums in Eq (59) can be split into an explicit, finite history of length K and a remaining part corresponding to k<l − K. This results in D" nðt l Þ ¼ X lÀ 1 k¼lÀ K P l ðt l jt k Þ " m k ðt l Þ þ P free ðt l Þxðt i Þþ þP L ðt l Þ N À where P L ðt l Þ ¼ X lÀ 1 k¼lÀ K P l ðt l jt k Þv k ðt l Þ þ P free ðt l Þzðt l Þ is the firing probability of "neurons" belonging to the "holes and overshoots" δm l,k (cf. RESULTS, Sec. "Mesoscopic population equations"). The variables x and z have the discrete time definition corresponding to a discretization of their integral definition above. Having calculated the expected spike count D" nðt l Þ, the actual spike count Δn(t l ) is obtained by drawing a binomially distributed random number Dnðt l Þ $ B N; p B ¼ D" nðt l Þ=N ð Þ ð124Þ as discussed above. In Eq (124), B(N, p B ) denotes the binomial distribution corresponding to N Bernoulli trials with success probability p B . The discrete evolution equations for " m k ðt l Þ and v k (t l ) are given by Eqs (48) and (51), respectively, which we repeat here for convenience: " m k ðt lþ1 Þ ¼ ½1 À P l ðt l jt k Þ " m k ðt l Þ ð125Þ v k ðt lþ1 Þ ¼ ½1 À P l ðt l jt k Þ 2 v k ðt l Þ þ P l ðt l jt k Þ " m k ðt l Þ: ð126Þ To find the update rule for x, we use the definition Eq (123) and the update rule for " m k ðt l Þ, Eq (125): " m k ðt l Þ þ ½1 À P l ðt l jt lÀ K Þ " m lÀ K ðt l Þ: Here, we have exploited that P λ (t l |t k ) = P free (t l ) for k<l − K. Using again Eq (125) we find xðt lþ1 Þ ¼ ½1 À P free ðt l Þxðt l Þ þ " m lÀ K ðt lþ1 Þ: ð128Þ 7. Realize the boundary conditions att ¼ t according to Eq (130).
These steps have to be performed for all populations α = 1, . . ., M. A detailed implementation of the algorithm is provided by the pseudocode shown in Figs 11 and 12. Whereas the complexity of the microscopic simulation is of order O(NΔt −1 ), the integration of the population equation is of order O(Δt −2 ) because in each time step one has to update a history of length K = T/Δt (step 3, 4 and 6). Hence, at low neuron numbers (e.g. N < 100), the direct simulation of the microscopic system may become more efficient. We emphasize, however, that to achieve a comparable accuracy, the integration of the mesoscopic population equation can be performed on a coarser, millisecond time scale (e.g., Δt = 1 ms), whereas the microscopic simulation requires precise spike times and hence a sub-millisecond simulation (e.g., Δt = 0.1 ms). If we take advantage of this fact, the mesoscopic population model performs well even at low neuron numbers.

Power spectrum
We characterize the fluctuations of the stationary population activityby the power spectrum defined asC ðf Þ ¼ lim is the Fourier transform of the population activity on a time window of length T. For a population of renewal neurons the power spectrum is known analytically. It is given by [134] whereP ISI ðf Þ is the Fourier transform of the interspike interval density and r is the stationary firing rate given by Note that the power of the fluctuations in Eq (134) scales like 1/N, vanishing in the macroscopic limit N ! 1. For the LIF model with escape noise, the hazard rate λ(t|0) is given by for t > t ref and λ(t|0) = 0 for t t ref .

Modified Potjans-Diesmann model
To model the cortical column of [5] in our framework, we used the parameters of the original publication and modified the model in two ways: First, the background Poisson input was replaced by a constant drive and an increased escape noise such that the populations exhibited  roughly the same stationary firing rates. Specifically, we set u r = 0 mV, u th = 15 mV andΔ u = 5 mV; and, using the mesoscopic dynamics, fitted the resting potentials of the GIF model (here denoted bym a ) without adaptation J θ = 0 to obtain firing ratesr that roughly match the target firing rates. Second, we introduced adaptation on excitatory cells with strength J θ and time scale τ θ , and re-adjusted the resting potential as follows u rest ¼m þ J yr . This yielded correct stationary firing rates in the presence of adaptation. The resulting parameters of the modified model are summarized in Table 2.

Acknowledgments
We thank Hesam Setareh for useful discussions.

Author Contributions
Conceptualization: WG TS MD.