Characterization and Compensation of Network-Level Anomalies in Mixed-Signal Neuromorphic Modeling Platforms

Advancing the size and complexity of neural network models leads to an ever increasing demand for computational resources for their simulation. Neuromorphic devices offer a number of advantages over conventional computing architectures, such as high emulation speed or low power consumption, but this usually comes at the price of reduced configurability and precision. In this article, we investigate the consequences of several such factors that are common to neuromorphic devices, more specifically limited hardware resources, limited parameter configurability and parameter variations due to fixed-pattern noise and trial-to-trial variability. Our final aim is to provide an array of methods for coping with such inevitable distortion mechanisms. As a platform for testing our proposed strategies, we use an executable system specification (ESS) of the BrainScaleS neuromorphic system, which has been designed as a universal emulation back-end for neuroscientific modeling. We address the most essential limitations of this device in detail and study their effects on three prototypical benchmark network models within a well-defined, systematic workflow. For each network model, we start by defining quantifiable functionality measures by which we then assess the effects of typical hardware-specific distortion mechanisms, both in idealized software simulations and on the ESS. For those effects that cause unacceptable deviations from the original network dynamics, we suggest generic compensation mechanisms and demonstrate their effectiveness. Both the suggested workflow and the investigated compensation mechanisms are largely back-end independent and do not require additional hardware configurability beyond the one required to emulate the benchmark networks in the first place. We hereby provide a generic methodological environment for configurable neuromorphic devices that are targeted at emulating large-scale, functional neural networks.


S2.4 Scaling
Due to the modularity of this network model, several straightforward possibilities exist for increasing or decreasing its size without affecting its basic functionality. One can vary the total number of neurons simply by modifying the number of cells per MC. One can also vary the number of MCs per attractor by varying the total number of HCs. And finally, one can change the number of attractors by changing the number of MCs per HC accordingly.
All such changes need to be accompanied by corresponding modifications in connectivity in order to preserve the network dynamics. This has been done by keeping the average input current per neuron within an active attractor constant, which is equivalent to conserving the fan-in for every neuron from every one of its afferent populations and leads to the scaling rules shown in Table S2.9. In order to facilitate a comparison with the original results from [59] and [60], we have only considered homogeneous changes, meaning that all modules (MCs, HCs) were equal in size and symmetrically connected.    Figure S2.1 A-C τ refrac 0.16 0.16 0.16 ms minimum available in hardware at the used speedup τ syn,e 17.5 66.6 6.0 ms see paragraph "Synapses" τ syn,i 6.0 --ms see paragraph "Synapses" V reset -60.7 -72.5 -72.5 mV from the fits in Figure x represents the number of units of type x (the original values are found in Table S2.3). p represents the original connection probability as found in table Table S2.4. Whenever a scaled probabilityp exceeded 1, it was clipped to 1, but the weights of the corresponding synapses were also increased bỹ w syn = w syn ·p.
The connections to the BAS cells required special treatment for two reasons. Firstly, during an active state, they receive input from a single MC, but are excited by all MCs in a HC during the competition period between active attractors. Only one aspect can be preserved when scaling and we have considered the dynamics during UP states as most important, leading to a "PYR → BAS" scaling rule independent of N MC . Secondly, because PYR cells in MCs only project to the nearest 8 BAS cells, there are always precisely 8 active BAS cells per HC within an active attractor, which yields a simple "BAS → PYR" scaling rule. When decreasing the number of attractors however, the number of existing BAS cells per HC also decreases, making an appropriate connection density scaling necessary. This is the reason for the two different "BAS → PYR" scaling rules found in Table S2.9. Table S2.10 shows the combinations of N HC and N MC used for the quantification of synapse loss after mapping the L2/3 model onto the hardware in Figure 11. In these mapping sweeps the diffusive background noise was modeled, as for the large-scale network ported to the ESS (Section 3.1.7), with a background pool of 5000 Poisson sources and every PYR cell receiving input from 100 of the sources.

S2.5 UP-state detection
One crucial element of the analysis is the detection of UP-states from which various other properties such as dwell times, competition times as well as average spike-rates in UP-and DOWN-states are determined. The method of choice for detecting UP-states is based on the fact that the mean spike rate of an attractor during an UP-state is much higher than the spike rate in all remaining patterns in their corresponding DOWN-states, whereas -in times of competition -two or more attractors have elevated but rather similar spike rates. A measure which quantifies this relationship is the standard deviation σ of all mean spike rates per attractor at a given time t. The attractor with index i is then said to be in an UP-state at time t if the following relation holds: where r i (t) is the rate of attractor i at time t and c is a numerical constant which is set to 1. This method of detection has several advantages: it is based exclusively on spike trains (and not voltages or conductances, which are more difficult to read out and require much more storage space), it has a clear notion of there being at most one UP-state at any given time and it is completely local (in In small networks with randomly spiking neurons, it might happen by chance that all but one of the spike rates lie below the (approximately) constant standard deviation. These falsely detected UP-states are very short and can thus easily be filtered out by requiring a minimal duration for UP states, which we chose at 100 ms. This value was chosen after investigating dwell time histograms, as it distinguishes reliably between random fluctuations and actual active attractors.

S2.6 Pattern Completion
Pattern completion is a basic property of associative-memory networks. By only stimulating a subset of PYR cells pwithin a pattern, the complete pattern is recalled. The activity first spreads within the stimulated MCs, turning them dominant in their corresponding HCs. After that, the activity spreads further to other HCs -while the already dominating MCs stabilize each other through mutual stimulation -activating the whole pattern while suppressing all others. All PYR cells in the corresponding attractor hence enter an UP-state.
To verify the pattern completion ability of the network, a series of simulations was performed. In order to reduce the occurrence of spontaneously activating attractors -which would interfere with the activation of the stimulated attractor -competition was investigated in larger networks of size 25HC×25MC, as they exhibit almost no spontaneous attractors (the competition time fractions are much higher, see Figure 6 H).
For each network, all of the 25 patterns were stimulated one by one in random order. The time between consecutive stimuli was chosen to be 1000 ms to ensure minimal influence between patterns. The number of stimulated MCs (one per HC) was varied over the course of multiple simulations.
After simulation, each network was analyzed for successfully activated patterns. An activation attempt was said to be successful if the stimulated pattern was measured as active within 200 ms after the stimulus onset. If another pattern was active up to 75 ms or if the stimulated pattern had already been active between 20 − 500 ms prior to the stimulus onset, the attempt was deemed invalid and ignored during the calculation of success ratios. This was done to take into account the fact that a pattern is more difficult to activate when another one is already active or while it is still recovering from a prior activation. From all valid attempts the success probability (assuming a binomial distribution of successful trials) was estimated using the Wilson interval wherep represents the success ratio, n the number of valid attempts and z = 1 the desired quantile.
For most experiments (regular, synaptic weight noise and homogeneous synaptic loss) the number of invalid activations was always below 5 (out of 25). The only exception was the PYR population size scaling: starting at 15 PYR cells, the validity rate roughly halves for every reduction in size (by 5 PYR cells per step) due to the increased occurrence of spontaneous attractors. For simulations carried out on the ESS, only 10 patterns out of 25 were stimulated. Out of these 10 attempts, only 5 were valid, on average.

S2.7 Pattern Rivalry / Attentional Blink
Another important feature of the L2/3 model is its ability to reproduce the attentional blink phenomenon, i.e., the inability of one pattern, stimulated by layer 4 input, to terminate another already active pattern and become active itself. This phenomenon was investigated through a series of different networks of same size as in Section S2.6 (25HC×25MC). For each network, 24 out of 25 patterns were randomly assigned to 12 pairs. Afterwards, pattern rivalry was tested on all of these pairs in intervals of 1000 ms.
Let the two patterns in each pair be denoted A and B. In order to guarantee an immediate activation of pattern A, 6 out of 25 HCs were stimulated (as then all completion attempts are successful, see Figure 6 N). Then, after a certain delay ∆T , pattern B was stimulated with a varying amount of HCs. Both the number of stimulated HCs as well as the delay ∆T were varied for each network.
The same way as in Section S2.6, each network was then analyzed as to whether pattern B was successfully activated or not. If the competing pattern B was activated within 200 ms after the stimulus onset and stayed active for at least 100 ms, the attempt was counted as successful, otherwise it was deemed unsuccessful. As before, attempts during which spontaneously activated patterns intervened were ignored. From all successful and unsuccessful attempts, the success probability was then estimated the same way as in pattern completion, using Equation S2.2.
The validity ratios for pattern rivalry are not significantly different from those discussed in Section S2.6. Most experiments (regular, synaptic weight noise and homogeneous synaptic loss) have 10 to 12 valid attempts (out of 12). As before, for the PYR population size scaling experiments, the number of valid attempts dropped progressively (8.2 ± 1.7, 4.8 ± 2.1 and 2.2 ± 1.5 valid attempts for 15, 10 and 5 PYR per MCs respectively). Simulations carried out on the ESS had an average of 4 (distorted case) and 6 (compensated case) valid attempts (out of 10). Different network configurations have been compared in terms of attentional blink by estimating the 0.5 iso-probability contour in the following way. For every delay ∆T , the transition point from below to above 0.5 probability for successful activation of the second pattern was estimated by linearly interpolating between the two nearest data points with a success ratio of above and below 0.5, respectively. In case there were several such transition points only the one with the highest stimulus was considered. If no transition point could be identified, the transition was fixed at at either 25 or 0 stimulated MCs, depending on whether all success ratios were above or below 0.5. When there were no valid attempts for a certain delay/stimulus pair, its success probability estimate was replaced by the median of all valid activation attempts for that particular time delay ∆T (this only occurred sporadically in ESS and PYR population size scaling with less than 15 PYR cells per MC). After identifying the transition point for every time delay ∆T , intermediate values were interpolated linearly. Finally, the interpolated values were Gaussfiltered (µ = 0.25 × step size for ∆T in the dataset) to better approximate the true 0.5 iso-probability

S2.8 Star plots
While the spiking activity of many cells can be visualized quite well in raster plots, illustrating the temporal evolution of their membrane potentials is less straightforward. Here, we have chosen to use socalled star plots for visualizing both average voltages and average firing rates of entire cell populations.
In a system evolving in an abstract space with 3 dimensions, a star plot represents the orthogonal projection of the state space trajectory along the main diagonal of the corresponding Cartesian coordinate system onto a plane perpendicular to it. For n dimensions, points x in the star plot are no longer projections of the states z, but are rather calculated as A visualization for n = 3 is illustrated in Figure S2.2. In case of the L2/3 network, the number of dimensions is given by the number of attractors, with each axis describing some particular feature of the corresponding attractor (such as the average voltage or spike rate of the constituent PYR cells).
In addition to the position in state space, the state space velocity is also encoded in a star plot by both the thickness and the color of the trajectory. Especially in the case of the L2/3 network, this can be very useful in visualizing e.g. attractor stability or competition times. Here, both line thickness and lightness were chosen proportional to (const + e −|dx|/dt ), with x being the position in state space. Figure S2.3 B and C show two characteristic examples of star plots used for visualizing the dynamics of the L2/3 network.

S2.9 Average synaptic conductance due to Poisson stimulation
For a single Poisson source with rate ν i connected to the neuron by a synapse with weight w i and time constant τ syn , the conductance course can be viewed as a sum of independent random variables, each of them representing the conductance change caused by a single spike. In the limit of large ν i , the central limit theorem guarantees the convergence of the conductance distribution to a Gaussian, with moments given by Since conductances sum up linearly, N Poisson sources lead to an average conductance of S2.10 Detailed simulations of synapse loss and PYR population reduction Figure S2.3 and S2.4 show the effects of various levels of synapse loss and PYR population reduction, respectively.

S2.11 Synaptic weight noise
As can be seen in Figure S2.5, the firing rate of single PYR cells is highly dependent on the synaptic input weight that connects them to their respective Poisson source. For example, a variation of 20% in the input weight can cause the firing rate to either effectively vanish or more than triple. This heavily distorts network dynamics as PYR cells within MCs will exhibit highly disparate firing rates, thereby disrupting the network's ability to maintain stable UP states (in which all participating PYR cells should fire roughly with the same rate).