Unsupervised Learning in an Ensemble of Spiking Neural Networks Mediated by ITDP

We propose a biologically plausible architecture for unsupervised ensemble learning in a population of spiking neural network classifiers. A mixture of experts type organisation is shown to be effective, with the individual classifier outputs combined via a gating network whose operation is driven by input timing dependent plasticity (ITDP). The ITDP gating mechanism is based on recent experimental findings. An abstract, analytically tractable model of the ITDP driven ensemble architecture is derived from a logical model based on the probabilities of neural firing events. A detailed analysis of this model provides insights that allow it to be extended into a full, biologically plausible, computational implementation of the architecture which is demonstrated on a visual classification task. The extended model makes use of a style of spiking network, first introduced as a model of cortical microcircuits, that is capable of Bayesian inference, effectively performing expectation maximization. The unsupervised ensemble learning mechanism, based around such spiking expectation maximization (SEM) networks whose combined outputs are mediated by ITDP, is shown to perform the visual classification task well and to generalize to unseen data. The combined ensemble performance is significantly better than that of the individual classifiers, validating the ensemble architecture and learning mechanisms. The properties of the full model are analysed in the light of extensive experiments with the classification task, including an investigation into the influence of different input feature selection schemes and a comparison with a hierarchical STDP based ensemble architecture.


Validation of Analytic Solutions by Numerical Simulation
In order to validate the analytic solutions of the voter ensemble network, we compared its results, as derived in the first part of the Results Section of the main paper, with a numerical simulation that simply iterated through all the underlying equations of the same model. This validation was deemed worthwhile because the simplified analytical model is based on Bernoulli random variables that simulate per sample firing events. The numerical simulation of the model allowed us to check that the long-term trends and statistics matched those predicted by the analytical solutions. The simulation was performed by a two-step process: a learning phase was followed by a measuring phase. Since this initial simplified model is for illustrative purposes (to determine if the ITDP and ensemble learning dynamics are as expected), for convenience and ease of analysis, abstract automatically generated data was used for the simulation. First, we defined the posterior probability tables as in Figure 3 (in the main paper) for every ensemble voter and the gating voter by randomly drawing their values using a certain distribution. After defining the posterior probabilities for an input dataset of size M , the learning phase trains the connection weights using ITDP until they are sufficiently converged by running the system for multiple rounds of dataset presentation. This is followed by the measuring phase which measures the performance of the final voter using these fixed weights. The performance of a single voter was measured using normalised conditional entropy (NCE) [1], which is suitable for measuring the performance of a multi-class discrimination task where the explicit relation between the neuronal index and the corresponding class is unavailable. NCE has a value in the range 0≤NCE≤0.5, with lower conditional entropy indicating that each neuron fires more predominantly for one class, hence giving better performance (See Methods for the details of the simulation procedure and the NCE calculation).

Comparison with Numerical Simulation
Both the learning and measuring phases in the numerical simulation were run for T = M ×R steps, where R is the number of rounds of dataset presentation which was set to 10 or 100, depending on the experiment. Using 4 input classes (N C = 4) the total number of input samples was set to M = 400 which consists of 100 samples for each class. The pre-determined firing probabilities of the ensemble and the gating voter can be set arbitrarily under the constraint In order to easily test and understand the broad picture of the ensemble behaviour (as well as to aid performance visualisation), let us represent the system by a few parameters describing the presynaptic voters. Since the final voter performance is affected directly by the performance of the ensemble and gating voters, and therefore cannot be objectively measured independently of them, we consider the three key factors to be: the overall performance of the ensemble voters, the gating voter performance, and the ensemble size. Controlling the overall performance of the ensemble voters is achieved by using statistically identical voters for the entire ensemble. The performance of a voter can be adjusted by varying a parameter which changes all of its firing probabilities as follows.
For N C = 4, the mean firing probabilities of the neurons in each voter for the samples from each class can be considered as a 4×4 matrix where each column is set to the average firing probabilities of a voter for the samples in the corresponding input class. We can see that the voter performance is maximum (lowest NCE) when this matrix is the identity matrix. Also the performance of a voter is analytically the same under the permutation of rows (i.e. switching neuron indices) as can be seen in Equations 27 and 28 (main paper). Thus in order to easily assign various performance values to the voters, we defined a parameter 1 N C ≤P max ≤1 such that every diagonal element of the matrix is set to P max , and all other elements are set equally to r = (1 − P max )/(N C − 1). The probability matrix of a voter for N C = 4 is written as: This probability matrix can be used to design a voter such that its NCE has the desired value. The simple case would be to assign these class-dependent firing probabilities identically for all samples within the corresponding class, which results in the NCE ranging from 0.5 (worst) to 0 (best) by setting P max accordingly. Using this setting, an initial illustrative picture of voter ensemble behavior is shown in Figure S1.1 which describes the analytically calculated voter NCE for different P max values and the final NCEs of combined output for different average ensemble NCEs and its sizes. However, having identical firing probabilities for entire sample subsets is not realistic. Thus we assign the firing probabilities of every sample for each subsets randomly by using the elements of the above probability matrix as the means of normal distributions with certain variances. For example, when N C = 4, the firing probabilities for four neurons for a sample from class 1 are set as (p, r 1 , r 2 , r 3 ) where p = N (P max , σ 2 p ), r 1,2 = N (r, σ 2 r ), and r 3 = 1−(p+r 1 +r 2 ). Probability assignment using uniform distributions is also possible to assess wider diversity on the posterior probabilities of samples by defining the range as U (P L , P H ).
Using these parameter settings, the ensemble system was tested with different performances of the ensemble voters and the gating voter. First the weight convergence by ITDP was examined by comparing the analytic solution of the expected values of weights and the weights learnt by numerical simulation. Figure S1.2 shows clear separation of weight values into groups according to the discriminability of the ensemble voters, indicating that the ITDP learning reflects the history of presynaptic voter behaviours, which influences the final voter performance. After learning with sufficient guidance from the gating voter, the decision from the better performing ensemble neurons influences the final voter output more by developing relatively stronger weights than the other neurons. Thus the spike from one strongly weighted synaptic projection can overwhelm several other weakly weighted 'wrong' decisions, achieving the weighted vote learnt from the previous history of ensemble behaviour (exactly the behaviour we desire in this kind of ensemble learning). Plotting numerical vs. analytic weights, as in Figure S1.2 D-F, shows that the two solutions match well, validating our analytic formulation of expected weight values, and demonstrates that the system performs very well under appropriate parameter settings.
The main role of the gating voter is to guide the spike-evoked EPSPs from the ensemble neurons which represent the same class onto one of the postsynaptic neurons of the final voter via topographic distal signals. The analytic model allows us to, for example, easily gain insight into the distribution of the outgoing weights from an ensemble neuron to each of final voter neurons. For tractability, let us assume that each voter neuron has a constant firing probability for all samples in the same class. Using the firing probabilities of presynaptic voters defined by the diagonal matrices as in Equation S1.1, the probabilities of the ensemble voter and the gating voter can be defined by the parameters P E max and P G max respectively. Assuming the probabilities of input class presentations (p(c n )) are all the same, the analytic solution of the weight from one of the ensemble voter neurons (m i ) to the final voter neuron f k can be rewritten using class-conditional probabilities p(m|c) as Setting a = e 5 , N C = 4, N E = 3, and substituting p(m i |c n ) and p(g k |c n ) for the elements of the corresponding 4×4 diagonal probability matrices as in Equation S1.1, with parameters P E max and P G max gives: , otherwise (S1.3) We can see that the most enhanced connection weight from the presynaptic neuron m i to the postsynaptic neuron f k (i = k in the current setting) has an inverse relationship with the rest of the outgoing weights (connections to f k where i =k) in (P E max , P G max ) space. Figure S1.3 shows (when i = 1) that a low P G max (i.e. low gating voter performance) results in less difference between outgoing weights from the presynaptic neuron m 1 , hence giving equal EPSPs to all postsynaptic neurons (i.e. w 11 = w 21 = w 31 = w 41 , resulting in no capability of weighted voting). The same holds for the ensemble voter parameter (P E max ). Therefore, in our ensemble voter system, both the gating voter and at least one ensemble voter must have positive discriminability (NCE<0.5) in order to learn to perform weighted voting. The performance of the final voter depends on the performances of every voter in the ensemble, the gating voter, and the ensemble size. The combined output of the ensemble voter network can be understood by measuring the long-term behavior of the momentary output of the final voter. Figure S1.4 illustrates an example of the final voter probabilities iteratively calculated using the numerical simulation (with fixed weights after learning) with N C = 4, N E = 3. The firing probabilities of voters for each sample was randomly set by a normal distribution with variance 0.1, where the means were set by the parameters P E max = (0.5, 0.6, 0.7) for each ensemble voter and P G max = 0.6 for the gating voter. The EPSP for the final voter neuron was calculated simply by using the weighted sum of presynaptic spikes, resulting in 4 3 = 64 possible states of the presynaptic spikes. Thus the maximum number of probability values that each final voter neuron can have is 64, where their individual values depend on the synaptic weights ( Figure S1.4 E). At each time instance, the firing probability of each final voter neuron belongs to one of the corresponding set of values, only differing by their frequency of appearance (as shown in Figure  S1.4 A-D). The average firing behaviour ( Figure S1.4 F) shows that each neuron dominantly fires for a corresponding class, which indicates that the ensemble network successfully achieved weighted voting. ensemble are set by sampling from identical gaussian distributions of N (P E max , 0.05) in order to easily assign various overall average ensemble performances (and P G max for the gating voter). The results show a very good match between the numerical simulations and their analytic solutions over the different combinations of the ensemble and gating voter parameters (0.25 < P E max , P G max < 1.0). The pairs of graphs in each column show the effect of increased ensemble size, which improves the weighted vote performance as analytically predicted in Figure S1.1-right.
These validation tests showed that the logical model of a spiking voter ensemble system and its analytic solutions are capable of performing efficient spike-based weighted voting, driven by ITDP, and gives us important insights into how that is achieved. They also demonstrated how the seemingly complex network of interactions between stochastic processes within a population of voters can be effectively described by a series of probability metrics.