tension: A Python package for FORCE learning

First-Order, Reduced and Controlled Error (FORCE) learning and its variants are widely used to train chaotic recurrent neural networks (RNNs), and outperform gradient methods on certain tasks. However, there is currently no standard software framework for FORCE learning. We present tension, an object-oriented, open-source Python package that implements a TensorFlow / Keras API for FORCE. We show how rate networks, spiking networks, and networks constrained by biological data can all be trained using a shared, easily extensible high-level API. With the same resources, our implementation outperforms a conventional RNN in loss and published FORCE implementations in runtime. Our work here makes FORCE training chaotic RNNs accessible and simple to iterate, and facilitates modeling of how behaviors of interest emerge from neural dynamics.


A.2 Spiking neuron models
The continuous time voltage rate equations of the Theta, LIF, and Izhikevich spiking neural networks as outlined in (3) are summarized below. The voltage for the Theta neuron is governed by:θ = (1 − cos(θ(t))) + π 2 I(t)(1 + cos(θ(t))) The voltage for the LIF neuron is governed by: The voltage for the Izhikevich neuron is governed by: where I(t) = I Bias + s(t).

A.3 full-FORCE algorithm
full-FORCE (2) is a modified architecture for FORCE training in which a target-generating network is used to derive target hidden activations for each neuron in the primary network to more efficiently update w R (t). The discrete time forward pass of this target generating network is (denoted by subscript D): where all weights of the target generating network are randomly initialized and not trainable. The recurrent weights of the primary network are updated using:

A.4 FORCE-training spiking RNNs
Spiking neural networks have attracted significant recent interest both in the neuroscience and the machine learning community (4; 5; 6). A FORCE approach can also be used to train spiking RNNs by incorporating synaptic filtering dynamics, which translate discrete presynaptic spikes into continuous postsynaptic currents. Based on (3), a simple discrete time single exponential synaptic filter is: and the discrete time double exponential synaptic filter is: where t jk indicates the time of the kth spike for the jth neuron. We include specific spiking neuron models in Section A.2.

A.5 Spiking neural networks
Per (3), the continuous time single exponential synaptic filter is given by: and the double exponential synaptic filter is given by: where t jk indicates the time of the kth spike for the jth neuron.

A.6 Connection to Echo-State Networks
Like FORCE, classic ESN training(7) harnesses chaotic internal dynamics within an RNN to perform time-series prediction tasks. The main difference between ESN learning and FORCE learning is that in the ESN, the target function f (t) is used as the feedback signal during training, while in FORCE, the actual network output z(t) is used as the feedback signal, which empirically improves stability (1).

A.7 Connection to gradient methods
The shared goal of ESNs and FORCE is to train RNNs conscious of and explicitly manipulating dynamics internally generated within the RNN reservoir, with the hope of expressing long-term dependencies in a basis of these dynamics (1; 8). This represents a distinct conceptual approach from gradient methods such as backpropagation through time (BPTT), which aim to gradually reduce training error using small, full-rank (in general) updates to the trainable weights proportional to the gradient of the error function. In the chaotic regime of an RNN, gradient methods may fail to converge if the gradient degenerates (8). In contrast, RNNs with initial chaotic spontaneous activity converge more quickly and find more robust solutions under FORCE learning (1). An alternative perspective considers whether the RLS delta update can be (approximately) written as the true gradient of some objective function. (9) address a similar problem: given the linear least-squares loss function with network parameters W, S samples with T timepoints and linear readouts Z, the authors show that an RLS update roughly approximates an SGD update with an adaptive learning rate.
In our work here, as in the majority of existing work using FORCE, the gradient of the error in the output is not used. The above-mentioned approximation notwithstanding, the update computed by FORCE is not in general proportional to the gradient of the error. Nonetheless, interpolating between FORCE and gradient-based weight updates represents a potentially promising avenue for future work.

A.8 Improving performance using XLA
The performance of FORCE learning is largely dictated by the speed of the underlying linear algebra library used to compute the matrix operations core to RLS. An advantage of using a TensorFlow-based framework is that the Accelerated Linear Algebra (XLA) optimizing compiler can be deployed easily to speed up the code with no code changes required (10). ?? illustrates the running time of training an NoFeedbackESN using FORCEModel without and with XLA optimization. Experiments were performed on a 32-core AMD EPYC 7551P processor @ 2 GHz. The same input and target as Fig ?? was used. Networks were trained for 5 epochs with the average running time being recorded. The error bars were obtained by running each experiment 5 times. The performance improvement afforded by utilizing XLA may vary based on specific hardware used; due to known compatibility issues we were unable to perform this experiment on GPU.