The authors of this paper have a financial interest in the technology derived from the work presented in this paper. Patents include the following: US6889216, Physical neural network design incorporating nanotechnology; US6995649, Variable resistor apparatus formed utilizing nanotechnology; US7028017, Temporal summation device utilizing nanotechnology; US7107252, Pattern recognition utilizing a nanotechnology-based neural network; US7398259, Training of a physical neural network; US7392230, Physical neural network liquid state machine utilizing nanotechnology; US7409375, Plasticity-induced self organizing nanotechnology for the extraction of independent components from a data stream; US7412428, Application of hebbian and anti-hebbian learning to nanotechnology-based physical neural networks; US7420396, Universal logic gate utilizing nanotechnology; US7426501, Nanotechnology neural network methods and systems; US7502769, Fractal memory and computational methods and systems based on nanotechnology; US7599895, Methodology for the configuration and repair of unreliable switching elements; US7752151, Multilayer training in a physical neural network formed utilizing nanotechnology; US7827131, High density synapse chip using nanoparticles; US7930257, Hierarchical temporal memory utilizing nanotechnology; US8041653, Method and system for a hierarchical temporal memory utilizing a router hierarchy and hebbian and anti-hebbian learning; US8156057, Adaptive neural network utilizing nanotechnology-based components. Additional patents are pending. Authors of the paper are owners of the commercial companies performing this work. Companies include the following: Cover Letter; KnowmTech LLC, Intellectual Property Holding Company: Author Alex Nugent is a Co-owner; M. Alexander Nugent Consulting, Research and Development: Author Alex Nugent is owner and Tim Molter employee; Xeiam LLC, Technical Architecture: Authors Tim Molter and Alex Nugent are co-owners. Products resulting from the technology described in this paper are currently being developed. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials. The authors agree to make freely available any materials and data described in this publication that may be reasonably requested for the purpose of academic, non-commercial research. As part of this, the authors have open-sourced all code and data used to generated the results of this paper under a “M. Alexander Nugent Consulting Research License”.
Conceived and designed the experiments: MAN TWM. Performed the experiments: MAN TWM. Analyzed the data: MAN TWM. Contributed reagents/materials/analysis tools: MAN TWM. Wrote the paper: MAN TWM.
Modern computing architecture based on the separation of memory and processing leads to a well known problem called the von Neumann bottleneck, a restrictive limit on the data bandwidth between CPU and RAM. This paper introduces a new approach to computing we call AHaH computing where memory and processing are combined. The idea is based on the attractor dynamics of volatile dissipative electronics inspired by biological systems, presenting an attractive alternative architecture that is able to adapt, self-repair, and learn from interactions with the environment. We envision that both von Neumann and AHaH computing architectures will operate together on the same machine, but that the AHaH computing processor may reduce the power consumption and processing time for certain adaptive learning tasks by orders of magnitude. The paper begins by drawing a connection between the properties of volatility, thermodynamics, and Anti-Hebbian and Hebbian (AHaH) plasticity. We show how AHaH synaptic plasticity leads to attractor states that extract the independent components of applied data streams and how they form a computationally complete set of logic functions. After introducing a general memristive device model based on collections of metastable switches, we show how adaptive synaptic weights can be formed from differential pairs of incremental memristors. We also disclose how arrays of synaptic weights can be used to build a neural node circuit operating AHaH plasticity. By configuring the attractor states of the AHaH node in different ways, high level machine learning functions are demonstrated. This includes unsupervised clustering, supervised and unsupervised classification, complex signal prediction, unsupervised robotic actuation and combinatorial optimization of procedures–all key capabilities of biological nervous systems and modern machine learning algorithms with real world application.
How does nature compute? Attempting to answer this question naturally leads one to consider biological nervous systems, although examples of computation abound in other manifestations of life. Some examples include plants
A brain, like all living systems, is a far-from-equilibrium energy dissipating structure that constantly builds and repairs itself. We can shift the standard question from “how do brains compute?” or “what is the algorithm of the brain?” to a more fundamental question of “how do brains build and repair themselves as dissipative attractor-based structures?” Just as a ball will roll into a depression, an attractor-based system will fall into its attractor states. Perturbations (damage) will be fixed as the system reconverges to its attractor state. As an example, if we cut ourselves
Our goal is to lay a foundation for a new type of practical computing based on the configuration and repair of volatile switching elements. We traverse the large gap from volatile memristive devices to demonstrations of computational universality and machine learning. The reader should keep in mind that the subject matter in this paper is necessarily diverse, but is essentially an elaboration of these three points:
AHaH plasticity emerges from the interaction of volatile competing energy dissipating pathways.
AHaH plasticity leads to attractor states that can be used for universal computation and advanced machine learning
Neural nodes operating AHaH plasticity can be constructed from simple memristive circuits.
Through constant dissipation of free energy, living systems continuously repair their seemingly fragile state. A byproduct of this condition is that living systems are intrinsically adaptive at all scales, from cells to ecosystems. This presents a difficult challenge when we attempt to simulate such large scale adaptive networks with modern von Neumann computing architectures. Each adaptation event must necessarily reduce to memory–processor communication as the state variables are modified. The energy consumed in shuttling information back and forth grows in line with the number of state variables that must be continuously modified. For large scale adaptive systems like the brain, the inefficiencies become so large as to make simulations impractical.
As an example, consider that IBM’s recent cat-scale cortical simulation of 1 billion neurons and 10 trillion synapses
At the core of the adaptive power problem is the energy wasted during memory–processor communication. The ultimate solution to the problem entails finding ways to let memory configure itself, and AHaH computing is one such method.
Consider two switches, one non-volatile and the other volatile. Furthermore, consider what it takes to change the state of each of these switches, which is the most fundamental act of adaptation or reconfiguration. Abstractly, a switch can be represented as a potential energy well with two or more minima.
In the non-volatile case, sufficient energy must be applied to overcome the barrier potential. Energy must be dissipated in proportion to the barrier height once a switching event takes place. Rather than just the switch, it is also the electrode leading to the switch that must be raised to the switch barrier energy. As the number of adaptive variables increases, the power required to sustain the switching events scales as the total distance needed to communicate the switching events and the square of the voltage.
A volatile switch on the other hand cannot be read without damaging its state. Each read operation lowers the switch barriers and increases the probability of random state transitions. Accumulated damage to the state must be actively repaired. In the absence of repair, the act of reading the state is alone sufficient to induce state transitions. The distance that must be traversed between memory and processing of an adaptation event goes to zero as the system becomes intrinsically adaptive. The act of accessing the memory
In the non-volatile case some process external to the switch (i.e. an algorithm on a CPU) must provide the energy needed to effect the state transition. In the volatile case an external process must
Not only does it make physical sense to build large scale adaptive systems from volatile components but furthermore there is no supporting evidence to suggest it is possible to do the contrary. A brain is a volatile dissipative out-of-equilibrium structure. It is therefore reasonable that a volatile solution to machine learning at low power and high densities exists. The goal of AHaH computing is to find and exploit this solution.
In 1936, Turing, best known for his pioneering work in computation and his seminal paper ‘On computable numbers’
In 1944, physicist Schrödinger published the book
In 1949, only one year after Turing wrote ‘Intelligent machinery’, synaptic plasticity was proposed as a mechanism for learning and memory by Hebb
In 1953, Barlow discovered neurons in the frog brain fired in response to specific visual stimuli
In 1960, Widrow and Hoff developed ADALINE, a physical device that used electrochemical plating of carbon rods to emulate the synaptic elements that they called
In 1969, the initial excitement with perceptrons was tampered by the work of Minsky and Papert, who analyzed some of the properties of perceptrons and illustrated how they could not compute the XOR function using only local neurons
In 1971, Chua postulated on the basis of symmetry arguments the existence of a missing fourth two terminal circuit element called a memristor (
VLSI pioneer Mead published with Conway the landmark text
Beinenstock, Cooper and Munro published a theory of synaptic modification in 1982
At roughly the same time, the theory of support vector maximization emerged from earlier work on statistical learning theory from Vapnik and Chervonenkis and has become a generally accepted solution to the generalization versus memorization problem in classifiers
In 2004, Nugent et al. showed how the AHAH plasticity rule is derived via the minimization of a kurtosis objective function and used as the basis of self-organized fault tolerance in support vector machine network classifiers. Thus, the connection that margin maximization coincides with independent component analysis and neural plasticity was demonstrated
In 2008, HP Laboratories announced the production of Chua’s postulated electronic device, the memristor
Turing spent the last two years of his life working on mathematical biology and published a paper titled ‘The chemical basis of morphogenesis’ in 1952
Answering this question in a physical sense leads one straight into the controversial 4th law of thermodynamics. The 4th law is is attempting to answer a simple question with profound consequences if a solution is found: If the 2nd law says everything tends towards disorder, why does essentially everything we see in the Universe contradict this? At almost every scale of the Universe we see self-organized structures, from black holes to stars, planets and suns to our own earth, the life that abounds on it and in particular the brain. Non-biological systems such as Benard convection cells
One line of argument is that ordered structures create entropy faster than disordered structures do and self-organizing dissipative systems are the result of
One particularly clear and falsifiable formulation of the 4th law comes from Swenson in 1989:
“A system will select the path or assembly of paths out of available paths that minimizes the potential or maximizes the entropy at the fastest rate given the constraints
Others have converged on similar thoughts. For example, Bejan postulated in 1996 that:
“For a finite-size system to persist in time (to live), it must evolve in such a way that it provides easier access to the imposed currents that flow through it
Bejan’s formulation seems intuitively correct when one looks at nature, although it has faced criticism that it is too vague since it does not say what particle is flowing. We observe that in many cases the particle is either directly a carrier of free energy dissipation or else it gates access, like a key to a lock, to free energy dissipation of the units in the collective. These particles are not hard to spot. Examples include water in plants, ATP in cells, blood in bodies, neurotrophins in brains, and money in economies.
More recently, Jorgensen and Svirezhev have put forward the
Hatsopoulos and Keenan’s
“When an isolated system performs a process, after the removal of a series of internal constraints, it will always reach a unique state of equilibrium; this state of equilibrium is independent of the order in which the constraints are removed.”
The idea is that a system erases any knowledge about how it arrived in equilibrium. Schneider and Sagan state this observation in their book
We may reformulate this idea in the light of an adaptive container, as shown in
A) A first replenished pressurized container
Now we ask how the container adapts as the system attempts to come to equilibrium. If it is the
The gradient
The sudden pressurization of
We now map this thermodynamic process to anti-Hebbian and Hebbian (AHaH) plasticity and show that the resulting attractor states support universal algorithms and broad machine learning functions. We furthermore show how AHaH plasticity can be implemented via physically adaptive memristive circuitry.
The thermodynamic process outlined above can be understood more broadly as: (1) particles spread out along all available pathways through the environment and in doing so erode any differentials that favor one branch over the other, and (2) pathways that lead to dissipation (the flow of the particles) are stabilized. Let us first identify a synaptic weight,
We can now see that the synaptic weight possess state information. If
Anti-Hebbian (erase the path): Any modification to the synaptic weight that reduces the probability that the synaptic state will remain the same upon subsequent measurement.
Hebbian (select the path): Any modification to the synaptic weight that increases the probability that the synaptic state will remain the same upon subsequent measurement.
Our use of Hebbian learning follows a standard mathematical generalization of Hebb’s famous postulate:
“When an axon of cell A is near enough to excite B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased
Hebbian learning can be represented mathematically as
To begin our mapping of AHaH plasticity to computing and machine learning systems we use a standard linear neuron model. The choice of a linear neuron is motivated by the fact that they are ubiquitous in machine learning and also because it is easy to achieve the linear sum function in a physical circuit, since currents naturally sum.
The inputs
The weights and bias change according to AHaH plasticity, which we further detail in the sections that follow. The AHaH rule acts to
What we desire is a mechanism to extract the underlying building blocks or
An AHaH node is a hyperplane attempting to bisect its input space so as to make a binary decision. There are many hyperplanes to choose from and the question naturally arises as to which one is best. The generally agreed answer to this question is “the one that maximizes the separation (margin) of the two classes.” The idea of
Given a discrete set of inputs and a discrete set of outputs it is possible to account for all possible transfer functions via a logic function. Logic is usually taught as small two-input gates such as NAND and OR. However, when one looks at a more complicated algorithm such as a machine learning classifier, it is not so clear that it is performing a logic function. As demonstrated in following sections, AHaH attractor states are computationally complete logic functions. For example, when robotic arm actuation or prediction is demonstrated, self-configuring logic functions is also being demonstrated.
In what follows we will be adopting a
Let us analyze the simplest possible AHaH node; one with only two inputs. The three possible input patterns are:
Stable synaptic states will occur when the sum over all weight updates is zero. We can plot the AHaH node’s stable decision boundary on the same plot with the data that produced it. This can be seen in
The AHaH rule naturally forms decision boundaries that maximize the margin between data distributions (black blobs). This is easily visualized in two dimensions, but it is equally valid for any number of inputs. Attractor states are represented by decision boundaries A, B, C (green dotted lines) and D (red dashed line). Each state has a corresponding anti-state:
We refer to the A state as the null state. The null state occurs when an AHaH node assigns the same weight value to each synapse and outputs the same state for every pattern. The null state is mostly useless computationally, and its occupation is inhibited by bias weights. Through strong anti-Hebbian learning, the bias weights force each neuron to split the output space equally. As the neuron
Recall Turing’s idea of a network of NAND gates connected by
By connecting the output of AHaH nodes (circles) to the input of static NAND gates, one may create a universal reconfigurable logic gate by configuring the AHaH node attractor states (
We can achieve all logic functions directly (without NAND gates) if we define a
Logic Pattern | Spike Logic Pattern |
(0, 0) | (1, |
(0, 1) | (1, |
(1, 0) | ( |
(1, 1) | ( |
Every AHaH attractor consists of a state/anti-state pair that can be configured and therefore appears to represent a
Although we discuss a
A differential pair of memristors is used to form a synaptic weight, allowing for both a sign and magnitude. The bar on the memristor is used to indicate polarity and corresponds to the lower potential end when driving the memristor into a higher conductance state.
The circuits capable of achieving AHaH plasticity can be broadly categorized by the electrode configuration that forms the differential synapses as well as how the input activation (current) is converted to a feedback voltage that drives unsupervised anti-Hebbian learning
The functional objective of the AHaH circuit shown in
The circuit produces an analog voltage signal on the output at node y given a spike pattern on its inputs labeled
During the read phase, driving voltage sources
During the write phase, driving voltage source F is set to either
A more intuitive explanation of the above feedback cycle is that “the winning pathway is rewarded by not getting decayed.” Each synapse can be thought of as two competing energy dissipating pathways (positive or negative evaluations) that are building structure (differential conductance). We may apply reinforcing Hebbian feedback by (1) allowing the winning pathway to dissipate more energy or (2) forcing the decay of the losing pathway. If we chose method (1) then we must at some future time ensure that we decay the conductance before device saturation is reached. If we chose method (2) then we achieve both decay and reinforcement at the same time.
Without significant demonstrations of utility there is little motivation to pursue a new form of computing. Our functional model abstraction is necessary to reduce the computational overhead associated with simulating circuits and enable large scale simulations that tackle benchmark problems with real world utility. In this section, we derive the AHaH plasticity rule again, but instead of basing it on statistical independent components as in the derivation of
During the read phase, simple circuit analysis shows that the voltage on the electrode labeled y in the circuit shown in
During the write phase the driving voltage source F is set according to either a supervisory signal or in the unsupervised case, the anti-signum of the previous read voltage:
We may adapt
Using
A) Voltages during read phase across spike input memristors. B) Voltages during write phase across spike input memristors. C) Voltages during read phase across bias memristors. D) Voltages during write phase across bias memristors.
Input Memristors | Bias Memristors | |||
Read | Write | Read | Write | |
Accumulate | Decay | Decay | Accumulate | |
The output voltage during the read phase reduces to:
By absorbing
Model A is an approximation that is derived by making simplifying assumptions that include linearization of the update and non-saturation of the memristors. However, when a weight reaches saturation,
To account for the growing effect of anti-Hebbian forces we can make a modification to the bias weight update, and we call the resulting form functional
The purpose of a functional model is to capture equivalent function with minimal computational overhead so that we may pursue large scale application development on existing technology without incurring the computational cost of circuit simulations. We justify the use of Model B because simulations prove it is a close functional match to the circuit, and it is computationally less expensive than Model A. However, it can be expected that better functional forms exist. Henceforth, any reference to the
Finally, in cases where supervision is desired, the sign of the Hebbian feedback may be modulated by an external supervisory signal,
Compare
Note that AHaH computing is not constrained to just one particular memristive device; any memristive device can be used as long as it meets the following criteria: (1) it is incremental and (2) its state change is voltage dependent. In order to simulate the proposed AHaH node circuit shown in
In our proposed semi-empirical model, the total current through the device comes from both a memory-dependent current component,
The Schottky component,
The memory component of our model,
An MSS possesses two states, A and B, separated by a potential energy barrier as shown in
An MSS is an idealized two-state element that switches probabilistically between its two states as a function of applied voltage bias and temperature. The probability that the MSS will transition from the B state to the A state is given by
We model a memristor as a collection of
At each time step some subpopulation of the MSSs in the A state will transition to the B state, while some subpopulation in the B state will transition to the A state. The probability that
We model the change in conductance of a memristor as a probabilistic process where the number of switches that transition between A and B states is picked from a normal distribution with a center at
The update to the memristor conductance is given by the contribution from two random variables picked from two normal distributions:
The final update to the conductance of the memristor is then given by:
Reducing the number of MSSs in the model will reduce the averaging effects and cause the memristor to behave in a more stochastic way. As the number of MSSs becomes small, the normal approximation to the binomial distribution breaks down. However, our desired operating regime of many metastable switches, and hence incremental behavior, is within the acceptable bounds of the approximation.
All experiments are software based, and they involve the simulation of AHaH nodes in various configurations to perform various adaptive learning tasks. The source code for the experiments is written in the Java programming language and can be obtained from a Git repository linked to from Xeiam LLC’s main web page at
There are two distinct models used for the simulation experiments: functional and circuit. The simulations based on the functional model use functional Model B as described above. The simulations based on the circuit model use ideal electrical circuit components and the generalized model for memristive devices. Nonideal behaviors such as parasitic impedances are not included in the circuit simulation experiments. We want to emphasize that at this stage we are attempting to cross the considerable divide between memristive electronics and general machine learning by defining a theoretical methodology for computing with dissipative attractor states. By focusing on nonideal circuit behavior at this stage we risk obfuscating what is otherwise a theory with minimal complexity.
By adjusting the free variables in the generalized memristive device model and comparing the subsequent current-voltage hysteresis loops to four real world memristive device I–V data, matching model parameters were determined as shown in
Device | ||||||||||
Ag-chalc | 0.32 | 8.7 | 0.91 | 0.17 | 0.22 | 1 | – | – | – | – |
AIST | 0.15 | 40 | 10 | .23 | .25 | 1 | – | – | – | – |
GST | 0.42 | .12 | 1.2 | .9 | 0.6 | 0.7 | 5×10^{−3} | 3.0 | 5×10^{−3} | 3.0 |
WO |
0.80 | .025 | 0.004 | 0.8 | 1.0 | .55 | 1×10^{−9} | 8.5 | 22×10^{−9} | 6.2 |
A) Solid line represents the model simulated at 100 Hz and dots represent the measurements from a physical Ag-chalcogenide device from Boise State University. Physical and predicted device current resulted from driving a sinusoidal voltage of 0.25 V amplitude at 100 Hz across the device. B) Simulation of two series-connected arbitrary devices with differing model parameter values. C) Simulated response to pulse trains of {10
When it comes time to manufacture AHaH node circuitry, an ideal memristor will be chosen taking into consideration many properties. It is likely that some types of memristors will be better candidates, some will not be suitable at all, and that the best device has yet to be fabricated. Based on our current understanding, the ideal device would have low thresholds of adaptation (<0.2 V), on-state resistance of ∼100 kΩ or greater, high dynamic range, durability, the capability of incremental operation with very short pulse widths and long retention times of a week or more. However, even devices that deviate considerably from these parameters will be useful in more specific applications. As an example, short retention times on the order of seconds are perfectly compatible with combinatorial optimizers.
Circuit simulations were carried out by solving for the voltage at node y in each AHaH node (
All machine learning applications built from AHaH nodes have one thing in common: the inputs to the AHaH nodes take as input a spike pattern. A spike pattern is a set of integers that specify which synapses in the AHaH node are coactive. In terms of a circuit, this is a description of what physical input lines are being driven by the driving voltage (
A simple example makes spike encoding for an AHaH node clear. Suppose a dataset is available where the colors of a person’s clothes are associated with the sex of the person. The entire dataset consists of several colors
In the case of real-value numbers, a simple recursive method for producing a spike encoding can also conveniently be realized through strictly anti-Hebbian learning via a binary decision tree with AHaH nodes at each tree node. Starting from the root node and proceeding to the leaf node, the input
If we then assign a unique integer to each node in the decision tree, the path that was taken from the root to the leaf becomes the spike encoding. This process is an adaptive analog to digital conversion. The source code used to generate this spike encoding is in
We demonstrate that both the functional and circuit implementation of the AHaH node are equivalent and functioning correctly in order to establish a link between our benchmark results and the physical circuit. The source code for these experiments can be found in
In the derivation of the functional model, the assumption was made that the quantity
A two input AHaH node will receive three possible spike patterns
As stated earlier, the attractor states A, B, and C can be viewed as logic functions. It was earlier demonstrated how NAND gates can be used to make these attractor states computationally complete. It was also described how a spike encoding consisting of two input lines per channel can be used to achieve completeness directly with AHaH attractor states. To investigate this, 5000 AHaH nodes were initialized with random weights with zero mean. Each AHaH node was driven with 1000 spikes randomly selected from the set
To demonstrate that the attractor states and hence logic functions are stable over time, the above experiment can be repeated. However, the number of time steps can be significantly increased and the logic state of each AHaH node can be recorded at each time step. For this experiment, 100 AHaH nodes were randomly initialized, and their logic functions were tested over 50,000 time steps. The source code for this experiment is in
Clustering is a method of knowledge discovery which automatically tries to find hidden structure in data in an unsupervised manner
An AHaH node converges to attractor states that cleanly partition its input space by maximizing the margin between opposing data distributions. The set of AHaH attractor states are furthermore computationally complete. These two properties enable a sufficiently large collective of AHaH nodes to assign unique labels to unique input data distributions while maintaining a high level of tolerance to noise. If a collective of AHaH nodes are allowed to randomly fall into attractor states, the binary output vector is a label for the input feature. For example, a four node collective with outputs (0,0,0,1) would encode the output ‘0001’ and, if converted to base-10 integers, be assigned the cluster ID ‘1’. The collective node output (1,1,1,1) would encode the output ‘1111’ and be assigned the cluster ID ‘15’. Such a collective is called an AHaH clusterer.
The total number of possible output labels from the AHaH collective is
For example, given 64 spike patterns and 16 AHaH nodes, the probability of the collective assigning the same label is 3%. By increasing
We developed a quantitative metric to characterize the performance of our AHaH clusterer. Given a unique spike pattern
Divergence and convergence may be combined to form a composite measure we call
Perfect clustering extraction will occur with a vergence value of 1. The code used to encapsulate the vergence measurement can be found in
To investigate the AHaH clusterer’s performance as measured by our vergence metric, we swept the following parameters individually while holding the others constant: learning rate (
The number of inputs to the AHaH nodes making up the AHaH clusterer was 256. Synthetic spike patterns were created with a random spike pattern generator. Given a spike pattern length, the number of inputs available on the AHaH nodes and the number of unique spike patterns, a set of spike patterns was generated. Noise is generated by taking random input lines and activating them, or, if the input line is already active, deactivating it. The number of patterns that can be distinguished by the AHaH clusterer before vergence falls is a function of the input pattern sparsity, number of total patterns and the pattern noise. Both functional-based and circuit-based AHaH clusterers were investigated and showed good correspondence.
While the vergence experiments provide a quantitative measure of the characteristics of the AHaH clusterer, we also designed a program to qualitatively visualize the clustering capabilities. The basic idea is to create several spatial clusters in two-dimensional space and let the clusterer automatically determine the boundaries between clusters in an unsupervised manner. We used a
Linear classification is a useful tool used in the field of machine learning to characterize and apply labels to samples from datasets. State of the art approaches to classification include algorithms such as decision trees, random forests, support vector machines (SVM) and naïve Bayes
The AHaH classifier consists of one or more AHaH nodes, each node assigned to a classification label and each operating the supervised form of the AHaH rule of
To compare the AHaH classifier to other state of the art classification algorithms, we chose four popular classifier benchmark data sets: the Breast Cancer Wisconsin (Original), Census Income, MNIST Handwritten Digits, and the Reuters-21578 data sets. The source code for these classification experiments is found in
We scored the classifiers’ performance using standard classification metrics: precision, recall, F1, and accuracy. Information on these metrics and how they are used is widely available. The standard training and test sets were used for learning and testing respectively. More information about these benchmark datasets is widely available, and a large amount of classification algorithms have been benchmarked against them including SVM, naïve Bayes, and decision trees.
To further validate an AHaH classifier implemented with circuit AHaH nodes against functional AHaH nodes, we use the Breast Cancer Wisconsin (Original) benchmark dataset. This dataset is relatively small allowing the circuit level simulations to complete quickly. Each sample is either labeled
Continuous valued inputs were converted using the adaptive decision tree method of
The AHaH classifier is capable of unsupervised learning by evoking
To demonstrate this unsupervised learning capability we used the Reuters-21578 dataset. The entire training and test sets were lumped together and the classifier was given the first 25% inputs in a supervised manner. For the remaining 75% of the news articles, the classifier was run in an unsupervised manner. Only when the confidence was 1.0, which indicates high certainty of a correct answer, did the classifier use its own classification as a supervised training signal. The F1 score was recorded after each story for the following most frequent labels:
Complex signal prediction involves using the prior history of a signal or group of signals to predict the future state of the signal. Signal prediction, also known as signal forecasting, is used in adaptive filters, resource planning and action selection. Some real world examples include production estimating, retail inventory planning, inflation prediction, insurance risk assessment, and weather forecasting. Current prediction algorithms include principle component analysis and regression and Kalman filtering
By posing signal prediction as a multi-label classification problem, complex signals can be learned and predicted using the AHaH classifier. As a simplified proof of concept exercise to demonstrate this, a complex temporal signal prediction experiment was designed. For each moment of time, the real-valued signal
This feature set is then used to make predictions of the current feature
Motor control is the process by which sensory information about the world and the current state of the body is used to execute actions to generate movement. Stabilizing Hebbian feedback applied to an AHaH node can occur any time after the Anti-Hebbian read, which opens the interesting possibility of using AHaH nodes for reinforcement-based learning. Here we show that a small collective of AHaH nodes, an AHaH motor controller, can be used in autonomous robotic control. As a proof-of-concept experiment we use an AHaH motor controller to guide a multi-jointed robotic arm to a target based on a value signal or cost function.
A virtual environment in which an AHaH motor controller controls the angles of
The robotic arm challenge involves a multi-jointed robotic arm that moves to capture a target. Each joint on the arm has 360 degrees of rotation, and the base joint is anchored to the floor. Using only a value signal relating the distance from the head to the target and an AHaH motor controller taking as input sensory stimuli in a closed-loop configuration, the robotic arm autonomously learns to capture stationary and moving targets. New targets are dropped within the arm’s reach radius after each capture, and the number of discrete angular joint actuations required for each catch is recorded to asses capture efficiency.
Sensors measure the relative joint angles of each segment of the robot arm as well as the distance from the target ball to each of two “eyes” located on the side of the arm’s “head”. Sensor measurements are converted into a sparse spiking representation using the method of
Opposing “muscles” actuate each joint. Each muscle is formed of many “fibers” and a single AHaH node controls each fiber. The number of discrete angular steps that move each joint,
Given a movement we can say if a fiber (AHaH node) acted for or against it. We can further determine if the movement was good or bad by observing the change in the value signal. If, at a later time, the value increased after a movement, then each fiber responsible for the movement receive rewarding Hebbian feedback. Likewise, if the fiber acted in support of a movement and later the value signal dropped, then the fiber is denied a Hebbian update. As the duration of time between movement and reward increases, so does the difficulty of the problem since many movements can be taken during the interval. A reinforcement scheme can be implemented in a number of ways over a number of timescales and may even be combined. For example, we may integrate over a number of time scales to determine if the value increased or decreased.
Experimental observation led to constant values of
We measured the robotic arm’s efficiency in catching targets by summing the total number of discrete angular joint actuations from the time the target was placed until capture. As a control, the same challenge was carried out using a simple random actuator. The challenge was carried out for both AHaH-controlled and random-controlled robotic arm actuation for different robotic arm lengths ranging from 3 to 21 joints in increments of three. The total joint actuation is the average amount of discrete joint actuation over the 100 captured targets. The source code for this experiment is available in
An AHaH node will descend into a probabilistic output state if the Hebbian feedback is withheld. As the magnitude of the synaptic weight falls closer to zero, the chance that state transitions will occur rises from 0% to 50%. This property can be exploited in probabilistic search and optimization tasks. Consider a combinatorial optimization task such as the traveling salesman problem where the city-to-city path is encoded as a binary vector
This can be accomplished by utilizing an AHaH node with a single input as a node within a virtual routing tree. As a route progresses from the trunk to a leaf, each AHaH node is evaluated for its state and receives the anti-Hebbian update. Should the route result in a solution that is better than the average solution, all nodes along the routing path receive a Hebbian update. By repeating the procedure over and over again, a positive feedback loop is created such that more optimal routes result in higher route probabilities that, in turn, result in more optimal routes. The net effect is a collapse of the route probabilities from the trunk to the leaves as a path is locked in. The process is intuitively similar to the formation of a lighting strike searching for a path to ground and as such we call it a
To evaluate the AHaH combinatorial optimizer, we used the functional model (
The experiment consists of 200 strike searches, where
The AHaH rule reconstructions for the functional and circuit forms of the AHaH node are shown in
Each data point represents the change in a synaptic weight as a function of AHaH node activation, y. Blue data points correspond to input synapses and red data points to bias inputs. There is good congruence between the A) functional and B) circuit implementations of the AHaH rule.
As part of our functional model derivation (
Multiple AHaH nodes receive spike patterns from the set
The 2-input AHaH node receiving 500 consecutive inputs randomly chosen from the set
The AHaH rule naturally forms decision boundaries that maximize the margin between data distributions. Weight space plots show the initial weight coordinate (green circle), the final weight coordinate (red circle) and the path between (blue line). Evolution of weights from a random normal initialization to attractor basins can be clearly seen for both the functional model (A) and circuit model (B).
After being initialized with random synaptic weights, the occupation of logic states of AHaH nodes receiving the spike logic patterns of
A) Logic state occupation frequency after 5000 time steps for both functional model and circuit model. All logic functions can be attained directly from attractor states except for XOR functions, which can be attained via multi-stage circuits. B) The logic functions are stable over time for both functional model and circuit model, indicating stable attractor dynamics.
SP⇓, LF⇒ | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
( |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
( |
1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
(1, |
1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
(1, |
1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
Logic functions 0 and 15 represent the null state and their occupation is inhibited through the action of the bias. By increasing the number of bias inputs from 1 to 3 we can collapse the stable attractor logic states down to 3, 5, 10 and 12. These functions represent the pure independent component states and act to pass or invert each of the two input channels. Although these states are not computationally complete, they can be made so via the use of NAND gates as we discussed in the theory section. The advantage of using states 3, 5, 10 and 12 is that they are very stable. The disadvantage is that we must now rely on external circuitry (i.e. NAND gates) to achieve computational universality.
The AHaH clusterer parameter sweep experiment results are summarized in
Learning Rate | Number ofAHaH nodes | Number ofNoise Bits | Spike PatternLength | Number ofSpike Patterns | |
Default Value | 0.0005 | 20 | 3 | 16 | 16 |
Range | .0002–.0012 | >7 | < = 7 | < = 36 | < = 28 |
The results shown in
Functional (A) and circuit (B) simulation results of an AHaH clusterer formed of twenty AHaH nodes. Spike patterns were encoded over 16 active input lines from a total spike space of 256. The number of noise bits was swept from 1 (6.25%) to 10 (62.5%) while the vergence was measured. The performance is a function of the total number of spike patterns. Blue = 16 (100% load), Orange = 20 (125% load), Purple = 24 (150% load), Green = 32 (200% load), Red = 64 (400% load).
The AHaH clusterer performs well across a wide range of different 2D spatial cluster types, all without predefining the number of clusters or the expected cluster types. A) Gaussian B) non-Gaussian C) random Gaussian size and placement.
The results show that the AHaH clusterer is able to handle a spectrum of cluster types. We demonstrate the ability to detect Gaussian and non-Gaussian clusters, clusters of non-equal size, as well as non-stationary clusters. Whereas other methods have intrinsic failure modes for certain types of clusters, our method can apparently handle a wide range of cluster types. Although more work must be done to fully compare our methods to existing clustering methods, our results thus far indicate that our method offers a genuinely new clustering mechanism with a number of distinct advantages. The most significant advantage is that we can implement the AHaH clusterer in physically adaptive AHaH circuits. In other words, clustering can now become an adaptive hardware resource.
Our AHaH classifier benchmark scores for the Breast Cancer Wisconsin (Original), Census Income, MNIST Handwritten Digits, and the Reuters-21578 data sets are shown in
Breast Cancer Wisconsin (Original) | Census Income | MNIST Handwritten Digits | Reuters-21578 | ||||
AHaH | .997 | AHaH | .853 | AHaH | .98–.99 | AHaH | .92 |
RS-SVM |
1.0 | NBTree |
.86 | deep convex net |
.992 | SVM |
.864 |
SVM |
.972 | naïve-Bayes |
.84 | large conv. net |
.991 | C4.5 |
.794 |
C4.5 |
.9474 | C4.5 |
.858 | polynomial SVM |
.986 | naïve-Bayes |
.72 |
In comparing our MNIST results with other methods, it is important to account for data preprocessing and artificial inflation of the training data set through transformations of training samples. We do not inflate the training set; our results are achievable with only one online training epoch. Both the training and test are completed on a standard desktop computer processor in a few minutes to less than an hour, depending on the resolution of the spike encoding. The current state of the art achieves a recognition rate of 99.65% and “took a few days” to train on a desktop computer with GPU acceleration
The Reuters-21578, Census Income and Breast Cancer datasets cover a range of data types from strings to integers to continuous real-valued signals. The Census Income dataset furthermore contains mixed data types as well as exemplars with missing attributes. In all cases the AHaH classifier combined with the simple spike encoder of
A) Reuters-21578. Using the top ten most frequent labels associated with the news articles in the Reuters-21578 data set, the AHaH classifier’s accuracy, precision, recall, and F1 score was determined as a function of its confidence threshold. As the confidence threshold increases, the precision increases while recall drops. An optimal confidence threshold can be chosen depending on the desired results and can be dynamically changed. The peak F1 score is 0.92. B) Census Income. The peak F1 score is 0.853 C) Breast Cancer. The peak F1 score is 0.997. D) Breast Cancer repeated but using the circuit model rather than the functional model. The peak F1 score and the shape of the curves are similar to functional model results. E) MNIST. The peak F1 score is 0.98–.99, depending on the resolution of the spike encoding. F) The individual F1 classification scores of the hand written digits.
Using the confidence threshold as a guide, the AHaH classifier can also be used in a semi-supervised mode. Starting in supervised mode and learning over a range of training data, the classifier can then switch to unsupervised mode. In unsupervised mode we may activate Hebbian learning if the confidence exceeds a value. Results are shown in
For the first 30% of samples from the Reuters-21578 data set, the AHaH classifier was operated in supervised mode followed by operation in unsupervised mode for the remaining samples. A confidence threshold of 1.0 was set for unsupervised application of a learn signal. The F1 score for the top ten most frequently occurring labels in the Reuters-21578 data set were tracked. These results show that the AHaH classifier is capable of continuously improving its performance without supervised feedback.
Results to date indicate that the AHaH classifier is an efficient incremental optimal linear classifier. The AHaH classifier displays a range of desirable classifier characteristics hinting that it may be an ideal general classifier capable of handling a wide range of classification applications. The classifier can learn online in a feed-forward manner. This is important for large data sets and applications that require constant adaptation such as prediction, anomaly detection and motor control. The classifier can associate an unlimited number of labels to a pattern, where the addition of a label is simply the addition of another AHaH node. By allowing the classifier to process unlabeled data it can improve over time. This has practical implications in any situation where substantial quantities of unlabeled data exist. Through the use of spike encoders, the classifier can handle mixed data types such as discrete or continuous numbers and strings. The classifier tolerates missing values, noise, and irrelevant attributes and is computationally efficient. The most significant advantage, however, is that the circuit can be mapped to physically adaptive hardware. Optimal incremental classification can now become a hardware resource.
The results of the temporal signal prediction experiment are shown in
By posing prediction as a multi-label classification problem, the AHaH classifier can learn complex temporal waveforms and make extended predictions via recursion. Here, the temporal signal (dots) is a summation of five sinusoidal signals with randomly chosen amplitudes, periods, and phases. The classifier is trained for 10,000 time steps (last 100 steps shown, dotted line) and then tested for 300 time steps (solid line).
While this temporal signal prediction demonstration is not by any means an exhaustive comparison of AHaH signal prediction to other forecasting algorithms, it demonstrates the utility and flexibility of the AHaH classifier and provides the first glimpse of using AHaH nodes in the large application space of signal forecasting. These results also shed light on how AHaH node supervisory signals could be generated in a completely self-organizing system with zero human intervention. Time is the supervisor and prediction is the Hebbian reward. From the practical perspective, prediction provides the ability to prepare or optimize for the future. It also provides the ability to detect when a system is changing. If a prediction fails to meet with reality, an anomaly has occurred.
The results of the motorized robotic arm experiment are shown in
The average total joint actuation required for the robot arm to capture the target remains constant as the number of arm joints increases for actuation using the AHaH motor controller. For random actuation, the required actuation grows exponentially.
Our results show that populations of independent AHaH nodes can effectively control multiple degrees of freedom so as to ascend (or descend) a value function. This process is spontaneous and results from the emergent behavior of many AHaH nodes acting as
The results of the traveling salesman problem experiment are shown in
By using single-input AHaH nodes as nodes in a routing tree to perform a strike search, combinatorial optimization problems such as the traveling salesman problem can be solved. Adjusting the learning rate can control the speed and quality of the solution. A) The distance between the 64 cities versus the convergences time for the AHaH-based and random-based strike search. B) Lower learning rates lead to better solutions. C) Higher learning rates decrease convergence time.
A strike evolves in time as bits are sequentially locked in via the positive feedback selection mechanism after a period of evidence accumulation. The lower the learning rate, the more evidence is accumulated before a path is locked in. In this way, a strike search appears to be a relatively generic method to accelerate the search for a procedure. Using the traveling salesman problem as an example, we could just as easily encode the strike path as a relative procedure for re-ordering a list of cities rather than an absolute ordering. For example, we could swap the cities at indices A and B, then swap the cities at indices C and D, and so on. Furthermore, we could utilize the strike procedure in a recursive manner. In the case of the traveling salesman problem we could assign lower-level strikes to find optimal sub-paths and higher-order strikes to assemble larger paths from the sub-paths. Most generally, if (1) a problem can be represented as a bit configuration and (2) the configuration can be assigned a value in an efficient manner, then a strike can be used as an adaptive learning hardware resource for optimization tasks. The ability to change the convergence times allows dynamic choices to be made in the time available.
Both static and dynamic power consumption pathways must be considered when calculating the energy dissipation of neuromorphic chips containing AHaH circuit architecture. The static power component is dominated by the current flowing through the AHaH node synapse arrays during the read and write phases. The dynamic power component is dominated by the charging and discharging of the capacitive components of the circuitry. This capacitance includes parasitics from circuit elements and interconnect wires. Industry best practices can optimize dynamic power consumption. Here we focus on an estimation of static power consumption. Note that by not including the dynamic power consumption in this estimation, these values represent only a lower bounds on the synaptic power consumption of a neuromorphic chip. Dynamic power consumption, which is heavily dependent on chip design and architecture may have a significant power contribution. Recall that one of the major motivations of AHaH computing is the elimination of the von Neumann bottleneck for machine learning applications. Considering static and dynamic power consumption together with the elimination of this bottleneck, the net gain in power efficiency compared to modern digital electronics will most likely increase.
Static power dissipation of a single AHaH node is equal to
Condition | Maximum Power | ||
Path A Selected | |||
Path B Selected | |||
No Feedback |
From
In all applications, the spike encoding plays an important role in reducing the number of spikes and hence the power consumption.
Application | CoactiveSpikes | SpikeSpace | Sparsity | AHaH NodeCount |
Breast Cancer | 31 | 70 | 0.44 | 2 |
Census Income | 63 | ∼1800 | ∼0.035 | 2 |
MNIST | ∼1000 | ∼27,500 | ∼0.036 | 10 |
Reuters 21578 | ∼100 | ∼46,000 | ∼0.002 | 119 |
Robotic Arm | 92 | 341 | 0.27 | 345 |
Comb. Opt. | 1 | 1 | n/a | ∼600,000 |
Clusterer | 16 | 256 | 0.0625 | 20 |
Prediction | 300 | 9600 | 0.031 | 32 |
We have attempted to connect a low-level general statistical model of collections of metastable switches with dissipative attractor-based computation and machine learning in a physically realizable circuit. Our aim is to provide a road map for others to follow so that we may all explore and exploit this interesting and potentially useful form of computing. Our ultimate goal is to provide a physical adaptive learning hardware resource (the AHaH circuit) in much the same way as modern RAM memory provides a memory resource to computing systems. However, only when we have investigated the circuit and functional models and have demonstrated real world utility is it necessary to move toward simulation of nonideal circuits effects, such as parasitic impedances, signal delays, settling times and variations in memristor properties. These details are certainly required for the eventual construction of a neural processing unit (NPU) but to include them in this paper would obfuscate our core message that “a new type of computing is possible that appears to offer a solution of general machine learning”.
Our demonstrations of utility include results across the field of machine learning, from clustering and classification to prediction, control and combinatorial optimization. Given the intended broad scope of this paper it was not possible to provide much elaboration on some of our results, comparison with many other methods, nor discuss the implications. For this reason we have open-sourced all code used to generate the results of this paper. We encourage the reader to investigate our methods carefully and come to their own conclusions.
Although it was important to develop specific techniques to address the broad capabilities we have demonstrated, we wish to convey the idea that the AHaH node is a building block from which many higher-order adaptive algorithms may be built including many we have not yet conceived of. As an example consider our results with the AHaH motor controller and AHaH classifier. By using the classifier’s confidence estimation as the value function for the AHaH motor controller, which in turn controls the viewing position, angle and rotation of an “eye”, it should be possible to spontaneously control the gaze of a vision system to find and center previously trained objects. Alternately, by pairing the AHaH signal prediction with the AHaH combinatorial optimizer, it should be possible learn to predict a reward signal while simultaneously optimizing actions to attain reward. We can infer from our results that other capabilities are possible. Anomaly detection, for example, is the inverse of prediction. If a prediction can be made about a temporally dynamic signal, then an anomaly signal can be generated should predictions fail to match with reality. Tracking of non-stationary statistics is also a natural by-product of the attractor nature of the AHaH rule, and was slightly touched upon in the 2D clustering videos,
We have introduced the concept of AHaH computing. We have shown how the simple process of particles dissipating into containers through adaptive channels competing for conduction resources leads to AHaH plasticity. We have shown that memristive devices can arise from metastable switches, how differential synaptic weights may be built of two or more memristors, and how an AHaH node may be built of arrays of synapses. A simple read and write cycle driving an AHaH circuit results in physical devices implementing AHaH plasticity. We have demonstrated that the attractor states of the AHaH rule can configure computationally complete logic functions, and have shown their use in supervised and unsupervised classification, clustering, complex signal prediction, unsupervised robotic arm actuation and combinatorial optimization. We have demonstrated unsupervised clustering and supervised classification in circuit simulations, and have further shown a correspondence between our functional and circuit forms of the AHaH node.
The AHaH node may offer us a building block for a new type of computing with likely application in the field of machine learning. Indeed, we hope that our work demonstrates that functions needed to enable perception (clustering, classification), planning (combinatorial optimization, prediction), control (robotic actuation) and generic computation (universal logic) are possible with a simple circuit that, technologically speaking, may be very close at hand.
(MP4)
(MP4)
(MP4)
(MP4)
(MP4)
(MP4)
(MP4)
(MP4)
(MP4)
The Breast Cancer Wisconsin, Reuters-21578 Distribution 1.0, and Census Income classification benchmark datasets were obtained from the UCI Machine Learning Repository
The authors would like to thank Kristy A. Campbell from Boise State University for graciously providing us with memristor device data.
Special thanks to Air Force Research Laboratory’s (AFRL) Information Directorate.
Alex Nugent would like to personally thank Hillary Riggs, Kermit Lopez, Luis Ortiz, and Todd Hylton for their support over the years. This work would definitely not have existed without them.
This manuscript has been approved for public release; distribution unlimited. Case Number: 88ABW-2014-0103.