Computing with networks of nonlinear mechanical oscillators

As it is getting increasingly difficult to achieve gains in the density and power efficiency of microelectronic computing devices because of lithographic techniques reaching fundamental physical limits, new approaches are required to maximize the benefits of distributed sensors, micro-robots or smart materials. Biologically-inspired devices, such as artificial neural networks, can process information with a high level of parallelism to efficiently solve difficult problems, even when implemented using conventional microelectronic technologies. We describe a mechanical device, which operates in a manner similar to artificial neural networks, to solve efficiently two difficult benchmark problems (computing the parity of a bit stream, and classifying spoken words). The device consists in a network of masses coupled by linear springs and attached to a substrate by non-linear springs, thus forming a network of anharmonic oscillators. As the masses can directly couple to forces applied on the device, this approach combines sensing and computing functions in a single power-efficient device with compact dimensions.


Introduction
Massively-parallel networks of simple units with elementary non-linear processing capabilities, such as artificial neural networks, have been used for years as efficient and robust computing systems. In general, a network of N elements is described by the state column vector xðtÞ 2 R N . The state vector evolves with time t as a stimulus u(t) is imposed on the network, according to the rich dynamics created by the interconnection of the elements in the network. In a particularly simple network architecture called "reservoir computing" [1], an output function is formed using a weight vector w 2 R N . As the network is fed a training signal u Ã (t), the weights are adjusted to minimize an error function between y(t) and a target function y Ã (t) = F [u Ã (t)]. F is a complicated transformation of the input signal, which is used to represent the computing capabilities of the network. During this training phase, the internal structure of the network is left untouched, and only the output weights are adjusted. Such "reservoir computers" can approximate the transformation F correctly when their dynamics obey the echo state property [2], in which case the states xðtÞ do not depend on the stimuli u(t − τ) for a time τ that is sufficiently long. A reservoir computer with the echo state property will exhibit useful computing capabilities when the weights obtained during the training phase can be used to form an output y(t) that is close to F[u(t)], for a new stimulus u(t) similar enough to u Ã (t). It is actually observed in numerical experiments that reservoir computers perform well when the networks driven by the stimulus u(t) operate as systems with complex dynamics [3]. As such, reservoir computers are an efficient approach to exploit architectures such as recurrent neural networks, which are Turing equivalent [4], but which are difficult to train using conventional methods. In practice, reservoir computers have been shown to be accurate and resource-efficient solutions to a number of challenging problems (e.g. ref. [1,5,6]).
A useful characteristic of reservoir computers is their fixed internal structure. The network is generally constructed randomly with a few deterministic constraints such as limits on the spectral radius of the network connection matrix [7], for instance. Following the network construction, different weight vectors can be computed to enable the network to perform different tasks. This fixed structure is especially attractive from the point of view of hardware implementation, as it does not require interconnections with dynamically adjustable strengths between the network elements, or other forms of network adaptability. A number of hardware implementations of reservoir computers have been discussed, including analog electronics [8], selfassembled atomic switches [9], optoelectronic devices [10] and photonic devices [11].
This motivates the search for hardware elements that can be arranged in a network to form a dynamical system able to respond to an external stimulus, and to exhibit the echo state property. We are especially interested in dynamical systems which can be stimulated directly by physical forces, such as accelerations or mechanical pressure. In this communication, we verify numerically the hypothesis that non-linear (anharmonic) mechanical oscillators coupled by linear springs can perform non-trivial computations. This opens up the possibility for miniature, energy-efficient computers. As the mechanical elements are sensitive to physical forces, it also blurs the boundary between sensors and computers, with great opportunities in control and signal processing, for instance.
The objective of this study is to demonstrate that a network of non-linear mechanical oscillators can perform complex computations within the framework of reservoir computing. We choose a specific form for the network, which is described in details below, and show that a single instance of such a network can efficiently solve two widely different computing problems. This establishes the usability of networks of mechanical oscillators as general-purpose computing devices, which are able to efficiently process information from complex physical stimuli.
The following system is considered (see Fig 1 for a schematic description): where dots denote derivatives with respect to time, i = 1, . . ., N and obvious modifications are made to the last term for i = 1 (::: þ o 2 1 ½À x 1 ðtÞ þ x 2 ðtÞ) and i = N (::: þ o 2 1 ½x NÀ 1 ðtÞ À x N ðtÞ). Eq (2) describes a chain of nominally identical Duffing oscillators, except for the anharmonic term of strength β i which can vary in different oscillators. Oscillators with β i = 0 would be standard harmonic oscillators with resonance angular frequency o 0 and quality factor Q. The strength of the coupling between neighboring oscillators is parameterized by ω 1 .
The oscillators are driven by a harmonic term cos(Ot) at angular frequency O, with a mean amplitude A modulated by the input signal u(t), scaled by the parameter Δ i . When the drive amplitudes are large, the system can exhibit very complex dynamics, including extreme sensitivity to initial conditions (chaos). At lower drive amplitudes, the dynamics can still be complex but are no longer chaotic. Because of damping, the system exhibits the echo state property when its dynamics have a single attractor. The existence of a single attractor (for a given class of input signals u) is verified numerically by obtaining a stable, high success probability when the oscillator network is trained for a particular task.
We present in section 1 an example of an instance of a network described by Eq (2) (with N = 400), which performs well on two significantly different benchmark computing tasks, namely the computation of non-trivial digital functions requiring memory (the parity function test), and the recognition of spoken digits. This demonstrates that a given network (with fixed structure) can process information in widely different and complex ways. This might be especially relevant technologically for space-and power-constraint applications, where it is beneficial to collocate a device sensing and processing functions.
We consider as an example of a concrete implementation a network of thin doubly clamped silicon beams of length l operated in their out-of-plane mode. Such a network could be fabricated using conventional MEMS technologies. The number density of oscillators would be approximately where R w is the width-to-length ratio of the beams. The resonance frequency of such oscillators is [12] where R t is the thickness-to-length ratio of the beam. As non-linear effects must be present in the oscillators for non-trivial computing capabilities to emerge, their oscillation amplitudes should be sufficiently large. The minimum amplitude for the onset of non-linear effects in a double clamped silicon beam is approximated by [12] x nl ¼ 20 nm Q 100 The energy in one oscillator is m eff x 2 nl o 2 0 =2, for m eff the effective mass of the beam, so the mechanical power required to drive one oscillator in the reservoir is For comparison, reference [13] describes a state-of-the-art neuromorphic computing device implemented in the 28-nm CMOS technology with 10 6 computing elements (artificial neurons). This device achieves a neuron number density of 2 × 10 3 mm −2 (or 7 × 10 4 mm −2 when the peripheral circuitry not directly implementing the neurons is not considered). Its power consumption per neuron is on the order of 90 nW. As another example, reference [14] presents another neuromorphic chip (130-nm CMOS) with an artificial neuron number density of 3 × 10 3 mm −2 and power consumption per neuron of 4 nW. The mechanical portion of the proposed reservoir computer with silicon beams could thus be smaller (by one order or magnitude) or more energy efficient (by one or two orders of magnitude) than a state-of-the-art microelectronic devices with the same number of computing element. The density and power estimates of the oscillator network do not include the oscillator motion sensing, summation and amplification, which are expected to be implemented in efficient analog electronics, possibly using advanced packaging schemes such a heterogeneous integration.
Reservoir computers made of a network of coupled anharmonic mechanical oscillators therefore appear as interesting candidates for miniature, power-efficient devices that can be driven directly by physical signals (external fields, inertial or pressure forces, etc.), potentially allowing the creation of sensors with complex computing capabilities. Numerical simulations demonstrating the computing capabilities of a mechanical oscillator network are presented in section 1, while the robustness of this computing model with respect to possible variability in the hardware implementation are discussed in section 2.

Computing with a network of anharmonic oscillators
The main result of this section is a numerical demonstration that a single instance of a network of coupled anharmonic oscillators, as described by Eq (2), can perform well on different computing benchmarks. All numerical results are obtained with the same oscillator network, excepted where explicitly mentioned for robustness evaluations. It should be emphasized that the particular parameters and structure of the network presented below were only selected to demonstrate the usefulness of networks of oscillators as computers; the optimization of these parameters and structure to achieve better performances on specific tasks will be the subject of future communications.
The network consists of a long chain of N = 400 oscillators, each with fundamental angular frequency ω 0 = 1.3 and quality factor Q = 60. The oscillators are randomly assigned a strong (β = 1) non-linearity with a probability of 25%, and are otherwise assigned a weak (β = 0.005) non-linearity (the same random allocation is used in all numerical experiments). Neighboring oscillators in the chain are coupled by a linear spring of strength ω 1 = 1.5. In addition, the strength Δ i of the coupling between the signal u(t) and the oscillators is randomly set to a fixed value Δ Ã (benchmark problem-dependent) for 50% of the oscillators, and is zero otherwise. The amplitude A driving uniformly all the oscillators depends on the particular benchmark problem (see below).
Eq (2) is integrated numerically using a Runge-Kutta method. In order to extract the envelopes of the rapidly-oscillating signals x i (t), these are multiplied by cos(Ot), decimated by a factor of 10, and passed through a seventh order low-pass Butterworth filter. The envelope signals so-obtained, labeled χ i (t), contain only the low-frequency amplitude variations in the oscillator positions. They are used to form the output signal where the weights w i are computed from the data accumulated during the training period. During the training period, a signal u(t) is applied to the network, producing the data matrix X, with [X] ij = χ i (t j ) for discrete times t j , j = 1. . .M. The N-by-N matrix X X T is inverted (with Tikhonov regularization) to obtain the weights from the vector of the target function evaluated at the same time steps. It should be noted that the matrix X X T can be computed in real time by accumulating N(N + 1)/2 values at each time step. The training period is then followed by a measurement period, where the weights are fixed, the signal u(t) continues to be applied to the network, and the network's ability to reproduce the target function correctly is measured.
The parity function is considered as a first benchmark, as it requires both memory and non-trivial non-linear computational capabilities [15]. For this task, u(t) is a binary signal that can randomly switch between the two states −1 and +1 whenever t is an integer multiple of a period T. The n th -order parity of u is defined as and requires data between t − nT and t − T to be continuously computed. Numerical experiments were performed with T = 65 and O = 1.14, so that the input signal u(t) was switching at most every 9.07 cycles of the cos(Ot) drive. The amplitude parameters were set to A = 0.8 and Δ Ã = 0.7. In order to increase the robustness of the estimation of the parity functions, the weights were computed on ten contiguous, equal-length sub-sections of the full oscillator chain during a training phase of duration 359T. Each sub-section had its own set of weights and was sufficiently long (40 oscillators) to produce a good estimate of the parity function for most inputs. The ten estimates from the ten sub-sections were averaged, the sum was integrated over each period T, and the sign of the integral was used to decide between a value of -1 or of +1 for the parity function estimated by the network .  Fig 2 shows a numerical example obtained for parity functions of order 3, 4 and 5. Estimated parity values of +1 or -1 indicate that all ten sub-sections of the full chain, which are mostly equivalent to shorter independent chains with N = 40, were all able to obtain the correct value. On the other hand, estimated parity values around 0 (as observed more frequently for P 5 ) indicate that the equivalent shorter chains were unable to agree on a valid parity value. As in Fig 2, it is observed in larger scale experiments that the accuracy of the network decreases with increasing order of the parity function, with a proportion of correctly estimated parity values of 100% for P 3 , (93.48 ± 0.0031)% for P 4 and (68.78 ± 0.0058)% for P 5 . The training process appears to be relatively robust, with the 10 th percentile of the proportion of correctly estimated parity values for repeated training runs (which differ only in the randomly generated training data) estimated at 86.2% and 60.4% for P 4 and P 5 , respectively.
A convenient way to compare the capacitites of the oscillator networks to other results in the reservoir computing literature is to estimate their so-called memory capacity, as introduced in reference [3]. As bits are distributed equally between −1 and +1 in both the input and output of the parity benchmark, the mutual information [3] is estimated using for p τ the success probability of the delayed 3 rd order parity function, defined as The memory capacity is then given by with the sum truncated at τ = 4 because larger delays have negligible mutual information. The mutual information for the delayed P 3 function is 1 bit for a delay τ 2T, and then drops rapidly to 0.44 bit for τ = 3T and less than 0.04 bit for τ ! 4T. The resulting memory capacity is approximately 3.5 bits, similar to performance levels for large networks published in the reservoir computing literature (e.g. reference [3] presents a reservoir with 250 neurons with a memory capacity of 4.8 bits). As another benchmark, we consider the classification of recorded time series for the spoken words "zero" to "nine". This is a conventional benchmark for non-trivial classification tasks (e.g. ref. [16]), when the NIST TI-46 data set [17] is used. We use from this data set utterances of the words from 7 different female speakers. In this task, the driving signal u(t) is formed by concatenating time series of the utterances, padded with periods of silence (duration 70). The mean of each time series is removed, the time series is normalized by its standard deviation, and its absolute value is used for u(t). The time series are provided in TI-46 at a sampling rate of 12.5 kHz. In order to adapt the time series to the dynamics of the simulated oscillators, we stretch every second of time series data to 97.2 time units in the numerical simulation, which is the equivalent of having oscillators driven at a frequency of 808 Hz (O = 1). As a result, the network mostly uses the low frequency content of the sound recordings to classify the utterances. The amplitude parameters were set to A = 2 and Δ Ã = 6.
The training was performed on 800 utterances chosen randomly between the ten digits. One set of weights was computed for each digit. As for the parity benchmark, the chain of N = 400 oscillators was split into sub-sections to improve robustness, this time into 19 sub-sections of length 40, each overlapping the next by 20 oscillators. Each sub-section had its own set of weights and produced a value that should be one if the digit corresponding to this set was spoken, and zero otherwise. The values produced from the weights were integrated over each period where an utterance was submitted to the network. As a result, for each utterance, the nineteen sub-sections for each of the ten digits produced a total of 190 numbers c ij , with i = 1, . . ., 10 and j = 1, . . ., 19. The classification from the network was then obtained using 10 × 9/2 pair comparisons in a one-vs.-one manner, according to where T ii 0 is a threshold number, D ii 0 is a vector of coefficients to linearly combine the components of the difference of vectors c i and c i 0 , which correspond to [c i ] j = c ij . The digit i or i 0 that was returned earned one vote for each comparison, and the digit with the largest number of votes was returned by the network for the utterance. The coefficient vectors were computed using Fisher's linear discriminant, according to where N i is the number of i digits in the training data, S i is the covariance matrix of vector c i , λ is a small regularization parameter, μ i is the average of the vector c i , and similarly for i 0 . Each threshold T ii 0 was adjusted to maximize the probability over the training set of Eq (12) to return the correct digit when i and i 0 occurred with equal probability. Fig 3 presents results for the words classification benchmark. The results correspond to the average performance of 25 different training runs, performed on the same network. For all training runs, the variability in success rate is consistent with uncertainties from the finite sample size, indicating that the training procedure is repeatable. The results are relatively good, with the network correctly classifying randomly chosen utterances in (0.802 ± 0.009)% of the trials. It can be seen in Fig 3 that classification errors are principally made between small groups of digits such as {1, 3, 9} and {4, 5}, indicating that it is harder for the network to discriminate between digits within these groups than between these and other digits.
The results can be compared to other experiments in the reservoir computing literature. While success rates above 99% have been reported (for five speakers instead of seven as in this work) for networks of 200 nodes (e.g. ref. [10,18]), these experiments all make use of elaborated pre-processing schemes (specifically, the Lyon cochlear ear model [19]). It has been shown in ref. [20] that pre-processing greatly affects the efficiency of spoken words classification, with some pre-processors (e.g. using Mel-frequency cepstrum coefficients) performing well under 70% success rates for networks of 200 nodes. The results presented here with the oscillator network do not include any pre-processing, except for the rectification of the input time series to form the input signal u(t), and the normalization of the time series amplitudes. This is intended to reflect a simple system where the sound pressure directly modulates the driving force on the oscillators (e.g. by displacing a membrane). More complicated systems, for instance with modulation amplitudes for different oscillator groups that depend on the sound frequency, could in principle be significantly more efficient.

Discussion
Fig 4 shows variations in the success probability for the parity functions, as the global parameters A (amplitude of oscillator drive) and T (period of input binary signal) are varied. Each network is trained and operated independently as described in section 1. It is observed that the region of global parameter space where the performance of the network is good is reasonably large, indicating that the precise matching of the network to a given signal (especially with respect to T) is not required.
Similarly, Fig 5 shows the reduction in the success probability for the parity functions, when the parameters of the oscillators in the network are not all identical, but are rather taken to fluctuate randomly, for instance to simulate the effect of manufacturing tolerances, according to where λ i can be any of the parameters in the set {A, Q, β, ω 0 , ω 1 , Δ} for the i th oscillator, σ is the relative fluctuation level, and z is a standard normal random variable (zero mean and unit variance). Each perturbed network is trained and operated independently as described in section 1. While the performances of the network do depend on the nominal value of the parameters set for the oscillators, these data demonstrate for the parity function that the network is quite robust when the parameters of individual oscillators are independently varied around these On the other hand, Fig 6 shows how the success probability for the parity functions is reduced when perturbations described by Eq (14) are introduced just after the training of the network has been completed. This situation corresponds to oscillator parameters that are drifting over time. It can be observed that the success probability is almost independent of variations in the quality factor (Q), for relative variations as high as 10%. For the non-linear parameter (β) and the coupling strength (Δ), it is not reduced significantly for relative variations up to 1%, but drops rapidly for large variations. Finally, the success probability seems to degrade continuously with the magnitude of the relative variations for the harmonic drive amplitude (A), the coupling strength (ω 1 ) and the oscillator natural frequency (ω 0 ). These observations are compatible with the hypothesis that computational capabilities depend Variations in the success probability (P) for the three parity functions (blue: P 3 , green: P 4 and red: P 5 ) as the relative variation σ is increased, for perturbations introduced after the training of the network is performed. Error bars are computed at the 95% confidence level. https://doi.org/10.1371/journal.pone.0178663.g006 Computing with networks of nonlinear mechanical oscillators strongly on the network being operated at a precise point with respect to its dynamics (as determined mostly by A, ω 1 and ω 0 ), presumably where the motion of the oscillators is complex, but not chaotic. They also indicate that in actual physical devices, requirements on the stabilization of most parameters should be relatively mild (*1%), except for the parameters A, ω 1 and ω 0 which will have to be stable at the 0.1% level.

Conclusion
Non-linear mechanical oscillators arranged in a network using linear couplings can be used to perform complex computations, as demonstrated in this communication by numerical simulations of a single instance of a network that performs well on two widely different benchmark tasks (computation of parity functions, and classification of spoken words). The computational capacities of this network are preserved when the global network parameters are tuned to different values in a relatively large parameter space, when large (over 10%) random variations are introduced in the oscillator parameters before the network training phase, and when significant variations (0.1% to 1%, depending on the parameter) in the oscillator parameters are introduced after the training phase has been completed with a network of oscillators having nominal parameter values.
These results show the existence of a new class of computing devices based on networks of coupled non-linear mechanical oscillators. We have described one example of such device which, as mentioned above, performs robustly on two difficult computing tasks. It is remarkable that this device has a low complexity (400 oscillators) relative to modern microelectronic components. As explained in the introduction, this leads to the possibility of creating small and energy-efficient devices that are highly relevant technologically.
The device that was simulated in this work was constructed randomly from a small set of rules that are described in section 1. These rules were only minimally tuned in order to obtain good performances on the benchmark tasks. It was observed, for instance, that smaller networks did not perform as well on the spoken words benchmark. It is a general characteristic of the reservoir computing approach that the details of the dynamical systems that are used for computing are not directly relevant. However, it is likely that devices that differ from the one studied in this work might perform better, on a broader set of computing tasks. Studies related to the optimization of parameters of mechanical oscillator networks, including the number of oscillators, their linear and non-linear characteristics, as well as the way they are interconnected, for instance, will be the subject of future communications.
As the mechanical oscillators can be directly coupled to forces produced by the environment of the network (accelerations, sound pressure, etc.), devices having both sensing and computing capabilities can be envisioned. The devices, fabricated using MEMS technologies, for instance, could be very compact and energy-efficient, and compete with state-of-the-art sensors and microelectronic devices for distributed sensing or robot control.