First-order tactile neurons have spatially complex receptive fields. Here we use machine-learning tools to show that such complexity arises for a wide range of training sets and network architectures. Moreover, we demonstrate that this complexity benefits network performance, especially on more difficult tasks and in the presence of noise. Our work suggests that spatially complex receptive fields are normatively good given the biological constraints of the tactile periphery.
Citation: Zhao CW, Daley MJ, Pruszynski JA (2018) Neural network models of the tactile system develop first-order units with spatially complex receptive fields. PLoS ONE 13(6): e0199196. https://doi.org/10.1371/journal.pone.0199196
Editor: Maurice J. Chacron, McGill University Department of Physiology, CANADA
Received: April 10, 2018; Accepted: June 1, 2018; Published: June 14, 2018
Copyright: © 2018 Zhao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant information is within the paper.
Funding: This work was supported by the Canadian Institutes of Health Research (Foundation Grant to JAP: 3531979) and the Natural Science and Engineering Research Council of Canada (Discovery Grant to MJD). JAP received a salary award from the Canada Research Chairs Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
First-order tactile neurons in the hairless skin of the human hand have distal axons that branch in the skin and form many transduction sites [1–3], yielding spatially complex receptive fields with many highly sensitive zones [4,5] (Fig 1A). We have recently shown that this arrangement permits first-order tactile neurons to signal high-level features of touched objects such as the orientation of a touched edge [4,6,7], a capacity previously considered a hallmark of processing in the somatosensory cortex [8–10]. Here we leverage machine learning tools to investigate why complex receptive fields arise and what computational benefits they yield. We show that complex receptive fields arise under a wide range of training sets and biologically realistic network constraints. We also show that complex receptive fields benefit network performance, especially on more complex discrimination tasks and in the presence of noise.
(A) Examples of receptive fields from human first-order tactile neurons terminating in the fingertip acquired via microneurography. Color indicates the relative firing rate of the neuron when stimulated with a small punctate stimulus. For full details, see Pruszynski and Johansson (2014). (B) Graphic representation of a cross-section through the human glabrous skin. Note how a single afferent neurons branches and innervates multiple mechanoreceptive end organs. (C) Our four-layer feedforward neural network. The first layer models a small patch of skin, W(1) represents receptive fields, and the second layer models first order neurons. Layers 3 and 4 are a functional abstraction of the central nervous system. The relative sizes of each layer are shown but not to scale. Arrows represent fully connected feedforward weights between subsequent layers. End organs and first order neurons in (B) are colour matched with the layers that represent them in the model. (D) Examples of training data used to represent tactile stimuli. Each stimulus is shown on a 28 x 28 step grid. Stimuli were passed through a Gaussian filter and randomly rotated and translated. Points data were also randomly scaled.
We abstracted the tactile processing pathway with a four-layer feedforward neural network (Fig 1B and 1C). The input layer of our network consisted of 784 units, representing mechanoreceptors distributed over a small patch of skin. In this arrangement, the weight matrix between the input and first hidden layer—which we call W(1)—represents the receptive fields of first-order tactile neurons. Our network was trained on a range of stimuli including single points, multiple points, as well as Roman and Braille characters (Fig 1D). These stimuli were subjected to translation and rotation and were spatially filtered to crudely approximate skin mechanics. Importantly, we introduced three biologically-inspired constraints. First, non-negative regularization in W(1) to simulate the fact that first-order tactile neurons can only be excited when their transduction sites are stimulated . Second, convergence from the input to the first hidden layer to simulate the many-to-one convergence from mechanoreceptors in the skin to first-order tactile neurons traveling in the nerve [1–3]. Third, two distinct unsupervised and supervised training phases, representing the encoding and interpreting aspects of the tactile processing pathway, respectively.
We first asked under what conditions, if any, our network learns spatially complex receptive fields. In our main analysis, the 784 units in the input layer converged to 81 units in the first hidden layer, estimating the fact that first-order tactile neurons innervate on the order of ten mechanoreceptors [1–3]. We reasoned that the complexity of the training set would influence the complexity of the receptive fields . We tested this idea with four training sets: Gaussian single points, mixed one and two Gaussian-points, Roman letters, and a mixed set that included one and two Gaussian points, Roman letters and Braille characters in equal proportions (see Methods). These training sets represent different degrees of structural complexity, and consist of stimuli that have been used in tactile studies in both human and animal models [13–17] but were not meant to represent the natural statistics of tactile stimuli, which are unknown.
We trained our network on each of these training sets in an unsupervised fashion and examined the resulting receptive fields (i.e. the W(1) matrix). All networks, even those trained with the simplest training set, exhibited receptive fields with multiple areas of high sensitivity (Fig 2A). Overall, there was a clear effect of training set on receptive field complexity (F(3,76) = 1642, P<0.01) where the number of highly sensitive zones increased with the complexity of the training set (Fig 2B). A similar effect was evident when analyzing receptive fields in the spatial frequency domain, with more complex training sets yielding higher spatial frequency content.
(A) Examples of receptive fields learned by the 81- and 36-hidden unit models after training on different training sets (rows). Each receptive field is shown on a 28 x 28 step grid. Heat maps show areas with high weight values, which represent highly sensitive zones. Samples were chosen to show a variety of receptive field morphologies. The number on the bottom left corner of each receptive field is the number of peaks returned by our peak counting algorithm, which measures receptive field complexity. (B) The average complexity of each network under different architectures and training sets. Each data point is the mean peak count of receptive fields from that model on one iteration, with grey violin plots showing the overall frequency distribution across the 20 iterations we performed for each architecture and training set.
We next asked how the degree of convergence between the input and first hidden layers influenced receptive fields. That is, how physical constraints placed on the number of first-order tactile neuron axons traveling within the peripheral nerve should affect connectivity to mechanoreceptors in the skin. We reasoned that increasing convergence would increase receptive field complexity, since this smaller set of units must still encode the same set of inputs. We tested this idea by decreasing the size of the first hidden layer from 81 to 36 units, closer to the lower limit of biologically relevant convergence [1–3], and training the network on the same four training sets described above. Increasing convergence did result in more complex receptive fields for alphabet and mixed networks (Fig 2B). On average, the 36-unit alphabet network had 3.0 more peaks than the 81-unit alphabet network (t(38) = 46.39, P<0.01), and the 36-unit mixed network had 4.0 more peaks than the 81-unit mixed network (t(38) = 56.93, P<0.01). Interestingly, however, the one point and the one and two point networks (our simplest training sets) did not show increased complexity with increased convergence (Fig 2B). In fact, the 36-unit one point network had 0.3 fewer peaks than the 81-unit one point network (t(38) = -8.55, P<0.01), and the 36-unit one and two point network had 0.5 fewer peaks than the 81-unit one point two point network (t(38) = -10.00, P<0.01).
At this point we further abstracted our network constraints to examine how they influenced the learned receptive fields. First, we trained our network on the mixed stimulus set without non-negative regularization in W(1) and found qualitative changes in receptive field morphology such that they no longer had structural similarities to our previously documented empirical receptive fields  (Fig 3A). Second, we trained our network on the mixed stimulus set with extreme convergence (4 units in the first hidden layer) and, again, found the resulting receptive fields did not resemble our empirical receptive fields (Fig 3B). Last, we trained our network on each of the four stimulus sets without convergence (i.e. 784 units in the first hidden layer). We reasoned that such a network may not develop complex receptive fields because it did not need to compress the input space, especially for the single dot training set given its simple spatial statistics. However, receptive fields with multiple highly sensitive zones emerged for all training sets to varying degrees (Fig 3C).
Same format as Fig 2 but showing exemplar receptive fields learned by three alternative networks featuring architectures with relaxed constraints.
Given that our networks developed complex receptive fields under all network constraints and training sets, we investigated the functional consequences that such an arrangement had on sensory processing. In these analyses, we trained the network on unlabelled Mixed stimuli, then fixed W(1) and trained the remaining layers as a classifier using labelled Mixed stimuli. In our approach, the unsupervised training phase represents the encoding function of the tactile processing pathway, while the supervised training phase abstracts the more interpretive functions of the central nervous system. We compared this learned network against a network engineered to have single-peaked Gaussian receptive fields in W(1) on discrimination and identification tasks. For the engineered network, we selected the width of the Gaussian receptive field (SD = 3.0 steps) that resulted in best performance.
We first asked whether complex receptive fields benefit spatial accuracy. We had the network perform two-point discrimination, a task central to many studies of tactile acuity [13,18,19]. Specifically, we used a two-alternative forced choice paradigm and defined the difference limen as the separation distance between stimuli at which the network classified 75% of the stimuli correctly. The learned network had a mean difference limen of 6.94 (SD = 1.36) steps on our input space, which corresponds to a modelled distance of ~1–3 mm, depending on assumptions about mechanoreceptor innervation density. Overall, performance of learned and engineered networks were not significantly different with 81 units in the first hidden layer (t(45) = -1.85, P = 0.071; Fig 4A). Moreover, changing the degree of convergence from 81 to 36 units did not cause a statistically significant change in performance for either the learned or the engineered network (F(1, 82) = 0.31, P = 0.58; Fig 4A).
Performance of 81- and 36-hidden unit models either trained on mixed stimuli or engineered with fixed Gaussian receptive fields on the (A) two-point discrimination and (B) alphabet classification tasks. (A) Data points show the difference limen, defined as the separation distance at which the model classifies 75% of 2000 test points correctly. (B) Data points show the overall classification accuracy of 7800 tested Roman letters. Grey violin plots show the frequency distribution of difference limens and accuracy across model iterations. Performance is reported at varying levels of multiplicative or additive noise (see Methods). Groups may have different numbers of data points as some networks failed to converge and were not considered for testing.
We then asked whether complex receptive fields benefit network performance in a more difficult identification task. We assessed each networks ability to correctly classify new instances of characters from the Roman alphabet not previously seen by the network during the training phase (see Methods), as has been previously done with human participants . In this case, engineering W(1) to have single-peaked Gaussian receptive fields and increasing convergence both decreased network accuracy (F(1,79) = 103.78, P < 0.01, F(1, 39) = 107.23, P < 0.01, respectively), and the interaction between these factors was also significant (F(1, 79) = 7.05, P = 0.0096). That is, both learned and engineered networks performed well, but the learned networks outperformed engineered networks for both levels of convergence and the benefit of complex receptive fields increased with increased convergence (Fig 4B).
Finally, we asked whether complex receptive fields benefit network performance in the presence of noise. We introduced varying levels of normally distributed additive and multiplicative noise to the training data during both unsupervised and supervised training phases and then tested the network’s performance on a noiseless dataset. The effect of training noise on the network’s ability to classify characters from the Roman alphabet was substantial (Fig 4B). The learned network had an accuracy of 87.7% (SD = 1.1) with low levels of additive noise (see Methods) compared to 75.1% (SD = 2.5) for the fixed network with the same amount of noise, a statistically significant performance gap (t(41) = 20.65, P < 0.01). Convergence also significantly influenced classification accuracy under the different noise levels (F(6, 555) = 12.36, P < 0.01). The performance of the 36-unit network decreased by 1.4% compared to the 81-unit learned network with low levels of additive noise (t(38) = 4.25, P = 0.00013). In contrast, the performance of the 36-unit network with engineered Gaussian receptive fields decreased by 6.1% compared to the 81-unit engineered network (t(41) = 9.59, P < 0.01). The performance gap grew between learned and engineered networks with additional additive noise (Fig 4B). For all networks, multiplicative noise had a similar effect but much smaller effect size (Fig 4B).
A core feature of the tactile processing pathway is that there are many more mechanoreceptors in the skin of the hand than there are first order tactile neurons in the median and ulnar nerves. It is not surprising, therefore, that first order tactile neurons branch [1–3] since this is the only way they can innervate all the available mechanoreceptors. What may be surprising is the spatial complexity and apparent heterogeneity of the innervation pattern [4,5], a feature which has been overlooked or ignored in previous models of the tactile processing pathway [13,20–22]. Our work here leverages simple machine learning tools to provide two fundamental insights in this respect. First, we show that spatially complex receptive fields are a normatively good and, perhaps, biologically parsimonious, arising under a wide range of training sets and network architectures. Second, we show that spatially complex receptive fields benefit network performance, especially in relatively difficult tasks and in the presence of noise.
Heterogeneously sampling the input space is a good thing for the nervous system to do because the input space of sensory stimuli is inherently sparse. Neural networks like the one we use here implicitly learn the statistical regularities (and thus sparsity) of the stimuli to which they are exposed. Indeed, such a machine learning approach has been shown to reproduce biological receptive field properties of neurons at various levels of the visual processing pathway [12,23]. Another suggestion for a mechanism to exploit sparsity comes from the field of compressed sensing, which shows that randomly sampling the input space can, under reasonable assumptions, allow a system to fully reconstruct a sparse input signal with fewer measurements than that prescribed by the Shannon-Nyquist theorem [24–27]. Given an input with sparsity S (at most S non-zero terms), in many situations the input signal can be fully reconstructed by randomly sampling at a frequency greater than 2S with no noise or multiplicative noise, or 4S with additive noise [24,26], consistent with our observation that networks with more spatially complex receptive fields are particularly immune to additive noise. Fig 5 illustrates a cartoon compressed sensing scenario in our experimental setting, showing that a network with fully randomized weights in the first hidden layer can perform strikingly well on the alphabet discrimination task relative to the learned and fixed networks we described above. That is, the random network performs only slightly worse than the learned network and equivalent to the fixed network with no noise and, as expected, is able to better maintain its performance as the amount of additive noise is increased. This is not to say that the heterogeneity of how first-order tactile neurons innervate mechanoreceptors is random—indeed random connectivity yields receptive fields that are qualitatively distinct from those we record from humans (Fig 5B)–but, rather, that even random sampling can outperform pixel-like sampling with Gaussian receptive fields.
(A) Alphabet classification performance as a function of additive noise (same methodological details as in Fig 4b) for the 81-unit learned and fixed models, relative to a network with the same architecture but random connectivity in the first hidden layer (n = 20 for each group). Box plot represents the first and third quartiles; whiskers extend to the 95th percentile. (B) Example receptive fields from one representative unit in the learned, fixed, and the random models, respectively.
Feedforward neural network architecture
We designed a four layer feedforward network model with layers L1 to L2 containing s1 to s4 units respectively. s1 = 784, s2 = 81 or 36, s3 = 784, and s4 = 26 or 2 depending on if the network is trained to perform alphabet classification or two-point discrimination. The general form of feedforward computation was as follows: where W(l) denotes the weights from layers Ll to Ll+1, z(l+1) is the weighted sum of outputs from layer Ll, and a(l) is the output of layer Ll, after the activation function f. For unsupervised training (L1 to L3), we used a rectified linear function f(x) = max(0, x) for W(1) and a softmax function for W(2). For supervised training (L1 to L4), we used a rectifier for W(1) and W(2) and softmax for W(3).
Two-phased training and non-negativity constraint
We randomly initiated weights by drawing from distribution N(0, 0.01). The general learning algorithm was mini-batch gradient descent with mini-batches of size 256. We trained the network in two phases. In the unsupervised learning phase, we trained L1 to L3 as an autoencoder that reproduced the input. The goal of gradient descent was to minimize the categorical cross-entropy cost: where, for training instance x, p(x) is the true output (which is equivalent to input x in the unsupervised learning phase), q(x) is the predicted input, and R(W(1)) is the non-negativity constraint, leading to the learning rule
We incorporated the asymmetric regularization term, R(W(1)), where for each unit j of L1 and unit i of L2. c denotes an arbitrarily large constant, which we picked as 1000, that harshly penalized the network for learning negative weights in W(1).
In the supervised phase, we froze W(1) and trained L1 to L4 as a classifier. We reinitiated W(2) between the two training phases. Depending on the discrimination task to be performed, the network may operate as a binary (for two-point discrimination) or multiclass (for alphabet) classifier. Gradient descent minimized the cross-entropy cost:
Network hyperparameters used during training varied among different network architectures and training sets. Networks that did not reach convergence in the number of iterations were removed from testing.
We generated all training inputs X such that . We generated Gaussian-points stimuli by initializing one or two peaks |xij| = 10 where i, j are integers chosen independently from distribution U(0, 27), then passed through a two-dimensional Gaussian filter with width σ = 3.0.
We generated Roman letters stimuli as Helvetica characters normalized to 17 steps in height. We used similar height scaling for Braille characters. The filled portions of characters were initiated as |xij| = 1. We subjected each character to a random rotational angle drawn from distribution N(0, 20) in degrees, followed by random horizontal and vertical translation drawn from distribution N(0, 5) in steps.
We generated 60,000 training stimuli of each class. For Roman letters and Braille characters, there was approximately equal proportion of each character. Gaussian-points were evenly split between one and two points (i.e. 30,000 of each). We used standard one-hot encoding for labelling in supervised training.
Receptive field complexity
We bootstrapped 1000 receptive fields from each network. First, we designed a peak counting algorithm that calculated the number of significant local maxima contained in each receptive field. For each receptive field R, we define rij as a peak if 1) it is a local maximum 2) |rij| > (maxkrk)/2, that is, the value of rij is greater than half of the global maximum, and 3) rij is at least 5 steps away from the next closest local maximum. These criteria prevent low amplitude noise from being counted as peaks. Second, we analyzed receptive fields in the frequency domain by performing discrete two-dimensional Fourier transformation using the Fast Fourier Transform algorithm. We performed Fourier transformation after normalizing sampled RFs by their peak values such that maxkrk = 1.0. Last, to compare information shared by each pair of networks, we used mutual information between pairs of bootstrapped RFs normalized by their respective entropies, such that 1.0 means perfect correlation and 0 means no mutual information. We binned weights into 10,000 bins before calculating mutual information so that the control group (learned versus learned) RFs has normalized mutual information of close to 1.0.
We assessed network accuracy in two-point discrimination and alphabet classification. We implemented two-point discrimination using a two-alternative forced choice paradigm. We generated 2000 new stimuli (not used to train the network) of one and two Gaussian-points in equal proportions. Two Gaussian-points were spaced symmetrically about the center of the input space at distances 0 to 22 steps apart with increments of 2 steps. We subjected two Gaussian-points to a random integer rotational angle drawn from distribution U(0, 90) in degrees. We defined the difference limen, or just-noticeable difference, for two-point discrimination as the distance at which the network correctly classified 75% of test stimuli. We estimated difference limen using cubic spline interpolation on the full accuracy plot.
We assessed the network on alphabet classification by testing it on 7800 new characters (not used to train the network, as above) with 300 instances of each letter, subjected to rotational and translational variability as described above.
To assess robustness against noise, we trained the networks with noise before testing them on noiseless data. We implemented multiplicative noise on input X as εij = c · u · xij for each coordinate i, j in X, where u was randomly drawn from distribution N(0, 0.01). We implemented additive noise as θij = c · v · maxkxk, where v was randomly drawn from distribution N(0, 0.01). We designated c = 1.0 as low-level noise and c = 3.0 as high-level noise. Noise was re-instantiated at the beginning of each training epoch.
This work was supported by the Canadian Institutes of Health Research (Foundation Grant to JAP: 3531979) and the Natural Science and Engineering Research Council of Canada (Discovery Grant to MJD). JAP received a salary award from the Canada Research Chairs Program.
- 1. Cauna N. Nerve supply and nerve endings in Meissner’s corpuscles. Am J Anat. 1956;99: 315–50. pmid:13372495
- 2. Cauna N. The mode of termination of the sensory nerves and its significance. J Comp Neurol. Wiley Subscription Services, Inc., A Wiley Company; 1959;113: 169–209.
- 3. Nolano M, Provitera V, Crisci C, Stancanelli A, Wendelschafer-Crabb G, Kennedy WR, et al. Quantification of myelinated endings and mechanoreceptors in human digital skin. Ann Neurol. Wiley Subscription Services, Inc., A Wiley Company; 2003;54: 197–205. pmid:12891672
- 4. Pruszynski AJ, Johansson RS. Edge-orientation processing in first-order tactile neurons. Nat Neurosci. 2014; pmid:25174006
- 5. Johansson RS. Tactile sensibility in the human hand: Receptive field characteristics of mechanoreceptive units in the glabrous skin area. J Physiol. 1978;281: 101–123. pmid:702358
- 6. Pruszynski JA, Flanagan JR, Johansson RS. Fast and accurate edge orientation processing during object manipulation. eLife. 2018;7. pmid:29611804
- 7. Pruszynski JA, Johansson RS, Flanagan JR. A Rapid Tactile-Motor Reflex Automatically Guides Reaching toward Handheld Objects. Curr Biol. 2016;26: 788–792. pmid:26898466
- 8. Bensmaia SJ, Denchev P V., Dammann JF, Craig JC, Hsiao SS. The representation of stimulus orientation in the early stages of somatosensory processing. J Neurosci. 2008;28: 776–786. pmid:18199777
- 9. Yau JM, Pasupathy A, Fitzgerald PJ, Hsiao SS, Connor CE, Albright TD. Analogous intermediate shape coding in vision and touch. PNAS. 2009;106: 16457–62. pmid:19805320
- 10. Fitzgerald PJ, Lane JW, Thakur PH, Hsiao SS. Receptive field properties of the macaque second somatosensory cortex: Representation of orientation on different finger pads. J Neurosci. 2006;26: 6473–84. pmid:16775135
- 11. Grigg P. Biophysical studies of mechanoreceptors. J Appl Physiol. 1986;60: 1107–1115. pmid:2422151
- 12. Olshausen BA, Field DJ. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature. 1996;381: 607–609. pmid:8637596
- 13. Wheat HE, Goodwin AW, Browning AS. Tactile resolution: Peripheral neural mechanisms underlying the human capacity to determine positions of objects contacting the fingerpad. J Neurosci. 1995;75: 5582–5595.
- 14. Vega-Bermudez F, Johnson KO, Hsiao SS. Human tactile pattern recognition: Active versus passive touch, velocity effects, and patterns of confusion. J Neurophysiol. 1991;65: 531–46. pmid:2051193
- 15. Phillips JR, Johnson KO, Hsiao SS. Spatial pattern representation and transformation in monkey somatosensory cortex. Proc Natl Acad Sci U S A. National Academy of Sciences; 1988;85: 1317–21.
- 16. Johnson KO, Lamb GD. Neural mechanisms of spatial tactile discrimination: Neural patterns evoked by braille-like dot patterns in the monkey. J Physiol. 1981;310: 117–144. pmid:7230030
- 17. Phillips JR, Johansson RS, Johnson KO. Representation of braille characters in human nerve fibres. Exp Brain Res. Springer-Verlag; 1990;81: 589–592.
- 18. Johnson KO, Phillips JR. Tactile spatial resolution. I. Two-point discrimination, gap detection, grating resolution, and letter recognition. J Neurophysiol. 1981;46.
- 19. Tong J, Mao O, Goldreich D, Pleger B, Dinse HR, Bach M. Two-point orientation discrimination versus the traditional two-point test for tactile spatial acuity assessment. Front Hum Neurosci. 2013;7: 579. pmid:24062677
- 20. Friedman RM, Khalsa PS, Greenquist KW, LaMotte RH. Neural coding of the location and direction of a moving object by a spatially distributed population of mechanoreceptors. J Neurosci. 2002;22: 9556–9566. pmid:12417680
- 21. Dodson MJ, Goodwin AW, Browning AS, Gehring HM. Peripheral neural mechanisms determining the orientation of cylinders grasped by the digits. J Neurosci. 1998;18: 521–530. pmid:9412528
- 22. Saal HP, Delhaye BP, Rayhaun BC, Bensmaia SJ. Simulating tactile signals from the whole hand with millisecond precision. Proc Natl Acad Sci U S A. National Academy of Sciences; 2017;
- 23. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad Sci. 2014;111: 8619–8624. pmid:24812127
- 24. Candes EJ, Wakin MB. An introduction to compressive sampling. IEEE Signal Process Mag. 2008;25: 21–30.
- 25. Candès EJ, Romberg J, Tao T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory. 2006;52: 489–509.
- 26. Candès EJ. The restricted isometry property and its implications for compressed sensing. Comptes Rendus Math. 2008;346: 589–592.
- 27. Donoho DL, L. D. Compressed sensing. IEEE Trans Inf Theory. IEEE Press; 2006;52: 1289–1306.
- 28. Lemme A, Reinhart RF, Steil JJ. Efficient online learning of a non-negative sparse