^{1}

^{2}

^{1}

^{3}

^{4}

^{5}

^{6}

^{1}

^{6}

^{7}

^{8}

^{9}

^{10}

The authors have declared that no competing interests exist.

First-order tactile neurons have spatially complex receptive fields. Here we use machine-learning tools to show that such complexity arises for a wide range of training sets and network architectures. Moreover, we demonstrate that this complexity benefits network performance, especially on more difficult tasks and in the presence of noise. Our work suggests that spatially complex receptive fields are normatively good given the biological constraints of the tactile periphery.

First-order tactile neurons in the hairless skin of the human hand have distal axons that branch in the skin and form many transduction sites [

(A) Examples of receptive fields from human first-order tactile neurons terminating in the fingertip acquired via microneurography. Color indicates the relative firing rate of the neuron when stimulated with a small punctate stimulus. For full details, see Pruszynski and Johansson (2014). (B) Graphic representation of a cross-section through the human glabrous skin. Note how a single afferent neurons branches and innervates multiple mechanoreceptive end organs. (C) Our four-layer feedforward neural network. The first layer models a small patch of skin, W^{(1)} represents receptive fields, and the second layer models first order neurons. Layers 3 and 4 are a functional abstraction of the central nervous system. The relative sizes of each layer are shown but not to scale. Arrows represent fully connected feedforward weights between subsequent layers. End organs and first order neurons in (B) are colour matched with the layers that represent them in the model. (D) Examples of training data used to represent tactile stimuli. Each stimulus is shown on a 28 x 28 step grid. Stimuli were passed through a Gaussian filter and randomly rotated and translated. Points data were also randomly scaled.

We abstracted the tactile processing pathway with a four-layer feedforward neural network (^{(1)}—represents the receptive fields of first-order tactile neurons. Our network was trained on a range of stimuli including single points, multiple points, as well as Roman and Braille characters (^{(1)} to simulate the fact that first-order tactile neurons can only be excited when their transduction sites are stimulated [

We first asked under what conditions, if any, our network learns spatially complex receptive fields. In our main analysis, the 784 units in the input layer converged to 81 units in the first hidden layer, estimating the fact that first-order tactile neurons innervate on the order of ten mechanoreceptors [

We trained our network on each of these training sets in an unsupervised fashion and examined the resulting receptive fields (i.e. the ^{(1)} matrix). All networks, even those trained with the simplest training set, exhibited receptive fields with multiple areas of high sensitivity (

(A) Examples of receptive fields learned by the 81- and 36-hidden unit models after training on different training sets (rows). Each receptive field is shown on a 28 x 28 step grid. Heat maps show areas with high weight values, which represent highly sensitive zones. Samples were chosen to show a variety of receptive field morphologies. The number on the bottom left corner of each receptive field is the number of peaks returned by our peak counting algorithm, which measures receptive field complexity. (B) The average complexity of each network under different architectures and training sets. Each data point is the mean peak count of receptive fields from that model on one iteration, with grey violin plots showing the overall frequency distribution across the 20 iterations we performed for each architecture and training set.

We next asked how the degree of convergence between the input and first hidden layers influenced receptive fields. That is, how physical constraints placed on the number of first-order tactile neuron axons traveling within the peripheral nerve should affect connectivity to mechanoreceptors in the skin. We reasoned that increasing convergence would increase receptive field complexity, since this smaller set of units must still encode the same set of inputs. We tested this idea by decreasing the size of the first hidden layer from 81 to 36 units, closer to the lower limit of biologically relevant convergence [

At this point we further abstracted our network constraints to examine how they influenced the learned receptive fields. First, we trained our network on the mixed stimulus set without non-negative regularization in ^{(1)} and found qualitative changes in receptive field morphology such that they no longer had structural similarities to our previously documented empirical receptive fields [

Same format as

Given that our networks developed complex receptive fields under all network constraints and training sets, we investigated the functional consequences that such an arrangement had on sensory processing. In these analyses, we trained the network on unlabelled Mixed stimuli, then fixed ^{(1)} and trained the remaining layers as a classifier using labelled Mixed stimuli. In our approach, the unsupervised training phase represents the encoding function of the tactile processing pathway, while the supervised training phase abstracts the more interpretive functions of the central nervous system. We compared this learned network against a network engineered to have single-peaked Gaussian receptive fields in ^{(1)} on discrimination and identification tasks. For the engineered network, we selected the width of the Gaussian receptive field (SD = 3.0 steps) that resulted in best performance.

We first asked whether complex receptive fields benefit spatial accuracy. We had the network perform two-point discrimination, a task central to many studies of tactile acuity [

Performance of 81- and 36-hidden unit models either trained on mixed stimuli or engineered with fixed Gaussian receptive fields on the (A) two-point discrimination and (B) alphabet classification tasks. (A) Data points show the difference limen, defined as the separation distance at which the model classifies 75% of 2000 test points correctly. (B) Data points show the overall classification accuracy of 7800 tested Roman letters. Grey violin plots show the frequency distribution of difference limens and accuracy across model iterations. Performance is reported at varying levels of multiplicative or additive noise (see

We then asked whether complex receptive fields benefit network performance in a more difficult identification task. We assessed each networks ability to correctly classify new instances of characters from the Roman alphabet not previously seen by the network during the training phase (see ^{(1)} to have single-peaked Gaussian receptive fields and increasing convergence both decreased network accuracy (F(1,79) = 103.78, P < 0.01, F(1, 39) = 107.23, P < 0.01, respectively), and the interaction between these factors was also significant (F(1, 79) = 7.05, P = 0.0096). That is, both learned and engineered networks performed well, but the learned networks outperformed engineered networks for both levels of convergence and the benefit of complex receptive fields increased with increased convergence (

Finally, we asked whether complex receptive fields benefit network performance in the presence of noise. We introduced varying levels of normally distributed additive and multiplicative noise to the training data during both unsupervised and supervised training phases and then tested the network’s performance on a noiseless dataset. The effect of training noise on the network’s ability to classify characters from the Roman alphabet was substantial (

A core feature of the tactile processing pathway is that there are many more mechanoreceptors in the skin of the hand than there are first order tactile neurons in the median and ulnar nerves. It is not surprising, therefore, that first order tactile neurons branch [

Heterogeneously sampling the input space is a good thing for the nervous system to do because the input space of sensory stimuli is inherently sparse. Neural networks like the one we use here implicitly learn the statistical regularities (and thus sparsity) of the stimuli to which they are exposed. Indeed, such a machine learning approach has been shown to reproduce biological receptive field properties of neurons at various levels of the visual processing pathway [

(A) Alphabet classification performance as a function of additive noise (same methodological details as in ^{th} percentile. (B) Example receptive fields from one representative unit in the learned, fixed, and the random models, respectively.

We designed a four layer feedforward network model with layers _{1} to _{2} containing _{1} to _{4} units respectively. _{1} = 784, _{2} = 81 or 36, _{3} = 784, and _{4} = 26 ^{(l)} denotes the weights from layers _{l} to _{l+1}, ^{(l+1)} is the weighted sum of outputs from layer _{l}, and ^{(l)} is the output of layer _{l}, after the activation function _{1} to _{3}), we used a rectified linear function ^{(1)} and a softmax function for ^{(2)}. For supervised training (_{1} to _{4}), we used a rectifier for ^{(1)} and ^{(2)} and softmax for ^{(3)}.

We randomly initiated weights by drawing from distribution _{1} to _{3} as an autoencoder that reproduced the input. The goal of gradient descent was to minimize the categorical cross-entropy cost:
^{(1)}) is the non-negativity constraint, leading to the learning rule

We incorporated the asymmetric regularization term[^{(1)}), where
_{1} and unit _{2}. ^{(1)}.

In the supervised phase, we froze ^{(1)} and trained _{1} to _{4} as a classifier. We reinitiated ^{(2)} between the two training phases. Depending on the discrimination task to be performed, the network may operate as a binary (for two-point discrimination) or multiclass (for alphabet) classifier. Gradient descent minimized the cross-entropy cost:

The learning rule in this phase was:

Network hyperparameters used during training varied among different network architectures and training sets. Networks that did not reach convergence in the number of iterations were removed from testing.

We generated all training inputs _{ij}| = 10 where

We generated Roman letters stimuli as Helvetica characters normalized to 17 steps in height. We used similar height scaling for Braille characters. The filled portions of characters were initiated as |_{ij}| = 1. We subjected each character to a random rotational angle drawn from distribution

We generated 60,000 training stimuli of each class. For Roman letters and Braille characters, there was approximately equal proportion of each character. Gaussian-points were evenly split between one and two points (i.e. 30,000 of each). We used standard one-hot encoding for labelling in supervised training.

We bootstrapped 1000 receptive fields from each network. First, we designed a peak counting algorithm that calculated the number of significant local maxima contained in each receptive field. For each receptive field _{ij} as a peak if 1) it is a local maximum 2) |_{ij}| > (_{k}_{k})/2, that is, the value of _{ij} is greater than half of the global maximum, and 3) _{ij} is at least 5 steps away from the next closest local maximum. These criteria prevent low amplitude noise from being counted as peaks. Second, we analyzed receptive fields in the frequency domain by performing discrete two-dimensional Fourier transformation using the Fast Fourier Transform algorithm. We performed Fourier transformation after normalizing sampled RFs by their peak values such that _{k}_{k} = 1.0. Last, to compare information shared by each pair of networks, we used mutual information between pairs of bootstrapped RFs normalized by their respective entropies, such that 1.0 means perfect correlation and 0 means no mutual information. We binned weights into 10,000 bins before calculating mutual information so that the control group (learned versus learned) RFs has normalized mutual information of close to 1.0.

We assessed network accuracy in two-point discrimination and alphabet classification. We implemented two-point discrimination using a two-alternative forced choice paradigm. We generated 2000 new stimuli (not used to train the network) of one and two Gaussian-points in equal proportions. Two Gaussian-points were spaced symmetrically about the center of the input space at distances 0 to 22 steps apart with increments of 2 steps. We subjected two Gaussian-points to a random integer rotational angle drawn from distribution

We assessed the network on alphabet classification by testing it on 7800 new characters (not used to train the network, as above) with 300 instances of each letter, subjected to rotational and translational variability as described above.

To assess robustness against noise, we trained the networks with noise before testing them on noiseless data. We implemented multiplicative noise on input _{ij} = _{ij} for each coordinate _{ij} = _{k}_{k}, where

This work was supported by the Canadian Institutes of Health Research (Foundation Grant to JAP: 3531979) and the Natural Science and Engineering Research Council of Canada (Discovery Grant to MJD). JAP received a salary award from the Canada Research Chairs Program.