Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Neuromorphic computing for content-based image retrieval

  • Te-Yuan Liu,

    Roles Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Target Corporation, Sunnyvale, California, United States of America

  • Ata Mahjoubfar ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    ata.mahjoubfar@target.com

    Affiliation Target Corporation, Sunnyvale, California, United States of America

  • Daniel Prusinski,

    Roles Methodology, Resources, Validation, Writing – original draft, Writing – review & editing

    Affiliation Target Corporation, Sunnyvale, California, United States of America

  • Luis Stevens

    Roles Conceptualization, Investigation, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing

    Affiliation Target Corporation, Sunnyvale, California, United States of America

Abstract

Neuromorphic computing mimics the neural activity of the brain through emulating spiking neural networks. In numerous machine learning tasks, neuromorphic chips are expected to provide superior solutions in terms of cost and power efficiency. Here, we explore the application of Loihi, a neuromorphic computing chip developed by Intel, for the computer vision task of image retrieval. We evaluated the functionalities and the performance metrics that are critical in content-based visual search and recommender systems using deep-learning embeddings. Our results show that the neuromorphic solution is about 2.5 times more energy-efficient compared with an ARM Cortex-A72 CPU and 12.5 times more energy-efficient compared with NVIDIA T4 GPU for inference by a lightweight convolutional neural network when batch size is 1 while maintaining the same level of matching accuracy. The study validates the potential of neuromorphic computing in low-power image retrieval, as a complementary paradigm to the existing von Neumann architectures.

Introduction

Neuromorphic computing is a non-von Neumann computer architecture, aiming to obtain ultra-high-efficiency machines for a diverse set of information processing tasks by mimicking the temporal neural activity of the brain [13]. In neuromorphic computing, numerous spiking signals carry information among computing units i.e. artificial neurons, synchronously or asynchronously [4], forming a mesh-like, nonlinear dynamical system [5]. The information can be encoded in the temporal characteristics of the signals, for example firing rates [6].

In this work, we implement and analyze a low-power computer vision model for visual search engines and recommender systems that evaluate the visual similarity between a query image and a database of product images. In conventional machine learning pipelines, this is often performed by transfer learning using a deep convolutional neural network (CNN) [7] pre-trained on a large-scale dataset e.g., ImageNet [8, 9] and fine-tuned on a domain-specific image dataset e.g., DeepFashion2 for apparel [10]. The embeddings of the images are calculated by inferring the activation values of the last few layers of the neural network as visual features [1117]. The distances between embeddings of the query image and the database images are used to find the nearest neighbors for the query image in the embeddings space, identifying the most similar items visually [18].

Here, we evaluate the same visual search and recommendation technique using embeddings generated by the neuromorphic neural networks. We train spiking convolutional neural networks on a clothing-specific image classification dataset, Fashion-MNIST [19]. The trained spiking neural networks are then used for extraction of features for the product images and the query images. The embeddings will be based on the patterns of the temporal spikes, and similar to the conventional convolutional neural networks, they are used for finding nearest visual neighbors of the query image among product images. Our results show considerable power efficiency in finding the most visually similar products using neuromorphic chips and particularly Loihi [20].

Methods

To explore applications of neuromorphic computing in image retrieval, we built and deployed a spiking neural network (SNN) on Intel’s Loihi neuromorphic chip. Our image search pipeline is shown in Fig 1. Firstly, we convert a trained artificial neural network (ANN) into a spiking neural network (SNN) and deploy it on Loihi chip. We then feed training and test images into the SNN and probe the neurons of the layer before the output layer to get image embeddings. Finally, nearest neighbor search is employed on CPU cores to find the best matches in the training dataset for each test image.

thumbnail
Fig 1. Pipeline of image retrieval by spiking neural network.

https://doi.org/10.1371/journal.pone.0264364.g001

In the first step, we train different ANNs by minimizing the cross-entropy loss function for the classification of the Fashion-MNIST dataset via backpropagation. Then, we convert the ANNs into SNNs, compare the classification test accuracies of SNNs, and select the most accurate SNN model. Suggested by Hunsberger and Eliasmith [21] and Sengupta et al. [22], we reduce the feature map size using average pooling rather than max pooling, and employ dropout to regularize [23].

Note that there are two constraints on the neural network architectures that can be deployed on Loihi chips. One constraint is that the synaptic memory, which stores neuron weights per neuromorphic core is 128 KB. This indicates that the number of parameters associated with neurons in a core is limited. The other constraint is the maximum fan-in of 4,096 per neuromorphic core, which means the input size of the neurons cannot exceed 4,096 [24]. These two constraints result in neural networks deployed on Loihi chip to have relatively slim layers rather than wide layers.

Given an ANN, the conversion is done through building a SNN which has the same architecture as the ANN, but changing the neuron type to Leaky, Integrate and Fire (LIF) neuron with soft-reset, which is a variant of Residual Membrane Potential (RMP) neuron proposed by Han et al. [25]. Then, floating-point ANN parameters are scaled to integers and transplanted to the SNN as Loihi chip executes operations with integer numbers. The spiking threshold of each LIF neuron is determined at the same time as the parameter scaling, using a method provided by Loihi NxSDK [26]. The method of parameter scaling and threshold calculation is shown in Algorithm 1 (For more details, see mapping spiking neural networks onto a manycore neuromorphic architecture by Lin et al. [26]).

Algorithm 1 Parameter scaling and threshold calculation

Require: Normalized Input: input ∈ [0, 1]N×H×W×C

1: WMAX = 2num_weight_bits−1 − 1, bMAX = 2num_bias_bits−1 − 1

2: slope = 1, param_percentile = 99.999, activation_percentile = 99.999

3: for snn_layer, ann_layer in zip(SNN.layers, ANN.layers) do

4:  if snn_layer is input layer then

5:   param_scale = WMAX

6:   dvdt = input × param_scale

7:  else

8:   weight, bias = ann_layer.get_param()

9:   bias = bias × slope

10:   weight_norm = percentile(abs(weight), param_percentile)

11:   bias_norm = percentile(abs(bias), param_percentile)

12:   

13:   param_scale = min(weight_ratio, bias_ratio)

14:   weight = int(weight × param_scale), bias = int(bias × param_scale)

15:   snn_layer.set_param(weight, bias)

16:   dvdt = ReLU(weightspikerate + bias)

17:  end if

18:  threshold = int(percentile(dvdt, activation_percentile))

19:  snn_layer.threshold = threshold

20:  

21:  

22: end for

Similar to the spike-norm algorithm proposed by Sengupta [22], a set of images are fed into the network and the threshold at each layer is set to the maximum activation at that layer. However, Loihi chip uses a rate-based simulation of SNN instead of doing the actual SNN forward-pass to calculate the spiking thresholds.

In Algorithm 1, there are two important variables. One is named param_scale, which gives the factor we use to scale the ANN parameters to integers to get the SNN parameters. The other one is named threshold, which is the spiking threshold that decides the LIF neuron spiking activity.

Algorithm 1 requires a batch of input images to tune the spiking threshold, and they are represented as an N × H × W × C matrix with floating-point elements ranging between zero and one; N for the number of images and H, W, C for the image’s height, width, and channel. Line 1 set the WMAX, bMAX, and line 2 set the slope, param_percentile, activation_percentile variables. If we use 9 bits to represent SNN weights on Loihi chip, then the maximum weight WMAX is 29−1 − 1 = 255. We set the maximum bias bMAX in the same way. The slope variable shows the ratio between the SNN neuron output and the ANN neuron output at the current layer and is initialized to one.

In line 3, we get snn_layer and its corresponding ann_layer. From line 4 to 6, if snn_layer is the input layer which encodes input images into spike time series, we set param_scale to WMAX and multiply input by param_scale to get the dvdt, which is the neuron membrane potential increment rate. Note that dvdt here still has the shape of N × H × W × C as the input layer only multiplies the input by a scalar.

From line 7 to 16, if the snn_layer is not the input layer, we have to scale the ANN parameters and set the SNN parameters. In line 8, we get the ANN weight and bias from ann_layer. Then in line 9, we multiply bias by slope to update bias with the scaling of the previous layer. In line 10, we set weight_norm as one single value by getting a percentile value of abs(weight) and do likewise to set bias_norm in line 11. Then in line 12, we set weight_ratio as the ratio between WMAX and weight_norm to find out how many times we can scale up weight without exceeding WMAX, and we do the same thing to calculate bias_ratio. In line 13, we compare weight_ratio and bias_ratio to set the param_scale to the smaller value. In line 14 and 15, we use param_scale to scale the ANN weight and bias, quantizing them to integers, and set them as the parameters of snn_layer. In line 16, we calculate dvdt by simulating the ANN neuron activation and the shape of dvdt becomes N × FH × FW × FC, where FH, FW, and FC stand for the feature map’s height, width, and channel.

In line 18 and 19, we set the threshold of neurons at snn_layer to the quantized percentile value of dvdt so there is one single threshold value for this layer. Then, in line 20, we calculate the spikerate, an estimation of the spiking probability of neurons, as the output of snn_layer, which has the same shape as dvdt. In line 21, we update slope by multiplying it with the ratio of param_scale and threshold.

Now having a SNN at hand, we start feeding images into the network. For each image, we probe the neurons of the layer before the output layer at the last execution time step to get the neuron membrane potentials. The membrane potential vector is then the embedding of the input image.

Our SNN takes images in the training and test sets as inputs and generates their embeddings. We see the training image embeddings as a corpus of image features. For each test image, we apply nearest neighbor search using cosine similarity to find images in the corpus that are the closest to the test image in the embedding space.

Results

We implemented and tested 3-layer, 4-layer, and 5-layer SNNs for classification of Fashion-MNIST dataset. We selected Fashion-MNIST as our evaluation dataset because it is suitable for benchmarking small-footprint computer vision models. Note that we use this dataset without data augmentation in our experiments. The architectures analyzed are shown in Table 1. In the architecture column, number of convolutional kernels (number of output channels) in each layer are concatenated by hyphens. Note that the last architecture in Table 1 was not deployable on the Loihi chip because the maximum fan-in was exceeded. The fourth architecture in Table 1 scores the best classification test accuracy when converted to a SNN; this architecture is shown in Fig 2. It consists of three layers, including two convolutional layers and one dense layer. We use this SNN architecture to conduct the rest of the experiments. The image embeddings are generated by flattening the output of the last convolutional layer from a 4 × 4 × 64 tensor to a 1024-dimensional vector.

thumbnail
Table 1. Various CNN architectures and their performance.

https://doi.org/10.1371/journal.pone.0264364.t001

The SNN layer partition on a Loihi chip is shown in Fig 3. There are 128 neuromorphic cores on a Loihi chip in 8 rows with 16 cores a row. We can see each layer occupies certain number of the neuromorphic cores [26]. Our best performing SNN (Fig 2) is relatively compact, so the number of cores occupied is small compared with the number of cores available on a Loihi chip.

SNN has an intrinsic execution time parameter, called number of time steps, which is used to define how many discrete time slots are given to the network to process information during inference. It is intuitive that the more time steps we give our SNN to process the information, the higher performance we get, but the runtime is also larger. This tradeoff between performance and number of time steps is shown in Fig 4. We can see that performance metrics skyrocket between 4 time steps and 16 time steps and then plateau, showing that using 16 time steps is enough to achieve certain degree of performance. The error bars indicate the negligible variations among five independently trained networks, displaying reproducibility of our results.

thumbnail
Fig 4. Tradeoff between performance metrics and number of time steps.

https://doi.org/10.1371/journal.pone.0264364.g004

The relation between the runtime and the number of time steps is shown in Fig 5. As we gradually increase the number of time steps, the runtime scales up almost linearly. However, the runtime is independent of the number of time steps for small numbers, e.g., 4 or 8 time steps because the overhead takes up the majority of the runtime.

thumbnail
Fig 5. Tradeoff between runtime and number of time steps.

https://doi.org/10.1371/journal.pone.0264364.g005

The performance comparison between the selected SNN and its ANN counterpart is shown in Table 2. Note that the number in the parentheses next to the model type is the number of time steps used per example during SNN inference. The ANN and SNN have the same network architecture but different neuron types and parameters. We can see that the SNN using 128 time steps have accuracies very close to the ANN, indicating that the SNN is capable of achieving comparable performance with its ANN counterpart. Using fewer time steps, e.g., 16 time steps, our SNN suffers a classification accuracy degradation, but the gap is smaller than 5%. However, the top-1 and top-3 accuracies of the SNN with 16 time steps is still very close to the ANN. This means that the SNN with 16 time steps per inference generates reasonable embeddings, suitable for the image retrieval task.

Several examples of the SNN image retrieval are shown in Fig 6. The first column shows query images, each from a class in the dataset. The next three columns present three randomly-selected images from the corpus with the same class label as the query images. The next three columns demonstrate the top-three images selected by image search from the corpus using the ANN-generated image embeddings. The last three columns show the top-three images selected from the corpus using the SNN-generated image embeddings. It is obvious that image retrieval results, either using ANN or SNN, are visually closer to the query images compared with the randomly-selected images from the corpus. Again, our SNN implemented on the Loihi chip demonstrates comparable performance with the ANN.

thumbnail
Fig 6. Examples of the image retrieval by the artificial (ANN) and spiking (SNN) neural networks.

https://doi.org/10.1371/journal.pone.0264364.g006

The neural network inference latency (forward-pass runtime per example) comparison between the selected SNN and its ANN counterpart is shown in Table 3. Note that the Loihi could not support batch sizes larger than one at the time of the experiments. We can see that when the batch size equals one, the SNN on Loihi using 16 time steps has approximately 13.8x/11.3x longer runtime than the ANNs on Xeon/i7 CPUs, 3.8x longer than the ANN on ARM CPU, and 2.3x/2.5x longer than the ANNs on V100/T4 GPUs. The difference is even more dramatic if we use larger batch sizes for inference on the CPUs or GPUs. It is obvious that the SNNs on Loihi chip do not have an advantage in terms of the inference latencies. Several time steps that take an SNN to converge to its results leads to long execution times. Reducing the runtime is a direction where we look forward the neuromorphic hardware to improve upon.

The comparison of the average power consumptions between the SNNs and the ANNs is shown in Table 4. With the batch size set to one, the SNN with 16 time steps uses 217.0x/24.0x less power than the ANNs on Xeon/i7 CPUs, 9.3x less than the ANN on ARM CPU, and 40.8x/31.3x less than the ANNs on V100/T4 GPU. This is where neuromorphic hardware starts to shine as it consumes way less power than the conventional hardware. Utilizing the temporal sparsity of SNN appropriately, we believe the neuromorphic hardware can further reduce its power consumption. Another thing that we can observe from Table 4 is that the static (idle) power dominates the power consumption of the Loihi chip.

We measured the total energy used per inference (forward pass) reported in Table 5. These results can also be estimated by combining the results of Tables 3 and 4. As summarized in Table 5, with the batch size set to one, the energy consumption of SNN with 16 time steps is 15.6x/3.2x less than the ANNs on Xeon/i7 CPUs, 2.5x less than the ANN on ARM CPU, and 17.5x/12.5x less than the ANNs on V100/T4 GPUs per inference. This proves the benefits of the neuromorphic hardware in the low energy-budget applications of machine learning, particularly lightweight image search engines and visual recommender systems. It is apparent that when large batch sizes are used, CPUs and GPUs consume less energy per example. However, there are many use cases where inference is executed in small batches, and they are the targets for neuromorphic hardware in the current stage.

Another observation is that the energy consumption for a small number of time steps does not scale linearly. For example, the energy consumption per inference for 128 time steps is only 4.0 times larger than 16 time steps (Table 5). This is due to the constant portion of the energy needed for running each inference, which does not change by the number of time steps.

We use energy probes provided by Loihi NxSDK to perform the power and energy measurements of the Loihi chips. For the CPUs, we use Intelligent Platform Management Interface (IPMI) and the system profiler information to measure the power consumption and then, we integrate the power readings over time to get the energy consumption. For the GPUs, we use NVIDIA System Management Interface (nvidia-smi) [27] to measure the power consumption and again, we integrate the power readings to get the energy consumption.

Discussion

Our results confirm the energy efficiency of the Loihi neuromorphic chip. However, we noticed that the inference latency becomes impractically large when a network of Loihi chips are used. We ponder this is due to the interchip communication latencies. Nowadays in many applications, deep neural networks models with millions of parameters and billions of intermediate activations are used. Neuromorphic chips need to scale up, possibly by increasing the number of neuromorphic cores and on-chip memory, to support these applications in future.

The energy efficiency obtained by the Loihi chip in our experiments is owing to two factors. First, the model parameters are stored in the local memory of the neuromorphic cores, minimizing the energy cost of the data transfer to the shared memory. Second, the neuromorphic cores are optimized for specialized functionalities; this efficiency is very similar to that of other specialized accelerators e.g., graphical processing units (GPUs). The typical ANN-to-SNN conversion methods, including Algorithm 1 used here, do not capitalize on temporal sparsity, possible on the neuromorphic processors, as in the brain. So, designing better training and conversion algorithms to employ temporally sparse signals for neuromorphic machine learning is a promising future direction.

Finally, it is worthwhile to emphasize that to implement the complete image retrieval pipeline, we performed the nearest neighbor search on the host CPU cores. It is possible to carry out an approximate k-nearest neighbors algorithm on the neuromorphic chips [28], but we believe that the CPU cores are largely needed for some stages of a machine learning pipeline. Thus, the role of neuromorphic computing is to improve the performance of special tasks and supplement the general-purpose processors.

Conclusion

We studied the application of the Loihi chip, a neuromorphic computing hardware developed by Intel, in image retrieval. Our results show that the generation of the deep learning embeddings by spiking neural networks for lightweight convolutional neural networks is about 2.5 times more energy-efficient compared with a CPU and 12.5 times more energy-efficient compared with a GPU. We confirm the long-term potential of neuromorphic computing in machine learning, not as a replacement for the predominant von Neumann architecture, but as accelerated coprocessors.

Acknowledgments

We would like to thank Hari Govind, Ramesh Subramonian, and Charles Leu at Target and Andreas Wild at Intel for helpful suggestions and constructive comments. We are also grateful of the Intel Neuromorphic Research Community for giving us access to the Loihi chips.

References

  1. 1. James CD, Aimone JB, Miner NE, Vineyard CM, Rothganger FH, Carlson KD, et al. A historical survey of algorithms and hardware architectures for neural-inspired and neuromorphic computing applications. Biologically Inspired Cognitive Architectures. 2017;19:49–64.
  2. 2. Wunderlich T, Kungl AF, Müller E, Hartel A, Stradmann Y, Aamir SA, et al. Demonstrating advantages of neuromorphic computation: a pilot study. Frontiers in Neuroscience. 2019;13:260. pmid:30971881
  3. 3. Cauwenberghs G. Neuromorphic learning VLSI systems: A survey. In: Neuromorphic systems engineering. Springer; 1998. p. 381–408.
  4. 4. Schuman CD, Potok TE, Patton RM, Birdwell JD, Dean ME, Rose GS, et al. A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:170506963. 2017;.
  5. 5. Neckar A, Fok S, Benjamin BV, Stewart TC, Oza NN, Voelker AR, et al. Braindrop: A mixed-signal neuromorphic architecture with a dynamical systems-based programming model. Proceedings of the IEEE. 2018;107(1):144–164.
  6. 6. Ponulak F, Kasinski A. Introduction to spiking neural networks: Information processing, learning and applications. Acta neurobiologiae experimentalis. 2011;71(4):409–433. pmid:22237491
  7. 7. LeCun Y, Bengio Y, Hinton G. Deep learning. nature. 2015;521(7553):436–444. pmid:26017442
  8. 8. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee; 2009. p. 248–255.
  9. 9. Kornblith S, Shlens J, Le QV. Do better imagenet models transfer better? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 2661–2671.
  10. 10. Ge Y, Zhang R, Wang X, Tang X, Luo P. DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2019. p. 5337–5345.
  11. 11. Babenko A, Slesarev A, Chigorin A, Lempitsky V. Neural codes for image retrieval. In: European conference on computer vision. Springer; 2014. p. 584–599.
  12. 12. Chen CL, Mahjoubfar A, Tai LC, Blaby IK, Huang A, Niazi KR, et al. Deep learning in label-free cell classification. Scientific reports. 2016;6(1):1–16. pmid:26975219
  13. 13. Gordo A, Almazán J, Revaud J, Larlus D. Deep image retrieval: Learning global representations for image search. In: European conference on computer vision. Springer; 2016. p. 241–257.
  14. 14. Mahjoubfar A, Churkin DV, Barland S, Broderick N, Turitsyn SK, Jalali B. Time stretch and its applications. Nature Photonics. 2017;11(6):341.
  15. 15. Gordo A, Almazan J, Revaud J, Larlus D. End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision. 2017;124(2):237–254.
  16. 16. Li Y, Mahjoubfar A, Chen CL, Niazi KR, Pei L, Jalali B. Deep cytometry: deep learning with real-time inference in cell sorting and flow cytometry. Scientific reports. 2019;9(1):1–12. pmid:31366998
  17. 17. Noh H, Araujo A, Sim J, Weyand T, Han B. Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE international conference on computer vision; 2017. p. 3456–3465.
  18. 18. Cao Y, Long M, Wang J, Liu S. Deep visual-semantic quantization for efficient image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017. p. 1328–1337.
  19. 19. Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:1708.07747. 2017.
  20. 20. Davies M, Srinivasa N, Lin TH, Chinya G, Joshi P, Lines A, et al. Loihi: A Neuromorphic Manycore Processor with On-Chip Learning. IEEE Micro. 2018;PP:1–1.
  21. 21. Hunsberger E, Eliasmith C. Training spiking deep networks for neuromorphic hardware. arXiv preprint arXiv:161105141. 2016.
  22. 22. Sengupta A, Ye Y, Wang R, Liu C, Roy K. Going deeper in spiking neural networks: VGG and residual architectures. Frontiers in neuroscience. 2019;13. pmid:30899212
  23. 23. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research. 2014;15(1):1929–1958.
  24. 24. Rajendran B, Sebastian A, Schmuker M, Srinivasa N, Eleftheriou E. Low-power neuromorphic hardware for signal processing applications: A review of architectural and system-level design approaches. IEEE Signal Processing Magazine. 2019;36(6):97–110.
  25. 25. Han B, Srinivasan G, Roy K. RMP-SNN: Residual Membrane Potential Neuron for Enabling Deeper High-Accuracy and Low-Latency Spiking Neural Network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2020.
  26. 26. Lin CK, Wild A, Chinya GN, Lin TH, Davies M, Wang H. Mapping spiking neural networks onto a manycore neuromorphic architecture. ACM SIGPLAN Notices. 2018;53(4):78–89.
  27. 27. NVIDIA System Management Interface; 2021. Available from: https://developer.nvidia.com/nvidia-system-management-interface.
  28. 28. Frady EP, Orchard G, Florey D, Imam N, Liu R, Mishra J, et al. Neuromorphic Nearest Neighbor Search Using Intel’s Pohoiki Springs. In: Proceedings of the Neuro-inspired Computational Elements Workshop; 2020. p. 1–10.