The authors have declared that no competing interests exist.
Conceived and designed the experiments: LEG AAL. Performed the experiments: LEG. Analyzed the data: LEG AAL. Contributed reagents/materials/analysis tools: AAL. Wrote the paper: LEG AAL. Initiated project: AAL.
We have developed an open software platform called Neurokernel for collaborative development of comprehensive models of the brain of the fruit fly
Reverse engineering the information processing functions of the brain is an engineering grand challenge of immense interest that has the potential to drive important advances in computer architecture, artificial intelligence, and medicine. The human brain is an obvious and tantalizing target of this effort; however, its structural and architectural complexity place severe limitations upon the extent to which models built and executed with currently available computational technology can relate its biological structure to its information processing capabilities. Successful development of human brain models must therefore be preceded by an increased understanding of the structural/ architectural complexity of the more tractable brains of simpler organisms and how they implement specific information processing functions and govern behavior [
The nervous system of the fruit fly
Remarkably, the fruit fly is capable of a host of complex nonreactive behaviors that are governed by a brain containing only ∼105 neurons and ∼107 synapses organized into fewer than 50 distinct functional units, many of which are known to be directly involved in functions such as sensory processing, locomotion, and control [
Despite considerable progress in mapping the fruit fly’s connectome and elucidating the patterns of information flow in its brain, the complexity of the fly brain’s structure and the still-incomplete state of knowledge regarding its neural circuitry pose challenges that go beyond satisfying the current computational resource requirements of fly brain models. These include (1) the need to explicitly target the information processing capabilities of functional units in the fruit fly brain, (2) the need for fly brain model implementations to efficiently scale over additional hardware resources as they advance in complexity, and (3) the need for brain modeling to be approached as an explicitly open and collaborative process of iterative refinement by multiple parties similar to that successfully employed in the design of the Internet [
To address these challenges, we have developed an open source platform called Neurokernel for implementing connectome-based fruit fly brain models and executing them upon multiple Graphics Processing Units (GPUs). In order to achieve scaling over multiple computational resources while providing the programmability required to model the constituent functional modules in the fly brain, the Neurokernel architecture provides features similar to that of an operating system kernel. In contrast to general-purpose neural simulators, the design of Neurokernel and brain models built upon it is driven by publicly available proposals called Requests for Comments (RFCs).
Neurokernel’s design is predicated upon the organization of the fruit fly brain into a fixed number of functional modules characterized by local neural circuitry. Neurokernel explicitly enforces a programming model for implementing models of these functional modules called Local Processing Units (LPUs) that separates between their internal design and the connectivity patterns that link their external communication interfaces independently of the internal design of models designed by other researchers and of the connectivity patterns that link them. This modular architecture facilitates collaboration between researchers focusing on different functional modules in the fly brain by enabling models independently developed by different researchers to be integrated into a single whole brain model irrespective of their internal designs.
This paper is organized as follows. We first review the anatomy of the fruit fly brain that motivate Neurokernel’s design and then describe its architecture and support for GPU resources and programmability in the following section. The subsequent two sections respectively present Neurokernel’s programming model and detail its API. To illustrate the use of Neurokernel’s API, we then describe its use to integrate independently developed models of the retina and lamina neuropils in the fly’s visual system. We also assess Neurokernel’s ability to exploit technology for accelerated data transmission between multiple GPUs in benchmarks of its module communication services. Finally, we compare Neurokernel to other computational projects directed at reverse engineering the function of neural circuits and discuss the project’s long-term goals.
Analysis of the
Individual neuropils are identified by different colors in the left-hand figure, with the names of several major neuropils listed. Most neuropils are paired across the fly’s two hemispheres. The right-hand figure depicts a tract of neuronal axons connecting neuropils across hemispheres highlighted in yellow (image created using data and software from [
The axons of an LPU’s local neurons and the synaptic connections between them and other neurons in the LPU constitute an internal pattern of connectivity that is distinct from the bundles, or tracts, of projection neuron processes that transmit data to neurons in other LPUs (
In contrast to a purely anatomical subdivision, the decomposition of the brain into functional modules casts the problem of reverse engineering the brain as one of discovering the information processing performed by each individual LPU and determining how specific patterns of axonal connectivity between these LPUs integrates them into functional subsystems. Modeling both these functional modules and the connectivity patterns that link them independent of the internal design of each module is a fundamental requirement of Neurokernel’s architecture.
We refer to our software framework for fruit fly brain emulation as a
Neurokernel’s architectural design consists of three planes that separate between the time scales of a model’s representation and its execution on multiple parallel processors (
The application plane provides support for hardware-independent specification of LPUs and their interconnects. Services that implement the neural primitives and computing methods required to execute neural circuit model instantiations on GPUs are provided by the compute plane. Translation or mapping of specified model components to the methods provided by the compute plane and management of multiple GPUs and communication resources is performed by the control plane operating on a cluster of CPUs.
A key aspect of Neurokernel’s design is the separation it imposes between the internal processing performed by an LPU model and how that model communicates with other models (
An LPU model’s internal components (cyan) are exposed via input and output ports (yellow and orange). Connections between LPUs are described by patterns (green) that link the ports of one LPU to those of another. Connections may only be defined between ports of the same transmission type.
Each communication port must either receive input (yellow) or emit output (orange), and must either transmit spikes (diamonds) or graded potentials (circles).
In these examples, the identifier level strings
Identifier/Selector | Comments |
---|---|
selects a single port | |
equivalent to |
|
equivalent to |
|
selects two ports | |
another example of two ports | |
equivalent to |
|
selects ten ports | |
selects all ports starting with |
|
equivalent to |
|
equivalent to |
A single LPU may potentially be connected to many other LPUs; these connections must be expressed as patterns between pairs of LPUs (
An instance of the
Source Port | Destination Port |
---|---|
Port | Interface | I/O | Port Type |
---|---|---|---|
Port attributes are used by Neurokernel to determine compatibility between LPU and pattern objects. To provide LPU designers with the freedom to determine how to multiplex input data from multiple sources within an LPU, Neurokernel does not permit multiple input ports in a pattern to be connected to a single output port. Input ports in a pattern may be connected to multiple output ports. It should be noted that the connections defined by an inter-LPU connectivity pattern do not represent synaptic models; any synapses comprised by a brain model must be a part of the design of a constituent LPU and connected to the LPU’s ports in order to either receive or transmit data from or to modeling components in other LPUs.
In contrast to other currently available GPU-based neural emulation packages [
To make use of Neurokernel’s LPU API, all LPU models must subclass a base Python class called
An instantiated LPU’s graded potential and spiking ports are respectively associated with GPU data arrays that Neurokernel accesses to transmit data between LPUs during emulation execution; LPU designers are responsible for reading the data elements associated with input ports and populating the elements associated with output ports in the
Inter-LPU connectivity patterns correspond to the connections described by the tracts depicted in
The designer of an LPU is responsible for associating ports with internal components that either consume input data or emit output data. Neurokernel provides a class called
Graded Potential Ports | Spiking Ports | ||||
---|---|---|---|---|---|
Port | Array Index | Array Data | Port | Array Index | Array Data |
0 | 0.71 | 0 | 1 | ||
1 | 0.83 | 1 | 0 | ||
2 | 0.52 | 2 | 1 |
In addition to the classes that represent LPUs and inter-LPU connectivity patterns, Neurokernel provides an emulation manager class called
Apart from the API requirements discussed above, Neurokernel currently places no explicit restrictions upon an LPU model’s internal implementation, how it interacts with available GPUs, how LPUs record their output, or the topology of interconnections between different LPUs; compatible LPUs and inter-LPU patterns may be arbitrarily composed to construct subsystems (
Independently developed LPUs and connectivity patterns may be composed into subsystems (red, green) which may in turn be connected to other subsystems to construct a model of the whole brain (yellow).
Neurokernel’s compute plane currently provides GPU-based implementations for several common neuron and synapse models. Supported neuron models include the Leaky Integrate-and-Fire, Hodgkin-Huxley, and Morris-Lecar point neuron models, as well as a stochastic model of the photoreceptors in the fly retina. Alpha function and conductance-based synaptic models are also supported. These modeling components may be used to construct and execute LPUs without writing any Python code by specifying an LPU’s design declaratively as a graph stored in GEXF, an XML format for storing a property graph supported by various graph processing libraries. Neurokernel does not restrict LPU model developers to using the above models; additional modeling components may be added to the compute plane as plugins.
Communication between LPU instances in a running Neurokernel emulation is performed using MPI to enable brain emulations to take advantage of multiple GPUs hosted either on single computer or a computer cluster. Neurokernel uses OpenMPI [
This section illustrates how to use the Neurokernel classes described in the Application Programming Interface section to construct and execute an emulation consisting of multiple connected LPUs. The section assumes that Neurokernel and its relevant dependencies (including OpenMPI) have already been installed on a system containing multiple GPUs. First, we import several required Python modules; the
Next, we create a subclass of
The data arrays associated with an LPU’s ports may be accessed using their path-like identifiers via two instances of the
To connect two LPUs, we specify the ports to be exposed by each LPU using path-like selectors. The example below describes the interfaces for two LPUs that each expose two graded potential input ports, two graded potential output ports, two spiking input ports, and two spiking output ports.
Using the above LPU interface data, we construct an inter-LPU connectivity pattern by instantiating the
We can then pass the defined LPU class and the parameters to be used during instantiation to a
After all LPUs and connectivity patterns are provided to the manager, the emulation may be executed for a specified number of steps as follows. Neurokernel uses the dynamic process creation feature of MPI-2 supported by OpenMPI to automatically spawn as many MPI processes are needed to run the emulation:
To evaluate Neurokernel’s ability to facilitate interfacing of functional brain modules that can be executed on GPUs, we employed Neurokernel’s programming model to interconnect independently developed LPUs in the fruit fly’s early visual system to provide insights into the representation and processing of the visual field by the cascaded LPUs. We also evaluated Neurokernel’s scaling of communication performance in simple configurations of the architecture parameterized by numbers of ports and LPUs.
The scope of the effort to reverse engineer the fly brain and the need to support the revision of brain models in light of new data requires a structured means of advancing and documenting the evolution of those models and the framework required to support them. To this end, the Neurokernel project employs Requests for Comments documents (RFCs) as a tool for advancing the designs of both Neurokernel’s architecture and the LPU models built to use it. IPython notebooks and RFCs [
The integrated early visual system model we considered consists of models of the fruit fly’s retina and lamina. The retina model comprises a hexagonal array of 721 ommatidia, each of which contains 6 photoreceptor neurons. The photoreceptor model employs a stochastic model of how light input (photons) produce a membrane potential output. Each photoreceptor consists of 30,000 microvilli modeled by 15 equations per microvillus, a photon absorption model, and a model of how the aggregate microvilli contributions produce the photoreceptor’s membrane potential [
The combined retina and lamina models were executed on up to 4 Tesla K20Xm NVIDIA GPUs with an 8 second natural video scene provided as input to the retinal model’s photoreceptors. The computed membrane potentials of specific photoreceptors in each retinal ommatidium and of select neurons in each cartridge of the lamina were recorded (
The hexagonal tiling depicts the array of ommatidia in the retina and the corresponding retinotopic cartridges in the lamina. Outputs of select photoreceptors in the retina (R1) that are fed to neurons in the lamina and outputs of specific neurons in the lamina (L1, L2) are also depicted.
We compared the performance of emulations in which port data stored in GPU memory is copied to and from host memory for traditional network-based transmission by OpenMPI to that of emulations in which port data stored in GPU memory is directly passed to OpenMPI’s communication functions. The latter functions enabled OpenMPI to use NVIDIA’s GPUDirect Peer-to-Peer technology to perform accelerated transmission of data between GPUs whose hardware supports the technology by bypassing the host system’s CPU and memory [
To evaluate how well inter-LPU communication scales over the number of ports exposed by an LPU on a multi-GPU machine, we constructed and ran emulations comprising multiple connected instances of an LPU class with an empty
We initially examined how the above performance metrics scaled over the number of output ports exposed by each LPU in a 2-LPU emulation and over the number of LPUs in an emulation where each LPU is connected to every other LPU and the total number of output ports exposed by each LPU is fixed. We compared the performance for scenarios where data in GPU memory is directly exposed to OpenMPI to that for scenarios where the data is copied to the host memory prior to transmission; the former scenarios enabled OpenMPI to accelerate data transmission between GPUs using NVIDIA’s GPUDirect Peer-to-Peer technology. The metrics for each set of parameters were averaged over 3 trials; the emulation was executed for 500 steps during each trial.
The scaling of performance over number of ports depicted in
The number of output ports was varied over 25 equally spaced values between 50 and 15,000. The plot on the left depicts average synchronization time per execution step, while the plot on the right depicts average synchronization throughput (in number of ports per unit time) per execution step.
The total number of output ports exposed by each LPU was varied between 250 and 10,000 at 250 port intervals.
Current research on the fruit fly brain is mainly focused on LPUs in the fly’s central complex and olfactory and vision systems. Since the interplay between these systems will be key to increasing understanding of multisensory integration and how sensory data might inform behavior mediated by the central complex, we examined how well Neurokernel’s communication mechanism performs in scenarios where LPUs from these three systems are successively added to a multi-LPU emulation. Starting with the pair of LPUs with the largest number of inter-LPU connections, we sorted the 19 LPUs in the above three systems in decreasing order of the number of connections contributed with the addition of each successive LPU and measured the average speedup in synchronization time per execution step due to direct GPU-to-GPU data. The number of connections for each LPU was based upon estimates from a mesoscopic reconstruction of the fruit fly connectome; these numbers appear in Document S2 of the supplement of [
In light of their low costs and rapidly increasing power and availability, there is growing interest in leveraging the power of multiple GPUs to support neural simulations with increasingly high computational demands [
Currently available neural simulation software affords researchers with a range of ways of constructing neural circuit models. These include tools that enable models to be explicitly expressed as systems of differential equations [
Existing technologies for interfacing neural models currently provide no native support for the use of GPUs and none of the aforementioned services required to scale over multiple GPU resources. Neurokernel aims to address the problem of model incompatibility in the context of fly brain modeling by ensuring that GPU-based LPU model implementations and inter-LPU connectivity patterns that comply with its APIs are interoperable regardless of their internal implementations.
Despite the impressive performance GPU-based spiking neural network software can currently achieve for simulations comprising increasingly large numbers of neurons and synapses, enabling increasingly detailed fruit fly brain models to efficiently scale over multiple GPUs will require resource allocation and management features that are not yet provided by currently available neural simulation packages. By explicitly providing services and APIs for management of GPU resources, Neurokernel will enable fly brain emulations to benefit from the near-term advantages of scaling over multiple GPUs while leaving the door open to anticipated improvements in GPU technology that can further accelerate the performance of fly brain models.
The challenges of reverse engineering neural systems have spurred a growing number of projects specifically designed to encourage collaborative neuroscience research endeavors. These include technologies for model sharing [
Neuromorphic platforms whose design is directly inspired by the brain have the potential to execute large-scale neural circuit models at speeds that significantly exceed those achievable with traditional von Neumann computer architectures [
Although the Neurokernel project is specifically focused upon reverse engineering the fruit fly brain, the framework’s ability to capitalize upon the structural modularity of the brain and facilitate collaborative modeling stand to benefit efforts to reverse engineer the brains of other model organisms. To this end, we have already used Neurokernel to successfully scale up the retinal model described in the Integration of Independently Developed LPU Models section to emulate the retina of the house fly, which comprises almost 10 times as many differential equations (18.8 billion) as that of the fruit fly (1.95 billion). Further development of Neurokernel’s support for multiple GPUs and—eventually—neuromorphic hardware will hopefully open the doors to collaborative modeling of the brains of even more complex organisms such as the zebra fish and mouse.
Efforts at reverse engineering the brain must ultimately confront the need to validate hypotheses regarding neural information processing against actual biological systems. In order to achieve biological validation of the Neurokernel, the computational modeling of the fruit fly brain must be tightly integrated with increasingly precise electrophysiological techniques and the recorded data evaluated with novel system identification methods [
Neural responses to sensory stimuli are recorded from the live fly brain in real time and compared to the computed responses of the corresponding components in a fly brain model executed on the same time scale. Discrepancies between these responses and new connectome data may be used to improve the model’s accuracy (fruit fly photograph adapted from Berger and fly robot image adapted from Vizcano, Benton, Gerber, and Louis, both reproduced with permission).
Although Neurokernel currently permits brain models to make use of multiple GPUs, it requires programmers to explicitly manage the GPU resources used by a model’s implementation. Having implemented the API for building and interconnecting LPUs described in the Application Programming Interface section within Neurokernel’s application plane, our next major goal is to implement a prototype GPU resource allocation mechanism within the control plane to automate selection and management of available GPUs used to execute a fly brain model. Direct access to GPUs will also be restricted to modeling components implemented by LPU developers and added to Neurokernel’s compute plane; models implemented or defined in the application plane will instantiate and invoke these components. These developments will permit experimentation with different resource allocation policies as LPU models become more complex. Restricting parallel hardware access to modeling components exposed by the compute plane will also facilitate development of future support for other parallel computing technologies such as non-NVIDIA GPUs or neuromorphic hardware.
Neurokernel is a fundamental component of the collaborative workflow needed to accelerate the process of fruit fly brain model development, execution, and refinement by multiple researchers. This workflow, however, also requires a means of efficiently constructing brain models and modifying their structure and parameters in light of output discrepancies observed during validation or to incorporate new experimental data. As noted in the Application Programming Interface section, Neurokernel currently can execute LPU models declaratively specified as GEXF files that each describe an individual LPU’s design as a graph of currently supported neuron and synapse model instances and separately specified inter-LPU connectivity patterns. Since this model representation must either be manually constructed or generated by ad hoc processing of connectome data, modification of LPUs is currently time consuming and significantly slows down the improvement of brain models. LPUs explicitly implemented in Python that do not use supported neuron or synapse models are even less easy to update because of the need to explicitly modify their implementations.
To address these limitations and enable rapid updating and reevaluation of fly brain models, we are building a system based upon graph databases called NeuroArch for the specification and sophisticated manipulation of structural data associated with LPU models and inter-LPU connectivity [
Despite the fruit fly brain’s relative numerical tractability, its successful emulation is an ambitious goal that will require the joint efforts of multiple researchers from different disciplines. Neurokernel’s open design, support for widely available commodity parallel computing technology, and ability to integrate independently developed models of the brain’s functional subsystems all facilitate this joining of forces. The framework’s first release is a step in this direction; we expect and anticipate that aspects of the current design such as connectivity structure and module interfaces will be superseded by newer designs informed by the growing body of knowledge regarding the structure and function of the fly brain. We invite the research community to join this effort on Neurokernel’s website (
This video depicts a natural video signal input to the photoreceptors in the 721 ommatidia comprised by the retina model, average photoreceptor response per ommatidium, and outputs (membrane potentials) of select photoreceptors (R1) in retina and neurons (L1, L2) in the lamina.
(ZIP)
The authors would like to thank Konstantinos Psychas, Nikul H. Ukani, and Yiyin Zhou for developing and integrating the visual system LPU models used to test the software. The authors would also like to thank Juergen Berger for kindly permitting reuse of his fruit fly photograph and thank Nacho Vizcaíno, Richard Benton, Bertram Gerber, and Matthieu Louis for permitting reuse of the robot fly image they composed for the ESF-EMBO 2010 Conference on Functional Neurobiology in Minibrains.