Simulation of the song motor pathway in birds: A single neuron initiates a chain of events that produces birdsong with realistic spectra properties

Birdsong is a complex learned behavior regulated by Neuromuscular coordination of different muscle sets necessary for producing relevant sounds. We developed a heterogeneous and stochastically connected neural network representing the pathway from the high vocal center (HVC) to the robust nucleus of the arcopallium (RA) neurons that drive the muscles to generate sounds. We show that a single active neuron is sufficient to initiate a chain of spiking events that results to excite the entire network system. The network could synthesize realistic bird sounds spectra, with spontaneous generation of intermittent sound bursts typical of birdsong (song syllables). This study confirms experiments on animals and on humans, where results have shown that single neurons are responsible for the activation of complex behavior or are associated with high-level perception events.


Introduction
Birds use acoustic signals for communication purposes and birdsong is known to play a prominent role in sexual selection by influencing female preferences and for territorial defense [1]. This is a very complex behavior regulated by different hormonal and neural mechanisms that interact in an intricate and hierarchical way [1][2][3]. Birdsong production has been studied for a long time and it is important as a small-scale model of higher level language system. The actual connection circuitry of birds or human brain is complex and very difficult to clarify [4]. Nevertheless, we know that bird brain consists of neural pathways where vocal sounds are produced via phonation through pneumatically induced vibrations [5]. In addition, we learned that birds modulate their vocal tract to filter and produce sounds at different frequencies [3]. Given that the sound generation involves modulation of the vocal tract and the coordination of breathing patterns in such a way that the syrinx receives airflow to vibrate and produce PLOS  consistent sounds [6,7], birdsong requires a highly coordinated neuromuscular activity to produce sounds of variability and recognizable details. Birdsong production involves mainly three different muscle sets: upper vocal tract, syrinx and respiratory system [8]. Previous studies have determined that coordination of these muscle sets for song production, at least in birds that learn their song (passeriformes: oscines, hereafter songbirds), is determined by a group of brain nuclei and their connecting pathways [3,8]. The one responsible for song production is known as the motor pathway, and this neural architecture comprises the HVC (high vocal center), the robust nucleus of the arcopallium (RA) and the tracheosyringeal motor nucleus [3,8,9]. The similarities between human speech and birdsong [10], and the complexity of this learned behavior in birds, have stimulated the study of the neural architecture and processes responsible for birdsong production. By using mathematical models, previous studies have attempted to understand and emulate the spiking neural network that produces birdsong [11,12]. Notably, the model introduced by Jin in 2007 [13] realized a very detailed but idealized model of sequence generation for birdsong simulation. In that study, neurons are statically connected by predetermined sequences and neuronal theory is based on Hodgkin-Huxley [14] model. Our study improves those works that use deterministic connections, by applying stochastic rules to connect neurons realizing random connection with defined statistic, as in real brains [15,16]. In previous models, for each class of neurons used, units were considered identical to each other. Here neurons are realized with the required realistic heterogeneity implementing the biologically plausible Izhikevich model [17,18] for simulation. All these improvement and modernization of previous studies resulted in a persuasive evidence that complex behavior like sound generation in birds (a paradigm of language in superior animals) can be initiated by a single neuron that drives a small sized network pathway, supporting many outstanding experiments on humans and animals [19][20][21][22].

Methods
We use a realistic and biological plausible network model (an Izhikevich spiking network [18]) to reproduce the neurological pathway from the HVC to the robust nucleus of the arcopallium (RA) that drives muscle tension for the bird labial oscillation [15,16]. These neurons are connected to realize sequential spiking (see for example Hahnloser, 2002or Spiro, 1999 [23,24]. There could be infinite methods to realize sequential spiking and stereotypical firing that results in the characteristic acoustical features of birdsong. Neurons could be linearly connected in a feed forward manner (for example Abarbanel, 2004) [12], or we could realize other ad hoc circuitry that produces firing sequences. However, a rigid and geometric sequential neuron-to-neuron connection scheme is not realistic for a biological system where neuronal connections are known only by their topological statistics [23,25]. Moreover, neurons characteristics must be heterogeneous (neurons of the same type are actually different from each other) and excitatory and inhibitory neurons ratio should be biologically plausible [18,26] In our tests neuronal connection structure is random (as neural connections are in real brains) but probability of connection to the robust nucleus of the arcopallium is increased in sequence to realize randomness and nevertheless maintain an increasing excitation flow to the RA cells. Results are random for each simulation that is repeated and averaged over several tests for reproducible outputs. The constant activity of one single neuron appears to be the causal entity that drives the complex behavior of the neural system that generates synthetic sounds. The sounds are studied and analyzed in their spectral characteristics, but also we converted them to digital waveform in ".wav" format. These synthetic songs are in this way playable with common digital means for audible sounds and those can be compared with real birdsongs. Files are given in the supplemental material (S1 Audio: "tEvol_PLOS.wave"). We simulated the neural stream that emulates the HVC up to the RA. An object, implemented in the programming language python, represents each neuron. We use a very efficient type of quadratic integrate and fire neuron model, developed by Izhikevich [18]. In this model the two main variables taken into account are the intracellular membrane voltage v and the cell recovery potential u as in the following: The variables I, v and u represent current, voltage and recovery potentials. The three parameters k 1 , k 2 and k 3 are obtained by fitting the dynamics of real cortical neurons, they are set to 0.04, 5 and 140 respectively. With these parameters choice v results to be scaled in mV and time in mSec (we will use mV, mA and mSec units in our results, however the reader should be aware that, due to variables substitution implicit in the Izhikevich model, biologically realistic values could be different) [17,26]. The other four parameters a, b, c and d define the dynamical behavior of the model. Heterogeneity is introduced by adding a random variable in the parameter generation algorithms. For the excitatory neurons a = 0.02, b = 0.2, c = −50 + 10 Ã x 2 and d = 2 − 1 Ã x 2 , for the inhibitory ones a = 0.02 + 0.08 Ã x, b = 0.25 − 0.05 Ã x 2 , c = −50 and d = 2. In this equation x is a random variable between 0 and 1 (x 2 represents simply a different random variable). With this scheme baseline values are added to random fluctuating components for the heterogeneity. An initialization procedure generates a number of HVC neurons and the same amount of RAs, for each of these groups 80% are excitatory and 20% inhibitory.
Our parameter choices are based on Izhikevich 2003 and realize neurons of "chattering spiking" (CH) types [26].
Connections to neighbor neurons are defined by an array of integers " A that points to neurons labels and by an array of floats " W that contains the corresponding connections strength values (from 0 to 1).
Neurons belonging to the same type (HVC or RA) are connected with each other with a modified small world architecture [27,28] of dimension one. This means that each neuron has two neighbors, one on its left and one on its right. No random connections are implemented within the same neuron type group, randomness is realized by heterogeneity and random connections between groups as explained below. A single neuron in the HVC group, called initiator and labeled as number one, initiates the spiking, and because of this connection structure, all HVC neurons are set to spike in sequence accordingly to random connections of increased strength as described hereafter.
In previous work of Abarbanel et al. [12] HVC neurons are all identical and connected in an exact sequence to drive the RAs network. This type of neuronal sequence is not stochastic so biologically implausible [25]. The statistical connection algorithm that realizes randomness defines the connection strength W ij for two neurons i and j is where i and j are the neurons index for the HVC (i) and the RAs (j) groups. N p is the total number of RAs, X and X 0 are random numbers between 0 and 1. For every neuron in HVC, this function generates an uniform random number X in the range between 0 and 1. This value is compared with i N p ; if the latter is greater than X, the neuron is connected to the corresponding RA j . In this way, when the ratio p ¼ i N p grows the probability of connection does as well and reaches 100% when i = N p . The connection strength W ij is a random value ranging from a minimum value of h min = 0.5 to h max = 1; in case of no connection W ij = 0.
So, for each neuron in the network, the input is determined adding up all the contributions coming from connected pre-synaptic neurons. The parameters W ij are used as conductance weights. In other words, the current I in Eq 1 is here the index i of I i and v i represents the current neuron and the index j the neurons connected to it. When this neuronal architecture is constructed (see Fig 1 for a sketch of it), the single initiator neuron can start a chain of events that brings the HVC network to spike and induces, sequentially, an increasing number of RA neurons to spike. This structure realizes both biologically plausible randomness/heterogeneity and strong stereotyped firing observed in birdsong [23].
Parts of the RA neurons are connected to the muscle that regulates the syringeal tension and parts to those that regulate bronchial pressure [12]. We chose at random 50% of RA being of the first type and the remaining RA neurons of the other. The selection is done using the count number of each of them and selecting odd and even numbers for the two classes. At any In a similar fashion are setup the RA neurons in the adjacent ring on the right. To elucidate the HVC-RA stochastic connection algorithm, the first few HVC neuron are schematically numbered as "1", "2", "3" etc. and the random connections to RAs neurons are represented by lines going on the right. The probability of connection to RA grows counterclockwise (arrow) to realize the sequential spiking. Connections are unidirectional from HVC to RA and are indicated by the curved lines (not all connections are shown, those from neuron 4 onwards are interrupted for clarity, connections are random and differ for each simulation). Each neuron is a python object that executes the Izhikevich neural model (Eq 1). Half of RA neurons connect to muscles that drive the bronchial pressure (P) and half to the labial tension (T). A neuronal recruitment model is used to integrate neuronal spikes and activate muscles (Eqs 4 and 5). RA neurons connect to a physical model of the syrinx regulated in this way by tension and bronchial pressure (Eq 6). https://doi.org/10.1371/journal.pone.0200998.g001 Simulation of song motor pathway in birds brains, a neuron initiates the spiking chain that produces birdsong PLOS ONE | https://doi.org/10.1371/journal.pone.0200998 October 5, 2018 given time, the tension and pressure driving input of the syringeal and bronchial muscular apparatus are function of the membrane potentials accordingly to a muscular recruitment process. In other words, the tension of the muscles is proportional to the number of firing motor neurons (recruitment [12]) and decays exponentially if there are no firing ones, or if the total potential is lower than a threshold. This is realized by the following equation: where pos is the positive function, Th is a threshold fixed at -64 mV, RA t and RA p are the membrane potentials for the two RA populations (t and p are the corresponding indexes and N is the total number of neurons for each population. In our case of the two groups Ra t and Ra p have same size, that is half of the total RA neurons). The values τ t and τ p are the time constants of the recruiting process, also those are supposed to be equal for the two populations τ t = τ p = 10 msec. To calibrate suitable variables range, two scaling functions are defined: these are proportional to the time-dependent values for tension and pressure (T and P). In our case alpha 0 = 0.9, alpha 1 = 50 Ã 10 −3 , β 0 = 15 Ã 10 −3 and β 1 = 8.75 Ã 10 −3 . These constants were determined analyzing the equations in the work of Abarbanel [12] and adapting to our specific neural network. The differential equation describing the physical motion is as follows [12,29]: The variable x represents the difference between the position of the labium (prephonatory site) and its rest point, y represents the corresponding velocity. These formalisms may look counter intuitive, but we used these conventions to keep the same expression format of previous well-known literature (for example again Abarbanel 2004).
In this equation the coefficient of y represents the linear dissipation, in this case the variable pressure β. The pressure variability is caused by the neural recruiting process and it acts on the velocity y damping its amplitude. The coefficient C instead is the quadratic non-linear factor of this model. Using the works of Gardner and one of Laje [11,29] as a guideline, and considering our different scaling variables, this is set to C = 0.4. Preliminary tests (not shown) indicated that this value keeps main peak centered about 500 Hz. It is a critical value that is kept constant in all the simulations presented here for reason of uniformity between all graphs. We integrate this differential equation with the Euler method. This method has been chosen since it is fast and precise enough. In literature simulation with Izhikevich neurons are integrated with time step of dT = 1msec [18], here we use dT = 0.1 msec for better stability and robustness to noise. A python controlled graphic engine (blender, www.blender.org) was used to change color and vertical length of each neuronal block when active and was used to visualize in quasi real-time the dynamics of the network for testing, optimizing and evaluation (an example of these visualizations is shown in the supporting materials S1 Video: "blendNeurVisual.avi"). A statistical characterization of the resulting membrane potential is shown in Figs 2 and 3.

Results and discussion
Birds produce a large diversity of sounds, which usually include pure-tone whistles, broadband sounds, frequency and amplitude modulations, click-like sounds, and noise. In despite of the large diversity in bird sounds, birdsong is recognized as tonal [30]. Given that birdsong production is regulated by respiratory patterns [31], songs have different duration, with an average of two seconds. In addition, a note (the minimum phonetic unit of songs) can last between 10 and 100 ms. Although the fundamental frequency of birdsong usually lies between 3 to 5 KHz [30], birds may produce songs in a large range of frequencies. Our synthetically produced sounds resemble birdsong in tonality, duration, and fundamental frequency. In Fig 4, we Simulation of song motor pathway in birds brains, a neuron initiates the spiking chain that produces birdsong present a simple two notes song of a common songbird, the great tit Parus major, (Fig 4a), compared to one of the resulting sounds from our model (Fig 4b). Considering the fact that the neural pathway simulated in the model is extremely simple and triggered by a single activating neuron, the similarities between the real sounds and the simulated one are striking.
Our simulation results are compatible with those obtained by Henry D. I. Abarbanel et al. [12] that realized a model of the song motor pathway in birds. We have significantly improved that model by introducing the Izhikevich equations, describing the variations over the time of   Simulation of song motor pathway in birds brains, a neuron initiates the spiking chain that produces birdsong the voltage response and the recovery variable which represents the difference between all inward and outward voltage-gated currents [18].
Moreover, different from the work mentioned above where the number of neurons were fixed to 10 units, our model allows more neurons, theoretically unlimited quantity, and was constructed by establishing a biological plausible stochastic architecture between the nucleus HVC and the robust nucleus of the arcopallium (RA). For each simulation run used in our work only one HVC type neuron, named initiator above, is stimulated with always the same constant strength (except when the influence of current is studied) and it induces the HVC small world network to become active and to drive the second group of RAs neurons as described previously. Eqs (1), (4), (5) and (6) are integrated with a time step of 0.1 ms for a total simulation time of 1 second. The nonlinear parameter C (equal to 0.4) and the two recruitment time constants τ t and τ p are kept both equal to 10 ms for the entire sets of simulations (except when tau was examined). The parameters τ represent the decay rate of the pressure and tension variables, in other words how fast they decay once stimuli are removed. To test reproducibility and stability, each simulation run has been repeated 3 times with different random seeds when generating neuronal parameters and connection strengths. So different runs differ slightly one from another. Trend and behavior were identical for all simulations, plots shown are one example of the three simulations, not an average of them.
Results of the simulation show intermittent burst of oscillation, which scaled in time, will result in the birdsong-like syllables. In Fig (5), we can observe the wave of pressure and tension the network produces in the upper panel, whereas in the lowest the actual labial movement is shown with clear oscillatory behavior. The network reproduces long period of labial inactivity spontaneously. Notably, one single initiator neuron is the cause of the chain of spikes that finally result in these trains of vibrations that resemble the natural birdsong as reported in many birdsong studies [1,6].
Waveforms so obtained are scaled in time, using a sampling frequency F derived directly by the time step of the Izhikevich model integration: F = 1/10 −3 dT where dT is the time step and 10 −3 is a time unit conversion to fit Izhikevich time scale of one millisecond. In the online material it is possible to download a playable file of the waveform obtained (supporting information S1 Audio: "tEvol_PLOS.wave", generated by the scipy.io.wavfile.write library with sampling rate of F and scaled over 16 bits). Beside acoustical analysis, a comparative frequency study of the output of the simulation has been performed. The Fourier spectrum of the wave is taken and prominent peaks and frequency positions are studied in function of relevant network parameters (Fig 6). Neural networks systems are extremely non-linear, so definition of parameters is fundamental for stable and reproducible performance of the network. We conducted a series of simulations changing a variety of parameters keeping C constant at 0.4: we varied HVC and RA type neurons number from 1 to 32 (for each set of neurons) (Fig 7). The main pitch of the generated sound is represented by peak number 2, and the position more than doubles in value with increased network size, for example for N = 5, peak 1 frequency is about 100 Hz, whereas for N = 15 it reaches about 500 Hz. Interestingly, once the network is big enough, about 20 neurons, frequency stabilizes and then drops as there is a mechanism that realizes an optimal network size for the system. The amplitude of the input stimulus given to the initiator neuron is studied in the range 0 to 100 mA, Fig 8. The network is affected by a generalized noise level, a current value that is superimposed homogeneously on all neurons of the entire network to emulate real brains internally generated noise (this parameter is added to I i of Eq 3). We call it here INLVL (internal noise level) and it is examined from 0 to 10 mA. Also in this case, noise produce resonant falls in the spectrum, as shown in Fig 9. Eq 4 describe the threshold phenomena occurring in the neurological process of recruiting. À T t t represents the exponential decay from the maximum recruiting value reached. The speed of decay is τ t or τ p depending on the different variable in consideration (bronchial pressure or muscle tension). In our simulations these are chosen to be both equal and to range between few to 100 msec (Fig 10).

Conclusions
We constructed a random and heterogeneous spiking network with a realistic neural model to emulate the HVC to RA motor pathway in birds. Our model is made by few tens of interconnected neurons per class, it is a minimal model based on current literature concerning the structure of bird brains. Given the limited knowledge about the exact neuronal connections, this neural pathway is purely theoretical. Moreover, the size and level of interconnections is minimal compared to real brains. Even with these limitations, the simulation demonstrates that it is possible to build a simple neuronal pathway that realizes complex sounds strikingly similar to birdsongs. It shows that a single neuronal activation can initiate the chain of events that leads to the song generation. We could demonstrate that a recruiting system is at play during sound generation and that synthetic waveforms comparable to birdsounds are produced by the network. Not only sounds are similar in center tonality and frequency profile, but also train of syllables are generated spontaneously by the network with timing and rhythm compatible to natural birdsongs. Our sounds are similar to those described by Nowicki as typical birdsong, which includes nearly sinusoidal pure tones and harmonic sounds, where frequency and amplitude modulations  Simulation of song motor pathway in birds brains, a neuron initiates the spiking chain that produces birdsong  Simulation of song motor pathway in birds brains, a neuron initiates the spiking chain that produces birdsong may be present [32]. Moreover, birdsong frequency usually ranges between 3 and 5 KHz, but birds may sing between 100 Hz and above 10 KHz [32] and our sounds, using the simulation time in the Izhikevich neuronal model, results in raw sounds centered around 500-600 Hz. The waveforms obtained could be scaled in frequency by mere re-sampling and sounds similar to birdsongs of species with higher tones can be obtained, see the supplemental material S1 Audio for an example audio file of our waveforms ("tEvol_PLOS.wave").
The chain of spiking that produces this complex behavior, is generated by a single initiator neuron that is spiking first, all other spiking in the network are spontaneous consequence of this single neuron activity, in accordance to recent experiments [33]. Our model and results are consistent with the idea that the expression of complex behaviors can be generated by a statistically connected chain of neurons similarly to previous experiments on animals and models. This study suggests the existence of a simple, self-regulating and extremely efficient mechanism for the generation of complex activity as birdsongs and paves the way for the complete simulation of more complex animal behavior with simple and primitive spiking networks instruments.
Supporting information S1 Audio. An example of the sounds produced by our network. Sounds are similar in center tonality and frequency profile, train of syllables are generated spontaneously with timing and rhythm compatible to real birdsongs. Birdsong frequency usually ranges between 3 and 5 KHz, but birds may sing between 100 Hz and above 10 KHz. The waveforms produced by the network could be scaled in frequency by mere re-sampling to obtain sounds similar to those of species with birdsong of higher tones. (WAVE) Simulation of song motor pathway in birds brains, a neuron initiates the spiking chain that produces birdsong S1 Video. A video that shows how we visualize the dynamics of our network. Each block represents a neuron, the hight of the block is proportional to the membrane potential and can be seen changing in time. Connections between neurons are not shown, but the two rings represent the HVC neurons and the RA ones. The central block represent the syrinx muscle system and its elongation is proportional the air pressure. This visualization system was useful especially at the beginning of our research as simple sanity check, but also was useful to have a qualitative insight on the flow of information along the neural pathway. (AVI)