Time-Warp–Invariant Neuronal Processing

doi:10.1371/journal.pbio.1000141

Figure 1.

Time warp in natural speech.

Sound pressure waveforms (upper panels, arbitrary units) and spectrograms (lower panels, color-code scaled between the minimum and maximum log power) of speech samples from the TI46 Word corpus [24], spoken by different male speakers. (A and B) Utterances of the word “one.” Thin black lines highlight the transients of the second, third, and fourth (bottom to top) spectral peaks (formants). The lines in (A) are compressed relative to (B) by a common factor of 0.53. (C and D) Utterances of the word “eight.”

More »

Expand

Figure 2.

Time-warp–invariant voltage traces.

Spike rasters show a random spike pattern across N = 500 afferents (N_ex = 250 excitatory and N_in = 250 inhibitory), each of which fires a single action potential at a random time chosen uniformly between 0 and 500 ms. Whereas the original spike pattern (β = 1) is shown in (B), the pattern displayed in (A) is compressed by a factor of β = 0.5. In each panel, the lower trace depicts the voltage V(t,β) induced by the spike patterns in our model neuron with balanced uniform synaptic peak conductances that resulted in a zero mean synaptic current at rest set to for excitatory synapses and for inhibitory synapses. These values result in a mean total synaptic conductance of . In (B), the voltage trace V(t,1) (thin grey line) is superimposed on the rescaled voltage trace V(βt,β) (thick black line) from (A).

More »

Expand

Figure 3.

Classification of time-warped random latency patterns.

(A) Error probabilities versus the scale of global time-warp β_max for the conductance-based (blue) and the current-based (red) neurons. Errors were averaged over 20 realizations, error bars depict ±1 standard deviation (s.d.). Isolated points on the right were obtained under dynamic time warp with β_max = 2.5 (see Materials and Methods). (B) Dependence of the error frequency at β_max = 2.5 on the resting membrane time constant τ_m (left) and the synaptic time constant τ_s (right). Colors and statistics as in (A). (C) Voltage traces of a conductance-based (top and second rows) and a current-based neuron (third and bottom rows). Each trace was computed under global time warp with a temporal scaling factor β (see Materials and Methods) (color bar) and plotted versus a common rescaled time axis. For each neuron model, the upper traces were elicited by a target and the lower traces by an untrained spike template.

More »

Expand

Figure 4.

Adaptive learning kernel.

Change in synaptic peak conductance Δg versus the time difference Δt between synaptic firing and the voltage maximum, as a function of the mean total synaptic conductance G during this interval (color bar). Data were collected during the initial 100 cycles of learning with β_max = 2.5 and averaged over 100 realizations.

More »

Expand

Figure 5.

Task dependence of the learned total synaptic conductance.

Error frequency of the conductance-based tempotron versus its effective integration time τ_eff. After switching from time-warp to Gaussian spike jitter, τ_eff increased as the mean time-averaged total synaptic conductance G decreased with learning time (inset).

More »

Expand

Figure 6.

Auditory front end.

(A and B) Incoming sound signal (bottom) and its spectrogram in linear scale (top) as in Figure 1D (A). Based on the spectrogram, the log signal power in 32 frequency channels (Mel scale, see Materials and Methods) is computed and normalized to unit peak amplitude in each channel ([B], top, colorbar). Black lines delineate filterbank channels 10, 20, and 30 and their respective support in the spectrogram (connected through grey areas). In each channel, spikes in 31 afferents (small black circles) are generated by 16 onset (upper block) and 15 offset (lower block) thresholds. For the signal in channel 1 (shown twice as thick black curves on the front sides of the upper and lower blocks), resulting spikes are marked by circles (onset) and squares (offset) with colors indicating respective threshold levels (colorbar). (C) Spikes (onset, top, and offset, bottom) from all 992 afferents plotted as a function of time (x-axis) and corresponding frequency channel (y-axis). The color of each spike (short thin lines) indicates the threshold level (as used for circles and squares in [B]) of the eliciting unit.

More »

Expand

Table 1.

Test set error fractions of individual detector neurons.

More »

Expand

Figure 7.

Speech-recognition task.

(A) Learned synaptic peak conductances. Each pixel corresponds to one synapse characterized by its frequency channel (right y-axis) and its onset (ON) or offset (OFF) afferent power threshold level (x-axis, in percent of maximum signal powers [see Materials and Methods]). Learned peak conductances were color coded with excitatory (warm colors) and inhibitory conductances (cool colors) separately normalized to their respective maximal values (color bar). The left y-axis shows the logarithmically spaced center frequencies (Mel scale) of the frequency channels. (B) Spike-triggered target stimuli (color-code scaled between the minimum and maximum mean log power). (C) Mean voltage traces for target (blue, light blue ±1 s.d.; spike triggered) and null stimuli (red; maximum triggered).

More »

Expand

Figure 8.

Time-warp robustness.

(A) Error versus time-warp factor β. (B) Mean errors over the range of β shown in (A) (digit color code; triangles: female speakers, circles: male speakers) versus the mean effective time constant τ_eff calculated for β = 1 by averaging the total synaptic conductance over 100-ms time windows prior to either the output spikes (target stimuli) or the voltage maxima (null stimuli). (C) Mean voltage traces for time-warped target patterns for the neurons shown in Figure 7. Bottom row: conductance-based neurons, upper row: current-based neurons (see Materials and Methods).

More »

Expand

Figure 9.

Robustness to spike failures.

The error fraction of each digit detector neuron was measured as a function of the spike failure probability over the range from 0% to 10% and fitted by linear regression. For each neuron, the resulting slope (median 0.0069) is plotted versus the intercept (median 0.0061) with symbols and colors as in Figure 8B. The median R² of the linear regression fits was 0.94. The inset shows the median error fraction of the population as a function of the spike failure probability in the range of 1% to 50% with the robust regime braking down at approximately 20%.

More »

Expand