The Temporal Winner-Take-All Readout

How can the central nervous system make accurate decisions about external stimuli at short times on the basis of the noisy responses of nerve cell populations? It has been suggested that spike time latency is the source of fast decisions. Here, we propose a simple and fast readout mechanism, the temporal Winner-Take-All (tWTA), and undertake a study of its accuracy. The tWTA is studied in the framework of a statistical model for the dynamic response of a nerve cell population to an external stimulus. Each cell is characterized by a preferred stimulus, a unique value of the external stimulus for which it responds fastest. The tWTA estimate for the stimulus is the preferred stimulus of the cell that fired the first spike in the entire population. We then pose the questions: How accurate is the tWTA readout? What are the parameters that govern this accuracy? What are the effects of noise correlations and baseline firing? We find that tWTA sensitivity to the stimulus grows algebraically fast with the number of cells in the population, N, in contrast to the logarithmic slow scaling of the conventional rate-WTA sensitivity with N. Noise correlations in first-spike times of different cells can limit the accuracy of the tWTA readout, even in the limit of large N, similar to the effect that has been observed in population coding theory. We show that baseline firing also has a detrimental effect on tWTA accuracy. We suggest a generalization of the tWTA, the n-tWTA, which estimates the stimulus by the identity of the group of cells firing the first n spikes and show how this simple generalization can overcome the detrimental effect of baseline firing. Thus, the tWTA can provide fast and accurate responses discriminating between a small number of alternatives. High accuracy in estimation of a continuous stimulus can be obtained using the n-tWTA.


Introduction
In recent years, there has been growing interest in coding information about external stimuli by the fine temporal structure of the neural dynamic response [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. Several studies have shown that response latency is modulated by external stimuli [1][2][3][4]. Many cells in the middle temporal (MT) cortex code for the direction of motion of visual stimuli, and can be characterized by a preferred direction of the stimulus, to which they respond maximally, see e.g., [19,20]. Osborne et al. [1] reported that the MT cells respond with the shortest delay when the stimulus is in their preferred direction and that the response delay increases as the stimulus direction diverges from the preferred direction of the cell. In the auditory system of the ferret, Nelken et al. [2] showed response-latency tuning in primary auditory cortex cells to the direction of a virtual sound source. In a recent work Gollisch and Meister [18] showed that relative first-spike times of retinal ganglion cells carry considerable information about the external stimulus, but they did not suggest a concrete readout mechanism.
Here we study the accuracy of a simple readout mechanism, the temporal-Winner-Take-All (tWTA), which extracts information from response latency. The tWTA estimates the stimulus by the identity of the cell that fired the first spike in a population of cells, in contrast to the conventional rate-Winner-Take-All (WTA), which estimates the stimulus by the identity of the cell that fired the most spikes. For example, the tWTA estimate for the direction of motion of a visual stimulus from the responses of a population of MT cells would be the preferred direction of the cell that fired the first spike in the entire population.
Considerable theoretical effort has been devoted to the study of the accuracy of population code readout mechanisms, such as the population-vector, optimal-linear and ideal observer readouts. Of particular interest in the investigation of these readouts was the dependence of the readout accuracy on the population size and the effects of noise correlations in the neuronal responses. In this work, we quantify tWTA accuracy. To this end, we address three specific questions. One, what are the essential features of the neuronal dynamic response to the stimulus to which the tWTA is sensitive? Two, how does the tWTA accuracy depend on the population size? Three, what are the effects of noise correlations and baseline firing on tWTA accuracy?
These questions are addressed in the framework of a statistical model for the dynamic response of MT cells to a moving visual stimulus. In the first part of the results section we investigate tWTA accuracy in a two-column competition model, and in the second part we study tWTA accuracy in the framework of a hypercolumn model. Both parts start by defining the statistical model of the neuronal dynamic response and then follow with an investigation of tWTA accuracy in the absence of noise correlations and baseline firing. In the final stage of each part, correlations and baseline firing are introduced and their effect on tWTA accuracy is investigated.

tWTA Readout Accuracy in a Two Competing Columns Model
The model. We study tWTA accuracy in a model of two competing MT columns coding for the direction of motion of visual stimuli. Each column is comprised of N homogeneous cells. We denote the preferred direction of the cells in column 1 by w 1 and the preferred direction of the cells in column 2 by w 2 . Without loss of generality, we take w 1~0 , which is equivalent to measuring all angles with respect to w 1 . We denote the probability density of a single cell k (k~1, . . . N) in column i with preferred direction w i to fire its first spike at time t i ð Þ k given that stimulus h was presented at time t o~0 by f t i ð Þ k h{w i j . Assuming that first-spike times are statistically independent, the probability density of the first spike in the entire column i at time t is given by the product of three terms: the probability density of a specific cell to fire its first at time t, f t h{w i j ð Þ , the probability that that the first spike times of the rest N{1 ð Þ cells in the population occurred after time t, 1{ Ð t 0 dt'f t' h{w i j ð Þ À Á N{1 , and the N different possibilities of choosing the cell that fired the first spike: The function W is the logarithm of the probability of a single cell firing its first spike after time t, and it has the following properties: Throughout this section, we will quantify tWTA accuracy by using the two alternative forced choice (2AFC) paradigm. In a 2AFC discrimination task, the system is given a stimulus, either h 1 or h 2 , randomly with equal probabilities. Presentation of the stimulus generates a population response in the two columns, t 1 ð Þ and t 2 ð Þ , which are distributed as defined above. The task of the readout is to infer, on the basis of these spike times, whether the stimulus was h 1 or h 2 . We will use the probability of correct discrimination, P C , and the error rate, P e~1 {P C , as measures of the tWTA performance. We will use the term sensitivity to designate the inverse of the stimulus difference, Dh, at which P C crosses a certain threshold, P th . This latter measure is related to the 'just noticeable difference' used in psychophysics.
tWTA accuracy in the absence of correlations. Assuming that column 1 responds faster to a stimulus in direction h 1 than column 2 and vice versa for stimulus h 2 , we define the tWTA readout in the 2AFC task as follows: decide first stimulus was For the sake of convenience, we take h 1~w1~0 and h 2~w2~d h. This choice equalizes the probability of correct response given stimulus h 1 and given stimulus h 2 . The probability of correct response, P C , is given by the probability that population 1 fired the first spike, given stimulus h. Thus, P C can be written as the integration over all possible first-spike times, t, of the probability density that population 1 fired its first spike at time t multiplied by the probability that the first spike time of population 2 is large than t: In the limit of large populations, N??, the integral in the righthand-side of equation (4) will be dominated by the region in which the exponent obtains its maximum. Since W is a monotonically decreasing function of t, this region is the region of small t. For small t, we approximate f by: where r is the scale parameter, a is the shape parameter, t is the delay parameter and x ½ z~x for xw0 and 0 otherwise.
Relation to the peri stimulus time histogram (PSTH) in an inhomogeneous Poisson process (IHPP). The IHPP is widely used to model the stochastic nature of the neural temporal response [21,22] and is fully defined by the PSTH. In the context of first spike-time distribution, the choice of an IHPP model does not limit the generality of the model, since every PSTH, r t ð Þ, of an IHPP could be mapped to first spike time distribution, f t ð Þ, and vice versa. For a given IHPP with PSTH, r t ð Þ, the first spike time distribution is given by (see e.g., [21,22]) In the other direction, we want to obtain the PSTH, r t ð Þ, that will yield a specific first spike time distribution, f t ð Þ, in an IHPP model. The probability density that the first spike has occurred in time tzdt in an IHPP model, can be written as the product of the probability density of spiking at that time, r t ð Þ, multiplied by the probability that there were no prior spikes, 1{ Thus we obtain the reciprocal relation which could be verified by substituting equation (7) into equation (6). For small t: f t ð Þ&r t ð Þ. Thus, the scale parameter corresponds to the scale of the PSTH, the shape parameter governs the initial

Author Summary
Considerable experimental as well as theoretical effort has been devoted to the investigation of the neural code. The traditional approach has been to study the information content of the total neural spike count during a long period of time. However, in many cases, the central nervous system is required to estimate the external stimulus at much shorter times. What readout mechanism could account for such fast decisions? We suggest a readout mechanism that estimates the external stimulus by the first spike in the population, the tWTA. We show that the tWTA can account for accurate discriminations between a small number of choices. We find that the accuracy of the tWTA is limited by the neuronal baseline firing. We further find that, due to baseline firing, the single first spike does not encode sufficient information for estimating a continuous variable, such as the direction of motion of a visual stimulus, with fine resolution. In such cases, fast and accurate decisions can be obtained by a generalization of the tWTA to a readout that estimates the stimulus by the first n spikes fire by the population, where n is larger than the mean number of baseline spikes in the population.
acceleration of the PSTH, and the delay parameter measures the temporal shift of the PSTH. Figure 1 illustrates how the different parameters that characterize the initial neural response: scale, shape and delay, affect the first spike probability density and the corresponding PSTH. Note that whereas f t ð Þ and r t ð Þ are very similar for small t, for large t, f t ð Þ decays to zero while r t ð Þ may continue to increase. Below we study the different effects of the tuning of these parameters on the accuracy of the tWTA.
Effect of scale parameter tuning. We first consider a simple model in which the scale is the only parameter that is tuned to the stimulus. In this case, we can write f near t~0 as the product of a function of the stimulus and a function of time: where a is independent of h. Expanding W in small t, W t h j ð Þ&{r h ð Þt a =a and substituting in equation (4), we obtain to a leading order in 1=N Hence, in this case, the probability of correct response is at chance level, P chance C~1 =2, when the neural response has the same scale for the two alternatives, r 0 ð Þ~r dh ð Þ, and increases monotonically in the ratio r 0 ð Þ r dh ð Þ . The accuracy of the tWTA is not improved by increasing N: The same accuracy will be obtained with N~1 and N~1000 cells, but, somewhat faster for the N~1000 case. Figure 2a shows the probability of correct discrimination as a function of N for different values of r dh  Effect of shape parameter tuning. In the case where only the shape parameter, a, is tuned to the stimulus, we write: where r is independent of h. We assume that population 1, with preferred direction w 1~0 , fires faster than population 2, with preferred direction w 2~d h, given stimulus h~0, in the sense that for short times the probability of firing of cell in population 2 is larger than that in population 1; hence, a 0 ð Þva dh ð Þ. To compute P C in the limit of large populations, equation (4), it is convenient to make a change of variables to u t ð Þ~{ W t 0 j ð ÞzW t dh j ð Þ ½ , yielding: Applying Watson's Lemma [23] we obtain the asymptotic approximation for the error rate: Hence, in this case, the probability of error decays algebraically with N to zero. This scaling of the readout accuracy with population size is similar to the scaling of the conventional rate-WTA accuracy with population size [24]. For small dh, a dh ð Þ&a 0 ð Þzdha' 0 ð Þ, we obtain: Thus, although in this case tWTA sensitivity improves by utilizing larger populations, this logarithmic improvement is extremely slow. Figure 2b shows the discrimination error rate to the power of  (12).
Effect of delay parameter tuning. In the case where the delay parameter, t, is the only the parameter that is tuned to the stimulus, we write: where H x ð Þ is the Heavyside function: H x ð Þ~1 for xw0 and 0 otherwise. In this case. we find (see Methods) that the probability of error decays exponentially fast with the population size, N: where A a ð Þ is defined in Methods. Hence, in this case, the tWTA error rate decays to zero exponentially with N, in contrast to the slow algebraic scaling of the conventional rate-WTA accuracy with the population size [24]. For small dh, we can expand the delay parameter, t, in dh and approximate t dh ð Þ&t'dh; for small dh, we thus find that tWTA sensitivity grows algebraically with N: in contrast to the logarithmic scaling of the conventional rate-WTA sensitivity with population size [24]. Figure 2c shows minus the logarithm of the discrimination error rate in the case of delay parameter tuning to the stimulus. The open squares are estimates of the tWTA accuracy obtained by averaging tWTA accuracy over 10 6 realizations of the neural stochastic response. The solid line shows the analytical prediction of equation (15). The different effects exerted by scale, shape and delay parameters on the scaling of the tWTA accuracy with the population size highlights the sensitivity of the tWTA to fine details of the first-spike-time distribution. Nevertheless, in the general case, all parameters will be tuned to the stimulus. The dominant contribution to the tWTA accuracy will result from the tuning of the delay parameter. Hence, the tWTA error rate will decay exponentially fast to zero with N, and the sensitivity will scale algebraically with N. We will therefore focus hereafter on models in which the delay parameter is tuned to the stimulus and ignore the tuning of other parameters to the stimulus.
Two important factors may have a considerable effect on the tWTA accuracy are addressed below. The first is noise correlations in the fluctuations of first spike times of different cells. It has been shown that noise correlations have a considerable effect on population code readout accuracy [25][26][27][28][29]. The second factors is nonzero baseline firing rate.
Effect of correlations on the tWTA accuracy. How should the covariance between first spike times of different cells be modeled? One possible mechanism that can cause correlated firing is having a shared input. Two cells that receive a common input that fluctuates above its mean will integrate it over time and reach spiking threshold sooner than their average first spike time. If the common input fluctuates below its average value, spike time of both cells will be delayed. It is reasonable to assume that cells that are functionally close, i.e., have similar preferred directions, will have more common input. Hence, their first spike times are expected to be more positively correlated. motivated by this intuition, we model correlations by adding a uniform random shift, t i ð Þc , to the spike times of the cells in column i~1,2, which represents the effect of fluctuations in shared inputs to cells in every column. Thus, we write the first spike time t i ð Þ k of neuron k in population i as the sum of a correlated term and an independent term: are statistically independent, given the stimulus, with ð Þc and t 1 ð Þc , are independent, with probability distribution f c t ð Þ. In the limit of large N, the probability of correct discrimination is given by (see Methods): Hence, for large populations, the uncorrelated fluctuations can be ignored, and the probability of correct discrimination saturates to a size-independent limit. Figure 3 shows the performance of the tWTA, in terms of percent correct discrimination, as a function of the number of cells in each column, N, for increasing values of t c~0 ,1,2,3 ms from top to bottom. In the simulations, we used a model in which only the delay parameter is tuned to the stimulus.
Specifically we took: In this case we obtain (see Methods): In the absence of correlations, t c ?0[N eff ??, equation (20) converges to equation (15) with a~0 and W t 0 j ð Þ~{rt. The error rate, P err~1 {P C , decays to zero exponentially with the number of cells, N. In the presence of correlations, t c w0, for small populations, N%N eff , the tWTA error rate decays exponentially with N, as in the uncorrelated case, equation (15). When N*N eff , tWTA performance reaches the saturation regime, and tWTA accuracy converges to a finite limit for N??: Hence, in the presence of correlations for large N, the tWTA error rate is an increasing function of t c , which saturates to chance level Effect of baseline firing on tWTA accuracy. In the above analysis we assumed zero baseline firing for all cells. However, nonzero baseline firing may have a significant effect on the tWTA accuracy. To incorporate baseline firing into our model, it is most convenient to use the framework of the IHPP, which is defined by the PSTH. The PSTHs of the two populations are modeled by: where, r o is the baseline firing rate (r o vr) and T is the duration in which both columns fire at baseline prior to responding selectively to the stimulus. The function t h ð Þ is the tuning of the delay parameter. As above we take t 0 ð Þ~0 and t dh ð Þ:t. In this case, we find: Figure 4 shows the probability of correct discrimination as a function of N for different values of T~0,1,5,10 ms from top to bottom. For any positive T, the probability of correct discrimination, P C , decays to chance level, P chance C~0 :5, exponentially fast with N for large N. This decay results from the fact that the probability of not spiking in the time interval before time T decays to zero exponentially with N. For T~0, the probability of correct response will saturate exponentially to P max C~1 1z ro r (compare with equation (9)) which can be high for low baseline firing rate, The temporal n Winners-Take-All (n-tWTA). To overcome the detrimental effect of baseline firing we generalize the tWTA to a family of readouts, n-tWTA, that are determined by the subgroup of cells that fired the first n spikes. In a 2AFC competition between two homogeneous columns, the n-tWTA estimates the stimulus by the preferred direction of the column that fired the first n spikes. In the model of delayed step function response PSTH, equation (23), spikes that are fired in the absolute delay period, from time 0 to time T, are independent of the stimulus and hence carry no information. The informative time of spiking is that from time T to time Tzt, where firing rates of the cells depend on the stimulus. For a given population size, N, the mean number of spikes fired during the absolute delay time is Nr o T. During the informative period, an average of Nrt spikes is being fired by the informative group. Taking Nrtwn&Nr o T diminishes the detrimental effect of baseline firing and conserves the essential information embedded in the temporal order of the  neural responses. Figure 5 shows the percent correct discrimination of the n-tWTA, as a function of n. In this case, the average number of baseline spikes fired during the absolute delay time is Nr o T~1, and P C does indeed increase as n is increased from n~1 and to almost perfect discrimination at about n~5. During the informative period of spiking, an average of Nrt~25 spikes are fired by the 'correct' group. As expected, the probability of correct discrimination deteriorates for nwNrt~25. In this example, the performance of the n-tWTA will decay to chance level in the limit of large n, since we did not incorporate any scale differences in the firings of the two populations. Thus, a reasonable choice of n can eliminate the effect of baseline firing and greatly improve the performance of the tWTA.
Note that the optimal region for n, depends on the population size. For any fixed n, increasing the population size increases the number of baseline spikes fired during the absolute delay period, Nr o T. Hence, for Nwn=r o T the n-tWTA performance will decay to chance level. An alternative n-tWTA generalization is to estimate the stimulus by the preferred direction of the first single cell that fired n spikes, see [2]. Results for this later generalization are qualitatively similar to those of the former in this model.

tWTA Estimation Accuracy in a Hypercolumn Model
The model. We study the tWTA estimation accuracy in a hypercolumn model of N cells coding for an angular variable, h, such as the coding for the direction of motion of a visual stimulus by MT cells. Each cell k[ 1, . . . N f g is characterized by its preferred direction w k~2 pk N to which it responds fastest. Spike time distributions of different cells are modeled by independent IHPPs with PSTH r k t h j ð Þ~r t h{w k j ð Þ , for cell k, k~1, . . . N. The tWTA estimate of the stimulus is given by the preferred direction, w k , of the cell k that fired the first spikê h h~arg min where t w k ð Þ denotes the time of the first spike of cell k, following presentation of the stimulus. Throughout this section, we quantify tWTA sensitivity by the inverse of the root-mean-square estimation error, averaging of X over the distribution of spike times for a given external stimulus h. tWTA accuracy in the absence of correlations. The probability of the tWTA estimator to beĥ h[ w k f g N k~1 , is given by the probability that the first spike in the population was fired by the cell with preferred directionĥ h: Empirical examples of first spike time tuning to an angular external stimulus is shown for example in [1,2]. Since tuning of the delay parameter makes the dominant contribution to the tWTA accuracy (see above), we now analyze the case of a delayed step function PSTH model with stimulus-independent scale and shape parameters. Specifically, we take the instantaneous firing rate of cell k with preferred direction w k , given that stimulus h was presented at time t o~0 , to be: This simple choice of PSTH does not limit the generality of our results but rather clarifies the analysis such that our conclusions are not obscured by non-relevant parameters. Figure 6 shows typical population response to stimulus h~0. The dots on row w show the spike times of a single cell with preferred direction w. The dashed line shows the delay tuning function, t w k {h ð Þ5 1{cos w k {h ð Þ ½ ms, which yields the minimum possible spike time for each preferred direction.
The delay tuning function, t w ð Þ, is assumed to be a periodic function of w. We further assume that the delay function, t w ð Þ, is a continuous, even function of its argument with a single minimum at w~0. For cells with preferred directions close to the stimulus, Figure 5. Performance of the n-tWTA readout in a 2AFC discrimination task in a two-column model. The probability of correct discrimination of the n-tWTA readout is shown as function of n. The probability of correct discrimination was estimated by averaging over 10 5 realizations of the neural stochastic response in an IHPP model for spike time distribution as defined in equation (23) with: N~100, r~50 Hz, r o~1 Hz, T~10 ms and t~5 ms.   we can approximate the delay function by: where a[ 1,2 ½ characterizes the delay tuning function near its unique minimum, for a smooth delay function a~2, and b is a constant in units of time. Since the tWTA is affected only by the fastest cells, we can use the approximation of equation (31) to describe the delay function of the entire hypercolumn, bearing in mind that the likelihood of cells with preferred directions that are far removed from the stimulus to affect the tWTA decays exponentially fast with N.
Using the continuum limit approximation for the exponent in the right hand side of equation (29), we evaluate the conditional probability density of the estimation error of dĥ h:h{ĥ h and obtain: In the limit of large N, P dĥ h and decays exponentially with N for dĥ h&N { 1 1za . Hence, we obtain the following scaling law for the tWTA accuracy: As in the two-column competition in the 2AFC paradigm, the sensitivity of the tWTA readout in a hypercolumn model scales algebraically fast with N, in the absence of noise correlations and in the limit of low baseline firing. This fast scaling is in contrast to the slow logarithmic scaling of the conventional rate-WTA readout accuracy wih the population size [24]. where the uncorrelated parts, t u k , are taken to be distributed according to an IHPP with a PSTH r k t ð Þ~rH t{t w k {h ð Þ ð Þ . For the sake of simplicity, we take t w ð Þ~t 1{cos w ð Þ ð Þ . The terms x, y and z are the correlated components of the spike times. The z term represent the effect of shared input to the entire hypercolumn, whereas, x and y represent the effect of shared input that is stronger for columns that are functionally closer, i.e., have smaller preferred directions difference. We assume that the correlated noise has zero means SxT~SyT~SzT~0 and variance Sx 2 T~Sy 2 T~t 2 c and Sz 2 T~t 2 . Figure 8 shows typical realizations of the population response during a single trial of presenting stimulus h~0 in the presence of noise correlations. In where Dh is measured in radians. Note that equation (35) takes the form of a signal-to-noise ratio, where the signal is the modulation amplitude of the delay function, t, and the noise is the component of collective noise correlations that affect the tWTA estimation, t c . The tWTA sensitivity, equation (35), is independent of the collective fluctuations in the uniform direction, t. Figure 9 shows the asymptotic accuracy of the tWTA as a function of the noise-to-signal ratio t c t . The solid line shows the analytical result of equation (35) in the limit of large N. The open squares show numerical estimation of asymptotic accuracy using a population of size N~10 5 cells. The finite size of the network limits the ability of the numerical estimate to follow the analytic curve at high accuracy (low noise levels). To compensate somewhat for this effect, an extremely high firing rate was used in the simulations.
Effect of baseline firing on the tWTA accuracy. The effect of nonzero baseline firing on tWTA estimation accuracy is studied in the framework of a hypercolumn IHPP model with a delayed step function PSTH. Specifically, we took the following PSTH for the response of cell k with preferred direction w k : where T is the absolute delay, t h ð Þ §0 is the tuning of the delay Figure 7. Estimation accuracy of the tWTA readout in a hypercolumn model. The accuracy of the tWTA readout, in terms of one over the squared estimation error of estimating h~0, is plotted as a function of the population size, in an IHPP hypercolumn population model, equation (30). The latency tuning was modeled by t Dh ð ÞD h j j a ms (where Dh is measured in radian) with a~1,1:2, . . . ,2 from top to bottom. Accuracy was measured by averaging the squared estimation error over 10,000 trials of simulating the neuronal stochastic response (squares). The solid lines show the analytical fit using equation (33). doi:10.1371/journal.pcbi.1000286.g007 parameter with t 0 ð Þ~0. For T~0 in the limit of large N, we can approximate the probability of the tWTA estimator to bê h h[ w k f g N k~1 , equation (29), by: where Z is a normalizing factor of the probability distribution. Figure 10a and 10b show histograms of tWTA estimations of stimulus h~0 for N~360 and N~3600, respectively, in this model with T~0. The solid line shows the analytical approximation, equation (37). The distribution is characterized by a narrow peak around zero error, with a width that decreases to zero as N grows to infinity and a uniform probability for large errors. The ratio of the peak distribution of the zero error (atĥ h~h) to the distribution of a specific large error is given by r=r o . However, since the width of the peak decreases as N increases (compare Figure 10a and 10b), the average squared estimation error increases for large N, even in for T~0, in contrast to the effect of baseline firing in the 2AFC, where at T~0 the probability of correct response is an increasing function of N. A hallmark of the tWTA readout is the high kurtosis of the estimation error.
In the case of Tw0, using equation (37), one obtains Hence, in this case the peak to base ratio of the distribution is decreased and decays exponentially to zero with the product Nr o T. This effect is shown by the histogram of tWTA estimation errors in Figure 10c where we took N~3600 and T~1 ms (compare with Figure 10b where N~3600 and T~0 ms). The solid line shows the analytical approximation of equation (38).

Discussion
At the time of the first spike, the tWTA is the ideal observer and, in the case of angle estimation, it is also the population vector readout. If a decision must be made at very short times, then the tWTA is the best readout. It is therefore important that we know and understand the capabilities and limitation of this readout. Scaling of the tWTA accuracy with the population size, N, can show a wide range of behaviors: from constant in N (equation 9), through logarithmic (equation 13) to algebraic (equation 17). These different scaling regimes depend on fine details of the tuning of the probability distribution of the first-spike-times or alternatively on the initial rise of the PSTH response to the stimulus. In the generic case in which scale, shape and delay parameters are all tuned to the stimulus, the tWTA accuracy will increase algebraically with N, in contrast to the expected logarithmic slow scaling of the conventional rate-WTA readout [24]. Nevertheless, the tWTA is expected to show high sensitivity to the inherent neuronal diversity at the level of single cell response properties (see e.g., [30]). This sensitivity of the tWTA predicts considerable subject-to-subject variability in psychophysical performance as well as large fluctuations in the psychophysical accuracy for the same subject under different stimuli conditions, such as discriminating h 1 and h 1 zdh versus discriminating h 2 and h 2 zdh.
Noise correlations in the fluctuations of first-spike times of different cells have a drastically detrimental effect on the tWTA accuracy, limiting the effective number of degrees of freedom in the network and resulting in finite error levels, even in the limit of large N, see e.g., equations (21), (22) and (35) and Figures 3 and 9. This effect is similar to that has been reported in population coding literature [25][26][27][28][29]31], and depends on the correlations structure. Here we investigated the effect of correlations that had simple spatial structure and no temporal structure. A drastically detrimental effect on the tWTA accuracy is caused by neuronal response covariance which generates collective fluctuation that resembles the 'signal', i.e., similar to the tuning of the delay parameter (see Figure 8). For a detailed discussion on the effects of correlations structure see [27]. The temporal structure of response covariance may also have a considerable effect. For example, if the correlations depend on the absolute time, in a manner that they are negligible for small t and build up later in time, then they will not necessarily cause saturation of the tWTA accuracy. However, better empirical understanding of first spike time correlations is required to yield sufficient constraint for theoretical study of this issue. It is important to emphasize that by correlations we mean first spike time covariance of simultaneously recorded cells, in contrast to other types of correlations [5].
In a 2AFC, task nonzero baseline firing has a twofold detrimental effect on the tWTA accuracy. The first is in the case in which the onset of the tWTA readout precedes the stimulus response of the fastest cell in the entire population, Tw0. In this case, the tWTA accuracy will decrease as N is increased beyond , in a hypercolumn model, as defined in section 'effect of correlations on the tWTA estimation accuracy', see equation (34). The solid line shows the analytical asymptotic value, equation (35). Open squares show the numerical estimation of the asymptotic value as calculated by averaging the tWTA estimation error over 100 trials in a hypercolumn model of N~100,000 cells. The latency function that was used was: To minimize the effect of finite N, an extremely high firing rate of r~5K Hz was used in the IHPP simulations. doi:10.1371/journal.pcbi.1000286.g009 Figure 10. Effect of baseline firing on the tWTA estimation in a hypercolumn model. Histograms of tWTA estimation of stimulus h~0 were obtained in a model of delayed step function response to the stimulus, equation (36), with t h ð Þ~t 1{cos h ð Þ ð Þ , and parameters: t~50 ms, r~50 Hz and r o~1 Hz. Population size was N~360 in (a) and N~3600 in (b,c). Histograms were estimated using 10 6 repetitions in (a,b) and using 10 7 repetitions in (c). The solid lines are analytical approximations of equation (37) in (a,b) and equation (38) in (c). doi:10.1371/journal.pcbi.1000286.g010 some optimal value N max . This effect can be minimized by obtaining a more accurate estimate for the minimal response time of the cells in the population, i.e., effectively decreasing T [5]. The second effect is a saturating effect, which limits the maximal accuracy that can be obtained by the tWTA, P max C~1 1z ro r , even for T~0. Note that, although P max C is less than 1, psychophysical accuracy is also finite. The value of P max C can be rather high in cases in which the baseline firing is small relative to the stimulus response. These effects can be decreased for any given N by using a generalized n-tWTA readout that makes a decision according to the population that fired the first n spikes, see Figure 5. Nevertheless, for any given fixed value of n, increasing the population size, N, will decrease the n-tWTA performance to chance level, for Tw0. Hence, for fast decisions there are advantages to reading out the responses of small neuronal populations rather than larger populations.
Baseline firing has similar detrimental effects on the tWTA readout in estimation tasks (see Figure 10). A hallmark of the tWTA readout that can serve as a prediction is its high kurtosis. There are various ways to generalize the tWTA to use more than one spike in order to overcome the detrimental effect of baseline firing. One option is that readout is determined by the preferred direction of the single cell that fired the first n spikes. An alternative generalization is to define the readout by a 'vote' of cells that fired the first n spikes in the population. In the later case, different weights may be assigned to the votes. The utility of the different possible generalizations is expected to depend largely on the structure of the correlations in the neuronal initial dynamic response to the stimulus.
In a series of highly influential papers, Thorpe and colleagues (see e.g., [12,14]), have highlighted the possible role of spike latency as primary source of information in the CNS and have shown, for example, how an image falling on the retina could be reconstructed from a spike latency (see also work of [15]). In the context of this work, their readout could be thought of as a specific choice for the n-tWTA generalization. Here, we presented a systematic investigation of the tWTA accuracy that allows for comparison with psychophysical accuracy; hence, enables testing of the hypothesis that tWTA is actually used by the CNS. In addition, our analysis provides a framework that allows for the understanding and the investigation of the effects of correlations and baseline firing on the tWTA accuracy.
Neural network implementations of the tWTA. Considerable theoretical effort has been devoted to the investigation of neural network models that can implement the conventional rate-WTA [32][33][34][35][36][37][38][39][40][41][42][43]. These studies have focused on inputs that are constant in time and differ by their scale. However, it has been acknowledged that the temporal structure of the inputs may have a considerable effect on the WTA readout [43]. This effect shows the sensitivity of existing rate-WTA neural network models to the order of firing and demonstrates the capability of neural networks to implement a tWTA computation. Indeed one can imagine the responses of the (assumed excitatory) hypercolumn network that code for the external stimulus by their spike time latency, being input to a n-tWTA readout layer of laterally all to all connected inhibitory neurons. Once, input to a inhibitory cell crosses firing threshold of n excitatory post synaptic potential, it will fire and silence the rest of the network. Investigation of various neural network implementations, their limitations and deviations from the mathematically ideal tWTA and the computational consequences of these deviations if exist is beyond the scope of the current work and will be addressed elsewhere.
The neural code. To what extent does the CNS use the tWTA as a readout mechanism? Readout mechanisms used by the CNS are necessarily dynamic processes that may involve inhibition and hence generate WTA-like competition between inputs from different columns. If fast decisions between a small number of alternatives are required, then the tWTA can provide the correct result with high probability. In such a case, we predict that the readout will be determined by competition between relatively small groups of cells rather than by the entire cell population that responds to the stimulus so as to decrease the effect of baseline firing. Such decisions include, for example, estimation of the direction of motion of a visual stimulus at a low resolution of 45u. However, for discrimination between many alternatives the tWTA is limited by the baseline firing. Why is this task more sensitive to baseline firing? Consider an example in which estimation of the direction of motion of a visual stimulus is required at a precision of 3.6u. For this angular resolution, a population of at least N~100 cells is needed. Let us assume that at the stimulus onset the 'correct' cell fires at a rate of r~100 Hz while the rest of the population fires at a baseline rate of r o~1 Hz. During the first t~10 ms of stimulus presentation, the 'correct' cell will fire an average of tr~1 spike, while the rest of the cells will fire an average of N{1 ð Þtr o~0 :99 spikes; thus, the tWTA is expected to err in more than 3.6u in about 50% of the cases. Hence, fine estimation tasks cannot rely on the single first spike, and our theory predicts that in these cases the first n spikes must be considered where n should be larger than the average number of baseline spikes. How should the n-tWTA combine the information from the first n spikes? The answer to this question depends on the temporal structure of correlations, fine details of the PSTH, and on our assumptions on the computational capabilities of this readout and is beyond the scope of the current paper. The current work provides the essential framework for addressing this question. To further study the hypothesis that the CNS actually uses the tWTA better empirical understanding of the tuning of first spike time distribution to the stimulus, baseline firing, and the spatial and temporal structure of noise correlations is required.

Methods
Calculation of tWTA Accuracy in 2AFC in the Case of Delay Parameter Tuning to the Stimulus Substituting equation (14) into equation (4), we obtain the probability of correct discrimination as sum of two terms: The integral I 1 , equation (40), can be evaluated exactly, yielding the contribution of where we have used t as shorthand for t dh ð Þ. The contribution of I 2 to P C is positive; hence, the tWTA error rate, in this case, will decay to zero exponentially with the population size N.
For the calculation of I 2 , equation (41), we change variables to u t ð Þ~{ W t 0 j ð Þ{W t 0 j ð ÞzW t dh j ð Þ ½ , yielding: where for a small positive t:t{t, _ W W t dh j ð Þ&{r a t a{1 . Assuming _ W W t 0 j ð Þ=0, the leading term in t u ð Þ, for small u, is given by: Using Watson's Lemma to evaluate I 2 to leading order in N, we obtain Calculation of the tWTA Accuracy in 2AFC with Correlations For a given stimulus h, the probability density, P 1 t ð Þ, that the first spike of population 1 occurred at time t is The probability density, P 2 t ð Þ, that population 2 did not fire before time t is given by: Assuming the tWTA decides the stimulus is h if the first spike comes from population 1, the probability of correct response is: and obtain: which is equivalent to the result of equation (35).