Variability Measures of Positive Random Variables

During the stationary part of neuronal spiking response, the stimulus can be encoded in the firing rate, but also in the statistical structure of the interspike intervals. We propose and discuss two information-based measures of statistical dispersion of the interspike interval distribution, the entropy-based dispersion and Fisher information-based dispersion. The measures are compared with the frequently used concept of standard deviation. It is shown, that standard deviation is not well suited to quantify some aspects of dispersion that are often expected intuitively, such as the degree of randomness. The proposed dispersion measures are not entirely independent, although each describes the interspike intervals from a different point of view. The new methods are applied to common models of neuronal firing and to both simulated and experimental data.


Introduction
One of the most fundamental problems in computational biology is the problem of neuronal coding, the question of how information is represented in neuronal signals [1,2]. The discharge activity of neurons is composed of series of events called action potentials (or spikes). It is widely accepted, that information in neuronal systems is transferred by employing these spikes. The shapes and durations of individual spikes are very similar, therefore it is generally assumed that the form of the action potential is not important in information transmission. When a stimulus is presented, the responding neuron usually produces a transient response followed by a sustained one, which is often treated as stationary in time [3] . The firing rate of the sustained part of the response depends on the stimulus, however, the stimulus can be also ''encoded'' in the statistical structure of the interspike intervals (ISI) by the temporal coding [1,[4][5][6].
While the description of neuronal activity from the rate coding point of view is relatively straightforward [7] , the temporal code allows infinite number of alternatives. Spike trains with equal firing rates may turn out to be different under various measures of their statistical structure beyond the firing rate. For example, even more than a half century ago, coefficient of variation (c v ) of ISIs was reported to encode information about light intensity in adapted cells of the horseshoe crab [1,8]. Similarly, changes in the level of bursting activity, also characterized by c v , are reported to be the proper code for edge detection in certain units of visual cortex [9]. In general, the bursting nature of neuronal firing is commonly described by c v [10].
In order to describe and analyze the way information is represented in spike trains, methods for their mutual comparison are needed. Although the ISI probability density function (or histogram of data) usually provides a complete information, one needs quantitative methods [11][12][13], especially since a visual inspection of the density shape can be misleading. Here we restrict our attention to the measures of the neuronal firing precision, e.g., of the the ISI distribution dispersion. We investigate the properties of the standard deviation, the entropy-based dispersion and the Fisher information-based dispersion. Although standard deviation is used ubiquitously and is almost synonymous to the ''measure of statistical dispersion'', we show, that it is not well suited to quantify some aspects of spiking activity that are often expected intuitively [4,14]. We will show, that the diversity or randomness of ISIs is better described by entropy-based or Fisher information-based dispersions. The difference between entropy and Fisher information descriptions lies in the fact that the Fisher information describes how ''smooth'' is the distribution, while the entropy describes how ''even'' it is. The ''smoothness'' and ''evenness'' might be at first thought interchangeable, but we show that it is not the case.
The illustration of the proposed methods is provided on simple and frequently employed models of stationary neuronal activity, given by lognormal, gamma and inverse Gaussian distributions of ISIs. Finally, we apply the theory on experimental data obtained by recording the spontaneous activity of rat olfactory neurons [15].

Methods
Statistical methods and methods of probability theory and stochastic point processes are widely applied to describe and to analyze neuronal firing [16][17][18]. The probabilistic description of spiking times results from the fact, that the positions of spikes cannot be predicted exactly, only the probability that the spike occurs is given [18]. Thus, under suitable conditions, the ISI or time-to-first spike after the stimulus onset can be described by a continuous positive random variable. We denote this random variable as T. Complete description of T is given by its probability density function f (t), defined on ½0,?).
The (statistical) dispersion is a characteristics of ''variability'' or ''spread'' of the distribution of the random variable T. There are different dispersion measures described in the literature and employed in different contexts, e.g., standard deviation, interquartile range [19], mean difference [20] or the coefficient of local variance [13]. The measures have the same physical units as T.

Standard deviation
By far, the most common measure of dispersion is the standard deviation, s, defined as the square root of the second central moment of the distribution. The corresponding relative dispersion measure is known as the coefficient of variation, c v , where E T ð Þ is the mean value of T. Exponential distribution implies c v~1 , however, this values of c v may occur for other distributions as well.

Entropy based dispersion
The randomness of a probability distribution can be defined as the measure of ''choice'' of possible outcomes. Bigger choice results, intuitively, in greater randomness. For discrete probability distributions such measure of randomness is provided by the Shannon entropy, which is known to be a unique, consistent with certain natural requirements [21]. The Shannon entropy is generally infinite for continuous variables, and therefore it cannot be used for our purposes. Formally, the notion of differential entropy, h(f ), of probability density function f (t), is introduced as however, the value h(f ) can be positive or negative and cannot be by itself used as a measure of randomness [22]. In order to obtain a properly behaving quantity, the entropybased dispersion, s h , was proposed in [23], The interpretation of s h relies on the asymptotic equipartition property theorem and the entropy power concept [22]. Namely, since for the exponential probability density function f exp (t)~1=E T ð Þexp½{t=E T ð Þ holds h(f exp )~1zE T ð Þ we see, that s h is the standard deviation of such exponential distribution, which satisfies h(f exp )~h(f ). Informally, the value of s h is bigger for those random variables, which generate more diverse (or unpredictable) realizations.
Analogously to Eq. (1), we define the relative entropy-based dispersion coefficient, c h , as Note, that Eq. (4) can be equivalently written as where E T ð Þ is the mean value of T and is the Kullback-Leibler distance of the probability density f (t) from the exponential density with the same mean as f (t). From Eq. (5) follows that c h is essentially (up to the scale) equivalent to the measure of spiking randomness, g, proposed in [4], since c h~e g{1 . From the properties of the Kullback-Leibler distance in Eq. (5) follows, that the maximum value of c h is c h~1 , which occurs if and only if f (t) is exponential [22].

Fisher information based dispersion
The Fisher information is a measure of the minimum error in estimating a parameter of a distribution. In a special case of the location parameter, the Fisher information J(f ) does not depend on the parameter itself, and can be expressed directly as a functional of the density f (t) ( [22], p.671), Lt We illustrate that the value of J(f ) is small for smoothly-shaped probability densities. Any locally steep slope or the presence of modes in the shape of f (t) increases J(f ) [24]. Due to the derivative in Eq. (7), certain regularity conditions are required on f (t). In this paper we consider only the densities for which J(f ) takes finite values. Further theoretical considerations are however beyond the scope of this paper. The units of J(f ) correspond to the inverse of the squared units of T, therefore we propose the Fisher information-based dispersion measure, s J , as In analogy with Eqns. (1) and (4) we define the relative dispersion coefficient c J as For exponential distribution holds c J~1 , however, this value is not specific only for the case f (t)~f exp (t).
Just as c h is related to the Kullback-Leibler distance by Eq. (5), we note that J(f ) can be written as [25] Although Eq. (10) is not suitable for evaluation of J(f ), it shows, that both c h and c J are connected on the fundamental level by the concept of the Kullback-Leibler distance.

Basic properties of the proposed measures
Standard deviation (or c v ) measures essentially how offcentered, with respect to E T ð Þ, is the probability density of T and it is sensitive to outlying values. On the other hand, c v does not quantify how random, or unpredictable, are the outcomes of T. Namely, high value of c v (high variability) does not indicate that the possible values of T are distributed evenly [4]. On the other hand, the value of s h (and c h ) quantifies how evenly is the probability distributed over ½0,?). The third measure, c J , is sensitive to the modes and steepness of slopes of the density (due to the dependence on the derivative of the probability density in Eq. (7)). Since multimodal densities can be more evenly spread than unimodal ones, the behavior of c h cannot be generally deduced from c J (and vice versa). The key features of the three considered dispersion measures are illustrated in Fig. 1.
A cartoon with typical density shapes resulting from a combination of c v , c h and c J values range is shown in Fig Note, that the number of possible scenarios is large and therefore Fig. 2 is not exhaustive.

Common distributions of interspike intervals
We choose three widely employed statistical models of ISIs: gamma, inverse Gaussian and lognormal distributions, and analyze them by means of the three described dispersion coefficients c v , c h and c J .
Gamma distribution is one of the most frequent statistical descriptors of ISIs employed in analysis of experimental data   [15,26,27]. Its probability density function parametrized by shape parameter k and scale parameter h is where C(z) is the gamma function [28]. The mean value of the distribution is E T ð Þ~kh and the coefficient of variation is equal to For c v~1 , i.e. k~1, the gamma distribution becomes exponential distribution. By parametrizing the density (11) by c v and substituting it into Eqns. (4) and (9) we obtain the entropy-based and Fisher information-based dispersion coefficients as functions of c v , where Y(z)~d dz ln C(z) is the digamma function [28]. For details of the calculation of c h and c J see Supporting Information S1. Note, that the gamma density is not differentiable at t~0 for The inverse Gaussian distribution is often used to describe neural activity and fitted to experimentally observed ISIs [26,29,30]. This distribution describes the spiking activity of a stochastic variant of the perfect integrate-and-fire neuronal model [18,31]. The probability density function of the inverse Gaussian distribution parametrized by its mean, m~E T ð Þ, and scale parameter s is The coefficient of variation is equal to and the other dispersion coefficients can be expressed as (see Supporting Information S1) c J~ffi where K (1,0) (n,z) is the derivative of the modified Bessel function of the second kind, K (1,0) (n,z)~L Ln K(n,z) [28]. The lognormal distribution of ISIs, with some exceptions [32], is rarely presented as a result of a neuronal model. However, it represents a common descriptor in experimental data analysis [26,30]. The lognormal probability density function parametrized by the mean, m, and standard deviation, s, of variable ln T is In this parametrization, the mean of the lognormal distribution is E T ð Þ~exp mzs 2 =2 À Á and the coefficient of variation is equal to The two other dispersion coefficients, expressed as functions of c v , are (see Supporting Information S1) The dependence of c h on c v is shown in Fig. 3, the dependence of c J on c v is shown in Fig. 4, for all the three mentioned distributions. Obviously, the dependencies are not linear (even not monotonous) and thus neither c h nor c J is equivalent to c v . The dependence of c J on c h is plotted in Fig. 5. We observe, that c h and c J indeed do not describe the same qualities of the distribution, since a unique c h value does not correspond to a unique c J value (and vice versa). Except for the gamma distribution, the dependence between c h and c J forms a closed loop, where c h~cJ~0 for both c v ?0 and c v ??.
Additionally, just as c h and c J are related to c v in Eqns. (13), (14), (17), (18), (21) and (22), c h and c J can also be related to higher statistical moments. For example, the skewness c of the distribution is defined as the ratio of the third central moment and the third power of standard deviation. For gamma distribution holds c~2c v , for inverse Gaussian c~3c v and for the lognormal Thus the curves depicted in Fig. 3 and Fig. 4 would retain their unimodal shapes if plotted in dependence on c.
Different distributions with equal c v and different c h (or c J ) can be found, and vice versa; see Fig. 3 (or Fig. 4) for examples. Therefore, it cannot be said in general that c h , c J are more informative than c v . To provide an example in which c J provides a different view over c v and c h , we consider the folded normal probability density with parameters a,bw0 The shapes of the folded normal probability density function, Eq.

Simulated data
To illustrate the accuracy of the estimators b c h c h and b c J c J of dispersion coefficients c h and c J , we simulated spike trains with gamma, inverse Gaussian and lognormal distributions of ISIs by employing the R and STAR software packages [26,33]. In all the simulations the mean ISI was fixed to 1, while the coefficient of variation, c v , varied from 0:05 to 4:00 in steps of 0:05. In other words, we generated random samples from the mentioned distributions with given parameters. The spike trains represented by sample point processes were constructed by using the generated values as the time intervals (ISIs) between successive events (spikes). Five thousand spike trains, each consisting of 100 ISIs, were simulated for each of the values of c v and for each of the three distributions.
In the first study, the parameters of the distributions were estimated by the maximum likelihood method. For the gamma distribution (11) the maximum likelihood estimators b k k and b h h were found numerically (by minimizing the loglikelihood function). For the inverse Gaussian distribution (15) the maximum likelihood estimators were computed as   Fig. 3. The coefficient c J grows as the average of squared derivative of the probability density function (see Eq. (7)) becomes smaller, that means as the distribution of the interspike intervals becomes more smooth. This confirms that ''smoothness'' and ''evenness'' of the distribution (compare with Fig. 3 Similarly, for the lognormal distribution (19) of ISIs, maximum likelihood estimators of the parameters are b m m~1 n The values of coefficient of variation, c v , were calculated by substitution of the maximum likelihood estimates into Eqns. (12), (16) or (20). Consequently, the other two dispersion coefficients, b c h c h and b c J c J , were computed by substitution of the estimated c v into Eqns. (13) and (14) for the gamma distribution, into Eqns. (17) and (18) for the inverse Gaussian and into Eqns. (21) and (22) for the lognormal distribution.
In the second study, the coefficient of variation was estimated by commonly used moment method as the ratio of the sample standard deviation, b s s, and the sample mean, T, for all the mentioned distributions. Both the entropy-based and Fisher information-based dispersion coefficients were then calculated by substitution of estimate (28) where n~5000 is the number of simulated spike trains and c c h,i c h,i is the value estimated from the i-th spike train. Analogous equation is used for evaluation of b(c J ). The latter measure is the relative standard error, e(c h ), expressed as the ratio of the standard deviation and the mean value of the estimate, As c v grows from zero, the relative standard deviation of the estimators decrease and attains its minima at around c v ¼

Experimental data
In order to examine variability or irregularity of the ISIs in real neurons using the proposed dispersion coefficients, we apply the measures on experimental data. The data come from extracellular recordings of olfactory receptor neurons of freely breathing and tracheotomized rats. Spontaneous, single-unit action potentials were recorded. The single unit nature of the recorded spikes was controlled. The experimental procedures and data analysis were published in [15], where complete details are given.  freely breathing rats and 11 records from tracheotomized rats. The sample sizes range from 150 to 1500 and all records were tested against nonstationarity.
All samples were fitted with inverse Gaussian distribution (15) as a commonly used distribution of ISI. The histogram of ISIs of typical record and fitted probability density function are depicted in Fig. 9. The mean, m, and the scale parameter s were estimated by maximum likelihood method. The fit of the data to the inverse Gaussian distribution was checked by Kolmogorov-Smirnov test. The null hypothesis was not rejected on the 5% level in any sample. The dispersion coefficients c v , c h and c J were calculated by substitution of the estimated parameters into Eqns. (16), (17) and (18).
The values of estimated dispersion coefficients are summarized and shown as box-and-whisker plots in Fig. 10. Generally, the two categories, tracheotomized and freely breathing, do not differ significantly in the medians of c v , c h or c J . Although the ranges of the values overlap in both categories, the values of the criteria seem to be relatively specific with respect to the freely breathing category. The difference between mean values are greater than between medians. However, we can observe that the tracheotomized category achieves higher values of c v and lower values of both c h and c v than the freely breathing category. Taking into account the interquartile-range and the range between the whiskers, the Fisher information-based dispersion coefficient, c J , seems to be the best of the three examined coefficients to distinguish the two categories for this data. Both groups of rats were compared by employing one-sided variant of the Mann-Whitney test to the three respective dispersion coefficients. However, due to the small sample sizes, no differences between the two groups were confirmed at 95% confidence level.
Moreover, obtained scatterplots of pairs of the dispersion coefficients c h and c J are shown in Fig. 11. The two categories of rats are best distinguishable in panel c), for the tracheotomized category having lower values of c h together with lower values of c J than the latter one. Note also the positions of the points in panel c), which confirm that there can be two different c J values corresponding to unique c h value.

Discussion
In recent years, information-based measures of signal regularity or randomness have gained significant popularity in various branches of science [24,[34][35][36][37]. In this paper, we constructed dispersion-like quantities based on these information measures and applied them to the description of neuronal ISI distributions. In particular, we continued the effort initiated in [4,23] by taking into account a variant of Fisher information, which has been employed also in different contexts [24,[38][39][40][41].
We are motivated by the difference between frequently mixed up notions of ISI variability and randomness, which, however, represent two different concepts. Consider, for example, a spike train consisting of ''long'' and ''short'' ISIs with no serial correlations. By adding ''medium'' length ISIs we do not increase the spiking variability, contrary to what expected intuitively, but decrease it. On the other hand, since the count of ISI of different lengths increases, the spiking randomness is increased. Furthermore, even if conventional analysis of two spike trains reveals no difference, the spike trains may still differ in their randomness and the difference is tractable with relatively limited amount of data [4].
Additionally, by considering the Fisher information-based dispersion coefficient, c J , we show that ISI randomness (increasing with diversity of the ISI lengths) and probability density ''smoothness'' are related, but still different notions. For example, all of the tested distributions are ''maximally smooth'' for c v ¼ : 0:5 and ''maximally even'' (maximum ISI randomness) for c v ¼ : 1.
The statistical properties of the parametric estimations of c v and of c h and c J consequently, are illustrated on simulated data. The results show that the accuracy of the dispersion coefficients depends on the distribution. However, similar property can be found: estimated values of c h as well as of c J become accurate at the point of maxima of these dispersion coefficients, regardless on the used ISI distribution. It is shown that the ISI distribution as well as the method used for estimation of the parameters from the sample highly influence the bias of the estimators b c h c h and b c J c J . In this paper, we used the parametrical estimates of c v ,c h ,c J for both simulated and experimental data analyses. Specific parametric family of distributions was assumed and only the parametres were estimated. On the other hand, it is natural to ask for the nonparametric versions of the estimators. The non-parametric estimate of c v is simply calculated by using the first two sample moments. Recently [42], discussed disadvantages of this estimator, stressing out its bias. Non-parametric estimates of the entropy are known [43,44], and we found c h can be estimated reliably. As regards the non-parametric estimate of c J , approaches based either on spline interpolation of the empirical cumulative distribution function [45] or on specialized kernel-based method for the estimation of the probability density function [46] can be used. Nevertheless, the estimation of the Fisher information-based dispersion coefficient c J is a complex task. Preliminary results of our work in progress are promising. The coefficients were also evaluated from the experimental data, spontaneous action potentials of olfactory receptor neurons in tracheotomized and freely breathing rats. Assuming the inverse Gaussian model, the three estimated dispersion coefficients quantify small differences in the two categories. Taking into account their variability, c J seems to be the best measure for distinguishing the categories. Other approach use the pairs of coefficients c v , c h and c J to discriminate between the groups. For the analyzed data, the pair of values c h and c J seems to be the most effective choice.

Supporting Information
Supporting Information S1 Detailed calculation of the coefficients c h and c J for the gamma, inverse Gaussian and lognormal distributions. (PDF) Figure 11. Dependencies between different dispersion coefficients for the same experimental data as in Fig. 10. The data was fitted by inverse Gaussian distribution and from maximum likelihood estimators of its parameters the dispersion coefficients were computed. Two categories of data are distinguished: tracheotomized and freely breathing rats. There are small differences in the two categories, as quantified by all the three coefficients. See Figs. 3, 4 and 5 for inverse Gaussian distribution to see the complete curves of the dependencies. doi:10.1371/journal.pone.0021998.g011