An O(n) method of calculating Kendall correlations of spike trains

William Redman

doi:10.1371/journal.pone.0212190

Abstract

The ability to record from increasingly large numbers of neurons, and the increasing attention being paid to large scale neural network simulations, demands computationally fast algorithms to compute relevant statistical measures. We present an O(n) algorithm for calculating the Kendall correlation of spike trains, a correlation measure that is becoming especially recognized as an important tool in neuroscience. We show that our method is around 50 times faster than the O (n ln n) method which is a current standard for quickly computing the Kendall correlation. In addition to providing a faster algorithm, we emphasize the role that taking the specific nature of spike trains had on reducing the run time. We imagine that there are many other useful algorithms that can be even more significantly sped up when taking this into consideration. A MATLAB function executing the method described here has been made freely available on-line.

Citation: Redman W (2019) An O(n) method of calculating Kendall correlations of spike trains. PLoS ONE 14(2): e0212190. https://doi.org/10.1371/journal.pone.0212190

Editor: Bryan C. Daniels, Arizona State University & Santa Fe Institute, UNITED STATES

Received: June 12, 2018; Accepted: January 29, 2019; Published: February 14, 2019

Copyright: © 2019 William Redman. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: There is no data for this paper. The code used to evaluate the two principle methods discussed in the manuscript have been made available on Github (https://github.com/william-redman/Kendall-Correlation-for-Large-Spike-Trains) as mentioned in S1 Code.

Funding: The author received no specific funding for this work.

Competing interests: The author has declared no competing interests exist.

Introduction

The Kendall correlation was first introduced by Maurice Kendall in 1938 [1]. As a rank correlation, it takes into account the specific ordering of the elements of the sets it is correlating. A Kendall correlation, τ, equal to 1 is interpreted as the elements in the two sets being ordered in the same way. τ = −1 is interpreted as the elements in the two sets being ordered exactly oppositely. And τ = 0 is interpreted as the ordering of the two sets having no relation to one another.

Despite being used in a number of other scientific fields [2–4], it is only recently that the Kendall correlation has started to become appreciated, and implemented, in neuroscience. In particular, due to the usual sparseness of spike trains (i.e. the large number of zeros), the Kendall correlation has been shown to be particularly appropriate for computing pairwise correlations between spike trains, especially as compared to Pearson’s correlation [5–7]. Recently, it was used to explore the place field structure of place cells in the hippocampus [7], and generally pairwise correlations can be useful for revealing aspects of the behavior of the recorded, or constructed (in the case of computational/theoretical studies), networks. We note that for the remainder of the paper, by spike train we mean specifically a vector of length n whose i^th element is a 1 if the corresponding neuron fired at least once during the i^th time bin of the recorded interval and 0 otherwise. This is a frequently used way to talk about spike trains and is appropriate if firing is particularly sparse or if the time bin size is sufficiently small.

A simple, non-optimized, way of computing the Kendall correlation of two row vectors, X and Y, is MATLAB’s function, corr(X, Y, ‘Type’, ‘Kendall’). On MATLAB’s website [8], they define the Kendall correlation as (1) where , and (2)

However, as additionally stated, MATLAB’s function also has a normalization constant in the calculation of τ that adjusts for ties [8]. A Kendall correlation that takes this additional consideration into account is often referred to as τ_b in the literature [9]. Therefore, the true way in which MATLAB calculates the Kendall correlation of the row vectors X and Y is (3) where n₀ = n(n − 1)/2, n₁ = ∑_i t_i(t_i − 1)/2, and n₂ = ∑_j u_j(u_j − 1)/2. The sums of n₁ and n₂ are over all the distinct values X and Y take (respectively), and t_i is the number of elements in X equal to the i^th distinct value of X (u_j is the same, but for Y).

As can be seen from the definition of K, calculating τ requires summing over many of the pairs of values in X and Y (in fact, n(n − 1)/2 pairs, which means that the run time is O(n²)). For large spike trains, this results in a large computation time. For this reason, a faster, O(n ln n) method was developed [10], which makes use of the existence of a mapping between sorting and Kendall correlation. Additional work has been done using sorting and balanced tree structures in cutting edge ways to decrease the run time of other O(n ln n) methods [11]. While these methods—we will below consider specifically Knight’s method [10]—have great power because they are valid for arbitrary vectors, like the O(n²) method implemented by MATLAB, the generality is unnecessary for computing the Kendall correlation of spike trains. Below, we specifically take the inherent structure of spike trains (that is, that their elements take values only from {0, 1}) under consideration to derive a faster method of calculating Kendall correlations specific to spike trains. We show that our new method is O(n) and then examine how much faster our method is than Knight’s method under various conditions.

Materials and methods

As mentioned above, the motivating idea for the following method is that, since spike trains take values only in {0, 1}, by taking this fact under consideration, we might be able to speed up the calculation of the Kendall correlation. In particular, we show that we can write an explicit formula for K (from Eq (1)) that can be evaluated very quickly—in fact, in O(n).

Considering Eq (2), we see that there are two principle cases we need to consider when calculating K: the case where X_i and X_j are in the same order as Y_i and Y_j (i.e. where ξ*(X_i, X_j, Y_i, Y_j) = 1), and the case where they are in the opposite order (i.e. where ξ*(X_i, X_j, Y_i, Y_j) = −1). The third case, ξ*(X_i, X_j, Y_i, Y_j) = 0, obviously doesn’t contribute to the value of K. We now consider these two cases separately.

Same order case

This case happens only when X_i = Y_i = 1 and X_j = Y_j = 0, or when X_i = Y_i = 0 and X_j = Y_j = 1 (for i < j).

We define the active set of X to be (4) where 1 ≤ i ≤ n. We similarly define the active set of Y, A^Y.

We now define the combined active set, or the set of positions in the spike trains such that X_i = Y_i = 1, as (5)

Now let N = {1, 2, …, n}. We define the silent set of X as (6) where ⋅\⋅ is the set minus operator. We similarly define the silent set of Y, S^Y.

We now define the combined silent set, or the set of positions in the spike trains such that X_j = Y_j = 0, as (7) With Eqs (5) and (7), we can find the contribution to K from this case. The number of ways ξ*(X_i, X_j, Y_i, Y_j) = 1, K⁺, is (8) where |⋅| is the function that returns the number of elements of the set. We see clearly that the first sum in K⁺ is the number of ways X_i = Y_i = 1 and X_j = Y_j = 0, and the second sum in K⁺ is the number of ways X_i = Y_i = 0 and X_j = Y_j = 1.

By the relationship between the two sums in Eq (8), we can simplify K⁺ to be (9)

Opposite order case

This case happens only when X_i = Y_j = 1 and X_j = Y_i = 0, or X_i = Y_j = 0 and X_j = Y_i = 1 (for i < j).

We define the difference of X as (10)

We similarly define the difference of Y, ΔY. ΔX is the set of positions in the spike trains where X_i = 1 and Y_i = 0 (vice versa for ΔY).

With these we can now find the contribution to K from this case. The number of ways ξ*(X_i, X_j, Y_i, Y_j) = −1, K⁻, is (11) where the first sum in K⁻ is the number of pairs (i, j) (where i < j) such that X_i = Y_j = 1 and X_j = Y_i = 0, and the second sum in K⁻ is the number of pairs (i, j) such that X_i = Y_j = 0 and X_j = Y_i = 1.

Again, the sums are related (as they were in Eq (8)), so we can re-write K⁻ as (12)

Ties

The final thing needed in order to calculate K is the number of tied pairs in X and Y, n₁ and n₂. This is easy in the case of spike trains, as the number of ties for the value 1 is just the sum of all the elements in the train, and the number of ties for the value 0 is just n minus that sum. Therefore, using the equation given for n₁, we have (13)

The same is true for n₂ (with Y in place of X).

Therefore, with Eqs (9), (12) and (13), we can write the Kendall correlation, Eq (3), of two neural spike trains as (14) where K⁺, K⁻, n₀, n₁, and n₂ can be found with the formulas we have given for them. Note that Eqs (5), (7), (9), (10), (12) and (13) are all linear in n, i.e. O(n). Therefore, Eq (14) is O(n).

Comparison

To compare the presented method, Eq (14), with Knight’s method and MATLAB’s method, we created random binary vectors with a specified “sparseness”. Here sparseness refers to the expected fraction of 1s present in the vectors (or, in the neural context, the expected activity over a given time interval). We generated these vectors by using MATLAB’s rand function, with which we generated 1 × n vectors with elements uniformly drawn from (0, 1) [12]. We then set every element in each vector that had a value less than the sparseness we specified to 1, and all other elements to 0. Put another way, if X^rand was our random 1 × n vector with elements drawn from (0, 1), then we used the transform (15)

We then used MATLAB’s method, Knight’s method, and our method to calculate the Kendall correlation of and (where was similarly generated). To record the time it took for each method, we used MATLAB’s built-in tic toc function [13]. We did all of the calculations on a 2014 MacBook Air (1.4 GHz Intel Core i5) running MATLAB 2015a.

For details of how we implemented Knight’s method, see the S1 Text.

Results

The results of comparing our method to Knight’s and MATLAB’s methods, are shown in Fig 1. Unsurprisingly, both our method and Knight’s method show considerable advantage over the O(n²) method that is implemented by MATLAB [8] (Fig 1a). However, our method is definitively faster. Importantly, this holds true for a range of sparseness values (Fig 1b), although our method shows a slight slowing down for larger sparseness values, while Knight’s method does not. Our method is on average ≈ 35 times faster for a sparseness of 25% and ≈ 60 times faster for a sparseness of 1%. Because a sparseness of 25%, the maximum we tested, is unrealistic for any neural simulation or recording, our method is faster than Knight’s in a neurally plausible regime.

Download:

Fig 1. Run times for all three methods.

(a) The run time as a function of spike train length using Knight’s method (black), our method (red), and the standard MATLAB method (green) for a sparseness of 5%. N = 10 and error bars are standard deviation. (b) The run time as a function of spike train length for different sparseness values: dotted line (1%), dashed line (5%), solid line (25%). N = 100, error bars are standard deviation, and colors are the same as in (a).

https://doi.org/10.1371/journal.pone.0212190.g001

Finally, for all the correlations between spike trains we computed, we checked that the two Kendall correlation values were within 10⁻¹² of MATLAB’s Kendall correlation function (see Table 1). Therefore, we feel confident that our method is correct and equivalent (up to machine error) to MATLAB’s method.

Download:

Table 1. Examples of calculated Kendall correlation for all three methods.

Kendall correlation of the spike trains listed at the top of the table (both with length 10⁴) for the three methods.

https://doi.org/10.1371/journal.pone.0212190.t001

Discussion

We have presented a novel method to calculate Kendall correlations of large spike trains, and have demonstrated its advantage (in terms of computation time) to the standard for fast Kendall correlation computation [10]. We achieved this by specifically taking the structure of spike trains (the fact that they are made up of 1s and 0s) into consideration, and deriving explicit formulas for the components of the Kendall correlation (Eqs (9), (12) and (13)). These formulas are all linear in n, meaning our method is O(n), unlike Knight’s method which is O (n ln n). We have also, by way of computation, provided evidence that our method is correct and equivalent (up to machine error) to MATLAB’s standard method.

With a significantly faster method to compute the Kendall correlation between large spike trains, we hope that the Kendall correlation will become a more accessible tool for neuroscience. While we know there are faster ways to implement algorithms similar to Knight’s (as was explored in [11]) that may be faster than the method provided here, the simplicity of our method (a few linear equations) makes it much more appealing to neuroscientists who have limited technical knowledge and/or interest in computer science. We imagine it will be especially useful in computational/theoretical studies where large, sparse spike trains are frequently generated and whose pairwise correlations provide insight into the complex properties of the network. We hope that the fact that pairwise correlations over significantly longer time intervals (or equivalently, between spikes trains of longer lengths) can now be calculated quickly, more in-depth analysis of generated networks (in addition to analysis of observed/recorded networks) will be achieved.

Finally, we hope that our results make clear the usefulness of considering specifically the structure of spike trains when calculating certain quantities. We’re sure many other measures can be significantly sped up when taking this into consideration.

Supporting information

S1 Code.

https://doi.org/10.1371/journal.pone.0212190.s001

(PDF)

S1 Text. Implementation of Knight’s method.

https://doi.org/10.1371/journal.pone.0212190.s002