Novel Covariance-Based Neutrality Test of Time-Series Data Reveals Asymmetries in Ecological and Economic Systems

Systems as diverse as the interacting species in a community, alleles at a genetic locus, and companies in a market are characterized by competition (over resources, space, capital, etc) and adaptation. Neutral theory, built around the hypothesis that individual performance is independent of group membership, has found utility across the disciplines of ecology, population genetics, and economics, both because of the success of the neutral hypothesis in predicting system properties and because deviations from these predictions provide information about the underlying dynamics. However, most tests of neutrality are weak, based on static system properties such as species-abundance distributions or the number of singletons in a sample. Time-series data provide a window onto a system’s dynamics, and should furnish tests of the neutral hypothesis that are more powerful to detect deviations from neutrality and more informative about to the type of competitive asymmetry that drives the deviation. Here, we present a neutrality test for time-series data. We apply this test to several microbial time-series and financial time-series and find that most of these systems are not neutral. Our test isolates the covariance structure of neutral competition, thus facilitating further exploration of the nature of asymmetry in the covariance structure of competitive systems. Much like neutrality tests from population genetics that use relative abundance distributions have enabled researchers to scan entire genomes for genes under selection, we anticipate our time-series test will be useful for quick significance tests of neutrality across a range of ecological, economic, and sociological systems for which time-series data are available. Future work can use our test to categorize and compare the dynamic fingerprints of particular competitive asymmetries (frequency dependence, volatility smiles, etc) to improve forecasting and management of complex adaptive systems.

company was removed from analysis. 6 remaining PERMNOs, 88313, 88319, 88587, 89463, 89813, 90352, and 90435, had zero entries -these PERMNOs were removed from the data. The PERMNOs for the remaining 451 S&P500 companies used in our analysis are listed at the bottom of this supplemental information le.
Share prices were converted to relative abundances to obtain market shares, and each company's capitalization (the product of share price and share volume) was converted to relative abundances to obtain the market capitalization. The dates of the day-end prices were noted and time-increments were dened as the number of days between the dates of the day-end prices (e.g. ∆t = 1 for most weekdays, ∆t = 3 for most weekends, etc.).
For a comparison of relative-abundance-distribution based methods of assessing neutrality, a time point was taken from each of the datasets, the counts were rareed to 5000 for comparison across datasets (for the market capitalization, the unit was taken as the market share of the smallest company), and species-abundance distributions were t using the R package 'sads'. Many of the datasets appear neutral from this perspective, and all of the datasets whose RADs are well-t by neutral theory's expectations are in fact rejected using our test. Figure 1: Empirical rank-abundance distributions (circles) plotted against the best-t from neutral theory (blue line). Tests based on these static features may provide useful comparisons between models, but time-series data are ultimately needed to distinguish between models with similar or identical stationary properties.

Constant Volatility Transformations for the Wright-Fisher Process (WFP)
The WFP is dened by the Itô stochastic dierential equation dening uctuations in relative abundances in an n-species community, X t = X 1 t , ..., X n testing if the volatility comes from our particular model, is to nd a function, f , which has constant volatility when its input is a WFP and non-constant volatility otherwise. Thus, we seek a function f : for some constant, C. Applying Itô's lemma to equation 1, we have that If we let our constant be C = 2 a constant volatility function will be a function f such that ∇f T Σ∇f = 1 noting the trivial case with C = 0 can always be obtained by any function f X 1 t + ... + X n t = f (1). Simplifying notation by denoting X i t = x i , equation (5) can be expanded using equation (2): If n = 3, and we denote x 1 = x, x 2 = y and x 3 = z, we obtain the partial dierential equation dening constant-volatility transformations for a 3-species WFP.
When n = 2, the PDE in equation 6 is simply which can be solved by substituting u = x + y and v = x − y, simplifying to Since X t ∈ ∆ 2 and u = 1 the solution is simply For n > 2, a set of solutions can be intuited by using the grouping invariance of the WFP, i.e. that higher dimensional Wright-Fisher Processes can be collapsed into lower-dimensional Wright-Fisher processes by simply grouping species. For example, where n = 3, we can set w = x + y and the resulting process Z t = (w, z) is a 2-species WFP. Thus, f = arcsin (x + y − z) is a CVT for n = 3. Since any re-grouping of species in a WFP yields a WFP, and since we always know the CVTs for the 2-species case, we can dene a set of CVTs for an n-dimensional WFP: where a i = ±1. All 2 n of the these CVTs solve equation 6 and have constantvolatility when operating on a WFP.
Two details about this test should be noted. First, one might consider constructing a test on the magnitude -and not just the constancy -of the volatility.
Doing so will provide a stronger test of the particular WFP written in equation 1, but will have a higher false-positive rate as a test of the neutrality of competition. This is clear by considering the process formed by rescaling the volatility of a standard WFP by a scalar, α.
and thus the species can still be grouped to form lower-dimensional WFPsthis process is still neutral in the sense that species identities do not aect their dynamics, but its volatility is a dierent magnitude than the WFP of equation (1). Tests for the constancy, and not the magnitude, of the volatility of f a will provide more robust tests of the neutral symmetry -grouping invariance.
The second detail is that the 2 n constant volatility tests are redundant and some are uninformative. For instance, the sample volatilities of f a and f −a are equal, and the volatilities for a = (1, 1, ..., 1) are zero since f ±1 = arcsin (±1). Thus, there are only 2 n−1 − 1informative choices of CVT.

Modication of KS-test to account for dependence among CVTs
The P-values arising from CVT tests are dependent as two CVTs may overlap greatly in their grouping of species in the community. Consequently, a standard KS-test on the distribution of P-values will have a high false-positive rate (FIGURE S1).
Correcting the high false positive rate of a standard KS-test requires accounting for the dependence among the CVTs. Chicherportiche et al. [3] developed a modied goodness of t test for dependent data that requires calculation of copulas. However, rather than calculating copulas explicitly and using the methods developed by Chicherportiche et al., we resort to simulation of the null distribution of the KS statistic for CVTs operating on a WFP.   A full perturbation analysis including varying λ and T sim , or an explicit calculation of the copulas, will improve the accuracy of this method for testing the neutrality of competitive systems.

Simulation of Surrogate Data
Simulating the WFP is not trivial due to the bounded state space, ∆ n . Some tools for the numerical simulation of the WFP were developed in the thesis of Washburne (2015) [4]. The relevant tools are discussed here.

WFP simulation
We used Washburne's (2015) methods for simulating the WFP, namely simulating a log-ratio transformation, for i = 1, ..., n. Trajectories for the log-ratio transformation of the WFP were computed by integrating the SDE, dt, shown to keep X t ∈ ∆ n without aecting the bilinear quadratic covariation of X t [4].
We also use Washburne's method for factoring the covariance matrix, Σ, into a zero-sum matrix and a competition matrix, Here, 1 k is the k × 1 vector containing all ones, 0 k is the k × 1 vector containing all zeroes, and I k is the k × k identity matrix. V is a diagonal matrix whose ith diagonal is the ithe element of the vectorized, upper-triangular region of Σ. For example, if n = 4 and X 1 t = x, X 2 t = y, X 3 t = z and X 4 t = w, we have the All WFP trajectories simulated here were simulated by Euler-Maruyama integration of the dimension-corrected Itô SDE of the log-ratio transform, g, using the covariation matrix factorization described above. The trajectories of g t were then mapped through the inverse of g t to obtain the WFP X t .

Parameter Estimation
Surrogate datasets are constructed by numerically integrating equation (1) over a time interval, [0, T sim ]. To match at WFP to the data, we need to estimate parameters in the model, λ, ρ, and T . First, we estimate λ and ρ. Second, since λ denes an autocorrelation time, we can use λ to estimate the timescale of the simulations, T sim . The parameters λ and ρ can be estimated by their relation to the stationary mean & variance of the WFP. Consider a two-species WFP, n = 2, and for simplicity let's use the notation X 1 t = X t and X 2 t = Y t and ρ 1 = ρ. Denoting the rst and second moments of the WFP m t = E[X t ] and s t = E[(X t ) 2 ], we can use Itô's lemma and Itô's isometry to yield the system of ODEs ds t dt = 2 (1 + λρ) m t − 2 (1 + λ) s t which yields the stationary mean, m ∞ = lim t→∞ m t and second moment s ∞ = lim t→∞ s t Thus, the stationary mean is µ = ρ and the stationary variance is given by 1+λ . These two equations give one way of estimating λ and ρ. Given a two-species system, we estimate the sample mean, µ, and sample variance,v 2 , and from those obtain our estimates of λ and ρ: The last parameter that remains to be estimated is the length of time, T sim , to simulate our surrogate datasets. An estimate,T sim , can be obtained by noting that the true autocorrelation time of the WFP is τ c = 1 λ , i.e. the autocorrelation Cor(X t , X t+τc ) = e −1 . For a dataset with M time points, the autocorrelation function can be interpolated to nd the time lag, τ with 0 ≤ τ ≤ M , where C(X t , X t+τ ) = e −1 . We want a surrogate dataset to be simulated for the same number of autocorrelation times as our empirical data, i.e.T sim τc = M τ , thus the time-length for the surrogate dataset will bê T sim = M λτ . To analyze the distribution of P-values from constant-volatility tests, we generated 16,000 surrogate trajectories of f t . To generate these surrogate data, 16,000 random groupings, a, were chosen by randomly assigning a i = ±1,ρ was estimated as average relative abundance of one group,λ was kept atλ = 18.0063 from the previous estimation, and one WFP trajectory was simulated and subsampled as above. Generating 16,000 trajectories required discarding 11 trajectories which left the bounded domain. The mean distance from the boundary, δ = 0.5 − |ρ − 0.5|, was signicantly lower in the 11 discarded trajectories compared to trajectories included in our analysis: trajectories included in our analysis had δ = 0.3598, compared to δ = 0.0367 for the 11 trajectories discarded.
Consequently, these surrogate data had a very slight bias against extremely lop-sided groupings.
Part V PERMNOS