Synchrony is more than its top-down and climatic parts: interacting Moran effects on phytoplankton in British seas

Large-scale spatial synchrony is ubiquitous in ecology. We examined 56 years of data representing chlorophyll density in 26 areas in British seas monitored by the Continuous Plankton Recorder survey. We used wavelet methods to disaggregate synchronous fluctuations by timescale and determine that drivers of synchrony include both biotic and abiotic variables. We tested these drivers for statistical significance by comparison with spatially synchronous surrogate data. Identification of causes of synchrony is distinct from, and goes beyond, determining drivers of local population dynamics. We generated timescale-specific models, accounting for 61% of long-timescale (> 4yrs) synchrony in a chlorophyll density index, but only 3% of observed short-timescale (< 4yrs) synchrony. Thus synchrony and its causes are timescale-specific. The dominant source of long-timescale chlorophyll synchrony was closely related to sea surface temperature, through a climatic Moran effect, though likely via complex oceanographic mechanisms. The top-down action of Calanus finmarchicus predation enhances this environmental synchronising mechanism and interacts with it non-additively to produce more long-timescale synchrony than top-down and climatic drivers would produce independently. Our principal result is therefore a demonstration of interaction effects between Moran drivers of synchrony, a new mechanism for synchrony that may influence many ecosystems at large spatial scales.

155 If x n (t) (t = 1, . . . , T ) is the nth of N time series from different locations, and W n,σ (t) is its wavelet transform, then we refer to w n,σ (t) = W n,σ (t)/|W n,σ (t)| as the phasor-normalized transform. We also define where overbar denotes complex conjugation, as the power-normalized transform, using the same notation, 156 w n,σ (t), as was used for the phasor-normalized transforms. The normalization used in any given context 157 will be specified textually if it is not immediately clear. The terminology power-normalized transform 158 was chosen because the denominator in the expression for the power-normalized transforms w n,σ (t) is 159 the square root of the average wavelet power of time series over locations. That denominator is only a 160 single positive rescaling factor, so the w n,σ (t) contain essentially the same information as the W n,σ (t).

161
The phases of w n,σ (t) and W n,σ (t) are the same, and are equal to the phase of oscillation in x n (t) at time 162 t and timescale σ.

163
Appendix S6 Spatial coherence 164 The spatial coherence of two variables x where the w are power-normalized transforms (Appendix S5). The spatial coherence takes values between 166 0 and 1 (lemma 4). Because w (0) n,σ (t)w (1) n,σ (t) is a complex number with phase equal to the phase difference 167 between the two wavelet components, the sum over n and t is large if the phase difference is consistent the spatial coherence is the (standard) wavelet coherence [Torrence andCompo, 1998, Addison, 2002].
We use power-normalized transforms in this section. The coefficient β k (σ) is a complex number with 179 phase equal to the phase difference between the fluctuations in x  Hilbert space under the inner product w n,σ (t), and we denote the 188 associated norm by · .

189
Applying lemma 5 with V = Span(w The point v is an approximation of w sense that, for any other w ∈ V , the spatial coherence, n,σ (t) with the power-normalization 193 of w is less than or equal to the spatial coherence, n,σ (t) with the power-normalization of 194 v. This follows from lemma 5.

195
The point v can be written explicitly as a complex linear model expressing the transforms w n,σ (t) (lemma 5). 199 We know a solution of this equation exists because v ∈ V = Span(w  statistics calculated for surrogate and real data when the real data has been pre-normalized. For spatially 219 synchronous surrogates, for each timescale σ the same phase is added to the Fourier components at σ for 220 each of the x n (t). Thus surrogates also preserve cross-spectrum and cross-correlation properties of the 221 time series. This is the approach developed by Prichard and Theiler [1994]. For asynchronous surrogates, 222 independent phases are added for each x n (t), so that synchrony between the x n (t) is eliminated at all 223 timescales. 224 We developed a significance test for adding variables to the linear model constructed in Appendix S7.

226
Consider adding another variable to an explanatory model consisting of K variables. If x is significantly better than the linear model The resulting test, now described in detail, is the wavelet linear model analogue of standard F -tests used 231 to compare nested models in a classic general linear modeling framework.

232
Using the notation of Appendix S7 and V = Span(w j,n,σ (t) as the power-normalized transforms of 234 a j th synchrony-preserving surrogate of x (K+1) n (t) for j = 1, . . . , 10000, and write w j,n,σ (t)) for each timescale, σ. Thus we have d data,σ and d j,σ for 236 j = 1, . . . , 10000, making the σ dependence explicit. We consider that the model (13) is a significant 237 improvement over the model (14), for timescale σ at significance level α, if d data,σ 2 is smaller than a 238 fraction 1 − α of the d j,σ 2 . In words, the more complex model is considered better at σ if the additional 239 predictor allows a reliably greater improvement in the representation of w (0) n,σ (t) than would a randomized 240 surrogate data set with the same spatial and temporal autocorrelation properties.

241
The requirement that d data,σ 2 is smaller than a fraction 1 − α of the d j,σ 2 at timescale σ is 242 equivalent to the requirement that the spatial coherence at σ between the right side of (13) and w is greater when the right side of (13) is fitted using the actual w n,σ (t) more than would a randomized, surrogate predictor. To see this, we write the squares of the 247 spatial coherences to be compared as where (18) and (22) follow because v data ⊥ d data and v j ⊥ d j . Thus the spatial coherence (15) is larger than the spatial coherence (19) for exactly the same j for which d data 2 is smaller than d j 2 .

251
The procedures described above give one statistical test for each timescale σ. But the results of 252 the tests are not independent across values of σ, particularly for similar values. To determine whether 253 (13) is a significant improvement over (14) for a whole band of timescales σ in aggregate, the rank of 254 d data,σ 2 relative to the surrogate values d j,σ 2 was found for each timescale in the band (rank 1 was the 255 lowest) and the mean rank was computed inside the band. The same procedure applied to each surrogate 256 separately (i.e., ranking each d j,σ 2 against the d j ,σ 2 for all j = j) produced surrogate mean rank 257 values for the timescale band, and the proportion of these mean ranks less than the actual mean rank 258 provided a p-value for the band.

259
Two timescale bands, as opposed to three or more bands, were used for simplicity and because wavelet  including more than one temperature variable, more than one salinity variable, more than one wind 283 variable or more than one cloud cover index were discarded. Of the remaining models, and separately for 284 long and short timescale, we retained only those models for which each variable included in the model 285 significantly improved the fit over not including it (using the significance methods described above).

286
This step ensured the selected model was not merely the best model among poor alternatives, but also 287 explains a significant amount of variation in the response transform w were ranked (again, separately for long and short timescales) via a leave-one-out cross validation criterion.

291
The leave-one-out cross validation was performed as follows. For each model remaining after the above words, when leaving out location n we wrote (using the notation of section Appendix S7) w n =n ,σ (t) 296 as v data + d data and found the optimal (minimal d data,σ 2 ) coefficients β k for n = n . We then found 297 d data,σ 2 at location n using these coefficients. The average of d data,σ 2 over σ in the desired frequency 298 band gave the leave-one-out goodness of fit for this location n for the model. The lower the value the better the fit. We averaged over the 26 locations, left out in turn, to obtain a single index of leave-one-out 300 goodness of fit for each model for the frequency band. The resulting measure was intended to be robust 301 against overfitting by the addition of more variables. The remaining model with the best leave-one-out 302 goodness of fit was selected at high frequencies, and separately at low frequencies.

303
Appendix S10 Checking phase relationships 304 The phases of the model coefficients β k (σ) are supposed to represent the phase lag of effects (or asso-305 ciations) at timescale σ of x    timescale σ, time t, and location n. We tested for this phase correspondence and found it to be good (Fig.   310   S1). Especially at long timescales, a particular phase relationship between predictor and PCI transforms 311 is maintained consistently through time (Fig. S1a, b). PCI and temperature had a consistent phase 312 relationship on long timescales, corresponding to a phase shift of one quarter of a cycle (Fig. S1a,e). The 313 average phase difference between PCI and temperature transforms (average of the green line in Fig. S1e 314 over long timescales) was 0.42π radians at long timescales (π/2 is exactly a quarter cycle). PCI and C.

315
finmarchicus were found to be in a consistent anti-phase relationship (Fig. S1b,f). The average phase 316 difference between PCI and C. finmarchicus transforms (average of the green line in Fig. S1f) was 0.78π 317 radians for long timescales and 0.99π radians for short timescales, both of which were reasonably close 318 to anti-phase (π radians phase difference). The average phase differences between PCI and echinoderm 319 and decapod larvae, respectively, on short timescales were 0.09π and 0.01π radians, both of which are 320 approximately in-phase relationships.

322
For each predictor variable, the spatial coherence with PCI was computed, as was the spatial coherence 323 between PCI and 1000 surrogate versions of the predictor in which time series locations were randomly 324 permuted but time series were otherwise unmanipulated. For each timescale in the relevant band (long 325 timescales for growing season temperature and C. finmarchicus abundance, short timescales for C. fin-326 marchicus abundance and for echinoderm larvae and decapod larvae) the spatial coherence for the un-327 manipulated data was ranked in the distribution of surrogate spatial coherences and ranks were averaged 328 across the timescale band. The same procedure was then applied to each surrogate spatial coherence 329 separately, i.e., surrogate spatial coherences were then ranked in the other surrogate spatial coherences 330 and these values were averaged for each surrogate across the timescale band. The proportion of the sur-331 rogate mean ranks less than the actual mean rank was then computed. On long timescales, mean spatial 332 coherence rank was higher than that of 989 of 1000 such surrogates values for growing season temperature, 333 and 1000 of 1000 for C. finmarchicus. At short timescales, mean spatial coherence rank was higher than The magnitude of this quantity is between 0 and 1. The wavelet phasor mean field is one natural choice  The significance of a wavelet phasor mean field magnitude |r σ (t)| at timescale σ and time t is obtained 361 by comparing to a distribution of the mean field magnitude of N random phasors, where the u n are independent uniformly distributed random variables on the unit interval. For large N , 363 this is a Rayleigh distribution [Strutt, 1902]. For any N a distribution of possible magnitudes for the sum 364 can quickly numerically generated by summing a large number of sets of N random unit phasors.

365
If phase synchrony occurs for σ and t, so that the unit phasors w n,σ (t) have similar phase for all or 366 most n = 1, . . . , N , then |r σ (t)| will tend to be larger than the vast majority of the random quantities 367 of (24). The test applies for one pair of σ and t values at a time, so one must be cognizant of multiple 368 testing errors when examining ranges of σ and t. The approach used here is the same as that used in field, but now with the w n,σ (t) representing power-normalized transforms (Appendix S5).

375
In addition to the reasoning that the magnitude of the the wavelet mean field will tend to be bigger when fluctuations at time t and timescale σ have similar phase at all sampling locations, and are therefore synchronized, the wavelet mean field is also a natural choice for a time-and timescale-specific measure of strength of synchrony because of its mathematical properties. The mean squared magnitude for all σ if and only if the time series x n (t) (t = 1, . . . , T ) are identical (lemma 6). Also, this quantity is the power of the average time series divided by the average of the powers of all the time series (lemma 6). If time series are unsynchronized, power in the average time series will be reduced, as unsynchronized fluctuations will tend to cancel. In contrast, synchronized fluctuations will reinforce each other and contribute power to the average. In this way 1 T T t=1 |r σ (t)| 2 represents synchrony for each timescale, σ. It is a generalization of var 1 , a familiar quantity interpretable as synchrony because it represents the extent to which fluctuations in   n,σ (t)) then we make use of the model Here, dependencies on σ, n and t 386 are initially made explicit for clarity. By the reasoning of Appendix S7, we can write This is the right side of the best-fitting model (Eq.2) from the main text. 388 We let h n,σ (t) = vn,σ(t) vn,σ(t) and Π n,σ (t) is power normalized, σ , which is a nonnegative real number, is less than or equal to 1. The larger Π (0h) σ is, the closer 391 h n,σ (t) comes to v n,σ (t), and the closer v n,σ (t) comes to w = v n,σ (t) (33) Π (0h) σ equals the spatial coherence of w (0) n,σ (t) and h n,σ (t), which explains the choice of notation Π (0h) σ (see 393 Appendix S6). By orthogonality of v n,σ (t) and d n,σ (t) (i.e., v n,σ (t), d n,σ (t) = 0), h n,σ (t), d n,σ (t) = 0, 394 as well.

395
Theorem 1. A Moran theorem for wavelet linear models Using the notation defined above, and if 396 r (0) n,σ (t) is the wavelet mean field of the x If populations x In other words, synchrony of x Thus |Π   Proof. We omit subscripts σ and time arguments t, sums over n and m are understood to be from 1 to 409 N , and sums over t are understood to be from 1 to T .

414
The assumption of non-interacting populations not affected by the environments of neighboring sites 415 was interpreted to mean (36) is negligible. This is reasonable because for such populations, the magnitude that the cross terms of (36) were negligible for our best short-and long-timescale models, finding that 419 they were (Appendix S15).

420
Appendix S15 Predicted synchrony, fractions of synchrony explained, 421 and cross terms 422 Using the reasoning behind theorem 1 of Appendix S14, and continuing the notation from that section, 423 we can consider |Π

436
We tested the assumption that the cross terms of (36) were negligible for our best short-and long-437 timescale models of PCI as follows. The quantity was averaged over long timescales for our best long-timescale model of PCI. The result was 11%. The The quantity q is also, as we now show, the fraction of the total power in the mean signal 1 N n x Here we use the definitions of the power normalized transform and the mean field. Thus the power of and the fraction of the total power of the mean explained by synchrony in the x (k)

458
Appendix S16 Testing for significance of interaction effects 459 We tested whether the synchrony attributable to our model included a substantial effect due to interac-          teractions among predictors 474 The interaction terms in the following theorem indicate synchrony is not only the 'sum of its parts'.

475
Theorem 2. Synchrony attribution theorem Using the notation of the previous two sections, the 476 portion of synchrony explained by a wavelet model can be partitioned as The kth summand in the first term on the right is the timescale-specific portion of synchrony attributable to 478 the kth predictor. The second term on the right is the timescale-specific portion of synchrony attributable 479 to interaction effects. The second term is real-valued and can be positive or negative.

480
Proof. The following equations omit explicit dependencies on σ and t for brevity.
The remaining statements of the theorem follow straightforwardly.

482
Applying the synchrony attribution theorem, we can partition the fraction of synchrony explained 483 (47) as q(σ) = k q k (σ) + q int (σ), where in biological dependencies from before to after the shift, so that our results represent an average of 494 distinct phenomena before and after the regime shift instead of being a reflection of one consistent set of 495 relationships between variables. This possibility is difficult to assess because time series are too short to 496 reliably perform wavelet analyses separately for data before and after the shift.

497
Appendix S18 Tests of methods using the numeric example 498 Treating the simulated time series of the numeric example (α(n, t) and β(n, t) from Fig.1a,d and γ (3) (n, t)) 499 in the same way as real ecological data, we evaluated our wavelet methods by checking they gave results 500 consistent with the construction of those data. 501 We constructed and tested a wavelet model of γ (3) (n, t) as described in Appendix S9. Working back 502 from the time series only, we constructed the best fit model of the wavelet transform of γ (3) (n, t) in terms 503 of the wavelet transforms of its two drivers α(n, t) and β(n, t), as in equation 2. We applied the method 504 of spatially synchronous surrogates (Appendix S8) to determine that adding the second variable to a 505 one-variable model incorporating only either α(n, t) or β(n, t) would produce a statistically significant 506 improvement (p = 0.000 for 1000 surrogates in both cases) in goodness of fit. 507 We verified that the synchrony explicable by the best fit model (according to the wavelet Moran 508 theorem, see Appendix S14 and Appendix S15) is consistent with the actual synchrony of γ(3)(n, t), as 509 expected since the local noise added to γ (3) was asynchronous. At the 20 year timescale, the expected 510 synchrony according to the best fit model was 1.022 times the synchrony observed in the wavelet mean 511 field of γ(n, t), i.e., all of the synchrony in γ (3) was attributable to α(n, t) and β(n, t), as expected. In the 512 analysis of real PCI data, the synchrony associated with the statistical model we constructed was only 513 some fraction of the observed synchrony, due to the other, unattributed components of variability having 514 non-zero spatial synchrony of their own. 515 We evaluated the size of the interaction effects between α(n, t) and β(n, t) as in Appendix S16: 1) 516 we evaluated the mean squared synchrony of the model (evaluated at the 20-year timescale); and we 517 compared it to 2) the same quantity calculated after replacing α(n, t) by synchrony-preserving surrogates 518 (we used 1000 surrogates and took the average). The former quantity was 1.24 times the latter, showing 519 substantial interaction effects were detected.