Reconstructing Source-Sink Dynamics in a Population with a Pelagic Dispersal Phase

For many organisms, the reconstruction of source-sink dynamics is hampered by limited knowledge of the spatial assemblage of either the source or sink components or lack of information on the strength of the linkage for any source-sink pair. In the case of marine species with a pelagic dispersal phase, these problems may be mitigated through the use of particle drift simulations based on an ocean circulation model. However, when simulated particle trajectories do not intersect sampling sites, the corroboration of model drift simulations with field data is hampered. Here, we apply a new statistical approach for reconstructing source-sink dynamics that overcomes the aforementioned problems. Our research is motivated by the need for understanding observed changes in jellyfish distributions in the eastern Bering Sea since 1990. By contrasting the source-sink dynamics reconstructed with data from the pre-1990 period with that from the post-1990 period, it appears that changes in jellyfish distribution resulted from the combined effects of higher jellyfish productivity and longer dispersal of jellyfish resulting from a shift in the ocean circulation starting in 1991. A sensitivity analysis suggests that the source-sink reconstruction is robust to typical systematic and random errors in the ocean circulation model driving the particle drift simulations. The jellyfish analysis illustrates that new insights can be gained by studying structural changes in source-sink dynamics. The proposed approach is applicable for the spatial source-sink reconstruction of other species and even abiotic processes, such as sediment transport.


Bootstrap
Define the set of centered residuals {ě i =ê i − 1 T q T q j=1ê j ; i = 1, ..., T q}. We randomly select T q error terms with replacement, e + = (e + 1 , ..., e + T q ), from the set of centered residuals and formulate the bootstrap sample y + =ŷ + e + . The bootstrap estimates of the parameters are then found by fitting the bootstrap data using the proposed penalized estimation method. This process is repeated a number of times (400 times in the numerical work herein) to obtain the bootstrap distributions of the parameters of interest.

Model Fitting and Diagnostics
The optimization of the objective function is conducted by an iterative algorithm which alternately updates the parameters source by source until convergence, as follows. Suppose we want to update all the parameters related to the hth source. Let y (h) . Upon fixing all parameters except those of the hth source, the optimization problem becomes minimizing . . , p, j = 1, . . . , q. This much simpler problem can be efficiently solved by an alternating lasso method; see [1] for details. We use AIC to determine the optimal value of λ h . Denote the optimizer as (d j,h ), with the tuning parameter being λ h . Define Following [2], the degrees of freedom is given by where I(·) is the indicator function. We compute the solutions over a grid of 100 equally spaced λ h values on the log scale, between λ min = 0 and λ max , the smallest λ h value at which all coefficients become zero [1]. The best λ h is selected as the one yielding the smallest AIC. Following [2], the selection of the K regularization parameters is nested within the iterative algorithm, in order to avoid the computationally expensive high-dimensional grid search. For the fitted pre-1990 model, the optimal tuning parameters are 0.010, 0.018, 0.003 and 0.013 (×10 −5 ), for Alaska Peninsula, Bristol Bay, Pribilof Islands and St. Matthew Island, respectively; their post-1990 model counterparts are 0.015, 0.041, 0.001 and 0.001 (×10 −5 ). While these λ values may seem small in magnitude, the degree of penalization is also determined by the adaptive weights w i,j,k = (d kũi,kṽj,k ) −2 where (d k ,ũ i,k ,ṽ j,k ) are the least squares estimators [3]. Besides, the enforced nonnegativity constraints also promote sparsity in the estimates of the u's and v's, thereby lessening the penalty needed for recovering certain sparsity pattern.
The variability in the observed jellyfish CPUE data y j,t is substantial, which reflects the complexity and inhomogeneity of the spatial/temporal dynamics of the distribution of jellyfish. Most of the CPUE observations in the data are small, with the 50th and 75th percentiles being 0.57 and 2.03, respectively; a few observations are quite large, with the 95th percentile and the maximum being 7.26 and 22.9, respectively, indicating occasional large fluctuations in the jellyfish biomass. To alleviate the influence/domination of the few large observations, the natural log-transformed CPUE, log(y j,t + 1), is found to fit the proposed source-sink model well, i.e. log(y j,t Since log(y j,t + 1) ≈ y j,t when y j,t is small, this transformation has little effect on the majority of the observations in the data. In fact, the linearity of the model remains to hold approximately. To see this, let f j be the expected value of f j,t . By Taylor expansion, y j,t ≈ (f j,t − f j + 1) exp (e j,t + f j ) when f j,t − f j is small, which means that y j,t is approximately linearly related to f j,t . Therefore, the model interpretations in the source-sink reconstruction with the log-transformed data remain valid. For future research, it would be worthwhile to consider a more complex model with the original data, by incorporating a thick-tail distribution that accounts for the occasional large observations.
We have evaluated the goodness of fit of the fitted source-sink reconstruction model, by checking whether the errors satisfy the independent and identical distribution (i.i.d.) assumption. The residuals from the pre-1990 and post-1990 periods are combined and standardized region by region. The autocorrelation patterns of the residuals from each region have been checked, and no significant autocorrelation is found up to lag 12 for each residual series. The cross correlation patterns between each pair of prewhitened regional residual series (28 pairs) have also been examined. There are only 4 significant cross correlations in all pairs of residuals up to lag 6 (13 cross correlations per each pair), i.e. the number of significant lags is about 1%. Hence, the standardized residuals appear to satisfy the i.i.d. assumption.
We have also checked the normality assumption with the residuals. Fig. S6 displays the Normal Q-Q plots for the combined residuals, and no clear departure form normality can be seen. Based on the Shapiro-Wilk normality test, the normality of each set of residuals are not rejected at the 5% significance level. We note that our proposed method does not require the error term to be normally distributed. Our model appears to fit the data well, and except for a few possible outliers, the residuals do not show any discernible pattern.
We use a permutation approach to further assess the model validity, especially to assess the significance of the structural change in the source-sink dynamics starting in 1990. Under the null hypothesis of no change in the source-sink dynamics, we can shuffle the datacases by permuting the year. For each permutated data, we refit the model to the two separate periods (pre-and post-1990) and compute a test statistic defined as the sum of the AIC of the model using data from the "pre-1990" period and the AIC of that from the "post-1990" period. This procedure is repeated 500 times, and a reference distribution is built for the test statistic. The observed test statistic using the actual data is 4.56, and it is smaller than all the 500 values based on permuted data, i.e., the estimated p-value of no change, computed as the fraction of the 500 values based on permutated datasets being less than the observed value 4.56 based on true data, is < 0.002. This result provides strong evidence that a change occurred in the source-sink dynamics starting 1990.
Altogether, we can conclude from these model diagnostics that our fitted model is correctly specified and provides a good fit to the data.