## Figures

## Abstract

The potential effects of conservation actions on threatened species can be predicted using ensemble ecosystem models by forecasting populations with and without intervention. These model ensembles commonly assume stable coexistence of species in the absence of available data. However, existing ensemble-generation methods become computationally inefficient as the size of the ecosystem network increases, preventing larger networks from being studied. We present a novel sequential Monte Carlo sampling approach for ensemble generation that is orders of magnitude faster than existing approaches. We demonstrate that the methods produce equivalent parameter inferences, model predictions, and tightly constrained parameter combinations using a novel sensitivity analysis method. For one case study, we demonstrate a speed-up from 108 days to 6 hours, while maintaining equivalent ensembles. Additionally, we demonstrate how to identify the parameter combinations that strongly drive feasibility and stability, drawing ecological insight from the ensembles. Now, for the first time, larger and more realistic networks can be practically simulated and analysed.

## Author summary

Mathematical models can be used to predict the potential effects of human actions on an ecosystem. Even without data, information from food webs and ecological theory has been used to build ecosystem models; but the current methods for generating them are slow, making analysis only practical for small, simple, food webs. We used a statistical method from the field of approximate Bayesian inference to speed up the process of model generation, so that we can study larger and more complex food webs. Using ecosystem case studies and randomly generated food webs, we show that our method can produce equivalent models and prediction in a fraction of the time. When tested on a large reef food web, the existing method was not fast enough to generate models within a reasonable time, but this is now possible with our new method. Hence, we can now analyse the large and complex ecosystems that exist in nature without needing to simplify our knowledge to save computation time.

**Citation: **Vollert SA, Drovandi C, Adams MP (2024) Unlocking ensemble ecosystem modelling for large and complex networks. PLoS Comput Biol 20(3):
e1011976.
https://doi.org/10.1371/journal.pcbi.1011976

**Editor: **John Martin Anderies,
Arizona State University, UNITED STATES

**Received: **July 24, 2023; **Accepted: **March 7, 2024; **Published: ** March 14, 2024

**Copyright: ** © 2024 Vollert et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **The code used for this analysis was implemented in MATLAB (R2022b) and is freely available for download on Figshare at https://doi.org/10.6084/m9.figshare.23707119.v2. The code (495MB) associated with this manuscript contains an ‘EEM Methods’ folder for using the methods and a separate ‘Results Replication’ folder for reproducing the results and figures.

**Funding: **SAV is supported by a Queensland University of Technology Centre for Data Science, Australia Scholarship. CD is supported by an Australian Research Council Future Fellowship (FT210100260). MPA and SAV acknowledge funding support from an Australian Research Council Discovery Early Career Researcher Award (DE200100683). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Conservation actions aim to help preserve the populations of threatened species, and more generally maintain the health of an ecosystem. However, it can be challenging to foresee the effects of an intervention across the whole ecosystem, leaving the potential for unintended consequences [1–4], such as a shift in predation to increased consumption of a species of interest (e.g., Roemer et al., 2002 [5]). Quantitative models can provide critical insights for ecosystem management by forecasting species populations into the future, or in response to both anthropogenic and natural perturbations [6–8]. However, parameterising these models is challenging.

There is typically limited information about the model parameters prior to any analysis [9] due to the difficulty, speed, cost, and uncertainty of expert elicitation and field experiments [10–12]. Consequently, estimates of model parameters necessary to simulate the ecosystem are often poorly constrained and subsequently yield inconclusive forecasts [13].

Since time-series abundance data is often lacking for model calibration [14, 15], parameters can be constrained based on desired *features* of the ecosystem; two common expected features are feasibility (also referred to as coexistence or persistence) and stability [16]. Ensemble ecosystem modelling (EEM)—an extension of qualitative modelling methods [3, 10, 17]—is a method used to generate an ensemble of plausible ecosystem models by randomly sampling parameter values and retaining those that yield feasible and stable ecosystems [18]. Many studies have used similar methods to simulate ecosystem properties such as these and investigate relationships between network structures, interaction strengths, and ecosystem properties [16, 19–22]. While studies investigating ecological theory could benefit from new parameterisation regimes, we focus on EEM because of its suitability in conservation planning under limited information. In practice, EEM has been used to assess the indirect consequences of species reintroductions [18, 23, 24], invasive species management [25], habitat restoration [26], population controls such as baiting [26], and assisted migration [27].

Predictions from EEM can inform conservation decisions in the all-too-common situation of limited data availability; however, the process of parameterising the ensemble becomes increasingly computationally intensive as the size of the ecosystem network increases. There can be a very low probability of randomly sampling feasible and stable systems [28]; for example, Peterson and Bode [27] reported fewer than 1 in 1, 000, 000 parameter sets were both feasible and stable for an ecosystem of 15 species. These constraints are even less likely to be satisfied for larger and more complex networks [19, 29].

Due to the low probability of generating ecosystem models in which all species stably coexist, much theoretical literature, starting with the classic work of May [16, 19, 20, 29], suggests it is unlikely for complex ecosystems to exist in nature, whereas others have recently proposed explanations for why they do exist—such as natural selection [30, 31]. In order to explore these ecological theories and to build decision-making tools, it is beneficial to model feasible and stable ecosystems—especially in the absence of time-series data. Yet in practice, this becomes computationally impractical via random sampling as the food web increases in size [27].

In this paper, we exploit established efficient parameterisation methods within Bayesian statistics to present and demonstrate a new method for efficiently generating an ensemble of parameter sets that define feasible and stable ecosystem models, inspired by sequential Monte Carlo approximate Bayesian computation (SMC-ABC) [32, 33]. Promisingly, when this new method is compared to the original method proposed by Baker *et al*. [18]—hereby referred to as *SMC-EEM* and *standard-EEM*, respectively—the computational efficiency is increased by several orders of magnitude for larger systems, whilst retaining similar predictions. We demonstrate that SMC-EEM, yields consistent ensembles of ecosystem networks to the standard-EEM method using two common comparisons (parameter inferences and model forecasts) as well as via analysis of model sloppiness [34]—a novel model analysis tool [35] that has only recently been applied for comparison of model ensembles [36]. Additionally, we demonstrate how this analysis of sloppiness could identify the key parameter combinations driving feasibility and stability, drawing ecological insight from the obtained ensembles. Therefore, the methods presented here unlock the capabilities of ensemble ecosystem models for representing in, and forecasting for, the complex ecosystem networks that exist in nature.

## Methods

### Ecosystem network modelling

An ecological community of interacting organisms and their physical environment can be represented as an ecosystem network or food web [37]. Ecosystem networks represent the interactions between individual species or groups of species (often referred to as nodes), characterising relationships such as predator-prey, host-parasite, competitive or mutualist [37, 38]. An interaction matrix is used to characterise positive and negative interactions between species that represent a beneficial or detrimental effect on the abundance of the affected species [9]. By characterising the direct effects of one population on another, the indirect effects that propagate through an ecosystem can be understood and modelled [39]. These interaction networks have been analysed both qualitatively [10, 40–42] and quantitatively [6, 9, 13, 18, 43] in order to forecast ecosystem population trajectories and predict responses to disturbances.

Ecosystems can be quantitatively modelled in many ways—such as non-parametric methods [44], empirical dynamic modelling [45, 46] or stochastic autoregressive models [43] (see [12] for an overview). Here, we focus on the common quantitative approach of using the generalised Lotka-Volterra equations for forecasting change in ecosystem node abundances over time [6, 9, 47],
(1)
where *n*_{i}(*t*) is the abundance of the *i*th ecosystem node at time *t*, *r*_{i} is the growth rate of the *i*th ecosystem node, *N* is the number of ecosystem nodes being modelled, and *α*_{i,j} is the per-capita interaction strength characterising the effect of node *j* on node *i*.

If there is no known effect of species *j* on species *i*, the parameter *α*_{i,j} = 0. However, relationships between species can be prescribed via the sign of the interaction strength parameters. For example, a mutualist relationship would require that both *α*_{i,j} and *α*_{j,i} are positive. Hence, connecting these Lotka Volterra equations to an ecosystem network informs ecosystem-specific information about the interaction strength parameters *α*_{i,j} in the model. In this work, we limit consideration to identifying suitable parameter values for a known model structure, rather than identifying appropriate model structures or networks.

The system represented in Eq (1) can be equivalently expressed in a vector form as
(2)
where **n** = {*n*_{i} : *i* = 1, …, *N*} is the vector of species abundances, **r** = {*r*_{i} : *i* = 1, …, *N*} is the vector of species growth rates, **A** = {*α*_{i,j}: *i*, *j* = 1, …, *N*} is the *N* × *N* interaction matrix of per-capita interaction strengths between ecosystem nodes, and ∘ is the Hadamard or element-wise product.

### Feasibility and stability constraints

The EEM method generates an ensemble of plausible parameter sets for the generalised Lotka-Volterra model where there is limited data. To do this, it uses two constraints on the behaviour of the whole ecosystem: feasibility and stability [18].

Since there cannot be negative populations, a *feasible* ecosystem is one in which equilibrium populations of all species are positive [21]. This feasibility condition is met if for all *i*, where is the equilibrium population abundance for node *i*, which is the solution to
(3)
Following Eq (2), this condition can be rewritten conveniently as
(4)
where **n*** is the vector of equilibrium population abundances for all species.

A *stable* ecosystem is one which can recover after small perturbations of species abundances away from equilibrium [22]. Specifically, local asymptotic stability (Lyapunov stability) requires that the dynamic system returns to the vicinity of the equilibrium point following a perturbation [21]. To determine if the stability constraint is met the Jacobian matrix *J* must be evaluated at equilibrium **n***, such that
(5)
is the (*i*, *j*)th element of the Jacobian matrix *J*, and *f*_{i} is the change in abundance for the *i*th node represented by Eq (1). Eq (5) indicates that the elements of this Jacobian matrix approximate the effect of species *j* on species *i* when the system is close to equilibrium [22]. The dynamic system is considered locally asymptotically stable if the real part of all eigenvalues (λ_{i}) of the Jacobian matrix *J* are negative, i.e. . For the generalised Lotka-Volterra equations, the elements of the Jacobian matrix evaluated at equilibrium can be calculated as
(6)

### Ensemble ecosystem modelling

Ensemble ecosystem modelling (EEM) aims to produce an ensemble of parameter sets that yield feasible and stable ecosystems for a given ecosystem network structure. The standard approach to EEM, introduced by Baker *et al*. [18], is to randomly search a pre-defined parameter space for possible intrinsic growth rate parameters *r*_{i} and interaction strengths *α*_{i,j} that together yield a feasible and stable ecosystem. Specifically, the model parameters ** θ** ≡ {

*α*

_{ij},

*r*

_{i}}

_{i,j=1,…,N}are first sampled from a pre-specified probability distribution which characterises any prior beliefs about the parameter values; this is the prior distribution

*π*(

**). Next, any sampled parameter sets**

*θ***which lead to feasible and stable ecosystems are added to the ensemble of plausible models, creating an ensemble of parameter sets from the target distribution**

*θ**π*(

**|**

*θ***) that have the desired system features**

*s***. Throughout this manuscript, we refer to this random sampling process for generating an ensemble of feasible and stable ecosystems—described in Algorithm 1—as the**

*s**standard*-EEM method. After solving each system of Lotka-Volterra equations, the forecasts are combined to produce an ensemble that can simulate the multitude of potential effects of conservation actions on each of the species within the ecosystem [18, 23–27]. A summary of the EEM process is depicted in Fig 1.

**Algorithm 1**: The standard-EEM algorithm proposed by Baker et al. [18].

**while** *the ensemble is not sufficiently large* **do**

Propose parameter values using any prior beliefs ** θ*** ∼

*π*(

**)**

*θ* **if** *the model using* ** θ***

*meets the feasibility and stability constraints*

**then**

Save parameter values ** θ*** to the ensemble

Forecast using the ensemble of ecosystem models

In the present work, we present the SMC-EEM method and compare it to the standard-EEM method [18]. The inputs and outputs of the EEM process are the same regardless of the parameterisation method (SMC-EEM or standard-EEM) used.

While the standard-EEM method can produce a representative ensemble of feasible and stable ecosystems, in practice it is too computationally intensive to be practical for large or dense ecosystem networks. We show here that the efficiency of EEM can be greatly improved by exploiting efficient sampling methods developed for Bayesian statistics, such as sequential Monte Carlo-approximate Bayesian computation (SMC-ABC). To explain this, we first demonstrate the connection between EEM and approximate Bayesian computation (ABC).

### Approximate Bayesian computation

ABC is a statistical inference technique used to estimate the parameters of complex models by comparing simulated data to observed data [48–51]. The technique involves simulating data from the model using prior information about the model parameters ** θ** as specified by the prior distribution

*π*(

**). The simulated data (from the model specified by**

*θ***) is then compared to the observed data**

*θ***via a summarisation function**

*y**S*that reduces the full dataset to a set of summary statistics. A discrepancy function is used to measure the similarity between the simulated and observed datasets [50], and if the simulated data closely matches the observed data, the parameter values are accepted as plausible. The target (posterior) distribution, which is a distribution of the parameters conditional on the available data

*π*(

**|**

*θ***), can then be approximately sampled using ABC accept-reject [52], or more efficient methods [48] such as Markov chain Monte Carlo ABC (MCMC-ABC) [53, 54] or sequential Monte Carlo ABC (SMC-ABC) [55, 56]. For the interested reader, helpful reviews on approximate Bayesian methods can be found in Beaumont et al., [51], Drovandi [57], or Sisson et al., [49].**

*y*### Connections between approximate Bayesian computation and ensemble ecosystem modelling

While similarities have been drawn between ABC and EEM [18, 26], this connection has not been exploited in the literature to our knowledge. Where ABC uses summary statistics to capture key information in the observed data, EEM applications have no abundance data and instead assume that ecosystems are *observably* feasible and stable. While we suggest that EEM is not an ABC approach in the statistical sense, we propose to frame these system features as summary statistics and adopt ABC-based sampling methods. In this way, the output of EEM should instead be considered a constraint-informed prior, rather than a posterior distribution—as feasibility and stability are not directly observed. However, by placing EEM within an ABC framework, the vast literature on efficient sampling methods developed for ABC can be used to efficiently generate an ensemble of plausible ecosystem networks.

There are many different ABC methods available [49], and the simplest is accept-reject ABC. Table 1 reveals that the steps of the standard-EEM method (Algorithm 1) are exactly analogous to the ABC accept-reject method [58]. Through both methods, the model parameters ** θ** = {

*α*

_{ij},

*r*

_{i}:

*i*,

*j*= 1, …,

*N*} are calibrated using prior information about the parameters and summaries of the data (feasibility and stability).

In the ABC accept-reject method depicted in Table 1, the aim is to minimise the discrepancy (*ρ*) between the modelled and observed data so that they match as much as possible, such that *ρ* < *ϵ* where the target discrepancy *ϵ* is small. Equivalently, in the standard-EEM method, the aim is for the features of modelled ecosystems to match what is assumed to be true for a real ecosystem of coexisting species—feasibility and stability.

Hence, ABC can be mathematically matched to EEM by introducing a discrepancy function *ρ* that becomes equal to a target discrepancy of zero (*ϵ* = 0) when the modelled ecosystem is feasible and stable. To this end, we define a discrepancy function *ρ*(** θ**) for an ecosystem represented by parameters

**= {**

*θ**r*

_{i},

*α*

_{i,j}:

*i*,

*j*= 1, …,

*N*}, as (7) (8) (9) where

*v*

_{f}(

**) is a measure of infeasibility of all ecosystem nodes (the negativity of equilibrium populations ), and**

*θ**v*

_{s}(

**) is a measure of instablility of all ecosystem nodes (the positivity of the real parts of the Jacobian eigenvalues λ**

*θ*_{i}). Using the discrepancy function

*ρ*(

**) defined in Eqs (7)–(9), a feasible and stable ecosystem possesses**

*θ**ρ*(

**) = 0; however, any infeasibility or instability will result in**

*θ**ρ*(

**) > 0.**

*θ*### Sequential Monte Carlo-approximate Bayesian computation

By placing EEM within an ABC framework we can take advantage of advanced ABC sampling methods beyond ABC accept-reject sampling. Within the ABC framework, there is a large suite of methods for sampling from the approximate posterior—such as ABC accept-reject, MCMC-ABC and SMC-ABC [51]—which each present different advantages and disadvantages. In the present work, we used SMC-ABC for sampling because it can be more efficient for applications with a low probability of randomly sampling acceptable parameter values [59] and this is the key computational bottleneck in ecosystem generation for large and complex networks. Hence, in the remainder of this section, we provide a brief overview of SMC-ABC as it pertains to ecosystem generation.

SMC-ABC works by moving an ensemble of parameter sets through a sequence of distributions, ending at the target distribution [55]. Typically starting with an ensemble drawn from the prior distribution *p*_{0}, these parameter sets are manipulated to become representative of the next distribution in the sequence *p*_{1} and this process is repeated until the ensemble is representative of the target distribution *p*_{T}. In SMC-ABC, the sequence of distributions *t* = 0, …, *T* is a sequence of decreasing maximum discrepancies *ϵ*, such that the *t*th distribution is *p*_{t}(** θ**|

*ρ*(

**) ≤**

*θ**ϵ*

_{t}), where

*ϵ*

_{t}≤

*ϵ*

_{t−1}. This sequence, whether prespecified or adaptively selected within the algorithm, commonly progresses the ensemble from the prior (maximum discrepancy

*ϵ*

_{0}= ∞) to some target discrepancy (maximum discrepancy

*ϵ*

_{T}). In this way, SMC-ABC breaks up the sampling problem into a series of simpler problems [60]. Provided that the sequence of distributions is chosen sensibly so that the effective sample size throughout the algorithm is maintained at a reasonable level, the sequence itself does not affect the target distribution, merely the speed that the target distribution is obtained.

In SMC, a distribution in the sequence is characterised by many independent and weighted parameter sets referred to as ‘particles’. The weight attributed to each particle is determined by both the prior density and the discrepancy of the parameter set. As such, each particle *θ*_{i} contains a proposed value for all model parameters and a weighting, and subsequently an ensemble of *M* particles make up an empirical approximation of the distribution *p*_{t}.

Each distribution in the sequence, *p*_{t}, can be approximated by manipulating the ensemble characterising the previous distribution *p*_{t−1}, using importance sampling and MCMC-ABC techniques [33]. To progress the particles from one distribution to the next, three steps are iteratively applied: reweighting, resampling and moving [48, 56].

**Reweighting**: The prior density and discrepancy for all particles is calculated and used to weight the particles. This ensures parameter sets that create outputs similar to the observations are more highly weighted.**Resampling**: Particles are resampled according to their weight, such that high-weighted particles are duplicated and low-weighted particles are eliminated. This focuses the particles into areas of the parameter space that can yield low discrepancies.**Moving**: MCMC-ABC [54] is used to move the particles according to the current distribution in the sequence*p*_{t}(|*θ**ρ*() ≤*θ**ϵ*_{t}). This diversifies the ensemble (avoiding duplicates) by jittering each parameter set relative to its current values.

By iterating through these three steps, the cluster of weighted particles can progress through the sequence of distributions to the target distribution. Algorithm 2 shows a summary of an adaptive SMC-ABC method [55], adapted to the EEM context by building on Drovandi and Pettitt’s implementation [33]. Further details of this algorithm are provided in S1 File.

**Algorithm 2**: Overview of the SMC-EEM method (see S1 File for full details)

**INITIALISE**

Generate an ensemble of *M* particles from the prior distribution, *π*(** θ**)

**REWEIGHT**

Evaluate the discrepancy for all particles,

Set the discrepancy threshold *ϵ*_{t}

**while** *there are infeasible or unstable models in the ensemble*, max(** ρ**) > 0

**do**

**RESAMPLE**

Replace all particles with a discrepancy greater than the tolerance, *ρ*(*θ*_{i}) ≥ *ϵ*_{t}, by duplicating particles with discrepancies below the tolerance

**MOVE**

**while** *there are many duplicate particles* **do**

**for** *each particle that was replaced* **do**

Propose a new set of parameter values from a proposal distribution

Evaluate the discrepancy and prior density, and

Accept or reject based on a Metropolis-Hastings ratio

**REWEIGHT**

Lower the discrepancy threshold, *ϵ*_{t}

We can think of the ABC accept-reject method (standard-EEM) as “uninformed”: we reject models that do not fit the constraints, without learning from them. Instead, a more informed sampling method, such as SMC-ABC, utilises information from rejected models. SMC-ABC methods use a sequence of decreasing tolerances, so that parameter values are proposed from an iteratively more “informed” distribution, rather than the prior [60]. As a result, SMC-ABC can perform more efficiently than ABC accept-reject for simulating rare events (when the prior and target distributions are very different) [51]. S1 Video shows a visual comparison of the ABC accept-reject and SMC-ABC methods in two dimensions.

### Analysis of model sloppiness

To compare the ensembles produced by standard-EEM and SMC-EEM, an analysis of model sloppiness can be used. Analysis of model sloppiness is a data-informed sensitivity analysis [61–63] that has recently been shown to provide useful insights for biological and ecological models parameterised using Bayesian inference [34–36]. In the context of ecosystem generation, analysis of model sloppiness can be used to provide a comparison of the model ensembles generated via different Bayesian methods.

Whilst ensembles can (and should) also be compared based on the estimated marginal parameter distributions, this method can be misleading when individual parameter values are unconstrained. Complementarily, analysis of model sloppiness can be used to compare tightly constrained parameter combinations (e.g. products and ratios of parameters) between different ensembles, to indicate their similarity even when individual parameter values are relatively unconstrained [36].

The analysis of model sloppiness uses an eigendecomposition of a parameter-data sensitivity matrix to identify the directions in parameter space, with associated magnitudes, that are most informed by the data [34, 61]. Here the “data” refers to the feasibility and stability constraints. We use the posterior covariance sensitivity matrix—the inverse of the empirical covariance matrix of the logarithmically transformed ensemble [34, 35]—to capture how tightly constrained parameters are after parameterisation. Hence, using this analysis on an ensemble generated via standard-EEM yields the directions in parameter space that are important for obtaining feasible and stable systems.

These important directions can be expressed as parameter combinations [34], known as eigenparameters :
(10)
where is the *j*th eigenvector of the sensitivity matrix, *n*_{p} is the number of model parameters, and *θ*_{i} is the *i*th parameter in the model [35, 62]. Using a logarithmically transformed ensemble allows this eigenparameter to be expressed as a product (as in Eq (10)) rather than a sum, which is common in the literature [34, 35, 61]. Each eigenparameter has a corresponding eigenvalue λ_{j} that indicates how tightly constrained the parameter combination is, such that the largest eigenvalue (λ_{1}) corresponds to the most sensitive eigenparameter . These parameter combinations (expressed as in Eq (10)) can be directly analysed to identify important mechanisms [34], or visually represented to identify parametric trends [35] that drive the model to match the data (in this case feasibility and stability). For further information about the analysis of model sloppiness method or interpreting eigenparameters, see Monsalve-Bravo et al., 2022 [34] or Vollert et al., 2023 [35]. We applied this analysis to a case study to demonstrate the process of identifying important mechanisms and parameter trends in feasibility and stability constrained ecosystem models (see Case study 3: Great Barrier Reef network).

Additionally, we can use this analysis of sloppiness to compare the similarity of ensembles (standard-EEM and SMC-EEM) across important parameter combinations, comparing the ensembles across many parameters simultaneously. For each eigenparameter *j* (as in Eq (10)), the parameter values ** θ** can be substituted in to yield a value for the parameter combination, . Repeating this process for all parameter sets in an ensemble therefore yields a distribution of values representing the eigenparameter . Hence, for each important parameter combination , we can produce and compare the distributions created by two different ensembles of parameter sets, assessing the ensemble similarity across the important directions in parameter space [36]. When applied to standard-EEM and SMC-EEM ensembles, this analysis reveals whether the important parameter combinations for feasibility and stability are similar between the two methods, indicating ensemble similarity even if individual parameters are unconstrained. Hence, the analysis of model sloppiness here provides a critical assessment of the similarity of the ensembles produced by the two different methods of ecosystem network generation (standard-EEM and SMC-EEM).

### Case studies

The standard-EEM and SMC-EEM methods were compared in two ways. Firstly, the two methods were compared generally across many randomly generated ecosystem network structures (referred to as the “simulation study”). Secondly, the methods were compared via three case studies representing natural ecosystems. An ecosystem network representing semiarid Australia—originally used by Baker *et al*. [18] to introduce EEM—was investigated as an example network where standard-EEM is practical for ecosystem generation within a reasonable computation time. A network of Phillip Island, Australia [25] was used to showcase an example where SMC-EEM is much faster than standard-EEM for ensemble generation. Finally, a coral reef food web network proposed for the Great Barrier Reef [64] was investigated as an example of interest where the standard-EEM method is computationally impractical. For the simulation study and the three case studies, the computation times and the resulting ensembles produced by each method were compared.

#### Simulation study.

To generally test the two methods, many ecosystem networks were simulated. Following the practice of May [19] (later replicated by many other studies, e.g., Allesina and Tang [28]), a random matrix theory approach was used, whereby the sign structure of an interaction network was randomly assigned, as follows.

A network of *S* species requires a *S* × *S* interaction matrix. The diagonal elements of the matrix (the effect of a species on itself) are negative so that the species populations are self-regulating. Each off-diagonal element of the matrix was treated independently via a two-step process. Firstly, the interaction was made non-zero with a probability *c*—this connectance parameter specifies the probability of direct interaction between two species [28]. We focused our results on ecosystems generated with a connectance probability of *c* = 0.5; however, we also explored varying this probability to *c* = 0.25 and *c* = 0.75. Secondly, each non-zero element was allocated either a positive or negative interaction with probability *p* = 0.5 (such that there was an equal probability of positive or negative interactions). Network structures consisting of between 3 and 15 species (inclusive) were generated with this approach.

For each randomly generated network structure of 3–15 species, 1000 feasible and stable parameterisations were found using the ensemble generation methods discussed previously (standard-EEM and SMC-EEM). We aimed to generate and simulate 1000 ecosystems of each size. However, due to the computational burden of the experiment, we were unable to simulate this many large networks. Instead, there are a minimum of 100 ecosystems simulated for each network size. For each ecosystem network considered in this work (both simulated and natural case studies) the parameterisation used prior distributions of , and following Baker et al., 2017 [18].

#### Case study 1: Semiarid Australia network.

The two ensemble generation methods (standard-EEM and SMC-EEM) were then applied to an eight-node ecosystem network representing semiarid Australia (see Figure 1b of [18]). This ecosystem network was previously used to introduce the standard-EEM method and to evaluate the plausible consequences of dingo reintroduction to a national park in Australia [18].

Since standard-EEM has been previously applied to this case study it serves as a useful test case where both methods are expected to generate an ensemble within a practical time frame. In this network, interaction matrix elements that do not represent direct effects of species on each other are set to zero and thus do not require sampling; if this were not the case then ecosystem generation for this (eight-node) network would require sampling of 72 parameters (total 64 interaction matrix elements *α*_{i,j} and 8 growth rates *r*_{i}). Instead, this eight-node network has 33 parameters when represented as a generalised Lotka-Volterra model, which is small compared to other ecosystem networks observed in nature that have been quantitatively investigated (e.g. Booderee National Park represented as 20 nodes and 163 parameters [9]).

#### Case study 2: Phillip Island network.

Next, we generated an ensemble of ecosystem models using both standard-EEM and SMC-EEM for a 22 node network which represents Phillip Island, Australia (see Figure 2 of [25]). This network is considerably larger and more complex than the semiarid Australian network—there are 110 parameters to be estimated when represented as a Lotka-Volterra system—such that the SMC-EEM method is expected to generate an ensemble faster than the standard-EEM method.

#### Case study 3: Great Barrier Reef network.

Lastly, we demonstrate the benefits of the SMC-EEM method using a case study where it is impractical to use standard-EEM. Rogers *et al*. [64] produced a conceptual 16-node coral reef food web from the literature which depicts a Great Barrier reef ecosystem (see Figure 1 of [64]). In addition to being a large, this ecosystem network is also densely connected, resulting in an extremely low probability of sampling a feasible and stable model.

## Results

### Simulation study

Our new SMC-EEM method is orders of magnitude faster than the standard-EEM method for larger ecosystems when compared generally across many randomly generated ecosystem network structures (Fig 2). We observe that for smaller ecosystems the standard-EEM method may be more computationally efficient due to the additional computational processes required by the SMC-EEM method. This key result also holds for different connectance probabilities *c* (S1 Fig).

The computation time required to parameterise an ensemble of 1000 feasible and stable ecosystem models using both the standard-EEM and SMC-EEM methods. This figure shows the medians (dots) and 7.5–92.5% quantiles (error bars) of computation times. Note, the computation time for any one ecosystem network was capped at 10^{4} seconds due to the computational burden of the simulation study.

More generally, the computation time of the standard-EEM method scales linearly with the probability of randomly selecting parameter values that are feasible and stable (Fig 3). This probability—known as the acceptance rate—is an emergent property of the model, prior and constraints, and can be estimated as the proportion of tested parameter sets that were accepted using standard-EEM. In our simulation study, the SMC-EEM method was computationally more efficient for ecosystems with an estimated acceptance rate smaller than 0.005 (vertical dashed line in Fig 3), such that less than 1 in 200 proposed systems are feasible and stable. Here, the SMC-EEM method is faster than the standard-EEM method because fewer parameter values need to be trialled (S2 Fig), making the SMC-EEM method more statistically efficient. Though standard-EEM can outperform SMC-EEM at high acceptance rates, both methods were computationally inexpensive in these scenarios. In our simulation study, ensembles of 1000 feasible and stable ecosystems could be generated in less than 12 seconds via either method in networks with an acceptance rate greater than 0.005.

The parameterisation computation times of Fig 2 with respect to the *acceptance rate* of the standard-EEM method—an estimation of the probability of randomly sampling a feasible and stable system given a network with a pre-specified structure. Acceptance rates are logarithmically displayed from 100% acceptance (left) to very small percentages (right). Note that the computation time for any one ecosystem network was capped at 10^{4} seconds to maintain practical computations in the simulation study.

Additionally, we find that the ensembles of ecosystem models produced by the standard-EEM and SMC-EEM methods are consistent with each other in their estimated parameter distributions, eigenparameter distributions, and time-series predictions (Fig 4). For example, for a randomly sampled interaction structure (Fig 4a), the SMC-EEM method replicates the outputs of the standard-EEM method in terms of predicted model parameter distributions (blue and red densities in Fig 4b). Additionally, from an analysis of model sloppiness, the stiffest eigenparameters (i.e. parameter combinations corresponding to the largest eigenvalues of the sensitivity matrix, see Eq (10) and surrounding text for more information) also correspond extremely well between the SMC-EEM and standard-EEM methods (blue and red densities in Fig 4c). Finally, time-series forecasts of these ecosystems from a common randomly chosen initial condition are virtually indistinguishable between the methods (blue and red shaded regions in Fig 4d).

Example outputs from a randomly chosen ecosystem simulated in Fig 3 using ensembles obtained from the prior distribution (grey), standard-EEM method (red) and SMC-EEM method (blue). In each case, notice that the standard-EEM method and SMC-EEM method produce consistent results that are significantly different to the prior: **(a)** A six-species ecosystem network generated using *c* = 0.5. This example ecosystem has 27 parameters and a 0.037 probability of randomly selecting feasible and stable parameter values. **(b)** Estimated marginal parameter distributions estimated via both methods and compared to the prior distribution. **(c)** Marginal distributions of the nine stiffest eigenparameters for each ensemble obtained from an analysis of model sloppiness. **(d)** The distribution of equilibrium population abundances predicted for each ensemble. Note that the x-axes have been limited to visualise the distribution peaks, however the range of equilibrium populations for the prior distribution was , so is very diffuse (and hence barely visible in these plots) compared to the ensemble-predicted distribution abundances. **(e)** Time-series predictions of population abundances for each ensemble of ecosystem models using randomly chosen initial conditions (median population prediction and 95% credible intervals shown).

### Case study 1: Semiarid Australia network

For both SMC-EEM and standard-EEM methods, it took less than a minute to generate a 10,000 model ensemble for the semiarid Australia network, though the standard-EEM method was faster (Table 2). These computation times are consistent with our previously observed relationship between acceptance rate and computation time (Fig 3), as the estimated acceptance rate for this network is 0.11, which is much larger than 0.005. As the acceptance rate for the semiarid Australia network is high, only two SMC-ABC iterations were required to generate the ensemble, making the SMC-EEM method statistically inefficient. For this eight-species ecosystem network, with connectance *c* = 0.39, the standard-EEM method would be the best choice of method, as it is faster and easier to implement.

For this network (Fig 5a), we found that the SMC-EEM method produced consistent estimated distributions of equilibrium abundances to the standard-EEM method (Fig 5b). We also observe similar estimated parameters (S3 Fig), stiff eigenparameters (S4 Fig), and time-series predictions (S5 Fig) for the standard-EEM and SMC-EEM produced ensembles.

Ensemble ecosystem modelling for an ecosystem network representing semiarid Australia parameterised using standard-EEM and SMC-EEM methods. **(a)** The semiarid Australian ecosystem network [18] consisting of eight nodes and 33 parameters when represented as a Lotka-Volterra system. **(b)** Distributions of equilibrium abundances from the prior distribution (grey), standard-EEM (red) and SMC-EEM (blue) ensembles of ecosystem models. Note that the x-axes have been limited to visualise the distribution peaks, however the range of equilibrium populations for the prior distribution is very diffuse (and hence barely visible in these plots) compared to the ensemble-predicted distribution abundances. Here the blue and red densities match almost exactly, demonstrating that the outputs of the standard-EEM and SMC-EEM methods are consistent.

### Case study 2: Phillip Island network

The standard-EEM method required 108 days to generate 100,000 ensemble members for the Phillip Island network; however, SMC-EEM completed this task in under 6 hours (Table 3). (It should be noted that these computational exercises were performed in parallel on 12 cores.) The SMC-EEM method produced the ensemble in 0.22% of the time required by standard-EEM because it required 0.13% of the simulations. This massive computational saving is consistent with the results presented in Fig 3, as the acceptance rate for the Phillip Island network was 1.7 × 10^{−6}. The SMC-EEM method is thus the only practical option, out of the two methods, for this 22-species network.

Additionally, the outputs of SMC-EEM and standard-EEM are consistent. The distributions of equilibrium abundances computed for each parameterised ensembles are consistent (Fig 6b) and both methods produce comparable estimated marginal parameter distributions (S6 Fig), stiff eigenparameter distributions (S7 Fig) and population forecasts (S8 Fig), indicating that the information gained about the parameters is consistent between methods.

Ensemble ecosystem modelling for an ecosystem network representing Phillip Island parameterised using standard-EEM and SMC-EEM. **(a)** The Phillip Island ecosystem network [25] consists of 22 nodes, with connectance *c* = 0.18, and 110 parameters when represented as a Lotka-Volterra system. **(b)** Distributions of equilibrium abundances from the prior distribution (grey), standard-EEM (red) and SMC-EEM (blue) ensembles of ecosystem models. Note that the x-axes have been limited to visualise the distribution peaks, however the range of equilibrium populations for the prior distribution is very diffuse (and hence barely visible in these plots) compared to the ensemble-predicted distribution abundances. Here the blue and red densities match almost exactly, demonstrating that the outputs of the standard-EEM and SMC-EEM methods are consistent.

### Case study 3: Great Barrier Reef network

Parameterising the Great Barrier Reef network [64] for 100,000 ensemble members took 21 hours for the SMC-EEM method (Table 4) and could not be practically computed using the standard-EEM method. Based on a preliminary analysis of 20 ensemble members, it took approximately 40 hours to generate a single ensemble member using standard-EEM with an acceptance rate of , hence an ensemble of this size would take years to produce (estimated 450 years). The SMC-EEM method is thus the only practical option, out of the two methods, for this 16-species network.

Since we cannot produce a standard-EEM ensemble, instead we compared the outputs of two independently obtained SMC-EEM ensembles to assess their reproducibility. This indicates if SMC-EEM can adequately sample the parameter space to produce a representative ensemble. The two independent SMC-EEM ensembles of 100,000 yield consistent results when comparing the predicted equilibrium abundances (Fig 7b). Additionally, the ensembles have comparable estimated marginal parameter distributions (S9 Fig), stiff eigenparameter distributions (S10 Fig), and time-series forecasts (S11 Fig), indicating that the information gained about the parameters is consistent across independent runs. Such a result is very encouraging given that, for this case study, we have yielded a representative approximation of 118-dimensional space with 100,000 parameter sets each.

Ensemble ecosystem modelling for an ecosystem network representing the Great Barrier Reef parameterised using standard-EEM and SMC-EEM. **(a)** The Great Barrier Reef ecosystem network [64] consists of 16 nodes, with connectance *c* = 0.4, and 118 parameters when represented as a Lotka-Volterra system. **(b)** Distributions of equilibrium abundances from the prior distribution (grey), and two independent SMC-EEM ensembles (light blue and dark blue) of ecosystem models. Note that the x-axes have been limited to visualise the distribution peaks, however the range of equilibrium populations for the prior distribution is very diffuse (and hence barely visible in these plots) compared to the ensemble-predicted distribution abundances. Here the independent SMC-EEM ensembles are consistent, demonstrating reproducibility.

Once an ensemble is obtained, a data-informed sensitivity analysis—such as the analysis of model sloppiness—can be used to identify the important parameter combinations for achieving feasibility and stability. For this Great Barrier Reef ecosystem network, each of the five most tightly constrained parameter combinations focuses on balancing the positive growth rates of basal species, or the self-regulation of top predators (see S12 and S13 Figs).

For example, using the information in S12 Fig, the first eigenparameter can be expressed as
where *r*_{i} is the positively constrained intrinsic growth rate for species *i*, *α*_{i,j} is the interaction parameter for the effect of species *j* on species *i*, and the relevant species for this equation are represented as *TA* for turf algae, *MA* for macroalgae, *C* for coral, *D* for detritus and *U* for urchins. This eigenparameter describes the balance between the proliferation of turf algae and the negative impacts on its abundance: mainly competition with macroalgae (including the proliferation rate of macroalgae), but also other lower trophic species including detritus, coral and urchins. Similar relationships can be seen for the five most influential parameter combinations (S12 Fig).

This could indicate that given growth rate parameters are constrained to be only positive, and self-interactions between species are constrained to be only negative (self-regulating), the most important features for parameterising feasible and stable ecosystems are a high abundance of basal species and limited populations of top-predators. This well-observed result, while not a surprising insight, indicates how this analysis could be used to identify key drivers for developing feasible and stable ecosystems.

## Discussion

In this work, we have presented and demonstrated a method that, for the first time, can rapidly generate ensemble ecosystem models for higher dimensional ecosystem networks. This new method, which we call the SMC-EEM method, can generate consistent ensembles to the current gold-standard method—standard-EEM—whilst being orders of magnitude faster for large and densely connected networks. On a Phillip Island case study [25] SMC-EEM reduced the computation time from 108 days to 6 hours, with indistinguishable time-series predictions, estimated distributions of model parameters and model parameter combinations. For a Great Barrier Reef network, we showed that standard-EEM was not capable of producing a large ensemble, such that SMC-EEM was the only practical option. This new method permits large and complex ecosystems—as observed in nature—to be practically simulated and analysed.

### The best ecosystem generation method depends on the properties of the ecosystem network

Both the standard-EEM method and our introduced SMC-EEM method have advantages and disadvantages, depending on the ecosystem being modelled. SMC-EEM is expected to be more computationally efficient for ecosystems comprised of 7 or more species (result obtained for a connectance probability *c* = 0.5 as in Fig 2; see S1 Fig for results with other values of *c*), or if less than 1 in 200 parameter values are feasible and stable when sampled using standard-EEM (acceptance rate of 0.005; Fig 3). While the acceptance rate of an ecosystem network, which encapsulates both the number of species and connectance, is a better predictor of computation time than the number of species in the system (see Figs 2 and 3), the number of species is a much more intuitive measure and does not require prior calculations to estimate, unlike the acceptance rate.

When considering the eight-species semiarid Australian ecosystem network (with *c* = 0.39), based on the number of species it would be unclear beforehand whether SMC-EEM or standard-EEM would be faster (Fig 2). However, by estimating the acceptance rate as 0.11 (roughly 1 in 9 parameter sets tested were feasible and stable), Fig 3 clearly shows standard-EEM is expected to outperform SMC-EEM for this network. Practically, both standard-EEM and SMC-EEM are acceptable choices for this case study as they both generated the model ensemble within a minute; however, we must acknowledge that standard-EEM is a simpler process (making it more straightforward to implement in computer code) and generated the ensemble faster (Table 2).

In contrast, for the 22-species Phillip Island case study (with *c* = 0.18) and an acceptance rate of 1.7 × 10^{−6} (roughly 1 in 600,000 parameter sets tested were feasible and stable), it is clear from both Figs 2 and 3 that SMC-EEM will be significantly faster. When applying standard-EEM to this system, we found it would take 108 days to generate the ensemble (Table 3), making SMC-EEM the only practical option of the two methods.

Lastly, the 16-species Great Barrier Reef network (with *c* = 0.4) and an acceptance rate of (roughly 1 in a billion parameter sets tested were feasible and stable) is expected to be orders of magnitude faster according to the trends shown in Figs 2 and 3, and the observed computation times (Table 4) were within the credible ranges indicated by these trends. Here we note that the acceptance rate for this network is considerably smaller than for the Phillip Island network, and this could be due to being more densely connected, or the structure of the network itself [65, 66].

### Comparing the ensembles generated by the two methods

In this work, we used the estimated parameter distributions and time-series predictions to compare ensembles produced using the two methods. Additionally, the distributions of the stiff eigenparameters, obtained using an analysis of model sloppiness, provided an additional diagnostic comparing the similarity of the ensembles. The analysis of model sloppiness can indicate how similar the ensembles are, whilst accounting for parameter interdependencies [34, 36]—a perspective not easily observed via the estimated marginal distributions, quantities of interest, or via time-series predictions. We, therefore, encourage the comparison of Bayesian inference method-generated ensembles via comparison of eigenparameter distributions alongside a comparison of marginal parameter distributions, as this provides a more comprehensive comparison. For the ensembles tested in this work, the eigenparameter distributions did not indicate any substantive differences (Fig 4, S4 and S10 Figs).

To our best knowledge, the SMC-EEM method outputs match those produced by the standard-EEM method (Figs 4–7, and S3–S10 Figs). However, users should be cautious when selecting the ensemble size for SMC-EEM. While standard-EEM always randomly samples from the parameter space to propose new values, SMC-EEM proposes new values relative to current values in the ensemble (via the multivariate Gaussian proposal distribution centered on the current parameter value within the MCMC algorithm). Hence, if there are not enough particles to cover a high-dimensional parameter space, the SMC-EEM method may not sufficiently explore the parameter space, thereby creating an ensemble that is not representative and is different to the distribution of ensembles produced by standard-EEM. This difference in ensembles occurred when using only 10,000 ensemble members for both the Phillip Island and Great Barrier Reef case studies; however, the ensembles were found to be consistent for 100,000 ensemble members.

For ecosystem networks that are not overly complex, it is possible to assess whether there are enough parameter sets by comparing the results of SMC-EEM and standard-EEM. But for high-dimensional ecosystem networks, it will not be practical to compare outcomes since the latter will have impractically high computational costs (as for the Great Barrier Reef case study). We therefore recommend multiple independent runs of the SMC-EEM method and a visual assessment of whether the ensemble is reproducible (through the estimated parameter distributions, stiff eigenparameter distributions, and time-series predictions), especially if the ecosystem network is as large as the Great Barrier Reef network explored here (see Fig 7). Hence, while the foundational analysis presented here demonstrates that the SMC-EEM method finally unlocks analysis of higher-dimensional networks, its accuracy will be limited primarily by the size of the ensemble.

### Implications for ecosystem network generation in nature

While the main motivation behind SMC-EEM was to maximise the capabilities of the conservation tool, this parameterisation regime could also be of use for drawing theoretical insights. There is substantial debate in the literature regarding which features of natural ecosystems make them more likely to be stable and feasible (e.g., [28, 67, 68]). Some literature suggests that larger and more connected networks are less likely to be feasible and stable [16, 19, 28] because there is a lower probability of randomly sampling parameter values to satisfy these two constraints. However, treating the probability of generating a feasible and stable system through random sampling as a proxy for the likelihood of these systems developing in nature creates a disparity: complex food webs are actually observed in nature, yet are perceived theoretically as highly unlikely.

Interactions in ecosystems have been shaped by processes such as co-evolution, niche partitioning, and resource competition [69], making it unlikely that interactions in ecological networks are random. Additionally, the “community assembly” hypothesis [70] suggests that the development and persistence of large food webs may be the result of natural selection of species survival (from an even larger pool of initial species) whose interaction strengths possess particular statistical properties [30, 31]. These theories imply that the probability of randomly sampling independent parameter values to satisfy feasibility and stability does not indicate the probability of the ecosystem existing in nature.

Thus, instead of being limited by the conceptual argument that the inability to efficiently generate plausible ecosystems via random sampling suggests these ecosystems cannot exist in practice, a key implication of the community assembly hypothesis is that we can instead take advantage of the full suite of Bayesian approaches (as performed here) to identify an ensemble of parameters that can plausibly generate large ecosystems in a computationally efficient manner. The SMC-EEM method also has the potential (beyond specific case studies) to broadly explore the consequences of community assembly on the general properties of ecosystem networks that form in nature [30].

Now that we can quickly produce large ensembles of parameter values that match ecological theory, insights can be drawn from the results. This method could be used to compare the relative difficulties in obtaining models that meet different constraints; for example, is there a lower probability of obtaining feasible ecosystem models, or stable ecosystem models? Alternatively, practitioners could compare the estimated parameter values, or values of interest—such as abundance correlations between species—across ensembles parameterised using different ecological theories.

In our implementation, we assumed parameters were independent in the prior distribution; however, SMC-EEM can accommodate other prior choices (e.g., prior parameter dependencies such as a trophic transfer efficiency constraint [18] or intraspecific density dependencies [71] can be implemented using conditional distributions). However, assuming prior parameter independence does not prevent dependencies from being inferred when fit to the constraints. By analysing the covariance of the parameters once incorporating the constraints (using a method such as the analysis of model sloppiness), the parameter combinations that are important for feasibility and stability could be assessed, as we have shown in our analysis.

When we applied this analysis to the Great Barrier Reef case study, it suggested that high populations of basal species and low populations of top predators were the most important factors for achieving the constraints. While this result is unsurprising, it is also somewhat uninsightful. This is likely due to the relatively uninformed prior distributions used in the analysis (following those of Baker et al., [18]) that forced intrinsic growth rate parameters to be positive and had equal magnitude across all species. Growth rate prior distributions with negative values, or other prior distributions, could easily be used within SMC-EEM instead. However, any effect of these prior distributions on the ensemble would in turn affect this analysis, such that we recommend testing various prior specifications to assess its impact.

### Computational efficiency unlocks new opportunities for improving ecosystem model realism

In the present analysis, we considered ecosystem networks generated by generalised Lotka-Volterra equations—as this is the mathematical model that EEM has been thus far applied to [18]—however, alternative models have been proposed to offer more complex representations of ecosystem interactions in nature, such as different functional responses [72], or more recently, higher-order (i.e. beyond pairwise) interactions [73]. The generalised Lotka-Volterra model is computationally convenient for EEM because the equilibrium feasibility and stability conditions are readily computable via algebraic formulae (Eqs (4) and (5)). A different choice of model or constraints could be much more computationally expensive to simulate and include many more parameters for calibration—e.g., models with predator learning or prey saturation [72], or constraints on ecosystem dynamics outside of the system equilibrium [74]. The statistical efficiency of the SMC-ABC-based approach underlying our SMC-EEM method therefore offers a significant advantage over standard-EEM if other (potentially more realistic) model types and constraints are used. We surmise that the computational gains shown in the present work are expected to extend beyond the generalised Lotka-Volterra models, and feasibility and stability constraints considered here.

Within our SMC-EEM method, the choice of discrepancy function drastically reduced the computation time in comparison to the standard-EEM method for larger networks (Fig 3). We used a simple discrepancy function to indicate a measure of how infeasible and unstable an ecosystem parameterisation is (Eq (7)); however, there may be better choices for the discrepancy function which further improve the efficiency of the method—such as replacement of the sums and absolute values in Eq (7) with other distance measures like the Euclidean norm, or weighting the infeasibility and instability sums differently. We leave these investigations for future work, especially as the results regarding the “best” discrepancy function may be highly model and constraint-specific.

When additional constraints are imposed on the ensemble—which further reduces the acceptance rate—maintaining computational efficiency carries even greater importance than seen here. Case studies in the literature have considered constraints in addition to feasibility and stability, including feasibility and stability for subsets of the ecosystem [18, 24], randomly assigned species interactions [25, 27] and additional constraints on combinations of parameters (e.g. trophic energy transfer constraints) [18, 24]. While the inclusion of such additional constraints in the SMC-EEM method is possible, it can require more careful algorithmic programming than the standard-EEM method.

Additional data on population estimates, where available, should be used to inform the model parameters further. Since the constraints we used to parameterise SMC-EEM are not directly observable, we can consider the resulting ensemble as a constraint-informed prior distribution [75] which can then be updated to incorporate any available time-series data in a subsequent Bayesian analysis. Furthermore, it would be interesting to analyse the effects on population forecasts of the constraint-informed prior compared to the relatively uninformed prior. Alternatively, the constraints within the discrepancy function could be redefined where additional information about species abundance estimates is available (see e.g., Neutel et al [76]). Parameter sets with equilibrium abundances near the estimates could be given a lower discrepancy according to a Gaussian distribution, or equilibrium abundance limits could be defined—as in the feasibility constraint (see Eq (8))—to avoid unreasonable population sizes. Though connecting these data with feasibility and stability constraints, we hope that ensemble ecosystem modelling can be more accurate for conservation decision-making.

## Conclusion

Through SMC-EEM we have unlocked ensemble ecosystem modelling for large and complex networks. Increasing the computational efficiency means that users only need to wait hours, rather than months, to analyse the risks and potential consequences of conservation actions in remote and understudied ecosystems with limited data. Through drastically improved computational efficiency, SMC-EEM brings new opportunities to explore more realistic ecosystem models and constraints to study the large and complex ecosystem networks that exist in nature.

## Supporting information

### S1 Fig. Computation time required to generate an ensemble for various network connectances.

The computation time needed to generate an ensemble of 1000 feasible and stable ecosystem models using a connectance probability of *c* = 0.25 (left), *c* = 0.5 (middle) and *c* = 0.75 (right), for both the standard-EEM and SMC-EEM methods. This figure shows the medians (dots) and 7.5–92.5% quantiles (error bars) of computation times for producing the results. Note, the computation time for any one ecosystem network was capped at 10^{4} seconds due to the computational burden of the simulation study. More densely connected ecosystems (higher value of *c*) increase the computation time of both methods and decrease the network size at which the SMC-EEM method becomes more computationally efficient than the standard-EEM method.

https://doi.org/10.1371/journal.pcbi.1011976.s001

(TIF)

### S2 Fig. The number of simulations required to generate an ensemble for various network sizes.

The number of parameter sets trialled to generate an ensemble of 1000 feasible and stable ecosystem models using both the standard-EEM and SMC-EEM parameterisation methods. This figure shows the medians (dots) and 7.5–92.5% quantiles (error bars) of simulation numbers for the models parameterised in Fig 2 of the manuscript. Note, the computation time for any one ecosystem network was capped at 10^{4} seconds due to the computational burden of the simulation study.

https://doi.org/10.1371/journal.pcbi.1011976.s002

(TIF)

### S3 Fig. Parameter distributions for the semiarid Australia ecosystem network comparing standard-EEM to SMC-EEM.

Marginal parameter distributions estimated using both the standard-EEM method (red) and the SMC-EEM method (blue). Species labels represent dingoes (D), mesopredators (M), large herbivores (H), small vertebrates (V), grasses (G), invertebrates (I), fires (F) and soil quality (S). Notice that the blue and red densities match almost exactly, demonstrating that the outputs of the standard-EEM and SMC-EEM methods are consistent.

https://doi.org/10.1371/journal.pcbi.1011976.s003

(TIF)

### S4 Fig. Eigenparameter distributions for the semiarid Australia ecosystem network comparing standard-EEM to SMC-EEM.

Marginal distributions of the nine stiffest eigenparameters estimated via the prior (grey), standard-EEM (red) and SMC-EEM (blue) ensembles. Notice that the blue and red densities match almost exactly, demonstrating that the outputs of the standard-EEM and SMC-EEM methods are consistent.

https://doi.org/10.1371/journal.pcbi.1011976.s004

(TIF)

### S5 Fig. Time-series predictions for the semiarid Australia ecosystem network comparing the prior, standard-EEM, and SMC-EEM.

Time-series forecasts for the prior (grey), standard-EEM (red) and SMC-EEM (blue) ensembles simulated from a random initial condition. Depicted are the median (think lines) and 95% credible intervals (thin dotted lines) for each ensemble. Notice that the blue and red predictions are similar, demonstrating that the outputs of the standard-EEM and SMC-EEM methods are consistent.

https://doi.org/10.1371/journal.pcbi.1011976.s005

(TIF)

### S6 Fig. Parameter distributions for the Phillip Island ecosystem network comparing standard-EEM to SMC-EEM.

The estimated marginal distributions for each parameter within the ecosystem model for the Phillip Island network were generated via the standard-EEM method (red) and the SMC-EEM method (blue). Species labels represent parameters for the red fox (RF), feral cat (FC), toxoplasmosis (T), black rat (BR), house mouse (HM), European rabbit (ER), myxoma and calici (MC), little penguin (LP), short-tailed shearwater (STS), little raven (LR), Cape Barren geese (CBG), raptors (R), woodland birds (WB), ringtail possum (RP), brushtail possum (BP), swamp wallaby (SW), eastern barred bandicoot (EBB), soil invertebrates (SI), terrestrial invertebrates (TI), woodlands (W), grasslands (G), and herbfield (H).

https://doi.org/10.1371/journal.pcbi.1011976.s006

(TIF)

### S7 Fig. Eigenparameter distributions for the Phillip Island ecosystem network comparing standard-EEM to SMC-EEM.

Distributions of the nine most constrained parameter combinations (stiffest eigenparameters) determined by an analysis of model sloppiness of the standard-EEM ensemble. Here we compare the values of the eigenparameters for the prior (grey), standard-EEM (red) and SMC-EEM (blue) ensemble.

https://doi.org/10.1371/journal.pcbi.1011976.s007

(TIF)

### S8 Fig. Time-series predictions for the Phillip Island ecosystem network comparing the prior, standard-EEM, and SMC-EEM.

Time-series forecasts for the prior (grey), standard-EEM (red) and SMC-EEM (blue) ensembles simulated from a random initial condition. Depicted are the median (think lines) and 95% credible intervals (thin dotted lines) for each ensemble. Notice that the blue and red predictions are similar, demonstrating that the outputs of the standard-EEM and SMC-EEM methods are consistent.

https://doi.org/10.1371/journal.pcbi.1011976.s008

(TIF)

### S9 Fig. Parameter distributions for the Great Barrier Reef ecosystem network comparing two independent SMC-EEM ensembles.

The estimated marginal distributions for each parameter within the ecosystem model for the Great Barrier Reef network were generated via two independent runs of the SMC-EEM algorithm (black and blue). Species labels represent parameters for large carnivores (LC), pelagic piscivores (PP), benthic piscivores (BP), meso-carnivores (MC), invertivores (Iv), herbivore (H), detritivores (Dv), planktivores (Pv), coral cryptics (CC), invertebrates (I), urchins (U), corals (C), macroalgae (MA), turf algae (TA), detritus (D), and plankton (P).

https://doi.org/10.1371/journal.pcbi.1011976.s009

(TIF)

### S10 Fig. Eigenparameter distributions for the Great Barrier Reef ecosystem network comparing two independent SMC-EEM ensembles.

Distributions of the nine most constrained parameter combinations (stiffest eigenparameters) determined by an analysis of model sloppiness of a SMC-EEM ensemble. Here we compare the values of the eigenparameters for the prior distribution (grey), and two independent ensembles generated via the SMC-EEM algorithm (black and blue).

https://doi.org/10.1371/journal.pcbi.1011976.s010

(TIF)

### S11 Fig. Time-series predictions for the Great Barrier Reef ecosystem network comparing the prior and two independently generated SMC-EEM ensembles.

Time-series forecasts for the prior (grey), and two independently generated SMC-EEM (light and dark blue) ensembles simulated from a random initial condition. Depicted are the median (think lines) and 95% credible intervals (thin dotted lines) for each ensemble. Notice that the two blue predictions are similar, demonstrating that the SMC-EEM ensembles are consistent.

https://doi.org/10.1371/journal.pcbi.1011976.s011

(TIF)

### S12 Fig. Five most tightly constrained parameter combinations for the Great Barrier Reef ecosystem network.

The eigenvector values for the first five eigenparameters, rescaled to be between -1 and 1. These values are shaded such that the darker colours indicates a greater contribution of the parameter to the important parameter combinations. The columns of this table can be interpreted using Eq (10). Notice, that the most important parameters are all growth rates for lower trophic species, and self-regulation for top predators.

https://doi.org/10.1371/journal.pcbi.1011976.s012

(TIF)

### S13 Fig. Eighty most tightly constrained parameter combinations for the Great Barrier Reef ecosystem network.

The eigenvector values for the first 80 eigenparameters, shaded such that darker colours indicate a greater contribution of the parameter to the eigenparameter. Each row represents an eigenparameter (ordered from most sensitive to least) and each column represents a model parameter (grouped by type). Note that beyond the first five eigenparameters, there are no clearly interpretable trends.

https://doi.org/10.1371/journal.pcbi.1011976.s013

(TIF)

### S1 File. Additional details of the SMC-EEM method.

Additional details for implementing the SMC-EEM method, adapted from Drovandi and Pettitt’s (2011) [33] implementation of SMC-ABC.

https://doi.org/10.1371/journal.pcbi.1011976.s014

(PDF)

### S1 Video. Visualisation of the ABC accept-reject and SMC-ABC approaches.

This video shows a two-dimensional visualisation of the ABC accept-reject approach (left) and the SMC-ABC approach (right). Each parameterisation approach aims to obtain samples from 0.3 ≤ *x* ≤ 0.4 and 0.8 ≤ *y* ≤ 0.9 (grey-shaded region).

https://doi.org/10.1371/journal.pcbi.1011976.s015

(MP4)

## Acknowledgments

The authors thank Chris Baker, Cailan Jeynes-Smith, Brodie Lawson and Robert Salomone for helpful discussions during this research. Computational resources were provided by the eResearch Office, Queensland University of Technology.

## References

- 1. Prior KM, Adams DC, Klepzig KD, Hulcr J. When does invasive species removal lead to ecological recovery? Implications for management success. Biological Invasions. 2018;20(2):267–83.
- 2. Buckley YM, Han Y. Managing the side effects of invasion control. Science. 2014;344(6187):975–6. pmid:24876482
- 3. Raymond B, McInnes J, Dambacher JM, Way S, Bergstrom DM. Qualitative modelling of invasive species eradication on subantarctic Macquarie Island. Journal of Applied Ecology. 2011;48(1):181–91.
- 4. Roemer GW, Donlan CJ, Courchamp F. Golden eagles, feral pigs, and insular carnivores: how exotic species turn native predators into prey. Proceedings of the National Academy of Sciences. 2002;99(2):791–6. pmid:11752396
- 5. Roemer GW, Donlan CJ, Courchamp F. Golden eagles, feral pigs, and insular carnivores: how exotic species turn native predators into prey. Proceedings of the National Academy of Sciences. 2002;99(2):791–6. pmid:11752396
- 6. Adams MP, Sisson SA, Helmstedt KJ, Baker CM, Holden MH, Plein M, et al. Informing management decisions for ecological networks, using dynamic models calibrated to noisy time-series data. Ecology Letters. 2020;23(4):607–19. pmid:31989772
- 7. Possingham HP, Andelman S, Noon B, Trombulak S, Pulliam H. Making smart conservation decisions. Conservation Biology: research priorities for the next decade. 2001;23:225–44.
- 8. Tulloch AI, Hagger V, Greenville AC. Ecological forecasts to inform near-term management of threats to biodiversity. Global Change Biology. 2020;26(10):5816–28. pmid:32652624
- 9. Baker CM, Bode M, Dexter N, Lindenmayer DB, Foster C, MacGregor C, et al. A novel approach to assessing the ecosystem-wide impacts of reintroductions. Ecological Applications. 2019;29(1):e01811. pmid:30312496
- 10. Dambacher JM, Li HW, Rossignol PA. Qualitative predictions in model ecosystems. Ecological Modelling. 2003;161(1-2):79–93.
- 11. Kristensen NP, Chisholm RA, McDonald-Madden E. Dealing with high uncertainty in qualitative network models using Boolean analysis. Methods in Ecology and Evolution. 2019;10(7):1048–61.
- 12. Geary WL, Bode M, Doherty TS, Fulton EA, Nimmo DG, Tulloch AI, et al. A guide to ecosystem models and their environmental applications. Nature Ecology & Evolution. 2020;4(11):1459–71. pmid:32929239
- 13. Novak M, Wootton JT, Doak DF, Emmerson M, Estes JA, Tinker MT. Predicting community responses to perturbations in the face of imperfect knowledge and network complexity. Ecology. 2011;92(4):836–46. pmid:21661547
- 14. Humbert JY, Scott Mills L, Horne JS, Dennis B. A better way to estimate population trends. Oikos. 2009;118(12):1940–6.
- 15. McDonald-Madden E, Baxter PW, Fuller RA, Martin TG, Game ET, Montambault J, et al. Monitoring does not always count. Trends in Ecology & Evolution. 2010;25(10):547–50. pmid:20727614
- 16. Dougoud M, Vinckenbosch L, Rohr RP, Bersier LF, Mazza C. The feasibility of equilibria in large ecosystems: A primary but neglected concept in the complexity-stability debate. PLoS Computational Biology. 2018;14(2):e1005988. pmid:29420532
- 17. Melbourne-Thomas J, Wotherspoon S, Raymond B, Constable A. Comprehensive evaluation of model uncertainty in qualitative network analyses. Ecological Monographs. 2012;82(4):505–19.
- 18. Baker CM, Gordon A, Bode M. Ensemble ecosystem modeling for predicting ecosystem response to predator reintroduction. Conservation Biology. 2017;31(2):376–84. pmid:27478092
- 19. May RM. Will a large complex system be stable? Nature. 1972;238(5364):413–4. pmid:4559589
- 20. Allesina S, Tang S. The stability–complexity relationship at age 40: a random matrix perspective. Population Ecology. 2015;57(1):63–75.
- 21. Grilli J, Adorisio M, Suweis S, Barabás G, Banavar JR, Allesina S, et al. Feasibility and coexistence of large ecological communities. Nature communications. 2017;8(1):1–8. pmid:28233768
- 22. Stone L. The feasibility and stability of large complex biological networks: a random matrix approach. Scientific Reports. 2018;8(1):1–12. pmid:29844420
- 23. Pesendorfer MB, Baker CM, Stringer M, McDonald-Madden E, Bode M, McEachern AK, et al. Oak habitat recovery on California’s largest islands: scenarios for the role of corvid seed dispersal. Journal of Applied Ecology. 2018;55(3):1185–94.
- 24. Peterson KA, Barnes MD, Jeynes-Smith C, Cowen S, Gibson L, Sims C, et al. Reconstructing lost ecosystems: A risk analysis framework for planning multispecies reintroductions under severe uncertainty. Journal of Applied Ecology. 2021;58(10):2171–84.
- 25. Rendall AR, Sutherland DR, Baker CM, Raymond B, Cooke R, White JG. Managing ecosystems in a sea of uncertainty: invasive species management and assisted colonizations. Ecological Applications. 2021;31(4):e02306. pmid:33595860
- 26. Bode M, Baker CM, Benshemesh J, Burnard T, Rumpff L, Hauser CE, et al. Revealing beliefs: using ensemble ecosystem modelling to extrapolate expert beliefs to novel ecological scenarios. Methods in Ecology and Evolution. 2017;8(8):1012–21.
- 27. Peterson K, Bode M. Using ensemble modeling to predict the impacts of assisted migration on recipient ecosystems. Conservation Biology. 2021;35(2):678–87. pmid:32538472
- 28. Allesina S, Tang S. Stability criteria for complex ecosystems. Nature. 2012;483(7388):205–8. pmid:22343894
- 29.
Landi P, Minoarivelo HO, Brännström Å, Hui C, Dieckmann U. Complexity and stability of adaptive ecological networks: a survey of the theory in community ecology. In: Systems analysis approach for complex global challenges. Springer; 2018. p. 209–48.
- 30. Barbier M, de Mazancourt C, Loreau M, Bunin G. Fingerprints of high-dimensional coexistence in complex ecosystems. Physical Review X. 2021;11(1):011009.
- 31. Serván CA, Capitán JA, Grilli J, Morrison KE, Allesina S. Coexistence of many species in random ecosystems. Nature Ecology & Evolution. 2018;2(8):1237–42. pmid:29988167
- 32. Sisson SA, Fan Y, Tanaka MM. Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences. 2007;104(6):1760–5. pmid:17264216
- 33. Drovandi CC, Pettitt AN. Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics. 2011;67(1):225–33. pmid:20345496
- 34. Monsalve-Bravo GM, Lawson BAJ, Drovandi C, Burrage K, Brown KS, Baker CM, et al. Analysis of sloppiness in model simulations: Unveiling parameter uncertainty when mathematical models are fitted to data. Science Advances. 2022;8(38):eabm5952. pmid:36129974
- 35. Vollert SA, Drovandi C, Monsalve-Bravo GM, Adams MP. Strategic model reduction by analysing model sloppiness: A case study in coral calcification. Environmental Modelling & Software. 2023;159:105578.
- 36. Botha I, Adams MP, Frazier D, Tran DK, Bennett FR, Drovandi C. Component-wise iterative ensemble Kalman inversion for static Bayesian models with unknown measurement error covariance. Inverse Problems. 2023;39(12):125014.
- 37. Ings TC, Montoya JM, Bascompte J, Blüthgen N, Brown L, Dormann CF, et al. Ecological networks–beyond food webs. Journal of Animal Ecology. 2009;78(1):253–69. pmid:19120606
- 38. Montoya JM, Pimm SL, Solé RV. Ecological networks and their fragility. Nature. 2006;442(7100):259–64. pmid:16855581
- 39. Novak M, Yeakel JD, Noble AE, Doak DF, Emmerson M, Estes JA, et al. Characterizing species interactions to understand press perturbations: what is the community matrix? Annual Review of Ecology, Evolution, and Systematics. 2016;47:409–32.
- 40. Baker CM, Holden MH, Plein M, McCarthy MA, Possingham HP. Informing network management using fuzzy cognitive maps. Biological Conservation. 2018;224:122–8.
- 41. Levins R. Discussion paper: the qualitative analysis of partially specified systems. Annals of the New York Academy of Sciences. 1974;231(1):123–38. pmid:4522890
- 42. Dambacher JM, Li HW, Rossignol PA. Relevance of community structure in assessing indeterminacy of ecological predictions. Ecology. 2002;83(5):1372–85.
- 43. Ives AR, Dennis B, Cottingham KL, Carpenter SR. Estimating community stability and ecological interactions from time-series data. Ecological Monographs. 2003;73(2):301–30.
- 44. Bonnaffé W, Coulson T. Fast fitting of neural ordinary differential equations by Bayesian neural gradient matching to infer ecological interactions from time-series data. Methods in Ecology and Evolution. 2023;14(6):1543–63.
- 45. Liu OR, Gaines SD. Environmental context dependency in species interactions. Proceedings of the National Academy of Sciences. 2022;119(36):e2118539119. pmid:36037344
- 46. Ye H, Beamish RJ, Glaser SM, Grant SC, Hsieh Ch, Richards LJ, et al. Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling. Proceedings of the National Academy of Sciences. 2015;112(13):E1569–76. pmid:25733874
- 47.
Murray JD. Mathematical Biology I: An Introduction. Springer, New York, New York, USA; 2002.
- 48. Beaumont MA. Approximate Bayesian computation. Annual review of statistics and its application. 2019;6:379–403.
- 49.
Sisson SA, Fan Y, Beaumont M. Handbook of approximate Bayesian computation. CRC Press; 2018.
- 50. Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C. Approximate Bayesian computation. PLoS Computational Biology. 2013;9(1):e1002803. pmid:23341757
- 51. Beaumont MA. Approximate Bayesian computation in evolution and ecology. Annual review of ecology, evolution, and systematics. 2010;41:379–406.
- 52. Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in population genetics. Genetics. 2002;162(4):2025–35. pmid:12524368
- 53. Martin GM, Frazier DT, Robert CP. Computing Bayes: From then ‘til now. Statistical Science. 2023;1(1):1–17.
- 54.
Gamerman D, Lopes HF. Markov chain Monte Carlo: Stochastic simulation for Bayesian inference. CRC Press; 2006.
- 55. Del Moral P, Doucet A, Jasra A. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2006;68(3):411–36.
- 56. Chopin N. A sequential particle filter method for static models. Biometrika. 2002;89(3):539–52.
- 57.
Drovandi C. Approximate Bayesian Computation. Wiley StatsRef: Statistics Reference Online. 2017:1-9.
- 58. Pritchard JK, Seielstad MT, Perez-Lezaun A, Feldman MW. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution. 1999;16(12):1791–8. pmid:10605120
- 59. Cérou F, Del Moral P, Furon T, Guyader A. Sequential Monte Carlo for rare event estimation. Statistics and Computing. 2012;22(3):795–808.
- 60. Del Moral P, Doucet A, Jasra A. An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and Computing. 2012;22(5):1009–20.
- 61. Transtrum MK, Machta BB, Brown KS, Daniels BC, Myers CR, Sethna JP. Perspective: Sloppiness and emergent theories in physics, biology, and beyond. The Journal of Chemical Physics. 2015;143(1):010901–113. pmid:26156455
- 62. Brown KS, Sethna JP. Statistical mechanical approaches to models with many poorly known parameters. Physical Review E. 2003;68(2):021904. pmid:14525003
- 63. Gutenkunst RN, Waterfall JJ, Casey FP, Brown KS, Myers CR, Sethna JP. Universally Sloppy Parameter Sensitivities in Systems Biology Models. PLoS Computational Biology. 2007;3(10):1871–8. pmid:17922568
- 64. Rogers A, Harborne AR, Brown CJ, Bozec YM, Castro C, Chollett I, et al. Anticipative management for coral reef ecosystem services in the 21st century. Global Change Biology. 2015;21(2):504–14. pmid:25179273
- 65. Johnson S, Domínguez-García V, Donetti L, Munoz MA. Trophic coherence determines food-web stability. Proceedings of the National Academy of Sciences. 2014;111(50):17923–8. pmid:25468963
- 66. Barbier M, Loreau M. Pyramids and cascades: a synthesis of food chain functioning and stability. Ecology letters. 2019;22(2):405–19. pmid:30560550
- 67. Emmerson M, Yearsley JM. Weak interactions, omnivory and emergent food-web properties. Proceedings of the Royal Society of London Series B: Biological Sciences. 2004;271(1537):397–405. pmid:15101699
- 68. Jacquet C, Moritz C, Morissette L, Legagneux P, Massol F, Archambault P, et al. No complexity–stability relationship in empirical ecosystems. Nature Communications. 2016;7(1):1–8. pmid:27553393
- 69. Dormann CF, Fründ J, Schaefer HM. Identifying causes of patterns in ecological networks: opportunities and limitations. Annual Review of Ecology, Evolution, and Systematics. 2017;48:559–84.
- 70. Weiher E, Freund D, Bunton T, Stefanski A, Lee T, Bentivenga S. Advances, challenges and a developing synthesis of ecological community assembly theory. Philosophical Transactions of the Royal Society B: Biological Sciences. 2011;366(1576):2403–13.
- 71. Reznick D, Bryant MJ, Bashey F. r-and K-selection revisited: the role of population regulation in life-history evolution. Ecology. 2002;83(6):1509–20.
- 72. Holling CS. The components of predation as revealed by a study of small-mammal predation of the European Pine Sawfly. The Canadian Entomologist. 1959;91(5):293–320.
- 73. Gibbs T, Levin SA, Levine JM. Coexistence in diverse communities with higher-order interactions. Proceedings of the National Academy of Sciences. 2022;119(43):e2205063119. pmid:36252042
- 74. Hastings A, Abbott KC, Cuddington K, Francis T, Gellner G, Lai YC, et al. Transient phenomena in ecology. Science. 2018;361 (6406). pmid:30190378
- 75. Wesner JS, Pomeranz JP. Choosing priors in Bayesian ecological models by simulating from the prior predictive distribution. Ecosphere. 2021;12(9):e03739.
- 76. Neutel AM, Heesterbeek JA, Van de Koppel J, Hoenderboom G, Vos A, Kaldeway C, et al. Reconciling complexity with stability in naturally assembling food webs. Nature. 2007;449(7162):599–602. pmid:17914396