Dynamic Global Vegetation Models: Searching for the balance between demographic process representation and computational tractability

Vegetation is subject to multiple pressures in the 21st century, including changes in climate, atmospheric composition and human land-use. Changes in vegetation type, structure, and function also feed back to the climate through their impact on the surface-atmosphere fluxes of carbon and water. Dynamic Global Vegetation Models (DGVMs), are therefore key component of the latest Earth System Models (ESMs). Model projections for the future land carbon sink still span a wide range, in part due to the difficulty of representing complex ecosystem and biogeochemical processes at large scales (i.e. grid lengths� 100km). The challenge for developers of DGVMs is therefore to find an optimal balance between detailed process representation and the ability to scale-up. We categorise DGVMs into four groups; Individual, Average Area, Two Dimensional Cohort and One Dimensional Cohort models. From this we review popular methods used to represent dynamic vegetation within the context of Earth System modelling. We argue that the minimum level of complexity required to effectively model changes in carbon storage under changing climate and disturbance regimes, requires a representation of tree size distributions within forests. Furthermore, we find that observed size distributions are consistent with Demographic Equilibrium Theory, suggesting that One Dimensional Cohort models with a focus on tree size, offer the best balance between computational tractability and realism for ESM applications. Introduction Vegetation is a key determinant of future climate change. It is estimated that 3.4 ± 0.9 PgC yr (±σ) of carbon was absorbed into vegetation and soils over the last decade [1]. This sink represents almost 30% of annual anthropogenic carbon emissions; 11.5 ± 0.9 PgC yr. However, there remains a large range of uncertainty in the projections for the terrestrial sink over the coming century [2, 3]. Climate and human influences are similarly important determinants of the geographical distribution of observed vegetation [4, 5], which affects the land carbon storage and the hydrological cycle. A reduction in forest cover typically decreases water recycling, leading to drier conditions [6]. The additional loss of biomass is a source of carbon emissions and could lead PLOS CLIMATE PLOS Climate | https://doi.org/10.1371/journal.pclm.0000068 September 6, 2022 1 / 18 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111


Introduction
Vegetation is a key determinant of future climate change. It is estimated that 3.4 ± 0.9 PgC yr −1 (±σ) of carbon was absorbed into vegetation and soils over the last decade [1]. This sink represents almost 30% of annual anthropogenic carbon emissions; 11.5 ± 0.9 PgC yr −1 . However, there remains a large range of uncertainty in the projections for the terrestrial sink over the coming century [2,3].
Climate and human influences are similarly important determinants of the geographical distribution of observed vegetation [4,5], which affects the land carbon storage and the hydrological cycle. A reduction in forest cover typically decreases water recycling, leading to drier conditions [6]. The additional loss of biomass is a source of carbon emissions and could lead to an acceleration of global warming [7,8]. Climate and land-use change could potentially cause a significant loss of regional forest ecosystems [9]. For instance, one key risk is Amazonian dieback [10,11]. Two models in the CMIP5 ensemble predicted dieback under the most extreme radiative forcing scenario [12], but the likelihood is increased when including deforestation, which reduces evapotranspiration and forest resilience to droughts [13]. Dieback of forest ecosystems has previously resulted in losses of biodiversity [14,15], social impacts [16,17], and the release of greenhouse gases [18].
To model these interactions, Dynamic Global Vegetation Models (DGVMs) are now being used within a growing number of Earth System Models (ESM). Understanding the dynamics of forest ecosystems and how the populations of plants change through time (through fecundity, growth and mortality), will inform researchers and policy makers about the risks associated with changes in land-use and climate.

Dynamic Global Vegetation Models
In the historical context of climate modelling, the representation of the land-surface has increased in complexity in recent decades. Initially, atmosphere-land interactions were represented in a rudimentary way, with Global Circulation Models using prescribed values for parameters such as surface albedo, surface roughness and effective root-depth [19]. Vegetation was first modelled explicitly to improve the representation of the diurnal cycle of temperature [20]. Later models would begin to prescribe vegetation types categorised by "Plant Functional Types" (PFTs): for example, C3 and C4 Grass or Broadleaf Deciduous Trees [21]. This allowed for stomatal conductance and phenological diversity to be more realistically represented [22][23][24][25]. However, even in these second generation land-surface models. the vegetation cover was prescribed and fixed, based-on either observed land-cover or "equilibrium" biogeographical models, such as the BIOME model [26,27].
In parallel with the development of the first climate models, computational models were developed to represent community dynamics of forests, through the concept of a forest patch. Each patch would represent a community of species in both patch age and physiology, going through different "phases" of succession [28,29]. A patch based approach was first implemented in the JABOWA [30,31] and FORET [32] models. These are more commonly known as "gap models", both simulating a grid of patches within an ecosystem on the scale of 0.01 to 0.1 hectares. Gaps are randomly formed by disturbances, resetting a patch at the beginning of its successional pathway. Other forest models sought to represent explicit individuals of tree species in three-dimensional space, to capture vertical heterogeneity [33,34]. Environmental variables such as temperature and precipitation drive the photosynthesis and growth of plants. Light competition is most simply represented using Beer's law of radiation attenuation [35,36]. Age and size are principal variables in determining mortality and competition. The reproduction or recruitment rate is either fixed based-on species or dependent on the size of the current plant population. Gap

Individual Based models
Individual Based DGVMs are descendants of the original gap models, but often coupled to more sophisticated representations of water and carbon fluxes. The earliest example of this method was the appropriately named, HYBRID (version 3.0) model [48,49]. HYBRID simulates many individual trees established from a fixed density of seeds. As with gap models, crown competition for light is represented using Beer's law. HYBRID can be driven globally using climate variables from a GCM, typically by subdividing each grid-cell into up to ten patches [50]. The SEIB-DGVM model uses a similar methodology at the global scale [51]. These individual-based models have the potential to model plant-specific processes in greater detail, resulting in more realistic modelling of site-level dynamics [52].
For practical reasons, Individual Based models use the process of "upscaling" from the scale of the stand to that of the land-surface grid-box [53]. SEIB-DGVM scales 30mx30m plots up to the order of 200kmx200km grid-boxes [51]. When coupled to the MIROC-ESM [54], SEIB-DGVM simulates in essence one random patch for one grid-box, which may lead to spurious statistical variation between grid-boxes. Nevertheless, this form of explicit upscaling has been used effectively to study the influence of soil variation on high latitude ecosystems [55], and to forecast typhoon windthrow in Japan under climate change [56].
Individual Based models have also been used to address some of the criticisms of the concept of Plant Functional Types (PFTs Individual Based models can however fail to capture some large scale dynamic processes [47,66]. For example, the influence of forest fires can extend well past single patch scale. A single fire can eliminate an entire patch, especially in savannah, within a few minutes [67]. Therefore, to include fire, Individual Based models with a single patch must represent fire through some form of parametrisation. SEIB-DGVM relies on an end-of-year probabilistic estimation of fire based on environmental factors [68], after which the model applies an average risk of plant mortality. However, generally Individual Based models that represent multiple subgrid patches are able to explicitly simulate fire. In LPJ-GUESS in cohort and individual mode Uniquely, LPJ-GUESS has two modes "population mode", which is an Area Averaged vegetation model (see section Area Averaged Models), and "cohort and individual mode" which simulates multiple patches with a three dimensional simulation of trees within each patch, fire mortality is applied daily across multiple patches (100 patches per 0.1 ha in grid-box size)removing all individuals within the patch if a fire occurs [69].

Area Averaged models
An alternative approach to representing global vegetation focuses on the area-averaged properties of most relevance to climate, such as biomass and leaf area index. These top-down models essentially represent the biophysical properties of a typical plant on some vegetated fraction of a grid-box. In many cases, these dynamical models were born out of the earlier equilibrium biogeography models [26].
The VECODE model [5], sought to represent dynamic vegetation fractions as a consequence of two climate variables-temperature and precipitation. Competition was simply between two PFTs-trees and grass, the fraction of which must always add up to one. VECODE successfully captured the general distribution of tree coverages [70]. However, this approach underplays how fast vegetation cover might change, due to the lack of physical processes such as disturbances or land-use change [71]. In an expansion to the VECODE approach, the DYNVEG model simulates PFTs with the inclusion of dynamic disturbances such as fire and windthrow [72].
The TRIFFID [73] and CTEM models [74] represent PFTs by the fraction of grid-box area they cover and model competition between the PFTs through a Lotka scheme [75]. In TRIF-FID, vegetation tiles were simulated with area averaged productivity and carbon biomass. PFTs accumulate carbon into three pools-wood, leaves and litter. For recruitment, TRIFFID uses an LAI (Leaf-Area Index) dependent fraction of carbon mass assimilate to "spread", or expand, the coverage. PFTs are grouped via a competitive hierarchy, where dominant PFTs (Trees) can freely-access the space occupied by sub-dominant PFTs (Shrubs and Grasses). The competition between the different tree PFTs is modelled using relative heights estimated from allometry.
VECODE, TRIFFID, DIVE [76] and DYNVEG collectively provide us with one of the key large-scale simplifications for dynamic vegetation modelling; the sum of PFT/plant fractional grid-box area, ∑ν i , and the total unoccupied or bare space, ν bare , must be equal to one, the total grid box fractional area: Let us call this rule the "hard exclusion approximation", which implies that despite plants competing for many resources (light, water, and below ground nutrients, see [77]), space is the principal limiting resource [5,78]. The "hard" element describes the absence of overlap between the spaces occupied by PFTs, whereas in reality there is a degree of tolerated overlap between plants, in terms of both physical space and ecological niches [79].
This method of area exclusion competition has had a number of criticisms [61,80]. It has been argued that this will not realistically capture biome shifts or the species sorting that may occur under climate change [81]. Also, the abstraction of competition means that such dynamic relationships are hard to relate to field measurements [82].
Lotka approaches that use coverage and hard exclusion for competition have also been criticised in terms of biodiversity [61]. For instance, in the steady-state they cannot simulate the observed mixed communities of broadleaf or needleleaf trees seen in temperate latitudes [74,83]. In a Lotka context, this is because having a competition coefficient equal to one can generate complete exclusion of sub-dominant PFTs. Having a competition coefficient less than one generates diversity, but this breaks the hard exclusion rule; the overall total vegetation area can then exceed the grid-box area. The CTEM achieves diversity by allowing competition coefficients to be dependent on a more complex competition-colonisation parametrisation basedon inter-PFT invasion rates [74].
The hard exclusion approximation has been used to represent area-based disturbances, from land use change to fire, along with controlling the rate of subsequent regrowth [72, 84-90]. Area based land-use change is perhaps easiest to simulate, not necessarily needing to represent plant demography. The implementation of land use change in JULES-TRIFFID is to simply replace the natural PFTs with crop PFTs at an area dependent rate [91]. However, the representation of fire or drought disturbance through purely area, may overlook important dependencies on plant size and age [92].
A second approach to Area Averaged models is to build dynamics by primarily mechanistic means. The IBIS [93], SDGVM [94,95], MC1 [96], LPJ [97] and VEGAS [98,99] models simulate explicit light and/or water competition of mean PFT members within the ecosystems. This is typically done by partitioning the population into different "layers", representing either the canopy heights or root depths of vegetation. Additionally they may use an implementation of the self-thinning rule for competitive based mortality. There are benefits to this approach as, unlike the Lotka and area exclusion methods, there is a more direct link to physical processes, allowing for more model validation with empirical measurements [68, [100][101][102]. However, there is a cost associated with increasing the number of modelled processes, and thereby the number of uncertain parameters, which may lead to additional uncertainty at regional scales [103][104][105].
A stochastic trait based approach has been implemented in Area Averaged models through the JeDi-DGVM [81]. Lotka approaches are inappropriate for trait-based stochasticity, as the large number of possible PFTs implies that the competition coefficient matrix can quickly become computationally unwieldly; at rate of I 2 , with I being the total number of PFTs. JeDI-DGVM can generate thousands of randomly sampled growth strategies on a global scale. For grid-box scaling the model uses the "biomass-ratio" hypothesis [106], as simulated in the JeDI biogeographical equilibrium model [107,108]. The Biomass-ratio hypothesis is where the function (i.e. dominant, sub-dominant growth strategies) of the species is dictated by the relative contribution it makes to the total community biomass. The JeDI-DGVM does not represent mechanistic competition for light or space. Similarly, the DIVE model uses the original JeDI approach to pre-select a distribution of PFT traits, but DIVE then updates the PFTs area using the hard-exclusion rule [76]. The PFTs in this latter model are prescribed using diversity outputs from the original JeDI trait model [107,108].

Two Dimensional Cohort models
In some ways, Individual and Area Averaged based DGVMs have similar limitations. One relies on the representability of relatively small patches, the other relies on the representability of an average plant. Real landscapes at grid-box scale are significantly more varied, and therefore beyond the scope of either of these approaches [47, 109,110].
In a bid to overcome these shortcomings, the recent development of global vegetation models has focused on producing cohort based models [110]. Cohort based models seek to partition plant populations according to demography. This is done by grouping into size-plant height, basal diameter, or mass-and/or age classes. These models occupy a middle ground in terms of complexity, and seek to capture the statistical distributions of plants while eliminating the stochastic variability associated with explicitly simulating a relatively small number of patches with an Individual Based model. The background theory derives from the application of partial differential equations (pdes) within ecology [111].
The first Cohort models focused on the community dynamics across both size and age (henceforth: "cohort 2D models"). The ED model sought to replicate the statistics of earlier stochastic gap models [112]. ED successfully captures both vertical and horizontal heterogeneity of the community by partitioning populations into height and patch-age classes [113]. Much like other vegetation models, mortality is driven by PFT type, self-thinning and resource availability. Light competition is simulated among "infinitely flat crown canopy layers" [110]. Patch-age structure is modelled by the equation: where a is the age of the given patch, p is the probability distribution, and λ is the rate of patch disturbance. Eq (2) is an application of Von Foerster's cell-age kinetics [114,115], applied to represent the statistical distribution of patch ages [116]. Modelling of patch age not only allows for the disturbance history of a region to be represented, but also allows for more realism in the modelling of recruitment and disturbance, via its dependency on "patch-interconnectedness" and forest fragmentation [117][118][119]. Additionally in ED and other Cohort models, the size-structure is updated through the Fokker-Planck or Kolmogorov forward equation, first formulated in the context of physiological age by Van Sickle in 1977 [120]. The number density of plants, n, changes through plant size (or age), z. The rate of change in the size dimension is described by the plant growth rate, g. The population is also removed through a "sink", or mortality rate, γ: The variable z, could be any physical parameter related to the plant size, such as the basal diameter or biomass, or height in the case of ED. Given information relating to how growth and mortality vary with this size variable, Eqs (2) and (3) will capture the statistical evolution of both the vertical and horizontal structure of a forest-without the need for individually representing every tree. ED and ED2 [121] have been successfully applied to model a variety of ecosystems: arid [122], boreal [123], temperate [124] and tropical vegetation [125,126].
However, the modelling of both the size and age structure generates practical difficulties [110]. Firstly, by having a varying number of discrete patch ages, the number of patches to be simulated typically increases monotonically with time. Effectively, as time tends to infinity, a ! 1 and the λ decay would mean that the number of patches, along with their corresponding size profiles, will tend to infinity (Fig 2). To get around this issue the ED series of models pragmatically merge patches of similar demographic properties [119].
Other 2D cohort approaches use the Perfect Plasticity Approximation (PPA) to describe crown competition, e.g. the LM3-PPA model [127] and the CLM(ED) model [128]. LM3-PPA simulates both basal diameter (dbh) and patch age cohorts, but also aggregates across PFTs into canopy layers to deal with competition for light. PPA assumes that adjacent plant crowns perfectly morph to fill the space. This is known as phototropism or plasticity [129,130]. PPA is in essence a more tractable, semi-analytical approximation to crown competition and morphology [131], compared to earlier Individual Based approaches, such as the TASS model [42]. In each canopy layer of height z � , the integrated product of number density and crown area, a, across the ordered height h, of the plants must be less than or equal to the available fraction, F: For the top layer, F = 1. The available fraction may decrease through each layer [127], and the fraction of photosynthetically active is typically assumed to attenuate down though the canopy according to Beer's law. PPA can be viewed as at least partly related to the hard exclusion

PLOS CLIMATE
approximation. In essence both assumptions use either the grid-box area or the canopy area as a hard limit for vegetation. In LM3-PPA, the mortality rate is a combination of size-independent mortality along with an additional height-dependent mortality due to shading from taller plants. A later version of LM3-PPA has also been successful in capturing the demographic distribution of both size and age within a tropical forest, using allometric power-law relationships for physical parameters [132]. Cohort 2D models have historically struggled with integration into ESMs or LSMs. The ED model was originally planned to be integrated into the JULES LSM in 2010 [119], but this stalled partly because of the difficulties arising from coupling; the aforementioned unbounded patch age dimension and the fact that physiology and demography were difficult to separate within ED. Nevertheless, the recent GFDL-ESM4.1 does now successfully use the LM4.0-PPA dynamic vegetation model [133], with results presented in the CMIP6 model ensemble [3].

One Dimensional Cohort models
DGVMs based on 2D cohort models represent a significant increase in complexity and computational demands compared to existing ESM land-surface schemes. Recent developments have therefore tended to focus more on 1D cohort models, which eliminate the explicit representation of either size classes or age classes. For instance, the POP model [134] eliminates size cohorts and instead focuses on an age based approach. POP calculates carbon assimilation at the grid-box scale, and then uses allometric relationships to disaggregate these grid-box mean fluxes into age cohorts. Mortality is assumed to arise from two components-stand crowding and resource limitation, which enables the model to simulate self-thinning within forest communities. POP has been successfully included in the CABLE land surface model [135].
Other models simulate multiple PFT patches, such as in ORCHIDEE-MICT [136]. The ORCHIDEE-MICT model partitions PFTs into age based sub-grid "Cohort Functional Types". Plants move up into older classifications once the basal diameter surpasses a threshold. When this occurs patches are merged, with the new functional traits being the area-weighted average. JSBACH4 [137] and LPJ-wsl2.0 models [138], use "vector tracking of fractional transitions". In this approach, there is a fixed number of age classes containing several sub-divisions. The sum of fractions across all classes must equal one, to normalise with respect to patch population (see Eq (2)). After a disturbance, such as deforestation or fire, the affected patch fraction is then substituted back into the lowest age class. The physiology or state variables within the age class is once again the area weighted average. Much like cohort 2D models, the explicit representation of patch age requires a balance between realism and computational practicality.
An explicit size-only approach is adopted in the RED model [139]. RED relies on Eq (3) to update the size-structure within mass (m) classes. The hard-exclusion approximation from Eq (1) is used to limit the number of surviving new recruits within "gaps" (i.e. the area covered by bare soil or sub-dominant PFTs). By default, RED assumes that plant growth-rate varies with size in accordance with metabolic scaling theory; g / z ϕ , with z ! m and ϕ ! 3/4 [140]. A further simplification is the assumption (by default) that tree mortality rate (excluding disturbances) is size-independent. RED has been designed to be consistent with Demographic Equilibrium Theory (DET) for the size-structure for forests [141]. DET predicts that at equilibrium, Eq (3) with @n/@t ! 0, forest demography is dictated by metabolic scaling of growth and size-independent mortality. When Eq (3) is integrated with these assumptions, we find a Weibull distribution describing how the number density of plants varies with size:

PLOS CLIMATE
DET has been validated for North America using the USDA forestry dataset [141] (Fig 3a) and for South America using the RAINFOR sites [142] (Fig 3b). Interestingly, despite the significant differences in climate between the tropical and temperate forests, the mean shape of the Weibull distribution is similar, with μ 1 = 0.220 and μ 1 = 0.255, for North and South America respectively. It therefore appears that the assumptions of DET, and by extension of RED, are broadly-consistent with observed tree size-distributions.

Discussion
Within the latest CMIP6 model ensemble only a few ESMs attempt to simulate dynamic vegetation [3]-curiously more models include interactive disturbances (such as fire). Of the DGVMs that are employed in CMIP6, most use simple area-average representations of vegetation dynamics (i.e. DYNVEG in MPI/JSBACH and TRIFFID UKESM/JULES). There is now a general acceptance of the need to include Cohort models to "bridge" the divide between the small scale and large scales [145]. Some pioneering DGVMs have made progress in this regard, such as the GFDL/LM4 (which is based-on LM3-PPA), but much more still needs to be done. In recent years there has been an overall trend towards the development of intermediate complexity DGVMs based-on Cohort models (Fig 4). These attempt to address the major criticisms of both Individual Based and Area Averaged models by capturing the demographic statistics of tree populations within forests. By simulating a probability distributions for ageclasses, size-classes, or sometimes both, cohort models seek to overcome the stochastic variability and upscaling issues of Individual Based models [112]. 2D Cohort models aim to replicate the statistics of Individual Based models in terms of both patch-age and tree-size. 1D Cohort models simplify by projecting on to just one of these dimensions, in order to reduce the dimensionality of the model phase space, and also the number of uncertain parameters [134,139] (5), for μ 0 (assuming z ! d and ϕ ! 1/3) against observations of number density versus diameter. Panel (a) shows the DET fit to North American trees [141] observations of trunk diameter (diameter at breast height) from the USDA Forest Service FIA program [143]. Panel (b) presents DET fits to RAINFOR measurements of basal diameter (b) [144]. μ 1 is the normalised metric of the turnover parameter (μ 0 at d 0 = 1 cm) describing the shape of the Weibull distribution. https://doi.org/10.1371/journal.pclm.0000068.g003

PLOS CLIMATE
DGVMs: The balance of demographic processes and computational tractability Most cohort models seek to explicitly represent patch age since disturbance. The number of patches of different ages therefore increases through time. To make this manageable, patches must be repeatedly merged, which arguably introduces an arbitrariness into such models. The alternative is to instead explicitly model the probability distribution function for tree size, and to implicitly assume that age-related processes (such as mortality and growth-rates) can use tree-size as a proxy for tree-age. We believe that such size-based demographic modelling is preferable for global ESM applications, in part because plant size is key to the realistic representation of disturbances such as drought and fire [92].
For models to accurately represent vegetation dynamics under climate and land-use change, it is clear that they must move on from simplistic average area models and include plant demography. A number of theoretical concepts may be helpful for this endeavour. For example, metabolic scaling of growth (coupled with constant mortality) offers a huge simplification, and yet appears to be consistent with the distribution of tree sizes at large-scales [141,142] (Fig 3). Likewise, the distribution of patches may be understood in terms of percolation theory [146], or via the associated concept of self-organised criticality [145].

Conclusion
In this review we have outlined four different approaches used in Dynamic Global Vegetation Models (DGVMs): Individual Based, Area Averaged, 2D Cohort and 1D Cohort models. Individual Based models explicitly model individual trees and/or patches in an attempt to capture both vertical and horizontal forest dynamics in mechanistic detail. This is in stark contrast to

PLOS CLIMATE
DGVMs: The balance of demographic processes and computational tractability Area Averaged DGVMs which simply try to represent the mean properties (e.g. biomass and leaf area index) of the Plant Functional Types within each grid-box. We have argued that neither of these approaches is appropriate to model the dynamic land-surface in response to climate and land-use changes. Area-average approaches are incapable of representing tree-size distributions, and these are vital to reliably model vegetation responses to disturbance and anthropogenic land-use change. Individual-based models can explicitly model the key processes but are essentially computationally intractable for use in ESMs. An intermediate complexity approach is offered by Cohort models that exists in two forms. 2D Cohort models seek to reproduce the statistics of the Individual Based models in both the vertical (plant size) and the horizontal (patch-dynamics). 1D Cohort models focus on just one of these dimensions, in order to simplify the implementation of these models within ESMs. Although most existing 1D Cohort models focus on patch-age, we suggest that 1D Cohort models which instead explicitly model the distribution of plant sizes (rather than patch ages), may be a better fit to the demands of global climate and Earth System modelling.