Figures
Abstract
Understanding the global distribution of biomes is essential for biodiversity conservation, climate modeling, and land-use planning. Traditional approaches often summarize climate data into indices, and recent models sometimes include extreme events such as severe droughts or rare cold spells. This study evaluates how the choice of machine learning algorithm, climate data summarization, and extreme climate indices affect the accuracy and robustness of global biome modeling. Four algorithms were tested: random forest (RF), support vector machine (SVM), naive Bayes (NV), and LeNet convolutional neural network (CNN). RF and CNN achieved the highest accuracy, with CNN preferred due to RF’s stronger overfitting. Summarizing climate data into indices reduced accuracy by 1–2%, while adding extreme indices increased accuracy by <2% (except for NV, which performed poorly overall). However, extreme climate data caused large mismatches between observed and predicted climate values, reducing robustness as measured by prediction consistency. These results indicate that including extreme climate data in global biome prediction models offers limited accuracy gains but can significantly weaken robustness, so caution is advised.
Citation: Sato H (2026) Predicting dominant terrestrial biomes at a global scale using machine learning algorithms, climate variable indices, and extreme event indices. PLoS One 21(2): e0324107. https://doi.org/10.1371/journal.pone.0324107
Editor: Chong Xu, National Institute of Natural Hazards, CHINA
Received: April 20, 2025; Accepted: December 10, 2025; Published: February 26, 2026
Copyright: © 2026 Hisashi Sato. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data required to reproduce the analyses described herein are openly available via Zenodo (https://doi.org/10.5281/zenodo.8113935).
Funding: HS received two grants from the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan: [[Arctic Challenge for Sustainability II (ArCS II) [Program Grant Number JPMXD1420318865] [Arctic Challenge for Sustainability III (ArCS III) [Program Grant Number JPMXD1720251001]]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Biomes—major regional ecological community defined by distinctive life forms and dominant plant species [1]—are primarily determined by climate [2,3] and, in turn, influence climate through biophysical and biochemical feedbacks [4]. Understanding biome distributions is therefore essential not only for estimating land potential and guiding conservation policy, but also for informing climate projection models (reviewed in Hengl, Walsh [5]).
Many methods have been proposed to model biome distributions (reviewed by Sato and Ise [6]). Traditional approaches often use a limited set of derived climate indices (e.g., annual precipitation, coldest-month temperature) to summarize monthly or seasonal data. However, recent advances in machine learning (ML) allow models to incorporate a larger number of raw variables without summarization, potentially improving accuracy and flexibility. For example, Hengl, Walsh [5] used 160 environmental variables, including soil and topography, and monthly climate variables, to model biome distribution using ML algorithms. However, excessive input dimensionality may reduce generalizability and increase computational cost, underscoring the need to balance model complexity and performance. Among the diverse variables now accessible through ML methods, extreme climate events—such as severe droughts and rare low-temperature incidents—have emerged as particularly important predictors of boundaries of biomes (reviewed in Beigaite, Tang [7]). Incorporating extreme climate indices alongside regular variables has been found to improve the performance of decision tree-based models [7].
This study evaluates the effectiveness of machine learning models in predicting the global distribution of potential natural vegetation (PNV), i.e., vegetation in the absence of human influence, under current and future climate conditions. It addresses three questions: (1) Which machine learning algorithm performs best in reproducing the current biome distribution? (2) Does using raw monthly climate variables improve model performance compared to using derived climate indices (BIOCLIM)? (3) Does incorporating extreme climate indices (CLIMDEX) improve model robustness under projected future conditions (2060–2080, RCP8.5)? The analysis focuses on prediction accuracy, model stability, and spatial consistency under extrapolative scenarios.
To contextualize the modeling approach adopted in this study, two reference points are particularly relevant. First, process-based dynamic global vegetation models (DGVMs) simulate vegetation dynamics through mechanistic representations of ecosystem processes [8]. These include plant physiological functions, carbon cycling, and interspecies competition. While this allows for detailed representation of ecological mechanisms, DGVMs rely on numerous parameterizations and assumptions, which can introduce uncertainty—especially when projecting into novel future climates. In contrast, this study, like the one by [9], adopts a purely data-driven approach using ensemble machine learning algorithms that infer climate–vegetation relationships directly from observed data. This enables the models to capture empirical patterns with fewer structural assumptions, offering a computationally efficient and flexible alternative for large-scale biome prediction, particularly under extrapolative scenarios. Second, unlike land cover maps derived from satellite imagery, which primarily reflect current human-modified landscapes, this study aims to model potential natural vegetation (PNV)—that is, the vegetation that would exist without anthropogenic disturbance—at a 0.5° resolution. In this context, data-driven modeling provides an effective means for identifying climate-sensitive biome boundaries and anticipating potential shifts under future scenarios, independent of current and future land-use patterns.
Methods
Biome data
This study used the potential natural vegetation (PNV) dataset compiled by Beigaite, Tang (7) for their decision tree-based models of the global PNV distribution. Here, the same dataset is applied to a wider range of machine learning algorithms. In their study, the authors derived the PNV data from the Moderate Resolution Imaging Spectroradiometer (MODIS) MCD12C1 land cover product in 2001 [10], specifically using the International Geosphere Biosphere Programme (IGBP) land cover classification. This dataset based on the supervised learning classification of the MODIS Terra and Aqua reflectance data [11], contains percent cover for 17 IGBP classes [12] in each grid cell at a resolution of 0.05°. They resampled the data to 50 km × 50 km grids and identified the dominant natural vegetation type in each grid cell. Grid cells partly affected by human activity were retained, under the assumption that relative proportions of natural vegetation remain stable despite such modifications. Of the original 17 categories, only 13 representing natural vegetation were used (Fig 1, S1 Table). Cells with 100% human activity, water cover, or both were excluded as was Antarctica, leaving 52,297 grid cells for analysis. Although Beigaite, Tang (7) did not specify the proportion removed, it is estimated that roughly 10–11% of the land grid cells (excluding Antarctica) were excluded.
Although many biome classifications exist (e.g., [13,14]), I used this PNV dataset for three practical reasons aligned with global climate-vegetation modelling at 0.5°: (i) the workflow-deriving the dominant natural class from MODIS/IGBP and resampling to ~50-km grids-matches my analysis grid; (ii) its climate inputs and projections are processed consistently with the BIOCLIM (AveI) and CLIMDEX (CEI) indices used here, with harmonized definitions and resolution for present data and CMIP5-based futures; and (iii) employing the same dataset as related machine-learning studies facilitates comparability without additional preprocessing.
Because the aim of this study is to model potential natural vegetation (PNV), I retained grid cells that are only partially affected by human activity, on the premise that the dominant natural-vegetation signal remains informative at 0.5° resolution. By contrast, cells with 100% human cover and/or water were excluded, leaving 52,297 land grid cells (approximately 10–11% of land cells excluded, Antarctica removed). This choice minimizes spatial coverage bias while preserving information on the prevailing natural type. I acknowledge, however, that uncertainty in PNV estimation may increase in regions with strong human influence, and I interpret results in those areas with caution.
Machine learning algorithms
Four machine learning algorithms were employed: RF [15], SVM [16], NV [17], and CNN [6]. RF, SVM, and NV were implemented in R v3.3.3 [18] using the randomForest, ksvm, and naiveBayes packages, respectively, with commands such as randomForest(VegNo ~ ., DatasetTrain), where DatasetTrain is the training dataset and VegNo is the biome category column. All models were run with default settings to (1) ensure fair comparability by avoiding bias from parameter tuning, (2) keep the implementation straightforward and reproducible, and (3) align with the study’s objective of evaluating algorithms and data combinations rather than optimizing a single best-performing model. This choice also simplified implementation. The RF model, for instance, already achieved 100% accuracies on the training sets with default parameters, indicating strong overfitting and suggesting that further tuning would not improve generalization in this case. While strategies such as cross-validation and variable selection can reduce overfitting [19], the RF results indicate that high-capacity models may still overfit even under best-practices settings.
In contrast to Beigaite, Tang (7), a decision tree algorithm was not included in this study. Although decision trees rapidly provide interpretable boundary conditions for the distribution of a given output variable, they are generally inferior to the algorithms explored in this study in terms of reconstruction accuracy [20]. The RF algorithm is an ensemble of decision tree algorithms, which should provide higher model accuracy [15].
Although the models other than CNN were trained using numerical climate data, applying CNN algorithms requires converting the climate data into images. CNNs are typically applied to analyze visual imagery and have been successfully adapted for species distribution modeling at regional [21,22] and global scales [6]. The present CNN was trained following the method of Sato and Ise (6). This method represents climatic conditions using graphical images and employs them as training, testing, and prediction data for CNN models. I selected this method because it allows CNNs—originally developed for image analysis—to automatically extract nonlinear seasonal patterns from multiple climate variables while preserving their temporal structure, enabling convolutional filters to identify spatially coherent features. This method automatically extracts nonlinear seasonal patterns for climatic variables relevant to biome classification. Each graphical image is 256 × 256 pixels and is divided into rectangular cells representing each data point, with tiles in each cell expressing the values in grayscale. Prior to this visualization, climate variables were standardized to the range 0.01–1.00 using a log transformation. The R code for generating the images is available in the online open data repository.
In this paper, the term “CNN” is used, for convenience, to refer to the method of Sato and Ise (6), a machine-learning approach that includes transforming climate data into images as an integral part of its framework. In this method, the arbitrariness inherent in converting climate data into images can lead to variation in learning accuracies. However, Sato and Ise (6) demonstrated that, despite employing various strategies for image transformation and evaluating the resulting differences in learning performance, the effects on biome prediction were minimal: the range of training accuracies was 58.3–59.7% among four different imaging methods (such as pie charts and various color palettes), and 56.2–57.8% among four schemes used to transform climatic variables prior to graphical conversion (such as linear, log, and sigmoid transformations). Although these results are based on different climate and biome datasets than those used in the present study, they provide a useful reference.
The four algorithms selected in this study represent contrasting assumptions and capabilities in handling nonlinearity and feature interactions. RF is an ensemble-based decision tree method known for its robustness, but it can easily overfit training data without proper regularization or depth constraints, especially when using default settings [23]. CNNs have a strong capacity to extract complex nonlinear patterns from spatially structured inputs and have demonstrated superior performance in recognizing hierarchical relationships in multidimensional data [24]. SVMs and NV represent more traditional machine learning approaches: SVMs are sensitive to data scaling and kernel choice, while NV classifiers rely on conditional independence assumptions that may not hold for ecological data. Including these diverse algorithms allowed a broad assessment of how model assumptions influence biome classification performance. Although CNNs were trained using visually encoded climate data rather than tabular variables, the underlying information was identical across all algorithms, ensuring comparability despite differences in input format.
Climate data
This study used four climate datasets: averaged monthly air temperature and precipitation (Ave, 24 variables), averaged monthly climate indices (AveI, 16 variables), climate extreme indices representing extreme conditions on a daily scale such as the maximum length of a dry spell (CEI, 27 variables), and a subset of CEI (CEIpart, 21 variables). The variables included in AveI and CEI are listed in Tables 1 and 2, respectively. S1–S3 Figs show the present (1970–2000) and future (2061–2080) distributions of Ave, AveI, and CEI, respectively. Among all of the climatic variables used in this study, only six in the CEI dataset (Tn10p, Tx10p, Tn90p, Tx90p, WSDI, and CSDI) had completely separate distributions between the present and future. Another indexed extreme climate dataset, CEIpart, was constructed by excluding these variables from the CEI dataset.
The Ave data were obtained from the WorldClim version 2.1 product (released January 2020; Fick and Hijmans (25)), which represents average monthly air temperature and precipitation data for 1970–2000. The original WorldClim 2.1 product [25] was downloaded at a spatial resolution of 10 min, and resampled to 50 km × 50 km grids using the nearest-neighbor method. AveI was released by Beigaite, Tang (7), summarizing WorldClim 2.1 properties in terms of annual means (e.g., BIO1 and BIO12), seasonality (e.g., BIO4, BIO7, and BIO15), and limiting environmental factors to a monthly scale (e.g., BIO5, BIO6, and BIO14).
The CEI product was also released by Beigaite, Tang (7) using the CLIMDEX [26,27]. CLIMDEX comprises four datasets that were derived from different reanalysis datasets. Among these, Beigaite, Tang (7) used a dataset calculated from the ERA-Interim reanalysis dataset, which accurately reproduces the observed climate extremes [28]. The CEI data derived from the ERA-Interim reanalysis dataset covers 32 years (1979–2010). Multi-year CEI values were averaged for each grid; multi-year averages of extreme indices are commonly used to represent average extreme conditions in the past and future [26,27,29]. The original resolution of the CEI data was 1.5° × 1.5°; they were transformed onto 10 min × 10 min grids through conservative interpolation and then resampled to 50 km × 50 km grids using nearest-neighbor interpolation.
Future climate conditions (2061–2080) were projected using BIOCLIM [25] and CLIMDEX [26,27] indices derived from the Intergovernmental Panel on Climate Change (IPCC) Coupled Model Intercomparison Project Phase 5 (CMIP5) ensemble means of 11 models, averaged over that period. RCP8.5 was selected as it represents the most severe climate change scenario, providing a stringent test of model robustness under conditions far beyond the present-day range. In the IPCC’s Fifth Assessment Report (AR5, 2014), RCP8.5 assumes continued growth in global GHG emissions throughout the 21st century, reaching ~758 ppm CO2 by 2080 [30]. All variables were standardized to ensure compatibility between present and future datasets. Present-day BIOCLIM indices were obtained directly from WorldClim v2 [25]. Future BIOCLIM indices were generated from CMIP5 outputs that had been bias-corrected and adjusted to match the definitions and resolution and definition of the present-day BIOCLIM data. CLIMDEX indices were calculated using the same methods for historical and projected data [26,27]. These data sources and processing steps follow the same approach used in [7]. Equivalent CMIP6 products with the same resolution and correction methods are not yet widely available.
Intermediate projection periods (e.g., 2041–2060) were not used because the aim was to test model robustness under the most extreme climate conditions. Since RCP8.5 represents the highest emission trajectory, its far-future projection (2061–2080) was considered sufficient for evaluating extrapolative performance.
Data analysis
The learning performance of six climate dataset combinations—Ave, Ave + CEI, Ave + CEIpart, AveI, AveI + CEI, and AveI + CEIpart—was compared to disentangle three effects: (1) summarizing the climate data into indices, (2) adding extreme climate indices, and (3) adding overlapping extreme climate indices whose distributions are shared between current and future conditions. Four machine learning algorithms were applied to each dataset combination, resulting in 24 models in total. By structuring the analysis this way, I avoided the complexity of interpreting results from an exhaustive examination of which specific feature combinations the machine learning algorithms most strongly depended on.
For each model, 25% of all 52,297 grids (13,074 grids) were randomly selected for training, and the remaining 75% (39,223 grids) for testing. This proportion is the reverse of the typical 70–80% allocation to training [31], chosen to emphasize model robustness over performance and to ensure that rare vegetation types (<1%) were adequately represented in the test set. Training accuracy was defined as the proportion of correct predictions on the training data, and test accuracy as the proportion on the test data; their difference was used as the overfitting score [32]. To reduce sampling bias and better assess generalization, each model was evaluated in ten replicate experiments using different random seeds. The results were averaged across the ten independent 25:75 splits, serving a similar role to k-fold cross-validation.
To evaluate model robustness under future climate scenarios, I compared biome maps produced by different machine learning algorithms trained on the same dataset. Here, robustness refers to the stability of predictions across different algorithms. It was measured using the pairwise coincidence rate—the percentage of grid cells where two models assigned the same biome class. This metric captures agreement among models regardless of their match with the observed map, providing an indicator of consistency rather than accuracy (see Table 4).
Results
Overall model accuracy
Across all training datasets, three of the four machine learning models—RF, CNN, and SVM—showed high test accuracy in reconstructing global PNV distributions (Table 3). Test accuracy ranged from 80.1–81.4% for RF, 77.1–82.0% for CNN, 74.6–78.0% for SVM, and 44.2–50.1% for NV. Based on these values, the ranking in descending order of accuracy was RF, CNN, SVM, and NV. The NV model performed markedly worse, with large misclassification areas in boreal and tropical forests (Figs 1, S4–S7 Fig).
All models exceeded the baseline accuracy of 17.8%, obtained by predicting all grid cells as the most frequent PNV class, grassland (SI S1 Table). As a further reference, a naive model assigning each grid cell the most frequent PNV observed at its latitude achieved 49% accuracy—lower than all machine learning models except NV.
The NV model’s low test accuracy was mainly due to overestimating areas dominated by boreal forest, tropical rainforest, and deciduous broadleaf forest (Fig 4). In contract, the other models primarily showed discrepancies along PNV boundaries (Figs 2, 3, and 5), consistent with the observed fragmentation of biome distributions along PNV boundaries (Fig 1). The biome distributions reconstructed by the models, however, exhibited more continuous structures (Figs. S4–S7 Fig). The NV model’s poor performance likely stem from its assumption that prediction variables are independent—a condition violated by climate variables. The NV model was excluded due to its poor performance from further analysis and discussion Fig 4.
Four sets of climate data were used for training and simulation: (a) averaged monthly air temperature and precipitation (Ave), (b) averaged monthly climate indices (AveI), (c) Ave + climate extreme indices (CEI), (d) AveI + CEI, (e) Ave + a subset of CEI (CEIpart), and (f) AveI + CEIpart.
Error analysis
All models showed similar weaknesses in classifying certain small-area biomes, “Wetlands” and “Closed Shrublands” (0.5% and 0.2% of grid cells, respectively). Using the most basic AVE dataset, test accuracy for Wetlands was RF: 23.4%, SVM: 0.0%, and CNN: 3.6%. For Closed Shrublands, the corresponding values were RF: 7.3%, SVM: 29.0%, and CNN: 5.7% (S2-S4 Tables). Notably, the SVM model did not classify any grid cells as “Wetland” across all 39,233 grids×10 tests. These errors were widely dispersed and occurred in all models. The poor performance for these small-area biomes likely reflects their limited representation in the training data, which in turn reduces classification accuracy in both present-day reconstructions and future projections, and increase inter-model variability in predicted range shifts.
Overall accuracy is a standard performance metric, but it can be misleading when class distributions are highly imbalanced, as in this study, where some biomes occupy less than 1% of grid cells. To provide a fairer assessment that accounts for chance agreement, Cohen’s Kappa [33] was calculated from the confusion matrices of models trained with the Ave dataset. Kappa values range from 0 to 1, with 0.61–0.80 indicating “substantial” agreement and 0.81–1.00 indicating “almost perfect” agreement. The results were RF: 0.784, SVM: 0.728, and CNN: 0.759, all within the “substantial” range. The ranking by Kappa was consistent with that by test accuracy: RF > CNN > SVM.
While Cohen’s Kappa offers a chance-corrected measure of overall agreement, it does not indicate the nature of classification discrepancies. To distinguish whether these discrepancies stem from differences in class proportions or from their spatial arrangement, I decomposed the errors into quantity disagreement and allocation disagreement [34]. Quantity disagreement quantifies the difference in the number of grid cells assigned to each biome class, whereas allocation disagreement measures mismatches in their spatial placement. This breakdown can help guide whether model improvements should focus on adjusting overall class proportions or on refining spatial accuracy.
Quantity disagreement values were RF: 0.019, SVM: 0.045, and CNN: 0.024. Allocation disagreement values were notably higher—RF: 0.169, SVM: 0.191, and CNN: 0.185—across all models. In every case, allocation disagreement (0.169–0.191) exceeded quantity disagreement (0.019–0.045), indicating that spatial allocation errors are the primary source of model uncertainty. Consistent with other metrics, the ranking was RF > CNN > SVM. This pattern likely reflects the fragmented nature of observation-based biome distributions in regions with similar climatic conditions (Fig 1), with most mismatches occurring along biome boundaries (Figs 2, 3, and 5).
Across models, summarizing data into indices reduced test accuracy. The change from Ave to AveI was −1.1% for RF, −1.8% for SVM, and −2.0% for CNN (Table 3). Adding extreme climate indices (CEI) generally improved accuracy. Compared with Ave, Ave + CEI increased accuracy by +0.2% (RF), + 1.6% (SVM), and +1.0% (CNN). Compared with AveI, AveI + CEI increased accuracy by +1.1% (RF), + 3.1% (SVM), and +2.8% (CNN). Replacing CEI with partial CEI (CEIpart) produced no consistent trend: For RF, the change was negligible (+0.1% vs. + 0.1%). For SVM, accuracy decreased (−0.3% vs. −0.8%), whereas for CNN, it increased (+1.7% vs. + 2.1%).
Training accuracy, overfitting, and future-climate consistency
For all models-datasets combinations, training accuracy exceeded test accuracy, resulting in a positive overfitting score (training accuracy – test accuracy; Table 3). The RF model consistently achieved 100% training accuracy, leading to highest overfitting scores (18.6–20.0%). SVM and CNN had much lower overfitting scores, at 1.38–2.05% and 0.75–2.17%, respectively.Under current climatic conditions, all models reconstructed highly coincident PNV distributions regardless of the training datasets (accuracy: 70.1–86.4%, Table 4). Differences in reconstructed PNV correspondence between model pairs were small: less than 2.0% between RF and SVM (accuracy: 84.5–86.4%), less than 3.3% between RF and CNN (accuracy: 70.8–74.1%), and less than 2.3% between SVM and CNN (accuracy: 70.1–72.4%).
When the trained models were applied to future climatic conditions−i.e., conditions beyond the training data−much larger differences emerged among the PNV distributions generated by different model and dataset combinations (accuracy: 1.6–82.8%; Table 4). Discrepancies were particularly pronounced in PNV maps from models trained on CEI datasets (see Figs S8–S11). SVM models trained on the CEI dataset predicted only evergreen broadleaf forest (Fig. SI9c, d), whereas CNN models produced maps dominated by grassland and savanna (Fig. SI11c,d). Substituting CEI data with partial CEI (CEIpart) reduced these extreme outputs (Figs. SI9e, f and 11e,f). When models trained with the NV algorithm and CEI dataset were excluded, the remaining models produced much more consistent PNV distributions under future climate conditions (accuracy: 51.7–82.8%, Table 4).
Discussion
Comparative performance of machine learning algorithms
Across all input dataset combinations, RF and CNN algorithms provided more accurate global PNV models than SVM and NV. A concise rationale for selecting these four algorithms and their contrasting assumptions is provided in the Methods section (“Machine learning algorithms”).
Hengl, Walsh (5) found that the RF algorithm consistently outperformed other machine learning algorithms, including neural networks. In their study, a stack of 160 global maps representing biophysical conditions over the terrestrial surface—including atmospheric, climatic, relief, and lithologic variables—was used as explanatory variables to predict 20 biome classes in the BIOME 6000 dataset [35]. Although a direct comparison with the present study is not possible, their findings support RF as an effective machine learning algorithm for reconstructing biome maps. The present study is the first to compare the performance of a CNN algorithm adapted for biome modeling [6] with that of other machine learning algorithms; this CNN showed performance comparable to that of the RF algorithm.
Model reliability: Overfitting and generalization
The RF and CNN algorithms exhibited comparable test accuracy; however, CNN was preferred because RF produced much higher overfitting scores than the other machine learning algorithms examined in this study. Although overfitting in RF could potentially be reduced by limiting tree depth, doing so would require adjusting the model based on the test data, which would compromise the experimental design by failing to keep calibration and testing independent. Overfitting is an inevitable risk associated with empirical models [32]. Fourcade, Besnard (19) demonstrated an extreme example of pseudo-predictive variables−randomly chosen classical paintings−increasing the accuracy of species distribution modeling; these models sometimes had even higher evaluation scores than models trained with relevant environmental variables. To avoid overfitting or the use of pseudo-predictive variables, Fourcade, Besnard (19) suggested investing greater effort in cross-validation and ensuring the selection of the most important predictors. This approach was followed in the present analysis.
Model robustness under climate extremes
Adding extreme climate data slightly improved test accuracy; however, it substantially reduced model robustness, defined here as the consistency of model predictions under forecast climate conditions. This reduction in robustness was primarily driven by six CEI variables whose distributions deviated markedly from those in the training data, underscoring the importance of assessing the distributions of both training and prediction variables when building empirical models. Because the improvement in test accuracy obtained by including extreme climate data did not outweigh the loss in robustness, I recommend excluding extreme climate data when predicting global biome distributions at the geographical resolution used in this study (0.5°).
Here, “robustness” is operationalized as cross-algorithm consistency under future climates; while this proxy is useful for decision-making, it can also reflect reduced sensitivity to novel or extreme conditions. The observed loss of consistency was primarily driven by a small set of CEI variables whose distributions diverged strongly between training and projection conditions. In practice, whether to include CEI should follow the research question: (i) for a high-agreement baseline map under extrapolative climates, exclude CEI; (ii) to explore a broader possibility space or risk envelopes, include CEI as a supplementary input; and (iii) for species- or local-scale questions, extremes may contribute more and could be prioritized.
Role of climate indices in simplifying input data
The climate index data used in this study reduced the number of variables by two-thirds (from 24 to 16). However, it only slightly decreased model accuracy (−1.1%, −1.8%, and −2.0% for RF, SVM, and CNN, respectively), demonstrating that the typical climate indices employed here effectively captured extracted essential climate information relevant to global biome distribution. Nevertheless, indexing has limited utility in building machine learning-based non-transparent models, it is essential for constructing interpretable models such as decision trees [7].
Contextualizing with previous studies and future directions
As shown above, the reliability of empirical models cannot be guaranteed beyond the range of their training data. In contrast, process-based models are expected to behave appropriately even when applied to environmental conditions that deviate slightly from those represented in observational datasets. This is one reason why many groups have proposed and developed dynamic global vegetation models (DGVMs) with greater fidelity to process to ecological processes, aiming to predict biome distributions mechanistically, considering climate, soil, and the fundamentals of plant physiology and ecology [36]. For example, Pugh, Rademacher (37) compared several DGVMs and identified discrepancies in their outputs and mechanisms, although their study did not specifically focus on biome distribution prediction. The expected increase in the frequency of extreme climate values in the future, which could significantly differ from the current distribution, may justify a shift from empirical models toward DGVMs. However, current DGVMs are not yet a reliable option for reconstructing plant population dynamic processes on a global scale; biome map predictions under commonly used climate change scenarios differ significantly among state-of-the-art DGVMs [37,38]. Therefore, empirical models continue to play an essential role in the approximate mapping of biomes under changing climatic conditions.
There is clear evidence that climate extremes influence a plant demographic processes such as growth [39,40], regeneration [41], and mortality [42,43], all of which affect plant species distributions. However, this does not necessarily mean that extreme climate data should always be included to improve biome map reconstruction, as mean climatic values are often strongly correlated with extreme climatic variables. Nevertheless, at local and species levels, extreme climate conditions may serve as more important predictors; Zimmermann, Yoccoz [44] showed that augmenting mean climate predictors with variables representing climate extremes can improves the predictive power of species distribution models.
A crucial disadvantage of the climatic envelope approach is that extrapolating current correlations between climate and biome distributions into the future may lead to substantially biased predictions. Thus, strong model performance under present climate conditions does not guarantee similar performance under novel climatic conditions that may arise in the future. However, no models—except those trained with the NV algorithm and the CEI dataset—showed notable increases in PNV uncertainty under projected climatic scenarios. This finding suggests that robust models can be developed beyond the training data if machine learning algorithms and climatic variables are carefully selected. The climatic envelope approach also has other limitations; for example, it ignores time lags between climate change and vegetation response, changes in atmospheric CO2, and human land-use change (as discussed in Sato and Ise (6)). Nonetheless, the climatic envelope approach remains useful for various applications, including benchmarking DGVMs [36].
Conclusion
CNN models trained on complete climate datasets without extreme indices provided the most balanced performance in terms of accuracy and robustness. This study examined how different machine learning algorithms and representations of climate data influence the accuracy and robustness of global PNV models. Both RF and CNN algorithms produced highly accurate models; however, RF exhibited substantial overfitting, which undermined its robustness. Consequently, the CNN model was considered more reliable overall.
Summarizing climate data into indices provided minimal benefit and slightly reduced model accuracy (by 1–2% in RF, CNN, and SVM), suggesting that such simplification may be unnecessary for non-transparent models like CNN. Although incorporating extreme climate variables marginally improved accuracy (by 1–2% across the same models), it also reduced robustness—particularly under future climate conditions that deviated from those represented in the training data. These results underscore the need to weigh carefully the trade-off between model accuracy and stability when including extreme climate variables.
Overall, the findings suggest that, within the current modeling framework, CNN-based models trained on complete climate datasets without extreme indices achieve a favorable balance between predictive accuracy and generalizability in global biome modeling. Although the CNN approach uses a graphically encoded input format, the underlying climate data remains consistent across models, enabling cautious yet meaningful comparisons between algorithms.
Supporting information
S1 Fig. Histograms of average monthly air temperature and precipitation (Ave, 24 variables).
Red bars: averages for 1970–2000; Blue bars averages for 2061–2080.
https://doi.org/10.1371/journal.pone.0324107.s001
(PDF)
S2 Fig. Histograms of average monthly climate indices (AveI, 16 variables).
Red bars: averages for 1970–2000; Blue bars averages for 2061–2080.
https://doi.org/10.1371/journal.pone.0324107.s002
(PDF)
S3 Fig. Histograms of climate extreme indices (CEI, 27 variables).
Red bars: averages for 1970–2000; Blue bars averages for 2061–2080.
https://doi.org/10.1371/journal.pone.0324107.s003
(PDF)
S4 Fig. Simulated potential natural vegetation (PNV) under current climatic conditions using the RF model.
Four sets of climate data were used for training and simulation: (a) Ave, (b) AveI, (c) Ave + CEI, (d) AveI + CEI, (e) Ave + CEIpart, and (f) AveI + CEIpart.
https://doi.org/10.1371/journal.pone.0324107.s004
(PDF)
S5 Fig. Simulated PNV under current climatic conditions using the SVM model.
The same experimental setup as in S4 Fig. was used.
https://doi.org/10.1371/journal.pone.0324107.s005
(PDF)
S6 Fig. Simulated PNV under current climatic conditions using the NV model.
The same experimental setup as in S4 Fig. was used.
https://doi.org/10.1371/journal.pone.0324107.s006
(PDF)
S7 Fig. Simulated PNV under current climatic conditions using the CNN model.
The same experimental setup as in S4 Fig. was used.
https://doi.org/10.1371/journal.pone.0324107.s007
(PDF)
S8 Fig. Simulated PNV under future climatic conditions (2061–2080) projected under the IPCC RCP8.5 scenario using the RF model.
Four climate datasets were used for training and simulation: (a) Ave, (b) AveI, (c) Ave + CEI, (d) AveI + CEI, (e) Ave + CEIpart, and (f) AveI + CEIpart.
https://doi.org/10.1371/journal.pone.0324107.s008
(PDF)
S9 Fig. Simulated PNV under future climatic conditions (2061–2080) using the SVM model.
The same experimental setup as in S8 Fig. was used.
https://doi.org/10.1371/journal.pone.0324107.s009
(PDF)
S10 Fig. Simulated PNV under future climatic conditions (2061–2080) using the NV model.
The same experimental setup as in S8 Fig. was used.
https://doi.org/10.1371/journal.pone.0324107.s010
(PDF)
S11 Fig. Simulated PNV under future climatic conditions (2061–2080) using the CNN model.
The same experimental setup as in S8 Fig. was used.
https://doi.org/10.1371/journal.pone.0324107.s011
(PDF)
S1 Table. Potential Natural Vegetation (PNV) classes used in the modelings.
From the IGBP classification, three human-mediated classifications (Croplands, Cropland/Natural Vegetation Mosaics, and Urban and Built-Up Lands) and Water Bodies were neglected. Descriptions were based on Loveland and Belward.
https://doi.org/10.1371/journal.pone.0324107.s012
(PDF)
S2 Table. Confusion matrix for biome classification using the AVE climate data set and the RF model.
Columns represent the actual classes, while rows represent the predicted class. This matrix is based on the test grid only, with a total of 392,230 predictions from 10 independent trials (39,223 grids×10 test). Shaded diagonal cells indicate correct classifications. Each cell shows the count (top) and column-wise percentage (bottom) within the actual class.
https://doi.org/10.1371/journal.pone.0324107.s013
(PDF)
S3 Table. Confusion matric for biome classification using the SVM model.
The same dataset and evaluation as in S2 Table were used.
https://doi.org/10.1371/journal.pone.0324107.s014
(PDF)
S4 Table. Confusion matrix for biome classification using the CNN model.
The same dataset and avaluation procedure as in S2 Table were used.
https://doi.org/10.1371/journal.pone.0324107.s015
(PDF)
Acknowledgments
The author thanks the anonymous reviewers of the previous version of the manuscript. Dr. Shuntaro WATANABE (Kagoshima Univ.) and Dr. Takeshi ISE (Kyoto Univ.) offered technical support regarding issues of CNN, including the installation of the pertinent computer environments.
References
- 1.
Lincoln R, Boxshall G, Clark P. A Dictionary of Ecology, Evolution and Systematics. 2nd edition ed. United Kingdom University Press, Cambridge; 1998.
- 2.
Adams J. Plants on the move. Vegetation-Climate Interaction - How Plants Make the Global Environment. Second Edition ed. Springer; 2010. p. 67–96.
- 3. Prentice IC, Cramer W, Harrison SP, Leemans R, Monserud RA, Solomon AM. Special paper: a global biome model based on plant physiology and dominance, soil properties and climate. Journal of Biogeography. 1992;19(2):117.
- 4. Pitman AJ. The evolution of, and revolution in, land surface schemes designed for climate models. Int J Climatol. 2003;23(5):479–510. pmid:ISI:000182449500001
- 5. Hengl T, Walsh MG, Sanderman J, Wheeler I, Harrison SP, Prentice IC. Global mapping of potential natural vegetation: an assessment of machine learning algorithms for estimating land potential. PeerJ. 2018;6:e5457. pmid:30155360
- 6. Sato H, Ise T. Predicting global terrestrial biomes with the LeNet convolutional neural network. Geosci Model Dev. 2022;15(7):3121–32.
- 7. Beigaitė R, Tang H, Bryn A, Skarpaas O, Stordal F, Bjerke JW, et al. Identifying climate thresholds for dominant natural vegetation types at the global scale using machine learning: Average climate versus extremes. Glob Chang Biol. 2022;28(11):3557–79. pmid:35212092
- 8. Argles APK, Moore JR, Cox PM. Dynamic global vegetation models: searching for the balance between demographic process representation and computational tractability. PLOS Clim. 2022;1(9):e0000068.
- 9. Bonannella C, Hengl T, Parente L, de Bruin S. Biomes of the world under climate change scenarios: increasing aridity and higher temperatures lead to significant shifts in natural vegetation. PeerJ. 2023;11:e15593. pmid:37377791
- 10.
Friedl M, Sulla-Menashe D. MODIS/Terra Aqua Land Cover Type Yearly L3 Global 0.05Deg CMG V006. In Center NLPDAA; 2015.
- 11. Friedl MA, Sulla-Menashe D, Tan B, Schneider A, Ramankutty N, Sibley A, et al. MODIS collection 5 global land cover: algorithm refinements and characterization of new datasets. Remote Sensing of Environment. 2010;114(1):168–82.
- 12. Loveland TR, Belward AS. The International Geosphere Biosphere Programme Data and Information System global land cover data set (DISCover). Acta Astronautica. 1997;41(4–10):681–9.
- 13. Beierkuhnlein C, Fischer J-C. Global biomes and ecozones – Conceptual and spatial communalities and discrepancies. Erdkunde. 2021;75(4):249–70.
- 14. Mucina L. Biome: evolution of a crucial ecological and biogeographical concept. New Phytol. 2019;222(1):97–114. pmid:30481367
- 15. Breiman L. Random forests. Machine Learning. 2001;45(1):5–32.
- 16. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
- 17.
Langley P, Iba W, Thompson K, editios. An analysis of Bayesian classifiers. In: Menlo Park, CA. 1992.
- 18.
R-Core-Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2018.
- 19. Fourcade Y, Besnard AG, Secondi J. Paintings predict the distribution of species, or the challenge of selecting environmental predictors and evaluation statistics. Global Ecol Biogeogr. 2017;27(2):245–56.
- 20.
Caruana R, Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. In: ICML ‘06: Proceedings of the 23rd international conference on Machine learning. Pittsburgh, Pennsylvania, USA, 2006.
- 21. Benkendorf DJ, Hawkins CP. Effects of sample size and network depth on a deep learning approach to species distribution modeling. Ecological Informatics. 2020;60:101137.
- 22.
Botella C, Joly A, Bonnet P, Monestiez P, Munoz F. A deep learning approach to species distribution modelling. In: Joly A, Vrochidis S, Karatzas K, Karppinen A, Bonnet P, editors. Multimedia tools and applications for environmental & biodiversity informatics. Springer Nature; 2018. p. 169–99.
- 23. Probst P, Wright MN, Boulesteix A. Hyperparameters and tuning strategies for random forest. WIREs Data Min & Knowl. 2019;9(3).
- 24. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. pmid:26017442
- 25. Fick SE, Hijmans RJ. WorldClim 2: new 1‐km spatial resolution climate surfaces for global land areas. Intl Journal of Climatology. 2017;37(12):4302–15.
- 26. Sillmann J, Kharin VV, Zhang X, Zwiers FW, Bronaugh D. Climate extremes indices in the CMIP5 multimodel ensemble: Part 1. Model evaluation in the present climate. JGR Atmospheres. 2013;118(4):1716–33.
- 27. Sillmann J, Kharin VV, Zwiers FW, Zhang X, Bronaugh D. Climate extremes indices in the CMIP5 multimodel ensemble: Part 2. Future climate projections. JGR Atmospheres. 2013;118(6):2473–93.
- 28. Donat MG, Sillmann J, Wild S, Alexander LV, Lippmann T, Zwiers FW. Consistency of temperature and precipitation extremes across various global gridded in situ and reanalysis datasets. J Climate. 2014;27(13):5019–35.
- 29. Seneviratne SI, Hauser M. Regional climate sensitivity of climate extremes in CMIP6 Versus CMIP5 multimodel ensembles. Earths Future. 2020;8(9):e2019EF001474. pmid:33043069
- 30.
Stocker TF, Qin D, Plattner G-K, Tignor M, Allen SK, Boschung J, et al. Climate change 2013: The physical science basis. Contribution of Working Group I to the fifth assessment report of the Intergovernmental Panel on Climate Change. Cambridge, United Kingdom and New York, NY, USA: Cambridge University Press; 2013. p. 1535.
- 31.
Géron A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent. 2nd ed. ed. Sebastopol: O’Reilly Media; 2019.
- 32. Leinweber DJ. Stupid data miner tricks. JOI. 2007;16(1):15–22.
- 33. Cohen J. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement. 1960;20(1):37–46.
- 34. Pontius RG Jr, Millones M. Death to Kappa: birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing. 2011;32(15):4407–29.
- 35.
Harrison SP. Biome 6000 DB classified plotfile version 1. In: Reading U, editor. 2017.
- 36. Fisher RA, Koven CD, Anderegg WRL, Christoffersen BO, Dietze MC, Farrior CE, et al. Vegetation demographics in Earth System Models: A review of progress and priorities. Glob Chang Biol. 2018;24(1):35–54. pmid:28921829
- 37. Pugh TAM, Rademacher T, Shafer SL, Steinkamp J, Barichivich J, Beckage B, et al. Understanding the uncertainty in global forest carbon turnover. Biogeosciences. 2020;17(15):3961–89.
- 38. Friend AD, Lucht W, Rademacher TT, Keribin R, Betts R, Cadule P, et al. Carbon residence time dominates uncertainty in terrestrial vegetation responses to future climate and atmospheric CO2. Proc Natl Acad Sci U S A. 2014;111(9):3280–5. pmid:24344265
- 39. Ciais P, Reichstein M, Viovy N, Granier A, Ogée J, Allard V, et al. Europe-wide reduction in primary productivity caused by the heat and drought in 2003. Nature. 2005;437(7058):529–33. pmid:16177786
- 40. Jolly WM, Dobbertin M, Zimmermann NE, Reichstein M. Divergent vegetation growth responses to the 2003 heat wave in the Swiss Alps. Geophysical Research Letters. 2005;32(18).
- 41. Ibáñez I, Clark JS, LaDeau S, Lambers JHR. Exploiting temporal variability to understand tree recruitment response to climate change. Ecological Monographs. 2007;77(2):163–77.
- 42. Bigler C, Bräker OU, Bugmann H, Dobbertin M, Rigling A. Drought as an inciting mortality factor in scots pine stands of the valais, Switzerland. Ecosystems. 2006;9(3):330–43.
- 43. Villalba R, Veblen TT. Influences of large-scale climatic variability on episodic tree mortality in northern patagonia. Ecology. 1998;79(8):2624–40.
- 44. Zimmermann NE, Yoccoz NG, Edwards TC Jr, Meier ES, Thuiller W, Guisan A, et al. Climatic extremes improve predictions of spatial patterns of tree species. Proc Natl Acad Sci U S A. 2009;106 Suppl 2(Suppl 2):19723–8. pmid:19897732