Global Validation of a Process-Based Model on Vegetation Gross Primary Production Using Eddy Covariance Observations

Gross Primary Production (GPP) is the largest flux in the global carbon cycle. However, large uncertainties in current global estimations persist. In this study, we examined the performance of a process-based model (Integrated BIosphere Simulator, IBIS) at 62 eddy covariance sites around the world. Our results indicated that the IBIS model explained 60% of the observed variation in daily GPP at all validation sites. Comparison with a satellite-based vegetation model (Eddy Covariance-Light Use Efficiency, EC-LUE) revealed that the IBIS simulations yielded comparable GPP results as the EC-LUE model. Global mean GPP estimated by the IBIS model was 107.50±1.37 Pg C year−1 (mean value ± standard deviation) across the vegetated area for the period 2000–2006, consistent with the results of the EC-LUE model (109.39±1.48 Pg C year−1). To evaluate the uncertainty introduced by the parameter Vcmax, which represents the maximum photosynthetic capacity, we inversed Vcmax using Markov Chain-Monte Carlo (MCMC) procedures. Using the inversed Vcmax values, the simulated global GPP increased by 16.5 Pg C year−1, indicating that IBIS model is sensitive to Vcmax, and large uncertainty exists in model parameterization.


Introduction
Terrestrial gross primary production (GPP) is the largest carbon flux in terrestrial ecosystems, and it is approximately 20 times larger than the amount of carbon introduced from anthropogenic sources [1]. Thus, even small fluctuations in GPP can cause large changes in the airborne fraction of carbon and subsequently influence future climate change [2]. Vegetation also contributes to human welfare by providing food, fiber and energy [3][4]. Therefore, regular monitoring and reliable estimation of global terrestrial GPP is important for improving our understanding of the global carbon cycle, accurately predicting future climate, and ensuring the long-term sustainability of terrestrial ecosystem services.
Ecosystem models serve as a backbone for evaluating large-scale and global GPP. Two categories of ecosystem models are widely used: process-based and satellite-based. Satellite-based models are driven by remotely sensed data and provide simple means of estimating GPP [5]; however, they are limited in their ability to model mechanisms. Process-based models typically exhibit detailed expressions of terrestrial processes, such as photosynthesis, respiration, phenology, and hydrological cycle. Therefore, processbased models play important roles in investigating the mechanisms underlying current biases in estimated ecosystem production [6], predicting the future conditions of the terrestrial carbon cycle, and exploring its feedback to climate change [7].
Numerous attempts have been made to develop and improve process-based models. However, a recent study from the North American Carbon Project (NACP) showed that current models perform poorly and difference between observations and simulations far exceed the observational uncertainty [8]. Model parameter uncertainty is a key source limiting the accuracy of process-based models. Knorr and Heimann analyzed the uncertainties of process-based models [9]. They found that parameter uncertainties could explain much of the large variance among models and that the largest uncertainties arose from plant photosynthesis, respiration and soil water storage.
The maximum rate of carboxylation by the enzyme Rubisco (V cmax ) is fundamental in modeling photosynthesis [10]. Sensitivity analysis shows that the projections of ecosystem production are particularly sensitive to the fixed parameters associated with V cmax [11]. Therefore, the parameterization scheme of V cmax is essential for GPP simulation, and its impacts on model performance need to be tested.
Eddy covariance measurements recorded by the increasing number of eddy covariance (EC) towers provide a great opportunity for model validation and improvement. Concurrent measurements include carbon fluxes, latent heat, and sensible heat, as well as meteorological conditions such as air temperature and relative humidity, and provide unprecedented datasets for model validation and evaluations of parameter constraints [12]. The current network of EC sites covers a wide range of ecosystem types, thus it has the potential to significantly improve our understanding of the variation in GPP across time, space and biomes [13].
The goal of this study was to validate a process-based ecosystem model (Integrated BIosphere Simulator, IBIS) based on measurements from 62 EC sites [14]. The specific objectives were to (1) examine the performance of IBIS over several ecosystem types, (2) compare IBIS model performance with a satellite-based model (i.e., Eddy Covariance-Light Use Efficiency, EC-LUE), and (3) investigate the impacts of the parameter V cmax on model performance.

IBIS model and parameter inversion
The Integrated BIosphere Simulator (IBIS) is designed to integrate a variety of terrestrial ecosystem processes within a single, physically consistent modeling framework. It represents land surface processes, canopy physiology, vegetation phenology, long-term vegetation dynamics, and carbon and water cycling [14]. The photosynthesis module of the IBIS model is provided by the formulations of Farquhar [15]. C 3 photosynthesis and C 4 photosynthesis are expressed separately in the IBIS model. For C 3 plants, the gross photosynthesis rate per unit leaf area, A g (mol CO 2 m 22 s 21 ) is expressed as where J e is light-limited rate of photosynthesis, J c represents the Rubisco-limited rate of photosynthesis, and J s is the photosynthesis limited by the inadequate rate of utilization of triose phosphate. The light-limited rate of photosynthesis is given as where a 3 is the intrinsic quantum efficiency of CO 2 uptake in C 3 plants (mol CO 2 mol 21 quanta), Q p is the flux density of photosynthetically active radiation absorbed by leaf (mol quanta m 22 s 21 ), C i is the concentration of CO 2 in the intercellular air spaces of the leaf (mol mol 21 ), and C Ã is the compensation point for gross photosynthesis (mol mol 21 ). The Rubisco-limited rate of photosynthesis is calculated as where V m is the maximum carboxylase capacity of Rubisco (mol CO 2 m 22 s 21 ) and K C and K O are the Michaelis-Menten coefficients (mol mol 21 ) for CO 2 and O 2 , respectively.
Under conditions of high intercellular CO 2 concentrations and high irradiance, photosynthesis is limited by the inadequate rate of utilization of triose phosphate. This limitation is expressed as where T is the rate of triose phosphate utilization. Photosynthesis in C 4 plants is similarly modeled as the minimum of three potential capacities to fix carbon [16]. The gross photosynthesis rate is given by where J i~a4 Q p is the light-limited rate of photosynthesis, J e~Vm is the Rubisco-limited rate of photosynthesis and J c~k is the CO 2 -limited rate of photosynthesis at low CO 2 concentrations. The parameter V cmax (mol CO 2 m 22 s 21 ) is very important for simulating the photosynthesis process. In the IBIS model, it is established as a constant that differs among plant functional types (PFTs) ( Table 1). To validate the IBIS model and investigate the impact of the V cmax parameter scheme on model performance, we conducted two simulations (IBIS and IBIS-Type) for each site. IBIS simulation used the default values of V cmax (Table 1). For the IBIS-Type simulation, the vegetation-specific V cmax values were inversed for each PFT. The Markov Chain-Monte Carlo (MCMC) procedure was used for the parameter inversion, and the Metropolis-Hastings (M-H) algorithm was used as the MCMC sampler [17][18] (see Xu et al. [19] and Yuan et al. [20] for detailed descriptions of the MCMC procedure). We conducted 10000 samples for each site and assigned the V cmax value with the highest frequency as the optimal value of V cmax . Finally, we ran the model using the inversed V cmax for each PFT as the IBIS-Type simulation.

EC-LUE model
A satellite-based model (i.e. EC-LUE) [5,[21][22][23] was used to compare the local and global GPP simulations with those of the IBIS model. The EC-LUE model is based on two assumptions: (1) ecosystem GPP has a direct relationship with the absorbed photosynthetically active radiation (APAR) via light use efficiency (LUE), where LUE is defined as the amount of carbon produced per unit of APAR; and (2) realized LUE may be reduced below its theoretical potential value by environmental stressors, such as low temperatures or water shortages. The EC-LUE model is driven by four variables: the normalized difference vegetation index (NDVI), photosynthetically active radiation (PAR), air temperature, and the ratio of sensible to latent heat flux (Bowen ratio).

Data
We used the eddy covariance (EC) data to validate the IBIS model. We used data obtained from the LaThuile dataset (http:// www.fluxdata.org). The daily GPP values were estimated from the eddy covariance measurements using a community standard method [24]. Briefly, GPP was estimated from the equation: where NEE day is daytime NEE. Daytime ecosystem respiration R day was estimated using daytime temperature and an equation describing the temperature dependence of respiration, which was formulated from nighttime NEE measurements. For further details on the algorithm, see Reichstein et al. (2005) [24]. The gap filling and quality control of this dataset are conducted according to standard criteria [25][26], and the uncertainty in annual GPP can be controlled to some extent below 100 g C m 22 year 21 [25]. This dataset is widely used in model studies. In the present study, we selected 62 EC sites for model validation (Table S1). The selected data covered six major terrestrial biomes: evergreen needleleaf forest (ENF), deciduous broadleaf forest (DBF), evergreen broadleaf forest (EBF), mixed forest (MF), grassland (GRA) and savanna (SAV). Additional information on the vegetation, climate and soil characteristics of each site was collected from the associated metadata of the LaThuile dataset. Daily average, maximum and minimum temperature, relative humidity, precipitation, cloud fraction, photosynthetically active radiation, latent heat and sensible heat were used to drive the IBIS, and GPP was used to evaluate its performance.
To decrease model uncertainty, we used the satellite-based leaf area index (LAI) from the Moderate Resolution Imaging Spectroradiometer (MODIS) as a model input. The Normalized Difference Vegetation Index (NDVI) data derived from MODIS were used to drive the EC-LUE model. The 8-day MODIS-NDVI data (MOD13) and the MODIS-LAI (MOD15) data with 1-km spatial resolution were used for model verification at the EC sites. Quality control (QC) flags, which signal cloud contamination in each pixel, were examined to filter out NDVI and LAI data of insufficient quality. We temporally filled missing or unreliable values for each 1-km MODIS pixel based on their corresponding quality assessment data fields, as proposed by Zhao et al. [27]. In addition, data on soil properties, including soil texture, organic carbon content and nitrogen content, were required to input soil information into the model, and were therefore collected from the sites where EC towers are established.
For global simulation, we used meteorological datasets from the Modern Era Retrospective Analysis for Research and Applications (MERRA) archive for 2000-2006 to drive the IBIS and EC-LUE models. MERRA is a NASA reanalysis dataset for the satellite era which uses a new version of the Goddard Earth Observing System Data Assimilation System Version 5 (GEO-5). We used climate conditions at 10 meters above the land surface and at a resolution of 0.5u latitude by 0.6u longitude. The Global Gridded Surfaces of Selected Soil Characteristics datasets were used to supply soil properties for the IBIS model; detailed information is available from the website (http://ww.isric.org). The global distribution of plant functional types (PFTs) was derived by overlapping the MODIS land-cover type product with the Köppen-Geiger climate classification map, with the land cover classifications aggregated into nine PFTs ( Table 1). The IBIS model was unable to simulate carbon cycle processes in cropland or wetland; therefore, these two vegetation types were replaced with C 3 grassland in this study.

Statistical analysis
Three metrics were used to evaluate model performance: (1) the coefficient of determination, R 2 , which represents how much variation in the observations is explained by the model simulations; (2) the root mean square error (RMSE), which represents the total difference between the simulated and measured values; (3) the relative predictive error (RPE), which represents the ratio of error to observation. It is computed as where P and O are the mean simulated and measured values, respectively. One-way ANOVA was employed using SPSS software to test the significance of the differences in optimal V cmax for each biome in the IBIS-Type scheme, and the paired-samples t-test was used to test the significance of the difference in statistical metrics between different models.

Model validation at EC towers
The overall comparison of the estimated GPP with the EC measurements showed that the IBIS model performed well in capturing the variability in GPP. Across all study sites, the IBIS model explained approximately 60% of the variation in siteaveraged GPP (Fig. 1). The coefficient of determination (R 2 ) varied from 0.11 at the ES-LMa site to 0.94 at the CA-Man site, with a mean value of 0.71 across all EC sites. The mean R 2 values   (Fig. 2). On average, the mean relative predictive error (RPE) of all sites was 29.68%, and the RPE at most sites was less than 20% (Table 2). Although the IBIS model explained most of the GPP variability at individual sites, large differences between the predicted and estimated GPP values from the EC measurements were apparent at some sites and for some vegetation types. The IBIS model underestimated GPP for the majority of PFTs and overestimated GPP in evergreen broadleaf forest. Specifically, over 40 of the 62 sites had negative RPE values, and the RPEs of 26 sites were below 220%. The largest underestimation occurred at two savanna sites, ZA-Kru and BW-Ma1, with PREs of 251.83% and 248.15%, respectively. The other 24 sites with RPEs below 2 20% were predominantly deciduous broadleaf forest (6 sites), evergreen needleleaf forest (7 sites) and grassland (8 sites). In addition, the model overestimated GPP at 6 sites with RPEs greater than 20%, four of which were evergreen broadleaf forest. Extreme overestimation occurred at two sites of evergreen broadleaf forest, FR-Pue and PT-Mi1, with RPEs of 59.41% and 72.49%, respectively.

Comparison of IBIS and EC-LUE
Compared to the satellite-based EC-LUE model, the IBIS model performed comparably at most sites, according to the R 2 , RMSE and RPE values ( Table 2). The IBIS mean R 2 values for evergreen broadleaf forest, evergreen needleleaf forest, mixed forest and grassland were similar to those of the EC-LUE model, which were 0.52, 0.79, 0.75, 0.83, respectively; no significant differences were found for most PFTs. Comparable results were also found for the RMSE, and no significant differences in RMSE were detected for any PFTs except for evergreen broadleaf forest.
However, the EC-LUE model had some advantages for some PFTs (Figure 2). For broadleaf forest, grassland and savanna, the R 2 of the EC-LUE was significantly higher than that of the IBIS, particularly for savanna, where the mean was 76% higher than that of the IBIS (0.74 for EC-LUE and 0.42 for IBIS). In addition, the EC-LUE had significantly low model error for evergreen broadleaf forest, with a mean RMSE 22% lower than that of the IBIS (2.12 g C m 22 day 21 for EC-LUE and 2.71 g C m 22 day 21 for IBIS).

IBIS Performance with inversed V cmax
Model parameterization did not significantly improve the performance of the IBIS, as indicated by the R 2 and RMSE. The mean R 2 values of the IBIS-Type were 0.64, 0.51, 0.81, 0.77, 0.76, 0.41 for deciduous broadleaf forest, evergreen broadleaf forest, evergreen needleleaf forest, mixed forest, grassland and savanna, respectively. These values were very similar to those of the IBIS model; only the R 2 of the deciduous broadleaf forest differed significantly, being higher in the IBIS-Type model. The overall R 2 increased from 0.60 to 0.68 after revising V cmax (Fig. 1). The RMSEs of the IBIS and the IBIS-Type models were also comparable. The mean RMSE of the IBIS-Type varied from 1.49 g C m 22 day 21 for evergreen needleleaf forest to 2.60 g C m 22 day 21 for deciduous broadleaf forest. Evergreen broadleaf forest was the only vegetation type yielding a significant difference in RMSE between the two models.
The IBIS employs a set of parameter values for a given PFT (Table 1). In the IBIS-Type scheme, the V cmax value, which was set to as the mean value of V cmax inversed from each site within a given PFT, was largely differentiated from the original parameter values. V cmax of the IBIS-Type was 36.67%, 30.00%, 24.29%, 61.09% and 30.00% higher than that of the IBIS for deciduous broadleaf forest, evergreen needleleaf forest, mixed forest, grassland and savanna, respectively; it was 50% lower for evergreen broadleaf forest. In addition, the V cmax value inversed at each site did not indicate significant differences among different PFTs (Fig. 3).

Temporal and spatial patterns in global averaged GPP
The spatial pattern in average annual GPP estimated using the original IBIS, the IBIS-Type and the EC-LUE models from 2000 to 2006 were generally consistent (Fig. 4). The highest value was from the humid tropics (Amazonia, Central Africa and Southeast Asia), with an annual GPP over 2000 g C m 22 year 21 . Temperate regions had intermediate levels of GPP, and the lowest GPP was found in both cold and arid regions.
The magnitude of GPP estimated by the IBIS and EC-LUE models were comparable, reaching 107.5061.37 Pg C year 21 and 109.3961.48 Pg C year 21 (mean value 6 standard deviation) globally, respectively (Fig. 5). Two-model comparisons revealed consistent GPP estimations for the various PFTs (Table 3) with the exception of savanna, for which the IBIS model greatly underestimated GPP. The GPP estimate of the IBIS-Type scheme was much higher than those of the other two models, with a global value of 123.9761.76 Pg C year 21 (Fig. 5). Larger GPP estimations using the IBIS-Type simulation were found for most biomes (Table 3).

Discussion
Process-based ecosystem models are one of the most important components of earth system models used to predict future climate change [1]. The IPCC AR5 requested that all earth system models integrate the global carbon cycle module [7]. Previous studies have shown that the uncertainty in carbon cycle models can produce 40% differences in the predicted temperature by 2100 [28]. The GPP is the total photosynthetic uptake or carbon assimilation by plants, and it is a key component of terrestrial carbon balance. Any errors in the GPP simulations will propagate through the model, introducing errors into the simulated biomass and net ecosystem fluxes. If the simulated GPP is too low or too high, predicted leaf area index, wood biomass, crop yield, and soil biomass may also be too low or too high [29].
In this study, we examined the performance of the IBIS, which has been widely used to evaluate the regional and global terrestrial ecosystem carbon balance and has been integrated into earth system models (e.g., Brazilian Earth System) [30]. Our results indicate that the IBIS model is a good candidate for simulating GPP at regional-to-global scales, and its performance was comparable to that of the satellite-based EC-LUE model based on EC site validation and comparison. The magnitude of global GPP estimated by the IBIS is also consistent with results of previous studies. Our estimate of annual global mean GPP was 107.5061.37 Pg C year 21 (mean value 6 standard deviation). Beer et al. estimated global GPP as 123. 8 Pg C year 21 [31]. Two satellite-based light-use efficiency models revealed similar estimates of global GPP: 109.29 Pg C year 21 by the MODIS algorithm [27] and 111 Pg C year 21 by the EC-LUE [21]. Interestingly, we found that the IBIS was consistent with the EC-LUE model across different PFTs (Table 3), despite the large However, the IBIS did not perform well for two PFTs: evergreen broadleaf forest and savanna. Among the four evergreen broadleaf forest sites, two (PT-Mi1 and FR-Pue) have subtropical Mediterranean climate with dry summers and wet winters [322 33], and the AU-Tum site also suffers drought in summer [34]. At these sites, rainfall is the key driver of water and carbon fluxes [35]. Leuning analyzed the CO 2 and H 2 O fluxes at the AU-Tum site [34], and found that carbon uptake was more strongly constrained by water stress than by temperature, and strongly affected by soil water availability. Moreover, savannas are characterized by climate with distinct wet and dry seasons, and this climate forcing causes savanna to form open, heterogeneous woodland canopies with grass understories [36].
These particular ecosystem properties result in greater complexity of modeling fluxes [37]. Process-based models need to simulate variation in soil moisture and plant phenology. However, previous studies have identified significant biases when simulating soil moisture [38]. We examined the performance of the IBIS on soil water at the savanna sites and also found obvious differences between the simulated values and the EC measurements (Fig. 6). Moreover, the IBIS integrated temperature-dominated phenology algorithms developed by Botta et al. [39]. However, field studies suggest that for many drought-deciduous species, the first large precipitation event at the start of the rainy season initiates rapid leaf flush [40241], and leaf senescence is closely related to soil water availability in the dry season [41242]. This relationship may explain why the IBIS model did not effectively capture the variance in GPP at savanna sites.
Parameterization is another large source of uncertainty in process-based models. V cmax is the key parameter of the photosynthesis process [11], and a study by Bonan et al. [43] suggests that uncertainty in this parameter could account for 30 Pg C year 21 variations in model estimation of global GPP. In the present study, the differences in setting V cmax values between the IBIS and IBIS-Type schemes caused an increase in global GPP of over 16.5 Pg C year 21 , which also indicated that V cmax is a salient parameter for simulating GPP. Unfortunately, the determination of V cmax in current models contains large uncertainty. Rogers surveyed V cmax in current state-of-the-art models [44] and found that V cmax varied within a wide range of 246 to +77% of the PFT mean. Thus, the determination of V cmax and the reduction of uncertainty in this parameter are important issues for model development.
Many parameters in process-based models are established by PFTs, which are based on the assumption that the same type of vegetation responds similarly to the environment. However, a current study found that model parameters were more variable than previously assumed within the given PFTs [45] and that categorization of vegetation into less than eight PFTs may result in artificial multiple steady-states in a model of the Earth's climatevegetation system depending on the number of PFTs used [46]. In the present study, the variation analysis showed that V cmax did not significantly differ among PFTs. The predetermined parameterization scheme that sets the V cmax constant values for each PFT in the IBIS model may cause systematic error.
We attempted to test the impact of V cmax on model performance. However, variations in V cmax cannot explain the overall uncertainty of the IBIS model. Validation of the IBIS on three flux sites demonstrated that parameterization and formula-  Validation of Process-Based Model on GPP PLOS ONE | www.plosone.org tions of phenology also limit the model's ability to capture seasonal fluctuations in carbon and water exchange [38]. Particularly in regions with summer drought, phenology is primarily controlled by water supply rather than temperature [34]. This relationship hinders model simulation because biases in phenology and the dynamics of the leaf area index can affect the simulation of evapotranspiration. Such errors can pass to simulations of soil water content and other variables associated with the water and carbon cycles [38]. Therefore, revisions of not only the parameterization of V cmax but also other parameters and formulations are needed for model improvement.

Summary
Process-based models are important tools for carbon cycle research, but current models incorporate substantial uncertainty. This study examines the performance of the IBIS model at global EC sites. Our results showed that the IBIS model explained 60% of the variation in GPP at all EC sites and performed comparably to the EC-LUE model, which explained 80% of the variation in observed GPP. At the global scale, the magnitudes of GPP estimated by the IBIS and EC-LUE models were comparable, being 107.5061.37 Pg C year 21 and 109.3961.48 Pg C year 21 , respectively. The parameter V cmax is a key parameter in the photosynthesis model. In the IBIS model, V cmax was set as a constant for each PFT. The inversed V cmax value was largely differentiated from the original setting, and no significant differences were detected among PFTs.

Supporting Information
Table S1 Information of EC sites used in this study. (DOCX) Figure 6. Comparison of estimated and observed soil water. Daily variation in the estimated soil water fraction of the IBIS model (i.e., fraction of soil pore space containing liquid water) and in the observed soil water content at EC sites. The solid lines represent simulation data, and the dotted lines represent the observed data. doi:10.1371/journal.pone.0110407.g006 Validation of Process-Based Model on GPP