## Figures

## Abstract

Crop yields are sensitive to extreme weather events. Improving the understanding of the mechanisms and the drivers of the projection uncertainties can help to improve decisions. Previous studies have provided important insights, but often sample only a small subset of potentially important uncertainties. Here we expand on a previous statistical modeling approach by refining the analyses of two uncertainty sources. Specifically, we assess the effects of uncertainties surrounding crop-yield model parameters and climate forcings on projected crop yield. We focus on maize yield projections in the eastern U.S.in this century. We quantify how considering more uncertainties expands the lower tail of yield projections. We characterized the relative importance of each uncertainty source and show that the uncertainty surrounding yield model parameters is the main driver of yield projection uncertainty.

**Citation: **Ye H, Nicholas RE, Roth S, Keller K (2021) Considering uncertainties expands the lower tail of maize yield projections. PLoS ONE 16(11):
e0259180.
https://doi.org/10.1371/journal.pone.0259180

**Editor: **Yangyang Xu, Texas A&M University, UNITED STATES

**Received: **June 21, 2021; **Accepted: **October 14, 2021; **Published: ** November 18, 2021

**Copyright: ** © 2021 Ye et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **The entire analysis is performed in R. The analysis code is available through Github: https://github.com/yhaochen/Climate_CornYield The METDATA observations are available at: https://www.northwestknowledge.net/metdata/data/ The MACAv2-METDATA datasets are available at: https://cida.usgs.gov/gdp/client/#!catalog/gdp/dataset/5752f2d9e4b053f0edd15628 The crop yield data are available at Crop Production Historical Track Records (April 2018) USDA, National Agricultural Statistics Service: http://quickstats.nass.usda.gov.

**Funding: **HY, RN, SR and KK received support from the US Department of Energy, Office of Science through the Program on Coupled Human and Earth Systems (PCHES) under DOE Cooperative Agreement No. DE-SC0016162 (https://www.energy.gov). HY, RN and KK received support from the National Science Foundation (NSF) through the Network for Sustainable Climate Risk Management (SCRiM) under NSF cooperative agreement GEO-1240507 (https://www.nsf.gov). HY, RN, SR and KK also received support from Penn State (https://www.psu.edu) through the Center for Climate Risk Management. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## 1. Introduction

Increasing greenhouse gas concentrations lead to warmer climates and more frequent extreme weather events [1]. Climate change poses threats to many economic sectors. For example, agricultural yields can be highly sensitive to temperature and precipitation change, raising concerns about food security [2].

The net impact of climate change on agriculture is highly uncertain due to our limited knowledge about drivers of yield anomalies and future climates [3–5]. Improving our understanding of climate change impacts on crop yields and quantifying the surrounding uncertainties is a potentially important avenue to improve decisions.

Statistical models are often used to represent the weather-crop yield relationship and to produce yield projections [6]. Statistical models empirically relate historical weather data and crop yield observations. Another approach to study the weather-crop yield relationship uses process-based dynamical simulations, which simulate the physiological processes of crop growth. Compared with dynamical models, statistical models have lower computation cost. This drastically simplifies the uncertainty assessments surrounding yield projections [6, 7].

Schlenker and Roberts (2009) uses statistical models to identify a nonlinear effect of temperature on crop yields and shows that crop yields increase modestly as temperature increases and decrease sharply once temperature exceeds a particular threshold [4]. Based on this nonlinear weather-yield relationship, Schlenker and Roberts (2009) projects a maize yield decrease of 63%-82% with 95% confidence level by the end of this century (2070–2099) under the most extreme warming scenario considered [4].

Previous studies provide valuable insights about weather impacts on crop yields, but they often sample a rather small subset of potentially important uncertainties. Some studies apply a simple “delta method” to climate projections [4, 8]. In other words, these studies approximate the future climate distribution by a linear shift of the past climate. This assumption is inconsistent with observations that suggest that the shape of summer temperature distributions has already changed in the past [9]. More recent studies adopt more complex and realistic climate forcings when projecting crop yields [10, 11]. Burke et al (2015) reports the 95% confidence intervals of maize yields projections in multiple climate forcings based on a relatively simple statistical model [10]. Keane and Neal (2018) lists the range of yield projections under 19 general circulation models (GCM) and three representative concentration pathway (RCP) emission scenarios [11]. These studies are, however, mostly silent on the relative importance of different uncertainty sources.

Here we expand on the statistical analysis of Schlenker and Roberts (2009) by incorporating and quantifying the effects as well as the relative importance of two main uncertainty sources on yield projections [4]: (i) uncertainties surrounding model parameters and (ii) climate forcings. We focus on maize as it is widely grown in most U.S. states and has high data availability. Following past studies [4, 8, 12], we consider six weather variables in a simple regression model and allow each variable to be either neglected or included in a linear or quadratic term in the model. To approximate the effects of model parameter uncertainty, we sample model parameters that pass a simple pre-calibration test [13, 14] based on observation data and the best estimates of yield hindcasts. To sample the climate forcing uncertainty, we use an ensemble of downscaled climate products to represent sampled climate conditions in future. We project the yield distribution based on sampled model parameters and climate forcings. Finally, we quantify the relative importance of these uncertainty sources by using a cumulative uncertainty approach based on the standard deviations when considering different uncertainty sources [15].

We address two main questions: (i) How does the incorporation of different uncertainties change the maize yield projection? (ii) What is the relative importance of each uncertainty source? The remaining text introduces the chosen yield data as well as climate data (section 2), describes the process of model regression and uncertainty analysis in detail (section 3), reports the main results (section 4), discusses methods and results (section 6) as well as caveats and limitations (section 7). The last section summarizes the conclusions and points to research needs.

## 2. Data

We collect county-level annual maize yield data from the United States Department of Agriculture [16]. We focus on 24 states in the eastern U.S. because they often rely more on precipitation than irrigation [4]. These yield data are reported as unit yield per growing area (bushel/acre) along with growing area in each county. We drop the counties with unreported data. We calculate the annual average yields for the entire study region weighted by reported growing areas.

We use METDATA historical climate data, a relatively high-spatial resolution (4km*4km) daily surface meteorological data product covering the contiguous U.S. [17]. We choose the historical study period from 1979 to 2018. We consider five weather variables based on previous research: maximum temperature, minimum temperature, precipitation, maximum relative humidity and minimum relative humidity [12]. We use weather data within the maize growing season each year defined as a 6-month interval after the 21-day moving average temperature reaches 10°C [12].

For the climate projections, we use MACAv2-METDATA [18]. This dataset uses the Multivariate Adaptive Constructed Analogs (MACA) statistical method to downscale GCMs bias corrected by METDATA observations. We analyze the 2019–2099 period from these projections to extend the observed data. We focus on two 30-year time windows to represent the near (2020–2049) and far future (2070–2099). Similar to how we dealt with the METDATA observations, we extract the same daily weather variables within the maize growing season. We choose to use projections following the business-as-usual RCP8.5 scenario [19] for comparability with other studies [4, 11, 12].

We aggregate all weather data to the county level based on each grid center’s longitude and latitude [8]. For each county, we find the grids whose centers fall inside the county boundary. We take the mean of these grids to serve as county level average weather data.

To capture some measure of the uncertainty in climate forcing, we use an ensemble of MACAv2-METDATA projections that comprises 18 climate projections based on different Coupled Model Intercomparison Project v5 (CMIP5) models [20]. These climate projections differ considerably. We consider these ensemble members as equally likely, as there is little evidence that one model outperforms the others in terms of root-mean-square errors (RMSE) over space and time compared with observations [10].

To simplify the comparison with previous studies, we also apply the delta downscaling method for climate projection as a scenario without considering climate forcing uncertainty [21]. We realize that this is a strong approximation and use this as an idealized scenario for comparison only [8]. For each 30-year time window, we calculate the mean difference or ratio of each weather variable between MACAv2-METDATA projection and hindcast, and then shift the METDATA observations of 1981–2010 to generate a new climate. Specifically, for each 30-year projection period, we calculate the 30-year mean value for each weather variable in each MACAv2-METDATA projection dataset. We then use the multi-model ensemble mean from the 18 climate projections as the mean projection for each variable [22]. We further calculate each variable’s 30-year hindcast mean value from MACAv2-METDATA hindcast dataset for a time window of 1981–2010. We shift the observational temperatures linearly based on the absolute difference between projection mean and hindcast mean, and multiply the observational precipitation and relative humidity proportionally based on the ratio between projection mean and hindcast mean.

## 3. Methods

The design of the analysis is illustrated in the flow diagram (Fig 1). We consider six weather variables that previous work identified as important based on the five weather variables reported in the historical climate data: maximum temperature, minimum temperature, precipitation, vapor pressure deficit (VPD) calculated by temperature and relative humidity, growing degree days (GDD) and extreme degree days (EDD) calculated by temperature [12].

We adopt county level weather data and yield data to model the weather impact on maize yields. We consider parameter uncertainty through a pre-calibration method and climate forcing uncertainty through an ensemble of downscaled climate products.

We calculate VPD (in hPa) using Eq (1), where T_{mean} is the average of maximum and minimum temperature in degrees Celsius, and RH_{mean} is the average of maximum and minimum humidity:
(1)

We adopt 10°C to 29°C as maize’s growing temperature range and use it to calculate GDD and EDD [4]. A simple estimation method of GDD and EDD follows Eqs (2) and (3) based on daily maximum and minimum temperature [23]: (2) and (3)

To calculate GDD, we treat any temperature higher than 29°C as 29°C, and any temperature lower than 10°C as 10°C. Similarly, to calculate EDD, we treat any temperature lower than 29°C as 29°C.

We analyze a set of model structures and determine the best choice of model structure by minimizing cross validation errors. Specifically, for each variable, we allow the model to include an up-to-quadratic relationship. This means the model may include both quadratic and linear terms, only the linear term, or nothing. The full model is shown in Eq (4).

(4)For each model structure, we apply ten-fold cross validation. We divide the observation data into ten equally sized groups, and we train the model using data from nine groups and test the hindcast performance of the last group. We calculate the RMSE for the test data to assess each model’s predictive skill. We repeat this process for each group and calculate the cross-validation error as the mean of ten RMSE calculations. We adopt the model with the smallest cross-validation error as the best model to estimate the yield hindcasts (Eq 5). We treat this model (Eq 5) as a reference model to represent the common approach and to calculate the yield anomalies.

(5)We transform the yield data into anomalies based on the reference model (Eq 5). In order to reduce the influence of other factors than weather, we include additional fixed effect terms in the reference model (Eq 5) and estimate these fixed effects. The temporal fixed effects capture factors that are approximately constant in space such as technology trend, market price and CO_{2} concentration. The spatial fixed effects approximate the effects of factors that are approximately constant in time such as local soil quality. Following previous work, we subtract the best estimates of temporal fixed effects in each year from yield observations [4]. We do not subtract the spatial effects because we are not specifically focusing on a particular region and we work in anomalies space. We then normalize the yield anomalies by subtracting the area-weighted mean yield so that the historical mean yield anomaly is zero. In the analyses of parameter uncertainties and yield projections, the calculations are based on normalized yield anomalies.

We use a simple pre-calibration approach to sample the model parameter uncertainty [13, 14]. The goal of pre-calibration is to characterize the parameter uncertainties and to drop unrealistic parameter samples by comparing the hindcasts with observations. The pre-calibration approach can provide several advantages. For example, it does not require a specific functional form for the parameter estimates. In addition, it provides a simple and straightforward way to sample the parameters with consideration of parameter interactions. Instead of directly applying the best model, we consider the full model shown in Eq (4) for the pre-calibration process. The best model treats the parameters that are excluded from the full model as zero. Hence the uncertainties of these parameters are neglected if the uncertainty characterization is based on the best model.

For each parameter, we specify a wide uniform range around the best estimate to sample from. The range width for all parameters (except the EDD terms) is twenty standard deviations around the best estimate of the full model. For EDD terms we use 50 standard deviations in order to cover the best estimates of the best model. We use Latin hypercube sampling to draw 10^{10} samples within this range without considering correlation [24]. We define a plausible hindcast range to be a symmetric band around the best estimate of hindcasts from Eq (4). We adopt the width of this band as the minimum width that enables the band to cover 95% of the area-weighted annual yield anomaly observation. We accept a parameter sample if the yield hindcast falls within this plausible range. We can observe the correlation between each pair of parameters from a two-dimensional heat map of accepted parameter samples after the pre-calibration (S1 and S2 Figs). Altogether we accept 19,231 samples. We do not find strong evidence that the yield projections change drastically when using more samples (S3 Fig). We hence consider the sample size of 10^{10} a reasonable approximation. This pre-calibration approach can certainly be refined [25], but it provides a simple and intuitive benchmark.

To assess the yield projection uncertainty, we sample climate forcing and model parameter uncertainties. Specifically, we sample 19 climate forcings and 19,231 accepted parameter samples. As a reference, we use the linear shifted climate projection from the delta method and the best estimates of parameters to represent the scenario without considering either uncertainty. For each 30-year interval, we calculate the average yield anomaly distribution while sampling different uncertainties.

To quantify the importance of the two uncertainty sources, we employ a cumulative uncertainty approach [15]. We use the standard deviation of the yield anomaly distribution from each of the 30-year windows to represent the uncertainties. The cumulative uncertainty approach decomposes the total uncertainty into individual uncertainty sources by calculating the uncertainty at different stages. Again, this method can certainly be refined [26], but it can provide some useful initial insights [15]. A stage is defined as a choice of considering certain uncertainty sources. For each stage, we first calculate the conditional cumulative uncertainty by fixing the factor(s) after this stage and varying the factor(s) up to this stage. For example, our first stage begins with varying model parameters while fixing the climate forcing. Under each choice of climate forcing, we calculate the standard deviation from the yield anomaly distribution considering parameter uncertainty. Then the marginal cumulative uncertainty of this stage is the mean of conditional cumulative uncertainty when choosing different factors after this stage. In our case, it is the mean of the standard deviations when choosing different climate forcings. The marginal cumulative uncertainty represents the cumulative uncertainty up to this stage. In this study, we list all three stages with their marginal cumulative uncertainty. The three stages are: (i) a stage considering only parameter uncertainty (ii) a stage considering only climate projection uncertainty and (iii) a stage considering both uncertainties.

## 4. Results

We estimate the best yield hindcasts and projections based on the selected model with the least cross-validation error shown in Eq (5) (the green and blue lines in Fig 2). Compared with the full model in Eq (4), the best model does not include maximum temperature (T_{max}) terms and the quadratic extreme degree days (EDD) term.

The black dots represent the area-weighted average annual yield observations. The green line is the best estimate of yield hindcast based on the model with the least cross-validation error. The deep blue lines are best estimates of this model for 18 different climate projections. Adding the effects of considered parameter uncertainty allows a pre-calibration to cover 95% of the observed yield data (light green area). The total effects of the considered uncertainties (climate forcings and parameters) expand the projection with a much wider 95% uncertainty range (light blue area).

Similar to previous studies, increasing growing degree days (GDD) increases maize yield while increasing extreme degree days (EDD) decreases maize yield. While the best model and the full model have similar hindcast skills, some of the parameter estimates are different, especially linear EDD term and linear T_{max} term (Table 1). This suggests that these two models may have different yield projections under extreme high temperatures.

Yield projections change considerably when moving from climate projections derived from the delta method to the more refined downscaling method (Fig 3). One reason for this is that the delta method underestimates the extreme high temperatures (S4 Fig). In this example, the delta method underestimates the projected temperature mean by about 0.7°C. This effect is amplified for EDD, because EDD represents the net effect of the extreme temperatures, which is the tail area of mean temperature distribution (S5 Fig). The increasing extreme high EDD can lead to potentially sharp decreases in yield projections.

**a: 2020–2049 b: 2070–2099** The point labels the point estimate without considering any uncertainty (the best estimate in a linear shifted climate projection); the three solid lines (red, green and blue) are the distributions when considering only parameter uncertainty, only climate forcing uncertainty and both uncertainty sources respectively. The distribution medians are labeled as vertical black lines on the box-whisker plots.

The distributions of the accepted pre-calibration parameter samples are much wider than the distributions based on the linear regression results from the best model (S6 Fig). Recall that our approach accepts parameter samples as long as the hindcasts pass the defined plausible band. Many best parameter estimates are not located at the highest density of the accepted pre-calibration samples (S1 and S2 Figs, Table 1).

As expected, adding climate forcing uncertainty to model parameter uncertainty widens the yield projection uncertainty range (Fig 2). The upper bound of the uncertainty range does not considerably increase until around 2060, but the lower bound decreases from about -40 bushel/acre in the near future down to about -150 bushel/acre in the far future. One hypothesis to potentially explain the observed patterns of an increasing upper bound of the uncertainty range in the far future is that some sampled structures are more sensitive to the positive effects of climate change. In this case, the yield projections will be high under a warm but not extreme climate forcing which has high GDD and low EDD. The results in Fig 2 are consistent with this hypothesis: although many model samples have similar hindcast skill passing the green plausible band, their predictive skill will diverge greatly, especially under an unexperienced more extreme future climate.

The best estimates of yield projections from different climate projections miss important information about the possible low yield extremes (blue lines in Fig 2). In the near future, some lines project positive yield anomalies that exceed the upper limit of the 95% uncertainty range. In the far future, though the yield projection uncertainty will have a high upper bound, the best estimates from each climate projection are all negative values. However, in both the near future and far future, these best estimates are much higher than the lower bound of the 95% uncertainty range. This suggests that considering both uncertainty sources will lead to more extremely low yield projections.

Considering model parameter uncertainty widens the probability density function of the yield projections (red lines in Fig 3). Similarly, considering climate forcing uncertainties produces roughly similar distributions (green lines in Fig 3). Considering both uncertainty sources extends the lower tails even further and results in yield distributions with negative skewness (blue lines in Fig 3).

Model parameter uncertainty explains more of the variance in yield projections than climate forcing uncertainty (Fig 4). The importance of the parameter uncertainty increases in the far future. In the near future, the stage with only climate forcing uncertainty explains around 66% of total variance, while the stage with only parameter uncertainty explains around 83% of total variance. In the far future, the stage with only climate forcing uncertainty explains around 53% of total variance while the stage with only parameter uncertainty explains around 95% of total variance.

The uncertainties are measured in yield anomaly standard deviations. Two panels are for two periods. **a**: 2020–2049 **b**: 2070–2099. The percentages are the proportion of the uncertainty cumulatived up to each stage.

## 5. Discussion

We expand on a well-studied approach to project changes in maize yields to analyze the effects and the relative importance of model parameters and climate uncertainties [4]. The pre-calibration approach provides a conceptually easy approach to analyze the effects of parameter uncertainty. However, it is computationally very demanding. In our case with 13 parameters, the acceptance rate is about one in a million using the current sampling range. The heat map suggests that the prior range is still not wide enough because samples can still be accepted near the boundaries (S1 and S2 Figs). Another potential concern is that some accepted samples project rather extreme yield anomalies in the far future. It is possible for a model to pass the pre-calibration test but project extreme yields under an extreme climate beyond historical climate (S7 Fig). This points to potential problems with the statistical model approach.

We use a cumulative uncertainty decomposition method to quantify the relative importance of each uncertainty source. However, the decomposition result depends on the measure of uncertainty and the order of uncertainty sources to add. The common approach starts with one source and adds another at each stage. Then the contribution of a particular source is the difference of cumulative uncertainties between two successive stages with and without this source. In the results for 2020–2049, if we choose to start with climate forcing uncertainty and then add parameter uncertainty, we would conclude that climate forcing uncertainty explains 66% of total uncertainty and the addition of parameter uncertainty explains 100%-66% = 34% of total uncertainty. If we start with parameter uncertainty and then add climate forcing uncertainty, we would conclude that parameter uncertainty explains 83% of total uncertainty while the addition of climate forcing uncertainty explains 100%-83% = 17% of total uncertainty (Fig 4). The results will also be different if we use another measure of uncertainty such as the range of the distribution. We simply list the uncertainty explained by each stage. More refined variance-based uncertainty decomposition methods are available to quantify the relative importance of each uncertainty source [27]. Sobol’s method considers all the uncertainty sources simultaneously and calculates the variance explained by each individual source as well as each interaction between multiple sources.

## 6. Caveats and limitations

We chose our analysis framework for its conceptual simplicity. The simplicity comes, of course, requires several simplifying assumptions that lead to caveats. Here we mention four examples of these caveats. First, we only adopt a simple statistical model with linear and quadratic terms and neglect key aspects of structural model uncertainty. Second, we only consider a high forcing scenario for climate projection (RCP8.5). Including more forcing scenarios will likely expand the uncertainty in climate forcing. Third, we simply treat the 24 states as a whole and produce mean projections for the entire region. We further assume that the statistical relationship and growing area in each county will hold the same in future. In reality, the adaptation of crops and technology development might change the weather-yield relationship greatly [11]. Last but not least, we adopt a very simple statistical approach. For example, we use a very simple acceptance criterion that does not consider the spatial correlation of yield residuals (S8 Fig). The yield anomalies usually have strong spatio-temporal patterns thus models that can capture and simulate these patterns might be considered more realistic.

## 7. Conclusion

Crop yields are sensitive to climate change. Many studies use statistical models to simulate the weather-yield relationship and estimate the yield projection under climate change. However, previous studies have often been silent on the effects and relative importance of the deep uncertainty surrounding model parameters and climate forcings. We identify important uncertainties in model parameters and climate forcings surrounding yield projections. We incorporate these two uncertainty sources using a statistical approach and apply a simple evaluation method to rank their relative importance. We find that considering these uncertainty sources leads to a yield projection with a wider range, larger variance, and a longer tail of low yield outcomes. By comparing the marginal cumulative uncertainty when considering different uncertainty sources, we conclude that model parameter uncertainty explains more uncertainty than sampled climate forcing uncertainty. Our study can help to inform climate impact assessments and the design of strategies to improve these assessments.

## Supporting information

### S1 Fig. Heat maps of accepted linear parameter samples based on the full model from the pre-calibration analysis.

The black dot is the best estimate based on the full model (Eq 4). The colors illustrate the probability density of the parameters with red area denoting higher and blue area denoting lower probability densities.

https://doi.org/10.1371/journal.pone.0259180.s001

(TIF)

### S2 Fig. Heat map of accepted GDD and EDD parameter samples.

This figure is a zoomed-in panel of S1 Fig. The black dot is the best estimate based on the full model. The best estimate does not necessarily locate at the highest density region of the accepted pre-calibration samples.

https://doi.org/10.1371/journal.pone.0259180.s002

(TIF)

### S3 Fig. Convergence of the pre-calibration sampling approach.

Shown are the far future yield projection (the blue PDF in Fig 3b) mean and standard deviation change as a function of accepted pre-calibration sample sizes. The solid line represents the mean, and the dashed line represents the standard deviation. Both lines stabilize after around 5,000 samples.

https://doi.org/10.1371/journal.pone.0259180.s003

(TIF)

### S4 Fig. Comparison of temperature distribution in far future under linear shifted climate and downscaled climate projection.

Here we pick the MIROC5 model projection from MACAv2-METDATA (the red histograms) [18, 20]. The linear shifted climate underestimates the high temperatures and overestimates the low temperatures. On the box-whisker plots, the vertical black lines are the histogram temperature medians (50% percentile), two ends of the box are 25% percentile and 75% percentile temperatures, and the black points are the outliers outside 1.5 times of the interquartile range (the width of the box).

https://doi.org/10.1371/journal.pone.0259180.s004

(TIF)

### S5 Fig. Comparison of extreme degree day (EDD) distribution in far future under linear shifted climate and downscaled climate projection.

The box-whisker plots are the same as Fig 1 except that they are for EDD instead of temperature.

https://doi.org/10.1371/journal.pone.0259180.s005

(TIF)

### S6 Fig. Marginal distributions of each parameter.

The black lines are the parameter distributions based on the linear regression result for the best model with the least cross-validation errors. This model does not include the quadratic EDD term and Tmax terms so instead there is a black dashed line at zero in these panels. The red lines are the parameter distributions from the accepted pre-calibration samples. The range of x-axis in each panel is the wide prior range of each parameter.

https://doi.org/10.1371/journal.pone.0259180.s006

(TIF)

### S7 Fig. The full yield anomaly projections uncertainty range.

This plot is the same as Fig 2 but with the full yield projections uncertainty range instead of 95% uncertainty range.

https://doi.org/10.1371/journal.pone.0259180.s007

(TIF)

### S8 Fig. County level yield residual of the model hindcast.

The yield residuals have strong spatial patterns varying each year. We plot the residual map in 1983 with the most observation data. In future studies, we plan to use spatial models to better account for these spatial patterns.

https://doi.org/10.1371/journal.pone.0259180.s008

(TIF)

### S9 Fig. The predictive skill of a model using only 32 years data.

We add eight more years observational data in an update (1979, 1980, 2013–2018). We use these data to test the predictive skill of the old model using 32 years data. The estimated hindcasts given by the old model (blue circles) are close to the hindcasts of the updated model (red line) and the observations (black dots).

https://doi.org/10.1371/journal.pone.0259180.s009

(TIF)

## Acknowledgments

We thank the support of the Program of Coupled Human and Earth Systems (PCHES) the PCHES researchers, as well as the Keller research group for discussions. Special thanks go to Ryan Sriver and Wolfram Schlenker for the discussions about this study and previous work; Murali Haran for his suggestions and advising to this project; Vivek Srikrishnan, Ben Seiyon Lee for their help in statistics and coding; and Mathew Lisk for his help in code checking, packaging, and the reproducibility check. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding entities.

## References

- 1. Luber G, McGeehin M. Climate change and extreme heat events. Am J Prev Med. 2008;35: 429–435. pmid:18929969
- 2. Lobell DB, Burke MB, Tebaldi C, Mastrandrea MD, Falcon WP, Naylor RL. Prioritizing climate change adaptation needs for food security in 2030. Science. 2008;319: 607–610. pmid:18239122
- 3. Mendelsohn R, Nordhaus WD, Shaw D. The Impact of Global Warming on Agriculture: A Ricardian Analysis. Am Econ Rev. 1994;84: 753–771. Available: http://www.jstor.org/stable/2118029
- 4. Schlenker W, Roberts MJ. Nonlinear temperature effects indicate severe damages to U.S. crop yields under climate change. Proc Natl Acad Sci U S A. 2009;106: 15594–15598. pmid:19717432
- 5. Wuebbles DJ, Easterling DR, Hayhoe K, Knutson T, Kopp RE, Kossin JP, et al. Our globally changing climate. In Climate Science Special Report: Fourth National Climate Assessment, Volume I U.S. Global Change Research Program. 2017;pp. 35–72,
- 6. Lobell DB, Burke MB. On the use of statistical models to predict crop yield responses to climate change. Agric For Meteorol. 2010;150: 1443–1452.
- 7. Roberts MJ, Braun NO, Sinclair TR, Lobell DB, Schlenker W. Comparing and combining process-based crop models and statistical models with some implications for climate change. Environmental Research Letters. 2017. p. 095010.
- 8. Lesk C, Coffel E, Horton R. Net benefits to US soy and maize yields from intensifying hourly rainfall. Nat Clim Chang. 2020.
- 9. McKinnon KA, Rhines A, Tingley MP, Huybers P. The changing shape of Northern Hemisphere summer temperature distributions: CHANGING TEMPERATURE DISTRIBUTIONS. J Geophys Res D: Atmos. 2016;121: 8849–8868.
- 10. Burke M, Dykema J, Lobell DB, Miguel E, Satyanath S. Incorporating Climate Uncertainty into Estimates of Climate Change Impacts. Rev Econ Stat. 2015;97: 461–471.
- 11.
Keane MP, Neal T. The Impact of Climate Change on US Agriculture: The Roles of Adaptation Techniques and Emissions Reductions. UNSW Business School Research Paper. 2018.
- 12. Hoffman AL, Kemanian AR, Forest CE. The response of maize, sorghum, and soybean yield to growing-phase climate revealed with machine learning. Environ Res Lett. 2020;15: 094013.
- 13. Edwards NR, Cameron D, Rougier J. Precalibrating an intermediate complexity climate model. Clim Dyn. 2011;37: 1469–1482.
- 14. Knutti R, Stocker TF, Joos F, Plattner G-K. Constraints on radiative forcing and future climate change from observations and climate model ensembles. Nature. 2002;416: 719–723. pmid:11961550
- 15. Kim Y, Ohn I, Lee J-K, Kim Y-O. Generalizing uncertainty decomposition theory in climate change impact assessments. Journal of Hydrology X. 2019;3: 100024.
- 16.
Crop Production Historical Track Records (April 2018) USDA, National Agricultural Statistics Service. http://quickstats.nass.usda.gov
- 17. Abatzoglou JT. Development of gridded surface meteorological data for ecological applications and modelling. Int J Climatol. 2013;33: 121–131.
- 18. Abatzoglou JT, Brown TJ. A comparison of statistical downscaling methods suited for wildfire applications. Int J Climatol. 2012;32: 772–780.
- 19. Moss RH, Edmonds JA, Hibbard KA, Manning MR, Rose SK, van Vuuren DP, et al. The next generation of scenarios for climate change research and assessment. Nature. 2010;463: 747–756. pmid:20148028
- 20. Taylor KE, Stouffer RJ, Meehl GA. An Overview of CMIP5 and the Experiment Design. Bull Am Meteorol Soc. 2012;93: 485–498.
- 21. Hay L. E., Wilby R. L. & Leavesley G. H. A comparison of delta change and downscaled GCM scenarios for three mountainous basins in the united states1. J. Am. Water Resour. Assoc. 36, 387–397 (2000)
- 22. Wang B, Zheng L, Liu DL, Ji F, Clark A, Yu Q. Using multi-model ensembles of CMIP5 global climate models to reproduce observed monthly rainfall and temperature with machine learning methods in Australia. Int J Climatol. 2018;38: 4891–4902.
- 23. McMaster GS, Wilhelm WW. Growing degree-days: one equation, two interpretations. Agric For Meteorol. 1997;87: 291–300.
- 24. McKay MD, Beckman RJ, Conover WJ. Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code. Technometrics. 1979;21: 239–245.
- 25. Chang W, Haran M, Olson R, Keller K. A COMPOSITE LIKELIHOOD APPROACH TO COMPUTER MODEL CALIBRATION WITH HIGH-DIMENSIONAL SPATIAL DATA. Stat Sin. 2015;25: 243–259. Available: http://www.jstor.org/stable/24311014
- 26.
Saltelli A, Ratto M., Andres T, Campolongo F, Cariboni J, Gatelli D. Global Sensitivity Analysis: The Primer. John Wiley & Sons; 2008
- 27. Sobol’ IM. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math Comput Simul. 2001;55: 271–280.