Skip to main content
  • Loading metrics

Pangeo-Enabled ESM Pattern Scaling (PEEPS): A customizable dataset of emulated Earth System Model output

  • Ben Kravitz ,

    Contributed equally to this work with: Ben Kravitz, Abigail Snyder

    Roles Conceptualization, Methodology, Validation, Writing – original draft, Writing – review & editing

    Affiliations Department of Earth and Atmospheric Sciences, Indiana University, Bloomington, IN, United States of America, Atmospheric Sciences and Global Change Division, Pacific Northwest National Laboratory, Richland, WA, United States of America

  • Abigail Snyder

    Contributed equally to this work with: Ben Kravitz, Abigail Snyder

    Roles Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing

    Affiliation Joint Global Change Research Institute, Pacific Northwest National Laboratory, College Park, MD, United States of America


Emulation through pattern scaling is a well-established method of rapidly producing climate fields (like temperature or precipitation) from existing Earth System Model (ESM) output that, while inaccurate, is often useful for a variety of downstream purposes. Conducting pattern scaling has historically been a laborious process, in large part due to the increasing volume of ESM output data that has often required downloading and storing locally to train on. Here we describe the Pangeo-Enabled ESM Pattern Scaling (PEEPS) dataset, a repository of trained annual and monthly patterns from CMIP6 outputs. This manuscript describes and validates these updated patterns so that users can save effort calculating and reporting error statistics in manuscripts focused on the use of patterns. The trained patterns are available as NetCDF files on Zenodo for ease of use in the impact community, and are reproducible with the code provided via GitHub in both Jupyter notebook and Python script formats. Because all training data for the PEEPS data set is cloud-based, users do not need to download and house the ESM output data to reproduce the patterns in the zenodo archive, should that be more efficient. Validating the PEEPS data set on the CMIP6 archive for annual and monthly temperature, precipitation, and near-surface relative humidity, pattern scaling performs well over a variety of future scenarios except for regions in which there are strong, potentially nonlinear climate feedbacks. Although pattern scaling is normally conducted on annual mean ESM output data, it works equally well on monthly mean ESM output data. We identify several downstream applications of the PEEPS data set, including impacts assessment and evaluating certain types of Earth system uncertainties.


Earth System Models (ESMs) are a standard tool for projecting future climate changes in response to forcing. The Coupled Model Intercomparison Project (CMIP) leads this effort and is one of the largest activities in Earth system science [1]. Nevertheless, ESMs are computationally expensive, which limits the research community’s ability to explore various kinds of uncertainties. For example, only a limited number of future scenarios are feasible to explore, especially in the context of a multi-model intercomparison [2]. Also, exploring internal variability or rare events requires large ensembles of simulations (e.g., [3]) which are similarly computationally prohibitive for many modeling groups and are difficult to explore for a wide variety of scenarios.

Climate model emulators are simple tools that are trained on ESM output data that reproduce the output of those ESMs, sometimes at reduced spatial or temporal resolution, often at a tiny fraction of the computational cost. There are numerous approaches to producing emulators. Some are built entirely from the ESM output based on either single (e.g., [4, 5]) or multiple [68] models or scenarios, depending upon the application. Others assume an underlying functional form of the gross Earth system behavior, such as an impulse response model [9, 10] or a simple climate model (e.g., [11]), such as those participating in the Reduced Complexity Model Intercomparison Project (RCMIP) [12], or they train a deep learning model on ESM output data to learn the underlying functional form [13, 14]. Emulators can be used to produce a variety of outputs, such as augmenting ensembles of simulations [7, 15], producing scenarios that were not simulated by the ESMs [16, 17], or estimates of error computing indices of extreme events [18].

To emulate regional impacts, one needs a way of relating global changes to changes on finer scales. A simple, commonly used way of doing this is via pattern scaling, in which the spatial pattern of change is assumed to scale linearly with global mean temperature [19, 20]. While technically inaccurate, pattern scaling has been shown to be reasonably effective at reproducing large-scale mean field CMIP5 behavior for temperature and precipitation over a range of scenarios [4, 5] and for emulating indices of extreme events [18].

Lynch et al. [5] built a pattern scaling archive for available CMIP5 output ( that can be used by a variety of stakeholders, such as impacts modelers [21] and multisectoral dynamics modeling efforts exploring the co-evolution of the integrated human-Earth system [22]. Using the options available to them, Lynch et al. downloaded climate model output from 12 of the 41 available models for the historical period and four future scenarios (the Representative Concentration Pathways) [23], processed the output locally to create a database of patterns, and then hosted that database in a repository. Put simply, this required a lot of time and labor. CMIP6 has over 100 participating models running even more scenarios than in CMIP5, and many models that previously participated in CMIP5 have undergone improvements, including increased spatial resolution, making the size of the individual training data files even larger. The approach undertaken by Lynch et al. is not sustainable.

Here we introduce the Pangeo-Enabled ESM Pattern Scaling (PEEPS) data set: trained patterns updated to use CMIP6 ESM output training data and extended to novel variables (relative humidity) and time scales (monthly climatologies), and covering more ESMs. The PEEPS data set described and validated in this manuscript is reproducible via either a Jupyter notebook or Python script to conduct the pattern scaling. The burden of providing this trained data set is greatly reduced compared to Lynch et al.’s CMIP5-era approach because it relies on cloud hosting and data access to CMIP6 data provided by the Pangeo community [24]. By interfacing with Pangeo, no downloading and storing of ESM output data for training is required, which substantially reduces the effort to continue providing updated, validated trained patterns in the literature. We host the files examined here in a Zenodo repository [25] for direct download by users uninterested in running Python code themselves. These NetCDF files are much smaller than the actual ESM outputs used to train. Because of Pangeo, the dataset can also be rapidly recreated with the Jupyter notebook or Python script available in the GitHub repository, offering more flexibility to researchers for accessing these patterns in the way that works best for them. This both improves the reproducibility of the data set described here, and also provides a head start for those interested in customizing the notebook to explore and independently validate other patterns than those described here.

In this manuscript we document the PEEPS data set of CMIP6 patterns for the research community. Section 2 describes the pattern scaling method and ESM output data that we analyzed. Section 3 produces a validation of the PEEPS data sets (trained patterns) for the CMIP6 archive, exploring scaling on both annual mean and monthly mean output. Section 4 describes the code that produces the PEEPS data set, including options, formatting of the output files, and performance. Section 5 provides a discussion of the potential uses of this package.


There are numerous methods of conducting pattern scaling, including calculating a linear change from the underlying data [4, 7, 20] or assuming an underlying functional form [26] or model relating global temperature to local changes. Additional methods include using principal component analysis to represent spatial variability [27] or nonlinear methods involving machine learning [28, 29]. Here we conduct ordinary least squares linear regression (using the Python package sklearn) in each grid cell of each ESM individually to obtain a slope (m(x)) and an intercept (b(x)) value for each model and each scenario separately, where x indicates the spatial dimension. The emulated (pattern scaled) output at any grid point (T(x, t)) at any time t is then obtained from the global mean temperature () via (1) Thus the training data for this emulation is grid cell values for a specific ESM, scenario, and variable and the corresponding global average temperature serving as the predictor. (In all subsequent figures showing spatial aggregates, grid point pattern scaling was conducted first, and then these results were aggregated.) If there are multiple ensemble members, the regression is computed for each ensemble member individually, and the reported slope and intercept is the ensemble average. Fig 1 shows a sample of pattern scaling for precipitation.

Fig 1. An illustrative example of pattern scaling for annual mean precipitation in a single model (in this case, CESM2-WACCM) over the historical period.

Top left shows the slopes (precipitation per degree global average temperature change) of the regression lines at each grid point (mm day−1 K−1). Top right shows the intercepts of the regression lines (mm day−1). Bottom left shows the generated precipitation (P(x)) time series calculated as where m(x) is slope, is global mean temperature averaged over the last 20 years of simulation (for the historical period 1995–2014), b(x) is intercept, and x is the spatial dimension. Bottom right shows the residual (mm day−1) of generated minus the actual model output, again averaged over the last 20 years of simulation.

Pattern scaling is often conducted using annual mean climate model output. In addition to that, we explore pattern scaling using monthly mean output. We explore errors in both of these temporal resolutions for surface air temperature (tas), precipitation (pr), and relative humidity (hurs). The Pangeo architecture enables analysis of as many other variables and scenarios as are available in the CMIP6 archive (and, likely, archives for future projects). Table 1 describes the number of models we evaluated for each scenario/variable combination. The scenarios we explore are the latest future projections used in CMIP [2] and are the subject of in-depth discussions in the latest assessment report by the Intergovernmental Panel on Climate Change [30].

Table 1. The number of models available via Pangeo for each scenario/variable combination.

60 models had their output available in Pangeo at the time of our analysis.


Annual mean pattern scaling

As a metric of validation, Figs 24 show a measure of how large the annual mean residual is compared to the underlying data. At each grid point, for each model and scenario, the average residual over the last 20 years of simulation (1995–2014 for historical; 2081–2100 for the others) is computed. This value is then compared to the standard deviation of the model output over that 20-year period for each model and scenario. We chose a 20-year period as broadly representative of a period that would average out most modes of internal variability; further work could test different periods of averaging, particularly for monthly means which would require shorter periods of averaging. Values in Figs 24 indicate the percent of models for each scenario for which the residual is within one standard deviation of the model output. The standard deviation is computed assuming each year within that 20 year period is independent, which is an erroneous assumption that results in a more conservative estimate (smaller standard deviation).

Fig 2. In each grid box, percentage of models where the surface air temperature (tas) residual for annual mean pattern scaling is within one standard deviation of the model output.

S1S3 Figs show versions of Figs 24 where, instead of using the mean residual (i.e., bias), the comparison is to the root-mean-square error (RMSE) of the residual. We primarily focus on the mean residual in this study, as that is a commonly used metric and a good determinant of how well pattern scaling performs on climatological scales [18, 31]. A drawback of using the mean residual is that it can involve compensating errors, although large errors in bias also lead to large errors in RMSE [32]. Conversely, RMSE can describe how the errors in pattern scaling compare to natural variability, with the drawback that the RMSE results tend to be biased toward short-lived deviations (i.e., penalizes outliers). This is indeed borne out in the results, as comparisons of Figs 24 and S1S3 Figs largely differ over areas where there is more variability (e.g., the midlatitudes for temperature and the tropics for precipitation). Nevertheless, outside of SSP4–3.4 (for which there are few models), at least half of the models do well in all scenarios, especially over land.

As has been found in other studies performing the exact same methodology (e.g., [7]), for the most part, pattern scaling appears to be an effective means of replicating the local trend of annual mean climate model output for a wide variety of models and scenarios, with many regions showing that for 90+% of the models the residuals are within one standard deviation of the baseline for each model. Fig 5 further illustrates this, showing the number of scenarios for which the panels displayed in Figs 24 are at least 90%. For surface air temperature, 85.6% of all area (96.3% of land area) has all six scenarios that meet this criterion. For precipitation the analogous figures are 98.5% of all area (98.9% of land area), and for relative humidity 78.4% of all area (78.5% of land area). Relative humidity shows worse performance than the other two models, as well as large inter-scenario differences. (S4 Fig shows a version of Fig 5 for but computed using the RMSE metric. As is expected, performance is substantially worse, as RMSE on the residuals tends to amplify deviations on sub-decadal timescales. Nevertheless, based on the results in S1S3 Figs, choosing a cutoff value of lower than 90% would undoubtedly improve the appearance of the results, again particularly over land.) S5S8 Figs are replicates of Figs 25, respectively, but calculated using a pooled variance across all models instead of the variances of the individual models; as might be expected, fewer values fall outside of the one standard deviation range in the pooled variance computations.

Fig 5. An aggregate of Figs 24 illustrating for each grid point the number of scenarios for which the value (percent of models within one standard deviation of baseline) is at least 90%.

There are regions where, for some scenarios and some models, pattern scaling introduces error. This could be due to a few reasons:

  1. Pattern scaling doesn’t work for that region. This could be due to some response in the climate system that results in a nonlinear relationship between global mean temperature and the local response (for example, feedbacks leading to Arctic Amplification); nonmonotonicity in global mean temperature (for example, ssp126 has an overshoot, which could affect regressions); or a low trend in global mean temperature (again, possible under ssp126) resulting in a poor linear fit. There could also be slowly responding elements of the climate system that result in different transient states or a different steady state (e.g., land-ocean contrast evolution).
  2. The baseline may have low variability. This would result in a greater probability of exceeding one standard deviation.
  3. There is a low number of models (as is the case for ssp434), so having a high residual for even one model can result in a large change in Figs 24.

Figs 6 and 7 provide further insight into potential sources of error in pattern scaling for the three variables considered here. For all variables in all scenarios, the inter-quartile range never exceeds 0.5 standard deviations, indicating that any errors tend to be due to a smaller percentage of models rather than general features of pattern scaling. Among those models that exceed the inter-quartile range, relative humidity tends to have greater error than the other two variables, and high latitudes tend to have more error than other regions. Historical and ssp245 tend to have the least error, ssp585 has the most error, and ssp434 has too few models to ascertain a robust comparison with other scenarios. The greatest error tends to vary across variables and scenarios, that is, there is no group of models that performs poorly in all cases. If a model/scenario/variable combination has error in one spatial region, it tends to have high error in the other regions; based on the results in Figs 25, this result is likely dominated by local features with high residuals. Nevertheless, there are few values in Fig 7 that exceed one standard deviation, and they are almost entirely found in ssp126, ssp370, and ssp585. (S9 and S10 Figs replicate Figs 6 and 7, respectively, but computed using the pooled variance instead of the individual model variances.) Further spatial analysis (not pictured) indicates that indeed the mean residuals on a gridpoint basis are quite small, further reinforcing that on average pattern scaling does well except for a few outliers. A notable exception includes tropical precipitation, which is a known difficulty for pattern scaling due to nonlinear behavior of intense precipitation [4].

Fig 6. Box plot of root mean square error (RMSE) of pattern scaling, calculated as the number of standard deviations the generated output is from the actual model output (calculated over the last 20 years of simulation), for each scenario (panels) and for temperature (tas), precipitation (pr), and relative humidity (hurs) in a variety of regions.

Red lines indicate the median model, blue boxes indicate the inter-quartile range, and whiskers indicate the full model range. Because so few models participated in ssp434, we show the RMSE values for each model.

Fig 7. Heatmap of the number of standard deviations (colors) the generated output is from the actual model output for each model in each scenario (panels; calculated from the last 20 years of each simulation) for temperature (tas), precipitation (pr), and relative humidity (hurs) in a variety of regions.

White squares (marked by NaN) indicate that there is no model output available for that model/variable combination on Pangeo.

Figs 25 have some areas where pattern scaling performance is consistently worse than others. In addition to the tropics, these areas include the North Atlantic, the high latitudes (predominantly the Arctic), the Southern Ocean, and oceanic areas associated with eastern boundary currents. These are all areas associated with feedback-dominated behavior where pattern scaling might not be expected to perform well: the “warming hole” in the North Atlantic associated with the Atlantic Meridional Overturning Circulation and cloud feedbacks [33]; Arctic amplification associated with strong feedbacks like the ice albedo feedback, lapse rate feedback, and changes in atmospheric and oceanic heat transport [34, 35]; cloud feedbacks in the Southern Ocean [36]; and persistent marine stratocumulus decks off the western coasts of continents [37]. Regarding the Atlantic Meridional Overturning Circulation, the Southern Ocean, and marine stratocumulus decks, these areas are not over land so are not directly relevant for many impacts models, for example agriculture or hydrological models. While these regions are important in general, one would not presume that pattern scaling is an effective tool for studying these sorts of complex processes and feedbacks, so it could be argued that pattern scaling performance in these regions is less important. The high latitudes are important for many impacts studies, notably sea level rise; due to substantial uncertainties in feedback strength at the high latitudes resulting in large model spread [38], we urge caution in using this data set (or pattern scaling in general) to evaluate impacts of high latitude change. Figs 25 do indicate, however, that even at these latitudes, there are many ESMs amenable to pattern scaling in many scenarios for many variables.

The regression approach undertaken here is not well suited to capturing interannual variability (e.g., the El Niño Southern Oscillation or the North Atlantic Oscillation). Regions strongly affected by interannual variability are unlikely to show major sources of error if the oscillation period is substantially smaller than 20 years (the averaging period of our results). If the oscillation changes such that one mode becomes more dominant than the other, the regression should be able to capture those changes, similarly resulting in low error. A potential caveat is if the oscillation has a longer period than can be captured in the 20-year average (such as the Pacific Decadal Oscillation) [39].

For temperature, pattern scaling on ssp126 is worse-performing than the other scenarios (Figs 6 and 7), and for precipitation and relative humidity, pattern scaling on the high forcing scenarios (ssp370 and ssp585) is worse than for the others. ssp126 has little global mean temperature change, and what change it does have is nonmonotonic [6], so it is difficult to obtain a confident regression slope. Under ssp370 and ssp585, there is greater excitation of temperature-related feedbacks, which is more likely to result in behavior that cannot be captured by linear regression.

Monthly mean pattern scaling

Most pattern scaling is conducted using annual mean variables, but climate model output is often available as monthly averages. Monthly averages also often have more utility in many impacts models. This begs the question as to whether pattern scaling on monthly output will work. There is strong reason to believe that it would, as the seasonal cycle is by far the most dominant source of variability in monthly output, and removing that cycle is standard procedure for creating climatologies on which residuals are calculated.

Figs 810 show a monthly climatology of residuals calculated over six regions of the globe. (S11S13 Figs are the pooled variance versions, respectively.) For most models and scenarios, monthly residuals are small and do not show strong seasonal variations. High latitude temperature is a notable exception, although the mean model/scenario residual is approximately zero. For precipitation and relative humidity, a few models have residuals that separate from the model pack but with few notable seasonal characteristics, indicating that higher residuals are due to the model/scenario combination itself rather than anything intrinsic to monthly pattern scaling. S14 Fig shows a histogram comparing the residual for annual pattern scaling with the averaged residuals for monthly pattern scaling. For all three variables, the differences are several orders of magnitude smaller than the residual fields and are approximately normally distributed. This confirms our hypothesis that pattern scaling on monthly output does indeed work and, in many cases, is indistinguishable in performance from annual mean pattern scaling. S15S35 Figs show spatial patterns of the residuals, analogous to Figs 25. (S36S38 Figs show the pooled variance versions of S33S35 Figs, respectively.) For temperature there are few obvious differences between monthly residuals, and the patterns of success in pattern scaling reflect those of annual mean pattern scaling. There are some seasonal shifts in the residuals for precipitation based on seasonal variations in tropical precipitation; again, this is expected [4]. For relative humidity, differences between months are small and, like temperature, resemble the annual mean residual patterns.

Fig 8. Climatology of monthly climatology surface air temperature (°C) residuals (generated output minus actual model output) calculated over six different regions of the globe.

Each line indicates a model/scenario combination (values shown are averages over the last 20 years of simulation), and the thick black line indicates the average over all models/scenarios. x-axis indicates the month of the climatology.

Data availability: Repository and code for reproducibility

A frozen version of the PEEPS data set and the Python code used for generating it is available for download from Zenodo, doi:10.5281/zenodo.7557622 (Version 1.1). (This has been tested for Python versions 3.7–3.11.) An institutional repository of the code, which will incorporate future updates such as possibly converting the code into a Python package with integrated testing, is available at

For each model/scenario/variable combination, the annual mean pattern scaling code produces up to three output files, depending upon the options selected:

  1. The patterns for pattern scaling: a NetCDF4 file containing the slope and intercept from the regression.
  2. A NetCDF4 file containing the timeseries of global average temperature.
  3. [optional] A NetCDF4 file containing the residuals from pattern scaling: a timeseries of ESM output data minus the reconstruction from pattern scaling at each grid point.

If the option to produce monthly pattern scaling is selected, items 1 and 3 will produce 12 files each, one per month. If a particular combination has multiple ensemble members, the code will average all ensemble members and then conduct pattern scaling on the average. All of these data are available at the Zenodo repository.

Of critical importance is that the PEEPS data set is accessible via multiple avenues (either directly from Zenodo or reproducing the dataset via code), conforms to a standard, and is reproducible. The code that generates the PEEPS data set is written in Python 3 and is contained in a fully documented Jupyter notebook calling functions from an included file (‘’); the code for calling these functions is also included as a simple Python script. The output data files are NetCDF4 files, with a structure similar to that found in CMIP6 models (which are CMORized), including inherited metadata attributes. In doing so, access to the data set and its underlying code individually is not a barrier to use for downstream research, such as impacts modeling.

While the Python code is included primarily for documentation and reproducibility standards, it does include user-editable options should a user prefer to generate specific patterns rather than download them directly from Zenodo. In the notebook itself, the user-editable options are the list of experiments and the list of variables, formatted to CMIP6 standards. The script will then automatically generate a list of all available output (all models) with those specifications and process it for annual or monthly averages, depending upon the block of code the user is running. Within each block there are options regarding whether to fit an intercept (if false, the intercept is assumed to be zero) and whether to save the residuals (item 3 above). The repository includes a tutorial and examples.

It is difficult to benchmark performance because to some degree this depends upon the user’s computer. On a 2018-era MacBook Pro laptop (2.6 GHz Intel Core i7), each scenario/variable combination completed within approximately two hours. Dask is not used as it was not necessary to manage memory for creating this data set, allowing the code to be more readable.


The PEEPS output patterns output for monthly and annual climate data do not, of course, emulate any aspect of the target ESM’s internal variability; internal variability can be important for a variety of impacts, including extreme events [18, 40]. While this may be a disadvantage on decadal or shorter timescales, on longer timescales uncertainties tend to be dominated by structural uncertainties (whether models include specific processes) or scenario uncertainties [41], where mean field emulation is advantageous because it is faster to conduct and easier to calibrate. A ready example of where mean field validation could be useful lies in the flexibility of PEEPS. Because the code is open source, users can adapt it to their needs, for example if they wish to calculate patterns based on multiple ESMs or scenarios (e.g., [6]).

Mean field emulation is also valuable for long-term impacts studies, especially if one wishes to explore the consequences of different future scenarios. As an example, an activity in the Agriculture Model Intercomparison Project (AgMIP) [42] involves using global gridded crop models to produce emulators of climatological-mean yield response to uniform perturbations in growing-season mean temperature and precipitation [43]. These yield response emulators presently do not account for variability, meaning the linear patterns presented here for monthly temperature and precipitation can be used to rapidly generate gridded monthly data that can be directly translated into impacts assessments. The areas where pattern scaling performs less well lie over ocean grid cells, which are not relevant for agricultural yields. An additional advantage of PEEPS for impacts modeling is the ability to incorporate numerous ESMs. The spatial patterns for each ESM and each scenario differ; by sampling this space, one can capture a range of uncertainties in impacts.

There have recently been efforts to create novel scenarios directly from ESM output data using time sampling [44], statistical relationships [7], time sample downscaling [8], or machine learning [45]. By comparing those results with the PEEPS data set (or pattern scaling more generally), one can understand how linear (or nonlinear) the climate model output is, providing guidance on sources of uncertainties in CMIP6 projections. Hence, having the trained, validated patterns available for direct download can save time for Earth system modelers interested in such diagnoses.

Conclusions and next steps

We confirm (unsurprisingly) that, like for CMIP5, linear pattern scaling is an effective means of obtaining global-to-local relationships in CMIP6. Also, as hypothesized, pattern scaling on monthly output works well, and for temperature and precipitation has nearly identical performance to annual mean pattern scaling. Pattern scaling on relative humidity also works quite well for both annual mean and monthly mean. Model performance can vary across scenarios, variables, and regions (Fig 7), indicating that when using results from pattern scaling one may want to select models that work better for one’s purposes.

As discussed in the Introduction, two of the difficulties posed by the large computational expense of ESMs are generating numerous realizations and exploring multiple scenarios. The PEEPS data could be combined with other efforts to directly address these problems in an accessible way.

A frozen version of the code and output is available for download from Zenodo, doi:10.5281/zenodo.7557622 (Version 1.1). An institutional repository of the code, which will incorporate future updates, is available at

Supporting information

S1 Fig. As in Fig 2 but computed using the RMSE of the residuals except for the mean residual.


S2 Fig. As in Fig 3 but computed using the RMSE of the residuals except for the mean residual.


S3 Fig. As in Fig 4 but computed using the RMSE of the residuals except for the mean residual.


S4 Fig. As in Fig 5 but computed using the RMSE of the residuals except for the mean residual.


S5 Fig. As in Fig 2 but computed using the pooled variance across all models rather than the variance of the individual models.


S6 Fig. As in Fig 3 but computed using the pooled variance across all models rather than the variance of the individual models.


S7 Fig. As in Fig 4 but computed using the pooled variance across all models rather than the variance of the individual models.


S8 Fig. As in Fig 5 but computed using the pooled variance across all models rather than the variance of the individual models.


S9 Fig. As in Fig 6 but computed using the pooled variance across all models rather than the variance of the individual models.


S10 Fig. As in Fig 7 but computed using the pooled variance across all models rather than the variance of the individual models.


S11 Fig. As in Fig 8 but computed using the pooled variance across all models rather than the variance of the individual models.


S12 Fig. As in Fig 9 but computed using the pooled variance across all models rather than the variance of the individual models.


S13 Fig. As in Fig 10 but computed using the pooled variance across all models rather than the variance of the individual models.


S14 Fig. A histogram (count = number of model grid points) of the difference between the residuals (generated minus actual, averaged over the last 20 years of simulation for each scenario) for annual mean pattern scaling and the average of the residuals for all 12 months of monthly pattern scaling.

Units are °C, mm/day, and %, respectively.


S15 Fig. In each grid box, the percentage of models where the surface air temperature (tas) residual in the historical simulation for monthly mean pattern scaling is within one standard deviation of the model output.

See description in Section 3.1 in the main text for further details.


S27 Fig. As in S15 Fig but for surface relative humidity.


S33 Fig. An aggregate of S15S20 Figs illustrating for each grid point the number of scenarios for which the value (percent of models within one standard deviation of the baseline temperature) is at least 90%.


S34 Fig. An aggregate of S21S26 Figs illustrating for each grid point the number of scenarios for which the value (percent of models within one standard deviation of the baseline precipitation) is at least 90%.


S35 Fig. An aggregate of S27S32 Figs illustrating for each grid point the number of scenarios for which the value (percent of models within one standard deviation of the baseline surface air temperature) is at least 90%.


S36 Fig. As in S33 Fig but computed using the pooled variance across all models rather than the variance of the individual models.


S37 Fig. As in S34 Fig but computed using the pooled variance across all models rather than the variance of the individual models.


S38 Fig. As in S35 Fig but computed using the pooled variance across all models rather than the variance of the individual models.



  1. 1. Eyring V, Bony S, Meehl GA, Senior CA, Stevens B, Stouffer RJ, et al. Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geoscientific Model Development. 2016;9(5):1937–1958.
  2. 2. O’Neill BC, Tebaldi C, van Vuuren DP, Eyring V, Friedlingstein P, Hurtt G, et al. The Scenario Model Intercomparison Project (ScenarioMIP) for CMIP6. Geoscientific Model Development. 2016;9(9):3461–3482.
  3. 3. Kay JE, Deser C, Phillips A, Mai A, Hannay C, Strand G, et al. The Community Earth System Model (CESM) Large Ensemble Project: A Community Resource for Studying Climate Change in the Presence of Internal Climate Variability. Bulletin of the American Meteorological Society. 2015;96(8):1333–1349.
  4. 4. Kravitz B, Lynch C, Hartin C, Bond-Lamberty B. Exploring precipitation pattern scaling methodologies and robustness among CMIP5 models. Geoscientific Model Development. 2017;10(5):1889–1902.
  5. 5. Lynch C, Hartin C, Bond-Lamberty B, Kravitz B. An open-access CMIP5 pattern library for temperature and precipitation: description and methodology. Earth System Science Data. 2017;9(1):281–292.
  6. 6. Wells CD, Jackson LS, Maycock AC, Forster PM. Understanding pattern scaling errors across a range of emissions pathways. Earth system change: climate prediction; 2022. Available from:
  7. 7. Beusch L, Gudmundsson L, Seneviratne SI. Emulating Earth system model temperatures with MESMER: from global mean temperature trajectories to grid-point-level realizations on land. Earth System Dynamics. 2020;11(1):139–159.
  8. 8. Nath S, Lejeune Q, Beusch L, Seneviratne SI, Schleussner CF. MESMER-M: an Earth system model emulator for spatially resolved monthly temperature. Earth System Dynamics. 2022;13(2):851–877.
  9. 9. Schwarber AK, Smith SJ, Hartin CA, Vega-Westhoff BA, Sriver R. Evaluating climate emulation: fundamental impulse testing of simple climate models. Earth System Dynamics. 2019;10(4):729–739.
  10. 10. MacMartin DG, Kravitz B. Dynamic climate emulators for solar geoengineering. Atmospheric Chemistry and Physics. 2016;16(24):15789–15799.
  11. 11. Dorheim K, Link R, Hartin C, Kravitz B, Snyder A. Calibrating Simple Climate Models to Individual Earth System Models: Lessons Learned From Calibrating Hector. Earth and Space Science. 2020;7(11).
  12. 12. Nicholls ZRJ, Meinshausen M, Lewis J, Gieseke R, Dommenget D, Dorheim K, et al. Reduced Complexity Model Intercomparison Project Phase 1: introduction and evaluation of global-mean temperature response. Geoscientific Model Development. 2020;13(11):5175–5190.
  13. 13. Weber T, Corotan A, Hutchinson B, Kravitz B, Link R. Technical note: Deep learning for creating surrogate models of precipitation in Earth system models. Atmospheric Chemistry and Physics. 2020;20(4):2303–2317.
  14. 14. Ayala A, Drazic C, Hutchinson B, Kravitz B, Tebaldi C. Loosely Conditioned Emulation of Global Climate Models With Generative Adversarial Networks. arXiv. 2021.
  15. 15. Link R, Snyder A, Lynch C, Hartin C, Kravitz B, Bond-Lamberty B. Fldgen v1.0: an emulator with internal variability and space–time correlation for Earth system models. Geoscientific Model Development. 2019;12(4):1477–1489.
  16. 16. Alexeeff SE, Nychka D, Sain SR, Tebaldi C. Emulating mean patterns and variability of temperature across and within scenarios in anthropogenic climate change experiments. Climatic Change. 2018;146(3-4):319–333.
  17. 17. Tebaldi C, Knutti R. Evaluating the accuracy of climate change pattern emulation for low warming targets. Environmental Research Letters. 2018;13(5):055006.
  18. 18. Tebaldi C, Armbruster A, Engler HP, Link R. Emulating climate extreme indices. Environmental Research Letters. 2020;15(7):074006.
  19. 19. Santer B, Wigley TML, Schlesinger ME, Mitchell JFB. Developing climate scenarios from equilibrium GCM results. Max Planck Institute for Meteorology; 1990. 47.
  20. 20. Tebaldi C, Arblaster JM. Pattern scaling: Its strengths and limitations, and an update on the latest model simulations. Climatic Change. 2014;122(3):459–471.
  21. 21. Ruane AC, Rosenzweig C, Asseng S, Boote KJ, Elliott J, Ewert F, et al. An AgMIP framework for improved agricultural representation in integrated assessment models. Environmental Research Letters. 2017;12(12):125003. pmid:30881482
  22. 22. Calvin K, Bond-Lamberty B. Integrated human-earth system modeling—state of the science and future directions. Environmental Research Letters. 2018;13(6):063006.
  23. 23. Meinshausen M, Smith SJ, Calvin K, Daniel JS, Kainuma MLT, Lamarque JF, et al. The RCP greenhouse gas concentrations and their extensions from 1765 to 2300. Climatic Change. 2011;109:213.
  24. 24. Odaka TE, Banihirwe A, Eynard-Bontemps G, Ponte A, Maze G, Paul K, et al. The Pangeo Ecosystem: Interactive Computing Tools for the Geosciences: Benchmarking on HPC. In: Juckeland G, Chandrasekaran S, editors. Tools and Techniques for High Performance Computing. vol. 1190. Cham: Springer International Publishing; 2020. p. 190–204. Available from:
  25. 25. Kravitz B, Snyder AC. Pangeo-Enabled ESM Pattern Scaling (PEEPS): A customizable dataset of emulated Earth System Model output [Dataset]; 2022. Available from:
  26. 26. Castruccio S, McInerney DJ, Stein ML, Liu Crouch F, Jacob RL, Moyer EJ. Statistical Emulation of Climate Model Projections Based on Precomputed GCM Runs. Journal of Climate. 2014;27(5):1829–1844.
  27. 27. Herger N, Sanderson BM, Knutti R. Improved pattern scaling approaches for the use in climate impact studies: IMPROVED PATTERN SCALING APPROACHES. Geophysical Research Letters. 2015;42(9):3486–3494.
  28. 28. Kasim MF, Watson-Parris D, Deaconu L, Oliver S, Hatfield P, Froula DH, et al. Building high accuracy emulators for scientific simulations with deep neural architecture search. Machine Learning: Science and Technology. 2022;3(1):015013.
  29. 29. Watson-Parris D, Rao Y, Olivié D, Seland O, Nowack P, Camps-Valls G, et al. ClimateBench v1.0: A Benchmark for Data-Driven Climate Projections. Journal of Advances in Modeling Earth Systems. 2022;14(10).
  30. 30. IPCC. Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press; 2021.
  31. 31. Reichler T, Kim J. How Well Do Coupled Models Simulate Today’s Climate? Bulletin of the American Meteorological Society. 2008;89(3):303–312.
  32. 32. Collier N, Hoffman FM, Lawrence DM, Keppel-Aleks G, Koven CD, Riley WJ, et al. The International Land Model Benchmarking (ILAMB) System: Design, Theory, and Implementation. Journal of Advances in Modeling Earth Systems. 2018;10(11):2731–2754.
  33. 33. Keil P, Mauritsen T, Jungclaus J, Hedemann C, Olonscheck D, Ghosh R. Multiple drivers of the North Atlantic warming hole. Nature Climate Change. 2020;10(7):667–671.
  34. 34. Pithan F, Mauritsen T. Arctic amplification dominated by temperature feedbacks in contemporary climate models. Nature Geoscience. 2014;7(3):181–184.
  35. 35. Hahn LC, Armour KC, Zelinka MD, Bitz CM, Donohoe A. Contributions to Polar Amplification in CMIP5 and CMIP6 Models. Frontiers in Earth Science. 2021;9:710036.
  36. 36. Kim H, Kang SM, Kay JE, Xie SP. Subtropical clouds key to Southern Ocean teleconnections to the tropical Pacific. Proceedings of the National Academy of Sciences. 2022;119(34):e2200514119. pmid:35969773
  37. 37. Hill S, Ming Y. Nonlinear climate response to regional brightening of tropical marine stratocumulus: CLIMATE RESPONSE TO CLOUD BRIGHTENING. Geophysical Research Letters. 2012;39(15).
  38. 38. Chylek P, Folland C, Klett JD, Wang M, Hengartner N, Lesins G, et al. Annual Mean Arctic Amplification 1970–2020: Observed and Simulated by CMIP6 Climate Models. Geophysical Research Letters. 2022;49(13).
  39. 39. Newman M, Alexander MA, Ault TR, Cobb KM, Deser C, Di Lorenzo E, et al. The Pacific Decadal Oscillation, Revisited. Journal of Climate. 2016;29(12):4399–4427.
  40. 40. Quilcaille Y, Gudmundsson L, Beusch L, Hauser M, Seneviratne SI. Showcasing MESMER-X: Spatially Resolved Emulation of Annual Maximum Temperatures of Earth System Models. Geophysical Research Letters. 2022;49(17). pmid:36245896
  41. 41. Hawkins E, Sutton R. The potential to narrow uncertainty in projections of regional precipitation change. Climate Dynamics. 2011;37(1-2):407–418.
  42. 42. Rosenzweig C, Jones JW, Hatfield JL, Ruane AC, Boote KJ, Thorburn P, et al. The Agricultural Model Intercomparison and Improvement Project (AgMIP): Protocols and pilot studies. Agricultural and Forest Meteorology. 2013;170:166–182.
  43. 43. Franke JA, Müller C, Elliott J, Ruane AC, Jägermeyr J, Snyder A, et al. The GGCMI Phase 2 emulators: global gridded crop model responses to changes in CO2, temperature, water, and nitrogen (version 1.0). Geoscientific Model Development. 2020;13(9):3995–4018.
  44. 44. Tebaldi C, Snyder A, Dorheim K. STITCHES: creating new scenarios of climate model output by stitching together pieces of existing simulations. Earth System Dynamics. 2022;13(4):1557–1609.
  45. 45. Barnes EA, Hurrell JW, Ebert-Uphoff I, Anderson C, Anderson D. Viewing Forced Climate Patterns Through an AI Lens. Geophysical Research Letters. 2019;46(22):13389–13398.