Downscaling approaches of climate change projections for watershed modeling: Review of theoretical and practical considerations

Water resources managers must increasingly consider climate change implications of, whether the concern is floods, droughts, reservoir management, or reliably supplying consumers. Hydrologic and water quality modeling of future climate scenarios requires understanding global climate models (GCMs), emission scenarios and downscaling GCM output, since GCMs generate climate predictions at a resolution too coarse for watershed modeling. Here we present theoretical considerations needed to understand the various downscaling methods. Since most watershed modelers will not be performing independent downscaling, given the resource and time requirements needed, we also present a practical workflow for selecting downscaled datasets. Even given the availability of a number of downscaled datasets, a number of decisions are needed regarding downscaling approach (statistical vs. dynamic), GCMs to consider, options, climate statistics to consider for the selection of model(s) that best predict the historical period, and the relative importance of different climate statistics. Available dynamically-downscaled datasets are more limited in GCMs and time periods considered, but the watershed modeler should consider the approach that best matches the historical observations. We critically assess the existing downscaling approaches and then provide practical considerations (which scenarios and GCMs have been downscaled? What are some of the limitations of these databases? What are the steps to selecting a downscaling approach?) Many of these practical questions have not been addressed in previous reviews. While there is no “best approach” that will work for every watershed, having a systematic approach for selecting the multiple options can serve to make an informed and supportable decision.


Introduction
Assessment of climate change impacts on water resources involves several methodological decisions, including selection of global climate models (GCMs), emission scenarios, downscaling techniques, and hydrologic modeling approaches [1]. A watershed modeler, interested in the response of hydrology and biogeochemistry to future climate projections in a particular region to inform decisions, is faced with a number of options for implementing the modeling a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 framework, including selecting applicable GCMs, scenarios, downscaling approaches, and watershed model(s). Most watershed modelers will not be running a GCM to obtain climate predictions. While GCM output (e.g., predicted precipitation and temperature for future periods) is available for many GCM models and future scenarios, it is typically at a resolution that is unsuitable for watershed modeling, even for very large watersheds. While the spatial resolution varies according to objective in watershed modeling, it is usually on the order of tens to hundreds of km 2 , to more accurately represent local soils, land use and climate [2]. Thus, the first decision the watershed modeler needs to make is how to downscale GCM output to the finer resolution needed to model watersheds at the subwatershed scale. Downscaling transforms from global climate models output (GCMs) at coarse spatial resolution (thousands of km 2 ) to the fine spatial resolution needed for watershed modeling or other applications. Downscaling approaches assume that local climate is a combination of large-scale (global, continental, regional) climatic/atmospheric features and local attributes (topography, land use, large water bodies). Downscaling is a key step in understanding future implications to local regions, particularly for hydrology, because the underlying processes that determine impact require an understanding of the local climate and its drivers, such as topography, which are not captured at the coarser scale of GCMs. The watershed modeler may actually employ downscaled GCM products. However, it is important to understand the differences between various downscaling methods and techniques, in order to make an informed decision prior to selecting a particular downscaled product for watershed modeling.

Objectives of this review
The objective of this review is to provide water resource managers and modelers: (1) an overview of the theory behind the main downscaling approaches, and the improvements that have been made to correct for short-coming in different methods; and (2) a set of practical considerations, including an applied case study, for selecting downscaled GCM products. We critically assess the existing downscaling approaches and then provide practical considerations (which scenarios and GCMs have been downscaled? What are some of the limitations of these databases? What are the steps to selecting a downscaling approach?) Many of these practical questions have not been addressed in previous reviews.

Global climate models
GCMs predict climate behavior by using mathematical representations of well-documented physical processes to simulate the transfer of energy and matter throughout the global climate system, at a relatively coarse resolution [3]. In addition the sequence of interactions, spatially and temporally, that are used to model the climate system, GCMs depend on setting initial conditions, and considering changes to climate forcing [3]. Grid cells size defines the resolution of the model-the smaller the grid, the more detail in the model. The more detailed the GCM, the more data are required and the more computing power necessary to run the model. The additional dimension in climate models is time; GCMs can be run at hourly, daily, or monthly time-steps [3].
Once a model is set up, it is compared to historical climate observations to determine its accuracy. It can then be used to project future climate based on a range of greenhouse gas (GHG) and aerosol emissions scenarios [3]. The Intergovernmental Panel on Climate Change (IPCC) is the most common source of future emissions scenarios. Distinct scenarios have been developed for GCM testing based on radiative forcing, such as those from the Coupled Model Intercomparison Project Phase 5 (CMIP5), or more recently on the complex relationship between socioeconomic forces that drive GHG and aerosol emissions across the globe and the levels to which those emissions are expected to increase over time, from Phase 6 (CMIP6) [4,5]. The IPCC scenarios are collectively referred to as Representative Concentration Pathways (RCPs) [6].
Globally, there are several different groups of scientists developing and running climate models. NOAA's Geophysical Fluid Dynamics Laboratory (GFDL) was one of the first groups to combine oceanic and atmospheric processes into a single model, which was recently updated [7]. The US National Center for Atmospheric research has also spent decades refining the Community Atmosphere Model, which includes models for soil and vegetation. The Hadley Center for Climate Prediction and Research (Hadley) is another major research group that has developed the HadCM3 model [8]. There are several other models available, developed by different groups and countries, and each has its pros and cons, and accuracy varies by region, climate parameter, and time period. Some models may be better at predicting snowfall, whereas other at predicting extreme temperatures. The IPCC considers many different models in their evaluation of climate change, whereas for practical purposes most watershed modelers will likely select a smaller subset of models that still provides a range of probable climatic projections. The challenge becomes: Which GCMs to consider? Which future scenarios? What downscaling approach(es)? Table A in S1 Text provides additional questions and considerations that should be taken into account before launching into a climate change assessment.
Ensembles are also an option. An ensemble is a group of climate model simulations. Rather than running a single climate model, an ensemble reflects the output of several GCMs, where each GCM is somewhat unique to produce a range of possible scenarios. Ensemble results have come closest to replicating historical climate and can rely on both statistical and dynamic downscaling methods [9].

Downscaling approaches
The diversity of downscaling approaches reflects the diversity of goals and resources available for each assessment [10]. There is no single best downscaling approach, and methods will vary based on the desired spatial and temporal resolution of outputs and the characteristics of the greatest climate impact of interest [10]. There are two broad categories of downscaling methods: statistical downscaling (SD) and dynamical downscaling (DD, also referred to as regional climate models, RCMs). Each method is capable of providing station-scale data for watershed modeling. The trade-offs between methods include ease of use and ability to represent changes in temporal patterns [11]. A method's performance depend on the accuracy of: (1) representation of features (e.g., topography, land use) at higher resolution, (2) representation of processes and interactions (e.g., weather systems), and (3) physical parameterizations at the desired spatial scale and time step interval [12]. Because of this, downscaling accuracy varies by location, season, parameter, and boundary conditions (e.g., one method may be better for the West Coast and another for the Midwest) [13,14].
GCMs, even at fine spatial scales, are subject to considerable bias when compared to observed data [15]. Bias correction, is therefore a common step in SD, but also frequently in DD, as a way to remove systematic biases in the mean of the climate forcing data [5]. Climate model simulations are largely influenced by the way the model represents processes, energy and moisture budgets, as well as the simulation of clouds [15]. Therefore, GCM output postprocessing is another common step to improve GCM output before using it in downscaling studies [13,15].
Selection of downscaling method depends on the spatial scale for watershed modeling, the climate variables of interest, availability of historical observed data for the region and spatial scale of interest, and the resources available (Fig 1) [13]. Different sets of calibrated model parameters can yield divergent simulation results which may lead to different conclusions [16].

Statistical downscaling
Statistical downscaling relies on observed relationships between large scale atmospheric variables and local/regional climate variables, often referred to as predictors and predictands respectively [11]. Variables of interest are downscaled using the empirical relationship between historic observed and modeled data [14]. The underlying assumption is that the relationship between the predictors and predictands can be transferred to model future predictors from the GCMs because we assume that the processes that controlled climate in the past will continue to control local climate in the future using the same statistical patterns [17,18]. Parameter selection is an iterative process consisting of screening settings and predictors until the relationship is optimized [17]. Sensitivity analyses have shown that choice of data, choice of method, length of calibration period, selection of predictor variables and station data all have a significant impact on results [18,19]. A key limitation of SD is that it may not account for natural climate variability since earth's system is highly nonlinear, meaning that the statistical relationship that applied in the past may not apply in the future [19][20][21]. A key strength, however, of SD is its low computational demand and rapid ability to create scenarios [18].
There are four main categories of statistical downscaling techniques available: weather typing, constructed analog, weather generators, and regression methods. Regression techniques encompass a broad range of methods from the simplicity of linear regressions to machine learning. All have different pros and cons, though a review of the last couple decades of research show that many have moved away from weather typing and weather generators for future impact assessments.
Weather typing. Weather typing, also referred to as weather pattern-based, is a multivariate statistical downscaling method that relies on conditional resampling. It involves grouping local meteorological variables in relation to different classes of atmospheric circulation [22]. The main advantage of this method is that local variables are closely linked to global circulation [23]. However, reliability depends on having a static relationship between large scale circulation patterns and local climate [23]. For watershed modeling, where hourly or daily precipitation is an important variable, this method may not always be useful because typically the correlation between local precipitation and large-scale circulation patterns is not strong [23]. Approaches that have been used to improve weather typing include Constructed Analogs and variations such as Bias Corrected Constructed Analogs, Localized Constructed Analogs and Multivariate Adaptive Constructed Analogs.
Constructed analogs. Constructed Analog (CA) relies on matching large-scale variables from observations with GCM model output and using the analog relationship as a scaling factor for future climate [24][25][26]. CA assumes that if an exact analog (in the historical record) to current weather can be found, future weather should follow the same analog [24]. The method is similar to principal component analysis (PCA) where multiple dependent variables represent similar relations. CA uses linear regression to develop a best-fit analog of GCM output from historical data [24]. A key limitation of the original CA is that it neglects model bias and is unable to project climates if no analog could be identified [25].
Bias Corrected Constructed Analog (BCCA) use bias correction to improve the CA method [25]. Biases between the models are removed, typically using a quantile mapping technique, which adjusts simulated values by quantile to better match observations. Another modification to the CA method, called Localized Constructed Analogs (LOCA), has the added benefit of

PLOS WATER
being less sensitive to spatial scale and is better able to estimate extreme values [27]. LOCA produces downscaled estimates suitable for hydrologic modeling by using a multi-scale spatial matching scheme to pick appropriate analogs from observed data. Multivariate Adaptive Constructed Analogs (MACA) is another statistical downscaling method that utilizes meteorological training data (observations) to remove historical biases and match spatial patterns in climate model output [28]. Flint and Flint (2012) used statistically downscaled future climate projects from CA, then further downscaled them spatially using a gradient-inverse-distance-squared approach for hydrologic modeling at a 270-m spatial resolution [24]. They used continental-scale historical (observed) patterns of daily precipitation and air temperature at coarse resolution and their fine-resolution (approximately 12 km) equivalents with a statistical approach to climate and bias correction [24,29]. Results indicate successful downscaling as the 270-m resolution was closer to the measured monthly station data for the 18-year record than the 4-km PRISM model [24]. Teutschbein et al. (2011) investigated the variability of seasonal streamflow and flood-peak projections using three statistical methods to downscale precipitation (analog, SDSM, and a multi-objective fuzzy-rule-based classification (MOFRBC) [17]. The analog method used historical observed data to identify a set of previous analog events that most resembled the target time period [17]. MOFRBC is a circulation-pattern classification method combined with a conditional precipitation model (a stochastic weather generator) [17,30]. MOFRBC employs the circulation patterns in order to classify weather situations of different types [17,30].

Weather generators
Temporal downscaling can be performed when the temporal scale of GCM output is modified to the needs of the watershed modeling project. Weather generators are most often used in temporal downscaling, as GCM outputs are sometimes on a monthly time step and watershed models usually require a daily time step [10]. Weather generators are stochastic models that simulate future climate by either perturbing the parameters or by fitting to perturbed statistics [31][32][33][34]. Weather generators are parametric models that rely on mathematical formulae with explicit random elements to emulate actual weather data, usually at a daily timescale and for individual weather stations [34]. Downscaling using weather generators requires changing the climate parameters based on the changes between current-observed and future-predicted GCM climate [34]. The most commonly used stochastic precipitation generator is based on a first-order Markov chain process in which daily binary (dry or wet) precipitation occurrences are generated, and the amount of precipitation on wet days are generated independently from a regional/local probability distribution [34].
As with most SD methods, a major benefit to weather generators is their ability to rapidly develop long time series of climate scenarios to study the impact of rare climate events [32,33]. They also provide a stochastic downscaling approach, which is beneficial since combinations of small-scale (e.g., weather station) conditions can be consistent with the large-scale (e.g., GCM grid) model [34].
Statistical down scaling model. The Statistical Down Scaling Model (SDSM) is a conditional weather generator that uses atmospheric circulation indices and regional moisture variables to estimate time-varying parameters (e.g., precipitation and temperature) that describe the daily weather at individual sites/stations [35]. SDSM is a useful method because multiple, low-cost, single site scenarios for daily climate parameters can be generated easily and rapidly for different climate forcing conditions [36].
A test of SDSM for streamflow modeling in Quebec showed that SDSM provides reasonable downscaling data (local scale temperature and precipitation) when using predictors that represent observed current climate [37]. However, the performance was less reliable when using GCM predictors [37]. The choice of downscaling methods has a significant impact on streamflow simulations: while SDSM was found to suitable for downscaling precipitation for a river basin in Sweden, Teutschbein et al. (2011) concluded that an ensemble approach was preferable [17].

Regression methods
Regression methods rely on statistics to extrapolate data to higher resolutions based on the identified relationship between predictors and predictands. Sometimes referred to as transfer functions, a linear or nonlinear relationship between observed local climatic variables and large scale GCM outputs is defined using a regression method [35,37,38]. As with all statistical methods, the biggest limitation is the high probability of a lack of a stable relationship between current and future climate (predictors and predictands) [38].
Canonical correlation analysis. Canonical Correlation Analysis (CCA) is used to identify and measure the relationship among sets of variables. It is sometimes used in place of multiple regression when there are multiple intercorrelated variables. CCA seeks an optimal linear combination between predictors and predictands and selects pairs of patterns so that a maximum correlation is produced [39,40]. Before CCA, predictors and predictands are projected onto their empirical orthogonal functions (EOFs) to eliminate unwanted noise (small-scale features) and to reduce data space dimensionsionality [40]. Given the approach used to develop the correlations, CCA allows a physical interpretation of the mechanism controlling regional climate variability [40]. Busuioc et al. (2007) used CCA to downscale precipitation in Italy [40]. Three indices were used that focused on extreme rainfall events: the number of events exceeding the long-term 90 th percentile of rainy days, simple daily intensity, and the maximum number of consecutive dry days [40]. Busuioc et al. found that, generally, the combination of the first five to six EOFs and first two EOFs for all regional indices provided the best results [40].
Bias correction and spatial disaggregation. Bias Correction and Spatial Disaggregation (BCSD) relies on a historic dataset at the same grid scale as the spatially downscaled variables, corrects biases using quantile mapping of modeled against observed climate, and then applies anomalies in the observed variables to the spatially interpolated model [24,25,41]. Quantile mapping has the advantage of not having a specific requirement for the length of time or the number of observed stations and it has been found to have the best performance in reducing biases. BCSD has been used in several hydrologic impact analyses and has demonstrated downscaling capabilities comparable to other statistical and dynamic methods [13,41]. A major limitation, however, is that often monthly data is used and while the output can be converted to daily, unrealistic meteorology may result which is problematic for watershed modeling [41].
Maurer and Hidalgo (2008) compared CA with BCSD across the western US [13]. Coarse scale precipitation and temperature were employed in both downscaling approaches as predictors [13]. CA downscaled daily large-scale data directly but at a coarse spatial resolution whereas BCSD downscaled monthly data, with a random resampling technique to generate daily values [13]. Both methods struggled to reproduce wet and dry extremes for precipitation and were better at reproducing temperature, though the CA method exhibited slightly more accuracy with the extremes [13]. Maurer et al. (2010) later compared CA, BCSD, and a hybrid BCCA (using quantile-mapping bias correction prior to the CA method) downscaling for use with the VIC hydrologic model [42]. The bias correction considered in BCCA was essentially identical to that of BCSD, however, quantile mapping was used for all daily values (precipitation, maximum and minimum temperature) within each month [42]. While the bias correction included with BCCA forced the cumulative distribution function to match observations for the historical (observed) period, some biases due to the downscaling methods remained [42]. The three downscaling methods produced reasonable streamflow statistics at most locations, but the BCCA method consistently performed better than the other methods in simulating streamflow [42]. All methods performed well for generating extreme peak flows, but the hybrid BCCA method generated a better match of annual flow volume, showing that BCCA is also better at longer temporal scales [42]. Mizukami et al. (2015) examined the effects of four statistical downscaling methods (BCCA, BCSD daily and BCSD monthly, and asynchronous regression (AR)) on retrospective hydrologic simulations using three hydrologic models (CLM, VIC, and PRMS) [43]. Each SD method produced a different meteorological dataset including differences in precipitation, wet-day frequency, an energy input [43]. BCCA was found to consistently underestimate precipitation across the US leading to unrealistic hydrologic portrayals, whereas the other three methods overestimated precipitation, though BCSD-daily overestimated the wet-day frequency the most [43]. Despite this, the choice of downscaling method was found to impact runoff estimates less (excluding BCCA) than the choice of hydrologic model (with default parameters) [43].
Change factor. The Change Factor (CF) method, sometimes called the Delta Change method, calculates differences between simulated current and future climate and adds these differences to observed time series [18]. The CF method is based on the assumption that GCMs are more reliable simulating relative rather than absolute values, thus the observed time series is adjusted by either adding the difference or multiplying the ratio of parameters between future and present climate as simulated by the GCM [18,23]. An additive CF is the arithmetic difference between future and baseline projections, whereas a multiplicative CF is the ratio between future and baseline projections [44]. Additive and multiplicative CFs can be combined depending on the empirical distribution functions and any data limitations [44]. The selection of time periods, temporal domains, temporal scales, number of change factors, and type of change factors varies depending on the meteorological parameters and needs of the study [44]. Applying the CF method to hydrologic modeling, especially for urban watersheds, can present challenges because precipitation data need to be at fine time increments and change factors can produce negative precipitation values, can overestimate precipitation, or can increase the number of precipitation events [44,45].
A combined change factor method (CCFM) can be applied as a secondary bias-correction technique to reduce a number of the issues associated with the traditional CF method [44]. This is typically done by computing empirical cumulative distribution functions (ECDF) for both baseline and future time periods, sorting them to create ratio and difference plots, which facilitating the selection of additive or multiplicative CFs, with differences and ratios calculated for each percentile of the cumulative distribution function (CDF) [44]. Adjustments are typically then made to address negative values, overestimations, and to eliminate artificially high numbers of precipitation events [38]. The combined method eliminated or reduced these issues, and the predicted precipitation time series closely matched observed precipitation patterns [44]. Hay et al. (2007) tested the delta change method for precipitation, temperature, and runoff across three mountainous regions in the US. For the basins tested, realistic runoff scenarios were simulated successfully using statistically downscaled NCEP (National Center for Environmental Prediction) output, but not using statistically downscaled HadCM2 (Hadley Centre for Climate Prediction and Research) GCM [18]. Camici et al. (2014) compared the downscaling ability of delta change and bias correction through quantile mapping for two GCMs for hourly rainfall, discharge and flood modeling [46]. The delta change method projected a decrease in the flood frequency curve, but with quantile mapping, one GCM predicted a decrease in the frequency of annual maximum discharge and the other predicted an increase [46]. As with other similar studies, a key conclusion was that an ensemble GCM with multiple downscaling methods is preferable for flood mapping [46].
Machine learning and artificial neural networks. Artificial neural networks, sometimes called adaptable random forests, are still a relatively new and uncommon methods for SD that rely on machine learning. Though there are still only a few examples of random forest (RF) methods, they typically rely on the nonlinear relationship between predictors and predictands, making them particularly useful for downscaling precipitation, since this relationship is often nonlinear [47]. RF is an adaptable method that can be applied to downscale data from models, remote sensing retrievals, or gridded observations at a range of different resolutions [47].
An early example of machine learning for downscaling compared SDSM with Smooth Support Vector Machine (SSVM) for hydrologic modeling [48]. SSVM is a supervised machine learning method that uses limited sample information on an unconstrained convex quadratic optimization problem using smoothing [48]. Rainfall varied greatly between the two downscaling methods, though SDSM was found to have better performance than SSVM [48].
Later, He et al. (2016) developed Prec-DWARF (Precipitation Downscaling with Adaptable Random Forests), a machine-learning based method for statistical downscaling of precipitation across the continental US [47]. Different RF methods were tested and shown to reasonably reproduce the spatial and temporal patterns of precipitation [47]. A single RF tended to underestimate precipitation extremes, but predictions were improved by adding a second RF that was specifically designed to capture the relationship between covariates and target rainfall field for heavy and extreme precipitation. He et al. (2016) were able to successfully reproduce the geometrical and statistical characteristics of precipitation using RF [47]. It is still a somewhat limited method in that it consistently underestimates spatial variability, temporal dependence, and frequency of very high rainfall rates, and overestimates the amount and spatial extent of low intensity rainfall [47].

Dynamic downscaling
Dynamic downscaling (DD) relies on regional climate models (RCMs) to simulate local climate based on large-scale weather predicted by a GCM [14,24]. In DD, GCM predictions are used to provide the initial conditions and lateral boundary conditions (LBCs) (i.e., the grid edges) that define downscaling to a finer-resolution RCM [24,49]. This typically involves nesting finer-scale grids into the coarser GCM grid to produce a spatially complete set of climate variables that preserve both the spatial correlation as well as relationships between variables [23,50,51]. DD is computationally expensive, so historically, it hasn't been used as often for climate change impact studies that required long temporal periods or multiple scenarios [24], but as computing power has increased, this is less the case. Resolution capabilities and new predictive components have been added as capacity grows, increasing the functionality of DD for impact assessments [52].
The underlying assumption is that a GCM describes the response of global circulation to large-scale forcing (e.g., due to GHG emissions or solar radiation variations), while an RCM refines the climate projections, spatially and temporally, by considering smaller-scale features (e.g., topography, coastlines, bodies of water, land cover) [50]. DD explicitly solves for processbased physical dynamics to extract local-scale data from large-scale GCMs by developing either limited-area models (LAMs) or RCMs [53]. LAMs are essentially a portion of a GCM, run on a limited scale, that provide useful information for downscaling and can account for the processes impacted by local forcing (e.g., orography, coastline, vegetation, etc.) [54]. Similarly, a dynamically downscaled RCM considers more accurately areas with complex topography for estimating precipitation intensity and snow processes [55].
RCM simulations are particularly sensitive to initial conditions, thus the simulated sequence of weather events will often not match the driving model [56,57]. RCMs tend to simulate systematic errors because of their incomplete representation of the climate system and their dynamic and physical configurations. Some of these errors can be reduced through postprocessing using bias correction techniques (from SD) [57,58]. In addition, it has been found that downscaling too much using DD can actually be detrimental to prediction accuracy. Chan et al. (2013) compared the accuracy of dynamically downscaling to 50-, 12-, and 1.5-km and found that downscaling from 50-to 12-km improved precipitation simulations, but that seasonal biases increased with further downscaling to 1.5-km, resulting in too much precipitation [59].
There are multiple methods used for dynamic downscaling including one-, two-, and multi-way nesting, nudging, variable resolution grid models, and reinitialization.
Nested models. Nested RCMs are built using two or more RCM over a selected spatial domain at different grid scales, constrained by initial and time-dependent meteorological lateral boundary conditions from a GCM [50,52]. Nested RCMs are intended to add more realistic sub-GCM grid-scale detail [52]. One way nesting is subject to mismatches between simulated parameters and those of the GCM, whereas two-way nesting partially addresses this by having the RCM feed information back into the GCM model [60,61]. Multiple nesting, where consecutive nested models are used at increasing resolutions, can be used to reach highly localized areas [61,62]. However, adding nesting becomes increasingly computationally expensive.  [63]. Only the 4-km simulation correctly simulated precipitation totals, though in winter the 4-and 12-km simulations had similar performance [63]. The main advantage of the 4-km simulation was the improved mesoscale patterns of heavy precipitation and the larger-scale patterns of heavy precipitation [63].
Nudging. Nudging is a method that adds a correction to the predictive equations of variables to prevent the RCM drift from GCM results [64,65]. Spectral Nudging can be used to constrain RCM biases and improve results by forcing the RCM to follow the large-scale (GCM) driving boundary conditions [49]. What this does is relax the simulation by adding a nudging term to selected parameters that is proportional to the difference between the simulated and prescribed states (similar to the delta change method) [49]. This results in greater consistency between RCM and GCM. Nudging can, however, force the RCM to retain and potentially exacerbate biases from the GCM [49].
For example, Xu and Yang (2015) compared three sets of RCMs with identical model configurations [except for the initial and lateral boundary conditions] with and without spectral nudging, and with bias corrections and nudging where the nudging strength was progressively reduced [49]. Bias correction was done by interpolating the NNRP (National Centers for Environmental Prediction/National Center for Atmospheric Research Reanalysis Project) data to Community Atmosphere Model (CAM) grids, then computing the CAM biases in mean and variance and removing the biases by subtracting the mean bias and scaling variance from the original CAM simulation. Spectral nudging was applied to air temperature, horizontal winds, and geopotential height [49]. Both bias correction and spectral nudging improved results, though not consistently across all parameters [49]. Precipitation responded inconsistently to spectral nudging, with some regions and some seasons overestimating and some underestimating [49].
Grid nudging. Grid nudging, sometimes referred to as analysis nudging, is a variant of spectral nudging where nudging is performed for every grid cell and for all spatial scales [49]. It is used to ensure that the simulated large-scale fields are consistent with the driving fields [64,65]. Grid nudging is sometimes superior to spectral nudging if appropriate nudging coefficients are chosen to adjust the strength of the nudging force [64,65]. This method, is however, extremely computationally expensive. Liu et al. (2012) compared grid and spectral nudging in the downscaling of National Centers for Environmental Prediction (NCEP) and National Center for Atmospheric Research (NCAR) data with the Weather Research and Forecasting (WRF) model [66]. Compared with grid nudging, spectral nudging provides a better balance between the need to keep RCM results consistent with the large scale driving forces that would be provided by GCMs, and at the same time, allowed more variance to be added at smaller scales [66]. Additionally, the improvement at the small scale allowed by spectral nudging is not only reflected in spatial variability, but temporal variability as well [66]. Bowden et al. (2012) evaluated interior nudging techniques using the WRF model in a twoway nested configuration for downscaling with both the spectral and gridded approach [67]. Temperature behaved similarly regardless of which method was used, though for precipitation, gridded nudging generates monthly precipitation totals, intensity, and frequency more accurately than spectral nudging [67]. Gridded nudging appears to suppress the variability in predictions more strongly than spectral nudging [67]. Wooten et al. (2016) used the grid nudging approach using the WRF model to simulate precipitation over Puerto Rico [68]. Grid nudging generally resulted in lower precipitation than are observed but use of convective-permitting simulations improved the annual cycle, intensity, and location of rainfall [68].
Variable resolution grid models. Variable resolution grid models (also known as stretched grid models) rely on a higher resolution over the region of interest. These are not as widely used but the advantage of a variable resolution stretched-grid is that it does not require any lateral boundary conditions for forcing, which reduces the computational challenges [69]. The result is that it provides self-consistent interactions between global and regional scales while maintaining global circulation with a uniform grid [69]. It is also portable, so the area of interest can be located anywhere [69].
McGregor (2015) provided a summary of five variable resolution downscaled climate models. Some of the benefits include the absence of any lateral boundaries and the related problems with spurious reflections [70]. Useful interactions may also occur between the fine-resolution and coarser regions [70]. Two variable grid models employ the Schmidt transformation (ARPEGE and CCAM), the other three employ spherical coordinates that independently stretch the longitudes and latitudes (GEOS-SG, GEM, and LMDZ) [70]. However, lowerboundary forcing is still needed for sea-ice and sea surface temperatures [70]. Variable grid GCMs can downscale to a finer resolution, correcting some GCM biases [70]. Variable grid GCMs are increasingly being used to downscale ensemble GCMs [70]. Corney et al. (2013) implemented a variable resolution method for dynamically downscaling six GCMs for two emissions scenarios [71]. They found that application of bias adjustment and multi-staged increased resolution, the simulation can better reproduce observations, explaining more than 95% of the temperature spatial variance for~90% for precipitation [71]. Variable grid downscaling significantly improve the temporal distribution of variables and improved the seasonal variability predictions compared to the GCM [71].

Reinitialization.
Reinitialization is a method consisting of running short simulations that are frequently reinitialized. This allows the RCM to take advantage of the assimilated observations, both at the lateral boundaries and over the entire 3D atmosphere, while preserving the time sequence of the weather events from the driving field [51]. The benefit of reinitialization is that it is likely to prevent drift in climate from the GCM that wouldn't be predicted at a localized scale.
For example, Lucas-Picher et al. (2013) tested RCMs with the inclusion of reinitialization to evaluate the downscaling performance of HIRHAM5 [57]. Surface conditions were kept continuous while the atmosphere was reinitialized every 24-hours from ERA-I using a 6-hr spin up. The individual 24-hour simulations were concatenated into a single massive time series. The reinitialization successfully prevented seasonal drift towards wetter climate conditions and improved predictions near the boundary [57]. The annual mean and the deviation were also improved by reinitialization [57]. However, the approach is time-consuming and adds substantial computational requirements [57].
Bias correction. All GCMs have from some bias in their output, which when applied as boundary condition for an RCM may impact the results, sometimes substantially [72]. GCM biases can be passed through to the RCM via lateral and lower boundary conditions. There are two general approaches for bias correction: determining the differences (deltas) between predicted and observed/historic values for reference period that are then applied to observed historical data to construct future time series, or calculating scaling parameters to apply to future predictions to more closely fit observed climate [73]. One common technique is quantilequantile mapping (QM) [74,75]. For a given meteorological variable, the simulated cumulative density function (CDF) is compared to the observed CDF to calculate a correction function for each quantile [74,75]. Then the corrected function is used to remove bias from the variables in each quantile [74,75].
For example, Chen et al. (2019) used bias-correction with QM to downscale CMIP5 in Nevada at a station scale [76]. This involved: (1) validating PRISM data as the historic observed data, (2) bias correcting the CMIP5 dataset, and (3) validating the QM bias-correction process [76]. Hydrology was then simulated using Precipitation Runoff Modeling System (PRMS). Results fit well with observations of both temperature and precipitation as well as density distribution [76]. Piani et al. (2010) also used statistical bias correction to correct GCM output to produce internally consistent values that had a similar statistical distribution to the observations [77]. The approach performed unexpectedly well; in addition to improving the mean and other moments of the precipitation intensity CDF, drought predictions and heavy precipitation indeces also improved [77]. Themeßl et al. (2012) investigated the effect of quantile mapping on RCMs for its success in reducing systematic RCM errors and the ability to predict "new extremes" (values outside the calibration range) [78]. QM reduced the bias of daily mean, minimum, and maximum temperature, precipitation amount, derived indices of extremes by about one order of magnitude, and improved the frequency distributions [78]. In this case, QM was based on the CDF rather than on paired data and using empirical CDFs rather than theoretical CDFs [78]. QM was applied on a daily basis for each grid cell separately. In a 40-years RCM validation, QM was applicable to longer climate simulations and multiple parameters, irrespective of spatial and temporal error characteristics, which suggests transferability of QM to other RCMs [78]. Muerth et al. (2012) studied the importance of bias correction on daily runoff simulated with four different hydrologic models driven by different ensemble RCMs (CRCM4, RACMO2, and RCA3) [73]. Precipitation was bias corrected using the Local Intensity (LOCI) scaling method while air temperature was modified using a monthly additive correction [73]. Both methods were chosen for their simplicity, though they both have inherent flaws [73]. The monthly correction may create jumps in the corrected dataset between months and the LOCI performance has been shown to be slightly inferior to QM, especially at higher precipitation intensities [73]. As expected, bias correction of the RCMs systematically provides a more accurate representation of the actual hydrology, including mean and low flows [73]. High flow indicators are less affected, likely because the simulation of high flows is more dependent on the hydrological model's structure [73]. Pierce et al. (2015) tested three bias correction techniques (QM, CDF-transform (CDF-t), and equidistant quantile matching (EDCDFm)) on temperature and precipitation for 21 GCMs [79]. CDF-t bias correction determines a transformation to map the predicted CDF of a variable in the historical period to the observed CDF, then uses the map to adjust the future CDF [79]. EDCDFm corrects a future predicted value within a quantile in the future CDF by adding the historical value of the future CDF to the predicted change in value [79]. QM and CDF-t significantly altered the GCM downscaling by up to 30% for precipitation. EDCDFm better preserved the temperature values but not precipitation [79].
Teutschbein and Seibert (2012) compared the effectiveness of a range of bias correction methods on an ensemble of 11 RCMs across five watersheds in Sweden [80]. The comparison consisted of no correction, linear scaling, LOCI, power transformation, variance scaling, distribution transfer, and delta-change with observed data from 1961-1990 [74]. Biases were found to vary substantially. All temperature-bias corrections improved the raw RCM simulations, though the linear scaling method was least effective [80]. All precipitation bias corrections also improved the RCM simulation, though linear scaling and LOCI still had larger variability ranges and similar biases in terms of magnitude to uncorrected precipitation [80]. The largest differences between approaches were observed for the probability of dry days and intensity of wet days; LOCI and distribution mapping performed better for these two statistics of daily precipitation [80]. The other approaches partly decreased variability, but were unable to generate RCM predictions closer to observed values [80].

Combined approaches
There are some things that dynamic downscaling does not address well including uncertainties arising from sparse data, the representation of extreme summer precipitation, sub-daily precipitation, and capturing changes in small-scale processes and their feedback on large scales [81]. Thus, recent work has aimed to combine statistical and dynamic downscaling by using gridded RCM simulations and statistically downscaling them to point scales [81].
A number of studies combine elements of both statistical and dynamic downscaling. The lapse rate method is one such example that is based on empirical relationships between a predictor variable (e.g., elevation) and a predictand (e.g., temperature). The lapse rate method is similar to DD since it increases grid resolution of GCM output by considering higher resolution topography [25]. By empirically estimating local topographic lapse rates (LTLR, the statistical relationships between predictor and predictands) using higher resolution topographic and climate data, lapse rates can adjust GCM output based on elevation [25]. This is quite useful in mountainous terrains where elevation is one of the primary determinants of temperature and precipitation. Praskievicz (2017) used LTLR downscaling by re-gridding the GCM to the scale of the PRISM-derived lapse rates using bilinear interpolation and applying the interpolated lapse rate correction to the elevation difference between the GCM and PRISM grid points [25]. LTLR downscaling was compared to LOCA downscaling and LTLR did marginally better, though it was situationally dependent [25]. LTLR is stronger on temperature predictions than LOCA but comparable to LOCA on precipitation [25].
Li and Jin (2017) developed climate change scenarios for various sites through spatial downscaling of Earth System Models (ESMs) using a transfer function approach. ESM output was temporally downscaled using a weather generator and reconstructing spatiotemporal correlations using a distribution-free shuffle procedure [82]. Parametric QM and a Richardsontype weather generator were both used [82]. Linear and nonlinear transfer functions were fitted to the rank-ordered monthly observations and ESM output to calculate monthly mean and variance [82]. The nonlinear function transformed predicted monthly precipitation values that were within the range in which the nonlinear function was fitted. The linear function was employed for values outside the nonlinear function range [82]. Temporal downscaling was done by adjusting precipitation and temperature from the single-site weather generator based on the baseline period [82]. Finally, to create the expected rank correlations for various sites and climate parameters, the independent variables were paired [82]. The method performed well for inter-site and intervariable correlation and hydrological modeling, generating acceptable calibrations of monthly streamflow using the SWAT watershed model [82]. Shrestha et al. (2012) used SWAT to model hydrology of watersheds near the Great Lakes under climate change using nested RCMs (CRCM, RCM3, HRM3 and the ensemble mean of the three RCMs) and two transfer methods (delta change and bias correction) [83]. For the delta-change method, changes in mean monthly values between baseline and future periods were generated for each RCM [83]. For the bias-correction method, monthly systematic biases were calculated by comparing RCM predictions with observations for the historical period. Monthly mean biases were calculated for each RCM in terms of fractional change for precipitation, wind speed, relative humidity and solar radiation, or difference for minimum and maximum air temperature [83]. The delta-change method considers the changes in monthly mean values between historical and future periods and applies them to observed values without considering changes in variability, while the bias-correction method only removes the calculated monthly biases from the historical and future periods, preserving the changes in projected climate data variability [83]. The hydrologic simulations with either method (delta-change and bias-correction) resulted in similar streamflow [83].

Comparison between dynamic and statistical methods
A number of studies have compared SD to DD methods, using historical data for the comparison. There are a number of methods for evaluating performance of a downscaling approach (Table B in S1 Text). In general, the RCMs with bias correction seem to be the most effective at representing local historical climate accurately. For example, Chen et al. (2012) compared six different downscaling methods on the effects of climate change on hydrology in a Canadian basin [23]. Methods included using Canadian RCM (CanRCM) with and without bias correction, change factor and weather generator methods at both Canadian GCM and CanRCM scales and two statistical downscaling methods: SDSM with variance inflation and bias correction and discriminant analysis for precipitation coupled with step-wise regression for precipitation and temperature method at the Canadian GCM scale [23]. The bias correction was applied to CanRCM output for both monthly mean frequency and quantity for temperature and precipitation data [23]. The change factor method was applied by adjusting the observed daily temperature by adding the difference in monthly temperature predicted by the climate model to obtain future daily temperature. The weather generator method used CLIGEN, a first order two-state Markov chain that generates the occurrence of wet and dry days and the probability of precipitation given those and a normal distribution to simulate temperature minimas and maximas [23]. SDSM was implemented by linking daily probabilities of non-zero precipitation with large-scale predictors [84]. The final method was an extension of SDSM, with probability of precipitation downscaled using discriminant analysis and daily precipitation intensity of wet days downscaled using a stepwise linear regression [23]. All methods worked relatively well, though CanRCM with bias correction and SDSM were best.
Jang and Kavvas (2015) compared BCSD with dynamical downscaling using the MM5 model on precipitation variability over California [19]. Bias correction was done using a QM method that uses probability density functions (PDFs) for the aggregated monthly GCM simulated precipitation and bias correcting them based on the corresponding aggregated observations using QM. Spatial downscaling was then done using the synographic mapping system (SYMAP) algorithm [19]. BCSD-based normalized standard deviation and local precipitation change values did not show realistic spatial variation [19]. The spatial characteristics of the interpolated precipitation field were very different compared to observed values [19]. In contrast, MM5-based normalized standard deviation and local precipitation change values exhibited more realistic spatial patterns of monthly and annual precipitation variability [19]. Their conclusion was that BCSD was not suitable for assessment of future climate change at the watershed scale [19]. Pierce et al. (2013) used sixteen GCMs to generate temperature and precipitation projections for California [85]. The GCMs were downscaled with two statistical techniques (BCCA and BCSD) and three nested RCMs (RegCM3, WRF, and RSM) [85]. The authors compared climate projections using DD and SD approaches and evaluated systematic differences [85]. Of all methods, only BCCA maintained the daily sequence of the original GCM variability [85]. Schmidli et al. (2007) compared six SD models and three DD RCMs to evaluate downscaled daily precipitation in considering complex topography (European Alps). The SDs included regression methods, weather typing methods, a conditional weather generator, and a bias correction and spatial disaggregation method [14]. There was good agreement between the downscaled precipitation for most statistics [14]. All downscaling methods produced adequate mesoscale climate patterns [14]. Spatial congruence was better for SDs than for RCMs; predicted RCM patterns were typically shifted by a few grid points [14]. When considering simulated current precipitation, SDs and RCMs tended to have similar biases but differed in terms of interannual variations [14]. All SDs tend to strongly underestimate interannual variation magnitude, especially in summer and for precipitation intensity [14]. The downscaling approaches also diverged with regards to year-to-year anomaly correlation: in winter, over complex terrain, the better RCMs were more accurate than the SD approaches but over flat terrain and in summer, the differences were small [14]. Shrestha et al. (2013) compared BCSD with dynamically downscaled CRCM for simulating hydrologic changes using the Variable Infiltration Capacity (VIC) model [86]. The BCSD simulation was better able to predict precipitation, temperature, and runoff than the DD CRCM [86]. The biggest differences came from snow water equivalent and runoff [86]. While BCSD is identified as the preferred method for basin scale hydrologic impact modeling, projections were still found to differ considerably depending on which ensemble of GCMs was used [86].

Practical considerations
Most watershed modelers will employ pre-downscaled climate data for their models. In the Supporting Information we provide a summary of nine databases that host downscaled GCM simulations for various scenarios. A comparison of the key characteristics of each of these databases is provided in Table C in S1 Text. While this simplifies the process for a watershed modeler, it is important to understand the approaches (SD vs. DD) and specific methods employed. Not all GCMs and IPCC scenarios are available, which means the user must search for database(s) that can provide the necessary information. In addition, the user needs to determine which GCMs and downscaling approaches are more likely to accurately represent the climate in their watershed. This is best done by downloading downscaled GCM output for a historical period and comparing it to observed data. In the following sections we provide a workflow, with an applied example, which can be useful for watershed modelers considering creating projections of future hydrology and/or water quality.
Since the databases of downscaled output offer several GCMs, regional models, and downscaling methods to choose from, a methodology is needed for selecting which GCMs to use. A direct comparison between observed meteorological station data and GCM "predictions" of the past can be used to select the combinations of GCMs, regional models, and downscaling method(s) best suited to a given region. The rationale is that a model that can most adequately predict the past climate is likely to be more suitable for future projections. For example, the MACAv.2 database (https://climate.northwestknowledge.net/MACA/data_csv.php) provides downscaled data from the 21 GCMs (Table C in S1 Text). While the watershed modeler may choose to consider all 21 GCMs, resource and time considerations may require selecting the GCMs that perform better with regards to matching historical observations. A systematic approach for comparing output from each GCM to historical observations should be employed.

Conclusions
A water resources manager and/or watershed modeler interested in assessing the potential implications of climate change in their region has many options in terms of how best to consider GCM climate projections. However, selecting among the various options is complex. The selection of an appropriate downscaling method will depend on the desired spatial and temporal resolution for the watershed analysis, as well as resource and time constraints. It is difficult to recommend the "best" downscaling method, since the goals and resources of each study are unique, however, the general consensus across studies is that, particularly when trying to model hydrologic impact and take into consideration the uncertainties in climate projections, multiple GCMs and an ensemble of downscaling methods that include bias correction should be used to ensure the full range of possible outcomes is explored and accounted for.
In this review, we provide a pathway for selecting the major options, i.e., statistical vs. dynamic downscaling, as well as the menu of options available for each approach, such as weather typing, constructed analog, weather generators, and regression methods for SD, and RCMs for DD. Most watershed modelers will not implement a downscaling approach, given the considerable effort required to do so, but will rather be consumers of datasets available from several databases, which generally cover every region of the world. There are more options for obtaining SD datasets, with a wider range of GCMs and IPCC scenarios, than for DD datasets. Even with the availability of these datasets, a watershed modeler needs to make a number of important decisions regarding the GCMs and methods available, to select a subset of models that performs better in predicting the historical climate for a particular region.