Translating seasonal climate forecasts into water balance forecasts for decision making

Seasonal rainfall forecasts support early preparedness. These forecasts are typically disseminated at Regional Climate Outlook Forums (RCOFs), in the form of seasonal tercile probability categories—above normal, normal, below normal. However, these categories cannot be related directly to impacts on terrestrial water stores within catchments, since they are mediated by non-linear hydrological processes occurring on fine spatiotemporal scales, including rainfall partitioning into infiltration, evapotranspiration, runoff and ground-water recharge. Hydrological models are increasingly capable of capturing these processes, but there is no simple way to drive such models with a specific RCOF seasonal tercile rainfall forecast. Here we demonstrate a new method, “Quantile Bin Resampling” (QBR), for producing seasonal water forecasts for a drainage basin by integrating a tercile seasonal rainfall forecast with a hydrological model. QBR is based on mapping historical quantiles of basin-average rainfall to historical simulations of the water balance, and circumvents challenges associated with using climate model output to drive impact models directly. We evaluate QBR by generating 35 years of seasonal reforecasts for various water balance stores and fluxes for the Upper Ewaso Ng’iro basin in Kenya. Hindcasts indicate that when input tercile rainfall forecasts have skill, QBR provides accurate water forecasts at kilometre-scale resolution, which is relevant for anticipatory action down to village level. Pilot operational experimental water forecasts were produced for this basin using QBR for the 2022 March-May rainfall season, then disseminated to regional stakeholders at the Greater Horn of Africa Climate Outlook Forum (GHACOF). We discuss this initiative, along with limitations, plans and future potential of the method. Beyond the demonstrated application to water-related forecasts, QBR can be easily adapted to work with any rainfall-driven impact model. It can translate objective tercile climate probabilities into impact-relevant water balance forecasts at high spatial resolution in an efficient, transparent and flexible way


Introduction
Nearly 20% of the world's population live in areas of high or extremely high water insecurity [1], affecting lives and livelihoods in ways that are especially acute in the developing world [2]. Dryland regions have an inherently higher risk of water scarcity due to low annual rainfall totals and high atmospheric evaporative demand, so small deviations in seasonal totals and increases in interannual variability can quickly lead to drought conditions. Early warning of water insecurity can support mitigation of drought impacts on dryland communities and trigger preparedness activities (such as through the Forecast-based Financing protocol of the Red Cross e.g. [3]). These activities might include distribution of drought-tolerant seeds, destocking of livestock or provision of cash transfers to vulnerable households in advance of anomalous dry periods [4,5]. To maximise the potential for preparedness, early warnings should have as much lead time as possible; seasonal climate forecasts offer long lead times compared to weather forecasts, often available months ahead of the start of a season.
Seasonal climate forecasts are disseminated via Regional Climate Outlook Forums (RCOFs), that meet multiple times per year. These RCOFs were developed by the World Meteorological Organization (WMO) in the 1990s in conjunction with National Meteorological and Hydrological Services (NMHSs) and are focused on the delivery of climate outlook products that are used by a wide range of organisations to plan responses to climate hazards. In RCOFs, forecasts of seasonal total rainfall after downscaling are issued at a coarse spatial resolution (30-50 km), which may be suboptimal for many decision-making contexts, limiting the spatial precision of targeted interventions. For instance, rehabilitation of boreholes may be undertaken in anticipation of a poor season to protect water security during times of water stress. However low spatial precision of expected impacts affects ability to deploy limited resources in a concentrated manner. Providing this spatial precision requires taking account of the processes controlling partitioning of rainfall into surface and subsurface water stores, which are non-linear in time and space and are highly dependent on the topographic, soil and other basin characteristics. As it stands, seasonal total rainfall forecasts do not directly provide useful information about available water for specific locations e.g., soil moisture for crop growth, streamflow or groundwater for drinking or irrigation. Translating climate forecasts into water balance metrics at high resolution with more relevance to people's lives and livelihoods is key to the development of people-centred multi-hazard early warning systems [6]).
To provide more decision-relevant information, dynamical climate model output can be used to directly drive impact models. This method has been used to generate forecasts of hydrological variables such as soil moisture [7] or streamflow [8], but this approach faces several issues. Biases in mean total seasonal rainfall, daily rainfall distribution, and spatial pattern of teleconnections may induce errors in grid-point output from climate models which are compounded in turn within impact models. Spatial and temporal downscaling may also be required to generate input data at the appropriate scales, which has been shown to degrade forecast skill [9]. Initial conditions for the impact model must also be generated at some computational expense, using computationally intensive techniques such as data assimilation. None of these challenges is insurmountable [7]. However, climate model biases may remain uncorrected and significant time and computing resources are required to obtain downscaled climate forecasts for generating real-time operational forecasts and reforecasts, which may be unfeasible for actors with limited technical capacity.
Even if all the challenges associated with driving models with dynamical climate model output are met, many seasonal forecasts do not provide the necessary input variables required to drive an impact model. One example is a statistical forecast (e.g., [10,11]), predicting seasonal rainfall total based on historical relationships with large-scale predictors. No prediction of daily variability typically accompanies such a forecast. Another ubiquitous example comes from RCOF forecasts, on which many NMHSs base their own forecasts. For example, forecasts for the Greater Horn of Africa Climate Outlook Forum (GHACOF) are produced by post-processing and spatially calibrating seasonal total output from dynamical climate models, to provide probabilities of seasonal total rainfall falling in each of three terciles: above normal, normal, below normal [12]. This processing is done in such a way that it does not generate accompanying daily rainfall data that is consistent with the tercile forecasts. Given the ubiquity of RCOF forecasts carried out in this manner, it would be advantageous to be able to link a particular tercile forecast to hydrological impact models that operate at high spatiotemporal resolution.
In this paper we present a new method, "Quantile Bin Resampling" (QBR), which is used here to link GHACOF tercile forecasts with a process-based, high-resolution dryland hydrology model. QBR is parsimonious and technically much simpler than driving an impact model with high frequency seasonal forecast data, and it requires no bias correction or downscaling of climate model output. The resulting forecast is entirely consistent with the tercile seasonal climate forecast used to create it, whilst remaining faithful to physically realistic high frequency hydrometeorological processes and their interactions with the land surface. It provides seasonal outputs of multiple water balance components at kilometre-scale resolution (the native resolution of the hydrological model).
In the next section we describe QBR and the dryland hydrology model employed here, along with other methodological approaches. In section 3 we present an evaluation of the skill of QBR-based forecasts for producing hydrological forecasts over a basin in Kenya. We subsequently describe the release of pilot water balance forecasts at GHACOF60 in February 2022 in section 4. Finally in sections 5 and 6 we conclude by discussing the limitations of this approach, describe plans for development of QBR, and outline some potential applications of the method.

Methodology
We use QBR to link tercile rainfall probabilities with output from a recently published hydrological model, DRYP 1.0 [13], designed specifically for partitioning of the water balance in dryland catchments. Below, we describe the essential aspects of the DRYP model, the QBR methodology and finally our forecast verification strategy.

DRYP: A hydrological model created for drylands
DRYP has been designed to estimate the water balance with particular emphasis on dryland environments, where pre-existing models have poor capability in resolving water fluxes and stores that are faithful to the underlying hydrological processes [13]. Partitioning of rainfall into the water balance in drylands is highly dependent on key processes occurring at or near the land surface on small spatio-temporal scales. DRYP simulates these hydrological processes (such as infiltration-excess overland flow, transmission losses and focused groundwater recharge) and is designed to be run at high spatio-temporal resolution, to capture the hydrological dynamics associated with brief and spatially restricted intense rainfall events. DRYP is parsimonious and designed for use in data-poor settings, where only limited calibration is possible. The model uses spatial fields of sub-daily rainfall and potential evapotranspiration (PET) as input, along with various input layers representing land surface properties such as topography, vegetation and soil characteristics. Evaluation of the model has demonstrated skill in quantifying the main components of the dryland water balance [14].
A historical DRYP simulation for the Upper Ewaso Ng'iro basin of central Kenya [14] was used here as a testbed for QBR. This catchment is characterised by humid conditions in its headwaters near Mount Kenya, resulting in evapotranspiration losses under energy-limited conditions, and the development of largely perennial streams with significant groundwater contribution to baseflow. However, the lower part of the basin is dominated by arid and semiarid climatic conditions, ephemeral streams, and focussed groundwater recharge processes often occur via transmission losses from streams. The boundary of the basin and accompanying shapefile was derived from the NASA SRTM dataset [15].
As described in detail by [14], a historical simulation was produced by running DRYP at a 3-hour timestep, and at 1-km spatial resolution. Historical rainfall forcing at 0.1 o spatial resolution was taken from 3-hourly MSWEP v2.8 [16] and hourly PET at 0.1 o from hPET [17]. Forcing data were interpolated to the model resolution (1 km) using a bilinear method. Simulations were carried out for the period 1980-2016. However, the QBR forecast analysis was performed using only the period 1982-2016, as the first two years were used for model spin up. Model outputs characterising the water balance were extracted as the basis of the QBR forecast generation method. These include seasonal total actual evapotranspiration (AET), seasonal average soil moisture and seasonal total potential groundwater recharge. Potential groundwater recharge here is defined as the amount of water draining below the root zone of any model grid cell (diffuse recharge) or percolating below the riparian zone along channels (focused recharge). Additionally, the seasonal average of the commonly used water requirements satisfaction index (WRSI, [18], was calculated as the ratio between estimated AET and input PET. Finally, a metric of seasonal flood hazard was estimated for all cells along the river network defined in the model by calculating the number of days in each season in which streamflow exceeded the 99th percentile level (estimated from all flow days in the historical simulation).

Quantile bin resampling
We used the observation-driven historical simulation from DRYP described above (hereafter, DRYP-obs) as a basis for producing hydrological forecasts with QBR. Here we demonstrate the method using tercile seasonal rainfall probabilities as input, such that the results are relevant to the transformation of the GHACOF seasonal rainfall forecast into hydrological (water balance) forecasts.
Philosophically, QBR is like the concept of analogue forecasting [19], in which the current climate state is matched to similar states in the past, and the regional climate impacts in this historical subsample are used to forecast future conditions. With QBR, historical model outputs are categorised by quantiles of seasonal rainfall totals associated with that season (e.g., upper, middle or lower tercile). Historical hydrological outcomes can then be grouped according to these categories (Fig 1, steps 1 and 2). This mapping of rainfall quantiles to historical model simulations (quantile bin) thus provides a plausible set of hydrological outcomes associated with each tercile.
Once a quantile bin is produced, we can transform probabilistic forecasts of basin-average rainfall quantiles into a hydrological forecast ensemble of any size. This is done by resampling (with replacement) from each quantile subset, where the number of samples is chosen to be proportional to the input forecast probability. For example, a 10-member hydrological forecast ensemble can be generated from a tercile forecast of 60/30/10% upper/middle/lower terciles by resampling 6/3/1 members from each of the tercile subsets (Fig 1, part 3). Large ensembles can thus be easily generated, allowing for more robust representation of all members within each quantile subset. In the following analysis, we resample from the quantile bin to generate 100-member ensembles.
QBR is applied here at the scale of a hydrological basin. Whilst spatially explicit hydrological simulation is a key feature of the output, the method is only designed for input rainfall that does not have spatial structure, i.e. basin-average seasonal forecast probabilities. QBR thus assumes homogeneous climate forecast anomalies across the domain. This is reasonable for a basin small enough to experience similar climate forcing. For extremely large (e.g., transboundary) basins this assumption may not be realistic, and a different approach may be required. However, predictable seasonal signals arising from large-scale forcing tend to be homogeneous at relatively large scales, at least (e.g., Fig 4 in [20]). The Upper Ewaso Ng'iro basin has an area of approximately 15 000 km 2 , which is slightly bigger than a 1-degree grid box, the scale at which disseminated seasonal forecast output is typically available. We consider the assumption of a homogenous seasonal forecast signal over this basin to be reasonable.

Forecast evaluation 2.3.1 Evaulation strategy.
Any quantile forecast can be transformed with the quantile bin into a hydrological forecast ensemble. This ensemble can in turn be used to estimate the  (1), which come here from DRYP-obs, although they could be drawn from a long observational record. Next, these historical simulations are characterised according to the quantiles (in this case, terciles) of basin-average rainfall, to produce a quantile map (2). Finally a hydrological forecast ensemble is generated by resampling from the quantile map (3). An ensemble size of N = 10 is illustrated, but in practice a sufficiently large ensemble size should be chosen such that all quantile bin members are proportionally present). Once the hydrological forecast ensemble is created, probabilities of any hydrological 'event' (any variable of interest) can be calculated from the proportion of ensemble members in which that event occurs.
https://doi.org/10.1371/journal.pclm.0000138.g001 probability of crossing any defined hydrological threshold by calculating the proportion of hydrological ensemble members lying above or below that threshold.
Before relying on this new forecast, evaluation of this method and its products is essential. The lack of long-term hydrological records at sub-basin scale can present a challenge to evaluation of forecast skill (where 'skill' simply refers generally to the ability of a forecast system to anticipate observed variability). To move beyond this limitation and improve understanding of the QBR method and potential forecast skill, we follow the 3-tiered approach for evaluation of impact models driven by climate forecasts [21]. Tier-1 refers to evaluation of the input climate forecasts against climate observations. Tier-3 refers to evaluation of the forecast output from a model driven by climate forecasts against real observations of the target variable (for instance, long records of streamflow): a gold standard, but often hampered by lack of verification data. Tier-2 provides an intermediate test, where the output forecasts from the impact model driven by climate forecasts are evaluated against historical simulations produced by that same model driven by historical meteorological observations. In our case Tier-2 involves evaluating QBR forecasts (driven by some quantile rainfall forecast probabilities) against DRYPobs (driven by historical climate data).
If DRYP-obs accurately captures real world hydrological variability, then skill found with Tier-2 evaluation will be representative of Tier-3 (i.e., "real-world") skill. For many river basins there are insufficient data records to confirm this assumption robustly. As a proxy, historical DRYP simulations have been shown to accurately represent the main components of the dryland water balance in a data-rich dryland basin [13]. Beyond this, even with significant uncertainty in the relationship between DRYP-obs and real-world hydrology, Tier-2 verification still provides a useful "stress test" of the QBR methodology for water forecasting. By comparing the Tier-2 evaluation of QBR forecasts produced with differing levels of input skill, we can test if the output performs as expected: does higher-skill input (accurate input rainfall forecasts) result in higher-skill output? 2.3.2 Using the SEAS5 hindcast as a proxy for GHACOF forecast skill. Our goal here is to estimate potential skill from the GHACOF seasonal forecast for hydrologic forecasting. The GHACOF forecast is coordinated and disseminated by the IGAD Climate Prediction and Application Centre (ICPAC); the World Meteorological Organisation's provider of climate services for the Greater Horn of Africa. Although GHACOF seasonal forecasts have been archived since their incipience in 1998, the method of production significantly changed in 2019, from a so-called consensus forecast to an "objective" methodology. The new method is better documented, more transparent and repeatable, supporting a more widely accepted forecast. Therefore, we decided to avoid using the archive of forecasts before 2019, which may not necessarily be a good indication of forecast skill. Instead, and in order to use the full range of DRYP-obs data, we use reforecasts from the European Centre for Medium-Range Forecasts (ECMWF) seasonal model, SEAS5 [22], as a proxy for the potential skill of the GHACOF forecast. SEAS5 is a key model used in production of the GHACOF forecast (before and after 2019), and internal ICPAC evaluation indicates it is one of the most skilful models for the region. We therefore assume that SEAS5 represents a reasonable indication of likely GHACOF forecast skill. A 25-member SEAS5 hindcast is available from 1981 to present, allowing for the assessment of reforecasts covering the entire DRYP-obs period. Here we calculate seasonal rainfall totals from all members across the SEAS5 hindcast and calculate tercile thresholds across the Ewaso Ng'iro. These thresholds are then used to convert the ensemble hindcast into probabilities, where the percentage of ensemble members above the upper and below the lower tercile threshold provides the probability of upper and lower tercile seasonal rainfall.

Generating benchmark hindcasts.
We evaluate the potential of a SEAS5-driven hydrological forecast by generating QBR hindcasts using DRYP-obs. Hindcasts are created for the two main seasons of the region-the long rains (March to May, hereafter MAM) and the short rains (October to December, OND). For each season, a 100-member hydrological forecast ensemble is created for all years in DRYP-obs . The ensemble for each year is resampled based on a triad of tercile rainfall probabilities arising from the SEAS5 hindcast for that year.
Alongside the SEAS5 QBR hindcast, we created two benchmark hindcasts: "perfect" and "climatology". The first, "perfect", is created for each historical rainfall season using a forecast probability of 100% for the actual observed tercile category. If QBR works as intended, we expect this hindcast to show the highest skill because the forecast has perfect knowledge of priori which tercile of rainfall occurred. The skill achieved by the perfect forecast then represents the upper limit of impact forecasts produced by the QBR method based on terciles. Any forecast errors will not arise from uncertainty in the specification of the rainfall tercile, but instead can be attributed to the simplification of all spatiotemporal variability in meteorological forcing to the tercile of seasonal basin-average rainfall.
The second benchmark hindcast, "climatology", is produced by inputting a forecast of the climatological frequency of each tercile (i.e., 33/33/33% upper/middle/lower tercile), for all seasons. This represents a lower bound on impact forecasting skill, where we have no certainty about whether the rainfall fell in a particular tercile (lower third, middle third, upper third). We expect no skill from the QBR method based on the "climatology" hindcast, as the input forecast probabilities do not change from one year to another. Output probabilities should be reliable (unbiased), but they will show no variability from season to season.
Skill of each hindcast for each variable is assessed with the hindcast ensemble mean correlation against DRYP-obs. This is assessed for each 1-km grid cell (hereafter, pixel) in the test basin indicating the level and spatial variability in the association between forecast signals and observed variability. Statistical significance of correlations is calculated at the 95% level using a t-test [23].

Probabilistic verification.
We also carried out probabilistic verification, focusing on the bias in forecast probabilities (reliability) and the interannual variability in issued probabilities (sharpness). Probabilities are calculated based on a variable crossing a predefined threshold (hereafter referred to as an "event"). These two characteristics are central to the realworld utility of forecast probabilities for decision-making, as high reliability indicates that the probabilities "mean what they say" and when higher forecast probabilities are issued the likelihood of an event occurring in the corresponding observations is higher. High sharpness indicates that forecasts are issued at a large range of probability levels and so have the capacity to discriminate low risk from from high risk contexts.
Reliability diagrams [24] were used to evaluate reliability and sharpness, in which we produced separate diagrams based on seasonal forecast probabilities for three separate "events" for each variable. The first two events are defined by one-in-three and one-in-five-year return period thresholds respectively, calculated per pixel across DRYP-obs (the fraction of members of the hydrological forecast ensemble breaching this threshold provides the forecast probability). The third event is defined with a fixed threshold across all pixels, corresponding to a physical event specific to each variable. For example, the physical event for groundwater recharge is defined as "greater than 0 mm". Full details of the probabilistic verification methodology and results are provided in S1 and S2 Texts respecitvely. For brevity we only report headline findings in the main manuscript and provide a full dissection of probabilistic verification results in supplementary material for particularly interested readers.
Results are presented and discussed for MAM and OND, for which GHACOF forecasts are issued in February and August. We compare verification for the two benchmark hindcasts with verification based on SEAS5 hindcasts for each respective season at the time of the GHA-COF (i.e., 1-February forecast for MAM the 1-August forecast for OND).

Forecast evaluation results
Ensemble mean correlation demonstrates the ability of a forecast method to reproduce observed interannual variability. Correlation plots for the Ewaso Ng'iro basin are shown in Figs 2 and 3 for MAM and OND, respectively. The skill ranking of the different forecasts follows expectations. When "perfect" terciles are used as input, the output shows the highest skill. Forecast skill with SEAS5 input is lower, yet it is significantly higher than the "climatology" expectation (where all terciles are equally probable).
With "perfect" tercile forecasts as input, different spatial patterns of correlation between hindcast and observed outputs can be seen in each water balance variable. Forecasts of groundwater recharge for MAM ( Fig 2E) shows highest skill in the mountainous southeast, with patchy areas of zero skill across the northern half of the basin. Forecasts of soil moisture, AET and WRSI (Fig 2F-2H) all show higher skill across the basin, with slightly reduced skill in the south. Flood hazard forecasts show correlation of over 0.8 for most streams (it can only be measured along channels), with skill below statistical significance in the east of the basin. These patterns in water balance variables are broadly consistent for OND (Fig 3F-3J), although areas of zero correlation for groundwater recharge are located primarily in the west of the basin, whilst the other variables show higher correlations overall compared to MAM.
These differences in predictability are related to the underlying interannual variability in DRYP-obs. In the limit case where a variable shows zero interannual variability, the correlation

PLOS CLIMATE
Translating seasonal climate forecasts into water balance forecasts for decision making is, by definition, zero-there are no anomalies to predict. Very low levels of variability may arise from "noise", that is, fluctuations which are unpredictable on a seasonal scale. Conversely, high predictability is likely to be associated with greater interannual variability in the historical record. To demonstrate this, maps of standard deviation across all DRYP-obs years for MAM and OND are shown in S1 and S2 Figs. Comparison with the "perfect" correlations in Figs 2 and 3 shows a consistent pattern: areas where interannual variability is low also show the lowest correlation values. In OND the northwest of the basin has very low interannual variability in recharge, whilst the southeast margins have low interannual variability for soil moisture, AET and WRSI (S3 Fig). All these regions show lowest magnitude of correlation for each variable respectively; generally, below statistical significance (Fig 3). For MAM the area of low interannual variability in recharge is much more extensive compared to OND and reaches across to the east of the basin (S2 Fig, panel A): this is reflected in low correlations stretching across the same region (Fig 2F). Similarly, to OND, MAM variability of AET and WRSI is also low in the southeast of the region, and low correlations are also seen here for these variables. Flood hazard shows lowest variability in the east during MAM (S2 Fig, panel E); perfect forecasts also show below-significant correlations here (Fig 2J). This indicates a strong association between low interannual variability and low forecast skill. On the other hand, high interannual variability does not necessarily lead to high skill; the source of large year-to-year fluctuations may not arise from the predictability of the input rainfall forecast.
QBR driven by SEAS5 shows similar spatial patterns of ensemble mean correlation to "perfect" tercile forecasts for across all variables, though with lower magnitudes (Fig 2A-2E). For example, significant correlations for MAM recharge occur only in patchy areas across the basin. Correlations for soil moisture, AET and WRSI are statistically significant across most of the basin, with lower values in the southeast. Flood hazard forecasts are statistically significant only in some streams in the northwest. Similarly, skill for SEAS5 OND (Fig 3A-3E) shows the same spatial pattern as "perfect" forecast skill, with overall higher levels of skill compared to SEAS5 MAM. Probabilistic verification also demonstrates a high consistency between the skill of the input rainfall forecasts and the reliability and sharpness of the output water forecasts (S3 to S12 Figs). For both MAM and OND, inputting "perfect" tercile forecasts provides highly reliable (low bias) and sharp (high year-to-year variability) forecasts, as expected, whilst input of "climatology" forecast rainfall produces reliable probabilities with no sharpness. Results for SEAS5 are broadly reliable, with more sharpness than climatology forecasts and less than perfect. Higher sharpness is found across all variables for OND compared to MAM, consistent with the higher skill of input rainfall forecasts for the OND season. A more complete diagnosis of probabilistic skill can be found in supplementary material.
The overall conclusions from the forecast evaluation are as follows. Firstly, the way in which a basin-average rainfall tercile forecast is translated into water balance results in spatially heterogenous skill in the water forecasts. This indicates the non-linear relationship between seasonal rainfall totals and impacts on the water balance, further emphasising the need for methods to translate rainfall into more relevant metrics which reflect variability in the water balance. Secondly, the verification presented here demonstrates QBR to be robust, transforming the level of forecast skill in inputs into outputs in an expected way. Thus, the use of QBR to generate and release real-time water forecasts for MAM is validated for incorporation into the GHACOF beginning in February 2022. This activity is discussed in the following section, along with interpretation of results and potential limitations of the method.

Releasing a water forecast at GHACOF 60
The GHACOF objective seasonal rainfall forecast for MAM was issued by ICPAC at GHACOF 60 on February 17th 2022 [25]. The resulting rainfall terciles over the Upper Ewaso Ng'iro basin were used to drive the QBR method and to produce water forecasts. The seasonal rainfall forecast is shown in Fig 4, along with example output from our QBR water forecasts.
Broadly, the rainfall forecast indicates an increased chance of a wetter-than-average season across most of the region (Fig 4A). Most of Kenya (including the Upper Ewaso Ng'iro) fall in the defined "Zone I" region, where the probability of upper, middle and lower tercile seasonal total rainfall was forecast at 50%, 25% and 25%, respectively. These were used with the QBR method to produce forecasts for the five hydrological variables evaluated in this paper. For each variable, maps were generated representing the forecasted probability of each tercile. Along with these tercile forecasts, an additional forecast was produced for each variable separately, corresponding to the probability of that variable falling above or below a physical threshold for that season (see supplementary material for details on these physical thresholds).
Some examples of the resultant forecasts are shown in Fig 4B-4E. These explicitly show how basin-average rainfall tercile forecast is translated differently into water balance forecasts, dependent on location. For instance, Fig 4B shows that GHACOF forecast leads to enhanced chance of above-normal WRSI across most of the region, particularly in the north, yet without any enhanced signal in the mountainous south of the basin. Conversely, the GHACOF rainfall forecast leads to increased chance of groundwater recharge across the south and in some areas of the north (Fig 4C), but not for large areas of the basin where interannual variability in recharge is historically low (S1 Fig, panel A).
From the verification of SEAS5, we expect the probabilities of water balance variables to be reliable: higher probabilities accurately capture increased chance of the defined events. Correlation analysis indicates low predictability in the southeast of the region for soil moisture and WRSI and across the central part of the basin (Fig 2A & 2D): consistent with this analysis, the forecasts generated for MAM 2022 do not generate any strong forecast signal for these regions.
Examples of physical probability forecasts are shown in Fig 4C & 4D for probability of an indicative "severe plant stress" threshold, taken as seasonal average soil moisture falling below a quarter of the soil's capacity (see supplementary methods for details), and for probability of at least one extreme flow day, respectively. As expected, the water forecast suggests the wetter mountainous region is not at risk of severe plant stress in terms of soil moisture, whilst this probability is higher further north and east, with sub-basin variation in probability indicated down to the native 1-km scale of the DRYP output. The flood hazard map indicates the streams which are almost guaranteed to see at least one extreme flow day, along with those where the probability is lower, but still significant.
These experimental forecasts have been piloted at the GHACOF60, presented to stakeholders at a water and energy co-production session on February 16th and at a side event on the 17th of February 2022. We also included a participatory element to elicit feedback and stimulate discussion between stakeholders, inviting perspectives about the forecasts, their appearance and formatting. Water forecasts of key water balance variables developed here for the upcoming season, in addition to all 35 years historical simulations from DRYP-obs, have been uploaded as layers on the East Africa Hazards Watch (EAHW) public GIS platform, which is in development at http://hazardswatch.icpac.net/map/ea/ (an illustrative screenshot showing a flood risk layer is shown in Fig 5, where the new forecasts are included under the "water security" tab). EAHW aims to provide decision-ready information to support transnational coordination and early action across borders. It allows comparison with other hazard layers to improve the overall context and picture of forecast information delivered at the GHACOF. The inclusion of QBR forecasts on the EAHW platform is a step toward enhanced decision-relevant water security forecasts presenting various water balance variables. These will be developed further in collaboration with stakeholders as part of a co-production approach and expanded for more basins and ultimately the broader region at future GHACOF events.

Discussion
The evaluation of QBR presented here used tercile forecasts based on SEAS5 to produce kilometre-scale water forecasts. Results for a large basin in Kenya indicate that forecast probabilities from the method are reliable across both seasons. OND forecasts achieve higher skill by virtue of issuing a larger range of probabilities. This is despite OND forecasts being issued with an earlier (one month) lead time than for MAM. This is consistent with understanding of differences in seasonal forecast skill from dynamical models: predictions of MAM rainfall from dynamical models are generally limited in skill, whilst OND is more skilful due to the strong link to large-scale, predictable, climate modes (e.g. [26][27][28][29]).
Pilot forecasts based on the GHACOF seasonal forecast were produced for the MAM season 2022 and released in real-time as part of a stakeholder workshop at the GHACOF. These water forecasts for the Ewaso Ng'iro basin, along with historical DRYP-obs data, have been integrated into the East Africa Hazards Watch platform. The form of these forecasts has been initially developed with ICPAC, and will develop further through continuous feedback with stakeholders, working together to optimally tailor the GHACOF seasonal forecast for decision-making. Prior research has argued for more engagement from such intermediatelevel stakeholders and users, which can lead to more robust development and assessment of climate services outside of the traditional focus on end-user communities [30]. In parallel, work is underway to set up the DRYP model for basins across the Greater Horn of Africa, beginning with additional basins in Ethiopia and Somaliland. The ultimate vision is to extend DRYP-based forecasts across the entire region, transforming the GHACOF seasonal forecast into a regional water forecast at 1 km-scale, enhancing its relevance for water security. This is a key step in the move from hazard forecasting to impact-based forecasting, transforming forecasts into a richer source of information to help decision makers take more informed and precise anticipatory action [31].
Verification of the QBR hindcasts demonstrate that even with a spatially homogeneous rainfall forecast across the basin, predictability of impacts on sub-basin water stores is highly spatially variable and highly dependent on the specific water balance variable. Our results indicated interannual variability in water stores as a key factor controlling this potential predictability, where low levels of historical variability limit the potential level of predictability. This leads to an interesting question: why do seasonal averages for some water balance components in certain regions show minimal year-to-year fluctuations, despite large variability in seasonal rainfall? This has not been investigated further here, but this likely arises from an interaction between local hydroclimatological processes and the land surface properties. In the case of the Upper Ewaso Ng"iro, mountainous regions consistently receive large rainfall totals even in "lower tercile" years, thus the lower bound on soil moisture and WRSI will be relatively high, limiting the possible range of states (and therefore damping interannual variability). By contrast in the drier, hotter lowlands where potential evapotranspiration is consistently high, groundwater recharge is rare (only occurring by focused recharge below channels) and tercile seasonal total rainfall is not a strong predictor of modelled recharge. This spatial diversity in the response of surface water to seasonal rainfall anomalies is not quantified in current seasonal forecasts. Providing this insight represents part of the "added-value" of the QBR approach, in the translation of seasonal outlooks into more decision-relevant metrics.
One potential challenge to QBR is that it does not capture any predictability from initial conditions. Knowledge of anomalous antecedent conditions can add valuable information to seasonal predictions, particularly where that anomaly persists for many months. Sources of initial condition predictability include snowpack accumulation and soil moisture [32]. However, despite no inclusion of initial hydrological conditions prior to each forecast season, we still find high skill in the forecasts. This may arise due to the highly modal nature of the dry and wet seasons in the region: the start of the wet season forecast follows a persistent lack of rainfall during the dry season, reducing water stores to the same low level by the start of most rainfall seasons [33]. Start-of-season variability in surface water may offer limited predictability compared to the subsequent seasonal rainfall total which will end up dominating hydrological predictability. This is consistent with [32], where it is shown that streamflow forecasts made at the time of year when moisture states are typically depleted show minimal added value from knowledge of initial soil moisture conditions, and derive practically all hydrological forecast skill at these times from the future boundary conditions (i.e., input rainfall following the initial condition).
Another limitation of the method is that the set-up is fixed to link seasonal totals to statistics of seasonal hydrology. This means that QBR cannot disaggregate outcomes over a time series through the season. However, this problem might be overcome by updating the entire seasonal forecast as the season progresses, for instance by using recomputed tercile rainfall forecasts generated by combining observed rainfall to-date with a forecast of rainfall total for the remainder of the season, in a similar approach as is used in the TAMSAT-ALERT approach to forecasting soil moisture [34]. Indeed, as the season approaches ICPAC release regular updates after the initial GHACOF forecast release, and these can be used to update the QBR forecast accordingly. QBR does not yet exploit any predictability offered by interannual variability in temperature or more generally, PET. Temperature rises tend to lead to increased PET and thereby decrease available water in surface stores, but potentially higher focused recharge due to suppressed water tables in some parts of the basin [14]. Incorporating temperature forecasts into the QBR methodology may thus offer potential to improve hydrological forecasts. Whilst the association of hydrological outcomes with temperature is weaker than with rainfall, forecast signals of temperature anomalies are generally higher than for rainfall, boosting their potential value to hydrological predictability. The QBR approach could in theory incorporate temperature forecasts by resampling from a two-dimensional matrix of seasonal total rainfall and seasonal average temperature instead of rainfall alone. However, the 35 years of simulations based on the observational record precludes this: division into three quantiles is reasonable but into three-squared (accounting for both rainfall and PET) is not. We have plans to address this limitation by creating synthetic long records of PET observations based on observed records [17,35] and using these to create a much longer record of plausible outcomes using DRYP. This will also facilitate the use of forecast input based on smaller quantile bins, such as quintiles or deciles.
It would be valuable to compare forecasts generated with QBR with alternative forecasting methods, in order to understand and measure relative differences in performance. However to our knowledge there are currently no other hydrological forecasting systems which are able to producing kilometre-scale forecasts of the complete water balance (let alone those which can be linked easily with tercile rainfall forecasts, and have reforecast data available). Thus it is currently not possible to make such a benchmarking. Along with forecasting methods based on hydrological models, a benchmark forecast based on a simple statistical relation between seasonal rainfall tercile and hydrological observations might be considered, however the lack of availability of kilometre-scale long records of the water balance currently precludes this. However, the QBR method itself can be considered as a relatively simple statistical forecasting method. Water balance forecasts generated through application of QBR are themselves a good candidate to act as a future benchmark. Given that it is computationally cheap and simple to implement, it would make a strong test of more computationally demanding kilometer-scale forcasting methods.
The application of QBR presented here used DRYP, a dryland hydrological model. However, in principle QBR can link any process-based model of any complexity to any quantilebased seasonal forecast. Indeed, an early precursor of this method was used to visualise the uncertainty in the relationship between seasonal average climate and malaria risk (MacLeod and Morse 2014). It could even use observational data to generate the quantile bins, providing said data covers a sufficiently long period.

Conclusions
Here we have presented a new generalised approach (QBR) to link quantile rainfall forecast probabilities with simulations from process-based impact models. QBR is applied to a basin in Kenya to link seasonal tercile rainfall forecasts to a hydrological model which represents key dryland hydrology processes. We have demonstrated the method to be robust, transforming the level of forecast skill in inputs into outputs in an expected way. The main advantage of this method is its simplicity and flexibility to any seasonal forecast, especially those without any accompanying data at the required resolution to run process-based models (i.e. daily and/or subdaily rainfall data). This is particularly relevant for statistical seasonal forecasts based on indices which can outperform dynamical model output (e.g. Funk 2020). It is useful even when daily and sub-daily data are available, as it is much less computationally intensive than alternative methods. QBR does not require any bias correction of input, initialization strategy or downscaling and does not require any bespoke simulations to generate new forecasts at high spatial resolution in real time. Development of QBR including QBR to target other climate impacts will support translation of tercile-based seasonal climate outlooks into impactrelevant forecasts at high spatial resolution in a quick, transparent and flexible way.  Fig. As S3 Fig, for OND. (EPS) S9 Fig. As S4 Fig, for OND. (EPS) S10 Fig. As S5 Fig, for OND. (EPS) S11 Fig. As S6 Fig, for OND. (EPS) S12 Fig. As S7 Fig, for OND. (EPS)