Accurate estimations of local precipitation are necessary for assessing water resources and water-related disaster risks. Numerical models are typically used to estimate precipitation, but biases can result from insufficient resolution and incomplete physical processes. To correct these biases, various bias correction methods have been developed. Recently, bias correction methods using machine learning have been developed for improved performance. However, estimating local hourly precipitation characteristics remains difficult due to the nonlinearity of precipitation. Here, we focused on precipitation systems that could be reproduced by numerical models, and estimated the spatial distribution of local precipitation by recognizing the relationship between simulated and observed precipitation with a resolution of 0.06 degrees using a machine learning method. We subsequently applied a quantile mapping method to modify the precipitation amounts. Validation showed that our method could significantly reduce bias in numerical simulations, especially the spatial distribution of hourly precipitation frequency. However, the bias in the temporal distribution of hourly precipitation did not improve. Spatial autocorrelation analysis showed that this method can predict precipitation systems with spatial scales of 2500 to 40000 km2, which are associated with large-scale disturbances (e.g., cold fronts, warm fronts, and low-pressure systems). The high accuracy of these estimates indicates that the spatial distribution of hourly precipitation frequency is strongly dependent on precipitation systems with these spatial scales. Accordingly, our method shows that the relationship between the spatial distribution of precipitation systems and local precipitation is strong, and by recognizing this relationship, the spatial distribution of local hourly precipitation can be accurately estimated.
Citation: Yoshikane T, Yoshimura K (2022) A bias correction method for precipitation through recognizing mesoscale precipitation systems corresponding to weather conditions. PLOS Water 1(5): e0000016. https://doi.org/10.1371/journal.pwat.0000016
Editor: Debora Walker, PLOS: Public Library of Science, UNITED STATES
Received: September 10, 2021; Accepted: April 7, 2022; Published: May 12, 2022
Copyright: © 2022 Yoshikane, Yoshimura. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The MSM-GPV data were obtained from the archive system (http://apps.diasjp.net/gpv) of the Kitsuregawa laboratories at the Institute of Industrial Science, University of Tokyo. The Radar-AMeDAS data, which were provided by the JMA, are available by contacting the Japan Meteorological Business Support Center (http://www.jmbsc.or.jp/en/contact.html) by emailing the International Service Officer (firstname.lastname@example.org). The official name of the Radar-AMeDAS dataset in Japan is “Kaiseki-Uryou”. Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
Funding: TY and KY were funded by the Environment Research and Technology Development Fund S-20 of the Environmental Restoration and Conservation Agency of Japan (JPMEERF21S12020). KY was funded by the water environment and resource research project at the Earth Observation Research Center, Japan Aerospace Exploration Agency (JX-PSPC-533980), the advanced practice of watershed flood management using surface hydrological prediction system, "New Social Challenges" mission area, JST-Mirai Program (JPMJMI21I6), the Integrated Research Program for Advancing Climate Models (TOUGOU) (JPMXD0717935457) from the MEXT, and the Cross-ministerial Strategic Innovation Promotion Program (SIP), Cabinet Office, Government of Japan https://www8.cao.go.jp/cstp/gaiyo/sip/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Precipitation varies greatly in time and space due to interactions between meso-scale disturbances and local conditions, such as topography . To reduce the risk of water-related hazards, regional precipitation characteristics must be accurately estimated. Numerical models are frequently used to estimate precipitation, and a meso-scale model can reproduce some local precipitation characteristics. However, it is quite difficult to avoid biases caused by model uncertainties, such as physical processes and insufficient resolution. Therefore, bias correction for precipitation is critical for predicting water-related disaster risks and water resources [2–4].
In general, bias correction methods statistically correct for model bias . Monthly bias correction, nested bias correction, and quantile mapping have been used to reduce model biases . Quantile mapping corrects the entire distribution of a variable [5, 7–10]; thus, it may be more effective in correcting the time series of daily precipitation than the other methods . However, if the number of rainy days reproduced by the numerical model is extremely small compared to the observed values, the correction value is adjusted corresponding to the simulation [12, 13]. Moreover, these bias correction methods do not consider random, small-scale variations, which can lead to artifacts . Therefore, correcting the spatial distribution of hourly precipitation frequency is difficult because the methods cannot estimate the precipitation corresponding to the time-varying local atmospheric field conditions .
A variety of bias correction techniques have been developed, most of which deal with daily and monthly precipitation; however, few studies have attempted to correct for bias in hourly precipitation simulations. Recently, focusing on the ability of regional climate models to reproduce extreme rainfall with specific synoptic patterns, a bias correction method for extreme rainfall has been proposed by adding a certain complexity to the traditional quantile mapping method . To estimate regional precipitation characteristics in detail, selecting numerical model outputs (predictions) that have sufficient information is important. Nevertheless, the features extracted for each variable are on a grid; thus the selected explanatory variables are often of a high dimension. Bias correction methods using machine learning, which has developed rapidly in recent years, can estimate precipitations by accurately recognizing the complex relationships between explanatory and objective variables . Additionally, bias correction methods have also been applied to improve the estimation accuracy for monthly or daily precipitation by applying reanalysis data and climate models [1, 14–19]. Furthermore, machine learning bias correction methods have been applied to improve the accuracy of weather forecasts. A previous bias correction study on probability forecasting for maximum hourly precipitation in an 80 km × 90 km area utilized a machine learning bias correction method (quantile regression forests: QRF) . They showed this method accurately forecasted a maximum local hourly precipitation threshold of 30 mm h-1 in the afternoon period with short lead times. However, it remains extremely difficult to estimate the more detailed spatial distribution characteristics.
Numerical models cannot accurately represent self-propagating precipitation systems, which are strongly influenced by individual cumulus convection such as squall lines , due to the nonlinearity of precipitation processes, although models can some characteristics of typical (small-scale) precipitation systems . Therefore, it is quite difficult to correct hourly precipitation using ordinary bias correction methods . Consequently, the accuracy of the bias correction for the detailed spatial distributions of hourly precipitation frequencies is significantly reduced. Nevertheless, the squall line type precipitation systems are often formed in response to larger precipitation systems such as cold fronts , and mesoscale numerical models can reproduce these larger precipitation systems well . Therefore, we may potentially estimate the approximate temporal variations in local precipitation by recognizing the precipitation formation pattern corresponding to the movement of a larger precipitation system.
Precipitation is caused by complex interactions between precipitation systems and local conditions, such as orography. Generally, it is difficult to accurately reproducing the spatial distribution patterns of local precipitation is difficult due to the incompleteness of numerical models (e.g., resolution and parameterizations). Even a slight change in the atmospheric conditions, such as winds, can often significantly change the characteristics of the precipitation distribution due to the complex topography . Therefore, some differences (model biases) will exist between the spatial distribution patterns of simulated and observed precipitation at a local scale. However, the temporal variation bias for precipitation can be reduced if the observed and simulated precipitation distributions have linked patterns that correspond to the movement of larger precipitation systems. In this study, we developed and investigated the validity of a machine learning bias correction method that modifies the spatial distributions of hourly precipitation frequencies to reflect the local conditions by recognizing patterns corresponding to the movement of larger precipitation systems. The target area for the method validation is shown in Fig 1. The water resources in this area supply numerous extensive urban areas in Japan, such as Tokyo . Moreover, this area has suffered from many water disasters . Therefore, accurately estimating the precipitation distribution in this region is crucial.
A) The area surrounding the target area. B) The required area for the input data and the evaluation area for the target domain. The gray square with the dashed border is the area for the explanatory variable (simulated precipitation) and its central grid point is the objective variable (observed precipitation). The machine learning method was applied to all grid points in the evaluation area from the starting grid point (left bottom) to ending grid points (right top). Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
The Japan Meteorological Agency (JMA) uses the Automatic Weather Data Acquisition System (AMeDAS), which consists of 19 radars and approximately 1,300 rain gauges, to produce and provide graphs of hourly precipitation over a 1 km square area . We used Radar-AMeDAS data as the observed precipitation data in this study. Additionally, we focused on the evaluation of the bias correction effect of the method; therefore, the observed data were modified to a resolution of 0.06 degrees to correspond to the grid size of the simulated precipitation using a bi-linear interpolation method for the Grid Analysis and Display System (GrADS) .
The JMA provides a mesoscale-model grid point values (MSM-GPV) dataset, which is a three-dimensional grid forecast for temperature, wind, water vapor, solar radiation, etc., for Japan and the neighboring oceans using a meso-scale numerical model at grid intervals of 5 km. Predictions for up to 39 hours in the future are released every three hours [24, 30, 31]. We used the MSM-GPV dataset as the simulated precipitation data in this study.
Bias correction for precipitation using machine learning
In machine learning, the objective variable is the one being predicted. The explanatory variables are those that can explain the objective variables. Training data is the data used to train computers to create practical artificial intelligence (AI) and machine learning models. A classifier is a machine learning model for data classification and is created using training data. We used the simulated precipitation over the 21 × 21 grid of 0.06 degree cells which covering the area shown in Fig 1, as explanatory variables (feature vectors). The size of the explanatory variables was determined based on the results of experiments with different sizes as described in the later subsection named "Definition of explanatory variable size and resolution". The observed precipitation, which was measured at the center of the area for the explanatory variables, was used as the objective variable. Hence, the area required to establish the explanatory variable was larger than the evaluation area (Fig 1B). The machine learning (ML) method produced a classifier using a pair of simulated precipitation (explanatory variable) and observed precipitation (objective variable) values at each grid point. In the training data, the observed precipitation corresponded to the simulated precipitation from the initial time of data assimilation, which was conducted every 3 hours in the numerical model, to one hour following the assimilation. We used the observed and simulated precipitation values over 11 years (from 2007 to 2018, excluding the target year) as the training dataset and evaluated the estimated local precipitation. We confirmed the generalizability of the method by conducting a cross-validation using the estimated precipitation in January, April, July and October from 2007 to 2018. Climate studies often evaluate the climatic characteristics of the four seasons: December–February, March–May, June–August, and September–November, thus, the middle months were selected to reflect the characteristics of each season. The precipitation characteristics differ greatly among the seasons, thereby confirming the versatility of our method. The procedure for the bias correction method, the details of the method used to estimate local precipitation, and the number of training and testing data are shown in Fig 2 and S1 Fig and S1 Table, respectively.
We used a support vector machine (SVM) regression model (SVR) , which is a machine learning model, in this study. A SVM is a supervised learning method based on a portion of a dataset in which predictions are obtained with a support vector. A SVM attempts to obtain the optimal results by calculating the maximum-margin hyperplane, which is determined by maximizing the distance between the support vectors. SVMs have been employed in various fields, such as meteorology, hydrology, disasters, and water resources [19, 33, 34]. Previous studies have indicated the advantages and disadvantages of SVMs relative to other machine learning methods, such as neural networks and random forests [34–38]. For example, SVR has been shown to perform well with small sample sizes . Thus, this method is useful for recognizing rare precipitation events with small sample sizes. The support vector machine library in the scikit-learn system (Epsilon-Support Vector Regression) in scikit-learn 0.24.2 system  was used in this study.
Determination of hyperparameters
The SVR method requires the gamma, C, and epsilon hyperparameters to be configured. Gamma is a kernel function parameter that specifies the width of the Gaussian radial basis function kernel, C is the penalizing constraint error and epsilon is the width of the insensitive zone .
In the SVM pattern recognition method, the vector x = (x1, x2, x3…….., xp)T consisting of p explanatory variables is the input, and a classifier is trained to correctly output the objective variable f(xi). By introducing the intercept b and the coefficient vector w = (w1, w1, w1…….., wp)T, we define the linear regression function as follows.(1)
In this function, b and w are estimated so that this relationship is satisfied. In SVR, the non-negative parameter ε is set in advance, and only large residuals of ei that exceed the range of −ε ≤ ei ≤ ε are recorded as penalties of ξi. The parameters are estimated to minimize the following equation.(2)(3)
However, the following restrictions apply.(4)
In the above equations, C is the penalizing constraint error and, ε (epsilon) is the width of the insensitive zone. To determine the nonlinear regression equation, we introduce a feature map φ(x) that represents the vector of the nonlinear terms (features) of the explanatory variables. The regression function using the feature map is as follows.(5)
To avoid increasing the computational cost due to the increased dimension of the feature space, a kernel function was introduced to be able to represent the inner product in the feature space.(6)
The inner product of the two vectors is the largest when they have the same direction, so the kernel function can be interpreted as the similarity between two vectors in feature space. However, it is difficult to calculate the inner product (6) when the dimension of the feature space is high; therefore, the kernel method uses the following function, which is an inner product on a high-dimensional space.(7)
This function is called the gaussian radial basis function kernel and, γ (Gamma) is the kernel function parameter [41, 42].
Determining these hyperparameters is crucial for improving the generalizability of precipitation estimations. However, substantial computational resources are necessary to determine the optimal parameters [36, 43]. Therefore, it is necessary to obtain the optimal hyperparameters effectively. The hyperparameters could be configured at each point in the method, which may improve performance to some extent. However, this approach is extremely inefficient because considerable computational resources are required to determine the optimal values throughout the entire domain. Therefore, we applied the specified hyperparameter values to all grid cells in the domain according to the following procedure. First, we estimated the optimal hyperparameter values based on a random search  on around 10 grid points in the domain. The optimal values of gamma, C, and epsilon were estimated to be approximately 5×10−6, 10, and 0.001, respectively, which were assumed to be applicable to all grid cells because they did not vary extensively among the grid points. Next, the precipitation estimation performance was investigated based on the correlation coefficients of 35 grid cells, which were averaged over every 10 grids, as shown in S2 Fig First, the optimal gamma value was estimated using temporary values of C (10) and epsilon (0.001). Second, the optimal C value was obtained using the optimal gamma value and a temporary epsilon value. Third, the optimal epsilon value was obtained using both the optimal gamma and C values. Finally, the optimal gamma was obtained using both the optimal C and epsilon values. The parameters were optimally determined if they corresponded to the initial estimates or if the correlation coefficients did not change considerably. The optimal values of gamma, C, and epsilon were approximately 5×10−6, 10, and 0.001, respectively. Thus, we obtained the optimal hyperparameter values and configured them for all grid cells.
Definition of the size and resolution of the explanatory variables
S3 Fig shows plots of the correlation coefficients obtained under different area size for explanatory variable. The performance tended to improve as the area size increased. Small area sizes of explanatory variable were likely insufficient for estimating precipitation with high accuracy because of the lack of information. In other words, limited explanatory variables may be insufficient to explain the objective variable. Considering the performance and computational cost, we set the size for the explanatory variable to a 21 by 21 grid in this study.
Estimation of heavy precipitation using quantile mapping
We utilized quantile mapping , after applying the ML method, to modify the amount of precipitation by using the observed and estimated cumulative density functions of hourly precipitation. Precipitation of 0.1 mm h-1 or more was defined as a wet hour, and precipitation during wet hours was applied to the precipitation quantile mapping method. The corrected intensity for the estimated precipitation of a given quantile was obtained from the observed precipitation intensity distribution with the same quantile value. Here, we performed quantile mapping using the observed hourly precipitation data and the estimated data during January, April, July, and October for 11 years from 2007 to 2018, excluding the estimated year. We also performed the conventional quantile mapping (QM) method using the observed and simulated cumulative density functions of hourly precipitation (Fig 2 and S1 Fig).
Prediction of local precipitation using 39-h forecasted precipitation simulations
We investigated the local precipitation-estimation performance based on our method by using the 39-h forecasted precipitation data from the MSM-GPV dataset, which started at 0 UTC each day in January, April, July, and October for five years (from 2014 to 2018, as 39-h forecasted data were not provided by the JMA until 2014). For this purpose, we applied the classifiers and cumulative density functions produced by our method in advance (Fig 2 and S1 Fig).
Spatial autocorrelation of hourly precipitation
We investigated the spatial scale of the precipitation systems recognized by the spatial autocorrelation method using hourly precipitation data . The autocorrelations between two grid points were calculated using all grid points in the evaluation area, and the spatial scale of the highly autocorrelated area was estimated to be more than 0.7 in each month from the following equation: (8) where r is the spatial autocorrelation between points k and l, n is the total number of hourly precipitation data in each month from 2007 to 2018, and x is the hourly precipitation.
Nash–Sutcliffe Efficiency index of hourly precipitation
We investigated the performance of temporal variation in the hourly precipitation using Nash–Sutcliffe Efficiency index (NSE). The NSE value was estimated using the following equation: (9) where NSE is the NSE index, n is the total number of hourly precipitation data in each month from 2007 to 2018. x, x’, and x are the observed hourly precipitation, estimated (or simulated) hourly precipitation, and 12-year average of the observed hourly precipitation, respectively.
Validation of the bias correction method for precipitation
The simulated precipitation distribution characteristics were substantially different from those of the observed precipitation due to topographic differences and incompleteness of numerical models. Nevertheless, as shown in S4 Fig, the MSM-GPV data represented the characteristics of the temporal variations in the area-averaged precipitation intensity over a wide area corresponding to large-scale weather patterns. Therefore, the distribution patterns of the simulated precipitation over a wide area were likely connected to the observed precipitation distribution through weather patterns.
Model bias was found in the long-term precipitation simulations. Fig 3 shows the spatial distribution of the frequency of the 95th-percentile values obtained from the observed precipitation (OBS), precipitation estimated by the ML method with quantile mapping (MLQM), simulated precipitation (SIM), and precipitation estimated by quantile mapping (QM), in January, April, July, and October from 2007 to 2018. The spatial correlation coefficient (R) and root mean square error (RMSE) of the frequency of the 95th percentile in MLQM, SIM, and QM for OBS are shown in Table 1 for the evaluation area in January, April, July, and October. The spatial correlation coefficients for the MLQM exceeded 0.98, and the RMSEs were reduced significantly. However, the correlation coefficient and RMSE values for the QM were almost the same as those for the SIM.
The observed precipitation (OBS), precipitation estimated by the ML method with quantile mapping (MLQM), simulated precipitation (SIM), and precipitation estimated by quantile mapping (QM), in January, April, July, and October from 2007 to 2018 are shown. Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
Figs 4 and 5 show the spatial distributions patterns of the 12-year average monthly means and 95th-percentile precipitation values diring the target months, respectively. QM improved the monthly mean precipitation distribution characteristics to some extent, although its performance of QM was less optimal than that of MLQM. Unsurprisingly, QM improved the 95th-percentile values due to the characteristics of the QM method [12, 13]. The spatial correlation coefficients of the monthly means for the MLQM method exceeded 0.97, and the RMSEs were reduced significantly using this method. Moreover the spatial correlation coefficients of the monthly means for the QM method were inferior to those for the MLQM method (Table 1).
Same as Fig 3 except for 12-year average monthly mean. Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
Same as Fig 3 except for 95th-percentile precipitation values. Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
Fig 6 shows the spatial distributions of the NSE index values for the hourly precipitation for MLQM, SIM, and QM. The NSE index shows the performance of the temporal variation of estimated (or simulated) hourly precipitation. The NSE index values for MLQM were noticeably higher than those for SIM and QM in the high precipitation areas in January. However, the difference between the NSE index values for MLQM, SIM and QM in other months was small.
Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
Applicability of the method for forecasting precipitation data
Fig 7 shows the frequencies of the 95th-percentile OBS data, the precipitation estimated by applying the quantile mapping method after applying the 39-hour forecast value to the classifier created by the ML method (MLQM-forecast), and the forecasted precipitation (SIM-Forecast), as well as the correlation coefficients and RMSE values, during the five years from 2014 to 2018 in January April, July, and October. We used the forecasted data from 25 to 39 hours following the initial time to remove the influence of the initial time as much as possible. However, accurately comparing the simulated precipitation amount with the observed amount remained difficult because the forecasted data included errors based on the initial conditions. Nevertheless, the frequency distribution characteristics were accurately estimated by the MLQM-Forecast (Table 2). The correlation coefficient and RMSE values of the frequencies in the MLQM-Forecast dataset were improved over the other methods for all months.
The frequencies of the 95th-percentile OBS data, the precipitation estimated by the ML method with QM (MLQM-Forecast), and the simulated precipitation (SIM-Forecast) during the five years from 2014 to 2018 in January April, July and October. Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
Moreover, our method reasonably improved the spatial distribution patterns of the monthly mean precipitation and 95th-percentile values except for July in MLQM-Forecast (Figs 8 and 9). However, the amount of precipitation in the MLQM-Forecast at each grid point is somewhat different from that of the OBS. The overestimation of precipitation is especially pronounced in July. Additionally, some of the correlation coefficient and RMSE values for the monthly mean and 95th-percentile values of the MLQM-Forecast were lower than those of the SIM-Forecast (Table 2). The testing period (five years) might be too short to accurately evaluate the amount of precipitation because even a few disturbances such as heavy rain bands or typhoons in the evaluation area can have a significant impact on the distributions during the warm season from June to August.
Same as Fig 6 except for 5-year average monthly mean. Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
Same as Fig 6 except for 95th-percentile precipitation values. Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
Fig 10 shows the areas of the representative spatial autocorrelation with hourly precipitation greater than 0.7. The area of the OBS and the ML estimated precipitation (MLQM) were the smallest and largest in all months, respectively. The highly correlated areas increased in January and decreased in July, while those in April and October were nearly identical.
The areas of the representative spatial autocorrelation with hourly precipitation greater than 0.7 are shown.
The representative spatial scale of the precipitation systems can be estimated from the areas with higher spatial autocorrelation coefficients. Considering that only the precipitation distribution was used as an explanatory variable in this method, it is suggested that our method estimate the precipitation by recognizing the larger mesoscale precipitation systems with the areas of 2500 to 40000 km2. Meanwhile it is difficult to recognize the (small-scale) precipitation systems with the areas less than 2500 km2 because the relationship between the observations and simulations is unclear.
Numerical models cannot represent (small-scale) precipitation systems such as squall lines due to the nonlinearity (sensitive dependence on initial conditions) of the precipitation processes . Generally, squall lines are formed by the interaction between individual convective systems and ambient wind shear [46–49]. Because this precipitation is highly nonlinear, even a perfect model can produce very different results due to small differences in the initial values. Considering that squall lines are formed by the interaction between individual cumulus convection and ambient wind shear, and then are affected by the ambient wind shear, it is difficult to reproduce the squall lines in numerical models that correspond exactly to the observations because of the sensitivity of the dependence on the initial values . In some cases, the location of precipitation may shift significantly, or the precipitation itself may not form in the simulation. Consequently, the ML method cannot recognize the patterns in squall lines clearly, resulting in increased estimation errors for temporal distributions of hourly precipitation. The NSE index values shown in Fig 6 indicate the difficulty in reproducing short-duration precipitation caused at small scales. Even a one-hour shift in the peak of precipitation caused a large error. There was little improvement, except in January when the winter monsoon is prevalent and orographic precipitation is often formed. Fig 5K shows that the precipitation of 95th percentile values in July was slightly underestimated throughout the area. It has been suggested that a grid size of 1.5 km or less is needed to accurately reproduce heavy summer precipitation . Therefore, the MSM-GPV with a grid space of 5 km may not be sufficient to represent heavy precipitation. In cases where the passage of a simulated front is shifted by approximately one hour compared to the observations, the error in the maximum precipitation for each precipitation event of the SIM are will be smaller that of the MLQM and QM, because the heavy precipitation amounts in SIM are underestimated. Due to this underestimation, the NSE index of the SIM may be higher than that of MLQM and QM. Therefore, it is difficult to evaluate the total performance of our method only by the value of NSE index.
Nevertheless, our method significantly improved the long-term performance of the hourly estimations for precipitation. In general, small-scale precipitation systems are often formed in association with larger mesoscale precipitation systems . Moreover, the formation of long-term local precipitation strongly depends on the temporal variability of meso-β- and meso-α-scale precipitation systems associated with large-scale disturbances (e.g., cold fronts, warm fronts, atmospheric blocking and low-pressure systems) [52–58]. Therefore, our method, which recognizes the pattern of precipitation distribution produced mainly by meso-β and meso-α scale precipitation systems rather than (small-scale) disturbances such as squall lines, can estimate the approximate temporal variation in intensity of local precipitation and improve estimations of the spatial distribution of temporal precipitation frequency.
S3 Fig shows that the estimation error increases larger when the size of the precipitation distribution, which is the explanatory variable, decreases. Hence, the error will be large because ML attempts to recognize mainly the movement patterns of (small-scale) precipitation systems such as squall lines, which are difficult to reproduce with numerical models. Thus, if the optimal input data are not selected, it will be difficult to accurately estimate local precipitation even using advanced machine learning models. If the optimal input data are unknown, we can select a set of data that may be relevant to the observations and simulations, and thereby determine the relevant elements from the dataset by applying dimensional reduction. However, the selected dataset may not contain the elements necessary for bias correction. Consequently, the selection of the dataset before applying machine learning would greatly influences its performance.
QM does not consider the spatial distribution patterns of precipitation, but estimates it using only the characteristics of the simulation at the corresponding point. Therefore, the temporal variation in precipitation corresponds to the simulation (S5 Fig). If the precipitation frequency in the simulation is extremely underestimated, it will also be underestimated in QM, which corresponds to the simulation. Thus, the accumulated precipitation (monthly average precipitation) will also be greatly affected, making it difficult to accurately estimate water resources.
The significant improvement in the accuracy of local precipitation estimation by our method means that the long-term characteristics of local precipitation are mainly formed by precipitation associated with systems above the meso-β scale, which can be reproduced by numerical models. Since the quantile mapping method used the observed precipitation to correct the precipitation estimated by ML, the observed precipitation by squall lines is also reflected in MLQM. This could also cause the large errors in the temporal distribution of hourly precipitation in the MLQM. Therefore, our method can estimate the long-term spatial distribution characteristics of local precipitation with high accuracy, although it does not necessarily improve the temporal distributions of hourly local precipitation.
If the input elements are simple, we can roughly understand what machine learning is recognizing and clarify the limits of the method. Conversely, the more complex a machine learning method is, the less clear the appropriateness of the input data becomes, and thus, the more difficult it becomes to identify the causes of problems. Hence, the reliability of the method can be greatly reduced. Therefore, reliability could be improved by designing the method so that the relationship between the objective and explanatory variables can be reasonably explained based on meteorological theories and previous studies.
In this study, the simulated precipitation distribution within 113 km2 area, where the relationship between observation and simulation was clear, was used as an explanatory variable, and the observed precipitation at the center of the area was applied to ML as an objective variable. Furthermore, the validity of this bias correction method was examined based on the representative spatial scales of the estimated precipitation. The results showed that our method can significantly reduce the bias in simulation of the spatial distribution of hourly precipitation frequency, reflecting the local conditions, although the amount of precipitation and the temporal distribution of hourly precipitation are hard to evaluate, particularly in summer (July). The RMSEs of the spatial distribution of the MLQM and MLQM-Forecast precipitation frequency were reduced by 41%~51% and 22%~36%, respectively, in January, April, July, and October compared to the SIM and SIM-Forecast. The monthly mean precipitation in the MLQM was 50%~70% lower than that in the SIM, However, no reduction was found in the MLQM-Forecast, and precipitation increased in July compared to SIM-Forecast. The overestimation (or underestimation) of precipitation may be because the training period was too short to obtain sufficient samples of heavy rainfall, and that our method cannot handle temporal precipitation that has not been observed in the past. Moreover, our method mainly estimates meso-β-scale and meso-α-scale precipitation systems with spatial scales corresponding to 2500–40000 km2, which contributed significantly to the spatial distributions of hourly precipitation frequencies. Accordingly, this finding shows that our bias correction method can greatly improve performance by setting and simplifying the explanatory and objective variables appropriately to clarify the relationship between the observations and simulation becomes more clear.
S1 Fig. Detailed overview of the precipitation estimation method.
The observed and simulated precipitation data over 12 years (from 2007 to 2018, excluding the target year) were used as the training dataset. In the MLQM-Forecast, precipitation estimated by the ML method and observed precipitation for the same 12 years except the target year were used to create a classifier, and the 39-hour forecast was applied to the classifier and estimated by the quantile mapping method.
S2 Fig. Separate grid points (blue) used to estimate the optimal hyperparameters.
Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
S3 Fig. Correlation coefficients for the difference in the number of grid points on a side of the area.
A total of 21 grid points means a 21 × 21 grid (0.06 degree) for the size of the area for the simulated precipitation (the explanatory variable).
S4 Fig. Temporal variations in the area-averaged precipitation intensity over a wide area (The bold dashed frame in Fig 1B) in the observed precipitation (OBS) and simulated precipitation (SIM) from 2007 to 2018.
S5 Fig. Temporal variations in the hourly precipitation in January 2015 at a specific grid point.
(A) Observed precipitation (OBS) (grey bars), precipitation estimated by the ML method (ML) (orange line), and precipitation estimated by the ML method with quantile mapping (MLQM) (blue line), (B) simulated precipitation (SIM) (orange line) and precipitation estimated by quantile mapping (QM) (blue line). (C) Location of the grid point (red marker). Topographic data of U.S. Geological Survey (USGS) (http://www.usgs.gov) was used. Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. (http://www.naturalearthdata.com/about/terms-of-use/).
S1 Table. The number of training data and testing data for validation and forecasting.
The MSM-GPV data were obtained from the archive system (http://apps.diasjp.net/gpv) of the Kitsuregawa laboratories at the Institute of Industrial Science, University of Tokyo. The Radar-AMeDAS data, which were provided by the JMA, are available by contacting the Japan Meteorological Business Support Center (http://www.jmbsc.or.jp/en/contact.html) by emailing the International Service Officer (email@example.com). The official name of the Radar-AMeDAS dataset in Japan is “Kaiseki-Uryou”.
- 1. Wong G, Maraun D, Vrac M, Widmann M, Eden JM, and Kent T. Stochastic model output statistics for bias correcting and downscaling precipitation including extremes. Journal of Climate. 2014;27(18), 6940–6959.
- 2. Sivakumar B. Global climate change and its impacts on water resources planning and management: assessment and challenges. Stochastic Environmental Research and Risk Assessment. 2011; 25(4): 583–600.
- 3. Piao S, Ciais P, Huang Y, Shen Z, Peng S, Li J et al. The impacts of climate change on water resources and agriculture in China. Nature. 2010; 467(7311): 43–51. pmid:20811450
- 4. Van Aalst MK. The impacts of climate change on the risk of natural disasters. Disasters. 2006; 30(1): 5–18. pmid:16512858
- 5. Maraun D, Wetterhall F, Ireson AM, Chandler RE, Kendon EJ, Widmann M, et al. Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user. Reviews of geophysics, 2010;48(3).
- 6. Sachindra DA, Huang F, Barton A, Perera BJC, Statistical downscaling of general circulation model outputs to precipitation—Part 2: Bias-correction and future projections. Int. J. Climatol. 2014;34 (11), 3282–3303. http://dx.doi.org/10.1002/joc.3915.
- 7. Lafon T, Dadson S, Buys G, and Prudhomme C. Bias correction of daily precipitation simulated by a regional climate model: a comparison of methods. International Journal of Climatology. 2013; 33(6): 1367–1381. https://doi.org/10.1002/joc.3518.
- 8. Michelangeli PA, Vrac M, and Loukos H. Probabilistic downscaling approaches: Application to wind cumulative distribution functions. Geophysical Research Letters. 2009; 36: 11. https://doi.org/10.1029/2009GL038401.
- 9. Piani C, Haerter JO. and Coppola E. Statistical bias correction for daily precipitation in regional climate models over Europe. Theor. Appl. Climatol. 2010; 99: 187–192.
- 10. Teutschbein C. and Seibert J. Bias correction of regional climate model simulations for hydrological climate-change impact studies: review and evaluation of different methods. J. Hydrol. 2012; 456: 12–29.
- 11. Li J, Sharma A, Evans J, and Johnson F. Addressing the mischaracterization of extreme rainfall in regional climate model simulations–A synoptic pattern based bias correction approach. Journal of Hydrology, 2018;556, 901–912.
- 12. Maraun D. Bias correcting climate change simulations-a critical review. Current Climate Change Reports, 2016;2(4), 211–220.
- 13. Maraun D, Shepherd TG, Widmann M, Zappa G, Walton D, Gutiérrez JM, et al. Towards process-informed bias correction of climate change simulations. Nature Climate Change. 2017; 7(11): 764–773. https://doi.org/10.1038/nclimate3418
- 14. Vandal T, Kodra E. and Ganguly AR. Intercomparison of machine learning methods for statistical downscaling: the case of daily and extreme precipitation. Theoretical and Applied Climatology. 2019; 137(1): 557–570.
- 15. Ahmed K, Sachindra DA, Shahid S, Iqbal Z, Nawaz N, and Khan N. Multi-model ensemble predictions of precipitation and temperature using machine learning algorithms. Atmospheric Research. 2020; 236: 104806.
- 16. Li H, Sheffield J, and Wood EF. Bias correction of monthly precipitation and temperature fields from Intergovernmental Panel on Climate Change AR4 models using equidistant quantile matching. Journal of Geophysical Research: Atmospheres. 2010; 115(D10).
- 17. Ortiz-García EG, Salcedo-Sanz S, and Casanova-Mateo C. Accurate precipitation prediction with support vector classifiers: A study including novel predictive variables and observational data. Atmospheric research. 2014; 139: 128–136.
- 18. Raje D, and Mujumdar PP. A comparison of three methods for downscaling daily precipitation in the Punjab region. Hydrological Processes. 2011; 25(23): 3575–3589.
- 19. Sachindra DA, Ahmed K, Rashid MM, Shahid S, and Perera BJC. Statistical downscaling of precipitation using machine learning techniques. Atmospheric research. 2018; 212: 240–258. https://doi.org/10.1016/j.atmosres.2018.05.022.
- 20. Whan K, and Schmeits M. Comparing area probability forecasts of (extreme) local precipitation using parametric and machine learning statistical postprocessing methods. Monthly Weather Review. 2018; 146(11): 3651–3673.
- 21. Newton CW. Structure and mechanism of the prefrontal squall line. Journal of Atmospheric Sciences. 1950; 7(3): 210–222
- 22. Hanley KE, Kirshbaum DJ, Roberts NM, and Leoncini G. Sensitivities of a squall line over central Europe in a convective-scale ensemble. Monthly weather review. 2013; 141(1): 112–133.
- 23. Gagne DJ, McGovern A, and Xue M. Machine learning enhancement of storm-scale ensemble probabilistic quantitative precipitation forecasts. Weather and Forecasting. 2014; 29(4): 1024–1043.
- 24. Saito K, Fujita T, Yamada Y, Ishida J, Kumagai Y, Aranami K, et al. The operational JMA non- hydrostatic mesoscale model. Mon. Wea. Rev. 2006; 134: 1266–1298. https://doi.org/10.1175/MWR3120.1
- 25. Roe GH. Orographic precipitation. Annual Review of earth and planetary sciences, 2005;33(1), 645–671.
- 26. Taniguchi K. Future changes in precipitation and water resources for Kanto Region in Japan after application of pseudo global warming method and dynamical downscaling. Journal of Hydrology: Regional Studies, 2016;8, 287–303.
- 27. Ma W, Ishitsuka Y, Takeshima A, Hibino K, Yamazaki D, Yamamoto K, et al. Applicability of a nationwide flood forecasting system for Typhoon Hagibis 2019. Scientific reports, 2021;11(1), 1–12. pmid:33414495
- 28. Makihara Y, Uekiyo N, Tabata A, and Abe Y. Accuracy of radar-AMeDAS precipitation. IEICE Transactions on Communications 1996; 79:751–762.
Doty B, and Kinter JI. The Grid Analysis and Display System (GrADS): a desktop tool for earth science visualization. In American Geophysical Union 1993 Fall Meeting. 1993;6–10.
Ishikawa Y, and Koizumi K. Meso-scale Analysis. Outline of the Operational Numerical Weather Prediction at the Japan Meteorological Agency. 2002; 26–31.
JMA. NWP Application Products. 2019; https://www.jma.go.jp/jma/jma-eng/jma-center/nwp/outline2019-nwp/pdf/outline2019_04.pdf (accessed on 20 August 2021)
- 32. Smola A J, and Schölkopf B. A tutorial on support vector regression. Statistics and computing. 2004; 14: 199–222.
- 33. Fan J, Wang X, Wu L, Zhou H, Zhang F, Yu X, et al. Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy conversion and management. 2018; 164: 102–111. https://doi.org/10.1016/j.enconman.2018.02.087
- 34. Chen H, Chandrasekar V, Cifelli R, and Xie P. A machine learning system for precipitation estimation using satellite and ground radar network observations. IEEE Transactions on Geoscience and Remote Sensing. 2019; 58(2): 982–994.
- 35. Al-Anazi AF, and Gates ID. Support vector regression to predict porosity and permeability: Effect of sample size. Computers & geosciences. 2012; 39: 64–76.
- 36. Cherkassky V, and Ma Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural networks. 2004; 17(1): 113–126. https://doi.org/10.1016/S0893-6080(03)00169-2 pmid:14690712
- 37. Liu P, Choo KK. R, Wang L, and Huang F. SVM or deep learning? A comparative study on remote sensing image classification. Soft Computing. 2017; 21(23): 7053–7065. https://doi.org/10.1007/s00500-016-2247-2
- 38. Sivapragasam C, Liong SY, and Pasha MFK. Rainfall and runoff forecasting with SSA–SVM approach. Journal of Hydroinformatics. 2001; 3(3): 141–152. https://doi.org/10.2166/hydro.2001.0014
- 39. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. The Journal of machine Learning research. 2011; 12: 2825–2830.
Smets K, Verdonk B, and Jordaan EM. Evaluation of performance measures for SVR hyperparameter selection. In 2007 International Joint Conference on Neural Networks. IEEE. 2007; 637–642. https://doi.org/10.1109/IJCNN.2007.4371031
Vladimir V. The nature of statistical learning theory. Springer science & business media, 1999.
- 42. Takano Y. Support vecter machine and Kernel method. Journal of the Operations Research Society of Japan, 2020;65, 304–309.
Anguita D, Ghio A, Greco N, Oneto L, and Ridella S. Model selection for support vector machines: Advantages and disadvantages of the Machine Learning Theory. The 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain. 2010; 1–8.
- 44. Bergstra J, and Bengio Y. Random search for hyper-parameter optimization. Journal of machine learning research. 2012; 13(2): 281–305. https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a
Tomosugi K, and Ysuji Y. A study on time and space distribution of heavy rainfalls (2). Analysis of correlative structures based on great-sphere data of hourly rainfall.
- 46. Kato T, Structure of the band-shaped precipitation system inducing the heavy rainfall observed over northern Kyushu, Japan on 29 June 1999. J. Meteor. Soc. Japan, 2006; 84: 129–153.
- 47. Bluestein HB, Marx GT, and Jain MH. Formation of mesoscale lines of precipitation: Nonsevere squall lines in Oklahoma during the spring. Monthly Weather Review.1987; 115(11): 2719–2727.
- 48. Browning KA, Frankhauser JC, Chalon JP, Eccles PJ, Strauch RG, Merrem FH, et al. Structure of an evolving hailstorm part V: Synthesis and implications for hail growth and hail suppression. Monthly Weather Review. 1976; 104(5): 603–610.
- 49. Rotunno R, Klemp JB, and Weisman ML. A theory for strong, long-lived squall lines. Journal of Atmospheric Sciences, 1998;45(3), 463–485.
- 50. Lorenz EN. Deterministic nonperiodic flow, J. Atoms. Sci. 1963; 20: 130.
- 51. Kendon EJ, Roberts NM, Fowler HJ, Roberts MJ, Chan SC, and Senior CA. Heavier summer downpours with climate change revealed by weather forecast resolution model. Nature Climate Change, 2014;4(7), 570–576.
- 52. Orlanski I. A rational subdivision of scales for atmospheric processes. Bulletin of the American Meteorological Society. 1975; 527–530.
- 53. Hobbs PV and Persson POG. The mesoscale and microscale structure and organization of clouds and precipitation in midlatitude cyclones. Part V: The substructure of narrow cold-frontal rainbands. Journal of Atmospheric Sciences.1982; 39(2): 280–295.
- 54. Waymire ED, Gupta VK, and Rodriguez‐Iturbe I. A spectral theory of rainfall intensity at the meso-β-scale. Water Resources Research. 1984; 20(10): 1453–1465.
- 55. Ninomiya K, Akiyama T. and Ikawa M. Evolution and fine structure of a long-lived meso-α-scale convective system in Baiu frontal zone Part I: evolution and meso-β-scale characteristics. Journal of the Meteorological Society of Japan. Ser. II. 1988; 66(2): 331–350.
- 56. Ding Y. Summer monsoon rainfalls in China. Journal of the Meteorological Society of Japan. Ser. II. 1992; 70(1B): 373–396.
- 57. Fischer C. and Lalaurette F. meso-β-scale circulations in realistic fronts. II: Frontogenetically forced basic states. Quarterly Journal of the Royal Meteorological Society. 1995; 121(526): 1285–1321.
- 58. Nuissier O. Ducrocq VD. Lebeaupin C. and Anquetin S. A numerical study of three catastrophic precipitating events over southern France. I: Numerical framework and synoptic ingredients. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography. 2008; 134(630): 111–130.