## Figures

## Abstract

The pursue of a high resolution gridded climate data and weather forecast requires an unprecedented number of *in situ* near-surface observations to model the sub-mesoscale. National meteorological services (NMS) have practical and financial limitations to the number of observations it can collect, therefore, opening the door to crowdsourced weather initiatives might be an interesting option to mitigate data scarcity. In recent years, scientists have made remarkable efforts at assessing the quality of crowdsourced collections and determining ways these can add value to the “daily business” of NMS. In this work, we develop and apply a multi-fidelity spatial regression method capable of combining official observations with crowdsourced observations, which enables the creation of high-resolution interpolations of weather variables. The availability of a sheer volume of crowdsourced observations also poses questions on what is the maximum weather complexity that can be modelled with these novel data sources. We include a structured theoretical analysis simulating increasingly complex weather patterns that uses the Shannon-Nyquist limit as a benchmark. Results show that the combination of official and crowdsourced weather observations pushes further the Shannon-Nyquist limit, thus indicating that crowdsourced data contributes at monitoring sub-mesoscale weather processes (e.g. urban scales). We think that this effort illustrates well the potential of crowdsourced data, not only to expand the current range of products and services at NMS, but also opening the door for high-resolution weather forecast and monitoring, issuing local early warnings and advancing towards impact-based analyses.

**Citation: **van Beekvelt D, Garcia-Marti I, de Baar J (2024) Towards high-resolution gridded climatology stemming from the combination of official and crowdsourced weather observations using multi-fidelity methods. PLOS Clim 3(1):
e0000216.
https://doi.org/10.1371/journal.pclm.0000216

**Editor: **Ferdous Ahmed,
IUBAT: International University of Business Agriculture and Technology, MALAYSIA

**Received: **April 23, 2023; **Accepted: **November 13, 2023; **Published: ** January 3, 2024

**Copyright: ** © 2024 van Beekvelt et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **The first, second, and third party data observations used in this research are available through the following public data repository: https://data.4tu.nl/datasets/de341c98-989c-4ada-8e85-4b7621c7f9a7 with permission of the UK Met Office (© British Crown copyright 2011, Met Office) and Rijswaterstaat (Ministry of Infrastructure and Water Management). DOI number: https://doi.org/10.4121/de341c98-989c-4ada-8e85-4b7621c7f9a7.v1.

**Funding: **The authors received no specific funding for this work.

**Competing interests: ** The authors have declared that no competing interests exist.

## 1 Introduction

In the past three decades, the need for high-quality gridded climate data sets has fuelled a continuous increase of spatial resolution (i.e. grid spacing) of these data sets [1]. Where such data sets were initially typically provided at a ≈ 50 km resolution, current data sets are often provided at ≈ 10 km resolution [2]. However, users of such data sets ask for even higher spatial resolution, and we are currently pushing into the ≈ 1 km resolution [3].

Monitoring the recent history and present state of the local weather is an important aspect of the work of National Meteorological Services (NMS) weather rooms and early warning centres. For example, monitoring temperature and precipitation at a high spatial resolution is an important part of warning for possible adverse road traffic conditions.

At the same time, weather forecasts have steadily been improving since the onset of the 1980s [4]. The inclusion of surface and satellite observations and the improvement of data assimilation methods, powered with better computational capabilities, yielded a substantial global increase in the forecast skill [4, 5]. Numerical weather prediction models assimilate a sheer volume of observations, thus enabling forecasts typically at the mesoscale or synoptic scales, which are adequate resolutions to monitor large phenomena. Nevertheless, modelling sub-mesoscale processes (e.g. urban or neighborhood scales) would require an unprecedented number of surface observations that NMS might be unable to provide.

Parallel to the improvement of the numerical weather prediction models has been the advent of new technological and scientific advances that have changed the way surface observations are acquired. The appearance, consolidation, and current ubiquity of wireless networks, coupled with decreasing hardware prices, implies that today the acquisition of surface observations is possible practically anywhere on Earth [6]. These favorable conditions prompted the organic (or commercial) creation of new observational networks in which participants install personal weather stations (PWS) in their available spaces (e.g. home, schools, urban parks) and start measuring the weather collaboratively.

The Royal Netherlands Meteorological Institute (KNMI) has been an active participant of such crowdsourced initiatives. In 2015, KNMI became a partner of the Weather Observations Website (WOW) initiative, a global monitoring project conceived to provide a cloud-based platform where users can share their weather observations [7]. Currently, more than 30,000 users worldwide contribute to the WOW platform, yielding more than one million observations per day [8]. Focusing on the Dutch WOW (hereinafter: WOW-NL, http://wow.knmi.nl), the figures remain remarkable: around 1,000 stations monitor the weather in the Netherlands, yielding more than 300 million observations in nine years.

The availability of crowdsourced data collections (e.g. WOW, Netatmo) has motivated a buzzing activity around this topic across European NMS researchers in recent years. Substantial efforts have been dedicated to determine the quality of the weather measurements [9–11], and some of these quality assurance procedures have turned into software packages used in research [12, 13]. Some of these research lines investigate the potential that crowdsourced weather data has to study small-scale weather patterns, such as urban wind [14] or tracking the movement of storm systems [15]. These efforts have been picked up by large international organizations, such as the European Meteorological Network (EUMETNET) [16] or the European Centre for Medium-range Weather Forecasts (ECMWF), which have manifested interest in continuing promoting the usage of these novel data sources.

We join this effort illustrating potential for crowdsourced weather data beyond weather forecasting. We hypothesize that these (near) real-time observations have high potential to contribute creating high-resolution interpolations or issuing local early warnings during severe weather conditions. In this work, we describe how we applied a data-driven multi-fidelity spatial regression method to create high-resolution interpolations for the Netherlands. This method is able to combine official observations from the KNMI network and WOW-NL observations. We provide a structured analysis simulating complex weather patterns that are illustrative of the substantial contribution that crowdsourced data can offer at pursuing the monitoring of the sub-mesoscale processes. In addition, we use the Shannon-Nyquist theorem [17] to assess how much the inclusion of WOW-NL data increases our theoretical skill to observe patterns with a higher spatial frequency (e.g. rainfall, wind). We expect that the prediction made using only the official KNMI data will perform well if the spatial frequency is low. However, we expect that when the spatial frequency is high the predictions will be less accurate. The Shannon-Nyquist theorem states that the more oscillations there are, the more measurement points are needed to be able to identify these oscillations [18]. Therefore, we expect that adding more measurement points, such as the WOW stations, to our prediction will push back the Shannon-Nyquist limit, resulting in the ability to observe weather patterns of higher spatial frequency (i.e. more oscillations). This would imply that by using additional sources to KNMI we will be able to make more accurate predictions for weather phenomena with high spatial variability.

We hope these results will motivate NMS at including crowdsourced data to increase the resolution of the official products and services.

## 2 Data

### 2.1 Real-world data

KNMI operates the network of automatic weather stations (AWS) measuring the weather in the Netherlands. This network is composed of land and sea stations (e.g. North Sea) that are mainly sited in rural or unpopulated areas, hence complying with the World Meteorological Organization (WMO) guidelines and recommendations. These official stations are composed by a set of professional-grade instruments which are regularly maintained and calibrated to ensure the best-possible measurements. AWS measure a wide array of weather parameters (e.g. air temperature, precipitation, wind speed) every few seconds. These observations are sent to KNMI for its subsequent inclusion in the fundamental weather products and services. The spatial distribution of the network is such so that it balances financial considerations with a good coverage for large-scale phenomena, but it implies that large local regions remain unobserved [19]. We refer to these high-quality observations as ‘first-party data’ (i.e. 1PD).

Nevertheless, KNMI is not the only public organization capable of deploying sensor networks to measure weather conditions. It is the case of the Directorate General for Public Works and Water Management (i.e. Rijkswaterstaat), that maintains a network of weather stations parallel to the road network (i.e. GMS, ‘Slipperiness Reporting System’ in English), so that timely measurements can be acquired in the event that severe weather conditions compromise road safety. There are over 300 stations in this network, collecting basic weather parameters (i.e. temperature, precipitation, humidity). These instruments might not be as regularly maintained and calibrated as the official stations, but they tend to be reliable sources of weather observations. We refer to these good observations provided by other trusted organizations as ‘second-party data’ (i.e. 2PD).

However, the nature of 1PD and 2PD networks is spatially constrained, since monitoring stations require to be located in unpopulated places (e.g. rural areas, highways) to ensure the instruments are not disturbed by local factors (e.g. radiative effects). This implies that these networks are unable to acquire measurements in the urban environment, where most of the population live, and therefore local effects (e.g. canyon layer urban heat island) are poorly monitored. This spatial sparsity also limits the resolution at which NMS can offer products and services, which motivates the inclusion of weather data provided by alternative networks.

In 2015, KNMI joined as partner the Weather Observations Website (WOW) project, a global initiative promoted by the UK Met Office, intended to collect weather observations measured by the general public. Weather enthusiasts can install in their private (e.g. at home) or public (e.g. schools, parks) spaces personal weather stations (PWS). PWS registered to this global project take weather measurements that are subsequently sent and stored in the WOW repository. The Dutch WOW (hereinafter: WOW-NL), currently comprises more than 1000 stations that have collected over 300 million observations in less than a decade. As seen, there is a sheer volume of high-resolution observations that, potentially, could help complementing the official products and services. We refer to these crowdsourced weather observations provided by the general public as ‘third-party data’ (i.e. 3PD).

Data quality of 3PD collections is often the main concern when it comes to integrating these observations into any research or operational chain at NMS [20]. In the past years, substantial efforts have been dedicated to the quality assurance of 3PD. In this sense, researchers at KNMI implemented a modification of an existing quality control (QC) for air temperature, and developed two QCs for wind speed [11] and precipitation [10, 21]. In general, results show that after a QC procedure intended to filter outliers and correct biases, 3PD collection seems to be of sufficient quality to be included in subsequent services, and this is in line with results obtained by other researchers (i.e. [12–14, 22]) working with 3PD.

In this research we focus on using temperature measurements from the 25th of January 2019 from the KNMI (1PD), Directorate General for Public Works and Water Management (Rijkswaterstaat, 2PD) and WOW-NL (3PD). We selected this date because there was a ‘Code Orange’ severe weather impact warning for slippery roads due to ice formation. After the QC from Napoly et al. ([9]) was applied there were 35 1PD stations, 319 2PD stations and 409 3PD stations yielding observations. The distributions of these stations can be found in Fig 1. The figures show that the 1PD stations are evenly distributed over the country, the 2PD stations follow the road network and the 3PD stations are located throughout the country, but tend to be clustered around urban and peri-urban environments.

This figure shows the locations of the stations on the 25th of January 2019 that were used in this research. The first figure shows the locations of the 1PD, the second of the 2PD and the third of the 3PD. The base map is public domain and is available via: https://www.naturalearthdata.com/.

This data will be used to make a temperature prediction for the entire country. To make this prediction we divide the Netherlands in a grid, as shown in Fig 1, for each grid point a temperature will be estimated.

We choose to work with temperature data, because there is a QC available for it and it is easier to test it with a phenomenon such as temperature that is quite homogeneous in space and time. The robustness of the method will be tested on synthetic data that will be introduced in subsection 2.2.

### 2.2 Synthetic data

In Section 2.1 we provide a detailed description of the data sets used in this research. We consider that the weather measurements collected by the official KNMI network provide observations with reduced measurement errors, since the weather stations are regularly maintained and calibrated. However, measurement error might be larger for 2PD and 3PD monitoring networks (especially for the latter)—where error is defined as the difference between the reading of the low-fidelity sensor in its current set-up and a hypothetical high-fidelity sensor set-up at that same location. These significant errors pose challenges to get a clear picture on how accurate the proposed model is. In addition, the high density of the combined networks (i.e. 123PD) suggests that, potentially, very complex weather patterns (i.e. quickly changing along the spatial dimensions) can be modelled, which would illustrate the robustness of the approach. In addition, an important limitation of real-world data is that we do not have a ‘true map’ reference to validate our interpolated grid.

The strategy to overcome these limitations is to use synthetic data capable of simulating complex weather fields that we subsequently model. Each station of the 123PD network is assigned a synthetic temperature value and measurement error. The model runs with these synthetic measurements and the resulting gridded maps can be compared on a per-point basis, hence enabling the assessment on how well the model performs. This schema has been repeated for several simulated weather patterns with different spatial variability (Fig 2). In this way, we also show that the approach is robust.

Maps of the synthetic data generated by using Eq (1) for various values of *N*. The base map is public domain and is available via: https://www.naturalearthdata.com/.

This synthetic data will look like a wavy lattice, it will be defined precisely in the next paragraph. We will use a parameter *N* that defines how long those waves are, i.e. how quickly the temperature changes when you move a certain distance. By defining the synthetic data like this it will be easy to see how the model behaves for weather conditions of different spatial variabilities.

The synthetic data is defined as follows, let *ξ*^{s} and *ξ*^{g} denote the station and grid point locations, respectively, given by their longitude and latitude, and let **y**^{s} and **y**^{g} denote the temperatures generated for the stations and the grid points, respectively, defined as
(1) *ψ* and *χ* are defined as random numbers with 0 ≤ *ψ* ≤ 1 and 0 ≤ *χ* ≤ 1. These random numbers will ensure that the temperature field is shifted a little bit, and thus make the pattern more realistic. This **y** results in a temperature field that follows a lattice pattern, where the size of the lattice fields depend on *N*, which is a parameter that determines how many oscillations per degree longitude and latitude there are. Furthermore, we multiply these parameters by amplitude *a* that was determined by looking at the amplitude of real measured temperature values. In addition, a synthetic error *ϵ* is added to **y**^{s}, *ϵ* consists of a systematic error value (or ‘bias’) and random error value (or ‘noise’) for the second and third party stations. These factors, which are arbitrary to some extent, are in our set up determined by running the Kriging procedure that is described in Section 3 with data from the 25th of January and observing what values were predicted for the bias and noise. For the bias −3.22°C and −2.46°C were added to the second and third party stations respectively and 0.57°C and 1.62°C for the noise.

The resulting temperature fields *g*(*ξ*^{g}) are shown for various *N*’s in Fig 2. Temperature data has a low spatial variability, so it is expected to be similar to Fig 2A.

The goal of using synthetic data is to show that for weather patterns that show low spatial variability, the 1PD stations alone suffice to make accurate predictions (i.e. spatial regression), but when the weather pattern has higher spatial variability, using 2PD and 3PD data is beneficial. To use all of this data for making a prediction, we need a method that is able to predict values of new data points given a set of known data. In the next chapter a possible method will be addressed.

## 3 Methodology

In this paper we present and motivate an analytical framework for fusion of multi-fidelity multi-source data. Our approach is based on the idea that data is only complete when it is accompanied by a quantified indication of measurement error [23–25]. Because this measurement error is often not specified for all data sources, we propose to construct a simplified model of the measurement procedure, and learn the parameters of this measurement procedure model from the data set. We find that Bayesian Data Assimilation [26] offers a natural way of modeling this measurement process, because a model of the measurement procedure is an intrinsic part of the Bayesian updating process—formalized in the ‘likelihood’. In this way, our solution for dealing with bias and noise results in a natural way of fusing multi-fidelity multi-source data.

Kriging [27–29] is a probabilistic regression method that makes use of local weighting and the statistical properties of the known data points. Due to the use of statistical properties, the covariance and correlation between any two points is known, which makes it possible to compute an error map for the entire surface and that can allow us to quantify the accuracy of the prediction of the weather variable of interest. In a Bayesian framework, statistical properties also allow kriging to work with data containing bias and noise, because it is able to make estimations for the measurement error model coefficients by using maximum likelihood estimators when these coefficients are unknown [26, 30, 31]. As an alternative to maximum likelihood estimation, a resampling procedure like cross-validation might be used [32].

Kriging is closely related to Gaussian process (GP) regression [33]. In this context, a GP is a stochastic process that consists of a (possibly infinite) collection of random variables, such that every finite collection of those random variables follows a multivariate Gaussian distribution, which can be conditioned on the observed data [34, 35].

The goal of the method that uses a GP, or equivalently kriging in a Bayesian framework, is to find the distribution of the collection of random variables by learning from training data *D* and predicting the data *Y* by modelling the distribution *P*(*Y*|*D*) as a multivariate Gaussian distribution. The process is visualised in Fig 3. In the next step a number of the samples drawn from the prior distribution are shown. After this the observations are taken in to account, the samples that are not in line with the observations are removed and a posterior distribution is computed as shown in Fig 3.

In Fig 3 we show an illustration of the methodology described in Section 3. Fig 3A shows the prior realisation of 8 ensemble members, which have not been conditioned on the data. Fig 3B shows the realisation of ensemble members, which have now been conditioned on the high-fidelity data (green dots), but not yet on the low-fidelity data(gray dots). Fig 3C) illustrates the corresponding power spectra, with the reference (or ‘true’) spectrum in black, and the ensemble spectra in corresponding blue. Note that the ensemble spectra still deviate quite significantly from the reference. Fig 3D shows the realisation of ensemble members, which have now been conditioned on the high-fidelity data (green dots) as well as on the low-fidelity data(gray dots), but without a noise model. Fig 3E illustrates the corresponding power spectra. Note that the ensemble spectra still deviate quite significantly from the reference and show high-frequency noise. Fig 3F shows the realisation of ensemble members, which have now been conditioned on the high-fidelity data (green dots) as well as on the low-fidelity data(gray dots), now with a noise model. Fig 3G illustrates the corresponding power spectra. Note that the ensemble spectra now deviate much less from the reference and that the high-frequency noise has effectively been removed.

In Fig 3, the ensemble members illustrate that including a noise model in the likelihood is essential when we aim to condition on a combination of high- and low-fidelity observations. Without noise treatment, the power spectrum deteriorates; only after including a noise model, which acts similarly to a low-pass filter, the ensemble members match the spectrum of the reference process that was used to generate this synthetic data.

In Bayesian kriging, we define a ‘prior’, which is a description of the process under consideration before including the data. Then, we define a ‘likelihood’, which is a description (or approximate model) of the measurement procedure. Now, in the updating step, we introduce the observed data, which results in a ‘posterior’ distribution for the process under consideration. In this work, we focus on the posterior mean (the ‘map’) and the posterior variance (the ‘uncertainty map’).

For this project a modified version of kriging was used, using local error estimates (Kriging LE) [36]. In this version of kriging, bias and noise are incorporated in the computations through the likelihood. However, before we discuss this modified version the standard simple kriging will be discussed.

### 3.1 Bayesian spatial regression without bias and noise

Kriging is a non-parametric approach for estimating an unknown *n* × 1 spatial process **X**. In our case, this vector **X** contains the ‘interpolated’ values at the *n* grid points of our map. Kriging aims to estimate a distribution over all possible functions realisations that fit the observed data. It defines the prior on **X** as a GP [26, 34, 35, 37, 38]:
(2)
Here we have the the *n* × 1 prior drift *μ*. In our case, when there is no station bias *μ* is constant; while in the case of station bias, *μ* will be exploited to include the bias model (see next subsection). In addition, we have the *n* × *n* prior process covariance matrix **P**. In our case, **P** represents the expected spatial smoothness of the quantity of interest, before conditioning on the observations.

We then assume the normal likelihood for the observations [26]:
(3)
The likelihood can be interpreted as a model of the observation process [26]. Here, the the *N* × *n* matrix **H** is an observation matrix. In our case, the observation matrix **H** indicates where the *N* measurement points are located relative to the *n* grid points for which we want to predict a value—however, **H** can also define a measurement procedure that averages the quantity of interest over an interval in time or space, or any other linear operation that is a description of the measurement procedure (e.g. a partial derivative operator [39], etc.). In addition, we have the *N* × *N* observation error covariance matrix **R**, also known as the noise covariance matrix. In our case, we assume uncorrelated observation errors, such that **R** will only have entries on the diagonal. When no noise is assumed, we have **R** = 0, which results in ‘exact interpolation (i.e. the interpolated value exactly matching the observed station values). Note that in this subsection, where we assume no presence of noise, we have written **R** = 0; although for numerical reasons we implement a regularization **R** = *ε*^{2}**I**, with *ε*^{2} in the order of machine precision [39].

The *N* station observations are introduced as the *N* × 1 observation vector **y**. After conditioning on the observed data **y**, the posterior is defined by [26]:
(4)
where the *n* × 1 posterior mean is:
(5)
and the *n* × *n* posterior covariance is:
(6)
where we have introduced the Kalman gain:
(7)
In our case, the posterior mean represents the expected value of the ‘interpolated’ map, while the square root of the diagonal of the posterior covariance represents the estimated local uncertainty of the map.

The components of covariance matrix **P** are modeled by **P**_{i,j} = *κ*(**x**_{i}, **x**_{j}), where *κ* is a positive definite kernel. We have this requirement that *κ* must be positive definite, to make sure that **P** is positive (semi) definite [34]. There are several different kernels, but here we will be using the kernel defined as follows
(8)
where *σ*^{2} is the variance of **y** and *ψ*_{i,j} is the basis function corresponding to the correlation between locations
(9)
with *d* being the number of dimensions and *h* the spatial distance between locations. is a spatial parameter that serves as an indicator to assess how quickly the function changes when location *j* moves closer or further away from location *i* [40, 41].

To approximate , we use a maximum likelihood estimator (MLE) with respect to *θ*, which is equivalent to minimising [42–44]
(10)
The product can be interpreted as the matrix of correlations between sample data.

### 3.2 Bayesian spatial regression with bias and noise

However, all observed data contains bias and noise, so a method that is able to deal with that is needed. Therefore, we introduce Kriging LE, which is a version of simple kriging that is equipped to handle this. In the Bayesian framework, the bias and noise, as an intrinsic part of the measurement procedure, can be included in a natural way in the likelihood [3, 26, 36, 39, 43, 45]: (11)

In Eq (11), *Bβ*_{B} is a linear model for the bias, containing the *N* × *b* bias budget ** B** and the

*b*× 1 coefficients

*β*_{B}. For example, the bias budget can contain

*b*sources of bias, like the type of measurement device—indicating that one type of device might lead to a different bias than another type of device. However, other indicators can also be included in the bias budget. The corresponding coefficients

*β*_{B}are estimated by including them in the MLE.

In a similar way, in Eq (11), *Nβ*_{N} is a linear model for the variance of the observational noise. Here, we exploit the fact that in the present approach, we can provide individual noise levels for the individual stations (and/our types of station), through the corresponding elements of the diagonal of ** R** [39]. Again, the noise budget

*N*can contain the type of measurement device, but can also contain other proxies for the noise level of the data. Again, the corresponding coefficients

*β*_{N}are estimated by including them in the MLE.

Note that estimating hyperparameters like the bias and noise coefficients from an MLE requires us to apply an inflation factor to the posterior covariance [39].

It is worth mentioning that the regression result is robust against estimating the noise from an MLE. As an illustration, Fig 4 shows how, for a different number of low-fidelity points in the example of Fig 3, the relative RMS prediction error has a robust minimum for a range of estimated noise levels. In addition, we observe from this illustration that the robustness increases when we have more low-fidelity data points. In other words, for the resulting map it is not essential that we find very accurate values for *β*_{N}.

The accuracy of the interpolation depends on the true noise level and the estimated noise level. However, as can be seen from this illustration, the regression error is robust against changes in the estimated noise level.

Providing a bias and noise budget allows us to differentiate between the various data sources. In addition, although we do not use this in the present work, it is possible to include other proxies in the bias or noise budget. For example, it is possible to include population density as a proxy for observational noise, as for example in [45]. This, in essence, is how the Bayesian likelihood, as a model of the measurement procedure, allows us to have an analytical framework for multi-fidelity multi-source data fusion.

### 3.3 Reliability of the uncertainty estimate

In our resulting grids, we present an expected value (posterior mean) and predicted uncertainty (square root of the diagonal of the posterior covariance). It should be noted that both the expected value and the predicted uncertainty are only estimates.

Although the posterior mean does often show good performance, the kriging covariance often lacks reliability, inviting various suggested improvements [46–48]. This issue is becoming more important, as the quantification of uncertainty in observations and derived results is nowadays a central theme in science [23, 24]. Two important reasons for the relatively poor performance of the kriging covariance are: (i) the possibly inaccurate assumption that the variogram is constant throughout the domain [23, 49] and (ii) the sensitivity of the kriging variance to changes in the hyperparameters [50]. Improving the reliability of the uncertainty estimate, as derived from the posterior covariance, is an ongoing topic of investigation, see for example [45].

## 4 Results

### 4.1 Results for synthetic data

Recall that we had defined synthetic data for the stations and grid points. This was done to be able to illustrate the robustness of our approach. We expect that the predictions using 1PD, 12PD and 123PD all perform well if the number of oscillations *N* is low, but we expect 12PD and 123PD to outperform 1PD when the number of oscillations increases. This is expected, because of the Shannon-Nyquist theorem [18]. To verify these hypotheses, we computed the fitting performance based on weather complexity. The results are shown in Fig 5. For each combination of parties the model was ran 5 times, each time with a different small shift in the synthetic data. The figure confirms our hypotheses, for a low value of *N* all parties perform similarly. However, as *N* increases there are clear differences visible between the performances. 1PD reaches its Shannon-Nyquist limit relatively quickly, while the others are still performing reasonably well. By adding more data the Shannon-Nyquist limit is pushed back further, which implies that using more data increases the accuracy of the predictions in instances when there is a high spatial variability.

This figure shows the RMSE values for different combinations of 1PD, 2PD and 3PD for various levels of spatial variability of the data. For each combination in the picture, the RMSE is computed five times. Each time a different random noise is added to the synthetic data. It shows that for low spatial variability they all perform equally well, but after approximately 1.5 oscillations per degree 1PD performs significantly worse than the others.

It is also notable that while 13PD has more measurements available than 12PD, it does perform a little worse than 12PD between *N* ≈ 1 and *N* ≈ 8. This is due to the fact that 2PD measurements are more accurate than 3PD measurements and for low *N* the quality of measurements is more important than the quantity of measurements.

The findings of Fig 5 are supported by the resulting maps of the predictions. In Fig 6 the results are shown for *N* = 1.5. Fig 6A shows what the true synthetic temperature field looked like, and the other three maps show the predictions the model made using 1PD, 12PD and 123PD respectively. In Fig 7 the uncertainty of these three predictions are shown. In Fig 6B it appears the predicted temperatures resemble the true temperatures quite well. The pattern that we predict using only 1PD resembles the true temperature pattern quite well. Especially in the areas where there are more stations, the prediction looks accurate. However, if we consider an area where the density of the stations is lower, such as the northeast of the country, it is visible that there are large differences between the prediction and the true field. If we consider the point 53°N, 6.5°E, we see that the 1PD prediction is not able to predict the sharp edges of the true field well. If we consider that same point in the next two figures, it is clear that 12PD predicts this point better and that 123PD predicts it the best. The decrease of uncertainty of these predictions is visualised in Fig 7. In the first map we see that in the areas where there are two or three stations together the uncertainty is the lowest and in areas where the density of stations is low the uncertainty is the highest. By adding the 2PD and 3PD station the density of stations increases in most areas, which leads to a reduction of uncertainty. This is clearly visible, because the dark green areas from the first map can no longer be seen on the second and third map. The predictions made by 12PD and 123PD are very similar for this *N*, which is reflected by the similar average uncertainty. Note that we are able to compute this average uncertainty with a three decimal significance, because we have assumed that our 1PD does not contain any noise. This similarity in performance is what we had expected, since Fig 5 shows that for *N* = 1.5 the RMSEs for 12PD and 123PD are close to each other.

Fig 6A shows the true synthetic temperature field and the following figures the predictions made by using 1PD, 12PD and 123PD respectively. The locations of first-party stations are indicated with a black square, second-party stations with a white circle and the third-party stations with a white triangle. These maps show that for a low *N* 1PD, 12PD and 123PD are all able to make a prediction that resembles the true temperature. However, it also shows that the more parties are used, the more accurate the prediction becomes. This can clearly be seen in the north of the country. The base map is public domain and is available via: https://www.naturalearthdata.com/.

The figures show the uncertainty of the predictions made by using 1PD, 12PD and 123PD from Fig 6 respectively. In the first map there are some large dark green areas where the uncertainty is high. This is because the density of stations is very low there. When we add 2PD, these dark green areas disappear and the overall uncertainty decreases a lot, almost by 1.4°Celsius. Adding 3PD decreases the uncertainty even more, but only by a small amount. The base map is public domain and is available via: https://www.naturalearthdata.com/.

An example that shows the potential of using 2PD and 3PD even clearer is the case of *N* = 7. The results of the prediction are shown in Fig 8 and their uncertainty in Fig 9. In the first map the true temperature field is shown again. Since the *N* is higher than in the previous example the temperature changes quicker spatially which results in a smaller lattice. In the 1PD prediction it is very clear that this does not resemble the true temperature at all, which is also substantiated by the uncertainty computations visualised in Fig 9. This map is almost completely dark green, with an average uncertainty of 4.5°Celsius, which is more than double the uncertainty the 1PD prediction had for the case *N* = 1.5. The prediction made using 12PD is starting to look promising, the pattern is emerging in the areas where the density of stations is high like in the west of the country. However, in the lower density areas like the northeast the pattern is still quite spotty. These findings are reflected in the uncertainty map. The areas in the middle and west of the Netherlands are a lighter green than the northeast. It shows there are large areas where the prediction uncertainty is very significant. In the case of *N* = 1.5 we expected 12PD and 123PD to behave similarly, but for *N* = 7 Fig 5 shows that their RMSEs differ almost 0.5°Celsius, hence 123PD should perform significantly better than 12P. Which is exactly what our maps show. For the areas where 12PD predicted a spotty version of the true temperature, 123PD delivers a much clearer lattice similar to the true field. This change is also visible in the uncertainty maps, the areas with large uncertainty are greatly reduced compared to the 12PD prediction.

Fig 8A shows the true synthetic temperature field and the following figures the predictions made by using 1PD, 12PD and 123PD respectively. The second map indicates that 1PD is not able to make an accurate prediction of the temperature field at all. The following map shows that using 12PD delivers a much better prediction already, but still has some spotty predictions in the areas with a lower station density. The last map shows that adding 3PD helps predict the areas where 12PD have a low station density. The base map is public domain and is available via: https://www.naturalearthdata.com/.

The figures show the uncertainty of the predictions made by using 1PD, 12PD and 123PD from Fig 8 respectively. We see that 1PD has a very high average uncertainty and has a high uncertainty in almost the whole country. Using 12PD reduces that average uncertainty a lot, but there are still large areas with high uncertainty. By adding 3PD we see that these areas with high uncertainty are greatly reduced, resulting in an even lower average uncertainty. The base map is public domain and is available via: https://www.naturalearthdata.com/.

### 4.2 Results for real-world data

The previous section used synthetic data to demonstrate that when weather phenomena with a high spatial variability is being modeled, it is beneficial to use 2PD and 3PD in addition to 1PD. However, we would also like to see how the model behaves with real data. Therefore the model was applied to temperature data from the 25th of January 2019. Note that temperature fields do not exhibit high spatial variability so we expect 1PD, 12PD and 123PD to all perform well and to not see large performance differences between them.

The results support these hypotheses, when we consider the temperature maps in Fig 10 there are not any major differences. This is also expressed in the uncertainty maps shown in Fig 11. The average uncertainty is low for all three, but we do see a small decrease in the uncertainty when going from 1PD to 12PD to 123PD.

Temperature predictions for real temperature data from the 25th of January 2019. These figures show the predictions made by using 1PD, 12PD and 123PD respectively. The base map is public domain and is available via: https://www.naturalearthdata.com/.

Uncertainty predictions for the predictions shown in Fig 10. The base map is public domain and is available via: https://www.naturalearthdata.com/.

## 5 Conclusion

The provision of local weather observations is fundamental to advance towards a high resolution weather forecast. Official monitoring networks managed by NMS often face financial and operational challenges, limiting their growth and maintenance. In this context, the adoption of second- and third-party datasets by NMS might be crucial at substantially increasing the spatial resolution of their products and services. For this purpose, it seems necessary to increasingly adopt methods capable of combining weather observations with variable quality levels, which is the case for 2PD and in particular for 3PD data. In this document, we provide a thorough description of a multi-fidelity spatial regression method which is capable of combining official and crowdsourced weather observations, where each network has a significantly different level of fidelity.

The proposed method has been introduced as a spatial regression method that allows us to deal with multi-fidelity data, because it is able to make estimations for spatial, bias and noise parameters. The adjustment of the parameters is necessary to make accurate predictions, since second- and third-party data are assumed not to be as precise as first-party data. The proposed method has been tested for two scenarios, using synthetic data and real data. The experiment with synthetic data shows that the current approach might be useful to model weather phenomena with high spatial variability when first-party data is combined with second- and third-party data. The experiment with real data shows that the predicted temperature fields average standard deviations decreases when first-party data is combined with second- and third-party data, which tells us there is potential for significant improvement when including second- and third-party data. This means that these datasets can be used to increase the spatial resolution of observation-based weather interpolations and that they can be used to ensure that the nation wide prediction uncertainty is decreased significantly, especially for phenomena such as rain and wind.

Future work might be developed along two lines: 1) assessing the network design based on the spatial distribution of the stations; 2) uncertainty decrease based on bias and noise budgets. The pursue of an optimal network design is motivated by the fact that second- and third-party stations are often not evenly distributed over the country, hence forming clusters. For regions with a high station density, it would be interesting to research whether all the stations are needed or some of them are redundant. This has practical implications during the modelling phase, since a non-redundant network design might reduce the computational time. Network design can also be tackled for the regions with low station density, since it might be relevant to assess where more stations are required to decrease the prediction uncertainty. In this line of uncertainty decrease, it is important to mention that in this work we have estimated a bias and noise variable for each party (i.e. type of network). However, the bias and noise could also be estimated from “bias or noise budgets” that includes sensor types, station siting conditions, or even proxies (e.g. population density, local cooling or radiative effects). By doing this, the bias and noise can be estimated more accurately and thus the spatial regression would perform better. A disadvantage of this approach would be that it might substantially increase the computational time.

In recent years, there has been an intense research activity around the usage and incorporation of crowdsourced data in the climate sciences. Substantial efforts have been carried out to perform quality assessment for crowdsourced data [16], fitting these novel observations into the numerical weather predictions [22, 51], or defining workflows to transform these observations into valuable new products and services for NMS [19]. In this line, the current work illustrates how crowdsourced data enable the creation of high-resolution weather products, which has a remarkable potential to expand the current products and services at KNMI. The availability of these high-resolution maps, might also be helpful at issuing local weather warning and events, particularly in urban areas, and open the door to carry out impact-based analyses. Finally, it is important to remark that the elaboration of these high-resolution weather maps today, will undoubtedly contribute at creating the new high-resolution climate datasets of tomorrow.

## Acknowledgments

We would like to thank everyone involved in the WOW-project. This research has only been possible because of all the people who have a PWS and connected it to the WOW network. Due to their effort we have been given access to a lot of useful weather data that has inspired this research. We would also like to thank Rijkswaterstaat for sharing their data with us.

## References

- 1. Daly C, Taylor G, Gibson W, Parzybok T, Johnson G, Pasteris P. High-quality spatial climate data sets for the United States and beyond. Transactions of the ASAE. 2000;43(6):1957.
- 2. van den Besselaar EJM, Haylock MR, van der Schrier G, Klein Tank AMG. A European daily high-resolution observational gridded data set of sea level pressure. Journal of Geophysical Research: Atmospheres. 2011;116(D11).
- 3. de Baar J, Garcia-Marti I, van der Schrier G. Spatial regression of multi-fidelity meteorological observations using a proxy-based measurement error model. Advances in Science and Research. 2022.
- 4. Bauer P, Thorpe A, Brunet G. The quiet revolution of numerical weather prediction. Nature. 2015;525(7567):47–55. pmid:26333465
- 5. Alley RB, Emanuel KA, Zhang F. Advances in weather prediction. Science. 2019;363(6425):342–344. pmid:30679358
- 6. Chiaravalloti RM, Skarlatidou A, Hoyte S, Badia MM, Haklay M, Lewis J. Extreme citizen science: Lessons learned from initiatives around the globe. Conservation Science and Practice. 2022;4(2):e577.
- 7. Kirk PJ, Clark MR, Creed E. Weather observations website. Weather. 2021;76(2):47–49.
- 8.
Mylne K, Male H, Gilbert S. The Weather Observations Website. Copernicus Meetings; 2022.
- 9. Napoly A, Grassmann T, Meier F, Fenner D. Development and application of a statistically-based quality control for crowdsourced air temperature data. Frontiers in Earth Science. 2018;6:118.
- 10. De Vos L, Overeem A, Leijnse H, Uijlenhoet R. Rainfall estimation accuracy of a nationwide instantaneously sampling commercial microwave link network: Error dependency on known characteristics. Journal of atmospheric and oceanic technology. 2019;36(7):1267–1283.
- 11. Chen J, Saunders K, Whan K. Quality control and bias adjustment of crowdsourced wind speed observations. Quarterly Journal of the Royal Meteorological Society. 2021;147(740):3647–3664.
- 12. Båserud L, Lussana C, Nipen TN, Seierstad IA, Oram L, Aspelien T. TITAN automatic spatial quality control of meteorological in-situ observations. Advances in Science and Research. 2020;17:153–163.
- 13. Fenner D, Bechtel B, Demuzere M, Kittner J, Meier F. CrowdQC+—a quality-control for crowdsourced air-temperature observations enabling world-wide urban climate applications. Frontiers in Environmental Science. 2021;9:553.
- 14. Droste AM, Heusinkveld BG, Fenner D, Steeneveld GJ. Assessing the potential and application of crowdsourced urban wind data. Quarterly Journal of the Royal Meteorological Society. 2020;146(731):2671–2688.
- 15. Mandement M, Caumont O. Contribution of personal weather stations to the observation of deep-convection features near the ground. Natural Hazards and Earth System Sciences. 2020;20(1):299–322.
- 16. Hahn C, Garcia-Marti I, Sugier J, Emsley F, Beaulant AL, Oram L, et al. Observations from Personal Weather Stations—EUMETNET Interests and Experience. Climate. 2022;10(12):1–14.
- 17. Shannon CE. Communication in the Presence of Noise. Proceedings of the IRE. 1949;37(1):10–21.
- 18. Por E, van Kooten M, Sarkovic V. Nyquist–Shannon sampling theorem. Leiden University. 2019;1:1.
- 19. Garcia-Marti I, Overeem A, Noteboom JW, de Vos L, de Haij M, Whan K. From proof-of-concept to proof-of-value: Approaching third-party data to operational workflows of national meteorological services. International Journal of Climatology. 2022;.
- 20. Bell S, Cornford D, Bastin L. How good are citizen weather stations? Addressing a biased opinion. Weather. 2015;70(3):75–84.
- 21.
van Andel J. Quality control development for near real-time rain gauge networks for operational rainfall monitoring; 2021.
- 22. Hintz KS, Vedel H, Kaas E. Collecting and processing of barometric data from smartphones for potential use in numerical weather prediction data assimilation. Meteorological Applications. 2019;26(4):733–746.
- 23.
BIPM, ISO. Guide to the Expression of Uncertainty in Measurement. Geneva, Switzerland. 1995;122:16–17.
- 24. Kessel W. Measurement uncertainty according to ISO/BIPM-GUM. Thermochimica Acta. 2002;382(1-2):1–16.
- 25.
Duvernoy J. Guidance on the computation of calibration uncertainties. World Meteorological Organization. 2015;.
- 26. Wikle CK, Berliner LM. A Bayesian tutorial for data assimilation. Physica D: Nonlinear Phenomena. 2007;230(1-2):1–16.
- 27. Matheron G. Principles of geostatistics. Economic geology. 1963;58(8):1246–1266.
- 28.
Gandin LS. Objective analysis of meteorological fields. Israel program for scientific translations. 1963;242.
- 29. Cressie N. The origins of kriging. Mathematical geology. 1990;22(3):239–252.
- 30. Azpurua MA, Dos Ramos K. A comparison of spatial interpolation methods for estimation of average electromagnetic field magnitude. Progress In Electromagnetics Research M. 2010;14:135–145.
- 31. Tatalovich Z, Wilson JP, Cockburn M. A comparison of thiessen polygon, kriging, and spline models of potential UV exposure. Cartography and Geographic Information Science. 2006;33(3):217–231.
- 32.
James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. vol. 112. Springer; 2013.
- 33.
Williams CK, Rasmussen CE. Gaussian processes for machine learning. vol. 2. MIT press Cambridge, MA; 2006.
- 34.
Quadrianto N, Kersting K, Xu Z. In: Gaussian Process. Boston, MA: Springer US; 2010. p. 428–439.
- 35.
Murphy KP. Machine learning: a probabilistic perspective. MIT press;.
- 36. de Baar JH, Percin M, Dwight RP, van Oudheusden BW, Bijl H. Kriging regression of PIV data using a local error estimate. Experiments in fluids. 2014;55:1–13.
- 37.
Weinberger K. Lecture 15: Gaussian Processes; 2018. Machine Learning for Intelligent Systems, Course page. Available from: https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote15.html.
- 38.
Shekaramiz M, Moon TK, Gunther JH. A Note on Kriging and Gaussian Processes. 2019;.
- 39. de Baar JH, Dwight RP, Bijl H. Improvements to gradient-enhanced Kriging using a Bayesian interpretation. International Journal for Uncertainty Quantification. 2014;4(3).
- 40. Tobler WR. A computer movie simulating urban growth in the Detroit region. Economic geography. 1970;46(sup1):234–240.
- 41. Forrester AI, Sóbester A, Keane AJ. Multi-fidelity optimization via surrogate modelling. Proceedings of the royal society a: mathematical, physical and engineering sciences. 2007;463(2088):3251–3269.
- 42. Mardia KV, Marshall RJ. Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika. 1984;71(1):135–146.
- 43. Forrester AI, Keane AJ, Bressloff NW. Design and analysis of “Noisy” computer experiments. AIAA journal. 2006;44(10):2331–2339.
- 44.
De Baar JH, Dwight RP, Bijl H. Fast maximum likelihood estimate of the Kriging correlation range in the frequency domain. IAMG 2011: Proceedings of the International Association of Mathematical Geosciences “Mathematical Geosciences at the Crossroads of Theory and Practice”, Salzburg, Austria, 5-9 September 2011. 2011;.
- 45.
de Baar J, Garcia-Marti I. Recent improvements in spatial regression of climate data. In: NATOAVT-354 workshop on multi-fidelity methods for military vehicle design; 2022. p. 26-28.
- 46. Yamamoto JK. An alternative measure of the reliability of ordinary kriging estimates. Mathematical Geology. 2000;32:489–509.
- 47.
Heuvelink GB, Pebesma EJ, et al. Is the ordinary kriging variance a proper measure of interpolation error. In: The fifth international symposium on spatial accuracy assessment in natural resources and environmental sciences. RMIT University, Melbourne; 2002. p. 179–186.
- 48. Den Hertog D, Kleijnen JP, Siem AY. The correct Kriging variance estimated by bootstrapping. Journal of the Operational Research Society. 2006;57(4):400–409.
- 49. Frei C. Interpolation of temperature in a mountainous region using nonlinear profiles and non-Euclidean distances. International Journal of Climatology. 2014;34(5):1585–1605.
- 50. Van Groenigen JW. The influence of variogram parameters on optimal sampling schemes for mapping by kriging. Geoderma. 2000;97(3-4):223–236.
- 51. Nipen TN, Seierstad IA, Lussana C, Kristiansen J, Hov Ø. Adopting citizen observations in operational weather prediction. Bulletin of the American Meteorological Society. 2020;101(1):E43–E57.