Crowdsourcing air temperature data for the evaluation of the urban microscale model PALM—A case study in central Europe

In summertime and during heat events the urban heat island can negatively impact human health in urban areas. In the context of climate change, climate adaptation receives more attention in urban planning. Microscale urban climate modelling can identify risk areas and evaluate adaptation strategies. Concurrently, evaluating the model results with observational data is essential. So far, model evaluation is mostly limited to short-term field cam-paigns or a small number of stations. This study uses novel crowdsourcing data from Netatmo citizen weather stations (CWS) to evaluate the urban microscale model PALM for a hot day (T max � 30˚C) in Bochum in western Germany with anticyclonic atmospheric conditions. Urban-rural air temperature differences are represented by the model. A quality con-trol procedure is applied to the crowdsourced data prior to evaluation. The comparison between the model and the crowdsourced air temperature data reveals a good model performance with a high coefficient of determination (R 2 ) of 0.86 to 0.88 and a root mean squared error (RMSE) around 2 K. Model accuracy shows a temporal pattern and night-time air temperatures during the night are underestimated by the model, likely due to unresolved cloud cover. The crowdsourced air temperature data proved valuable for model evaluation due to the high number of stations within urban areas. Nevertheless, weaknesses related to data quality such as radiation errors must be considered during model evaluation and only the information derived from multiple stations is suitable for model evaluation. The procedure presented here can easily be transferred to planning processes as the model and the crowdsourced air temperature data are freely available. This can


Introduction
Cities are a central part of human life, providing space for living, work and services.Over the last decades, the amount of people dwelling in cities globally increased significantly.In 1950, only 30% of the world's population resided in cities.By 2018, this fraction increased to 55% and a further increase to 68% is projected in 2050 [1].This urbanisation process alters social and economic structures [1] as well as transforms the natural environment into an urban environment [2].
The transformation of natural into urban surfaces has multiple effects on the environment resulting in an urban ecosystem.Modifications of the urban atmosphere ensue from changes in the albedo and emissivity of surfaces, modified evaporation rates, aerodynamic roughness and the composition of the atmosphere in terms of water vapour, gases and particulate matter [2].One atmospheric alteration resulting from the urbanisation process is the canopy layer urban heat island (CLUHI).It describes the relative overheating of the urban atmosphere in comparison to rural areas [2][3][4][5].Combining this with warm and/or hot weather, this urban overheating can affect the health and comfort of a cities' population due to thermal stress [4,[6][7][8].Past heatwaves, like the one in Europe in 2003, have caused thousands of additional deaths due to heat-related mortality [9].The CLUHI represents an additional risk factor as it increases the heat load [7,9].
Anthropogenic climate change has the potential to further increase the risk of heat-related mortality.By 2020, the global air temperature has increased by 1.09˚C compared to pre-industrial levels [10].The rise in air temperatures is not globally uniform as the mean air temperature, e.g., in Germany has increased by 1.6˚C since 1881 [11].Furthermore, climate change is affecting extreme weather events like hot extremes.Since the 1950s, the occurrence of hot extremes has increased almost globally and human influence has contributed to the increase with medium to high confidence in many regions [10].Additionally, the increase in the frequency of hot days and nights is accentuated in urban areas compared to rural areas [12].A further rise in the occurrence of hot extremes is projected [10].The CLUHI, rising air temperatures and higher frequency of heatwaves due to climate change aggregate and lead to an increased heat load and additional heat stress for urban populations [6,13].
All the factors listed above raise the need for adaptation to changing climatic conditions in cities.City planners require detailed information on the urban microclimate to identify risk areas and plan adaptation strategies and evaluate different adaptation strategies [6,[14][15][16][17][18]. Numerical models can calculate the urban microclimate in the required resolution under current conditions and enable comparative studies to evaluate different strategies [8,14,16].
Today, various models are available to simulate the urban climate at different scales.At the microscale computational fluid dynamics (CFD) models are frequently used.CFD models combine temperature and velocity fields and enable accurate modelling of the urban microclimate with relevant processes such as turbulent flow, exchange of latent and sensible heat and radiative transfer [14,19].This qualifies them as a common tool to evaluate human thermal comfort and exposure in cities [6,18].At the same time, CFD models require detailed information on the urban structure and surface materials such as the exact position and height of buildings as well as boundary conditions and they demand significant computational resources [14,19].
One CFD model is the PALM model system which has recently been extended for urban applications with the PALM-4U model components [20].The model core has been developed for more than 15 years as a parallelised large-eddy simulation model [21].The newly developed modules for urban applications include a radiative transfer model, a building surface model, an indoor model, a surface spinup mechanism, a human biometeorology model with a multi-agent system as well as self-and offline nesting with mesoscale weather models [20].A validation study in Prague revealed a good representation of the urban microclimate for a winter and a summer situation [19].The model is sensitive to changes in surface properties and green infrastructure within a city [22,23].Therefore, the model has proven relevant capability to model thermal exposure and thermal comfort in the urban environment facilitating the evaluation of adaptation strategies for urban planning [6,24].However, case specific evaluation is still favourable to ensure its applicability under various conditions and in different surroundings.
Therefore, spatially resolved observations are needed.Data provided by professional weather stations have high installation and maintenance costs, and thus typically are not spatially dense enough to represent intraurban air temperature variations [25,26].Intensive mobile measurement campaigns are still costly to maintain and implement and limited in spatial coverage [26,27].An alternative to mobile and in-situ measurements is crowdsourcing.Due to the ever-increasing number of devices and sensors connected to the internet, observations of a variety of atmospheric quantities is available through crowdsourcing [26].In recent years, citizen weather stations (CWS) produced by the company Netatmo have received increasing attention for the investigation of the CLUHI [27][28][29][30][31][32].The data is freely accessible through an API [28].As the placement of the sensors is not regulated, several quality control procedures have been developed and improved to ensure high data quality for research purposes [28,30,31].The air temperature data have been applied to evaluate the CLUHI of cities and the air temperature differences on a local scale [29].The data were combined with remote sensing data and machine learning algorithms such as Random Forest to model the air temperature for the city of Berlin [27].
The present study uses the urban microscale model PALM to model the thermal conditions of a hot day (T max � 30˚C) in the city of Bochum, Germany.The model evaluation is based on a professional weather station and uses crowdsourced air temperature data from Netatmo CWS.The aim is to examine the usability of crowdsourced air temperature data for the evaluation of a micro-scale climate model.The following research questions shall be answered by this study: In the present study, does the PALM model display intraurban air temperature differences according to well-known principles of the urban climate?Based on the evaluation with CWS data, can a temporal and/or spatial pattern in model accuracy be detected?And finally, how suitable are CWS data for the evaluation of PALM?

Study area
The city of Bochum is located in western Germany in the polycentric urban area Ruhr encompassing 53 cities.In total, 5.1 million people live in the Ruhr area within an area of 4440 km 2 [33] characterised by a smooth transition between cities. Bochum is defined by a diverse structure of land use ranging from the densely built-up city centre to industrial and commercial sites to green areas spread over the city.
The study area encompasses the city centre, residential areas north and south of the city centre, a few commercial areas and a large park near the city centre and is characterised by a diverse structure (Fig 1).The dense city centre is surrounded by residential areas to the north and the south.To the west and the east, the urban structure is less dense with several commercial areas.At a further distance from the city centre, the urban structure is interrupted by open and natural areas.The terrain generally rises from the northwest to the southeast with an altitude ranging from 42 to 148 m above sea level.The smaller focus area, later called child domain, within the study areas highlights the northern city centre and the large park.The southern part of the focus area contains the dense city centre and the train tracks which encompass the city centre of Bochum.The train tracks are elevated from the surrounding areas.North of the train tracks the structure is less dense and impervious.Residential areas dominate interspersed with small commercial areas along the main roads.Two parks are located in the focus area: the smaller "Schmechtingwiesen Park" in the west and the larger "Stadtpark" in the east.South of Schmechtingwiesen Park are several allotments.Differences in elevation amount to about 50 m.

Study period
As the CLUHI is most pronounced in summer in mid-latitude cities, a heatwave period in August 2020 was chosen as the study period.The heatwave lasted from the 5 th of August to the 13 th of August 2020.A 30-hour period starting on the night of the hottest day (the 8 th of August) of this heatwave was chosen for the PALM simulation.According to the weather classification by the German Weather Service (DWD), central Europa was under the influence of high-pressure systems for the duration of the heatwave period, classified as BM (ridge of high pressure, central Europa) from the 5 th of August to the 7 th of August, as NEa (north-eastern weather situation, anticyclonic conditions central Europe) from the 8 th of August to the 10 th of August and as SEa (south-eastern weather situation, anticyclonic conditions central Europe)  [34], municipality borders are provided by Geobasis NRW [35].The base map (top left) is added in QGIS as XYZ Tiles [36] and the base map (bottom left) is provided by the Web Map Service (WMS) of the Federal Agency for Cartography and Geodesy [37].https://doi.org/10.1371/journal.pclm.0000197.g001from the 11 th of August to the 13 th of August.A weak flow from the northeast transported warm and dry air towards central Europe during the simulated period from 8 th of August 00:00 h to 9 th of August 06:00 h [38].All time stamps are in UTC time zone.
The influence of the high-pressure system resulted in a maximum air temperature of 36.4˚C on the 8 th of August as recorded by the professional weather station LMSS further described below.While during the first night the minimum air temperature reached 16˚C, air temperatures dropped only shortly below 20˚C in the second night.The mean relative humidity for the simulated period was 54.5 %.The overall wind speed was low with a mean of 1.4 ms - 1 and a maximum wind speed of 7.3 ms -1 at 10 m above roof level.

Weather data
Professional weather station data.A professional weather station run by the urban climatology group at Ruhr-University Bochum is situated within the study area.The station "Ludger Mintrop Stadtklima Station" (LMSS), hereafter referred to as reference station, is located in the focus area.The 10 m wind tower is on top of a building in the city centre resulting in a measurement height of 37.8 m above ground.Air temperature and humidity as well as soil temperatures are recorded in an allotment south of Schmechtingwiesen Park.The sensors for air temperature and humidity are placed inside a radiation shield without artificial ventilation.Measurement range and accuracy for all sensors are listed in Table 1.
Crowdsourced air temperature data.CWS produced by the company Netatmo are the source for the crowdsourced air temperature data.The stations consist of an indoor and an outdoor module encased in a cylindrical aluminium shell.Data recorded by the outdoor module can be made available to the public.The outdoor module records air temperature and humidity [30].The measured range of air temperature is -40 to +65˚C with an accuracy of ±0.3˚C and the range of relative humidity is 0 to 100% with an accuracy of ±3% [39].A comparative measurement by Meier et al. [30] in a climate chamber confirms the air temperature accuracy except for low air temperatures of 0˚C.A field comparison revealed an accuracy of ±0.5˚C between 14:00 h and 05:00 h and a lower accuracy after sunrise and during morning hours of up to -1.3 K [30].Naturally ventilated sensors can be subject to a warm bias due to a reduced exchange of air and heat accumulation beneath the radiation shield [40].
Data provided by Netatmo were extracted via the APIs and stored in a database.Stations received an internal ID.When relocated, a station receives a new ID to keep the time series consistent.Hourly mean values were accessed.The timestamp was modified to represent the end of the averaging interval [28].A quality control (QC) procedure, first developed by Napoly et al. [31] and recently updated by Fenner et al. [28], was applied to the observations.The QC procedure filters stations and data points in five main steps: 1. Metadata check: duplicate stations are removed (M1) 2. Outlier detection based on a comparison of individual data points to the data distribution, outliers are marked (M2) The optional QC level O1, a temporal interpolation for individual missing timesteps to increase data availability, was additionally applied to the database.Default settings for each QC level were used as presented in Fenner et al. [28], except for level M5.Isolated stations were not removed from the database.For this study, data for August 2020 was retrieved from the database.When combined with the model results, the dataset was filtered further based on whether a station provided more than 80 % of data for the study period.A total of 59 stations remained.

PALM model
Model configuration.The PALM model system 6.0 revision 4901 was applied in this evaluation study.PALM uses the incompressible Boussinesq approximations of the Navier-Stokes equations which calculate seven prognostic variables on a staggered Cartesian (Arakawa-C) grid: the velocity components u, v and w, the potential temperature, the subgrid-scale turbulence kinetic energy, the water vapour mixing ratio, and optionally a passive scalar.These variables are calculated by solving the equations for the conservation of mass, momentum, thermal internal energy, moisture, and the optional passive scalar [20].The turbulence closure on the subgrid scale followed the 1.5-order Deardorff's approach [41], further refined by Moeng and Wyngaard [42] and Saiki et al. [43].Advection was described by the 5 th order upwind scheme of Wicker and Skamarock [44].Time was discretised by the 3 rd order Runge-Kutta timestep scheme [45].A multigrid pressure solver for the Poisson equation was applied.Boundary conditions at the surface are defined using the Monin-Obukhov similarity theory where a constant flux layer is assumed between the surface and the first grid level.Here roughness lengths for heat, humidity and momentum are used to provide surface heat fluxes of momentum, heat and moisture to the first grid level [20].
The PALM model includes several modules which extend the functionality of the model core to real-world scenarios.The relevant modules for this work are the land surface model (LSM), the radiation model, the radiative transfer model (RTM), the building surface model (BSM), the plant canopy model (PCM), the surface spinup mechanism, and the offline and online nesting implementation [20].
Interactions with the surface are provided by the LSM and inside of urban areas additionally by the BSM.The LSM solves the energy balance of natural surfaces like soils, vegetation types and water surfaces and paved surfaces like streets and pavements [20,46].The BSM calculates the energy balance of building surfaces like walls, roofs, and windows.The energy balance is solved in the same way as in the land surface model, but the model is applied to vertical and horizontal surfaces [20,47].The PCM accounts for the influence of the resolved vegetation on the radiation as well as dynamic and thermodynamic processes [20,47].The built-in clear sky radiation model provides incoming and outgoing radiation fluxes and models the radiation budget at the surface [20].The radiation model is extended with the RTM which models radiative processes in complex environments where shading and multiple reflections are important [48].
The surface spinup mechanism enables a precursor simulation of the radiation model, the LSM and BSM.As detailed information on surface and material temperatures are often unavailable, the spinup mechanism improves the information on initial material and ground temperatures and provide almost equilibrium conditions at model initialisation.The atmospheric code is switched off during the spinup which saves computational cost [20].A spinup period of 24 hours was applied in this study.
PALM's offline and online nesting were utilised in the present study.Three model domains were defined to incorporate mesoscale weather effects and model microscale processes in the urban canopy.A mesoscale domain with a size of ~40 x 45 km and a resolution of 50 m horizontally was used in a precursor simulation to exclude turbulence adjustment zones from the study area.A vertical grid spacing of 25 m was used up to a height of 2000 m, followed by a grid stretching of 1.08 resulting in a vertical domain size of approx.6000 m.The following simulation was divided into a parent and child domain.The parent domain had a size of 8.  Offline nesting allows to incorporate results from mesoscale weather prediction models as atmospheric boundary conditions.The pre-processor INIFOR, provided by PALM, calculates realistic initial and boundary conditions from the COSMO D2 model and provides it for the PALM simulation via the dynamic driver file [20].COSMO-D2 was the weather forecast model used by the DWD [50].The pre-processor transforms the COSMO rotated pole coordinates to the PALM Cartesian coordinates and then interpolates the COSMO data to the desired PALM resolution.It requires specific files: a file with the COSMO numerical grid, a file with the COSMO soil map to identify water cells and hourly files with COSMO analysis or forecast data for the atmospheric field and the soil temperature and moisture.Additionally, a namelist file is required which determines the domain setup and geographical position of the domain.Further options can be provided by the command-line options of the INIFOR call [51].
The COSMO model is a Reynolds Averaged Navier Stokes (RANS) model.The RANS approach averages all turbulence and calculates a mean flow velocity and direction.All unsteadiness caused by turbulence is removed.RANS uses the non-linear Navier Stokes equations resulting in the Reynolds stress term which needs to be modelled appropriately.Several models have been developed to determine the Reynolds stress term in RANS models.Together with ensemble, time or spatial averaging the mean flow field can be predicted in a RANS model model [52].As turbulence is not explicitly resolved, turbulent fluctuations must be added to the boundary values.For this purpose, the synthetic turbulence generator was applied.It is based on a method by Xie and Castro [53] which adds perturbations to the wind components at the lateral boundaries [54].
Due to the computational cost of the large mesoscale model domain, it was impossible to directly nest the study area into the mesoscale domain.Therefore, the results of the mesoscale simulation were saved with a five-minute temporal resolution which includes the resolved turbulence.This output was then processed to serve as a dynamic driver for the following simulation where the boundary conditions are updated every five minutes.As turbulence is already resolved in the dynamic driver, the adjustment zones for the generation of turbulence in the parent domain could be reduced.The online nesting allows the simulation of a large domain with a coarse resolution combined with a small domain containing the area of interest with a fine resolution.The one-way coupled mode of the online nesting was applied here where influences from the parent domain are interpolated on the child domain without feedbacks from the child to the parent domain [55].
A detailed description of the model is given by Maronga et al. [20].Further information on the model components can be found in Gehrke et al. [46] for the LSM, Resler et al. [47] for the BSM, Salim et al. [48] for the RTM, Kadasch et al. [54] for the offline nesting and Hellsten et al. [55] for the online nesting.
The mesoscale simulation was run on 126 CPU cores.The nested microscale simulations were run on 112 CPU cores with 64 CPU cores for the parent domain and 48 CPU cores for the child domain.The runtime for the mesoscale run was 27 hours and for the microscale runs was 11 days and 4 hours.
Urban canopy description.The static driver contains all information on the surface properties and topography of the model domains.The required information was derived from freely available datasets provided by the state of North Rhine-Westphalia, the Federal Republic of Germany or the Copernicus Land Monitoring Service.The data sources are listed in Table 2.
Input data for the mesoscale simulation were the DEM200 and the CORINE land cover datasets.The digital elevation model was resampled to a resolution of 50 x 50 m and the COR-INE land cover dataset was rasterised.The CORINE land cover classes were translated into the PALM vegetation, pavement and water classes according to S1 Table .Input data for the parent and child domains were the DEM, the 3D laser scanning data, the building model and the Urban Atlas data.The DEM was resampled to a resolution of 10 x 10 m for the parent domain and 2.5 x 2.5 m for the child domain.The Urban Atlas was rasterised, and the land cover classes were converted into PALM vegetation, pavement and water classes according to S2 Table .Building heights were derived from the 3D building model.Each building was assigned an ID and a building type.In this study, PALM building type 2 (residential, 1950 to 2000) was used as data on the building age and specific usage was unavailable.The 3D laser scanning data and the DEM were used to determine tree heights in the model domains.The DEM was subtracted from the laser scanning data and buildings were clipped from the tree heights.A tree must be at least 3 m tall to be resolved by PALM.As PALM requires leaf area densities (LAD) for the plant canopy module, generic LAD profiles were applied to each tree depending on tree height and grid resolution.The applied LAD profiles are listed in S3 and S5 Tables for individual trees and S4 and S6 Tables for vegetation patches.The height and position of the trees are presented in S1 Fig.The trees have a conical shape.Soil type 3 (medium-fine porosity) was used in all domains.The input data for each domain was processed to the static driver netCDF file according to the PALM input data standard.

Evaluation
The model results from the microscale runs were compared to professional and crowdsourced air temperature data and evaluated.The 2 m potential air temperature was saved as a model output from PALM. 2 m air temperature was calculated from the 2 m potential air temperature.Areas containing buildings were clipped out of the results as the 2 m air temperature at building locations is the air temperature at 2 m above roof level.Then the time series of 2 m air temperature were extracted at every station's location and at the surrounding nine grid cells for the parent domain and 24 grid cells for the child domain.All values surrounding a single station were averaged for every timestep.All data were written in a table with the timestep and station ID as identifiers.For the parent domain, all stations within 1 km of the domain boundaries were removed to exclude the influence of boundary effects on the results.The model results and the observational data were merged based on timestep and station ID.Missing observational values were filled with NAN values.
Several statistical methods are available to evaluate the model results with observational data.For this evaluation the following measures were chosen: arithmetic mean, standard deviation, Pearson correlation coefficient, coefficient of determination (R 2 ), root mean squared error (RMSE), mean squared error (MSE), index of agreement (IoA), and bias [62].The statistical measures were calculated for the whole dataset and for four-hour time intervals.The graphical evaluation is based on boxplot time series and a direct comparison of each station's observations and the matching model results.The results can be used to evaluate overall model performance and differentiate model performance on a temporal and spatial basis.
Model results were prepared for evaluation using python version 3.8.10.Statistical evaluation was carried out with R version 4.1.2[63] in RStudio [64] and maps were generated in QGIS 3.16 [65].After sunrise (05:00 h) air temperatures rise with a slight negative gradient from east to west.The morning hours are characterised by small air temperature differences, except for shaded areas and water surfaces which are cooler than surroundings.Air temperatures continue to rise until late afternoon (16:00 h).Air temperatures in built-up areas reach 34 to 36˚C.Simultaneously, air temperature differences between densely built-up and open areas increase with built-up areas showing higher air temperatures than open spaces.Air temperatures start to decline in the early evening (18:00 h) with a pronounced cooling in open and vegetated areas.After sunset (19:00 h) highest cooling rates are observed for open spaces while built-up areas cool down at a slower rate.Larger open and vegetated spaces show a stronger cooling than natural areas within built-up areas.The cooling process continues through the night.Air temperatures in built-up areas remain above 20˚C.The pattern of 2 m air temperature distribution reflects the surface properties such as the degree of imperviousness and presence of buildings.

Model results
Air temperature differences were calculated using the urban-rural air temperature difference.Rural areas were defined using the LCZ classification scheme by Stewart & Oke [66].LCZ D (low plants) was defined as a rural area as standardised climate stations are placed on open fields.The required LCZ map was downloaded from the LCZ Generator [67].The 2 m air temperature within LCZ D was averaged for every timestep and served as the rural reference air temperature.Air temperature differences for the domain and all timesteps were calculated.The resulting air temperature differences are visualised in Fig 4 .An animation of the air temperature differences with hourly timesteps for the whole simulation period is provided as S3 Fig. Air temperature differences during the first modelled night clearly show the existence of an CLUHI.Air temperature differences decline in the first hours after sunrise starting at 05:00 h.Shaded areas are visible through their strongly negative temperature differences of up to -3 K or more.Exposed open areas in the southeast warm up quicker.Starting around 11:00 h built-up areas show higher temperatures than the rural reference.The built-up areas and exposed open spaces such as south-facing slopes show an increasing positive air temperature difference compared to the rural reference in the course of the midday and afternoon hours.North-facing slopes demonstrate negative air temperature differences in the same timeframe.The positive air temperature differences of built-up areas and sealed surfaces peak in the late afternoon and the first half of the second night.Open natural spaces cool down quicker resulting in a negative temperature difference.The model results show a strong CLUHI at night with built-up areas demonstrating air temperatures >4 K higher than the rural reference.Air temperature differences slightly decrease again in the second half of the night.temperatures on the 8 th of August are overestimated by around 3 to 5 K by PALM.Relative humidity is underestimated by the model.Measured warming after sunrise is stronger than modelled warming due to lower night-time and higher daytime air temperatures.Daytime modelled relative humidity is higher than observed relative humidity.After sunset, measured air temperatures show a stronger decrease than modelled air temperatures.Differences are highest at 22:00 h and decline from thereon.Towards the second half of the night modelled and measured air temperatures are well aligned.Differences in relative humidity correspond to the air temperature differences.Differences in wind speed are highest during morning hours on 8 th August and in the afternoon.Differences between the model and the observation in the first hours could be caused by the model spinup.The model overestimates 10 m wind speed in the afternoon.This could be a reason for the underestimation of the 2 m air temperature as higher wind speeds induce better mixing and a reduction of air temperatures.Resler et al. [19] also observed a small overestimation of the wind speed with a generally good agreement.They attributed the overestimation to the limited spatial representativeness of point measurements.Nevertheless, the comparison shows an overall agreement between the model and the observation.
The modelled heat fluxes of the energy budget at the surface show a high latent heat flux and low ground and sensible heat fluxes (Fig 6).As the surface at the station is classified as tall grass in the model, the results are as expected.The high latent heat flux could explain the higher modelled relative humidity at midday.Two sources of uncertainties could cause the overestimated relative humidity.Soil moisture was supplied to the dynamic driver from the COSMO model.Due to the coarse resolution of the COSMO model, small scale differences are neglected.A longer spinup period of more than 24 hours might reduce some of these uncertainties.The other uncertainty is the soil type provided to the model, as only one soil type was used in this setup.Soil moisture and heat transport within the soil and to the atmosphere depend on the soil type prescribed to the model [46].While the model only has a low sensitivity to soil moisture in highly urbanised areas, the influence of soil moisture is significant in the vicinity of vegetation and natural surfaces [22].Therefore, the difference between the designated soil type and the actual soil type at the measurement site can differ resulting in deviating energy fluxes a different relative humidity in the model.
Evaluation with crowdsourced air temperature data.The parent domain contains 59 Netatmo stations and the child domain nine Netatmo stations after the QC.Table 3 lists the values for all statistical parameters for the study area, divided by parent and child domain.The mean air temperature in the child domain is slightly higher in the modelled data (27.6˚C)than in the observed data (27.0˚C).In the parent domain mean air temperature of modelled and observed data is the same.Differences in standard deviation are small in both domains.The Pearson correlation coefficient, the R 2 and the IoA are close to their ideal values indicating a good agreement between modelled and observed data.These measures are slightly better for the parent domain.The bias for the parent domain is very small with 0.04.The negative bias for the child domain indicates a slight overestimation of the air temperature by the model.MSE and RMSE indicate disagreement between modelled and observed data.When reducing the stations used for evaluation of the parent domain to the stations within the child domain, the evaluation metrics for the parent domain are comparable to the child domain.The indicated better performance of the parent domain could be caused by the different sample sizes and the higher number of stations within the parent domain.
A time series with boxplots visualises the temporal evolution of air temperature and the hourly variance for the study area (Fig 7).The temporal course of modelled and observed air temperature aligns well.During the first four hours modelled and observed data agree nicely.Air temperature increase in the morning is more pronounced in the modelled data than in the crowdsourced data though the alignment between model and observation improves at midday.While the timeseries reveals an underestimation of maximum daytime air temperatures in the parent domain, the maximum air temperature in the child domain is overestimated.This suggests a better representation of small-scale air temperature differences and extremes at a finer grid resolution.The small sample size in the child domain reduces the validity of the information.The late afternoon and early evening are characterised by an overestimation of air temperatures by the model, followed by a period of good agreement before the model underestimates night-time air temperatures.The boxplots indicate a higher variance in the observed air temperature data with a smaller variance for the modelled data.The child domain exhibits a higher variance of air temperature for the modelled data during midday.Both domains experience an underestimation of night-time air temperatures.This finding contrasts with the comparison of the model results to the data from the reference station indicating that one measurement location is not representative of the thermal conditions in urban areas.Comparable to the findings of Resler et al. [19], the diurnal cycle is well represented.The time series above show a temporal pattern of differences between the model and the observation.The data sets were split into four-hour time intervals to evaluate the temporal pattern of the air temperature differences.The resulting seven time intervals are: 08.08.02:00-05:00 (1), 08.08.06:00-09:00 (2), 08.08.10:00-13:00 (3), 08.08.14:00-17:00 (4), 08.08.18:00-21:00 (5), 08.08.22:00 -09.08.01:00 (6), 09.08.02:00-05:00 (7).The RMSE, IoA, bias, as well as mean air temperature and standard deviation, were calculated (Table 4).In both model  domains, the model underestimates night-time air temperature in the second night as can be seen in the bias for time intervals six and seven.Air temperatures are overestimated by the model during the rest of the modelled period in the child domain while in the parent domain air temperatures are overestimated only during the morning and early evening.The measured air temperature range is higher than the modelled air temperature except for time interval six in both domains.The IoA is highest for time intervals two, three and five showing a good representation of the warming process in the morning and the cooling process in the evening despite a high RMSE and differences in the absolute values.Differences in standard deviation are highest at daytime (time intervals three and four) indicating higher spatial differences in air temperatures than calculated by the model.One reason could be the placement of Netatmo stations.Their height above ground differs for every station.Furthermore, their placement in the vicinity of buildings affects their orientation along the facades.Consequently, the influence of shading varies at daytime depending on the exposure to direct sunlight.All this causes a higher spatial variability than the 2 m air temperature from the model results.Additionally, the model results surrounding one station are averaged and a part of the variability might be removed.Some simplifications made in the preparation of the input data as well as the simple clearsky radiation model could cause this temporal pattern of differences between the model and the observations.A sensitivity analysis by Belda et al. [22] revealed high sensitivity of the air temperature to the presence or absence of trees.More trees reduce daytime air temperature while fewer trees reduce night-time air temperatures.Geletič et al. [18] showed air temperature reductions during the day in the vicinity of trees and at neighbourhood scale as an adaptation measure.The S1 Fig reveals that some Netatmo stations are in the vicinity of trees and could experience shading.The simplifications in the generation of LAD profiles made here could underestimate the shading caused by trees and therefore result in a higher daytime maximum air temperature as seen in the child domain and a lower night-time air temperature in both domains.An improved tree representation has the potential to reduce these differences.Tree representation could be improved by implementing a more sophisticated method to generate LAD profiles closer to reality following the method of Heldens et al. [68] or by exploring different remote sensing approaches described by Fassnacht et al [69].The underestimation of night-time air temperatures in both domains could be attributed to the uncertainty of the thermal properties of buildings and pavements.Changing the building properties to model the effect of e.g.retrofitting results in significant changes of the ambient air temperature [23].As shown by Belda et al. [22], the albedo, emissivity, thermal conductivity of walls and volumetric heat capacity have the highest sensitivity to changes.Sensitivity towards albedo is most important during the day due to its relation to the radiation balance while sensitivity towards emissivity and heat capacity is especially relevant at night for the energy balance.The current approach did not differentiate between commercial and residential buildings and building age.Furthermore, the used model version uses preliminary values for the thermal properties of the pavement types [46].Improving information on the buildings could be achieved by relating the building position to the land use class as an approximation to the usage of the building and deriving the building age from the cadastral data of ALKIS for German cities as in Heldens et al. [68].Material properties of the pavement types could be provided to the model via user-defined pavement types and albedo parameters in the static driver.
The clear-sky radiation model only considers radiation interactions at the surface [20].Data on the cloud cover for the study period is provided by a professional weather station of the DWD in the neighbouring city of Essen [70].Cloud cover is given in eighths (S4 Fig) .On the 8 th of August 2020, no cloud cover was observed until 22:00 h.A dense cloud cover developed until 01:00 h and solidified for the remainder of the night.The unresolved cloud cover in the model can be a reason for the underestimation of night-time air temperatures as the cloud cover increases longwave downward radiation and therefore reduces cooling.Furthermore, heating and cooling of the air caused by the divergence of radiative fluxes is missing in this approach [19].An alternative to the clear-sky radiation model is the RRTMG model which is an external library to the PALM model.RRTMG provides information on the shortwave and longwave radiative heating rates for 1-D vertical columns [20].This could improve the representation of radiative cooling in the night by the model [19].Another alternative is including an external radiation scheme where downwelling shortwave and longwave radiation is provided to the model.Following this method, the effect of clouds can be considered in the simulations [19].
Aside from the factors named above, the specific properties of Netatmo sensors and their placement as well as model initialisation can cause temporal differences.The underestimation of night-time air temperatures by the model and the good alignment in the morning hours when compared to the crowdsourced data contrast the comparison of the model results to data from the reference station.As the potential temperature in the dynamic driver is horizontally homogenous at the start of the simulation, small scale differences in air temperature are neglected.During this model spin-up period, the crowdsourced data fit the modelled air temperature better than the readings from a single reference station.The better alignment in the morning might be caused by the generally higher temperature measurements of Netatmo stations compared to reference stations [29,31].The higher temperature readings could also explain the higher differences between the crowdsourced data and the model at night compared to the reference station.Alternatively, the initial temperatures derived from the COS-MO-D2 model might be too high.Determining the exact cause requires further investigations with more simulations.
Comparing the modelled results with the observed data for every station individually in the parent domain (Fig 8) reveals potential sources of disagreement.The warming period in the morning has a lower density of observed data points.As the Netatmo sensors have a time lag for adjusting to rapidly changing temperatures, values from that period were likely filtered during the QC procedure due to the sensor lag.This is consistent with the study by Fenner et al. [29].Air temperature at midday and afternoon is higher in the observed data of several stations.The form of the curve of some of these stations suggests that some radiation errors remain after the QC, e.g., stations 29023, 29261, 29270 and 160849.Stations with radiation errors either must be removed manually or stricter criteria can be set in QC level M2 and/or M5.Alternatively, the QC procedure must be developed further to improve the automated removal of radiation errors.Several stations without radiation error reveal an overestimation of air temperatures by the model at midday, such as stations 28994, 29031, 29440 and 154093.To the contrary, observed and modelled data align very well for other stations e.g., 29005, 29011, 29263, 29427 and 29586.Night-time air temperatures are underestimated at 39 of 59 stations in the parent domain.Reducing the number of stations to the child domain reveals a similar picture and is therefore not shown here.While for some stations the modelled and observed data align well, for other stations differences between the model and the observation are higher.Part of the disagreement could be attributed to missing data from the observations or radiation errors.
Independently of the QC procedure, the spatial representativeness of Netatmo data is relevant in the analysis.Netatmo stations are influenced by local-and micro-scale phenomena simultaneously.Their placement within urban areas results in the air temperature readings being influenced by the CLUHI effect [30].In the present analysis, this is advantageous as the model is supposed to represent the CLUHI effect and the Netatmo data can reflect on the accuracy of the representation of the CLUHI in the model.As Netatmo stations are usually placed close to walls [31], their air temperature readings are influenced additionally by heat released from walls, resulting in higher air temperature measurements than a reference station at a further distance to buildings [29,31].This could suggest an underestimation of the air temperature by the model, as in the present study at night-time, even though the deviation is possibly due to microscale effects on the individual Netatmo station.Therefore, one individual Netatmo station should not be used to evaluate the model results at a specific location.Simultaneously, studies using Netatmo data [28][29][30][31] have shown that spatial means and the sum of all stations give a reliable picture of the thermal environment.Therefore, several stations or all stations can be used for model evaluation.Due to their placement in urban areas, Netatmo stations usually do not cover natural areas like parks [29].This is also the case in the present study where the analysis of the model performance is limited to built-up areas.
Further advantages are the usage of the same type of sensor for all stations [30] excluding differences based on sensor models and no observed sensor drift as shown by Meier et al. [30] and Fenner et al. [28].
The differences between the modelled and observed air temperature are mapped for representative points in time to evaluate the differences based on the stations' locations.Fig 9 shows the air temperature differences for the study area for four points in time.Throughout the day there is no clear pattern of under-or overestimation of a specific area.Air temperature differences between the model and the observation do not vary based on the density of the surrounding urban structure as stations in the city centre and in the suburbs both show higher and lower temperature readings than calculated by the model.In the second half of the second night, the model underestimates air temperatures over the whole domain.The differences between the model and the observation show a clearer pattern on the temporal scale while no pattern can be detected on a spatial scale.A further analysis of spatial differences by different types of urbanisation was not possible in this case study.Combining the Netatmo stations within the parent domain with the LCZ map mentioned above yields six different LCZs.Three LCZs are only represented with two or three stations and the other three LCZs have a very unequal distribution of stations rendering a statistical analysis of little value.This analysis requires a larger study area with a better representation of different LCZs and should be addressed in future studies.Aside from the above considerations on the model and the CWS, some general remarks on the computational costs and transferability of the method will be discussed.The mesoscale simulation used 126 CPU cores and the nested microscale simulations used 112 CPU cores.The runtimes were approx.27 hours for the mesoscale simulation and 11 days and 4 hours for each microscale simulation.The mesoscale simulation could be run on a smaller machine with less CPU as the runtime in the present set-up is relatively short.Reducing the available CPU cores for the microscale simulation would result in an increased runtime.As an alternative, the simulation of the parent and child domains could be split where the results of the parent domain can be used as the dynamic input for the child domain as was done with the mesoscale simulation.Nevertheless, this would reduce the transferability of the results and it could add an adjustment zone for the turbulence development in the child domain.An additional limitation to the transferability is the use of the COSMO-D2 data as dynamic input for the mesoscale simulation.As the COSMO-D2 or the newer ICON-D2 archive data are not freely available, a different data source such as the WRF model could be used when no access to the COSMO data is available.Results of a different mesoscale model will differ from the COSMO results leading to a changed dynamic input and therefore different results than in this setup.The input data for the static driver is freely available, making the description of the model domains transferable.The Netatmo data used for evaluation is also freely available through the Netatmo APIs [30].Processing of the model results, evaluation of the results with crowdsourced data and visualisation were carried out with open-source software and programming languages.

Conclusions
The model results represent the air temperature differences between built-up and natural areas.Air temperature differences are highest in the early evening and throughout the night and lowest in the hours after sunrise.Comparing the model results to measurement data from a professional weather station and crowdsourced air temperature reveals a good model performance.Evaluation metrics such as the Pearson correlation coefficient and the R 2 are close to their ideal values indicating a good agreement between model and observation.However, the MSE and RMSE give higher weight to outliers and show a certain disagreement.Differences in model accuracy on a spatial basis could not be detected.On a temporal basis, the evaluation metrics suggest a slightly worse performance for the second night.
The model results and the evaluation enable answering the research questions in the following.
In the present study, does the PALM model display intraurban air temperature differences according to well-known principles of the urban climate?
The PALM model displays intraurban air temperature variations in this case study as presented in the mapped results in Figs 3 and 4. Built-up areas are characterised by higher air temperatures compared to open or natural spaces.The air temperature differences between the urban and natural areas are higher at night following the theory behind the CLUHI [2].
Based on the evaluation with CWS data, can a temporal and/or spatial pattern in model accuracy be detected?
Model accuracy varies throughout the day.Compared to the crowdsourced air temperature data, the modelled air temperature has a lower range at nearly all times.The warming period in the morning and the cooling period in the afternoon and evening are characterised by an overestimation by the model.The overestimation in the morning is likely due to a known sensor lag of the Netatmo stations [30].The IoA for both warming and cooling periods is high indicating a good representation of the pattern of the cooling and warming process.The model underestimates night-time air temperatures in the second night.The cause of the underestimation could be caused by the clear sky radiation model where clouds are neglected.Data from a nearby climate station of the DWD revealed a dense cloud cover in the second half of the second night which was not resolved in the model.While a temporal pattern in model accuracy was detected, this study did not find a temporal pattern in model accuracy.
How suitable are citizen weather station data for model evaluation?Quality controlled crowdsourced air temperature data showed significant potential for model evaluation.Crowdsourced air temperature data have a high spatial resolution and are therefore capable of representing the thermal conditions in different urban environments [29,30].The crowdsourced data used is freely available and require no additional effort in terms of measurement campaigns.The QC procedure filters most of the errors inherent in crowdsourced data.Nevertheless, radiation errors remain which must be considered during model evaluation.The differentiation between micro-and local-scale influences on the air temperature readings is complicated.Therefore, one individual station is not sufficient for model evaluation, but the information derived from several stations and/or all stations is suitable for model evaluation.
The model evaluation and the answered research questions allow an outlook for future applications of the model and crowdsourced air temperature data for model evaluation.Model results could be improved by refining the input data in terms of tree representation, building information, surface materials and soil type.Instead of applying the clear sky radiation model, the RRTMG model can be used to achieve more realistic radiation budgets.Finally, on the modelling side, the mesoscale model WRF could be used as dynamic input for the mesoscale run.WRF is an open-source model, and the newest version can incorporate the LCZ scheme as the land cover scheme to include the effect of urban areas in mesoscale models [72,73].On the measurement side, improved QC of the crowdsourced data further filtering the radiation errors can improve data quality.Higher data quality reduces uncertainties in model evaluation caused by the crowdsourced data.Adding CWS manually above natural surfaces would enable the evaluation of model performance between natural and paved surfaces.Using Netatmo CWS instead of professional measurement equipment reduces costs and ensures comparability of the measurements as the sensors remain the same.

Fig 1 .
Fig 1. Location of the study area in Germany (top left) and in relation to the surrounding municipalities (bottom left); the right map shows building positions within the study area and land use classes provided to the PALM model; land use classes are grouped into pavement types, vegetation types and water.German federal states borders are provided by the Federal Agency for Cartography and Geodesy [34], municipality borders are provided by Geobasis NRW [35].The base map (top left) is added in QGIS as XYZ Tiles [36] and the base map (bottom left) is provided by the Web Map Service (WMS) of the Federal Agency for Cartography and Geodesy [37].
8 x 8.5 km with an isotropic grid spacing of 10 m.Vertical grid stretching was applied after 200 m resulting in a height of approx.1300 m.The child domain covered an area of 1.8 x 1.5 km with an isotropic grid spacing of 2.5 m and 60 vertical grid levels.All domain layouts are represented in Fig 2.

Fig 2 .
Fig 2. Domain layouts of the mesoscale simulation (upper right) and the microscale simulation with parent and child domains as well as the available Netatmo stations in the study area.The base map (main map) is provided by the Web Map Service (WMS) of Geobasis NRW [49] and the base map (top right) is provided by the Web Map Service (WMS) of the Federal Agency for Cartography and Geodesy [37].https://doi.org/10.1371/journal.pclm.0000197.g002

Fig 3
Fig 3 visualises the spatial and temporal development of the 2 m air temperature for four representative points in time.An animation of the air temperature with hourly timesteps for the whole simulation period is provided as S2 Fig.Early morning hours on the 8 th of August show a stronger cooling of open and vegetated areas compared to built-up areas.After sunrise (05:00 h) air temperatures rise with a slight negative gradient from east to west.The morning hours are characterised by small air temperature differences, except for shaded areas and water surfaces which are cooler than surroundings.Air temperatures continue to rise until late afternoon (16:00 h).Air temperatures in built-up areas reach 34 to 36˚C.Simultaneously, air temperature differences between densely built-up and open areas increase with built-up areas showing higher air temperatures than open spaces.Air temperatures start to decline in the early evening (18:00 h) with a pronounced cooling in open and vegetated areas.After sunset (19:00 h) highest cooling rates are observed for open spaces while built-up areas cool down at a slower rate.Larger open and vegetated spaces show a stronger cooling than natural areas within built-up areas.The cooling process continues through the night.Air temperatures in built-up areas remain above 20˚C.The pattern of 2 m air temperature distribution reflects the surface properties such as the degree of imperviousness and presence of buildings.Air temperature differences were calculated using the urban-rural air temperature difference.Rural areas were defined using the LCZ classification scheme by Stewart & Oke[66].LCZ D (low plants) was defined as a rural area as standardised climate stations are placed on open fields.The required LCZ map was downloaded from the LCZ Generator[67].The 2 m air temperature within LCZ D was averaged for every timestep and served as the rural reference air temperature.Air temperature differences for the domain and all timesteps were calculated.The resulting air temperature differences are visualised in Fig 4.An animation of the air temperature differences with hourly timesteps for the whole simulation period is provided as S3 Fig. Air temperature differences during the first modelled night clearly show the existence of an CLUHI.Air temperature differences decline in the first hours after sunrise starting at 05:00 h.Shaded areas are visible through their strongly negative temperature differences of up to -3 K or more.Exposed open areas in the southeast warm up quicker.Starting around 11:00 h built-up areas show higher temperatures than the rural reference.The built-up areas and exposed open spaces such as south-facing slopes show an increasing positive air temperature difference compared to the rural reference in the course of the midday and afternoon hours.North-facing slopes demonstrate negative air temperature differences in the same timeframe.The positive air temperature differences of built-up areas and sealed surfaces peak in the late afternoon and the first half of the second night.Open natural spaces cool down quicker resulting in a negative temperature difference.The model results show a strong CLUHI at night with built-up areas demonstrating air temperatures >4 K higher than the rural reference.Air temperature differences slightly decrease again in the second half of the night.

Fig 7 .
Fig 7. Boxplot time series of the PALM 2 m air temperature [˚C] and Netatmo air temperature [˚C] for the parent (left) and child (right) domain of the study area for the timeframe 8 th of August 02:00 h to 9 th of August 05:00 h.https://doi.org/10.1371/journal.pclm.0000197.g007

Fig 8 .
Fig 8. Timeseries of the PALM 2 m air temperature [˚C] and Netatmo air temperature [˚C] for each station individually for the parent domain for the timeframe 8 th of August 02:00 h to 9 th of August 05:00 h.https://doi.org/10.1371/journal.pclm.0000197.g008

Fig 9 .
Fig 9. Spatiotemporal pattern of the air temperature differences [K] between the PALM and Netatmo data for the study area for the timesteps 08 th of August 05:00 h, 08 th of August 14:00 h, 08 th of August 20:00 h and 09 th of August 04:00 h.Base map is provided by the Web Map Service (WMS) of Geobasis NRW [71].https://doi.org/10.1371/journal.pclm.0000197.g009