Estimation of ground-level PM2.5 concentration using MODIS AOD and corrected regression model over Beijing, China

To establish a new model for estimating ground-level PM2.5 concentration over Beijing, China, the relationship between aerosol optical depth (AOD) and ground-level PM2.5 concentration was derived and analysed firstly. Boundary layer height (BLH) and relative humidity (RH) were shown to be two major factors influencing the relationship between AOD and ground-level PM2.5 concentration. Thus, they are used to correct MODIS AOD to enhance the correlation between MODIS AOD and PM2.5. When using corrected MODIS AOD for modelling, the correlation between MODIS AOD and PM2.5 was improved significantly. Then, normalized difference vegetation index (NDVI), surface temperature (ST) and surface wind speed (SPD) were introduced as auxiliary variables to further improve the performance of the corrected regression model. The seasonal and annual average distribution of PM2.5 concentration over Beijing from 2014 to 2016 were mapped for intuitively analysing. Those can be regarded as important references for monitoring the ground-level PM2.5 concentration distribution. It is obviously that the PM2.5 concentration distribution over Beijing revealed the trend of “southeast high and northwest low”, and showed a significant decrease in annual average PM2.5 concentration from 2014 to 2016.


Introduction
With vibrant development of economy and rapid improvement of industry, environmental issues have gradually become a focus of attention. In recent years, the problem of haze has become outstanding, and the main reason for this is the absorption and scattering of PM 2.5 (particulate matter with aerodynamic diameters less than or equal to 2.5 μm) in visible light intensity. In addition to affecting the human daily life and economical production, PM 2.5 also contributes to chronic respiratory, pulmonary, cardiovascular, and other human diseases [1,2]. The reduction of PM 2.5 concentration has been a matter of great urgency all over the world.
The real-time monitoring of PM 2.5 concentration plays a vital role in PM 2.5 concentration reduction and environmental protection. Monitoring methods of PM 2 Lee et al. proposed a mixed effect model, which fully exploited remote sensing data to establish the relationships between daily PM 2.5 concentration and AOD in the New Zealand region and suggested the daily relations between AOD and PM 2.5 concentration through the statistical method [13]. Xie et al. developed a mixed effect model to invert daily PM 2.5 concentration in Beijing, which taken the diurnal variation relationship of AOD and PM 2.5 concentration into account [14]. This research concluded that high-resolution daily PM 2.5 concentration maps can contribute to assessment of short-term and long-term PM 2.5 exposure. Zheng et al. put forward a linear mixed effect model that integrated AOD measurement, meteorological parameters, and NO 2 column density for the estimation of PM 2.5 concentration in Beijing, Tianjin and Hebei, the Yangtze River Delta, and the Pearl River Delta [15]. As shown in the experimental results, the combination of different factors contributed to the accuracy of model, which provided the possibility of assessing PM 2.5 concentration in areas with different pollution levels.
You et al. adopted the GWR model to estimate the concentration of PM 2.5 concentration in China and introduced meteorological features as auxiliary variables into the model to improve the accuracy [16]. In the experiments, cross validated R 2 reached 0.79. The experimental results demonstrated that the proposed approach could evaluate a large area of PM 2.5 concentration distribution. Guo et al. proposed a geographic and temporal weighted regression model (GTWR), which took both the temporal and spatial variability into consideration [17]. The AOD measurements, meteorological factors, and land use variables were also employed for fitting different seasons. The experiments indicated that GTWR was superior to MLR and GWR but that it was influenced by the sampling frequency of PM 2.5 data [17]. Continuous daily PM 2.5 observations could improve the performance of the model.
The aforementioned methods, such as MLR, mixed effects mode, GWR, GTWR, established the empirical linear relationship between PM 2.5 concentration and satellite based AOD, and successfully mapped the PM 2.5 concentration distribution of the study area. However, adequate analysis of how the relevant factors, such as boundary layer height (BLH) and relative humidity (RH), effect the PM 2.5 concentration from a theoretical point of view is also essential for estimation of PM 2.5 concentration distribution.
In this work, we first analyse the relationship between ground-level PM 2.5 concentration and AOD, BLH, RH, and get the conclusion that BLH and RH can effectively correct the relationship between AOD and PM 2.5 concentration. Thereby, three kinds of more theoretical AOD-PM 2.5 regression models are established for PM 2.5 concentration estimation, which are more interpretable. Then, NDVI, ST and SPD are introduced as auxiliary variables to further improve the performance of the corrected regression model. The modified AOD-PM 2.5 concentration regression models were established separately according to the season and used to estimate the seasonal average PM 2.5 concentration over Beijing, China during 2014 to 2016. The seasonal and annual PM 2.5 concentration distribution over Beijing were mapped using the established models for better analysing the geographic distribution and annual changes of PM 2.5 concentration.

Materials and methods
Beijing, located at 39.4˚-41.6˚N, 115.7˚-117.4˚E, is an important economic and political center of China. With the rapid development of the economy, Beijing has become one of the cities heavily affected by haze, especially in spring and winter. PM 2.5 has a serious impact on urban traffic and people's life. In this section, MODIS AOD and NDVI products, station-measured PM 2.5 data, four kinds of meteorological data are described in this section (seen in Table 1).

PM 2.5 measurements
Hourly station-measured PM 2.5 concentration data observed at 15 uniformly-distributed air quality monitoring station in Beijing, which were collected from the Beijing Municipal Environmental Monitoring Center (http://zx.bjmemc.com.cn/) [18,19], were used for the experiments. The location of all the 15 air quality monitoring stations in Beijing is shown in Fig 1. and the geographical coordinates are listed in S1 Table. We averaged the measured PM 2.5 concentration between 13:00 pm and 14:00 pm local time to match the overpass time of the MODIS Aqua satellite (about 13:30 pm local time). The study period was from January 1, 2014, to December 31, 2016, spanning a total of 1096 days.

MODIS AOD and NDVI products
MODIS onboard the NASA Aqua satellite has been in operation since 2002. It provides retrieval products of aerosol and cloud properties with nearly daily global coverage. The MODIS data used in this study were the MYD04 Level 2 aerosol products with a resolution of 10 km and MYD13 monthly NDVI products with a resolution of 1 km [9]. The overpass time of the MODIS Aqua satellite is about 13:30 pm local time. The MODIS DT aerosol algorithms produce two separate products with different resolutions, with spatial resolutions of 3km and 10km respectively. However, it performs unsatisfactorily in urban regions. The DB algorithm performs better than the DT algorithm over urban areas, and it can provide better land coverage over both dark and bright surfaces, which is more reasonable for PM 2.5 concentration distribution research in Beijing [20][21][22].

ECMEF ERA interim meteorological data
The ERA-Interim data provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) are real-time updated global atmospheric reanalysis data (https://www. ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era-interim). The meteorological parameters used in this study are BLH and RH at 100m vertical direction, surface temperature and wind speed at 14:00 pm local time [23]. The spatial resolution of the ERA-Interim reanalysis data used in this study was approximately 0.125˚.

AOD-PM 2.5 corrected regression model
Because some parameters used to describe atmospheric physical conditions, such as air pressure, atmospheric temperature, and atmospheric humidity, change much more in the vertical direction than the horizontal direction [24], it is often assumed that the atmosphere has a structure in which the horizontal direction is uniform and the vertical direction is layered. When considering a single homogeneous atmospheric layer containing spherical aerosol particles, the mass concentration at the surface can be represented as in which ρ denotes the density of aerosol particles (g/m 3 ) and n dry (r) denotes the particle size distribution spectrum under dry conditions. Generally, AOD is the integral of the extinction coefficient of aerosol particles in the vertical direction of the atmosphere with aerosol scale height H [25] AODðlÞ ¼ where σ ext,amb denotes the aerosol extinction coefficient under ambient conditions. It can be expressed as s ext;amb ðlÞ ¼ R 1 0 pQ ext;amb ðr; lÞn amb ðrÞr 2 dr, where Q ext,amb denotes the particulate matter extinction efficiency under ambient conditions and n amb (r) denotes the particle size

PLOS ONE
Estimating of ground-level PM 2.5 concentrations over Beijing distribution under ambient conditions. Thus, at a given wavelength λ, the aerosol optical thickness from the ground to the height H can be expressed as [26] AOD ¼ p As we all know, different aerosol particles have different hygroscopic properties, e.g., watersoluble particles and organic aerosol particles have distinct hygroscopic properties. Thus, aerosol particles with the same mass concentration but different composition show different extinction characteristics under different humidity conditions. That is to say, the correlation between the extinction coefficient and the mass concentration changes with humidity. Therefore, to reduce the uncertainty introduced by the hygroscopic growth of the particles, it is necessary to convert the AOD obtained by satellite remote sensing into the mass concentration of dry particles.
A hygroscopic growth factor f(RH), which represents the ratio between these (size-distribution integrated) extinction efficiencies, is required to convert particulate matter extinction efficiency under ambient conditions to under dry conditions. Then, Eq (3) can be converted to the following [26,27] AOD ¼ pf ðRHÞ Furthermore, to rewrite the above formula, the size-distribution integrated extinction efficiency hQ ext i and effective radius r eff can be introduced: Hence, the relationship between AOD and near-surface PM mass concentration is finally derived as follows [25,28] As the aerosol scale height H is an assumed equivalent height under ideal conditions, it is difficult to measure directly when the near-surface aerosol extinction coefficient is unknown. In the planetary boundary layer, the obvious turbulent motion causes strong atmospheric mixing, resulting in a uniform vertical distribution of aerosols. In practice, the BLH has a similar physical meaning to the aerosol scale height H and is often used instead of H.
According to the above relationship between AOD and PM, it can be inferred that if the AOD is corrected by the factors BLH and f(RH), the corrected AOD should be expected to obtain a better correlation with PM.
Here, a time-space matched MODIS AOD-PM 2.5 data pair and a corresponding BLH and f(RH) were used to construct different linear regression models in different seasons [26].
There were 4 types of models: where a and b are model regression coefficients. Eq (8) represents a simple, original regression model, Eq (9) represents a model corrected by BLH, Eq (10) represents a model corrected by RH, and Eq (11) represents a model jointly corrected by BLH and RH. In this study, NDVI, ST and SPD are introduced as auxiliary variables to further improve the performance of the corrected regression model for estimating ground-level PM 2.5 concentration. 4 types of modified corrected regression models can be described as VIII : Those modified corrected regression models are fitted separately according to the season and used to estimate the seasonal average PM 2.5 concentration over Beijing, China during 2014 to 2016. For model evaluation, we assessed the model performance by calculating the coefficient of determination R 2 and root mean squared error (RMSE) between estimated and measured PM 2.5 concentration in different seasons: RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 N where f i and y i are the predicted PM 2.5 and the measured PM 2.5 concentration of the ith sample, respectively, and N is the total number of AOD-PM 2.5 pair samples.

Results and discussions
In our study, the correlation between estimated and measured PM 2.5 concentration using aforementioned models is evaluated. R 2 between estimated and measured PM 2.5 concentration obtained by different models according to the season during 2014-2016 is tabulated in Table 2. f(RH) is defined empirically as f(RH) = 1/(1 − RH/100).
In Table 2, the R 2 of model I can be interpreted as the correlation between measured PM 2.5 concentration and MODIS DB AOD. Generally, in spring and autumn, the estimated PM 2.5 concentration had closer correlation with the measured PM 2.5 concentration. This may be a result of more uniform near-surface aerosol composition, as well as lower BLH, which limits aerosol diffusion in Beijing during spring and autumn.
According to Eq (7), BLH and f(RH) corrections could improve the AOD-PM 2.5 correlation theoretically. The BLH corrected model was established on the basis of this. In Table 2, The R 2 of model II can be interpreted as the correlation between measured PM 2.5 concentration and BLH corrected AOD. Model correlations were affected differently in different seasons. The R 2 in spring had a large increase after BLH correction, from 0.72 to 0.75 in 2014, from 0.69 to 0.71 in 2015, from 0.76 to 0.82. The more obvious advantages of BLH correction appears in  autumn. The R 2 has about 0.08-0.09 increase in different years, and reaches 0.81 in 2014 and 2015, 0.79 in 2016, demonstrating the effectiveness of BLH correction in autumn. For winter, the BLH corrected model achieved a slight improvement in 2014 and 2015 and a significant improvement in 2016. However, the R 2 of the BLH corrected model had a decrease in summer, which because higher BLH cannot limit aerosol diffusion, as well as the poor quality of MODIS AOD products in summer because of cirrocumulus and rainfall.
The R 2 of model III can be interpreted as the correlation between measured PM 2.5 concentration and RH corrected AOD. With RH correction separately (Model III), a obvious increase of R 2 are appeared in summer, 2016, from 0.62 to 0.73. However, the results change little in other times, and the fluctuation range of R 2 is within 0.02. When considering BLH and f(RH) correction simultaneously (as Model IV), the results are very close to Model II.
When introduced NDVI, ST and SPD as auxiliary variables, the R 2 of Model V-VIII in all seasons have about 2% increase correspondingly in most situations. Fig 3 show the scatter plots and regression results between estimated and measured PM 2.5 concentration in the different seasons with the best results among all kinds of modified corrected models. In total, the correlation between estimated and measured PM 2.5 concentration is more higher in spring and autumn. The In our study, even though the complicated PM 2.5 concentration distribution in Beijing, the modified corrected AOD-PM 2.5 models achieved higher model fitting R 2 values than those in previous studies, e.g., an observation-based algorithm that considers the effect of the main aerosol characteristics applied to the Beijing-Tianjin-Hebei region with R 2 of 0.70 [3], the GWR model applied to the whole China mainland with an overall R 2 of 0.64 [5], an improved model applied to the Beijing-Tianjin-Hebei region with an R 2 of 0.77 [15], satellite-driven PM 2.5 models with VIIRS nighttime data applied to the Beijing-Tianjin-Hebei region with R 2 of 0.75 [19], and NAQPMS data incorporated to MODIS data applied to the Beijing-Tianjin-Hebei region from January to December 2017 with seasonal R 2 values of 0.75, 0.62, 0.80, and 0.78 in the spring, summer, autumn, and winter, respectively [22]. Table 3 is comparison between estimated and measured seasonal average PM 2.5 concentration of 15 monitoring stations from 2014 to 2016. Minor difference reflects the good performance of the established models in each season. Table 4. is the comparison of regional average PM 2.5 concentration from 2014 to 2016 over Beijing. In term of interannual seasonal change, PM 2.5 concentration in summer are more lower the other seasons because of a great deal of rainfall. PM 2.5 concentration in spring and winter is general higher than other seasons because of low rainfall and home-heating. In term of annual change of PM 2.5 concentration during 2014 to 2016, it witnesses a downward trend in the same season. The annual average PM 2.5 concentration decrease from 75.1 μg/m 3 to 62.0 μg/m 3 , about a 17% decrease, which confirms the effectiveness of a series of stringent clean air actions implemented by the Chinese government from 2013 to 2017 [29]. In 2016 winter, the PM 2.5 concentration decreased to 59.4 μg/ m 3 , which is inspiring.
The spatial distribution of the seasonal and annual average PM 2.5 concentration are mapped to investigate its spatial distribution characteristics over Beijing during 2014 to 2017, as shown in Fig 4. The measured average PM 2.5 concentration is represented as a circle using the same color scheme. In Beijing, PM 2.5 concentration spatial distribution reveals the trend of "southeast high and northwest low". In spring, autumn and winter of 2014, autumn and winter of 2015, autumn of 2016, this trend is more obvious. PM 2.5 concentration of the southeast of Beijing during these seasons are higher than 75 μg/m 3 . This conforms to the urbanization level and population distribution in Beijing. Human activities directly affect the PM 2.5 concentration. Although the satellite-derived PM 2.5 monitoring method can provide larger spatial coverage than ground monitoring stations, the satellite has lower temporal coverage mainly due to bad observation conditions, such as clouds and fogginess. In our study, there were 477 model-valid days during 2014 to 2016. In addition, due to cloudy or foggy weather in Beijing giving rise to inefficient sampling frequency of available satellite observations, the AOD retrieval algorithm may be not valid. Thus other monitoring approaches with larger spatial coverage and smaller weather limitations should be developed.
Overall, the seasonal estimated PM 2.5 concentration showed good consistency with those of ground measurements. We attempted to use relative humidity to correct MODIS AOD, but this brought little effect. Additional work should be done in this aspect.

Conclusion
This paper focused on analysing the influence of BLH and f(RH) on the linear relationship between AOD and PM 2.5 concentration and attempted to evaluate the improvement provided by modified regression models. As aerosol scale height, BLH can effectively improve the AOD-PM 2.5 correlation by transforming AOD into near-surface aerosol extinction coefficient. When introduced NDVI, ST and SPD, the R 2 in all seasons have about 2% increase. The spatial and temporal PM 2.5 concentration distribution characteristics over Beijing during 2014 to 2016 were well revealed by seasonal and annual PM 2.5 concentration distribution