Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

An Approach to Improve the Performance of PM Forecasters


The particulate matter (PM) concentration has been one of the most relevant environmental concerns in recent decades due to its prejudicial effects on living beings and the earth’s atmosphere. High PM concentration affects the human health in several ways leading to short and long term diseases. Thus, forecasting systems have been developed to support decisions of the organizations and governments to alert the population. Forecasting systems based on Artificial Neural Networks (ANNs) have been highlighted in the literature due to their performances. In general, three ANN-based approaches have been found for this task: ANN trained via learning algorithms, hybrid systems that combine search algorithms with ANNs, and hybrid systems that combine ANN with other forecasters. Independent of the approach, it is common to suppose that the residuals (error series), obtained from the difference between actual series and forecasting, have a white noise behavior. However, it is possible that this assumption is infringed due to: misspecification of the forecasting model, complexity of the time series or temporal patterns of the phenomenon not captured by the forecaster. This paper proposes an approach to improve the performance of PM forecasters from residuals modeling. The approach analyzes the remaining residuals recursively in search of temporal patterns. At each iteration, if there are temporal patterns in the residuals, the approach generates the forecasting of the residuals in order to improve the forecasting of the PM time series. The proposed approach can be used with either only one forecaster or by combining two or more forecasting models. In this study, the approach is used to improve the performance of a hybrid system (HS) composed by genetic algorithm (GA) and ANN from residuals modeling performed by two methods, namely, ANN and own hybrid system. Experiments were performed for PM2.5 and PM10 concentration series in Kallio and Vallila stations in Helsinki and evaluated from six metrics. Experimental results show that the proposed approach improves the accuracy of the forecasting method in terms of fitness function for all cases, when compared with the method without correction. The correction via HS obtained a superior performance, reaching the best results in terms of fitness function and in five out of six metrics. These results also were found when a sensitivity analysis was performed varying the proportions of the sets of training, validation and test. The proposed approach reached consistent results when compared with the forecasting method without correction, showing that it can be an interesting tool for correction of PM forecasters.


Air pollution has been the focus of public concern due to its health impact on the worldwide population, mainly in the big urban centers [1, 2]. The contamination of the earth’s atmosphere by biological molecules, particulates and other harmful substances causes diseases and death in humans, who are also harmed by the damage that other living organisms, such as food crops, natural vegetation and herds of animals, suffer [2].

Particulate matter (PM) concentration has been a major concern among the air pollutants as according to epidemiological studies [312] and several diseases have been associated with this substance [1]. The Global Monitoring Report [1] points out PM as the major urban air pollutant affecting human health. The level of damage usually depends up on the duration of exposure as well as the kind and concentration of particles in the air [2, 4, 7, 9]. In general, the short-term effects [1, 13, 14], such as irritation in the eyes, nose and throat, headaches, nausea and allergic reactions are less serious [3]. However, in some cases, the exposure to short-term air pollution can cause upper respiratory infections such as bronchitis and pneumonia and aggravate the medical conditions of individuals with asthma and emphysema [3]. The long term effects [1, 8] may include chronic respiratory disease [3], lung cancer [7], cardiovascular diseases [5], such as ischemia-reperfusion injury and atherosclerosis, and even damage to the brain [15, 16], [15, 16], liver [16, 17], or kidneys [17, 18]. Continuous exposure to air pollution [8, 9] can severely affect the health and growth of children and may aggravate medical conditions in the elderly.

The monitoring of PM concentration is a relevant issue, as it allows the governments to create public policies to prevent and warn the population regarding high levels of PM. In this scenario, Artificial Neural Networks (ANN) have been widely used for the forecasting of PM concentration [19]. A non-exhaustive search in the literature points out three general ANN-based approaches for forecasting of PM concentration: the use of an ANN itself, hybrid systems that use search algorithms for the choice of ANN parameters, and hybrid systems that combine an ANN with another forecaster. Several studies belonging to each one of the aforementioned approaches are addressed in the following.

Four different ANN models: Recurrent Network Model (RNM), Change Point Detection Model with RNM, Sequential Network Construction Model and Self Organizing Feature Model were considered by Sharma et al. [20] for forecasting the concentration data of seven pollutants, among them PM2.5 and PM10, in the California area. A Multilayer Perceptron (MLP) model, a Radial Basis Function (RBF) and a Square Multilayer Perceptron (SMLP) were addressed by Ordieres et al. [21] for forecasting PM2.5 concentration in the cities of El Paso (Texas) and Ciudad Juárez (Chihuahua). Other studies also used an MLP model: Kukkonen et al. [22] used an MLP model with homoscedastic and heteroscedastic Gaussian noise (ANN-HeG) to forecast the PM10 concentration in Helsinki, Caselli et al. [23] compared an MLP training via backpropagation, an RBF model and a multivariate regression model to forecast daily PM10 in Bari, Italy, and Gennaro et al. [24] used the MLP model developed in [23] with a specified set of input data to forecast PM10 concentration in two sites in the Western Mediterranean. Forecasting the maximum average concentration of PM10 per day in the city of Santiago, Chile, was carried out by Perez and Reyes [25] with the use of ANN. The majority of studies concerning PM concentration forecasting are regarded with one-step ahead forecasting. Multi-step ahead forecasting is also found in the literature. For example, Kurt and Oktay [26] used an MLP model to forecast sulfur dioxide (SO2), carbon monoxide (CO) and PM10 concentration levels for 3 days ahead for a Besiktas district (Istanbul, Turkey) and Caselli et al. [23] applied an MLP model to forecast PM10 concentrations for 1, 2 and 3 days ahead.

Intelligent hybrid systems have also been proposed through combinations of ANNs with other techniques. These techniques are generally employed for the selection of input variables and the best ANN parameters, such as number of neurons in hidden and input layers, activation function, and training algorithm among others. Examples of techniques that have been combined with ANN include: Principal Component Analysis (PCA) [27], which was used with MLP to forecast PM10 concentration in Thessaloniki and for selection of input variables, followed by forecasting of PM2.5 and PM10 in Thessaloniki and Helsinki via MLP and linear regression (LR) [28]; Genetic Algorithm (GA), which was applied to select the inputs for ANN to forecast PM10 emission in 26 Europe countries [29] and a Multi-Objective Genetic Algorithm (MOGA), which was applied to reduce the number of potential meteorological input variables, with the integration of an MLP model with a numerical weather prediction model HIRLAM (High Resolution Limited Area Model) to forecast sequential hourly time series concentrations of PM2.5 in Helsinki [30]; Nearest Neighbor method combined with a MLP model in the scenario of PM10 concentration in Santiago [31]; Wavelets with combination of an ANN ensemble to forecast the daily average concentration of PM10 in Warsaw, Poland [32]. Hybrid systems were also proposed in other studies: Mishra et al. [33] proposed a Neuro-fuzzy model for forecasting PM2.5 during haze conditions in Delhi, India, and Qin et al. [34] proposed a hybrid model based on Cuckoo search (CS) and ANN training via backpropagation to forecast PM concentration levels in the four major cities of China (Beijing, Shanghai, Guangzhou and Lanzhou). Hybrid systems are also employed for multi step ahead forecasting. For example, Ul-Saufie et al. [35] combined MLR and ANN with PCA to forecast daily PM10 concentration (one, two and three days ahead) in Negeri Sembilan, Malaysia.

Hybrid systems have been proposed assuming that one forecaster can be insufficient to model a time series [3638], making the combination of two or more models necessary. In this context, two assumptions are adopted: (i) a time series can contain patterns that are not purely linear or nonlinear [36]; (ii) a highly nonlinear time series cannot be modeled by an ANN alone [37, 38]. In the first assumption, the use of only linear or non-linear techniques may lead to inaccurate results, making it necessary to develop models considering both the linearity and the non-linearity involved in the time series [36]. In the second assumption, the use of one ANN can be insufficient due to problems of misspecification, generating a biased or inconsistent model [37, 38]. The misspecified ANN could be generated either during structure selection or during training, causing overfitting or underfitting problems.

A particular class of hybrid systems, which uses multiforecasters, explores the error series (residuals) in an attempt to improve the forecasting. Given the forecasting of a model, the error series (residuals) is obtained from the difference between actual time series and its forecasting. Usually, it is supposed that the error series is a white noise, i.e., it consists of independent and identically distributed random shocks that are unpredictable [37, 38]. However, due to misspecification of the forecaster, or time series behavior (linear and non-linear components), or due to disturbances present in the stochastic process after the specification of the forecaster, the assumption of the white noise can be violated. Thus, the temporal patterns that still remain in the error series can be captured and used to generate the forecasting of the residuals [37, 38].

In the scenario of air pollutant concentration forecasting, hybrid systems with more than one forecaster and using ANN have been proposed. Al-Alawi et al. [39] proposed a hybrid method using multiple regression combined with both PCA and ANN to forecast the concentration of ozone in Kuwait’s lower atmosphere. Ettouney et al. [40] also used a PCA combined with two ANNs in cascade to forecast ozone concentration in two locations in Kuwait. Westerlund et al. [41] proposed a forecast combination (FC), using two methods: linear regression and ANN to forecast air quality in Bogota. Combinations with Autoregressive Integrated Moving Average (ARIMA) model were also proposed: Sánchez et al. [42] combined an Elman neural network with ARIMA for forecasting of SO2 concentration registered in a control station in the vicinity of a coal-fired power station in northern Spain and Díaz-Robles et al. [43] proposed a hybrid ARIMA and ANN model to forecast PM10 concentration levels in urban areas of Temuco, Chile.

Hybrid models have not been applied only for forecasting parameters regarding air quality. In the literature, one can observe hybrid models in other public health interest scenarios such as: ARIMA combined with generalized regression neural network (GRNN) to forecast the incidence of tuberculosis [44], ARIMA combined with nonlinear auto-regressive neural network (NARNN) to forecast incidence of hand-foot-mouth disease (HFMD) [45] and ARIMA combined with NARNN to forecast the prevalence of schistosomiasis in humans [46].

Herein, we propose a hybrid approach for improving the performance of PM concentration forecasters using residual modeling. The hybrid approach consists of a recursive modeling, which at each iteration verifies if there are temporal patterns in the remaining residuals, thus aiming to generate the forecasting of the error series. Then, the forecasting of the error series is used in the next stage of the sequence of forecasters in order to improve the forecasting. This proposed approach differs from previous multiforecaster hybrid systems based on error series in the fact that those ones use only one residual series. The proposed hybrid system uses as many residual series as necessary to obtain a residual with white noise behavior. Another highlight of the proposed approach is its generality, that is, the forecasting system is sufficiently versatile to use a set of different forecasters, e.g. an ANN followed by an ARIMA model, or an ARIMA model followed by an ANN, or a sequence of three different forecasters. Simulation results are presented for four time series: PM2.5 and PM10 daily concentration levels in Kallio and Vallila stations in Helsinki, Finland shown in Fig 1. More specifically, in the present study, error series are used to improve the forecasting of a recent hybrid system composed of genetic algorithm (GA) and ANN presented in the literature [19]. For residual modeling, two methods are considered in this study: an MLP neural network and our own hybrid system used in PM concentration series forecasting. The results are evaluated using a set of six well-known metrics and show that the proposed approach is capable of improving the performance of the PM forecaster considered for all cases.

Fig 1. Locations of the Kallio and Vallila air quality stations in Helsinki. The Vallila station is located in urban traffic (UT) environment and the Kallio station is located in urban background (UB) environment(adapted from [47]).

The Methodology for the Correction of Pollutant Forecasters

Fig 2 shows the architecture of the proposed approach, which is composed of two modules: (i) Forecasting and (ii) Correction. Given a univariate time series (x), the aim of the first module is to train a forecasting model (M0). Then, the error series e0 is calculated as the difference between the time series and the output of the model, e0 = xM0(x).

The Correction Module is only executed if the residual series (e0) is not a white noise. In other words, a test (Fig 2—“Stop?”) is performed to verify if there is enough information in the error series to improve the precision of the forecasting.

Formally, a white noise [48] is a sequence of independent and identically distributed random values with zero mean and constant variance. In general, the literature of time series supposes that the residuals generated by the forecasters have a white noise behavior, and the module (i) Forecasting of the proposed approach (Fig 2) shows this classical approach. However, we expect that the error series contains useful information that was not captured by the forecaster M0 (module (i) Forecasting). In this context, there are tests [48] to check the hypothesis that the residuals are independent and identically distributed random variables. Some of these tests are: the portmanteau test, spectral analysis, turning point test and autocorrelation function (ACF) [48]. The ACF test is adopted in this study and consists of the cross-correlation of a series with itself at different times, as a function of two time lags. So, ACF measures the correlation between the value of a series in the time t and t + k according to Eq (1): (1) where Corr is the correlation, γh = Cov(yt, yt+k), where Cov is the covariance, and γ0 is the sample variance of the time series. ρk lies in the range [−1, 1], where 1 and −1 indicate perfect correlation and perfect anti-correlation, respectively. If the values of ρk, where k = 1, …, n, are in the range [−2s, 2s] (where s is the standard deviation of data sample), then there is no correlation; otherwise, there is correlation in the series data.

Based on the assumption that it is possible to extract valuable information from the error series (e0) to improve the forecasting of the whole system, the Correction Module trains a model (M1) that aims to forecast the error of the model (M0). Once again, if the residual series (e1), where e1 = e0M1(e0), of the model M1 is not a white noise, this procedure is repeated until the stop criterion is reached. So, the output of the Correction Module is given by two sets: models {M0, M1, …, Mn} and error series {e0, e1, …,en−1}.

After training, the models are used to forecast unseen patterns as shown in Fig 3. So, given a time series xq = [x1, x2, …, xt], we want to forecast xt+1. The predicted value of xt+1 is given by Eq (2). (2) where ideally it is expected that the contribution of the Mi−1 model is greater than that of the Mi model to the final forecasting of the proposed approach. This should occur assuming that if the remaining error series is not a white noise, at each iteration i of the proposed approach a model Mi captures temporal patterns. In this context, it is expected that the contributions of the Mi models decrease at each iteration, remaining only a signal uncorrelated in time; in other words, a white noise.

Simulation and Results

Four time series were used for the evaluation of the proposed approach. The series correspond to daily mean concentration of particulate matter (PM2.5 and PM10) from Helsinki. The data set is composed of values measured between the years 2001 to 2003 from stations Kallio and Vallila [28]. Despite the two stations be located in Helsinki, they have different characteristics. The Kallio station is located in an urban background and the Vallila station is situated in an area more exposed to pollution from local traffic.

The data set is normalized to lie within the interval [0, 1] and divided into three sets: 80% for training, 10% for validation and 10% for test. For each time series, ten simulations with the proposed approach were performed and the best model was selected based on the performance (fitness value) in the validation data set. The results for all models correspond to the one step ahead forecasting from the test set. Table 1 shows the six metrics [28, 49, 50] used to evaluate the performance of the approach employed in this paper: Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE), U of Theil Statistics (U), Average Relative Variance (ARV), Prediction of Change in Direction (POCID) and Index of Agreement (IA). For MSE, MAPE, U and ARV, the lower the value of those measures, the better is the forecasting of the model. U and ARV measures are used to compare the model performances with the forecasting of a Random Walk model and that of the mean of the series, respectively. If U and ARV values are equal to 1, the forecasting of the forecaster is equivalent to the random walk model (U = 1), or to the mean of the time series (ARV = 1), respectively. However, if U and ARV values are less than or greater than 1, the forecasting of the model is better or worse than the performance of the random walk model or mean, respectively. In case of POCID and IA, the higher the value the better is the performance of the model. The POCID can have values in the range [0, 100] and IA in the range [0, 1]. A ratio shown in Eq (3) was developed to compare the performances of the current correction with the model added and the previous correction, or the model without correction: (3) where EvaluationMeasurecorrectedModel is the forecaster’s performance reached after the current correction (n) is added and EvaluationMeasureuncorrectedModel is the value of forecaster’s performance reached with the previous correction (n − 1, model without the current correction). For POCID, IA and fitness values, if the obtained value in Eq (3) is greater than 1, the correction improves the forecasting. If the value is equal to or less than 1, the correction either does not add information, or worsens the previous forecasting, respectively.

Table 1. Metrics for forecasting assessment, where N is the size of the series, targetj is the real value at period j, outputj is the forecasting at period j and is the mean of the series.

In the column θ: ↑ means the higher the value of metric the better is the forecasting, and ↓ means the lower the value of metric the better is the forecasting.

The proposed approach is used to correct a hybrid system (model of the Forecasting Module—Fig 2) composed by GA and ANN of type MLP [19]. This hybrid system consists of two stages: optimization of the ANN parameters performed by GA and phase adjustment. In the first stage, the GA searches the best configuration of the following parameters of MLP: number of input neurons (relevant time lags), number of nodes in the hidden layer and the training algorithm among four candidates (Levenberg-Marquardt, Scaled Conjugated Gradient, Resilient Backpropagation (RPROP) and One Step Secant Conjugate Gradient).

The search process performed by GA is guided by fitness function, defined in Eq (4). Silva et al. [51] reported that the definition of an adequate fitness function is a non-trivial task. So, the used fitness function aims to aggregate different measures and the higher (as closer to 100) its value the more accurate is the forecasting model. (4)

In the second stage, if necessary, a procedure of phase adjustment is performed aiming to minimize the difference between the forecasting performed by ANN and the PM concentration series. The hybrid system [19] was chosen because it reached superior results in terms of accuracy, when compared with literature [19, 50, 52]. Thus, if the proposed approach is capable of improving a forecaster with high accuracy, it is expected that the approach also improves the forecaster with lower accuracy.

Two models are adopted for correction (Correction Module—Fig 2): the own hybrid system and an MLP model. In both cases, the models are trained recursively until a stopping condition is reached. For ANN, the input and hidden nodes are established using ACF and trial and error method in the interval [1, 20], respectively.

The parameters used in this work are described below:

  • The stopping conditions for recursive approach are: (i) the error series has a white noise behavior, or; (ii) an increase (≥ 5%) in the MAPE value with the addition of one correction;
  • The parameters of the hybrid system are set up to: (i) Initial acceptable fitness value (1% of error); (ii) Initial maximum number of time lags (10); (iii) Maximum number of hidden units (20); (iv) Maximum number of iterations (10);
  • The parameters of the GA used by the hybrid system are set up to: (i) Mutation probability of 10%; (ii) Population size of 10 individuals; (iii) Maximum number of generations equals 1000; (iv) Minimum fitness progress of 10−4. Three stopping conditions for the ANN training algorithms are used: (i) Maximum number of iterations equals 1000; (ii) cross-validation process with a generation loss of 5%; (iii) Progress training of 10−6.
  • The parameters of the ANN of type multi-layer perceptron (MLP) used in the correction phase are: (i) training algorithm is Levenberg-Marquardt; (ii) Maximum number of iterations. Three stopping conditions for Levenberg-Marquardt algorithm are used: (i) Maximum number of iterations equals 1000; (ii) cross-validation process with generation loss of 5%; (iii) Progress training of 10−6.

Correction via Artificial Neural Network

Table 2 shows the results in terms of the evaluation measures and fitness reached by the hybrid system without correction and proposed methodology using ANN for PM2.5 and PM10 time series from Kallio and Vallila stations. The hybrid system uncorrected (without correction) is named HS. The version of the hybrid system corrected, denoted by HS+Cn, corresponds to the proposed methodology, with n terms of corrections generated by an ANN model. For all time series analyzed here there was the need for just one correction term, suggesting that the HS was not capable of capturing all information contained in the time series. Therefore, a comparison is done between HS and HS+C1. After the first correction (HS+C1), the same stopping criterion was reached for all cases, which was the model residual for a white noise.

Table 2. Results with correction via ANN for all series.

The best result for each metric and fitness is highlighted in bold.

Table 2 shows that the proposed approach improved 15 out of 24 evaluation measures, achieving the best fitness for all time series. The best result for each metric and fitness is highlighted in bold. For pollutant concentration time series from Kallio station, the proposed approach improved almost all evaluation metrics, except the POCID measure. The decrease in MSE and MAPE shows that the correction was able to improve the forecasting of the uncorrected model. For Vallila station, the improvement occurred in MAPE, ARV and IA for PM10 concentration time series and in MAPE and IA for PM2.5 concentration time series. These results show that for all studied cases, HS+C1 reached a better performance than HS with respect to the fitness function, that is the objective for the proposed method, and, as a consequence, it was also observed an improvement in the performance for other metrics.

Table 3 shows the results reached with the model HS+C1 and the uncorrected model (HS) according to the ratio defined in the Eq (3). The values greater than 1 for POCID, IA and fitness show that the addition of the correction improved the forecasting of the HS. For MSE, U, MAPE and ARV the logic of the Eq (3) is inverted; values smaller than 1 show forecasting enhancement. For concentration series from Kallio station, greater improvements occurred in U, MAPE and ARV metrics. For concentration series from Vallila station, greater improvement occurred in MAPE metric. In general, the addition of corrections made the forecasting closer to the actual time series. This improvement can be seen by comparing Figs 4(a), 5(a), 6(a) and 7(a) with Figs 4(b), 5(b), 6(b) and 7(b), which are the forecasting for PM2.5 and PM10 concentration time series from Kallio and Vallila stations without and with correction, respectively.

Table 3. Comparison between subsequent corrections with ANN and regarding the model without correction measured for Eq (3) for all series.

Fig 4. Forecasting for the PM10 concentration time series for Kallio Station (solid lines—actual values; dashed lines—predicted values).

Fig 5. Forecasting for the PM2.5 concentration time series for Kallio Station (solid lines—actual values; dashed lines—predicted values).

Fig 6. Forecasting for the PM10 concentration time series for Vallila Station (solid lines—actual values; dashed lines—predicted values).

Fig 7. Forecasting for the PM2.5 concentration time series for Vallila Station (solid lines—actual values; dashed lines—predicted values).

Correction via Hybrid System

Table 4 shows the performance of the model uncorrected (HS) and the proposed methodology using HS+Cn in terms of the evaluation measures and fitness for PM2.5 and PM10 concentration time series from Kallio and Vallila stations. In the proposed methodology using correction via HS, the number of corrections n ranged from 1 to 5. The stopping condition reached was two increases in MAPE value in followed corrections, for all cases. This result suggests that the capacity of correction depends on the accuracy of the forecasting method used in this step. Table 4 shows that the proposed methodology using correction via HS was able to correct the forecasting of the HS more than once in 3 of 4 time series.

Table 4. Results with correction via HS for all series.

The best result for each metric and fitness is highlighted in bold.

Table 4 also shows that the proposed approach improved 21 out of 24 evaluation measures, achieving the best fitness functions for all time series. The best performance for each metric and fitness is highlighted in bold. For pollutant concentration time series from Kallio station, repeating the proposed approach improved almost all evaluation metrics, except the POCID measure. For Vallila station, the improvement also occurred in almost all metrics, except in the POCID measure for PM10 concentration time series. For all series, the best model considered was the last model found. For Kallio station, the best models were HS+C1 and HS+C1+C2 for PM10 and PM2.5, respectively. For Vallila station, the best models were HS+C1+C2 and HS+C1+…+C5 for PM10 and PM2.5, respectively. Table 4 shows that if the values of POCID and fitness are overlooked, the gain in the evaluation measures tends to be smaller at each correction, i.e., at each correction (Ci), the HS aggregated less information than past correction (Ci−1) in the final forecasting. Considering, for instance, Fig 8, it is observed that the MSE difference between HS +…+ Cn and HS (without correction) decreases as the number of corrections increases. The same behavior can be observed in Table 4 in terms of MSE, U, MAPE, ARV and IA, when more than one correction is performed.

Fig 8. MSE difference between HS with n corrections and HS (without correction) for PM2.5 concentration series for Vallila station.

Table 5 shows the ratio, according to Eq (3), used to compare (Cn − Cn−1), the model with n corrections with the previous model with n − 1 correction terms and the best model with model uncorrected (Cn − C0). As in Table 4, in the Table 5, the values greater than 1 for POCID and IA, and values less than 1 for MSE, U, MAPE and ARV show that the addition of the correction terms improved the forecasting of the HS. The values of fitness greater than 1 at each iteration show that the use of HS in the correction improved the initial forecasting for all time series. The total improvement can be seen in the last lines (Cn − C0) for each time series in Table 5 and by comparing the Figs 4(a), 5(a), 6(a) and 7(a) with Figs 4(c), 5(c), 6(c) and 7(c), which are the forecasting for PM2.5 and PM10 concentration time series from Kallio and Vallila stations without and with correction, respectively.

Table 5. Comparison between subsequent corrections via HS and regarding the model without correction measured for Eq (3) for all series.

Comparing the performances of the corrections via ANN and HS from Tables 3 and 5, it can be seen that for all series, the addition of one correction (C1 − C0) of HS led to better results than ANN. This fact shows that the search for HS through the combination of exploration using gradient descendent algorithms and that using GA, overcomes the use of only one learning algorithm for ANN. Consequently, in cases with correction via HS, where there were more corrections, the proposed methodology improved the performance reached by HS+C1.


Table 6 shows for all series, both stations, all the metrics and fitness, the best performance reached via ANN correction and via HS according to Tables 2 and 4, respectively. It is possible to observe, for all cases, that the correction via HS outperforms the correction via ANN. As an example, for PM10 of the Vallila station the MSE and ARV results obtained by HS (C1+C2) is one order of magnitude smaller than the one obtained by ANN (C1). This result suggests that forecasting methods with high accuracy lead to better corrections. Thus, this study indicates that between the evaluated models, the more indicated for real applications is the HS corrected via HS.

Table 6. Comparison between the corrections via ANN and via HS for all series.

The best result for each metric and fitness is highlighted in bold.

A sensitivity analysis was performed with the objective to show the robustness of the proposed approach in terms of different proportions of the data set. Thus, three proportions are considered for training, validation and testing: 80% − 10% − 10%, 50% − 20% − 30% and 50% − 30% − 20%. Figs 9 and 10 show the results reached for sensitivity analysis in terms of the fitness for all series of both stations for correction via ANN and HS, respectively.

It is observed in Fig 9(a) and 9(b), that when the correction via ANN is performed the fitness value of HS +C1 is greater than the HS without correction for concentration series in the Kallio and Vallila stations, respectively. For all considered cases, the same stopping criterion of the proposed approach was reached after the first correction (C1), which was the model residual for a white noise.

Fig 9. Fitness evolution for Kallio and Vallila stations with correction via ANN.

Fig 10(a) and 10(b) show the sensitivity analysis of the fitness value with respect to the aforementioned proportions when the correction via HS is considered for Kallio and Vallila stations, respectively. For both stations, the performance at each correction (HS +Cn) overcomes the performance of the previous correction (HS +Cn−1). For all cases presented in Fig 10 the stopping criterion was the increase in the MAPE value. It is possible to observe that for most cases more than one correction via HS was performed (Fig 10), on the contrary of the correction via ANN, where only one correction was performed in all cases (Fig 9).

Fig 10. Fitness evolution for Kallio and Vallila stations with correction via HS.

Fig 11 shows the ACF performed in the residuals obtained after each correction via HS for PM10 series with proportion 50% − 30% − 20% for Vallila station (Fig 10(b)). It is possible to observe that none of the residuals, until the correction measured, have white noise behavior. However, Fig 11 shows that at each correction the ACF behavior tends to a white noise. In this case the stopping criteria was the increase in the MAPE value.

Fig 11. Autocorrelation (ACF) for PM10 concentration series for Kallio station.

Concluding Remarks

In this paper, a new approach is proposed to improve the performance of particulate matter (PM) forecasting. Important aspects of the proposed approach are highlighted as follows: it uses recursive residuals (error series) modeling; it uses as many residuals as the error series obtained so far is not assumed (by using the ACF) to be a white noise; it is quite general in the sense that it can use a set of different forecasters. Particularly, in the present work, a hybrid system (HS) [19] composed of a GA and ANN is corrected from two methods using residual modeling: an MLP neural network and own HS.

Results were presented in terms of six well-known performance metrics for time series forecasting (mean daily concentration levels) of PM2.5 and PM10 of two stations from Helsinki: Kallio and Vallila. Results obtained point out the benefits of using recursive residual modeling. An analysis was carried out with the purpose of evaluating the robustness of the proposed approach in terms of different proportions of the data set (training-validation-testing). It was observed that the proposed approach is robust for three different proportions: 80% − 10% − 10%, 50% − 20% − 30% and 50% − 30% − 20%. Thus, as PM concentration can damage human health, the approach presented in this paper may be used as an alternative to PM forecasting, which is a relevant issue to support decisions of organizations and governments for estimating the corresponding health risks.

With objective to investigate the performance of the proposed approach, novel studies will be performed in different scenarios with different combinations of forecasters. Future research directions include: forecasting of other contaminants (e.g. NO, NO2, NOx, CO, O3) concentrations; forecasting in the scenario of extreme events (e.g. dust-storm [53, 54]); performance evaluation considering missing data; performance assessment by using series with different times (e.g. second, minute, hour, or month); forecasting with dataset which presents measurement (or observational) errors. Furthermore, a new architecture can be developed with objective to estimate health risks from the forecasting of the proposed approach.


This work was partially supported by Brazilian agencies: CNPq and Facepe.

Author Contributions

Conceived and designed the experiments: PSGMN. Performed the experiments: PSGMN. Analyzed the data: PSGMN GDCC FM TAEF. Contributed reagents/materials/analysis tools: PSGMN GDCC FM TAEF. Wrote the paper: PSGMN GDCC FM TAEF.


  1. 1. Bank W. Global Monitoring Report 2008: MDGs and the Environment: Agenda for Inclusive and Sustainable Development; 2008.
  2. 2. European Environment Agency. Air quality in Europe—Report 2014. European Environment Agency; 2014.
  3. 3. Wang KY, Chau TT. An Association between Air Pollution and Daily Outpatient Visits for Respiratory Disease in a Heavy Industry Area. PLoS ONE. 2013 10;8(10):e75220. pmid:24204573
  4. 4. Feng J, Yang W. Effects of Particulate Air Pollution on Cardiovascular Health: A Population Health Risk Assessment. PLoS ONE. 2012 03;7(3):e33385. pmid:22432017
  5. 5. Tong L, Li K, Zhou Q. Promoted Relationship of Cardiovascular Morbidity with Air Pollutants in a Typical Chinese Urban Area. PLoS ONE. 2014 09;9(9):e108076. pmid:25247693
  6. 6. Block M, Calderòn-Garcidueñas L. Air Pollution: Mechanisms of Neuroinflammation & CNS Disease. Trends in Neurosciences. 2009;32(9):506–516. pmid:19716187
  7. 7. Fajersztajn L, Veras M, Barrozo LV, Saldiva P. Air pollution: a potentially modifiable risk factor for lung Cancer. Nature Reviews Cancer. 2013;13(9):674–678. pmid:23924644
  8. 8. Brody H. Lung Cancer. Nature. 2014;513(7517):S1. pmid:25208065
  9. 9. Dadvand P, Parker J, Bell ML, Bonzini M, Brauer M, Darrow LA, et al. Maternal Exposure to Particulate Air Pollution and Term Birth Weight: A Multi-Country Evaluation of Effect and Heterogeneity. Environmental Health Perspectives. 2013;121(3):367–373.
  10. 10. Harrison RM, Yin J. Particulate matter in the atmosphere: which particle properties are important for its effects on health? Science of The Total Environment. 2000;249(1-3):85–101. pmid:10813449
  11. 11. Englert N. Fine particles and human health—a review of epidemiological studies. Toxicology Letters. 2004;149(1-3):235–242. pmid:15093269
  12. 12. Katsouyanni K, Touloumi G, Spix C, Schwartz J, Balducci F, Medina S, et al. Short-term effects of ambient sulphur dioxide and particulate matter on mortality in 12 European cities: results from time series data from the APHEA project. Air Pollution and Health: a European Approach. BMJ (Clinical research ed) 1997;314(7095):1658–1663.
  13. 13. Shang Y, Sun Z, Cao J, Wang X, Zhong L, Bi X, et al. Systematic review of Chinese studies of short-term exposure to air pollution and daily mortality. Environment International. 2013;54(0):100–111. pmid:23434817
  14. 14. Pascal M, Falq G, Wagner V, Chatignoux E, Corso M, Blanchard M, et al. Short-term impacts of particulate matter (PM10, PM10-2.5, PM2.5) on mortality in nine French cities. Atmospheric Environment. 2014;95(0):175–184.
  15. 15. Genc S, Zadeoglulari Z, Fuss SH, Genc K. The Adverse Effects of Air Pollution on the Nervous System. Journal of Toxicology. 2012;2012(782462):23 pages. pmid:22523490
  16. 16. Kleinman MT, Campbell A. Central Nervous System Effects of Ambient Particulate Matter: The Role of Oxidative Stress and Inflammation. University of California and California Air Resources Board; 2014.
  17. 17. Kim JW, Park S, Lim CW, Lee K, Kim B. The Role of Air Pollutants in Initiating Liver Disease. Toxicological Research. 2014;30(2):65–70. pmid:25071914
  18. 18. Tarantino G, Capone D, Finelli C. Exposure to ambient air particulate matter and non-alcoholic fatty liver disease. World Journal of Gastroenterology: WJG. 2013;19(25):3951–3956. pmid:23840139
  19. 19. de Mattos Neto PSG, Madeiro F, Ferreira TAE, Cavalcanti GDC. Hybrid intelligent system for air quality forecasting using phase adjustment. Engineering Applications of Artificial Intelligence. 2014;32(0):185–191.
  20. 20. Sharma S, Barai SV, Dikshit AK. Studies of air quality predictors based on neural networks. International Journal of Environment and Pollution. 2003;19(5):442–453.
  21. 21. Ordieres JB, Vergara EP, Capuz RS, Salazar RE. Neural network prediction model for fine particulate matter (PM2.5) on the US-Mexico border in El Paso (Texas) and Ciudad Juárez (Chihuahua). Environmental Modelling & Software. 2005;20(5):547–559.
  22. 22. Kukkonen J, Partanen L, Karppinen A, Ruuskanen J, Junninen H, Kolehmainen M, et al. Extensive evaluation of neural network models for the prediction of NO2 and PM10 concentrations, compared with a deterministic modelling system and measurements in central Helsinki. Atmospheric Environment. 2003;37(32):4539–4550.
  23. 23. Caselli M, Trizio L, de Gennaro G, Ielpo P. A Simple Feedforward Neural Network for the PM10 Forecasting: Comparison with a Radial Basis Function Network and a Multivariate Linear Regression Model. Water, Air, and Soil Pollution. 2009;201(1-4):365–377.
  24. 24. de Gennaro G, Trizio L, Gilio AD, Pey J, Pérez N, Cusack M, et al. Neural network model for the prediction of PM10 daily concentrations in two sites in the Western Mediterranean. Science of The Total Environment. 2013;463-464(0):875–883. pmid:23872183
  25. 25. Perez P, Reyes J. An integrated neural network model for PM10 forecasting. Atmospheric Environment. 2006;40(16):2845–2851.
  26. 26. Kurt A, Oktay AB. Forecasting air pollutant indicator levels with geographic models 3 days in advance using neural networks. Expert Systems with Applications. 2010;37(12):7986–7992.
  27. 27. Slini T, Kaprara A, Karatzas K, Moussiopoulos N. PM10 Forecasting for Thessaloniki, Greece. Environmental Modelling & Software. 2006;21(4):559–565.
  28. 28. Voukantsis D, Karatzas K, Kukkonen J, Rasanen T, Karppinen A, Kolehmainen M. Intercomparison of air quality data using principal component analysis, and forecasting of PM10 and PM2.5 concentrations using artificial neural networks, in Thessaloniki and Helsinki. Science of The Total Environment. 2011;409(7):1266–1276. pmid:21276603
  29. 29. Antanasijević DZ, Pocajt VV, Povrenović DS, Ristić MD, Perić-Grujić AA. PM10 emission forecasting using artificial neural networks and genetic algorithm input variable optimization. Science of The Total Environment. 2013;443(0):511–519. pmid:23220141
  30. 30. Niska H, Rantamaki M, Hiltunen T, Karppinen A, Kukkonen J, Ruuskanen J, et al. Evaluation of an integrated modelling system containing a multi-layer perceptron model and the numerical weather prediction model HIRLAM for the forecasting of urban airborne pollutant concentrations. Atmospheric Environment. 2005;39(35):6524–6536.
  31. 31. Perez P. Combined model for PM10 forecasting in a large city. Atmospheric Environment. 2012;60:271–276.
  32. 32. Siwek K, Osowski S. Improving the accuracy of prediction of PM10 pollution by the wavelet transformation and an ensemble of neural predictors. Engineering Applications of Artificial Intelligence. 2012;25(6):1246–1258.
  33. 33. Mishra D, Goyal P, Upadhyay A. Artificial intelligence based approach to forecast PM2.5 during haze episodes: A case study of Delhi, India. Atmospheric Environment. 2015;102(0):239–248.
  34. 34. Qin S, Liu F, Wang J, Sun B. Analysis and forecasting of the particulate matter (PM) concentration levels over four major cities of China using hybrid models. Atmospheric Environment. 2014;98(0):665–675.
  35. 35. Ul-Saufie AZ, Yahaya AS, Ramli NA, Rosaida N, Hamid HA. Future daily PM10 concentrations prediction by combining regression models and feedforward backpropagation models with principle component analysis (PCA). Atmospheric Environment. 2013;77(0):621–630.
  36. 36. Zhang GP. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing. 2003;50(0):159–175.
  37. 37. Firmino PRA, de Mattos Neto PSG, Ferreira TAE. Error modeling approach to improve time series forecasters. Neurocomputing. 2015;153:242–254.
  38. 38. Firmino PRA, de Mattos Neto PSG, Ferreira TAE. Correcting and combining time series forecasters. Neural Networks. 2014;50(0):1–11. pmid:24239986
  39. 39. Al-Alawi SM, Abdul-Wahab SA, Bakheit CS. Combining principal component regression and artificial neural networks for more accurate predictions of ground-level ozone. Environmental Modelling & Software. 2008;23(4):396–403.
  40. 40. Ettouney RS, Mjalli FS, Zaki JG, El-Rifai MA, Ettouney HM. Forecasting of ozone pollution using artificial neural networks. Management of Environmental Quality: An International Journal. 2009;20(6):668–683.
  41. 41. Westerlund J, Urbain JP, Bonilla J. Application of air quality combination forecasting to Bogota. Atmospheric Environment. 2014;89(0):22–28.
  42. 42. Sánchez AB, Ordòñez C, Lasheras FS, de Cos Juez FJ, Roca-Pardiñas J. Forecasting SO2 Pollution Incidents by means of Elman Artificial Neural Networks and ARIMA Model. Abstract and Applied Analysis. 2013;2013(6):238259.
  43. 43. Luis A Díaz-Robles and Juan C Ortega and Joshua S Fu and Gregory D Reed and Judith C Chow and John G Watson and Juan A Moncada-Herrera. A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco, Chile. Atmospheric Environment. 2008;42(35):8331–8340.
  44. 44. Zhang G, Huang S, Duan Q, Shu W, Hou Y, Miao X, et al. Application of a hybrid model for predicting the incidence of tuberculosis in Hubei, China. PLOS ONE. 2013;8(11):1–7.
  45. 45. Yu L, Zhou L, Tan L, Jiang H, Wang Y, Wei S, et al. Application of a new hybrid model with seasonal auto-regressive integrated moving average (ARIMA) and nonlinear auto-regressive neural network (ANRNN) in forecasting incidence cases of HFMD in Shenzen, China. PLOS ONE. 2014;9(6):1–9.
  46. 46. Zhou L, Yu L, Wang Y, Lu Z, Tian L, Tan L, et al. A hybrid model for predicting the prevalence of schotomiasis in humans of Qianjiang city, China. PLOS ONE. 2014;9(8):1–12.
  47. 47. Vlachogianni A, Kassomenos P, Karppinen A, Karakitsios S, Kukkonen J. Evaluation of a multiple regression model for the forecasting of the concentrations of NOx and PM10 in Athens and Helsinki. Science of The Total Environment. 2011;409(8):1559–1571. pmid:21277004
  48. 48. Box GEP, Jenkins GM, Reinsel GC. Time Series Analysis: Forecasting and Control. Wiley Series in Probability and Statistics. Wiley; 2008.
  49. 49. Rodrigues ALJ, Silva DA, de Mattos Neto PSG, Ferreira TAE. An Experimental Study of Fitness Function and Time Series Forecasting Using Artificial Neural Networks. In: Genetic and Evolutionary Computation Conference (GECCO 2010). ACM; 2010. p. 2015–2018.
  50. 50. de Mattos Neto PSG, Rodrigues ALJ, Ferreira TAE, Cavalcanti GDC. An intelligent perturbative approach for the time series forecasting problem. In: IEEE World Congress on Computational Intelligence (WCCI 2010). IEEE; 2010. p. 1–8.
  51. 51. Silva DA, Alves GI, de Mattos Neto PSG, Ferreira TAE. Measurement of Fitness Function efficiency using Data Envelopment Analysis. Expert Systems with Applications. 2014;41(16):7147–7160.
  52. 52. Ferreira TAE, Vasconcelos GC, Adeodato PJL. A New Intelligent System Methodology for Time Series Forecasting with Artificial Neural Networks. Neural Processing Letters. 2008;28(2):113–129.
  53. 53. Csavina J, Field J, Félix O, Corral-Avitia AY, Sáez AE, Betterton EA. Effect of wind speed and relative humidity on atmospheric dust concentrations in semi-arid climates. Science of The Total Environment. 2014;487(0):82–90. pmid:24769193
  54. 54. Wong MS, Xiao F, Nichol J, Fung J, Kim J, Campbell J, et al. A multi-scale hybrid neural network retrieval model for dust storm detection, a study in Asia. Atmospheric Research. 2015;158-159(0):89–106.