Figures
Abstract
Air pollution is a global problem that threatens environmental sustainability and severely affects public health. Monitoring air quality and predicting future pollution levels are critical for creating effective environmental policies and enabling individuals to take precautions against air pollution. This study presents a long-term assessment of daily Air Quality Index (AQI) prediction using machine learning models based on meteorological and pollutant data collected in eastern Türkiye from 2016 to 2024. The dataset includes four major air pollutants (PM₁₀, SO₂, NO₂, O₃) and five meteorological variables (temperature, precipitation, relative humidity, wind direction, wind speed). Three models—eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Support Vector Machine (SVM)—were evaluated using the coefficient of determination (R²), root mean square error (RMSE) and mean absolute error (MAE) as performance metrics. Among these, XGBoost achieved the highest prediction accuracy (R² = 0.999, RMSE = 0.234, MAE = 0.158). The results demonstrate that ensemble-based machine learning approaches, particularly XGBoost, can effectively model AQI fluctuations using environmental predictors. These results provide valuable insights for air quality forecasting systems and suggest practical implications for regional air pollution management and early warning systems, supporting public health protection and the development of environmental health policies.
Citation: Tırınk S (2025) Machine learning-based forecasting of air quality index under long-term environmental patterns: A comparative approach with XGBoost, LightGBM, and SVM. PLoS One 20(10): e0334252. https://doi.org/10.1371/journal.pone.0334252
Editor: Takayuki Mizuno, National Institute of Informatics, JAPAN
Received: August 2, 2025; Accepted: September 22, 2025; Published: October 8, 2025
Copyright: © 2025 Sevtap Tırınk. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying the results presented in this study were obtained from two official databases of the Ministry of Environment, Urbanization and Climate Change (Türkiye). Air quality monitoring data are publicly accessible at https://sim.csb.gov.tr/. Meteorological data are available through the General Directorate of Meteorology’s MEVBIS system (https://mevbis.mgm.gov.tr/) and require user registration and purchase via the Ministry’s official platform. Additional information regarding access procedures can be obtained by contacting the General Directorate of Meteorology at pazarlama@mgm.gov.tr. These data are available to other researchers under the same conditions as those applied to the author.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: RMSE, Root Mean Square Error; MLR, Multiple Linear Regression; MSE, Mean Squared Error; MAE, Mean Absolute Error; R2, Coefficient Of Determination; SD, Standard Deviation; T, Temperature; P, Precipitation; RH, Relative Humidity; WD, Wind Direction; WS, Wind Speed; RF, Random Forest; RFR, Random Forest Regression; CR, Catboost Regression; DT, Decision Tree; MP, Multilayer Perceptron; GB, Gradient Boosting; LR, LinearR, Lasso Regression; SM, Stacked Models; KNN, K-Nearest Neighbor; LightGBM, Light Gradient Boosting Machine; SVR, Support Vector Regression; LSTM, Long Short-Term Memory; LogR, Logistic Regression; MLR, Multiple Linear Regressive; ANN, Artificial Neural Networks; ILSTM, Improved Long Short-Term Memory Algorithm; BRNN, Bayesian Regularized Neural Networks
Introduction
Air is an indispensable resource for the sustainability of life, and its quality is critical to human health and the environment. However, air pollution has become a serious global environmental problem that negatively affects atmospheric quality and public health [1,2]. Rapid industrialization and urbanization have significantly increased the concentration of harmful substances in the atmosphere, especially in industrial areas, resulting in increased pollutant emissions [3,4]. This trend has increased public awareness and concern about air quality, as well as triggered calls for the development of effective air quality management and pollution control strategies [5].
The urbanization process plays a decisive role in air pollution levels. The spatial arrangement of urban areas affects the distribution and concentration of pollutants, causing denser urban environments to be exposed to higher pollution levels, usually due to traffic emissions and industrial activities [4,5]. In the Iğdır province of Türkiye, rapid economic expansion and urbanization processes also lead to a severe deterioration in air quality. In many regions of Türkiye, especially in the winter months, using fossil fuels for heating is one of the main factors increasing air pollution. In regions such as Iğdır, this situation becomes more pronounced in the winter months, and air quality deteriorates significantly [6,7].
The effects of air pollution on public health are profound. Various studies have shown that exposure to major air pollutants — including particulate matter (PM₂.₅ and PM₁₀), sulfur dioxide (SO₂), nitrogen oxides (NOx), carbon monoxide (CO), and ozone (O₃) — is associated with serious health problems such as respiratory diseases, cardiovascular diseases, and even cancer [8,9]. In this regard, Kumar et al. [10] highlighted how particulate matter, together with meteorological conditions, critically contributes to poor air quality and adverse health outcomes. The world health organization emphasizes the urgent need for comprehensive monitoring and management strategies because of premature deaths from air pollution [11]. Increasing urban populations makes air quality protection more critical, necessitating addressing both local and transboundary pollution sources [12]. In this context, authorities are trying to combat air pollution by developing long-term strategies. However, since these solutions require large-scale implementation, they can be costly in terms of time and financial resources. Providing air quality estimates and air quality index (AQI) values is important for individuals to develop protection strategies [13]. Air quality is influenced by the interactions among economic development, urban planning, and public health. Forecasting air pollution can support local governments and vulnerable groups (e.g., individuals with respiratory diseases, pregnant women, and children) by indicating when and where air quality may deteriorate and enabling timely protective measures.
As a widely used index, the AQI provides an overall assessment of environmental air quality. It expresses the overall air quality with a single numerical value by combining the concentrations of specific pollutants (Particulate Matter smaller than 2.5 micrometers (PM2.5), Particulate Matter smaller than 10 micrometers (PM10), Ozone (O₃), Carbon Monoxide (CO), Nitrogen Dioxide (NO₂), and Sulfur Dioxide (SO₂)) [14,15]. The Turkish National Air Quality Index (TNAQI) is the version of AQI developed by the US Environmental Protection Agency (EPA) and adapted to national legislation and limit values [16]. On a scale from 0 to 500, the AQI is divided into six categories – good, moderate, unhealthy for sensitive groups, unhealthy, very unhealthy, and hazardous – with corresponding health warnings provided in Table 1 [16]. As the AQI value increases, the threat to human health also increases.
The relationship between air quality and meteorological factors is shaped by complex chemical and dynamic atmospheric reactions. Concentrations of air pollutants are highly sensitive to meteorological conditions such as wind speed and direction, relative humidity, and temperature, which have been shown in various studies to directly affect local air pollution levels [7–17]. Sekula et al. [18] evaluated the effect of atmospheric circulation on air quality, emphasizing the importance of humidity and temperature gradients, especially in the lower troposphere. Therefore, meteorological conditions should be taken into account in air quality assessments and forecasting, since they critically influence pollutant dispersion rather than being controllable factors for improving air quality.
In recent years, the use of machine learning algorithms in air quality prediction has become increasingly widespread, and significant progress has been made in this field. Studies conducted in this field in recent years are summarized in Table 2, where model performances are primarily assessed using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R²). The table includes a variety of machine learning methods such as Random Forest (RF), Random Forest Regression (RFR), Cubist Regression (CR), Decision Tree (DT), Multilayer Perceptron (MP), Gradient Boosting (GB), Linear Regression (LR), Stacked Models (SM), k-Nearest Neighbor (KNN), Light Gradient Boosting Machine (LightGBM), Support Vector Regression (SVR), Long Short-Term Memory (LSTM), Logistic Regression (LogR), Artificial Neural Network (ANN), Improved Long Short-Term Memory (ILSTM), and Bayesian Regularized Neural Networks (BRNN). Meteorological variables are abbreviated as Temperature (T), Precipitation (P), Relative Humidity (RH), Wind Direction (WD), and Wind Speed (WS). This progress results from collaborative efforts, as various studies examine the methods developed to increase the effectiveness of machine learning techniques in air quality prediction [19–21].
As shown in Table 2, ensemble learning methods such as XGBoost, and LightGBM frequently achieved the highest accuracy in AQI prediction across different regions. Support vector-based models also provided competitive results in several studies, particularly for PM-based predictions. In most cases, particulate matter (PM2.5 and PM10) emerged as the dominant factor influencing air quality, highlighting its critical role in AQI forecasting. These studies demonstrate the growing interest in AQI prediction using machine learning. However, most of them focus on short-term datasets, metropolitan regions, or limited sets of pollutants and meteorological variables. To the best of our knowledge, no long-term, region-specific study has been conducted in a geopolitically sensitive area like Iğdır, Türkiye. In addition, existing models rarely integrate both pollutant and meteorological data over extended periods. This gap further highlights the novelty of the present research.
Air pollution prediction varies across different regions and climatic conditions due to the complex, nonlinear interactions between atmospheric pollutants and meteorological parameters, posing a significant challenge on a global scale [34–36]. Although machine learning models are widely used for AQI prediction, comprehensive evaluations of the forecast performance of these models are often limited to short-term datasets or specific subsets of pollutants [37–39]. This study aims to (i) reveal the long-term status and temporal dynamics of air pollution in Iğdır, Türkiye, during 2016–2024 using the TNAQI together with pollutant and meteorological records and (ii) systematically compare the predictive performances of advanced machine learning models (XGBoost, LightGBM and SVM) for daily AQI prediction and quantitatively evaluate the impacts of air pollutants and meteorological factors. Unlike traditional studies that usually focus on short-term datasets or specific subsets of pollutants, this research presents a comprehensive evaluation by integrating multiple air pollutants (PM10, SO2, NO2, O3) with key meteorological variables to improve the forecast accuracy. The findings provide valuable insights into the adaptability of machine learning models under different environmental conditions, suggesting a scalable and generalizable AQI prediction framework that can contribute to developing data-driven air quality management policies globally.
The study’s objectives are based on these observations:
- a). Calculate daily AQI using data from four air pollutants (PM10, SO2, NO2, O3) and five meteorological parameters (temperature, precipitation, relative humidity, wind direction, wind speed) from 2016 to 2024.
- b). Apply and compare advanced machine learning models, including XGBoost, LightGBM, and SVM, to predict the AQI with high precision.
- c). Evaluate the performance of the machine learning models using metrics such as R², RMSE and MAE and assess their predictive capabilities.
- d). Train the models on 80% of the dataset and validate them using the remaining 20% to ensure reliability and accuracy in AQI prediction.
- e). Perform a comparative analysis to identify the most suitable model for AQI prediction, highlighting XGBoost’s consistent performance across all metrics.
- f). Analyze model results to determine influential parameters impacting AQI predictions and offer insights for efficient air quality management.
This study offers a novel contribution to the field of air quality modeling by focusing on Iğdır province, a border region of Türkiye adjacent to three countries—Iran, Armenia, and Nakhchivan (Azerbaijan). Despite its strategic geopolitical location and vulnerability to cross-border pollution transport, the region has remained underrepresented in empirical AQI forecasting studies. By integrating eight years of daily meteorological and pollutant data (2016–2024), this study systematically compares the predictive performance of three advanced machine learning models—XGBoost, LightGBM, and SVM. The findings demonstrate that XGBoost significantly outperforms the other models in terms of accuracy and generalizability. This comprehensive regional analysis not only fills a critical gap in the existing literature but also provides a scalable framework for AQI prediction in other transboundary or climatically similar regions.
Materials and methods
Study area and data description
The current study examined the AQI of Iğdır City of Türkiye. Iğdır province is a settlement located east of Türkiye and borders three countries (Iran, Armenia, and Nakhchivan). The coordinates of Iğdır province are recorded as latitude: 39° 55′ 25.36″ N and longitude: 44° 02′ 42.00″ E. Its surface area is 3588 km2. Also, Iğdır, located at an altitude of 800–900 m, has a provincial population of approximately 210000. The geographical location of the study area is illustrated in Fig 1.
The study area is located in Iğdır Province, Türkiye. The base map is sourced from Natural Earth (public domain; http://www.naturalearthdata.com) and elevation data are from the USGS National Map Viewer (public domain; https://www.usgs.gov).
Air quality and meteorological data, including PM10, NO2, SO2, O3, wind speed, wind direction, relative humidity, temperature, and precipitation, were collected from air quality and meteorology stations operated by the Republic of Türkiye Ministry of Environment, Urbanization and Climate Change. The dataset comprises eight year daily air pollutants data (2016–2024) for Iğdır.
Methodology for AQI calculation and indexing
In this study, AQI values were calculated using pollutant concentrations measured at the air monitoring station in Iğdır province. The TNAQI system, adapted to Türkiye’s national standards, was applied. According to this approach, the AQI is determined as the maximum sub-index among five pollutants (SO2, NO2, PM10, O3, CO). Reference intervals for pollutant concentrations and their corresponding AQI values are provided in Table 3 [16].
The AQI calculation follows Equations (1) and (2). For this purpose, four pollutants (PM₁₀, SO₂, NO₂, O₃) measured in Iğdır were used:
where i represents the air pollutant. IAQIi is the air quality sub-index for pollutant i; Ci is the concentration of pollutant i; 𝐶𝑙𝑜𝑤 and 𝐶ℎ𝑖𝑔ℎ denote the minimum and maximum concentration values of the AQI category corresponding to the specific pollutant; 𝐼𝑙𝑜𝑤 and 𝐼ℎ𝑖𝑔ℎ denote the minimum and maximum AQI values for that category (see Table 3). The following calculations were used in the study: maximum one-hour average values (μg m-3) for NO2 and SO2, maximum eight-hour average values (μg m-3) for O3, and daily average values for PM10 (μg m-3). These daily AQI values, along with the daily average pollutant concentrations and meteorological parameters, were subsequently used as input for the machine learning models. The overall methodology is illustrated in Fig 1.
Machine learning models
Light gradient boosting machine algorithm (LightGBM).
The LightGBM algorithm is a framework offering a gradient boosting method using tree-based learning methods [40]. It was developed by Microsoft researchers in 2017 [41]. This algorithm has a distributed structure and provides a faster training process and higher performance compared to other algorithms. LightGBM is based on a leaf-based growth strategy using one-sided gradient sampling, special feature grouping, and depth-limited histograms. This algorithm aims to achieve high prediction success by creating a strong learner from the combination of weak learners. In particular, LightGBM uses maximum depth-limited leaf-based growth and histogram-based methods to shorten the training time and reduce memory usage [42]. In addition, the histogram subtraction technique enables the use of a large number of histograms by dividing continuous explanatory variables, thus increasing statistical efficiency and accelerating convergence [43]. The LightGBM algorithm is a gradient boosting model built using tree-based classifiers [27,44,45]. Trees are constructed iteratively such that each step minimizes the loss function. However, traditional approaches often struggle with speed and capacity. To address these challenges, LightGBM efficiently handles large datasets and categorical features using techniques such as histogram-based splitting and leaf-wise tree growth [46].
eXtreme gradient boosting algorithm (XGBoost).
The XGBoost algorithm is a gradient boosting method proposed by Chen and Guestrin in 2016 [47]. The XGBoost aims to make more accurate predictions by adding a regularization term to the objective function to prevent overfitting [48]. The XGBoost algorithm constructs an ensemble containing a set of decision trees trained on different dataset partitions [49]. When splitting trees by depth or level, XGBoost determines the best branch splitting effect of each tree (decision) feature and the appropriate threshold for this feature. The XGBoost performs successive splits to make the tree structures more distinct [48]. Finally, the scores of the stable trees obtained during the training process are summed, and the final predicted value of the response variable is calculated [50].
Support vector machine algorithm (SVM).
The SVR algorithm is an important subgroup of support vector machine, one of the machine learning algorithms [51]. While the support vector machine algorithm used for classification operations is called support vector classification, the part that deals with modeling and prediction operations is called SVR [52,53]. Since SVR is a supervised learning method, the success of the predictions made with SVR varies depending on the training and test data sets [54]. The main goal of SVR in the linear SVR model is to define a function f(x) with the maximum deviation (ε) value from the training set and is as flat as possible. The training data lies within the limits between −ε and +ε [51]. However, many studies cannot be modeled within the framework of linear features. Therefore, in nonlinear SVR cases, the input data is transformed into a higher dimensional hilbert space so the regression line can be linear [54]. There are many nonlinear kernel functions, including the Gaussian radial basis function kernel. The kernel function used in this study is the Gaussian radial basis function kernel.
Model comparison criteria.
In this study, the performance of the models was assessed using three criteria: RMSE, MAE, and R².
Root-mean-square error. The RMSE is calculated as the square root of the Mean Squared Error (MSE), which represents the average squared difference between the observed actual output values and the model’s predicted values. RMSE provides a measure of how much the predictions deviate from the actual values, effectively serving as the standard deviation of these differences. It ranges from 0 to positive infinity, where a lower RMSE indicates predictions that are closer to the target values. In comparison, a higher RMSE reflects a more significant deviation and a wider distribution of values.
Mean Absolute Error. The MAE quantifies the error in model predictions by calculating the mean of the absolute differences between the actual and predicted values. It is obtained by summing the absolute differences for all observations in the dataset and dividing this sum by the total number of observations. The MAE ranges from 0 to positive infinity, where smaller values closer to 0 indicate better performance, as they reflect that the predicted values are more closely aligned with the actual target values.
R-Squared. The R² represents how well the model predicts the target variable. This criterion ranges from 0 to 1. When R² equals 1, the predictions perfectly align with the data, indicating absolute accuracy, while lower R² values suggest weaker predictive performance.
In the above equations, ŷ represents the predicted AQI variable, y denotes the actual AQI variable, symbolizes the average value of AQI, and i refers to the index of each data point in the dataset. The upper bounds of MAE and RMSE extend to positive infinity, which makes them less effective in fully reflecting a model’s overall performance across different datasets. In contrast, R², with its range confined between 0 and 1, provides a more reliable metric for comparing model performance on varying datasets. A model is considered effective when its RMSE and MAE values are low, and its R² value approaches 1. Furthermore, while all three metrics are suitable for evaluating a model’s performance on the same dataset, R² is particularly useful for comparing models across different datasets.
Software information
The dataset was randomly split into 80% training and 20% testing. 10-fold cross-validation was used for all algorithms, and although different k values were tested, the most robust results were obtained with 10-fold validation. All statistical analyses were performed using R software [55]. The “psych” package was used for descriptive statistics, and the “corrplot” package was used to visualize the relationships between explanatory and response variables, and the “caret” package was used to separate the dataset into training and test sets [56–58]. For the implementation of LightGBM, XGBoost, and SVR algorithms, the “lightgbm”, “xgboost”, and “e1071” packages were preferred, respectively [59–61]. The feature importances of all models were visualized using the “ggplot2” package [62].
Results
To estimate AQI, eight years daily air pollutant data (2016–2024) for Iğdır were used to compare different machine learning models. Table 4 presents descriptive statistics such as mean, standard deviation, minimum, and maximum values for all air quality criteria and meteorological features used in this study. For descriptive analysis and interpretation, daily measurements were aggregated into monthly averages, and the values shown in the table represent the mean monthly values across the eight-year dataset (2016–2024). This approach highlights seasonal variations and long-term trends. In the Table 4, n denotes the number of observations (sample sizes) used in the analysis within the scope of the statistical approach. For example, the average temperature in January was 1.67°C, with the lowest temperature being −11.8°C and the highest temperature being 7.7°C. In February, the average temperature increased slightly to 2.66°C. A significant rise in temperature was observed from March onwards, with the highest values recorded in July and August (27.28°C and 27.54°C). The temperature decreased from September onwards and was measured as 1.66°C in December (Table 4).
The precipitation parameter generally remained at low levels, with a maximum value of 29.1 mm measured in November. Relative humidity varied throughout the year, reaching its highest average of 75.44% in December. Wind speed generally remained low, with a maximum of 3.8 m s-1 measured in December. Wind direction generally varied between 200°-250° (Table 4).
The PM10 concentrations increased during the winter months and were determined to be 181.61 µg m-3 on average in January. The PM10 values decreased during the summer months but remained at high levels. SO2, NO2, and O3 concentrations also showed seasonal variation, with SO2 values increasing significantly during the winter months. The SO2 average was measured as 13.05 µg m-3 in December (Table 4).
When evaluated regarding AQI, the highest values were seen in January (172.91) and November (168.16), indicating that air quality was negatively affected in winter months. During the summer months, AQI remained at lower levels. All these results reveal that air quality in the region is strongly affected by seasonal changes and deteriorates in winter months. This situation can be attributed to increased fossil fuel use and meteorological conditions in winter months. The study makes a significant contribution to determining measures that can be taken to improve air quality in the region (Table 4).
Fig 2 illustrates the correlation coefficients among environmental factors (temperature, precipitation, relative humidity, wind direction, wind speed, PM10, SO2, NO2, and O3) and their relationship with AQI. The coefficients range between −1 and +1, where the magnitude indicates the strength and the sign shows the direction of the relationship.
According to the correlation matrix, the highest correlation coefficient is between PM10 and AQI, with a coefficient of 0.81. In addition, a relatively high and positive correlation of 0.56 between NO2 and AQI indicates that increases in NO2 levels may be associated with increased AQI values. Similarly, the correlation coefficient 0.40 between SO2 and AQI is also noteworthy. However, the correlation of −0.25 between temperature and AQI suggests that temperature increases may be associated with decreased AQI values. Lower correlation values are observed between other variables (e.g., precipitation, relative humidity, wind direction) and AQI, indicating that the effect of these variables on AQI may be less pronounced.
Hyperparameter optimization is a critical step in machine learning model development, as it directly influences predictive performance and generalization ability. Identifying the optimal parameter combinations ensures that the models are neither underfitted nor overfitted, thereby improving the reliability of AQI forecasts. Fig 3 presents the three-dimensional surface plots for the optimized model results on the train set, including the XGBoost (top), LightGBM (middle), and SVM (bottom) models. These plots illustrate the model performance under different hyperparameter combinations, where the x and y axes represent the hyperparameters, and the z-axis represents the corresponding performance metrics.
In the XGBoost section (top), the model performance shows relatively low variance across different hyperparameter settings, indicating stable results under specific parameter selections. The performance metric exhibits significant improvement and reaches peak values for certain hyperparameter combinations, highlighting the optimal conditions for the model (Fig 3). Additionally, error values are lower where the R² value is maximized, confirming the importance of model-specific hyperparameter tuning (Fig 3).
The LightGBM section (middle) includes three plots representing R², RMSE, and MAE metrics. The R² plot demonstrates how well the model explains variance in the dataset, with high values for specific parameter settings. The RMSE and MAE plots indicate the magnitude of prediction errors, where consistently low values across a wide range of hyperparameters suggest the model’s ability to make accurate and reliable predictions (Fig 3).
The SVM section (bottom) also displays R², RMSE, and MAE values, providing critical insights for optimizing the hyperparameters of the SVM model. The variations in these metrics highlight the model’s sensitivity to hyperparameter tuning, showing that proper selection of hyperparameters significantly impacts overall performance (Fig 3). The trends in these plots guide further optimization efforts to enhance the model’s predictive accuracy.
This comprehensive visualization allows for a comparative evaluation of the models, facilitating a deeper understanding of their respective performances under different parameter settings (Fig 3).
Fig 4 presents the three-dimensional surface plots for the optimized model results on the test set, showcasing the XGBoost (top), LightGBM (middle), and SVM (bottom) models. These visualizations demonstrate how model performance varies with different hyperparameter settings, where the x- and y-axes correspond to the selected hyperparameters and the z-axis reflects the associated performance metric, thereby highlighting the sensitivity of each model to parameter tuning.
In the XGBoost section (top), the performance metrics—R², RMSE, and MAE—demonstrate a pattern similar to the training set results. The consistency between the train and test sets indicates that the model maintains stable predictive capabilities across different datasets. Specifically, the plots reveal that high R² values correspond to lower RMSE and MAE, confirming that the selected hyperparameter combinations contribute to accurate model predictions (Fig 4).
The LightGBM section (middle) also provides insights into the model’s predictive success using R², RMSE, and MAE metrics. The R² plot highlights regions where the model effectively explains variance, particularly for certain hyperparameter settings. The RMSE and MAE plots indicate how prediction errors fluctuate under different conditions, with lower values signifying better performance (Fig 4). The model demonstrates robust generalization ability as it maintains low error rates across a broad range of hyperparameter settings.
The SVM section (bottom) displays performance variations based on different hyperparameter choices. The R² plot illustrates the model’s explanatory power, while RMSE and MAE plots highlight fluctuations in prediction errors. Notably, the wave-like structures in the plots indicate that the model is highly sensitive to specific hyperparameter selections. This sensitivity emphasizes the importance of fine-tuning to achieve optimal performance. Additionally, the results provide valuable insights into the model’s generalization capacity on test data, aiding in hyperparameter optimization (Fig 4).
By consolidating these models into a single Fig, Fig 3 facilitates a comparative evaluation, allowing for a clearer understanding of how each model performs on the test dataset under various hyperparameter settings.
Table 5 compares the goodness of fit criteria obtained with optimal hyperparameter values when LightGBM, XGBoost, and SVM algorithms are used in AQI estimation. LightGBM algorithm observed R2 values as 0.922 and 0.889 in training and test datasets, respectively, indicating that the model can explain the variance in the dataset to a large extent. RMSE and MAE values for the training set were measured as 18.661 and 5.666, respectively, while for the test set, these values were measured as 20.764 and 6.777, indicating that the model has slightly higher error rates in the test set.
The XGBoost algorithm performs better than other models, with R2 values indicating an almost perfect fit (0.999 in training, 0.994 in test). RMSE and MAE values are extremely low especially in the training set (RMSE 0.234, MAE 0.158), while these values are measured as 4.84 and 0.972 in the test set, suggesting that the model may have overfitted the training data. The SVM model showed an average fit with an R2 value of 0.782, which remained constant in both training and test sets; RMSE and MAE values were determined as 28.824 and 12.233 in the training set and 31.136 and 13.546 in the test set. These results show that the SVM model performs less than the other two models on this particular data set. As a result, the XGBoost algorithm stands out as the best-performing model in AQI estimation with high R2 values and low error metrics. The performances of LightGBM and SVM should be evaluated with more comprehensive analyses, especially in terms of generalization capabilities and error rates.
Fig 5 presents the variable importance levels for the three models—XGBoost (Fig 5a), LightGBM (Fig 5b), and SVM (Fig 5c)—and their respective impacts on AQI estimates. These plots highlight the significance of different variables in predicting AQI, offering insights into how each model prioritizes features.
(a) XGBoost; (b) LightGBM; (c) SVM.
In the XGBoost section (Fig 5a), PM10 emerges as the most influential variable with an importance score of 0.974, followed by O₃, SO₂, and NO₂ with scores of 0.008, 0.007, and 0.004, respectively. Other features, such as Wind Direction, Relative Humidity, and Temperature, hold lower importance levels, indicating their relatively minor impact on AQI predictions.
The LightGBM section (Fig 5b) presents a variable importance distribution similar to that of XGBoost. PM10 remains the most critical predictor with a score of 0.955, while O₃, SO₂, and NO₂ follow with importance values of 0.011, 0.010, and 0.009, respectively. Wind Direction, Wind Speed, and Temperature are identified as less significant features in the AQI estimation process.
The SVM section (Fig 5c) exhibits a different variable importance distribution compared to the other models. In this case, PM10 maintains the highest importance with a score of 0.330, but the ranking of secondary variables shifts. Precipitation, Wind Speed, and Relative Humidity gain prominence, with scores of 0.103, 0.081, and 0.080, respectively. Conversely, Temperature, SO₂, and O₃ hold lower importance levels, suggesting their diminished role in AQI predictions within this model. This finding is consistent with Choudhary et al. [63], who reported that various machine learning algorithms such as RF, SVM, Bagged MARS, and BRNN provided reliable predictions of particulate matter and gaseous pollutants, supporting the overall effectiveness of ML-based approaches in air quality studies.
A comparative analysis of these models indicates that XGBoost and LightGBM yield highly similar variable importance rankings, identifying the same key predictors for AQI estimation. This alignment reinforces the reliability of these models in accurately capturing influential environmental factors. Meanwhile, SVM assigns relatively lower importance to PM10 and emphasizes different meteorological variables, reflecting variations in how models process and interpret input features.
By consolidating these variable importance analyses into a single Fig, Fig 5 facilitates a clearer comparison across models, offering a comprehensive understanding of how each algorithm prioritizes different features in AQI estimation.
In the present study, the XGBoost model exhibits impressive goodness of fit with R2 values of 99.9% and 99.4% on the training and test sets, respectively, in various hyperparameter combinations. These high R2 values indicate that the model explains the variance in the data set extraordinarily well. The RMSE and MAE values of the XGBoost model are much lower than the other models, indicating that the estimates are closer to the actual values. Therefore, the model is more reliable. While the LightGBM and SVM models also exhibit consistent results, the superior performance of the XGBoost model suggests that it should be preferred more, especially in terms of the accuracy of the estimates and the general reliability of the model. In this context, it is concluded that the XGBoost model offers higher reliability and accuracy when making AQI estimates.
Discussion
Monitoring and improving air quality is essential for both public authorities and individuals seeking to reduce environmental and health risks. Accurate AQI estimation supports these efforts, and machine learning algorithms provide powerful tools for this task. In this study, three algorithms—SVM, LightGBM, and XGBoost—were applied to AQI prediction using pollutant and meteorological data from Iğdır, Türkiye. All three models achieved satisfactory performance, but XGBoost consistently outperformed the others, yielding R² = 0.999, RMSE = 0.234, and MAE = 0.158. These results demonstrate that XGBoost is an effective and reliable model for AQI prediction, with reduced risk of overfitting compared to previous approaches.
These findings confirm that XGBoost has a superior ability to handle particularly complex datasets and understand the interactions between multidimensional features. Similar performance of XGBoost has also been reported in other studies [64,65]. For example, Van et al. [13] compared the performances of Decision Tree, Random Forest, and XGBoost algorithms. They stated that the XGBoost model gave the best results in accuracy (R² = 0.9993), error (RMSE = 2.5359), and MAE (1.2844) metrics on two different datasets.
Although the LightGBM model showed lower performance than XGBoost, it provided a good level of accuracy in AQI prediction. R² values for LightGBM were recorded as 0.922 in the training set and 0.889 in the test set. RMSE values were 18.661 and 20.764, and MAE values were 5.666 and 6.777, respectively. These findings show that LightGBM offers an important alternative for relatively less complex models. Ravindiran et al. [25], in particular, stated that CatBoost stood out in their similar studies, but LightGBM also produced reliable results in most cases.
The SVM model, on the other hand, exhibited lower performance compared to the other two models. The R² value was recorded as 0.782 in both training and test sets. The RMSE values were 28.824 and 31.136, and the MAE values were 12.233 and 13.546, respectively. These results indicate that SVM should be optimized for more complex datasets and multidimensional features. Liu et al. [33] emphasized that an SVR-based model was successful in some cases, but its generalizability may be limited.
The challenges associated with traditional air quality monitoring methods further emphasize the importance of using machine learning for AQI estimation. Traditional approaches often rely on fixed monitoring stations, which can provide limited spatial coverage and cannot effectively capture local pollution events [66,67]. In contrast, machine learning models play a pivotal role in integrating diverse datasets, including real-time sensor data and historical pollution records, to improve the accuracy of predictions. This comprehensive approach is particularly important in urban environments, where pollution sources can vary significantly in different areas [68,69]. Moreover, integrating meteorological parameters into predictive models is crucial because weather conditions such as temperature, humidity, and wind speed can significantly affect pollutant distribution and concentration levels [70,71]. The impact of meteorological parameters (temperature, precipitation, wind direction, wind speed, and humidity) on air quality is important in this study. The findings showed that meteorological factors are important inputs for AQI estimation, and air pollution varies depending on environmental conditions. Sigamani and Venkatesan [30] stated that meteorological factors play an important role in AQI estimation, affecting pollution concentrations by 60–74%.
This study’s findings align with various machine learning-based AQI prediction studies in literature. Ravindiran et al. [24] highlighted that the CatBoost model performed best with R² = 0.9998 and RMSE = 0.76; however, in this study, the performance of XGBoost is close to CatBoost. Similarly, Liu et al. [33] found that an SVR-based model was superior in AQI prediction (R² = 0.9766), but the study did not cover newer algorithms such as XGBoost.
The study’s findings also contribute to the ongoing discourse on overfitting in machine learning models. Overfitting occurs when a model learns from the noise in the training data rather than the underlying patterns, leading to poor generalization of unseen data. The ability of the XGBoost model to maintain consistent performance across a range of metric scores suggests that it effectively mitigates overfitting, a concern noted in previous research on machine learning applications in environmental science [72,73]. This feature is particularly valuable in the context of air quality prediction, where accurate prediction is essential for public health interventions and policymaking.
Moreover, the implications of improved AQI prediction extend beyond purely academic interest; there are real-world applications in public health and urban planning. Accurate AQI predictions can inform government responses to pollution events, enabling timely public health warnings and interventions. For example, during periods of high pollution, authorities can impose traffic restrictions or encourage public transportation to reduce exposure risks [74,75]. Additionally, individuals can use AQI predictions to make informed decisions about outdoor activities, reducing the health risks associated with poor air quality [76–77]. In this regard, Kumar et al. [78] further noted that conventional AQI measures may underestimate actual health risks, highlighting the importance of developing reliable forecasting systems that can better inform public protection strategies.
The study’s results are significant, particularly in the context of recent global events such as the COVID-19 pandemic. During the pandemic, several studies demonstrated that poor air quality aggravated respiratory conditions and increased vulnerability to severe outcomes from SARS-CoV-2 infection [79,80]. In this regard, the ability to accurately predict AQI is especially valuable, as it can support early interventions and public health strategies aimed at reducing exposure to harmful pollutants during health crises. This makes the study’s findings highly relevant for protecting public health, particularly in densely populated urban areas where pollution levels are often elevated.
Most of the studies summarized in Table 2 focus on short-term datasets, a limited number of pollutant variables, or exclusively metropolitan regions. Moreover, many of these studies do not incorporate the long-term combined evaluation of meteorological and pollutant parameters. In this context, our study contributes to literature by using a long-term dataset that integrates both pollutant and meteorological variables. Conducting such a long-term analysis in a geopolitically sensitive and climatically unique region like Iğdır further reinforces the originality and significance of our findings.
In conclusion, the application of machine learning algorithms, particularly XGBoost, to predict AQI in Iğdır, Türkiye, demonstrates the potential of these technologies to improve air quality management. The study’s findings not only highlight the effectiveness of machine learning in this area but also highlight the importance of integrating diverse datasets and addressing overfitting to increase model reliability. As urbanization continues to increase and air pollution remains a pressing global issue, developing robust predictive models will be important to create healthier environments and support public well-being.
Conclusion
Air pollution levels in Iğdır have increased significantly, particularly during the winter months. The average AQI values in January and November were 172.9 and 168.2, respectively, which correspond mostly to the “Unhealthy” category according to the TNAQI classification. PM₁₀ was determined to be the primary pollutant in pollution.
Analysis and model comparison results indicate that the XGBoost model achieved significantly superior performance in AQI predictions compared to the LightGBM and SVM models. While the R2 values of the XGBoost model were exceptionally high in both the training (99.9%) and test (99.4%) sets, the error metrics, such as RMSE and MAE, were also notable with low values such as 0.234 and 0.158, respectively. These results show that the model explains the variance in the dataset perfectly well, and the predictions are highly accurate.
On the other hand, the LightGBM model also exhibited robust results, with R2 values measured as 92.2% and 88.9% in the training and test sets, respectively. This model also provided acceptable performance with low RMSE and MAE values. On the other hand, the SVM model exhibited lower reliability than these two models, with R2 values remaining at 78.2% in both sets and relatively high RMSE and MAE values, indicating that the model is not as effective as the others.
Considering these findings, the XGBoost model offers a more reliable and effective alternative for AQI estimations than other models. These results underline XGBoost’s high adaptability and accuracy capacity, especially in complex data structures and when evaluating the interaction of various environmental parameters. The proposed approach can serve as a valuable guide for future modeling studies and constitute a basis for applications on larger data sets.
Acknowledgments
The author is grateful to the ‘Republic of Türkiye Ministry of Environment, Urbanization and Climate Change’ and for providing the air quality and meteorological data.
References
- 1. Babatola SS. Global burden of diseases attributable to air pollution. J Public Health Afr. 2018;9(3):813. pmid:30687484
- 2. Kumar A, Patil RS, Dikshit AK, Kumar R. Comparison of predicted vehicular pollution concentration with air quality standards for different time periods. Clean Techn Environ Policy. 2016;18(7):2293–303.
- 3. Dadkhah-Aghdash H, Rasouli M, Rasouli K, Salimi A. Detection of urban trees sensitivity to air pollution using physiological and biochemical leaf traits in Tehran, Iran. Sci Rep. 2022;12(1):15398. pmid:36100647
- 4. Zhou C, Li S, Wang S. Examining the impacts of urban form on air pollution in developing countries: a case study of China’s megacities. Int J Environ Res Public Health. 2018;15(8):1565. pmid:30042324
- 5. Patton AP, Perkins J, Zamore W, Levy JI, Brugge D, Durant JL. Spatial and temporal differences in traffic-related air pollution in three urban neighborhoods near an interstate highway. Atmos Environ (1994). 2014;99:309–21. pmid:25364295
- 6. Argun YA, Tırınk S, Bayram T. Effect of urban factors on air pollution of Igdir. Black Sea J Eng Sci. 2019;2(4):123–30.
- 7. Tırınk S, Öztürk B. Evaluation of PM10 concentration by using Mars and XGBOOST algorithms in Iğdır Province of Türkiye. Int J Environ Sci Technol. 2023;20:5349–58.
- 8. Tui Y, Qiu J, Wang J, Fang C. Analysis of spatio-temporal variation characteristics of main air pollutants in Shijiazhuang city. Sustainability. 2021;13(2):941.
- 9.
Erawan M, Karuniasa M. Spatial dynamics of air pollution in Tangerang City, Jabodetabek metropolitan area. In: Proceedings of the 13th International Interdisciplinary Studies Seminar, IISS 2019. 30–1.
- 10. Kumar RP, Prakash A, Singh R, Kumar P. Machine learning-based prediction of hazards fine PM2.5 concentrations: a case study of Delhi, India. Discov Geosci. 2024;2(1).
- 11. Cho H-S, Choi M. Effects of compact urban development on air pollution: empirical evidence from Korea. Sustainability. 2014;6(9):5968–82.
- 12. Abas A, Aiyub K, Awang A. Biomonitoring potentially toxic elements (PTES) using lichen transplant usnea misaminensis: a case study from Malaysia. Sustainability. 2022;14(12):7254.
- 13. Van NH, Van Thanh P, Tran DN, Tran DT. A new model of air quality prediction using lightweight machine learning. Int J Environ Sci Technol. 2023;20:2983–94.
- 14.
EPA. National air quality and emissions trends report 1997. 454/R98-016. EPA; 1997.
- 15.
EPA. Air quality index reporting; final rule. Fed Reg. CFR; 1999;Part III.
- 16.
TNAQI. Republic of Türkiye ministry of environment, urbanisation and climate change. the Turkish national air quality index (TNAQI). 2025. https://dathm.csb.gov.tr/hava-kalitesi-indeksi-i-89066
- 17. Yang Q, Yuan Q, Li T, Shen H, Zhang L. The Relationships between PM2.5 and meteorological factors in China: seasonal and regional variations. Int J Environ Res Public Health. 2017;14(12):1510. pmid:29206181
- 18. Sekula P, Ustrnul Z, Bokwa A, Bochenek B, Zimnoch M. Random forests assessment of the role of atmospheric circulation in PM10 in an urban area with complex topography. Sustainability. 2022;14(6):3388.
- 19. Bellinger C, Mohomed Jabbar MS, Zaïane O, Osornio-Vargas A. A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health. 2017;17(1):907. pmid:29179711
- 20. Liu Y, Wang P, Li Y, Wen L, Deng X. Air quality prediction models based on meteorological factors and real-time data of industrial waste gas. Sci Rep. 2022;12(1):9253. pmid:35661145
- 21. Zhang B, Duan M, Sun Y, Lyu Y, Hou Y, Tan T. Air quality index prediction in six major chinese urban agglomerations: a comparative study of single machine learning model, ensemble model, and hybrid model. Atmosphere. 2023;14(10):1478.
- 22. Ansari A, Quaff AR. Advanced machine learning techniques for precise hourly air quality index (AQI) prediction in Azamgarh, India. Int. J. Environ. Res. 2025;19, 1–31.
- 23. Aram SA, Nketiah EA, Saalidong BM, Wang H, Afitiri AR, Akoto AB, et al. Machine learning-based prediction of air quality index and air quality grade: a comparative analysis. Int J Environ Sci Technol. 2024;21:1345–60.
- 24. Ravindiran G, Hayder G, Kanagarathinam K, Alagumalai A, Sonne C. Air quality prediction by machine learning models: a predictive study on the indian coastal city of Visakhapatnam. Chemosphere. 2023;338:139518. pmid:37454985
- 25. Gupta NS, Mohta Y, Heda K, Armaan R, Valarmathi B, Arulkumaran G. Prediction of air quality index using machine learning techniques: a comparative analysis. J Environ Public Health. 2023;2023:1–26.
- 26. Maltare NN, Vahora S. Air quality index prediction using machine learning for Ahmedabad city. Digit Chem Eng. 2023;7:1–9.
- 27. Zhang H, Ge L, Zhang G, Fan J, Li D, Xu C. A two-stage intrusion detection method based on light gradient boosting machine and autoencoder. Math Biosci Eng. 2023;20(4):6966–92. pmid:37161137
- 28. Sarkar N, Gupta R, Keserwani PK, Govil MC. Air quality index prediction using an effective hybrid deep learning model. Environ Pollut. 2022;315:120404. pmid:36240962
- 29.
Pant A, Sharma S, Bansal M, Narang M. Comparative analysis of supervised machine learning techniques for AQI prediction. In: 2022 International conference on advanced computing technologies and applications (ICACTA), 2022. 1–4. doi: https://doi.org/10.1109/icacta54488.2022.9753636
- 30. Sigamani S, Venkatesan R. Air quality index prediction with influence of meteorological parameters using machine learning model for IoT application. Arab J Geosci. 2022;15:1–12.
- 31. Janarthanan R, Partheeban P, Somasundaram K, Navin Elamparithi P. A deep learning approach for prediction of air quality index in a metropolitan city. Sustain Cities Soc. 2021;67:102720.
- 32. Castelli M, Clemente FM, Popovič A, Silva S, Vanneschi L. A machine learning approach to predict air quality in California. Complexity. 2020;2020:1–23.
- 33. Liu H, Li Q, Yu D, Gu Y. Air quality index and air pollutant concentration prediction based on machine learning algorithms. Appl Sci. 2019;9(19):1–9.
- 34. Nebenzal A, Fishbain B. Long-term forecasting of nitrogen dioxide ambient levels in metropolitan areas using the discrete-time Markov model. Environ Model Softw. 2018;107:175–85.
- 35. Liu D, Lee S, Huang Y, Chiu C. Air pollution forecasting based on attention‐based LSTM neural network and ensemble learning. Expert Syst. 2019;37(3).
- 36. Liang Y-C, Maimury Y, Chen AH-L, Juarez JRC. Machine learning-based prediction of air quality. Appl Sci. 2020;10(24):9151.
- 37. Shen J, Valagolam D, McCalla S. Prophet forecasting model: a machine learning approach to predict the concentration of air pollutants (PM2.5, PM10, O3, NO2, SO2, CO) in Seoul, South Korea. PeerJ. 2020;8:e9961. pmid:32983651
- 38. Dairi A, Harrou F, Khadraoui S, Sun Y. Integrated multiple directed attention-based deep learning for improved air pollution forecasting. IEEE Trans Instrum Meas. 2021;70:1–15.
- 39. Pappa A, Kioutsioukis I. Forecasting particulate pollution in an urban area: from copernicus to Sub-Km scale. Atmosphere. 2021;12(7):881.
- 40.
Ke G, Meng Q, Finley T. LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st conference on neural information processing systems (NeurIPS 2017), Long Beach, CA, USA, 2017.
- 41.
Li F, Zhang L, Chen B. A light gradient boosting machine for remaining useful life estimation of aircraft engines. In: Proceedings of the International conference on intelligent transportation, Maui, HI, USA, 2018.
- 42. Hajihosseinlou M, Maghsoudi A, Ghezelbash R. A novel scheme for mapping of MVT-type Pb–Zn prospectivity: LightGBM, a highly efficient gradient boosting decision tree machine learning algorithm. Nat Resour Res. 2023;32:2417–38.
- 43. Cai J, Li X, Tan Z, Peng S. An assembly-level neutronic calculation method based on LightGBM algorithm. Annal Nuclear Energy. 2021;150:107871.
- 44.
Zhang H, Ge L, Wang Z. A high-performance intrusion detection system using LightGBM based on oversampling and undersampling. In: International conference on intelligent computing, 2022.
- 45. Xiao X, Shao YT, Luo ZT, Qiu WR. m5C-HPromoter: an ensemble deep learning predictor for identifying 5-methylcytosine sites in human promoters. Curr Bioinform. 2022;5:452–61.
- 46. Cui B, Ye Z, Zhao H, Renqing Z, Meng L, Yang Y. Used car price prediction based on the iterative framework of XGBoost+LightGBM. Electronics. 2022;11(18):2932.
- 47.
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, San Francisco, CA, USA, 2016. 785–94.
- 48. Szczepanek R. Daily streamflow forecasting in mountainous catchment using XGBoost, LightGBM and CatBoost. Hydrology. 2022;9(12):226.
- 49. Faraz A, Tırınk C, Önder H, Şen U, Ishaq HM, Tauqir NA, et al. Usage of the XGBoost and MARS algorithms for predicting body weight in Kajli sheep breed. Trop Anim Health Prod. 2023;55(4):276. pmid:37500805
- 50. Yang Y, Wu Y, Wang P, Jiali X. Stock price prediction based on XGBoost and LightGBM. E3S Web Conf. 2021;275:01040.
- 51. Nguyen QT, Fouchereau R, Frénod E, Gerard C, Sincholle V. Comparison of forecast models of production of dairy cows combining animal and diet parameters. Comput Electron Agric. 2020;170:1–10.
- 52. Smola AJ, Schölkopf B. A tutorial on support vector regression. Stat Comput. 2004;14:199–222.
- 53. Kavaklıoğlu K. Modeling and prediction of Turkey’s electricity consumption using support vector regression. Appl Energy. 2011;88:368–75.
- 54. Patel AK, Chatterjee S, Gorai AK. Development of a machine vision system using the support vector machine regression (SVR) algorithm for the online prediction of iron ore grades. Earth Sci Inform. 2018;12(2):197–210.
- 55.
R Core Team. R: a language and environment for statistical computing; R foundation for statistical computing: Vienna, Austria, 2022. https://www.r-project.org/
- 56.
Wei T, Simko V. Visualization of a correlation matrix. R Package; 2021.
- 57.
Revelle W. Psych: procedures for personality and psychological research. Evanston, IL, USA: Northwestern Scholars; 2022.
- 58.
Kuhn M. Caret: classification and regression training. Astrophysics Data System; 2022.
- 59.
Chen T, He T, Benesty M. xgboost: extreme gradient boosting. 2023.
- 60.
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. 2024.
- 61.
Shi Y, Ke G, Soukhavong D. Lightgbm: light gradient boosting machine. 2023.
- 62.
Wickham H. ggplot2: elegant graphics for data analysis. New York, NY, USA: Springer; 2016.
- 63. Choudhary A, Kumar P, Pradhan C, Sahu SK, Chaudhary SK, Joshi PK, et al. Evaluating air quality and criteria pollutants prediction disparities by data mining along a stretch of urban-rural agglomeration includes coal-mine belts and thermal power plants. Front Environ Sci. 2023;11.
- 64. Srivastava S, Kumar A, Bauddh K, Gautam AS, Kumar S. 21-day lockdown in india dramatically reduced air pollution indices in Lucknow and New Delhi, India. Bull Environ Contam Toxicol. 2020;105(1):9–17. pmid:32495123
- 65. Benchrif A, Wheida A, Tahri M, Shubbar RM, Biswas B. Air quality during three covid-19 lockdown phases: AQI, PM2.5 and NO2 assessment in cities with more than 1 million inhabitants. Sustain Cities Soc. 2021;74:103170. pmid:34290956
- 66. Zhang Z, Xue T, Jin X. Effects of meteorological conditions and air pollution on COVID-19 transmission: evidence from 219 Chinese cities. Sci Total Environ. 2020;741:140244. pmid:32592975
- 67. Sarroeira R, Henriques J, Sousa AM, Ferreira da Silva C, Nunes N, Moro S, et al. Monitoring sensors for urban air quality: the case of the municipality of Lisbon. Sensors (Basel). 2023;23(18):7702. pmid:37765759
- 68. Dong D, Xu X, Yu H, Zhao Y. The impact of air pollution on domestic tourism in China: a spatial econometric analysis. Sustainability. 2019;11(15):4148.
- 69. Ethan CJ, Mokoena KK, Yu Y. Air pollution status in 10 mega-cities in China during the initial phase of the COVID-19 outbreak. Int J Environ Res Public Health. 2021;18(6):3172. pmid:33808577
- 70. Kim B. Do air quality alerts affect household migration?. Southern Economic Journal. 2018;85(3):766–95.
- 71. Wu H, Zhang Y, Yu Q, Ma W. Application of an integrated Weather Research and Forecasting (WRF)/CALPUFF modeling tool for source apportionment of atmospheric pollutants for air quality management: a case study in the urban area of Benxi, China. J Air Waste Manag Assoc. 2018;68(4):347–68. pmid:29020513
- 72. Balakrishnan K, Dey S, Gupta T, Dhaliwal RS, Brauer M, Cohen AJ, et. al. The impact of air pollution on deaths, disease burden, and life expectancy across the states of India: the global burden of disease study 2017. Lancet Planet Health. 2019;3(1):e26–39. pmid:30528905
- 73. Graça D, Reis J, Gama C, Monteiro A, Rodrigues V, Rebelo M, et al. Sensors network as an added value for the characterization of spatial and temporal air quality patterns at the urban scale. Sensors (Basel). 2023;23(4):1859. pmid:36850456
- 74. Rani N, Azid A, Khalit S, Juahir H, Samsudin M. Air pollution index trend analysis in Malaysia, 2010–15. Pol J Environ Stud. 2018;27:801–7.
- 75. Ming W, Zhou Z, Ai H, Bi H, Zhong Y. COVID-19 and air quality: evidence from China. Emerg Markets Fin Trade. 2020;56(10):2422–42.
- 76. Gao H, Yang W, Yang Y, Yuan G. Analysis of the air quality and the effect of governance policies in China’s pearl river delta, 2015–2018. Atmosphere. 2019;10(7):412.
- 77. Han C, Xu R, Zhang Y, Yu W, Zhang Z, Morawska L, et al. Air pollution control efficacy and health impacts: a global observational study from 2000 to 2016. Environ Pollut. 2021;287:1–9.
- 78. Kumar P, Choudhary A, Joshi PK, Kumar RP, Bhatla R. Machine learning models for estimating criteria pollutants and health risk-based air quality indices over eastern coast coal mine complex belts. Front Environ Sci. 2025;13.
- 79. Ignac-Nowicka J. Towards smart city: influence of air pollution on the local community of the Zabrze city in surveys and field research. Multidiscip Asp Prod Eng. 2018;1:845–50.
- 80. Gurajala S, Dhaniyala S, Matthews JN. Understanding public response to air quality using tweet analysis. Social Media Soc. 2019;5(3).