Ensemble and temporal feature-based framework for rainfall classification in Bangladesh

doi:10.1371/journal.pone.0342646

Table 1.

Recent approaches to rainfall prediction across various methodological frameworks.

More »

Expand

Fig 1.

Workflow of the proposed pipeline.

The diagram outlines the sequential stages of data collection, preprocessing, feature engineering, model training, evaluation, and result interpretation.

More »

Expand

Fig 2.

Rainfall intensity distribution across Bangladesh.

The figure visualizes cumulative precipitation from 2010 onward. The map highlights regional variations in rainfall patterns, offering insights into long-term hydrological trends. Base map data from Natural Earth (public domain), with additional layers and annotations by the authors.

More »

Expand

Fig 3.

Dataset class distribution.

The figure illustrates the frequency of samples across the four rainfall intensity classes (Class 0–3), highlighting the class imbalance in the dataset.

More »

Expand

Fig 4.

Correlation matrix of meteorological variables.

The figure depicts pairwise Pearson correlation coefficients among the principal meteorological variables—temperature, humidity, and sunshine duration—and the target variable, rainfall class. Positive correlations are indicated by warmer tones and negative correlations by cooler tones. The analysis reveals humidity as the most positively correlated predictor and sunshine duration as the most negatively correlated factor influencing rainfall intensity across Bangladesh.

More »

Expand

Table 2.

Descriptive statistics for the main variables.

More »

Expand

Fig 5.

Regional weather variations across Bangladesh’s five hottest stations.

(A) Average temperature, (B) sunshine duration, (C) relative humidity, and (D) rainfall patterns. Data reflect annual averages from the study period.

More »

Expand

Fig 6.

Monthly variations of weather parameters in Bangladesh.

The figure shows: (a) average rainfall (mm), (b) sunshine duration (hours), (c) relative humidity (%), and (d) temperature (°C) across a calendar year. The data highlight the characteristic monsoon pattern, with peak rainfall occurring from June to August and an inverse relationship with sunshine hours. Humidity remains persistently high (>70%) throughout the year, while temperature exhibits expected seasonal variation, with the highest values in the pre-monsoon months (April–June).

More »

Expand

Fig 7.

Feature engineering pipeline for Feature Set 1 (standard temporal encoding).

The diagram outlines the sequential steps of temporal feature extraction, including calendar-based attributes, lag computation, and rolling window aggregation.

More »

Expand

Fig 8.

Two-stage hierarchical stacking framework.

Stage 1 produces out-of-fold (OOF) predictions from base learners; these OOF probabilities form meta-features used by the Stage 2 meta-learner for final prediction.

More »

Expand

Fig 9.

Mixture-of-experts (MoE) architecture.

The figure illustrates how a gating network adaptively assigns weights to expert models (Random Forest and XGBoost) for each input sample.

More »

Expand

Fig 10.

Dynamic ensemble selection (DES) using the KNORA-U algorithm.

The figure illustrates how the KNORA-U algorithm selects classifiers (Decision Tree, Random Forest, KNN, and XGBoost) based on local neighborhood accuracy.

More »

Expand

Fig 11.

Confusion matrices of the top-performing models.

The figure presents results for (a) Bidirectional LSTM, (b) LightGBM (cyclical features), (c) Gradient Boosting, and (d) Random Forest.

More »

Expand

Table 3.

Performance comparison of ML and DL models with and without feature engineering.

More »

Expand

Fig 12.

Unified multi-metric heatmap summarizing model performance.

The figure presents accuracy, precision, recall, and F1-score across all model categories (baseline, preprocessed, feature-engineered, and ensemble). Darker tones indicate stronger performance and highlight the relative consistency between models.

More »

Expand

Fig 13.

Three-dimensional probability distributions of rainfall predictions.

The figure shows rainfall classification probabilities across humidity, sunshine, and temperature feature spaces using the Random Forest model. Color gradients indicate classification confidence, revealing how hygrothermal variables jointly influence rainfall occurrence.

More »

Expand

Table 4.

Model performance after data balancing using SMOTE.

More »

Expand

Table 5.

Test accuracies of classical models after hyperparameter optimization using Randomized Search (RSCV) and Grid Search (GSCV).

More »

Expand

Table 6.

Comparative performance of ensemble and stacking strategies.

More »

Expand

Fig 14.

LIME visualization depicting local feature contributions for dry weather predictions.

The figure illustrates how the LIME algorithm identifies the most influential features responsible for predicting ‘no rain‘ (dry weather) conditions.

More »

Expand

Fig 15.

SHAP dependence plots for previous-day humidity across rainfall classes.

The figure shows SHAP dependence plots for Humidity_lag_1 (previous-day humidity) across rainfall classes: no rain, light rain, moderate rain, and very heavy rain. Color gradients indicate interactions with Temperature_lag_3 (three-day lag), Temperature_roll_mean_7 (seven-day rolling mean), and Humidity_roll_mean_3 (three-day rolling mean). Higher lagged humidity values correspond to stronger SHAP contributions for light and very heavy rainfall, highlighting how recent temperature and humidity persistence jointly influence precipitation.

More »

Expand

Fig 16.

Reliability diagrams for rainfall intensity classification.

Calibration curves show the relationship between mean predicted probability and fraction of positive outcomes for each rainfall class. The diagonal dashed line represents perfect calibration. The model achieves ECE = 0.039, demonstrating well-calibrated probabilistic predictions.

More »

Expand

Fig 17.

Confidence-accuracy analysis.

(a) Distribution of prediction confidence scores; 42.0% of samples exceed the high-confidence threshold of 0.8. (b) Accuracy stratified by confidence level, showing that predictions with confidence 0.9–1.0 achieve 98.9% accuracy compared to the baseline accuracy of 70.1%.

More »

Expand

Table 7.

Comparative overview of recent rainfall prediction and classification studies.

More »

Expand