Table 1.
Recent approaches to rainfall prediction across various methodological frameworks.
Fig 1.
Workflow of the proposed pipeline.
The diagram outlines the sequential stages of data collection, preprocessing, feature engineering, model training, evaluation, and result interpretation.
Fig 2.
Rainfall intensity distribution across Bangladesh.
The figure visualizes cumulative precipitation from 2010 onward. The map highlights regional variations in rainfall patterns, offering insights into long-term hydrological trends. Base map data from Natural Earth (public domain), with additional layers and annotations by the authors.
Fig 3.
The figure illustrates the frequency of samples across the four rainfall intensity classes (Class 0–3), highlighting the class imbalance in the dataset.
Fig 4.
Correlation matrix of meteorological variables.
The figure depicts pairwise Pearson correlation coefficients among the principal meteorological variables—temperature, humidity, and sunshine duration—and the target variable, rainfall class. Positive correlations are indicated by warmer tones and negative correlations by cooler tones. The analysis reveals humidity as the most positively correlated predictor and sunshine duration as the most negatively correlated factor influencing rainfall intensity across Bangladesh.
Table 2.
Descriptive statistics for the main variables.
Fig 5.
Regional weather variations across Bangladesh’s five hottest stations.
(A) Average temperature, (B) sunshine duration, (C) relative humidity, and (D) rainfall patterns. Data reflect annual averages from the study period.
Fig 6.
Monthly variations of weather parameters in Bangladesh.
The figure shows: (a) average rainfall (mm), (b) sunshine duration (hours), (c) relative humidity (%), and (d) temperature (°C) across a calendar year. The data highlight the characteristic monsoon pattern, with peak rainfall occurring from June to August and an inverse relationship with sunshine hours. Humidity remains persistently high (>70%) throughout the year, while temperature exhibits expected seasonal variation, with the highest values in the pre-monsoon months (April–June).
Fig 7.
Feature engineering pipeline for Feature Set 1 (standard temporal encoding).
The diagram outlines the sequential steps of temporal feature extraction, including calendar-based attributes, lag computation, and rolling window aggregation.
Fig 8.
Two-stage hierarchical stacking framework.
Stage 1 produces out-of-fold (OOF) predictions from base learners; these OOF probabilities form meta-features used by the Stage 2 meta-learner for final prediction.
Fig 9.
Mixture-of-experts (MoE) architecture.
The figure illustrates how a gating network adaptively assigns weights to expert models (Random Forest and XGBoost) for each input sample.
Fig 10.
Dynamic ensemble selection (DES) using the KNORA-U algorithm.
The figure illustrates how the KNORA-U algorithm selects classifiers (Decision Tree, Random Forest, KNN, and XGBoost) based on local neighborhood accuracy.
Fig 11.
Confusion matrices of the top-performing models.
The figure presents results for (a) Bidirectional LSTM, (b) LightGBM (cyclical features), (c) Gradient Boosting, and (d) Random Forest.
Table 3.
Performance comparison of ML and DL models with and without feature engineering.
Fig 12.
Unified multi-metric heatmap summarizing model performance.
The figure presents accuracy, precision, recall, and F1-score across all model categories (baseline, preprocessed, feature-engineered, and ensemble). Darker tones indicate stronger performance and highlight the relative consistency between models.
Fig 13.
Three-dimensional probability distributions of rainfall predictions.
The figure shows rainfall classification probabilities across humidity, sunshine, and temperature feature spaces using the Random Forest model. Color gradients indicate classification confidence, revealing how hygrothermal variables jointly influence rainfall occurrence.
Table 4.
Model performance after data balancing using SMOTE.
Table 5.
Test accuracies of classical models after hyperparameter optimization using Randomized Search (RSCV) and Grid Search (GSCV).
Table 6.
Comparative performance of ensemble and stacking strategies.
Fig 14.
LIME visualization depicting local feature contributions for dry weather predictions.
The figure illustrates how the LIME algorithm identifies the most influential features responsible for predicting ‘no rain‘ (dry weather) conditions.
Fig 15.
SHAP dependence plots for previous-day humidity across rainfall classes.
The figure shows SHAP dependence plots for Humidity_lag_1 (previous-day humidity) across rainfall classes: no rain, light rain, moderate rain, and very heavy rain. Color gradients indicate interactions with Temperature_lag_3 (three-day lag), Temperature_roll_mean_7 (seven-day rolling mean), and Humidity_roll_mean_3 (three-day rolling mean). Higher lagged humidity values correspond to stronger SHAP contributions for light and very heavy rainfall, highlighting how recent temperature and humidity persistence jointly influence precipitation.
Fig 16.
Reliability diagrams for rainfall intensity classification.
Calibration curves show the relationship between mean predicted probability and fraction of positive outcomes for each rainfall class. The diagonal dashed line represents perfect calibration. The model achieves ECE = 0.039, demonstrating well-calibrated probabilistic predictions.
Fig 17.
(a) Distribution of prediction confidence scores; 42.0% of samples exceed the high-confidence threshold of 0.8. (b) Accuracy stratified by confidence level, showing that predictions with confidence 0.9–1.0 achieve 98.9% accuracy compared to the baseline accuracy of 70.1%.
Table 7.
Comparative overview of recent rainfall prediction and classification studies.