Comparison of machine learning methods in forecasting and characterizing the birch and grass pollen season

doi:10.1371/journal.pone.0332093

Fig 1.

Daily birch and grass pollen concentration.

Birch (A) and grasses (B) pollen concentration data collected at the Jagiellonian University Collegium Medicum in 1991 - 2024. The solid line shows the median pollen count for each day of the year, while the dashed lines denote the first (25th percentile) and third (75th percentile) quartiles, illustrating the typical seasonal peak and interannual variability. Periods outside the pollen seasons have been shortened to improve readability.

More »

Expand

Table 1.

Descriptive statistics of the daily birch and grass pollen concentrations used into the study.

More »

Expand

Table 2.

Statistical description of meteorological data for birch and grass pollen seasons.

More »

Expand

Table 3.

Overview of the machine learning methods evaluated in this study.

For each model, the table lists its methodological family, canonical theoretical reference, and the specific software implementation used in our experiments. The theoretical references point readers to foundational publications describing each method, while implementation references provide links to the exact packages and libraries used to ensure reproducibility.

More »

Expand

Fig 2.

MAGN structure.

Schematic structure of the Multi-Associative Graph Network (MAGN). Sensory fields (rectangular nodes A and B) contain sensory neurons representing unique feature values (green circles), each with an associated duplicate counter indicating how many times the value appeared in the training data. Object neurons (blue circles) represent individual data samples and are linked to the sensory neurons through defining connections, which encode the feature composition of each object. Similarity connections between sensory neurons capture statistical associations between feature values across the dataset, while duplicate counters on object neurons reflect the frequency of feature values and their patterns. Together, these components form a hierarchical associative graph that enables incremental learning, relational reasoning, and efficient retrieval.

More »

Expand

Fig 3.

Deep neural network architecture.

The input features, such as pollen concentration or meteorological data, are in the form of time series. The convolutional layers, followed by the LSTM/GRU layers, and the dense layers were used sequentially. The output is a time series predicting the pollen concentration in the following days.

More »

Expand

Table 4.

Pollen concentration forecasting results.

More »

Expand

Fig 4.

Comparison of machine learning models based on metric averages.

The machine learning methods are compared using the averages of all variants of Experiment 1 for accuracy (A), MAE (B), execution time in seconds (C), and total memory used during training and prediction in megabytes.

More »

Expand

Fig 5.

Prediction examples.

The observed (gray) and predicted (green or orange) Betula (A) and Poaceae (B) pollen concentrations. The predictions were made by the XGBoost model for 3 randomly selected pollen seasons from the test set.

More »

Expand

Fig 6.

Feature importance and normalized mutual information.

The top row shows the importance of the feature from the XGBoost model for Betula (A1) and Poaceae (B1). The bottom row shows the normalized mutual information for Betula (A2) and Poaceae (B2).

More »

Expand

Fig 7.

Association rules analysis for meteorological data.

The association rules where the antecedent is the combination of average temperature, cloud cover, humidity, mean wind speed, and duration of the sunshine of the previous 3 days and the consequence is the pollen concentration class for Betula (A) and Poaceae (B). The bigger the dot, the bigger the support.

More »

Expand

Table 5.

Comparative Overview of Machine Learning Methods for Pollen Forecasting.

More »

Expand