Fig 1.
Daily birch and grass pollen concentration.
Birch (A) and grasses (B) pollen concentration data collected at the Jagiellonian University Collegium Medicum in 1991 - 2024. The solid line shows the median pollen count for each day of the year, while the dashed lines denote the first (25th percentile) and third (75th percentile) quartiles, illustrating the typical seasonal peak and interannual variability. Periods outside the pollen seasons have been shortened to improve readability.
Table 1.
Descriptive statistics of the daily birch and grass pollen concentrations used into the study.
Table 2.
Statistical description of meteorological data for birch and grass pollen seasons.
Table 3.
Overview of the machine learning methods evaluated in this study.
For each model, the table lists its methodological family, canonical theoretical reference, and the specific software implementation used in our experiments. The theoretical references point readers to foundational publications describing each method, while implementation references provide links to the exact packages and libraries used to ensure reproducibility.
Fig 2.
Schematic structure of the Multi-Associative Graph Network (MAGN). Sensory fields (rectangular nodes A and B) contain sensory neurons representing unique feature values (green circles), each with an associated duplicate counter indicating how many times the value appeared in the training data. Object neurons (blue circles) represent individual data samples and are linked to the sensory neurons through defining connections, which encode the feature composition of each object. Similarity connections between sensory neurons capture statistical associations between feature values across the dataset, while duplicate counters on object neurons reflect the frequency of feature values and their patterns. Together, these components form a hierarchical associative graph that enables incremental learning, relational reasoning, and efficient retrieval.
Fig 3.
Deep neural network architecture.
The input features, such as pollen concentration or meteorological data, are in the form of time series. The convolutional layers, followed by the LSTM/GRU layers, and the dense layers were used sequentially. The output is a time series predicting the pollen concentration in the following days.
Table 4.
Pollen concentration forecasting results.
Fig 4.
Comparison of machine learning models based on metric averages.
The machine learning methods are compared using the averages of all variants of Experiment 1 for accuracy (A), MAE (B), execution time in seconds (C), and total memory used during training and prediction in megabytes.
Fig 5.
The observed (gray) and predicted (green or orange) Betula (A) and Poaceae (B) pollen concentrations. The predictions were made by the XGBoost model for 3 randomly selected pollen seasons from the test set.
Fig 6.
Feature importance and normalized mutual information.
The top row shows the importance of the feature from the XGBoost model for Betula (A1) and Poaceae (B1). The bottom row shows the normalized mutual information for Betula (A2) and Poaceae (B2).
Fig 7.
Association rules analysis for meteorological data.
The association rules where the antecedent is the combination of average temperature, cloud cover, humidity, mean wind speed, and duration of the sunshine of the previous 3 days and the consequence is the pollen concentration class for Betula (A) and Poaceae (B). The bigger the dot, the bigger the support.
Table 5.
Comparative Overview of Machine Learning Methods for Pollen Forecasting.