Using explainable AI to investigate electrocardiogram changes during healthy aging—From expert features to raw signals

Cardiovascular diseases remain the leading global cause of mortality. Age is an important covariate whose effect is most easily investigated in a healthy cohort to properly distinguish the former from disease-related changes. Traditionally, most of such insights have been drawn from the analysis of electrocardiogram (ECG) feature changes in individuals as they age. However, these features, while informative, may potentially obscure underlying data relationships. In this paper we present the following contributions: (1) We employ a deep-learning model and a tree-based model to analyze ECG data from a robust dataset of healthy individuals across varying ages in both raw signals and ECG feature format. (2) We use explainable AI methods to identify the most discriminative ECG features across age groups.(3) Our analysis with tree-based classifiers reveals age-related declines in inferred breathing rates and identifies notably high SDANN values as indicative of elderly individuals, distinguishing them from younger adults. (4) Furthermore, the deep-learning model underscores the pivotal role of the P-wave in age predictions across all age groups, suggesting potential changes in the distribution of different P-wave types with age. These findings shed new light on age-related ECG changes, offering insights that transcend traditional feature-based approaches.


INTRODUCTION
Characterizing healthy aging through ECG changes Cardiovascular diseases continue to represent the leading cause of mortality worldwide [1].Analyzing the effect of healthy aging on the cardiovascular system enables to distinguish between an old but healthy and a younger but cardiovascularconstrained heart.This is especially difficult as there exists a discrepancy between biological age and cardiovascular age [3].Thus, knowing what changes in the heart during healthy aging can help to avoid deaths since it enables early treatment through premature detection of cardiovascular diseases.These changes are most commonly assessed through changes in electrocardiogram (ECG) features of healthy people with age.Shortcomings of prior work However, by working on the feature level, relationships in the ECG that are not covered by any features may be excluded from the analysis.Previous research suggests that deep-learning models can outperform feature-based classifiers on age prediction with ECG data [4].Studies like [5], [6] also used deep-learning models to infer the age from ECG data but they did not restrict their dataset GO, YS, JMLA and NS are with Oldenburg University, Oldenburg, Germany (email: {gabriel.ott,yannik.schaubelt,juan.lopez.alcaraz,nils.strodthoff}@uol.de.WH is with Charité Universitätsmedizin Berlin, Berlin, Germany (email: wilhelm.haverkamp@dhzc-charite.de).GO and YS contributed equally.Corresponding author: Nils Strodthoff. to healthy people only.Moreover, when deep-learning models were used to do age prediction on ECG data of healthy subjects by [7], it was not analyzed how the models managed to detect the age and thus they failed to provide information on what changes in the heart during aging.Furthermore, [8], [9] and [10] focused on analyzing what changes in the heart with age, but did not make use of deep-learning models.Research questions In this work, we aim to use techniques from explainable AI (XAI), to identify both features from feature-based age classifiers as well as ECG segments from deep learning models operating on raw data that are most important to discriminate between different age groups within a collective of healthy people.To this end, we address the following research questions: (1) Do both approaches produce insights that correspond to literature results?(2) Can this approach be used to discover new ECG feature correlations in the context of healthy aging across diverse groups?Main contributions and findings We analyze a dataset [11], [12] of 1,120 ECG recordings of healthy people with varying ages using two different models: an XResNet50 and an eXtreme Gradient Boosting (XGBoost) model.The XRes-Net50 operates on raw ECG data, while the XGBoost model uses long-range and short-range ECG features as input.Both models were trained to predict the age of a healthy person from their 1-lead-ECG and achieve a competitive performance (macro-AUCs of 0.73 and 0.77, respectively).After training, both models were then investigated with explainable AI methods where the XGBoost-model was analyzed with SHAP to find the most important ECG features for classifying the age.On the other hand, for the XResNet50 heartbeat-based saliency maps were superimposed to show the areas of interest for the age prediction task.
To summarize, our contributions and findings in this study are: (1) Our approach reaches competitive performance even though it leverages very different feature sets.(2) The XG-Boost model mainly leverages long-range features.The inferred breathing rate from the ECG declines with age for healthy people and that very high SDANN5 (average standard deviation of normal-to-normal RR-intervals within a 5-minute interval) values are more likely to be from a person aged 50 or more than from a person aged 34 or less, even though SDANN values generally decline with age.(3) By construction, the XResNet is only able to leverage short-range features.Given the XResNet insights, we show that the XResNet50 exploits relationships in the P-wave with age, presumably indicating that the distribution of different types of P-wave changes with age.

I. MATERIALS AND METHODS
A. Background ECG analysis ECG analysis is of great importance in understanding the impact of aging on heart health and mortality.As highlighted by [13], accelerated heart-aging, as indicated by ECG-age, is associated with a significant increase in all-cause mortality, underscoring the crucial role of ECG as a biomarker for cardiovascular risk.Investigating how the ECG of a healthy heart changes with age is therefore essential for mortality prediction.Furthermore, [10] examined ECG features to assess changes in the autonomic nervous system across various age groups, revealing clear age-related trends.While their study demonstrated such trends, it utilized a limited set of ECG features.For a comprehensive understanding of ECG analysis and its diverse applications, including critical steps like preprocessing, feature extraction, selection, transformation, and classification, [14] provides an informative survey that encompasses the breadth of this field.Age prediction from ECGs [7] used deep learning techniques on the automatic aging dataset [11] for age prediction.However, their approach involved reducing the original 15 age classes to only 4, which improved model performance but limited the potential for explainability insights.Meanwhile, [5] successfully predicted the age of individual subjects with a notable average error of 7 years.It is worth noting that their dataset included individuals with various health conditions, raising the possibility of age inference from age-related diseases.A similar result was achieved on a public dataset in [6].[4] introduced a promising predictive approach comparing deep learning models working on raw ECG data and tree-based classifiers using ECG features.These models were trained on a substantial dataset of over 2.3 million 12-lead ECGs for diverse tasks.However, the study did not explore the explainability of their models, which remains an important aspect for further investigation in age prediction from ECG data.Our research addresses these gaps by considering a finer granularity of age groups, allowing for more detailed insights from explainable methods.Additionally, by focusing solely on data from healthy individuals, we ensure cleaner data and more reliable insights and hence less confounding factors, thus contributing to a more comprehensive understanding of age prediction from ECGs.ECG explainability A recent review [15] highlights the potential of using techniques of explainable AI (XAI) to uncover mechanisms underlying age prediction models, an approach that resonates very well with the approach taken in this work.In the domain of ECG analysis, the significance of interpretability has gained prominence in recent studies, as nicely reviewed in recent systematic reviews [16].Researchers have been actively integrating XAI techniques to enhance the interpretation of ECG data.However, in many cases, this crucial component get reduced to anecdotal evidence obtained from the straightforward application of commonly used attribution methods to handpicked examples to underline the validity of the proposed algorithm.On the contrary, two recent dedicated works on interpretability in the ECG domain [17], [18] highlight the methodology of aggregated attributions across patients or entire patient populations, a technique that is also supposed to be used in this study.

B. Dataset and data preparation
Dataset The Autonomic Aging dataset [12], [19] aims to quantify changes in cardiovascular autonomic function during healthy aging.It contains ECG recordings of 1,120 healthycontrol subjects sampled at 1,000 Hz in a resting state under controlled measurement conditions.Nevertheless, for the purpose of this study, we considered only ECGs where the age information is given, leaving 1,095 patients.The patients' ages range from 18 to 92 years, and the recordings span from 8 to 35 minutes, with a mean of 19 minutes.Two different devices were used to measure the ECGs, a 1-lead and a 2-lead ECG recorder.Therefore, we used only the matching lead (II) from both devices.The gender distribution within the dataset shows a slight imbalance towards female patients (675 female and 420 male patients).The dataset contains 38 ECG recordings with missing values, however, most of them with only a few time steps across the complete signal at 1000 Hz, therefore, missing values were removed without excluding recordings.For approaches based on raw time series data, we work at a temporally downsampled resolution of 100 Hz, which was found to be sufficient for common diagnostic tasks [20], and also removed a negligible number of missing values in the time series.
In the dataset, all subjects are diagnosed as healthy, so we sampled 3-second crops in order to capture at least one complete heartbeat.Due to the imbalanced nature of the dataset, we opted for a 60/20/20 split between training, validation, and test sets at the subject level.Importantly, we maintained the age-group distribution in each set as far as possible while simultaneously ensuring that every age group was represented in all three sets.For consistency, all models presented in this work use identical splits for their training, validation, and test data.Table I contains additional descriptive statistics of the processed dataset.See our source code for the dataset preprocessing steps [21] for more details.On the contrary, the last four classes represent only 39 samples or correspondingly 3.4% of the full dataset.Furthermore, it is important to mention that there are no male samples available for the age groups 75-79 years and 85-92 years.As past studies did not indicate a strong interaction effect between gender and age prediction, and dividing the dataset by gender would worsen the imbalance in the smaller age groups, we decided to ignore gender as a covariant in this study.

C. Models and feature sets
Overview For the purpose of this study, we investigate two different classifiers working on two different feature sets but both predicting the subject's age: a residual neural network (XResNet50) operating on raw time series data and a treebased model gradient boost decision tree classifier (XGBoost) operating on derived features.We carried out diverse experiments to find sufficient feature sets and preprocessing settings which could lead us to a scenario of better performance results.
In each of the settings, we report test set scores of models selected based on held-out validation set scores.To facilitate continued research, we release the source code underlying our study [21].1) XGBoost: For the XGBoost model, we include longrange heart-rate variability (HRV) and short-range (SR) features.The HRV features describe how the ECG signal varies over time, and it contains features such as Standard Deviation of NN Intervals (SDNN), Root Mean Square of Successive Differences (RMSSD) and low and high-frequency powers to name a few.The HRV features were extracted from the ECGs with NeuroKit2 [22].The SR features were calculated from fiducial points and comprise features such as R-R intervals, heart rate, peak amplitudes, and waves such as Q, R and S, and P and T respectively.The SR features were extracted per heartbeat with the python HeartPy toolkit [23] in combination with NeuroKit2.To produce an age prediction for a whole ECG, the heartbeat-interval SR features were averaged over each ECG recording.This enabled us to combine the SR and HRV feature sets.As there are two different feature sets, all three combinations were tested: each alone and both sets combined.Training As a countermeasure against the label imbalance in the training dataset, we created artificially balanced datasets by oversampling the minority classes with a random oversampling technique.Lastly, regarding the model training, we performed a grid search to determine optimal hyperparameters based on validation set performance.After this process, the only hyperparameters where we found deviations from default values to be beneficial were max depth=10, max leaves=10, learning rate=0.008.For the explainability analysis, we leverage SHAP values [24].This is in line with a recent comparative study [25] where SHAP values showed good overlap with cardiologists' expert features.Relevant features At this point, it is worthwhile explicitly highlighting a number of ECG features that will play an important role for the later analysis: • SDNN and SDNN5: SDNN represents the standard deviation of normal-to-normal RR-intervals, while SDANN5 denotes the average SDNN calculated within a 5-minute interval.Similarly, SDANN1 refers to the average SDNN computed within a 1-minute interval.• HRV PAS: This metric quantifies the percentage of NN intervals within alternating segments, where NN intervals represent the time intervals between normal R-peaks in the ECG signal.• Alpha-Features: Alpha-features are derived from detrended fluctuation analysis (DFA) and provide insights into the auto-correlation between heartbeats.Specifically, alpha1 characterizes short-term correlations, while alpha2 captures long-term correlations in heart rate variability.• pNN: pNN20 signifies the percentage of heartbeat intervals with more than a 20-millisecond deviation from the previous interval, while pNN50 represents the corresponding percentage for intervals with more than a 50millisecond deviation.• MCVNN: MCVNN stands for the median absolute deviation of RR intervals divided by the median of RR intervals, providing valuable information about heart rate variability.
These ECG features serve as critical components for our analysis, and understanding their definitions is essential for comprehending the subsequent sections of this paper.
2) XResNet50: The XResNet50 deep learning model works with raw time series data.We chose the XResNet50 model, which showed competitive performance with the bestperforming convolutional neural networks for a range of different ECG classification tasks [6], [26].It represents a one-dimensional adaptation of a commonly used ResNet-type convolutional neural network from computer vision [27].Here we additionally restrict to a single input channel as appropriate for 1-lead ECG data.
Training The XResNet50 was trained with the AdamW optimizer and weight decay [28].We investigate two different loss functions, namely focal loss (FL) [29] and cross-entropy loss (CEL).The learning rate was set to 10 − 5 for FL and 10 − 2 for CEL and adjusted with a reduced learning rate on the plateau scheduler, which divided the learning rate by 10 if the loss did not decrease for 2 consecutive epochs.We trained on 20 epochs with early stopping after 3 consecutive epochs.Oversampling the training set produced insufficient performance results in early experiments.Thus, as a countermeasure against the unbalanced training set, class weights are applied instead.The class weights were set to the inverse of each age group's number of occurrences in the training set.We investigate different scenarios, firstly by using two different loss functions and, secondly, by applying training class weights.
We trained the models on crops of 3s length and aggregate predictions from multiple crops using mean output predictions to obtain sample-level predictions, see [20] for a detailed analysis of the benefits of this procedure.Note that using 3s-crops limits the XResNet50 to detect short-range patterns.We leverage the methodology proposed in [30] to compute beat-aligned attribution maps over entire patient subgroups.In particular, we use saliency maps as attribution maps as saliency was the only attribution method that satisfied the sanity checks proposed in [30].

D. Performance metric
For comparability with earlier works, we report accuracy as a performance metric but stress the severe shortcomings of accuracy in the presence of severe class imbalance as is the case here.As the main performance metric, we report the macroaveraged (over age groups) area under the receiver operating curve (macro-AUC), which is less affected by class imbalance and operates on output probabilities rather than dichotomized outputs.To assess the uncertainty of our predictions due to the finite size of the test set, we resort to bootstrapping on the test set.We report 2.5 and 97.5 percentiles of the test set scores, i.e. 90% confidence intervals for the test set scores.We indicate these within brackets behind the point estimate for the score such as 0.5 (0.025, 0.975).

1) XGBoost:
In Table II, we present the performance evaluation of the XGBoost model operating on different feature sets, including SR, HRV, and a combined feature set HRV+SR, trained on both balanced (oversampled) and unbalanced (original) datasets.The results indicate a nuanced performance pattern, with the balanced configuration generally exhibiting slightly superior performance compared to the unbalanced setting.Notably, the model operating on SR features performed worse with an AUC of 0.70.Conversely, the model incorporating both SR and HRV features and trained on a balanced dataset demonstrated the highest efficacy, achieving an AUC score of 0.77 on the test set.The performance gain over the model leveraging on HRV features confirms that the combined model actually exploits both short-range as well as long-range features.To set these results into perspective, we also show a direct comparison between our feature-based XGBoost model and a previously introduced feature-based approach [7].For comparability, we follow their approach and consolidate the original 15 age groups into 4 broader categories.Both models demonstrate closely aligned accuracy scores, with our XG-Boost model achieving 0.684 (95% CI: 0.62-0.74)and the prior feature-based model at 0.688 (95% CI: 0.64-0.73).The similarity in performance with almost identical point estimates and largely overlapping confidence intervals underscores the parity between our XGBoost model and the established feature-based approach.This reinforces the reliability of our findings and underscores the suitability of our model for age group classification tasks, laying the foundation for further explainability investigations.
2) XResNet: Table III presents the performance evaluation for different XResNet50 configurations.Notably, the experimental findings underscore the superiority of focal loss over cross-entropy loss as the preferred choice for this task with a severely imbalanced label distribution.Furthermore, irrespective of the loss function employed, it is evident that the model attains significantly enhanced performance levels when trained on the unbalanced training dataset.The most noteworthy configuration emerges with focal loss applied to the unbalanced training dataset, achieving a commendable macro AUC score of 0.74.It is worth stressing that the training and validation happened on crop level but the testing on subject level.By averaging the prediction of all crops belonging to a patient, a subject-level prediction was formed.3) Comparative assessment: When comparing the results from both models it is interesting to see that both reach a comparable performance despite fundamentally different input representations and model architectures.The most direct comparison is between the XGBoost model operating on SR (short-range) and the XResNet model, which by construction only leverages short-range features as well.It reveals a slight advantage on the side of the XResNet model, which is in line with the original hypothesis that the raw waveform contains additional discriminative information that is not covered in conventionally considered short-range ECG features.The long-range information contained in the HRV features is somewhat complementary to this information as the increase in the predictive performance of the HRV+SR model compared to the SR model shows.It remains an interesting question for future research if such long-range interactions could also be exploited using models operating on raw time series data with appropriate model architectures, see [20] for first steps in this direction.
Additionally, Fig 2 displays sample-level AUC scores for both models across diverse age groups, revealing a general performance improvement with increasing age, albeit with a notable exception in the 75-79-year age group for the XResNet model and lower scores observed within the 24 to 44-year age range.These findings provide valuable insights into the nuanced age-related patterns discerned by both models.

B. Explainability results
1) XGBoost: SHAP feature relevances In this study, we classify the features into two categories: those with the prefix HRV-denote long-range HRV features, while those with the prefix SR-represent short-range features.Subsequently, we analyze the top 10 influential features for each age group, see Fig 3.
Important features with consistent age trends Observations reveal that certain features are recurrent across multiple age groups, displaying a consistent trend with respect to age.Specifically, HRV SDANN5, HRV PAS, P-wave amplitude (p mV) and alpha-fluctuation values consistently increase with age where lower values of these features are indicative of younger individuals, whereas higher values are associated with elderly individuals.Conversely, certain other features exhibit a contrasting trend.For instance, pNN20, MCVNN, and breathing rate in conjunction with breathing signal exhibit a decline with advancing age.High values of these features correspond to younger individuals, while lower values are characteristic of older individuals.Note that in the case of breathing rate and breathing signal, these features are derived from the ECG and serve as estimates of respiratory activity.Consistency with literature results It is noteworthy that our findings align with existing research in several aspects.Specifically, the observed trends in pNN, alpha-mean, HRV-PAS, P-wave amplitude and alpha-fluctuation are consistent with previous studies.For instance, the decrease in pNN50 with age among healthy subjects, as well as the discriminative power of pNN50 and pNN20 in age separation, has been noted by [31].Similarly, the steady increase in alpha-values with age among healthy subjects, as well as rising alphafluctuations, has been reported [32] and [33], albeit without specific reference to alpha-mean.Furthermore, the upward trajectory of HRV-PAS with age as observed in our model, is consistent with the findings of [34].Similarly, the pattern of P-wave amplitude rising up to age 60 before declining, as observed in our study, concurs with the research of [35].In summary, our XGBoost model's conclusions are in accordance with existing research, reinforcing the notion that certain physiological features exhibit consistent age-related trends, which can be valuable in understanding the physiological changes associated with aging.New insights: breathing rate and SDANN5 At this stage we have presented parts of our findings that align with previous research, however, we further provide insights into ECG and healthy aging, specifically for breathing rate and SDANN trends.
According to [36] breathing rate and age are hardly correlated at all.However, the 2.5-97.5 percentile of the breathing rate increases with age according to [37], meaning that lower-and higher breathing rates become more common with age.Furthermore, [2] suggests that there are variations in respiratory dynamics, particularly in response to metronome breathing such as the increase in high frequency at different postural changes, especially in young subjects.In contrast to this work, these studies were not limited to healthy subjects only.Since all subjects in this work are healthy this means that the breathing rate decreases with age for healthy subjects.This finding is also plausible when considering that all subjects were in a resting state during recording.Because of the general decline in body activity with age less energy and thus oxygen  is required to run body activities in a resting state.Assuming that the lungs and heart are healthy a lower breathing rate is therefore plausible for healthy aging.
Our model reveals interesting insights regarding the SDANN5 feature.Contrary to established research showing a general decline in SDANN with age, the explainability analysis suggests that high SDANN5-values contribute positively towards the age prediction of older individuals.For instance, it associates lower SDANN5 values with those aged 20-34 and higher values with those aged 60-64.A similar study using the same dataset [7] shows that while the mean SDANN does decline with age, there are significant variations in SDANN values among age groups, which lead to very high SDANN values being more probable for subjects older than 50 compared to subjects younger than 30.In summary, our XGBoost model uncovers an unexpected relationship between age and SDANN5, challenging the conventional wisdom of decreasing SDANN values with age.Notably, this effect is primarily observed in specific age groups beyond age 60.
2) XResNet: Beat-level descriptive analysis At first, we explore superimposed mean heartbeats for all age groups in Fig 4 as a plausibility test and to compare with literature statements.The amplitude of the T-wave decreases with age and shifts to the right, indicating an overall longer cardiac cycle, meaning a slower heart rate.Furthermore, the T-and P-wave intervals shorten with age; moreover, the absolute magnitude of the S-peak, Q-peak, and P-wave appears to diminish with age as well, which is in accordance with [38] [39].It is noteworthy that the amplitude of the R-peak shows no conclusive trend with age.Aggregated saliency maps: methodology Since ECGs even in the same age group have slightly different heart rates and are generally not aligned, the crop-level-saliency maps cannot simply be laid on top of each other.Following [18], the crops of each subject with their saliency maps were split into individ-ual heartbeats and averaged from 30 milliseconds before to 50 milliseconds after the R-peak.Then, these medium heartbeats were again averaged for each age group, resulting in one aggregated heartbeat per age group as shown in Fig 5 .To reveal the patterns exploited by the model most clearly, we used the training set to produce the aggregated attribution maps.We also mark the most salient data points (marked in red) to identify patterns across age groups as described below.Aggregated saliency maps: results The XResNet model consistently demonstrates a predilection for the entire Pwave as individuals age.It specifically focuses on the offset, with some onset in early age groups.These variations may reflect different P-wave types.Prior studies found that, the distribution of which undergoes significant changes with age.[35].Furthermore, research by [40]- [42] has elucidated agerelated disparities in various aspects of the P-wave, including its duration.Consequently, it is plausible that the Deep Learning model distinguishes age groups based on distinct P-wave parts and their respective distributions, underlining the complexity of its age classification methodology.The application of more sophisticated methods, for example from the domain of concept-based XAI such as [43], would be the logical next step to uncover these changes.Apart from the Pwave, the model frequently focuses on the Q-peak and S-peak while showing limited relevance to the R-peak and the peak of the T-wave.The TP segment receives moderate attention, indicating its importance in age-related classification.
3) Comparative assessment: Table IV presents key insights into age-related trends of ECG features that were derived from applying XAI on the XGBoost and XResNet50 models.In the XGBoost model, age is associated with a decrease in pNN20, MCVNN, and breathing rate, along with an increase in alpha-fluctuations, P-wave amplitude, and PAS.Additionally, SDANN5 values rise with age.In contrast, the XResNet50 exhibits distinct focus areas with age: given our criterion over the eight most important saliency time steps, it emphasizes P-wave features (18.33% for onsets and 53.33% for offsets), frequently places relevance on the Q-peak (8.33%), shows little relevance on the R-peak, sometimes focuses on the Speak (4.16%), attributes minimal relevance to the T-wave (3.33%), and frequently assesses the TP-interval (12.5%).The latter might be related to differences in the heart rate, which are difficult to analyze by means of saliency maps.Nevertheless, the observed trends offer valuable insights into age group differentiation in the two models' ECG interpretations.
4) Data imbalance and research focus: While the 'autonomic aging' dataset [19] used in this work stands out as one of the largest datasets of its kind, it is important to acknowledge its inherent imbalance, notably the scarcity of samples from individuals aged 70 or older.Deep learning models, with their appetite for ample training data, face a particular challenge in such scenarios.Addressing the imbalance by consolidating the underrepresented age groups might seem like a logical step, however, leads to a less nuanced prediction model.We have deliberately chosen not to merge age groups above 70 years as our primary focus centers on understanding Fig. 5: Saliency maps across age groups.Within each subplot, as you progress from left to right and from top to bottom, you navigate through the beat-level ECG saliency maps, shedding light on the key ECG features that contribute to age group differentiation according to the color map scheme.'Subjects' and 'Heartbeats' state the number of subjects and heartbeats used to create these plots.The eight highest gradients are marked in red. the nuances of a healthily aging heart.In this context, we find that the uniqueness of our dataset, even with its imbalances, continues to yield more insightful results that better align with our research objectives.

III. CONCLUSION
In this study, we investigated age-related cardiovascular changes of a healthy population.Leveraging ECG data from 1,095 healthy subjects, we developed two different models that work across diverse data modalities and used feature attribution methods to study their behavior: an XGBoost model analyzing short-and long-range ECG features as well as an XResNet50 model processing raw ECG data.Our experiments suggest that the feature-based model achieves better predictive performance in comparison with raw ECG data, which is comparable with literature performance.The findings from the feature-based model indicate increasing heart irregularity and reduced flexibility with age, which aligns with prior research.It also revealed a decline in inferred breathing rate with age and the significance of high SDANN values in older individuals.Notably, the deep-learning model identified the Pwave as the most important segment across all age groups.Our findings provide complementary insights into age-related ECG changes, whose identification is crucial for the early detection of cardiovascular diseases.To promote further exploration in this area of study, we release the source code underlying our study [21].

Fig. 2 :
Fig. 2: Predictive performance.Predictive performance results of the models per age group in terms of AUC on the test set, where the yellow (left) age group represents the XGBoost and the right (blue) the XResNet.

Fig. 3 :
Fig. 3: SHAP values across age groups.The 10 most important features for classifying the age groups are depicted in these subplots.In each subplot, the features are arranged in descending order of importance, emphasizing their significance in age group classification.The color scheme, with blue dots representing low feature values and red dots denoting high feature values, provides a visual representation of the feature's influence across different age groups.As you move from left to right and from top to bottom, you explore the SHAP values for all age groups.

Fig. 4 :
Fig. 4: Aggregated mean heartbeat.Aggregated mean heartbeat for all age groups showcases ECG feature trends across age groups.

TABLE I :
Summary of the dataset composition.
[11]1: Age-group distribution.Age-group distribution in terms of age groups provided in the Autonomic Aging dataset[11].The age groups span a range from 18 to 92 years, where the majority of patients are between 20 to 50 years old.Age-group distributionFig 1 shows the age distribution across the dataset in terms of 15 age groups, where the first age group contains subjects aged 18 to 19, whereas all following age groups but the last cover age intervals of 5 years.There is a clear imbalance in the age distribution, with the majority in age group 20-24 with 422 samples, followed by age group 25-29 with 105 samples.

TABLE II :
XGBoost model macro-AUC on the test set for different configurations.

TABLE III :
XResNet50 performance on different configurations.

TABLE IV :
Clinical observations and feature trends in agerelated explainability.Percentages refer to relative number of age groups where a high-saliency timestep (red in Fig 5) occurred in the corresponding segment.