Figures
Abstract
Objective
Dementia, particularly Alzheimer’s disease (AD), constitutes a major global health concern, with AD accounting for approximately 70% of all cases. EEG-based biomarkers hold promise for early identification of individuals at risk; however, small and heterogeneous samples frequently limit generalizability.
Methods
An EEG-based sample enrichment framework was developed by integrating advanced signal processing, component-level feature extraction, data harmonization (neuroHarmonize), and Propensity Score Matching (PSM). EEG data from four independent cohorts were harmonized to reduce site-related variability while preserving covariates such as age and sex. Features including power, entropy, coherence, synchronization likelihood, and cross-frequency coupling were extracted from independent components. PSM was applied at 2:1, 5:1, and 10:1 ratios to expand and balance the control group (HC) relative to the Alzheimer’s risk group (ACr), composed of PSEN1-E280A mutation carriers without cognitive symptoms.
Results
Sample enrichment through PSM improved classification accuracy, with decision tree models yielding values between 0.91 and 0.96. Higher enrichment ratios enhanced model stability and generalizability, as shown by learning curves and confusion matrices. Feature selection was based on model performance and effect sizes (Cohen’s d).
Citation: Henao Isaza V, Aguillon D, Tobón-Quintero CA, Lopera F, Ochoa-Gómez JF (2026) Comprehensive methodology for sample enrichment in EEG biomarker studies for Alzheimer’s risk classification. PLoS One 21(3): e0343722. https://doi.org/10.1371/journal.pone.0343722
Editor: Diego A. Forero, Fundación Universitaria del Área Andina, COLOMBIA
Received: August 27, 2025; Accepted: February 10, 2026; Published: March 11, 2026
Copyright: © 2026 Henao Isaza et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: “The publicly available datasets are: CHBMP: https://chbmp-open.loris.ca/ SRM: https://openneuro.org/datasets/ds003775/versions/1.2.1 UdeA1 and UdeA2: https://openneuro.org/datasets/ds007427 The analysis codes are publicly available at: https://github.com/GRUNECO/eeg_harmonization https://github.com/GRUNECO/Data_analysis_ML_Harmonization_Proyect”.
Funding: This work was supported by the Comité para el Desarrollo de la Investigación (CODI), Universidad de Antioquia (Project No. 2017-16371 to CATQ) and by the Comité para el Desarrollo de la Investigación (CODI), Universidad de Antioquia (Project No. PRG 2022-53407 to JFOG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The treatment of neurodegenerative diseases, particularly Alzheimer’s disease (AD), is emerging as a major global health concern [1]. AD, which is characterized by progressive cognitive decline, imposes a significant burden on individuals and society at large, contributing to approximately 70% of all cases of dementia worldwide [2]. Its hallmark features include the accumulation of beta-amyloid (Aβ) plaques and hyperphosphorylated tau neurofibrillary tangles, leading to neurodegeneration over time [3]. While understanding of the underlying mechanisms has advanced in recent decades, early and accurate detection of AD remains a major obstacle in clinical practice [4].
Several biomarkers for Alzheimer’s disease have been identified in the literature; however, the use of electroencephalography (EEG) biomarkers for reliable Alzheimer’s disease risk classification remains relatively unexplored [5]. Traditional methods often face challenges in balancing healthy non-carrier subjects (HC) and E280A mutation Alzheimer’s disease carriers without clinically detectable cognitive impairment (ACr) groups. For this study, ACr refers to individuals who are asymptomatic carriers of the E280A mutation, defined as those without clinically detectable cognitive impairment according to neuropsychological assessments. Despite carrying the mutation, these carriers do not exhibit significant deficits in memory, language, or other cognitive functions, thus remaining cognitively intact at the time of assessment. Challenges also arise in accounting for demographic variables such as age and sex [6].
Despite the potential of EEG in distinguishing subjects in the preclinical stages of AD, obtaining sufficiently large samples for reliable comparisons remains a significant challenge. Recent studies have highlighted the importance of large-scale collaborations that emphasize the integration of diverse datasets [7–9]. The EEG-IP platform, as presented in the work of van Noordt et al. [10] serves as an exemplary model for successfully integrating infant EEG data from multiple sites. By pooling longitudinal cohort studies, adhering to the Brain Imaging Data Structure (BIDS) EEG standard, and implementing a common signal processing pipeline, a standardized and integrated dataset was achieved [11]. This pioneering effort highlights both the successes and challenges encountered, particularly in addressing issues related to signal annotation, timing, and independent component analysis [12] during preprocessing.
Similarly, the work of Duncan et al. [13] introduces the Data Archive for the BRAIN Initiative (DABI), a dedicated platform designed to address the complexities of sharing human intracranial neurophysiology recordings and multimodal data. This initiative aligns with the overarching goal of creating specialized repositories capable of accommodating the unique features of complex and heterogeneous datasets, further validating the feasibility and importance of harmonizing data from diverse sources [14].
The main objective of this study is to develop a framework for exploring differences in pre-symptomatic subjects with scarce samples using advanced machine learning (ML) techniques. Using propensity score matching (PSM) techniques [15], we aim to optimize the balance between HC and ACr. Our study provides a detailed analysis of the results obtained, allowing us to identify significant patterns and differences in the distribution of relative performance between the groups of interest. Data visualization through figures and tables helps us to better understand the peculiarities of the EEGs of ACr compared HC.
This paper presents original research focused on an innovative workflow that integrates data from multiple sources and employs advanced data analysis techniques. By addressing the urgent need to improve the early and accurate detection of Alzheimer’s disease, we aim to provide new insights that will drive the development of more effective approaches to AD risk classification.
Materials and methods
Subjects and EEG acquisition
This study includes four EEG databases comprising healthy non-carriers (HC) and carriers of the PSEN1 E280A mutation associated with Alzheimer’s disease (ACr), none of whom presented clinically detectable cognitive impairment at the time of recording. All EEG recordings were obtained under resting-state conditions.
Inclusion and exclusion criteria
The selection of subjects and datasets followed predefined inclusion and exclusion criteria, summarized schematically in Fig 1. Briefly, included datasets were required to: (i) contain resting-state EEG recordings acquired with eyes closed and/or eyes open, (ii) include HC and/or ACr participants with available neuropsychological assessments, and (iii) be acquired using standard digital EEG systems with sufficient spatial resolution. Datasets were excluded if they were acquired using portable EEG systems, contained fewer than 58 electrodes, lacked detailed acquisition protocols, had been previously preprocessed, or corresponded to private sources without explicit consent from the data providers.
Schematic representation of the standardized EEG pre-processing workflow applied in this study, including detrending, referencing, artifact removal using ICA and wavelet-ICA, normalization, feature extraction, and harmonization. The pipeline generates spectral, connectivity, and entropy-based features across multiple frequency bands and independent components.
EEG acquisition
Participants were seated comfortably in a quiet room and instructed to remain relaxed without performing any specific task. No external stimuli were presented during the recordings. Depending on the database, EEG was recorded with eyes closed and/or eyes open.
General acquisition details across all databases:
- EEG systems: Standard digital EEG systems were used, with 58–128 electrodes placed according to the 10−20 or extended 10−10 system.
- Duration: Resting-state recordings lasted between 4 and 10 minutes depending on the dataset.
- Patient condition: All participants were neurologically evaluated, with ACr participants showing no cognitive impairment. Basic demographic information, including age and sex, was collected for all participants (Table 1).
- Data type: Multi-center recordings included both neurophysiological (EEG) and neuropsychological information.
Database-specific details:
- UdeA 1 Database: 68 ACr, 77 HC. EEG was recorded for 5 minutes with eyes closed and open using a Neuroscan amplifier and a 58-channel tin cap.
- UdeA 2 Database: 11 ACr, 12 HC. EEG was recorded for 5 minutes with eyes closed using a Neuroscan amplifier and 64 electrodes.
- SRM Database: 31 HC. EEG was recorded for 4 minutes with eyes closed using a BioSemi ActiveTwo system with 64 electrodes following the extended 10−10 system.
- CHBMP Database: 38 HC. EEG was recorded for 10 minutes with eyes closed using a MEDICID digital system with 64 or 128 electrodes.
All datasets were pooled to form a comprehensive database including ACr and HC subjects. The main characteristics of each dataset are summarized in Table 1.
Quality control
To ensure data reliability, quantitative quality-control metrics were applied at multiple stages of the EEG pre-processing pipeline. During the early-stage preprocessing (PREP pipeline), metrics were used to identify defective channels based on missing data (NaN), flat signals, amplitude deviation, high-frequency noise, correlation thresholds, signal-to-noise ratio, dropout events, and RANSAC-based detection. Artifact removal efficiency was further evaluated during wavelet–ICA processing by assessing the proportion of filtered independent components. Additionally, noisy time segments were identified and rejected using statistical and signal-based criteria, including kurtosis, amplitude thresholds, linear trends, and spectral power distribution.
The combined application of these metrics enabled systematic assessment of signal quality and ensured that only reliable EEG data were retained for subsequent analysis. A schematic summary of the quality-control metrics is provided in the Supplementary Materials (Supplementary S1 Fig).
Ethics statement and data access dates
EEG data from the UdeA databases were collected prospectively under the approval of the Ethics Committee of the Instituto de Investigaciones Médicas at the Universidad de Antioquia (Approval Act No. 010, Code F-017–00). All participants provided written informed consent prior to participation. For the present study, access to the UdeA 1 and UdeA 2 datasets was carried out throughout the year 2022.
Data from the SRM and CHBMP databases are publicly available and were obtained under their respective terms of use. These datasets were collected by independent research teams with appropriate ethical approvals and were fully anonymized prior to release. Access to the SRM and CHBMP data for this study also took place throughout 2022. Therefore, no additional ethics approval or informed consent was required for their secondary use. The authors did not have access to any identifying participant information.
EEG data pre-processing
Given the well-known inter-subject variability and stochastic nature of EEG signals, a standardized preprocessing, normalization, and harmonization strategy was applied to minimize non-neuronal variability while preserving biologically meaningful patterns.
The raw data underwent pre-processing using the pipeline proposed by Suarez et al. [16]. The standardized early-stage EEG (PREP) processing pipeline was applied (Fig 1), including signal detrending, robust referencing, and interpolation of bad channels. The Fast ICA algorithm obtained artifactual and neural ICA components after applying a high-pass filter. The records were segmented into 5-second epochs and subjected to wavelet-ICA for further artifact removal. At this step, individual Infomax ICA was applied. A low-pass filter was applied, and noisy epochs were detected and removed based on various criteria. The data was normalized to account for variability introduced by hair, scalp, and skull. Spectral, connectivity, and amplitude modulation features were extracted.
The Ochoa-Gómez, J. F., et al. [17] gICA methodology, outlined in the latter study, served as a robust foundation for extracting reproducible neuronal components from resting-state electroencephalographic data. This methodological framework ensures the reliability and reproducibility of the independent components, which were used as spatial filters for extracting the signals analyzed in this study.
Building on this, we applied machine learning models to harmonize features across diverse cohorts, focusing on Alzheimer’s disease and PSEN1-E280A mutation carriers (ACr).
Additionally, feature extraction involved assessing several key metrics:
- Relative Power: This measure evaluates the proportion of a signal’s power relative to a reference, providing insight into neural activity in different brain regions [18].
- Permutation entropy: A measure of uncertainty or disorder in a dataset, used to characterize the complexity and regularity of brain activity patterns [19].
- Coherence: A measure of the consistency or synchronization between signals at different frequencies, indicating functional communication between brain regions [20].
- Cross Frequency: Examines the relationship between oscillations in different frequency bands, providing insight into the organization and integration of neural activity [21].
- Synchronization Likelihood: Assesses the likelihood that two signals are synchronized in time, reflecting functional connectivity between brain regions [22].
Harmonization
To ensure harmonization across different datasets, we employed the neuroHarmonize package [23]. This tool, which extends the functionality of neuroCombat [24], uses the ComBat algorithm for correcting multi-site data. The data matrix and covariate matrix were prepared and harmonized, controlling for site effects while preserving covariate effects. This step was crucial for reducing inter-subject variability and improving the consistency of the data, thereby enhancing the reliability of subsequent analyses.
The propensity score matching (PSM) process used logistic regression to match individuals from the ACr and HC subjects based on their similarity in pretreatment covariates such as sex and age [15]. The propensity score calculated the probability of an individual being a gene carrier based on these characteristics. Subjects with lower propensity scores were then removed to achieve the desired proportion of subjects in the treatment group. PSM was used to ensure comparability between the healthy non-carrier (HC) and E280A mutation Alzheimer’s disease carrier without clinically detectable cognitive impairment (ACr) groups in the study. This method aims to balance covariates between treatment and control groups in observational studies, thereby providing more valid comparisons and reducing bias due to non-random treatment allocation.
This logistic regression-based propensity score matching ensures that confounding variables are controlled, leading to more comparable and homogeneous groups. By refining the matching process, differences observed in the EEG data can be more accurately attributed to the gene carrier status rather than initial characteristic differences.
Subject ratios and age-sex matching
We employed Propensity Score Matching (PSM) at varying ratios (2:1, 5:1, and 10:1) to assess the impact of sample size disparities between healthy non-carrier subjects (HC) and E280A mutation Alzheimer’s disease carriers without clinically detectable cognitive impairment (ACr) on model performance. These ratios were chosen to explore how increasing the size of the HC cohort relative to ACr influences the robustness and generalizability of our findings. Through PSM, we aimed to achieve balance in demographic variables such as age and sex, thereby mitigating biases associated with non-random treatment assignment and improving the comparability between groups.
In Fig 2 the data input for both HC and ACr groups were combined across all cohorts (UdeA 1, UdeA 2, SRM, CHBMP) and subjected to propensity score matching (PSM). This process resulted in matched datasets with two, five, and ten times as many HC as ACr.
Age- and sex-matched records from healthy non-carrier subjects (HC) and E280A mutation Alzheimer’s disease carriers without clinically detectable cognitive impairment (ACr) were combined across four cohorts and matched using propensity score matching at 2:1, 5:1, and 10:1 ratios.
Model selection
The data was organized within dataframes, where each row corresponded to a record, and the columns represented specific features. For the implementation and validation of the model, an 80−20 train-test split was applied, ensuring that class proportions were maintained to provide an unbiased model evaluation on unseen data.
The cross-validation process employed a pre-fitted estimator for predicting outcomes, using ten-fold iterations for training and evaluation. Performance scores were averaged across folds to ensure robust model assessment.
The pseudocode (Fig 3) outlines the feature selection and model evaluation process, which highlights the steps taken to identify the most relevant features and optimize model precision:
Diagram illustrating the iterative feature selection process applied to harmonized EEG data, including correlation-based feature reduction, decision tree-based importance ranking, model training, and performance evaluation using confusion matrices.
Following this process, the features that contributed most to improving model accuracy were selected. The decision tree model, which achieved the highest precision, was subsequently employed to generate a confusion matrix, effectively evaluating the model’s ability to classify individuals at risk of Alzheimer’s disease.
Decision trees were chosen for their intrinsic feature selection capabilities, enabling a deeper analysis of feature importance. This process allowed us to assess how individual features influenced the classification of patients, providing insights into both their individual and collective impact on model performance.
Additionally, the effect size of each selected feature was evaluated using Cohen’s d. This standardized measure quantified the difference between ACr and HC groups for each specific metric, offering a comprehensive understanding of each feature’s contribution to group differentiation.
Results and discussion
Our study utilized an extensive dataset to develop a model capable of identifying statistical differences and distinguishing between ACr and HC across different cohorts. This comprehensive approach allowed us to provide a more nuanced and precise understanding of the implications of the PSEN1-E280A mutation in varied carrier contexts. The findings have significant clinical and research implications, offering potential advancements in early diagnosis and targeted interventions for Alzheimer’s disease. By integrating data from diverse sources, our model enhances the robustness and generalizability of EEG-based classification, contributing valuable insights to the field of Alzheimer’s research.
Preprocessing
By implementing the previously validated processing pipeline from the study of Henao Isaza et al. [25]. The relative power spectral density was computed by analyzing EEG frequency bands, including delta (1.5–6 Hz), theta (6–8.5 Hz), alpha 1 (8.5–10.5 Hz), alpha 2 (10.5–12.5 Hz), beta 1 (12.5–18.5 Hz), beta 2 (18.5–21 Hz), beta 3 (21–30 Hz), and gamma (30–45 Hz) [26].
In Fig 4, The graph illustrates the comparison of components (ICs) of the Alpha2 band across four cohorts of interest. Each boxplot represents the distribution of relative power values for a specific component (C) within different groups and databases.
Boxplots showing the distribution of relative power values for independent components in the Alpha 2 frequency band across four cohorts, separated by healthy non-carrier subjects (HC) and asymptomatic E280A mutation carriers (ACr).
For component C1, the median power values range from 0.05 to 0.1 across all subjects in the four cohorts. However, outliers are observed for the SRM and CHBMP HC groups, with the CHBMP group exhibiting an outlier reaching a value of 0.30 for relative power. In addition, only the UdeA1 cohort shows an outlier for the ACr group.
For Component C2, the median power values across subjects are even closer, with outliers detected in the CHBMP and UdeA2 cohorts. Notably, in the UdeA2 cohort, outliers are present in both the HC and ACr groups. Component C3 shows greater variability in median power values, with outliers in the UdeA2 cohort. Components C4, C5, C6 and C8 show a higher prevalence of outliers, especially in the HC group. In components C7 and C9, the median power values range from 0.075 to 0.150, with outliers observed in the SRM, UdeA1, and UdeA2 cohorts for the ACr group.
Propensity score matching (PSM)
By estimating the propensity score, which is the probability of receiving a particular treatment given observed covariates, individuals in the treatment group can be matched with individuals in the control group who have similar propensity scores.
As shown in Fig 5, the range of common support by treatment status. The propensity score is plotted on the x-axis and the frequency is plotted on the y-axis. The common support functions are smooth, and the algorithm’s balancing property is satisfied. Beneficiaries outside the range of shared support were dropped in the logistic regression models for each case (2:1, 5:1, 10:1).
Distribution of propensity scores for healthy non-carrier subjects (HC) and E280A mutation Alzheimer’s disease carriers without clinically detectable cognitive impairment (ACr), illustrating the region of common support used for matching.
Feature selection
It is important to select relevant features that capture meaningful aspects of brain activity before training a model for EEG analysis. The features used are presented in Fig 1, including the components from [17] and the frequency bands described in the processing, some of which are shown in Fig 6. These features were selected based on their relevance to capturing meaningful aspects of brain activity for EEG analysis. From nearly 967 initial features (Fig 1), the model first removed those with the highest correlations. The decision tree algorithm then identified the 100 most important features for inclusion (Model Selection). Cohen’s d values were utilized to quantify the magnitude of differences between groups ACr and HC for each specific metric.
Kernel density plots illustrating the distributions of Cohen’s d for selected EEG features, including spectral power, functional connectivity, synchronization likelihood, entropy, and cross-frequency coupling measures. Positive or negative values indicate the direction of differences between healthy non-carrier subjects (HC) and asymptomatic E280A mutation carriers (ACr). Statistical significance was assessed using the Mann–Whitney U test.
Fig 6 illustrates the distribution of effect sizes (Cohen’s d) for selected metrics in EEG analysis, highlighting the observed differences between ACr and HC groups. Cohen’s d provides standardized measures of effect size: Small (0.2): Indicates a small difference. Medium (0.5): Represents a moderate difference. Large (0.8): Indicates a substantial difference. Very Large (1.2+): Represents a significant difference.
Relative power analysis revealed a Cohen’s d of −0.53 for component C7 in the Alpha2 band, indicating a moderate-to-large effect size and a highly significant difference (p < 0.001) between the ACr and HC groups. Coherence analysis for component C5 in the Alpha2 band showed a Cohen’s d of −0.49, corresponding to a moderate effect size and a highly significant difference (p < 0.01). Synchronization likelihood in component C5 within the Alpha2 band yielded a Cohen’s d of −0.34, suggesting a small-to-moderate effect. Entropy analysis for component C6 in the Alpha2 band resulted in a Cohen’s d of −0.27, indicating a small effect size. Finally, cross-frequency coupling analysis in component C9 showed comparatively smaller effects.
In addition, these features were evaluated using Cohen’s d to assess effect size, comparing group differences across three proportions (n1, n2, and n3). An initial model training was conducted on the n1 dataset, followed by incremental model adjustments on n2 and n3 datasets. This iterative approach ensured comprehensive model refinement and evaluation across different dataset compositions.
The analysis of key EEG features across the different sample ratios (Table 2) revealed even more pronounced differences between groups. In the 2:1 ratio, relative power in components C7 and C10 within the Beta3 band showed very large effect sizes (Cohen’s d = 1.22 and 1.12, respectively; p < 0.0001). As the augmentation ratio increased, the discriminative power of cross-frequency coupling features became dominant. Specifically, in the 5:1 ratio, interactions such as C8 Mbeta3-Theta and C9 Malpha1-Theta exhibited large negative effects (Cohen’s d = −1.69; p < 0.0001). This trend culminated in the 10:1 ratio, where features like C3 Mbeta3-Alpha1 and C5 Mbeta2-Alpha2 reached extreme effect sizes (Cohen’s d = −2.28; p < 0.0001). Notably, age remained a highly significant covariate across all ratios, with its effect size increasing from 1.10 in the 2:1 scenario to 2.78 in the 10:1 scenario, underscoring the importance of demographic harmonization in the classification models.
Model performance across ratios
For the 2:1 ratio, the training curve starts at around 0.5 scores and gradually increases, stabilizing at approximately 0.8 after 30 samples. In contrast, the validation curve begins around 0.4 scores, exhibits a slightly steeper rise, stabilizes at about 0.83 after 30 samples, and then climbs to about 0.87 after 130 samples, peaking finally at approximately 0.95.
In the case of the 5:1 ratio, the training curve starts near a score of 0.99, dips slightly around 50 samples, and rises again to about 1.0 by 100 samples. Meanwhile, the validation curve begins at a score of 0.83, experiences a slight dip, rises again around 40 samples, and gradually increases to about 0.97.
Lastly, for the 10:1 ratio, the training curve starts near a score of 1.00 and maintains a near-perfect trajectory throughout. The validation curve for this ratio begins at approximately 0.9 and grows steadily until it converges at 1.00, demonstrating the most stable and optimal performance among all ratios, consistent with the perfect accuracy and precision scores reported in Table 3.
The learning curves presented in Fig 7 complement the statistical results from Table 3 by illustrating the model’s convergence and stability across different augmentation ratios. For the 2:1 ratio, a noticeable gap between training and validation accuracy persists in the early stages, suggesting that smaller samples limit the model’s initial generalizability. However, as the ratio increases to 5:1 and 10:1, this gap narrows significantly and earlier in the process. Specifically, the 10:1 ratio exhibits the fastest convergence, with the validation curve reaching a plateau of 1.00 almost simultaneously with the training curve.
Training and validation accuracy curves for decision tree models trained using 2:1, 5:1, and 10:1 ratios of healthy non-carrier subjects (HC) to asymptomatic E280A mutation carriers (ACr).
Confusion matrix
In the following Fig 8 three confusion matrices for class distribution ratios of 2–1, 5–1, and 10–1, providing a visual representation of the computer precision results. In these matrices, TP stands for true positives, FP for false positives, FN for false negatives, and TN for true negatives.
Confusion matrices obtained from the test datasets for decision tree models trained using 2:1, 5:1, and 10:1 HC-to-ACr ratios. Performance was evaluated on 20% of the total data.
For the 2:1 ratio, the results are TP = 31, FP = 1, FN = 3, and TN = 13. For the 5:1 ratio, the results are TP = 32, FP = 0, FN = 1, and TN = 5. For the 10:1 ratio, the results are TP = 32, FP = 0, FN = 2, and TN = 1. These matrices provide insight into the performance of the model at different class distribution ratios, highlighting the differences in correct and incorrect predictions.
Integrating data from multiple cohorts is fundamental to improving the robustness and generalizability of EEG-based AD detection models [27]. Our study took a comprehensive approach by combining data from multiple sources. This highlights the importance of considering the diversity of data in Alzheimer’s disease research to ensure that findings apply to a wide range of populations and clinical scenarios [28].
Our discussion focuses on how PSM and gICA improve the accuracy and reliability of our model, highlighting the importance of considering different methodological approaches to address unique challenges in early AD detection. This comparison underscores the need for comprehensive and multifaceted strategies to improve the accuracy and applicability of EEG-based AD detection models [29].
The figure in Fig 7 illustrate the impact of different class distribution ratios on model stability. For the 2:1 ratio, both curves show a gradual increase, with the validation accuracy reaching 95%, matching the results in Table 3. For the 5:1 ratio, the validation curve exhibits a slight initial dip before climbing to 97%. Notably, at the 10:1 ratio, the model demonstrates its most robust performance; both curves quickly converge and reach a perfect accuracy of 100%, showing the greatest stability and steady growth over time.
The performance metrics in Table 3 further support these observations. For the 2:1 ratio, the model achieved an accuracy of 95% and an AUC of 98%. As the sample enrichment increased, performance improved: the 5:1 ratio reached an accuracy of 97% and a perfect precision of 1.00, although with a slightly lower recall of 0.97. The 10:1 ratio yielded the most exceptional results, with accuracy, AUC, recall, and precision all reaching 1.00.
This perfect performance in the 10:1 configuration suggests that a higher ratio of healthy controls to carriers facilitates an optimal decision boundary for the algorithm. This setup likely provides a superior equilibrium after harmonization and matching, allowing the model to discriminate between classes with no error. These findings are particularly significant as they exceed recently reported AUC values in the literature, which range from 0.962 to 1.0 [30], and significantly surpass earlier benchmarks of 0.85 [31].
This variability in AUC outcomes can be attributed to differences in the data used (such as sample size, data quality, and specific population characteristics) and the analytical methods employed (including the types of features utilized and data preprocessing techniques). Each study may employ different datasets and methodologies, which can significantly affect the resulting AUC values.
Previous research by García-Pretelt et al. [32] applies machine learning to classify individuals at risk for Alzheimer’s disease using resting-state EEG, achieving an impressive accuracy of up to 83% using spatial filters obtained from a gICA approach. Additionally, the study by Francisco Gerson A de Meneses et al. [33] provides insights by using convolutional neural networks (CNNs) to classify neurological diseases based on cortical topographies. The remarkable performance of SqueezeNet, with accuracies of 88.89% for Parkinson’s disease, 75.70% for depression, and 72.10% for bipolar disorder, highlights the potential of advanced machine learning techniques in the classification of neurological conditions, which complements our study’s focus on ICA configurations.
Caroline L. Alves et al. [34] focuses primarily on harmonizing EEG data from different cohorts, with accuracies of 98 and 99% using CNNs in Parkinson’s disease. On the other hand, the present study takes a broader approach, integrating and analyzing data from different sources to improve Alzheimer’s risk classification. This strategy has allowed us to gain a more complete and accurate understanding of the impact of the PSEN1-E280A mutation in different carrier contexts. This underscores the importance of considering data diversity in AD research to ensure model robustness and generalizability.
The results of Gerson et al. [33] serve as a reference point, demonstrating the successful application of similar algorithms with EEG data in various diseases, thus reinforcing the robustness of our methods. This, together with other papers [35], highlights the importance of our findings in the clinical context and underlines the implications for future research and clinical applications. However, our study goes further by providing a more comprehensive understanding of how integrating data from different cohorts can improve the accuracy and generalizability of EEG-based AD detection models. This underscores the need to continue to explore innovative and collaborative approaches to address challenges in Alzheimer’s diagnosis and treatment.
On the other hand, the components obtained from the reproducibility approach reported in the study by Ochoa-Gómez et al. [17] provide a variety of metrics for classification, in line with the proposal of Prado et al. [36]. This study highlights the need for systematic harmonization in EEG connectivity studies, addressing critical sources of variability and suggesting a composite metric strategy to improve replicability in multicenter studies.
While our study is promising, it’s important to acknowledge its limitations. Despite using a comprehensive dataset, we must question its representativeness and potential biases, especially with similar cohorts such as UdeA1 and UdeA2. Generalizing our findings beyond these specific cohorts may be challenging, highlighting the need for validation in diverse populations and increasing cohort diversity [37]. In addition, despite the use of advanced machine learning techniques such as EEG-based classification, the complexity of the models may hinder full interpretation. Finally, potential biases, measurement errors, and underlying assumptions in our analysis models warrant careful consideration and further discussion, particularly regarding model selection and the preference for SVM in the literature.
Conclusions
Our study represents a significant step forward in the field of computational neuroscience, particularly in the area of Alzheimer’s disease detection using EEG data. By harmonizing data from multiple cohorts and applying advanced machine learning techniques, we have developed a robust model capable of accurately discriminating between healthy non-carrier subjects (HC) and E280A mutation Alzheimer’s disease carriers without clinically detectable cognitive impairment (ACr).
Our results highlight the importance of integrating data from multiple sources to improve the generalizability and applicability of disease detection models. The successful application of techniques such as propensity score matching (PSM) and group independent component analysis (gICA) highlights the effectiveness of comprehensive approaches in improving model accuracy and reliability.
While our study presents promising results, it is imperative to acknowledge its limitations. The representativeness of our dataset and potential biases inherent in our sample must be carefully considered. Furthermore, the complexity of our machine learning model requires thorough sensitivity analyses and cross-validation to ensure its stability and robustness.
In conclusion, our work contributes valuable insights to the burgeoning field of computational neuroscience, provides a refined understanding of the impact of the PSEN1-E280A mutation, and paves the way for more accurate and early detection methods for Alzheimer’s disease. We believe that our findings have significant clinical and research implications and represent a critical step in addressing the challenges posed by neurodegenerative diseases in our aging population.
Supporting information
S1 Fig. Quality-control metrics applied across the EEG pre-processing pipeline, including early-stage preprocessing (PREP), wavelet–ICA artifact removal, and noisy time rejection.
https://doi.org/10.1371/journal.pone.0343722.s001
(TIF)
S1 File. This file contains all supplementary tables generated during the EEG data integration and harmonization procedures.
Tables labeled sovaharmony correspond to the pre-enrichment approach, in which a single harmonization step is applied prior to downstream analyses. Tables labeled neuroHarmonize correspond to the sample enrichment approach, where harmonization is performed after data integration under different group ratio configurations. The archive also includes the table features_p_bonferroni, which reports the original p-values and Bonferroni-corrected p-values for all features retained in the sample enrichment pipeline and used as input to the machine learning models for each group ratio configuration.
https://doi.org/10.1371/journal.pone.0343722.s002
(ZIP)
References
- 1. World Health Organization. Dementia. https://www.who.int/news-room/fact-sheets/detail/dementia. 2024. Accessed 2024 September 21.
- 2. Alzheimer’s Disease International ADI. World Alzheimer Report 2022. 2022. https://www.alzint.org/resource/world-alzheimer-report-2022/
- 3. Kang J, Lemaire HG, Unterbeck A, Salbaum JM, Masters CL, Grzeschik KH, et al. The precursor of Alzheimer’s disease amyloid A4 protein resembles a cell-surface receptor. Nature. 1987;325(6106):733–6. pmid:2881207
- 4. Pais M, Martinez L, Ribeiro O, Loureiro J, Fernandez R, Valiengo L, et al. Early diagnosis and treatment of Alzheimer’s disease: new definitions and challenges. Braz J Psychiatry. 2020;42(4):431–41. pmid:31994640
- 5. Jiao B, Li R, Zhou H, Qing K, Liu H, Pan H, et al. Neural biomarker diagnosis and prediction to mild cognitive impairment and Alzheimer’s disease using EEG technology. Alzheimers Res Ther. 2023;15(1):32. pmid:36765411
- 6. Duque-Grajales JE, López A, Ramírez M. Quantitative EEG analysis during resting and memory tasks in carriers and non-carriers of PS-1 E280A mutation of familial Alzheimer’s disease. CES Med. 2014;28(2):165–76.
- 7. Smit DJA, Andreassen OA, Boomsma DI, Burwell SJ, Chorlian DB, de Geus EJC, et al. Large-scale collaboration in ENIGMA-EEG: A perspective on the meta-analytic approach to link neurological and psychiatric liability genes to electrophysiological brain activity. Brain Behav. 2021;11(8):e02188. pmid:34291596
- 8. Ballesteros AS, Prado P, Ibanez A, Perez JAM, Moguilner S. A pipeline for large-scale assessments of dementia EEG connectivity across multicentric settings. Center for Open Science. 2023.
- 9. Moguilner S, Birba A, Fittipaldi S, Gonzalez-Campo C, Tagliazucchi E, Reyes P, et al. Multi-feature computational framework for combined signatures of dementia in underrepresented settings. J Neural Eng. 2022;19(4):10.1088/1741-2552/ac87d0. pmid:35940105
- 10. van Noordt S, Desjardins JA, Huberty S, Abou-Abbas L, Webb SJ, Levin AR, et al. EEG-IP: an international infant EEG data integration platform for the study of risk and resilience in autism and related conditions. Mol Med. 2020;26(1):40. pmid:32380941
- 11. Gorgolewski KJ, Auer T, Calhoun VD, Craddock RC, Das S, Duff EP, et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci Data. 2016;3:160044. pmid:27326542
- 12. Kemp F. Independent Component Analysis Independent Component Analysis: Principles and Practice. J Royal Statistical Soc D. 2003;52(3):412–412.
- 13. Duncan D, Garner R, Brinkerhoff S, Walker HC, Pouratian N, Toga AW. Data Archive for the BRAIN Initiative (DABI). Sci Data. 2023;10(1):83. pmid:36759619
- 14. Prado P, Mejía JA, Sainz-Ballesteros A, Birba A, Moguilner S, Herzog R, et al. Harmonized multi-metric and multi-centric assessment of EEG source space connectivity for dementia characterization. Alzheimers Dement (Amst). 2023;15(3):e12455. pmid:37424962
- 15. ROSENBAUM PR, RUBIN DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.
- 16. Suárez-Revelo JX, Ochoa-Gómez JF, Tobón-Quintero CA. Validation of EEG Pre-processing Pipeline by Test-Retest Reliability. Communications in Computer and Information Science. Springer International Publishing. 2018. p. 290–9.
- 17. Ochoa-Gómez JF, Mantilla-Ramos Y-J, Isaza VH, Tobón CA, Lopera F, Aguillón D, et al. Reproducible Neuronal Components found using Group Independent Component Analysis in Resting State Electroencephalographic Data. openRxiv. 2023.
- 18. Babadi B, Brown EN. A review of multitaper spectral analysis. IEEE Trans Biomed Eng. 2014;61(5):1555–64. pmid:24759284
- 19. Python Software Foundation. itertools — Functions creating iterators for efficient looping — Python 3.12.6 documentation. https://docs.python.org/3/library/itertools.html. Accessed 2024 September 21.
- 20. SciPy Community. Coherence — SciPy v1.14.1 Manual. https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.coherence.html. Accessed 2024 September 21.
- 21. Falk TH, Fraga FJ, Trambaiolli L, Anghinah R. Erratum to: EEG amplitude modulation analysis for semi-automated diagnosis of Alzheimer’s disease. EURASIP J Adv Signal Process. 2014;2014(1).
- 22. Niso G, Bruña R, Pereda E, Gutiérrez R, Bajo R, Maestú F, et al. HERMES: towards an integrated toolbox to characterize functional and effective brain connectivity. Neuroinformatics. 2013;11(4):405–34. pmid:23812847
- 23. Pomponio R, Erus G, Habes M, Doshi J, Srinivasan D, Mamourian E, et al. Harmonization of large MRI datasets for the analysis of brain imaging patterns throughout the lifespan. Neuroimage. 2020;208:116450. pmid:31821869
- 24. Richter S, Winzeck S, Correia MM, Kornaropoulos EN, Manktelow A, Outtrim J, et al. Validation of cross-sectional and longitudinal ComBat harmonization methods for magnetic resonance imaging data on a travelling subject cohort. Neuroimage Rep. 2022;2(4):None. pmid:36507071
- 25. Henao Isaza V, Cadavid Castro V, Zapata Saldarriaga LM, Mantilla-Ramos Y-J, Suarez Revelo JX, Tobón Quintero CA, et al. Tackling EEG Test-Retest Reliability with a Pre-Processing Pipeline Based on ICA and Wavelet-ICA. Elsevier BV. 2023.
- 26. Chaddad A, Wu Y, Kateb R, Bouridane A. Electroencephalography Signal Processing: A Comprehensive Review and Analysis of Methods and Techniques. Sensors (Basel). 2023;23(14):6434. pmid:37514728
- 27. Watanabe Y, Miyazaki Y, Hata M, Fukuma R, Aoki Y, Kazui H, et al. A deep learning model for the detection of various dementia and MCI pathologies based on resting-state electroencephalography data: A retrospective multicentre study. Neural Netw. 2024;171:242–50. pmid:38101292
- 28. Scheltens P, De Strooper B, Kivipelto M, Holstege H, Chételat G, Teunissen CE, et al. Alzheimer’s disease. Lancet. 2021;397(10284):1577–90. pmid:33667416
- 29. Xia W, Zhang R, Zhang X, Usman M. A novel method for diagnosing Alzheimer’s disease using deep pyramid CNN based on EEG signals. Heliyon. 2023;9(4):e14858. pmid:37025794
- 30. Kim S-K, Kim H, Kim SH, Kim JB, Kim L. Electroencephalography-based classification of Alzheimer’s disease spectrum during computer-based cognitive testing. Sci Rep. 2024;14(1):5252. pmid:38438453
- 31. Meghdadi AH, Stevanović Karić M, McConnell M, Rupp G, Richard C, Hamilton J, et al. Resting state EEG biomarkers of cognitive decline associated with Alzheimer’s disease and mild cognitive impairment. PLoS One. 2021;16(2):e0244180. pmid:33544703
- 32. García-Pretelt FJ, Suárez-Relevo JX, Aguillon-Niño DF, Lopera-Restrepo FJ, Ochoa-Gómez JF, Tobón-Quintero CA. Automatic classification of subjects of the PSEN1-E280A family at risk of developing Alzheimer’s disease using machine learning and resting state electroencephalography. J Alzheimers Dis. 2022;87(2):817–32. pmid:35404271
- 33. de Meneses FGA, Teles AS, Nunes M, da Silva Farias D, Teixeira S. Neural networks to recognize patterns in topographic images of cortical electrical activity of patients with neurological diseases. Brain Topogr. 2022;35(4):464–80. pmid:35596851
- 34. Alves CL, Pineda AM, Roster K, Thielemann C, Rodrigues FA. EEG functional connectivity and deep learning for automatic diagnosis of brain disorders: Alzheimer’s disease and schizophrenia. J Phys Complex. 2022;3(2):025001.
- 35. Zuber S, Bechtiger L, Bodelet JS, Golin M, Heumann J, Kim JH, et al. An integrative approach for the analysis of risk and health across the life course: challenges, innovations, and opportunities for life course research. Discov Soc Sci Health. 2023;3(1):14. pmid:37469576
- 36. Prado P, Birba A, Cruzat J, Santamaría-García H, Parra M, Moguilner S, et al. Dementia ConnEEGtome: towards multicentric harmonization of EEG connectivity in neurodegeneration. Int J Psychophysiol. 2022;172:24–38. pmid:34968581
- 37. Mindt MR, Okonkwo O, Weiner MW, Veitch DP, Aisen P, Ashford M, et al. Improving generalizability and study design of Alzheimer’s disease cohort studies in the United States by including under-represented populations. Alzheimers Dement. 2023;19(4):1549–57. pmid:36372959