Random survival forest model for early prediction of Alzheimer’s disease conversion in early and late Mild cognitive impairment stages

Amna Saeed; Asim Waris; Ahmed Fuwad; Javaid Iqbal; Jawad Khan; Dokhyl AlQahtani; Omer Gilani; Umer Hameed Shah; for The Alzheimer’s Disease Neuroimaging Initiative

doi:10.1371/journal.pone.0314725

Peer Review History

Original SubmissionJuly 18, 2024
16 Sep 2024 Decision Letter - Ghanim Ullah, Editor PONE-D-24-27193Random Survival Forest Model for Early Prediction of Alzheimer's Disease Conversion in Early and Late Mild Cognitive Impairment StagesPLOS ONE Dear Dr. Waris, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Oct 31 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Ghanim Ullah, Ph.D. Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. Thank you for stating the following financial disclosure: “This work was funded by the Higher Education Commission (HEC) of Pakistan under grant number 16052/NRPU/R&D/HEC/2021-2020.” Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 4. Thank you for stating the following in the Acknowledgments Section of your manuscript: “The authors state that this work was funded by the Higher Education Commission (HEC) of Pakistan under grant number 16052/NRPU/R&D/HEC/2021-2020. Data collection and sharing for this project was funded by The Alzheimer's Disease Neuroimaging Initiative (ADNI). ADNI is funded by the National Institute on Aging (National Institutes of Health Grant U19 AG024904) for data gathering and sharing. The Northern California Institute for Research and Education is the grantee organization. In the past, ADNI has also received funding from the National Institute of Biomedical Imaging and Bioengineering, the Canadian Institutes of Health Research, and private sector contributions through the Foundation for the National Institutes of Health (FNIH) including generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research &Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics.” We note that you have provided funding information that is currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: “This work was funded by the Higher Education Commission (HEC) of Pakistan under grant number 16052/NRPU/R&D/HEC/2021-2020.” Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 5. We note that you have indicated that there are restrictions to data sharing for this study. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Before we proceed with your manuscript, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., a Research Ethics Committee or Institutional Review Board, etc.). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories. You also have the option of uploading the data as Supporting Information files, but we would recommend depositing data directly to a data repository if possible. We will update your Data Availability statement on your behalf to reflect the information you provide. 6. We note that Figure 1 in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission: 1. You may seek permission from the original copyright holder of Figure 1 to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” 2. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The present work aims to assess the conversion risk from early and late MCI to AD, employing several ML algorithms for survival analysis, including Random Survival Forests (RSF). The authors used data from ADNI and built separate models for the eMCI and lMCI cohorts to evaluate how the risk varies according to the onset of the disease. The findings suggest that RSF outperformed other algorithms, with lMCI showing a higher probability of conversion to AD. Although the authors' efforts are commendable, I must highlight several major issues. First, the manuscript closely resembles the work by Sarica et al., with the only apparent novelty being the comparison between eMCI and lMCI. More elaboration on how this work advances the field beyond previous studies is necessary. The reported high c-index for eMCI is surprising, given the low number of conversions to AD in this cohort. This raises concerns about the robustness of the findings. Additionally, the authors have not reported the number of events or censored data per year, which is crucial information. Twenty-three subjects over eight years is not sufficient. Furthermore, the authors did not declare the technique used for group balancing. It's important to note that balancing survival data is generally not recommended in the literature, as the time variable is a target and not a predictor. Regarding the timing of events, it is unclear why the authors chose to use years instead of months. A finer time resolution could potentially yield more accurate predictions. Table 1 should report the demographics of eMCI and lMCI cohorts, split between those who converted to AD and those who did not, along with the occurrence of events/censorship, as detailed in Sarica et al. ("Explainability of random survival forests in predicting conversion risk from mild cognitive impairment to Alzheimer’s disease." Brain Informatics 10.1 (2023): 31 and "Sex differences in conversion risk from mild cognitive impairment to Alzheimer’s disease: an explainable machine learning study with random survival forests and SHAP." Brain Sciences 14.3 (2024): 201). Additionally, the percentage of missing data per diagnosis is not provided, which is fundamental in determining whether imputation methods are appropriate. As a minor issue, the Results section should be separated from the Discussion to avoid confusion. The Discussion should focus on the clinical implications of the most important features identified in the analysis. Finally, in Figures 5 and 6, the feature rankings should be ordered by importance to clearly highlight the most significant variables. In summary, while I did not find major issues with the methodology itself, the authors must carefully revise the dataset and provide additional detail to ensure that their conclusions are fully supported by the data. Reviewer #2: Saeed et al. apply different machine learning (ML) survival models to predict the time to the conversion of early (eMCI) and late (lMCI) mild cognitively impaired patients to Alzheimer’s disease (AD) using both single modality and multimodal data. The models included Models included Random Survival Forest (RSF), Extra Survival Trees (XST), Gradient Boosting Survival Analysis (GB), Survival Tree (ST), Cox-net, and Cox Proportional Hazard (CoxPH). The study finds that RSF performs best on both datasets (eMCI and lMCI). The authors also found that cognitive impairment is the best predictor of survival time and using multimodal data does not improve model performance significantly. This study departs from other studies on using ML in MCI to AD in that, here instead of classification the authors used survival models to predict the survival of the population and individual patients. Furthermore, the use of multimodal data to predict MCI to AD and splitting the data into eMCI and lMCI are also new additions to the field. Overall, I find this a thoroughly carried out study, which has the potential to advance the field. The paper is well-written. Thus, I recommend the manuscript for publication with the suggestion that the code reproducing the key results is made available as supplementary information with the paper. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Alessia Sarica Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0314725.r001
Revision 1
10 Oct 2024 Author Response Response to Editor: I am writing to submit the revised version of our manuscript titled "Random Survival Forest Model for Early Prediction of Alzheimer's Disease Conversion in Early and Late Mild Cognitive Impairment Stages." This paper investigates the differences in progression rates and predictive modeling between early and late Mild Cognitive Impairment stages using multimodal data. We have carefully addressed all the reviewers' comments and have made the necessary changes to enhance the clarity and quality of our manuscript. In our revision, we have ensured that the manuscript meets PLOS ONE's style requirements. The code used in our analysis will be made available at the time of publication. This work was funded by the Higher Education Commission (HEC) of Pakistan under grant number 16052/NRPU/R&D/HEC/2021-2020. As the Principal Investigator of the project, I secured this funding, which was used to provide all necessary resources. However, we have removed the specific financial details from the manuscript as requested. Additionally, we want to clarify that the dataset utilized in this study is owned by a third-party organization, the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The data is publicly available from the ADNI website at https://adni.loni.usc.edu/data-samples/adni-data/#AccessData upon sending a request that includes the proposed analysis and the named lead investigator. Furthermore, we would like to inform you that Figure 1 was created using Canva, and the images in that figure are free elements covered under the Free Media License Agreement. This agreement allows us to use Canva designs for online or electronic publications. More information regarding the content license agreement can be found at https://www.canva.com/policies/content-license-agreement/ Response to Reviewer 1: Concern #1: The manuscript closely resembles the work by Sarica et al., with the only apparent novelty being the comparison between eMCI and lMCI. More elaboration on how this work advances the field beyond previous studies is necessary. Author Response: Thank you for your thoughtful feedback. We acknowledge that our work closely resembles the study by Sarica et al. and appreciate the opportunity to clarify how we have built upon and advanced it. Our contributions are as follows: • While our research builds on the foundation of Sarica et al., we have focused on an important aspect: analyzing how the progression to Alzheimer's Disease (AD) differs between early (eMCI) and late (lMCI) stages of Mild Cognitive Impairment. This detailed comparison has not been extensively covered in previous work, and we believe it adds significant value to the field. • To enhance precision and accuracy, we built separate models for eMCI and lMCI stages. This approach allows for a more tailored analysis of disease progression, rather than treating MCI as a uniform stage. • We compared six machine learning models to identify the most effective in predicting AD conversion, offering a broader evaluation than previous studies. This comparative approach highlights the strengths of different models in these distinct MCI stages. • For the eMCI dataset, which faced class imbalance, we implemented a robust strategy to handle the imbalance. This approach ensures that our model produces reliable results without overfitting or introducing bias, a common risk when dealing with imbalanced data. • Our study also performed a multimodal comparison, evaluating neuropsychological assessments, neuroimaging (MRI + PET), and CSF biomarkers both individually and in combination. This comprehensive analysis helped determine whether combining modalities improves predictive performance or if a single modality can perform equally well. • A key advancement in our work is generating individual survival curves from baseline data. These curves provide personalized risk assessments for each patient, offering valuable information to clinicians in both eMCI and lMCI stages. Author Action: Thank you for your valuable comments. In response, we have made additions to both the Introduction and Discussion sections to emphasize the novel aspects of our study and how it builds upon the work of Sarica et al. These additions highlight the unique contributions of our research, particularly the detailed comparison of early and late MCI stages, handling imbalance in dataset and the generation of personalized survival curves. • We have added the necessary revisions in the Introduction section, from page 5, line 111, to page 6 line 124: ‘While these studies have made substantial contributions, there is still a lack of extensive research on predicting time-to-AD conversion in the eMCI and lMCI stages using multimodal data. According to studies, the rate of AD progression differs among stages [21]. Hence it is crucial to develop ML models specific to each MCI stage, so they can capture the distinct patterns of each stage to generate more precise and personalized predictions. Such a stage-specific approach can enable clinicians to identify the individual at risk, allowing for timely interventions and better patient outcomes [21]. By developing separate models for eMCI and lMCI, our study addresses this need, advancing the precision of diagnosis and prognosis across different points in the disease continuum. Additionally, the handling of data imbalance in survival analysis datasets is a relatively underexplored challenge, especially in AD progression. To bridge these gaps, we implemented a comprehensive strategy using multiple ML survival models to predict AD conversion risks in eMCI and lMCI patients separately. This approach not only provides the first stage-specific ML-based survival analysis for MCI but also introduces methods to mitigate data imbalance, enabling more reliable and personalized risk assessments for clinical decision-making.’ • We have added the necessary revisions in the Discussion section, from page 26, line 538, to page 27, line 553: ‘Our results align with previous research, which has demonstrated the effectiveness of RSF in predicting time-to-event scenarios in both clinical and research settings [28], [29] [35], [36]. RSF possesses several key features that make it a reliable approach for disease forecasting, including built-in mechanisms to reduce overfitting, effectively handling high dimensional data, capturing complex relationships between predictors and survival outcomes and absence of convergence issues [37], [38]. These attributes contribute to its effectiveness in medical research and clinical applications. Our study builds upon the previous work by developing separate models for early and late MCI stages and identifying the key features that serve as important predictors for each stage. This approach enables more accurate predictions tailored to the unique progression patterns of each group. We also addressed the challenge of dataset imbalance by using oversampling techniques, leading to more reliable and unbiased model performance. We generated individual survival curves for both progressive and stable subjects in both datasets. The models used to generate these curves were trained separately on the eMCI and lMCI data. This approach enhances the precision and accuracy of the survival curves and is better suited to the specific characteristics of each stage. These results can help clinicians make early, personalized predictions of Alzheimer's progression, enabling timely interventions and better resource planning for at-risk patients.’ ________________________________________ Concern #2: The high c-index for eMCI is surprising, given the low number of conversions to AD in this cohort. This raises concerns about the robustness of the findings. Author Response: Thank you for raising this important concern regarding the unexpectedly high c-index for the eMCI cohort. Upon reviewing this comment, we realized that our original findings were not as robust as initially believed. The main reason behind this was the significant class imbalance in the eMCI dataset, which we had attempted to address by balancing the entire dataset before splitting it into training and test sets. As you correctly pointed out, this led to data leakage, where some oversampled data points appeared in both the training and test sets, resulting in higher accuracy and a biased model. Consequently, the test set was not truly "unseen" during evaluation, which compromised the reliability of our results. To rectify this, we adjusted our methodology. Instead of oversampling the entire dataset before splitting, we first divided the dataset into training and test sets and then oversampled the minority class (uncensored subjects) only in the training set. This way, the test set remained unbalanced, representing real-world data distribution, and contained unseen data for proper evaluation. (More on this in the response to Concern#4) Additionally, we incorporated cross-validation and hyperparameter optimization to further ensure the model's robustness. With these changes, the new results are reliable, achieving a c-index of 0.90 and IBS of 0.10 for the Random Survival Forest model, which we believe reflects the true predictive power of the model. Author Action: We have revised the 'Target Imbalance' section in the Methodology and made additions to the Results and Discussion sections. These updates explain how we address target imbalance to ensure robust findings and include updated results for the eMCI dataset. • We have revised the Abstract, on page 1, line 32 to line 33: ‘For eMCI, RSF trained on multimodal data achieved a C-Index of 0.90 and an IBS of 0.10. For lMCI, the C-Index was 0.82 and the IBS was 0.16.’ • We have revised the section ‘Target Imbalance’ in the Methodology, from page 10, line 208, to line 219, and updated the Figures 3 and 4: 2.3.4 Target Imbalance: ‘For prediction labels, patients were categorized into two groups: those showing progression of the disease (labeled '1') and those who did not (labeled '0'). Figure 3 compares the distribution of censored (stable) and uncensored (AD converters) subjects in both the eMCI and lMCI datasets. A noticeable imbalance is present in the eMCI group, where there are significantly fewer uncensored cases, whereas the lMCI group shows a more balanced distribution. Imbalanced datasets can lead to biased models that do not perform well on new data [22]. To address this, we used the 'random oversampler' from the sklearn library, which balances the class distribution by randomly duplicating samples from the minority class. We first split the dataset into training and testing sets, then oversampled the minority class in the training set to balance it before model training, which is illustrated in Figure 4. The testing set was left unbalanced to reflect real-world conditions and ensure the model was evaluated on unseen, naturally distributed data.’ Figure 3. Distribution of censored and uncensored data in eMCI and lMCI groups. Figure 4. Balancing the training set of eMCI group. • We have updated the Results section, on page 17, line 352 to line 357, and Figure 6. ‘This section discusses the performance of ML models trained on multimodal data that includes all features. As compared to other models, RSF had the highest accuracy on both datasets. All the ML models outperformed the traditional CoxPH model in both datasets. For eMCI group, among the ensemble-based models, RSF showed the best performance (C-Index= 0.90 ± 0.03, IBS= 0.10 ± 0.02), followed by ) XST (C-Index= 0.86 ± 0.02, IBS= 0.10 ± 0.03) and Gradient Boosting (C-Index= 0.82 ± 0.02, IBS= 0.10 ± 0.03).’ Figure 6. Heatmap showing performance of models measured by C-Index and IBS for early and late MCI. • We have updated the Results section on page 19, line 382 – 385: ‘For the eMCI group, RSF trained on cognitive features achieved a C-Index of 0.85 ± 0.02 and an IBS of 0.11 ± 0.02 compared to Imaging features (C-Index= 0.76 ± 0.02, IBS= 0.14 ± 0.02, p<0.05) and CSF biomarkers (C-Index= 0.74 ± 0.03, IBS= 0.16 ± 0.02, p<0.05).’ • Furthermore, we have updated Figure 8 which illustrates the individual survival curves for individuals with eMCI and lMCI. The balancing approach has resulted in enhanced survival curves: (a) Progressive eMCI subject (b) Non-Progressive eMCI subject Figure 8. Predicted survival estimates for subjects with progressive eMCI and lMCI as well as those with non-progressive eMCI and lMCI. The red line refers to the actual event times for progressive/uncensored patients and the actual censoring time for non-progressive/censored patients. • We have made additions to the Discussion section, on page 24, line 490 to 510: ‘Handling missing data and target imbalance is a common challenge in biomedical research. To address this, we used KNN imputation to manage missing data. In the eMCI dataset, there was a significant imbalance between the number of subjects who progressed to AD (uncensored) and those who remained stable (censored). Imbalanced datasets can lead to biased models and overfitting, which affects the reliability of results [33]. Table 4 shows when the model was trained on the imbalanced dataset, it achieved 98% accuracy on the training set but only 71% on the test set. This indicates overfitting, suggesting that the model may have learned patterns specific to the training data that do not generalize to new data. Datta et al. [22] achieved improved predictive performance of their Cox elastic net regression models when trained on sampled data. This highlights the effectiveness of integrating sampling methods with survival analysis techniques to improve predictive performance in biomedical research. Oversampling is one of several sampling techniques used to address target imbalance in disease prediction studies [34]. To address imbalance in eMCI dataset, we first split the data into training and test sets. The training set was then balanced by oversampling the minority class (subjects progressing to AD), while the test set remained unbalanced so that the final performance evaluation remains unbiased. After applying this balancing strategy, the model’s performance improved significantly, achieving 97% accuracy on the training set and 90% on the test set. This improvement suggests that balancing the training data enhanced the model’s generalizability and reliability. Balancing the entire dataset before the train-test split was avoided, as it could cause data leakage, with some data points appearing in both the training and test sets, leading to overfitting and unreliable model evaluation. After completing preprocessing, we proceeded with training and evaluating the models.’ Table 4. RSF's performance before and after applying data balancing. ________________________________________ Concern #3: The authors have not reported the number of events or censored data per year, which is crucial information. Author Response: Thank you for bringing this to our attention. We agree that reporting the number of events and censored data per year is crucial for understanding the dataset's characteristics. We have now included this information in the manuscript as a figure, illustrating the number of censored events and total events per year for each dataset (eMCI and lMCI). Author Action: Figure 2 has been added on page 8, line 3, providing details on the number of events and censored data for both eMCI and lMCI cohorts by year. Figure 2. Censored and uncensored data distribution per year in Early and late MCI individuals. ________________________________________ Concern #4: The technique used for group balancing was not declared. It is important to note that balancing survival data is generally not recommended in the literature. Author Response: Thank you for raising this important concern regarding group balancing in survival analysis. In response, we acknowledge that balancing survival data is not always recommended in the literature due to the potential risks of introducing bias or overfitting. However, in our study, the decision to balance the training data was carefully considered due to the severe imbalance in the eMCI dataset, where the minority class (subjects progressing to AD) was significantly underrepresented. This imbalance posed a ris Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pone.0314725.r002
15 Nov 2024 Decision Letter - Ghanim Ullah, Editor Random survival forest model for early prediction of Alzheimer's disease conversion in early and late Mild cognitive impairment stages PONE-D-24-27193R1 Dear Dr. Waris, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Ghanim Ullah, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: I applaud the excellent work of the authors in reviewing. I am very impressed by the thorough responses they have given to my comments. The work has been greatly improved. Reviewer #2: The authors have addressed all my concerns. The revised manuscript is clear and through. Therefore, I do not have anymore concerns ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Alessia Sarica Reviewer #2: No ******** https://doi.org/10.1371/journal.pone.0314725.r003
Formally Accepted
29 Nov 2024 Acceptance Letter - Ghanim Ullah, Editor PONE-D-24-27193R1 PLOS ONE Dear Dr. Waris, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Ghanim Ullah Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0314725.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .