Peer Review History

Original SubmissionSeptember 6, 2023
Decision Letter - Billy Morara Tsima, Editor

PONE-D-23-26242Application of Machine Learning to Predict Cognitive Deficits in HIV using Auditory and Demographic FactorsPLOS ONE

Dear Dr. Niemczak,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 18 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Billy Morara Tsima, MD MSc

Academic Editor

PLOS ONE

Journal requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf.

2. 1) Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

2) Please include a complete copy of PLOS’ questionnaire on inclusivity in global research in your revised manuscript. Our policy for research in this area aims to improve transparency in the reporting of research performed outside of researchers’ own country or community. The policy applies to researchers who have travelled to a different country to conduct research, research with Indigenous populations or their lands, and research on cultural artefacts. The questionnaire can also be requested at the journal’s discretion for any other submissions, even if these conditions are not met.  Please find more information on the policy and a link to download a blank copy of the questionnaire here: https://journals.plos.org/plosone/s/best-practices-in-research-reporting. Please upload a completed version of your questionnaire as Supporting Information when you resubmit your manuscript.

3. Note from Emily Chenette, Editor in Chief of PLOS ONE, and Iain Hrynaszkiewicz, Director of Open Research Solutions at PLOS: Did you know that depositing data in a repository is associated with up to a 25% citation advantage (https://doi.org/10.1371/journal.pone.0230416)? If you’ve not already done so, consider depositing your raw data in a repository to ensure your work is read, appreciated and cited by the largest possible audience. You’ll also earn an Accessible Data icon on your published paper if you deposit your data in any participating repository (https://plos.org/open-science/open-data/#accessible-data).

4. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

5. Thank you for stating the following financial disclosure: 

 [YES. This study was funded by the National Institutes of Health (NIH), grant number 5R01DC009972 to principal investigator Jay C. Buckey M.D. The content of this report is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.].  

Please state what role the funders took in the study.  If the funders had no role, please state: ""The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."" 

If this statement is not correct you must amend it as needed. 

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

6. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

""Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

7. Please upload a copy of Figure S1 and S3, to which you refer in your text on page 1 and 2 in Supplementary. If the figure is no longer to be included as part of the submission please remove all reference to it within the text.

8. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In this work, the authors try to predict cognitive deficits in HIV using auditory measurements and demographic factors. The methodology, results and conclusions presented by the authors are unsatisfactory, incoherent and do not meet the standards of a well conducted experimentation and study.

This work is highly incremental i.e applying stock machine learning techniques to understand the predictive capability of auditory signals beyond demographic factors. The authors start by setting the objective of predicting neuro-cognitive deficits in patients living with HIV using auditory tests and demographic factors. However, the experiments conducted and results presented do not seem to provide any added value to the research field.

In Table 2, the authors compare the performance of different machine learning methods for their task showing very modest gains that does not demonstrate the importance of those small gains in the overall objective. Also, it is unclear how this prediction relates to predicting cognitive deficits in PLWH vs healthy populations.

In Figure 2, the shapely values for some of the variables are very different between gaussian naive bayes and kernel naive bayes methods which brings doubt in the model predictions. There has been no comment or further investigation provided on this observed phenomenon.

Overall, the manuscript provides little added value and is not suitable for publication. The results are unsatisfactory, not adequately substantiated with evidence, does not address the key questions in the objective and is incoherent in several ways described above.

Reviewer #2: Summary:

The paper uses machine learning (ML) algorithms to predict cognitive deficits with auditory and demographic data. In 5 of 7 ML algorithms under testing, auditory data help improves the prediction performance over models only based on demographic data. Naive Bayes based models are the best performing models (ROC-AUC around 0.9), based on which the most important features for cognitive deficits are identified. Both the statistical analysis and interpretations look good. Please see the comments below:

Major:

In 2.1 Data collection - Cognitive Data Collection, does the cognitive impairment indicator variable (MoCA score<26) the only response variable this study focus on? Is the "two sample" in T-test and the "classification error" in Bhattacharyya algorithm based on the binary variable derived from MoCA score (<26)?

The data collection was during 2017-2022, which include the sars-cov-2 pandemic period. Can the authors comments on if the data collection procedures before and after the pandemic was changed or not, if the visiting intervals of the subjects was also impacted? If yes, how would these factors influence the data and results? e.g. due to the pandemic, it is possible a subject has multiple visits before pandemic and 1 visit after pandemic, where the time interval of visits before pandemic are short while the one after the pandemic is much longer. The data of that after-pandemic single visit tends to have higher influence on the slope than the other visits before pandemic.

Can the authors discuss more about why Naive Bayes models are performing better than the others? And does the strong assumption of naive bayes hold in this study?

It seems the auditory variable trajectories are useful to predict cognitive function. Can the authors comment on is it interesting to predict the cognitive function trajectory instead of the final cognitive function?

Ensemble model represents a large class of models, which one is tested in this paper?

As is indicated in discussion, the HIV status is not a significant predictor for cognitive function. I think is makes more sense to not highlight HIV in title and abstract. After reading the title and abstract,

my expectation was that there would be a section talking about how auditory and demographic factors can be used to predict cognitive deficits specifically caused by HIV.

Minor:

Typo "2.63 Data Collection" -> "2.1 Data Collection"

In supplementary figure 1, it seems from N=557 to N=478, there are some additional exclusion criteria but the second paragraph of "2.1 Data Collection - Subjects" did not mention it.

In supplementary figure 1, typo "Exlusionary" -> "Exclusionary"

In supplementary figure 2, the feature names on y axis are missing.

In Table 3, can the authors describe how was the p-values was calculated?

Reviewer #3: The publication is well written; there is not really a lot of 'new research' with respect to machine learning, but this is an interesting use case.

Under "main outcomes" the "area under the curve" should specify the AUC is for the receiver operational characteristic (ROC) curve, and the other metrics should be cited too (F1 and Yourdon).

There are many references that describe problems or issues with the AUC measure, particularly for imbalanced data sets such as this one. Precision / recall curves would be a good complimentary measure too. The main reason for this issue is that the work is pretty straight forward but does show some level of utility, so gaining more insight into how well the concept might work would be useful. For this reason, I would also like to see the confusion matrices from the supplement inserted and worked into the main text as well, for this is often the best way to see the impact of selected thresholds (which translate ROC curves and PR curves into 'real world' impact).

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Revision 1

Response to the Reviewers:

We would like to thank the editor and reviewers for their comments and suggestions on our manuscript. We have made the necessary corrections, outlined below starting with the editor’s comments.

Editor’s comments:

We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

- This has been done.

In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety.

- Due to the number of potentially identifiable features in the data used for the machine learning algorithm, Dartmouth College IRB and Muhimbilli University of Health and Allied Sciences (MUHAS) have asked to restrict data sharing to those with proper compliance certifications. We welcome data sharing but want to ensure a proper data use agreement is in place to handle data properly and with complete confidentiality.

Reviewer #1:

In this work, the authors try to predict cognitive deficits in HIV using auditory measurements and demographic factors. The methodology, results and conclusions presented by the authors are unsatisfactory, incoherent and do not meet the standards of a well conducted experimentation and study.

- The longitudinal study in Tanzania offers the rare opportunity to examine predictors of neurocognitive function. Studying auditory tests as potential predictors is new and novel. The extensive dataset we have is particularly well suited to machine learning approaches. We believe the understand the reviewer’s concerns and have addressed them in the responses below.

- While we recognize the limitations in this study, we believe our approach adds to the literature on how neurocognitive deficits can be found using auditory measures. This approach combines multiple demographic variables with novel auditory assessments to assess and predict neurocognitive deficits. If auditory tests can predict neurocognitive dysfunction using machine learning techniques, this could change how neurocognitive deficits, particularly related to HIV, are monitored and detected in low- middle-income countries.

This work is highly incremental i.e applying stock machine learning techniques to understand the predictive capability of auditory signals beyond demographic factors.

- We agree the work is incremental, but we believe an incremental approach is needed. Auditory test results correlate with neurocognitive deficits. Turning this correlation into a useful prediction requires incremental testing and the use of multiple variables. We believe starting with standard or stock machine learning approaches is a good first step. The machine learning techniques used in this manuscript offer interpretability, efficiency, robustness, and insights into feature importance. These benefits make them suitable for various research and clinical applications. If auditory test results can add to the predictive capability of neurocognitive deficits, it could dramatically change the landscape of neurocognitive treatment, monitoring, and detection.

The authors start by setting the objective of predicting neuro-cognitive deficits in patients living with HIV using auditory tests and demographic factors. However, the experiments conducted and results presented do not seem to provide any added value to the research field.

- To our knowledge, a machine learning approach such as has been used here has not been previously attempted using auditory test results combined with demographic factors. The results show improved prediction when the auditory tests are added, which is a new result that has not been shown before. This finding is important because neurocognitive assessments can require long protocols and trained administrators. They are also greatly affected by education. Auditory tests are comparatively faster, can be administered by minimally trained personnel, and don’t require much education to understand. While we have not proved that auditory tests can substitute for neurocognitive assessments, this manuscript supports our understanding of this predictive relationship and moves us closer to providing accurate and accessible measures of cognitive dysfunction using readily acquired data.

In Table 2, the authors compare the performance of different machine learning methods for their task showing very modest gains that does not demonstrate the importance of those small gains in the overall objective. Also, it is unclear how this prediction relates to predicting cognitive deficits in PLWH vs healthy populations.

- We have tried to make the importance of these gains clearer. Demographic factors alone are strong predictors of neurocognitive deficits, so any significant gains in predictive value from additional tests, as shown in our manuscript, are worthy of further examination. We have moved the confusion matrices from the supplementary materials to the body of the manuscript to show how gains in machine learning prediction with auditory variables can be important. For example, increased age is highly related to decreased working memory, processing speed, and executive function. Our auditory tests add a significant 9% to AUC values. The fact that auditory variables add predictive value beyond age, is very appealing. In addition, the F1 and Youden’s Index are also larger for predictive models with auditory variables compared to those without.

In Figure 2, the shapely values for some of the variables are very different between gaussian naive bayes and kernel naive bayes methods which brings doubt in the model predictions. There has been no comment or further investigation provided on this observed phenomenon.

- We appreciate this comment. With Gaussian naïve Bayes we are assuming the data follow a Gaussian distribution. Kernel naïve Bayes relaxes this assumption by modeling the distribution via a kernel density estimation. This approach is likely more prone to overfitting because the kernel method requires additional hyperparameters for the density function used to model the feature's distribution. Any misspecification of the kernel parameters can impact the fit (e.g., if kernel bandwidth is too small), then the modeled distribution for that feature may not generalize. Both methods have pro’s and con’s. We wanted to be as comprehensive as possible, so we chose to include both measures even though Gaussian naïve Bayes overperformed kernel naïve Bayes. This has been clarified in the manuscript.

-

Overall, the manuscript provides little added value and is not suitable for publication. The results are unsatisfactory, not adequately substantiated with evidence, does not address the key questions in the objective and is incoherent in several ways described above.

- We believe that many readers may not be aware that auditory variables could be used to improve predictions of neurocognitive performance and would find these results interesting. We agree the current results represent an incremental step in understanding how the auditory system can be used as a window into neurocognitive function; but they also show a new and novel approach to predicting neurocognitive deficits.

Reviewer #2:

The paper uses machine learning (ML) algorithms to predict cognitive deficits with auditory and demographic data. In 5 of 7 ML algorithms under testing, auditory data help improves the prediction performance over models only based on demographic data. Naive Bayes based models are the best performing models (ROC-AUC around 0.9), based on which the most important features for cognitive deficits are identified. Both the statistical analysis and interpretations look good. Please see the comments below:

- Thank you for your review of our manuscript. We have addressed your comments below.

In 2.1 Data collection - Cognitive Data Collection, does the cognitive impairment indicator variable (MoCA score<26) the only response variable this study focus on? Is the "two sample" in T-test and the "classification error" in Bhattacharyya algorithm based on the binary variable derived from MoCA score (<26)?

- Yes, the MoCA was the only response variable we focused on for this study. Due to the MoCA’s validated binary classifier (<26 = impairment), we felt confident that this response variable would indicate neurocognitive deficits and be understandable to readers due to the international usage of this tool. Yes, the t-test and classification error in Bhattacharyya algorithm are based on the binary MoCA scores. We have clarified this in the figure legend.

The data collection was during 2017-2022, which include the sars-cov-2 pandemic period. Can the authors comment on if the data collection procedures before and after the pandemic was changed or not, if the visiting intervals of the subjects was also impacted? If yes, how would these factors influence the data and results? e.g. due to the pandemic, it is possible a subject has multiple visits before pandemic and 1 visit after pandemic, where the time interval of visits before pandemic are short while the one after the pandemic is much longer. The data of that after-pandemic single visit tends to have higher influence on the slope than the other visits before pandemic.

- These are important questions. We did not change the testing protocol except for the use of PPE during testing after the pandemic. We have published on the effects of PPE on neurocognitive tests using the Leiter-3 in pediatric patients, which showed no main effect of PPE on neurocognitive measures. The Leiter-3 involves much more operator interaction than the MoCA so we believe the MoCA results were likely not affected. The MoCA follows strict verbal instructions that are repeated the same way for every subject.

- There was at least a 6-month break in subject testing due to the pandemic, but after resuming activities, most subjects resumed normal frequency of study visits (twice per year). Longitudinal plots of the data do not show major changes in slope related to the pandemic.

Can the authors discuss more about why Naive Bayes models are performing better than the others? And does the strong assumption of naive bayes hold in this study?

- We believe naïve Bayes models performed better than other machine learning algorithms because they: 1.) are highly scalable and can handle a large number of features, making them suitable for our high-dimensional dataset, 2.) have low variance and are efficient with small datasets, which means they are less prone to overfitting, especially with our testing data of 96 data points, and 3.) they assume independence among features, effectively ignoring correlations between them. Naive Bayes classifiers assume that features are conditionally independent given the class label. This means that the presence or absence of one feature does not affect the probability of another feature occurring, given the class. In other words, even if there are irrelevant features present in the dataset, Naive Bayes models can still perform well because they focus on the most discriminative features (i.e., auditory variables) that are relevant for predicting the class label. Additionally, when the assumption is violated, naïve Bayes may give less weight to redundant features, which while affecting their feature importance could also reduce overreliance on a feature that could otherwise degrade subsequent predictive performance on a test set, relating to overfitting.

- We have added a summary of why naive Bayes models are performing better than the others in the discussion.

It seems the auditory variable trajectories are useful to predict cognitive function. Can the authors comment on is it interesting to predict the cognitive function trajectory instead of the final cognitive function?

- This is an excellent point and predicting the trajectory of cognitive function is a long-term goal. As a first step, we first wanted to see if auditory variables (mean and slope) would predict MoCA deficits at the highest likelihood timepoint of finding impaired subjects (the last visit). That is, with increased time, more subjects will show cognitive deficits, therefore the latest visit would likely have the worst neurocognitive outcome. In a practical scenario, such as clinical settings with limited resources or time constraints, focusing on one visit also provides an approach to assessing cognitive function without the need for extensive longitudinal data collection and analysis, especially when immediate decisions need to be made.

- Nevertheless, predicting the trajectory of neurocognitive function would provide a dynamic understanding of decline over time, which can be valuable for early detection and intervention. We plan to analyze trajectories on other neurocognitive tests. With this future work we may be able to identify patterns of change, such as accelerating decline or stabilization, which may offer insights into the underlying mechanisms of cognitive impairment. This study aimed to focus on if auditory variables could add to predictive ability of neurocognitive function at one timepoint.

- We have added this to the limitations section of the manuscript.

Ensemble model represents a large class of models, which one is tested in this paper?

- We used a LogitBoost model, which is an ensemble aggregation algorithm used primarily for binary classification. Because the ensemble aggregation method was a boosting algorithm, classification trees that allow a maximum of 10 splits composed the ensemble. One hundred trees composed the ensemble. We have added “LogitBoost” to the methods section, but because LogitBoost belongs to the overarching family of ensemble models, we chose to keep that verbiage throughout the manuscript.

As is indicated in discussion, the HIV status is not a significant predictor for cognitive function. I think is makes more sense to not highlight HIV in title and abstract. After reading the title and abstract,

my expectation was that there would be a section talking about how auditory and demographic factors can be used to predict cognitive deficits specifically caused by HIV.

- The authors debated this same question. While the PLWH group had a higher percentage of cognitive impairment, it was not a significant predictor in any of the algorithms. We have removed HIV from the title and tempered our focus on HIV in favor or prediction of neurocognitive impairment.

Typo "2.63 Data Collection" -> "2.1 Data Collection"

- Corrected

In supplementary figure 1, it seems from N=557 to N=478, there are some additional exclusion criteria but the second paragraph of "2.1 Data Collection - Subjects" did not mention it.

- We corrected this in the supplementary file and in the body of the manuscript. We separated out those with hearing loss (excluded n=37) and those with abnormal middle ear function (excluded n=79).

In supplementary figure 1, typo "Exlusionary" -> "Exclusionary"

- Corrected

In supplementary figure 2, the feature names on y axis are missing.

- Added feature names.

In Table 3, can the authors describe how was the p-values was calculated?

- In Table 3, p-values were calculated using a two-proportion z-test for accuracy, and the DeLong method (which compares two AUROC curves) for AUC. We have added this to the results section.

Reviewer #3:

The publication is well written; there is not really a lot of 'new research' with respect to machine learning, but this is an interesting use case.

- We appreciate this comment. We believe this is an interesting application of machine learning approaches to help understand how auditory measures could predict neurocognitive dysfunction.

Under "main outcomes" the "area under the curve" should specify the AUC is for the receiver operational characteristic (ROC) curve, and the other metrics should be cited too (F1 and Yourdon).

Main

Attachments
Attachment
Submitted filename: Response to the Reviewers v3.docx
Decision Letter - Billy Morara Tsima, Editor

Machine Learning for Predicting Cognitive Deficits using Auditory and Demographic Factors

PONE-D-23-26242R1

Dear Dr. Niemczak,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Billy Morara Tsima, MD MSc

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: (No Response)

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #3: (No Response)

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: No

Reviewer #3: (No Response)

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: (No Response)

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: The revisions look good. All the questions have been answered well. I recommend accepting it for publication.

Reviewer #3: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

**********

Formally Accepted
Acceptance Letter - Billy Morara Tsima, Editor

PONE-D-23-26242R1

PLOS ONE

Dear Dr. Niemczak,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Billy Morara Tsima

Academic Editor

PLOS ONE

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .