Peer Review History
| Original SubmissionApril 2, 2021 |
|---|
|
PONE-D-21-10920 A prospectively validated novel risk prediction model for new onset heart failure utilizing a large statewide health information exchange PLOS ONE Dear Dr. Duong, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Thank you for the opportunity to edit this. It's a good piece of work, and will add to the literature in this rapidly expanding area. Any more detail that could be added for translatability/reproducibility, the better. Similarly, ensuring the manuscript meets a systematic review checklist would increase the assessed quality. All suggestions are optional and intended to add value and potential impact. Please submit your revised manuscript by Jul 10 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Dylan A Mordaunt Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. 3. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ 4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. 5.Thank you for stating the following in the Competing Interests section: "I have read the journal's policy and the authors of this manuscript have the following competing interests: Dr. Ling, Mr. Widen, and Dr. Sylvester are co-founders and shareholders of HBI Solutions " We note that one or more of the authors are employed by a commercial company: HBI Solutions a) Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form. Please also include the following statement within your amended Funding Statement. “The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.” If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement. b) Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc. Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf. Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests Additional Editor Comments : - One of the editors has asked for expansion on details of the HIE, as this would be of interest both for reproducibility but also in terms of translation. I understand that the Maine HIE was based on Orion Health's Rhapsody HL7 integration engine and associated stack, which is a very similar stack to both HealthENet in NSW and CalIndex in California. - I'm aware that previously Maine HIE had operationalized readmissions predictions based on daily extracts from the HIE, returning predictions to case managers. Although this was quality driven, it's worth considering that this was motivated by CMS funding. This track record would be worth citing. - I can see how this model would be useful and I think the authors should be commended. The features are somewhat telelogical for the purpose, but it may be that in translation the application is different. Just something for consideration. - This is a good piece of work. I would suggest that if this were to be included in a systematic review our critically analysed, it would be worth the authors undertaking the Tripod ML checklist or similar- https://www.tripod-statement.org/, so as to ensure to increase the impact, quality etc [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Please include a more detailed description of the Health Information Exchange (HIE) - is is designed to provide real-time point of care access to practitioners or is is a passive repository for secondary analysis purposes? I would like to see a sensitivity analysis that ascertains HF based on only a single outpatient HF code (rather than two outpatient codes) - does this improve sensitivity, PPV and AUC? Perhaps could add to the discussion potential to use newer survival analysis extensions to deep learning models which might require less dimensionality reduction. Could expand a bit more on how why there is variation in provider billing practices, how this influences ICD coding, and how this might impact the predictive models. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Louisa R Jorm [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
PONE-D-21-10920R1 A prospectively validated novel risk prediction model for new onset heart failure utilizing a large statewide health information exchange PLOS ONE Dear Dr. Duong, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Oct 07 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Dylan A Mordaunt Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Additional Editor Comments: Thank you for your submission and amendments. We have received some additional feedback with regards to methods as per the reviewers. Whether these are minor or major suggestions is a matter of perspective, but they are all worth considering. In particular it would be useful to describe how and why this model is different from previous models in the field. All suggestions are addressable. I look forward to receiving your resubmission. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: (No Response) Reviewer #3: (No Response) Reviewer #4: (No Response) Reviewer #5: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Partly Reviewer #3: Partly Reviewer #4: Partly Reviewer #5: Partly ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: N/A Reviewer #3: Yes Reviewer #4: Yes Reviewer #5: I Don't Know ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: No Reviewer #3: No Reviewer #4: Yes Reviewer #5: No ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes Reviewer #5: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: Some previous comments have been addressed. In my point of view, there are some major points that still need to be addressed to meet the quality for publication: 1. The manuscript itself lacks a lot of literature review on related works. 2. The authors should compare the performance results to previous studies on the same dataset. 3. The authors should propose more feature selection techniques to find out the optimal ones. 4. How did the authors perform hyperparameter optimization of the models? 5. Machine learning-based model (i.e., XGBoost) has been used in previously biomedical studies such as PMID: 31987913 and PMID: 32942564. Thus, the authors are suggested to refer to more works in this description to attract a broader readership. 6. There must have space before reference number. Reviewer #3: Aim: This paper developed a risk prediction tool to detect incident heart failure in adults using a large state-wide CCHIE from the state of Maine, USA. A tree-boosting algorithm was trained in order to model the probability of incident heart failure in year two from data collected in year one, and then prospectively validated in year three. This paper tackles a very important problem that could be solved by using existing routinely collected data. In addition, it shows how difficult could be for an algorithm to predict HF based on administrative and billing codes such ICD10. I enjoyed reading the paper. The model obtains a high specificity but a very low sensitivity. This could be expected as the classes are highly imbalanced, making this problem a difficult one. If the algorithm classifies everyone as healthy, it will have already very high specificity. I would invite the authors to tackle this problem by using some of the machine learning techniques that deal with imbalanced datasets, for example, using a Class Weighted XGBoost or Cost-Sensitive XGBoost. In addition, given the vast amount of data (labs, procedures, medicines,...) the authors could use other markers in addition to ICD10 to better define HF. This will include some work with clinicians. Finally, the authors have seemed to just "plug" the XGBoost to the data without tuning any hyper-parameter. I would also invite the authors to re-visit this and tune some of the most important hyper-parameters of the XGBoost. This could significally improve the performance of the predictive algorithm. Please find some other suggestions below: Abstract: 1. Methods and Results: "A tree-boosting algorithm was developed": A tree-boosting algorithm was trained, rather than developed. 2. Conclusions: See my comment below regarding the term "prospectively validated". Methods Database and subject selection criteria: 1. I would use the terms "training and test" sets for the sets that you use for training and testing. I am not 100% clear which part of the data you use for training and which one for testing. Explain this in detail with dates and number of records. Table 1 will be ideal for that. Every machine learning algorithm is tested in some data that the model hasn't seen during the training, but I wouldn't call it "prospectively validated" unless you collect the data prospectively, which as far as I understood, it is not the case here. This paper may be of interest: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2760438 (Prospective and External Evaluation of a Machine Learning Model to Predict In-Hospital Mortality of Adults at Time of Admission Nathan Brajer). They used the terms training and test sets to build the model, and then, they prospectively validated by validation the model with real-time data: "The model was integrated into the production electronic health record system and prospectively validated on a cohort of 5273 hospitalizations representing 4525 unique adult patients admitted to hospital A between February 14, 2019, and April 15, 2019." Therefore, in my opinion, you use the data as if it was collected prospectively, which is exactly the idea behind any test set, but you didn't integrate the model into production and validate prospectively. I would remove the term "prospectively validated" from the manuscript and replace it with "test the algorithm" or validate it in the test set. 2. "Subjects in the model building/training cohort were randomly split into a 2/3 training and 1/3 retrospective prediction group, which was used to train separate models under differing feature sets (see below)." I was confused with the definition of the training and test set. In addition ,Table 1 doesn't specify the dates or the amount of patients (2/3 and 1/3) that falls in each group. Table 1 says that the training cohort contains 497,470 patients, but Figure 1 says the 497,470 were divided into "observation period" (usually this is the training set) and prediction period (this is NOT usually part of the training set). Finally, the term "validation cohort" is a bit confusing too, since "validation or development set" is commonly used for hyperparameter tuning. As stated above, I would change it to training and test. If validation cohort is preferred, please state clearly, with dates and number of patients, what sets you used for training, test and external testing or "validation". But I don't think it is either external (since the data comes from the same system) or prospective (as it is not collected prospectively). Feature selection and preprocessing: 1. Did you convert the ICD9 to ICD10 codes? You talked about using 5 or 3 digits with the ICD10. How do you deal with the ICD9 codes? 2. "Finally, we performed a univariate filtering step to eliminate features associated 78 with the outcome with a chi-squared test p-value >0.2." You could use XGBoost for both purposes, 1) feature/dimensionality reduction (https://machinelearningmastery.com/feature-importance-and-feature-selection-with-xgboost-in-python/) and 2) Final model. It would be interesting to see which features the XGBoost presents as the most relevant. 3. Did not you include patient characteristics such as gender? I think males tend to have higher incidence of HF. 4. "Medications were mapped to medication class using the Established Pharmacologic Class coding system". Were medications treated as binary 1-Medication was taken, 0 -Wasn't used, or did you use the amount of units, tablets,...? In supplementary Table 3, the medications contain this text: Patient had *** medications , which makes me think you consider a continuous number. Please clarify. 5. Exactly the same for lab test, for example "Patient had *** abnormal laboratory tests (INR in Blood by Coagulation assay) in the last 12 months". Maybe a binary flag will be enough. 6. In addition, I think it would help to present this supplementary table 3 by groups: medicines, laboratory test, demographics, ICD10 Umbrella, CPT4 Code. It would be very interesting to see what features are relevant for the algorithm. Outcome definition: 1. "Development of HF was defined as new assignment of an ICD-10 code for HF". Therefore, no ICD9 codes were considered for the definition of HF. am I right? I assumed all the systems were using ICD10 then. Supplementary table 1 includes some ICD9 codes, which was confusing. 2. As stated above, please clarify the training and test sets. Model construction: 1. Hyper-parameter tuning: I haven't seen any reference to the hyper-parameters of the model. As per the API, there are many hyper-parameters that can be tuned: https://xgboost.readthedocs.io/en/latest/parameter.html. That is, a "development" set (in addition to the training and test set) should be put aside to tune these hyper-parameters. Alternatively, cross validation could be used in the training set using techniques such as grid search or random search to tune those hyperparameters. a. Why did you decide to change the max_depth from the default value (6) to 5? b. Why didn't you tune any of the others hyper-parameters or a subset of them? This could change the performance by much. For example, learning rate, number of estimators, type of regularization (L1 or L2), ... 2. Reproduction of the results: Which library, version and software did you use? Which type of machine? Results: 1. In the methods you wrote: "Finally, we performed a univariate filtering step to eliminate features associated 78 with the outcome with a chi-squared test p-value >0.2." but in the results, it seems that the features were chosen by the XGBoost algorithm: "Of the 43,906 possible data features before feature reduction techniques were performed, the boost algorithm selected 339 for inclusion in the final model (Supplemental Table 3)." Please clarify. 2. I didn't understand this sentence: "The model also selected as weak classifiers features such as undergoing eye surgery, laxative use, abnormal iron levels, and vitamin D use, to name a few." Figure 2: The confusion matrix was confusing. Please use standard names: Predicted versus True/ Reviewer #4: The authors report the development of risk prediction tool for development of HF in adults using a large state-wide CCHIE from the state of Maine. In terms of data, the author collects enough data for modeling and analysis; in terms of algorithm, the author uses the classical machine learning algorithm xgboost to model the probability of incident-HF in year two from data collected in year one, and then prospectively validated in year three. Here are some questions that may need to be explained: 1.Throughout the data set, the positive data (disease +) is much less than the negative data (disease -), only about 1%, the data is highly uneven. For this kind of binary classification problem of data imbalance, we should first down-sample the (disease -) data to keep the two types of data as balanced as possible before modeling. Because of the data imbalance, the Sensitivity and PPV shown in the confusion matrix of Figure2 are not high enough. How does the author think about this problem? 2.On page 6 start from Line 89, The authors use XGboost algorithm to build two models, but the reason for choosing this algorithm needs to be supplemented, why there is no other algorithm, such as SVM or fully connected neural network and so on. 3.Among the excluded patients, only the diagnosis or data of the previous year were excluded. If the patients did not come to see a doctor because of HF in the previous year, but had a previous history of HF (such as diagnosed earlier), how to exclude them? Will them be mistakenly included? 4.The table1 baseline data are too simple, so we should recompare the baseline data between the training group and the validation group, and whether there is any difference between the heart failure group and the non-heart failure group combined with the most important feature found by table2. Reviewer #5: The manuscript describes an original application of machine learning for new-onset heart failure prediction in a 1-year timeframe on a large cohort of subjects. Based on the manuscript in its current form, I do not have sufficient elements to know whether statistical analysis (especially machine learning) has been performed appropriately. The level of detail in which the methods and experimental procedures are described should be increased. Most importantly, please add more details related to the machine learning experiments. For example: - Is data reduction applied before the classification? If so, was it done only on the training set to avoid information leakage? - What kind of cross-validation was used for model development? - How was the final model derived? - How were the XGBoost parameters chosen? - The data set classes appear to be highly imbalanced: was this taken into account in model development? Please provide further details about the "data reduction techniques to reduce dimensionality" mentioned in the Methods section. The initial number of features is reported only in the Results section: I suggest mentioning it also in the Methods section, "Feature selection and preprocessing", together with the number of features resulting after dimensionality reduction and the univariate filtering step. Please specify what criterion was followed to aggregate lab data into abnormal/normal. When the AUC values are first reported, e.g. "0.797 [0.790-0.803]" etc, the "95% CI" statement inside the brackets is missing. Moreover, are the results reported as mean or median plus CI? Minors: - Please check the manuscript for proper spacing between words and references, spelling ("others models", "this algorithm could integrate into the HIE", etc.), and punctuation. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No Reviewer #3: Yes: Oscar Perez-Concha Reviewer #4: No Reviewer #5: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.
|
| Revision 2 |
|
Identification of patients at risk of new onset heart failure: utilizing a large statewide health information exchange to train and validate a risk prediction model PONE-D-21-10920R2 Dear Dr. Ling, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Dylan A Mordaunt, MB ChB, FRACP, FAIDH Academic Editor PLOS ONE Additional Editor Comments (optional): Thank you for your resubmission. As you will have seen, we have had some change in reviewers, that often produces variability. I will address these as follows. The reviewers have provided some valuable feedback, and although my decision is to accept (which I detail below), I would suggest considering whether to include these in your final submission. Dr Perez-Concha has given a detailed discussion and it's a shame we don't have an editorial format that we could enable Dr Perez-Concha to expand on these, as I think they're valuable but shouldn't prevent publication under the PLoS One format. With specific reference to PLoS One's criteria for publication (https://journals.plos.org/plosone/s/criteria-for-publication): 1. The study appears to present the results of original research. 2. Results appear not to have been published elsewhere. 3. Experiments, statistics, and other analyses are performed to a high technical standard and are described in sufficient detail. There are some additional comments from reviewers that do not represent critical flaws and are perhaps something to be addressed in post-publication review. 4. Conclusions are presented in an appropriate fashion and are supported by the data. 5. The article is presented in an intelligible fashion and is written in standard English. 6. The research meets all applicable standards for the ethics of experimentation and research integrity. 7. The article adheres to appropriate reporting guidelines and community standards for data availability. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: (No Response) Reviewer #3: (No Response) Reviewer #4: (No Response) Reviewer #5: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: No Reviewer #3: Partly Reviewer #4: Partly Reviewer #5: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: I Don't Know Reviewer #3: I Don't Know Reviewer #4: I Don't Know Reviewer #5: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: No Reviewer #3: Yes Reviewer #4: Yes Reviewer #5: No ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes Reviewer #5: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: Some of my previous comments have been addressed. However, there are still some concerns as follows: 1. I previously asked for literature review in Introduction to show some previous works that focused on the same problem, not mean the related works in the Methods. 2. The authors should compare the performance results to previous studies on the same dataset. ==> If the authors aim to use their data and convince that their methods are good, they should replicate the other methods on their data to prove that. Currently there are some related works focusing on this prediction model, thus they must try and compare. 3. The authors should propose more feature selection techniques to find out the optimal ones. Reviewer #3: Thank you for the opportunity to review this paper again. Thank you for having addressed my previous suggestions. Abstract •Conclusions: Instead of "passively" I would use the word routinely. Methods •Lines 74-77: “Laboratory data were provided from the HIE as “abnormal” and “normal” binary categories due data interoperability challenges requiring raw test values to be converted to binary abnormal/normal categorical variables via comparing test result value against the corresponding care providers’ test normal reference range”. Questions: a.Does this sentence mean that you do not know the criteria which were followed to aggregate lab data into abnormal/normal? b.Did all the health providers (hospitals, outpatient clinics, …) follow the same criteria to convert numbers to normal and abnormal categories? •The section Exploratory Model Analyses should be included in the section Model construction and tuning, as both sections deal with hyperparameters. Please create a table or summarize all the hyper-parameters that you have tuned, instead of finding this information scattered across the paper. I didn't have clear some of the steps that you followed and that's why I answered "I don't know" to the question of "3. Has the statistical analysis been performed appropriately and rigorously?" •It will be very useful to include a “graph” or plot with the exact definition of “discovery” and “validation” cohort. •Line 143: “the positive class, grid search a range of different class weightings (90, 95, 100, 110, 150)”. What was the value of the weight for the negative class? A value of 1? What is the meaning of a weight of 150? Results •Table 1. Reviewer 4, comment 4 suggested: “The table 1 baseline data are too simple, so we should compare the baseline data between the training group and the validation group, and whether there is any difference between the heart failure group and the non-heart failure group combined with the most important feature found by table 2”. You said you addressed this, but I don’t see that you listed the important features of table 2 within table 1. I agree I would add more features to the table and percentages to the numbers. You could add the top-ranked 25 known features associated with HF. I think the main result of this study should be how difficult is to predict HF even with big amounts of data. I don’t think the main finding or result is a predictive algorithm per se. I would wonder how many clinicians would trust it. The sensitivity of 29.2 is a very low value. I don’t think AUC or specificity are very informative in this case, as the problem is highly imbalanced. I would suggest that the authors frame the question in terms of the methodology to predict HF, what we can be done with this method, and what we have to do in the future to predict HF more accurately. For future work, it might be worth exploring the nature of the question, that is, prediction of HF in the next year. Maybe we need to understand better which predictors/features are going to predict HF more accurately, instead of feeding everything directly to the model. In addition, it could be beneficial to use several models and make a comparison. Reviewer #4: Whether the model is used for diagnosis or screening, the nature of machine learning is the same. If the data is highly unbalanced, the model will learn more about the characteristics of the "majority" samples and ignore the "minority" samples. So I suggest adding an experiment to sample the "majority" samples so that the data are equalized and modeled again, and the PPV and sensitivity of your new model will be more valuable. Reviewer #5: I would like to thank the authors for replying to my previous comments. I suggest to rephrase the new paragraph “Laboratory data were provided from the HIE as “abnormal” and “normal” binary categories due data interoperability challenges requiring raw test values to be converted to binary abnormal/normal categorical variables via comparing test result value against the corresponding care providers’ test normal reference range.”, since it is hard to read. Other newly added parts may also benefit from proofreading. I have no further issues. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: Yes: Khanh N.Q. Le Reviewer #3: Yes: Oscar Perez-Concha Reviewer #4: No Reviewer #5: No |
| Formally Accepted |
|
PONE-D-21-10920R2 Identification of patients at risk of new onset heart failure: utilizing a large statewide health information exchange to train and validate a risk prediction model Dear Dr. Ling: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Dylan A Mordaunt Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .