Peer Review History

Original SubmissionSeptember 9, 2019
Decision Letter - Mark Alber, Editor, Sushmita Roy, Editor

Dear Dr Isambert,

Thank you very much for submitting your manuscript 'Learning clinical networks from medical records based on information estimates in mixed-type data' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers.

The reviewers found the presented work to be timely and interesting. However, they raised some substantial concerns about the manuscript as it currently stands. In particular, the manuscript needs to be substantially revised to improve the clarity and usability and also how this work compares to the existing work. The manuscript needs to provide a proper Methods section. The code also needs to be made available along with example inputs and outputs. 

While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here

We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.

Sincerely,

Sushmita Roy, Ph.D.

Associate Editor

PLOS Computational Biology

Mark Alber

Deputy Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors present a method for computing the mutual information between mixed variables by finding an optimal binning strategy. They demonstrate that the method is competitive with state-of-the-art methods for estimating mutual information between mixed variables and that it has a particular advantage as an independence test. They then apply this mutual information estimator to graphical model structure learning and demonstrate good performance on benchmark data as well as present a case study application to a medical data.

The method and application are technically sound and well-presented.

My two main concerns are:

1. In the author summary and introduction an impression is built up that no methods exist for computing mutual information for mixed variables. The authors are clearly aware of these methods (references 15-17), however, the mention of these methods is pushed down deep into the benchmarking subsection of the results section. These must be brought to the forefront (be referenced in the introduction) as not to misrepresent the state of the art.

2. There's no explanation of the principle by which "latent variables" are suggested in the graphical model, i.e. what makes an edge suggest mediation by a latent variable vs a simple correlation/anticorrelation edge. If this is a post-hoc decision in light of expert knowledge the text needs to be explicit about that.

Reviewer #2: Review of the PLOS Computational Biology manuscript #PCOMPBIOL-D-19-01535 "Learning clinical networks from medical records based on information estimates in mixed-type data" by V Cabeli, L Verny, N Sella, G Uguzzoni, M Verny, H Isambert

Summary:

This paper presents an extension of the MIIC network learning algorithm for mixed-type (i.e. both continuous and categorical) data. This new approach relies on a new estimation procedure for the (conditional) Mutual Information (MI) for such mixed-type data, also introduced in this manuscript. After introducing the need and relevance of such methods especially in the context of medical records, the authors present new methodological developments for estimating (conditional) MI, that is suitable for mixed-type data, and illustrate its good performance on benchmark synthetic. Then the authors outline their extension of the MIIC algorithm for mixed data, briefly benchmark it, and present an extensive application to medical records of elderly patients with cognitive disorders. Finally, a short discussion quickly highlights the conclusions from that application.

General Comments:

This manuscript presents an interesting and timely new method for estimating network from mixed-type data such as medical records. While the manuscript is well written, the structure is a bit confusing and impedes both its readability and assessment: first it lacks a materials and methods section which should contains the methodological developments that are currently being presented alongside simulations benchmarks and application in the Results section; secondly the Discussion section should be broader and better acknowledge the assumptions and limitations made by the proposed method. Besides, I have questions concerning the guarantees offered by the proposed method and the assumptions required, as those are not clearly outlined in the manuscript. In particular, I wonder how the authors deal with the scaling of the MI and how it impacts edge pruning and filtering in their network inference. My questions to the authors are detailed below.

Major issues:

1. The MI is an unbounded positive quantity, therefore one of the difficulties of using MI for inferring networks from mixed-type data is the scaling of the MI that will usually varies depending on the variable type (binary, categorical, continuous…). This aspect should be discussed in the manuscript. In particular, the MI for categorical variables tends to increase with the number of categories. How do the proposed method deals with this when i) pruning (and filtering) the edges of the inferred network ? ii) representing the association strength such as in Figure 4 ?

2. The manuscript lacks a method section. New methodological development should be in a specific Methods section, with a first subsections presenting the new approach for approximating partial MI in mixed-data and a second one presenting the extension of the MIIC algorithm.

3. The discussion section should discuss the whole scope of the manuscript, including assumptions and limitations of the proposed approach for learning network from mixed-data, as well as synthetic benchmark results and application.

4. Page 4 line 82-33, the authors seem to make an assumption on the partitioning cut-points that should be clarified, especially if it is required for their approximation to be accurate.

5. The authors should detail a bit more how they derived equation 7 or provide a reference.

6. It is unclear whether there are guarantees for the convergence of the proposed optimization procedure presented at the bottom of page 4, or if this is more of a heuristic procedure that works in practice.

7. The authors should describe what are X and Y represented on Figure 2 and how they are generated in the synthetic benchmark (this is somewhat explained in the SI but should be mentioned and clarified in the main manuscript).

8. Page 6 the authors alludes to the capacity of their approach to identify (conditional) independence. Could they clarify how do they characterize independence from (part) MI — in my experience this can be difficult in practice, even with resampling procedures?

9. I command the author in making a software available for their method in the form of the R package miic. However, I was unable to find (and so test) the mentioned discretizeMutual function neither from the CRAN version of the package or on GitHub. The authors should provide an url for the code of the proposed approach.

Minor issues:

1. Page 2 line 36, “cause-effet” should appear in English

2. Figure 1 & 3 should have a linetype/color legend and should be readable in black & white

3. Page 6 line 142 KSG acroym is never defined

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: The data generation process for the benchmark data was described, which may serve as a substitute for explicit tables with that data (or the authors can easily generate such tables and provide them as, e.g. csv files).

The authors didn't specify how and whether the clinical data from the case study could be accessed (likely one of the options for sensitive data will apply).

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Revision 1

Attachments
Attachment
Submitted filename: responses_to_reviewers.pdf
Decision Letter - Mark Alber, Editor, Sushmita Roy, Editor

Dear Dr. Isambert,

Thank you very much for submitting your manuscript "Learning clinical networks from medical records based on information estimates in mixed-type data" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

In particular, please provide the necessary data files underlying the figures and results in your manuscript.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. 

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Sushmita Roy, Ph.D.

Associate Editor

PLOS Computational Biology

Mark Alber

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This is a review of the revised version of PCOMPBIOL-D-19-01535R1 'Learning clinical networks from medical records based on information estimates in mixed-type data'

The authors present a method for computing the mutual information between mixed variables by finding an optimal binning strategy. They demonstrate that the method is competitive with state-of-the-art methods for estimating mutual information between mixed variables and that it has a particular advantage as an independence test. They then apply this mutual information estimator to graphical model structure learning and demonstrate good performance on benchmark data as well as present a case study application to a medical data.

The method and application are technically sound and well-presented.

The authors' revisions have fully addressed all the concerns I had in my prior review.

I recommend publication as dissemination of this method will be of value to the research community.

Reviewer #2: The authors have adequaltely adressed all my comments. I recommend that the authors pay extra attention to provide **all data underlying the figures and results presented in the manuscript** in their final submission (especially regarding Fig 2 and 4).

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Yuriy Sverchkov

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods

Revision 2

Attachments
Attachment
Submitted filename: responses_to_reviewers2.pdf
Decision Letter - Mark Alber, Editor, Sushmita Roy, Editor

Dear Dr. Isambert,

We are pleased to inform you that your manuscript 'Learning clinical networks from medical records based on information estimates in mixed-type data' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Sushmita Roy, Ph.D.

Associate Editor

PLOS Computational Biology

Mark Alber

Deputy Editor

PLOS Computational Biology

***********************************************************

Formally Accepted
Acceptance Letter - Mark Alber, Editor, Sushmita Roy, Editor

PCOMPBIOL-D-19-01535R2

Learning clinical networks from medical records based on information estimates in mixed-type data

Dear Dr Isambert,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Laura Mallard

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .