Peer Review History

Original SubmissionDecember 13, 2021
Decision Letter - Thomas Leitner, Editor, Joel O. Wertheim, Editor

Dear Mrs. Andresen,

Thank you very much for submitting your manuscript "Unsupervised machine learning predicts future sexual behaviour and sexually transmitted infections among HIV-positive men who have sex with men" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

Both Reviewers saw this investigation as worthwhile. The major outstanding issue with this manuscript is whether "training" and "validation" datasets were employed (per Reviewer 1). Upon re-reading this manuscript, I agree that is not clear. Though a charitable reading should suggest that "up to a certain cut-off date, and apply regression to test" indicates this approach. However, if this is the case, its presentation is too informal. I recommend the authors clarify their approach and perform a more robust assessment of their model, as re-review will be necessary before any formal consideration of this manuscript.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Joel O. Wertheim

Associate Editor

PLOS Computational Biology

Thomas Leitner

Deputy Editor

PLOS Computational Biology

***********************

Both Reviewers saw this investigation as worthwhile. The major outstanding issue with this manuscript is whether "training" and "validation" datasets were employed (per Reviewer 1). Upon re-reading this manuscript, I agree that is not clear. Though a charitable reading should suggest that "up to a certain cut-off date, and apply regression to test" indicates this approach. However, if this is the case, its presentation is too informal. I recommend the authors clarify their approach and perform a more robust assessment of their model, as re-review will be necessary before any formal consideration of this manuscript.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The manuscript entitled “Unsupervised machine learning predicts future sexual behaviour and sexually transmitted infections among HIV-positive men who have sex with men” used statistical approaches to assess whether clusters (developed using a hierarchical clustering technique) enhance predictions of sexual behaviour or sexually transmitted diseases (STIs).

Overall, I thought the paper read nicely and addressed an important policy question. However, I thought there were some methodological gaps. Below are the gaps that I think should be addressed. Thank you for letting me read your manuscript!

Major comments

1. The authors used likelihood ratio test (LRT), AIC, BIC, and auROC to assess the predictive performance of the clusters. To my knowledge, LRT, AIC, and BIC are typically used for model selection and not prediction. While these three metrics showed that a model including the cluster variables improves model fit, these metrics do not assess prediction. Therefore, based on my understanding, statements such as those on lines 174 (“…improved the model fit for predicting”) and 180 (“…improved model performance”) are not accurate as LRT does not assess prediction.

2. The manuscript did not discuss the use of a training and validation datasets. As written, it seems that the entire dataset was used to assess prediction. Without the use of training/validation datasets the prediction is typically too optimistic. Creating training/validation datasets seem to be the standard approach for prediction, so it would be nice to understand why that was not used.

3. The paper makes that claim that the use of the clusters increases the prediction. However, it would be nice to see a more robust model selection framework, i.e., including the previous two nsCAI values as variables (instead of just the previous one) or an “ever nsCAI” variable as well as investigation of the functional form of age.

4. The auROC metric (which was the one metric in the paper that I typically have seen used to assess prediction) did not seem very different with and without the clusters (as seen in Table 2). This left me wondering if the unsupervised machine learning clusters really did increase prediction. Especially once training and validation datasets are created and a more robust model selection approach is taken.

Minor comments

1. Not sure if “corrected our models” is the right term; maybe use control or adjust (see line 108 for an example).

2. I was not clear on lines 281-284 regarding the sexual contact network. It would be helpful to include more details on the connection between the analysis and contact networks.

Reviewer #2: Thank you for giving me the opportunity to review this manuscript. I thought it was well written and the subject is interesting. Although the subject of machine learning is not my expertise, I do have some minor points that will hopefully help to further improve manuscript.

Abstract

- 2nd paragraph: I find “up to a certain cut-off point” rather vague. I would suggest just to provide the cut-off date here.

- I am not sure what the author mean with the last paragraph of the abstract and I do not think this is elaborated on the Discussion section of the main paper. Do you mean the clusters with “framework”? And how can this framework be used as an alternate method for categorization (this is also mentioned in the Conclusion of the main paper) and how can it contribute to a better understanding of time-varying risk factors?

Methods:

- Line 59: how did you define a non-steady partner? Or a steady partner?

- Line 65: suggestion to refrain from using abbreviations that are not commonly used such as “nsCAI” and “nsP” if the word count allows it. This would increase the readability of the manuscript.

Results:

- Figure 1: panels for step 1 and 2 are rather small now and the text was very hard to read. Would is be possible to increase the size?

- Why was only syphilis routinely tested and not chlamydia and gonorrhea?

- Table 1: could you explain what is considered “mandatory schooling” in Switzerland? Is that similar to primary school and high school for example? And what school level corresponds to “finished apprenticeship?”

- Figure 2: I am not sure if the “total” line is necessary in this figure

Discussion:

- Line 284: do you mean the sensitivity analysis last mentioned in the Results section here? If so, I would clarify this.

- Line 292: Indeed, asymptomatic STIs may likely go unnoticed and thereby impact your outcome if it is based on self-report. How do you think this would impact your results if, instead of self-report, the outcome was based on actual STI testing done at follow-up visits?

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No: see reasons in manuscript

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Revision 1

Attachments
Attachment
Submitted filename: 20220620_ML_STIs_Reviewer_Reply.pdf
Decision Letter - Thomas Leitner, Editor, Joel O. Wertheim, Editor

Dear Mrs. Andresen,

Thank you very much for submitting your manuscript "Unsupervised machine learning predicts future sexual behaviour and sexually transmitted infections among HIV-positive men who have sex with men" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board. The Editor appreciated the attention to an important topic, and we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the below recommendations.

Having reviewed the response to reviewers and revised manuscript, I believe the authors have appropriately responded to the reviewers comments and criticism. However, these responses are not yet sufficiently reflected in the main text of the manuscript. Rather, many legitimate points of confusion and ambiguity were addressed only in the Response to Reviewers document, and not within the text to be published itself. This approach leaves open the strong possibility that readers of this paper will be equally as confused as the reviewers. Therefore, I encourage the authors to revise their manuscript again to ensure that the accessibility of this manuscript. Please return to the original reviewers and go through point-by-point to ensure all important points are addressed in the manuscript itself. Further, I would make it clear why no validation set was included, as readers will as themselves have the same question, and your response will improve the impact of this study.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Joel O. Wertheim

Associate Editor

PLOS Computational Biology

Thomas Leitner

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Having reviewed the response to reviewers and revised manuscript, I believe the authors have appropriately responded to the reviewers comments and criticism. However, these responses are not yet sufficiently reflected in the main text of the manuscript. Rather, many legitimate points of confusion and ambiguity were addressed only in the Response to Reviewers document, and not within the text to be published itself. This approach leaves open the strong possibility that readers of this paper will be equally as confused as the reviewers. Therefore, I encourage the authors to revise their manuscript again to ensure that the accessibility of this manuscript. Please return to the origin reviewers and go through point-by-point to ensure all important points are addressed in the manuscript itself. Further, I would make it clear why no validation set was included, as readers will as themselves the same question, and your response will improve the impact of this study.

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Revision 2

Attachments
Attachment
Submitted filename: 20220815_ML_STIs_Reviewer_Reply_Revised.pdf
Decision Letter - Thomas Leitner, Editor, Joel O. Wertheim, Editor

Dear Mrs. Andresen,

We are pleased to inform you that your manuscript 'Unsupervised machine learning predicts future sexual behaviour and sexually transmitted infections among HIV-positive men who have sex with men' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Joel O. Wertheim

Academic Editor

PLOS Computational Biology

Thomas Leitner

Section Editor

PLOS Computational Biology

***********************************************************

Formally Accepted
Acceptance Letter - Thomas Leitner, Editor, Joel O. Wertheim, Editor

PCOMPBIOL-D-21-02210R2

Unsupervised machine learning predicts future sexual behaviour and sexually transmitted infections among HIV-positive men who have sex with men

Dear Dr Andresen,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .