Peer Review History
Original SubmissionApril 14, 2020 |
---|
Dear Dr Watson, Thank you very much for submitting your Research Article entitled 'A cautionary note on the use of machine learning algorithms to characterise malaria parasite population structure from genetic distance matrices' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review again a much-revised version. We cannot, of course, promise publication at that time. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Giorgio Sirugo Associate Editor PLOS Genetics Scott Williams Section Editor: Natural Variation PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: review uploaded Reviewer #2: I am aware of researchers who have spent alot of time trying to (unsuccessfully) reproduce Plasmodium falciparum and P. vivax sequence-based population structure analyses and conclusions, including some of those presented within the publications cited in the manuscript. The underlying difficulties in some published work seem to be multi-factorial, including methodologies applied being opaque or non-robust, data being selectively removed, collapsed or not made available (e.g. Indonesian sequence data) and, as per the manuscript, some over-interpretation of results and unawareness of assumptions. Practices are changing and methodologies improving, including with some uptake in the use of IBD approaches, and therefore the proposed manuscript is important for future robust population structure analysis; especially as such analyses could be guiding disease control activities. To strengthen the manuscript, the best practice guidelines therein, and potential impact, I have suggested some edits: (1) There is a need for additional references in the context of malaria biology or epidemiology. For example, statements like “..unlike humans, malaria parasites can self (recombination between genetically identical male and female gametes), and the rate of selfing varies with transmission intensity.” (line 49) could be supported by a citation. In addition, there is a need for a few sentences discussing different types of machine learning (ML) methods. It seems the manuscript has narrow scope in the ML approaches applied, so some discussion of other approaches that can be used (e.g. random forests, support vector machines, etc. for classification) and examples will be useful for the reader. Perhaps, these approaches be more explicit in Figure 1. (2) One of the issues with analyses of Plasmodium sequencing direct from clinical samples is the multiplicity of infection and potential co-infections (e.g. P. falciparum and malariae co-infections). For the former, the Fws and estMOI software have been applied to triage out samples with >1 clones. It may be worth in the discussion or methods sections (where Fws is mentioned) stating in greater detail how MOI and co-infections can affect genetic clustering and IBD analyses. (3) The starting point for the analyses in the manuscript is a set of SNPs and samples, which have gone through a bioinformatic pipeline. In general, it is extremely difficult to generate exactly the same dataset even with a detailed Methods section. Hence, your call for greater availability of intermediate datasets is a good one. Equally, all underlying raw data needs to be available, and its unavailability is a problem in some P. falciparum and P. vivax genomics studies, where some populations (e.g. Indonesia) appear in print, but their raw data is unavailable. Whilst, I do not want the authors to discuss politics, there is a need for a general sentence about the need for complete data availability and citing some examples of P. falciparum and P. vivax studies where only partial raw sequence datasets were made available. (4) Related to (3), there is a clear need for greater transparency in bioinformatic and statistical/analytical pipelines (this could be included in Panel 1). The effects of allele frequency thresholding, SNP detection algorithms, and how missing genotype values are handled, could also impact on population genetic and structure analyses. It would be worth discussing this briefly. (5) As stated, there are at least 3 software tools available for IBD analyses. Whilst the focus of the manuscript is not on inter-population comparisons (Fst analyses are cited for this), there is a need to discuss whether and how IBD could be used for such analysis. It is a natural extension from your work to identify the genetic regions that are driving population structure, and by providing a brief discussion of this would lead to greater impact, especially to inform those working on molecular barcodes for geographical and transmission classification. More generally, transmission applications could be referenced. (6) The focus of the manuscript is mainly on SNP analysis, but there are also indel variants, and Figure 1 also mentions microsatellites. Should we include indels to improve population structure resolution and inference? How could indels be included into the ML approaches? Also, a quick pubmed search revealed many papers performing population STRUCTURE analyses using microsatellites where there is an over-interpretation of “K” (as highlighted, lines 476-). Given the large number of studies, it would make sense to include a very recent example to highlight your point and demonstrate that it is a contemporary issue (the most recent one I found from a pubmed search was PMID: 32379762, but feel free to identify an alternative appropriate example). (7) The dataset used for the analysis is quite small (N=393; covering years 2011-13), especially compared to the much larger datasets currently available. Whilst, I am not suggesting a revised analysis with the Pf3K data from the same locations post-2013, it would be useful to discuss the impact and use of additional data (e.g. for machine "learning"), and any new insights in light of subsequent new control measures. (8) Please check that gene names are italicised. Reviewer #3: This is an interesting and thoughtful paper highlighting the lack of critical thinking around the application of ML algorithms in analyses of malaria parasite genetic data. Comments: I struggled to get a sense of the questions that the authors had in mind in general. In the introduction and discussion there are isolated examples of the questions addressed by the methods. For example, p1 “an important goal in malaria parasite genetic epidemiology is inference of the full ancestral recombination graph” p2 “Many questions of clinical and public health relevance, for example, interpreting reduced haplotype diversity as a selective sweep”, p2 “characterizing population structure”, p13 “construct discrete clusters of samples” and p13 “what is the most likely sequence of events that led to the spread of a single multi-drug resistant P falciparum lineage across Southeast Asia”. There seems to be a swing from hugely broad to very specific questions – I can see why such a problem exists since there are many possible questions. It might be helpful to give some boundaries to the scope of the questions, or to categorize the types of questions, describe the categories and give specific examples within a category. There is a long paragraph in the introduction on the epidemic of multi-drug resistant parasites in the GMS. This was interesting, but I felt I had wandered into another paper. It is told from the point of view of someone interested in drug resistance, but for this paper it might be more relevant to focus on the different methods and the different conclusions reached, or to introduce it as a motivating example in a Panel. The rationale for the choice of example is not explained. It is interesting, has available data and gives different results by method - but perhaps it is a particular case. Since the paper is talking in general terms for a wider scope of questions, it may be interesting to consider if there is a set of conditions under which the results for the three methods would be the same. Chromosome painting is mentioned several times. It would be helpful to briefly describe how this works. There did not seem to be any mention of statistical models to infer networks of infection. Are these outside the scope of the questions considered? p5 Genetic distance measure 3. It would be helpful to mention how this measure behaves when there is recombination. NP - is this written out somewhere? (p12) ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No
|
Revision 1 |
Dear Dr Watson, We are pleased to inform you that your manuscript entitled "A cautionary note on the use of unsupervised machine learning algorithms to characterise malaria parasite population structure from genetic distance matrices" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional accept, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about one way to make your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Giorgio Sirugo Associate Editor PLOS Genetics Scott Williams Section Editor: Natural Variation PLOS Genetics Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have satifactorily duel on most of the issues raised. Use of panels for further detailing has improved the paper. Reviewer #3: The authors have revised the manuscript well and I have no further comments. Minor comment: The sentence starting L90 is awkwardly worded. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Alfred Amambua-Ngwa Reviewer #3: No ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-20-00579R1 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. |
Formally Accepted |
PGENETICS-D-20-00579R1 A cautionary note on the use of unsupervised machine learning algorithms to characterise malaria parasite population structure from genetic distance matrices Dear Dr Watson, We are pleased to inform you that your manuscript entitled "A cautionary note on the use of unsupervised machine learning algorithms to characterise malaria parasite population structure from genetic distance matrices" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Matt Lyles PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .