Backward compatibility of whole genome sequencing data with MLVA typing using a new MLVAtype shiny application for Vibrio cholerae

Jérôme Ambroise; Léonid M. Irenge; Jean-François Durant; Bertrand Bearzatto; Godfrey Bwire; O. Colin Stine; Jean-Luc Gala

doi:10.1371/journal.pone.0225848

Peer Review History

Original SubmissionJuly 24, 2019
12 Aug 2019 Decision Letter - Axel Cloeckaert, Editor PONE-D-19-20834 Backward compatibility of whole genome sequencing data with MLVA typing using a new MLVAtype shiny application: the example of Vibrio cholerae PLOS ONE Dear Dr Ambroise, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please consider the comments of the reviewer to improve the manuscript. We would appreciate receiving your revised manuscript by Sep 26 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Axel Cloeckaert Academic Editor PLOS ONE Journal Requirements: 1. When submitting your revision, we need you to address these additional requirements. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 3. In your Methods section, please clarify whether the isolates were obtained from a collection, a company, or from another third party source 4. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Ambroise et al describe a software (MLVAtype) which allows to deduce MLVA genotypes from assembled Whole Genome Sequence (WGS) data. The input is a draft assembly, the size of the k-mer used for the assembly and the sequence of the repeated motif. MLVAtype will search for the largest occurrence of repeated motifs. Then MLVAtype will compare the length of this largest occurrence and will consider it as a valid VNTR allele if it is smaller than the k-mer value used for the assembly. To evaluate MLVAtype, the authors used 19 V. cholera isolates and an MLVA assay comprising five loci with a 6 bp (3 loci), 7 bp or 9 bp repeat unit. They sequenced the PCR amplification products by Sanger sequencing to constitute the reference data. They then compare in vitro and WGS-derived MLVA typing data with the reference data. Nine isolates were typed in vitro using the Agilent 2100 Bioanalyzer whereas ten isolates were typed in vitro using the higher resolution capillary electrophoresis system GenScan. 300 bp long WGS data was produced for the nine isolates versus 150 bp for the ten isolates. The 300 bp long reads allow to deduce a full and correct MLVA genotype whereas the 150 bp reads provide partial data. 90% of in vitro MLVA profiles are incorrect. There are a number of issues associated with the software design, in vitro typing, report organization. -in some species in which MLVA is used, some VNTRs occur in families, i.e. different VNTR loci share the same repeat unit. How will MLVAtype behave in such a case? It seems that all loci sharing an identical repeat motif will be (incorrectly) assigned the largest allele size. -conversely, tandem repeats are often not perfect. How will MLVAtype behave in such a case? -what is the rational for not taking into account contigs smaller than 2000 bp, especially given that in the present situation (Vibrio cholera VNTRs), the VNTR loci are shorter than 200 bp? -Page 2 third $: the authors do not mention other published approaches for in silico MLVA typing. For instance Vergnaud et al. Frontiers Microbiology 2018 previously explored the possibility to deduce MLVA from WGS data was evaluated for Brucella using an in silico PCR approach. -Page 6, Table 4: please also provide the initial estimates before applying the “right-censorchip”. The authors seem to implicitly assume that Spades will never correctly reconstruct tandem repeat arrays longer than the k-mer size, but do not provide the data to show that. -by default, Spades will explore multiple k-mer values (rather than a specified value). It would seem useful to provide in Table 4 the results obtained in these conditions. -if indeed Spades will not reconstruct tandem repeats longer than k-mer size, then why try to assemble reads when the read length is longer than available k-mer sizes (instead of recovering the reads of interest using tools such as BBduk)? -the in vitro MLVA assay does not seem to be correctly working yet, because the Bioanalyzer does not have a sufficient amplicon sizing resolution and/or because the allele calling (conversion from size estimate to repeat copy number) is not optimized, as indeed suggested by the authors in the first paragraph page 8. The error rates reported in Table 3 for the Bioanalyzer (above 50%!) and for GenScan (7 errors in 10 strains for the second locus) are not acceptable. What is the interest of backward compatibility of WGS with in vitro MLVA typing if the in vitro data is so bad, i.e. if MLVA does not seem applicable here? I believe that the Bioanalyzer (in)capacity to discriminate VNTR alleles with repeat units smaller than 8 bp has already been discussed in the literature, see for instance De Santis et al., BMC microbiology 2011. Regarding the GenScan errors, this is probably due to incorrect allele calling, resulting from slightly inexact size measurement by the capillary equipment (literature available, see for instance Hyytia-Trees et al., Foodborne Pathog Dis 2010). Once correctly set-up, there should be no errors at least when using the GenScan. Table 5: include the cost estimate for the GeneScan method (where the assay can be run in a single multiplex PCR). The estimated cost for PCR (10 € per PCR for reagent costs) seems a bit high. Please clarify the indicated WGS cost: does this cover the making of the sequencing library? What read length? More generally, Table 5 and the associated paragraph are poorly informative, it would be useful to try to estimate the overall cost, based upon commercial services prices. Page 7, paragraph on “Theoretical feasibility …” is not informative as is. Should either be developed, by being more specific, or deleted. Page 8, second paragraph, “the larger, the k-mer size, the better the accuracy of WGS-derived MLVA profiles.” The authors need to show the data (see previous remark on Table 4). Also please explain the reason for limiting the k-mer size to 175 Page 8, last paragraph before conclusion: the theoretical evaluation has not been done appropriately. The postulated 6 nt motif is not applicable to any of the MLVA assays commonly used in the given list of genus/species. Rather the design of the MLVAtype software indicates that it will have a very limited range. Indeed the MLVAtype web page at https://ucl-irec-ctma.shinyapps.io/NGS-MLVA-TYPING/ appears tailored for V. cholerae (and perhaps MLVA assays with up to five VNTR loci, and short and perfect repeat arrays). The article might be more convincing if focused on V. cholerae and its 10000 publicly available sequence reads archives. -details, Page 1 last $, Mycobacterium, Streptococcus etc are not species but genus. Please be more specific. Page 2, first paragraph “Mounting evidence …” obviously and by definition Whole Genome Sequencing can only be better than the previous methods, no need to refer to “Mounting evidence”. The issue is cost, as detailed in the second paragraph. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0225848.r001
Revision 1
23 Sep 2019 Author Response We thank the Editor and the Reviewer for their careful reading of our manuscript. In revising our paper, we carefully followed the editor’s direction, and replied to the point by point questions of the reviewer. We are confident that the answers provided and the corresponding modifications in the revised version will now meet the Editor and the Reviewer’s expectations. We thank you in advance for your editorial work. The authors Reviewer #1: Ambroise et al describe a software (MLVAtype) which allows to deduce MLVA genotypes from assembled Whole Genome Sequence (WGS) data. The input is a draft assembly, the size of the k-mer used for the assembly and the sequence of the repeated motif. MLVAtype will search for the largest occurrence of repeated motifs. Then MLVAtype will compare the length of this largest occurrence and will consider it as a valid VNTR allele if it is smaller than the k-mer value used for the assembly. To evaluate MLVAtype, the authors used 19 V. cholera isolates and an MLVA assay comprising five loci with a 6 bp (3 loci), 7 bp or 9 bp repeat unit. They sequenced the PCR amplification products by Sanger sequencing to constitute the reference data. They then compare in vitro and WGS-derived MLVA typing data with the reference data. Nine isolates were typed in vitro using the Agilent 2100 Bioanalyzer whereas ten isolates were typed in vitro using the higher resolution capillary electrophoresis system GenScan. 300 bp long WGS data was produced for the nine isolates versus 150 bp for the ten isolates. The 300 bp long reads allow to deduce a full and correct MLVA genotype whereas the 150 bp reads provide partial data. 90% of in vitro MLVA profiles are incorrect. There are a number of issues associated with the software design, in vitro typing, report organization. Question: -in some species in which MLVA is used, some VNTRs occur in families, i.e. different VNTR loci share the same repeat unit. How will MLVAtype behave in such a case? It seems that all loci sharing an identical repeat motif will be (incorrectly) assigned the largest allele size. Answer: All loci sharing an identical repeat motif will indeed be incorrectly assigned the largest allele size. For such species, an in silico PCR approach would be more appropriate. According to your recommendation (see your other later comment), we decided to focus the paper on V. cholerae where each repeat unit is only found in one locus. Question: -conversely, tandem repeats are often not perfect. How will MLVAtype behave in such a case? Answer: For such case, an in silico PCR approach would indeed be more appropriate. According to your recommendation (see your comment later), we have now restricted the focus of our application on V. cholerae where tandem repeats are perfect. It is worth noting that the problem of right-censoring applies only to a perfect repetition of the same motif. This also explains why right-censored data were not observed in the study of Vergnaud et al. Frontiers Microbiology 2018. Question: -what is the rational for not taking into account contigs smaller than 2000 bp, especially given that in the present situation (Vibrio cholera VNTRs), the VNTR loci are shorter than 200 bp? Answer: The 2000 bp threshold was modified to 1000 bp in order to be in line with literature. Small contigs are actually excluded because they are often non-informative [1, 2, 3 , 4] and associated with a low coverage. Question: -Page 2 third $: the authors do not mention other published approaches for in silico MLVA typing. For instance Vergnaud et al. Frontiers Microbiology 2018 previously explored the possibility to deduce MLVA from WGS data was evaluated for Brucella using an in silico PCR approach. Answer: We fully agree with the reviewer and thank him for his suggestion. This reference is now included in the reference list of the amended version of the paper. Question: -Page 6, Table 4: please also provide the initial estimates before applying the “right-censorchip”. The authors seem to implicitly assume that Spades will never correctly reconstruct tandem repeat arrays longer than the k-mer size, but do not provide the data to show that. Answer: We agree with the comment. In the amended version of the paper, this table is now provided as ‘supplementary file 1’ and commented in the “Results” section. Question: -by default, Spades will explore multiple k-mer values (rather than a specified value). It would seem useful to provide in Table 4 the results obtained in these conditions. Answer: We agree with the comment. In the amended version of the paper, these results are now provided in the ‘supplementary file 1’ and commented in the “Results” section. Question: -if indeed Spades will not reconstruct tandem repeats longer than k-mer size, then why try to assemble reads when the read length is longer than available k-mer sizes (instead of recovering the reads of interest using tools such as Bbduk)? Answer: One advantage of the current application is that you can run the MLVA typing on a shiny application very quickly and easily. This process (including data transfer and analyses) would be much longer if the MLVA profiles were directly typed from the reads. Therefore, deriving MLVA profiles directly from the reads was not tested in our study. As the current results showed no discrepancy between WGS data (after assembly) and Sanger sequencing data (gold standard), we focused the paper on deriving MLVA profiles from the assembly. Question: -the in vitro MLVA assay does not seem to be correctly working yet, because the Bioanalyzer does not have a sufficient amplicon sizing resolution and/or because the allele calling (conversion from size estimate to repeat copy number) is not optimized, as indeed suggested by the authors in the first paragraph page 8. The error rates reported in Table 3 for the Bioanalyzer (above 50%!) and for GenScan (7 errors in 10 strains for the second locus) are not acceptable. What is the interest of backward compatibility of WGS with in vitro MLVA typing if the in vitro data is so bad, i.e. if MLVA does not seem applicable here? I believe that the Bioanalyzer (in)capacity to discriminate VNTR alleles with repeat units smaller than 8 bp has already been discussed in the literature, see for instance De Santis et al., BMC microbiology 2011. Regarding the GenScan errors, this is probably due to incorrect allele calling, resulting from slightly inexact size measurement by the capillary equipment (literature available, see for instance Hyytia-Trees et al., Foodborne Pathog Dis 2010). Once correctly set-up, there should be no errors at least when using the GenScan. Answer: The objective of the paper was not to optimize Bioanalyzer-derived MLVA typing (albeit this can of course be done as previously reported by Lista et a. in the literature). Accordingly and to avoid any ambiguity about the imperfect Bioanalyzer results presented in the first version of the paper, this part was removed in the amended version. Likewise, and as discussed supra, the purpose was to compare NGs results with existing published results. We did not really questioned the reliability of these published GenScan results. However, following the comment of the reviewers, we addressed this question to our foreign collaborators and they decide to review the initial GenScan-based MLVA profiles; a senior technician ran them again: all but one - unexplained - mismatches were corrected! These errors in our first submission and the reviewer’s comment underpin therefore the value of the work of Hyytia-Trees et al concluding that “proper training and experience is necessary to collect accurate information when using the GeneScan methodology”. Albeit not 100% concordant, this concordance between Sanger- and GenScan-derived MLVA profiles is substantially improved in the amended version of the manuscript (Table 3, Figure 3). Question: Table 5: include the cost estimate for the GeneScan method (where the assay can be run in a single multiplex PCR). The estimated cost for PCR (10 € per PCR for reagent costs) seems a bit high. Please clarify the indicated WGS cost: does this cover the making of the sequencing library? What read length? More generally, Table 5 and the associated paragraph are poorly informative, it would be useful to try to estimate the overall cost, based upon commercial services prices. Answer: The cost of the Bioanalyzer method was replaced by the cost of the GenScan method. In addition, PCR costs were updated and a new reference was added to clarify the cost of WGS analysis. Question: Page 7, paragraph on “Theoretical feasibility …” is not informative as is. Should either be developed, by being more specific, or deleted. Answer: This paragraph was deleted, accordingly. Question: Page 8, second paragraph, “the larger, the k-mer size, the better the accuracy of WGS-derived MLVA profiles.” The authors need to show the data (see previous remark on Table 4). Also please explain the reason for limiting the k-mer size to 175. Answer: Page 8, second paragraph is based on Figure 3. This is specified in the amended version of the paper. The reason for limiting the k-mer to 175 is justified as follows: “As previously reported, the length of repeat motifs should not exceed 174 nucleotides for V. cholerae, corresponding to 29 repetitions of a 6 nt motif [13]. Accordingly, the longer k-mer size (i.e. 175) proved to generate a correct MLVA profile with no censored data.” Question: Page 8, last paragraph before conclusion: the theoretical evaluation has not been done appropriately. The postulated 6 nt motif is not applicable to any of the MLVA assays commonly used in the given list of genus/species. Rather the design of the MLVAtype software indicates that it will have a very limited range. Indeed the MLVAtype web page at https://ucl-irec-ctma.shinyapps.io/NGS-MLVA-TYPING/ appears tailored for V. cholerae (and perhaps MLVA assays with up to five VNTR loci, and short and perfect repeat arrays). The article might be more convincing if focused on V. cholerae and its 10000 publicly available sequence reads archives. Answer: We fully agree with the reviewer’s suggestion. The updated version of the paper focuses now only on V. cholerae MLVA application. Question: -details, Page 1 last $, Mycobacterium, Streptococcus etc are not species but genus. Please be more specific. Answer Considering that the new version of the paper focuses on V. cholerae, and that the paragraph related to the theoretical feasibility has been removed in the new version of the paper, this sentence was removed. Question: Page 2, first paragraph “Mounting evidence …” obviously and by definition Whole Genome Sequencing can only be better than the previous methods, no need to refer to “Mounting evidence”. The issue is cost, as detailed in the second paragraph. Answer This sentence has been removed, as required. References: 1: Gurevich, Alexey, et al. "QUAST: quality assessment tool for genome assemblies." Bioinformatics 29.8 (2013): 1072-1075. 2: Bultman, Katherine M., et al. "Draft Genome Sequences of Type VI Secretion System-Encoding Vibrio fischeri Strains FQ-A001 and ES401." Microbiology resource announcements 8.20 (2019): e00385-19. 3: Rozanov, Aleksey S., et al. "Metagenome-Assembled Genome Sequence of Phormidium sp. Strain SL48-SHIP, Isolated from the Microbial Mat of Salt Lake Number 48 (Novosibirsk Region, Russia)." Microbiology resource announcements 8.31 (2019): e00651-19. 4: Parks, Dylan, et al. "Genome Sequence of Bacillus subtilis natto VK161, a Novel Strain That Produces Vitamin K2." Microbiology resource announcements 8.35 (2019): e00444-19. Attachments Attachment Submitted filename: Response_to_reviewer.doc https://doi.org/10.1371/journal.pone.0225848.r002
8 Oct 2019 Decision Letter - Axel Cloeckaert, Editor PONE-D-19-20834R1 Backward compatibility of whole genome sequencing data with MLVA typing using a new MLVAtype shiny application for Vibrio cholerae PLOS ONE Dear Dr Ambroise, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please consider the comments of the reviewer to improve the manuscript. We would appreciate receiving your revised manuscript by Nov 22 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Axel Cloeckaert Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/A ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have significantly clarified and improved their report. A few points are written in a misleading way and can easily be improved. (unfortunately for the reviewer, lines are still not numbered) Page 2, “However, it is worth noting that an in silico PCR approach to type MLVA in Brucella from WGS data was recently developed” Would be more exact to indicate: “However, it is worth noting that an in silico PCR approach to type MLVA from WGS data was recently developed and evaluated for Brucella” Along this line of benchmarking, may be worth mentioning https://github.com/Papos92/MISTReSS Page 3 second line “GeneScan determination was retrieved for Ugandan isolates from published data [9]” This sentence is misleading since it suggests that the chromatograms were published as part of the Bwire et al. 2018 publication. However what I understand is that the authors have in the course of the present investigation realized that the MLVA alleles calling published in the 2018 report was incorrect, and have reanalyzed the data. This is a different thing. Page 3 second paragraph “method proposed by Kendall et al. [10], the formula had to be modified to better fit the sequence length of the motif and the position of the primers (Table 1). It is of note that, the original calculation formula was used for the VC0283 motif but with a modified reverse primer” Not clear to me. Do the authors just mean that the use of the modified VC0283 reverse primer has no impact on the PCR product size? May be worth recalling the sequence of the previous primer, as in “(AGCCTCCTCAGAAGTTGAG instead of the previous XXXXX)” “Locus” would be less ambiguous than “motif” (which refers to the repeat unit). Page 3, page 4 and elsewhere: two designations are used for loci VC0171 (alias VCA0171) and VC0283 (alias VCA0283), please harmonize Page 4: “returns the number of tandem repeats” and “increasing number (j=2, 3 .., k) of tandem repeats”: may be clearer to replace “tandem repeats” by “tandemly repeated units” Page 5 cost analysis paragraph, “Reagents costs for MLVA typing (5 motifs) of V. cholerae isolates were compared using three different methods (Table 5).” The authors need to expand a little bit. Discuss the relative costs. Recall that Sanger sequencing was used only to produce a reference dataset, but not as a suggestion for routine MLVA typing (as shown, would make no sense in terms of cost). Table 5 Instead of “Motifs”, the authors probably mean “Loci”? Cost estimates: Sanger sequencing: are these commercial costs? Eight dollars for one PCR in terms of reagents and consumables seems a lot. Genscan-based typing: with only five loci, all five loci should be run in one multiplex PCR. Most laboratories running MLVA on Genscan-type of equipment and significant numbers of strains will multiplex the PCRs. Then the fair reagents cost estimate should be down to one PCR and one run per sample i.e. 11.4 USD. WGS cost: indicating current best commercial prices (for sequencing 1 strain alone versus as part of a batch of 96) might be useful Page 5, “However, it should be noted that a modification of either the primer” When running MLVA on a Genscan type of machine, the “formula” is useless. The allele calling software will call each allele base on its associated observed size range. This remark by the authors suggests that the authors are exporting raw size estimates from the Genscan, and then convert by allele calling using the “formula”. This is not the most recommended way to proceed, see available literature. Page 5, “was applied in order to decrease the number of mismatches, especially those observed with VC0437 and VCA0283.” Quote Table 1 (and check locus names, see a previous remark) Page 7: “hence solving the well-recognized issue of backward compatibility with traditional MLVA typing methods.” The software is not solving anything! There has never been an issue of backward compatibility when the sequencing reads are longer than the tandem repeat arrays. Recall that when tandem repeats contain internal variations, software for sequence assembly may be able to reconstruct correctly tandem repeats longer than the sequencing reads. Mention alternative approaches (the in silico PCR methods, including in terms of benchmarking https://github.com/Papos92/MISTReSS) and explain why the present approach is believed to be more appropriate at least for V. cholera MLVA. I would suggest to merge Table 1 and S1 table as S1 table clearly illustrates the impact of the k-mer size. Discussion: the authors need to comment on the pros and cons of the approach they use here versus the more commonly used in silico PCR approach. In particular they need to indicate that the approach used here work in the Vibrio cholerae context because the VNTRs are unique (so the repeat unit sequence is locus-specific), the tandem repeat arrays are perfect, and there are no indels in the flanking sequences. This is an uncommon situation. They might indicate why they think this approach may be of interest when applicable. Do they think it is because it does not require to have assembled the flanking sequences? ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0225848.r003
Revision 2
12 Nov 2019 Author Response The point-to-point answer to the reviewer has been attached as a word file. Attachments Attachment Submitted filename: Response_to_reviewer.doc https://doi.org/10.1371/journal.pone.0225848.r004
14 Nov 2019 Decision Letter - Axel Cloeckaert, Editor Backward compatibility of whole genome sequencing data with MLVA typing using a new MLVAtype shiny application for Vibrio cholerae PONE-D-19-20834R2 Dear Dr. Ambroise, We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements. Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication. Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. With kind regards, Axel Cloeckaert Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: https://doi.org/10.1371/journal.pone.0225848.r005
Formally Accepted
2 Dec 2019 Acceptance Letter - Axel Cloeckaert, Editor PONE-D-19-20834R2 Backward compatibility of whole genome sequencing data with MLVA typing using a new MLVAtype shiny application for Vibrio cholerae Dear Dr. Ambroise: I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. For any other questions or concerns, please email plosone@plos.org. Thank you for submitting your work to PLOS ONE. With kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Axel Cloeckaert Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0225848.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .