Peer Review History

Original SubmissionJuly 5, 2020
Decision Letter - Momiao Xiong, Editor

PONE-D-20-20743

Efficient association mapping from k-mers - an application in finding sex-specific sequences

PLOS ONE

Dear Dr.Rahman:,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 26 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Momiao Xiong

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript “Efficient association mapping from k-mers - an application in finding sex-specific sequences” reimplemented a previously developed algorithm HAWK using C++ instead of the original R scripts and/or other scripts. Briefly, the major changes include the reimplementation of association running and the multiple test correction. The package was tested in comparison with the original HAWK package. Both results are similar on the E. coli data. The authors also ran the new package to identify sequences of sex chromosomes using human population whole genome sequencing data. It showed that significant k-mers are largely from chromosomes X and Y. Running time was also compared. Overall the manuscript aims to computationally improve the HAWK performance for k-mer GWAS. The computational speed for k-mer GWAS is important because hundreds of million statistical tests could be needed for a k-mer GWAS study. Below I have a few minor comments.

1. The lab for x- and y-axis could be clearer, although the audience could guess what cpp means and what R means. Notes can be added in the legend.

2. The paragraph for the explanation of k-mer missing regions on chromosome Y is hard to understand. If the region is missed in the reference genome as the authors speculated, what was plotted in Fig 5b? And it is not reasonable to state that “The missing region in Chromosome Y also possibly explains the large percentage of male associated k-mers that could not be mapped to it in comparison to the percentage of female associated k-mers that map to Chromosome X.” because the difference in the mapping percentage between male and female significance k-mers is not large (99.57 vs 96.37%) but the regions seem to be large. An alternative explanation is that X and Y share sequences and k-mers derived from the common sequences would not be identified as male associated k-mers, the term used by the authors.

3. Female and male associated k-mers were used in the manuscript. Actually, both k-mers sets are associated with the gender (male and female). The authors probably categorized them based on the direction (positive/negative) of correlation. I think, more accurate terms are needed to be used.

4. Line9-11 in the Introduction are not clear. What kind of method it is and why the method made the reference-based method hard.

Reviewer #2: The authors improved the implementation of the Hawk, an alignment-free association mapping method for genetic research, by reducing the running time and addition of feature of Benjamini-Hochberg method for controlling for multiple comparisons. The authors also suggest that the method can be used to identify sex-specific sequences. This study has several important strengths, however, there are several issues that should be clarified and some details should be added before publication.

1. Introduction: I think two flow-charts, one for the original HAWR, and the other for re-implemented one by the authors, would be helpful that readers can understand the differences in procedures more intuitively.

2. Implementation, page 2: Please add more details about the population structure. Do they include multiple variables or is it an integrated one? How can it be used for controlling for confounding, for example, by adjusting in the regression model?

3. Implementation, page 2, line 50-52: “p-values were adjusted for population structure…” This sentence is unclear. Does this mean the associations between genes and phenotypes were adjusted for confounders and p-values were estimated, or the p-values themselves were corrected? If the former is the case, then it should be said that the p-values were adjusted.

4. Results, Figure 1: Please indicate which plot is for the original or revised one.

5. Results, FDR: What the FDR value and corresponding alpha value were used for example in Figure 2(e) and (f)? It will be also more informative if the author could show the alpha level from the Bonferroni correction in the same example.

6. Results, sex-specific sequences: How the number of 32,699,548 of k-mers was chosen for the PCA?

7. Results, sex-specific sequences: How much variance can be explained by PC1 to PC6?

8. Results, sex-specific sequences: It would be more informative to plot PC1 to PC6 by populations. If PC5 and PC6 are also significantly different between YRI and TSI, a senstivity analysis with additional adjustment for PC5 and PC6 would be necessary.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Revision 1

Reviewer 1

The manuscript “Efficient association mapping from k-mers - an application in finding sex-specific sequences” reimplemented a previously developed algorithm HAWK using C++ instead of the original R scripts and/or other scripts. Briefly, the major changes include the reimplementation of association running and the multiple test correction. The package was tested in comparison with the original HAWK package. Both results are similar on the E. coli data. The authors also ran the new package to identify sequences of sex chromosomes using human population whole genome sequencing data. It showed that significant k-mers are largely from chromosomes X and Y. Running time was also compared. Overall the manuscript aims to computationally improve the HAWK performance for k-mer GWAS. The computational speed for k-mer GWAS is important because hundreds of million statistical tests could be needed for a k-mer GWAS study. Below I have a few minor comments.

Comment 1

The lab for x- and y-axis could be clearer, although the audience could guess what cpp means and what R means. Notes can be added in the legend.

Response

We have clarified the labels for x- and y-axis in the caption of Figure 2 on Page 6.

Comment 2

The paragraph for the explanation of k-mer missing regions on chromosome Y is hard to understand. If the region is missed in the reference genome as the authors speculated, what was plotted in Fig 5b? And it is not reasonable to state that “The missing region in Chromosome Y also possibly explains the large percentage of male associated k-mers that could not be mapped to it in comparison to the percentage of female associated k-mers that map to Chromosome X.” because the difference in the mapping percentage between male and female significance k-mers is not large (99.57 vs 96.37\\%) but the regions seem to be large. An alternative explanation is that X and Y share sequences and k-mers derived from the common sequences would not be identified as male associated k-mers, the term used by the authors.

Response

There is a sequence of Ns that corresponds to the missing region in the reference fasta file (Lines 205-207). Hence, although nothing could be mapped to the region, the Manhattan plot could be generated.

We have rewritten this portion (Lines 207-214) which should hopefully make it more reasonable. We have added a figure (Figure 6(c)) showing the histograms of counts of k-mers that could and could not be mapped to Chromosome Y. The heavier tail of distribution of unmapped k-mer counts indicate that they are from a repeat rich region which is what makes the missing region hard to assemble.

We agree with the alternate explanation in the sense that if the difference in k-mer count distributions in cases and controls is small and the number of samples in the study is not large, some k-mers may not be identified. However, this is unlikely in this case because (a) we are able to identify k-mers from throughout Chromosome X, and (b) our earlier analysis indicate (not shown here) that it is easier to identify the k-mers that have one copy in one group and none in the other (Chromosome Y) than the k-mers with two copies in one group and one copy in the other group (Chromosome X).

Comment 3

Female and male associated k-mers were used in the manuscript. Actually, both k-mers sets are associated with the gender (male and female). The authors probably categorized them based on the direction (positive/negative) of correlation. I think, more accurate terms are needed to be used.

Response

We thank the reviewer for the observation. We have corrected the terminology.

Comment 4

Line9-11 in the Introduction are not clear. What kind of method it is and why the method made the reference-based method hard.

Response

We have rewritten the lines to make them more comprehensible. (Lines 9-20)

Reviewer 2

The authors improved the implementation of the Hawk, an alignment-free association mapping method for genetic research, by reducing the running time and addition of feature of Benjamini-Hochberg method for controlling for multiple comparisons. The authors also suggest that the method can be used to identify sex-specific sequences. This study has several important strengths, however, there are several issues that should be clarified and some details should be added before publication.

Comment 1

Introduction: I think two flow-charts, one for the original HAWR, and the other for re-implemented one by the authors, would be helpful that readers can understand the differences in procedures more intuitively.

Response

We thank the reviewer for suggestion. We have added a flowchart (Figure 1) showing old features and highlighting re-implementations.

Comment 2

Implementation, page 2: Please add more details about the population structure. Do they include multiple variables or is it an integrated one? How can it be used for controlling for confounding, for example, by adjusting in the regression model?

Response

We have added more details about the population structure (Lines 56-60) and the regression model (Lines 84-88).

Comment 3

Implementation, page 2, line 50-52: “p-values were adjusted for population structure…” This sentence is unclear. Does this mean the associations between genes and phenotypes were adjusted for confounders and p-values were estimated, or the p-values themselves were corrected? If the former is the case, then it should be said that the p-values were adjusted.

Response

It is indeed the former. We have rewritten the sentence to make it clearer. (Lines 61-63)

Comment 4

Results, Figure 1: Please indicate which plot is for the original or revised one.

Response

We have modified the caption of Figure 1 to make this clear.

Comment 5

Results, FDR: What the FDR value and corresponding alpha value were used for example in Figure 2(e) and (f)? It will be also more informative if the author could show the alpha level from the Bonferroni correction in the same example.

Response

We use alpha level of 0.05. We have now mentioned this in the text (Line 161) and in the caption of Figure 3. We have also mentioned the alpha level for Bonferroni correction in the text (Line 145) and added horizontal lines in Figure 3 to highlight the threshold.

Comment 6

Results, sex-specific sequences: How the number of 32,699,548 of k-mers was chosen for the PCA?

Response

Each k-mer present in between 1\\% and 99\\% of the samples are chosen with probability 0.01 which led to this number. Clarified this in the manuscript. (Lines 185-186)

Comment 7

Results, sex-specific sequences: How much variance can be explained by PC1 to PC6?

Response

The first six PCs explain 14.90\\% of the variance among which the first and second PCs explain 7.68\\% and 2.51\\% of the total variance respectively. We have now mentioned this in the paper. (Lines 187-189)

Comment 8

Results, sex-specific sequences: It would be more informative to plot PC1 to PC6 by populations. If PC5 and PC6 are also significantly different between YRI and TSI, a sensitivity analysis with additional adjustment for PC5 and PC6 would be necessary.

Response

We have added plots for PC3 to PC6 by populations in Figure 5. We had not observed any significant relation between populations and PC3-PC6.

Attachments
Attachment
Submitted filename: responses_to_reviewers.pdf
Decision Letter - Momiao Xiong, Editor

Efficient association mapping from k-mers - an application in finding sex-specific sequences

PONE-D-20-20743R1

Dear Dr. Rahman,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Momiao Xiong

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Authors have performed additional analyses to clarify some confusions. The revision has addressed reviewer's comments.

Reviewer #2: The authors addressed all my concerns. I have no further comments.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Formally Accepted
Acceptance Letter - Momiao Xiong, Editor

PONE-D-20-20743R1

Efficient association mapping from k-mers - an application in finding sex-specific sequences

Dear Dr. Rahman:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Momiao Xiong

Academic Editor

PLOS ONE

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .