Peer Review History

Original SubmissionNovember 4, 2021
Decision Letter - David Balding, Editor, Vincent Plagnol, Editor

Dear Dr Bocher,

Thank you very much for submitting your Methods entitled 'Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version.

We cannot, of course, promise publication at that time. From an editorial perspective, the reviewers, and in particular reviewer 2 raises several major issues. I do think these issues can be addressed, as this is a request for additional work rather than a fundamental challenge of the concept of the paper, but none of them is trivial. This means (i) benchmark the proposed methods against the most commonly used tools in the field, (ii) understand how variability in mutation rate can complicate data interpretation and (iii) include indels into the model (both reviewers made that point and this is a major source of rare deleterious variation). These combined additions represent a high bar, but methods for rare variant association testing are quite mature, which in turn raises the expectations for alternative approaches.

Should you decide to revise the manuscript for further consideration here, your revisions must address the specific points raised above. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Vincent Plagnol

Associate Editor

PLOS Genetics

David Balding

Section Editor: Methods

PLOS Genetics

Reviewer's Comments to the Authors:

Reviewer #1: In their publication 'Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score', Bocher et. all describe a novel method for selecting disease-candidate variants among rare SNVs genome-wide. The described methodology is very interesting and enables new insights especially in previously less regarding genomic regions. I would recommend however to revise the text somewhat in order to improve readability:

major:

- I would recommend to define the adjusted CADD scores once and then stick to that. Maybe use an acronym (e.g. ACS (adjusted CADD score), RAVA-Score or something like that). In my eyes, the switches between adjusted and not-adjusted CADD scores (i.e. l. 154) make the manuscript hard to read, especially as you define the adjustment multiple times. Something similar may help to distinguish 'all' and '>1000bp' CADD regions

- Table 2: Are percentages only from within the chosen (>1000bp) CADD regions or all. Or why else are some coding CCDS not in a CADD region? Why are protein domains so much less covered (likelihood that entire domain is covered instead of % of each bp)?

- Fig 1: Why do you include missense and MSC in the top panel. I understand that those are the same as the bottom panel, but the empty plots should hence be excluded altogether. Maybe you could use that space on the upper panel to put (a single) legend for both panels there (i.e. the TPR, TNR and precision colours).

One should note that Figure 1 does not tell much as most variants in ClinVar are coding and thus the adjusted CADD score does not do much. You mostly just have a higher cut-off than CADD1.4=20 (similar S4, just lower) which decreases TPR and increases FPR

suggestion:

- I wonder what happens in the very large 'CADD regions' around the centromers where, presumedly, many variants are not scored at all and general genome conservation is low and hence the median, which may lead to small numbers of variants in slightly higher conserved areas to be considered significant, idk). I have no idea what effect this may have but I, personally, would probably implement a maximum length for CADD regions and split regions larger than, say, 200 kb into the maximum number so that each is at least 100 kb (just a suggestion, maybe this does not work as intended)

- Maybe it's just me, but I find Figures S1 (to a lesser degree S2) definitely more relevant for the main manuscript than any of the Tables. You are generally moving a lot of the method (i.e. definition of CADD regions) in the supplement that seems an important part of the manuscript

- l. 320/321 'as recommended by https://cadd.gs.washington.edu/info, version v1.4': not to be pedantic but I would interpret 'there is not a natural choice here -- it is always arbitrary' rather as a recommendation for a dynamic threshold like your method than a single fixed cut-off. Afterall, you are pretty much proposing a (automated) solution for a problem that has also been stated there

- l.156: ‘Note that because CADD scores are only available for SNVs’ -> CADD is available for InDel, consider carefully however if you want to include those in the analysis

minor:

- l. 89/90 'These regions prevent the use of sliding windows procedures while enabling the study of rare variants in the whole genome' -> I would use 'avoid' or 'replace' instead of 'prevent'

- l. 92 fix -> fixed

- (multiple) I assume 'package R Ravages' should be 'R package Ravages'

- (multiple) I would write 'gnomAD', not 'GnomAD'

Reviewer #2: In this manuscript, Bocher et al. attempt to define a new approach to performing rare variant burden tests in the non-coding genome. With whole-genome sequencing of large disease cohorts increasing at a rapid rate, identifying such methods is of value to the field. Unfortunately, the authors’ approach at defining functional units of the non-coding genome fails to account for major confounders. Furthermore, they have failed to benchmark against some of the most popular tools in the field as outlined in my comments below. In addition, their rationale for defining these CADD regions is very unclear to me.

1. The authors define the boundaries of their CADD regions as regions between two variants with an adjusted CADD score > 20. However, they do not consider mutation rate. In CCR, which the authors compare their approach to, Havrilla et al. cleverly used CpG density as a proxy for mutation rate and showed that this approach worked well. Unfortunately, because the CADD regions do not account for mutation rate, it is entirely unclear whether the lack of variation in a CADD region is due to selective pressure or lower mutability of that region due to decreased mutation rate.

2. I’m not convinced that the authors are detecting regions of the genome depleted for functional variation. I’d like to see how their regions compare to other methods that attempt to define constrained genomic regions (JARVIS, Orion, CDTS, LINSIGHT, etc.). The authors should compare performance of these approaches in classifying non-coding ClinVar variants. Also, the authors should not use ClinVar benign variants as a negative set in these comparisons, as most ClinVar benign variants are gnomAD polymorphisms. Because gnomAD variation was used to define this score, they run into confounding with this comparison. A better approach would be to use variants seen in other databases (e.g., DiscovEHR) but not gnomAD as a putative benign set.

3. In defining CADD regions, the authors required the variant to be seen in gnomAD > 2x, but they do not provide any rationale how they came to this threshold. Same goes for choosing a CADD threshold for the burden tests.

4. While the authors compare to sliding-window based burden approaches, they should also compare to how their approach compares to defining functional units using the many other intolerance methods mentioned above.

5. They exclude indels/structural variants because these variants do not have CADD scores. Doing so results in a tremendous loss of power: indels / SVs should have much larger effect sizes than SNVs in the non-coding genome. A burden model that includes SNVs with a CADD threshold + any indels / SVs should be more powered.

6. The authors find a suggestive association between a CADD region and VTE. I’d like to see how their approach performs without any CADD filter as a negative control. Furthermore, I would want to see a comparison that their CADD region-based burden analysis does better than randomly splitting the genome into random chunks (matched in size to the CADD regions).

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Revision 1

Attachments
Attachment
Submitted filename: Reviews_PlosG_23May22_VF.docx
Decision Letter - David Balding, Editor, Vincent Plagnol, Editor

Dear Dr Bocher,

Thank you very much for submitting your Methods entitled 'Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score' to PLOS Genetics.

We are happy to say that the substantial reviewers' comments have been addressed and we are therefore close to being able to accept the manuscript. However, reviewer 1 made some minor comments and we ask you to address these, after which we expect to formally accept the manuscript.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Vincent Plagnol

Associate Editor

PLOS Genetics

David Balding

Section Editor: Methods

PLOS Genetics

Reviewer's Comments to the Authors:

Reviewer #1: All my previous concerns have been addressed. However, there are a few minor comments to the new sections that should be fixed prior to publication:

l. 196/197: "This is expected given the fact that conservation and thus CADD scores are low in these regions."

l. 462/463 "We also observed that CADD regions close to the centromeres can be very large, possibly due to a general low conservation in these areas"

> My understanding is that there are few high quality genomes that cover these areas (repetitive elements, GC content) and hence alignments are missing to generate proper conservation scores.

l. 239 "Fig 2B" -> that is 2A now, right?

l. 477 "if" -> "of" (or the sentence does not make sense)

l. 479 "use of a" -> "use of the"

optional:

Consider rereading and revising some sentences here and there. Quite a few feel unnecessarily lengthy.

E.g l. 478, you can remove "of them can" without changing context

Reviewer #2: The authors have addressed my concerns

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Revision 2

Attachments
Attachment
Submitted filename: Review2_response_reviewers_2.docx
Decision Letter - David Balding, Editor, Vincent Plagnol, Editor

Dear Dr Bocher,

We are pleased to inform you that your manuscript entitled "Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Vincent Plagnol

Academic Editor

PLOS Genetics

David Balding

Section Editor

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-21-01454R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Formally Accepted
Acceptance Letter - David Balding, Editor, Vincent Plagnol, Editor

PGENETICS-D-21-01454R2

Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score

Dear Dr Bocher,

We are pleased to inform you that your manuscript entitled "Testing for association with rare variants in the coding and non-coding genome: RAVA-FIRST, a new approach based on CADD deleteriousness score" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Anita Estes

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .