Peer Review History
| Original SubmissionOctober 3, 2019 |
|---|
|
Dear Dr Huang, Thank you very much for submitting your Research Article entitled 'Unified inference of missense variant effects and gene constraints in the human genome' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review again a much-revised version. We cannot, of course, promise publication at that time. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Scott M. Williams Section Editor: Natural Variation PLOS Genetics Hua Tang Section Editor: Natural Variation PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: This paper describes the development and evaluation of UNEECON, a framework for jointly predicting deleterious variants and constrained genes. This is certainly an interesting topic in the context of variant effect prediction and interpretation. I find the attempt to unify variant-level and gene-level quite innovative, and it is certainly an approach that will be useful in the study of severe, early onset disorders. Further, the use of a deep neural network to learn parameters relevant to population genetics from millions of variants from gnomAD is a novel contribution to this area. From the perspective of constrained gene prediction, UNEECON results are quite promising. However, I remain skeptical of some of the claims with regards to pathogenicity prediction and the overall argument that this method would be better in practice than those evaluated here (see below). Overall, the paper is clearly written and the methods are outlined satisfactorily (see minor comments for some things that need to be clarified). I outline my comments below: MAJOR: - A general issue that has plagued the field is the problem of unevenly distributed variant information across genes. Some genes are over-studied and are likely to have more variants identified as pathogenic. More importantly, many genes are likely to contain variants from only one class. UNEECON is interesting in this context that it is trained on genes that are mostly going to contain only benign variants and is evaluated on a ClinVar set that will skew mostly towards pathogenic-only or bi-class genes. Ref. 58 from this paper highlighted a method that performed extremely well in its evaluations when run on “pathogenic-only” or “benign-only” proteins but drastically underperformed on “mixed” genes. Given that UNEECON is heavily influenced by gene-level features, I wonder if it is susceptible to the same issue. One way to test this would be to perform a version of the ClinVar evaluation on only the subset of genes that contain both classes of variants. If performance drops then perhaps unification of variant-level and gene-level information may not be the best approach for variant pathogenicity prediction. - On a related note, I am concerned about information leakage between the training set and evaluation sets used in this paper. I agree that UNEECON benefits from actually not using the pathogenic variants in its training and evaluation. However, the sheer size of gnomAD is expected to include every known gene in the training of the deep neural network. Since UNEECON uses gene-level information, there is a distinct possibility that performance may be inflated even after excluding variants in both ClinVar and gnomAD. In fact, Ref. 58 (cited for the circularity and inflation issues) points this out and recommends gene-level partitioning for cross-validation experiments such as those conducted by PolyPhen-2 and MutPred. Of course, in the context of this paper, the proposed experiment would be to train a version of UNEECON without variants in genes from the ClinVar set and evaluate that version on the ClinVar set. That way any gene-level bias in the performance measures would be eliminated. - On page 8, line 254, is it all that surprising that UNEECON-G and pLI scores do not correlate? Intuitively, the impact of missense variants and LOF mutations are going to vary in magnitude even within the same gene. While a biological explanation (as provided here) for this may very well be plausible, it is more likely that the discrepancies are due to technical reasons. A recent commentary (PMID: 30977936) has touched upon issues related to the methodology and applicability of pLI scores. This commentary highlights the example of BRCA genes that have near-zero pLI scores but are known to harbor several deleterious missense variants. MINOR: - The bimodality of the UNEECON score distribution for active sites is worrisome with the peak closer to 0.25 is a little confusing. I interpret this as “there are more variants in active sites that have low UNEECON scores than high.” This is counter-intuitive and warrants some explanation. - In the functional analyses related to Fig. 5, are there any interesting depletions? I am curious about the functions of those genes that are tolerant to missense but not to LOF mutations. I am also not sure what “unclassified” means in this context. - What is the difference between Eqns. 2 and 3? It is difficult to tell with q_i being defined. - In the Methods section, it would be helpful to readers if a clear account of the parameters to be estimated is provided up front. - The paper is missing details of the final model that emerged from the evaluation process, its architecture and its parameters. - Similarly, the paper lacks details on dataset sizes, particularly in the context of model training and evaluation. How many variants were used to train the deep mixed-effects model? How many variants were included in the evaluations relevant to ClinVar? How many pathogenic and how many benign? - I am also curious about the activation function of the output layer of the neural network. This is of particular relevance to z_ij and its scaling relative to u_j. Is there a potential for one quantity to systematically dominate the other in Eqn. 9? Reviewer #2: The work represents an important advance in prioritization of genes and variants relevant to human disease. it has been known since the introduction of gene level intolerance scoring in 2013 that gene level metrics of the strength of purifying selection provide independent information about variation pathogenicity to the longer established variant level metrics that largely depend on conservation and amino acid substitution features. While attempts have been made previously to integrate both approaches into a single predictive framework these have been based on supervised learning approaches using a set of putatively pathogenic and benign variants. The work here combines a selected set of variant level features with a gene level term and estimates selective constraint operating against all possible gene sequence changes based on human polymorphism data compared against sequence specific mutability. As such, it provides an integrated approach assessing purifying selection operating in the human population. The authors have rerun the standard assessments used to test both gene level and variant level predictors with generally improved performance both for identifying relevant gene sets (e.g. haploinsufficient genes) and pathogenic variants. In addition to these advances, the model allows some novel biological insights, including explaining an important reasons for discrepancy between intolerance to missense and loss of function variation as being due to the proportion of proteins that is disordered. The model also highlights that the gene level term is more informative than variant level terms which is still not as widely appreciated as it should be. For these reasons the work here represents an important advance in the field. While the paper is generally clearly written and the conclusions generally fair, I do have a couple of relatively minor suggestions for consideration. Perhaps most fundamentally, while the use of UNEECON deep learning model to combine variant features and a gene level term to predict the strength of selection operating against specific alleles is welcome, since it allows non linear combinations of these terms to be learned, it is striking that a linear approximation of the UNEECON model is very highly correlated, suggesting little benefit from the model learning optimum non linear combinations. The authors appropriately use the linear model to infer the relative importance of features, but the very high correlation between the two models suggests the linear modle is likely to have similar performance to UNEECON. Given the more direct interpretability of the linear model, the authors should comment on whether the more complex model is in fact needed for use. The second small point is that some of the comparisons are inappropriate since some of the metrics are used in ways they were not intended for. For example, in Figure 3a representing prediction in distinguishing pathogenic variants gene level metrics such as RVIS are compared directly to UNEECON. As outlined however in the initial work, gene level metrics are intended to be used alongside some version of a variant level predictor (since as emphasized here and in the original publications the two approaches offer independent information). The fair comparison therefore for generating a version of figures 3 focused on variants would be to use a combination of a variant and gene level metric for all those comparisons like RVIS that are gene level metrics. This idea was outlined in the initial publications under the banner of a combined threshold for both gene level and variant level. I have no doubt that UNEECON would still perform better, but one appropriate simple comparison would be to re run these analyses including for example a hard threshold on some appropriate variant score such as PP2 alongside the quantitative gene level score such as RVIS as currently used. Finally, the gene level metrics in use are known to struggle with small genes since there is often not enough polymorphism data to infer selection. The authors should address robustness to gene size. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No |
| Revision 1 |
|
* Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. * Dear Dr Huang, Thank you very much for submitting your Research Article entitled 'Unified inference of missense variant effects and gene constraints in the human genome' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved. We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer. In addition we ask that you: 1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. 2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images. We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] Please let us know if you have any questions while making these revisions. Yours sincerely, Scott M. Williams Section Editor: Natural Variation PLOS Genetics Hua Tang Section Editor: Natural Variation PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The revised version of the paper addresses most of the concerns that I had with the original version of the paper. However, I still remain skeptical of the claim of UNEECON’s “unmatched” performance when it comes to pathogenicity prediction. Although the AUCs are indeed higher for UNEECON in Figs. 3, 4, S3 and S4, performances in the most important region of the ROC curves (the low-false-positive-rate region) tend to be on comparable to other methods. I suggest toning down such strong claims made with regards to performance of UNEECON in pathogenicity prediction, when compared to other methods. I also would like to follow up on the following statement in the item-by-item response: “Training a version of UNEECON without gnomAD variants in disease genes will disable UNEECON’s ability to learn gene-level constraints in disease genes, leading to an underestimation of UNEECON’s performance.” This gets to the actual motivation behind my comment. If gene-level constraints are that important to UNEECON’s performance, then it is expected that UNEECON will underperform when attempting to predict a pathogenic variant in a gene with no previous known disease association. The gnomAD subset that does not overlap with ClinVar serves as a proxy for such genes as it is quite comprehensive in the coverage of the genome. My original concern was that UNEECON may simply be good at separating disease-associated genes (which is as the author correctly said is subject to ascertainment bias) from those in gnomAD, and that this was a major driver of variant-level predictive performance. This is somewhat alleviated through the inclusion of Fig. S4 but a true test of UNEECON’s ability to contribute to novel discoveries is in its ability to make correct variant-level predictions in “undiscovered” disease genes. If an experiment to test this seems infeasible, it would be helpful to clearly state this as a limitation of the model in the Discussion section. Reviewer #2: the authors have done a thorough job of responding to the reviews and I have no further comments ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No |
| Revision 2 |
|
Dear Dr Huang, We are pleased to inform you that your manuscript entitled "Unified inference of missense variant effects and gene constraints in the human genome" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional accept, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about one way to make your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Scott M. Williams Section Editor: Natural Variation PLOS Genetics Hua Tang Section Editor: Natural Variation PLOS Genetics Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-19-01659R2 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. |
| Formally Accepted |
|
PGENETICS-D-19-01659R2 Unified inference of missense variant effects and gene constraints in the human genome Dear Dr Huang, We are pleased to inform you that your manuscript entitled "Unified inference of missense variant effects and gene constraints in the human genome" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Matt Lyles PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .