Peer Review History

Original SubmissionJuly 6, 2019
Decision Letter - Joseph Devaney, Editor

PONE-D-19-19050

Assessing the performance of genome-wide association studies for predicting disease risk

PLOS ONE

Dear Dr. David,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please address the comments from the reviewer. In addition, please address the review of the program to the general research community.

We would appreciate receiving your revised manuscript by Oct 13 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Joseph Devaney

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript presents a statistical method and an R package for approximating the AUC of reported SNPs when only the summary level data of the validation dataset are available. The manuscript is well written, and it is easy to follow.

I have the following major concerns.

-----------------------------------------

On page 7&8, it seems authors first simulate the SNP profile and then assign the healthy/disease statuses. Shouldn’t it simulate the SNP profile given the healthy/disease statues? A detailed explanation is needed for this part.

-----------------------------------------

On page 10, what is the variance inflation factor? How was it used to determine if there is multicollinearity?

-----------------------------------------

On page 14, what are odd/even SNPs?

-----------------------------------------

On page 15, authors claimed that “This was important because the SNP inheritance model in published GWA studies may vary from SNP to SNP, and is not typically disclosed. Thus, for each of the 7 WTCCC disease studies we calculated the true AUROC under these four SNP inheritance schemes.” I do not see the point to have four schemes using logistic and ridge regression. The scheme is automatically handled when the model is being fitted. Why do we need to restrict the inheritance scheme explicitly?

-----------------------------------------

In Table 2, some AUROCs are close to 0.5. Does it mean the report SNPs has little or no contribution to the classification accuracy? Do we have any explanation that low p-value SNPs have little discrimination power?

-----------------------------------------

On page 18, there is a comparison of AUROCs between GWAS-derived vs. non-GWAS-derived risk factors. How were those AUROCs of non-GWAS-derived risk factors calculated? Did they use the same logistic or ridge regression models? If not, the comparison may not be fair. The difference may be caused by the models not the risk factors.

-----------------------------------------

Will the authors release the developed tool as open source? The impact will be limited if the research community cannot verify and apply the computational method.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Revision 1

Sept. 29, 2019

Dear Dr. Devaney,

Please find attached our revised manuscript (PONE-D-19-19050) entitled: “Assessing the performance of genome-wide association studies for predicting disease risk” by Jonas Patron, Arnau Serra-Cayuela, Beomsoo Han, Carin Li and myself. We wish to thank the reviewer for their comments and suggestions. We have tried our best to address all the reviewer’s comments by modifying the paper and have provided detailed responses or described our modifications to the manuscript in the attached pages. We have also produced both a clean version and marked-up version of the manuscript with all changes indicated in red so that the edits or modifications can be more easily seen.

For Reviewer #1:

1. On page 7&8, it seems authors first simulate the SNP profile and then assign the healthy/disease statuses. Shouldn’t it simulate the SNP profile given the healthy/disease statuses? A detailed explanation is needed for this part.

Response: We appreciate the reviewer’s concerns and apologize for the confusion. Indeed, we simulate the SNP profiles given the healthy/disease status of the cohort. Note from equations (1)-(4) on pg. 7&8 we start by determining or inputting the number of cases and controls. That is, we always start by knowing the healthy/disease status of the individuals in the cohort, we just don’t know the SNP profile of said individuals. Next, by knowing the risk allele frequency in the controls and the odds ratio between the cases and controls we calculate the risk allele frequency in the cases. Once we know the risk allele frequency in both the case and control groups, we appropriately assign the SNP profiles to each group. We have added portions of this text to the manuscript to explain the process in more detail.

2. On page 10, what is the variance inflation factor? How was it used to determine if there is multicollinearity?

Response: Given a regression model with multiple variables, the variance inflation factor is the quotient of the variance from a model which regresses one of the predictor variables against all the others. Multicollinearity was determined to exist when at least two variables showed inflated coefficients, as quantified by the calculation of the variance inflation factor being equal to infinity. We also tried other variance inflation factor cutoff values, however because the differences in the AUC estimates were so small (<0.009) and because standard logistic regression is more easily interpretable than its ridge regression counterpart, we found it appropriate to restrict the use of ridge regression only to models with extreme (i.e. divergent) variance inflation factor estimates. We have added portions of this text to the manuscript to explain this concept in more detail.

3. On page 14, what are odd/even SNPs?

Response: The characterization of SNPs as odd or even is simply referring to their position in a list—those found at odd numbered positions in the list are “odd SNPs” and those that are found in even numbered positions are called “even” SNPs. It is a simple scheme to split the SNPs in an approximately 50-50 fashion. Consider Table R1 (see below), which is a table with SNPs. The 1st, 3rd, and 5th SNPs would be the odd SNPs and the 2nd and 4th SNPs would be the even SNPs.

Table R1

Accession Control size Case size Risk allele freq OR Odd/Even

rs7903146 2598 2309 0.3 1.36 odd

rs5219 2598 2309 0.36 1.25 even

rs10811661 2598 2309 0.85 1.21 odd

rs1801282 2598 2309 0.87 1.21 even

rs2641348 2598 2309 0.11 1.15 odd

4. On page 15, authors claimed that “This was important because the SNP inheritance model in published GWA studies may vary from SNP to SNP, and is not typically disclosed. Thus, for each of the 7 WTCCC disease studies we calculated the true AUROC under these four SNP inheritance schemes.” I do not see the point to have four schemes using logistic and ridge regression. The scheme is automatically handled when the model is being fitted. Why do we need to restrict the inheritance scheme explicitly?

Response: This is an excellent point. What were doing was conducting two different and independent modeling steps. In the first step, we were trying to calculate the true AUROC of a given WTCCC disease study. To do this we had to convert the raw GWAS data into dummy variables to which we could apply logistic regression. Take Table R2 (below) for example. In this example, there is one individual and we have information about four of his/her SNPs. To run a regression analysis the information in columns “1st Allele” and “2nd Allele” must be converted to a single number (either a 0 or 1) known as the dummy variable. Then, in the second step we would run our logistic/ridge regression to estimate the AUROC (in this step the choice between logistic or ridge regression would be handled automatically). Note that in this simple example a change in the choice of the SNP inheritance scheme produces three distinct results: (1,1,0,1), (0,1,0,0), and (1,1,0,0). As a result, when we run our standard regression techniques we could expect to produce 3 different AUROCs, all of which can be interpreted as the “true” AUROC.

Table R2

SNP Risk Allele 1st Allele 2nd Allele Dominant Inheritance Recessive Inheritance Dominant (odd) – Recessive (even)

#1 A T A 1 0 1

#2 A A A 1 1 1

#3 T A A 0 0 0

#4 C C G 1 0 0

As such, to rigorously test GWIZ’s AUROC predictions, we had to ensure that we could accurately predict the AUROC which resulted from any one of these inheritance schemes. In our test set we could restrict the inheritance scheme explicitly, however in practice, given a set of odds ratios we cannot necessarily know which inheritance scheme produced them.

5. In Table 2, some AUROCs are close to 0.5. Does it mean the report SNPs has little or no contribution to the classification accuracy? Do we have any explanation that low p-value SNPs have little discrimination power?

Response: With regard to the first question, yes, when the AUROCs are close to 0.5 it does indicate that the reported SNPs has/have little or no contribution to the classification accuracy. With regard to the second question, the size of the p-value and its influence on the AUROC is primarily affected by the associated SNP’s odds ratio and allele frequency. In other words, low p-values can arise from statistically significant effects of small magnitude. In GWA studies, the ‘effect’ we are looking for is that of differences in allele frequencies between case and control groups. And the magnitude of the effect in question is measured by the odds ratios. A SNP with a p-value of 10-80 but with an odds ratio close to 1 (which is typical of most GWAS SNPs) will have a very small effect on the AUROC. A low odds ratio often arises from a very low SNP/allele frequency. A SNP with an odds ratio of 3 but a p-value of just 10-10 will have a large effect on the AUROC.

6. On page 18, there is a comparison of AUROCs between GWAS-derived vs. non-GWAS-derived risk factors. How were those AUROCs of non-GWAS-derived risk factors calculated? Did they use the same logistic or ridge regression models? If not, the comparison may not be fair. The difference may be caused by the models not the risk factors.

Response: The AUROCs of the non-GWAS derived risk factors were calculated using a mixture of different approaches. About 1/3 of the values quoted in this table used logistic regression while the others used either PLS-DA or random forest methods. It is true that the choice of the classification model can affect the performance and the reported AUROC but generally most biostatisticians will test multiple classification models (logistic regression, PLS-DA, random forest, SVM regression, etc.) and will quote the best performing model. Our own experience in using many different classifiers for many other related projects is that the AUROC differences between classifiers are generally quite small (<5%) although rare exceptions do occur. Overall, we believe the AUROC differences accurately reflect the risk factors and that they are not a (significant) function of the choice of classifier. We have added this information to the manuscript.

7. Will the authors release the developed tool as open source? The impact will be limited if the research community cannot verify and apply the computational method.

Response: The code is freely available for download at https://bitbucket.org/wishartlab/gwiz-rscript/src/master/, and is compatible with a wide variety of UNIX platforms, Mac OS and Windows operating systems. The above link to bitbucket is cited in the original manuscript on page 11, however to reduce confusion in the future we have also mentioned this link in the abstract.

In addition to these edits we have also made some small changes to the manuscript by adding a few missing references and elaborating on GWIZ’s comparison to other programs. This led to a small update to Figure 3. Once again, we would like to thank the reviewer for their comments and suggestions. We hope that the explanations we have provided are satisfactory and certainly the changes they suggested have significantly improved the overall clarity and quality of the manuscript. We hope these edits are acceptable and are looking forward to seeing the manuscript published in PLOS ONE soon.

Sincerely,

David Wishart

Attachments
Attachment
Submitted filename: Response-letter-PLOS ONE - DSW.docx
Decision Letter - Joseph Devaney, Editor

Assessing the performance of genome-wide association studies for predicting disease risk

PONE-D-19-19050R1

Dear Dr. David,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Joseph Devaney

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Formally Accepted
Acceptance Letter - Joseph Devaney, Editor

PONE-D-19-19050R1

Assessing the performance of genome-wide association studies for predicting disease risk

Dear Dr. David:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Joseph Devaney

Academic Editor

PLOS ONE

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .