Correlation-based tests for the formal comparison of polygenic scores in multiple populations

Sophia Gunn; Kathryn L. Lunetta

doi:10.1371/journal.pgen.1011249

Peer Review History

Original SubmissionOctober 24, 2023
7 Dec 2023 Decision Letter - Michael P. Epstein, Editor, Xiang Zhou, Editor Dear Dr Gunn, Thank you very much for submitting your Research Article entitled 'Correlation-Based Tests for the Formal Comparison of Polygenic Scores in Multiple Populations' to PLOS Genetics. The manuscript was fully evaluated at the editorial level and by independent peer reviewers. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version (although we cannot promise publication at that time). The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. In particular, multiple reviewers felt the overall writing of the manuscript could be enhanced and that the work itself needs to be more self contained. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. In addition, substantial improvements are needed in the manuscript's writing to enhance clarity, enabling readers to better understand the methods, simulations, and results. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Xiang Zhou, Ph.D. Academic Editor PLOS Genetics Michael Epstein Section Editor PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Correlation-Based Tests for the Formal Comparison of Polygenic Scores in Multiple Populations Gunn and Lunetta This study introduces correlation-based methods to assess the performance of Polygenic Scores (PGS) by building upon prior work that developed a statistical framework and robust test statistics for comparing multiple correlation measures across diverse populations. The adaptable framework presented here can be extended to a broader range of hypothesis tests compared to existing methods. The authors validate the proposed methods through simulations and illustrate their utility with two examples: evaluating previously developed PGS for low-density lipoprotein cholesterol and height across multiple populations in the All of Us cohort. Additionally, the authors have created an R package called 'coranova' featuring both parametric and non-parametric implementations of the described methods. This study is interesting, however, there are comments and questions that should be carefully considered by the authors. 1. This study employs a correlation-based test as a fundamental aspect of its methodology. It may be essential for the authors to explicitly articulate the distinction between correlation-based and R^2-based tests, as these are two distinct test statistics with unique properties. I strongly recommend that the authors delve deeper into this comparison, specifically exploring their performance in non-nested model comparisons. For instance, a valuable approach would be to compare the performance of correlation-based and R^2-based tests using Vuong’s test as a benchmark for non-nested model comparisons. Vuong, Quang H. (1989). "Likelihood Ratio Tests for Model Selection and non-nested Hypotheses" (PDF). Econometrica. 57 (2): 307–333. For nested model comparisons, a parallel analysis is recommended, with a focus on assessing the performance of correlation-based and R^2-based tests using the likelihood ratio test as a reference. Given the availability of these methods in R-packages, conducting such comparisons should be straightforward. I think that the R^2-based test primarily serves to assess model fit, akin to the likelihood ratio test for nested model comparisons and Vuong’s test for non-nested model comparisons. However, correlation-based test is slightly different from R^2-based or likelihood ratio test including Vuong’s. To enhance clarity for readers, it is advisable for the authors to explicitly elucidate this distinction in the Introduction or early section of the Results, ensuring a clear understanding of the metrics employed in their study. 2. The current study does not address nested model comparisons. Is there a specific rationale for this omission? I recognize that dealing with nested model comparisons can become intricate, especially when multiple predictors are involved. However, it is noteworthy that such comparisons can be readily applied to simpler pairwise scenarios. Nested model comparison holds significance as it allows for the hypothesis that the inclusion of an additional predictor significantly enhances the model fit. It would be beneficial for the authors to contemplate how this methodology can be effectively employed, particularly when dealing with multiple predictors. Considering and discussing the implications of nested model comparisons in the context of multiple predictors would enhance the completeness and robustness of the study. 3. In the abstract, the phrase 'the comparison of multiple correlation measures in multiple populations, perhaps correlated' may be misleading. The term 'correlated' should be omitted, as individuals across populations are assumed to be independent of each other. 4. The deflated type I error rate is evident in the R2-based test, as illustrated in Figure 1, particularly when tau is low. This observation aligns with expectations, as lower tau values lead to sign switches in some replicates for the two sets of estimated taus, attributable to sampling errors. Consequently, the discrepancy between two Polygenic Scores (PGSs) for R^2-based metrics is smaller compared to the correlation-based test. It is noteworthy that, with a larger sample size, the deflated type I error rate tends to be mitigated. To further substantiate this, the authors may consider verification with a substantial sample size, such as n=10,000. Additionally, it is crucial to note that the test outcome depends on the formulated hypothesis. Specifically, the hypothesis should assert no difference between the two sets of R^2 when the direction of association is consistent between the two models. In cases where the direction of association differs, the R^2 for negative regression coefficients should manifest as negative when subjected to subtraction. The authors are encouraged to consider and address this nuance in their analysis. 5. In Figure 4, I suggest the authors scrutinize why the test for 'between' decreases as phi increases. Additionally, it would be beneficial to discuss any insights into why the tests for other categories increase with an increasing phi, unless such explanations have already been provided in the text. 6. Figures 5 and 7 could be significantly enhanced by emphasizing the hypothesis related to multiple PGSs across populations (as depicted in section 3.2). Additionally, it is essential to improve the comprehensibility of the legends associated with these figures. 7. When comparing European and African datasets using the GitHub page's function (perform_coranova_parametric(list(afr, eur), "pheno", c("pgs1", "pgs2", "pgs3"))), the author conducted a non-nested model comparison between two independent ancestries, where the sample size can differ between the two populations. Specifically, the author compared the following models: • Model 1: afr$Pheno ~ afr$pgs1 + afr$pgs2 + afr$pgs3 • Model 2: eur$Pheno ~ eur$pgs1 + eur$pgs2 + eur$pgs3 Is my understanding correct? Nevertheless, it is challenging to discern how the number of Polygenic Scores (PGSs) varies in each population. Additionally, the process may not be very user-friendly, especially if the headers differ across data frames. It is recommended that the authors develop a user-friendly manual and an interface within the functions to enhance usability. Reviewer #2: Dr. Gunn and colleagues have introduced a correlation-based approach to test for differences in polygenic scores (PGS) across multiple populations. Utilizing the asymptotic covariance results of correlations from Olkin & Finn, they have developed a test framework and applied it to data from the All of Us project, assessing traits like LDL and height. While the framework is compelling, there are aspects of the manuscript that could be refined for clarity and precision. I suggest the following revisions: 1. On page 3 section 3.1, the definition of \\mu as "the vector of population values" should be clarified to represent the population mean of sample correlations. Moreover, the hypothesis testing section should be made more formally precise by presenting the null hypothesis without the term "mean". If u is defined within a specific population j, why don’t you define u as u_j=(r_(1,j),…,r_(P,j)) and μ_j=(ρ_(1,j),…,ρ_(P,j) ). Then the “between” test null hypothesis can be written as H_0:μ_1=⋯=μ_K. Please update your hypothesis testing section to make it clear. 2. Figure 1 mentions a parameter \\delta that is absent from the figure. The figure should be updated to include this parameter, or the text should be amended to reflect its contents accurately. Additionally, the figure's legend needs more detailed explanations, especially regarding the null hypothesis tested. The same applies to Figure 2, where the legend should provide sufficient context for the results presented. 3. An explanation is needed for why the R2-based test shows significantly less power when the correlation between PGS and outcome is low (e.g., tau = 0.05). 4. Figure 2 is comparing the power of Coranova and R2 Redux. Figure 2 is referenced in the following sentence which focuses on type I error: “We find that our correlation-based method has well controlled type I error when assessing differences in two polygenic scores in a single population (figure 1 and 2)”. It would be helpful to describe the results of figure 2 separately. 5. I am a little bit confused with the setting of Figure 4. Why do you want to conduct the test for “interaction” and “within (difference across scores)” in the setting of “testing between hypothesis” 6. Additional clarification is needed for the results depicted in Figure 4, where the power for detecting differences across PGS is higher when the PGS are more correlated with the outcome (high tau). This seems unintuitive, as it suggests that differences should be harder to discern when all PGSs are highly correlated with the outcome. 7. The manuscript would benefit from a more detailed description of the simulation settings. Each setting ("between," "within," "interaction") should have clear parameters defined against which the null hypothesis is tested. The current descriptions in the figure legends are brief. 8. In the third paragraph of discussion, I am confused with the following sentence: ”12 PGS do not have the same correlation with LDLD cholesterol, mean PGS correlation with LDL does not differ by 1KG genetic-similarity group, and the pattern of correlation between the PGS correlation between the PGS and LDL cholesterol differs by genetic similarity group”. It would be helpful if you could clarify it a little bit more. 9. The modeling of correlations and the corresponding covariance matrix in your framework shows admirable flexibility, particularly with the potential to use various contrast matrices A for hypothesis testing. The use of a global test to discern differences in the correlation of PRS across ancestries is insightful. However, the true robustness of the approach may be challenged when applied to scenarios involving more than two PRS methods, as is common in practice. In such cases, the global test might encounter limitations due to an increased degrees of freedom penalty, which could impact its power. It would be compelling to see the framework applied to these more complex scenarios. Specifically, it would be valuable if the framework could address detailed questions in a multi-ancestry context, such as: 1) Are there performance differences across ancestries for a given PRS method? 2) Within each ancestry, which PRS method provides the best prediction performance? 3) Considering all ancestries, which PRS method emerges as the most effective? 10. When dealing with two PGSs derived from different sources, there might be instances of inverse correlations with similar magnitudes due to the use of different effect alleles. I’m curious whether the Coranova package can account for such scenarios, which are not typically an issue in R2 evaluations but are pertinent to correlation analyses. 11. It would be interesting to see an example wi https://doi.org/10.1371/journal.pgen.1011249.r001
Revision 1
5 Feb 2024 Author Response Attachments Attachment Submitted filename: response_to_reviewers.pdf https://doi.org/10.1371/journal.pgen.1011249.r002
22 Feb 2024 Decision Letter - Michael P. Epstein, Editor, Xiang Zhou, Editor Dear Dr Gunn, Thank you very much for submitting your Research Article entitled 'Correlation-Based Tests for the Formal Comparison of Polygenic Scores in Multiple Populations' to PLOS Genetics. The manuscript was fully evaluated at the editorial level and by independent peer reviewers. There are a few remaining comments from the reviewers that we ask you address in a revised manuscript. We therefore ask you to modify the manuscript according to the review recommendations. Your revisions should address the specific points made by each reviewer. In addition we ask that you: 1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. 2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images. We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. Please let us know if you have any questions while making these revisions. Yours sincerely, Xiang Zhou, Ph.D. Academic Editor PLOS Genetics Michael Epstein Section Editor PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In the context of a polygenic score and its associated outcome, the correlation is typically expected to be positive. However, it's essential to note that there can be instances where a negative correlation arises due to factors such as sampling error or a negative association between estimated SNP effects and the outcome. Please revise 'always' to 'typically' at lines 83 and 441. Other than that, I have no further comments Reviewer #2: The authors have fully addressed my comments. Reviewer #3: The authors have addressed my major concerns. I have the following comments. 1. In the introduction section: .... this correlation should always be positive, ranging from 0 (the PGS is completely independent of 84 its outcome), to 1 (the PGS is perfectly linearly associated with its outcome). This statement requires more justification. 2. In the responses to the reviewers report, please see the response to comment 3 from the reviewer 3. In the portion where the authors have mentioned contrast matrix using matrix algebra, the columns of the first contrast matrix are not linearly independent. We can add the 1st and second columns to get the last column. Or, did the authors mean the row vectors? Because R2 can not have a basis containing more than 2 linearly independent vectors. ******** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: None Reviewer #3: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes:** Haoyu Zhang Reviewer #3: No https://doi.org/10.1371/journal.pgen.1011249.r003
Revision 2
31 Mar 2024 Author Response Attachments Attachment Submitted filename: response_to_reviewers_v2.pdf https://doi.org/10.1371/journal.pgen.1011249.r004
3 Apr 2024 Decision Letter - Michael P. Epstein, Editor, Xiang Zhou, Editor Dear Dr Gunn, We are pleased to inform you that your manuscript entitled "Correlation-Based Tests for the Formal Comparison of Polygenic Scores in Multiple Populations" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Xiang Zhou, Ph.D. Academic Editor PLOS Genetics Michael Epstein Section Editor PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-23-01188R2 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. https://doi.org/10.1371/journal.pgen.1011249.r005
Formally Accepted
18 Apr 2024 Acceptance Letter - Michael P. Epstein, Editor, Xiang Zhou, Editor PGENETICS-D-23-01188R2 Correlation-Based Tests for the Formal Comparison of Polygenic Scores in Multiple Populations Dear Dr Gunn, We are pleased to inform you that your manuscript entitled "Correlation-Based Tests for the Formal Comparison of Polygenic Scores in Multiple Populations" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Livia Horvath PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom plosgenetics@plos.org \| +44 (0) 1223-442823 plosgenetics.org \| Twitter: @PLOSGenetics https://doi.org/10.1371/journal.pgen.1011249.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .