Protein prediction for trait mapping in diverse populations

Ryan Schubert; Elyse Geoffroy; Isabelle Gregga; Ashley J. Mulford; Francois Aguet; Kristin Ardlie; Robert Gerszten; Clary Clish; David Van Den Berg; Kent D. Taylor; Peter Durda; W. Craig Johnson; Elaine Cornell; Xiuqing Guo; Yongmei Liu; Russell Tracy; Matthew Conomos; Tom Blackwell; George Papanicolaou; Tuuli Lappalainen; Anna V. Mikhaylova; Timothy A. Thornton; Michael H. Cho; Christopher R. Gignoux; Leslie Lange; Ethan Lange; Stephen S. Rich; Jerome I. Rotter; NHLBI TOPMed Consortium; Ani Manichaikul; Hae Kyung Im; Heather E. Wheeler

doi:10.1371/journal.pone.0264341

Peer Review History

Original SubmissionAugust 17, 2021
19 Oct 2021 Decision Letter - Heming Wang, Editor Transfer Alert This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present. PONE-D-21-26641Protein prediction for trait mapping in diverse populationsPLOS ONE Dear Dr. Wheeler, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Dec 03 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Heming Wang, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following in the Acknowledgments Section of your manuscript: "This work is supported by the NIH National Human Genome Research Institute Academic Research Enhancement Award R15 HG009569 (HEW). MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420. Funding for SHARe genotyping was provided by NHLBI Contract N02-HL-64278. Genotyping was performed at Affymetrix (Santa Clara, California, USA) and the Broad Institute of Harvard and MIT (Boston, Massachusetts, USA) using the Affymetrix Genome-Wide Human SNP Array 6.0. MESA Family is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support is provided by grants and contracts R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071258, R01HL071259, by the National Center for Research Resources, Grant UL1RR033176. Also supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. The TOPMed MESA Multi-Omics project was conducted by the University of Washington and LABioMed (HHSN2682015000031/HHSN26800004). Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). SOMAscan proteomics for NHLBI TOPMed: Multi-Ethnic Study of Atherosclerosis (MESA) (phs001416.v1.p1) was performed at the Broad Institute and Beth Israel Proteomics Platform (HHSN268201600034I). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. Participants in the INTERVAL randomised controlled trial were recruited with the active collaboration of NHS Blood and Transplant England (www.nhsbt.nhs.uk), which has supported field work and other elements of the trial. DNA extraction and genotyping was co-funded by the National Institute for Health Research (NIHR), the NIHR BioResource (http://bioresource.nihr.ac.uk) and the NIHR [Cambridge August 17, 2021 15/25 Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust]. The INTERVAL study was funded by NHSBT (11-01-GEN). The academic coordinating centre for INTERVAL was supported by core funding from: NIHR Blood and Transplant Research Unit in Donor Health and Genomics (NIHR BTRU-2014-10024), UK Medical Research Council (MR/L003120/1), British Heart Foundation (SP/09/002; RG/13/13/30194; RG/18/13/33946) and the NIHR [Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust]. Proteomic assays were funded by the academic coordinating centre for INTERVAL and MRL, Merck & Co., Inc. A complete list of the investigators and contributors to the INTERVAL trial is provided in Di Angelantonio et al. [34]. The academic coordinating centre would like to thank blood donor centre staff and blood donors for participating in the INTERVAL trial. This work was supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome. 612 The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: "This work is supported by the NIH National Human Genome Research Institute Academic Research Enhancement Award R15 HG009569 (HEW). MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420. Funding for SHARe genotyping was provided by NHLBI Contract N02-HL-64278. Genotyping was performed at Affymetrix (Santa Clara, California, USA) and the Broad Institute of Harvard and MIT (Boston, Massachusetts, USA) using the Affymetrix Genome-Wide Human SNP Array 6.0. MESA Family is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support is provided by grants and contracts R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071258, R01HL071259, by the National Center for Research Resources, Grant UL1RR033176. Also supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. The TOPMed MESA Multi-Omics project was conducted by the University of Washington and LABioMed (HHSN2682015000031/HHSN26800004). Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). SOMAscan proteomics for NHLBI TOPMed: Multi-Ethnic Study of Atherosclerosis (MESA) (phs001416.v1.p1) was performed at the Broad Institute and Beth Israel Proteomics Platform (HHSN268201600034I). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. Participants in the INTERVAL randomised controlled trial were recruited with the active collaboration of NHS Blood and Transplant England (www.nhsbt.nhs.uk), which has supported field work and other elements of the trial. DNA extraction and genotyping was co-funded by the National Institute for Health Research (NIHR), the NIHR BioResource (http://bioresource.nihr.ac.uk) and the NIHR [Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust]. The INTERVAL study was funded by NHSBT (11-01-GEN). The academic coordinating centre for INTERVAL was supported by core funding from: NIHR Blood and Transplant Research Unit in Donor Health and Genomics (NIHR BTRU-2014-10024), UK Medical Research Council (MR/L003120/1), British Heart Foundation (SP/09/002; RG/13/13/30194; RG/18/13/33946) and the NIHR [Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust]. Proteomic assays were funded by the academic coordinating centre for INTERVAL and MRL, Merck & Co., Inc. A complete list of the investigators and contributors to the INTERVAL trial is provided in Di Angelantonio et al. The academic coordinating centre would like to thank blood donor centre staff and blood donors for participating in the INTERVAL trial. This work was supported by Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript" Please include your amended statements within your cover letter; we will change the online submission form on your behalf Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Review: Protein prediction for trait mapping in diverse populations Overview: Schubert et al present work on predicting protein abundance in the TOPMed data using genotype information into large-scale GWAS across diverse populations. Their work builds conceptually on the TWAS model, which predicts steady state gene expression (mRNA levels) rather than downstream protein abundance. They investigated the predictive performance and transferability of prediction models across four TOPMed MESA populations (African Americans, Chinese, European, and Hispanic/Latino). Particular attention was paid to the performance of fine-mapping or baseline prediction models across populations, and follow-up with out-of-sample performance in the INTERVAL protein study. Next, the applied their prediction models to GWAS data for 28 phenotypes of diverse populations in the PAGE consortium. I find this area to be interesting and useful, however I find the results as presented to provide very little insight and lack clear take-aways for downstream decision making. For example, the authors performed a good deal of analyses to quantify protein prediction accuracy, but fail to provide a clear recommendation on which approach to use (e.g., does having more models matter upfront, or having population matched model/GWAS downstream). I appreciate the complexity of multiple analyses across diverse populations complicates matters, but simple meta-analysis tools can simplify the big picture and present results in a consistent manner that help the reader understand what approaches work better on average. Similarly, there are a number of analyses that should have been performed to help place findings in context prior to predictive modeling. Overall, I think the data generated in this manuscript to be interesting and valuable to the broader community, but that a simplified presentation could greatly help with communicating primary findings and informing downstream TWAS/PWAS users. I provide more details below. Major Comments: 1. Given that protein prediction models rely on pQTL signals, it would greatly help if the authors also discussed raw pQTL association (and fine-mapping) results across populations before discussing prediction. Understanding functional enrichment of protein regulatory mechanisms is interesting and crucially unexplored in data at this scale. Including these analyses would help provide context for downstream prediction models and shed light on regulatory mechanisms themselves, prior to their relationship with complex disease risk. 2. Similarly, prediction accuracy is inherently tied to heritability. It would be helpful to see how prediction accuracy tracks with in-sample h2g estimates (or out-of-sample INTERVAL R2 with INTERVAL h2g). 3. Can the authors provide some supporting analyses for when genes fail to replicate across populations? It would be interesting to see how avg Fst at a gene/locus tracks with avg cross-pop R2. Reviewer #2: The authors took up an exploratory study that constructed pQTL models using TOPMed MESA cohorts of various ancestries, including African American, Chinese, European, Hispanic/Latino, and cross-population; models were further evaluated and validated using an independent cohort European INTERVAL. For each population-specific cohort or cross-population cohort, the authors have also developed a baseline model (which I believe was inclusive of all sequenced or genotyped variants) and a fine-mapped model. In general, fine-mapped models outperformed baseline models in terms of significant pQTL signals/models. Furthermore, the authors used the constructed models to perform PWAS on the PAGE cohort and replicated in the UKB+ data when the testing trait is available in the replication cohort. The authors successfully identified several known associations, for example, HDL-APOE. 1. In line 86-90, the authors stated that they identified 372 protein aptamers distinct to MESA and not found in GTEx Whole Blood models. Can there be false positives? Are these protein aptamers population-specific or from the cross-population model? It would explain the distinctiveness of these aptamers if these were population specific. 2. For table 1, are these all replication of previous findings? Are some of them novel? There was analysis in the result saying that some significant association signals of APOE isoforms went away after adjusted for PAV. Would it be better if this is noted in the table 1 or at least stated in the table 1 legend? 3. Were there any related samples in MESA? Did the authors adjust for relatedness among samples? Minor suggestions or side questions 1. What the relationships between pQTLs in this study and MESA eQTLs? Is it correlated? 2. Some acronyms, like PIP, were declared more than once. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes:** Binglan Li [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0264341.r001
Revision 1
16 Dec 2021 Author Response We thank the editor and reviewers for the thorough examination of our manuscript and for providing positive and helpful feedback. We appreciate the opportunity to address reviewer comments here. Our responses are prefaced by RESPONSE: Reviewer #1: Review: Protein prediction for trait mapping in diverse populations Overview: Schubert et al present work on predicting protein abundance in the TOPMed data using genotype information into large-scale GWAS across diverse populations. Their work builds conceptually on the TWAS model, which predicts steady state gene expression (mRNA levels) rather than downstream protein abundance. They investigated the predictive performance and transferability of prediction models across four TOPMed MESA populations (African Americans, Chinese, European, and Hispanic/Latino). Particular attention was paid to the performance of fine-mapping or baseline prediction models across populations, and follow-up with out-of-sample performance in the INTERVAL protein study. Next, the applied their prediction models to GWAS data for 28 phenotypes of diverse populations in the PAGE consortium. I find this area to be interesting and useful, however I find the results as presented to provide very little insight and lack clear take-aways for downstream decision making. For example, the authors performed a good deal of analyses to quantify protein prediction accuracy, but fail to provide a clear recommendation on which approach to use (e.g., does having more models matter upfront, or having population matched model/GWAS downstream). I appreciate the complexity of multiple analyses across diverse populations complicates matters, but simple meta-analysis tools can simplify the big picture and present results in a consistent manner that help the reader understand what approaches work better on average. Similarly, there are a number of analyses that should have been performed to help place findings in context prior to predictive modeling. Overall, I think the data generated in this manuscript to be interesting and valuable to the broader community, but that a simplified presentation could greatly help with communicating primary findings and informing downstream TWAS/PWAS users. RESPONSE: Thank you for your helpful recommendations. We have updated our results as described in our responses to your Major Comments below and have added clearer recommendations to our Discussion, lines 322-327: “Given the improved cross-population prediction of fine-mapped models (S7 Fig, S5 Table, S6 Table) and similar performance to baseline models in PWAS (Fig 5), we recommend using our fine-mapped models in PWAS. We also recommend population-matching in PWAS when protein model training sample sizes are within the same order of magnitude, as in TOPMed MESA, to maximize PWAS discovery, colocalization, and replication.” I provide more details below. Major Comments: 1. Given that protein prediction models rely on pQTL signals, it would greatly help if the authors also discussed raw pQTL association (and fine-mapping) results across populations before discussing prediction. Understanding functional enrichment of protein regulatory mechanisms is interesting and crucially unexplored in data at this scale. Including these analyses would help provide context for downstream prediction models and shed light on regulatory mechanisms themselves, prior to their relationship with complex disease risk. RESPONSE: Thank you for your suggestion. We have added more details about our cis-pQTL mapping to the beginning of the Results (lines 49-60). We added Table 1, which summarizes pQTL counts (FDR < 0.05) and added all pQTL summary statistics to the zenodo repository with the prediction models. We found that effect sizes were enriched near the transcription start site (TSS) for each gene region which mapped to a protein in our sample and that as sample size increased, smaller effect size SNP associations farther from the TSS were discovered (S2 Fig). 2. Similarly, prediction accuracy is inherently tied to heritability. It would be helpful to see how prediction accuracy tracks with in-sample h2g estimates (or out-of-sample INTERVAL R2 with INTERVAL h2g). RESPONSE: We agree, thank you for the suggestion. We estimated heritability of each protein trait using Bayesian Sparse Linear Mixed Modeling (Zhou et al. 2013), as we have done previously for gene expression traits (Wheeler et al. 2016, Mogil et al. 2018). This analysis is added to the Results (lines 113-119): “As the heritability of a trait determines the ceiling for genetic prediction performance, we estimated the proportion variance explained (PVE) by SNPs within 1Mb of each protein encoding gene using Basyesian Sparse Linear Mixed Modeling (BSLMM) [35]. Highly heritable proteins (high PVE) were associated with high predictive performance in INTERVAL across populations, despite larger credible sets surrounding the PVE estimates in the smaller populations, i.e., CHN and AFA. (S5 Fig).” We added a description of our BSLMM analysis to the Methods (lines 501-506): “We used the software GEMMA to implement BSLMM for each protein aptamer with 100K sampling steps per aptamer. BSLMM estimates the PVE (the proportion of variance in phenotype explained by the additive genetic model, analogous to h2). From the second half of the sampling iterations for each aptamer, we compared the median and the 95% credible sets of the PVE to model performance in INTERVAL.” 3. Can the authors provide some supporting analyses for when genes fail to replicate across populations? It would be interesting to see how avg Fst at a gene/locus tracks with avg cross-pop R2. RESPONSE: Thank you for the suggestion. We have taken your advice and believe our results make our paper stronger. We added a new figure (now Fig 4) and the following to the Results (lines 145-164): “When we compared all five TOPMed MESA training populations within each model building strategy, we observed the largest and most significant differences between populations in the baseline models rather than the fine-mapped models (S7 Fig, S5 Table, S6 Table). To test the hypothesis that allele frequency differences between populations influence predictive power, we performed a fixation index (FST) analysis. For each model set, we calculated the (FST) between INTERVAL and the corresponding TOPMed population for SNPs in the predictive model. We then compared the difference in average (FST) between protein models that had a large difference in predictive performance between populations and protein models that had a small difference (Fig 4). We tested multiple thresholds for differences in predictive performance in both fine-mapped and baseline model sets. We found that models which had minimal differences in their performance had significantly smaller differences in average FST than models which had larger differences in performance by Wilcoxon signed-rank test (Fig 4). This effect was observed for multiple thresholds in both baseline and fine-mapped model sets, but was attenuated in fine-mapped sets. Thus, performance differences between populations in the fine-mapped models are less likely due to allele frequency differences. As sample sizes in proteomics studies increase, allowing identification of SNPs with higher PIP values, including trans-acting pQTLs, we anticipate increased cross-population performance benefit from multi-ancestries fine-mapping.” Reviewer #2: The authors took up an exploratory study that constructed pQTL models using TOPMed MESA cohorts of various ancestries, including African American, Chinese, European, Hispanic/Latino, and cross-population; models were further evaluated and validated using an independent cohort European INTERVAL. For each population-specific cohort or cross-population cohort, the authors have also developed a baseline model (which I believe was inclusive of all sequenced or genotyped variants) and a fine-mapped model. In general, fine-mapped models outperformed baseline models in terms of significant pQTL signals/models. Furthermore, the authors used the constructed models to perform PWAS on the PAGE cohort and replicated in the UKB+ data when the testing trait is available in the replication cohort. The authors successfully identified several known associations, for example, HDL-APOE. 1. In line 86-90, the authors stated that they identified 372 protein aptamers distinct to MESA and not found in GTEx Whole Blood models. Can there be false positives? Are these protein aptamers population-specific or from the cross-population model? It would explain the distinctiveness of these aptamers if these were population specific. RESPONSE: Thank you for your questions. The cross-validated prediction performance of all models with R2>0.01 is listed in S3 Table along with columns indicating whether or not the gene has a GTEx whole blood or any tissue transcription model (mashr method in Barbeira et al. 2020). We note many proteins (254/372) that do not have a mashr transcript model in GTEx do have a significant protein aptamer model trained in the MESA EUR population, which is the closest ancestry to GTEx, therefore most aptamers are not population-specific. Yes, we agree that some of the models listed in S3 Table may be false positives, which is why we go on to test them in the independent INTERVAL cohort. 2. For table 1, are these all replication of previous findings? Are some of them novel? There was analysis in the result saying that some significant association signals of APOE isoforms went away after adjusted for PAV. Would it be better if this is noted in the table 1 or at least stated in the table 1 legend? RESPONSE: Thank you for the suggestion. Yes, the APOE associations were no longer significant after adjusting for PAVs. We agree that this should be noted in what is now Table 2 and have added a footer indicating which associations are no longer significant after PAV adjustment. We also discuss in lines 221-227 that “Three of our protein-trait associations were not found in the original PAGE GWAS, but are still supported by independent GWAS. Increased Haptoglobin, Mixed Type was associated with decreased LDL cholesterol and decreased total cholesterol, both of which are corroborated by GWAS at this locus (Klarin et al. 2018). Increased IL-1Ra was associated with decreased C-reactive protein. SNPs near IL-1Ra associated with C-reactive protein in an independent GWAS (Han et al. 2020). The directions of effect for each protein-phenotype association were consistent between all training populations.” 3. Were there any related samples in MESA? Did the authors adjust for relatedness among samples? RESPONSE: Yes, we adjusted for cryptic relatedness using PCAIR, as described in the Methods, lines 431-442. No close relatives (1st-2nd degree) were identified. Minor suggestions or side questions 1. What the relationships between pQTLs in this study and MESA eQTLs? Is it correlated? RESPONSE: We agree this would be a useful analysis, but it would be a significant project beyond the scope of this paper due to differences in tissues, timing, samples, and harmonization issues. For example, the protein data come from plasma, while TOPMed MESA has RNA-Seq data in PBMCs, monocytes, and T-cells taken at different exam timepoints. We note there are other ongoing TOPMed proposed papers performing such integrative analyses. 2. Some acronyms, like PIP, were declared more than once. RESPONSE: Thank you, we have edited our manuscript so acronyms are declared upon first use and not again. Attachments Attachment Submitted filename: Schubert et al. Reviewer Response.pdf https://doi.org/10.1371/journal.pone.0264341.r002
9 Feb 2022 Decision Letter - Heming Wang, Editor Protein prediction for trait mapping in diverse populations PONE-D-21-26641R1 Dear Dr. Wheeler, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Heming Wang, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #2: (No Response) ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No https://doi.org/10.1371/journal.pone.0264341.r003
Formally Accepted
14 Feb 2022 Acceptance Letter - Heming Wang, Editor PONE-D-21-26641R1 Protein prediction for trait mapping in diverse populations Dear Dr. Wheeler: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Heming Wang Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0264341.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .