DRAMS: A tool to detect and re-align mixed-up samples for integrative studies of multi-omics data

Yi Jiang; Gina Giase; Kay Grennan; Annie W. Shieh; Yan Xia; Lide Han; Quan Wang; Qiang Wei; Rui Chen; Sihan Liu; Kevin P. White; Chao Chen; Bingshan Li; Chunyu Liu

doi:10.1371/journal.pcbi.1007522

Peer Review History

Original SubmissionOctober 30, 2019
27 Nov 2019 Decision Letter - Thomas Lengauer, Editor, Ilya Ioshikhes, Editor Dear Dr Liu, Thank you very much for submitting your manuscript 'DRAMS: A Tool to Detect and Re-Align Mixed-up Samples for Integrative Studies of Multi-omics Data' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts. In addition, when you are ready to resubmit, please be prepared to provide the following: (1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors. (2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text. (3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution. Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are: - Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition). - Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video. - Funding information in the 'Financial Disclosure' box in the online system. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here. We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us. Sincerely, Ilya Ioshikhes Associate Editor PLOS Computational Biology Thomas Lengauer Methods Editor PLOS Computational Biology A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Jiang et al. developed a tool to detect and correct mixed-up samples in multi-omics data. The mix-up refers to switched IDs during the data generation process. Through analyzing the PsychENCODE BrainGVEX data, they corrected 12.5% mixed-up IDs, and this correction is shown to improve the discovery of cis-QTL. I think the paper addressed an important and interesting research question, and it was written in a convincing way. Below I list some comments that may improve the manuscript. Major comments: “We can cluster all the highly related data and consider that the data from one cluster have only one potential ID.” It was not sure how the clustering was done. Is it based on a clustering algorithm? What is the resulted cluster size? Is it restricted to be the number of data types? Is it necessary that each cluster has each data type? More details are expected to explain the clustering procedure. In Fig 1 and S2, the range of genetic relatedness score exceeds 1. Is it really possible to exceed one or just because of density curve smoothing? If the latter, the authors may consider using a way to estimate the density within a restricted range. Additionally, in Fig 1, “The text in each node represents the data ID. Different colors represent different omics types.” The pair of A-A in Step 1 Type 1 are in the same color. Should A-A be in different colors? In Fig 2, the authors assessed the proportions of successfully corrected mix-ups. I am wondering if there are non-mix-ups are wrongly classified as mix-ups and then overcorrected. For example, if using the cutoff of 0.65 of genetic relatedness scores for highly related data pairs, are there any non-mix-ups wrongly classified as mismatched data pairs? It may be related to the section of "Sensitivity and specificity in extracting highly related data pairs," but they are in different simulation settings. In the simulation, how are the different data types simulated? How is the logistic regression fitted? Is it the same fitted model based on 44 data pairs as in the real data analysis? The overcorrection may be an issue in the analysis of PsychENCODE data. “We identified a total of 1971 matched pairs and 518 mismatched pairs… In the end, we corrected 201 (12.5%) IDs for data of the six omics types.” There percentage of mismatched pairs and corrected IDs seems high. Minor comments: Fig 4 has a lot of abbreviations. Providing the full names in the caption would help reading. The Supplementary Figures need captions. “Since we intend to test whether the number of cis-QTLs increased, only chromosome 1 was used to save computing time.” This should be at least noted in the caption of Table 2 to help interpret the number of eQTLs. Page 19, “In formula 3, Sa and Sb indicate the sex matching level for data “A” and “B”, respectively. Taking Sa as an example, the original value of Sa would be 0. If the reported sex and genetics-based sex in data “A” are matched, Sa would be plus 0.5. If the reported sex of data “A” and the genetics-based sex of data “B” are matched, which means that the reported sex and genetics-based sex are matched after switch ID from “B” to “A”, then Sa would be plus 0.5.” Could you explain more why the values of Sa are set in this way? It seems hard to understand and confusing. Grammar error. Page 2: “As the number of datasets to be integrated increaseS” Page 3: “all omics data of the same samples were clusterED together” Page 4: “For the first step is to build highly related data pairs.” Page 18: “those pairS with smaller scores were classified” Reviewer #2: Comments to authors The manuscript by Yi and colleagues, “DRAMS: A Tool to Detect and Re-Align Mixed-up Samples for Integrative Studies of Multi-omics Data”, presents a new method to detect and re-align mixed-up samples in multi-omics studies. The authors calibrated DRAMS using simulations and applied their method to the data from PsychENCODE BrainGVEX project to correct 201 sample IDs. They further tried to validated the results by comparing PCA and eQTL results before and after correcting sample IDs. The experiment is well designed and the analyses were carefully conducted. As the increasing emergency of multi-omics data, this method would be useful. Major comments: 1. The authors should compare DRAMS with state-of-the-art methods of this kind in both simulation and real data analyses, such as MixupMapper. 2. DRAMS needs a set of hand-picked high-confidence mismatched data pairs as a training set. How many pairs with certain switch directions are required in the training process? What if there are no such mismatched data pairs? 3. The authors used a relatedness score of 0.65 to define highly related data pairs. Is it possible that the related individual pairs (due to relatedness rather than the same individual in different data types) have the relatedness score > 0.65? More justification is required here. 4. The authors claimed that more types of omics data were included, the power is higher. However, it would increase the computational complexity. In this case, how about the computational burden? It would be useful to summary the resources requirement (e.g., running time, memory usage) under different simulation scenarios. 5. In the eQTL analysis, did the authors correct for ancestry and sex? In addition, as discussed by the authors, large number of QTLs does not necessarily mean correct data ID. It would be useful to replicate the new discovery of eQTLs in an independent data set. Minor comments: 1. Figure 1 is quite busy. It is difficult for the readers to follow. Detailed legend is needed. 2. Is it a typo of “404” mismatched data pairs, given 518 mismatched pairs in total and 44 pairs with certain switch directions? 3. In Table S4, there were 80 data pairs with the same ID were defined as “mismatched pairs”. It is not clear of the definition of “matched pairs” and “mismatched pairs” in Fig. 3. The authors also mentioned that eight data were not related to any other data. It does not make sense that only eight data are not related to any other data among all samples. 4. In the validation analysis using PCA, after correcting for mismatched IDS, the ancestry of all the other samples were matched correctly? ******** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: None Reviewer #2: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1007522.r001
Revision 1
23 Jan 2020 Author Response Attachments Attachment Submitted filename: Responce to reviewers.docx https://doi.org/10.1371/journal.pcbi.1007522.r002
9 Feb 2020 Decision Letter - Thomas Lengauer, Editor, Ilya Ioshikhes, Editor Dear Liu, Thank you very much for submitting your manuscript "DRAMS: A Tool to Detect and Re-Align Mixed-up Samples for Integrative Studies of Multi-omics Data" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Ilya Ioshikhes Associate Editor PLOS Computational Biology Thomas Lengauer Methods Editor PLOS Computational Biology ********************* A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: My comments have been addressed. Reviewer #2: The authors addressed most of my concerns except the my question #3. It is nice that the authors added a paragraph (lines 276-385) to suggest the use of stringent relatedness score in family data. However, it is still not clear how the relatedness score will affect the result and how to choose the relatedness score in the real data analysis. It is better to test the method using a variety of relatedness score. ****** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: None Reviewer #2: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods https://doi.org/10.1371/journal.pcbi.1007522.r003
Revision 2
16 Feb 2020 Author Response Attachments Attachment Submitted filename: Responce to reviewers.docx https://doi.org/10.1371/journal.pcbi.1007522.r004
28 Feb 2020 Decision Letter - Thomas Lengauer, Editor, Ilya Ioshikhes, Editor Dear Liu, We are pleased to inform you that your manuscript 'DRAMS: A Tool to Detect and Re-Align Mixed-up Samples for Integrative Studies of Multi-omics Data' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Ilya Ioshikhes Associate Editor PLOS Computational Biology Thomas Lengauer Methods Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #2: My comments have been addressed. ****** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #2: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1007522.r005
Formally Accepted
24 Mar 2020 Acceptance Letter - Thomas Lengauer, Editor, Ilya Ioshikhes, Editor PCOMPBIOL-D-19-01762R2 DRAMS: A Tool to Detect and Re-Align Mixed-up Samples for Integrative Studies of Multi-omics Data Dear Dr Liu, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Bailey Hanna PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1007522.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .