Peer Review History

Original SubmissionFebruary 6, 2025
Decision Letter - Jie Liu, Editor

PCOMPBIOL-D-25-00243

Generating Correlated Data for Omics Simulation

PLOS Computational Biology

Dear Dr. Brooks,

Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

​Please submit your revised manuscript within 60 days Jun 17 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter

We look forward to receiving your revised manuscript.

Kind regards,

Jie Liu

Academic Editor

PLOS Computational Biology

Jian Ma

Section Editor

PLOS Computational Biology

Journal Requirements:

1) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019.

2) Please provide an Author Summary. This should appear in your manuscript between the Abstract (if applicable) and the Introduction, and should be 150-200 words long. The aim should be to make your findings accessible to a wide audience that includes both scientists and non-scientists. Sample summaries can be found on our website under Submission Guidelines:

https://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-parts-of-a-submission

3) Your manuscript's sections are not in the correct order.  Please amend to the following order: Abstract, Introduction, Results, Discussion, and Methods

4) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: 

https://journals.plos.org/ploscompbiol/s/figures

5) Thank you for stating "All data used is available with accession numbers GSE151923,GSE81142, GSE151565." Please include the repository name in your Data Availability Statement.

6) Please include the grant recipients in the Funding Information tab.

7) Thank you for stating "Source code for all simulations and figures in this plot is available at github.com/itmat/dependent_sim_paper/." This link reaches a 404 error page. Please amend this to a working link.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Authors:

Please note that two reviews are uploaded as attachments.

Reviewer #1: Please see attached.

Reviewer #2: This study presents three methods for simulating correlated omics data using a Gaussian copula approach with a covariance matrix decomposition. The authors demonstrate their utility in benchmarking computational tools such as DESeq2 and CYCLOPS. This paper is quite interesting. However, there are some major and minor issues need to be addressed:

Major issues:

1. The manuscript describes a data simulation approach. Could you provide a clear rationale for this specific goal? Also, the authors demonstrate the covariance matrix decomposition into diagonal and low-rank components. Can you show what the hypothesis behind it. More specifically, how does this decomposition compare to other methods of modeling correlation structures in RNA-seq data?

2. For the introduction section, could you include more related works on probabilistic models and generative models that are relevant to data simulation and benchmarking? Providing a broader context on these approaches would strengthen the foundation of the study.

3. While the study compares simulated data to real datasets, the validation metrics primarily focus on variance and correlation structure. Could you provide additional validation using biological benchmarks (e.g., differential analysis, functional enrichment) could further support the biological relevance of the simulated datasets.

4. The results indicate increased variance in DESeq2 outcomes when using correlated data, but the practical implications for differential expression analysis are not fully explored. What does this increased variance represent?

5. By comparing with the real datasets, the authors compared the independent genes with the dependent genes with various methods. But none of the existing simulation approach has been compared. Can you benchmark with existing methods and show your simulation performances, e.g scDesign, ZINB-WAVE, etc.?

6. What is your conclusion for this three different methods? It seems like there is no consistent best methods across different datasets.

Reviewer #3: Main review comments are uploaded as an attachment.

Overall, this article has a lot of writing problems and the results are not particularly convincing.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No: The author states that “Source code for all simulations and figures in this plot is available at github.com/itmat/dependent_sim_paper/”. However, currently there is no content in the provided link. The author should make sure the source code is available at the provided link. The provided github repository for the R package seems to be valid.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

Figure resubmission:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions.

Reproducibility:

To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Attachments
Attachment
Submitted filename: 20250305.pdf
Attachment
Submitted filename: Review Comments.pdf
Revision 1

Attachments
Attachment
Submitted filename: plos_comp_bio_reviewer_response.docx
Decision Letter - Jie Liu, Editor

Dear Dr Brooks,

We are pleased to inform you that your manuscript 'Generating Correlated Data for Omics Simulation' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Jie Liu

Academic Editor

PLOS Computational Biology

Jian Ma

Section Editor

PLOS Computational Biology

***********************************************************

There are still a few minor revision suggestions. That will be great if the authors can make the changes eventually.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This revision has resolved my main concerns. Some remaining points for clarification:

1. The setup described in response to my minor point (2) makes sense. I would still considering calling the samples "conditionally independent" or "independent conditional on time" rather than independent, sinec that might give the impression of the samples being independent without any conditioning at all.

2. The additional comparison with SPSimSeq is worthwhile. There are many simulators that could have been used, and the manuscript should justify why SPSimSeq was chosen. My understanding is that it was chosen because it is also based on Gaussian copulas, but this wasn't entirely clear.

3. The new runtime benchmarking is useful but potentially raises questions about how this proposal compares with the relatively large literature on fast simulation of multivariate gaussians (e.g., mvnfast). While it is helpful to mention bigsimr, it does not seem fair to dismiss the entire literature based on this one package's performance, and a few extra references could help prevent any risk of misinterpretation.

Reviewer #2: I have read the Authors' response to the reviewers and am satisified that they have addressed all of my questions and concerns.

Reviewer #3: It seems that the authors have addressed most of my review comments. I have the following four follow-up comments:

1. When responding to review comment #8, the authors included some theoretical details on how the random sampling works.

(1) “We use two basic facts about the multivariate distribution.” Here it should be “multivariate normal distribution”, not “multivariate distribution”.

(2) “if u ~ N(0, Sigma_1) and v ~ N(0, Sigma_2), then u + v ~ N(0, Sigma_1 + Sigma_2).” This only holds when u and v are independent or at least uncorrelated. The authors need to explicitly state this assumption to avoid confusion, and they also need to explain why this independence or uncorrelated assumption applies here.

2. The authors may consider spending some more efforts on revising the language of the paper. Right now many places read unsmooth to me, but if I read it a few more times, I could get the main point.

3. In the Data Availability Section, the link to the Metabolomics data from MetaboAnalyst does not work when I click it.

4. I found a typo: Row 6 of the caption of Figure 1: should be corpcor, not coprcor.

Reviewer #4: This manuscript presents a set of efficient methods for simulating high-dimensional omics data with realistic correlation structures using a Gaussian copula framework, addressing a key limitation in current simulation tools that often assume independence across features. The authors propose three covariance matrix strategies (PCA, spiked Wishart, and corpcor), implement them in the dependentsimr R package, and demonstrate their utility through benchmarking applications with DESeq2 and CYCLOPS. The revised version thoughtfully and thoroughly addresses prior reviewer concerns. The manuscript is substantially improved. As a minor suggestion, it would be helpful to provide a brief roadmap for extending the framework to additional omics modalities (e.g., single-cell, proteomics, or multi-omics), which would strengthen the paper’s generalizability and long-term impact.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Kris Sankaran

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

Formally Accepted
Acceptance Letter - Jie Liu, Editor

PCOMPBIOL-D-25-00243R1

Generating Correlated Data for Omics Simulation

Dear Dr Brooks,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Livia Horvath

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .