Peer Review History

Original SubmissionMay 23, 2019
Decision Letter - Hua Tang, Editor, Amy L. Williams, Editor

Dear Dr Nelson,

Thank you very much for submitting your Research Article entitled 'Coupling Wright-Fisher and coalescent dynamics for realistic simulation of population-scale datasets' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review again a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. Of note, both reviewers asked for a more detailed discussion and analysis of the impact of Wright-Fisher modeling. As noted by reviewer 2, performing simulations in a pedigree addresses the issues raised in this manuscript and should be explored. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Amy L. Williams

Guest Editor

PLOS Genetics

Hua Tang

Section Editor: Natural Variation

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Review of Nelson et al

In this manuscript the authors describe an implementation of a discrete time Wright Fisher model as part of the msprime package. The authors show how the coalescent approximation breaks down when the sample size, call it n, approaches the effective population size, Ne, and when large regions of the genome are simulated. While this result has been known for some time, the authors describe new facets to this issue and provide an implemented solution. The DTWF model that they implement compares favorably to the Hudson coalescent with respect to runtime and adequately captures features of the genealogical process that the coalescent approximation can not.

Generally I believe this to be a significant contribution, however the manuscript as written needs some substantial revision. I have point by point criticisms and suggestions that follow.

1) Generally it would be helpful to explore at what ratio of n/Ne these issues manifest. I would suggest revising the figures 2 and 3 to include a number of n/Ne ratios to show how this scaling effects the fit of the coalescent approximation.

2) Figure 1A is very hard to follow. It certainly does not clarify what is going on. I’d suggest the authors create a new figure to describe things.

3) Lines 87-90 in the motivation section. Is this issue purely a consequence of diploidy not being modeled? If the authors could explain the rationale a bit here it would be helpful.

4) Also with respect to motivation—the authors currently look at IBD tract lengths and the variance in ancestry as potential issues with the coalescent approximation. While this is great, both of those essentially are two facets of the same issue—recombination not being adequately captured by the coalescent. Can the authors look at different aspects of the data? For instance is the SFS perturbed in this regime under the coalescent?

5) Line 134—typo here. No section given.

6) Fig 3—unclear what the units of TMRA (shown in colors) are. Generations? Also in that figure—why is there a large gap in the data points in the top panel?

7) Figure 5 caption—the caption says that the sample was 1000 haploids but Ne=10000 diploids. Is this a typo? Was the actual sample size 10000 haploids?

8) With respect to hybrid models—it would be good to show how IBD and LD are affected by the hybrid model – are these features faithfully captured by using the hybrid models?

9) With respect to the performance analysis—it looks like the DTWF outperforms the coalescent model starting at 1e9 bp. While this is fine, we almost never have to simulate a billion bp chromosome and instead we can simulate unlinked chromosomes as separate, one from another. The authors should probably point this out.

10) In the Supplement the authors should spend more time describing the implementation. It is very non-technical at this point. Also the authors might point the reader to the code.

11) Line 382—typo “underestimated”

12) Last point—the authors should show how the issue of large samples not being adequately modeled under the coalescent is realized in empirical data. For instance the authors could analyse IBD tract lengths in the UK Biobank dataset and show that the distribution observed does not square with a coalescent process. As written the paper feels more like a technical computing note than a genetics paper.

Reviewer #2: This manuscript proposes a Wright-Fisher extension of msprime, a well-used coalescent simulator. Clearly this is a useful extension, but I feel that further work is needed for publication in PLoS Genetics.

First, it is disappointing to see only simulation results under a constant population size model. The authors should explore more realistic demographic models (e.g., previously inferred human population histories with two phases of exponential growth in the recent past) and study the accuracy of the standard coalescent model under those scenarios.

The authors have not directly demonstrated that using the WF model produces a better fit to real data. For example, it would be interesting to compare the IBD length distribution estimated from real data with simulation results from msprime (WF) and msprime (Hudson) under an inferred demographic model (e.g., inferred using the site frequency spectrum).

Co-author Kelleher has done interesting work on simulating pedigrees. It would be natural to think of a hybrid model where a pre-specified pedigree or a probabilistic pedigree model is used for the recent past, followed by the standard coalescent in the distant past. This would be a welcome extension and could be more useful than the WF extension. After all, the WF model is rather simple and idealized, while the actual mating pattern in real populations is much more complicated. Related to this point, would it be possible to incorporate other random mating models (e.g., general Cannings exchangeable models) into msprime?

Since one of the main motivations for the WF extension concerns IBD sharing, it seems important to implement crossover interference. If this is not an overly difficult extension, I would strongly recommend implementing it.

Please explain why msprime (WF) is faster than the previous version of msprime, as shown in Figure 5. Is it because the number of lineages is bounded by the population size in "msprime (WF)", as shown in Figure 2? It would be good to discuss Figure 5 in the context of Figure 2. Related to this point, please explain why "hybrid (100 WF generations)" is slower than "msprime (Hudson)", while "hybrid (1000 WF generations)" is faster than "msprime (Hudson)". To determine the optimal switch time in the hybrid model, it seems that one should investigate the trade-off between the computational overhead for using the WF model and the reduction in the number of lineages. This suggests that the optimal switch time would depend on the demographic model. This point should be clarified. Similarly, the authors should explain why "msprime (WF)" is less efficient than "msprime (Hudson)" for shorter regions, by discussing the trade-off mentioned above.

My understanding is that Bhaskar et al. (PNAS 2014, 111:2385-2390) first proposed the hybrid model, but this is not clearly acknowledged in the manuscript. The first three pages of the manuscript (including the title) give the impression that the idea is being proposed here for the first time.

Minor comments:

- Figure 1A: This figure is difficult to understand. Please explain it more clearly in the caption.

- Figure 2 : Perhaps this should be plotted with the x-axis in log scale? Also, it would not hurt to mention that the x-axis is in "Generations (backwards in time)".

- Figure 3 : In the top figure, please explain why there are few IBD segments of length between ~7*10^8 and ~10^9.

- Line 81-82: "We traced this phenomenon to samples having more than 2^t simulated ancestors at generation t in the past" is ambiguous. I think you meant, "We traced this phenomenon to some individuals in the sample having..."

- Line 95-96: It would help the reader to explain here why recent events in migration models induce long-range correlations along the genome.

- Lines 101-105: Bhaskar et al. (2014) compared the WF and the coalescent models with respect to the number of lineages at a single site. It would be good to discuss this result in relation to your result.

- There are blank references to sections throughout the manuscript. For example, Figure 2 caption ends with "described in Section ."

- Line 140: Replace "closely related samples" with "closely related individuals".

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Revision 1

Attachments
Attachment
Submitted filename: Response to reviewers.pdf
Decision Letter - Hua Tang, Editor, Amy L. Williams, Editor

* Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. *

Dear Dr Nelson,

Thank you very much for submitting your Research Article entitled 'Accounting for long-range correlations in genome-wide simulations of large cohorts' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. Only one comment from Reviewer 2 remains to be addressed, and this should likely be possible quickly. One possibility is to simply make a textual change.

We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Amy L. Williams

Guest Editor

PLOS Genetics

Hua Tang

Section Editor: Natural Variation

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I'm pleased with the edits made to this revision. This is an excellent contribution.

Reviewer #2: Overall the authors have done a good job of revising the paper and I am generally satisfied with all the changes. One exception is their response to my first major comment regarding the effect of demography on the distribution of pairwise IBD length. The authors have done simulation using the Out-of-Africa model from Gutenkunst et al. (2009), but my understanding is that in that model the present effective population sizes of YRI, CEU, and CHB are 7300, 29524, and 53403, respectively. What would happen if the present effective population size were much larger, say 1 million or 10 million, while the sample size is held at 1000? The authors claim, "the overall relationship between IBD counts and IBD length ... does not depend on the details of the demographic history or sample sizes." To me, this seems like a strong claim which warrants more rigorous justification, as it might send an incorrect message to the reader. To what extent does it not depend on the demographic model? Could you be more quantitative?

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Revision 2

Attachments
Attachment
Submitted filename: Response to reviewers #2.pdf
Decision Letter - Hua Tang, Editor, Amy L. Williams, Editor

Dear Dr Nelson,

We are pleased to inform you that your manuscript entitled "Accounting for long-range correlations in genome-wide simulations of large cohorts" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional accept, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about one way to make your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Amy L. Williams

Guest Editor

PLOS Genetics

Hua Tang

Section Editor: Natural Variation

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-19-00848R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Formally Accepted
Acceptance Letter - Hua Tang, Editor, Amy L. Williams, Editor

PGENETICS-D-19-00848R2

Accounting for long-range correlations in genome-wide simulations of large cohorts

Dear Dr Nelson,

We are pleased to inform you that your manuscript entitled "Accounting for long-range correlations in genome-wide simulations of large cohorts" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Matt Lyles

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .