The Burr distribution as a model for the delay between key events in an individual’s infection history

Nyall Jamieson; Christiana Charalambous; David M. Schultz; Ian Hall

doi:10.1371/journal.pcbi.1012041

Peer Review History

Original SubmissionApril 2, 2024
27 Jul 2024 Decision Letter - Thomas Leitner, Editor, Eric HY Lau, Editor Dear Mr Jamieson, Thank you very much for submitting your manuscript "The Burr distribution as a model for the delay between key events in an individual’s infection history." for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. The Authors are expected to address all the criticisms by all Reviewers. In particular, please further assess the fitting or setup of the Burr XII distribution which had large standard errors (Reviewers #1 & #2), provide a more balanced description concerning the performance of the proposed Burr distributions, and of the existing methods (Reviewer #1), harmonize the terminology and revise the discussion about doubly interval-censored method, continuous model and adjustment of delays (Reviewer #2). In additional to the above comments, please address, 1. Abstract, the proposed Burr models were compared to “currently used methods”. However, in the main results (Table 2) it is only compared with Gamma distribution which does not give a good representation of the best or commonly used methods. Table 2 should extend to Weibull and lognormal distributions. 2. Line 170-171, “However, symptom onset is likely proportional to bacterial load at low loads (i.e., the early stages of infection) before saturating at large loads.” Could the authors provide a reference for the statement? 3. Also, please reconsider if the statement is applicable to a specific or a specific class of diseases, e.g. Legionnaires’ disease or the example diseases in Fig S2 4. If the statement is not a general observation, for example there is a delay in symptom onset with respect to increase in viral load, would the proposed derived Burr distribution or Burr distribution types still be applicable? 5. Could the authors provide some brief description of the Melbourne Legionnaires’ data? In particular, how the incubation period was determined up to an accuracy of a single day? Was there any simplification involved? 6. Similarly, please provide brief description of the incubation period data for other diseases 7. The examples above did not illustrate the practical situation where the uncertainty on the exposure time (or to a lesser extent symptom onset) is usually longer than a single day, under such situation the full performance of doubly-censoring framework should be demonstrated 8. Please provide more description on the simulation, in particular how or if the uncertainty in exposure time was introduced to the simulated data. Note that to demonstrate the performance of the proposed distribution and the doubly-censoring framework, uncertainty in the exposure time similar to that in practice should be introduced. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Eric HY Lau, Ph.D. Academic Editor PLOS Computational Biology Thomas Leitner Section Editor PLOS Computational Biology ********************* The Authors are expected to address all the criticisms by all Reviewers. In particular, please further assess the fitting or setup of the Burr XII distribution which had large standard errors (Reviewers #1 & #2), provide a more balanced description concerning the performance of the proposed Burr distributions, and of the existing methods (Reviewer #1), harmonize the terminology and revise the discussion about doubly interval-censored method, continuous model and adjustment of delays (Reviewer #2). In additional to the above comments, please address, 1. Abstract, the proposed Burr models were compared to “currently used methods”. However, in the main results (Table 2) it is only compared with Gamma distribution which does not give a good representation of the best or commonly used methods. Table 2 should extend to Weibull and lognormal distributions. 2. Line 170-171, “However, symptom onset is likely proportional to bacterial load at low loads (i.e., the early stages of infection) before saturating at large loads.” Could the authors provide a reference for the statement? 3. Also, please reconsider if the statement is applicable to a specific or a specific class of diseases, e.g. Legionnaires’ disease or the example diseases in Fig S2 4. If the statement is not a general observation, for example there is a delay in symptom onset with respect to increase in viral load, would the proposed derived Burr distribution or Burr distribution types still be applicable? 5. Could the authors provide some brief description of the Melbourne Legionnaires’ data? In particular, how the incubation period was determined up to an accuracy of a single day? Was there any simplification involved? 6. Similarly, please provide brief description of the incubation period data for other diseases 7. The examples above did not illustrate the practical situation where the uncertainty on the exposure time (or to a lesser extent symptom onset) is usually longer than a single day, under such situation the full performance of doubly-censoring framework should be demonstrated 8. Please provide more description on the simulation, in particular how or if the uncertainty in exposure time was introduced to the simulated data. Note that to demonstrate the performance of the proposed distribution and the doubly-censoring framework, uncertainty in the exposure time similar to that in practice should be introduced. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors present a comparison between the Burr family of distributions and more commonly used distributions in terms of their ability to accurately capture the incubation period of a range of diseases, with a particular focus on Legionnaires disease. They also explore the effects of failing to account for the fact that observations of incubation periods are generally doubly censored. The paper is clear, well written and likely to be of relevance and interest to readers of this journal. I have some comments, mainly around the interpretation and framing of the results, as I worry that the generality of the findings may be overstated in places but I think these concerns should be relatively easily addressed. Comments below: Abstract: “Our approach provides biological justification in the derivation of our delay distribution model, the results of fitting to data highlighting the superiority of the Burr model compared to currently used models in the literature. Our results indicate that the derived Burr distribution is 13 times more likely to be a better-performing model to incubation-period data than currently used methods.” I think the authors are overstating the generality of the results here and this statement should be toned down. The authors have shown that the Burr is more appropriate in certain situations (where there is a clearly defined mode and/or the incubation period tapers off) but not in others. The authors showed that the derived Burr is 13 times more likely to be a better performing method than the gamma distribution in 1 situation applied to a single dataset but this is far from a general findings across the range of scenarios explored. It would be reasonable to say it is better sometimes rather than trying to claim it is better all the time and state briefly under which conditions it appears to perform better. By my understanding of the paper the current statements in the abstract are not well supported. Line 25: “The validity of these distributions has not been explored” This feels like an overly strong statement. It feels to me more like the authors are arguing the justification for using them is suboptimal and they do a better job but I’m not sure it is fair to say nobody has ever explored the validity of this incredibly common and widely used approach. Line 27: “The gamma distribution is the sum of n exponentially distributed random events” Technically (assuming n is an integer which it would need to be to relate to compartmental events) then this actually describes an Erlang distribution, no? It’s a very picky point though so I’m not too fussy about it. Lines 57-58: “Using standard probability distributions, as well as censored incubation-period data in statistical analysis, is likely to produce biased inference.” Its not clear immediately to me (and therefore I expect other readers) in which direction this should induce bias rather than just a lack of precision. Suggest adding a bit more explanation. Lines 92-94: “However, most of these methods are either too simple, …” I think the authors not to say what is missing here, rather than just that they’re too simple. If it is just the points that immediately follow then remove the “too simple” argument. Line 150: I may have got lost here but I think the authors use delta, t and then deltat as 3 separate quantities. If so this is confusing because when I read deltat I don't know whether it is deltat or the product of delta and t. Suggest replacing deltat with something else if this is the case Lines 260-264: Its not clear to me that strong evidence is more than substantial evidence. I would just say that the larger the value the stronger the evidence - the cutpoints are a bit arbitrary and unnecessary. Lines 332-334: Is this correct? Is it not that there is a x% chance that the Burr model is the best model rather than there is an x% chance it fits better? That would then seem more consistent with the above? Table 2: Interesting to me that the gamma distribution appears to perform worse in terms of AIC when doubly censored? Any explanation for why this would be? also the standard errors have done some interesting things... For the gamma distribution the beta parameter in particular has much much smaller standard errors. Same with Burr III. For Burr X alpha has much smaller standard errors. For Burr XII the standard errors suggest the fitting has gone wrong because they are huge. The derived model looks ok, they've all consistently got a bit smaller. I'd like some commentary/justification on this. You have said a bit below about the large standard errors in Burr XII but I'm not sure how much that helps me - are you saying that 2 of the 3 parameters just don't matter and everything is governed by alpha? That seems odd to me. I don’t fully understand what’s going on here, so more explanation would be welcome. Figure 2: Its nice to see this plotted out like this. In the sections above (or the supplementary) it would be nice to see comparisons against the lognormal and the Weibull (particularly the Weibull). Looks like the performance might be incredibly similar… Table 3: Make it clear in the caption that the reason for multiple entries here are because there are different datasets. I'd really rather see an actual metric e.g. the difference between AIC scores than just a series of yeses and nos. The magnitude of potential differences matters. Lines 400-402: I don't think you can say this. Taking AIC as a measure, excluding the third campylobacter dataset as suggested and using the derived Burr then I count 6 times where the gamma distribution is better and 12 times where the derived Burr is better. If I do a quick test of proportions I get that I estimate the gamma distribution to be better 33% of the time (95% CI 14%: 59%, p-value against a null of 0.5 = 0.24). So I don't think you have support for this statement. Linear 491: I don’t think I ever saw anywhere how you assessed statistical significance or a p-value for this. Reviewer #2: Jamieson et al. describe the family of Burr distributions for modelling incubation time data. They derive a novel probability distribution within the Burr family and use this to model incubation time data sourced from prior studies on a number of different infectious diseases. The datasets used comprise doubly censored data, in which the exact time of infection and symptom onset within any given day are not known. The resulting paper was interesting to read and seems like a useful contribution on this topic, albeit providing incremental developments rather than a major jump in methodology. I have some comments regarding the details of the methods and results presented. For context, I am a statistician with some research experience using incubation time data. -Under ‘A probability-based approach’ in the Methods section, I feel that the use of mathematical and statistical terminology is quite loose and not entirely accurate (although this does not undermine the rest of the paper). In the initial paragraph here, it is not clear whether the authors are describing a process with discrete time steps or a stochastic process in continuous time. ‘p(t)’ can only be described as the probability that an individual will experience symptoms at time ‘t’ if time is discrete (otherwise f(t) could give the probability density or perhaps hazard of an event occurring at time ‘t’). However, the subsequent reasoning regarding the limit of the function as delta-t tends to zero only makes sense in continuous time. Why not just start here along the lines of “We first consider the option of using an exponential survival model with time-varying hazard…”? Transformations between hazard and cumulative distribution functions etc all form a basic part of the theory of survival (/time-to-event) analysis. -I don’t think that the terms ‘Markovian random variable’ or ‘Markovian rate’ are very commonly used. I can guess at what the authors mean based on context and knowledge of Markov processes, but it would be better for the authors to more explicitly explain what they are trying to convey in each instance. -There is odd phrasing in a few places: “First, αD is an exponent of t controlling the growth of probability”, I think that ‘increasing hazard’ or ‘increasing probability density’ would be more technically correct, although it may also be possible to rephrase in less technical language. “The rate at which the derived Burr approaches the exponential distribution increases for decreasing αD”, I think that it would be more correct to say something like “the rate at which the derived Burr approaches a constant hazard (as for an exponential distribution)…”. “Therefore, the parameter αD can be interpreted as a parameter that limits the rate at which the symptom onset process in an individual becomes Markovian”: I think that this sentence could be dropped. -I think that the description of AIC and BIC etc in the Methods could be substantially condensed, as most readers will be familiar with these standard model comparison approaches. The full details could be retained in the appendices. -RESULTS: “Models are fitted using both the continuous and doubly interval-censored models to offer comparison between the two methods.” I’m not sure exactly what is meant by the ‘continuous model’ here and throughout the rest of the manuscript. Have the integer values for incubation times in days just been used to fit each distribution using standard maximum likelihood methods as if they were truly continuous data? -“Results from using the DI methods agrees with the continuous likelihood method in that βXII and γXII in the Burr type XII model have large standard errors, indicating that they are not important in the model fitting procedure.” I think that this rather implies an unstable model fit. This could be because there are two many model parameters, a lack of separation of the parameters or there are computational problems… -“Indeed, the doubly interval-censored methods account for a potential delay between exposure time and infection as well as a delay between symptoms starting to develop and the person reporting the symptoms, whereas the continuous model does not account for either delay, resulting in longer times for the incubation periods.” I’m not convinced that this statement is true. I assume that for the incubation time datasets ‘infection’ is recorded as the day of exposure and the event of ‘onset’ is recorded as the date of reporting of symptom onset. That being the case, I don’t see how any method can separate out the delay from exposure to infection or the delay from symptom onset to reporting. Any analysis method is going to model the time distribution between the events recorded. No use of a censoring method to allow for inexact recording of symptom onset will adjust for the fact that someone might have started feeling the symptoms a day or two before being recorded as a case (unless this is recorded accurately in the first place). -The simulations are largely used to map the parameters of the derived Burr model to the gamma distribution, which ultimately produces very similar results for any of the datasets considered. I would ideally also like to see a check of appropriate model fits using the authors’ methodology with doubly censored data (which could be simulated with exact known infection and onset times), e.g. to check for bias in parameter estimation and appropriate coverage of 95%CIs. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes:** David Ewing Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1012041.r001
Revision 1
2 Oct 2024 Author Response Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pcbi.1012041.r002
19 Nov 2024 Decision Letter - Thomas Leitner, Editor, Eric HY Lau, Editor Dear Mr Jamieson, We are pleased to inform you that your manuscript 'The Burr distribution as a model for the delay between key events in an individual’s infection history.' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Eric HY Lau, Ph.D. Academic Editor PLOS Computational Biology Thomas Leitner Section Editor PLOS Computational Biology Feilim Mac Gabhann Editor-in-Chief PLOS Computational Biology Jason Papin Editor-in-Chief PLOS Computational Biology ********************************************************* Thanks for addressing all the editor’s and reviewers' comments. Congratulations on the excellent work! Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have satisfactorily addressed my comments and concerns. I am happy to approve the revised submission. Reviewer #2: Thank you for the comprehensive replies and adjustments to the manuscript. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes:** David A Ewing Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1012041.r003
Formally Accepted
9 Dec 2024 Acceptance Letter - Thomas Leitner, Editor, Eric HY Lau, Editor PCOMPBIOL-D-24-00529R1 The Burr distribution as a model for the delay between key events in an individual’s infection history. Dear Dr Jamieson, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1012041.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .