Estimating fine age structure and time trends in human contact patterns from coarse contact data: The Bayesian rate consistency model

Shozen Dan; Yu Chen; Yining Chen; Melodie Monod; Veronika K. Jaeger; Samir Bhatt; André Karch; Oliver Ratmann; on behalf of the Machine Learning & Global Health network

doi:10.1371/journal.pcbi.1011191

Peer Review History

Original SubmissionOctober 20, 2022
4 Feb 2023 Decision Letter - Virginia E. Pitzer, Editor Dear Mr. Dan, Thank you very much for submitting your manuscript "Estimating fine age structure and time trends in human contact patterns from coarse contact data: the Bayesian rate consistency model" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Virginia E. Pitzer, Sc.D. Section Editor PLOS Computational Biology ********************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: This manuscript is attractive and well-written but methodologically under-developed. The full-Bayesian model should be expanded to account for temporal trends. Currently, f_t, tau_t, alpha_t, l_t are treated as a priori independent, but use of splines or GPs or linear models would help borrow information across survey windows and explicitly account for temporal trends. The same goes for the time varying reporting effects. Major Comments: 1) Regarding second sentence of section "Data Processing". The method of imputing children's ages seems more like adding noise than actual imputation (which usually uses ancillary data to predict missing data). Since you're already working within a Bayesian framework, wouldn't it make more sense to infer these ages as latent variables? Otherwise, it is hard to understand the benefit of adding noise. 2) The choice of directly including (8) in (9) seems arbitrary. Wouldn't it be reasonable to include any function that favors Y and penalizes T? 3) I find the HSGP terminology a little bit troubling. Isn't it more accurate to call this a truncated Karhunen-Loeve expansion? Is HSGP a rebranding or is it really different? Minor Comments: 1) I don't think the stringency index is ever defined or its relevance is ever stated. It is cited, but I shouldn't have to look it up elsewhere. 2) Sentence beginning "The overdispersion parameter" is not a sentence. 3) "i. e." 4) What is the main point of the second paragraph within the "Difference in Age Parameterization" section? Is it "this works better and is worth the additional computational cost"? Please clarify. Reviewer #2: This paper introduces a novel method to analyse contact survey data where the age characteristic of the contact has only been reported as an age group. The method allows estimation of contact matrices at a finer age scale. The resulting matrices remind of approaches where a contact matrix of aggregate age groups has been smoothed (e.g. https://doi.org/10.1186/1471-2334-9-187), but the method presented here arguably make better use of the precise recording of contactor ages. In addition, the authors aim to adjust their estimates by reporting fatigue, and for data collected at an aggregate level. This is then applied to analyse social contact data from Germany. The authors used zero-mean two-dimensional Gaussian Processes to estimate the contact intensities between two ages. I am not familiar enough with this method to accurately review this technique. I have assumed that it is valid and appropriate in my review, but would recommend a specialist review to look at this aspect of the method. Overall, this is a very interesting paper and a useful addition to the literature on contact matrices. I do have a number of comments, mainly on i) the adjustment for the aggregated number of contacts, and ii) the simulations used to validate the model. Major comments: i) Adjustment of the aggregated number of contacts Both the incidence of cases and stringency index remained relatively stable between waves 2 to 5 (figure 1). However, your model estimates a gradual increase in contacts between these periods (figures 5, 6, and 8). This is contradictory, as you would roughly expect changes in the force of infection and incidence to be proportional to the change in the number of cases. In Figure S1, I would expect non-household contacts to increase as NPIs are relaxed. However, they remain stable, while mainly the aggregated contacts seem to increase over time. Part of that is artificial, because they could only be reported from wave 3 onwards (line 89 – please explain why there are also aggregate contacts in wave 2 in your figure S1). But I would expect to see a similar increase in non-household contacts if there was indeed a true change in behaviour increase. I have some doubts about the data quality and accuracy of the reported aggregate number of contacts. Direct contacts are relatively clearly defined (physical touch and/or a short in-person conversation), but people can also have other in-person contacts that do not necessarily fit this definition (e.g. shop clerks interacting with customers). These other contacts are not as clearly defined, but often referred to as indirect contacts. Many surveys, including POLYMOD, allowed participants to roughly estimate such indirect contacts. They are probably less important for transmission than direct contacts (no/fewer words spoken, usually of shorter duration or larger physical distance). The aggregate contacts referred to in this paper may also be an example of such indirect contacts (I could not see the actual wording of the questions asked), and I think that this should be explored further: - At a minimum, could you report the proportion of aggregate to individually reported contacts in each age and wave? - In lines 93, 140, 205: It is not clear how participants with aggregate contacts are treated. Are the contacts added in Y_ab? Or do only the participants add to N_a, but their aggregate contacts not to Y_ab? If a person aged a reports 20 aggregate contacts, does that mean 20 is added to T_ta? Please clarify this in the text. If they are added to T_ta (and thus given equal weight as contacts in Y_ab), it would be useful to explore treating them differently, either by excluding them from the analysis, or by giving them a lower weight than the direct contacts (it can potentially be fitted as a parameter in your model, you will have some information if you assume people of similar age have on average the same number of daily contacts, and there are sufficient people with/without aggregate contacts). ii) Validation of the model through simulated contact data The simulated datasets seem to generate realistic contact matrices with strong age-assortative mixing and off-diagonals representing child-parent contacts. The model seems to be able to estimate this matrix quite well, where the difference-in-age method is better than the age-age method. However, that is perhaps not surprising, as the data is also generated to make sure that there are strong difference-in-age patterns. Real-world data is a lot messier, and it would be useful to see how well the model performs against a real-world dataset where the age of the contact is known. POLYMOD has the age in years (as an integer) for both participants and most contacts. It would be useful to understand how well the model can estimate the age-granular POLYMOD matrix (assuming that it is true) from POLYMOD data converted to data in age groups. In addition, mixing patterns can look very different in populations that do not follow a rectangular age distribution (such as Germany or the UK), but where the age distribution is pyramid-shaped as is the case in many countries with high mortality rates (e.g. Kenya or Malawi). In these populations, age-assortative mixing is often less apparent in matrices as contacts between children dominate the overall contacts. It would be very useful to understand how well the model behaves for matrices in these populations. Minor comments: 17: More difficult compared to what exactly? 22, 40: The authors write that participants record contact ages in large age categories COVID-era studies to facilitate reporting. In many surveys, this was a restriction decided on by the survey implementor (the online market research platform) for ethical and privacy reasons. Note that it is not unique to COVID-era studies, e.g. POLYMOD also allowed participants to provide an age range if the exact age of a contact was not known (which is common). 120, 146: Why did you cut your matrices off above age 84? Are participants and contacts above these ages grouped together, and should the final age group be interpreted as a single 84+ group? 133: when imputing the age of the child, did you create multiple datasets to fit to? Did you redraw the age in each MCMC iteration? 138: Unclear what is meant by a missing entry. I assume it means that not a single participant of age a and gender g reports any contact with contacts of age b and gender h, but this is not clear. 156, 161: be consistent when describing contact rates, e.g. make it clear that this is the rate per unit time (probably day) Useful to double check notations. Especially M, F and g, h subscripts are used interchangeable, which is not always necessary. 187, 191: It may be better to speak of survey wave rather than time or survey time 283: I can’t find any trace plots in the supplement. Not possible to assess mixing and convergence of chains. Would be useful to have a table with all summary statistics of the chains as well (for the final model). 293: small typo: studies is duplicate Table 1: it is hard to understand how the accuracy of the method changes from the summary statistics alone. Could these matrices also be plotted in the supplement? 368: did you use the same population distribution in adjusting the socialmixr estimates for symmetry as has been used in your model (your P_a parameter)? This will (slightly) affect the final estimated contact matrix. 370: contacts with missing age information are imputed by sampling. Do the socialmixr figures show a single (random) iteration of this sampling? Similarly, how are you selecting the final fitted parameters to generate your shown contact matrices from your posterior distribution? Figure 4: is it possible to show the intensity for the top and middle row at the same scale as the bottom row? E.g. by dividing the value for each cell for each integer age in the contact age group, and copying this value for each age in the participant age group? This also makes the comparison between age groups a bit fairer, as some are in 5y age bands and other are in 10y age bands. Figure 5: how are the central estimates calculated? Median of posterior samples? Figure 5, 192: I agree with the attempted adjustment for reporting fatigue. However, it would be useful to understand how well your method is able to make this adjustment. Your rho parameter essentially rescales all contacts for people who have completed >1 survey. Could you compare the adjusted estimates in figure 5 to a model fit where all data is restricted to new participants? Figure 7: The POLYMOD estimates seem very high. The mean contacts in the Mossong et al paper are around 10-15 for most age groups, and similar for Germany. Could you double check these? Have you included the estimated indirect contacts in the POLYMOD estimates? Because of different survey methodologies (and changes in behaviour), the differences between POLYMOD and COVIMOD are probably not only due to the pandemic, which should be accounted for when comparing the two surveys. 484: This is figure 7. 485: I don’t think you can conclude that there has been a sustained behavioural change compared to pre-pandemic values, as the two surveys are not directly comparable. 528: I assume this is a computational limitation? Arguably, the contact patterns by age are much more important than those by/between gender. Combining the two genders may allow fitting to a larger number of survey waves. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1011191.r001
Revision 1
14 Apr 2023 Author Response Attachments Attachment Submitted filename: response_to_reviewers_r1.docx https://doi.org/10.1371/journal.pcbi.1011191.r002
17 May 2023 Decision Letter - Virginia E. Pitzer, Editor Dear Mr. Dan, We are pleased to inform you that your manuscript 'Estimating fine age structure and time trends in human contact patterns from coarse contact data: the Bayesian rate consistency model' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Reviewer 2 also noted a few minor typos that you may wish to address at this stage. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Virginia E. Pitzer, Sc.D. Section Editor PLOS Computational Biology Lucy Houghton Staff PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I'm satisfied. Reviewer #2: I would like to thank the authors for their work and additional time in running the additional analyses. I am very happy with their results, and with the edits made. All my comments from my first review have been answered, and I have no further comments. I did notice a few typos that I wanted to highlight to the authors: Line 533: 20202 Line 614: S2 Fig is duplicated Line 619: A is capitalized ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes:** Kevin van Zandvoort https://doi.org/10.1371/journal.pcbi.1011191.r003
Formally Accepted
1 Jun 2023 Acceptance Letter - Virginia E. Pitzer, Editor PCOMPBIOL-D-22-01544R1 Estimating fine age structure and time trends in human contact patterns from coarse contact data: the Bayesian rate consistency model Dear Dr Dan, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1011191.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .