Covasim: An agent-based model of COVID-19 dynamics and interventions

The COVID-19 pandemic has created an urgent need for models that can project epidemic trends, explore intervention scenarios, and estimate resource needs. Here we describe the methodology of Covasim (COVID-19 Agent-based Simulator), an open-source model developed to help address these questions. Covasim includes country-specific demographic information on age structure and population size; realistic transmission networks in different social layers, including households, schools, workplaces, long-term care facilities, and communities; age-specific disease outcomes; and intrahost viral dynamics, including viral-load-based transmissibility. Covasim also supports an extensive set of interventions, including non-pharmaceutical interventions, such as physical distancing and protective equipment; pharmaceutical interventions, including vaccination; and testing interventions, such as symptomatic and asymptomatic testing, isolation, contact tracing, and quarantine. These interventions can incorporate the effects of delays, loss-to-follow-up, micro-targeting, and other factors. Implemented in pure Python, Covasim has been designed with equal emphasis on performance, ease of use, and flexibility: realistic and highly customized scenarios can be run on a standard laptop in under a minute. In collaboration with local health agencies and policymakers, Covasim has already been applied to examine epidemic dynamics and inform policy decisions in more than a dozen countries in Africa, Asia-Pacific, Europe, and North America.

following changes to the manuscript to hopefully describe these features of Covasim in more detail: • In the Introduction: "Overall, the design principle we followed with Covasim was to make common usage patterns as simple as possible, while still giving the user the ability to customize virtually all aspects of the simulation.For example, Covasim comes pre-loaded with demographic data for each country (Section 2.4), but users can also define custom populations and contact networks down to the level of a single city (25) or even university (26).In addition, Covasim includes six built-in interventions (Section 2.5), but custom interventions of arbitrary complexity can also be defined (Fig. 9).In addition, Covasim's high performance for an agent-based model, achieved via dynamic rescaling (Section 2.6.2) and array-based computations (Section 2.7.1), means that most analyses can be run on a standard laptop, removing the need to use a high-performance computing cluster except for large parameter sweeps or model calibrations (Section 2.6.8).6, the unit of the y-axis of the cpu-time plot is unclear.It doesn't seem to be cpu-seconds per simulation day as this would be a few orders of magnitude off from the performance example described in the text and the linear increase labelling in the plot.
We have clarified that the simulations are run over 100 days.In addition to updating the y-axis label, we have added the following text: "Covasim performance in terms of processor usage (top) and memory usage (bottom), for the number of agents shown, simulated for 100 days.There is nearly linear scaling over three orders of magnitude of population size."We have used a unit of 100 days since this is relevant to the durations users typically run the model for (initially 60-180 days, unfortunately now 400+!).
* We had the impression, that in the current state, the model is best suited to simulate the propagation on a city level.How well could the model be extended to include another higher level network, for example the propagation between cities, states or countries?Some sort of mobility data is often available which allows a direct modelling of the network without making too much assumptions.If you find this point relevant it would eventually be interesting to touch on it in the discussion.
Indeed, as of now, Covasim has been used to model populations as a single region without more detailed information to extend the networks to include geography and the mobility between different regions like cities, states, or even countries.When we first started developing the model, many locations around the world were showing significantly reduced mobility and encouraging social distancing, so we focused our model development to capture other aspects of COVID-19 transmission instead of mobility.We agree with the reviewer that mobility data could be integrated to model movement of agents to different regions and our team is currently working on this.Part of what makes agent based models both powerful and complex to develop is how to determine which agents will move where and when, especially in the absence of data with these details.To our knowledge many mobility data available indicate the volume change in visitors or traffic to locations (e.g., Google Mobility Trends, SafeGraph).For the kind of network mobility suggested here, we are looking into data on the mobility flows between regions (cell phone based datasets, or even Facebook's Commuting Zones data) to inform where agents are going depending on their home location and the ages of mobile agents.We thank the reviewer for their suggestion and have added the following text to the discussion section on this point: "With the deployment of vaccines come additional questions and interest regarding the lifting of mobility restrictions and social distancing guidelines, as well as questions about equitable vaccine distribution to different populations around the world.Additional Covasim development of data driven modeling of mobility will help address the modeling and identification of the risk of importation to regions with fewer resources for early detection and treatment as pre-pandemic mobility gradually returns to parts of the world." Reviewer 2

Comment Response
This paper presents Covasim which is a comprehensive agent-based model for Covid-19.
The paper describes the underlying model and software which is open-source and has been developed by a team from multiple institutions.It does not describe the details of calibration methods or results, which have been presented by the authors and external users of Covasim in other publications (i.e.not in scope for this paper).The model includes all the key aspects for modelling both the dynamics of the virus and disease, as well as interventions to reduce infections.The software is written to a high standard, transparent, and easy to use and extend.A testament to this is that it has already been used by a number of external researchers beyond the core development team and has been used to advice policy makers in multiple countries.The paper is well written, clear and easy to read.

Specific comments and questions:
Thank you for the positive comments.
1.The default parameters correspond to a doubling time of 4-6 days and and an R0 of 2.2-27.Is the range stochastic uncertainty, if so what is the mean?Assuming the mean of these ranges, the doubling time seems a little high and the R0 seems a little low, certainly when compared to the early stages of the epidemic in Europe.
Since these numbers are so context-specific, we have revised the manuscript to make it clearer that default values should not be used without adjustment: "The value of β = 0.016 that is currently used as the default in Covasim was initially based on calibrations to data from Washington and Oregon states.However, this default value is too low for high-transmission contexts such as New York City or Lombardy (48), and may be too high for low-transmission contexts such as India (49).Hence, this parameter must be calibrated by the user to match local epidemic data, as described in Section 2.6.8." We have also tried to clarify the language indicating that the ranges are due to the factors mentioned: "For a well-mixed We (the authors) had a lively debate regarding this figure, and found there were a variety of opinions regarding its value (as well as the clarity of Fig. 4, though we appreciate the reviewer's comment).Given the difficulty of explaining and representing network connectivity, we did not want to remove Fig. 3 for the subset of readers who may find it helpful.Instead, we have added a second panel to it, which we hope bridges between the abstract-but-intuitive representation in (now) Fig. 3A and the concrete-but-unintuitive representation in Fig. 4. The new Fig.3B uses data from the model, as in Fig. 4, but shows connections across layers for individuals, as in Fig. 3A; unlike either Fig. 3A or Fig. 4, it also shows default weights for each connection.We appreciate that there may be relatively few readers who get value out of all of Fig. 3A, Fig. 3B, and Fig. 4; however, we wish to keep all three to communicate with the broadest readership possible.We also thank the reviewer for bringing our attention the issue regarding uninfected people.Although the speed does not depend on the number of infected agents, this comment prompted us to ask why not, and hence rethink the logic of the innermost loop.To our surprise, a change in array indexing resulted in a significant overall speed increase.This change, along with other updates, has resulted in a 3.7-fold speed increase in the latest version of Covasim (v2.1.1)compared to when benchmarking was originally performed (v1.7.0).Fig. 6 has been updated to reflect the new results.To test this result locally, run: import covasim as cv sim = cv.Sim(pop_size=100e3, n_days=100) sim.initialize() sim.run() This simulation, for 10 million person-days, should take about 1.2-1.5 s to run for version 2.1.1.
We have also clarified the text in several places: "Performance scales linearly with population size [.We have added the following text to that section: "The test suite includes unit tests (e.g., checking that sampling functions produce the desired distributions; that simulations loaded from file exactly match the original), functional tests (e.g., that a simulation run with a particular analyzer produces a plot), and end-to-end "scientific" tests (e.g., that an increase in mortality rate leads to more deaths, while adding NPIs leads to fewer It is not a rigorous Bayesian credible interval, since it is a combination of stochastic and parametric uncertainty, and since the likelihood function is not well defined.We have clarified this in the manuscript in several places: size population?Is it a Bayesian interval from uncertainty in the calibration of the model parameters? • Section 2.6.8:"Intuitively, most distributional assumptions mean that larger errors imply a lower log-likelihood.However, we do not make explicit distributional assumptions, so caution is advised with treating them as statistically rigorous likelihoods."• Section 3.1: "Since these forecast intervals are typically produced by a combination of both stochastic variability ("aleatory uncertainty") and imperfect knowledge of the "true" parameter values ("epistemic uncertainty"), they should not be interpreted as statistically rigorous Bayesian credible intervals (80,81)."• Section 3.3: "We then ran the model with eight different calibrated model parameter sets (with multiple parameter sets used to capture parametric uncertainty) to (a) estimate unobserved quantities [...] (the large uncertainty interval for deaths is a consequence of the small numbers of events being predicted, i.e., fewer than 10 deaths per day; this forecast interval includes both parametric and stochastic uncertainty, as described in Section 3.1)".
population where each individual has an average of 20 contacts per day, a value of β = 0.016 corresponds to a doubling time of roughly 4-6 days and an R0 of approximately 2.2-2.7, with the exact value depending on the population size, age structure, and other factors."2. Fig 2 -Viral load timing.It seems that out of those who do go on to develop symptoms, their viral load (and thus infectiousness) will be zero prior to the onset of symptoms or non-zero for at most one day.What is the breakdown in transmission by symptom status of the source?Is the epidemic predominantly being driven by transmission from symptomatic individuals?What is the mean generation time?It is true that in the figure shown, viral load rises at most one day before symptoms.However, in the model, individuals can become infectious more than five days prior to showing symptoms.This is shown in Fig. 2D of the companion paper, reproduced here: We have regenerated Fig. 2 using a different random seed to include a case where viral load rises more than one day before symptoms, and added a citation to the aforementioned preprint: "Viral loads for a representative sample of individuals given default parameter values are shown in Fig. 2. The proportion of transmissions by asymptomatic, presymptomatic, and symptomatic individuals varies by context; estimated proportions for Seattle are shown in (25)." 3. Fig 3 -not really necessary, all the information is in Fig 4 (which is very clear).
..] single-core compute time[...]scales at a rate of roughly 7 million simulated person-days per second of CPU time.[...] One consequence of the array-based implementation is that compute time depends on the number of agents and the number of connections per agent, but is independent of the number of infected agents; this is because uninfected agents are simply represented as zeros in the transmission probabilities vector.[...] Covasim [...] can also be adapted easily to other parallel processing libraries such as Celery and Dask.Although in some special situations it is possible to split a single simulation across multiple cores, parallel processing is used primarily to run multiple independent simulations simultaneously, such as for uncertainty analyses or calibration." Case study -is a nice example which has been presented in a prior publication.What is the basis of the forecast interval?Is this the stochastic uncertainty of the model in a finite