Real-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges

The time-varying reproduction number (Rt) is an important measure of transmissibility during outbreaks. Estimating whether and how rapidly an outbreak is growing (Rt > 1) or declining (Rt < 1) can inform the design, monitoring and adjustment of control measures in real-time. We use a popular R package for Rt estimation, EpiEstim, as a case study to evaluate the contexts in which Rt estimation methods have been used and identify unmet needs which would enable broader applicability of these methods in real-time. A scoping review, complemented by a small EpiEstim user survey, highlight issues with the current approaches, including the quality of input incidence data, the inability to account for geographical factors, and other methodological issues. We summarise the methods and software developed to tackle the problems identified, but conclude that significant gaps remain which should be addressed to enable easier, more robust and applicable estimation of Rt during epidemics.


Overview
In this supplementary information, we provide more detailed results from our scoping review, including additional exploration to determine the usability of each identified R package and tool, more in depth analyses of our questionnaire data, and the questionnaire itself.

Exploration of R package and tool usability
We performed some additional exploration to determine the usability of each R package and tool (Table B & Table 2 in the main text). This was assessed by the ease of installation, the availability of adequate documentation or tutorials, and their speed. Table B provides the full breakdown for how each score was defined.

Ease of installation
Ease of installation was determined by the availability of installation instructions, the options for how the R package could be installed, the R package dependencies, and whether other system dependencies were required for either running or installing the packages. As EpiFilter is not an R package, some of the criteria in this section are not applicable. To use EpiFilter, the repository needs to be cloned from GitHub. 243

Installation Instructions
Installation instructions may be available through an adequate README file or website, e.g. a "quick start" or introductory page. Instructions were found for all R packages except for APEestim and bayEStim.

Options for installation
Installation is easiest if the R package is available on the Comprehensive R Archive Network (CRAN), by simply using the "install.packages()" function. earlyR, epicontacts and EpiNow2 are available on CRAN. All remaining R packages can be installed using the devtools R package, which enable packages to be installed and built directly from source (e.g. GitHub

Speed
Computational speed has been assessed by each author of this study with different computer specifications (see Table B).
The main function in each package/tool was taken from provided examples (if available) and wrapped in the system.time() R function to measure the execution time. The main function estimated the reproduction number for each R package/tool except for epicontacts, which estimates the serial interval. We chose the following classifications: <10 seconds = very good, 10 seconds -5 minutes = good, >5 minutes = poor. The classification allocated to each package was based on the agreement of at least 2 out of the 3 computers. We note that such direct comparisons of the runtimes of the different models may not be fair, as the examples provided by each package which we have used to assess speed vary in terms of the dataset used, model complexity, and dimensionality of the reproduction number to estimate. Nevertheless, we assume that examples will always be relatively simple and therefore their computational speed may be a good overall indicator of speed of reproduction number estimation in general using a given package.
In our experience, of those that estimated the reproduction number, the fastest packages were APEestim, earlyR and EpiFilter, all taking <10 seconds. Epidemia took from <1 minute to nearly 4 minutes, depending on whether the model was basic (uses renewal equation to propagate infections) or extended (adds variance to this process). Interestingly, the extended model took less time to run, which as discussed on the package website, may be due to the posterior distribution being easier to sample from. 251 Meanwhile, estimating the reproduction number using the epinow() function in EpiNow2 took longer, from 3 up to 13 minutes. This took much longer when estimating the reproduction number for each geographical region in turn using regional_epinow(), ranging from 8 to 25 minutes. Both Epidemia and EpiNow2 give an indication of progress whilst estimation is ongoing. This is particularly important for EpiNow2 given the lengthy execution time, but the progress bar only refreshes after long intervals, so the user is not updated regularly. Table B. Usability of each R package or tool * identified in the scoping review. This table shows a full breakdown of how the classifications (very good=✓✓, good=✓, poor=✗) were determined for the "additional exploration" section of Table 2 within the main text. For the 'ease of installation' and 'documentation and tutorials' sections, each criterion was allocated a score, shown in squared brackets, and the overall classification was determined by the sum of the scores. For the 'speed' section, each author used the system.time() function in R to determine the run time of the main function of the package available in the provided examples. ** The classification (<10s = ✓✓, >10s -5min = ✓, >5min = ✗) was decided based on the time category agreed on by at least 2 out of the 3 computers.

Theme
Usability criterion APEestim

Ease of installation
Installation instructions available (e.g., detailed README file or webpage) [1] (Note: devtools may require the installation of Rtools (Windows) or Xcode (Mac))

Overall ease of installation
Scoring:

Documentation and tutorials
Function documentation (description of inputs and outputs) [1]

Questionnaire: additional results
The full questionnaire is shown in the next section. In total, there were 17 responses to the questionnaire from respondents in 11 different countries, including the USA, Canada, France, Indonesia, Austria, Bermuda, Germany, India, Peru, Uruguay, and the UK ( Figure B). In an attempt to reach a wide variety of people, the questionnaire was distributed to a contact list of known EpiEstim users, in addition to being shared via the Imperial MRC Centre for Global Infectious Disease Analysis Twitter account (with over 141,000 followers). Nonetheless, the number of respondents was small, and it is possible that the sample may be more representative of epidemiologists, or those with a high level of training in the field, as opposed to other researchers who may experience greater barriers in terms of usability. Despite these limitations, the questionnaire helped us to limit the risk of publication bias and was useful to reinforce findings from our literature review whilst ensuring that we didn't overlook any issues that would be less apparent from the literature search alone.  All questionnaire respondents had performed analysis using EpiEstim to investigate COVID-19 and some also used it for Ebola Virus Disease, Norovirus and Influenza ( Figure D). In terms of input datasets, all respondents had access to daily case incidence data, 24% had daily death data, and 12% had weekly case incidence data ( Figure D). Only one respondent reported having data that distinguished between local and imported cases and none reported having data containing infector-infected pairs. On a scale from 1 to 5 (1 representing "badly" and 5 "very well") respondents were asked how well EpiEstim met their needs. All respondents selected between 3 and 5, with 94% selecting either 4 or 5 ( Figure F). Despite this, twelve of the respondents (71%) reported having a technical or methodological issue when using EpiEstim.
Almost half of respondents (47%) said they thought none of EpiEstim's features could be improved, whereas 35% thought that usability could be improved ( Figure F). One respondent (6%) thought speed could be improved and 24% selected    The questionnaire highlighted similar challenges to the scoping review, including issues with incidence data, geographical factors and suggestions to extend the capability of the package to address practical and logistical issues (Table C).
Respondents wanted to account for reporting delays and time-varying reporting rates, to use weekly as well as daily incidence, different formats of data, and more intuitive plotting options for the results. Regarding geographical issues, they suggested enabling easier reproduction number estimates for different regions simultaneously. For more practical purposes, they proposed expanding the methodology to allow for the projection of hospital bed occupancy, thereby assisting logistical planning in hospitals. As discussed in the main text, the questionnaire also revealed an additional three themes: usability, speed, and compatibility.
It is important to note that, as shown in section 5, for four of the survey questions that required typed answers as opposed to multiple choice, we used example responses to ensure that questions were clear for respondents. These examples were based on issues previously reported in correspondence with users of EpiEstim. It is possible that these examples may have biased the results by making respondents more likely to state these particular issues compared to others.