Understanding an evolving pandemic: An analysis of the clinical time delay distributions of COVID-19 in the United Kingdom

Understanding and monitoring the epidemiological time delay dynamics of SARS-CoV-2 infection provides insights that are key to discerning changes in the phenotype of the virus, the demographics impacted, the efficacy of treatment, and the ability of the health service to manage large volumes of patients. This paper analyses how the pandemic has evolved in the United Kingdom through the temporal changes to the epidemiological time delay distributions for clinical outcomes. Using the most complete clinical data presently available, we have analysed, through a doubly interval censored Bayesian modelling approach, the time from infection to a clinical outcome. Across the pandemic, for the periods that were defined as epidemiologically distinct, the modelled mean ranges from 8.0 to 9.7 days for infection to hospitalisation, 10.3 to 15.0 days for hospitalisation to death, and 17.4 to 24.7 days for infection to death. The time delay from infection to hospitalisation has increased since the first wave of the pandemic. A marked decrease was observed in the time from hospitalisation to death and infection to death at times of high incidence when hospitals and ICUs were under the most pressure. There is a clear relationship between age groups that is indicative of the youngest and oldest demographics having the shortest time delay distributions before a clinical outcome. A statistically significant difference was found between genders for the time delay from infection to hospitalisation, which was not found for hospitalisation to death. The results by age group indicate that younger demographics that require clinical intervention for SARS-CoV-2 infection are more likely to require earlier hospitalisation that leads to a shorter time to death, which is suggestive of the largely more vulnerable nature of these individuals that succumb to infection. The distinction found between genders for exposure to hospitalisation is revealing of gender healthcare seeking behaviours.


Introduction
The COVID-19 pandemic has had an unprecedented impact on the global population. In the United Kingdom, as of 24 February 2021, 4194785 cases have been observed [1]  healthcare system. The changing landscape of COVID-19 prevalence due to non-pharmaceutical interventions (NPIs) has led to a variable aetiological clinical impact. Moreover, since the onset of the pandemic, the virus has had varying temporal consequences for different demographics, affecting the time delay parameters, which was particularly pronounced with the March 2020 outbreak in care homes [2]. Understanding these temporal time delay dynamics of infection is key for the calculation of the infection hospitalisation rate (IHR) and infection fatality rate (IFR). This in turn has implications for the accurate modelling of the pandemic and formulation of effective public health policy. For instance, the changes to the time delay dynamics are central to estimating the incubation and illness period, which is essential for defining accurate quarantine periods for those that have been infected or exposed by a contact.
Tracking the phenotypic changes in the virus is now becoming more relevant due to the extent of antigenic drift observed in SARS-CoV-2 [3] and worrying mutations [4] that may have an impact upon vaccine effectiveness. There is limited contemporary research that looks at infection to clinical outcomes and nothing we have found for this study that addresses the temporal changes or looks in detail at the distinctions between gender or by age. Much of the literature that seeks to estimate the time delay dynamics [5] has been focused on the outbreak in Wuhan, China seen in 2019 and at the start of 2020. From this period Linton et al. (2020) [6] calculated the mean time from infection to hospitalisation: 9.7 days (95% CI: 5.4, 17.0), hospitalisation to death: 13.0 days (95% CI: 8.7, 20.9), and infection to death: 20.2 days (95% CI: 15.1, 29.5). However, these estimates are predominantly from small samples and, due to the pandemic nature of this outbreak, are dependent upon the demographic structure, the quality of the healthcare system, and the epidemiological context in which they were collected.
The time between infection and a clinical outcome for infectious diseases is not precisely observed and therefore is often 'coarsely' recorded, that is, we observe a subset of the sample space in which the true but unobservable data actually lie [7]. Therefore, modelling of this type of data needs to adjust for its imprecise nature or it is likely that the estimates will not accurately capture the maximum likelihood or the tails of the distribution, which can be important to inform key elements of public health policy. McAloon et al. [5] found, in a meta-analysis of studies published on the incubation period of COVID-19, that this has been overlooked in much of the current literature. In this study, we employed a doubly interval censored modelling approach [8] that seeks to capture all the available information of the clinical time delay distribution.
The time delay from infection to a clinical outcome has changed in response to the evolution of intrinsic and extrinsic factors across the geography of the United Kingdom. Using the most complete clinical data presently available, we have calculated, across distinct epidemiological periods in the pandemic, the difference in the time delay distributions for hospitalisations and deaths. These periods were defined by identifying temporally unique periods that were found to be strongly associated with changes in the prevalence of SARS-CoV-2. We have further modelled the difference between age groups and by gender to understand and analyse distinctions between demographic groups.

Epidemiological data
Two Public Health England datasets were used in this study: the mortality line list and the Severe Acute Respiratory Infection Watch (SARI) line list [9]. The data used ranges from 1 January 2020 to 20 January 2021. The key dates used to develop the models were of symptom onset, hospitalisation, and mortality in order to measure three quantities of interest: the time from infection to hospitalisation, the time from hospitalisation to death, and the time from infection to death.

Data preparation
The two datasets used in this study were merged and split in order to measure the three quantities of interest. Subsequently, rows with missing values and duplicates were dropped. As the datasets were anonymised, it was assumed that if two lines had the same local authority area, sex, age, start date and end date, then they referred to the same person. Additionally, the data was filtered to remove erroneous negative time-delay periods and extreme outliers prior to model fitting. The data were then split into distinct epidemiological periods: the first wave (January to May), the summer (June to August), the second wave (September to November), and the third wave (December to January). The periods were defined by clear distributional changes in the time delays that had an evident seasonality with distinct peaks in prevalence and hospital admissions: • 1 st Period: The first period was characterised by a sharp increase of SARS-CoV-2 incidence that peaked at 280000 [10], which across the period led to daily hospital admissions having a median of 1466 [1] and this precipitated the first national lockdown.
• 2 nd Period: The second period saw a loosening of NPIs with the median for daily hospital admissions dropping to 162 [1] and incidence estimates peaking at 10700 [10].
• 3 rd Period: The third period was characterised by the introduction of tiers that determined the extent of the NPIs that were required locally. It saw an increase in the median for daily hospital admissions to 1025 [1] and a peak incidence estimate of 66800 [10].
• 4 th Period: The middle of the fourth period saw the start of a national lockdown with the highest median for daily hospital admissions of 2529 and incidence estimates peaking at 157000 [10].
In addition, in order to assess the dependence of time delay on gender and age, we split the combined data by ten-year age bands and gender using the data from January 2020 to November 2020. These dates were selected so that the full distribution of hospitalisations and deaths had been observed. We did not have reliable data on infection to symptom onset so this was informed by a literature estimate [5]. We then calculated for these periods two categories based on the date that the data was collected. This was used to address the inherently 'coarse' [8] nature of this data, in part due to how it was recorded.

Time delay distribution modelling
We define two events, A and B, and the times at which these events occur, α and β, with α < β. However, α and β are not known precisely: In addition, let O be an unobserved event that occurs a time t 0 prior to A. The probability density function governing the time from O to A is p(t 0 ). Let the time between events O and B be T, a continuous random variable with probability density function f(t;θ) dependent on parameters θ. We express the joint probability of all three events as In the absence of information informing p(α), let it be a uniform distribution. In other words, Then, we can express the likelihood of θ and an observed data point For multiple data points X = {X i }, the likelihood is and a Hamiltonian Markov chain Monte Carlo method is used within Stan [11] to find the distributions of θ, given the observed data.
Within the context of this paper, the events O, A and B refer to the quantities in Table 1. We use the literature [5] to inform the time between O and A as p(t 0 ) * Lognormal (1.63, 0.50). In the specific case that the time we want to measure is in fact A to B rather than O to B, we can let p(t 0 ) = δ(t 0 ) where δ here refers to a half delta function defined on t 0 2 R þ 0 . In order to account for the right truncation present within the most recent portion of the dataset, we use a modified probability density function f RT that accounts for this [6] where Fðt; θÞ ¼ R t 0 f ðt; θÞ dt is the cumulative probability function of f and r is the exponential growth rate of type A events. In this paper there are two categories of type A event: symptom onset and admission to hospital. In order to calculate the growth rate, a negative binomial was fitted to modelled incidence [12] for symptom onset, and publicly available admissions data [1] for hospitalisations.

Models and assessing performance
For each set of data points, the probability density function f is taken to follow the Lognormal, Gamma and Weibull distributions as these are commonly used for survival data. Their probability density functions, defined for x � 0, are as follows We calculated the leave-one-out cross-validation (LOO) using Pareto-smoothed importance sampling (PSIS) and the widely applicable information criterion (WAIC) [13] scores for each model to compare the accuracy of the fitted Bayesian models. The WAIC score is asymptotically equivalent to LOO and can be thought of as an approximation [14]. Therefore, LOO scores were used in conjunction with Pareto k diagnostics and the R-hat convergence diagnostics to assess the best model fit. Most desirable is the lowest LOO score alongside a Pareto score where k � 0.7 and anR � 1:05 [13].

Results
We present two sets of results in this paper: (i) the evolution of the times to clinical outcome over the course of the pandemic, and (ii) the variation in those times by sex and age group. The times to clinical outcome that are measured are infection to hospitalisation, hospitalisation to death, and infection to death. The modelled estimates that were of primary interest were informed by category A (see section on data preparation) rather than by category B because these estimates are not influenced by historical infections in the defined periods. We report category B results (Tables A1-A3 in the S1 Appendix) as they may have utility for epidemiological modelling and when assessing external factors, such as the impact of healthcare pressure, because it captures those individuals that died or were hospitalised in that period. The choice of which date category to use impacts the whole time delay distribution. The right tail of the distribution using category B data may capture some individuals infected in an earlier period whereas, using category A to inform estimates may capture some hospitalisations and deaths from a later period. We were not aware of any selection bias for individuals that were included in the datasets used for modelling although, ascertainment bias for cases would be more evident in the earlier periods when testing capacity was more limited. The sample of individuals that have symptom onset included in the death data line list pertains to reporting practices of certain testing laboratories. In the SARI dataset highly detailed data is collected for a subset of NHS Trusts, which includes symptom onset.  Tables 2 and 4 show that the Lognormal is a better fit for the infection to hospitalisation and infection to death distributions whereas, Table 3 illustrates that Weibull is a better fit for the hospitalisation to death distribution.  Tables 2-4 show the distributions of these times for the four distinct periods described in the Methods section. There is a consistent age structure for the hospitalisations and deaths, which is highly skewed towards the older demographics, irrespective of the temporal period. Noteworthy is the result that the mean time from infection to hospitalisation has remained the most constant of the three time delay quantities. This contrasts with noticeable increases observed in the time from hospitalisation to death and infection to death over the summer and early autumn months of 2020 when prevalence was lower, with declines observed in the most recent period.

Variation in time by sex and age
Additionally, modelled results by sex and age can be seen in Tables 5-7. Fig 1 illustrates that men had a longer time delay distribution than women for infection to hospitalisation; however, there was no statistically significant difference in the time from hospitalisation to death between the sexes. For the variation by age, the mean time from infection to hospitalisation and death increases from those in their twenties to peak in patients in their forties, followed by a steady reduction with increasing patient age until 80-89. The variation observed within the time from hospitalisation to death was more modest; nevertheless, middle-aged patients displayed the longest times as observed in infection to death. Results for people under the age of 20 were discarded because there were too few patients for a meaningful measurement of their epidemiological characteristics. Males have a greater time from infection to hospitalisation, which was statistically significant, with a p-value of 5.0 × 10 −15 using a Mann-Whitney-Wilcoxon. This same distinction between males and females is not found for the time delay in hospitalisation to death with a p-value of 0.93.

Discussion
The impact of SARS-CoV-2 between subgroups of the population and across periods defined by distinct temporal epidemiological trends is significant in furthering understanding of the virus and how we might expect it to change over time. Understanding the clinical time delays and the impetus that drive the changes in these distributions will help to untangle extrinsic pressure from any further phenotypic changes we encounter in the virus. This will help to inform more impactful policy decisions on the containment and the suppression of transmission and allow for a clearer understanding of variants of concern. As seen in Fig 2, there was found to be statistically significant variation between the defined periods. This is particularly apparent in Table 4 where we observe that during the first wave of SARS-CoV-2 the mean time from infection to death is 19.6±0.2 days (95% interval: 5.6, 50.0) and that in the summer period that followed, this rises to 24.7±1.4 days (95% interval: 5.8, 69.8). There has been a substantial change in testing volume and strategy over the timeline of the pandemic impacting the complete capture of COVID-19 deaths and hospitalisations, which will be particularly significant for the January to March 2020 period. This may have had the impact of selection bias at the start of the pandemic albeit the impact of this is thought to be small due to prioritisation of testing for individuals that required clinical care. The summer period is very striking in Fig 2 by the long right tail for all three categories, which could be indicative of a change in patient clinical management as intensive care clinicians found that sustaining patients that were considered extremely critical for longer could result in a higher survival rate [15]. Moreover, the survival rate for patients will have been positively impacted by the endorsement in the UK of dexamethasone [16] use on the 13 November 2020 [17], the more widespread use of individualised lung protective ventilator strategies [18], and the support for proning [19] by the Intensive Care Society [20] in April 2020. High prevalence of SARS-CoV-2 has palpably impacted the healthcare system's ability to manage the volume of patients [21], which has been a conspicuous impetus behind temporal fluctuations in the clinical time delay distributions, as seen in Fig 3. However, in periods of higher prevalence we may The data were filtered for hospitalisation dates between January 2020 to November 2020. 90% credible intervals are quoted.
https://doi.org/10.1371/journal.pone.0257978.t006 also see a compositional shift towards more severe patients being admitted, which could be seen as an adaptive response to increasing pressure on the healthcare system; nonetheless, this should not have an impact upon the time delay distributions for mortalities. This can be further seen in Table 3 where during the first period hospitalisation to death was 10.3±0.1 days (95% interval: 0.4, 34.9), while an increase was seen in the low prevalence summer to 14.6±0.3 days (95% interval: 0.4, 53.3). This association between an increase in prevalence and a decrease in the time delay to a clinical outcome can be seen across the pandemic in Fig 2. It is perhaps the best early indicator that a healthcare system is under stress and that intervention may be required to allow hospitals to decompress [22].  the pandemic by Public Health England [24] found that 88% of deaths were within 28 days and 96% were within 60 days of positive COVID-19 test, with 54% of those excluded by the 28 day limit found to have COVID-19 on their death certificate. Moreover, as the results in this study indicate, the mean time to death is longer during times of low prevalence, which leads this categorisation to be more unsuitable. We did not observe a significant impact in the clinical time delay distributions from the growth in the B. Corroborating previous literature [5,6] we find the time from infection to death for SARS--CoV-2 is similar to SARS [25] although a shorter period to peak infectivity is now clear for SARS-CoV-2 [26]. We find that the decrease seen in time from illness onset to hospital admission observed during the SARS outbreak of 2003, thought to be reflective of contact tracing, has not been observed in the SARS-CoV-2 outbreak in the UK. Table 2 illustrates how the time from infection to hospitalisation slightly increased from 8.0±0.1 days (95% interval: 2.7, 18.5) in the first wave to 9.7±0.3 days (95% interval: 4.1, 19.6) at the end of the second wave.
The time from infection to hospitalisation between genders shows a statistically significant difference with males showing a longer modelled mean time of 8.6±0.1 days (95% interval: 2.9, 20.0) relative to 7.9±0.1 days (95% interval: 2.8, 18.0) for females. This difference is not found between genders for the time delay distribution of hospitalisation to death. This is likely related to the well documented epidemiological phenomenon that males have a tendency towards delayed medical help seeking [27]. Galasso et al. (2020) illustrated across eight countries that males are overall likely to be less compliant with NPIs and treat the dangers of COVID-19 with less gravity. The greater fatality rate of males from COVID-19 [28] is a combination of biological, psychosocial, and behavioural causal factors; nonetheless, this delay in seeking out medical attention may be a contributory factor to increasing their overall IFR. We can observe the differences between age groups in Fig 1. It illustrates that the 40-49 age group have the longest time from infection to death with a mean of 26.5±1.1 days (95% interval: 7.3, 69.3) while the shortest period was found for the 80-89 age group with 17.6±0.2 days (95% interval: 5.3, 44.0). The distribution of the time delays to a clinical outcome seen in Fig 1 illustrates that the youngest and oldest age groups have the shortest time delays, which is revealing of the predominantly more vulnerable nature of the younger adults in 20-39 age bands that require either clinical intervention or have a severe reaction to SARS-CoV-2 infection that results in a mortality.

Conclusion
We illustrate that evaluating the variation in the time delay temporal changes is key to informing public health policy and that this should not be regarded as a static metric but rather something that, thus far, has been inherently a by-product of extrinsic pressure. By monitoring these changes it will aid in the calibration of quarantine periods, the calculation of fatality rates, and help in unpacking the extent of transmission. This should be monitored closely in response to new variants of concern and further work should aim to understand their time delay dynamics. Moreover, we also recommend further analysis to assess the impact of vaccination campaigns on these trends. The paradigms seen by gender are not unexpected but should help to inform public policy on how to shape the message around when to seek medical attention. Finally, we propose that fluctuations in the modelled mean time from hospitalisation to death can be used as a proxy indicator of healthcare strain and that an intervention is required that may help to preclude avoidable morbidity and mortality. The main limitation of this study is that we can only infer from the wider context any causal impact on the clinical time delay distributions.