Generative Bayesian modeling to nowcast the effective reproduction number from line list data with missing symptom onset dates

The time-varying effective reproduction number Rt is a widely used indicator of transmission dynamics during infectious disease outbreaks. Timely estimates of Rt can be obtained from reported cases counted by their date of symptom onset, which is generally closer to the time of infection than the date of report. Case counts by date of symptom onset are typically obtained from line list data, however these data can have missing information and are subject to right truncation. Previous methods have addressed these problems independently by first imputing missing onset dates, then adjusting truncated case counts, and finally estimating the effective reproduction number. This stepwise approach makes it difficult to propagate uncertainty and can introduce subtle biases during real-time estimation due to the continued impact of assumptions made in previous steps. In this work, we integrate imputation, truncation adjustment, and Rt estimation into a single generative Bayesian model, allowing direct joint inference of case counts and Rt from line list data with missing symptom onset dates. We then use this framework to compare the performance of nowcasting approaches with different stepwise and generative components on synthetic line list data for multiple outbreak scenarios and across different epidemic phases. We find that under reporting delays realistic for hospitalization data (50% of reports delayed by more than a week), intermediate smoothing, as is common practice in stepwise approaches, can bias nowcasts of case counts and Rt, which is avoided in a joint generative approach due to shared regularization of all model components. On incomplete line list data, a fully generative approach enables the quantification of uncertainty due to missing onset dates without the need for an initial multiple imputation step. In a real-world comparison using hospitalization line list data from the COVID-19 pandemic in Switzerland, we observe the same qualitative differences between approaches. The generative modeling components developed in this work have been integrated and further extended in the R package epinowcast, providing a flexible and interpretable tool for real-time surveillance.


Introduction
The accurate and timely tracking of transmission dynamics is an important objective of infectious disease surveillance during epidemics. For instance, the effective reproduction number R t was used as a central indicator of transmission dynamics to guide and evaluate the public health response in many countries during the COVID-19 pandemic [1][2][3][4][5]. R t can be estimated from clinical case data such as confirmed cases, hospitalizations or deaths under the assumption that the proportion of infections resulting in a respective case remains constant over time [6]. Tracking R t in near real-time is challenging, however. Because of substantial delays between infection and reporting of cases, case counts by date of report only provide a delayed and potentially distorted signal of transmission dynamics [7]. Therefore, approaches to estimate R t are often based on case counts by date of symptom onset, which is ideally closer to the date of infection and independent of reporting delays [1,8]. Such data are typically obtained from a line list, which stores individual information about each case, e. g. date of symptom onset, date of positive test, and date of report.
In this work, we address three important challenges that arise when tracking transmission dynamics from symptom onset data. First, information about the date of symptom onset is typically not available for all cases in the line list. Symptom onset information may be missing due to different reasons, including gaps in ascertainment or reporting, and asymptomatic infections [9,10]. Since these factors can change over time, cases with missing symptom onset date cannot simply be excluded when inferring R t over time [11]. Second, in real-time surveillance, the delay between symptom onset and reporting means that cases with symptom onset close to the present may not have been reported in the line list yet, which is also known as right truncation [12]. As a result, case counts by date of symptom onset will be subject to a downward bias towards the present, and a truncation adjustment is needed to correct for this bias [13][14][15]. Third, when estimating R t from case counts by date of symptom onset, one needs to account for stochastic incubation periods and transmission noise to accurately relate to the time of infection [8,16].
Previous methods have typically addressed the above challenges in three separate steps [17][18][19].
Specifically, cases with missing symptom onset date are first completed via an imputation model fitted on the cases with known symptom onset date. Then, case counts by date of symptom onset are corrected for occurred-but-not-yet-reported cases via a truncation adjustment model fitted on 3/42 previously observed cases and reporting delays. Finally, R t is inferred from the case counts by date of symptom onset via a non-parametric method for R t estimation that accounts for incubation periods and noise. In such a stepwise approach, estimates obtained in each step are treated as data during the subsequent step, meaning that information can only flow in one direction, such that knowledge and assumptions used in later steps cannot inform earlier steps. Moreover, the uncertainty involved in earlier steps is either neglected [17] or must be incorporated via a resampling scheme, where later steps are repeatedly applied to each sample from the current step [18]. The overall efficiency of the latter solution strongly depends on the computational effort required for downstream steps.
For example, Li and White combined imputation, truncation adjustment, and R t estimation in a fully Bayesian approach, which is however still stepwise internally and required the use of simple estimators for truncation adjustment and R t estimation which can be directly computed as posterior predictive quantities [19].
In this paper, we explore an alternative to the existing stepwise approaches by integrating missing date imputation, truncation adjustment, and effective reproduction number estimation in a single generative model [20]. In essence, we use Bayesian hierarchical modeling [21] to describe how cases arise from an infection process and are reported with stochastic, time-varying delays and potentially missing symptom onset dates. This allows us to estimate the time series of symptom onsets and the effective reproduction number over time as parameters of a single, consistent nowcasting model, where the priors and model components for imputation, truncation adjustment, and R t estimation jointly inform each other. Moreover, when estimated in a fully Bayesian framework, the uncertainty from all components is naturally represented by the posterior distribution of model parameters [22].
To compare the generative and stepwise approaches to nowcasting, we study their performance and qualitative behavior on synthetic line list data for multiple outbreak scenarios and across epidemic phases. We ensure comparability by using maximally similar modeling components in all approaches that reflect the assumptions of the synthetic data equally well. Thereby, we study sufficiently realistic nowcasting settings with time-varying reporting delays, weekday effects in reporting, and randomly missing symptom onset dates. We furthermore compare the generative and stepwise approaches on hospitalization line list data from the first and second wave of the COVID-19 epidemic in Switzerland. Based on our findings, we discuss the relative strengths and weaknesses of the different modeling approaches in real-time surveillance and implications for further model 4/42 development in the future. Figure 1 provides an overview of different stepwise and generative nowcasting approaches compared in this work. Our main nowcasting target is the so-called instantaneous effective reproduction number R t , which describes the expected number of secondary infections caused by a primary infection if conditions remained as on date (index) t. We also nowcast the number of cases by date of symptom onset N t , which can help to identify biases in R t estimation and is often of interest by itself. The data are individual cases in a line list with a date of report and a (potentially missing) date of symptom onset. These cases can be aggregated into different counts, such as C t (number of cases reported on date t) or N t,d (number of cases with symptom onset on date t and a reporting delay of d days). Importantly, the counts N t,d are subject to right truncation, as we can only observe cases with t + d ≤ T until the present date T . Moreover, cases with missing symptom onset date can only be counted by report date, as C missing t . In the following, we develop our generative model for nowcasting R t by starting with a model for direct R t estimation using only counts by date of report C t . We then gradually increase the complexity to account for right-truncated case counts by date of symptom onset and for incomplete line list data. At each stage, we first present a stepwise approach similar to existing nowcasting methods, and then show how the additional step can be directly integrated into the generative model.

(A) Effective reproduction number estimation
There exist various mechanistic and non-mechanistic approaches to estimate R t , most of which rely on the renewal equation [23,24] as a mathematical foundation. For example, the non-parametric method by Cori et al. [8] implemented in the widely used package EpiEstim allows to compute Bayesian R t estimates efficiently from a time series of case counts. To account for the delay between infection and reporting of cases however, these estimates must be shifted back in time by the mean delay, or one must first infer the time series of infections in an intermediate step using deconvolution [1,7].
This and other limitations have motivated the recent development of Bayesian methods which model infections and R t as latent variables. Here we use a semi-mechanistic renewal model similar to existing tools [25,26] to estimate R t directly from observed case counts ( Figure 1A). This allows us Case counts by date of report are not biased by right truncation and can be used to estimate R t via a direct (no truncation adjustment) approach, assuming knowledge of the delay between infection and report.
(B) Case counts by date of symptom onset are only delayed by the incubation period but subject to right truncation. The truncation can be adjusted for by using a stepwise (additional adjustment step) or generative (integrated truncation model) approach. (C) When line list data are incomplete, cases with missing onset date must be accounted for. This can be achieved using a stepwise (additional imputation step) or generative (integrated missingness model) approach. (I-V) Models used in the steps of the different approaches. To ensure comparability, the model components for R t estimation, truncation adjustment, and missing date imputation were designed to be maximally similar across all approaches (see SI Appendix B for full model definitions).
to specify a single generative model for our data, which can then be adapted to right-truncated and incomplete line lists. Specifically, we model C t , the number of cases by date of report, as Poisson distributed with rate λ t , the expected number of cases reported on date t. We link λ t to the number of infections on date t, denoted I t , via convolution over previous dates, i. e.
where ρ t−d is an ascertainment proportion, i. e. the proportion of infections at time t − d that 6/42 eventually become reported cases, and p d the probability of report d days after infection, with an assumed maximum delay D. We here let ρ t = ρ for all t, i. e. we assume that the ascertainment proportion is constant over time. Importantly, both ρ and p are assumed to be known, and p is typically composed of i) a delay from infection to symptom onset (i. e. the incubation period), and ii) a delay from symptom onset to reporting. The latter is specific to each reporting system and should be estimated from line list data. We here follow the common practice of using the empirical distribution of delays observed in the line list as a proxy for the reporting delay. This naive approach ignores the right truncation of observations and assumes a fixed delay over time, and we assess the bias resulting from this simplification on synthetic data. We remark that a rough estimate of ρ is often sufficient for estimating R t , as constant-in-time multipliers cannot bias R t estimates and will distort uncertainty quantification only in extreme cases of misspecification [11]. If ρ t changes over time however, R t estimates will be biased, with slow, constant changes introducing small but continuous bias, and incidental abrupt changes introducing bigger but only temporary bias. In real-world settings, both of these types of changes can occur.
The number of infections I t is modeled via a renewal process as in previous work [2,[27][28][29], i. e.
where ψ represents an assumed intrinsic generation time distribution [30] and G is the maximum generation time (see SI Appendix A.2.2 for the modeling of initial infections). Note that we here use a stochastic formulation, i. e. we account for variance in the offspring distribution by sampling realized infections I t as independently Poisson distributed given ι t [27,31]. As a smoothing prior for R t over time, we use a first-order random walk. To ensure R t > 0, we apply a softplus link to the random walk, i. e.
where softplus(x) = log(1+e kx ) k is the softplus function with sharpness parameter k. We choose k = 4 to ensure that the random walk is effectively on the unit scale for all realistic values of R t , thereby modeling changes in R t over time as symmetric (see SI Appendix A. 1

(B) Truncation adjustment
The direct estimation of R t from case counts by date of report C t requires a deconvolution of C t with potentially long and difficult-to-estimate delays between infection and reporting. An alternative is to estimate R t from case counts by date of symptom onset N t ( Figure 1B), such that the delay distribution in Eq. (1) is simply the incubation period, which is shorter and generally independent of the reporting system. On the present date T however, only N t,d with t + d ≤ T are known, such that the real-time count T −t d=0 N t,d is a downward-biased estimate of N t . This bias must be corrected by estimating the number of symptom onsets which have occurred but are not yet reported, also known in the statistical literature as "nowcasting" in the narrow sense [13]. Conceptually this task is related to time-to-event analysis [33], as it requires estimating the distribution of reporting delays from right truncated observations N t,d to predict the not-yet-reported cases.

Stepwise approach
To nowcast R t , existing methods follow a stepwise approach [17][18][19], where first N t is estimated using the observed N t,d in a truncation adjustment step, and then R t is estimated from the time series of N t (instead of C t , as in Section 2.1). For the truncation adjustment step, we specify a model similar to Günther et al. [17], which jointly combines Bayesian smoothing of the epidemic curve [15,34] with a discrete time-to-event model for the reporting delay distribution [14]. The latter allows to estimate reporting delays that change over time (e. g. due to changes in the reporting process or the burden on health authorities) and are subject to seasonality (e. g. due to weekday effects), which is both observed in practice [14]. For this, we define p t,d as the probability that a case with symptom onset on date t is reported with a delay of d ∈ [0, 1, . . . D] days after symptom onset,

8/42
i. e. we assume that reporting cannot occur before symptom onset (no negative delays) and is at most D days after symptom onset. In practice, it is important to assume a long enough maximum delay D that sufficiently covers the true delay distribution. Cases with delays longer than D should then be completely dropped from the analysis, as setting their delays to D instead can lead to right truncation bias. Given the above definition, we assume that cases with symptom onset on date t are Poisson distributed with rate λ t , and their reporting delays are categorically distributed with event probabilities p t,d . We can then model the observed numbers of cases with symptom onset on date t and reporting delay d as To model the delay probabilities p t,d , we use a discrete time-to-event model in which we parameterize the hazard of reporting h t,d via a logistic regression [14,35], i. e.
where γ d is the logit baseline hazard for delay d, and z t and w t+d are vectors of covariates with coefficient vectors β and η, respectively. This implements a non-parametric, proportional hazards model [36] for the reporting delay as in Günther et al., however, we here model both effects by date of symptom onset z ⊤ t β and effects by date of report w ⊤ t+d η. We design z t to account for changes in the reporting delay of cases over time via a piecewise linear weekly change point model, and w t+d to account for differences in reporting between weekdays (see SI Appendix A.3.1 for details). As a smoothing prior for λ t over time, we use a first-order random walk on the log scale, i. e.
This is a minimally informed prior assuming a stationary time series of symptom onsets which has often been used in earlier work [15, 17, To obtain a nowcast for R t , existing nowcasting approaches typically estimate R t from the nowcast for N t in a separate step. Since the nowcast for N t is subject to considerable uncertainty on dates close to the present, stepwise approaches must repeat the R t estimation step on many sampleŝ {t|t≤T } from the truncation adjustment, and finally combine the resulting R t estimates. This "resampling" procedure is necessary to account for the uncertainty from the truncation adjustment but introduces computational overhead especially if the individual R t estimation steps are costly. In practice, stepwise approaches therefore either ignore uncertainty from the truncation adjustment [38] or use simple, non-parametric methods such as EpiEstim to estimate R t [17,18]. To ensure a consistent comparison of approaches in this study, we fitted the generative model introduced in Section 2.1 on 50 different samplesN {t|t≤T } using MCMC, which incurred comparatively high computational cost. We additionally produced nowcasts using EpiEstim for R t estimation in a supplementary analysis (SI Appendix A.2.4 and C.1).

Generative approach
As an alternative to the stepwise approach, we here propose to integrate R t estimation and truncation adjustment in a single generative model. This is achieved by replacing Eq. (6) in the truncation adjustment model with where ρ t−s is again an assumed ascertainment proportion, τ is the incubation period distribution, L is the maximum incubation period, and I t is the number of infections on date t. Analogous to and obtain posterior samples for R t and N t as described before. The resulting generative approach allows to nowcast R t and N t in a single step, while quantifying uncertainty both from truncation adjustment and R t estimation. Moreover, by explicitly linking the time series of symptom onsets to the underlying infection dynamics, we use the renewal model as a structural prior for the temporal correlation of N t . This way, epidemic modeling is not only used to estimate R t but also to regularize the nowcast for N t , thereby avoiding the additional non-parametric smoothing of λ t in the stepwise approach.

(C) Missing date imputation
In practice, line list data are often incomplete, i. e. the date of symptom onset can be missing for a substantial proportion of cases. Ignoring incomplete cases and computing nowcasts as described in Section 2.2 only using the complete cases N known t,d will underestimate N t and bias nowcasts of R t if the proportion of cases with missing onset date changes over time. It is therefore important to also include the incomplete cases, which are only available as case counts C missing t by date of report.
Here we discuss two stepwise approaches of imputing missing onset dates and then show how a missingness model can be directly integrated into the generative nowcasting model ( Figure 1C).

Stepwise approach
A simple but potentially biased approach is to treat the reporting delays of cases in the line list as independent and identically distributed. Under this assumption, missing onset dates can be imputed by simply subtracting a random delay from the date of report of each incomplete case, i. e. each case counted in C missing t . The random delays could for example be drawn from the empirical delay distribution of delays from complete cases in the line list. However, even if there was no right truncation or changes in the reporting delay over time, such a procedure will yield biased results during an epidemic wave. This is because subtracting delays from the date of report implicitly conditions the delay distribution on the date of report [7,39]. The resulting distribution, also known 11/42 as backward delay distribution, depends on the shape of the past epidemic curve and therefore varies during an epidemic. A more accurate approach is therefore to only assume that cases with the same date of report have identically distributed delays, and to estimate the so-called backward delay probability p ← t,d that a case reported on date t had its symptom onset d days ago [17][18][19]. We here implement this approach by modeling the number of cases with known symptom onset date as and parameterizing p ← t,d using an identical discrete time-to-event model as specified in Eq. (5), but indexed by date of report instead of date of symptom onset (SI Appendix B.4). We remark that by conditioning on the date of report, the above model is not biased by right truncation, but potentially by left truncation (see SI Appendix A.3.2). We compute the likelihood P N known {t,d|t+d≤T } p ← t,d , θ to obtain posterior samples from P p ← t,d , θ N known {t,d|t+d≤T } via MCMC, and use the latter to impute missing onset dates by drawing random delays from the corresponding backward delay distribution.
Importantly, using observed delays to impute missing onset dates implies a missing-at-random assumption, i. e. reporting delays are assumed to be independent of whether cases in the line list are complete or incomplete. Moreover, both the independent imputation and the backward-delay imputation approach are stepwise in that they add an additional imputation step prior to nowcasting.
As a result, the imputed symptom onset dates are treated as observed data during the following step(s), which creates the same obstacle to uncertainty quantification as described in Section 2.2.
Thus, to account for uncertainty of the imputation, a multiple imputation scheme must be used, where multiple imputed data sets are created and a separate nowcast is estimated for each data set [18]. With more complex nowcasting models however, multiple imputation is often impractical and therefore omitted [17], as fitting to a large number of imputed data sets would be prohibitively computationally expensive. Similarly, we only assessed single imputations due to resource constraints in this study.

Generative approach
As an alternative, we propose to directly account for missing symptom onset dates in the generative nowcasting model described in Section 2.2. For this, we assume that symptom onsets on date t become known with probability α t and missing with probability 1 − α t . The case counts with known 12/42 and missing symptom onset date t and delay d are then modeled as which again assumes that symptom onset dates are missing at random, i. e. independent of their reporting delay. While the N missing are not directly observed in the line list, we know that is the sum of Poisson distributed case numbers from previous days with corresponding delays, and it is also Poisson distributed with Note that under this model, only observations C missing t for t > D should be used to avoid bias from left-truncation, or the latent parameters λ, p, and α must be further extended into the past (SI Appendix A.4). As a minimally informed, stationary smoothing prior for α t over time, we use a first-order random walk on the logit scale, i. e.
and estimate α 1 and σ logit(α) with weakly informed priors as before. This assumes that the probability of missing symptom onset dates varies only by date of symptom onset, but more sophisticated models could also account for weekday effects or effects by date of report. Assuming N known and obtain posterior samples for R t and N t using MCMC as before. Therefore, the model can be jointly fit to incomplete line list data without the need for an additional imputation step.
This also accounts for the uncertainty resulting from missing onset dates without the use of a multiple imputation scheme. In comparison to stepwise approaches, the generative approach requires 13/42 estimating the probability of missing onset dates over time, but in turn avoids estimating backwarddelay distributions. Moreover, using estimates for α t , separate nowcasts for N known t and N missing t can be easily obtained.

Modeling of overdispersion
As found in earlier work [17], the modeling of overdispersed case counts can improve nowcasting performance in real-world settings. When nowcasting COVID-19 hospitalizations in Switzerland, we therefore model observed case counts as Negative Binomial instead of Poisson distributed in all our models, e. g. we use

Implementation and estimation
We implemented all models used in the different nowcasting approaches ( Figure 1)

Comparison of approaches using synthetic data
We used synthetic data to compare the different stepwise and generative nowcasting approaches.
To simulate synthetic data, we generated infections through a stochastic renewal process over a period of 200 days in which R t follows a piecewise linear time series with prespecified change points.
The simulation matches the generative models described in Sections 2.1 and 2.2, but simulates the reporting of individual cases and therefore does not assume independent observation noise. We  Figure 15). We used a piecewise linear model to simulate changes in the reporting hazard over time, with random changes in the trend every four weeks, which differed across simulation runs. We also included weekday effects with the odds of reporting being at the baseline Wednesday -Friday, 70% lower on Saturdays, 80% lower on Sundays, 10% higher on Mondays, and 5% higher on Tuesdays. Details of the simulation of infections and reporting are provided in SI Appendix D.
For each simulation run and scenario, nowcasts were obtained for selected weeks in different phases of the epidemic wave ( Figure 2-5, SI Appendix B.6) by applying the nowcasting approaches at different lags of the respective week (at the end of the week, one week after, two weeks after). This allowed us to assess changes in performance as more data becomes available. We fitted the models using always the last three months of line list data until the date of the nowcast. Complete line list 15/42 data, i. e. without missing symptom onset dates, were used to compare the direct R t estimation with the stepwise and generative truncation adjustment approaches. Incomplete line list data were used to compare the stepwise and the generative missing date imputation approaches. Here we randomly selected cases to have a missing symptom onset date with a probability varying over time between 20% and 60% according to a random piecewise linear function, which differed across simulation runs.
To evaluate nowcasting performance in each of the 50 simulation runs, we scored the nowcasts for each week in each phase with respect to N t and R t against the simulated ground truth. As a performance measure, we used the weighted interval score (WIS), which is a quantile-based proper scoring rule that approximates the continuous ranked probability score (CRPS). The CRPS generalizes the absolute error to probabilistic nowcasts. For an observation y, i. e. the true N t or R t , and a predictive distribution F , i. e. the probabilistic nowcast for N t or R t , the weighted interval score is computed as where |y − F 0.5 | is the absolute error of the median, α k ∈ {α 1 , . . . , α K } represent different central prediction intervals at coverage level (1 − α k ), and IS α (F, y) is the interval score for a specific interval, defined as That is, the interval score can be decomposed into three penalty components, where the dispersion component encourages sharpness, and the under-and overprediction components encourage calibration of the nowcast. We used K = 11 different α k , corresponding to the 10%, 20%, . . . , 90%, 95%, and 98% credible intervals (CrI) for N t and R t , respectively. To obtain an overall score WIS for each evaluated week, we took the arithmetic mean of scores from each day of the week, which again yields a proper score. We furthermore computed the relative contribution of each penalty component (dispersion, over-, and underprediction) to the respective WIS. In the results, we report the average WIS over all 50 simulation runs, as well as the percentage of runs where each approach performed best.

Application to COVID-19 in Switzerland
We also compared the direct, stepwise, and generative nowcasting approaches during the COVID-19 pandemic in Switzerland. For this, we used anonymized line list data of cases reported to the Federal Office of Public Health (FOPH) from January 01, 2020 -March 31, 2021. Because in Switzerland the date of symptom onset for COVID-19 was only consistently recorded for hospitalized patients, we only used cases of hospitalization from the line list. As date of report, the date when a hospitalization was recorded by FOPH was used. The symptom onset date was either reported together with the clinical record of hospitalization or slightly before (due to a previous test result which was later merged), but not updated afterward. Negative delays were therefore assumed to be data entry errors, and the respective symptom onset dates were set to missing.
We produced retrospective nowcasts for the number of hospitalizations by date of symptom onset N t (including cases with and without known symptom onset) and the effective reproduction number R t during different phases of the first and second wave (before, at, and after peak, respectively). We On real-world data, the nowcasting performance with regard to N t can be evaluated in retrospect by observing the number of cases with a certain onset date after a sufficiently long delay. This is however only possible for cases with known onset date, i. e. N known t . To obtain a proxy ground truth for N missing t , we used nowcasts at large lags beyond the respective date (7-14 days more than the maximum delay). These nowcasts are informed by fully reported case counts and thus no longer subject to right truncation. The resulting "consolidated" estimates of the true N t were used to evaluate the performance of the different real-time nowcasts with regard to N t , using the same weighted interval scoring rule as described for the synthetic data. Since the effective reproduction

17/42
number R t cannot be observed in a real-world setting, and corresponding estimates are subject to considerable uncertainty, we refrained from assessing the performance of R t nowcasts and focused on a qualitative comparison instead.

Synthetic data: R t estimation and truncation adjustment
We used complete synthetic line list data, i. e. without missing symptom onset dates, to compare i) the direct R t estimation approach, ii) the stepwise nowcasting approach with an additional truncation adjustment step, and iii) the generative nowcasting approach with a joint truncation and renewal model. Figure 2 shows the simulated true N t and corresponding nowcasts for an exemplary simulation run in different phases of the first wave scenario (corresponding results for the second wave scenario are shown in SI Appendix Fig. 4). The baseline simulated reporting delay had a mean of 9 days and standard deviation of 8 days, and the maximum assumed delay was D = 56 days. The nowcasts were conducted using the different approaches and at different lags (nowcast at end of same week, one week after, two weeks after). The direct R t estimation approach did not produce nowcasts for N t , as it directly uses counts by date of report C t instead.

Nowcasts of the number of cases by symptom onset date N t
Compared to C t , which includes reporting delays, the time series of N t tracked transmission dynamics in a more timely and regular manner (SI Appendix Fig. 2-3). In a real-time setting however, the reported number of symptom onsets can have a substantial downward bias towards the present (Figure 2, grey bars), because a proportion of symptom onsets that occur close to the present are not yet reported at the time of nowcasting. This truncation bias was adjusted for both by the stepwise and generative approach, however, nowcasts differed between the two approaches especially on dates close to the present.
While the generative approach predicted exponentially growing or declining case counts before and after the peak of the wave, the stepwise approach predicted case counts to remain at the recent level in all phases. This qualitative difference was strongest for nowcasts conducted at short lags, where nowcasting uncertainty was especially high, and declined with longer lags, as more data

18/42
became available for the respective week.
Below each lag and phase, Figure 2 and SI Appendix Fig. 4 show the weighted interval score (WIS) for each approach across 50 simulation runs. The generative approach achieved systematically better scores in phases with exponential growth or decline, while the stepwise approach had a bias towards underprediction of N t during exponential growth and towards overprediction during decline.
This further highlights the difference in smoothing assumptions of the two approaches. While the renewal model component in the generative approach predicted a continuation of recent infection dynamics, the minimally informed random walk prior in the stepwise approach predicted stationary case counts. Notably, at the peak of the wave, this led the generative approach to overpredict case counts, however only at short lags (same week nowcast), when little signal on the change in transmission dynamics was available. Overall, the WIS for both approaches decreased substantially with larger lags, and nowcasts conducted two weeks after the evaluation period show little difference in performance between the stepwise and the generative approach. Figure 3 shows results for nowcasting R t in different phases of the first wave scenario (corresponding results for the second wave scenario are shown in SI Appendix Fig. 5). Here we found distinct biases in the direct and the stepwise approach. The direct R t estimation approach, which used the empirical distribution of delays between symptom onset and reporting in the line list to model observed cases C t , showed a strong downward bias of R t nowcasts during the first wave, where few cases had been observed initially. This reflects the right truncation of line list data, which generally leads to an underestimation of reporting delays and therefore of infections in the recent past. In the second wave scenario, the effect of right truncation was smaller. However, due to the long delay between infection and reporting in case counts C t , changes in R t were detected comparatively late by the direct R t estimation approach. The stepwise truncation adjustment approach showed a different form of bias,

Nowcasts of the effective reproduction number R t
i. e. nowcasts tended to R t = 1 towards the present, which translated into under-or overprediction of R t depending on the epidemic phase. Importantly, this was observed although the R t estimation step of the stepwise approach used a smoothing prior which expects R t to remain at its current level. This indicates that the R t estimation step was also impacted by the stationary smoothing assumption in the previous truncation adjustment step, i. e. the constant N t nowcasts biased the

19/42
downstream R t estimates towards equilibrium-level transmission. The generative approach did not show such a bias and generally achieved the smallest WIS among the three approaches. Where no sufficient signal for a change in transmission dynamics was available, R t nowcasts from the generative approach remained at the current level. At the peak of the epidemic wave, when transmission declined from R t = 2 to R t = 0.8, this led to an overprediction of R t for nowcasts in the same week.
For all approaches, performance scores improved with larger lags, however, the inferior performance of the direct approach in the first wave scenario was pronounced even for nowcasts at a two-week lag.

Additional comparisons
In the above comparisons, a random walk prior on the expected number of symptom onsets was used in the truncation adjustment step of the stepwise approach. While this smoothing prior has been a default choice in existing approaches [15,17,34, 37], we also implemented a non-stationary, exponential smoothing prior (SI Appendix A.1.2) to test whether the observed bias in nowcasts remained. We found that non-stationary smoothing in the stepwise approach reduced the bias of both N t and R t nowcasts and improved the WIS, but a slight bias remained during phases of exponential growth or decline (SI Appendix Fig. 8-11). We also tested the use of a non-stationary smoothing prior on R t for the generative approach, which led to slightly better performance of R t nowcasts when transmission was changing.
As it is widely used in practice, we also implemented a stepwise approach where the R t estimation step is performed using the non-parametric method by Cori et al. via the package EpiEstim (SI Appendix A.2.4), instead of our semi-mechanistic renewal model described in Section 2.1. Since the R t estimation step does not influence the previous truncation adjustment step, nowcasts for N t with EpiEstim were identical compared to when using a semi-mechanistic renewal model. The nowcasts for R t showed the same bias towards R t = 1, however, they were often more volatile and at the same time considerably less uncertain (SI Appendix Fig. 12-13) when using EpiEstim in the R t estimation step. This overconfidence generally led to higher WIS values of the stepwise approach with EpiEstim. We note that when using EpiEstim, nowcasts for R t could only be obtained at a lag of 8 days or more, as R t estimates must be shifted backward in time to account for the incubation period and to center the 7-day smoothing window used for EpiEstim [7].

20/42
Nowcasting approach Stepwise (truncation adjustment −> renewal model)  Shown are the true number of cases by symptom onset date N t (black), the number of cases reported until the nowcast date (grey bars), and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a stepwise approach using cases by date of symptom onset with a truncation adjustment step (blue), and ii) a generative approach using cases by date of symptom onset with an integrated truncation and renewal model (red). The direct approach using cases by date of report does not produce nowcasts of N t (not shown). Shown below each phase is the weighted interval score (WIS, lower is better) for N t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored vertical bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

21/42
Nowcasting approach Direct (renewal model) Stepwise Shown are the true R t (black) and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a direct approach using cases by date of report with no truncation adjustment (yellow), ii) a stepwise approach using cases by date of symptom onset with a truncation adjustment step (blue), and iii) a generative approach using cases by date of symptom onset with an integrated truncation and renewal model (red). Shown below each phase is the weighted interval score (WIS, lower is better) for R t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

Synthetic data: Missing symptom onset date imputation
To compare the different stepwise and generative approaches to impute missing onset dates, we used the same synthetic line list data as before, but simulated symptom onset dates to be missing with a time-varying probability 1 − α t (Section 2.6). We tested i) a stepwise approach using independent imputation for the imputation step, ii) a stepwise approach using backward(-delay) imputation for the imputation step, and iii) a generative approach using an integrated missingness model. All three approaches used a generative model for truncation adjustment and R t estimation as defined in Section 2.2, allowing us to specifically assess differences resulting from the imputation approach. . N t nowcasts conducted at the end of the same week were highly similar for all three approaches, however, differences were observed for longer lags, i. e. one and two weeks after. Here, nowcasts from the generative approach consistently achieved a better WIS than the stepwise approaches. Nowcasts from the stepwise approach with independent imputation, which ignores the conditioning on the date of report, tended to underpredict the peak of the wave, and this bias increased with longer lags, as more data became available. In contrast, the stepwise approach with backward-delay imputation was generally unbiased, with approximately equal penalties for overand under-prediction. However, as uncertainty from the imputation step was ignored, this approach yielded overconfident nowcasts at larger lags. Nowcasts from the generative approach generally had the widest uncertainty intervals. Figure 5 and SI Appendix Fig. 7 show the corresponding results for nowcasts of R t in the first and second wave scenario, respectively. The three approaches to missing date imputation produced mostly similar R t nowcasts both at shorter and longer lags, but we observed a small bias of the stepwise approach with independent imputation around the change point of R t due to the underprediction of the peak in N t . However, the differences in performance between the approaches were not consistent across phases and comparatively small with regard to the overall WIS, which was dominated by the uncertainty and error from the truncation adjustment and R t estimation.

23/42
Reported until nowcast (known onset date) Reported until nowcast (missing onset date) Imputation approach Stepwise (independent imputation −> nowcasting model) Stepwise  Shown are the true number of cases by symptom onset date N t (black), the number of cases reported until the nowcast date (dark grey bars for known, light grey bars for missing onset dates), and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a stepwise approach using a forward imputation step (green), ii) a stepwise approach using a backward imputation step (blue), and iii) a generative approach using an integrated missingness model (red). All approaches used a generative model for nowcasting. Shown below each phase is the weighted interval score (WIS, lower is better) for N t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

24/42
Imputation approach Stepwise (independent imputation −> nowcasting model) Stepwise  Nowcasts of R t on incomplete line list data of a simulated first wave scenario using different approaches of accounting for missing onset dates. Shown are the true R t (black) and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a stepwise approach using a forward imputation step (green), ii) a stepwise approach using a backward imputation step (blue), and iii) a generative approach using an integrated missingness model (red). All approaches used a generative model for nowcasting. Shown below each phase is the weighted interval score (WIS, lower is better) for R t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row). 25/42

COVID-19 hospitalizations in Switzerland
We used line list data of hospitalized patients who tested positive for SARS-CoV-2 during the COVID-19 pandemic in Switzerland, reported between January 01, 2020 -March 31, 2021, to retrospectively nowcast the number of hospitalizations by date of symptom onset and the effective reproduction number. Overall, the line list contained 25,831 cases of hospitalization, with symptom onset missing for 33% of the cases (SI Appendix Fig. 14). 137 cases had a symptom onset date later than the date of report, in which case the symptom onset date was likely a data entry error and set to missing (see Section 2.7). Our assumed maximum delay of D = 56 days covered 97% of observed reporting delays (SI Appendix Fig. 15), and cases with larger delays were removed from the analysis. We assumed different incubation period and generation time distributions for the wildtype and alpha variants of SARS-CoV-2 and accounted for the introduction and spread of the alpha variant during the second wave in Switzerland (see Section 2.7). Nowcasts were conducted using i) the direct R t estimation approach, ii) a fully stepwise approach (using a backward-delay imputation step, a truncation adjustment step, and an R t estimation step), and iii) a fully generative approach (using a joint missingness, truncation, and renewal model). we found the same qualitative differences between the generative and the stepwise approach as on synthetic data. In particular, nowcasts from the stepwise approach remained mostly stationary on days close to the nowcast date, while nowcasts from the generative approach extrapolated the recent trend of exponential growth or decline. To assess real-time nowcasting performance with regard to N t , we used consolidated estimates based on nowcasts lagged by 7-14 days beyond the maximum delay as a proxy ground truth (see Section 2.7). Based on the consolidated estimates, the generative approach achieved a better WIS in all phases except at the peak of the wave. Nowcasts from the stepwise approach were slightly sharper than from the generative approach at larger lags, likely because of the neglected uncertainty from the missing onset date imputation. Of note, both nowcasting approaches underpredicted N t in the early phase of the epidemic in Switzerland, presumably due to

26/42
an underestimation of the reporting delay based on very few initial hospitalization cases. Figure 7 shows corresponding nowcasts for the hospitalization-based R t . Generally, we found R t nowcasts at a lag below one week to be largely dominated by the smoothing prior of the nowcasting model. This is partly to be expected, as transmission dynamics are additionally delayed by the incubation period, which was on average ≈ 5 days in our case, thus real-time estimates for R t are generally less informed than for N t . Of note, nowcasts using the direct R t estimation approach, which is based on the more delayed case counts C t , were dominated by the smoothing prior even longer, until a lag of two weeks or more. R t nowcasts from the fully stepwise and fully generative approach mostly agreed after a lag of two weeks but showed systematic differences at shorter lags. Specifically, as in our results on synthetic data, nowcasts from the stepwise approach tended toward R t = 1, while nowcasts from the generative approach predicted R t to remain at its current level. Nowcasting uncertainty was mostly similar between the stepwise and the generative approach.

27/42
Fully stepwise (backward imputation −> truncation adjustment −> renewal model)  Shown are nowcasts with 95% credible interval (CrI) for the number of hospitalizations with COVID-19 by date of symptom onset N t in different phases of the first and second wave, obtained through i) a fully stepwise approach using cases by date of symptom onset with a backward imputation step and a truncation adjustment step (blue), and ii) a fully generative approach using an integrated missingness, truncation and renewal model (red). The direct approach using cases by date of report does not produce nowcasts of N t (not shown). Also shown are the number of cases by symptom onset date reported until the respective nowcast date (grey bars), and consolidated point estimates (7-14 days after maximum delay, averaged over all models) of the true N t (black dashed lines). Shown below each phase is the weighted interval score (WIS, lower is better) for N t nowcasts of each approach during a selected week (grey shade) evaluated on the consolidated point estimates. The scores (colored bars) are decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

28/42
Reproduction number  R t in different phases of the first and second wave, obtained from hospitalization line list data through i) a direct approach using cases by date of report with no truncation adjustment (yellow), ii) a fully stepwise approach using cases by date of symptom onset with a backward imputation step and a truncation adjustment step (blue), and iii) a fully generative approach using cases by date of symptom onset with an integrated missingness, truncation, and renewal model (red). Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

29/42 4 Discussion
In this paper, we developed a fully generative model to nowcast the effective reproduction number R t from right-truncated line list data with missing symptom onset dates. Previous methods have addressed this task by separating missing onset date imputation, truncation adjustment, and R t estimation into consecutive, independent steps [17][18][19]. Here we proposed to unify these tasks in a single generative model that can be fit directly to observed data. With such a model, all quantities of interest and their associated uncertainty can be estimated jointly in a Bayesian framework. This is in contrast to stepwise approaches, which face difficulties in propagating uncertainty between consecutive steps and often require resampling schemes that can restrict the individual components.
Moreover, information can only flow in one direction, such that priors and model structure from later steps cannot inform earlier steps. Instead, our generative model enables shared regularization between imputation, truncation adjustment, and R t estimation, thereby eliminating the need for non-parametric smoothing at intermediate steps.
To compare the generative and the stepwise approaches with regard to performance and qualitative behavior, we applied them to synthetic line list data for different outbreak scenarios and to real-world data of hospitalizations during the COVID-19 pandemic in Switzerland. We found that under realistic delays between symptom onset and reporting of hospitalized cases, nowcasts for days close to the present are based on few reported cases and thus can be substantially influenced by prior assumptions about the smoothness and shape of the epidemic curve. In nowcasting practice, it is therefore important to be aware that under long reporting delays, real-time nowcasts can effectively become weakly informed forecasts due to a lack of data signal.
When nowcasting R t , estimates can be obtained either from case counts by date of report C t , or from case counts by date of symptom onset N t . The approach using C t can be more direct as it requires no truncation adjustment of case counts, but it depends on external estimates of the reporting delay distribution to infer the time series of infections. Our results show that the resulting R t estimates can still be substantially biased when using naive empirical estimates of the reporting delay. On the other hand, approaches using cases by date of symptom onset are more complex as they must adjust for the right truncation of the real-time case count. Existing methods conduct this truncation adjustment in a separate step, and we found that this can systematically 30/42 over-and underpredict case counts depending on the current epidemic phase. Specifically, we demonstrated that when modeling the expected number of cases via a stationary random walk -as customary in earlier studies [15,17,18,34, 37] -nowcasts will be biased during periods of exponential growth or decline. Notably, our results show that the stationary smoothing can even overwrite prior assumptions in the subsequent R t estimation step and therefore bias the R t estimates towards R t = 1.
In our proposed generative approach, the renewal model assumed for R t estimation also serves as an epidemiologically motivated smoothing prior for the case counts. As a result, the generative model When nowcasting from incomplete line list data with missing symptom onset dates, we found that even with up to 60% of missing data, informative nowcasts of the total number of symptom onset dates could be made. To account for the missing onset dates, we implemented two stepwise approaches, i.e. with an additional i) independent imputation step and ii) backward-delay imputation step, as well as a generative approach that uses an integrated missingness model. As multiple imputation was not computationally feasible, the stepwise approaches only used a single imputation and thereby ignored uncertainty arising from the missing dates. In contrast, our generative model jointly fits to the complete and incomplete cases in the line list and naturally accounts for uncertainty arising from missing onset dates. Notably, we find that for short nowcasting lags, when few cases have been reported and overall nowcasting uncertainty is high, performance differences between the approaches were mostly negligible. At longer lags, however, the stepwise approach with independent imputation showed bias from misspecified delays especially around the peak of the epidemic curve.
In contrast, the stepwise approach with backward-delay imputation correctly modeled delays during imputation and was thus unbiased but overconfident as it did not account for imputation uncertainty. were however mostly with regard to N t , and no clear differences with regard to R t could be identified, where the overall uncertainty is generally high. In the setting studied here, it thus seems that the uncertainty from missing onset dates may be small compared to the uncertainty from reporting delays and infection dynamics. In our model, we specified α t , the probability for symptom onset dates to be known, with respect to the onset date t. It should be noted that in settings where symptom onset information for already reported cases is entered retrospectively in the line list, also known as backfilling, this probability may also depend on the date of report. That is, the probability for symptom onset dates to be known could be systematically lower for cases reported closer to the present. In such a setting, our generative model may still correctly account for cases with missing onset date, however, this could require a different prior for α t which adequately represents a downward trend towards the present. When backfilling is the main source of missingness, it may also be preferable to index α t by the date of report instead.
In our models, we used a common but simplistic non-parametric smoothing prior, i. e. a first-order random walk, to smooth the time-varying parameters. In practice, a large variety of smoothing priors can be used for this purpose, with the potential of improving the performance of each approach. In a supplementary analysis, we found that when the stepwise approach is used with a non-stationary, exponential smoothing prior [50], the tendency to over-or underpredict case counts during phases of epidemic growth or decline is strongly reduced. Nevertheless, some bias remained and it should be noted that the use of more complex smoothing priors may also introduce other, more subtle biases to R t estimation. At the same time, non-stationary smoothing priors such as innovations state space models [50] or Gaussian processes [2,51] can also be employed to smooth R t in a generative model, potentially enabling tighter credible intervals and better capturing changes in transmission dynamics. Moreover, additional knowledge or data could be integrated in such priors, e. g. to provide more information on time-varying reporting delays or to account for population mobility and non-pharmaceutical interventions [52,53]. In the latter case, a generative approach offers the advantage of modeling population-level effects on disease transmission directly via R t instead of via the expected number of symptom onsets λ t , thereby ensuring temporal accuracy of the effect model. Moreover, due to the joint inference approach, better estimation of reporting delays may also improve the handling of missing onset dates, and more accurate models of transmission dynamics may also improve nowcasts of case counts. However, as the suitability of each option will vary with 32/42 the specific setting and application, it is important that practical tools for real-time surveillance allow for flexible model specification. In line with the generative approach described in this paper, We note several general limitations of the methods used in this work. First, to model changes in the reporting delay over time, we have used a time-to-event model formulation in which effects by date of onset and date of report are applied proportionally on the logit scale to the hazard for each delay [17]. This assumption reduces model complexity, but limits the ability to represent certain types of distributional shifts over time and excludes negative delays. Second, we treated reported symptom onset dates as exact, while in reality there may be uncertainty and inconsistency in how onset dates are ascertained. We also note that line lists can include patients with asymptomatic infection, in which case the imputed symptom onset dates are only hypothetical events that serve as a proxy for the date of infection. This may be the case, for example, for patients who are hospitalized for other reasons and become infected in hospital, but will only bias R t estimates if the proportion of infections in hospital to the total number of infections changes over time. Third, like previous methods, our models encode a missing-at-random assumption for cases with missing symptom onset date. If missingness correlates with reporting delay, this assumption will be violated and could lead to bias in imputation and truncation adjustment. It is therefore recommendable to conduct sensitivity analyses if additional data about the presence of symptoms is available. In theory, such information could also be integrated into our framework to explicitly model correlation between reporting delays and missingness. Fourth, we assumed that the distributions for the generation interval and the incubation period are exactly known, which is seldom the case in practice. However, both in the generative and the stepwise approach, it is possible to account for uncertainty in these parameters [2]. Fifth, we assumed a fixed ascertainment proportion ρ t = ρ over time to ensure identifiability. Under this assumption, exact knowledge of the ascertainment proportion is not essential because a misspecification of ρ does not bias R t and distorts uncertainty quantification only in extreme cases. It thus allows us to obtain R t estimates from hospitalization data that are indicative of population-wide transmission, but will lead to bias in time periods where ρ t changes. This is particularly important when there are differences in reporting delays or ascertainment proportions for cases from distinct subpopulations, e. g. age groups. In such a case, it can be necessary to add group 33/42 structure to the renewal and reporting delay components of the model [26, 54, 55], to avoid bias from differing transmission dynamics between the subpopulations. Moreover, time-varying ascertainment proportions may also estimated by fitting to several data sources, e. g. hospitalizations and deaths, simultaneously. Sixth, we have only tested a simplistic version of the direct R t estimation approach, with naive external estimates of the reporting delay. In practice, reporting delays may be externally estimated using more sophisticated methods that account for right truncation and censoring, and the R t estimation model can account for weekday effects and other modifiers of case counts [25].
Last, in our assessments using synthetic data, we stratified the performance of different approaches by epidemic phase. While this is helpful for understanding the behavior of a model, we do not recommend this approach for model selection in real-time surveillance, as it is generally difficult to identify the current epidemic phase.
We conclude by noting that the generative model proposed in this work naturally encompasses other forms of R t estimation. For example, in settings where no line list data is available, R t can only be estimated from the time series of reported cases as described in Section 2.1. However, this scenario corresponds to a special case in the generative model where all symptom onset dates are missing and a strong prior on the reporting delay is used. Similarly, R t is sometimes only estimated retrospectively, when case data are already fully reported, with some onset dates missing. This corresponds to a case in our model where all days of interest are sufficiently far in the past, beyond the maximum reporting delay. In addition, we have here used the example of nowcasting by date of symptom onset, but our framework is similarly applicable to other dates of reference, such as the date of positive test.

A Details of modeling components
The nowcasting approaches compared in this work use a varying number of models with different complexity. For instance, a fully stepwise approach for incomplete line list data uses a model for imputation, a model for truncation adjustment, and a model for R t estimation, while a fully generative approach only fits a single, joint model. At the same time, some models share certain characteristics, such as estimating a reporting delay distribution. In the following, we present these shared characteristics and discuss their modeling in detail.

A.1 Time series smoothing
All of our models include one or several latent, time-varying parameters, for example the effective reproduction number R t , the expected number of symptom onsets λ t , or the probability of a symptom onset date to become known α t . When we do not further decompose these parameters via a mechanistic model, we instead specify a non-parametric time series smoothing prior for them. This allows us to estimate the parameters from observed data while providing some regularization through the smoothing structure of the prior.

A.1.1 Random walk model
In our main analysis, we use the simple and popular random walk model [1][2][3], where θ is a time-varying parameter to estimate and f is a link function, e. g. a log link. We further use diffuse hyperpriors for the intercept θ 1 and the variance σ f (θ) 2 of the random walk (see Supplement B.6), allowing these to be simultaneously estimated from the data. In the context of nowcasting, where comparatively few data are available towards the present, this means that uncertainty about the most recent values of θ also depends on the estimated variance of the random walk. Intuitively, this reflects that expectations are less certain when an observed process exhibited high variance in the past.

A.1.2 Exponential smoothing
We tested the use of a non-stationary smoothing prior as an alternative to the simple random walk prior for λ t and R t in the stepwise approach and for R t in the generative approach. Here we used an innovations state space model with additive errors that implements an exponential smoothing prior according to Holt's linear trend method with dampening [4]. The time-varying parameter of interest, θ, was modeled recursively as where l t is the level, b t the trend, and ϵ t the additive error at time t, respectively. Moreover, α ′ is a level smoothing parameter, β ′ a trend smoothing parameter, and ϕ ′ a dampening parameter. As before, we use hyperpriors for the intercepts l 1 and b 1 and for the variance σ f (θ) 2 of the innovations (see Supplement B.6), allowing these to be estimated from the data. We also use weakly informative priors for the smoothing parameters α ′ and β ′ (Supplement B.6). Due to identifiability issues, the dampening parameter was fixed at ϕ ′ = 0.9, meaning that the strength of the trend halves approximately every week of predicting into the future.

A.1.3 Link functions
We use different link functions for the time-varying parameters. For λ t in the stepwise truncation adjustment approach, we use a log link, which means that increments have a multiplicative effect on the expected number of symptom onsets, which is in line with an exponential growth model on λ t , where the increments can be interpreted as instantaneous growth rates. For α t , we use a logit link, which ensures that 0 < α t < 1. For R t , we use a softplus link [5], i. e. f (x) = softplus −1 (x) with , where k is a sharpness parameter. We choose k = 4 to obtain a function that behaves effectively like f (x) = x for practically relevant values of R t while ensuring R t > 0.
Therefore, increments have approximately additive effects on R t , which we slightly prefer as a default smoothing assumption, as it avoids asymmetric uncertainty intervals for R t towards the present

4/36
and allows to specify an easily interpretable prior for the random walk variance. Such a model may for example be derived by assuming additive effects of interventions and behavioral changes [6].
However, if a maximum effective reproduction number can be assumed, a viable alternative may be the use of a generalized logit link [7].

A.2.1 Effective reproduction number
In all our models, R t specifically refers to the instantaneous effective reproduction number, i. e. the expected number of secondary infections an individual would cause if conditions remained as they were at time t [8,9]. It is equal to the expected number of infections at time t divided by the total infectiousness at time t, which depends on previous infections weighted by the generation interval distribution [10]. R t is thus a measure of the overall per-person transmission at time t, including effects of population immunity as well as interventions and behavioral changes. In renewal modeling, R t is inferred as a latent parameter of the discrete renewal equation [11] where I t−s is the number of infections that occurred s days before t, and ψ the generation interval distribution with maximum generation time G.

A.2.2 Seeding of infections
The renewal equation is only properly defined for times t > G, since new infections may be generated from previous infections up to G days before. The number of initial infections for t ≤ G must therefore be modeled using a non-mechanistic prior (also known as seeding when at the start of a new epidemic). We here use a random walk for ι t on the log scale, i. e.
with hyperpriors for the intercept and random walk variance as described in Supplement B.6.

A.2.3 Realized infections
To account for noise in the infection process, we assume that the number of realized infections is Poisson distributed with rate ι t (given through the renewal equation), i. e. I t |ι t ∼ Poisson(ι t ).
However, discrete latent parameters cannot be sampled using the Hamiltonian MCMC No-U-Turn sampler in Stan. Similar to earlier work [11,12], we thus use the Normal distribution as a continuous approximation of the Poisson distribution, i. e. I t |ι t ∼ Normal(ι t , √ ι t ).

A.2.4 EpiEstim
The package EpiEstim implements a Bayesian framework by Cori et al. [10] to directly compute R t estimates from the time series of cases by symptom onset date N t . Under this framework, using a Gamma(a, b) prior for R t and a smoothing window of τ days, the posterior distribution of R t is

A.3.1 Time-to-event model and reporting effects
We express the reporting delay probabilities p t,d via a discrete time-to-event model [3,13]. That is, we model the so-called hazard of reporting h t,d , i. e. the conditional probability of a case with symptom onset on date t to be reported with a delay of d days, given that it is not reported earlier.
We use logistic regression to parameterize h t,d , i. e.
Here, γ d is the logit baseline hazard for delay d. Since γ d is estimated as an independent parameter, we obtain a non-parametric model for the reporting delay distribution with flexible shape. To account for differences in reporting over time, z ⊤ t β and w ⊤ t+d η are added as additional predictors. Here, z t is a vector of covariates dependent on symptom onset date t, and w t+d is a vector of covariates dependent on the report date, which is d days after symptom onset date t. Moreover, β and η are vectors of coefficients to be estimated. This specification implements a proportional hazards model [14] for the reporting delay, where z ⊤ t β describes effects on the hazard by date of symptom onset, and w ⊤ t+d η describes effects by date of report.

6/36
We design z t to account for changes in the reporting delay of cases over time via a piecewise linear change point model [3], i. e.
where the i represent n = ((T − 1) div K) + 1 different change points, spaced at K = 7 days distance and indexed backward in time from the present (the first changepoint is K days before the present).
Note that as the most distant changepoint is not necessarily preceded by K days, it is instead specified as z t,n = max(0, min((n − 1) We design w t+d to indicate different weekdays using dummy encoding, i. e.
When applying our models to real-world data, public holidays were coded as Sundays.
Given the hazard h t,d , the probability for a case with symptom onset on date t to be reported with delay d is [13]

A.3.2 Backward delays
When modeling backward delays, we use a model analogous to the one for forward delays. For a case with date of report t ′ and reporting delay d, the backward reporting hazard is then ). We here index α t by the date of symptom onset t, but it is in theory also possible to use the date of report t ′ = t + d instead. In the latter case (date of report), we would have Which version is to be preferred will depend on assumptions about the process leading to missing onset dates. We only expect larger differences when a t is specified via a more complex time series model which e. g. accounts for weekday effects or trends towards the present.
In the joint model for missing onset dates, truncation adjustment, and R t estimation as specified in Supplement B.5, only observations C missing t for t > D can be used. This is because the counts all corresponding parameters such as I t , ι t , and R t must also be extended further into the past. An important disadvantage of this approach is that these extended parameters are potentially informed by very little data. In particular, the reporting delay probabilities p t,d for t < 1 are not directly informed by any observations in the line list, meaning that weekday effects but no trends over time can be modeled for these delays.

B Full model definitions
To realize the different approaches compared in this study, we implemented (I) a model for reproduc- model. We refer the reader to the main text for detailed explanations of the notation and variables.

B.1 Model I: R t estimation
Model C t |λ t ∼ Poisson(λ t ) / NegBin(λ t ) cases by report date reproduction number

B.5 Model V: Joint missing onset date, truncation adjustment, and R t estimation
Model incomplete cases by report date probability of known onset date reproduction number

12/36
Covariates z t = (z t,1 , z t,2 , . . . , z t,n ) date of symptom onset covariates w t+d = (w t+d,1 , w t+d,2 , . . . , w t+d,6 ) date of report covariates  Table 1 provides an overview of the priors used for parameters in different model components. Priors were specified broadly to provide regularization to the estimation without distorting results, and the same priors were used across all models. Note that for the random walk models on time-varying parameters such as λ t , R t or α t , we used a hyperprior that allowed the random walk variance to be simultaneously estimated from the data.
For the baseline hazard in the reporting delay model, we provided a constant, diffuse prior across all delays, corresponding to a piecewise exponential distribution in expectation, but with sufficient uncertainty to accommodate various delay distributions. When modeling symptom onsets in model

13/36
II, the prior for the intercept of λ was centered on a rough empirical estimate of the expected number of symptom onsets at t = 1, i. e.λ 1 = D d=0 (N known 1,d ). Similarly, for the renewal model used in models I, III, and V, the intercept of infections in the seeding phase was given a prior centered onι 1 = 1 ρλ 1 , with ascertainment proportion ρ. Such use of data in the prior only served to constrain the infection time series to the correct order of magnitude. When modeling the effective reproduction number, the use of a softplus link meant that R t is essentially described via a random walk on the unit scale. Here we chose a wide variance prior that would theoretically allow fast changes in R t , such as dropping from 4 to 1 within one week.

Implementation details
We implemented models I-V as probabilistic programs in Stan [16]. The implementation of model II was adapted from the largely similar model by Günther et al. [3], available at https://github. com/FelixGuenther/nc_covid19_bavaria and improved for efficiency. In particular, we used log scale (or logit scale) versions of the parameters λ t , p t,d , and α t and propagated them until the final likelihood to reduce computations and to improve numerical stability. We used non-centered parameterizations for the noise components of the random walk and innovations state space models.
For the stepwise approaches, posterior samples were extracted from the fitted model for the previous step and used as data input for the subsequent step. In the case of imputation, we used only a single random sample of the posterior distribution of imputed symptom onset dates from model IV to compute the imputed case countsN t,d , which were then used as input for model II.
In the case of reproduction number estimation, we

C.2 Sampling
For sampling of Stan models, we used cmdstan version 2.30.0 via cmdstanr version 0.5.0 [17] with the default configuration of the No-U-Turn sampler, i. e. four chains with 1,000 warm-up and 1,000

16/36
sampling iterations each [16]. To achieve a faster warm-up, important parameters of the model were initialized using crude empirical estimates from the data. For example, the baseline reporting hazard was initialized using an empirical estimate of the delay distribution ignoring changes over time, weekday effects, and truncation, and the time series of infections was initialized by shifting the time series of observed symptom onsets backward in time by the mean incubation period.

C.3 Diagnostics
Each fitted model was automatically checked for ineffective sampling, indicated by low effective sample sizes (ESS < 400) [18], and for convergence problems, indicated by divergent transitions [19], a low Bayesian fraction of missing information (E-BFMI < 0.2) [20], or high values of the Gelman-Rubin convergence diagnostic (R > 1.01) [21]. The checks indicated sufficient effective sampling sizes and good convergence and mixing of chains for all fitted models.

D Comparison on synthetic line list data D.1 Simulation
To obtain a synthetic line list from a single simulation run, we first simulated a stochastic trajectory of infections according to a specified transmission scenario (first or second wave), and then sampled cases with date of symptom onset and reporting delay that are recorded in the line list. Finally, to simulate incomplete line list data, we randomly marked symptom onset dates as missing for some cases. The reporting delays and probabilities of missingness were simulated to change randomly over time and differ across simulation runs. In the following, we provide details for each step of the simulation.

D.1.1 Simulation of infections
Infections were simulated by applying a stochastic renewal model to a given transmission scenario.
A transmission scenario in our simulation is characterized by a piecewise linear time series for R t and a seeding rate of initial infections. The first wave scenario starts with a seeding rate µ seed = 0.5 and R t = 2. Transmission stays at R t = 2 until day 70, then drops linearly to R t = 0. For all simulations, we used a generation interval distribution w that follows a discretized Gamma distribution with shape 1.43 and rate 0.29 (i. e. mean 4.9 and standard deviation 4.1 days) [22] and is truncated at a maximum generation time of 21 days. To obtain a simulated trajectory of infections, we sampled initial infections during the first 3 weeks as independently Poisson distributed with rate µ seed on each day. Then, we recursively applied the renewal equation while sampling realized infections I t |ι t ∼ Poisson(ι t ) at each time step, which can be interpreted as realizing an age-dependent branching process [11]. The final trajectory of infections was then used to simulate a synthetic line list.

D.1.2 Simulation of line list
Not all infections become reported cases. To mimic the generation of hospitalization line list data, we sampled infections only with a probability of 2% as cases in the line list. For all simulations, we assumed the incubation period to be independent of the generation interval and Gamma distributed with a mean of 5.3 days and a standard deviation of 3.2 days [23]. We drew a random incubation period for each case to simulate its date of symptom onset.
To obtain the corresponding date of report, we drew a reporting delay from a discrete delay distribution. As the baseline delay, we used a discretized lognormal distribution with a mean of 9 To simulate incomplete line list data with missing symptom onset dates, we selected cases with symptom onset on date t to have a missing onset date with probability 1 −α t , and let 1 −α t vary according to a random piecewise linear function with changepoints every 4 weeks, where the values at the changepoints were drawn from a geometric random walk constrained between 20% and 60% missingness.  The second wave scenario starts with a higher number of infections and equilibrium-level transmission, which increases to exponential growth after 70 days, and finally transitions into exponential decline. Differences in reporting delay, e. g. reduced reporting on weekends, only affect the time series of cases by date of report.

D.2 Performance evaluation
We produced 50 different simulation runs for the first wave and second wave scenario, respectively. For each scenario, the above procedure yielded three differently lagged nowcasts for each approach, for each epidemic phase, and for each of the 50 simulation runs. To evaluate the nowcasting performance of the different approaches, we compared each nowcast with the simulated ground truth from the respective synthetic line list. We assessed performance with regard to nowcasting the number of symptom onsets N t and nowcasting the effective reproduction number R t , using the weighted interval score (WIS) as described in the main text.

D.3 Supplementary figures for results on synthetic data
In the following, we provide supplementary figures for the results on synthetic data. These include the results of the main analyses for the second wave scenario (Figures 4 -7), results for models with exponential smoothing priors (Figures 8 -11), and results for the stepwise approach using EpiEstim (Figures 12 and 13).

21/36
Nowcasting approach Stepwise (truncation adjustment −> renewal model)  Shown are the true number of cases by symptom onset date N t (black), the number of cases reported until the nowcast date (grey bars), and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a stepwise approach using cases by date of symptom onset with a truncation adjustment step (blue), and ii) a generative approach using cases by date of symptom onset with an integrated truncation and renewal model (red). The direct approach using cases by date of report does not produce nowcasts of N t (not shown). Shown below each phase is the weighted interval score (WIS, lower is better) for N t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

22/36
Nowcasting approach Direct (renewal model) Stepwise (truncation adjustment −> renewal model)  Shown are the true R t (black) and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a direct approach using cases by date of report with no truncation adjustment (yellow), ii) a stepwise approach using cases by date of symptom onset with a truncation adjustment step (blue), and iii) a generative approach using cases by date of symptom onset with an integrated truncation and renewal model (red). Shown below each phase is the weighted interval score (WIS, lower is better) for R t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

23/36
Imputation approach Stepwise (independent imputation −> nowcasting model) Stepwise  Shown are the true number of cases by symptom onset date N t (black), the number of cases reported until the nowcast date (dark grey bars for known, light grey bars for missing onset dates), and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a stepwise approach using a forward imputation step (green), ii) a stepwise approach using a backward imputation step (blue), and iii) a generative approach using an integrated missingness model (red). All approaches used a generative model for nowcasting. Shown below each phase is the weighted interval score (WIS, lower is better) for N t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

24/36
Imputation approach Stepwise (independent imputation −> nowcasting model) Stepwise  Nowcasts of R t on incomplete line list data of a simulated second wave scenario using different approaches of accounting for missing onset dates. Shown are the true R t (black) and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a stepwise approach using a forward imputation step (green), ii) a stepwise approach using a backward imputation step (blue), and iii) a generative approach using an integrated missingness model (red). All approaches used a generative model for nowcasting. Shown below each phase is the weighted interval score (WIS, lower is better) for R t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

25/36
Nowcasting approach  Shown are the true number of cases by symptom onset date N t (black), the number of cases reported until the nowcast date (grey bars), and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a generative approach with a stationary random walk prior on R t (red, same as main text) ii) a stepwise approach with a non-stationary exponential smoothing prior on λ t (blue), and iii) a generative approach with a non-stationary exponential smoothing prior on R t (beige). Shown below each phase is the weighted interval score (WIS, lower is better) for N t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

26/36
Nowcasting approach  Shown are the true number of cases by symptom onset date N t (black), the number of cases reported until the nowcast date (grey bars), and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a generative approach with a stationary random walk prior on R t (red, same as main text) ii) a stepwise approach with a non-stationary exponential smoothing prior on λ t (blue), and iii) a generative approach with a non-stationary exponential smoothing prior on R t (beige). Shown below each phase is the weighted interval score (WIS, lower is better) for N t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

27/36
Nowcasting approach Generative (truncation adjustment + renewal model) Stepwise Shown are the true R t (black) and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a generative approach with a stationary random walk prior on R t (red, same as main text) ii) a stepwise approach with a non-stationary exponential smoothing prior on λ t (blue), and iii) a generative approach with a non-stationary exponential smoothing prior on R t (beige). Shown below each phase is the weighted interval score (WIS, lower is better) for R t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

28/36
Nowcasting approach Generative (truncation adjustment + renewal model) Stepwise (truncation adjustment (ETS) −> renewal model)  Nowcasts of R t on line list data of a simulated second wave scenario using different approaches of adjusting for right truncation with stationary and non-stationary smoothing priors. Shown are the true R t (black) and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a generative approach with a stationary random walk prior on R t (red, same as main text) ii) a stepwise approach with a non-stationary exponential smoothing prior on λ t (blue), and iii) a generative approach with a non-stationary exponential smoothing prior on R t (beige). Shown below each phase is the weighted interval score (WIS, lower is better) for R t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

29/36
Reproduction number  Nowcasts of R t on line list data of a simulated first wave scenario using a stepwise approach with a renewal model and EpiEstim for R t estimation. Shown are the true R t (black) and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a stepwise approach using a renewal model for R t estimation (blue), and ii) a stepwise approach using EpiEstim with a smoothing window of 7 days for R t estimation (green). Shown below each phase is the weighted interval score (WIS, lower is better) for R t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. one week later (middle row), and two weeks later (bottom row). Nowcasts using EpiEstim were shifted backward in time to account for the incubation period and to center the smoothing window of EpiEstim, and are therefore only available at a lag of 8 days or longer.

30/36
Reproduction number  Nowcasts of R t on line list data of a simulated second wave scenario using a stepwise approach with a renewal model and EpiEstim for R t estimation. Shown are the true R t (black) and point nowcasts with 95% credible intervals (CrI) in four different phases of the epidemic wave, obtained through i) a stepwise approach using a renewal model for R t estimation (blue), and ii) a stepwise approach using EpiEstim with a smoothing window of 7 days for R t estimation (green). Shown below each phase is the weighted interval score (WIS, lower is better) for R t nowcasts of each approach during a selected week (grey shade) over 50 scenario runs. Colored bars show average scores, decomposed into penalties for underprediction (crosshatch), dispersion (circles), and overprediction (stripes). The horizontal proportion bar below shows the percentages of runs where each approach performed best. Results are shown for nowcasts made at different lags from the selected week (vertical dotted lines), i. e. at the end of the selected week (top row), one week later (middle row), and two weeks later (bottom row).

31/36 E Application to COVID-19 in Switzerland E.1 Hospitalization line list data
Hospitalization line list data of patients who tested positive for COVID-19 was provided in anonymized form by the Swiss Federal Office of Public Health (FOPH). Cases were stratified by date of symptom onset, date of hospitalization, and date of report of the hospitalization to FOPH. When nowcasting, it is essential to use the date when a case effectively became available for analysis, which was the date of report of the hospitalization in our case. If we instead used the date of hospitalization as report date, any right truncation resulting from the delay between hospitalization and reporting of the hospitalization would be neglected, and the nowcasts would still be downward biased. The modeled delays in this study therefore implicitly included both the delay between symptom onset and hospitalization and the delay between hospitalization and reporting of the hospitalization. The date of hospitalization itself was not explicitly modeled. Figure 14 shows the number of cases over time by date of report of the hospitalization, stratified by cases with known and with missing symptom onset date. The weekly moving average percentage of cases with missing symptom onset date in the line list varied between 16% and 63%. Figure 15 shows incubation period and generation interval distributions explicitly in the renewal model but simply supplied different parameters to the nowcasting models over time.