The age profile of respiratory syncytial virus burden in preschool children of low- and middle-income countries: A semi-parametric, meta-regression approach

Background Respiratory syncytial virus (RSV) infections are among the primary causes of death for children under 5 years of age worldwide. A notable challenge with many of the upcoming prophylactic interventions against RSV is their short duration of protection, making the age profile of key interest to the design of prevention strategies. Methods and findings We leverage the RSV data collected on cases, hospitalizations, and deaths in a systematic review in combination with flexible generalized additive mixed models (GAMMs) to characterize the age burden of RSV incidence, hospitalization, and hospital-based case fatality rate (hCFR). Due to the flexible nature of GAMMs, we estimate the peak, median, and mean incidence of infection to inform discussions on the ideal “window of protection” of prophylactic interventions. In a secondary analysis, we reestimate the burden of RSV in all low- and middle-income countries. The peak age of community-based incidence is 4.8 months, and the mean and median age of infection is 18.9 and 14.7 months, respectively. Estimating the age profile using the incidence coming from hospital-based studies yields a slightly younger age profile, in which the peak age of infection is 2.6 months and the mean and median age of infection are 15.8 and 11.6 months, respectively. More severe outcomes, such as hospitalization and in-hospital death have a younger age profile. Children under 6 months of age constitute 10% of the population under 5 years of age but bear 20% to 29% of cases, 28% to 39% of hospitalizations, and 38% to 50% of deaths. On an average year, we estimate 28.23 to 31.34 million cases of RSV, between 2.95 to 3.35 million hospitalizations, and 16,835 to 19,909 in-hospital deaths in low, lower- and upper middle-income countries. In addition, we estimate 17,254 to 23,875 deaths in the community, for a total of 34,114 to 46,485 deaths. Globally, evidence shows that community-based incidence may differ by World Bank Income Group, but not hospital-based incidence, probability of hospitalization, or the probability of in-hospital death (p ≤ 0.01, p = 1, p = 0.86, 0.63, respectively). Our study is limited mainly due to the sparsity of the data, especially for low-income countries (LICs). The lack of information for some populations makes detecting heterogeneity between income groups difficult, and differences in access to care may impact the reported burden. Conclusions We have demonstrated an approach to synthesize information on RSV outcomes in a statistically principled manner, and we estimate that the age profile of RSV burden depends on whether information on incidence is collected in hospitals or in the community. Our results suggest that the ideal prophylactic strategy may require multiple products to avert the risk among preschool children.


S1.2 DESCRIPTIVE SUMMARIES OF ALL DATA
First, we will discuss some small details of how the data was extracted in order to function with our approach. Generalized additive mixed models will be discussed next, and then a detailed explanation of how our model choices led to the outcome models (OMs), the burden models (BMs), and our estimates of the window of vulnerability for RSV infection, hospitalization, and death. Therefore, a brief overview of generalized additive mixed models, the family of models behind our approach will be given in section S1.3. The explicit mathematical specification behind the relationships depicted in Fig 1 will be stated in section S1.4 and the model specifications for each of the four splines will be given in section S1.5. The specification of the burden models will be discussed in detail in section S1.6. And lastly, how the mean, median, and peak age of infection were calculated will be discussed in section S1.7. There were some small modifications to the way the data were presented by Shi et al in order to make it amenable for our approach in R [1,2]. No modifications were necessary to the Li et al data.

S1.1.1 Modifications to incidence data from Shi et. al
Some data were only reported as estimates and standard deviations, requiring us to back-calculate for the cases and the counts. We assumed that the equation to estimate confidence intervals had been: = / confidence interval of the log rate: where is the population time observed. are the cases observed.
/2 is the z-score for the 2.5th and 97.5th percentile, usually the value of 1.96. The above formulation avoids negative confidence bounds -which aligns with the confidence intervals in the Shi et. al data. Therefore, we would back-calculate the cases with the following formula (− /2 /ln(low CI /ˆ )) 2 and back-calculate the population by /ˆ . S1.1.2 Modifications to fatality (hCFR) data from Shi et. al When the probability of deaths in the hospital (hCFR) was presented as 0.0005, we calculated a count of 0. When no deaths were given in a category, we assumed that that category was not observed and we placed an NA instead of 0. S1.2 Descriptive summaries of all data S1.2 S1.  Figure A: Geographic distribution of the available data. Countries colored in gray were countries that we did not have data from or were high-income countries and thus excluded from our analysis. S1.3

S1.3 Generalized additive mixed models (GAMM)
Return to the Table of Contents.
Generalized additive models Unlike fully parametric models in which the predictors are assumed to have a relationship to the outcome that can be characterized by linear combinations of predictors -potentially a strong or unwarranted assumption -generalized additive models do not make the assumption that the relationship between the predictor and the outcome is linear (or a simple transformation of a linear function, such as a log-linear or logit-linear function). [3,4] Instead, generalized additive models (GAM) are the sum of polynomials that result in a variety of flexible curves or splines. The shape of a spline is broadly determined by two components: 1) the number of "knots", or places when the curve changes direction or convexity, and 2) the "smoothing" parameter, which constrains the number of grooves and inflections of a curve to achieve both parsimony and more general predictive capacity. [5,3,4] The smoothing parameter is a penalty term on the objective function, given by the second derivative of the model, thus penalizing more complicated curves. The penalty term in the smoothing splines circumvents the need to find the ideal number or location of knots in each regression model. The selection of the smoothing parameter of the penalty term is done through maximum-likelihood [5,3,4].
Non-normal outcomes The observed measurements are assumed to arise from probabilistic processes that can be approximated by off-the-shelf distributions (normal, binomial with logit-link, Poisson with log-link), much like the more conventional generalized linear models, and thus the statistical package employs maximum-likelihood methods (or variants of it) to estimate the placement of the knots and the parameters of the polynomials [5,6].
Mixed-effects framework An extension to GAMs, GAMMs for mixed-effects models, makes it possible to calculate fixed-and random-effects for these models. In our analysis, the impact of age on the outcome of interest (incidence, hospitalizations, deaths) was a fixed-effect, and the random-effect accounted for the differences between studies. This approach is analogous to the "meta-regression frameworks" observed in the literature [7,8,9,10,11]. In other words, splines in a mixed-effects framework decompose the observed variance into two components: a global trend (by age) using a fixed-effect spline, and a study-specific trend using random-effect splines. Specifically, the random effect component accounts for the difference between the age-related trend observed in each study and the global trend. S1.5

S1.4 Outcome Models (OM) I & II
Return to the Table of Contents. Four splines were constructed to characterize different features of RSV epidemiology in children under the age of 5 years: I) incidence as measured in community-based studies II) incidence as measured in hospital-based studies III) probability of hospitalization (measured in community-based studies) IV) probability of death among hospitalized patients (measured in hospital-based studies) As illustrated in Fig 1, the incidence of all cases, hospitalized cases, and deaths (in the hospital and in the community) can be estimated in two ways: 1) from the extant data by taking the incidence from community-based studies and using the probabilities to calculate hospitalizations and hospital-based deaths (OM I); or by taking the incidence from hospital-based studies (as an index of the relative level of disease at each age group) and back-calculating the incidence of disease in the community and calculating the incidence of deaths in the hospital (OM II). It should be noted that the product of incidence and probability yields an incidence.

S1.4.1 OM I: community-based studies as a basis for incidence
For OM I, the basis of incidence calculations come from community-based incidence studies: All incidence = community-based incidence spline Hospitalization incidence = community-based incidence spline × hospitalization probability spline Incidence of hospital-based mortality = community-based incidence spline × hospitalization probability spline × hCFR probability spline S1.4.2 OM II: hospital-based studies as a basis for incidence For OM II, the basis of incidence calculations comes from hospital-based incidence studies: All incidence = hospital-based incidence spline hospitalization probability spline Hospitalization incidence = hospital-based incidence spline Incidence of hospital-based mortality = hospital-based incidence spline × hCFR probability spline S1.5 Spline Models

Incidence Specifications
For incidence splines (splines I and II, the incidence of RSV based on community-and hospital-based studies, respectively) we assume a Poisson function: log(cases /population ) = + + (log(age)) + (log(age)) Where: is the overall intercept (the probability of hospitalization or death at age 0 in any low-or middle-income country) study-specific ( ) random effect on the intercept (log(age)) a global trend (fixed-effect) spline for the correlation between the outcome and age, constructed with a thin-plate smooth with a penalty on the null space (shrunk to zero). (log(age)) a set of study-specific ( ) random-effects splines constructed with a tensor product basis of a thin-plate smooth with a penalty on the null space (shrunk to zero) and a random-effects basis. S1.6

S1.5 SPLINE MODELS
Traditionally, the population element is put on the right-hand side as a predictor constrained to have a coefficient equal to one (called an "offset"). This is the result of a property of log functions of ratios: log(cases ) − log(population ) = + + (log(age)) + (log(age)) log(cases ) = + + (log(age)) + (log(age)) + offset[log(population )]

Incidence Splines -Income Group Stratification
We compared the null model above with one in which the age trend differs by World Bank income group classification: log(cases ) = + + + (log(age)) + (log(age)) + offset[log(population )] Where: fixed-effect parametric coefficient terms for the indicator variables representing the World Bank income group . (log(age)) a trend (fixed-effect) spline for the correlation between the outcome and age, stratified by World Bank income group and constructed with a thin-plate smooth with a penalty on the null space (shrunk to zero).
As an analogy to the more familiar linear regression, the age and the income group interact so that the difference between the outcome across income groups is not uniform across ages, but rather that the relationship of age and the outcome is unique for each set of countries in an income group. It should be noted that the spline-fitting package we employed centered the factor-specific (income-group-specific) splines, and the authors advised that the factor also be added to the model as a main effect. For more information, see the documentation for R command s(), part of the mgcv package.
Additionally, we tested whether the possibility that the random effects (both on the intercept and on the spline) also differed by income group classification: log(cases ) = + , + + (log(age)) + , (log(age)) + offset[log(population )] There were two reasons that we ran alternative models with an interaction term rather than running separate models using data from each income group. First, our approach made it possible to do model comparisons between the models that did not stratify countries by income group (global models) are those that did (income-group models).
Model selection In order to select between the null model (no income group stratification) and the two models with income group stratification, we examined the 2 test of the deviance between models with degrees of freedom equal to the difference in the parameters between the two models. This is done via the likelihood ratio test shown with the command anova(model1$mer, model2$mer).

Probability Splines
For probabilities splines (splines III and IV, the probability of hospitalization and the conditional probability of death among hospitalized cases, respectively), we assume a binomial process as follows: Here, too, we compared the null model above with one in which the age trend differs by World Bank income group classification: logit( ) = + + + (log(age)) + (log(age)) Additionally, we also tested the possibility that the random effects (both on the intercept and on the spline) also differed by income group classification: logit( ) = + , + + (log(age)) + , (log(age)) In summary, model parameters are as follows: S1.7

S1.5 SPLINE MODELS
is the overall intercept (the probability of hospitalization or death at age 0 in any low-or middle-income country) study-specific ( ) random effect on the intercept , study-specific ( ) random effect on the intercept stratified by World Bank income group ( ) fixed-effect parametric coefficient terms for the World Bank income group . (log(age)) a global trend (fixed-effect) spline for the correlation between the outcome and age, constructed with a thin-plate smooth with a penalty on the null space (shrunk to zero). (log(age)) a trend (fixed-effect) spline for the correlation between the outcome and age, stratified by World Bank income group and constructed with a thin-plate smooth with a penalty on the null space (shrunk to zero). (log(age)) a set of study-specific ( ) random-effects splines constructed with a tensor product basis of a thin-plate smooth with a penalty on the null space (shrunk to zero) and a random-effects basis.
, (log(age)) a set of study-specific ( ) random-effects splines stratified by World Bank income group and constructed with a tensor product basis of a thin-plate smooth with a penalty on the null space (shrunk to zero) and a random-effects basis. offset[log(population )] predictor constrained to have a coefficient equal to one. In these models, it adjusts for the population of the study ( ). Below we describe the specifications for our model and our rationale for these choices.

S1.5.1 Spline specifications and computational considerations
Mapping discrete age data to a continuous number line. The age-specific data in the literature was presented by discrete age groups, but spline models would treat age as a continuous variable. For this reason, the midpoint of the age group was used to map onto the continuous real number line (i.e., observations of an age group of 0-3 months were considered to be the observations of children all aged 1.5 months).
Log-transforming the age variable. Studies in the literature tend to present the incidence of RSV at a relatively high resolution (narrow age bands) for children under the age of 12 or 24 months and for very broad age bands for children over 12 or 24 months of age. Estimating a spline model on such data (like on linear data) would run the risk that the data points of older age groups would have undue leverage over the spline.
Spline bases. We used a "shrinkage" variant of the thin-plate spline ("ts" in R), which "shrinks" the spline towards a straight line; in other words, the algorithm encourages shapes that are as close to a straight line as is sensible unless there is strong evidence that an alternative shape is warranted, thus avoiding overly complex curves to describe such diverse settings as the low-and middle-income countries. This was especially important for the probability of death (hCFR) splines, which were sometimes prone to unusual shapes to capture the few deaths that were observed in the data.
A tensor product basis was used to construct the random-effect splines specific to each study. Such a spline was the interaction of thin-plate splines and random-effect splines (spline basis "re" in R) in addition to random effect parametric terms that were assumed to be normally distributed (constituting a ridge penalty and specified by random=∼(EconomicSetting|studyno)) [3,6].
Number and placement of knots. We avoided choosing the ideal number of knots by using a shrinkage variant of the "shrinkage" version of the thin-plate spline, which places a knot at every data point and then estimates a "smoothing" parameter to avoid "unnecessary" inflections at each knot. See Section S1.3 of this supplement. The shrinkage version of the thin-plate smooth has a penalty on the null space to that the smoothing term is shrunk to zero.
It should be noted that we excluded studies that data for fewer than three age groups, as no age curve would be possible other than a straight line. However, we used these studies as an informal "out-of-sample validation" to visually assess how our splines would have predicted the incidence, hospitalization, and death from studies that were not used to estimate the spline models. The results for out-of-sample validations are in Sections S2-2.1-S2-2.4 in S2 Text.
Model comparison/variable selection. We selected the model according to the likelihood ratio test since the simpler models were nested within the more sophisticated models. The likelihood ratio test uses a 2 test S1.8

S1.5 SPLINE MODELS
of the deviance with degrees of freedom equal to the difference in the parameters between the two models.
Family for each outcome. As described in section S1.3 of this supplement, we used regressions assuming a Poisson distribution of the outcome (with log-link) for the incidence splines (Splines I and II) and regressions assuming a binomial distribution of the outcome (with logit-link) for the probability splines (Splines III and IV).
As is common practice for Poisson models, overdispersion was checked (via the check overdispersion command of the performance R package) [12]. While the community-based incidence splines (Spline I) showed no evidence of over-dispersion, the hospital-based incidence splines (Spline II) did, and so we modeled that spline with a negative binomial family of models -the same form as the Poisson function above but with an additional parameter modeling a larger variance than would be assumed with a Poisson family of models.

S1.5.2 Predictions of the outcome model and uncertainty
Return to the Table of Contents.
We assumed that the posterior predictions of the splines would arise from a product of 1) a matrix of parameters × iterations (5,000) representing the random draws from a multivariate-normal distribution of the coefficients for the polynomial spline and 2) a matrix describing the estimated spline. The product is then exponentiated (for incidence outcomes estimated with a log-link) or transformed with an inverse-logit formula to the range of (0,1) (for probability outcomes estimated with a logit-link). When we combined the 5,000 splines as illustrated in Fig 1 and described in section S1.4 of this supplement, we were able to calculate the credible intervals of each of the outcomes by taking the median, mean, and the 2.5th and the 97.5th percentile of the outcome of interest. S1.9 S1.6 BURDEN MODELS (BM) I & II

S1.6 Burden Models (BM) I & II
Return to the Table of Contents. As shown in Fig 1 there are two ways to calculate the burden of disease in one country: 1) for Burden Model I, we apply the country population size to the incidences calculated in Spline Model I and II, and then we aggregate cases into age subgroups as desired; 2) for Burden Model II we take the number of cases under 5 in each country as calculated by Shi's risk-factor model, apply the proportion of cases that occur in each month of age according to our splines, and aggregate cases into age subgroups as desired. Because there are two ways to calculate age-specific incidence in part (A) and two ways to calculate burden in part (B), four sets of burden estimates result.
Shi and colleagues presented two models in their paper: one was the total number of cases that would be seen in each country if the incidence was equal to the mean incidence from the meta-analysis. We have re-calculated the analogous incidence estimates according to our spline-based meta-regression in burden model I (BM I).
A second model presented in the supplement predicted the incidence of RSV per country according to a regression model of RSV incidence with a variety of risk factors for the disease. Their risk-factor model of incidence, however, does not disaggregate by month of age), it only presents the total incidence for all children under 5.

S1.6.1 BM I: burden models using OM I and II and population estimates
In an analogous manner to Shi et al's main analysis, we took the results of our meta-regressions for OM I and OM II, applied an estimate of the population of each month of age for each country, and estimated the number of cases, hospitalizations, and inpatient deaths to each country.
Each country's population structure in children under 5 was assumed to be equal to that calculated by the UN population database, which contains data on the number of children at birth, at age 1, and at age 5 [13]. Therefore, the estimates of cases implicitly take into account the infant mortality in each country. All incidence spline (OM II) × population To calculate cases, hospitalizations, deaths within the hospital for age groups 0-5.9 months old, 6-11.9 months old, and 12-59.9 months old, see below, section S1.6.3. S1.6.2 BM II: burden models combining the OM I & II, Shi et al's risk factor model of RSV incidence among all children under 5, and population estimates We have also made age-specific incidence estimates that integrated the estimates of RSV incidence for children under 5 in Shi's country-level risk factor model. It should be noted that our estimate of total cases under 5 was not different than theirs, but we re-apportioned cases within the age groups such that their sum would equal Shi's estimate of cases. Our basis for apportioning cases to different age groups was our outcome models, as illustrated in Fig 1. Like in BM I, we have two ways of apportioning cases, according to OM I and OM II, so therefore there are two versions of BM II as well: BM II Cases(OM I) = Risk-factor-model Incidence 0-59 months × Proportion of cases(OM I) × population BM II Cases(OM II) = Risk-factor-model Incidence 0-59 months × Proportion of cases(OM II) × population Where: • is a specific country • is the age in months • Risk-factor-model Incidence 0-59 months is the incidence of all children from 0-59 months of age in Shi's risk factor model of incidence for children of that age group. • Proportion of cases in each month of age, according to OM I:

Proportion of cases (OM I) =
All incidence spline (OM I) × population All incidence spline (OM I) × population • Proportion of cases in each month of age, same as above, but according to OM II instead of OM I: Proportion of cases (OM II) = All incidence spline (OM II) × population All incidence spline (OM II) × population • population is the population in country at months of age To calculate cases, hospitalizations, and deaths within the hospital for age groups 0-5.9 months old, 6-11.9 months old, and 12-59.9 months old, we aggregated each of these outcomes across as described for BM I, as detailed in section S1.6.1 of this supplement.
Furthermore, in addition to presenting the age-specific case estimates, we presented the number of hospitalizations and inpatient deaths that would result from a model that assumes the underlying community incidence of RSV is equivalent to Shi's et al's risk factor model.

S1.6.3 Burden models: computational considerations and uncertainty calculations
Approximating integrals. To calculate the number of cases, hospitalizations, and inpatient deaths in each age month, we approximated the integrals with respect to m, rather than calculating it analytically. We had to take this approach because we had to take into account that our splines were predicted at discrete ages rather than as a distinct mathematical expression that could be integrated. However, we judged that calculating incidence, hospitalization, and death at age intervals of 0.1 months (600 intervals between ages 0-59.9) gave us a sufficient resolution to approximate the integrals shown above.
Uncertainty in BM I. In order to calculate the uncertainty intervals of the cases, hospitalizations, and inpatient deaths described in the burden models for each country and groups of countries, we took the uncertainty in each of the outcome models, calculated as described in section S1.5.2 of this supplement. For BM I, the uncertainty in the burden estimates arose completely from the outcome models, as we fixed the population of S1.11 S1.6 BURDEN MODELS (BM) I & II each country such that no uncertainty was derived from the size of the population at risk. We considered this a sensible choice as the populations for most countries are known with a high degree of accuracy compared to the uncertainty in RSV outcomes.
Uncertainty in BM II. For BM II, there is uncertainty from two domains: 1) the outcomes models, from which the uncertainty estimation is described in section S1.5.2 of this supplement, and 2) the uncertainty in the estimates of RSV cases in the community from Shi and colleagues' country-level risk factor model. To match the 5,000 samples of our splines, we drew 5,000 samples from a log-normal distribution described by the log-mean of the incidence and the standard deviation on the log-scale that would result in the 95% confidence intervals that were reported. We applied the sample vector of cases from the risk-factor model to each sample of our splines as described in section S1.6.2 of this supplement, resulting in a sample vector for each of the outcomes of BM II. We could then take the median and 95% credible intervals of the sample vectors of BM II to report in the results. S1.7 CHARACTERIZING THE AGE PROFILE OF RSV BURDEN (MEAN, MEDIAN, AND THE PEAK AGE OF EACH OUTCOME) S1.7 Characterizing the age profile of RSV burden (mean, median, and the peak age of each outcome) Return to the Table of Contents. For each iteration, we calculate the mean age of cases per iteration by taking a vector of the ages and weighting by the proportion of cases that occurs at each age. The median age per iteration was done by calculating the age at which half of all cases would have already occurred (usually close to the mean age, except when there is a long right tail in the distribution of the cases.) The peak age was calculated by taking the age at which incidence is highest in each iteration of the outcome models. For all measures (mean, median, and peak age) we also performed the analogous approach for hospitalizations and inpatient deaths. Then the 95% credible intervals for each of the mean, median, and peak ages are calculated by taking each of these measures in each iteration and calculating the median and 2.5% and 97.5% percentiles of each. S1.13 S1.8 SEVERE AND VERY SEVERE DISEASE S1.8 Severe and very severe disease Return to the Table of Contents. 'Severe' or 'very severe' RSV disease is defined by the WHO Integrated Management of Childhood Illnesses (IMCI) [14].
• RSV-associated severe ALRI: cough or difficulty in breathing with chest wall indrawing and laboratoryconfirmed RSV; children aged <2 months, increased RR (>60 breaths/ min) OR chest wall indrawing and laboratory-confirmed RSV • RSV-associated very severe ALRI: cough or difficulty in breathing with one danger sign (cyanosis, difficulty in breastfeeding or drinking, vomiting everything, convulsions, lethargy, or unconsciousness, head nodding) • Hospitalised RSV-associated severe ALRI: hospitalized ALRI cases with hypoxemia (as defined below) and laboratory-confirmed RSV • Hospitalised RSV-associated very severe ALRI: hospitalized ALRI with one danger sign (cyanosis, difficulty in breastfeeding or drinking, vomiting everything, convulsions, lethargy, or unconsciousness, head nodding) OR proxies for very severe disease -mechanical ventilation OR ICU admission.
To explore the window of protection against severe and very severe RSV disease, we devised a model that connects to the model of disease depicted in Fig 1. The model integrating severity is shown in Fig B of this supplement. As can be seen here, the connection between severe disease and death is not clear, and therefore we have not integrated severity into our main model. This is because severity is not always a pre-condition to hospitalization, and hospitalization is not necessarily an obligation in the context of severe cases; a physician's overall impression of the child plays a role, and in some contexts, wealth, access, and space also play a role on whether a severe case is hospitalized.
Another important note is that although very severe disease could be calculated conditional on the severe disease of community-based surveillance, severe and very severe disease was not related in the studies of hospital-based surveillance. Therefore, among studies of hospital-based surveillance, the very severe probability is calculated conditional on hospitalization, but not on (non-very) severe disease. The reason for this is that severe hospitalized disease includes hypoxemia, but very severe disease could include other 'danger signs' as defined by the WHO IMCI (see above definitions and Shi et al's supplement page 85 [15]).
For OM I, the basis of incidence calculations come from community-based incidence studies: All incidence = community-based incidence spline All incidence -severe cases = community-based incidence spline × probability of severe RSV spline All incidence -very severe cases = community-based incidence spline × probability of severe RSV spline × probability of very severe RSV spline Hospitalization incidence = community-based incidence spline × hospitalization probability spline Hospitalization incidence -severe cases = community-based incidence spline × hospitalization probability spline × probability of severe RSV among the hospitalized spline Hospitalization incidence -very severe cases = community-based incidence spline × hospitalization probability spline × probability of very severe RSV among the hospitalized spline S1.8.1 OM II: hospital-based studies as a basis for incidence For OM II, the basis of incidence calculations comes from hospital-based incidence studies: S1.14 S1.8 SEVERE AND VERY SEVERE DISEASE Figure B: Relationship between cases, hospitalizations, and severity in children from birth to 59 months of age. The boxes show incidence and the ovals show the probability of progressing from a case in the community to a case in the hospital, or from a case in the hospital to a fatal case, and the probability that community or hospital cases are severe or very severe. It should be noted that the product of incidence and probability yields an incidence. The colored boxes show the splines (splines I-IV) that we estimated from data in the literature, and the black boxes show incidence derived as products of our splines. In OM I, we use the incidence of community-acquired RSV cases (spline I) and the probabilities of hospitalization (spline III) and we derive the incidence of hospitalizations. In OM II (see figure below) we use the incidence of RSV cases from hospital-based studies (spline II) and the probabilities of hospitalization to back-calculate the community-based incidence; the severe and very severe cases were then calculated in a manner analogous to that in OM I. From these spline models, shown in Fig B and C in S3 Text we also estimated the peak, median, and mean age of infection, shown Fig D in S3 Text.
All incidence = hospital-based incidence spline hospitalization probability spline All incidence = hospital-based incidence spline hospitalization probability spline × probability of severe RSV spline All incidence = hospital-based incidence spline hospitalization probability spline × probability of severe RSV × probability of very severe RSV spline S1.15 S1.8 SEVERE AND VERY SEVERE DISEASE Hospitalization incidence = hospital-based incidence spline Hospitalization incidence = hospital-based incidence spline × probability of severe RSV among the hospitalized spline Hospitalization incidence = hospital-based incidence spline × probability of very severe RSV among the hospitalized spline S1.16 S1.9 COUNTRIES INCLUDED, WORLD BANK INCOME GROUP, POPULATION, AND LIFE TABLES S1.9 Countries included, World Bank Income Group, population, and life tables Return to the Table of Contents. We gathered the income groups for 2020 from the World Bank [16], the population and the population and the life table information from the UN Population Prospects interfacing via the wpp2019 R package (version 1.1-1) [13,17].  Countries that were in Li's risk factor model but are HIC and excluded from our analysis. The difference of 8.3M constitutes less than a 1.5% difference in population, so we do not believe it drives the differences in our results.

ISO3
Country  Countries that were LIC, LMIC, or UMIC but were not in Li's risk factor analysis. For comparability, we've excluded them from our analysis as well. However, this is a 5% difference in the population of interest and is largely concentrated in UMICs. S1.17 S1.9 COUNTRIES INCLUDED, WORLD BANK INCOME GROUP, POPULATION, AND LIFE TABLES

ISO3
Country name Income Group  To make estimates comparable, these studies were also not included in our analysis.