Skip to main content
  • Loading metrics

Modelling the first wave of COVID-19 in India

  • Dhiraj Kumar Hazra,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations The Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai, INDIA, Homi Bhabha National Institute, BARC Training School Complex, Anushaktinagar, Mumbai, INDIA, INAF/OAS Bologna, Osservatorio di Astrofisica e Scienza dello Spazio, Area della ricerca CNR-INAF, Bologna, ITALY

  • Bhalchandra S. Pujari,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Scientific Computing, Modeling and Simulation, Savitribai Phule Pune University, Ganeshkhind, Pune, INDIA

  • Snehal M. Shekatkar,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Scientific Computing, Modeling and Simulation, Savitribai Phule Pune University, Ganeshkhind, Pune, INDIA

  • Farhina Mozaffer,

    Roles Data curation, Investigation, Methodology, Validation, Visualization, Writing – review & editing

    Affiliations The Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai, INDIA, Homi Bhabha National Institute, BARC Training School Complex, Anushaktinagar, Mumbai, INDIA

  • Sitabhra Sinha,

    Roles Formal analysis, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

    Affiliations The Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai, INDIA, Homi Bhabha National Institute, BARC Training School Complex, Anushaktinagar, Mumbai, INDIA

  • Vishwesha Guttal,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Centre for Ecological Sciences, Indian Institute of Science, Bengaluru, INDIA

  • Pinaki Chaudhuri,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations The Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai, INDIA, Homi Bhabha National Institute, BARC Training School Complex, Anushaktinagar, Mumbai, INDIA

  • Gautam I. Menon

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations The Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai, INDIA, Homi Bhabha National Institute, BARC Training School Complex, Anushaktinagar, Mumbai, INDIA, Departments of Physics and Biology, Ashoka University, Rajiv Gandhi Education City, Sonepat, Haryana, INDIA


Estimating the burden of COVID-19 in India is difficult because the extent to which cases and deaths have been undercounted is hard to assess. Here, we use a 9-component, age-stratified, contact-structured epidemiological compartmental model, which we call the INDSCI-SIM model, to analyse the first wave of COVID-19 spread in India. We use INDSCI-SIM, together with Bayesian methods, to obtain optimal fits to daily reported cases and deaths across the span of the first wave of the Indian pandemic, over the period Jan 30, 2020 to Feb 15, 2021. We account for lock-downs and other non-pharmaceutical interventions (NPIs), an overall increase in testing as a function of time, the under-counting of cases and deaths, and a range of age-specific infection-fatality ratios. We first use our model to describe data from all individual districts of the state of Karnataka, benchmarking our calculations using data from serological surveys. We then extend this approach to aggregated data for Karnataka state. We model the progress of the pandemic across the cities of Delhi, Mumbai, Pune, Bengaluru and Chennai, and then for India as a whole. We estimate that deaths were undercounted by a factor between 2 and 5 across the span of the first wave, converging on 2.2 as a representative multiplier that accounts for the urban-rural gradient. We also estimate an overall under-counting of cases by a factor of between 20 and 25 towards the end of the first wave. Our estimates of the infection fatality ratio (IFR) are in the range 0.05—0.15, broadly consistent with previous estimates but substantially lower than values that have been estimated for other LMIC countries. We find that approximately 35% of India had been infected overall by the end of the first wave, results broadly consistent with those from serosurveys. These results contribute to the understanding of the long-term trajectory of COVID-19 in India.

Author summary

Making sense of publicly available epidemiological data for the COVID-19 pandemic in India presents multiple challenges, largely to do with the quality of the data. Here, we describe ways of addressing these questions by studying the data using a well-parameterised, detailed compartmental model together with Bayesian methods, alongside information derived from pan-India serological surveys. We focus on the first wave of the Indian pandemic, across the interval Jan 30, 2020 to Feb 15, 2021. We estimate that deaths were under-counted by a factor between 2 and 5 across the span of the first wave and that cases were under-counted by a factor of between 20 and 25 towards its end. We estimate an infection fatality ratio (IFR) in the range 0.05—0.15. We find that approximately 35% of India had been infected overall by the end of the first wave, a number that helps us better understand the context in which the second and later waves unfolded.


COVID-19, a disease of zoonotic origin whose causative agent is the beta coronavirus SARS-CoV-2, is believed to have first infected humans towards the latter part of November 2019, in or near the Chinese city of Wuhan [1]. It then spread, aided by international travel networks, around the world, with devastating epidemics in the USA, the UK, Europe and South America, even as cases in China declined [2]. With close to 440 million recorded cases and 6 million recorded deaths worldwide as of early March 2022, the COVID-19 pandemic is likely the most consequential epidemiological event of our lifetimes.

The first COVID-19 case in India was detected at the end of January, 2020 [3]. By March 25, 2020, the total numbers of Indian cases had increased to just above 600. At that point, the Indian government ordered a sequence of stringent country-wide lockdowns that were to last for 68 days (March 25—June 1, 2020) [4]. The lockdown was then relaxed in multiple phases. All through, reported cases of COVID-19 kept rising, although the pace of increase was arguably held in check by the stringency of the lockdown [5]. A peak of about 98,000 cases was reached in mid-September. Cases in India then declined steadily for the next four months, even as cases rose elsewhere in the world. By mid-January, it appeared as if India might have avoided the multiple waves of cases seen elsewhere. However, numbers at the level of individual cities and states presented a more complex story. Delhi and Mumbai, for example, saw multiple waves of cases [6]. The decline in cases at the all-India level persisted from the peak around mid-September to the middle of February, when they began to increase again [7]. This increase is linked to the emergence of more transmissible variants, specifically the B.1.617 and B.1.1.7 variants [8]. The pace of this increase was far steeper than the pace at which the first wave of cases were recorded. By roughly mid-May 2021, the second wave began to decline, from a peak of a little more than 410,000 cases daily at its maximum; a later third wave, associated with the Omicron variant spread across India across January 2022. A time-line of the first wave of COVID-19 in India is displayed in Fig 1. This figure shows the increase in cumulative cases, in tests and in deaths, highlighting the period of the national lockdown.

Fig 1. The timeline of the first wave of COVID-19 in India, beginning from the time of the first detected cases in India and ending on February 15, 2021, across different months during that period.

The period of India’s nationwide lockdown, between the dates of March 25 and May 31, 2020, is shown as a green box. Daily infection and death curves are shown in filled light blue curve and light brown inverted curves respectively. Specific milestone values for cases, deaths and tests are also provided.

Epidemiological models are useful because they allow us to reason about parameters that control pandemic spread, and how interventions serve to modulate them, extrapolating from a trajectory of cases and deaths. Population-level epidemiological information, such as results from serological surveys, help to further constrain these models. The earliest models for COVID-19 in India come from the work of Mandal et. al. [9]. This compartmental model addressed two issues, the effects of imperfect airport screening measures and the question of optimal strategies for mitigation once the disease had spread to the major Indian cities. Chatterjee et al. used a stochastic SEIR model to examine the effects of lockdowns on case counts [10]. Work from the group of Bhramar Mukherjee provided early insights into the progress of the pandemic and continues to do so [11]. Their work uses a Bayesian extension of the SIR model, the extended susceptible-infected-removed (eSIR) model, to project case-counts and deaths. Agent based models have provided useful insights, at the level of full cities, into mitigation methods and the effectiveness of non-pharmaceutical interventions [12]. Related references which model COVID-19 in India are [10, 11, 1322, 2235]. These models are very largely compartmental models of varying degrees of complexity [36]. Almost all of them were aimed at understanding the initial stages of the evolution of the pandemic and the role of interventions. To our knowledge, with the exception of Ref. [11], none have described the full trajectory of the epidemic by integrating model with data.

As the second most populous nation in the world, with a population of close to 1.4 billion, the consequences of an explosion of COVID-19 cases in India could easily dwarf its impact anywhere else [37]. What remains unclear is the extent to which the Indian population has so far been infected by COVID-19 and whether any proximity to herd immunity through infection might slow later waves of disease [3840]. Large-scale serological surveys (serosurveys) from the Indian Council of Medical Research (ICMR), adjusted for test sensitivity, estimate the overall fraction of those with a prior COVID-19 infection to be about 22% by December 2020 -January 2021 [4143]. The first two ICMR serosurveys obtained a nation-wide seroprevalence of 0.73% in May-June 2020 and of 6.6% in August-September 2020. A strong gradient of seroprevalance between urban and rural India has been a consistent feature of these national serosurveys. These were done in just 70 districts of a total of about 740 in India, however, so only represent a relatively small cross-section of the country. Other serosurveys have studied specific Indian cities, among them Bengaluru [40], Chennai [44, 45], Delhi [46], Pune [47] and Mumbai [48]. These city-based surveys estimate that fairly substantial fractions of the urban population should have been infected by the time that the first wave waned. In many cases, such as for Delhi, Mumbai and Pune, this fraction has been estimated at over 50%. However, because of the variety of test kits used, it has proved hard to compare results from different serosurveys. Also, these presumed high levels of prior seropositivity seem to have done little to offset the dramatic rise of cases seen at the onset to the second wave in urban India. This raises questions of potential errors in the serosurveys tied to the sensitivity and specificity of the kits used, as well as of the importance of reinfections [49].

This paper describes the analysis of first wave of COVID in India, from Jan 2020 to February 2021. To do so, we use an age-stratified, contact-structured compartmental model for COVID-19 spread in India. We call this model INDSCI-SIM. It is a 9-component model, adopted and modified from [50], with rates bench-marked to a wide range of available data. It projects numbers of both mild and severe cases and can be generalized to a variety of India-specific situations. Finally, it can be used to incorporate the modelled effects of a number of public health interventions, including lock-downs as well as progressive improvements in case identification. At the methodological level, our techniques can account for improvements, with time, in case identification as well as in treatment leading to lower overall mortality, within a fully Bayesian framework.

Our work describes the trajectory of COVID-19, including fits to both cases and deaths, across all districts in the southern Indian state of Karnataka, as well as in the capital of that state, Bengaluru. Also included are similar fits to cases and deaths in multiple Indian cities, including Mumbai, Delhi, Chennai and Pune, as well as to aggregate data for the whole country. Via this excersise, we estimate that approximately 35% of India was infected at the time the second wave struck, roughly consistent with serosurvey results. We suggest that cases have been under-counted by a factor ranging from about 90 at the onset of the Indian epidemic to about 20 at the end of the first wave. We estimate that a multiplicative factor of about 2.2 between counted and actual deaths might be a reasonable estimate across India for the first wave of the pandemic, although even this estimate relies on a number of approximations. Finally, our results are consistent with the observation that the trajectory of the disease across India has been inhomogeneous, with complex spatio-temporal behaviour at the level of districts and states summing to give smoother results for the country as a whole.

1 Materials and methods

The epidemiological compartmental model represented by INDSCI-SIM is shown in Fig 2. INDSCI-SIM is based on a model introduced in Ref. [50], that expands the classical SIR & SEIR framework with compartments that account for an asymptomatic infectious state [5052]. The model also accounts for variations in the severity of disease across the infected class [50]. There is a compartment for hospitalized cases as well as compartments that count deaths as well as recoveries. The model parametrizes the transmission of infection from the infected classes on the susceptible class, also allowing this to depend on time. This is an indirect, yet useful, way of describe increased stringency as well as relaxations in non-pharmaceutical interventions. These include both direct effects such as increased testing, the imposition of mask-wearing and the effects of isolation and quarantines. They also include, inter alia, indirect effects, such as increased public awareness and related modification of behaviour [53].

Fig 2. Schematic diagram of the compartmental model used in this analysis, adopted from [50].

The dotted lines indicate the force of infection from the infected compartments on the susceptible population. Transitions between compartments, denoted via solid lines and an arrow, are defined as in Eq 1 which contains both bare rates as well as information as to how flow is to be divided between compartments. The number of persons transiting from one compartment to the other in each day depends on the product of the transition rates and the branching fractions.

1.1 Disease progression

SARS-CoV-2 infection manifests as an acute respiratory infection, progressing to respiratory failure in a small number of patients [5456]. It can result in a range of clinical manifestations, from asymptomatic or mild infection to severe, requiring hospitalization [57]. Among patients who are symptomatic, the median incubation period is approximately 4 to 5 days [58]. About 97% have symptoms within 11 days after infection [59]. These can further be categorized as mild, severe and critical [60]. Patients who are hospitalized, the severe category, can progress to severe pneumonia and acute respiratory distress syndrome (ARDS). A fraction of these patients will require ventilation and an even smaller fraction may die [61]. Older patients experience greater clinical severity of COVID-19, which we account for via age-stratified parameters [61, 62]. Co-morbidities such as cardiovascular disease, diabetes and obesity are common underlying conditions associated with worse clinical outcomes and increased disease severity [6264]. These can be incorporated through a composite risk score affecting branching rates between mild and severe disease states, although we do not do so here. Males may experience more severe disease than females, and genetic variations, including the ABO blood type, have been implicated in clinical outcomes for patients with COVID-19. Our model ignores these effects [65, 66]. Around 40–75% of infections may be asymptomatic, a fairly broad range. We note that numbers for India suggest a larger fraction of asymptomatic cases than reported elsewhere [67, 68].

1.2 Transmission dynamics

The compartmental structure we consider contains susceptible (S), exposed (E), asymptomatic infectious (Ia), pre-symptomatic infectious (Ip), mildly symptomatic infectious (Im), severely symptomatic infectious (Is), hospitalized (H), dead (D) and recovered (R) compartments. Transitions between these model compartments are shown in Fig 2. The dotted lines indicate the force of infection from the infected compartments on the susceptible population. Frequency-dependent transmission is assumed. As appropriate for a fast-spreading infection, we ignore the effects of demography [51].

The model equations, for unstructured compartments, are (1) The left-hand side of these equations denote first-order time derivatives. The population size is N. Infectious individuals in any associated compartment can infect the susceptible population regardless of symptoms and severity with a fixed transmission rate β. However, this quantity is modulated by the relative intensity of contacts between susceptible and infectious individuals whose effect is simply specified here through factors of ϵ. These can, in principle, vary with time and can also be chosen to vary between compartments.

Each infected individual enters an ‘exposed’ compartment (E), spending an average latent period 1/γ days before becoming infectious [69]. A fraction α of this infectious population remains asymptomatic (Ia) until recovery. Asymptomatic individuals are assumed to be infectious for an average period of 1/λa days [70, 71]. Those in the remaining fraction, of (1 − α), enter a pre-symptomatic state (Ip) where clinical symptoms are not exhibited for a relatively short average period of 1/λp [70]. Individuals in this pre-symptomatic compartment go on to developing mild or severe symptoms [72]. All rates applicable to the model are provided in Table 1.

Table 1. Age dependent branching ratios between compartments.

Our values for αi and μi are from those consolidated by the COVASIM program, Refs. [76] and [77]. We use IFR’s from data from LMIC’s taken from fits in the paper of Ref. [78], where the following formula is derived: log10(IFR) = −3.27 + 0.0524 * age. We modify only the δi for the 80+ age group to account for the leveling off of mortality in older age groups described by Ref. [24]. The IFR for each age group can be obtained as IFRi = (1 − αi)(1 − μi)δi. We use the estimated δi numbers as our initial IFRs, although when we allow them to vary as part of our minimization strategy to find the effective IFR’s we multiply all δi’s by a single smooth time-varying factor, optimizing this against data.

A fraction μ of symptomatic cases is assumed to develop mild symptoms (Im), while the remaining fraction (1 − μ) of cases are transferred to the severe class (Is). Infectious cases with mild symptoms recover without hospitalization after 1/λm days. Severe cases require hospitalization after an average of 1/λs days. From the hospitalized population (H), we assume that a proportion (1 − δ) recovers successfully (R), after spending an average duration of hospitalization 1/ρ days. The remaining fraction, δ, of the hospitalised, will die. The numerical values of these parameters are specified in Table 1 [73, 74].

Of these parameters, the infectivity parameter β is particularly central. It determines the effective reproduction ratio as the epidemic proceeds. The force of infection arising from asymptomatic cases alone is assumed to be lower in comparison to that arising from the pre-symptomatic, mildly symptomatic and severely symptomatic cases.

1.2.1 Age-structured model.

Incorporating age structuring in addition to contact matrices describing the nature of social contacts are essential to the simulation,s as the fractions of infected population distributed in different compartments, as well as the IFRs, strongly depend on age. The IFRs, in particular, rise almost exponentially. NPIs such as a lockdown (or partial lockdown) affect different age groups diferently as different institutions, such as schools, workplace, markets etc. are largely populated by specific age groups. The societal structures peculiar to India, with large multi-generational families living under the same roof, suggest that incorporating structured contacts between age-groups should provide a better description of disease spread.

We incorporate age-structuring, dividing each compartment into sub-compartments, representing the age-intervals 0–9, 10–19, 20–29, 30–39, 40–49, 50–59, 60–69, 70–79 and 80+. We denote the sub-compartment by adding the subscript i, for each age-bracket, to each of the compartment labels, thus obtaining{ and Ri}. We define βij as the bare infectivity term coupling age-brackets i and j, although we will assume βijβ here. The quantities defines the intensity of contacts between these age-brackets. We assume that, during the lockdown, only contacts at home are effective, since schools and workplaces are closed. We can also assume that contacts in public transports and other places are negligible. Since, in India, schools largely remained closed even after the lockdown ended and workplace crowding had reduced substantially, we assume that even without any lockdown, work contacts are only 50% effective. These contacts are obtained from those compiled by Ref. [75]. The appropriate contact matrices are discussed in Fig A in S1 Text.

To incorporate time-dependence in contacts, we generalize the four ϵ terms, ϵa, ϵs and ϵm and ϵp, making their magnitude time-dependent. This allows for varying implementation of testing, quarantining and isolation rules, lock-downs and other non-pharmaceutical interventions.

Our generalized equations, for Na age-brackets, are then (2)

The values used here are listed in Table 1.

1.3 Time-dependence of effective contacts to model non-pharmaceutical interventions

The effective infectivity in the INDSCI-SIM model is a product of three terms. The first is a bare infectivity parameter (βijβ)and the second is the term involving the contacts, the Cij’s. The third is the temporal modulation, the factors of ϵ. While it is only variation in this overall product that is of significance, it allows us to conceptualize interventions in a more targeted way: the sudden changes in infectivity imposed by a lockdown can be associated with abrupt changes in β whereas slow and secular improvements in masking etc. can be associated to the smoothly varying factors of ϵ. This product is affected by mask-wearing, hand-washing, voluntary self-isolation or self-quarantining and the maintenance of social-distancing. It is also influenced by global non-pharmaceutical interventions such as closures of schools, malls and cinemas, large-scale lock-downs, restrictions on movement and autonomous modifications of social behaviour.

We first use a “global” value of β, one that differs in the lockdown period and in the open period. We then model compliance with restrictions by modulating this with a time-dependent factor, using a hyperbolic tangent function of time, as described below. This function comes with an associated time-scale which is an output of our minimization procedure. A quantification of mask-wearing India-wide comes from studies incorporated into the IHME model for India that are standardized using survey results from the University of Maryland Social Data Science Center, from the Kaiser Family Foundation and the YouGov COVID-19 Behaviour Tracker survey [79]. This data motivates the use of such an interpolating function, since it shows an initial lag period, a sharp rise as public awareness increases and saturation at about 70% reflecting social acceptance of mask-wearing.

We also allow for a decay of the relative intensity of contacts between susceptible and infected individuals by assuming an exponential decay of the ϵi terms, viz., (3) Here, τi represents the characteristic time-scale describing the increased effectiveness of non-pharmaceutical interventions. These could include restrictions on crowding, improvements in screening procedures as well as increased testing. The net effect of these could be chosen to be different for each of the infectious categories. Although the force-of-infection ultimately involves a product of both the β(t) and the ϵ(t) terms, splitting them out thus provides somewhat more control over the specifics of the non-pharmaceutical interventions as well as the ability to account for new variants that might affect the value of β but not the ϵ-s.

1.4 Estimating R0

Given the central equations defining our model, we compute the dominant eigenvalue of the next generation matrix [80] obtained from Eq 1. This is the basic reproductive ratio, R0. This result, derived using a next-generation method outlined in SI, yields (4) In the equation, β is the infectivity parameter. The reciprocal of the λ’s capture the timescales associated with different stages of the disease progression. Together, they determine the serial interval associated with the disease. Note that in the relative intensity of the contacts, the (ϵ)’s are simply constants; likewise, α and μ representing proportions are dimensionless constants. See Table 2 for the description and values of each of these parameter values.

Table 2. The transition rates between compartments and the efficiency parameters.

These parameters are fixed during the analysis and their values are taken from [50, 84].

1.5 Estimating R(t)

The time evolution of R(t) is obtained from Eq 2. We obtain the next generation matrix (), incorporating age stratification and contact matrices. Lockdown changes the contact matrices; we take this into account in our estimation of R(t). The computation of the next generation matrix, and thence R(t) corresponding to Eq 2, is discussed in Section 1 in S1 Text. Comparing model predictions with the data generates samples of parameters. From these samples we compute the bounds on R(t) for each region of interest.

1.6 Computing effective R(t) from the data

A standard way to describe infection spread is to calculate the effective reproduction rate R(t) at a given time t. To estimate R(t) we use a Bayesian approach developed by Bettencourt and Ribeiro [81], later modified by K. Systrom [82]. In this approach, given k new cases, the probability distribution of R(t) on a certain day t is: where P(k|Rt) is the likelihood of seeing k new cases given R(t) (this is assumed to follow a Poisson distribution), P(Rt) is the prior, and P(k) is the probability of seeing k cases. The method of deciding the appropriate priors is described in Ref. [81, 82].

1.7 Fixed parameters

Several parameters determine the progress of the pandemic in the INDSCI-SIM model. These include parameters which remain fixed, such as the rate of transitions between the compartments, described above, and a choice for the initial IFR. Other parameters are allowed to vary so as to model the effects of non-pharmaceutical interventions or optimized to fit available data, as discussed below. We assume that asymptomatic patients are less likely to pass on infection to susceptibles by a factor of 2/3, consistent with data from Ref. [67, 83]. The choices of our fixed parameters are provided in Table 2.

1.8 Variable parameters

We maximize a likelihood function to obtain optimal fits to the data, sampling a broad prior distribution in parameter space to assess uncertainties in our description of the disease progression. Our simulations are initiated around 2 weeks prior to the first death being reported. Since the time evolution of any similar compartmental model with constant parameters and no births or deaths would yield only a single wave, we use an adaptive parametrization to address multiple waves of COVID-19 cases. The parameters that enter our description are described below.

We choose a generic functional form for all our parametrizations that involve parameter changes with time. This form interpolates between a high and a low value, and is a function of a characteristic time-scale. We choose a hyperbolic tangent function for concreteness, and because it is easy to specify, but any other suitable function could be used in its place.

  1. Initial exposed (Einitial): The exposed people on the first day of simulation, distributed between age groups according to the population fraction.
  2. IFR-associated parameters (ΔIFR, δi(tinitial), δi(tfinal) and DIFR): A number of studies [8587], suggest that the IFR has decreased over time. This is attributed to an improved clinical handling, new pharmacological treatments such as corticosteroids, non-pharmacological treatments such as proning and simply earlier interventions and finally the potential prophylactic consequences of lower viral load exposure from masking [85]. For concreteness, we choose a specific functional form to describe this decrease, accounting for it by assuming a smooth transition governed by a hyperbolic tangent function. We model the variation of as: (5) where DIFR is the transition time-point (here, tied to the Hospitalized Fatality Ratio) and ΔIFR is the characteristic time width of the transition. This introduces four parameters, δi(tinitial), δi(tfinal), the timescale ΔIFR and the transition point itself, DIFR. Although this formulation is general, allowing for age-bracket dependent variation of the IFRs with time, we choose to allow all age-brackets to have the same functional behaviour, rendering the index i redundant. We take δi(tfinal) to be a sixth of δi(tinitial): given an initial IFR of 0.3% this potentially allows the IFR to decay across the range 0.3% to 0.05%, over a timescale defined by ΔIFR, with the cross-over point between the upper and lower limit defined by DIFR.
  3. Bias(b) and bias-variation (Δb): A large fraction of the population remains asymptomatic to COVID-19 infections. In addition, some symptomatic patients may also opt not to be tested. Since testing in India has been limited throughout the first wave, a substantial percentage of actual infections can be expected to have remained undetected, with detected infections always represent an under-counting of the true numbers of infected [88]. We use a bias parameter b to accommodate this scaling relation (actual daily infection = b(t) × reported infection), introducing another parameter to estimate. We parametrise the bias factor as: (6) where Δb represents the bias variation timescale. We use the transition time Dbias to be 3 months from the beginning of March but checked that the results should depend minimally on Dbias if the priors on the b(tinitial) and Δb are wide enough. This introduces three parameters b(tinitial), b(tfinal) and Δb. We fix b(tfinal) so that it will reach 1 asymptotically. Our calculation obtains the daily number of infected people from the day. We then fix a multiplier that relates the actual numbers of infected to those detected each day. This number, for those expected to be detected in each day, is then compared with the numbers of those reported infected.

If the reported numbers of daily infected cases, or the daily numbers of deaths, shows multiple peaks, a single parametrization will not capture this behaviour. The sources of this behavior are complex functions of government policy constraining the contacts between people, of the sum of individual actions taken to prevent infection and also of the entry of new potentially more infectious variants. Our analysis must be flexible enough to account for these. Set against this is the requirement that we should not over-determine the model, by allowing for a large number of such changes.

We allow our parametrization to vary in the following way: We assume that the time over which we choose to model the data is divided into N segments, with each segment denoted by i. This defines N − 1 break points or nodes. These node points are same for both new cases and deaths, with the lag between these already taken into in the model via various timescales of progression of the disease. The infectivity parameter β and the timescale τ can change in these segments. For simplicity, τ is assumed to be the same between segments. We thus parametrize the model with the following:

  1. βi: Infectivity within the i’th window.
  2. τi: Timescale within the i’th window.
  3. Nodei: The position of the break points (dates) across which βi and τi are change. We decide the required number of break points by comparing the Bayesian evidence for the models with different break points.

Comparisons of the Bayesian evidence indicates that different τi’s defined in each segment are not favored. Thus, we work with N β’s, a single τ and N − 1 Node parameters. For N windows, then, we will have N + 1 + (N − 1) parameters specific to adaptive parametrization, 4 parameters (Einitial, DIFR, ΔIFR, b, Δb) common to both parametrizations and 2 noise error parameters (for modelling the fluctuations in the infected and death data discussed in subsubsection 1.9.1).

1.9 Bayesian modeling

Our model is a mechanistic model that should capture the significant aspects of the dynamics of COVID-19 disease spread. It has a number of parameters which must be optimized. We achieve this optimization through a simultaneous fit to the data on deaths and on detected cases. The latter is related to the “true” number of cases through the time-dependent bias factor.

We address this optimization through Bayesian methods. In such methods, probability distributions over parameters, and not point estimates, are obtained [89, 90]. Bayesian models require a prior distribution over parameters to be specified as well as a likelihood of observing the data under a specific assignment of parameters. Given parameters and initial conditions, a compartmental model defines a unique solution for each of the compartments.

Such methods begin with prior estimates for the parameters entering the model, usually chosen so that they are at least partially constrained by prior knowledge [89]. A likelihood function, chosen appropriately, estimates the probability with which the observed data can be accounted for by a specific parameter set. This leads to a posterior distribution over the parameter set. Credible (confidence) intervals can be derived from such calculations.

We now address the role of initial conditions and sensitivity to parameter values in the context of our Bayesian modelling for parameter estimations. We initiate our model simulations when the infected population is small enough that the susceptible population is not substantially depleted. With this starting point, the dependence on this initial condition is negligible. All other important parameters are estimated using Bayesian methods which incorporate uncertainties in their values. Since we sample over an ensemble of initial values, the dependence on initial conditions is also expected to be minimal. Furthermore, Bayesian modelling naturally allows for parameter variation (e.g. for parameters listed in Table 3), finding an optimum range of posterior distributions constrained by the observed data.

Table 3. Priors on model parameters used in our analysis to account for time varying parameters.

1.9.1 Likelihood model.

We compute the likelihood after a logarithmic transformation, optimizing the product of the likelihoods from the infected and death data as (). We choose a form for the likelihood function that accounts for two sources of error [91]. We assume no correlation between the variances of infected and death data and treat them independently in the joint fit. We use a Gaussian log likelihood defined as: (7) where represents the variance in the data. This is the sum of a measurement error σME and of an intrinsic scatter σ. represents a suitable function (see below) of the computed value of infected/death cases, while represents reported infected/death data on day t scaled with the same function. The data must first be transformed so that it is normally distributed. By log scaling the data we find the distribution is close to normal around the mean, apart from minor outliers. Thus and represent the logarithm of the computed and reported values respectively.

In Eq 7, the ln ’s are computed for both infected cases and deaths and are then added. The likelihood has 2 error parameters, corresponding to the scatter in the infections and death data. The parameter σ appearing in the intrinsic scatter term encodes both parameters σ1 and σ2, corresponding to scatters in detected infection and death reports on a log scale. They appear in these two likelihood terms as defined in Eq 7. Note that the correlation in the data at different times can, in principle, be modelled through an error covariance matrix. We ignore such correlations here. In some cases reporting error leads to negative death numbers. Unless they are large (> 10) where we regularize the data by the absorption of outliers, we reject the data point.

1.9.2 Priors.

We use uniform priors on all the parameters. The ranges for each parameter are provided in Table 3. We ensure that the priors are broad enough to generate a two-tailed marginalized posterior distribution where the parameter can be constrained by the data.

Only for the period between nodes, the priors depend upon the zone that is being studied. With a visual inspection of the data, we can determine the possible times of transition. Different zones have local peaks at different time of the pandemic owing to the reasons described in subsection 1.8. To capture these effects, the priors on the nodes remain region dependent. However we impose wide priors around those transition times to remove possible bias. If the entire timeline is divided into N windows, we will have N − 1 nodes. The lower limit on the i + 1’th Node is the day following the upper limit of i’th Node.

1.9.3 Parameter estimation and post processing.

The posterior probability p(θ|D, M) for the parameter set θ of a model M given data D is given by Bayes Theorem, (8) where p(D|θ, M) is the likelihood and p(θ|M) the prior probability distribution. p(D|M) is the Bayesian evidence or the marginal likelihood [89]. We use a nested sampling method for parameter estimation using PolyChord [92]. We consider uniform priors and the likelihood described in Eq 7 and use CosmoChord (an extension of CosmoMC that includes PolyChord) as our generic sampler [93]. A brief description of nested sampling with PolyChord is given in Section 3 in S1 Text.

While PolyChord computes the marginal likelihood, it is also useful for parameter estimation using samples. We remove 30% of the total samples generated by the nested sampling procedure that represent the initial burn-in phase. We use getdist to obtain the marginalized posterior probabilities [94]. We use an adaptive parametrization as described in subsection 1.8. Marginal likelihoods are compared to determine the optimal numbers of windows required to describe the data. We show marginalized posteriors in certain cases in the main article while the others are presented in supplementary material.

1.10 Data

We use cumulative infected and death data downloaded from the COVID19India website We manually curated the time series, but only wherever required to deal with major outliers arising from reported corrections to the record. (For example Mumbai reported 917 deaths on June 16th 2020, as opposed to the average count of 78.3 for previous week.) These outliers were smoothed by replacing their values by the local average and redistributing the excess count over 20–60 previous days depending upon the value. Typically, for each time series there are only a handful of such outliers (< 5), if at all, and are removed manually. For our simulations, we need the population data for each zone of interest (district, state or country), and mention each of these sources in the appropriate sections. For each such zone, we also need the population fraction in each age group which we obtain from the 2011 census data [95].

1.11 Estimating mortality under-counting

Reported infections are expected to be substantially lower than the actual numbers of infected. While death numbers should nominally be better indicators of the state of the pandemic, most deaths in India happen at home and a medically certified cause of death may not be available, given low levels of MCCD coverage. Moreover, death registration is not uniform across the country. Estimates of overall under-counting of covid deaths vary, but reasonable estimates suggest a factor of 1.5—5 across India during the first wave [49].

Under-counting for cases and deaths cannot be estimated independently in our model, since one of these can be subsumed into a definition of the IFR. Benchmarking our results to those from serological surveys (serosurveys) provides a way of estimating this undercounting, since such surveys provide an estimate of the fraction of the population which has sustained a prior infection by the time of the test. Requiring that the quantum of deaths be consistent with the postulated IFR then allows us to estimate the undercounting of deaths at the time of the survey.

We analyse data for Karnataka—a typical Indian state with both rural and urban districts. (Approximately 62% of Karnataka’s population lives in rural areas, comparable to a figure of 65–70% for India as a whole.) We choose Karnataka for our analysis so that we can compare our numbers to district-wise serosurvey results available from a detailed study conducted in early September of 2020 [96].

For this part of the analysis, we assume that IFR does not change over the course of the first wave; we refer to this as the fixed IFR analysis. We choose several fixed values of (age-averaged) IFR in the range 0.15% and 0.3%. We vary the extent of death undercounting by multiplying the observed number of deaths by a factor >1. We then chose a baseline district where it is potentially safe to assume that death under-counting might be minimal and that, hence, . For this district, we expect the number of infections as estimated from serosurveys should coincide with those obtained from our calculations as well as be consistent with the numbers of deaths given our assumed IFR.

Once we standardize our choice of (age-averaged) IFR with this method for a baseline district, we assume that the same IFR holds true across the state. We can then use the serosurvey results to estimate the true number of deaths for each district [96]. If Dreported is the number of reported deaths, Isero is the expected number of infected individuals based on serosurvey, we can in principle estimate the death under-counting factor using the relationship between these quantities: . This relation is an average estimation. However, as the pandemic is ongoing during the serosurvey, the infection and deaths differ by a lag (≈ 1–2 weeks) and thus, we cannot use this relation directly.

Instead, using time series of reported infections and deaths, we first estimate actual infections by the time of serosurvey (X(Z) for a district Z). If there is no death under-counting, the expected proportion of actual infections will be equal to the actual proportion of infected from serosurveys (Y(Z)). We estimate any offset from this expectation to estimate a death under-counting factor for each district Z: . All these estimates were performed by analysing time-series data that begins with an initial simulation date (two weeks prior to the available infected and death reports of each district) and goes on to early September 2020, when the Karnataka serosurvey was conducted. We simultaneously fit the daily time-series of infection and deaths, using the fixed values of the (age-averaged) IFR mentioned above. Uncertainties in the serosurvey results translate to uncertainties in our determination of . Apart from the estimation of death undercounting, the rest of our analysis is based on a time-varying IFR described in the previous sections.

We also employ a ‘death multiplier’ for our analysis of Karnataka state, which is different from . The death multiplier is provided as an input that represents our estimate the amount by which deaths are under-counted. We multiply the death numbers uniformly by the death multiplier for the entire period of analysis. For consistency, these numbers must be aligned to the IFR and the underlying numbers of infections. is the parameter that we obtain as an output that indicates possible death undercounting. For example, in a particular case, if without the use of death multiplier, we find , then it indicates that the death numbers are under-counted by a factor of 2. For the same case, if the death multiplier was input was 2, then we will estimate .

2 Results

We present our results in the following sequence. We first present results for the large south- Indian state of Karnataka in subsection 2.1, under a variety of scenarios. In this part, our primary purpose is to obtain a reasonable estimation of death under-counting and the percentage population infected by COVID-19 at the end of first wave by mid-February 2021. This is followed by an analysis of data for a number of Indian cities where we assume no death undercounting. We present the results for the large city of Mumbai in the main text subsection 2.3, while relegating the rest to SI.

While the assumption of no death undercounting is certainly untrue, we present these results primarily to illustrate the power of our method in capturing time varying parameters on relatively short scales. Finally, in subsection 2.4, we apply the model to data for India as a whole, with death undercounting taken over from our estimates for Karnataka. In each of these cases, we estimate, among other parameters, the actual number of infected and the effective IFR over the course of first wave of the pandemic.

2.1 Karnataka districts: Estimating death under-counting

Our analysis of the Karnataka data follows a relatively complex set of protocols, which we use to obtain reasonable estimates of death undercounting and total infected in the first wave. This is a challenging exercise because (a) the cases are under-counted, (b) deaths are under-counted and (c) there is no true consensus on the IFR values that should be appropriate. Our method deals with these issues by assuming that the deaths are calibrated correctly in one urban district with the best health facilities in the state.

We begin with an estimation of death under-counting in Karnataka (subsubsection 2.1.1), using a simultaneous fitting of reported infections and deaths to our INDSCI-SIM model. We then compare this with serosurvey findings, following the fixed IFR analysis protocol described in subsection 1.11. Since the serosurvey we use to calibrate was conducted in September 2020, this first analysis is carried out for daily reported data of new cases and deaths from the start of the pandemic in India to early September 2020. We find that an IFR of 0.2% seems a reasonable estimate upto early September 2022. We then apply our model allowing for time-varying IFR (and other parameters, as described in subsection 1.1 to 1.10), to data of each district of Karnataka (subsubsection 2.1.2); we find that different districts had different levels of infections, ranging from 20% to 70% in some cases, with a state-wide aggregate under-counting factor of 2.2. Finally, we present results for the aggregate state-level data for Karnataka (subsection 2.2).

2.1.1 Estimates of death under-counting upto early September 2020, with fixed IFR.

We use the serosurvey results published in Ref. [97] for all districts of Karnataka. We first scale all age-dependent IFR’s with a constant factor to ensure age-averaged IFRs of 0.3%, 0.2% and 0.15%—these have been argued to be in the right range for LMICs [78]. We assume further that either (a) there is no death under-counting or, (b) that deaths have been under-counted by a factor of two. As can be seen in Table 4, assuming that deaths have been under-counted by a factor of two doubles our estimate of the actual infected in the same IFR category, as expected. On the other hand a decrease in the IFR results in an increase in our estimate of the actual numbers of infected. We stress that, for the purposed of this specific analysis, the IFR is assumed to be constant over time (no change in δi) and that we only analyze the data till September 15, 2020 for all districts in Karnataka for consistency with the Karnataka serosurvey. Population data for Karnataka districts is sourced from the projections for the period 2020–2021, as provided by the report issued by Directorate of Economics and Statistics, Bangalore [98]. We choose Bengaluru Urban as our baseline model district, assuming that the recorded numbers of deaths are accurate. This choice is motivated by the observation that Bengaluru Urban is the largest district in Karnataka by population. It contains the state capital and is also thus likely under greater scrutiny than other, more remote, districts. Comparing our model results with those from the serosurveys for Bengaluru Urban we find that an IFR of 0.2% (highlighted column in tha table) yields numbers of those actually infected that are close to the measured seroprevalence (the undercounting factor ). To the extent that the reported death numbers are accurate for Bengaluru Urban, this is then our best estimate for the IFR—but only up to the date of the serosurvey. We can now use this assumption to estimate levels of undercounting in other districts, . (We, however, expect that this IFR value will go down further as time goes on, reflecting improvement in patient treatments and a consequent mortality reduction as well as COVID deaths in older age-bands leading to a reduction in the population which carries the most mortality risk.)

Table 4. Estimation of death under-counting by comparing seroprevalence.

The first column lists each district of Karnataka (as on Sept 2020). The second column provides the seropositivity estimate as obtained in a serosurvey across the first two weeks of September 2020. The third column provides our estimate of the potential death undercounting in that district, , represented by the ratio of seropositivity to the actual number of reported infected. Since death overcounting is not possible, the minimum value of can only be 1. The uncertainty on this number is obtained by propagating the 95% uncertainties from the serosurvey reports to our prediction of actual infection. In cases the lower bound on goes below 1, we present the mean value and the range from 1 as an upper bound in parentheses. In a few cases we just present 1 as an upper bound, representing those cases where even the upper limit is less than or equal to 1.

Using these columns in Table 4, for IFR = 0.2%, we plot the factors in Fig 3 as a choropleth in the left panel. Darker colors for districts represent a larger death undercounting. In the right panel we plot for IFR = 0.3% analysis. As expected, the estimated death undercounting increases here. This would be consistent with an death undercounting in the Bengaluru Urban district by 60% i.e. ().

Fig 3. Ratio of seroprevalence reported in districts of Karnataka across the first two weeks of September, 2020 and our prediction of actual infected on the same date.

This ratio indicates the death under-counting in different districts assuming uniform IFR in different districts. We plot the ratios here for IFR = 0.2 [a: left plot] and 0.3 [b: right plot] respectively. Note that these numbers are plotted from Table 4 for corresponding IFRs. For IFR = 0.3 the INDSCI-SIM prediction of actual infected is certainly lesser than the IFR = 0.2 case. The plot on the right shows larger ratios compared to the left. The maps are generated using the data made available at As per their declaration the repository is available under an MIT License:

2.1.2 Estimates of cases by the end of first wave (Feb 2021), with variable IFR.

Having estimated , we redo our analysis for each of Karnataka’s districts, multiplying reported deaths by the multiplying factors tabulated in Table 4. We then run the program forward till February 15, 2021, but now also allow the IFR to vary according to the parametrization described earlier. Using this analysis we tabulate the actual infected percentages we predict by Dec 15, 2020 and Feb 15, 2021 in Table 5, together with 95% confidence intervals. We find that between 20% and 70% of the population in these districts has been infected by February 15, 2021. We find no substantial change in the actual infection between December 2020 and February 2021.

Table 5. Estimation of actual infection in Karnataka districts on December 15, 2020 and February 15, 2021.

The uncertainties are quoted at 95% C.L. This table is obtained from an analysis that assumes a time varying IFR as explained in Methods.

The relationship between actual numbers of prior infections and results from serosurveys is complicated by the observation that antibodies measured in such surveys have been observed to decay in concentration over a period of several months. However, given the fact that the peak in cases in Karnataka occurred over September-October 2020, and that the serosurvey was performed during this period, it is reasonable to assume that this will not impact our interpretation of the serosurvey observations [97].

2.2 Projections and data fitting for Karnataka state: The effect of death undercounting

We now present our analysis of Karnataka state viewed as an aggregate of all its districts. Karnataka has both urban and rural districts. We will assume that a suitable estimate of undercounting at the level of the entire state can be obtained through a population average of the undercounting at the level of its individual districts. Accounting for the fraction of the total population associated to each district, multiplying them by the district undercounting factor U(D) and summing these values gives us a population averaged death undercounting for Karnataka to be 2.2.

In this section we analyze the Karnataka infection curve in two ways. In the first, we assume no death undercounting. In the second we multiply the reported deaths by 2.2 to account for state-wide undercounting. Here the reported infection and deaths data are addressed using our model. We have used the data till the end of the first wave, i.e. February 15, 2021. From the sampled parameter space we predict the time evolution of bias (between actual and reported infection), the IFR and the actual infected population.

We present the results via a set of 6 plots in three panels. In the top panel we present the fit to the daily reported cases (left—(a)) and to the daily reported deaths (right—(b)). The actual predicted infections and the bias are plotted in the middle left (c) and right (d) plots respectively. Note that the bias factor, once multiplied with the reported infections, provides the estimate of actual infection. The change in the IFR (left—(e)) and the effective reproduction number R(t) (right—(f)) estimates are plotted in bottom panel. For R(t) we also plot an independent estimate [82] as a reference.

2.2.1 Karnataka: No death undercounting.

In Fig 4 we provide model results for Karnataka state data assuming deaths have been counted accurately. The posterior distributions and parameter constraints are provided in Fig F and Table A in S1 Text. Both infected and death numbers are well described by the model. The state-wide data has a single peak in both infected and deaths, indicating that the many variable peaks present in the individual districts are averaged out to yield the relatively simple structure shown in the figure. The numbers for those totally infected ranges between 15–45% at the 2σ level by mid February. We see that the strong urban to rural gradient in seropositivity, once averaged across the state, yields infected fractions that are below those in the urban regions. The bias multiplier follows results obtained from the city of Bengaluru.

Fig 4. Timeseries analysis for Karnataka without death undercounting.

We plot the fit to the daily infected cases [a: top left] and daily reported deaths [b: top right] assuming no death undercounting. Middle panel contains the cumulative actual infected cases [c: left] and the bias multiplicative factor [d: right] obtained as a ratio between actual and reported infections. The left plot [e] at the bottom panel contains the evolution of age averaged IFR. The bottom right plot [f] contains our estimation of R(t) and an independent [82] measurement. Note that the bands correspond to 2σ and 3σ confidence levels. We do not assume any death undercounting in this analysis.

The IFR shows a decreasing trend after lockdown with a final range of IFR between 0.05–0.12%, consistent with estimates from the serological surveys [97]. These surveys find an overall adjusted total prevalence of 27.7% (95%CI 26.1–29.3); we obtain about 25% at the mid-line of the 95% confidence interval in mid-September. The case-to-infection ratio was estimated to be 1:40 while the infection fatality rate was 0.05%; our results are consistent with both observations. The R(t) band shows an increase after the lockdown till August 2020. After that R(t) decreases till December 2020 and stays below 1 till February 2021. In this case, bounds on cumulative infections and deaths are plotted with data in Figs C-E in S1 Text.

2.2.2 Karnataka: Incorporating death undercounting.

Fig 5 shows our model predictions for the Karnataka state data where the death timeseries is uniformly scaled with multiplier of 2.2. The related shifts in the posterior distributions and parameter constraints can be seen in Fig F and Table A in S1 Text). The daily infection and the scaled daily death report fits are similar to Fig 4. However, compared to the analysis that assumed no death multiplier, predictions for the fraction of those infected ranges between 30–50% Table 5. A larger death undercounting thus leads to a higher estimates of actual infection within the population. Correspondingly, the bias multiplier is also large.

Fig 5. Timeseries analysis for Karnataka with death undercounting.

We plot the fit to the daily infected cases [a: top left] and daily reported deaths [b: top right] taking into account estimated death undercounting. Middle panel contains the cumulative actual infected cases [c: left] and the bias multiplicative factor [d: right] obtained as a ratio between actual and reported infections. The left plot [e] at the bottom panel contains the evolution of age averaged IFR. The bottom right plot [f] contains our estimation of R(t) and an independent [82] measurement. Note that the bands correspond to 2σ and 3σ confidence levels. In this analysis the reported death numbers are multiplied with 2.2, the average death undercounting we obtained from Karnataka districts assuming Bengaluru Urban does not have any death undercounting.

With higher death numbers, the effective IFR decreases at a later stage (around September 2020), converging around February to the values without death undercounting. The effect of death undercounting is absorbed mainly by the bias multiplier and therefore is reflected in the actual infected numbers. We find that the R(t) is nearly identical to the analysis without death undercounting. Assuming death undercounting, our estimates for the cumulative infections and deaths are plotted alongside the data in Section 5, Figs C-E in S1 Text.

2.3 Infection curves for select Indian cities

In this sub-section, we analyse of the progress of COVID-19 across the period Jan 2020—February 15, 2021, for a few select Indian cities: Bengaluru, Chennai, Delhi, Pune and Mumbai. For each of these cities, the data correspond to the urban district agglomeration within which the city is embedded. We assume that deaths have been counted largely accurately for these cities, and thus that U(D) defined above is 1. We will return to the implications of this assumption later.

All these cities show more than a single peak in reported cases. The origins of these peaks are complex. Together, they represent a combination of relaxations of the lockdown and of physical distancing, festivals and political activity at various times, the ebb and flow of migration from nearby districts, as well as day-to-day variations in testing. These effects are captured in our multiple window model with adaptive selection, since infectivity parameters can now vary across different windows and the bias factor between actual and reported cases can itself evolve.

We present a set of 6 plots in three panels, as in results of Karnataka in the previous subsection. We show the results for Mumbai in a subsection below, and relegate the rest to Tables D-G and Figs K-V in S1 Text.

2.3.1 Mumbai.

Results for the city of Mumbai are provided in Fig 6. The population of the district is taken from the projections for 2021 [99]. The posterior distributions and related parameter constraints are provided in Fig H and Table B in S1 Text. The data for Mumbai show two well resolved peaks around June-July 2020 and September-October 2020. Following the second peak, reported infections reduce and remain roughly constant from about December 2020.

Fig 6. Timeseries analysis for Mumbai.

We plot the fit to the daily infected cases [a: top left] and daily reported deaths [b: top right] assuming no death undercounting. Middle panel contains the cumulative actual infected cases [c: left] and the bias multiplicative factor [d: right] obtained as a ratio between actual and reported infections. The left plot [e] at the bottom panel contains the evolution of age averaged IFR. The bottom right plot [f] contains our estimation of R(t) and an independent [82] measurement. Note that the bands correspond to 2σ and 3σ confidence levels.

The reported deaths shows a peak and a plateau before reaching a second, smaller peak. While the second peak in the reported infection is higher than the first peak, the plateau in the reported death numbers around October 2020 is substantially lower than the July 2020 peak, indicating better treatment interventions. There are no systematic outliers in the death numbers, once we apply our smoothing to the data. The sharp decrease in death cases in contrast to the increase in reported infections could result from increase in testing (a lowering of the TPR) and/or a decrease in IFR.

While the bias multipler decreases from 100 to 15, the IFR does not decrease as rapidly. At 2σ the data is consistent at about an IFR of 0.15%, which is the largest when compared to the previously discussed cities. In Mumbai, a relatively smaller fraction of population (50%) appears to be infected within this calculation. However, fluctuations in the infected data w.r.t. the mean value translate to the uncertainties in the time-series estimation. Thus, the bands are wider compared to Bengaluru, Chennai and Delhi. At the 2σ-level we estimate that 54–84% of total Mumbai population may have been infected. R(t) shows a steady decrease within the lockdown and after that till mid-August 2020 when it starts to rise again temporarily indicating an upcoming second peak. Following the peak R(t) first reduces below 1 for a few months, then rises to fluctuate around 1 in early 2021 till February.

Mumbai serosurvey data showed that 54·1% of samples in slums and 16·1% of those in non-slums tested positive [100], in a study performed between June 29 and July 19, 2020. These are broadly consistent with our estimates. Our estimates for cumulative infections and deaths are plotted with data in Fig G in S1 Text.

2.4 India

To describe the data for India, we assume that the death undercounting factor that represents India as a whole is the same as that obtained for the state of Karnataka, a factor of 2.2. In doing this we are motivated by the observations that the urban-rural gradient can be substantial and that the division between urban and rural populations at the Karnataka state level and at the national level are not too different, at an approximate ratio of about 35:65.

The national data for COVID-19 infections shows a single peak structure around mid-September 2020, at a bit less than 98,000 cases. The daily numbers then began to reduce after that reaching a value of about 10,000 by January of 2021. Our model fits infection and death reports in three adaptive windows between March 1, 2020 and February 15, 2021. For our calculations, we use India’s population data from the projections reported in the 2019 report published by National Commission on Population [101].

We find the bias multipliers change from 120 to 20 in this time period, thus indicating that by the end of the first wave, infections were being underestimated by the reported cases through a factor of 20. This decrease is in accordance with the reported overall increase in testing during this period. At the 2σ level, by the middle of February when the second wave began, we find that between 20–50% (95% CI) of Indian population was infected, with a point estimate of about 35%. This is a more conservative estimate than made by others. However, we believe it is consistent with the third ICMR serosurvey across December and January as well as with the slow pace of case accumulation over January and February.

The IFR band is consistent with a global India IFR of about 0.1% (95%CI 0.05—0.15). The R(t) band has several interesting features corroborated with the independent measurement plotted in dots. Before the lockdown, the R(t) stays in the high range 2.5–3. During lockdown it reduced to 1.5 and thereafter to the vicinity of 1.2. It stayed around 1.2 till August, going below 1 around October, signaling a steady decrease in infected numbers. A slight upward blip in November 2020 is consistent with the festival season in the north and east of India. The R(t) then stayed below 1 through till February 2021. Assuming death undercounting, estimations for cumulative infections and deaths for India are plotted with data in Fig I in S1 Text. We present the posterior distribution of model parameters for the national data analysis in Fig J in S1 Text.

Our estimates indicate that at the beginning of March the mean exposed population in India was around 6000. Around 14 March the number of officially reported infections in India crossed 100, but we expect that a lot of infections must have escaped detection.

Three adaptive windows are defined between simulation onset (March beginning) to 154’th day from the onset (August 2, 2020); between August 2, 2020 to October 21, 2020 (234th day from the simulation onset) and October 22, 2020 to February 15, 2021 (simulation end). These 3 regions have three different associated infectivities β1 = 0.0755, β2 = 0.117 and β3 = 0.220. This is the base infectivity with respect to the simulation onset. This is suggestive of two phases in which increased relaxations of COVID-19-associated restrictions may have led to a rise of cases, one around the beginning of August and one around the end of October, although one should be careful not to overinterpret this data,

Note that the effective β is modified by the parameter τ which has a mean value of 142 days. Therefore while the β is increasing with different windows, the effective β is also increasing. The bias parameter is unbounded from above even with a conservative prior. However the time evolution of bias determined by b and Δb for national data indicates a steady decrease, reaching a bias of around 20 as plotted in Fig 7. The IFR shows a transition around mid July (108 day since the onset) with a width ΔIFR of 2 months. σ1 and σ2 represents the noise in the infected and deaths data respectively. They are well constrained.

Fig 7. Timeseries analysis for Indian national data.

We plot the fit to the daily infected cases [a: top left] and daily reported deaths [b: top right] assuming taking into account estimated death undercounting. Middle panel contains the cumulative actual infected cases [c: left] and the bias multiplicative factor [d: right] obtained as a ratio between actual and reported infections. The left plot [e] at the bottom panel contains the evolution of age averaged IFR. The bottom right plot [f] contains our estimation of R(t) and an independent [82] measurement. Note that the bands correspond to 2σ and 3σ confidence levels. In this analysis the reported death numbers are multiplied by 2.2 to take into account a measure of presumed undercounting.

The triangle plot of posteriors reveal strong correlations between parameters. Given the time-series of infected persons, a fit to the initial data can be addressed either by having a large initial exposed and lower infectivity or by having a smaller exposed population and a higher infectivity. Such a negative correlation is seen in Fig J in S1 Text, between Einitial and β1. Since smaller τ1 results in a faster decrease in infection by lowering the effective β, to fit the infection data, if βi increases, τ must decrease to maintain a similar level of effective infectivity. These negative correlations are reflected in the contour plots as well. The slope of the infection curve is given by β; the β’s are thus positively correlated.

3 Discussion and conclusions

This paper describes the application of an epidemiological compartmental model, INDSCI-SIM, to COVID-19 in India. We focused on describing the first wave, starting from the first case in India on January 30, 2020 and continuing till mid-February, 2021, when a second and more destructive wave was initiated. Our intent was to obtain reasonable estimates for the fraction of the Indian population that was infected prior to the onset of the second wave. A parallel aim was to arrive at credible estimates of the Infection Fatality ratio (IFR) and of the undercounting of both cases and deaths during the first wave.

Assuming that deaths were recorded accurately in one urban district of Karnataka, we chose an infection fatality ratio for which our predicted fraction of infections and the fraction obtained through serosurveys in early September 2020 were approximately equal. We then used this to determine the extent of death undercounting across other districts of Karnataka, both urban and rural, by examining consistency between the numbers of infected as estimated through serosurveys and the numbers of deaths that should have resulted given the IFR value. We found that some districts could have under-counted deaths by a factor of about 5. Rural districts, in general, tended to have lower levels of infection spread, likely associated to the fact that agricultural labour and a good fraction of the rural workforce work outdoors, where infection risk is decidedly less. We thus estimated the overall fraction of infected in Karnataka to lie in the interval 30%-50%. Then, factoring this death undercounting into our calculation, we switched to a more refined calculation in whicn the IFR itself was allowed to vary. Averaging over all districts weighted by their populations yielded a factor of 2.2 between total actual and reported deaths in Karnataka state. We used these results to estimate that between 20–70% of the population across these districts were infected by February 2020.

Our results for the cities of Mumbai, Delhi, Pune and Bangalore—we assume no death undercounting here—indicate that, by February 2021, almost 60–80% of the population were infected, consistent with serological surveys in these cities. We found that only one in 15–20 cases were detected overall [102]. The effective reproductive ratio in all cases settled to just below 1 between December 2020 and February 2021, a period where cases were declining across India.

Our analysis for India, incorporating the same level of death undercounting as for Karnataka state, estimates that a fraction of about 35% (CI: 20–50%) have been infected at the 95% confidence level by the end of the first wave. We believe that this estimate, distinctly on the lower side of estimates made by others, may account for the speed at which the second wave has spread. It may also account for the fact that rural spread seems to have been far larger in the second wave as opposed to the first, although the plight of urban India has tended to attracted more attention.

Our IFR estimates, though broad, are consistent with an inferred IFR of about 0.1%, with a broad 95% CI of (0.05—0.15). These are, as mentioned earlier, on the lower side vis a vis. population-based estimates for LMICs in general. We caution here that the extent of death undercounting will impact any estimate of the IFR. There are, as yet, no completely credible estimates of the extent of undercounting in the first COVID-19 wave in India, although this question has attracted much attention recently [49, 103, 104]. A recent estimate is of 3.1–3.4 million COVID deaths across both waves, including 2.5–2.8 million during the period of the second wave, April-June 2021 [105]. An estimate of about 0.6—0.9 million deaths in the first wave is larger than our conservative estimate of a multiplier of 2.2 between actual and recorded deaths, which would give us a number somewhat in excess of 0.3 million, but is overall consistent with the range we had estimated of undercounting by a factor of between 2 and 5. This would then imply an IFR of between 0.2 and 0.3, assuming that our estimates of infections remained accurate with only the fraction of deaths among them being multiplied by a factor of between 2 and 3.

In principle, we could have repeated the district-wise analysis we performed for Karnataka state for all states in India, but would be faced with the same problem, that of finding a benchmark district where we could safely assume that deaths were being counted more-or-less accurately (or knowing the extent of undercounting precisely) and then adjusting the mortality figures of other districts accordingly. We expect that as more data is obtained through better proxies for COVID-19 deaths or through accurate estimations of all-cause mortality, we should be able to refine these calculations further with more precise input.

Our model and methods provide a valuable framework as well as specific tools for comprehensive retrospective analysis of epidemiological waves. When used with abundant caution, we believe that it may be useful even for short-term projections; we strongly recommend that modellers must present uncertainties associated with any such projections. More specifically, the results presented here should be valuable in determining the initial conditions for the second wave, especially since we compute the posterior probabilities for the parameters which enter our model. Given these, and additional epidemiological and clinical input, for example the fraction of reinfections upon the introduction of a new strain, the decay of humoral immunity, the persistence and extent of cellular immunity, a vaccination program that was being slowly ramped up at the time of onset of the second wave as well as other features of the Indian response across February and March of 2021, we can then begin to address the question of what led to the fast increase of cases in the second wave. Our methodological innovations include accounting for a time-dependent improvement in case identification as well as in treatment leading to lower mortality, all implemented within a fully Bayesian framework. Questions such as the issue of reinfections as a consequence of immune escape, of partial immunity from the ongoing vaccination program, of vaccine breakthroughs at a low rate and of the impacts of multiple lock-downs applied inhomogeneously throughout the country should factor into any further analysis of events after the end of the first wave of COVID-19 in India. These impact public health policy intimately, especially as they are relevant to both testing strategies and vaccination policies [106, 107]. It is here that we expect that well formulated and bench-marked models should be of substantial use, both to understand the past better as well as to project the future more accurately.

4 Codes developed and used

We developed a fast FORTRAN code ELiXSIR—Extended, zone Linked IX-compartmental SIR model: a code to simulate COVID19 infection as a solver for the model [108]. A version of ELiXSIR is available for download in The code is designed for arbitrary number of age groups and regions. We acknowledge the use of PolyChord through the publicly available code CosmoChord ELiXSIR is fused with CosmoChord for the purpose of data analysis and post processing. See Fig B in S1 Text that provides a flowchart of how our analysis is structured.

Supporting information

S1 Text.

Section 1: Computation of reproduction number R0. We provide the calculation of the basic reproductive ratio corresponding to the age-stratified model, assuming uniform infectivity and values of ϵ = 1, using a next-generation-matrix method. Section 2: Contact matrices between stratified age-groups. We describe how the contact matrices of Ref. [75] can be specified for our use here, combining contact matrices with a 5-year resolution into a more coarse-grained description. Fig A: The coarse-grained contact matrices for India during (top) and without lockdown (bottom). Section 3: Brief discussion on Nested Sampling. We briefly discuss the sampling method we have used in our analysis. Section 4: Flowchart of our analysis. We discuss the essentials of the algorithm and techniques and provide a flowchart of the analysis. Fig B: A schematic flowchart of our analysis. Section 5: Supporting plots for Karnataka, Mumbai and India (aggregate) analysis. We present supporting plots and tables corresponding to our analysis for Karnataka state, Mumbai and for India-wide numbers. Table A: Karnataka: Constraints on parameters. Fig C: Karnataka: Bounds on cumulative infection with no death multiplier. Fig D: Karnataka: Bounds on cumulative infection with a death multiplier of 2.2. Fig E: Karnataka: Bounds on cumulative infection with a death multiplier of 5. Fig F: Karnataka: Marginalized posteriors of the parameters. Table B: Mumbai: Constraints on parameters. Fig G: Mumbai: Bounds on cumulative infection with no death multiplier. Fig H: Mumbai: Marginalized posteriors of the parameters. Table C: India: Mean and 95% bounds on the parameters. Fig I: India: Bounds on cumulative infection with a death multiplier of 2.2. Fig J: India: Marginalized posteriors of the parameters. Section 6: Infection curves for select Indian cities. In the main text, we presented the analysis of Mumbai alone. Here, following the same protocol, we describe results for Bengaluru Urban, Chennai, Pune, and Delhi. Table D: Bengaluru Urban: Constraints on parameters. Fig K: Bengaluru Urban: Timeseries analysis for Bengaluru Urban. Fig L: Bengaluru Urban: Bounds on cumulative infection. Fig M: Bengaluru Urban: Marginalized posteriors of the parameters. Table E: Chennai: Constraints on parameters. Fig N: Chennai: Timeseries analysis for Chennai. Fig O: Chennai: Bounds on cumulative infection. Fig P: Chennai: Marginalized posteriors of the parameters. Table F: Delhi: Constraints on parameters. Fig Q: Delhi: Timeseries analysis for Delhi. Fig R: Delhi: Bounds on cumulative infection. Fig S: Delhi: Marginalized posteriors. Table G: Pune: Constraints on parameters. Fig T: Pune: Timeseries analysis for Pune. Fig U: Pune: Bounds on cumulative infection. Fig V: Pune: Marginalized posteriors of the parameters.



The numerical simulations discussed in the paper have been performed at the Institute of Mathematical Science’s High Performance Computing facility Nandadevi. DKH would like to thank Xingang Chen, Will Handley and Joshua Speagle for important discussions. BSP and SMS would like to thank Mihir Arjunwadkar for early discussions. GIM would like to thank Siva Athreya, Sandeep Krishna, Brian Wahl, Philip Cherian, Vaibhhav Sinha, Giridhara Babu, Bhramar Mukherjee, Gagandeep Kang, L S Shashidhara, Shahid Jameel, Kayla Laserson and Harish Iyer for many related discussions.


  1. 1. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, et al. A Novel Coronavirus from Patients with Pneumonia in China, 2019. New England Journal of Medicine. 2020;382(8):727–733. pmid:31978945
  2. 2. Russell TW, Wu JT, Clifford S, Edmunds WJ, Kucharski AJ, Jit M. Effect of internationally imported cases on internal spread of COVID-19: a mathematical modelling study. The Lancet Public Health. 2021;6(1):e12–e20. pmid:33301722
  3. 3. Andrews MA, Areekal B, Rajesh KR, Krishnan J, Suryakala R, Krishnan B, et al. First confirmed case of COVID-19 infection in India: A case report. Indian Journal of Medical Research. 2020;151(5):490. pmid:32611918
  4. 4. Lancet T. India under COVID-19 lockdown. Lancet (London, England). 2020;395(10233):1315.
  5. 5. Pulla P. Covid-19: India imposes lockdown for 21 days and cases rise. BMJ. 2020; p. m1251. pmid:32217534
  6. 6. Pons-Salort M, John J, Watson OJ, Brazeau NF, Verity R, Kang G, et al. Reconstructing the COVID-19 epidemic in Delhi, India: infection attack rate and reporting of deaths. Infectious Diseases (except HIV/AIDS); 2021. Available from:
  7. 7. Mallapaty S. India’s massive COVID surge puzzles scientists; 2021. Available from:
  8. 8. Menon GI. The Novel Coronavirus Variants and India’s Uncertain Future; 2021. Available from:
  9. 9. Mandal S, Bhatnagar T, Arinaminpathy N, Agarwal A, Chowdhury A, Murhekar M, et al. Prudent public health intervention strategies to control the coronavirus disease 2019 transmission in India: A mathematical model-based approach. Indian Journal of Medical Research. 2020;151(2):190. pmid:32362645
  10. 10. Chatterjee K, Chatterjee K, Kumar A, Shankar S. Healthcare impact of COVID-19 epidemic in India: A stochastic mathematical model. Medical Journal Armed Forces India. 2020;76(2):147–155. pmid:32292232
  11. 11. Ray D, Salvatore M, Bhattacharyya R, Wang L, Du J, Mohammed S, et al. Predictions, role of interventions and effects of a historic national lockdown in India’s response to the COVID-19 pandemic: data science call to arms. Harvard Data Science Review. 2020;2020(Suppl 1). pmid:32607504
  12. 12. Agrawal S, Bhandari S, Bhattacharjee A, Deo A, Dixit NM, Harsha P, et al. City-Scale Agent-Based Simulators for the Study of Non-pharmaceutical Interventions in the Context of the COVID-19 Epidemic: IISc-TIFR COVID-19 City-Scale Simulation Team. Journal of the Indian Institute of Science. 2020;100(4):809–847. pmid:33199946
  13. 13. Ganesan S, Subramani D. Spatio-temporal predictive modeling framework for infectious disease spread. Scientific Reports. 2021;11(1):6741. pmid:33762613
  14. 14. Banerjee R, Bhattacharjee S, Varadwaj PK. Analyses and Forecast for COVID-19 epidemic in India. Infectious Diseases (except HIV/AIDS); 2020. Available from:
  15. 15. Ansumali S, Kumar A, Agarwal S, Shashank HJ, Prakash MK. A steady trickle-down from metro districts and improving epidemic-parameters characterize the increasing COVID-19 cases in India. medRxiv. 2020; p. 2020.09.28.20202978.
  16. 16. Basu D, Salvatore M, Ray D, Kleinsasser M, Purkayastha S, Bhattacharyya R, et al. A Comprehensive Public Health Evaluation of Lockdown as a Non-pharmaceutical Intervention on COVID-19 Spread in India: National Trends Masking State Level Variations. medRxiv. 2020; p. 2020.05.25.20113043.
  17. 17. Bhaskar A, Ponnuraja C, Srinivasan R, Padmanaban S. Distribution and growth rate of COVID-19 outbreak in Tamil Nadu: A log-linear regression approach. Indian Journal of Public Health. 2020;64(6):188.
  18. 18. Bhattacharyya R, Bhaduri R, Kundu R, Salvatore M, Mukherjee B. Reconciling epidemiological models with misclassified case-counts for SARS-CoV-2 with seroprevalence surveys: A case study in Delhi, India. medRxiv. 2020; p. 2020.07.31.20166249.
  19. 19. Chinazzi M, Davis JT, Ajelli M, Gioannini C, Litvinova M, Merler S, et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020;368(6489):395–400. pmid:32144116
  20. 20. Deo V, Chetiya AR, Deka B, Grover G. Forecasting Transmission Dynamics of COVID-19 Epidemic in India Under Various Containment Measures- A Time-Dependent State-Space SIR Approach. medRxiv. 2020; p. 2020.05.08.20095877.
  21. 21. Gupta S, Shah S, Chaturvedi S, Thakkar P, Solanki P, Dibyachintan S, et al. An India-specific Compartmental Model for Covid-19: Projections and Intervention Strategies by Incorporating Geographical, Infrastructural and Response Heterogeneity. arXiv:200714392 [physics, q-bio]. 2020;.
  22. 22. Kotwal A, Yadav AK, Yadav J, Kotwal J, Khune S. Predictive models of COVID-19 in India: A rapid review. Medical Journal Armed Forces India. 2020;76(4):377–386. pmid:32836710
  23. 23. Lahiri A, Jha S, Bhattacharya S, Ray S, Chakraborty A, et al. Effectiveness of preventive measures against COVID-19: A systematic review of In Silico modeling studies in indian context Lahiri A, Jha SS, Bhattacharya S, Ray S, Chakraborty A—Indian J Public Health; 2020. Available from:;year=2020;volume=64;issue=6;spage=156;epage=167;aulast=Lahiri.
  24. 24. Laxminarayan R, John J. Is Gradual and Controlled Approach to Herd Protection a Valid Strategy to Curb the COVID-19 Pandemic?; 2020. Available from:
  25. 25. Mitra A, Pakhare AP, Roy A, Joshi A. Impact of COVID-19 epidemic curtailment strategies in selected Indian states: an analysis by reproduction number and doubling time with incidence modelling. medRxiv. 2020; p. 2020.05.10.20094946. pmid:32511508
  26. 26. Pant R, Choudhury LP, Rajesh JG, Yeldandi VV. COVID-19 Epidemic Dynamics and Population Projections from Early Days of Case Reporting in a 40 million population from Southern India. medRxiv. 2020; p. 2020.04.17.20070292.
  27. 27. Ranjan R. Predictions for COVID-19 Outbreak in India using epidemiological models. medRxiv. 2020; p. 2020.04.02.20051466.
  28. 28. Sardar T, Nadim SS, Rana S, Chattopadhyay J. Assessment of lockdown effect in some states and overall India: A predictive mathematical study on COVID-19 outbreak. Chaos, Solitons, and Fractals. 2020;139:110078. pmid:32834620
  29. 29. Shah K, Awasthi A, Modi B, Kundapur R, Saxena D. Unfolding trends of COVID-19 transmission in India: Critical review of available Mathematical models. Indian Journal of Community Health. 2020;32(2 (Supp)):206–214.
  30. 30. Welling A, Patel A, Kulkarni P, Vaidya VG. Multilevel Integrated Model with a Novel Systems Approach (MIMANSA) for Simulating the Spread of COVID-19. medRxiv. 2020; p. 2020.05.12.20099291.
  31. 31. Agrawal M, Kanitkar M, Vidyasagar M. SUTRA: A Novel Approach to Modelling Pandemics with Asymptomatic and Undetected Patients, and Applications to COVID-19. arXiv:210109158 [q-bio]. 2021;.
  32. 32. Singh R, Adhikari R. Age-structured impact of social distancing on the COVID-19 epidemic in India. arXiv:200312055 [cond-mat, q-bio]. 2020;.
  33. 33. Pujari BS, Shekatkar S. Multi-city modeling of epidemics using spatial networks: Application to 2019-nCov (COVID-19) coronavirus in India. medRxiv. 2020; p. 2020.03.13.20035386.
  34. 34. Ansumali S, Kaushal S, Kumar A, Prakash MK, Vidyasagar M. Modelling a pandemic with asymptomatic patients, impact of lockdown and herd immunity, with applications to SARS-CoV-2. Annual Reviews in Control. 2020;50:432–447. pmid:33071595
  35. 35. Saraswat B, Ansumali S, Prakash MK. Using high effective risk of Adult-Senior duo in multigenerational homes to prioritize COVID-19 vaccination. medRxiv. 2021; p. 2021.04.14.21255468.
  36. 36. Purkayastha S, Bhattacharyya R, Bhaduri R, Kundu R, Gu X, Salvatore M, et al. A comparison of five epidemiological models for transmission of SARS-CoV-2 in India. medRxiv. 2020; p. 2020.09.19.20198010.
  37. 37. Kuppalli K, Gala P, Cherabuddi K, Kalantri SP, Mohanan M, Mukherjee B, et al. India’s COVID-19 crisis: a call for international action. The Lancet. 2021;397(10290):2132–2135.
  38. 38. Naushin S, Sardana V, Ujjainiya R, Bhatheja N, Kutum R, Bhaskar AK, et al. Insights from a Pan India Sero-Epidemiological survey (Phenome-India Cohort) for SARS-CoV2. eLife. 2021;10:e66537. pmid:33876727
  39. 39. John J, Kang G. Tracking SARS-CoV-2 infection in India with serology. The Lancet Global Health. 2021;9(3):e219–e220. pmid:33515513
  40. 40. Velumani A, Nikam C, Suraweera W, Fu SH, Gelband H, Brown P, et al. SARS-CoV-2 Seroprevalence in 12 Cities of India from July-December 2020. medRxiv. 2021; p. 2021.03.19.21253429.
  41. 41. Murhekar MV, Bhatnagar T, Selvaraju S, Rade K, Saravanakumar V, Thangaraj JWV, et al. Prevalence of SARS-CoV-2 infection in India: Findings from the national serosurvey, May-June 2020. Indian Journal of Medical Research. 2020;152(1):48. pmid:32952144
  42. 42. Murhekar MV, Bhatnagar T, Selvaraju S, Saravanakumar V, Thangaraj JWV, Shah N, et al. SARS-CoV-2 antibody seroprevalence in India, August–September, 2020: findings from the second nationwide household serosurvey. The Lancet Global Health. 2021;9(3):e257–e266. pmid:33515512
  43. 43. Murhekar MV, Bhatnagar T, Thangaraj JWV, Saravanakumar V, Kumar MS, Selvaraju S, et al. SARS-CoV-2 seroprevalence among the general population and healthcare workers in India, December 2020–January 2021. International Journal of Infectious Diseases. 2021;108:145–155. pmid:34022338
  44. 44. Jagadeesan M, Ganeshkumar P, Kaur P, Sriramulu HM, Sakthivel M, Rubeshkumar P, et al. Epidemiology of COVID-19 and effect of public health interventions, Chennai, India, March–October 2020: an analysis of COVID-19 surveillance system. BMJ open. 2022;12(3):e052067.
  45. 45. Malani A, Ramachandran S, Tandel V, Parasa R, Sudharshini S, Prakash V, et al. SARS-CoV-2 Seroprevalence in Tamil Nadu in October-November 2020. medRxiv. 2021; p. 2021.02.03.21250949.
  46. 46. Delhi sero-survey results: Over 23% residents have coronavirus antibodies; 2020. Available from:
  47. 47. Ghose A, Bhattacharya S, Karthikeyan AS, Kudale A, Monteiro JM, Joshi A, et al. Community prevalence of antibodies to SARS-CoV-2 and correlates of protective immunity in five localities in an Indian metropolitan city. medRxiv. 2020; p. 2020.11.17.20228155.
  48. 48. Malani A, Shah D, Kang G, Lobo GN, Shastri J, Mohanan M, et al. Seroprevalence of SARS-CoV-2 in slums and non-slums of Mumbai, India, during June 29-July 19, 2020. medRxiv. 2020; p. 2020.08.27.20182741.
  49. 49. Banaji M. Estimating COVID-19 infection fatality rate in Mumbai during 2020. medRxiv. 2021; p. 2021.04.08.21255101.
  50. 50. Childs ML, Kain MP, Kirk D, Harris M, Couper L, Nova N, et al. The impact of long-term non-pharmaceutical interventions on COVID-19 epidemic dynamics and control. medRxiv. 2020; p. 2020.05.03.20089078. pmid:32511583
  51. 51. Anderson RM, May RM. Infectious Diseases of Humans: Dynamics and Control. Oxford, New York: Oxford University Press; 1992.
  52. 52. Keeling M, Rohani P. Modeling Infectious Diseases in Humans and Animals; 2007. Available from:
  53. 53. Perra N. Non-pharmaceutical interventions during the COVID-19 pandemic: A review. Physics Reports. 2021;913:1–52. pmid:33612922
  54. 54. Tsai PH, Lai WY, Lin YY, Luo YH, Lin YT, Chen HK, et al. Clinical manifestation and disease progression in COVID-19 infection. Journal of the Chinese Medical Association: JCMA. 2021;84(1):3–8. pmid:33230062
  55. 55. Choi MH, Ahn H, Ryu HS, Kim BJ, Jang J, Jung M, et al. Clinical Characteristics and Disease Progression in Early-Stage COVID-19 Patients in South Korea. Journal of Clinical Medicine. 2020;9(6):1959. pmid:32585855
  56. 56. Chen J, Qi T, Liu L, Ling Y, Qian Z, Li T, et al. Clinical progression of patients with COVID-19 in Shanghai, China. The Journal of Infection. 2020;80(5):e1–e6. pmid:32171869
  57. 57. Oran DP, Topol EJ. Prevalence of Asymptomatic SARS-CoV-2 Infection. Annals of Internal Medicine. 2020;173(5):362–367. pmid:32491919
  58. 58. Chan JFW, Yuan S, Kok KH, To KKW, Chu H, Yang J, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. The Lancet. 2020;395(10223):514–523. pmid:31986261
  59. 59. Kimball A. Asymptomatic and Presymptomatic SARS-CoV-2 Infections in Residents of a Long-Term Care Skilled Nursing Facility—King County, Washington, March 2020. MMWR Morbidity and Mortality Weekly Report. 2020;69.
  60. 60. Richardson S, Hirsch JS, Narasimhan M, Crawford JM, McGinn T, Davidson KW, et al. Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area. JAMA. 2020;323(20):2052–2059. pmid:32320003
  61. 61. Cummings MJ, Baldwin MR, Abrams D, Jacobson SD, Meyer BJ, Balough EM, et al. Epidemiology, clinical course, and outcomes of critically ill adults with COVID-19 in New York City: a prospective cohort study. Lancet (London, England). 2020;395(10239):1763–1770.
  62. 62. Docherty AB, Harrison EM, Green CA, Hardwick HE, Pius R, Norman L, et al. Features of 20 133 UK patients in hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study. BMJ. 2020;369:m1985. pmid:32444460
  63. 63. Suleyman G, Fadel RA, Malette KM, Hammond C, Abdulla H, Entz A, et al. Clinical Characteristics and Morbidity Associated With Coronavirus Disease 2019 in a Series of Patients in Metropolitan Detroit. JAMA Network Open. 2020;3(6):e2012270–e2012270. pmid:32543702
  64. 64. Singh BB, Ward MP, Lowerison M, Lewinson RT, Vallerand IA, Deardon R, et al. Meta-analysis and adjusted estimation of COVID-19 case fatality risk in India and its association with the underlying comorbidities. medRxiv. 2020; p. 2020.10.08.20209163.
  65. 65. Gebhard C, Regitz-Zagrosek V, Neuhauser HK, Morgan R, Klein SL. Impact of sex and gender on COVID-19 outcomes in Europe. Biology of Sex Differences. 2020;11(1):29. pmid:32450906
  66. 66. Singh PP, Srivastava AK, Upadhyay SK, Singh A, Gupta P, Maurya S, et al. The association of ABO blood group with the asymptomatic COVID-19 cases in India. medRxiv. 2021; p. 2021.04.01.21254681. pmid:34366234
  67. 67. Kumar N, Shahul Hameed SK, Babu GR, Venkataswamy MM, Dinesh P, Kumar Bg P, et al. Descriptive epidemiology of SARS-CoV-2 infection in Karnataka state, South India: Transmission dynamics of symptomatic vs. asymptomatic infections. EClinicalMedicine. 2021;32:100717. pmid:33521608
  68. 68. Gupta M, Mohanta SS, Rao A, Parameswaran GG, Agarwal M, Arora M, et al. Transmission dynamics of the COVID-19 epidemic in India and modeling optimal lockdown exit strategies. International Journal of Infectious Diseases. 2021;103:579–589. pmid:33279653
  69. 69. Du Z, Xu X, Wu Y, Wang L, Cowling BJ, Meyers LA. Serial Interval of COVID-19 among Publicly Reported Confirmed Cases—Volume 26, Number 6—June 2020—Emerging Infectious Diseases journal—CDC. Emerging infectious diseases. 2020.
  70. 70. Rasmussen AL, Popescu SV. SARS-CoV-2 transmission without symptoms. Science. 2021;371(6535):1206–1207. pmid:33737476
  71. 71. Buitrago-Garcia D, Egli-Gany D, Counotte MJ, Hossmann S, Imeri H, Ipekci AM, et al. Occurrence and transmission potential of asymptomatic and presymptomatic SARS-CoV-2 infections: A living systematic review and meta-analysis. PLOS Medicine. 2020;17(9):e1003346. pmid:32960881
  72. 72. Ferretti L, Ledda A, Wymant C, Zhao L, Ledda V, Abeler-Dörner L, et al. The timing of COVID-19 transmission. medRxiv. 2020; p. 2020.09.04.20188516.
  73. 73. Barman MP, Rahman T, Bora K, Borgohain C. COVID-19 pandemic and its recovery time of patients in India: A pilot study. Diabetes & Metabolic Syndrome: Clinical Research & Reviews. 2020;14(5):1205–1211. pmid:32673841
  74. 74. Khalili M, Karamouzian M, Nasiri N, Javadi S, Mirzazadeh A, Sharifi H. Epidemiological characteristics of COVID-19: a systematic review and meta-analysis. Epidemiology & Infection. 2020;148. pmid:32594937
  75. 75. Prem K, Cook AR, Jit M. Projecting social contact matrices in 152 countries using contact surveys and demographic data. PLOS Computational Biology. 2017;13(9):e1005697. pmid:28898249
  76. 76. Kerr CC, Stuart RM, Mistry D, Abeysuriya RG, Hart G, Rosenfeld K, et al. Covasim: an agent-based model of COVID-19 dynamics and interventions. medRxiv. 2020; p. 2020.05.10.20097469. pmid:33173914
  77. 77. Ferguson N, Laydon D, Nedjati Gilani G, Imai N, Ainslie K, Baguelin M, et al. Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand; 2020. Available from:
  78. 78. Levin AT, Hanage WP, Owusu-Boaitey N, Cochran KB, Walsh SP, Meyerowitz-Katz G. Assessing the age specificity of infection fatality rates for COVID-19: systematic review, meta-analysis, and public policy implications. European Journal of Epidemiology. 2020;35(12):1123–1138. pmid:33289900
  79. 79. IHME | COVID-19 Projections; 2021. Available from:
  80. 80. Diekmann O, Heesterbeek JaP, Roberts MG. The construction of next-generation matrices for compartmental epidemic models. Journal of The Royal Society Interface. 2010;7(47):873–885. pmid:19892718
  81. 81. Bettencourt LMA, Ribeiro RM. Real Time Bayesian Estimation of the Epidemic Potential of Emerging Infectious Diseases. PLOS ONE. 2008;3(5):e2185. pmid:18478118
  82. 82. Gruisen G, Lerer A. k-sys/covid-19; 2021. Available from:
  83. 83. Bender JK, Brandl M, Höhle M, Buchholz U, Zeitlmann N. Analysis of Asymptomatic and Presymptomatic Transmission in SARS-CoV-2 Outbreak, Germany, 2020—Volume 27, Number 4—April 2021—Emerging Infectious Diseases journal—CDC. Emerging infectious diseases. 2021. pmid:33600301
  84. 84. Childs M, Kain M, Kirk D, Harris M, Ritchie J, Couper L, et al. Potential Long-Term Intervention Strategies for COVID-19; 2020. Available from:
  85. 85. Horwitz LI, Jones SA, Cerfolio RJ, Francois F, Greco J, Rudy B, et al. Trends in Covid-19 risk-adjusted mortality rates in a single health system. medRxiv. 2020; p. 2020.08.11.20172775. pmid:32817973
  86. 86. Yu Y, Gu T, Valley TS, Fritsche LG, Mukherjee B. Changes in COVID-19-related outcomes and the impacts of the potential risk factors over time: a follow-up analysis. medRxiv. 2021; p. 2021.01.02.21249140.
  87. 87. Gray WK, Navaratnam AV, Day J, Wendon J, Briggs TWR. Changes in COVID-19 in-hospital mortality in hospitalised adults in England over the first seven months of the pandemic: An observational study using administrative data. The Lancet Regional Health—Europe. 2021;5:100104. pmid:33969337
  88. 88. Chen X, Hazra DK. Understanding the Bias between the Number of Confirmed Cases and Actual Number of Infections in the COVID-19 Pandemic. medRxiv. 2020; p. 2020.06.22.20137208.
  89. 89. Gelman A, Carlin J, Stern H, Dunson D, Vehtari A, Rubin D. Home page for the book, “Bayesian Data Analysis”; 2021. Available from:
  90. 90. Hobbs N, Hooten M. Bayesian Models; 2015. Available from:
  91. 91. Hogg DW, Bovy J, Lang D. Data analysis recipes: Fitting a model to data; 2010. Available from:
  92. 92. INSPIRE;. Available from:
  93. 93. Lewis A, Bridle S. Cosmological parameters from CMB and other data: A Monte Carlo approach. Phys Rev D. 2002;66:103511.
  94. 94. Lewis A. GetDist: a Python package for analysing Monte Carlo samples; 2019. Available from:
  95. 95. Census of India: Population Enumeration Data (Final Population);. Available from:
  96. 96. Babu GR, Sundaresan R, Athreya S, Akhtar J, Pandey PK, Maroor PS, et al. The burden of active infection and anti-SARS-CoV-2 IgG antibodies in the general population: Results from a statewide survey in Karnataka, India. medRxiv. 2020; p. 2020.12.04.20243949.
  97. 97. Babu GR, Sundaresan R, Athreya S, Akhtar J, Pandey PK, Maroor PS, et al. The burden of active infection and anti-SARS-CoV-2 IgG antibodies in the general population: Results from a statewide sentinel-based population survey in Karnataka, India. International Journal of Infectious Diseases. 2021;108. pmid:34029705
  98. 98. Home—Directorate of Economics and Statistics;. Available from:
  99. 99. Maharashtra Population 2021;. Available from:
  100. 100. Malani A, Shah D, Kang G, Lobo GN, Shastri J, Moha M, et al. Seroprevalence of SARS-CoV-2 in slums and non-slums of Mumbai, India, during June 29-July 19, 2020. medRxiv. 2020.
  101. 101. Population projections for India and states 2011–2036;. Available from:
  102. 102. Kar SS, Sarkar S, Murali S, Dhodapkar R, Joseph NM, Aggarwal R. Prevalence and Time Trend of SARS-CoV-2 Infection in Puducherry, India, August–October 2020—Volume 27, Number 2—February 2021—Emerging Infectious Diseases journal—CDC. Emerging infectiuos diseases.
  103. 103. Gupta A, Rajendran D, Rukmini S. India is under-reporting Covid-19 deaths. Here are some ways to work around the problem; 2021. Available from:
  104. 104. Rukmini S. Interpreting deaths in Chennai—The Hindu; 2021. Available from:
  105. 105. Deshmukh Y, Suraweera W, Tumbe C, Bhowmick A, Sharma S, Novosad P, et al. Excess mortality in India from June 2020 to June 2021 during the COVID pandemic: death registration, health facility deaths, and survey data. medRxiv. 2021; p. 2021.07.20.21260872.
  106. 106. Foy BH, Wahl B, Mehta K, Shet A, Menon GI, Britto C. Comparing COVID-19 vaccine allocation strategies in India: A mathematical modelling study. International Journal of Infectious Diseases. 2021;103:431–438. pmid:33388436
  107. 107. Cherian P, Krishna S, Menon GI. Optimizing testing for COVID-19 in India. PLOS Computational Biology. 2021;17(7):e1009126. pmid:34292931
  108. 108. Hazra D. Dhiraj Kumar Hazra / elixsir; 2021. Available from: