Estimating the herd immunity threshold by accounting for the hidden asymptomatics using a COVID-19 specific model

A quantitative COVID-19 model that incorporates hidden asymptomatic patients is developed, and an analytic solution in parametric form is given. The model incorporates the impact of lock-down and resulting spatial migration of population due to announcement of lock-down. A method is presented for estimating the model parameters from real-world data, and it is shown that the various phases in the observed epidemiological data are captured well. It is shown that increase of infections slows down and herd immunity is achieved when active symptomatic patients are 10-25% of the population for the four countries we studied. Finally, a method for estimating the number of asymptomatic patients, who have been the key hidden link in the spread of the infections, is presented.


Introduction
COVID-19 infections have breached the five million mark, yet there is neither a vaccine nor a scalable treatment in sight [1,2]. Furthermore, a distinctive feature of the COVID-19, in contrast to other infectious diseases such as Influenza or SARS, is the presence of a large fraction of "asymptomatic" patients, who don't have any obvious symptoms but are still capable of infecting susceptible individuals through contacts. However, identifying individuals spreading infections via the asymptomatic pathway is not easy unless extensive contact tracing and testing is performed. A major challenge is the uncertainty in the estimation of asymptomatic fraction, with estimates ranging from 41% to 86% of infected [3,4]. And along the symptomatic pathway, 44% [5] to 68% [6] of the infections are spread before the onset of symptoms rendering the quarantining people with symptoms less efficient compared to other infectious diseases. These challenges have driven governments to implement non-pharmaceutical interventions (NPIs) such as social distancing and partial or full lock-downs [7]. An unsaid, a posteriori, rationale for these lock-downs is that they provide efficient isolation mechanism for the asymptomatic. However, a dearth of quantitative understanding of the effects of the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 lock-down has triggered debate around the effectiveness, duration and mode (partial vs. full) of lock-down. Thus, it is even suggested that societies should just move in an unhindered manner, towards the attainment of the "herd-immunity threshold" [8]. This threshold is achieved when a sufficiently large proportion of a population becomes immune, and as a result, the disease spread slows down. For COVID-19, estimating the onset of herd immunity remains elusive, and indeed, ascertaining whether herd immunity exists at all! Moreover, high case fatality rate of 3-10% (vs. 0.05% for seasonal influenza) limits the practicality of herd immunity as an effective policy tool. Thus, models that can provide quantitative estimates of the disease spread and the impact of policy measures are expeditiously required.
Similar to other epidemics/pandemic, three different kinds of models are used for COVID-19: 1) Statistical extrapolation models which fit the observed patterns of infections to make short-term prediction [9,10], 2) Agent based models for a qualitative illustration of microscopic dynamics of spreading infections [11], and 3) Compartment models which divide the population into groups based on the current different disease state of the individual and model the interaction among them [2,[12][13][14]. Since 1927 plague in Mumbai, compartmental models have been a standard guiding tool for policy decisions [15]. The spread of flu-like diseases (influenza, SARS, COVID-19 etc) is often modelled using three or four compartments: Susceptible-Infected-Recovered (SIR) or Susceptible-Exposed-Infected-Recovered (SEIR). Some variants, also consider theoretically a simple containment option, of quarantining infected persons with symptoms. However, all these models assume that only contact between the S and the I compartments leads to new infections, with the implicit assumption that contact between the S and E compartments does not lead to any infection. In contrast, an asymptomatic patient with COVID-19 can, and does, infect susceptible individuals through contact. Thus, epidemiological models must consider the distinction between asymptomatic and symptomatic. Moreover, models should distinguish between lock-down and quarantine as these are two qualitatively different policy tools the former operating at the level of a society and the latter the level of a few individuals.
In this paper, we aim to model all these novel aspects of COVID-19 and accomplish three goals: 1. Formulate a minimal epidemiological model incorporating the above mentioned unique aspects of COVID-19 disease spread and associated policies. We accomplish this by formulating a SAIR model by including the asymptomatics (A) and suitably adapting the governing equations. Since lock-down was unique to COVID-19 management, we also include it in an explicit fashion by using discontinuous in time reproduction rate (the effective rate at which susceptible population get converted into infected).
2. Establish that the model representatively captures the observed epidemiological data, and sheds light on the underlying parameters and universalities that govern the dynamics in the different phases of the pandemic spread and containment. We accomplish this by deriving a closed-form solution for the SAIR model with and without lock-downs, and using the solution to estimate the underlying parameters that drive the infection dynamics.
Science and Engineering Research Board (SERB) provided additional funding to MV. No additional funding was received for this research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: Sankhya Sutra Labs provided support for the study in the form of salaries for authors [SB and SA]. Additionally, MKP is the cofounder and director of VNIR Biotechnologies but does not receive a salary. There are no patents, products in development or marketed products associated with this research to declare. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Unique character of infectiousness
We begin by emphasizing the difference between SEIR and SAIR models [12]. A typical SEIR model assumes a framework of serial, directed transitions across the intermediate health states of the individuals (Fig 1). In this framework, the infections are caused when a susceptible person comes in contact with a person deemed to be infected person on the basis of the symptoms (I). However, after this contact, with a certain likelihood the person remains in a pre-symptomatic intermediate state or the exposed individual (E), that is not contagious, before transitioning to a contagious and symptomic state (I). While this framework is acceptable for influenza or SARS, the epidemiology of COVID-19 is such that there is an alternative pathway between the susceptible (S) and the recovered states (R) which passes through asymptomatic individuals (estimated to be around 86%), [3] who never show any symptoms but carry enough viral load to infect others. Thus a model for COVID-19 should consider two parallel pathways of infection ( Fig 1B).
We develop a generalized version of the model considering the Susceptible, Asymptomatic, Infected and Recovered compartments (SAIR model) and assuming a homogenously mixed population segment where COVID-19 is spreading. The system will obey the following SAIR dynamics _ S ¼ À aðtÞ SðI þ AÞ; where for any variable X time derivative is denoted as _ X ¼ dX=dt. We assume that α(t) denotes the probability with which, when a susceptible person meets an infected or asymptomatic person, they become a part of the asymptomatics, which for simplicity includes the presymptomatics and the asymptomatics. δ is the rate at which pre-symptomatics or asymptomatic patients get converted into symptomatic ones and γ is the rate at which both asymptomatic and symptomatic patients recover. β(t) is a migration parameter, the origin and exact functional form of which will be discussed in the next section.

Lock-down for infection containment
In our formulation of the model, we claim that the lock-down can be modeled by considering a sudden change in the infection rate constant using a Heavyside function as aðtÞ ¼ a 0 Hðt lock À tÞ. Here, we note in passing that one can model social distancing as reduction in value of α or an imperfect lock-down. This term, typically absent in standard models, denotes the fact that in an idealized lock-down no susceptible person meets an infected person and thus first order reaction changes to a zero-order reaction. In a minimal model, one may assume that asymptomatic patients either get converted into symptomatic one with an effective rate δ or recovers with a rate γ.
Before we proceed to analyse the model, we wish to point out that one may add further complication to this model by introducing more parameters and compartments. For example, recovery rate γ and infection rate α need not be same for asymptomatic and symptomatic fraction [12]. However, as there is no biological evidence to the contrary, we assume that both rates are equal, which leads to an analytically tractable and simplified framework.
Further, during this crisis we learnt that once a lock-down is announced, people migrate across different segments of a country. Even for a qualitatively correct modeling of disease spread dynamics, it is important to account for this migration of people. This migration can indeed happen in many waves. However, for simplicity we assume that it happens once and only during a short duration after lock-down. Furthermore, one would expect that among infected population only asymptomatic people are able to travel. Here, it needs to be reminded that, we are only interested in the influx of the infected population in a given population segment, and not the details of where they came from. In order to model such a scenario, we take typical thermodynamic route of dividing the system into two parts: system (Eq (1)) and universe (given in Eq (2)). Finally, the coupling constant bðtÞ ¼ b 1 fHðt lock þ � À tÞ À Hðt lock À tÞg and � is the short period of time post lock-down, in which population migration is allowed/possible. For the sake conceptual completeness of showing where these migrations happen from, one can also write the equations for the net of all geographical regions outside the region of interest (which for the sake of convenience is described as the rest of the Universe (U)). This migration is a characteristic of the system (country or region under consideration) and parameters β and � need to be extracted from the data. The universe (rest of the world) can also be assumed for this purpose to be following a similar SAIR dynamics, but with an outflux term (−β(t)A) in the asymptomatic dynamics Eqs (1) and (2) complete our development of COVID-19 specific model. However, in the rest of this work, we will focus on solving Eq (1).

Analytical solution of COVID-19 model without lock-downs
In the present work, we solved a phenomenological model of a well-mixed society, with everyone interacting with everyone else. However, the interactions may be structured by age, local movement of the population, and many of these can be modelled in the framework of agent based models. The formulation of the disease specific interactions we developed can also be integrated into other models which study the interactions at agent level detail, or in tandem with economic consequences [16], both of which are beyond the scope of the present work.
With an emphasis mainly on the spread of infections at the societal level, we show that the set of equations we model are sufficient to capture most of the available epidemiological data on COVID-19.
This system of equations can be solved for pre-lock-down situation in terms of the reproduction rate which can be solved in terms ofS ¼ S=S 0 as where M + S + R = 1 when there is no influx of people into a country, S 0 denotes the susceptible population at t = 0 and the recovered population at t = 0 is taken to be 0. Substituting the expression from Eq (4) in the evolution equation for S gives us the parametric solution in implicit form as Assuming that the equation can be converted to an explicit form forS as a function of t, it is possible to substitute this into Eq (4) to obtain an expression for M as a function of t. Finally, the expression for M(t) can be disambiguated into separate expressions for I(t) and A(t) by using Eq (1). Specifically, in the equation for _ I, we can substitute A = M − I, which gives _ I ¼ À ðd þ gÞIðtÞ þ dMðtÞ: If we define a new constant δ 1 = γ + δ, then the solution of the above equation is Therefore the key is to turn Eq (5) into an explicit expression, to the extent possible. For this purpose, we use Hermite-Hadamard inequality for the logarithm [17] z which suggests that we use approximate form of the logarithm as log z = (z − 1)(w 1 /z + w 2 ), with the constraint that w 1 + w 2 = 1. Fig 2 depicts the ability of the approximatrion to capture log(z) when 0 < z < 1. Upon approximating the logarithm, we get a solution in explicit form asS where a = (S 0 r 0 − w 2 )/r 0 , b = (r 0 + w and h is a constant such that and h ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi b 2 À 4ad p . Once the evolution equation for S is known in a closed from, we find the evolution for the remaining variables using Eq (4) . Fig 3a and 3b) depicts a representative temporal variation for the parameters S, A, I and R captured using the numerical and analytical solution. The analytical solution formulated using the above approximation to logarithm is found to be in close agreement with the numerical solution of the ODE.

Analytical solution of COVID-19 model with lock-downs
On the other hand, after an idealized lock-down no susceptible person meets an infected person and thus the first order reaction changes to a zero order reaction. The intermediate time (t lock < t < t lock + �) solution simplifies to Once there is no more flux of asymptomatic individuals (i.e. after t lock + � days), the equations for M yield an exponential decay given by and the infections post lock-down is given by where, The recovery rate is split into two parts The recovery of infected people is easier to track for any country and hence is the more important equation for parameter extraction.
Eqs (10), (12), (13) and (16) are the closed-form solutions to the model we developed and are used in the next section for parameter estimation.

SAIR to SIR
One may think of SIR model as a coarse-grained version of the SAIR model formally. In the coarse grained picture of SIR model, one does not make a distinction between asymptomatic and symptomatic. Thus, it is natural to expect that the total infectionĨ ¼ A þ I shall follow the SIR dynamics. Indeed, upon ignoring the migration effects (β = 0), we arrive at the SIR model in term ofĨ as We would like to insist that despite this formal relationship between the model, the two model represent fundamentally two different dynamics in reality. This is largely due to the fact thatĨ is not an observable and one can not get any information on it by looking at reported infection number unless full contact tracing is implemented to test even asymptomatic fraction. Thus, although it may be possible to arrive at an SIR model from SAIR under certain conditions, the reverse is not the same. Nevertheless, in these limits where this conversion and comparison to SIR model is possible, we demonstrate the quality of our approximate solution. Fig 4 shows that the proposed approximation in Eq (5) yields more accurate results than a third order Taylor series approximation for the same [18]. A more elaborate contrast between present result and various other approximations [18][19][20] is left for future project.

Closed form solution
COVID-19 is unique in many ways. In this we work we attempt to capture two major differences as regards its epidemiology-the dynamical aspects of infections spread through pre-symptomatic and asymptomatic persons, and that containments are mainly achieved by lock-downs at the level of the society rather than quarantines. Our model and its closed formsolution address these unique aspects that COVID-19 presented itself with. Epidemics like SARS in 2003, Swine flu in 2009, MERS in 2012 and 2015, could be managed at most with contact tracing and quarantine, and hence addressing a solution for the lock-down did not arise with earlier epidemics. As can be imagined, the closed form solution makes both the prediction and the assumptions/parameters involved in the prediction transparent, which can otherwise be buried in the numerics. In addition, since when we work with real data, rather than use a priori assumptions, the formulation helps an easy extraction of the underlying parameters as it is done below.

Parameters and their universality
The publicly available reported infection and death data from different countries was gathered from the worldometer site (https://worldometers.info). The infection data from different countries that implemented a lock-down had three regimes-rising, intermediate and decreasing. It can be easily assumed that the reported infections are the symptomatic infections, since most countries have been short of testing resources; as a result, patients were tested for a confirmation only after the onset of symptoms. The analytical solutions for the active infections and recovered populations for both pre-and post-lock-down scenarios discussed above model these different regimes of the infection spread. These analytical solutions are then fit onto the reported infections for several countries Fig 5, to give us an estimate of the underlying parameters that govern the COVID19 infection dynamics. In this section we discuss the estimation procedure for the parameters (α 0 , γ, δ, β 1 ). We begin with Eq (16) as the data for recovered patients and active infections is readily available and a simple linear fit gives us an estimate for γ as shown in Fig 6. γ −1 is a measure of the average number of days it takes for a COVID-19 infected person to recover. The values of γ −1 found for France, Italy, Japan and Switzerland lie within a universal sensible range (2-4 weeks) observed for COVID-19.
Using Eq (10) and real time infection data, parameter α 0 can be estimated as shown in Fig  7. Fig 7 reveals that the three different regimes of the COVID19 infection dynamics could be modelled by the framework we developed. The parameters α 0 representing the rise is similar for many countries reiterating a universal pattern in the initial pre-lock-down regime. This can be understood as an intrinsic characteristic dynamics of COVID-19 which exhibits strong similarities across countries (see Table 1) Using the simplified equations post lock-down (Eqs (11) and (12)), estimated parameter γ and ð _ I þ gIÞ data for different countries, we estimate the parameters δ and β 1 , as shown in Figs 8 and 9 respectively. The origin of β 1 is the migration of people during lock-down and can be expected to be a country-specific event dictated by the prevalent social-political conditions.

Asymptomatic infections
So far we have made predictions based on the COVID19 specific model we developed and validated its efficacy by showing that the analytical solution provides very good fits to the three different regimes of infection. Following up on these validations from the observed infections, we use the model to estimate the number of asymptomatic people, an extremely important metric in quantifying the extent of COVID19 spread. Using our model and the parameters that were extracted, we could estimate how the number of individuals in the different compartments S, A, I and R changed with time with or without a lock-down. We estimate the ratio of the asymptomatic to symptomatic individuals (Fig 10), which varies from 1 to 30 depending on the phase of the pandemic dynamics. Seroprevalence tests on small population samples of a few  hundreds to a few thousands have estimated the asymptomatics to be present in excess of symptomatics by a factor of 5.7 in Geneva, Switzerland [21], 20 in Wuhan, China [22] to about 50 in Santa Clara, USA [23]. In practice, it has not yet been possible to perform tests on more than a few thousands from cities with millions in population. In that sense, the reliability of the serological tests can be established based on the randomness, and representativeness of the community. However, the predictions of the asymptomatics to symptomatics extracted using our model is in a comparable range, thus serving as a mutual validation, for our predictions and the small serological sample sets.

Herd immunity
Lock-down had been the most important non-pharmaceutical intervention against COVID-19. However, because of the economic losses caused by the lock-down there have been several proposals to let the infection take its natural course, which would theoretically slowdown sufficient population is infected, popularly known as the attainment of the herd-immunity. The estimates for the population fraction at which the herd-immunity is achieved [24] for European countries for example, had been around 70%. However, specifically for COVID-19, because of the large fraction of asymptomatic infections, estimates for the infection fatality rate, which considers both the asymptomatic and the symptomatic infections, were also made [25]. However, none of these have been made on an integrated framework, and these estimates use the asymptomatic to symptomatic ratio derived from other sources. Our model which is developed on an organic framework which can both estimate the fraction of asymptomatics, and make predictions for the infections, was used for shedding light on herd-immunity. Our results show that the herd-immunity, defined as the fraction of population at which symptomatic infections reach a peak and beyond which begin decreasing could be achieved at 12-25% of the population as illustrated in Fig 11) ( Table 2). These estimates for herd-immunity which are in single digit percentages only seem contradictory to estimates of 50-60% [25] until one realises the large fraction of the infections are asymptomatic accounting for a total infection of 50-56% of the population (Table 2). Thus our model allowed us to make estimates both for the hidden-asymptomatics and the herd-immunity, and the fraction of the symptomatics who will burden the health care system.

Conclusion
In conclusion, as a part of our analysis, we are able to provide a method for estimating the asymptomatic fraction of the population. Finally, by fitting our model to data from countries where the pandemic appears to have peaked, we are also able to estimate the level of herdimmunity. We are able to show that herd-immunity is achieved at levels of 10% to 25%, far lower than the levels suggested in the literature. We find that the SAIR model can be readily adapted to incorporate the effects of lock-down and the solution to the system of equations bears striking resemblance to the real-world data. The formal solution allows one to evaluate  the effect of lock-down as a policy tool and can also be integrated into other frameworks which study the economic consequences of the lock-downs.