A new formulation of compartmental epidemic modelling for arbitrary distributions of incubation and removal times

The paradigm for compartment models in epidemiology assumes exponentially distributed incubation and removal times, which is not realistic in actual populations. Commonly used variations with multiple exponentially distributed variables are more flexible, yet do not allow for arbitrary distributions. We present a new formulation, focussing on the SEIR concept that allows to include general distributions of incubation and removal times. We compare the solution to two types of agent-based model simulations, a spatially homogeneous one where infection occurs by proximity, and a model on a scale-free network with varying clustering properties, where the infection between any two agents occurs via their link if it exists. We find good agreement in both cases. Furthermore a family of asymptotic solutions of the equations is found in terms of a logistic curve, which after a non-universal time shift, fits extremely well all the microdynamical simulations. The formulation allows for a simple numerical approach; software in Julia and Python is provided.


Introduction
A burgeoning number of papers attempting to model the dynamics of the COVID-19 pandemic have been published over the last few months [1]. Among these, a large fraction ( [2][3][4] are just a few examples) are based in more or less complex variants of the classical SEIR (susceptible-exposed/preinfectious-infectious-removed) model [5], which assume exponential distributions of incubation and removal times. Real epidemic data do not support however an exponential distribution for these parameters which are usually described by gamma, lognormal or Weibull distributions [6,7].
It is well-known that these more general distributions are difficult to be captured by classical compartmental models such as SEIR, which normally treat the incubation and removal times as exponentially distributed, or as the sum of several exponentially-distributed independent times. On the other hand, there is a vast literature on the study of non-Markovian stochastic processes to model epidemics [8], but no simple and concrete formulation of compartmental models for epidemics that implements general distributions. The goal of this paper is to provide such a formulation, which is also practical from a numerical point of view. The principles on which a completely general formulation of compartmental models can be built are actually already present in the seminal work by Kermack and McKendrick [9], where they considered the SIR model. Here, we focus on the more general SEIR model. Our starting point is a formulation of the SEIR concept that correctly represents the evolution of an epidemic under the assumption of full homogeneity and fixed values of the microscopic parameters, namely: i) number of contacts per unit time, ii) the probability of infection per contact, iii) the incubation time and iv) the recovery/removal time. By construction the inter-compartment probabilistic transition rates transparently conserve the total probability as time evolves: we thus refer to our construction as uSEIR, where "u" stands for unitary.
We demonstrate that the results of uSEIR describe very well the result of a microscopic Agent Based Model (ABM) simulation with no stochasticity in the model parameters. Arbitrary distribution of the incubation and removal times can be easily incorporated into our equations, recovering the classical SEIR equations in the particular case of exponentiallydistributed incubation and removal times. We also show that the resulting equations can be efficiently solved numerically, and provide appropriate codes and examples (with implementations in the Julia and Python/Cython languages).
The non-homogeneity in the infection rate per unit time is more subtle, because, in the extreme case, it should invalidate the treatment in terms of global S-E-I-R populations. We study two sources of this non-uniformity: inhomogeneity in the probability of infection per contact and inhomogeneity in the number of contacts per individual. The first is modelled with an ABM model with a negative binomial distribution of the probability of infection per contact, the second is modelled with a simulation on a scale-free network. The uSEIR equations represent instead the evolution in which the infection rate per unit time is the average one, independently of the underlying distribution. We observe that the simulations show a significant variance which however amounts mostly to a time-translation. When the different curves of infected individuals are shifted to tune their maxima, all the curves fall on a universal curve correctly reproduced by the uSEIR equations. We analyse the origin of this universality, i.e., independence on initial conditions. It derives from an asymptotic solution of the uSEIR equations, which is found to be a logistic curve whose shape is fixed by the average microscopic parameters. In the case of networks we briefly comment on the effect of clustering on the dynamics of the epidemic.

Epidemic dynamics: From local to global
The spread of an infectious agent in a large population is a complex stochastic process, which under certain assumptions can be described in terms of a relatively small number of global variables, which follow deterministic differential equations. In order to understand the underlying dynamics, it is useful to think of an epidemic outbreak in terms of simple agent-based models (ABMs) [10], where the microdynamics can be studied by computer simulations. In these models agents can make their own decisions based on the rules given to them, and the evolution can capture unexpected aggregate phenomena that result from combined individual behaviours. ABMs can incorporate easily stochastic parameters as well as heterogeneities in the population, and they are therefore a useful tool to study the performance of the description in terms of global variables.
In the context of an epidemic, agents have four possible states: Susceptible (S), Exposed (E), Infectious(I) and Removed (R). Only infectious agents can induce the change of state of another susceptible agent to that of exposed with a given infection rate, r S ! E . Each exposed agent necessarily becomes infectious after an incubation time, t i , while each infectious agent remains in this state only during the recovery/removal time interval, t r . In a real epidemic this time would be the interval of time during which an individual remains infectious. The agent can move to the removed compartment either because it dies, recovers or gets isolated. All these outcomes are equivalent as regards the evolution of the epidemic, which is monitored by the total fraction of agents in states S(t), E(t), I(t), R(t) at any given time. The study of an epidemic in terms of these variables is the SEIR paradigm [9]. In this context, the so-called basic reproduction number, R 0 , is a fundamental quantity that controls the rate of infection in an homogeneous susceptible population. It is defined as the average number of individuals that a given infectious agent turn to exposed in the interval t r , assuming a fully homogeneous and susceptible population.
In this paper we want to derive a set of equations for these global variables that describe correctly the global dynamics under the necessary assumption of a well-mixed and homogeneous population, if seen at a sufficiently large scale, but that incorporates arbitrary distributions of incubation, removal and infection rates in the microdynamics. These set of equations will be presented in the next section, where they will be benchmarked against two types of ABM simulations, that we briefly describe next. In the first type of ABMs, a number N of agents progressing through the S-E-I-R compartments move in an homogeneous space and get exposed by proximity to infected neighbours with some probability. The second type of models assumes the evolution on a network where the agents have a varying number of contacts.

Spatially homogeneous ABM
We use the MESA package [11] to simulate the spread of an outbreak in a homogeneous population. The agents in the model are called "turtles", following the nomenclature of the NetLogo [12] software. A two-dimensional turtle world is divided in a grid of equal size cells, and inhabited by N turtles. At the initial time the turtles occupy a randomly chosen cell and at each clock tick they do a random move to a neighbouring cells or stay in the same one.
To simulate the evolution of an epidemic, a small number of turtles are infectious, I 0 , at the initial time, while S 0 = N − I 0 are susceptible. At each clock tick infectious (I) turtles can expose any susceptible (S) turtle they find in their neighbourhood. Two turtles are considered neighbours if they share the same cell or are in neighbouring ones.
More concretely, at each time step each susceptible neighbour of an infected turtle, k is exposed with probability p. If the neighbour k becomes exposed, the clock time of this event is recorded, t ðkÞ s!e . At each tick all the exposed turtles are examined and eventually turned into infectious, at time t À t ðkÞ s!e > t i . Again, the time at which the transition to infectious happens is recorded, t ðkÞ e!i , and the turtle remains infectious until it progresses to the recovered compartment of time t À t ðkÞ e!i > t r . The basic assumption of homogeneity at large scales or full-mixing of the S-E-I-R populations requires that the probability of infection per contact is small enough. Only in the limit p ! 0 we can expect that the infecting agent sees an average population of susceptibles (i.e the turtles have time to diffuse before a second infection succeeds). In this limit, the basic reproductive number is given by where c is the average number of neighbours or contacts per unit time. For the rules described before, and in the case of a turtle world homogeneous population, c is simply: where A is the total number of grid cells in the turtle world and 9 is the number of neighbouring cells of any given cell (including itself). � is the measure of one time step. Therefore, the infection probability can be obtained from any desired R 0 as: The simulation with this p will only match the input value of R 0 in the simulation if p is small enough, so that the assumptions that go in the derivation of this formula are satisfied. The simulation is run for a time t � t r and we use a step such that t r /� � 30.
This basic simulation setup has to be supplemented with a prescription to choose the times that turtles remain exposed/infectious. A very common assumption is to take these times exponentially distributed. This can be interpreted as each turtle trying to leave the exposed/infectious compartments at each time tick with a fixed probability, leading to a Poisson process. Other (more realistic) choices of distribution (gamma, Weibull, etc) allow to model some inhomogeneity in the population. In this case, the evolution of the disease is described by a non-Markovian SEIR model.
At each step the software records the turtles in each compartment and thus provides (in the limit of small clock ticks) the functions S(t), E(t), I(t) and R(t), which can be directly compared with the predictions of the solutions of the SEIR models.

ABM on networks
Realistic populations are not necessarily well-mixed, at least not at small scales. Most individuals have contact only with a very small fraction of the total population. Complex networks show very rich topological features that are similar to real-world social networks. They can have a small number of links between nodes and still display the small-world phenomena. Scale-free networks can also capture the large difference of contacts that different individuals in society have. The study of the evolution of diseases on complex networks allows to study the impact of this rich topological structure in the evolution of and epidemic. The spread of epidemics on networks is an area of intense research. Since the seminal works [13] many studies have been performed on this topic (see the recent review [14] and references therein).
A network is just a non-oriented graph {G, E} consisting on nodes G ¼ fn i g N i¼1 and edges linking two nodes, E = {e ij }. We say that two nodes n a and n b are connected if e ab 2 E. The number of edges attached to a node n a is called its degree and labelled k a .
In the context of the spread of an infectious disease, each node is an agent, and the edges represent the contacts. Each contact links two agents that can expose each other if one is infectious and the other susceptible. The number of edges is therefore the number of contacts. At each tick of time, � � t r , infectious nodes pick a single edge at random, and if susceptible they attempt to infect the node attached to it with probability p. In this setup, each click of time represents therefore a contact and all nodes have the same number of contacts per unit time c = 1. More general situations can be simulated by allowing infectious nodes to attempt infecting several nodes at each time step. As in the turtle world, the infectious agent remains so in a time interval t r , while the times t r , t i can be chosen different for each node by drawing samples of some previously chosen distribution.
In contrast with the turtle world, a general network breaks the assumption of full mixing, since two nodes that are not linked have exactly zero probability of transmiting the disease between them. On the other hand, a fully connected network, where each node is linked to the remaining N − 1 nodes, represents correctly a fully mixed situation. In this case the basic reproductive number R 0 is simply For a general network with small clustering, the correction to this relation is expected to scale as / 1/hki as shown in Fig 1). More generally, the value of R 0 can depend in a non-trivial way on the network topological properties.
In our study we will concentrate on a particular one-parameter family of random networks described by Klemm and Eguiluz (KE) [15]. These complex networks show a number of features that are expected in realistic networks: Scale-free Nodes with both large and small number of contacts are present. In fact the distribution of the number of nodes is given by the power law Small-world Most nodes are not linked between themselves (i.e. hki�N), but every link can be reached from any other by a small number of hops. Being more specific, the average distance between nodes gros logarithmically with the size of the network hdi�logN.
High clustering Even if two networks share the number of nodes, edges and the degree distribution, they can look very different if the average clustering coefficient, hCi, is different.

PLOS ONE
The clustering coefficient c i of a node n i measures the probability that two neighbors of n i are also neighbours In Fig 2 we show two networks with equal distribution of k that differ only in the different clustering properties. KE networks depend on a free parameter μ that does not affect the average degree of the network or its distribution, but affects severely the value of the clustering, interpolating from almost no clustering hCi = 0 for μ ! 1, to a very clustered network with hCi�0.84 for μ = 0.

uSEIR formulation
A real epidemic is a complex stochastic process that eventually evolves to a regime where there are large numbers of individuals in the S-E-I-R compartments. In the assumption that the populations in these compartements are homogeneous and maximally mixed, the dynamics of the system should be well described in terms of the global variables S(t), E(t), I(t), R(t), whose time evolution is described by a set of deterministic differential equations [9,16]. We first want to derive the set of equations that should describe the dynamics of these variables under the assumption that the incubation and removal times are fixed. The relation between the changes in these variables is essentially fixed by unitarity. On the one hand, each individual must be in one of the S, E, I or R compartments. Therefore the number of individuals in the population, N, is a constant: Secondly, there must also be a relation between the rates at which these different individuals move from one compartment to the next. An infectious process is that in which an infected individual gets in contact with a susceptible one. Let us call r S ! E the rate of infection per unit time per infected individual and per susceptible individual. The number of susceptible individuals gets reduced by those that become exposed between [t, t + dt], that is:

PLOS ONE
The parameter r S ! E that governs the infection rate is often denoted as β/N in standard SEIR notation. This is the basic equation that assumes an homogenous and maximally-mixed susceptible and infectious populations, making the treatment of the microscopic process in terms of global variables possible. It constitutes the simplest possible form of the force of infection, defined as (minus) the logarithmic derivative of S. Keeping within the simplest approximation, if the incubation and removal times of all individuals have the same values, we must also have that the individuals that become exposed at time t are those that move from compartment S ! E minus those that move from E ! I. But the latter must be the ones that entered the exposed compartment in time t − t i . Therefore we have: where θ is the Heaviside step function.
The initial conditions to these equations start with a fixed N and a number of infected individuals at time t = 0, I(0) = I 0 , so that S(0) = S 0 = N − I 0 , while E(0) = 0 and R(0) = 0. In the equations above, the number of initially infected individuals does not recover, but we can easily force this with the substitution in Eq (8): These equations depend only on three variables, namely r S ! E , t i and t r , which in principle are the same parameters appearing in the classical SEIR models. In terms of the basic reproduction number, R 0 , r S ! E corresponds to the combination: Note that R 0 is proportional to t r , while r S ! E is independent of t r . In a microscopic description of the infected process as in the ABM simulations, the rate is related to the microscopic parameters via R 0 from Eq (1) or Eq (4) for the different ABMs.
We can compare the uSEIR and classical SEIR solutions to the ABMs simulations, matching the basic microscopic parameters. In Fig 3 we show the curve for the fraction of infected individuals as a function of time measured from 10 independent turtle simulations in a population of 10 4 agents with a fraction of infectious agents of 10 −3 at t = 0, and assuming fixed parameters t i , t r and r S ! E for all the agents. The uSEIR solution agrees very well with the simulations, while the classical SEIR predicts a wider and less pronounced peak. This is of course not surprising, since classical SEIR is known to be valid when t i and t r are exponentially distributed, corresponding to an underlying Markovian stochastic process. Modifications of SEIR equations adding more compartments can be designed to represent Erlang distributions, that for sufficiently large n are narrower, but uSEIR gets the limit of fixed values of the parameters directly and without these complications.
In realistic cases, not all individuals have the same incubation or removal times, and certainly not all individuals have the same number of contacts and probability of infection per contact. In the following, we consider the effect of these different non-homogeneities.

Generically distributed t i and t r
Non-trivial distributions for t i and t r can be incorporated in the uSEIR equations by considering different compartments of individuals. For example, the population divides into those with different incubation periods, t ðkÞ i , so we have S k (t) as the susceptible individuals in the k-th compartment of incubation time. Each compartment follows its usual progression S k ! E k ! I k ! R k , but the important point to notice is that a given susceptible individual in compartment k becomes an exposed individual in the same compartment k, but can get infected from any infectious individual in any other compartment. If we assume that the capability to infect per unit time is independent on the compartment, the number of susceptible individuals in compartment k changes as they become exposed according to: while Eq (9) will still be valid for the exposed, infected and recovered in each compartment k, taking the incubation period as that corresponding to this compartment, t (k) . Summing over all the compartments, the first equation leads to: while in the others we get

PLOS ONE
Obviously in the limit of t ðkÞ i varying continuously the sum becomes an integral with the corresponding PDF, P E (t i ): X k ð:::Þ ! Z dt i P E ðt i Þð:::Þ; We can similarly assume sub-compartments for varying t r , with a PDF P I (t r ), and the modification would be analogous, resulting in the following delay integro-differential equations: We refer to Eqs (16) and (9) indistinctively as uSEIR.
A simple and efficient algorithm to solve these equations, complete with easy-to-use codes in Python and Julia, is described in S1 Appendix.

Recovering classical SEIR
In the case where the probabilities are exponential, the integro-differential equations can be reduced to regular differential ones, of the classical SEIR type.
Let us assume and define The derivative of this function is related to that of E(t), using Eq (14), so up to a constant f ðtÞ ¼ À EðtÞ Since f(0) = E(0) = 0, the constant must vanish and the equations reduce to: Analogously we define which for an exponential with average ht r i satisfies and therefore gðtÞ ¼ IðtÞ where C 0 = −I(0)/ht r i. Finally, the integral in the first equation: Z dt r P I ðt r ÞĨðtÞ ¼ IðtÞ À Ið0Þð1 À e À t=ht r i Þ � � IðtÞ: Finally, defining � RðtÞ � RðtÞ þ Ið0Þð1 À e À t=ht r i Þ; ð26Þ we recover the classical SEIR equations: We can incorporate easily the exponential distributions for t i and t r in the ABMs simulations, while we maintain the rate of infection constant. The comparison of the SEIR solution of Eq (27) and the homogeneous ABM simulation with exponentially distributed t i and t r is shown in Fig 4. The agreement as expected is good, even if the variance is much larger than in the fixed-parameters case. In fact, an interesting observation is that most of the observed variance of the outbreaks is a simple time translation. If we time-translate all the outbreaks to make their maxima coincide the variance is much smaller and the agreement with SEIR better, see Fig 5. We will discuss the origin of this time-shift in the following section.
An exponential distribution for the incubation and removal times is however not realistic. A more realistic distribution seems to be, e.g., a general gamma distribution, Γ[k, θ]. For

PLOS ONE
the COVID-19 epidemic, the distribution of incubation times has been shown to be well described with a gamma with parameters (k, θ)'(5.8, 0.948) [7], corresponding to an average ht i i'5.5 days. For the removal time, we assume the same distribution with parameters (6.5, 1). This corresponds to the average between the "short" and "long" removal times discussed in [7].
We compare the results of the gamma-distributed ABM simulations in Fig 6. As expected, classical SEIR does not give a good description of the simulations in this case, while solving the integro-differential Eq (14) does.
We note that, although there is a vast literature on incorporating arbitrary distributions for t i and t r in stochastic approaches to the propagation of epidemics (see, e.g., a recent review in [8]), we have not found a simple formulation of the problem in the context of compartmental models such as the one described by equations Eq (16). While generalizations of exponential distributions, such as the Erlang case, are dealt with in the standard SEIR literature using a superficially similar sub-compartmentation approach (see, e.g., [17][18][19][20][21]) this is not quite as general as the treatment described here.

Non-uniform infection rate and universality
A different situation is when the rate of infection is non-uniform across the population. It is important to stress that the rate depends on two independent parameters: the number of contacts per infected individual, which critically depends on the clustering properties of the social network, and the probability of infection per contact. Non-uniformity can originate in either of the two properties. In this section we will consider the simplest case of a uniform number of contacts, but a non-uniform infection probability per contact.

Non-uniform rate: The probability of infection
We could separate the population in individuals that infect others with different rates. The rate might depend on the type of infectious individual and the type of susceptible individual. Defining r l k to be the rate at which an infected individual of type l infects a susceptible individual of type k. The equations in this case are: where t i and t r might also depend on the compartment. Assuming that the rates only depend on the type of infecting individual and not on the type of susceptible and, for simplicity, that t i and t r are fixed, only the total number of individuals in each compartment needs to be evolved. This is the case, because the different compartments are in some proportion in the population and we assume the proportion is preserved by the initial conditions of the I k (0) and S k (0). The equations reduce to the usual ones with a rate that is the weighted average: where p k is the proportion of individuals in compartment k. In the continuous case where P R (r) is the corresponding PDF. However, this result seems in conflict with the fact non-uniformity in the rate is known to be very important in the evolution of an epidemic (see, e.g., [22,23]). One example of this is the relevance of the fraction of individuals for which the probability of infection is zero. Their presence in a given population implies that the effective number of useful contacts gets reduced. When the fraction of the population with zero infecting power is large enough the epidemic may be aborted. In practice, the effect is similar to that of herd immunity, used to measure the needed number of vaccinations to abort an epidemic. A very rough estimate for the fraction of herd immunity, f H , would be For example, with R 0 � 3, f H � 0.7, that is 70% of the population. One would then naively expect that in an epidemic where this estimate holds about 70% of the population ends up getting infected; however, in the previous examples a larger fraction is found. The reason for this overshooting effect is that, due to the time delay in the process, the fraction of recovered individuals grows slowly and is not effective in reducing the growth of the epidemic sufficiently, as would be the case if the fraction of immune individuals had been present from the start, as would be the case, for instance, in a (partially) vaccinated population. Note that in the SEIR paradigm, the immune population is part of the susceptible, that pass by the compartments E ! I ! R but have zero infecting power when they are I so they are inert. In practice the evolution of the epidemic would be identical if we just dropped them from the start and readjust the rate not to include them.
It has been argued that for COVID-19 the distribution of R 0 across the population is well described by the negative binomial distribution, NB[0.16, 0.0437] [24], which has average 3.5 but a large dispersion. This distribution implies that about 60% of the population is immune (not far from the naive herd immunity), while there must be few individuals that have a very large rate of infection, the famous superspreaders.
In Fig 7 shows the evolution of 100 simulations assuming fixed t i and t r while R 0 is drawn from this negative binomial. The average of those outbreaks as well as the result of uSEIR using the average hR 0 i are also shown. Clearly the variance is huge, and the average is not a good representation of the individual epidemic histories. The uSEIR curve misses completely the outliers.
There is an interesting observation however. If all the curves are time-translated to make their maxima coincide, they fall in the uSEIR curve, as shown on the right Fig 8. This fact can be interpreted as follows. The position of the peak is non-universal, because it depends very sensitively on the initial conditions, in particular on what is the infectious potential of the first infectious agents. Since all epidemics start with a small number of individuals, we cannot invoke the central limit theorem for the initial stages of an outbreak. These stages have a large variability, however as the exponential grows the averaging effect of the population starts to be effective. The curve around the maximum is in fact universal, in the sense that it depends on the average of the basic parameters and not on the initial conditions, as we now show from the uSEIR equations.

Universality and the logistic curve
We have observed that the main effect of the different initial conditions is a temporal shift of the maximum, but the shape or the height of the infection curve does not change significantly. This strongly suggest that the equations have a universal solution. We have indeed found it.

PLOS ONE
Let us consider the differential Eq (9) near the maximum of the infection curve t max , which will remain as a free parameter. Let us also assume that t max � t i , t r , and define the function The differential equations for the uSEIR with fixed t i and t r and for t � t i , t r : which implies Since I(t)!0, E(t)!0, F(t) = S(t)I(t)!0 as t ! 1, it follows that C 0 = 0, C 00 = 0 and C = N. Using the previous equations, it is easy to derive a differential equation for F(t), expanding at

PLOS ONE
linear order in t i and t r : We are interested in the solution near the maximum, so we use the initial conditions: This non-linear equation has an analytical solution given by: with a � r ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi This is the universal function that drives the evolution of the infected, exposed and suscepti-ble+recovered individuals near the maximum. The maximum of the infected is at t max − t i for the infected, while the maximum(minimum) for the exposed (susceptible+recovered) is at t max . The integral of this function from [−1, 1] is Note that for large t max , the range t < 0 gives a negligible contribution. We can also derive the value of the susceptible at t max since and the curve of the susceptible can be easily obtained The total number of susceptible at the end of the epidemic is therefore: With this we conclude that the epidemic curve is universal once the value of the maximum position is determined. The value of F 0 should also depend on the basic parameters and not the initial conditions, although the precise value is not easy to get. A rough estimate can be obtained as follows. Near the maximum, and if the incubation and removal times are sufficiently small, we can approximate that R(t max ) ' I(t max ) + E(t max ), since the infected and exposed quickly recover; using this and the value of S(t max ) we can estimate F 0 to be The only dependence on the initial condition remains in t max . In Fig 9 we compare the numerical solution to the uSEIR equations to the analytic expression of Eq (36), fixing the parameters F 0 and t max (the height and the position of the peak) from the numerical solution.
Varying the initial conditions, that is the fraction of the number of infected individuals at t = 0, shifts t max , but otherwise leaves the curve invariant. As can be seen, the analytical solution around the peak describes very well the full uSEIR solution. The agreement is better for smaller values of t i and t r .
It is possible to extend this asymptotic solution to the case where t i and t r are not fixed but drown from distributions P E (t i ) and P I (t r ). Eq (34) gets modified in that the coefficient of the last term becomes: where hi refers to the average with the corresponding PDF. Therefore the logistic, Eq (36), is still the asymptotic solution with a modified parameter: a ! r ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi

Non-uniform rate in network simulations
We now consider the non-homogeneities in the social contacts. We have generated a number of KE networks with hki = 40 and different clustering properties, by changing the μ parameter. The networks have 10 6 nodes. On these networks we evolve the epidemic using the time progression explained above, starting with 10 infected nodes. The probability of infection per contact is p = 2 × 10 −3 . For the incubation and removal times we assume a LogNormal distributions, in units of the step time, �, with parameters (μ X , σ X ) = (10 3 , 200) for t r and (μ X , σ X ) = (500, 100) for t i , where μ X is the mean and s 2 X is the variance. For these parameters, pht r i = 2, which gives an approximation to R 0 up to 1/hki corrections, as explained in sec. 1. From Fig 1, we can get a more precise estimate of R 0 = . For each network we run a number of simulations and average the S-E-I-R fractions, after performing a time-shift to make their maxima coincide (which as in previous cases, reduces most of the variance). In Fig 10 we show the evolution of infected individuals as a function of time for the various networks. We observe a clear dependence on the clustering parameter, but nevertheless the data in all cases is extremely well described by the universal behaviour derived from uSEIR, Eq (36). The lines are three-parameter fits (a, I 0 , t max ) of the form: while and using the rough estimate of Eq (42), we get I 0 /N � 1/6. Both parameters would therefore be given in terms of the average microscopic parameters.
In Fig 11 we  is roughly constant and matches rather well the microdynamical average value of Eq (37). Instead I 0 /N decreases with clustering, even for small clustering. This effect can be interpreted as effective suppression of the fraction of susceptible population: clustering seems to screen the access to the susceptible. Note that if we substitute in the uSEIR equations S by f c S, where f c is the screening factor, the asymptotic solution is as in Eq (45) with I 0 ! f c I 0 , while a ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi I 0 =N p À 1 remains invariant. This could explain the behaviour found at small clustering.
At large clustering, on the other hand, the parameters I 0 and a show a non-trivial dependence with clustering. In spite of this, the logistic remains an extremely good description of the time evolution of the infected fraction. It would be interesting to understand this behaviour in

PLOS ONE
terms of a renormalization or screening of the basic parameters, or modifications of the force of infection with respect to the well-mixed approximation yielding −S 0 /S / I. As a final comment, we note that everything we have studied here assumes no time variation of the basic parameters. In a real epidemic, measures of social distancing, self-protection, etc. are taken, that induce a sudden change of the basic parameters, particularly the rate of infection. This effect induces a quench of the epidemic curves that we have been discussing in this paper. It will be interesting to explore to what extent the evolution after the quenches can be understood in terms of the fundamental parameters, in particular whether the universality near the herd-peak translates into some universality of the curve after a quench, if it has happened in the asymptotic regime.

Conclusions
In this paper we have presented a simple formulation, uSEIR, Eq (16) of the SEIR modelization of a epidemic outbreak that properly accounts for an arbitrary distribution of incubation and removal times, reducing to classical SEIR in the limit of distributions of the exponential family. We have compared this model with a series of ABM homogeneous simulations for various scenarios including fixed values for the incubation and removal times, as well as various realistic distributions for the the latter, or for the probability of infection per contact. We have also considered ABM simulations on scale-free networks with varying clustering properties. In all cases, the model reproduced the simulations accurately after a non-universal time-shift. Only in the presence of large local clustering in the distribution of contacts we observed a clear deviation, when the averages of microdynamical parameters are included. The uSEIR formulation allowed us to understand the universality property observed in different outbreaks in the simulations. This derives from an explicit asymptotic solution found for small incubation and removal times in terms of a logistic curve, with a shape that can be determined in terms of the microdynamical parameters. This curve is found to fit very well the data even in cases of large clustering, provided the parameters are left as free fit parameters, suggesting that the dynamics in the high clustering regime may still be well described in terms of global variables but with screened or renormalized parameters. On the contrary, the early stages of an outbreak are highly non universal, an aspect that should be carefully taken into account when fitting data and predicting using any SEIR modelling. Only when the early stages of exponential growth are well underway, is uSEIR expected to be a good description. The averaging of independent outbreaks, without taking into account the non-universal time-shift, can also be very misleading.