A data driven change-point epidemic model for assessing the impact of large gathering and subsequent movement control order on COVID-19 spread in Malaysia

The second wave of COVID-19 in Malaysia is largely attributed to a four-day mass gathering held in Sri Petaling from February 27, 2020, which contributed to an exponential rise of COVID-19 cases in the country. Starting from March 18, 2020, the Malaysian government introduced four consecutive phases of a Movement Control Order (MCO) to stem the spread of COVID-19. The MCO was implemented through various non-pharmaceutical interventions (NPIs). The reported number of cases reached its peak by the first week of April and then started to reduce, hence proving the effectiveness of the MCO. To gain a quantitative understanding of the effect of MCO on the dynamics of COVID-19, this paper develops a class of mathematical models to capture the disease spread before and after MCO implementation in Malaysia. A heterogeneous variant of the Susceptible-Exposed-Infected-Recovered (SEIR) model is developed with additional compartments for asymptomatic transmission. Further, a change-point is incorporated to model disease dynamics before and after intervention which is inferred based on data. Related statistical analyses for inference are developed in a Bayesian framework and are able to provide quantitative assessments of (1) the impact of the Sri Petaling gathering, and (2) the extent of decreasing transmission during the MCO period. The analysis here also quantitatively demonstrates how quickly transmission rates fall under effective NPI implementation within a short time period. The models and methodology used provided important insights into the nature of local transmissions to decision makers in the Ministry of Health, Malaysia.


Introduction
Imported cases from China contributed to the first COVID-19 wave in Malaysia from January 25, 2020 to February 26, 2020 [1]. This first wave had a total of 22  also take into account heterogeneity in the disease spread such as varying contacts among susceptibles within the closed population. The starting point of our proposed models are the class of epidemic models with power transmission dynamics which are shown to incorporate heterogeneity (see [6,7]). Several studies in the literature [8][9][10][11][12] have analyzed the effects of NPIs in reducing the number of COVID-19 cases. In [8], the effects of different types of NPIs on COVID-19 cases are modeled using a negative-binomial distribution whose underlying parameters incorporate country information, type of NPI implemented and change-point effects. The associated Bayesian analysis is carried out using Markov Chain Monte Carlo (MCMC) algorithms to arrive at posterior parameter estimates and credible intervals. No epidemic models are considered in this work. A generalization of the SEIR epidemic model is considered in [10] to understand the dynamics of transmission in New York, USA, under various NPI settings. However, the model is complex and requires data-rich inputs for the estimation of all unknown parameters. As a result, the authors derive baseline epidemiological parameters from published literature and not from actual observed cases in New York, and in the end conduct a simulation study based on the assumed parameter values. The study in [9] extends the work of [13] and computes a time-varying basic reproduction number as a way of gauging the effect of NPIs over time. Both these works assume that serial intervals (i.e., the time between onset of symptoms for the infector and the infectee) can be computed for each case, which is another situation requiring data-rich input.
Studies that use compartmental epidemic models as a way of gauging the time-varying effects of NPIs have emerged over the course of the pandemic [14][15][16]. Compartmental epidemic models naturally model disease spread via contact rates which directly quantify the extent of NPIs (since, as mentioned earlier, NPIs are designed to reduce person-to-person transmission). Thus, epidemic models provide a natural approach for considering time-varying effects of the MCO period. Further, in this paper, the estimation of SEIR parameters is carried out based on local considerations and local data; they are not obtained from published literature based on studies conducted elsewhere where their local dynamics can be vastly different.
We seek to address one important aspect of Malaysia's multifaceted response to the COVID-19 pandemic, that is, to inform the health officials at MOH and aid them in their decision-making. Thus, our model was developed under local considerations using local data. Our model and related analyses are able to provide a quantitative assessment of (1) the impact of the Sri Petaling gathering, and (2) the extent of decreasing transmission during the MCO period by incorporating a time-varying contact rate parameter, which is estimated using locally available data. In essence, the proposed models here are being used as a lens to interpret the observed data in terms of when, and to what extent, a reduction in COVID-19 transmission occurred as result of the implementation of MCO.

Data collection
Daily situation reports on COVID-19 cases in Malaysia are published by the National Crisis Preparedness and Response Centre (CPRC) of MOH, as well as other official websites (such as outbreak.my). The data on daily COVID-19 cases have been published since 21 January 2020 and are publicly available. The reports consist of confirmed daily and cumulative cases, recovered cases and deaths, as well as cases requiring ICU care and ventilator support. Cases by states are also available for the 13 states and 3 federal territories. In this study, we studied characteristics of the second COVID-19 wave in Malaysia starting from March 1, 2020 (corresponding to the final day of the Sri Petaling gathering). Data used for the current study are confirmed daily cases for Malaysia, and for two states: Selangor and Sarawak. These states were chosen to illustrate the aggressive transmission propagated by the Sri Petaling gathering. Selangor is the state where Sri Petaling is located and from where a majority of the participants originated, whereas Sarawak represents a state which was essentially not affected by this gathering. The time period of study is between March 1, 2020 (end of Sri Petaling gathering) and April 28, 2020, covering the period immediately after the Sri Petaling gathering and the first three phases of the MCO. Our study duration is further divided into two periods. The first period ranges from March 1, 2020 until March 18, 2020, which covers the subsequent 17 days after the gathering. The second period is taken from March 18, 2020 until April 28, 2020, covering the three successive Phases 1, 2 and 3 of the MCO. Fig 1 gives the trajectories of reported daily COVID-19 cases between March 1, 2020 and April 28, 2020 for Malaysia, and the states of Selangor and Sarawak.
All COVID-19 cases reported by MOH were confirmed by real-time reverse transcriptasepolymerase chain reaction (real-time RT-PCR) tests. A positive case was reported when the person in question was found to be positive for SARS-CoV-2 via a real-time RT-PCR test. Upon confirmation, the individual was isolated at COVID-19 designated hospitals and healthcare facilities [17]. Active cases are defined as infected persons who were currently undergoing treatment, and hence, isolated and removed; the individual is assumed to be unable to infect other susceptibles in the population, and hence "removed" from further modelling steps. This study did not consider transmission from positive isolated COVID-19 patients to health personnel as there was no evidence of this type of transmission occurring in the COVID-19 designated hospitals in Malaysia.

The SEIR model
The typical and well-known SEIR compartmental model consists of four compartments (Susceptible, Exposed, Infected and Recovered) representing different stages of evolution of an infectious disease, such as COVID-19, in a population. Susceptible individuals come in contact with one or more infected individuals in the population, and subsequently, become exposed to the virus. The virus then incubates within these individuals for some time. At the end of the incubation period, the exposed person becomes infectious and transmits the disease to other susceptibles in the population who come in close contact to him/her. The infected person is assumed to be infectious for a certain period (called the infectious period) after which the  where S(t), E(t), I(t) and R(t) represents, respectively, the susceptible, exposed, infected and recovered compartments representing the total number of individuals in each compartment at time t (here, _ xðtÞ denotes the derivative of x(t) with respect to time t for x 2 {S, E, I, R}), N is the total population size, and hðS; IÞ ¼ b S N I is the rate of new infections (or, the number of new cases in the population). The parameters that govern the trajectory of the SEIR model are θ � (β, δ, γ, i 0 , e 0 ) which are, respectively, the transmission rate (i.e., number of individuals in the population an infected person comes in contact with and successfully transmits the disease per unit time), the rate of incubation of the disease, the rate of infectiousness, the initial number of infectives and the initial number of exposed individuals. Since Based on initial values of Sð0Þ and R(0) = 0 at time T 0 = 0, the SEIR ODE system can be solved numerically to yield the values of S(t), E(t), I(t) and R(t) for every t 2 [T 0 , T 1 ] where T 1 denotes the final time-point. In (1)-(4), the incubation period, 1/δ, is the reciprocal of the incubation rate δ, and similarly, the infectious period 1/γ is the reciprocal of the infectious rate γ. The correspondence between rates and exponential sojourn times is only approximately so since it is not so straightforward to establish this correspondence with an individual-based stochastic model given the non-linear nature of the process.
Modifications to the SEIR formulation (1)-(4) are made to adapt it to COVID-19 in Malaysia. The last compartment of the SEIR model in this case should be "Removed", and not "Recovered", representing all infectious individuals who are effectively isolated following a positive test result. In Malaysia, such patients are isolated in hospital wards to avoid further contacts with susceptibles [17]. For COVID-19 in particular, the onset of symptoms does not necessarily indicate the start of infectiousness; in fact, the onset of infectiousness may be somewhat earlier. Thus, the infectious period 1/γ represents the period of effective infectiousness, that is, the period from the start of infectiousness (asymptomatic or symptomatic) until the individual is isolated and can no longer infect others. Based on this understanding, 1/δ represents the incubation period, which is the period starting from getting infected until the onset of infectiousness. Recent studies on COVID-19 have also reported growing evidence of asymptomatic infections [18][19][20] which the original SEIR model does not incorporate. Hence, a new modified SEIR model is developed in the next section to account for isolated patients and asymptomatic transmissions.
Observed data in Malaysia consists of the total number of confirmed daily cases as reported by outbreak.my and CPRC (i.e., the number of cases that were tested positive, and hence, effectively isolated). Hence, only the R compartment of the SEIR model can be related to observed data while other compartments of the SEIR model remain unobserved. In the subsequent model development, the R compartment is further split into two: observed and asymptomatic sub-compartments corresponding to reported and asymptomatic infections; details are provided in the next section. An additional assumption made is that individuals from the R compartment do not return to the S compartment-at least on the timescales over which the epidemic is observed; see, for example, [21,22].
In order to provide a quantitative assessment of the impact of the MCO, we modify (1)-(4) to incorporate an instantaneous time-varying transmission rate, β � β(t) (see [23], for example) which is able to quantify the extent of disease transmission at time t. The MCO can be deemed effective if the function β(t) shows a decay reflecting an increasing effectiveness in reducing transmission among individuals over time due to implementation of the NPIs.

The modified SEIR model
The aforementioned SEIR model does not account for heterogeneity, asymptomatic transmission and change points. To this end, we propose models with power transmission dynamics that incorporate heterogeneity in the disease parameters; see, for example, [6,7]. Further, the infectious compartment in (3) of the SEIR model is now split into two, I o and I c , for symptomatic (or, observed) and asymptomatic (or, cryptic) individuals, who, respectively, exhibit and do not (or, mildly) exhibit symptoms but are nevertheless infectious. Correspondingly, the R compartment is also split into two, as mentioned earlier, to accommodate quarantined and un-quarantined cases. It is assumed that the proportion of individuals transiting from E to I o is p. The remaining exposed individuals (a proportion of 1 − p) transition into the I c compartment and remain undetected throughout their disease experience.
From now on, we consider only renormalized state values which represent proportions, and not actual numbers, of the population. The modified SEIR model with asymptomatic and observed infectiousness is given by the following system of ODEs: A key difference of the model formulation in (6)-(12) is the power transmission dynamics used to model heterogeneity in the population; see [6,7]. The epidemic models developed in [6,7] incorporate parametric heterogeneity, that is, these models describe a heterogeneous population where individuals in the population have different trait values from one person to another. In this framework, disease characteristics related to the population, such as the likelihood of a successful transmission for a susceptible or the rate of infectivity of an infectious individual, can be taken to vary heterogeneously. This heterogeneity is characterized via probability distributions, and is known as distributed susceptibility and infectivity in [6,7]. When a susceptible and infectious trait, respectively, has a Gamma(k 1 , ν 1 ) and Gamma(k 2 , ν 2 ) distribution (where Gamma(k, ν) refers to the Gamma distribution with shape parameter k > 0 and scale parameter ν > 0), it is shown in [6,7] that the resulting epidemic model has power law transmission characteristics with the rate of incidence (of new infections) given by where the powers on S and I are q 1 = 1 + 1/k 1 and q 2 = 1 + 1/k 2 depending on the shape parameter of the corresponding Gamma distribution assumed for distributed susceptibility and infectivity, respectively. Since k 1 , k 2 > 0, it follows that q 1 , q 2 > 1. The values of q 1 = 1 and q 2 = 1 corresponding to (5) can be thought as the limiting case when k 1 ! 1 and k 2 ! 1 representing homogeneity. From this discussion, it follows that the powers v, w o and w c on S(t), I o (t) and I c (t), respectively, are such that v � 1, w o � 1 and w c � 1. The lower bounds on v, w o and w c recover the original SEIR model dynamics with no heterogeneity. The remaining parameters have the following interpretation: (1) α represents a small yet significant force of infection that starts the local infection process but is eventually overwhelmed by it. In our context, α represents the initial force of infection arising from, say, foreign infectious individuals attending the large gathering at Sri Petaling starting February 27th. (2) The parameters γ o and γ c have the same interpretation as γ in (3), that is, they are rates of infectiousness but for the observed and asymptomatic compartments, respectively. (3) The parameter δ is the same as before, namely, it is the rate of incubation associated with the exposed compartment. (4) The functions β o (t) and β c (t) are time varying transmission rates for the observed and asymptomatic compartments, respectively, in the modified SEIR model with β c (t) = μβ 0 (t) and μ 2 [0, 1]. In other

PLOS ONE
words, we assume that the transmission rate for asymptomatic individuals is smaller than that of symptomatic individuals; this is a plausible assumption to make as asymptomatic individuals generally possess a lower viral load which leads to lower chances of a successful transmission. On the other hand, a longer asymptomatic infectious period may compensate for the lower transmission rates for an asymptomatic individual, and this possibility is captured by the model via the parameter γ c . A change-point is incorporated into the modified SEIR model to capture the shift in disease dynamics before and after the start of the MCO. Several works in the COVID literature have incorporated change-points to study the effectiveness of interventions; see, for example, [24][25][26]. To incorporate our change-point, an unknown threshold date, T � , is chosen so that observed daily cases fall either to the left or right of T � . We denote the observation window to the left of T � by W L consisting of dates from March 1st up to and including T � . The window to the right of T � is denoted by W R which consists of dates after T � up to and including April 28th, 2020. The change-point date T � is inferred from data; it is not taken as March 18th, 2020 which is the date of the start of MCO. Choosing T � in this data-driven way is justified as the impact of MCO on observed cases may not be realized immediately. The modified SEIR model with change-point T � is governed by the ODE system (6)- (12) for all time points t 2 W L . For the parameters of the modified SEIR model in W L , we denote them using the same symbol as before but with a subscript L. For W R , the same ODE system (6)-(12) is considered but now with a different set of parameter values in W R compared to W L which are denoted using a subscript R.
The transmission rates in W L are assumed to be constant, that is, to model possible changes in disease transmission over time. The functional form of β o,R (t) in (14) has an initial value of g o;R e a 0;R at t = T � after which it decreases (provided a 1 < 0) to the asymptotic value of a 2 γ o,R as t ! 1. Thus, a 2 γ o,R represents the residual disease transmission that may be present even during the MCO period, for example, due to close contact between family members in the same household. The general functional form of β o,R (t) subsumes the constant disease transmission rate model as a special case by taking a 1 = a 2 = 0 in (14). The constant rate submodel has the advantage of not explicitly assuming any functional form for the change in disease transmission over time. On the other hand, it can only ascertain if there is an overall change (increase or decrease) in transmission after the change point T � . Similar to the relationship β c, In the subsequent text, several redundancies are removed based on the interpretation of the parameters involved. These simplications are reasonable and lead to a reduction in the number of unknown parameters. This reduction, in turn, improves the model fitting and inference procedures. It is reasonable to assume that γ c,L = γ c,R � γ c , δ L = δ R � δ, μ L = μ R � μ and p L = p R � p; in other words, intervention does not change the values of these parameter in W L and W R . These parameters can be assumed constant since they relate to intrinsic characteristics of the disease and of the underlying population. The parameter γ c relates to asymptomatic infections which are not detected in both W L and W R , δ relates to the incubation period of the disease, μ refers to the transmissibility ratio of asymptomatic to symptomatic individuals (which depends on the viral load of infectees only), and p refers to the proportion of symptomatic versus asymptomatic infectees. Hence, these parameters are characteristics of the overall population and the disease, and not the intervention. However, we do not assume that γ o,L = γ o,R for the symptomatic infectious periods, but rather 1/γ o,R � 1/γ o,L since the process of detecting and isolating symptomatic individuals during MCO was generally quicker and more efficient in W R compared to W L , resulting in a shorter symptomatic infectious period in W R compared to W L .
In the Results section, the constant transmission rate submodel is first fitted to observed data to determine if the MCO is succesful in reducing the overall disease transmission. Only when this is established, the full change-point model, with the explicit decay form of (14) and a 1 < 0, is fitted to the observed data. Further, in the model formulation of (6)- (11), only the R o compartment is modelled directly using a likelihood function based on daily observed cases. The other compartments of the modified SEIR model remain latent and do not have any direct observation processes for modelling based on likelihoods; see the Bayesian Inference section for further details.

Quantitative assessment of disease spread
The basic reproduction number, R 0 , is defined as the number of secondary infections caused by one primary individual during his/her infectious period. It is the most important quantitative indicator reported to assess whether the disease is in control or not. It is well-known that the threshold value of 1 for R 0 distinguishes between the situations where a major epidemic occurs versus the disease dying out before a major outbreak can become established in the population. In fact, several studies in the literature report a gradual decrease in R 0 after lockdown [27][28][29]. For the SEIR model in (1)-(4), R 0 is given by the well-known formula R 0 = β/γ Time-varying measures of secondary infections per primary infected, R t , are also available in the literature. However, for the modified SEIR model, R 0 and R t cannot be computed. Therefore, we resort to an alternative quantitative assessment of disease spread-the total number of infections (i.e., generational) caused by the introduction of one additional infectious individual into the infection process at time point t. This procedure is illustrated using the I o compartment. Based on (6)- (12), new values for the ODE system are calculated from time point t onwards with current state values serving as initializations of the ODE system for all except one compartment: For the I o -compartment, the current value I o (t) is replaced by I o (t) + 1/N as the initial value. The new rate of incidence is given by hðS � ; I � o ; I � c Þ over the infectious period of the individual when the ODE system is propagated using (6)- (12) in [t, t + 1/γ o ]. The increase in the rate of incidence by the introduction of this individual in the I o compartment is given by Similarly, the increase in the rate of incidence by the introduction of one infectious individual into the I c compartment at time point t can be calculated. This is denoted by Δ t (I c ). The final increase in the incidence is the mixture where p is the proportion of exposed individuals who enter the I o compartment in (8).

Bayesian inference
Model fitting and inference is carried out in a Bayesian framework. The likelihood. The total numbers of daily cases, D t ; t 2 W L [ W R , which constitute noisy observations from the R o -compartment in the modified SEIR model, are assumed to be distributed as where NB(x;ν, τ) denotes the negative binomial probability function with mean ν and variance ν + τν 2 . The parameter τ measures overdispersion with respect to the Poisson probability function which is recovered from NB(x;ν, τ) when τ = 0; see, for example, [30,31]. The observed data on daily cases in Malaysia exhibited significant overdispersion, and as a result, the Poisson likelihood did not fit the observed cases well. More discussion on this aspect is presented later. Assignment of priors. Several parameters of the modified SEIR model in (6)-(12) take different values in W L and W R , and reflect changes in the disease trajectory due to intervention. Other parameters, including those that relate to the intrinsic characteristics of the disease, remain constant throughout W L and W R . Defining where all notation used for the parameters is as before. In the subsequent discussion, separate priors are elicited either on individual components of Θ, or on specific subsets of Θ where the parameters within these subsets satisfy certain restrictions. A full prior on Θ then follows from assuming independence between the separate prior assignments. For each ξ 1 2 {p, μ}, the prior distribution on ξ 1 is assumed to be uniform, UðA x 1 ; B x 1 Þ with hyperparameters A x 1 and B x 1 , representing the lower and upper bounds, respectively. Since each ξ 1 represents a proportion, we have A x 1 � 0 and B x 1 � 1; however, the choices of hyperparameters used in the experiments for all parameters, including ξ 1 2 {p, μ}, will be discussed later. For now, we provide only the explicit forms of the priors assumed on the components of Θ.
The parameters that relate to the infectious periods, γ o,L , γ o,R and γ c , satisfy the relations 1/ γ o,R � 1/γ o,L (as pointed out earlier) and 1/γ o,L < 1/γ c . The latter condition follows from the fact that the asymptomatic infectious period, being undetected, will generally be longer than its symptomatic counterpart in W L , since symptomatic individuals will be isolated once they test positive. The joint prior distribution on {γ o,L , γ o,R , γ c } can be specified according to the following generation scheme: We first generate 1=g o;R � UðA g o;R ; B g o;R Þ, then set 1/γ o,L = 1/γ o,R + ξ 2 , and finally take 1 independently for j = 2 and 3. We refer to the infectious periods reported in literature, namely [18][19][20]32], to choose reasonable values for A g o;L , B g o;L , B x 2 and B x 3 . Based on the reported values, infectious periods within 2-8 days are found to provide a reasonable range for our experiments. The prior on δ, where 1/δ is the incubation period, is taken as 1/δ * U(A δ , B δ ). To select reasonable values for the lower and upper bounds, we refer to the reported literature (for example, see [33]) and consider an incubation period between 2-13 days to be a satisfactory range for our study.
The proposed Bayesian methodology calls for Θ-samples to be generated from its full prior specification as described above. For this purpose, explicit mathematical experssions of the joint prior density function on parameter subsets of Θ, such as {γ o,L , γ o,R , γ c }, are not required. One only needs to be able to simulate the reciprocals based on the generation scheme described above, and then invert the reciprocals to obtain prior samples of γ o,L , γ o,R and γ c from their implicit joint density specification.
Next, the prior on τ is taken as τ * U(A τ , B τ ) where 0 � A τ < B τ , and B τ is a pre-specified positive constant representing the maximum extent of overdispersion apriori. The prior on α L is assumed to be UðA a L ; B a L Þ, where A a L 0 ¼ 0 and B a L > 0 represents the maximum value of a small but constant force of infection arising, for example, from infectious foreign attendees of the Sri Petaling gathering. The parameter α R is set to 0 as there is no such force of infection in W R . For the parameters w o,L , w o,R and v L , independent uniform priors are considered with lower and upper bounds given by A w o;L and B w o;L for w o,L , A w o;R and B w o;R for w o,R , and A v L and B v L for v L , respectively. Since these parameters have a natural lower bound of 1 that represents homogeneity, the hyperparameters of the uniform priors are chosen in the range [1,1). The priors on w c,L , w c,R and v R are taken using the following relations: The prior on the change-point T � is taken to be uniform on dates from March 18th, 2020, to March 31st, 2020, both inclusive. Since we define T � as the number of days after March 1st 2020, T � * U(17, 30) a priori.
The constant contact rate β o,L during the pre-MCO period cannot be determined by direct observation. Hence, putting a prior directly on β o,L is difficult. However, since b o;L ¼ e a 0;L g o;L , the quantity e a 0;L has the interpretation of a reproduction number, a more fundamental quantity for infectious diseases compared to β o,L . Thus, one can obtain a range of values for a 0,L from previous reports on similar flu-like epidemics (see, for example, [34]) where we found the range 0-3.5 appropriate for covering the true value of a 0,L in our case. We take the prior on a 0,L as a 0;L � UðA a 0;L ; B a 0;L Þ where A a 0;L � 0 and B a 0;L � 3:5 based on benchmark values in [34].
The function β o,R (t) has the form in (14) which depends on coefficients a 0,R , a 1 and a 2 . For the prior on a 0,R , we consider two cases: When using the constant rate submodel to determine if an overall reduction in transmission has occurred or not, the prior on a 0,R is chosen as a 0;R � UðA a 0;R ; B a 0;R Þ with A a 0;R < 0 and B a 0;R > 0. This ensures that the prior support includes both positive and negative values to represent possible increase and decrease in the transmission rate. On the other hand, if the full model of (14) is considered, a 0,R is chosen deterministically (hence, no prior) to ensure continuity of the infection process before and after T � . Specifically, a 0,R is chosen so that the incidence rate of new infections, h(S(t), I o (t), I c (t)), in W L and W R coincide at t = T � . In the case of the full change-point model, a 1 is given a uniform prior with support on ½A a 1 ; B a 1 �. We take A a 1 < 0 and B a 1 ¼ 0 if a reduction in disease transmission is first established by the constant rate submodel. For the prior on a 2 , note that 0 � a 2 � e a 0;R . Hence, a reasonable prior to put on a 2 is Uð0; e a 0;R Þ.
Next, we describe the prior assignment on the initial number of symptomatic infectious and exposed individuals, denoted by i 0 and e 0 , respectively. Given δ and γ o,L , whose priors were elicited earlier, the prior on i 0 is taken as The half-width D i 0 > 0 is fixed to be reasonably large to represent the maximum extent of prior uncertainty around m i 0 . Similar to i 0 , the prior on the initial number of exposed individuals, e 0 , is taken as e 0 � Uðm e 0 À D e 0 ; m e 0 þ D e 0 Þ for the midpoint m e 0 and half-width D e 0 . The midpoint m e 0 is determined similarly as before using the value of the second derivative of the fitted polynomial evaluated at time t = 0 and the generated value of δ. The halfwidth D e 0 > 0 is once again fixed to be large to represent the maximum extent of prior uncertainty around m e 0 .
The model fitting experiments were conducted either by choosing default values for the lower and upper bounds, or by choosing values guided by literature when such information is available. For all other hyperparameters where no information is available, reasonable values were chosen at the start of the experiments, and the hyperparameters were adjusted by repeated trial and error runs to achieve the best fit models to the observed data akin to the empirical Bayes approach [35], at least informally. Table 1 gives the values of the hyperparameters used to obtain the plots and inference in the Results section.
Computational algorithm. Based on the negative binomial likelihood and prior elicitation in The Likelihood and Assignment of Priors, respectively, the posterior of Θ can be derived using Bayes theorem as pðY j DÞ / pðYÞ where D ¼ fD t : t 2 W L [ W R g is the collection of observed daily cases from March 1st until April 28, 2020, L(Θ) is the entire likelihood for D and π(Θ) is the prior on Θ as described in Table 1. Hyperparameter values for the reported results.

PLOS ONE
Assignment of Priors. Bayesian inference is carried out using Monte Carlo importance sampling. A total of M samples, Θ i for i = 1, 2, � � �, M, are generated from the prior specification π (Θ). The likelihood, L(Θ i ), is computed for each Θ i and normalized to obtain weights can now be used to provide an (unbiased) estimate of E(g(Θ)) and its corresponding variance. The above importance sampling methodology is easy to code and implement but one has to be cautious so that the distribution of weights is not too extreme. If the prior on Θ is chosen such that it spans a considerably larger region compared to where the likelihood concentrates its mass, the distribution of weights will be become extreme. As a result, almost all of the weights fw i g M i¼1 will be extremely small (close to zero) and only a few will have large positive values. Resampling these weights will select Θ isamples associated to these large weights multiple times and this will degrade the estimation of E(g(Θ)). Thus, it is important to judiciously choose the prior ranges, where possible, so that it adequately covers regions of high likelihood values. This is achieved in the Results section by running different selections of the hyperparameters and judiciously adjusting the range of the priors (where possible and respecting the information on the disease and intervention dynamics) to obtain the best fitting models. Based on this importance sampling method as well, an approximation to the Maximum-a-Posteriori (MAP) estimator of Θ can be found sincê In other words, as M becomes large, the sampling based maximumŶ M will be close to the true MAP estimateŶ MAP with probability tending to 1. The approximation in (21) could fail if any one of the assumptions made above is not valid: The posterior pððC; T � MAP ÞjDÞ could be discontinuous at C = C MAP orŶ MAP may not lie in the interior of supp[π(Θ|D)]. However, discontinuity is not a concern here as pððC; T � MAP Þ j DÞ is a continuous (in fact, differentiable) function of C. To ensure thatŶ MAP lies in the interior of supp[π(Θ|D)], the experiments were run based on judicious choices of the prior hyperparameters which has been mentioned earlier as well. The resampled fY � i g M i¼1 values are used for plotting and providing a quantification of uncertainty aroundŶ MAP (see, for example, Figs 10 and 11 in the Results section). Numerical measures of uncertainty can be obtained, for example, by computing the usual sample variance based on fgðY � i Þg M i¼1 to describe the variablity around the sample mean which in this case is an unbiased estimate of E(g(Θ)).
The Bayesian computational algorithm described in this section is developed using R and RStudio 1 . The likelihood L(Θ) for a specific Θ is evaluated numerically from the solution of the ODE system (6)-(12) obtained using the deSolve package in R.

Results
The study period is from March 1, 2020 until April 28, 2020 with T 0 = 0 and T 1 = 59. The impacts of the Sri Petaling gathering and MCO implementation are analyzed here using the proposed model described in The modified SEIR model section. Reported daily cases at the national level as well as for Selangor and Sarawak are used for model fitting and parameter estimation. Note that all states in Malaysia implemented MCO Phases 1-3 using the same guidelines and protocols. Thus, one can gauge the impact of the Sri Petaling gathering on COVID-19 spread in Malaysia based on a comparison between states and the national experience. Here, Selangor and Sarawak are chosen as two such representative states with high and low population densities, respectively.
First, we investigate if the MCO implementation had an overall effect of reducing COVID-19 transmission rates. For this purpose, the constant rate submodel of (14) is used and the prior on a 0,R in W R is chosen to be uniform with support on both positive and negative values. The Bayesian inference methodology described in the Bayesian Inference section is carried out with M = 50, 000 to obtainŶ MAP and samples Y � i from the posterior of Θ in (19). The curves of F(t), defined as  (16)) is obtained for t in W L and W R . The plots of Δ t versus t are shown in Fig 4. A consistent feature of the plots in all three panels is that they first increase for time points t � T � followed by a significant drop for t > T � . Hence, we conclude that an exponential rise in cases occurred right after the completion of the Sri Petaling gathering on March 1st 2020, and the implementation of the MCO successfully stemmed the exponential rise at the national, Selangor and Sarawak levels.
With reduced disease transmission established in W R , we next proceed to utilize the functional form (14)  . Variability estimates can be obtained for all parameters and their functions. As an illustration, we demonstrate the extent of variability inherent in the posterior visually for the F(t) curve (see (17)) for t 2 [T 0 , T 1 ]. This  Further results from the Bayesian analyses are summarized in Tables 2-4. These tables give the MAP estimates of parameters and their corresponding 95% credible intervals for Malaysia, Selangor and Sarawak, respectively. The correlation plots of selected pairs of parameters are provided in Figs 7 and 8. From these figures and others which we did not provide due to space limitations, we find that the posterior samples of parameter pairs are not strongly correlated with each other, except for the pairs for which correlation was induced apriori via the joint prior elicitation.
We provide a summary of the salient findings of our inference procedure. The symptomatic and asymptomatic infectious periods as well as the incubation periods are found to be around 6-8 days for Malaysia, Selangor and Sarawak. These findings are similar to values reported in the literature for other countries; see, for example, [18][19][20]32]. Change-points T � are estimated not too far away from the date of MCO implementation, March 18th 2020. For Malaysia, T � = 17 is the MAP estimate which corresponds to March 18th, 2020, and the associated 95% credible interval is (17,23). For Selangor, the MAP estimate of T � is T � = 24 (March 25th 2020)

PLOS ONE
with (20,28) being the 95% credible interval. For Sarawak, the transition date is less precise. The MAP estimate is T � = 28 (March 29th 2020) but the 95% credible interval (19,33) is much larger indicating higher uncertainty in T � . This can be attributed to the fact that the trajectory of reported case numbers for Sarawak shows a slower and more gradual increase, then decrease, compared to Malaysia and Selangor (see Fig 5). The plots of Δ t versus t for Malaysia, Selangor and Sarawak are provided in Fig 9. We note that all three panels in Fig 9 indicate a decay in the transmission rates after T � . The measure Δ t is calculated to be approximately 3.39, 2.50 and 2.08 for Malaysia, Selangor and Sarawak, respectively, at the start of W L , that is, when t = T 0 . Starting from t = T 0 , Δ t showed an increase in W L , reaching values of4.55, 3.42 and 2.50 at t = T � , respectively, for Malaysia, Selangor and Sarawak. Hence, before the MCO was introduced, Selangor achieved a rate of increase higher than that of Sarawak. After T � , Δ t for Malaysia, Selangor and Sarawak registered a decay demonstrating the effectiveness of the MCO. Δ t declined sharply to a value around0.33, 0.0003 and 0.31, respectively, for Malaysia, Selangor and Sarawak at t = T � + 10, and after that, it declined more gradually to its corresponding asymptote. Based on Δ t , it is seen that the initial transmission rates tend to be higher for areas with a higher population density (comparing Selangor and Sarawak). On the other hand, based on the MAP estimates of a 1 in W R of −1.756 and −0.325 for Selangor and Sarawak, respectively (see Tables 3 and 4), higher population density areas also experience a faster decline in the transmission rates under an effective implementation of the MCO. Although the MAP estimate of a 1 for Malaysia (a 1 = −0.371 from Table 2) is not as negative as it should be, we will show in the next section that a redistribution of cases further improves this estimate of a 1 and brings it closer to that of Selangor.  Generally speaking, such outliers highlight a mismatch between the proposed model and the observed data, and point towards model inadequacy. However, we wish to emphasize that this

PLOS ONE
is not the case here. One key consideration is the effect of delay, that is, whether or not the reported case numbers coincide with the day of testing. It is highly likely that a lag occurred in the reporting of cases since the COVID-19 experience was new to Malaysia. Based on the report [36], it is reasonable to assume that delays in testing and reporting were expected during the initial days of the COVID-19 outbreak in Malaysia. The peak on Day 14 seem to suggest a significant backlog of reporting of cases coupled by the fact that the day was a Monday, the first day of the week. The effects of reporting delays on observed case trajectories and parameter inference are illustrated here based on a simulation study. A delay-in-reporting model based on the multinomial distribution is assumed: Let X * Mult(D t ;p 1 , p 2 , � � �, p K ), where X = (X 1 , X 2 , � � �, X K ) with K = 5 and X k is the number of cases (out of the total reported cases on day t, D t ) that is to be redistributed to day t − k + 1 for k = 1, 2, � � �, K. The probabilities p k , k = 1, 2, � � �, K are chosen according to a truncated geometric distribution taking values in k − 1 for k = 1, 2, � � �, K with success probability 0.4. The cases redistribution model is applied to new cases reported from Day 10 until Day 15. The redistributed reported case trajectory, the best fit curves and associated variabilities are shown in Fig 11. Comparing Figs 10(a) and 11(b), one can immediately notice that the reported case numbers in Fig 11(b) are better explained by the variabilities of the underlying model and the negative binomial likelihood. Parameter estimates and credible intervals for the redistributed case numbers are given in Table 5. The new infectious periods are still within the 6-8 day range and are consistent with previously reported literature. The redistribution of case numbers have also reduced the uncertainty around the MAP value of T � = 20: The credible interval for T � in Table 5 is narrower compared to that in Table 2. The MAP estimate of a 1 is now −1.693, which is about halfway between Selangor and Sarawak.
A final point to be discussed is our preference for the negative binomial likelihood compared to the more traditional Poisson likelihood for modelling COVID-19 case numbers. Our initial investigation used the Poisson likelihood for reported case numbers but we found that the underlying model together with the Poisson likelihood was not able to capture inherent variabilities in the observed data. Hence, we opted for the overdispersed negative binomial

PLOS ONE
likelihood which was able to satisfactorily represent the observed data via its overdispersion parameter τ. This is evidenced by the variability bands presented in Figs 10 and 11(b) which successfully enclose most of the reported case numbers. This coverage is further improved in Fig 11(b) by a redistribution of delayed cases. We also provide the loglikelihood values corresponding to the Poisson and negative binomial observation models in Table 6 for Malaysia (with original case numbers), Malaysia (with redistributed case numbers), Selangor and Sarawak. Note that the negative binomial loglikelihood values are consistently larger than the Poisson counterparts indicating a better model fit to observed data.

Conclusion
Quantitative models and assessment of the impacts of the Sri Petaling gathering and implementation of MCO on COVID-19 spread in Malaysia are developed in this paper. The MCO implementation is found to be highly effective in containing (an exponential rise of) the

PLOS ONE
COVID-19 outbreak in Malaysia. The analysis here quantitatively demonstrates how quickly transmission rates fall under effective NPI implementation within a short time period. Higher disease transmission is found in Selangor (a state with higher population density) compared to Sarawak. We also found that under MCO, the decline in transmission was faster in Selangor compared to Sarawak. The rise and fall of disease transmission in Selangor mirrored the national level whereas Sarawak showed a more gradual increase and decrease in COVID-19 transmission. The change points were mostly found to be close to the date of MCO implementation (18th March 2020) although Sarawak exhibited a larger uncertainty around that date due to its gradual and slower increasing and decreasing trends of reported case numbers. Our study developed a new model to represent COVID-19 spread in Malaysia that accounts for heterogeneity and asymptomatic transmissions. We found that reported case numbers in Malaysia exhibited large variabilities which can possibly be attributed to a delay in reporting, particularly during the early stages of the pandemic as the experience with handling COVID-19 was new to the country. Nevertheless, the model developed here together with the overdispersed negative binomial likelihood are able to capture salient features of COVID-19 spread in Malaysia and provide reliable quantitative assessments even under the challenges of limited and delayed data.