Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Modeling the dynamics of the COVID-19 population in Australia: A probabilistic analysis

  • Ali Eshragh ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations School of Mathematical and Physical Sciences, University of Newcastle, Newcastle upon Tyne, NSW, Australia, International Computer Science Institute, University of California-Berkeley, Berkeley, CA, United States of America

  • Saed Alizamir,

    Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

    Affiliation School of Management, Yale University, New Haven, CT, United States of America

  • Peter Howley,

    Roles Methodology, Writing – original draft, Writing – review & editing

    Affiliation School of Mathematical and Physical Sciences, University of Newcastle, Newcastle upon Tyne, NSW, Australia

  • Elizabeth Stojanovski

    Roles Methodology, Writing – original draft, Writing – review & editing

    Affiliation School of Mathematical and Physical Sciences, University of Newcastle, Newcastle upon Tyne, NSW, Australia


3 Aug 2022: The PLOS ONE Staff (2022) Correction: Modeling the dynamics of the COVID-19 population in Australia: A probabilistic analysis. PLOS ONE 17(8): e0272762. View correction


The novel coronavirus COVID-19 arrived on Australian shores around 25 January 2020. This paper presents a novel method of dynamically modeling and forecasting the COVID-19 pandemic in Australia with a high degree of accuracy and in a timely manner using limited data; a valuable resource that can be used to guide government decision-making on societal restrictions on a daily and/or weekly basis. The “partially-observable stochastic process” used in this study predicts not only the future actual values with extremely low error, but also the percentage of unobserved COVID-19 cases in the population. The model can further assist policy makers to assess the effectiveness of several possible alternative scenarios in their decision-making processes.

Introduction: COVID-19 pandemic

The novel beta-coronavirus, later named “COVID-19”, was first reported in late December 2019 in Wuhan City, China [1]. Early reports indicated a wet market in Wuhan to be the origin of the outbreak, affecting approximately 66% of market staff, and comprising symptoms resembling pneumonia of fever, dry cough, and fatigue [2]. The market closed 1 January 2020, following an epidemiologic alert announced by the local health authority in China on 31 December 2019. The infection was reported to have spread to many cities across China over January 2020, with thousands in China becoming infected by the disease, while also spreading rapidly globally, affecting countries including Thailand, Japan, Korea, Vietnam, Singapore, United States and Germany [2]. The World Health Organization (WHO) declared the outbreak a pandemic on 11 March 2020 and, as of 15 June 2020, a total of 7, 823, 289 confirmed cases of COVID-19 globally were reported by WHO with 431, 541 related deaths across at least 216 countries

Pandemic in Australia

According to official reports, the novel Corona Virus COVID- 19 arrived on Australian shores on around 25 January 2020. From 5 March 2020, the number of new cases grew rapidly and reached over 300 cases daily in late March Following lockdown restrictions by the Australian Government from mid-March, the daily number of new cases started declining from early April, reaching approximately 20 cases daily by late April

Preventative measures to minimize transmission were increasingly imposed by the Australian Government from 1 February 2020 with foreign nationals from mainland China banned entry to Australia, and 14 days of self-quarantining imposed for returning citizens from China Travel restrictions were subsequently imposed with all travelers arriving in Australia required to self-isolate for 14 days from 15 March 2020, with fines of up to AUD$50,000 for non-compliance A general travel ban was imposed from 20 March 2020 with Australia closing its borders to all non-residents

A human bio-security emergency was declared in Australia on 18 March 2020 with a social distancing rule of 4 square meters per person imposed from 20 March 2020. From 22 March 2020, a mandatory closure of non-essential services was imposed with some states closing their borders allowing only the state’s residents to return and from 23 March 2020, all places of social gathering were closed with cafes and restaurants limited to takeaway From 29 March 2020, public gatherings were limited to two people if they were not from the same household and there were only four acceptable reasons for leaving homes, comprising shopping for essentials, medical or compassionate needs, exercise and work or education purposes

Some subtle variation in the timing of the implementation of these measures occurred between States/Territories with State Governments/Territory officials also imposing additional restrictions in response to State/Territory-specific data. For example, some states introduced social distancing measures in schools from 15 March 2020, preventing students and staff from congregating in large numbers with several university graduations, conferences, events, classes and student organized events also canceled

At the time of writing, the Government had a three-stage plan to reopen Australia by July 2020 The three stages reflect increasing the numbers of permissible visitors in homes and public places, whilst still maintaining noted hygiene and social spacing, along with the opening of various places of employment and social interaction (restaurants, community centers in Stage 1). Accompanying this are the lifting of travel restrictions: local and regional in Stage 1, interstate in Stage 2 and partial international, principally Pacific region, in Stage 3. The seven States and Territories invoke these at slightly differing times to reflect their local experiences and numbers of infected people.

Vital need for modeling

The outbreak of COVID-19 and its accompanying pandemic has created an unprecedented challenge and unilateral response worldwide, and urged every nation to deploy its utmost resources toward combating the disease whilst managing the economic and social impacts. Tracking the epidemic and estimating the size of the infected population and effects of potential guidelines and restrictions has become a critical priority for most governments around the globe as it has immediate ramifications on all subsequent policy interventions (e.g., see [37]).

Stochastic processes are designed to deal with change that involves randomness and uncertainty, both aspects that are paramount to the COVID-19 outbreak. In particular, partially-observable stochastic processes specifically account for incomplete knowledge of a system that arises from having only incomplete knowledge about a situation. A common form of partial observation is one whereby the state of the system, or each component of the system, can be observed with only a certain degree of certainty. An example is the case of biological invasions, whereby an invasive species or individual of the species can be detected with only a certain probability upon each survey (e.g., see [8]). Another application is in medical testing, where a test can provide a false negative, so that the infection can then only be detected with a certain probability for an infected individual upon administering the test (e.g., see [9, 10]). Further applications of partially-observable stochastic processes include (but are not limited to) recognizing patterns [11], analyzing digital signals [12], and understanding biological processes [13, 14].

In this paper, we focus on modeling the early stages of the COVID-19 outbreak in Australia, and provide an epidemic model that complements others in use by providing extremely accurate estimates of COVID-19 transmission in Australia, including estimates of hidden cases, in timeframes relevant to policy implementation. More precisely, we utilize a special class of stochastic processes, the partially-observable pure birth process, to model the dynamics of the COVID-19 population in Australia. In the present epidemiological context of modeling the COVID-19 outbreak, the main source of uncertainty comes from the stochastic dynamics of the system as well as the structure of sampling in which each infected individual can be tested with only a certain degree of certainty. Our model particularly suits situations where the number of infected citizens is minimal compared to the total at-risk population, which is the case for the majority of regional and national jurisdictions. This is a critical phase of disease spread, and requires policy measures that effectively control growth. The effectiveness of these policies, in turn, depends heavily on the quality of the models used and the precision of the estimates that they generate. The following two features of our model establish its benefits relative to other epidemiology models:

  1. The robust predictive nature where, with only small amounts of data, the (subsequently released) future actual values are forecasted very well;
  2. The capability to estimate not only the growth rate of the COVID-19 cases, but also the percentage of unobserved cases, which represent those in the population who have not been officially diagnosed.


The key contributions of this work include (but are not limited to):

  • identifying two structural break points in the numbers of new cases coinciding with where the dynamics of the COVID-19 population are altered: the first, a major break point, on 27 March 2020, is one week after implementing the “lockdown restrictions”, and the second minor point on 18 April 2020, is one week after the “Easter break”;
  • forecasting the future daily numbers of new cases up to around 8 weeks in advance with extremely low mean absolute percentage errors (MAPEs) using a relative paucity of data, namely, MAPE of 1.53% using 20 days of data to predict the number of new cases for the following 6 days, MAPE of 0.43% using 34 days of data to predict the number of new cases for the following 14 days, and MAPE of 0.26% using 55 days of data to predict the number of new cases for the following 52 days;
  • estimating approximately 33% of COVID-19 cases as unobserved by 26 March 2020, reducing to less than 5% after implementing the Government’s constructive restrictions;
  • predicting that the growth rate, prior to the Government’s implementation of restrictions, was on a trajectory to infect numbers equal to Australia’s entire population by 24 April 2020;
  • estimating the dynamics of the growth rate of the COVID-19 population to slow down to a rate of 0.820 after the first break point, with a rise to 0.979 after the second break point;
  • advocating the outlined stochastic model as practically beneficial for policy makers when considering implementation and easing of virus restrictions due to the demonstrated sensitivity of the dynamics of the COVID-19 population in Australia to both major and minor system changes.

Modeling: Dynamics of the COVID-19 population in Australia

A Continuous-time Markov population Process (CTMPP) is a class of stochastic processes often used to model biological phenomena (e.g., see [1517]). The study of a CTMPP under partial observations, referred to as a “partially-observable continuous-time Markov population process” (PO-CTMPP) is of interest for the present study. A special class of PO-CTMPPs is the partially-observable pure birth process (PO-PBP) whereby, while the underlying model is a stochastic “pure birth process”, observations are made partially according to a binomial distribution.

Bean et al. [18] extensively studied the theoretical properties of a PO-CTMPP and a PO-PBP, and derived the conditional probability distribution of the true state of the system and future values of partial observations, given the history of partial observations. Furthermore, they showed that, unlike a pure birth process, a PO-PBP is not Markovian of any order. Bean et al. [19] applied these results to find the Fisher Information for a PO-PBP and derived the optimal experimental design. The details of these results are summarized in S1 Appendix.

We utilize these approaches here to model and analyze the dynamics of the COVID-19 population in Australia. Due to practical limitations described in S1 Appendix, not all infected cases may be observed each day, implying that the confirmed cases reported officially are only partial representations of actual cases. Therefore, a PO-PBP can be considered a superior model to explain the complex dynamics of the COVID-19 population.

Remark 1 We assume that there is no shortage of “COVID-19 test kits” in Australia. If this assumption were false, the model would lose applicability once the required sampling rate reached the limit imposed by kit shortages. However, this has not happened to date in Australia, according to the Australian Prime Minister’s statements during the pandemic

The data for this study are obtained from “daily WHO reports” It should be noted that there are a few official resources for the COVID-19 data with minor differences in their reports. Although the structure of our model and main output stay consistent for the data from different resources, the numerical results might slightly vary. provided on their website All algorithms are coded in C, and the outputs are analyzed in Matlab R2019a.

The first step in data analysis is visualization. Fig 1 displays the cumulative new COVID-19 cases in Australia from 1 March to 15 June 2020 and demonstrates two structural break points where the population dynamics have been altered, mainly attributed to new polices and exogenous factors:

  1. The first break point occurs on 27 March 2020. This point corresponds to one week after implementation of the “lockdown restrictions”. As shown in Fig 1, this is a crucial break point where the curvature of the cumulative new cases dramatically changes from a convex exponential growth to a concave stable pattern. Furthermore, it is observed that the growth rate starts declining after this break point.
  2. The second break point occurs on 18 April 2020 which corresponds to a week after the “Easter break” in Australia. Unlike the first break point, the second one does not transform the curvature or stability of the graph, but instead shifts it up slightly and slows down the speed at which the growth rate parameter is declining.
Fig 1. Cumulative new COVID-19 cases in Australia for the whole span of the data.

Two structural break points in the dynamics of the COVID-19 population are evident: a major break point on 27 March 2020, and a minor break point on 18 April 2020.

Remark 2 Fig 1 along with those two structural break points indicate that the dynamics of the COVID-19 population in Australia appear very sensitive towards major/minor system changes, which should be a serious consideration for policy makers while easing virus restrictions.

To gain insights into the complex nature of the population for modeling its dynamics, we carry out our analysis in three nested time steps:

  • Step 1: 1−26 March 2020, until the first break point (cf. Section Step 1: 1–26 March 2020);
  • Step 2: 1 March–17 April 2020, until the second break point (cf. Section Step 2: 1 March–17 April 2020);
  • Step 3: 1 March–15 June 2020, the whole span of the data (cf.Section Step 3: 1 March–15 June 2020).

The general scope of our modeling for each step is as follows: we fit a PO-PBP to the data to model the dynamics of the COVID-19 population over the designated period. A PO-PBP possesses two major parameters comprising the growth rate λt and the observation probability pt (cf. S1 Appendix). We construct the likelihood function of partial observations according to S1 Eq in S1 Appendix in conjunction with S1 Corollary in S1 Appendix, and truncate the involved infinite sums by utilizing S2 Eq in S1 Appendix and S1 Proposition in S1 Appendix. Then, the logarithm of the likelihood function is maximized over the range of parameters to find their maximum likelihood estimates (MLEs). Finally, S2 Proposition in S1 Appendix is applied to predict the future values of partial observations.

Furthermore, in order to evaluate the accuracy of predictions generated by the estimated models, the dataset for each nested real time step is partitioned into two mutually exclusive segments consisting of the training data to estimate the parameters and forecast future values, and the test data to evaluate the accuracy of predictions. To measure the latter, we utilize mean absolute percentage error, introduced in Definition 3.

Definition 3 ([20]) The mean absolute percentage error (MAPE) is defined as: where ft, xt and h are the forecasted values, actual values, and prediction horizon, respectively.

Remark 4 It should be noted that the quality of predictions reported in following threes sections is robust to the choice of training and test data, provided there are enough observations in the former set (cf. Fig 5).

Step 1: 1–26 March 2020

This step involves the beginning of the pandemic in Australia where the COVID-19 population is growing exponentially fast. The data from 1−20 March 2020 are used as the training data to estimate the parameters of the model, and the data from 21−26 March 2020 are used as the test data to evaluate the accuracy of predictions. For this period of modeling, we consider the flowing dynamics for the two parameters growth rate λt and observation probability pt for the underlying PO-PBP: and where, α1 > 0 and β1 > 0 are constant coefficients, λ and p are unknown initial values of the parameters, and t0 is the date of the first observation (i.e., 1 March 2020). After constructing the likelihood function and maximizing over the parameters, the MLEs in Table 1 are derived. Since MLE of α1 and β1 turn out equal to 1, both MLEt) = 0.235 and MLE(pt) = 0.67 are fixed for all t in the range.

By applying S2 Proposition in S1 Appendix, we can predict the expected values of partial observations over the span of test data (i.e., 21−26 March 2020). The results are shown in Fig 2, where the training data, test data, and predictions are displayed by the black solid plot, red dot-dash plot, and blue solid plot, respectively. It is readily seen that the predicted values are so well fitted to the actual test data. This observation is numerically confirmed with MAPE = 1.53%.

Fig 2. Cumulative new COVID-19 cases in Australia for three categories consisting of: (i) training data spanning from 1−20 March 2020 (in black), test data spanning from 21−26 March 2020 (in red), and (iii) predicted values over the period of test data (in blue) with MAPE = 1.53%.

The model additionally suggests that prior to the Government’s implementation of restrictions, the growth rate was on a trajectory to hit infection numbers equal to Australia’s entire population by 24 April 2020, a prediction that would have probably been softened only somewhat by limiting factors such as our island status. This asserts the effectiveness of Government’s policies and restrictions. Fig 3 displays the semi-log plot (where the y-axis is scaled logarithmically) of the cumulative new COVID-19 cases in Australia (from 1 March–15 June 2020) and the model predictions (from 21 March–24 April 2020) in black and blue, respectively. This figure reveals the exponential growth of the COVID-19 population before the impact of lockdown restrictions on 27 March 2020 (marked in green).

Fig 3. Cumulative new COVID-19 cases in Australia for the whole span of the data (in logarithmic scale) compared with the predicted values for the scenario in which the lockdown restrictions had not been implemented.

It shows the exponential growth of the COVID-19 population before the impact of lockdown restrictions on 27 March 2020 (marked in green).

Remark 5 Fig 3 shows an initial break point on 6 March 2020. As it is located at the outset of the pandemic in Australia and the number of confirmed cases is still very small during that short period, we disregard it as a break point in our analysis. It does not have a significant influence on the results.

Finally, the MLE of the observation probability pt estimates that only 67% of COVID-19 cases in Australia had been tested by 26 March 2020, and the hidden 33% cases had not been recorded/diagnosed officially, by this date.

Identifiability analysis.

In statistical inference, there are several tools to measure the quality of estimates, including identifiability.

Definition 6 ([21]) A statistical inference is called “identifiable”, if different values of the model parameters generate different probability distributions of the observable variables.

If an inference is truly not identifiable, then mathematically, the value of the likelihood function will be a constant at all values of the parameters that are equivalent. Therefore, in this case, one would expect to see a ridge on the likelihood surface of roughly constant values as the parameters change.

In order to see the identifiability of the MLEs of the main parameters λ and p given in Table 1, we plot the log-likelihood function of partial observations in terms of the two parameters. As depicted in Fig 4, it is clearly observed that there exists a curvature in the log-likelihood function, illustrating these estimates are identifiable.

Fig 4. The log-likelihood function of partial observations in terms of the two parameters λ and p.

This plot illustrates the identifiability of the estimates provided in Table 1.

Remark 7 There is a heuristic that so long as the log-likelihood function changes by at least 2 units, then it is regarded as a worthwhile change, implying the identifiability of estimates. So, by considering the locus of points in (λ, p) that remain within 2 units of the log-likelihood function at (MLE(λ) = 0.235, MLE(p) = 0.67), the locus allows for MLE(p) to range within [0.55, 0.75] and for MLE(λ) to range within [0.225, 0.245]. A few other points within those two ranges are chosen as the MLEs of λ and p, but no significant difference in MAPE of predictions is observed.

Sensitivity analysis.

Due to the very small amount of available data (26 observations in this step, and 76 data in total), there is limited opportunity to investigate the robustness of estimates. In spite of this, the MLE of parameters for size training data, varying from 20 to 25, are estimated and the future values over the span of the corresponding test data (where their sizes varying downward from 6 to 1) are predicted. The MAPE of those predictions versus the size of training data are displayed in Fig 5. The largest MAPE is 3.18%, which is still very low, illustrating high quality forecasts, while demonstrating the robustness of our estimates to the choice of training data.

Fig 5. MAPE of predictions vs. size of training data for a range of size of training data (test data) varying from 20−25 (6−1).

Step 2: 1 March–17 April 2020

This step involves the first break point when the influence of lockdown restrictions appears in the COVID-19 population growth. The data from 1 March–3 April 2020 are used as the training data to estimate the model parameters, using data from 4−17 April 2020 as the test data to evaluate the accuracy of predictions. Fig 1 indicates changes to the dynamics of the growth rate after the first break point. Accordingly, we consider the following dynamics for the growth rate and the observation probability parameter for the underlying PO-PBP: and where α2 > 0 and β2 > 0 are new constant coefficients, and bp1 is the date of the first break point (i.e., 27 March 2020). The two new parameters α2 and β2 control the impact of the first break point on the growth rate and observation probability, respectively. Table 2 provides the MLE of the parameters.

By applying S2 Proposition in S1 Appendix, we can predict the expected values of partial observations over 14 days, that is the span of test data from 4−17 April 2020. The results are shown in Fig 6, where the training data, test data, and predictions are displayed by the black solid plot, red dot-dash plot, and blue solid plot, respectively, and the first break point is marked in green. Clearly, the predicted values are remarkably fitted to the actual test data with a significantly small MAPE = 0.43%, which is notably less than one percent error.

Fig 6. Cumulative new COVID-19 cases in Australia for three categories: (i) training data spanning 1 March–3 April 2020 (in black), test data spanning 4−17 April 2020 (in red), and (iii) predicted values over the period of test data (in blue) with MAPE = 0.43%.

The first break point on 27 March 2020 is marked in green.

Remark 8 The MLEs provided in Table 2 imply that the observation probability pt starts increasing after the first break point with the estimated boosting factor of MLE(β2) = 1.06 such that after one week, it reaches the upper bound of 1. However, by considering the identifiability of these point estimations as well as Remark 7, it is observed that the locus allows for MLE(pt) to range within [0.95, 1.00] over the test period of 4−17 April 2020. Hence, the model estimates that the percentage of observed COVID-19 cases from 2−17 April 2020 lies within the range [95%, 100%]. Furthermore, analogous to Fig 5, the robustness of estimates is confirmed.

Step 3: 1 March–15 June 2020

The last step is for the whole span of the data, involving both structural break points. The data from 1 March–24 April 2020 are used as the training data to estimate the parameters of the model, and the data from 25 April–15 June 2020 are used as the test data to evaluate the quality of predictions. Motivated from Fig 1 along with Steps 1–2, we define the following dynamics for the growth rate and the observation probability parameter for the underlying PO-PBP: and where γ > 0 is the new inflation parameter on the growth rate after the second break point, and bp2 is the date of the second break point (i.e., 18 April 2020). Table 3 provides the MLE of the parameters.

By applying S2 Proposition in S1 Appendix, we can predict the expected values of partial observations over the span of test data (i.e., 52 days from 25 April–15 June 2020, inclusive). Results are shown in Fig 7, where the training data, test data, and predictions are displayed by the black solid plot, red dot-dash plot, and blue solid plot, respectively, and the two break points are marked in green. It is clearly evident that the predicted values are exceptionally closely fitted to the actual test data with MAPE = 0.26%, which is notably much less than one percent.

Fig 7. Cumulative new COVID-19 cases in Australia for three categories: (i) training data spanning 1 March–24 April 2020 (in black), test data spanning 25 April–15 June 2020 (in red), and (iii) predicted values over the period of test data (in blue) with MAPE = 0.26%.

The two break points are marked in green.

Remark 9 The MLEs provided in Table 3 show that the declining parameter on the growth rate is inflated from MLE(α2) = 0.820 to MLE(α3) = 0.979. Although it is still less than one (indicating that the population is stable and not exploding), but such an increase as a consequence of people’s interactions during the Easter break as well as releasing a few restrictions on 2 May 2020 should be taken into account by policy makers for easing out the COVID-19 restrictions. Furthermore, by considering the identifiability of estimations as well as Remark 7, it is observed that the locus allows for MLE(α3) to range within [0.940, 1.020]. Hence, there is a chance that the parameter MLE(α3) could be greater than one, implying that the population starts growing again. If this takes place, the population size will quickly resume exponential growth (cf. Fig 3). Furthermore, analogous to Fig 5, the robustness of estimates is confirmed.

Justification of the estimate MLE(pt).

The Australian Government’s official statistics released on 24 April 2020 “… estimated that approximately 93 per cent of all symptomatic cases are detected in Australia. Australia has the highest reported detection rate in the world.” This aligns well with our estimated value of the detection rate, supporting our conclusions about the number of unobserved cases (cf. Remark 8). Furthermore, the Australian Government successfully controlled the COVID-19 population, reducing the number of cases to near zero, with subsequent outbreaks a consequence of people returning to the country. This trend is also consistent with our model’s prediction, which suggested a trajectory of fewer unobserved (and observed) cases as we approach the endpoint.

Remark 10 One can easily see that the MLEs of parameters provided in Tables 1, 2 and 3, alter after each per-identified structural break point. These results confirm our choice of break points, and also motivate us to suggest an algorithmic way to detect the break points. In that case, the whole time frame can be partitioned into τ mutually exclusive time intervals (tk−1, tk) for k = 1, …, τ, and the following dynamics for the growth rate parameter λt and the observation probability pt can be constructed: and After estimating all parameters, those consecutive intervals showing distinct MLEs for αi and βi could be an indication of a “structural break point”. If one wants to trim estimating the location of each break point, one should merge those consecutive intervals first, then partition them into some more sub-intervals with new parameters, and re-estimate all parameters together. Such trimming procedure can be repeated until there is no change in the break points.

PO-PBP: Strengths and limitations

The main strength of the PO-PBP model presented in this paper is the high accuracy of predictions within a timescale in the order of 4 weeks. It is timescales of this order that governments are considering to adjust restrictions and modify adjustments to restrictions Such high accuracy in these timescales makes the model not only applicable but also highly appealing to base policy decisions on.

A natural question to ask is whether comparable accuracy can be obtained in this work by directly fitting a pure birth process to the data-segments that lie between break points. We have made this calculation and conducted the comparison with our PO-PBP model and found that although the PBP performs reasonably well, it is not as accurate as our PO-PBP. For instance, the MAPE for predictions before the lockdown restrictions obtained by the former model is in the order of 6.03%, which is not as good as the MAPE for our PO-PBP, which is only 1.53%.

Remark 11 Inclusion of an additional parameter per the PO-PBP may be expected to improve performance compared to a PBP, although this is not always true as the nature of the parameter is critically important. We would like to highlight the weaker performance of the latter process in this context, and how this shortcoming may not only be overcome by the introduction of a new parameter, but additionally how the proposed novel PO-PBP produces a level of accuracy, implying that the inclusion of additional parameters are not needed. Our PO-PBP model achieves a remarkably low prediction error.

Would the accuracy of predictions of this model extend into the long-range future? This model, as with any models, should be used within its range of validity, with care to take into account the assumptions built into it. Due to some of the assumptions in the model that are described below, long-range predictions will lose accuracy unless they are corrected for by updated real-life data.

The model in this paper is based on a PBP, which is a special case of a more general Continuous-time Markov population process that includes both births and deaths. In this case, a “birth” is a new infection, and a “death” represents removal from the infected population by means of either recovery from illness or death. The real-life process of COVID-19 infection includes both infection and potential recovery or death. Thus in that sense a richer model that includes both birth and death would be appropriate. However, there is a trade-off. The theory for the richer model exists and is sound, but is computationally much less tractable.

The trade-off between representational accuracy and computability is a common one in mathematical modeling, and well understood in the literature (e.g., see [22]). In the case of birth and death processes, if births happen much faster than deaths, it is legitimate to disregard deaths within an appropriate timescale. In the case of COVID-19, infection can happen very quickly—in a matter of hours, whereas recovery or potentially death takes much longer: weeks potentially stretching into a month or more. Thus on timescales of a few weeks, much insight can be gained from a PBP.

Furthermore, the trade-off between the tractability of a PBP versus the longer range accuracy of one utilizing both births and deaths can be further tipped in the balance towards pure birth modeling by a method which to some extent accounts for both processes. Consider a growing birth and death process with the birth and death rate of λ and μ, respectively, such that λ > μ. Then to some extent, both may be accounted for in a PBP in which birth rate is modeled by the difference λ − μ. Our PO-PBP model employs this strategy: the λ in our model is really the difference between an underlying birth and death rate. This extends slightly the timescales within which the model is useful.

The considerations above are relevant to the prediction in Fig 3, in which the blue line indicates that if growth rates had continued in the same pattern as pre-lockdown, all Australians would have been infected by 24 April 2020. In reality that pattern would have been somewhat mitigated by factors not currently present in our PO-PBP. One such factor is recovery/death rates discussed above. Another is the finite size of the Australian population and our island status.

Due to Australia being an island along with recently closing its borders, any circulating virus has only a finite collection of approximately 25 million people to potentially infect. The more people infected, the greater the chance that when an infected person comes into contact with another person, that other person is also already infected, so that no new infection can occur. This saturation effect is not built into our PO-PBP, which means that the blue line in Fig 3 would shoot straight off the page had we not truncated it. In alternative models that account for the limiting effect of finite population size, that blue line would have curved down slightly as the infected proportion of the population became comparable with the overall population size. In other words, the domain of validity of the present PO-PBP is limited to the situation in which the overall proportion of infected individuals is still relatively low, as is the case, as of June 2020.

The most remarkable advantage of the PO-PBP is that it provides means of estimating and incorporating the proportion of hidden cases. More precisely, our model employs a new “observation probability” parameter to the underlying PBP to construct the new PO-PBP. Then, by maximizing the complicated likelihood function of the PO-PBP, all parameters including the observation probability at each time t are estimated. Due to the invariant property of MLEs, one minus the MLE of that observation probability will estimate the proportion of hidden cases in the population.

Any model, however, is only as valid as its assumptions. An assumption of the PO-PBP is that sampling is “uniform and random”, implying that the model assumes any infected person as likely to be tested and identified as any other person. The assumption of randomness is almost never 100% satisfied for any realistic scenario—some biases will always be present. What matters is how impactful these are. It is worth considering the impact of model assumptions in this modeling.

One feature of relevance is the availability of test kits. If shortage of test kits were to severely curtail sampling, this would undermine the validity of the model. In Australia, initial testing was largely limited to people considered “high risk”, and included those who recently returned from overseas, had contact with a confirmed COVID-19 case, or were in hospital with severe symptoms matching the disease, while other population members were considered “low risk”. If perchance significant infection had established in the “low risk” part of the population, and if a low death rate had allowed this hidden population to remain undetected, the proportion of undetected cases reported by our model would be an underestimate. However, the predicted pattern of confirmed infections would remain valid.

Recently, the Australian Government has substantially expanded testing opportunities and let testing for COVID-19 be available to every Australian with mild respiratory symptoms including a cough and sore throat This makes the “random sampling” assumption of the PO-PPB considerably more robust. We should soon be able to determine whether there has been a significant reservoir of undetected COVID-19 cases in Australia.

It should be noted that we are implementing a continuous-time model, whilst observations are reported just once daily. Use of a continuous-time model is still valid, however, since it is general enough to take account of the discrete data structure. This continuous-time model is utilized due to the power of the theory that underlines the model and the relevance of that theory to this modeling situation.

We conclude this section by stating that the PO-PBP models only the impact of COVID-19 with respect to the numbers of infections—it does not model other impacts on society (negative or positive) of policy control measures, as it is recognized that restrictions on gatherings affect people’s lives in different ways. A positive example is reduced air pollution due to reduced travel [23], while a negative example is increased risk of harms like domestic violence [24]. As yet there is no single model which incorporates all of these factors.

Discussion and conclusion: Policy implications

In this paper, we apply a class of continuous-time Markov population stochastic processes, namely the partially-observable pure birth process, to model and analyze the dynamics of the COVID-19 population in Australia. Specifically, we use the theoretical properties of this stochastic process to construct the likelihood function of cumulative confirmed cases to find the maximum likelihood estimates of its parameters. These estimations are used to predict future values of the population along with the number of unobserved hidden COVID-19 cases.

The Markovian stochastic process model that we develop is based on a partially observable pure birth process, and its predictions fit the actual observations at the Australian national level surprisingly well. Aside from its simplicity and high accuracy, there are several other advantages of this model from a policy perspective.

The stochastic process in our model revolves around only two parameters, both of which have clear and communicable practical interpretations: the former represents the speed of the spread, while the latter provides a measure of detection likelihood. We postulate that in the absence of any policy interventions, both parameters follow an evolution pattern that resembles a geometric decay. A shift in policy, however, may ratchet up or down the decay rate, thereby inducing a new infection trajectory.

As demonstrated, the model captures the complex dynamics of the detected/infected ratio, which is a critical component in the design of any containment policy. Furthermore, policymakers gain access to a coherent and insightful representation of the situation to informatively contemplate the consequences of action/inaction over the span of a few weeks (i.e., how the epidemic unfolds in the absence of any interventions).

Because of its efficacy, this model equips policymakers with a powerful tool to conduct scenario-based analyses, and enables predictions with a high degree of precision, of how a particular decision drifts the evolution trajectory of the disease, and enables such a prediction only a week after it is enacted. This supports early identification and reinforcement of effective policies, and timely scale back or discontinuation of others.

The model empowers decision-makers to evaluate and compare the implications of the two fundamental hallmarks of the model: lowering the infection rate versus increasing the detection likelihood. Depending on which of these two avenues should be pursued, the government resources should be directed accordingly, and the corresponding message should be conveyed to the public.

If extensive community screening were undertaken particularly for asymptomatic citizens, then this model would be expected to give very accurate predictive power throughout the full range of possible policy implementation with regard to social distancing restrictions. The model further lends itself to crafting hybrid policies that utilize a combination of these two approaches and the delicate division of available resources between them.

Availability of test kits is a practical consideration in interpreting this model. The current model assumes an adequate supply of test-kits so that sampling is not restricted by a shortage. If this assumption were not fully met during the early days of the pandemic in Australia, it would mean we would have potentially underestimated the “hidden fraction” of undetected COVID-19 cases. Statements by the Government indicate that there is no current shortage of test kits in Australia. It would be interesting but quite difficult future work to try to explicitly incorporate test-kit shortage in the model. Such future work would have particular relevance in other countries where test-kit shortages are a pressing issue.

Additionally, our epidemic model treats the entire nation as a single pool of homogeneous agents with equal exposure to risks and similar contact behaviors. While this assumption may sound fairly restrictive, we believe that it has not impacted the quality of our findings in a profound way. Nevertheless, accounting for inherent heterogeneities and local characteristics of smaller regions/communities would further enhance the richness of the model and strengthen its outcomes. This modification particularly lends itself to countries such as U.S. where there is substantial variation in the extent and timing of the epidemic across states.

This model is likely to be applicable to many other countries and circumstances beyond Australia, since there are not a large number of location-specific assumptions built in, the main one being that testing involves a reasonably uniform sampling of the population. This strength partly derives from the analytical nature of the model, rather than being one which is simulation-based and within that highly customized to local factors. Ideally, different models are used in conjunction, and this model could profitably be used in conjunction with other types of models, in understanding the spread of COVID-19 in the future both in Australia and elsewhere.

A benefit of this model is that the 4 weeks prediction horizon allows officials to fine-tune their short-term actions and contingency planning in light of reasonable confidence in the immediately expected upcoming pattern in the number of cases.

This model suggests several possibilities for further research, that would enhance its applicability across a broader range of circumstances. For example, in the Australian data, the structural break-points were identified visually. There is potential to develop a purely computational method of doing these identifications, according to Remark 10. This would enhance applicability to more complex and long-term data sets, as may be expected in the future across the world.

All in all, the PO-PBP is a useful model for understanding and predicting the trajectory of COVID-19 in Australia under various policy choices. On a short timescale, relevant to government actions, predictions have been shown to be very accurate. The model also appears to be sensitive to subtle shifts in population behavior, allowing it to be useful in considering the impact of social events such as the Easter break in April 2020.

Supporting information

S1 Appendix. Appendix: Background.

A brief overview of the underlying stochastic process used for modeling in this paper is presented in Appendix provided in Supplementary Materials.



The authors are very grateful to Dr Judy-anne Osborn from the University of Newcastle in Australia for extensive useful conversations about modeling assumptions, implications and presentation thereof, during preparation of this paper. The authors also thank Prof. Nigel Bean from the University of Adelaide in Australia, Professor Michael Saunders from Stanford University in the US, and the anonymous review team for their invaluable comments that helped improve the previous version of this paper.


  1. 1. Huang H, Wang Y, Li X, Ren L, Zha J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395:497–506. pmid:31986264
  2. 2. Wu Y, Chen C, Chan Y. The outbreak of COVID-19: An overview. Journal of the Chinese Medical Association. 2020;83:217–220. pmid:32134861
  3. 3. Chen TM, Rui J, Wang QP, Zhao ZY, Cui JA, Yin L. A mathematical model for simulating the phase-based transmissibility of a novel coronavirus. Infectious Diseases of Poverty. 2020;9(24): pmid:32111262
  4. 4. Dandekar RA, Henderson SG, Jansen M, Moka S, Nazarathy Y, Rackauckas C, et al. Safe Blues: A method for estimation and control in the fight against COVID-19. 2020:
  5. 5. Koo KR, Cook AR, Park M, Sun Y, Sun H, Lim JT, et al. Interventions to mitigate early spread of SARS-CoV-2 in Singapore: A modelling study. Infectious Diseases of Poverty. 2020;9(24): pmid:32213332
  6. 6. Moss R, Wood J, Brown D, Shearer F, Black AJ, Cheng AC, et al. Modelling the impact of COVID-19 in Australia to inform transmission reducing measures and health system preparedness. 2020;
  7. 7. Small M, Cavanagh D. Modelling strong control measures for epidemic propagation with networks—A COVID-19 case study. 2020; arXiv:2004.10396v2.
  8. 8. Olson CA, Beard KH, Koons DN, Pitt WC. Detection probabilities of two introduced frogs in Hawaii: Implications for assessing non-native species distributions. Biological Invasions. 2012;14:889–900.
  9. 9. Gerberry DJ. Trade-off between BCG vaccination and the ability to detect and treat latent tuberculosis. Journal of Theoretical Biology. 2009;261:548–560. pmid:19733577
  10. 10. Kao RR. The impact of local heterogeneity on alternative control strategies for foot-and-mouth disease. Proceedings of the Royal Society B: Biological Sciences. 2003;270:2557–2564. pmid:14728777
  11. 11. Fink GA. Markov models for pattern recognition: From theory to application. Berlin-Heidelberg-New York: Springer; 2008.
  12. 12. Vaseghi SV. Advanced digital signal processing and noise reduction. United Kingdom: Wiley; 2009.
  13. 13. Aggoun L, Elliott RJ. Recursive estimation in capture-recapture methods. Sultan Qaboos University, Oman, Science and Technology. 1998;3: 67–75.
  14. 14. Baldi P, Brunak S. Bioinformatics. Cambridge: MIT Press; 2001.
  15. 15. Black AJ, McKane AJ. Stochastic formulation of ecological models and their applications. Trends in Ecology and Evolution. 2012;27:337–345. pmid:22406194
  16. 16. Keeling MJ, Ross JV. On methods for studying stochastic disease dynamics. Journal of the Royal Society Interface. 2008;5:171–181. pmid:17638650
  17. 17. Renshaw E. Modelling biological populations in space and time. Cambridge: Cambridge University Press; 1993.
  18. 18. Bean NG, Elliott R, Eshragh A., Ross JV. On binomial observations of continuous-time Markovian population models. Journal of Applied Probability. 2015;52(2):457–472.
  19. 19. Bean NG, Eshragh A, Ross JV. Fisher Information for a partially-observable simple birth process. Communications in Statistics-Theory and Methods. 2016;45(24):7161–7183.
  20. 20. Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. International Journal of Forecasting. 2006;22(4):679–688.
  21. 21. Lehmann EL, Casella G. Theory of point estimation. New York: Springer-Verlag Inc; 1998.
  22. 22. Tibshirani R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. 1996;58(1):267–288.
  23. 23. Dutheil F, Baker JS, Navel V. COVID-19 as a factor influencing air pollution? Environmental Pollution. 2020;263(A):114466. pmid:32283458
  24. 24. Bradbury-Jones C, Isham L. The pandemic paradox: The consequences of COVID-19 on domestic violence. Journal of Clinical Nursing. 2020; pmid:32281158