Real-Time Epidemic Monitoring and Forecasting of H1N1-2009 Using Influenza-Like Illness from General Practice and Family Doctor Clinics in Singapore

Background Reporting of influenza-like illness (ILI) from general practice/family doctor (GPFD) clinics is an accurate indicator of real-time epidemic activity and requires little effort to set up, making it suitable for developing countries currently experiencing the influenza A (H1N1 -2009) pandemic or preparing for subsequent epidemic waves. Methodology/Principal Findings We established a network of GPFDs in Singapore. Participating GPFDs submitted returns via facsimile or e-mail on their work days using a simple, standard data collection format, capturing: gender; year of birth; “ethnicity”; residential status; body temperature (°C); and treatment (antiviral or not); for all cases with a clinical diagnosis of an acute respiratory illness (ARI). The operational definition of ILI in this study was an ARI with fever of 37.8°C or more. The data were processed daily by the study co-ordinator and fed into a stochastic model of disease dynamics, which was refitted daily using particle filtering, with data and forecasts uploaded to a website which could be publicly accessed. Twenty-three GPFD clinics agreed to participate. Data collection started on 2009-06-26 and lasted for the duration of the epidemic. The epidemic appeared to have peaked around 2009-08-03 and the ILI rates had returned to baseline levels by the time of writing. Conclusions/Significance This real-time surveillance system is able to show the progress of an epidemic and indicates when the peak is reached. The resulting information can be used to form forecasts, including how soon the epidemic wave will end and when a second wave will appear if at all.


Introduction
On 2009-04-24, the World Health Organization (WHO) reported the spread of a novel influenza A (H1N1) strain in the United States and Mexico.Sentinel surveillance which was mainly hospital based had indicated increased numbers of influenza-likeillness (ILI) in Mexico occurring since 2009-03-18 [1].Over the next months, the virus spread rapidly across the globe, resulting in the WHO declaring a pandemic and advising countries to activate their pandemic preparedness plans [2].Singapore identified her first imported case of influenza A (H1N1-2009) on 2009-05-27 [3], and the first unlinked cases on 2009-06-19 [4], which indicated community transmission had begun in Singapore.
Singapore experienced all three influenza pandemics of the last century-in 1918, 1957 and 1968 [5,6].During the 1957 pandemic, reporting of influenza cases by clinicians provided a reasonably clear indication of daily epidemic activity (Figure 1A) [7].Influenza-like illness (ILI) has also been used widely as an indicator of influenza activity during non-pandemic epidemics, with ILI reporting by sentinel general practice/family doctor (GPFD) clinics forming the backbone of surveillance systems for influenza in many countries [8][9][10][11][12][13][14], and these have been used to monitor the current pandemic [15][16][17][18].In Singapore, though, acute respiratory illness (ARI) data captured from electronic medical records, as a more general indicator of infectious disease outbreaks, have traditionally been used by health authorities, including during the early part of the current pandemic.However, ILI monitoring can provide an estimate of case numbers and hence attack rates, hospitalisation and case fatality ratios [19], and is more specific for influenza than ARIs.
Data from ILI monitoring can also be used for modelling of influenza epidemics and pandemics [20][21][22][23].Modelling can be performed retrospectively to determine the relative importance of community compared to household transmission, or to determine the effect of pharmaceutical and non-pharmaceutical interventions [21,[23][24][25][26]. Modelling can also be performed in real-time during an epidemic, as proposed by Hall and colleagues, who used mortality data from England and Wales to demonstrate how models could have forecast when epidemic activity would peak during several historical pandemic events [27].Since H1N1-2009 has low hospitalisation and mortality rates (less than 1% of infected individuals) [28], reporting of ILI from GPFD clinics would potentially provide a more accurate indicator of real-time epidemic activity and progress than hospitalisations and confirmed fatalities.
While data on ARIs are routinely collated and laboratory surveillance of influenza has been in place in Singapore for more than 30 years [29], there is currently no system for monitoring GPFD consults for ILI in Singapore.In order to monitor the epidemic and adjust response plans in real-time, we rapidly developed a system for ILI surveillance, with resulting data and forecasts made publicly available via a website.The purpose of this paper is twofold: N to describe how the system was developed and used to monitor the progress of the epidemic; and N to describe how the resulting information was used to perform near real-time forecasting of the course of the epidemic in Singapore.

Results
We started the project in early June 2009, shortly after Singapore identified her first imported case of influenza A (H1N1-2009) on 2009-05-27.We sent out mass appeals to 535 e-mail addresses of GPFDs or clinics, and 23 clinics agreed to participate; the locations of the participating GPFD clinics are shown in Figure 2. Four clinics were city or office area practices and the remainder were situated in residential areas across the island.
Figure 1(B,C) shows trends in consultations for ARIs and ILIs from the network.Data submissions started on 2009-06-25, by which time there had been 315 confirmed H1N1 cases (including 87 locally transmitted cases) in Singapore [30].There was a clear but initially unanticipated weekly periodicity to the data, with  [31].(A-C) Both daily counts (lines) and weekly averages (shaded polygons) are presented.(D) A marked drop in baseline ARI consultations can be seen immediately before the epidemic, complicating the determination of when the epidemic started using this measure.doi:10.1371/journal.pone.0010036.g001lower consultation rates on the weekend and a post-weekend surge in attendances.For descriptive purposes (but not analytical ones), we therefore used weekly averages to provide a smoothed picture of the epidemic trajectory.A comparison between Figure 1B or D and C clearly displays ILI as a better indicator of epidemic activity than ARI.The weekly average ARI consults per doctor in the early epidemic period was between 10 and 15 (Figure 1B), and peaked at 17 in the week ending 2009-07-25, but from this alone it was difficult to determine how much H1N1-2009 epidemic activity there was around the time community transmission was starting; this is compounded by the high baseline rate making the height of the peak relatively low, at just around one and a half times the baseline level.Figure 1D shows the weekly epidemiological data for acute respiratory tract infections in Singapore based on government clinic attendances for ARI [31].The government clinic ARI data peaked in the week ending 2009-08-01, but, as with our ARI surveillance data, the high levels of background noise make it difficult to ascertain how much community-level infection there was near the start of the epidemic, especially since the epidemic was preceded by a considerable dip in ARI numbers.On the other hand, there was a marked, nearly five-fold increase in our ILI case data, from an average of about 2 = 3 of an ILI per GPFD per day in the week ending 2009-06-27 to a peak of 3 1 = 2 in the week ending 2009-08-01.The highest recorded ILI rate occurred on 2009-08-03 (a Monday) with 6 1 = 2 ILIs per family doctor being reported.The sentinel network indicated that the epidemic had peaked around the start of August, and that ILI rates had returned to near baseline levels early in September.
Predictions of the number of ILIs being seen by our GPFDs and of the total number of people infected are presented in figure 3; animations of these forecasts and of the forecast total number seeking medical attention can be found in the supporting information (ILI/ GPFD /d in video S1, total ILI/d in video S2 and cumulative infections in video S3).These incorporate both population stochasticity and parametric uncertainty.The eventual forecast was that 13% of the population had been infected, with a 95% credible interval of (9%,19%).Initial forecasts were adversely affected by uncertainty in the parameters, caused by the vagueness of the subjective prior distributions we used and the scarcity of information from the data.By the middle of July, the algorithm was correctly forecasting the peak would occur at the start of August, although the magnitude of the epidemic was grossly overpredicted, and the accuracy of the forecast of the time of the peak may have been merely fortuitous.By the end of July, forecasts were stabilising around what transpired to be the eventual data, and by the middle of August, after the peak had come, the forecasts closely foreshadowed the tail of the epidemic.A measure of predictive accuracy is presented in figure 4. By the end of July, predictive error was averaging around 1 ILI per GPFD per day over a one-week time horizon.The sequence of subjective posterior distributions for the parameters and for the effective reproduction number R t over time are presented in figure 5, although we stress that these are our subjective distributions and do not expect the reader to share them [32][33][34].

Discussion
We have shown that it is possible rapidly, and at short notice, to deploy a real-time influenza epidemic surveillance system using GPFDs in the absence of an existing system.This is likely to be a workable model in much of the developing world where a significant proportion of primary care is delivered by private practice GPFDs.Firstly, we provide proof of concept that it is feasible, within a month, and with no budget, to establish a protocol for daily data submission for ILI and begin submission.Secondly, we show that processing the data in near real-timewith cases seen each day entered by the following day-can provide graphical trends that describe the progress of an influenza epidemic.Finally, we demonstrate how such data can be used in real-time, and in combination with a process-based model refitted daily, to generate forecasts that can subsequently be verified against actual data as an epidemic unfolds, as is common in other dynamic applications such as weather and finance.
While ILI surveillance is used widely in temperate countries [8][9][10][11][12][13], there are few publications on the effectiveness of ILI surveillance in tropical countries to chart the spread of epidemic influenza, given the high baseline incidence of other non-influenza diseases and minimal seasonal forcing.Evidence is now emerging on the value of such surveillance systems in the tropics [35], and our study shows that ILI surveillance can track epidemic influenza activity in such settings.The slow uptake of influenza surveillance systems for tropical countries may be related to the lack of appreciation for the epidemiology and impact of tropical influenza [36].Previous work has shown that both non-pandemic (often called ''seasonal'' in temperate countries in which influenza is associated with winter) and pandemic influenza caused substantial excess mortality in tropical Singapore [5,37].
In Singapore, influenza activity has traditionally been monitored through a combination of laboratory and ARI morbidity [29].ARI data reflect the total burden of acute respiratory illness from all causes, often including non-infectious causes such as exacerbations of chronic lung disease which may be environmental in origin.However, it is clear from this study that while both ARI and ILI counts give an indication of when epidemic activity peaks, ILI data provide better resolution of influenza epidemic activity, with the relative magnitude of increase over the baseline being far greater than for ARI data, since influenza activity in the early epidemic phase is masked by the high and obstreperous baseline rates of other respiratory illnesses diagnosed as ARI.The other system for tracking influenza activity in Singapore is based on laboratory confirmed diagnoses of influenza.This is similar to what is done in many countries throughout the world as part of the World Health Organization's Global Influenza Surveillance Network.Monitoring of laboratory confirmed diagnoses picked up an increase in H1N1-2009 isolates among a sub-sample of ILI cases presenting at government polyclinics about one week before the epidemic was apparent in our ILI data (data not shown).However, the advantages of ILI surveillance is that it is much cheaper than laboratory-based surveillance and there are no capacity issues that may limit the number of samples that can be processed daily.In addition, laboratory testing of random samples is less sensitive to changes in absolute numbers of community cases at the peak of the pandemic when the influenza proportion among ILI cases remains relatively steady [31].ILI surveillance is therefore a cheaper and possibly more effective alternative to traditional laboratory surveillance, especially for resource-poor areas, to obtain reasonable sample sizes.
Setting up such a surveillance network has the secondary benefit of allowing real-time forecasting, which allows more informed policy making.By forecasting the epidemic ahead of time, we allow our forecasts of epidemic activity to be verified against data.We observed during the epidemic that modelling results correctly forecast the timing of peak epidemic activity on some days, but was off by up to a week at other times, though the actual magnitude of the peak was markedly different from early forecasts.We note though that even the relative accuracy of the forecast of the timing of the peak may have been merely fortuitous, and stress that we provide no theoretical results to guarantee this accuracy is repeatable.One particular difficulty we faced was ensuring the predictive accuracy of the system, given the lack of training data and the need to inform policy making as the epidemic unfolded.The results presented herein are therefore almost entirely the same as those presented on-line, including any shortcomings; the only alterations to the model and approach were to allow reporting rates to vary across the week (a change partially implemented partway through the study) and to remove an adhoc method intended to make the approach more robust to potential changes to the parameters in time (which transpired not to improve matters enough to warrant introducing statistical non-coherencies).
The eventual forecast for the final size of the outbreak was around 13% with a 95% credible interval of (9%,19%).If true, then combined with the rolling out of vaccine and the potential for some additional existing immunity [38] (a possibility we conservatively excluded from the analysis), this figure suggests Singapore is unlikely to experience a large second wave without substantial mutation of the pathogen.The estimate of around 13% corresponds closely to a paired serological study of Singaporean adults which estimated 13% (11,16)%, adjusting for the age distribution of the country, had experienced a four-fold rise in antibody titres (Mark I-Cheng Chen, personal communication).The close correspondence adds considerable confidence to the conclusions of the study.
Further evaluation is underway of the value of retaining a sentinel network permanently in a tropical city-state with year-round nonpandemic influenza transmission and additional bi-annual epidemics.By establishing an avenue for public display of infectious disease forecasts, we hope to build public and institutional confidence in and acceptance of modelling in the context of infectious diseases.To this end, the network was publicised in the local media and the website was made freely available to the general public.This helped provide an additional layer of transparency to reporting of the numbers of people infected with influenza and the relative impact on the wider community.We believe that this contributed to the overall national risk communication strategy and helped to reduce the level of panic and disruption to normal activity feared at the onset of the pandemic.
Several limitations of our work need to be highlighted.Firstly, this system of data collection was fully dependent on the goodwill (Left) Actual (red and orange crosses) and predicted (grey shaded area) average number of patients presenting with influenza-like illness per day at the average participating GPFD.The information used to form the forecast is indicated by the red crosses.The last day of information used in forming the forecast is indicated with a red triangle.Predictions here (and in the right-hand column) take the form of decreasing credible intervals, with the region spanned by the outermost polygons corresponding to 95% credibility.Orange crosses indicate future data not used in forming the forecasts.(Right) Predicted total number of people who (i) are currently symptomatic, or (ii) have recovered, assuming no pre-existing immunity.The last day of information used in forming the forecasts is indicated with a red triangle.The cyan cross on the bottom panel indicates the age-adjusted estimate of adult seroconversion in the community from an independent study (maximum likelihood estimate and 95% confidence interval, Mark I-Cheng Chen, personal correspondence).doi:10.1371/journal.pone.0010036.g003 of participating GPFDs, who received no monetary compensation.We found that we could continue to motivate the participating GPFDs by providing frequent updates based on their aggregated contributions.Although we sent out mass appeals to over 500 email addresses, only 23 GPFDs agreed to participate.The poor response rate could be due to a combination of factors, including: 1. duplicate or invalid e-mail addresses, the former of which could have been addressed by pre-grepping the list, the latter by better book-keeping; 2. spam filtering, which can only really be addressed by using alternatives to e-mail, such as facsimile; 3. lack of publicity on the objectives and importance of our project, which might have been improved by more careful rhetoric in the invitation letter we sent; 4. and reluctance by GPFDs to commit to the burden of data collection during an impending epidemic which was already anticipated to increase workload.
The final premise may be the most critical, and we suggest that some form of financial reimbursement be considered to compensate GPFDs for the effort and time needed to drive data submission in future, as this would likely improve recruitment rates and make such a system sustainable in the long term.Overall, the poor response rate highlights the challenge of recruiting appropriate clinics for any such system, particularly when using e-mails to disseminate such information, and at short notice.However, for a medium-sized city of 4.8 million residents, the network of around 20 GPFDs sufficed to provide considerable information on epidemic progress.Notwithstanding this small number of participating GPFDs, the surveillance system achieved its intended objective of tracking and forecasting influenza epidemic activity in near real-time.The small number of participating GPFDs (estimated to be about 2% of all GPFD clinics in Singapore) may make it difficult to assess if our ILI data are representative of all influenza diagnoses during the epidemic, but this is a limitation common to sentinel GPFD networks for influenza.The potential impact of non-representativeness caused by non-response would not, however, impact the validity of the forecasts, since the methods used for that do not assume the sentinels were selected at random.Other countries have used GPFD networks for surveillance of other viral illnesses [8][9][10][11][12][13]20,21,28] and perhaps the combined lessons from these strategies could be applied more widely internationally.
In hindsight, several aspects of the approach could have been bettered.We did not anticipate the strong day of the week effect on ILI consulting rates, and this had a deleterious effect on predictions, especially when moving to Mondays from Sundays.In mid-July we changed the model to allow different rates at the weekend from the rest of the week, but by mid-August it became clear that the model would fit much better were every day of the week allowed its own reporting rate; this is the model presented herein.Again, in hindsight, it is obvious that there was bound to be sufficient information in the data to be able to estimate the differential reporting rates over the days of the week.Alternative models, such as the Richards model [39,40], might have proven as or more effective, and certainly could be more parsimonious, than the compartmental model we used, but our experience was that the challenges of developing the software before any data had been collected effectively ruled out deciding on an optimal model to use.As is common in the field of infectious disease modelling, the model we used made many simplifying assumptions (see methods), all of which may potentially have reduced the quality of the forecasts.For instance, the presence of heterogeneous mixing or susceptibility in reality but not in the model may lead eventually to changes to the parameter estimates over time as the routine endeavours to fit a model excluding these effects, but in forming forecasts at an early stage, the future path of parameter estimates is unknown and so forecasts cannot take this into account.In this paper, we have used the term ''forecast'' sensu Keyfitz [41], to indicate the belief we invested in these predictions and the way they were used in contingency planning in some of the authors' institutions.This contrasts with his definition of a projection, which is the extrapolation of past trends without claiming to expect them to match the future.A consequence of this reticence, according to Keyfitz, is that projections cannot be wrong (never being claimed right), while predictions or forecasts are ''practically certain'' [41] to be in error, and are prone to black swan-type events [42]accepting this, and excepting the initial predictions, the forecasts we made fared very well (figures 3 and 4).Had we concentrated instead on projecting the epidemic, via a suite of competing models, we might have learned more about the assumptions underlying those models, which would have informed future modelling efforts.A comparison of different projecting approaches, as has been done for seasonal influenza monitoring [43], would therefore be very useful to refine the general approach for future outbreaks of emerging diseases, but this remains work for the future.
In conclusion, a real-time GPFD surveillance system can be set up rapidly during an epidemic and is able to show the progress of the epidemic.Such an inexpensive system can be deployed even in resource-poor settings to track future influenza epidemics and pandemics and forecast their trajectories in near real-time.

Ethics approval
Ethics approval for the project was obtained from the institutional review board of the National University of Singapore.

Recruitment, enrollment and inclusion criteria for GPFDs
We obtained e-mail addresses of GPFDs in Singapore from the College of Family Physicians Singapore (CFPS) and the directory of Pandemic Preparedness Clinics, a group of over 200 clinics registered with the Ministry of Health to manage influenza cases.In all, invitations were sent to 535 e-mail addresses.A series of road shows was also conducted at the CFPS to describe how ILI surveillance could help to track an epidemic.GPFDs who agreed to participate were also asked to extend the recruitment to their contacts.
Participating GPFDs had to be doctors registered with the Singapore Medical Council who worked at least three full days a The reader's posterior distributions may differ from ours (see refs [32][33][34]).In the background for reference is the number of ILIs per GPFD per day (not to scale).The line of unity is marked on the panel for the effective reproduction number, R t ; the posterior crosses the line of unity around the day of the peak.Prior distributions for the parameters (R t is not a parameter) are indicated on the appropriate panels, using the notation Be for the beta distribution and N z m,s week in a general practice or family medicine clinic in the community.Participation was purely voluntary and participating GPFDs were given the option to withdraw from the project at any time.

Data submission and processing
Enrolled GPFDs were requested to submit returns on their work days by e-mail or facsimile by 2pm the following day.The data submitted comprised information on clinically diagnosed ARIs.Clinically, influenza is an acute respiratory infection.As a group, the ARIs may be defined as a clinical diagnosis of patients who present with new short-term (time from onset less than two weeks) respiratory symptoms of cough, rhinorrhoe a, nasal congestion and/or sore throat, which may or may not be accompanied by fever.The syndrome is usually though not exclusively associated with viral ae tiologies.The range of pathogens responsible for ARI besides influenza is described in a recent WHO paper [44].A number of viruses cause a clinical illness which is difficult to distinguish from influenza, including respiratory syncytial virus, piconaviruses, parainfluenza, and adenovirus.These produce an influenza like illness [45].The operational definition of ILI we then used in performing the analyses was an ARI exhibiting a fever of §37.8uC; this approximates the definition used by the United States' Centers for Disease Control and Prevention, which defines ILI as an acute illness with cough and/or sore throat with a fever of §37.8uC, in the absence of a known cause other than influenza [46].Other data elements collected in the data collection form (figure S1) included demographic, clinical, and antiviral treatment information.

Mathematical modelling
Disease dynamics are modeled via a standard, stochastic compartmental model [47-50, inter alios], with daily increments and individuals passing through a series of unobserved classes corresponding to clinical stages of infection-Susceptible, Exposed (infected but not infectious), Infectious and Removed (recovered and subsequently immune, or deceased)-formulated by the equations where A t , B t , C t represent the number of people in the whole population newly infected, infectious, and removed, respectively.These are assumed to follow binomial distributions as follows To be explicit, the infection model is formulated under the simplifying assumptions that: 1. infections are allowed to arise from importation at an assumptive constant rate e and from other local cases using the law of mass action, with b characterising the mixing and transmission probabilities of the local population, which as a first-order approximation is assumed to be homogeneous; 2. ''importations'' at rate e represent inhabitants of the country becoming infected via travel abroad or via travellers passing through Singapore, not of new immigrants entering the country infected; 3. the population size is taken to be fixed at 4.8M with no birth, death or genuine immigration or emigration during the epidemic (we ignored the fact that the official population size increased to 5M during the epidemic); 4. transition from exposed to infectious to removed is assumed to occur at constant rates l {1 and c {1 , respectively; 5. per-capita rates (r [ R z ) can be transformed to daily probabilities (p [ (0,1)) using the relationship p~1{exp {r ð Þ; and 6. no parameters change with time.
As with all models, the assumptions that go into ours can be criticised on biological, sociological and epidemiological grounds.Note that neither the parameters h~b,e,l,c ð Þ nor the states S t ~St ,E t ,I t ,R t f gare known.The infection model is married to an observation model, namely that the (known) number of cases reported on day t is D t *Pois N t y t ð Þ where N t is the (known) number of GPFDs submitting reports on day t and y t ~dd(t) wzI t =2084 f g where d t ð Þ is the day of the week of day t (Monday being 1 and so on).The parameters of the observation model are thus d i for i~1, . . .,7-the probability an infectious individual will seek medical attention on day of the week i-and w, which is related to the ''background'' consulting rate for non-H1N1 ILIs.We took the differential reporting rates to be the same for H1N1 and non-H1N1 ILIs, so that wd i represents the typical number of ILIs per GPFD on day i in the absence of the pandemic.There were 1480 GPFD in Singapore in 2001 [51], and the population grew 17% from 2001 to 2009, resulting in an estimated 1730 GPFDs in Singapore in 2009; 83% of patients attend these rather than polyclinics [51], and together these yield the divisor 2084; this permits the parameters to have a more natural interpretation (under the assumption that the participating GPFDs are representative) but is unneccessary for the analysis, as it functions as a mere rescaling of d d(t) .We artificially take the day of the week of public holidays to be a Sunday, with the normal week structure resuming the following day.
The assumptions of the observation model are that: 7. consultations occur only when individuals are infectious; 8. consultations are conditionally independent; 9. per capita consulting probabilities for those infected with influenza A (H1N1-2009) are constant throughout the epidemic; and 10. overall consulting rates for other diseases that may be mistaken for influenza A (H1N1-2009) are also constant -excluding the day of week effect-i.e.there are no concurrent epidemics (an assumption subsequently supported by laboratory testing that suggested limited levels of co-circulating strains).
As before, the validity of these assumptions is open to debate.
In the original formulation, we forced d i ~d for all i, i.e. to be equal.In the middle of July, in response to the obvious variation over the week, we changed the constraint of the model to d 1 ~d2 ~. . .~d5 and d 6 ~d7 .By mid-August, it was apparent that the day of the week effect needed to differ on each day of the week to attain a good fit.It is therefore the model without constraints that we present in this paper.

Statistical methodology
The parameters of the model are estimated within the Bayesian statistical paradigm [52, for instance] in which semi-informative prior distributions are assigned to parameters and incoming data incorporated via the likelihood function to obtain a time series of posterior distributions for the parameters and unobserved state space.
Since the state space is unobserved, a statistical method called particle filtering [53,54] is used to integrate over the possible realisations consistent with the daily observations.A series of 10 000 ''particles'' are created to which are associated parameter values and state space configurations generated from the prior distribution.Particles are iterated forward one day at a time via simulation of the state space, and the likelihood function calculated conditional on the trajectory of that particle and its associated parameter values.The likelihood function is then used to weight the particle.Particle degeneracy is overcome via resampling [54], while particle diversity is maintained via kernel smoothing [55]; the latter means that the resulting posterior distribution is approximate.The (approximate) posterior predictive distribution is derived by continuing the simulations beyond the last observation and weighting the resulting distribution via the particle weights at the last observation.
The particle filter algorithm proceeds as follows.where q is drawn from the integers 1,2, . . .,P f gwith probability proportional to ŵ w q tz1 , then setting w p tz1 ~1=P V p.  [55], we set h~0:3, Z is generated from a multivariate Gaussian distribution with mean vector 0 and variance given by the variance-covariance matrix of x x p tz1 over all p, and m p tz1 is the vector of means of x x p tz1 over all p if the simulated value x p tz1 falls within the correct support or x p tz1 ~x x p tz1 otherwise.
6. Increment t by one and repeat steps 2 onwards, until the current time is reached; thereafter to obtain the posterior predictive distribution repeat step 2 only (incrementing t) for as long as desired.
The algorithm provides the posterior distribution of any parameter, state or function thereof (such as the basic reproduction number, R 0 , or the effective reproduction number, R t , see e.g.[56,57]) by taking a weighted average of this characteristic according to the posterior weights w p t at the last observation time t.Here, only the posterior predictive distribution of the underlying states is of interest.Since the prior distributions taken were subjective (see below), the resulting posterior distributions are also subjective, and as a caveat lector we caution that our posterior distributions may differ from the reader's; for further information on subjective probability the reader is directed to the writings of de Finetti (e.g.[34]) or Lindley (e.g.[33]).For references on particle filtering and examples of its use in population dynamic modelling in ecology, see [53][54][55][58][59][60][61].
The prior distributions used are given in figure 5.In setting these, we aimed to balance the need to supplement the information content of the sentinel data with relevant information from other sources, with the desire not to obliterate the signal from the data.We set the prior mean for the infection rate, b, to be 1.2, with standard deviation 0.8.Combined with the prior distribution for the infectious period, this leads to a range for R 0 of 0 to around 6, i.e. more than spanning the range of estimates for historic pandemics.The prior distribution for the importation rate, e, was derived from a crude extrapolation of the timeline of the first five weeks of importations to the country [62].The prior distributions for the latent period and infectious period were modelled loosely on symptom onset after infection on an aeroplane [63] and a review of volunteer challenge studies [64].The prior distributions for the background rate of non-pandemic ILIs (w) were based upon the clinical insight of the authors, and for the reporting probabilities from guesstimation, noting that it is common for employers or schools in Singapore to require a formal medical certificate before allowing staff or students off work or out of class.We conservatively forced R 0 ð Þ to be 0 since we did not know how the findings of studies in temperate countries [38] relating to prior exposure would extrapolate to the tropics; in this way, forecasts may be seen as worst case scenarios.The prior distributions for E 0 ð Þ and I 0 ð Þ were derived from extrapolating the number of confirmed locally acquired cases.
Predictive error was assessed by taking the posterior distribution of absolute difference between forecasts and observations, averaged over a one-week time horizon, and then averaged to get the posterior mean prediction error.
All statistical routines were written by the authors using the R statistical programming language [65].

Automation script
Modelling results were updated daily around 3pm to a website that could be publicly accessed [66].This was automated using a bourne shell script that handled time, file transfer, archiving of previous forecasts, statistical processing, and positing of new output on the web.This was run on a unix web server using ISC's cron.

Figure 1 .
Figure 1.Influenza diagnoses in Singapore in 1957 and 2009 using alternative methods.(A) ILI in government and city council clinics, 1957 [7].(B) ARI in this GPFD sentinel network, 2009.(C) ILI in this GPFD network, 2009.(D) Weekly ARI in government polyclinics, 2009 [31].(A-C) Both daily counts (lines) and weekly averages (shaded polygons) are presented.(D) A marked drop in baseline ARI consultations can be seen immediately before the epidemic, complicating the determination of when the epidemic started using this measure.doi:10.1371/journal.pone.0010036.g001

Figure 2 .
Figure 2. Spot map showing the locations of participating GPFD clinics in Singapore.Most populated parts of the island were represented, the exception being the Woodlands, Sembawang and Yishun areas to the North.doi:10.1371/journal.pone.0010036.g002

Figure 3 .
Figure3.Evaluation of forecasts.(Left) Actual (red and orange crosses) and predicted (grey shaded area) average number of patients presenting with influenza-like illness per day at the average participating GPFD.The information used to form the forecast is indicated by the red crosses.The last day of information used in forming the forecast is indicated with a red triangle.Predictions here (and in the right-hand column) take the form of decreasing credible intervals, with the region spanned by the outermost polygons corresponding to 95% credibility.Orange crosses indicate future data not used in forming the forecasts.(Right) Predicted total number of people who (i) are currently symptomatic, or (ii) have recovered, assuming no pre-existing immunity.The last day of information used in forming the forecasts is indicated with a red triangle.The cyan cross on the bottom panel indicates the age-adjusted estimate of adult seroconversion in the community from an independent study (maximum likelihood estimate and 95% confidence interval, Mark I-Cheng Chen, personal correspondence).doi:10.1371/journal.pone.0010036.g003

Figure 4 .
Figure 4. Quantification of predictive error.Posterior absolute deviation between predicted average ILIs per GPFD and observed average, with error averaged over the one week period following the time the forecast is made.doi:10.1371/journal.pone.0010036.g004

Figure 5 .
Figure 5. Subjective posterior distributions of parameters and R t Posterior mean and marginal point-wise 95% credible intervals.The reader's posterior distributions may differ from ours (see refs[32][33][34]).In the background for reference is the number of ILIs per GPFD per day (not to scale).The line of unity is marked on the panel for the effective reproduction number, R t ; the posterior crosses the line of unity around the day of the peak.Prior distributions for the parameters (R t is not a parameter) are indicated on the appropriate panels, using the notation Be for the beta distribution and N z m,s 2 À Á for the modified normal distribution such that if X *N z m,s 2 À Á then Y *N m,s 2 À Á and X ~DY D. The prior distributions taken for the states were E 0ð Þ*N z Z 75,30 2 À Á , I 0 ð Þ*N z Z 60,30 2 À Á and R 0 ð Þ~0 (a Dirac delta prior), where N z Z m,s 2 À Á is similar to N z m,s 2 À Á except that its support is the integers, and its mass function at x is obtained by integrating the density for N z m,s 2 À Á from x{1=2 to xz1=2.doi:10.1371/journal.pone.0010036.g005

Figure S1
Figure S1 Data collection form.Found at: doi:10.1371/journal.pone.0010036.s001(0.01 MB PDF) support is the integers, and its mass function at x is obtained by integrating the density for N z m,s 2

5 .
Kernel smoothing.Let x p p (with entries rounded to the nearest integer for state space values) where, following Trenkel et al.