Inferring the Spatio-temporal Patterns of Dengue Transmission from Surveillance Data in Guangzhou, China

Background Dengue is a serious vector-borne disease, and incidence rates have significantly increased during the past few years, particularly in 2014 in Guangzhou. The current situation is more complicated, due to various factors such as climate warming, urbanization, population increase, and human mobility. The purpose of this study is to detect dengue transmission patterns and identify the disease dispersion dynamics in Guangzhou, China. Methodology We conducted surveys in 12 districts of Guangzhou, and collected daily data of Breteau index (BI) and reported cases between September and November 2014 from the public health authority reports. Based on the available data and the Ross-Macdonald theory, we propose a dengue transmission model that systematically integrates entomologic, demographic, and environmental information. In this model, we use (1) BI data and geographic variables to evaluate the spatial heterogeneities of Aedes mosquitoes, (2) a radiation model to simulate the daily mobility of humans, and (3) a Markov chain Monte Carlo (MCMC) method to estimate the model parameters. Results/Conclusions By implementing our proposed model, we can (1) estimate the incidence rates of dengue, and trace the infection time and locations, (2) assess risk factors and evaluate the infection threat in a city, and (3) evaluate the primary diffusion process in different districts. From the results, we can see that dengue infections exhibited a spatial and temporal variation during 2014 in Guangzhou. We find that urbanization, vector activities, and human behavior play significant roles in shaping the dengue outbreak and the patterns of its spread. This study offers useful information on dengue dynamics, which can help policy makers improve control and prevention measures.


Introduction
Dengue is a mosquito-borne disease caused by one of the four dengue virus serotypes (DENV [1][2][3][4], and is primarily transmitted by Aedes aegypti and Aedes albopictus [1,2]. The virus and its vectors are now widely distributed throughout tropical and subtropical regions, resulting in about half the world's population being at risk of infection [1]. The World Health Organization (WHO) has estimated that 50-100 million infections occur annually in over 100 endemic countries [1,3]. More recently, Bhatt et al. took into account the inapparent infections and found that the global burden is probably much higher, at about 390 million infections per year [4]. The problem of dengue epidemics in China has intensified over the past two decades [5], and between 1991 and 2013 about 21,532 dengue cases and 620 deaths were reported. In 2014, the incidence reached a peak, with 46,864 reported cases, 80% of which were infected in Guangzhou. Dengue is not endemic in China [5], but the current situation has become more complicated, and the exact causes of the increase in incidences and the detailed transmission characteristics are unclear [5].
In this study, we aim to identify the spatio-temporal transmission patterns of dengue epidemics in Guangzhou in 2014 by addressing the following questions: How can we estimate the temporal and geographical distributions of infection cases? How can we evaluate the infection risk and the control effects? How can we assess the interactions of the factors involved among different geographical locations and thus infer the diffusion process from one location to another? The answers to these questions will be influential in epidemiological inference and public health planning. Clear information about disease burden and infection risk can help in correctly evaluating the effects of the factors involved and the correct allocation of resources for intervention [4]. An accurate reflection of the transmission dynamics and diffusion process of dengue can help us more fully understand and further predict the prevalence of epidemic propagation. There are, however, many challenges to be addressed, such as misreported surveillance data, obscure vector indices, the hidden effects of hosts and vectors, and the heterogeneous infection processes. These challenges and existing works related to our questions are discussed in more detail below.
First, disease surveillance data is usually the baseline for estimating the infection burden, but it does not directly reflect the full extent of infection, for the following reasons: (1) multiple factors, such as inapparent infections, under-reporting, and misdiagnosis can lead to the misreporting of infected cases [4,6]; (2) the incubation period of the dengue virus can create a delay between infection and reporting; (3) human mobility can lead to the mis-registration of infection locations [7]. Feasible techniques to handle these issues have been proposed, such as analyzing index clusters and serologic testing to evaluate inapparent dengue infections [8], using epidemiological models with exposed states (e.g., SEIR model) to account for the incubation period [9], and incorporating human mobility into the transmission model to examine its effect on dengue infection [10]. Epidemiological information can be extracted from hidden infections and unclear data through statistical and stochastic methods [11]. Ster et al. deployed reversible jump MCMC methods to reveal hidden infections and inferred the infectivity profile of the U.K. 2001 foot and mouth epidemic [12]. By fitting the partially observed data sequences of hospital infection, Cooper et al. estimated key epidemiological parameters using a structured hidden Markov model [13]. We take the distribution of reported cases as the essential data, and also incorporate the factors of incubation time lag, host mobility and reported rate, to estimate the 2014 actual dengue burden in Guangzhou.
Second, dengue infection risk is in reality primarily evaluated by vector indicators, such as the house index (HI), the container index (CI), the Breteau index (BI), the pupa index (PI), and the adult productivity index [2]. However, the traditional vector indices (e.g., HI, CI, and BI) have been shown to be poor proxies for measuring adult mosquito abundance and dengue risk, possibly due to the inadequate quality of the vector and incidence data, diversity of vector indices and adult vector densities, or to geographic/temporal mismatches of infection sites and index records [14][15][16]. Most vector indices only reflect vector prevalence rather than abundance [17], as they do not take into account the container type productivity. In view of this, other vector indicators have been suggested, such as sampling adult mosquitoes [14,15,18], or integrating vector indices and other information (e.g., combing demographics and indices [18]. Beyond the vector indices, standard notions have been proposed to assess infection potentials, such as the basic reproduction number [19,20], the vectorial capacity (VCAP) [21][22][23], and the entomological incubation rate (EIR) [22,24]. In this study, we systematically integrate environmental and ecological factors and BI data to evaluate the adult vector densities.
Third, the large-scale spread of dengue viruses is often caused by the spatial and temporal dynamics of vectors and hosts [25][26][27]. A combination of elements, including dense populations, frequent human-vector contact, rural-urban migration, serotype circulation, and inadequate infrastructure, can lead to dengue infection and mosquito breeding opportunities [16,28,29]. These factors are significant in shaping the spatial and temporal transmission of dengue epidemics. Two types of studies have been performed to reveal transmission process and assess the relevant factors [30,31]. First, mapping techniques and statistical methods can be used to process various data, such as geographic information systems method [32], time-series Poisson regression [29], Moran's I statistic [33,34], and spatial scan statistics [35]. These methods can identify the hot spots, evaluate the relationship between different factors (e.g., climate, imported cases and urbanization) and dengue incidence, and estimate the dispersion process [29,[32][33][34][35]. The second type of analysis methods is primarily based on mathematical or computational models, and focus more on the intrinsic biting-based transmission process and the interaction between hosts, vectors, and viruses. These include differential equations [19,20], spatially agent-based transmission models [36], and metapopulation models [10]. These methods are able to estimate the infection capacity and simulate the time evolution and spatial diffusion of dengue epidemics [10,19,20,36]. However, it has been suggested that existing models less take into account the heterogeneity of mosquito densities/behaviors and mosquitohost encounters [31,37], so may not effectively reflect the spatial heterogeneity and temporal variation in the transmission [25,31].
In this study, we use mathematical models and computational methods to tackle the aforementioned problems. First, based on the Ross-Macdonald theory [9,[21][22][23], a transmission model is defined to simulate the spatial diffusion of the dengue virus. Second, large-scale and consecutive vector indices, together with environmental and ecological information, are integrated to estimate the vector quantities. Third, human mobility as a spatio-temporal driver of dengue spreading dynamics [36], is estimated based on the radiation model proposed by Simini et al. [38]. Underlying transmission parameters are quantified by fitting the model to real-world observations using machine learning methods, such as the Markov chain Monte Carlo (MCMC) method [39]. The issue of incomplete surveillance data is addressed by comparing the estimated incidence rates and surveillance data with a reported rate. The research framework is shown in Fig 1. We conduct an empirical study in Guangzhou, where serious dengue epidemics have recently been experienced, particularly in 2014. Guangzhou is an international metropolis located in the tropical/subtropical region, and its climate and geography are ideal for vector growth and virus survival. In Guangzhou, the population is densely concentrated in the urban areas, with frequent movement between districts. By implementing the proposed model, we aim to (1) estimate the actual incidences and reveal the effects of environment and urbanization on vector activities; (2) evaluate the infection risk in terms of the basic reproduction number and explain the temporal pattern of infection potential in each district; (3) identify the underlying transmission process and mechanisms of dengue in Guangzhou.

Materials and Methods
To modeling the spatio-temporal patterns of dengue transmission in Guangzhou, we first surveyed the situations about dengue outbreak (e.g., reported cases, control strategies and possible causes of outbreak) from the local government reports, newspaper, and other media, and then investigate the related information (e.g., geographical environment, vector indices, climate, population and transportation) in this region. Based on that, we propose a transmission model. Detailed procedure is presented in the following.

Study areas
The city of Guangzhou (112°57 0 E to 114°3 0 E and 22°26 0 N to 23°56 0 N) is the capital of Guangdong province in South China, and has an area of 7434 square kilometers and about 12.93 million residents. The climate is humid and subtropical, with high temperatures and humidity in summer, and comparatively mild and dry in winter. The annual mean temperature is 22°C and the annual accumulate precipitation is 1,800 mm. Guangzhou is an international port and an important foreign trade gateway into China. The above information is based on the Guangzhou government site (http://www.gz.gov.cn/gzgov/s2289/zjgz.shtml).
Guangzhou consists of 12 districts and is divided into three areas: urban, suburban, and exurban, which follows the city's overall planning (2010-2015) and its Five-Year Plan (2011-2015) that take into account the urbanization, population density, and green coverage. The urban areas are Liwan, Yuexiu, Haizhu, Tianhe, and Baiyun (south of Liuxihe and north second ring), which account for 3.8% of the city area, 46.6% of the total population, and about 56% of transportation. The suburban areas are Panyu, Huangpu, Luogang, Huadu, and Baiyun (outside the central area), and the exurban areas are Nansha, Zengcheng, and Conghua. A detailed map is shown in Fig 2.

Data collection and parameter settings
To identify the underlying transmission patterns of dengue in 12 Guangzhou districts, the following data are collected.
• Dengue cases. Records of dengue daily cases in 12 districts are obtained from the Health Department of Guangdong Province (http://www.gdwst.gov.cn/). According to the "Law of the People's Republic of China on the prevention and treatment of infectious diseases," all dengue cases confirmed by any medical institution or hospital must be reported to the surveillance system in the local area within one day (12 hours in cities and towns, 24 hours in villages). However, some cases go unreported, and the records of infection time and place of some may be misreported [6].
• Breteau index. In Guangzhou, the available vector indicator is the BI data. Approximately 287 surveillance sites are distributed throughout the city, and the daily reports of BI in each district are collected by the Guangzhou Center for Disease Control and Prevention (http:// www.gzcdc.org.cn/). In the affected areas, any indoor and outdoor water containers near any of 50 to 100 houses in the vicinity (outside the houses, a radius of 10m is considered as part of the property) are inspected. Larval growth is closely associated with temperature and rainfall [40], so the missing BI is estimated by stepwise linear regression using one week's weather records.
• Population density. The population size in each district is retrieved from the 2013 Guangzhou Statistical Yearbook, but some of the population in certain districts (registered as residents) may travel to other districts early in the morning for work or study and return in the late afternoon.
• Transportation. The transportation data is taken from the Guangzhou Transport Development Annual Report and the Guangzhou traffic site (http://www.gzjt.gov.cn/gzjt/web/ Default.aspx), from which the daily commuting level of the districts can be evaluated.
The parameters are set as follows.
• Study time period. Our study time period in Guangzhou is from September to November 2014. In January 2014 the imported cases have been found in Guangzhou, but the first autochthonous case was reported in June, and starting from August the local infections dominated the transmission.
• Time step. The time step is set to be one day, and each time step is divided into daytime and nighttime. The duration of daytime is set to be from 0700 (7 a.m.) to 1900 (7 p.m.), according to the working hour and commuting time in Guangzhou. It is assumed that residents who work in other districts stay there during the daytime, and that all residents stay in their home districts at night.
• Biting rate during daytime and nighttime. The dominant vector of dengue transmission in Guangzhou is Aedes albopictus. It has been reported that the Aedes mosquito is a diurnal feeder with peak biting periods in the early morning and in the evening [1], but surveys have shown that the biting activity of Aedes albopictus varies from place to place. For example, in Malaysia the activity peak is observed between 0600-0900 and 1500-2000 [41], and in Macau (near Guangzhou), between 0600-0800 and 1800-2000 [42], but in India, it shifts to 2230-2300 and 2030-2100 [43]. In our study, we assume that the biting rate during daytime and nighttime is the same, but has different values in different districts. The biting rate will be estimated by our modeling approach and machine learning methods.

Transmission model
In this section, we propose a mathematical model integrating the geographic, transportation, demographic, environmental, and surveillance information in dengue transmission. The relationship between incomplete surveillance data and the estimated number of incidences is discussed below. Surveillance systems usually record disease incidences in different locations as a set of time series, so if we have observed the incidences of H locations during time period t = 1, 2, Á Á Á, T, and the spatio-temporal surveillance data at time t are denoted by a vector Γ t = (γ 1t , γ 2t , Á Á Á, γ Ht ) T , then the correspondence of incidences between reporting and modeling can be quantified as follows: where δ is the reported rate, ρ t = (ρ 1t , ρ 2t , Á Á Á, ρ Ht ) T is the numbers of incidences derived from the modeling approach presented below, and ε is the error term, which follows normal distribution with variations S ¼ diagðs 2 1 ; s 2 2 ; . . . ; s 2 H Þ. To specify and locate the infection events, the time step is set to be one day, and each is divided into daytime and nighttime, to take daily commuting and biting difference into account. For simplicity, the notations of subscript and superscript correspond to the district number and vector age, and the hat and check correspond to daytime and nighttime, respectively.
Vector density. To evaluate the infectious bites, we quantify the spatial densities of vectors (i.e., Aedes mosquitoes). The vector flight range is less than 400m, and biting behavior usually occurs around their habitats [1,2], so we assume that the Aedes mosquitoes stay in their original locations. We further assume that the abundance of immature female vector is proportional to that day's BI with parameter K, thus we can estimate the density of female Aedes mosquitoes with specific ages in each district. Let x j i ðtÞ denote the density of female adult vectors with age j at time t in district i. It is then calculated as is the value of BI at time t in district i. The other parameters are described in Table 1. The densities of adult female vectors x i ðtÞ ¼ P j x j i ðtÞ in district i is therefore dependent on the coefficient K, the BI values in early (1 + h) days, and the vector survival rates.
The proportionality coefficient K reflects the concentration of immature mosquitoes in larval habitats, which is dependent on the climate, environment, hosts, container types, and surveillance methods. Its value can therefore vary from time to time and from place to place. A recent survey carried out in Guangzhou found that the abundance of larva vectors in aquatic habitats fluctuated significantly between urban, suburban, and rural areas, which corresponds to the three different classes of urban, suburban, and exurban [44], so we subdivide the parameter as K 1 , K 2 , and K 3 accordingly. These parameters are viewed as constants, due to the short period of our study.
Daily commuting. To account for remote infections in other locations and to understand the effects of human mobility on dengue transmission, we consider the daily commuting between different locations. A radiation model has recently been developed to simulate human mobility [38], where the commuting is a daily process related to employment, and the radiation model appears to match experimental data very well [38]. In our study, commuters refer to those who live and work (or study) in different districts, and who go out early in the morning and return in the late afternoon. We use the radiation model to estimate commuting frequency in different districts. Let N i denote the population (number of residents) in district i, and S il be the total population in the circle whose center is the origin i and radius is the distance between i and the destination l, minus the population at i and l. The number of commuters departing from district i to l can then be calculated as follows: Here, T i is the total number of commuters departing from i. It can be formulated as where N c is the total number of commuters and N is the total population [38]. Thus, the number of people who physically stay in district i at night is N i ¼ N i , while during the daytime the number becomesN Incidence modeling. To determine the potential infectivity from mosquitoes, we use the notion of vectorial capacity (VCAP), which is defined as the average number of infectious mosquito bites per unit time, following the introduction of a single infected host [21,22]. VCAP can capture the critical components of an insect's role in pathogen transmission, which is adapted from the basic reproduction number based on the Ross-Macdonald's model [49]. VCAP has recently been logically generalized to consider mosquito senescence with an agedependent survival rate and life expectancy [23]. VCAP can therefore be more detailed and precise when evaluating the activities of mosquitoes. Let V j i denote the VCAP contributed by k Extrinsic incubation period (EIP) 9d [45] d The longest intrinsic incubation period (IIP) 1 8d [46] σ Age at which adult mosquitoes begin biting hosts 3d [23] b = c Transmission probability from host to vector (=vector to host) 0.4 [47] δ The report rate 3 0.25 [4] K The IIP follows Log-normal distribution denoted as P L [46]. 2 These parameters can be fitted by machine learning methods. 3 The reported rate is adopted from the proportion of symptomatic infections during the high-incidence period.
where e j+k is the expectation of remaining infectious life at age j + k, and m j i is the ratio of female mosquitos at age j to humans, whose values during daytime and nighttime arem j To further estimate the infection size in each district, we introduce the notion of entomological incubation rate (EIR), which is defined as the number of infectious bites received by a human at each time step [22]. Letŷ i ðtÞ denote the proportion of infected population in the daytime. It is given byŷ i ðtÞ ¼Î i ðtÞ=N i , whereÎ i ðtÞ is the number of infections physically staying in district i at time t. EIR in the daytime is then calculated through vectorial capacity and temporal prevalence as follows [22]: where e ¼ P hÀk j¼s ipðiÞ is the average life span of Aedes mosquitoes. The range of j = σ, Á Á Á, h − k is because only those mosquitoes who can bite and live through EIP can contribute to the infection risk. The EIR at night E i ðtÞ can be similarly calculated. A number of factors determine the selection of some of the parameters. In China, when dengue patients go to hospital for treatment, they are registered and reported according to the regulations. Most of them must be hospitalized or quarantined, and must also take care to avoid being bitten. We therefore assume that the reported cases would not be bitten by mosquitoes again and only the unregistered patients (1 À dÞŷ i are involved in the further transmission. Our study time is the high-incidence period of dengue occurrence in Guangzhou, when the government took various measures (e.g., spraying insecticide and cleaning up the environment) to control the dengue outbreak. Information about dengue was also broadcasted widely across the media. Therefore, the treatment rate can be regarded as high and the reported rate as equal to the proportion of symptomatic infections in the total infections.
Based on the definition of EIR, the estimated number of new infections in the daytime is [24]: Here, we suppose that the total population is susceptible, as those infected and who recovered in Guangzhou only make up a small part of the total population.L i ðtÞ counters those infected who physically stay in their district i during the daytime. The corresponding value at night L i ðtÞ can be derived similarly. The number of new infections in the daytime of those living in district i (including those staying away) can therefore be estimated as: Those newly infected in the previous night and who stay in district i during the daytime can be calculated as: It should be noted that all new infectionsL i ðtÞ and L i ðtÞ are in a latent state at time t. Only after the intrinsic incubation period (IIP) (3-7days [46,50]), they will become patients (with or without symptoms) and δ of them will seek treatment.
To activate the formula (5), temporal prevalencesÎ and I is estimated as follows: where P L (j) is the probability that the individuals become infected after being bitten j days before, and PðlÞ ¼ 1 À P l i¼1 P I ðiÞ is the probability that those individuals who become infected l days before remain infected. τ and d denote the longest IIP and the maximum infection period, respectively.
Based on the report cards of infectious diseases in China, patients are registered as incidence cases in their own residential districts. This may introduce a spatial mismatch between the surveillance data and real infections. Hence, to evaluate the real transmission level, any remote infections should be projected into their residential districts. Based on the radiation model (3) and the transmission equations presented above, the number of real incidences among people living in district i can be calculated as follows: So far, the standard transmission process has been formulated by Eqs (2)(3)(4)(5)(6)(7)(8)(9)(10). To implement this model in a specific region, knowledge of the BI values, the initial number of infections, and suitable parameters is needed. The initial infections can be estimated by dividing the number of the reported cases by a reported rate, and undetermined parameters can be fitted by machine learning methods.
Markov chain Monte Carlo method. We adopt a Markov chain Monte Carlo (MCMC) method to estimate the model parameters. The model parameters K and a are estimated by fitting our model with surveillance data. The relationship between the sizes of reported cases and of modeling cases can be written in matrix notation as Γ = δρ+ε, where Γ = (Γ 1 , Γ 2 , Á Á Á, Γ H ) is a H × T matrix, representing the surveillance data, and ρ = (ρ 1 , ρ 2 , Á Á Á, ρ H ) is a H × T matrix, representing the modeling incidences. The H × T matrix ε follows a matrix normal distribution, i.e., ε * N(0, I T , S). To account for any misalignment of the report date, each element in Γ equals the average of reported cases over two successive days.
The likelihood can be calculated as: where in the evolution dynamic process K = (K 1 , K 2 , K 3 ), a = (a 1 , a 2 , Á Á Á, a H ). Accordingly, the joint posterior distributions of K and a are given by PðG t jr t ; K; aÞ The procedure of the MCMC method is carried out as follows [39]: First, we initialize all of the independent model parameters K and a, each of which follows a normal distribution. We then generate the value of the modeling cases based on new parameters for calculating the posteriori likelihood P(K Ã , a Ã |Γ) according to Eq (11). For each iteration, new values of Γ are generated from an adaptive proposal distribution P(K Ã , a Ã |K, a). New values of K and a can then be calculated. All new values K Ã , A Ã and Γ Ã will be accepted with probability min 1; PðK Ã ; a Ã jGÞqðK; ajK Ã ; a Ã Þ PðK; ajGÞqðK Ã ; a Ã jK; aÞ ; where q(K Ã , a Ã |K, a) is the proposed density. After a number of iterations, we can then analyze the statistics of the model parameters and estimate their values.

The heterogeneity of vector behaviors
According to the proposed model and the MCMC algorithm, the underlying model parameters are estimated, and the values are presented in Table 2. Based on the scale factors between BI and mosquito density (i.e., K 1 , K 2 and K 3 ), it is observed that the aquatic habitats contain the highest concentration of larval mosquitoes in urban areas and the lowest concentration in exurban areas, but the difference between the exurban and suburban areas is not significant. This finding is consistent with the results in [44], where the authors have found that in urban areas of Guangzhou, the larvae and pupae of Aedes albopictus are more abundant in container habitats. The possible reason for the disparity of K is that the temperature, food sources, and types of aquatic habitats and containers vary between the urban, suburban, and rural areas [44]. The estimated number of bites per person per 12 hours by each female Aedes mosquito is shown in Table 2, presented as a i in district i. This rate is equal to the product of the human blood index (i.e., the proportion of blood meals of mosquitoes taken from humans) and the mosquito feeding frequency, which is possibly associated with the status of demography, temperature, geography and environment [42]. In Guangzhou, the terrain slopes downward from the north to the south. The temperature usually falls 1-2 degree from the south to the north, but the urban heat island effect results in over 1.7 higher degrees in the center areas in 2014. Low latitude and high temperature can cause frequent mosquito feeding. Specifically, our results indicate that urban areas possess high values of human blood index or mosquito feeding frequency. This is perhaps due to the dense population and the urban heat island effect. Tianhe is the most prosperous district (with the highest GDP) and one of the densely populated areas. Mosquitoes there prefer to bite humans frequently. In the suburban and exurban areas, however, probably due to various blood sources (e.g., chicken, dogs, and cattle), cool temperature, high altitude, and the sparse population, the biting rate on humans is relatively low.

The infection risk
The infection risk is usually evaluated by the basic reproduction number R 0 , defined as the expected number of secondary infections averagely generated by one case in a completely susceptible population [9,19,22]. R 0 is widely used as an invasion threshold: if R 0 is less than one, then the disease will become extinct; otherwise, there will exist an endemic state. In epidemiology, R 0 reflects the biology of the transmission dynamics and quantifies the transmission potential of an epidemic. For vector-borne diseases, the basic reproduction number was first derived by Macdonald (1957) and Ross (1911) [9], based on which, we present the following formula: where ϑ = ∑ l lP I (l) = 5 days [48] is the average duration of human infection. Eq (13) is an evolutionary form of the basic reproduction number with time-varied VCAP. Averaging over the vector densities through the study period and inserting it into Eq (13), we obtain the mean value of the basic reproduction number R 0 . The evolutionary and average values of R 0 in each district of Guangzhou are presented in Fig 3. It can be observed that R 0 decreases from the largest value, 4.4 (in Tianhe), in late September to less than 1 in early November, and the average value is between 1.81-2.59. We find that R 0 is much higher but decreases more quickly in urban areas, which implies that the infection capacity is at first greater in urban areas, and the following intervention measures are effective there. Two peaks of R 0 are observed in Huadu and Conghua, due to the increase in mosquito numbers. It should be noted that a larger R 0 does not determine a higher incidence, and a rapid decrease of R 0 does not means a rapid reduction in incidences, as the incidence rate is also dependent on the infection sources. The spatio-temporal patterns of dengue dynamics Applying the estimated parameters to the proposed model, we obtain the following estimation: (1) The longitudinal number of infections in each district with the infection difference between daytime and night is as shown in Fig 4, in which the time corresponds to when people are bitten and get infected; (2) The longitudinal numbers of the incidences are as shown in Fig 5, in which the reported cases are demonstrated as a part of them; (3) The levels of the remote and local infections are as summarized in Table 3, in which the living areas and bit locations of the patients are estimated; (4) The spatio-temporal incidence rates are as shown in Fig 6. From the above Figures and Table, we observe heterogeneous and interesting patterns in dengue transmission in these districts, which are specified from the following aspects. First, the number of infections is estimated at about 113,108 cases from September 24 to November 9, 2014, some of which are recorded in the surveillance system. Most of the unreported cases are inapparent. We can classify the 12 districts into 3 classes.   • The areas with the highest incidence rates are around the urban center (i.e., Yuexiu, Haizhu, Liwan, Tianhe, and Baiyun), where about 92,336 people (81.6% of the total cases) are infected. The obvious hotspots can be observed in the areas around Liwan, Yuexiu, and Baiyun. The primarily reasons for the high incidences in the urban center could be: (1) high concentrations of people (about 57% of residents and 17% of the city's area), (2) a highly fluid population (with an estimated 81.7% daily mobility rate in these 5 districts), (3) abundant initial infections, and (4) high vector density and frequent human biting.
• The areas with the second highest incidence rates make up one section of the suburban area (i.e., Panyu and Huangpu). These two adjacent districts are close to the incidence hotspot, and many residents work or study in the central areas.
• The five remaining districts demonstrate relatively small infection levels, (6,022 cases, 5.3% of the total cases). These districts are located on the border of Guangzhou, where the human mobility rate is quite low and the infection source is small. In this case, high vector indices do not lead to high incidences.
Second, the patterns of infection differences between daytime and nighttime are shown in Fig 4. A slightly higher infection rate can be observed at night for residents in the urban center, which may be due to people returning to their living districts in late evening, and the infections risk is relatively high in urban areas. However, in most suburban and exurban areas, the lower infection risk creates a lower probability of getting infected at night. If the number of people leaving a certain district each day is larger than the number moving in (i.e., in Baiyun, Panyun, Huadu, Nansha, Zengcheng, and Conghua), then the incidence rate in the daytime could be higher due to remote infections.
Third, typical temporal patterns of infection are summarized as follows: • All 12 districts experience a fluctuation in terms of dengue incidences, but the peak time varies, with the first October 11 (Yuexiu) and the last October 30 (Huadu). These phenomena are due to the diverse levels of infection sources (i.e., infected people and vectors) and the heterogeneous activities of Aedes mosquitoes.
• Population density and mobility play significant roles in dengue diffusion. For example, based on surveillance data, Yuexiu, Baiyun, and Nansha are the first to report 100 cases, but the transmission patterns are different. Low mobility yields a small incidence level in Nansha. But Baiyun as the area with the largest numbers of residents and commuters, it has the most incidences and a comparatively late peak.
• The continuing rapid decline of BI values results in a relatively rapid decline in the incidence rate, in Yuexiu, Panyu, and Nansha, and particularly in Tianhe. This result also indicates that intervention in these districts was timely and effective.
• Double peaks are observed in some districts, such as Tianhe and Nansha, due to the fluctuation of vector densities and occurrence peak incidences in other districts. The latest peak incidences occur in Huadu and Conghua, as they are the last districts involved in the transmission, and the BI values are still high in late October.

The diffusion route
Based on the surveillance data, the weight of human mobility, the quantity of remote incidences, and the arriving time of incidence peak, particularly the spatio-temporal incidence rates, we are able to identify the primary route of dengue diffusion in Guangzhou.
• The first step refers to the spread through central areas: Local dengue infection initially takes place in Yuexiu, the biggest remote infection source, which then triggers a rapid spreading of the dengue virus through Liwan, Baiyun, and then invades Haizhu. Tianhe and Panyu are at the same time involved in the transmission process, due to the interchange of infection sources.
• The second step refers to the spreading process through the periphery of the central areas.
Remote infections from the central areas gradually cause widespread transmission in suburban and exurban areas. Nansha, Zengcheng, Huangpu, and Luogang experience the infection process at a similar rate, while Huadu and Conghua suffer infections slightly later. During transmission, Nansha, Zengcheng, Huadu, and Conghua contribute few infections to other areas.
It should be noted that the effects of human mobility are not just reflected in the remote infections, and by introducing infectivity to local Aedes mosquitoes, human mobility can lead to large number of autochthonous dengue cases.

Discussion
In this study, we have developed an inference technique to identify dengue transmission patterns and applied it to the 2014 dengue outbreak in Guangzhou, China. From this approach, we can improve our understanding of the dengue burden, infection risk, and the transmission dynamics in Guangzhou. Our results can help policy makers formulate effective measures to control and prevent dengue transmission.
Our model is based on the Ross-Macdonald theory, which can be viewed as an epidemiological compartment model with an SEI infection process. The model closely combines four key sub-models necessary for describing the integrated dynamics of the system, namely, those representing mosquito population dynamics, human movement, virus transmission, and parameters estimation. First, we present a novel method to estimate the quantities of adult female mosquitoes from the BI data and environmental information. This approach differs from other studies that directly use vector indices and involved factors (e.g., climate, sociodemographic indicators, and land-cover types) to estimate the potential dengue risk from a statistical perspective [14,16,26]. Second, based on the available transportation data, we use a standard radiation model to approximate the human mobility pattern [38]. As an inevitable component in dengue spatial transmission, human mobility can also be tracked by many other methods, such as GPS data [27], agent-based models, [25], and metapopulation models [10]. Next, we integrate well-recognized formulas (e.g., the vectorial capacity [21][22][23] and the entomological incubation rate [22,24]) to elucidate the transmission process. We take into account the spatial heterogeneity of vector-host interactions, and the corresponding biting rates are estimated by MCMC methods. Our model further explore the real dynamics of disease transmission behind the observed incidences. The framework can also be applied to the space-time analysis of other vector-borne diseases.
Based on our empirical study in Guangzhou, we find that the spatio-temporal distribution of incidences is extremely heterogeneous, with 81.6% infections occurring in urban centers with different shapes of peak existing in mid-October. By considering the underlying dynamics, we observe temporal and spatial disagreement between infection cases and reported cases. The rank of disease burden in 12 districts is also inconsistent with the surveillance results.
Further, We find that, in Guangzhou, the basic reproduction number R 0 as an indicator of the infection risk decreases from the peak (3.45) on September 22 to the trough (0.73) on November 9, 2014, with a mean value of 2.24. This estimated R 0 can be applied to quantify the infectivity in 12 districts and measure the effectiveness of the control strategies. From September, the Guangzhou government began to adopt various measures to control dengue transmission, such as disseminating knowledge about dengue through different media, asking every family to clean and clear their water containers, and organizing a sweep each Friday and regular spring-cleaning throughout the city. Consequently, we find that R 0 begins to decrease from late September in most areas, particularly urban regions. However, due to a large number of infectious sources, the incidences decreased in about mid-October. This indicates that to control dengue transmission, intervention measures must be taken in a timely fashion.
Due to the availability and validity of current surveillance data, the proposed models have certain limitations, which are worthy of further improvement and discussion: (1) People in Guangzhou are assumed to be without immunity against dengue; (2) The biological parameters are extracted from the literature (see Table 1), which may show geographical disparities; (3) The reported rate is used as the the proportion of symptomatic infections from [4]. Further experiments and survey are necessary to validate these parameters.