Conceived and designed the experiments: NF HY. Performed the experiments: ZP MY HZ XC JW. Analyzed the data: TG. Wrote the paper: TG NF HY.
The authors have declared that no competing interests exist.
The spread of infectious disease epidemics is mediated by human travel. Yet human mobility patterns vary substantially between countries and regions. Quantifying the frequency of travel and length of journeys in welldefined population is therefore critical for predicting the likely speed and pattern of spread of emerging infectious diseases, such as a new influenza pandemic. Here we present the results of a large population survey undertaken in 2007 in two areas of China: Shenzhen city in Guangdong province, and Huangshan city in Anhui province. In each area, 10,000 randomly selected individuals were interviewed, and data on regular and occasional journeys collected. Travel behaviour was examined as a function of age, sex, economic status and home location. Women and children were generally found to travel shorter distances than men. Travel patterns in the economically developed Shenzhen region are shown to resemble those in developed and economically advanced middle income countries with a significant fraction of the population commuting over distances in excess of 50 km. Conversely, in the less developed rural region of Anhui, travel was much more local, with very few journeys over 30 km. Travel patterns in both populations were wellfitted by a gravity model with a lognormal kernel function. The results provide the first quantitative information on human travel patterns in modern China, and suggest that a pandemic emerging in a less developed area of rural China might spread geographically sufficiently slowly for containment to be feasible, while spatial spread in the more economically developed areas might be expected to be much more rapid, making containment more difficult.
Worldwide urbanisation and increased human mobility create conditions favourable to the spread of emerging pathogens, such as the SARS epidemic
Increasing volumes of travel data for developed countries is becoming available, including origindestination matrices for commuting obtained from census data, data from surveys and data from mobile phones
A survey of commuting and travelling behaviour of 20,000 people from two different regions in China was conducted. Half of the study participants lived in Huangshan city at the southern end of Anhui province, whereas the other half come from in Shenzhen city in Guangdong province, bordering Hong Kong Special Administrative Region (SAR), China (see
Anhui is a mainly agricultural and relatively poor province. Huangshan city has a population of 1,470,000, comprising both urban and rural areas. The population density overall is 180 people/km^{2}.
By contrast, Guangdong province has a rapidly growing population and is now the largest province by population; it is also the richest province. Shenzhen city is a Special Economic Region, meaning it is more open to trade and commerce than other parts of China. Since it was established in the early 1980s, it has grown to a population officially recorded as 8.6 million at the end of 2007 with an average population density of 4200 people/km^{2}. Shenzhen is a major manufacturing centre in China, but industries also include finance and hightech enterprises. The population can be divided into local residents who tend to be better educated, and lowerskilled migrant workers. The latter typically originate from other provinces and come into Shenzhen for at least 6 months of the year, living in dormitories provided by their companies.
The Institutional Review Board of China CDC granted ethical permission for the study. Oral informed consent was obtained from all adult participants and parental consent for minors at the time of interview.
10,000 individuals were selected each from the administrative areas of Huangshan city in Anhui province and Shenzhen city in Guangdong province.
Huangshan city consists of 3 urban districts and 4 rural counties. Study participants were selected from one urban district, Tunxi district, and one rural county, Xiuning county, in proportion reflecting the overall proportions or urban and rural citizens within Huangshan city. Within Tunxi district, households were sampled randomly from all 11 communities, whereas within Xiuning county, households were sampled randomly from 30 of the 259 villages in the county.
Shenzhen city consists of 6 urban districts with a total of 639 communities. A substantial proportion of the population are migrant workers who live in company provided dormitories. From 30 communities within Shenzhen city, households and migrant workers were recruited for the study in numbers reflecting the proportion of migrant workers within the population.
All members of the selected households were administered a questionnaire, provided in
We analysed the demographic data, commuting distances and travel behaviour. In order to find predictors of the commuting distances, we fitted linear regression models and characterised the remaining variability within the categories defined by the best fitting regression model by fitting functional distributional forms to the observed distributions of distances travelled to school or work. To model the probability of an individual travelling we fitted logistic regression models to the data. In order to describe the distances of the occasional journeys we fitted functional forms to the overall distributions of distances travelled for the different cities using a gravity model approach. Due to the large number of possible origins and destinations within the country, we used a gravity model integrated over individual destinations that only retained information on the distances travelled.
In order to find predictors of the distance travelled to school or workplace we fitted linear regression models to the data, adjusting the confidence intervals of the estimated parameters for the clustering by households in the dataset. We used the logarithm of the distance to the school or workplace as the independent variable. For those individuals with a reported distance of 0 km we used ln(uniform[0,0.5[)) instead. Potential independent variables are age, sex, family size, registration status, frequency of travel outside the study area and whether or not they live in a rural area (Huangshan) or are migrant workers (Shenzhen). For students, age was categorised by type of school, whereas for employees we used 10year age bands, with those below 20 and those over 60 merged. Family size was classified into small (< = 3) and large (>3) families. To decide which variables and interactions to include into the final model, we fitted models with all possible variable combinations, and up to two interaction terms, as well as all possible combinations of interaction terms for all combinations of up to 4 variables. The model with the lowest value of the Bayesian information criterion
We wished to characterise the variation between individuals in the typical distances travelled to school or work. To this end we categorised observed log distances into bins of uniform width, with the first category containing all distances up to
We also included an additional parameter
To find predictors of those who had travelled outside the study area within the past week, we fitted logistic regression models to the data analogously to the way described above for the school and workplace distances. Here we did not perform a separate analysis for students and employees, so the occupational status was included as an additional potential independent variable. We fitted models with all possible variable combinations and up to one interaction term, and all possible combinations of interaction terms, if up to 3 variables were included.
For the occasional journeys we fitted a simplified gravity model, where we assumed that the number of journeys from each study area to destinations a certain distance
Gravity models assume that the frequency of journeys
As spatial kernel
95% confidence intervals for the parameter estimates of the fitted distributions to both the commuting and occasional journey distances were obtained by varying all parameters around the maximum likelihood estimate and identifying the part of the parameter space, where twice the negative log likelihood differs less than
The selected study participants in Huangshan were interviewed between 21^{st} July and 8^{th} September 2007, in Shenzhen between 9^{th} and 30^{th} August 2007. Some descriptive statistics of the study populations are shown in
Huangshan urban  Huangshan rural  Shenzhen local residents  Shenzhen migrant workers  
N  % (95% CI)  N  % (95% CI)  N  % (95% CI)  N  % (95% CI)  
Total  2317  8126  9894  1994  


male  1115  48 (46–50)  4059  50 (49–51)  5173  52 (51–53)  834  42 (40–44) 


student  439  19 (17 21)  1239  15 (14–16)  1332  13.5 (12.8–14.2)  0  0 (0–0.002) 
employee  1160  50 (48–52)  5832  72 (71–73)  5955  60 (59–61)  1994  100 (99.8–100) 
unemployed  718  31 (29–33)  1055  13 (12–14)  2607  26 (25–27)  0  0 (0–0.002) 


registered  1957  84 (83–86)  8062  99.2 (99.0–99.4)  2074  21 (20–22)  19  0.95 (0.57–1.5) 


mean family size (range)  3.25 (1–8)  3.68 (1–11)  3.08 (1–12)  n/a  


travelled in past week  224  9.7 (8.5–11)  233  2.9 (2.5–3.3)  360  3.6 (3.3–4.0)  30  1.5 (1.0–2.1) 
mean no of journeys if travelled (range)  1.28 (1–7)  1.09 (1–7)  1.06 (1–5)  1.03 (1–2)  

35.9 (0–98)  40.9 (0–94)  30.7 (0–94)  25.4 (15–70) 
There are interesting differences in occupational status by sex and age, see
Blue = males, red = females.
The distance distributions for both students and employees from Huangshan city fall off very steeply between 20 and 30 km, whereas the distance distributions from Shenzhen city show a longer tail of up to 300 km (
For Shenzhen, the best model for students included age and registration status; the best model for employees included age, sex, registration status and whether an individual was a local resident or migrant worker. Fitting the local population and migrant workers separately yielded the same included variables for the local population, but the preferred model for the migrant population was the null model, indicating a fairly homogeneous composition of this subpopulation. The R^{2}values of all these fits were low to moderate for students but very low for employees (R^{2} = 0.40, 0.14, 0.037, 0.061 for Huangshan and Shenzhen students, and Huangshan and Shenzhen local employees, respectively), indicating substantial residual variation in distances within categories. The observed distance distributions by category are summarised in
The data is displayed separately for students and employees in the different study areas as box plots with mean (diamonds), median (horizontal lines), interquartile range (bars) and 90% spread (whiskers). Categories are labelled by age, gender (m = male, f = female), urban (urb) or rural (rur) area, and registration status (reg = registered, nreg = not registered).
Overall, age is the most important predictor for students with older students travelling further to school. In Huangshan, students from rural areas have longer distances than those from urban areas, whereas in Shenzhen, those not registered travel longer distances than those registered.
For employees, men tend to have a longer commuting distance than women. Here, the commuting distance tends to decrease with age, but teenage employees in the Shenzhen local resident population travel fairly short distances. In contrast to the situation for students, local registration is associated with longer commuting distances in the Shenzhen local employees. Further details about the regression models can be found in
We fitted lognormal distributions to the distance distributions stratified by study area and employment status, and further into the categories determined by the linear regression. Cumulative observed and fitted lognormal distance distributions are shown in
Diamonds = observed distributions, lines = fitted lognormal distributions. HS = Huangshan students, SS = Shenzhen students, HE = Huangshan employees, SE = Shenzhen employees, m = male, f = female, urb = urban area, rur = rural area, n reg = not registered, reg = registered.
We also fitted several other distributional forms, including exponential, power law and Weibull distributions, to the commuting distance distributions, however, the lognormal tended to give the best fit for the majority of strata.
The mean number of journeys undertaken outside the study area within the 7 days prior to the questionnaire being taken varied markedly between the different study populations, see
Number of journeys  urban Huangshan  rural Huangshan  Shenzhen local  Shenzhen migrant  Total 
number of people travelling  224  233  360  30  847 
total number of journeys  286  253  380  31  950 
mean number of journeys  0.12  0.031  0.038  0.016  0.043 
The best fit logistic regression model for predicting who had made at least one journey outside the study area for the Huangshan city population included the following covariates: rural/urban area, sex, distance travelled to work or school, and occupation. People from the urban area, males, those with a long commute (> = 10 km) to work or school travelled more, and those who were unemployed travelled less than either students or employees. All included variables are highly significant, apart from the lack of differentiation between students and employees.
For Shenzhen city, the best fit logistic regression model included registration status, distance to work or school, subpopulation (local or migrant workers) and family size. Those registered, with a long distance to work/school, from the local population and from smaller families travelled most, with all variables being highly significant. Odds ratios for these models are shown in
N  % travelled  OR (95% CI)  p  
area  urban  2089  9.68  
rural  7865  2.88  0.24 (0.19–0.30)  <5e4  
sex  male  4877  5.43  
female  5077  3.37  0.65 (0.54–0.78)  <5e4  
distance to work/school  <10 km  8621  4.05  
> = 10 km  1333  6.52  1.5 (1.2–2.0)  <5e4  
occupation  student  1567  5.89  
employee  6673  4.47  0.93 (0.73–1.17)  0.527  
unempl  1714  2.67  0.41 (0.28–0.59)  <5e4 
N  % travelled  OR (95% CI)  p  
registration  not reg  9789  2.53  
reg  2091  6.79  2.5 (1.9–3.3)  <5e4  
distance to work/school  <10 km  10962  2.98  
> = 10 km  918  6.86  1.9 (1.4–2.6)  <5e4  
subpopulation  local  9886  3.64  
migrant  1994  1.50  0.47 (0.31–0.71)  <5e4  
family size  < = 3  6801  3.44  
>3  5079  3.07  0.68 (0.51–0.91)  0.011 
We fitted a simplified gravity model with a lognormal kernel function to the observed distances travelled. The lognormal distribution yielded a better fit than other distributions, namely exponential, power law and Weibull.
Symbols = observed distributions, thick lines = fitted distributions, pale lines = 95% credibility intervals of the fitted distributions.



Huangshan  2.70 (2.55–2.84)  1.04 (0.98–1.12) 
Shenzhen  4.85 (4.70–4.99)  1.21 (1.09–1.35) 
We have analysed a unique dataset of commuting and travel behaviour in two very different parts of China. The commuting and travel patterns show interesting differences between these two study sites, indicating that there is substantial heterogeneity in China. This means that it is not easy to generalise our results to the whole country, although the overall demographics of China resemble more those in Huangshan, whereas the strongly peaked age distribution found in Shenzhen that reflects the large number of migrant workers appears to be fairly unique (see
Study participants were chosen by household. In Huangshan, households were selected from one of the urban districts and one of the rural counties to match the overall proportion of urban and rural residents, whereas in Shenzhen, households were chosen from the local residents population and the dormitories housing the migrant workers. This selection process should give a fair representation of the overall populations in Huangshan and Shenzhen, respectively, as long as these are the major subpopulations. However, should there be substantial differences between the different urban districts and rural counties in Huangshan or any other important population groups in Shenzhen that are not covered in the selection of households, it is possible that the study populations might not reflect the cities well.
In China, the hukou system of residency permit
Migrant workers move from rural areas to the cities in order to take up jobs in manufacturing for at least 6 months of the year, living in company provided dormitories. The fact that around 17% of Shenzhen's population is made up of these migrant workers points to the rapid growth of the local economy and its need for labour as well as the differential in wages between Shenzhen and other areas of China. The dormitories provide very crowded accommodation with typically 8–12 people sleeping in the same room and many more assemble in the communal areas for eating and other pursuits
The regression modelling has identified several predictors of the distance to school or work, but the best fitting models explain only a small part of the observed variability, pointing to probably a large number of factors that are not well understood influencing the complex choices of places to live and work. However, the covariates included in the models do indicate that different mechanisms determine the distance to school or work for students and employees.
For students, the most important factor is student age, with older students travelling further as the number of schools decreases from primary to secondary schools. As the population density is lower in the rural areas, students in rural Huangshan tend to have to traverse longer distances to get to their school than those in the urban area. In Shenzhen, students without residency permit tend to have longer distances to school, possibly reflecting the more difficult access to education for the unregistered population
For employees, it is tempting to surmise that a long commuting distance is associated with a higher socioeconomic status, as has been reported for other places such as Seoul
These mobility determinants might interact in interesting ways with other characteristics important for disease spread, such as, for instance, an interaction between the agedependence of commuting distance with the agedependence of contact patterns or biological susceptibility to disease.
Travelling behaviour does not differ substantially between students and employees, but otherwise the predictors of travelling are similar to those for long commuting distances, with a long commuting distance being explicitly included in the logistic regression models for travelling.
Viboud et al.
While the patterns of occasional journeys observed might have a smaller impact on disease spread than commuting distances, due to the lower frequency of journeys, infrequent but very long distance trips are likely to play an important role in spreading diseases between regions, particularly in areas like Huangshan, where commuting is limited to very short distances.
This paper has presented the first known data on human travel patterns within China. The striking differences between travel distances seen in Huangshan and Shenzhen point to substantial heterogeneity in travel behaviour within China at its current state of development. It is important to take these regional differences into account when modelling the likely speed of geographic spread of an infectious disease outbreak and in planning for containment or control of such outbreaks. Ongoing modelling work is using these data to examine the feasibility of containment of a lethal influenza pandemic in different areas of China.
(PDF)
(PDF)
We thank the Health Bureaus of Shenzhen, Huang Shan and Xiuning for their assistance in coordinating the field investigations and provision of logistical support.