Travel Patterns in China

The spread of infectious disease epidemics is mediated by human travel. Yet human mobility patterns vary substantially between countries and regions. Quantifying the frequency of travel and length of journeys in well-defined population is therefore critical for predicting the likely speed and pattern of spread of emerging infectious diseases, such as a new influenza pandemic. Here we present the results of a large population survey undertaken in 2007 in two areas of China: Shenzhen city in Guangdong province, and Huangshan city in Anhui province. In each area, 10,000 randomly selected individuals were interviewed, and data on regular and occasional journeys collected. Travel behaviour was examined as a function of age, sex, economic status and home location. Women and children were generally found to travel shorter distances than men. Travel patterns in the economically developed Shenzhen region are shown to resemble those in developed and economically advanced middle income countries with a significant fraction of the population commuting over distances in excess of 50 km. Conversely, in the less developed rural region of Anhui, travel was much more local, with very few journeys over 30 km. Travel patterns in both populations were well-fitted by a gravity model with a lognormal kernel function. The results provide the first quantitative information on human travel patterns in modern China, and suggest that a pandemic emerging in a less developed area of rural China might spread geographically sufficiently slowly for containment to be feasible, while spatial spread in the more economically developed areas might be expected to be much more rapid, making containment more difficult.


Demographics
The population census 2000 provides age distributions in 5 year age bands by administrative area. Figure S1a shows the age distributions of Shenzhen and Huangshan compared with that of China overall. Most remarkable is the large proportion of people in their 20s in Shenzhen, clearly indicating the population of migrant workers (cf. Figure 3 in the main text). It is also clear that the Chinese population is demographically not stable. This could be due to the One Child Policy implemented in 1979, and other factors [1]. Due to the rapid changes in the demography there is little merit in comparing the age distributions found in the study populations in 2007 with the population census data from 2000 directly. In order to make the data more comparable, we project the age distribution of the study populations back to the year 2000 by subtracting 7 years from each individual's age and re-normalising the distributions. While this is the best approximation to the actual age distribution the study population would have had in 2000, it does not account for any deaths that would have occurred between 2000 and 2007 (hence the proportion of the elderly tends to be lower in the back projected distributions than in the census), and for any migration patterns. Figure S1 b to d compare the age distributions from the census in 2000 with those of the study populations projected back to the year 2000 for urban and rural Huangshan and Shenzhen, respectively. In Huangshan, the distributions match fairly well for adults over about 35 years, but differ markedly for younger people. This may be due to migration of the younger population. For Shenzhen, we compare both the back projected and the current age distribution of the study population with the census data. In the study population, the peak in the age distribution is less pronounced than that in the population census, and interestingly the peak age (20-24 years) of the current (rather than the back projected) study population matches that of the population census in 2000. This might be an indication of a high turnover of the migrant worker population, whereas the broadening of the peak would be explained by the ageing of the more stable local residents population. In China, employment rates rise rapidly between ages 15 and 20 for both men and women. Between 20 and 50 years, employment is at its highest level, with around 95% of men, but only 85% of women employed. After the age of 50, employment rates start to decline as people retire. However, there are marked differences between regions. Both the urban study areas mirror the lower employment rate for women with particularly low employment among Shenzhen women from their mid-twenties onwards. Retirement age starts earlier than the national average. Rural Huangshan shows very high employment rates (close to 100%) for both men and women and a much higher retirement age, see Figure S2. The proportion of the study populations in education is shown in Figure S3. Between the ages of 6 and 15, nearly full enrolment is achieved. Between 0 and 5 years, enrolment rates rise rapidly; urban Huangshan has considerably higher enrolment rates in both very young children and older teenagers than either rural Huangshan or Shenzhen.

Additional detail on the regression analysis
For Huangshan students, the best fitting model includes age and urban/rural, with an interaction. The fitted mean log commuting distance is therefore given by age urb/rur age*urb/rur Note that by definition. For Shenzhen students, the best fitting model includes age and registration status, without an interaction term, such that the fitted mean log commuting distance is given by   Table S1 to Table S4. As expected, the p-values for most parameters are highly significant.
Furthermore, we fitted lognormal distributions to the observed distances stratified by the categories identified in the regression analysis. The parameter values of these are given in Table S5 to Table S8. Although the fitted distributions shown in Figure 6 in the main text are visually quite convincing, the p-values for many strata are very small, indicating a significant deviation of the observed from the fitted distributions. This is particularly the case for the both Huangshan students and employees, as the sharp cut-off around 30km in the distributions is difficult to capture.

Analysis of the frequency of occasional travel
If people travelled outside the target area randomly with a given frequency without any difference between individuals, we would expect the number of journeys observed to be distributed across individuals in the population according to a Poisson process with rate n N λ = , where n is the number of journeys observed in the population and is the population size. The expected number of people making journeys given by Table S9 shows the observed and expected number of people having made 0, 1 or more journeys as well as the p-values of a 2 χ -test. The p-values are very small for all study areas, indicating that not all people travel with the same probability, but there are more people not travelling at all, and more people making more than one journey than would be expected.

Reconstruction of the travel distance distributions
For the datasets in Huangshan city, we know the origin of each journey by district (Tunxi district or Xiuning county), for the Shenzhen dataset the origin can be anywhere within Shenzhen city. The destinations of the occasional journeys were recorded at the level of county/district, city or province. Some journeys had several destinations recorded, for the distance distributions these were treated as separate journeys. For each origin-destination pair we calculated a distance distribution by weighting the distances between any two points within the area of origin and the area of destination by both population densities at the points of origin and destination. If there was only one journey for a specific origin-destination pair, this was allocated the median distance, if there were several journeys, they were allocated distances at the relevant percentiles of the distance distribution. The population densities were obtained from the Landscan dataset [2,3] which gives global population density estimates at a resolution of less than 1sqkm.

Binning the distance distributions
The travel distances were binned into bins of logarithmic width, where the maximum distance within distance category is given by . For the results presented in the main paper, we used n = . Here, we investigate the sensitivity to this arbitrarily chosen level of aggregation. In order to ensure statistical validity of the fitting procedure, we exclude any values of n so small that there are any bins with less than 5 observations. We furthermore exclude values of n so large that there are less than 6 bins. Figure S4 shows the parameter estimates obtained for different levels of aggregation for the Huangshan and Shenzhen. Although the parameter estimates show a slight variation across the different aggregation levels, they are consistent across the range.  With this, we again fitted a lognormal spatial kernel to the observed distance distributions at different levels of spatial aggregation. The obtained parameter estimates are shown in Figure S1. For the Shenzhen dataset, the fitted values of the population power τ tend to be somewhat above 1, with 1 being in the area of the lower bound of the confidence interval. This means that the number of people travelling to a particular area is approximately proportional to the population of that area. However, for the Huangshan dataset, the fitted values of the population power τ are very small, indicating that the attractiveness of a travel destination does not depend on its population size. The spatial kernel has a slightly shorter range than with a population power of 1 τ = .
Comparing the fitted cumulative distributions ( Figure S6) with those for the simpler model with the population power fixed to 1 (Figure 7 in the main text), the additional parameter offers only a very slight improvement in the overall fit.