Conceived and designed the experiments: JL JHK DAF. Performed the experiments: JL. Analyzed the data: JL DAF. Contributed reagents/materials/analysis tools: JL JHK DAF JVD. Wrote the paper: JL JVD. Led STEM development: JHK DAF.
The authors have declared that no competing interests exist.
Air travel plays a key role in the spread of many pathogens. Modeling the long distance spread of infectious disease in these cases requires an air travel model. Highly detailed air transportation models can be over determined and computationally problematic. We compared the predictions of a simplified air transport model with those of a model of all routes and assessed the impact of differences on models of infectious disease.
Using U.S. ticket data from 2007, we compared a simplified “pipe” model, in which individuals flow in and out of the air transport system based on the number of arrivals and departures from a given airport, to a fully saturated model where all routes are modeled individually. We also compared the pipe model to a “gravity” model where the probability of travel is scaled by physical distance; the gravity model did not differ significantly from the pipe model. The pipe model roughly approximated actual air travel, but tended to overestimate the number of trips between small airports and underestimate travel between major east and west coast airports. For most routes, the maximum number of false (or missed) introductions of disease is small (<1 per day) but for a few routes this rate is greatly underestimated by the pipe model.
If our interest is in large scale regional and national effects of disease, the simplified pipe model may be adequate. If we are interested in specific effects of interventions on particular air routes or the time for the disease to reach a particular location, a more complex pointtopoint model will be more accurate. For many problems a hybrid model that independently models some frequently traveled routes may be the best choice. Regardless of the model used, the effect of simplifications and sensitivity to errors in parameter estimation should be analyzed.
Air travel plays an important role in facilitating the spread of many infectious diseases, and systems that model that spread may need to take it into account. Crépey and Barthélemy
In modeling air travel, as with many aspects of disease spread, the temptation is to include all possible detail, but this may lead to unwieldy, complex systems that are difficult to validate and slow to run. When stochastic models are used, this computational complexity can seriously impact our ability to run the tens of thousands of simulations may be necessary for valid results.
Models of air travel (or travel in general) may be integrated into epidemiological models in different ways, but at some level must account for the movement of, or contact between, people at distant locations. The details of this integration and of the models themselves are not our focus here, but an example is the open source Spatial Temporal Epidemiological Modeling (STEM) project
The appropriate level of abstraction, and indeed the importance of air travel itself, is dependent on the disease being studied and the question being asked. The analysis presented here focuses on diseases spread by persontoperson contact, which includes many of those where rapid control might be required, e.g., influenza, smallpox
As part of our work developing STEM, we evaluated a simplified air transportation model, where all individuals flow through a single hub, in comparison with a fully saturated model where all routes are modeled individually, and a “gravity” model where the probability of travel between airports is scaled by their physical distance.
In this article we attempt to characterize the errors associated with the simplified model in a manner meaningful to the disease modeler. The level of complexity required for a model largely depends on the question being asked; by specifying the type and magnitude of errors, we hope to aid disease modelers in deciding if using a simplified air transport model will substantively impact their conclusions.
We obtained data on individual tickets within the United States for all of 2007 from the U. S. Department of Transportation Research and Innovative Technology Administration Bureau of Transportation Statistics (RITABTS). Tickets give the origin and destination of full trips, rather than individual flights. The RITABTS ticket data (DB1BTicket from the Airline Origin and Destination Survey) are a sample of 10% of U.S. tickets from reporting carriers. Using this model we calculated the probability of a trip originating at any airport
In order to account for the possibility of flights on unseen routes, and ensure comparability between models, we assigned 0.1 trip per year on every possible route not seen in the RITABTS data. These unseen trips account for 0.01% of the trips considered in this analysis.
The simplified model we used is a “pipe” model, in which individuals flow in and out of the air transport system based on the number of arrivals and departures from a given airport (i.e., there is no explicit modeling of individual routes). In this model, the flow of passengers in the air transportation network is considered to be like that of an incompressible fluid flowing through pipes where airports are sources and sinks of fluid. The more traffic through a given airport, the more fluid is flowing and the larger the associated “pipe” into the network. Since any traveler in the global transportation system has some probability of mixing with any other traveler (either on a flight or during a flight change at some hub), the pipes of all diameters join in some abstract hub in this model. Pointtopoint travel is then determined by the product of the probability of travel from the origin, to the destination, normalized by the total travel. Under this model the probability of a trip from origin
In infectious disease modeling we are interested in the rate of introductions from
Difference in rate of introductions from 

Difference in number of introductions from 

Difference in overall rate of introductions into 
The maximum likelihood estimate of
In
Legend: In all four graphs origins are ordered left to right by increasing airport traffic, and destinations are ordered bottom to top by increasing airport traffic. (A) The logprobability a trip from a given origin airport is to a particular destination airport under the saturated model. (B) The logprobability a trip from a given origin airport is to a particular destination airport under the pipe model. (C) The log probability ratio of the pipe model versus the saturated model. (D) Trips for which the rate of disease introductions from a fully infected location is overestimated by at least one individual per day (red) or underestimated by one individual per day (blue) under the pipe model.
Of interest to the infectious disease modeler is the frequency with which a disease will be introduced under the pipe model, and not introduced under the full model (and viceversa). To quantify this, we looked at the difference in the rate of introductions from an origin to each particular destination under the two models under the assumption that everyone at the origin is infected with the disease. Using this metric, we found that in only 10% of routes will the rate of introductions be over or underestimated by at least one person per day, and this over or underestimation will tend to occur on the most traveled routes (
A final method of evaluating the extent to which the pipe model approximates the full model involves comparing the difference between the mixture of flight origins for individuals coming into a given airport under the saturated model and the pipe model. This can be characterized by the calculating the Euclidean distance between the vector of percentages of arrivals coming from each airport under the saturated model and the pipe model. Those airports with the fewest arrivals per year have a larger difference in the makeup of their arrivals between the predictions of the pipe model and the saturated model, as shown in
Legend: The Euclidian distance between the vectors representing the probability that a particular arrival at an airport is from a particular origin under the pipetransport and saturated models versus the estimated number of yearly arrivals at the airport. For airports with larger numbers of arrivals the pipe model more accurately approximates the true distribution of arrivals.
While the simplified pipe model of air travel provides a rough approximation of actual air travel, it has several shortcomings. Most of these can be traced back to the pipe model's overestimation of the number of small town to small town trips. The other simplified model considered, a gravity model which takes into account distance, has similar problems and offers little benefit for the increased complexity.
For those highly infectious disease where air transportation plays an important role, underestimation of the number of disease introductions that would occur from travel between major western and eastern populations centers (e.g., Los Angeles and New York) may result in models that underestimate the speed with which the a disease will cross the country. Similarly, the overestimation of the number of locations from which people travel to less busy airports may lead to models where diseases will more rapidly reach locations that might remain protected for a longer period of time. However, for most routes, the size of these effects are relatively small, and the former problem may be correctable by a hybrid model, where frequently traveled routes are treated independently (amplified). Computationally a pipe model offers an enormous advantage as it captures disease transmission by air travel with a 2N edged graph, compared with the pointtopoint model that requires 2N^{2} edges.
In modeling the large scale regional and national effects of disease, the pipe model may be adequate, if the most important driver of disease spread is local contact and transportation modeling serves only to allow the disease to make long distance jumps across the country. If we are interested in the specific effects of interventions on particular air routes, or the time for the disease to reach a particular location, a more complex pointtopoint model will be more accurate. For the most sophisticated and realistic simulations even a model of pointtopoint trips may be too much of a simplification, as contact within airports during transit may play an important role in transmission. There may be other factors that lead the investigator to choose one model over the other, for instance, in the pipe model it is straight forward to implement mixing within the air transport system, whereas this may be more difficult in a point to point model.
Regardless of which model is used, it is important that the implications of any simplifications, or errors in parameter estimation (which become more likely as model complexity increases), are analyzed so that the appropriate level of complexity for the problem at hand may be selected.
We would like to acknowledge Dr. Derek Cummings for his valuable comments in the formulation of this manuscript, and all of the contributors to the STEM project.
We also would like to acknowledge the Eclipse Foundation (