Towards a Metropolitan Fundamental Diagram Using Travel Survey Data

Using travel diary data from 2000–2001 and 2010–2012 this research examines fundamental traffic relationships at the metropolitan level. The results of this paper can help to explain the causes of some traffic phenomena. Network average speed by time of day can be explained by trip length and cumulative number of vehicles on the road. A clockwise hysteresis loop is found in the Metropolitan Fundamental Diagram in the morning period and a reverse process happens in the afternoon.


Introduction
Scientific analysis of traffic and travel behavior are related fields that have seen little intersection, owing to their different histories, data sources, and methods. Traditionally travel diaries provide data about a sample of individual travelers, and serving as the data source for estimating regional travel demand models, while traffic measurement gauges network performance. Travel surveys are generally weighted to develop numbers for the region as a whole. Thus large diaries also implicitly contain aggregate data that can be exploited to better understand the causes of variation in traffic conditions. This study explores one region's travel surveys from two points in time using tools that have historically been used with traffic analysis. It tests some of the relationships that have been uncovered by analysis of data from traffic sensors, to see whether they hold at the metropolitan level.
To date analyses of fundamental relationships of traffic have used traffic data from placebased sensors like loop detectors or GPS. Some representative studies related to network flows relationships, and the macroscopic fundamental diagram (MFD) in particular, are described below.
Network level traffic flow relationships have been periodically considered in the literature. Based on a kinetic theory of traffic, a two-fluid model for the evolution of speed distribution on highways has been proposed [1]. A simulation model with embedded macroscopic rules shows the dynamic behavior of relative concentrations and traffic flows on the boundary and interior links, such that average speed decreases as network concentration increases [2]. The macroscopic traffic network relationship held for road sections, and varies with number of lanes and block length [3]. Large scale simulations reproduced the relationships on larger urban networks [4].
A macroscopic fundamental diagram (MFD) linking space-mean flow, density and speed for a complete network of a large urban area has been demonstrated with a field experiment in Yokohama (Japan) [5]. The data is collected by a combination of fixed detectors and floating vehicle probes (GPS-equipped taxis). The estimates of space-mean speeds and densities at different times of day lie close to a smoothly declining curve with deviations smaller than those of individual links. The spatial distribution of vehicle density in the network is one of the key components of a low scatter MFD and its shape [6]. If the spatial distribution of link density is the same for two different time intervals with the same number of vehicles in the network, then the same average flows should be presented. They also examine the errors of an MFD based on the errors of individual link microscopic fundamental diagrams (μFD) and errors in the probability density function of link occupancy. Full information of trajectories from probe vehicles for estimating the network speed is superior to spot sensors like loop detectors [7].
Hysteresis was identified on Minnesota freeways, which also have a high scatter due to different spatial and temporal distributions of congestion for the same level of average density [8]. Another reason is synchronized occurrence of transient periods and capacity drop phenomena in the offset of congestion. Similarly, a clockwise hysteresis loop exists in MFD [9]. The research also indicates that drivers tend to choose routes adaptively to avoid congested areas. Speeds subsequently recover from congestion, so the presence of adaptive routes make hysteresis less likely. To explore the limiting properties of network-wide traffic flow relations under congestion conditions, a model reproducing hysteresis and gridlock of the urban street network was established [10]. It is found that the network flow can be approximated as a non-linear function of network average density and variation in link densities, the networks with multiple route options tend to congest at a smaller density range. Traffic demand management and adaptive driving affect gridlock size, propagation speed, and recovery speed.
The impact of heterogeneity on the existence of an MFD with data collected by loop detectors has been studied [11]. Also a hysteresis pattern was observed which may be caused by spatially heterogeneous nature of travel demand, including different schedules for cars and trucks. The spatial distribution of congestion affects the scatter and hysteresis of MFD [12].
We note that the MFD suffers from the Modifiable Areal Unit Problem (MAUP), which means the size and precise shape of the MFD will be affected by the size of the research area. The type of roads and the way of dividing the city into zones can impact the size and shape of the MFD.
Average vehicle densities can be estimated across a network by using travel speed and MFD of urban traffic [9,13]. Congestion is critically important to the existence of MFD, and the estimates based on average travel speed are inaccurate in non-congested states but reliable and accurate in congestion or its onset. Average trip distance increases with network load level, as congestion induces drivers to take longer routes [14].
In contrast with the existing research described above, this paper employs travel diaries from a metropolitan survey to examine questions that have heretofore used traffic data. Using travel surveys conducted by the Metropolitan Council of the Minneapolis/Saint Paul region during 2010-2012 as well as 2000-2001, this paper aims to develop a fundamental diagram that relates speed to the total number of vehicles on the network at a given time, and to estimate a statistical relationship that accounts for that, trip length, and the onset / offset of congestion.

Materials and Methods Data
The 2000 and 2010 Twin Cities Travel Behavior Inventory (TBI) (with data collection in 2000-2001 and 2010-2012 respectively) are used in this analysis, the two time periods allow tests of temporal stability. The data, summarized in [15], represent the entire survey period. All data were collected on weekdays, and each trip record in the data provides location of origins and destinations, departure time, mode, and purpose. The spatial coverage of trips is the sevencounty metropolitan region (Anoka, Carver, Dakota, Hennepin, Ramsey, Scott, and Washington counties in Minnesota).
We examine only trips by automobile in this analysis, though the survey was multi-modal. Future research may investigate MFDs for other or all modes of travel. The travel duration was reported by survey respondents. A consultant provided household weights for the survey as described in [16]. The research team computed network distances for each trip in the survey, assuming trips took the shortest distance path on the network, using the region's TLG (Lawrence Group) network.
Much of the data is self-reported by the individuals who participated, and therefore there are errors in reporting. The following censoring rules were employed to address this issue, as illustrated in Table 1. Trips were excluded if: • The calculated travel speed was higher than 120km/h or lower than 10km/h.
• The reported travel time was higher than 120 mins or lower than 1 min.
• Weight of trip equals to zero. (The total of all person weights in the weighted survey equals the population of the survey region. If the weight of a trip is given as zero this means the individual is oversampled or there is some other data problem, so it is excluded.) • Trips began earlier than 3 am or ended after midnight.
In addition there are biases. A subsample of the subjects in the TBI2010 survey (too small to use for these purposes) were tracked using an on-person GPS unit for 7 days. For those respondents we do have the actual route chosen [17]. Using different data the median value of the actual path for commute trips is about 30 percent longer than the shortest network path travel time estimated using GPS data [17,18].
Further, people exaggerate their travel times (which is reported in travel surveys as higher than actual) by on average 50 percent, though this bias depends on network structure and congestion levels and the nature of the trip [19]. Some of this may be due to definitions about when and where a trip begins (e.g. vehicle based GPS will measure engine-on to engine-off, while a subject might report door-to-door travel times).
Which means actual speeds may be 1.3 Ã 1.5 = 1.95 times faster than calculated, depending on assumptions and actual values.
This paper uses reported travel times and estimated shortest network distance paths and reports s calculated , but this bias should be kept in mind. We do not believe it affects the general findings and fundamental relationships, though clearly affects the values associated with those findings. Future research using GPS recordings rather than self-reporting of travel surveys should improve the accuracy and precision of these analyses. Presently the sample sizes of GPS surveys are too small to draw sufficiently general conclusions.  Travel surveys weight individuals to try to replicate socio-economic and demographic distributions in the general population. So for a 1 percent sample, the average person weight would be 100. Oversampled groups would receive lower weights, while undersampled groups receive a higher weight.
In TBI surveys the simple rate is presented by a household weight w h developed by the consultants who conducted the survey to achieve the above aims. In this paper, we require a person weight w p,h in household h for every person in the household, which is obtained by equally distributing the household weight to the members of the household.
where: p h is the number of people in household h. The sum of w p,h for all respondents should be the population size (P) of the seven-county region. While the initial weights aimed to achieve that as well, our application of filters would result in the weights needing adjustment to reacquire that total.
where: λ is the coefficient of weight adjustment. w i is the person weight. (The person weight equals the trip rate as a person can only be on one trip at a time). The arrivals in each time window of duration y are the sum of all trips starting at t multiplied by the weight of the trip (w i ).
aðtÞ ¼ where: X a,i (t) = 1 if t t a (i) < t + y, that is, if trip i originates in the time window of duration y indexed by t, and 0 otherwise. The subscript i indexes trips, I is the total number of trips. Similarly, the cumulative number of departures (trips leaving the network) by time t (D(t)) are the sum of individual departures in each time period (d(t)): The time at which a trip leaves is simply the time in which it enters plus the duration of the trip (its length divided by speed).
The departures in each time period are the sum of all trips ending in a time window of duration y beginning at t multiplied by the weight of the trip (w i ).
where X d,i (t) = 1 if t t d (i) < t + y, meaning trip i leaves the network in the time window, and 0 otherwise. The total number of vehicles on the road at any given time N(t) is the difference between cumulative arrivals and cumulative departures.
The travel duration of each trip is given by: Total travel duration (T(i)) is thus the sum of this over all trips: Average travel duration is T ðiÞ ¼ TðiÞ In short, we cannot use network average lengths and speeds to correctly ascertain average travel duration (though for large samples, they may be similar), instead we construct the average from individual trip records.  [20][21][22]. Moreover the number of trips in 2000 were higher in mid-day than 2010, though at the end of the morning peak, the numbers had been similar. This supports observations of a decline in off-peak (typically non-work) travel between these two periods.

Descriptive Results
Next the change of the total number of on-road vehicles by time of day, N(t) is presented in Fig 2, which graphs the weighted diurnal curves for the metropolitan region. There are two spikes at around 7 am and 5 pm. Slopes of the line indicate the onset and offset of congestion, illustrating the rush hours in the morning and afternoon periods. The slopes indicate change in N(t) by time of day. With travel surveys, unlike traffic data, we know the purpose of the trip, so we can disaggregate travel by purpose. Travel behaviors (such as trip length and time-of-day) and thus congestion experienced vary by purpose. Fig 2 compares full and part-time workers and non-workers (N full (t), N part (t), N non (t)). First we see there are many more full time workers than part-time workers. Second we see that part-time workers are much more likely to be on the road in the middle of the day, while full-time workers are more peaked in their travel patterns. This peaked nature of (full-time) work trips gives rise to congestion. The figure also shows that non-work trips are more common in the afternoon. While the number of non-work trips is far higher than the number of work trips, the greater length of work trips offsets that to some extent, so during the peaks most vehicles on the road are work trips, even if most trips that originate in that period are not.     and a(t), the red line relates L(t) and N(t), and the green line relates N(t) and a(t), while the black line shows the three dimensional relationship for a(t), N(t) and L(t).
Average network trip distance (on the shortest path route) by 15 minute period, in Fig 5, illustrates that full-time work trips are on average longer than part-time trips, and that trips in the early morning hours tends to be longer. People with long commutes depart earlier both to ensure on-time arrival, and to avoid congestion effects which are more onerous on longer distance trips. Fig 6 shows the change of (calculated) speed with time of day. The speed is quite high in the early morning and relatively flat the rest of the day, holding at 40km/h (probably closer to 80 km/h in actuality if we adjust for over-reporting of travel duration and under-estimation of distance). The median value of speed has a similar trend with average speed but lower values. The box plot diagram also show the distribution of speed at different time points, the deviation of speed is about 10km/h. The results show that in general, longer distance trips take more time. However we can see that there are differences by time of day, which we associate with (1) different congestion levels in different day parts, (2) different trip purposes in different day parts.
The relationship of trip distance and speed is indicated in Fig 7. Long trips are associated with higher speeds, particularly for early morning departures, though the speeds drop noticeably after 7:00 am. From the ratio of speed to trip distance, congestion is greater in the afternoon than morning. Fig 8 graphs the fundamental relationship in two-dimensions, and shows in general that as the more vehicles enter the network and thus traffic levels increase, speed decreases. But it also shows hysteresis type processes. There is not a single line, but rather 4 relationships: AM congestion onset (the left-most portion), Mid-day congestion off-set, PM congestion on-set, and PM congestion off-set. Unlike typical macroscopic or microscopic fundamental diagrams, we cannot assume trip length is fixed, making the relationship more complicated. So for instance,       Towards a Metropolitan Fundamental Diagram Using Travel Survey Data while one might expect with AM congestion off-set that speed rises, average network speed does not rise until the PM congestion on-set, and it continues rising with PM congestion offset. What is going on (and we will illustrate in the next section) is that trip length is a confounding factor. The variability of trip length influences speed. We control for this to test hysteresis. Longer trips are faster, and are more likely to occur in the AM congestion onset and PM congested periods, while the shortest trips (typically mid-day nonwork trips) occur in less congested periods, but because they are shorter, and thus use more local roads, tend to have lower speeds.
That relationship becomes more obvious in Fig 9, showing the relationship between average travel duration and total vehicles on the network for every 15 minute period. The early morning has very long trips, which stabilize at about 24 minutes for the morning peak 15 minute period. Travel durations tend to decline through the mid-day as non-work (and part-time work) trips take a larger share of total travel. Durations then rise slightly in the afternoon peak, but do not match the morning, as the share of non-work trips is much higher.

Statistical Results: Arrival and Departure Rates
A regression to determine the number of vehicles entering the network (a(t)) as a function of the number of vehicles on the network (N(t)) was estimated as a quadratic relationship.
The regression results are shown in Tables 2 and 3 for both the 2010 and 2000 TBI to test for temporal stability. The results vary by year. The results all show a positive relationship between a(t) and N(t), and a negative relationship with N(t) 2 , indicating a diminishing number of vehicles entering system as the traffic on the system rises. This resembles the relationships found with the macroscopic and microscopic fundamental diagrams. The difference between departure time and arrival time is trip duration, they both relate to a city's dynamic state. Departure rate (exiting the network) depends on the state of traffic and arrival time. Arrival rate (entering the network) is the personal choice of each traveler, based on anticipated traffic levels (a function of historic traffic) and desired arrival (at destination) time.
In Table 4, the number of vehicles decreases faster (-35.814 vs. 24.111) in the morning peak hour while the vehicles increase faster (36.083 vs. -20.725) in the afternoon peak hour, so the number of vehicles increase fastest in the afternoon congestion onset and decrease fastest in the morning offset. This relates to the trip purpose of travelers. In the morning people need to get to their workplace on time, in the afternoon people get off work over a wider range of time and may engage in other activities. Since we have a multi-dimensional relationship, a statistical model is employed to decompose the factors explaining average calculated travel speed by time of day, as shown in Table 5. We consider three factors: average (estimated shortest distance path) length of trip by time of day (L(t)), number of vehicles on the road at time t (N(t)), and a dummy variable indicating congestion onset (c). Other functional forms (e.g. quadratic) were tested without notable improvement in statistical performance.
SðtÞ ¼ f ðLðtÞ; NðtÞ; cÞ ð 15Þ  We put L (trip distance) into the model to predict speed (S(t), s i ) because speed is not uniform, and depends on congestion and trip distance, and those are not fixed either.
The second regression results conduct for each individual trip, rather than by time slice, using each trip's calculated trip length and calculated speed, rather than the averaged value by time of day.
As the number of observations rise, so does the variability between observations (the first regression averaged across multiple people traveling at a given time of day). Thus the adjusted R 2 is lower (dropping from 0.87 in the aggregate model to 0.33 in the disaggregate model). Nevertheless the core relationships with Length (in this case individual trip length (l i )) and N(t) hold. All variables are significant at 5 percent level or better.
From the disaggregate model we see that each additional vehicle on the road in 2010 reduces network average speed by -0.00004 km/h. Each additional km of trip length increases network speed by 0.96 km. Moreover time periods where congestion is increasing (c = 1) reduce speed by 0.84 km/h. In 2010, length is a more important factor (based on the coefficient magnitudes) and number of vehicles on the network less important than in 2000. By Fig 1 we know that during congested periods, the cumulative number of vehicles entering and exiting the network is higher in 2010 than 2000, but for the whole day, there were slightly more vehicles on the network in 2000 than 2010.
To test hysteresis, Table 6 shows the coefficient l i and N(t) for speed in morning peak hour, mid-day and afternoon peak hour using disaggregate data. The Hysteresis depends on variation in the number of vehicles on the network. For the speed regression, the constant for congestion onset is higher than that of offset in all six cases. The coefficient for length is higher in offset for 5 of 6 cases. Longer trips are more likely to be at the edge of the region, have faster speeds, and recover faster from congestion. The coefficient of number of vehicles on the network is higher (less negative) for offset than onset in 5 of 6 cases. Speed change is most sensitive to l i and N(t) in the least congested mid-day period. Basically the coefficient of l i is higher in the morning than afternoon and the coefficient of N(t) is lower in the morning than that in afternoon for the onset and this is opposite for the offset. The process by which speed is changes with number of vehicle and trip length is complicated. Still hysteresis can be found by the change of speed in different time periods and traffic levels.

Discussion
This analysis uses a travel survey to measure total traffic levels and finds statistically significant relationships between aggregate demand and individual and average travel speed. we also find a hysteresis process, whereby speed depends not only on vehicles entering or exiting the network and number of vehicles on the network, but also on whether congestion is rising or falling. A strength of this analysis is the ability to use historic travel surveys to construct these curves for previous years where traffic speed data may not have been collected. Travel surveys have been collected systematically since the 1940s, and a number of those from the 1960s and 1970s still survive (many are archived at the Metropolitan Travel Survey Archive http:// surveyarchive.org). While not every survey has the spatial detail or weighting scheme we might like today, the general results should be comparable, and can show the stability (or lack thereof) in metropolitan speed, flow, length relationships.
Cautions about the analysis include that this is based solely on personal travel reported in the survey, and this excludes non-personal trucks, buses, bicycles and other vehicles on the road. However even with such limits, the relationships between speed, trip distance, and trip duration are sufficiently stable that strong fundamental relationships can be observed using travel survey data.