Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Two-Stage Algorithm for Origin-Destination Matrices Estimation Considering Dynamic Dispersion Parameter for Route Choice

  • Yong Wang,

    Affiliations School of Management, Chongqing Jiaotong University, Chongqing, China, Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington, United States of America

  • Xiaolei Ma ,

    xiaolei@buaa.edu.cn (XM); yinhai@uw.edu (YH)

    Affiliations School of Transportation Science and Engineering, Beijing Key Laboratory for Cooperative Vehicle Infrastructure, Systems, and Safety Control, Beihang University, Beijing, China, Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, SiPaiLou #2, Nanjing, Jiangsu, China

  • Yong Liu,

    Affiliation School of Management, Chongqing Jiaotong University, Chongqing, China

  • Ke Gong,

    Affiliation School of Management, Chongqing Jiaotong University, Chongqing, China

  • Kristian C. Henricakson,

    Affiliation Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington, United States of America

  • Maozeng Xu,

    Affiliation School of Management, Chongqing Jiaotong University, Chongqing, China

  • Yinhai Wang

    xiaolei@buaa.edu.cn (XM); yinhai@uw.edu (YH)

    Affiliation Department of Civil and Environmental Engineering, University of Washington, Seattle, Washington, United States of America

A Two-Stage Algorithm for Origin-Destination Matrices Estimation Considering Dynamic Dispersion Parameter for Route Choice

  • Yong Wang, 
  • Xiaolei Ma, 
  • Yong Liu, 
  • Ke Gong, 
  • Kristian C. Henricakson, 
  • Maozeng Xu, 
  • Yinhai Wang
PLOS
x

Correction

12 Feb 2016: Wang Y, Ma X, Liu Y, Gong K, Henrickson KC, et al. (2016) Correction: A Two-Stage Algorithm for Origin-Destination Matrices Estimation Considering Dynamic Dispersion Parameter for Route Choice. PLOS ONE 11(2): e0149827. https://doi.org/10.1371/journal.pone.0149827 View correction

Abstract

This paper proposes a two-stage algorithm to simultaneously estimate origin-destination (OD) matrix, link choice proportion, and dispersion parameter using partial traffic counts in a congested network. A non-linear optimization model is developed which incorporates a dynamic dispersion parameter, followed by a two-stage algorithm in which Generalized Least Squares (GLS) estimation and a Stochastic User Equilibrium (SUE) assignment model are iteratively applied until the convergence is reached. To evaluate the performance of the algorithm, the proposed approach is implemented in a hypothetical network using input data with high error, and tested under a range of variation coefficients. The root mean squared error (RMSE) of the estimated OD demand and link flows are used to evaluate the model estimation results. The results indicate that the estimated dispersion parameter theta is insensitive to the choice of variation coefficients. The proposed approach is shown to outperform two established OD estimation methods and produce parameter estimates that are close to the ground truth. In addition, the proposed approach is applied to an empirical network in Seattle, WA to validate the robustness and practicality of this methodology. In summary, this study proposes and evaluates an innovative computational approach to accurately estimate OD matrices using link-level traffic flow data, and provides useful insight for optimal parameter selection in modeling travelers’ route choice behavior.

Introduction

Urban sprawl and population growth have resulted in increasingly severe traffic congestion in major cities around the world. City planners and decision makers have recognized the need for comprehensive traffic management strategies to meet the challenges of rapidly evolving built environments and population demographics. Effective transportation polices and control measures can improve traffic safety and quality of service, as well as promoting economic development and reducing air pollution. Obtaining origin-destination (OD) traffic demand matrix in low-cost and high-accuracy manner not only becomes a problem transportation science, but also draws attentions from many scholars in various scientific fields. For example, researchers in statistical physics and complex systems recently proposed a number of novel methods to estimate OD matrix directly from population data [1, 2, 3, 4, 5, 6]. Reliable OD matrix estimation can provide critical insight for traffic management, operations, and urban planning efforts to mitigate congestion [7, 8]. Thus, a reliable OD matrix estimation method is indispensable for both transportation planners and traffic engineers.

A number of approaches have been developed for estimating OD matrices in the past several decades [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]. Compared with conventional survey-based method, data-driven OD estimation methods relying on link-level traffic flow measurements require less effort and offer significantly reduced time and cost for data acquisition and processing. For such methods, observed traffic flows at key points throughout the network should be known as prior information for OD matrix initialization.

Past research on this topic has considered range of different optimization methods, including entropy maximizing estimators [21, 22], maximum likelihood estimation [23], Bayesian inference estimation [24], generalized least squares (GLS) [9, 10, 25] to estimate OD demands. Entropy maximizing estimators are used to maximize the spread of trip distributions on all available paths (routes) where the observed traffic flows are used as the only information (i.e. without a target trip matrix). Maximum likelihood estimation aims to maximize the likelihood of the closeness between target OD matrix and estimated OD matrix. In the Bayesian inference approach, the target OD matrix is a prior probability function of the estimated OD matrix on a basis of observed traffic count data. The GLS estimator is a robust and efficient linear unbiased estimator, which can solve the estimation of OD matrix by minimizing the Weighted Euclidean Distances (WED) between the target data and the solution data.

User equilibrium (UE) assignment models are commonly used to obtain path choice behavior based on the estimated OD demand. Deterministic UE assignment models assume that all users have access perfect information about the generalized link travel costs, and select a route with the lowest perceived travel cost [26]. Beckman [27] formulated the UE assignment model by assuming that the OD demands are a function of level of service. A combined distribution and assignment model which relies on link-level traffic flow data was presented by Fisk and Boyce [28], and extended by Lam and Huang [29] to address multiclass-user transportation networks. Fisk [30, 31] proposed a combined entropy maximizing model with UE constraints. Yang et al. [11] integrated the GLS technique with a UE traffic assignment model for OD matrix estimation, presented in the form of a convex bi-level optimization problem. Summaries of the more recent contributions to UE-based traffic assignment are provided in Han [32], Lu et al. [33], Inoue and Maruyama [34], Kumar and Peeta [35].

The stochastic user equilibrium (SUE) principle allows the perceived cost to vary between individuals in a heterogeneous population, which can be seen as a more realistic approach than deterministic UE [15, 36], in which the perceived travel costs cannot vary between travelers. The probit SUE was first formulated as a generalization of user equilibrium by Daganzo and Sheffi [37], and developed by Sheffi and Powell [38] as a mathematical programming problem. Liu and Fricker [39] presented a two-stage SUE approach to estimate OD matrices and the probit dispersion parameter in an iterative manner. Yang et al. [15] improved on the methods described in Liu and Fricker by incorporating link traffic flows and travel cost obtained using logit-based SUE traffic assignment. Meng et al. [40] presented a linearly constrained model and solution algorithm for the probit SUE problem with fixed demand and separable link travel time functions. This modeling approach was extended in Meng et al. [41] using elastic demand and non-separable link travel time functions. Time-dependent traffic assignment can be also formulated as a multinomial logit model [42, 43, 44], and this has become one of the most common methods for SUE-based traffic assignment [45, 46, 47]. In a fixed-point formulation, fixed target demands or link flows are used to establish model based on UE and SUE principles [13, 14, 19].

In the multinomial logit model formulation, the link choice probability is a function of a dispersion parameter θ [16], which describes road users’ perception of travel costs. Though the dispersion parameter θ is predetermined in many previous studies [14, 36, 45, 46, 47, 48], here we assume that this value should be allowed to change with traffic conditions. In addition, Lo and Chan [16] proposed a maximum likelihood procedure for simultaneously estimating the OD matrix and the dispersion parameter θ, while the link choice proportions and link flows can be further calculated based on the maximum likelihood estimators of OD matrix and θ. Compared with the previous studies, the main contributions of this paper lie in: (1) A fixed-point model is formulated with a dynamic dispersion parameter θ, where the estimation of link choice proportions is integrated into the optimization procedure; (2) A GLS estimator is utilized to train this model, and the link choice proportions can be simultaneously calculated based on the OD matrix and dispersion parameter through a multinomial logit model; (3) A two-stage iterative algorithm is presented to refine the OD matrix and dispersion parameter estimates, and Sequential Quadratic Programming (SQP) from the extended quasi-Newton method is applied in the two-stage algorithm process[49].

The remainder of this paper is organized as follows: In Section2, relevant notation, definitions, and model formulations are presented, followed by a link choice proportion approach to calculate the observed link flow using a true OD matrix. A two-stage algorithm is described in Section 3, along with model implementation details. The performance of the proposed approach is tested in a hypothetical network, and a sensitivity analysis is conducted using a range of variation coefficients. Results are presented and compared with those obtained through other established OD estimation methods. In section 4, results are presented for a real-world network using loop detector data in the city of Seattle, WA to demonstrate the practicality of the proposed approach. Finally, conclusions are summarized in Section 5.

Model Formulation

Related Notations and Definitions

The notation and parameter definitions used throughout the paper are as follows:

  1. K the set of network links kK, where T denotes the total number of links
  2. L the set of observed links lL, where Γ denotes the number of observed links
  3. J the set of OD pairs jJ, where τ indicates the total number of OD pairs
  4. A the set of paths connecting the OD pair j, aA
  5. ck travel cost of link k
  6. crj travel cost of path r connecting the OD pair j
  7. tk the free flow travel time of link k
  8. Ck the capacity of link k
  9. αk the performance function parameter of link k
  10. βk the exponential value of link k’s performance function
  11. M the set of nodes in the network
  12. f the vector for estimated link flows, where fk is the estimated flow for link k
  13. the vector for observed link flow, where is the observed link flow for link l
  14. d the estimated OD vector matrix, where dj is the jth element of d for OD pair j
  15. the target OD vector matrix, where is the jth element of for OD pair j
  16. the initial OD vector matrix for SQP algorithm optimization
  17. W' the initial weight matrix in the group of all paths connecting each OD pair
  18. w'mn the initial weight element of all paths connecting nodes m and n, m,nM
  19. W the weight matrix in the group of all paths connecting each OD pair
  20. wmn the weight element of all paths connecting nodes m and n, m,nM
  21. E the identity matrix with the same dimension as the initial weight matrix
  22. G the vector matrix of observed link flows, where Gi is the ith element of G
  23. U the covariance matrix for the target OD vector and estimated OD vector
  24. V the covariance matrix for the observed link flows and estimated link flows
  25. P the matrix of link choice proportions, where pkj is the kjth element of P. This is equivalent to the proportion of OD pair j traveling on the observed link k
  26. Prj the probability of path r that connects OD pair j being chosen for a trip
  27. xk the observed traffic flow of link k
  28. the estimated traffic flow of link k at the sth iteration in the traffic assignment stage
  29. the auxiliary mean traffic flow of link k at the sth iteration in the traffic assignment stage
  30. θ the estimated dispersion parameter for OD estimation
  31. the target dispersion parameter for OD estimation
  32. the initial dispersion parameter for SQP algorithm optimization
  33. Q the covariance matrix for θ and
  34. Sd the feasible solution set for OD matrix
  35. Sθ the feasible solution set for θ parameter
  36. the variance for OD demands
  37. the variance for link flows
  38. the variance for dispersion parameter
  39. λd random term for the target OD matrix
  40. λf the random term for observed link flows
  41. F1 the “distance” between the estimated OD vector matrix d and target demand OD matrix
  42. F2 the “distance” between the estimated link flow vector f and observed link flow vector
  43. F3 the “distance” between the estimated dispersion parameter θ and the target dispersion parameter
  44. akrj decision variable, if the link k lies on path r connecting the OD pair j, and set akrj = 1, and 0 otherwise
  45. η the percentage of traffic flow traveling from each node to the most adjacent node
  46. RMSE(OD) the root mean squared error between estimated and true OD matrices
  47. RMSE(LF) the root mean squared error based on estimated and observed (true) link flows

Based on the above notations, the OD estimation model development and validation procedure can be described as follows:

  1. A ground truth OD matrix is used as prior information to calculate the link flows based on the link choice proportion model. These link flows represent the measured traffic flows obtained through fixed mechanical sensors or other means;
  2. The observed link flows are chosen from those calculated link flows at fixed points throughout the network;
  3. The estimated OD matrix, link flows, and dispersion parameter are obtained via the fixed-point model and two-stage iterative algorithm using the partial observed link flows from step (2);
  4. Results are evaluated and compared with the ground truth as established in step (1).

The Fixed Point Model with Dynamic Dispersion Parameter.

As presented in the previous subsection, the estimated OD vector matrix is expressed as d = [d1,d2,…,dj,…,dτ]', where dj denotes the mean traffic flow of the jth element of d for OD pair j. Consider an OD pair j connected by a link k which is associated with a link performance cost function ck(fk) equal to the cost of using link k. The link performance cost function [50] is expressed during the traffic assignment procedure in Eq 1: (1)

The link flow vector is defined as f = [f1,f2,…,fΓ]', and the matrix of link choice proportions is denoted as P = [pkj], where 0 ≤ pkj ≤ 1. This represents the proportion of OD pair j connected by the link k. Thus, the mathematical expectation of link flow vector f can be calculated as E[f] = [Pd]Γ×1, where Pd is the product of the observed mean link flow vector and the matrix of link choice proportions P. P can be adjusted by the link flows and the dispersion parameter θ.

The OD matrix can be estimated via a fixed point formulation by considering the target OD matrix and observed link flows as follows [9, 10, 13, 14, 15, 16, 18, 19, 51]: (2) Where:

P(d) = {pkj(dj)} is the assignment matrix, which represents the proportion of OD pair j using the observed link k;

f = P(d)d is the estimated link flow vector. f = {fk}, where fk = Σj pkj(dj)dj.

In this study, the dispersion parameter is integrated into the objective function (Eq 2) [16, 19] as follows: (3)

This model can be seen as a Stochastic User Equilibrium (SUE) problem [13]. The Generalized Least Square (GLS) estimator can be used to solve Eq 3 by minimizing the Weighted Euclidean Distances (WED) between the target data and the solution vector, and Eq 3 can be then reorganized as shown in Eq 4 [9, 10, 15, 48]: (4) Where:

P(d,θ) = {pkj(dj,θ)} is the assignment matrix, and is a function of both OD matrix and dispersion parameter θ.

f = P(d,θ)d is the estimated link flow vector. f = {fk}, where fk = Σj pkj(dj,θ)dj.

The matrix for link choice proportions P can be generally assumed fixed during the optimization procedure [10, 13, 14, 19]. This procedure performs well for uncongested traffic conditions or an idealized traffic network with fixed link costs. However, when the network becomes congested, users’ choices are increasingly influenced by adverse traffic condition. In this case, link flow and cost are not independent, and the assignment matrix P should be assumed to vary within each optimization step for link flow and OD estimation. Similarly, the GLS estimators of d and θ can be also obtained by solving Eq 4.

The Link Choice Proportion Calculation Using the Dispersion Parameter.

As mentioned in notation and definitions subsection, the link flow and cost will be updated when a new set of values of d and θ is received. Drivers’ link choice decisions are influenced by the network-wide traffic condition, and thus the link choice proportion matrix P should be allowed to vary as well. The method of successive average (MSA) is adopted to calculate equilibrium link flows in the traffic assignment procedure [7, 16, 45, 52].

The cost of path r connecting the OD pair j can be expressed as: (5)

The probability Prj can be then computed according to the path choice logit model [45]: (6)

For a driver traveling along the path r, the weight assigned to link k is equal to exp(-ckθ). It is worth noting that the sum of probabilities over all feasible paths for each OD pair is equal to one.

As previously noted, W' = [w'mn] is the initial weight matrix of all possible paths connecting each OD pair. With the initial weight is set to w'mn = exp(-ckθ), then W', , and represent the weight matrix in the group of paths with one link, two links and three links respectively. Therefore, the weight matrix for all possible paths can be formulated as: (7)

Wong [53] and Lo and Chan [16] have proven that the right side of Eq 7 is convergent for any acyclic networks, and is equal to W = (EW')−1E. Therefore, the probability of a trip from node m to node n (OD pair j) choosing link k can be calculated as follows: (8) Where link k connects node g and node v, and wmn expresses weight matrix of all possible paths connecting nodes m and n, m,nM. wmn is set to 1 for all nodes in the network.

Following the previous definition, the auxiliary mean traffic flow of link k is defined for each incoming d and θ via the following equation: (9)

The equilibrium traffic link flows can be then obtained using the MSA method. Specifically, the flow of link k can be calculated at the (s+1)th iteration with the following equation: (10)

As shown in Eq 10, the flow of link k at the (s+1)th iteration is equal to the mean of the auxiliary traffic flow of link k in the previous s iterations.

When a new set of values of d and θ is received, the matrix P of link choice proportions is updated following the procedure described above, and is then integrated into the Eq 4 to update the values of d and θ. This optimization procedure continues until convergence of the OD matrix and dispersion parameter estimation is reached.

Model Solution Algorithm

To solve the Stochastic User Equilibrium (SUE) problem described above, a two-stage algorithm for GLS estimation and SUE traffic assignment is proposed: First, the OD matrix d and the dispersion parameter θ are simultaneously estimated under the condition of the fixed link flows, link costs, and weight matrix. Second, the link flows, link costs, and link choice proportions are updated according to the new values of d and θ in the SUE assignment process. The two-stage algorithm is executed iteratively until the convergence of values of d and θ is reached. Sequential quadratic programming (SQP) from the extended quasi-Newton method is chosen as the solution method [49].

Two-Stage Algorithm

The initialization procedure of the two-stage algorithm can be described as follows:

  1. Initialize the counter t = 0, set the initial OD vector matrix , the initial dispersion parameter , and the initial link flow , kK.
  2. Calculate the initial link costs for all links in the network using Eq 1, and calculate the weight matrix W for all paths based on the initial link costs and θ(0).
  3. Calculate the link choice proportion matrix P using the weight matrix W and θ(0).
  4. Calculate the initial mean auxiliary traffic flow for all the observed links with Eq 9, and update t = t + 1.

The first stage of the algorithm is described as follows:

  1. Step 1. The objective function (Eq 4) can be updated with the new mean auxiliary observed link flows as follows: (11) Where:
    U−1, P(t), V−1, and Q−1 can be updated using the new mean auxiliary observed link flows, estimated OD vector matrix, dispersion parameter, and link flow vector respectively;
    The feasible set for d and θ should meet the requirements d ≥ 0, θ > 0. When the value of θ approaches zero, the path choice probabilities for all paths tend to be equal. As the value of θ increases, the path choice probabilities tend to be deterministic.
  2. Step 2. Use the SQP algorithm to obtain a new set of values of d(t) and θ(t) that minimizes the objective function. The starting point for optimizing the OD vector and dispersion parameter should be fixed in advance. During the iterative process of the SQP algorithm, whenever a new value θ is received, the link choice proportion matrix P will be updated by changing the value of exp(−θck) in Eq 8, while the link cost and weight matrix should remain unchanged.
    The second stage of the algorithm can be described as follows:
  3. Step 3. Initialize the counter s = 1.
  4. Step 4. Calculate the weight matrix W with the new dispersion parameter θ(t).
  5. Step 5. Calculate the link choice proportion matrix P(t) using the weight matrix W and dispersion parameter θ(t).
  6. Step 6. Calculate the mean auxiliary traffic flow for all observed links as follows:
  7. Step 7. Calculate the equilibrium traffic link flow of link k via the MSA method:
  8. Step 8. The maximum relative difference between current and previous mean link flows should satisfy the following requirement: (12)
    If the above requirement is met, the algorithm proceeds directly to step 11, otherwise proceed to step 9.
  9. Step 9. Calculate the new link costs according to , lL.
  10. Step 10. Calculate the weight matrix using the updated link costs, set s = s + 1, and return to step 5.
  11. Step 11. The maximum relative difference between the current and previous OD matrix estimates should satisfy the following requirement: (13)
    If the above requirement is met, terminate the procedure and output the current estimates of OD vector matrix d and dispersion parameter θ as d(t) and θ(t). Otherwise, set t = t + 1, and proceed to step 12.
  12. Step 12. Calculate the new starting points as follows: , , and return to step 1.

Model Evaluation

To evaluate the performance of the proposed method, the root mean squared errors (RMSE) for OD matrix and link flows after convergence are defined as follows:

  1. (1) The root mean squared error (RMSE) of the estimated link flows relative to the true link flow xl is computed as follows: (14)
    Similarly, the RMSE of the observed (target) link flows relative to the true link flows xl can be defined as RMSE , where is replaced by in Eq 14.
  2. (2) The RMSE of the estimated OD matrix d(t) relative to the true OD matrix d can be defined as RMSE (OD): (15)
    Likewise, the RMSE of target OD matrix relative to the true OD matrix d is defined as RMSE (), where d(t) is replaced by in Eq 15.

Numerical Experiment and Result Analysis

A Hypothetical Network Test

In this section, the performance of the proposed approach is tested in a hypothetical network. The network and data proposed by Yang et al. [15] and Caggiani et al. [19] are adopted as the test bed with some slight modifications. The network (presented in Fig 1), is composed of 9 nodes (3 origin centroids and 3 destination centroids), and 14 links. The true and initial OD vector matrices d and for the SQP algorithm are shown in Table 1. The initial dispersion parameter is assumed to be 4, and the true dispersion parameter is fixed to 1.5. Note that the initial OD matrix and dispersion parameter are quite dissimilar from those of the ground truth data.

The following parameters in the Bureau of Public Roads (BPR) [50] link performance function are used: αk = 0.15 and βk = 4, ∀kK. In addition, the free flow travel time (tk) and capacity (Ck) for each link are predetermined as shown in Table 2.

The ground truth link flows can be generated by allocating the true OD matrix to the traffic network using SUE-Logit assignment method presented in Section 2.3. The true dispersion parameter is θ = 1.5, resulting in the link flows shown in Table 3. The set of links {5, 6, 7, 11, 13} is selected as the observed links.

In this example, we assume that the OD vector and link flow vector follow the Poisson distribution. The covariance matrices U (for OD demands) and V (for link flows) in Eq 4 can be assumed to be diagonal matrices [9, 14, 54]. The diagonal element for U, V and Q can be computed respectively through the following equations: Where cvd, cvx and cvθ represent the variation coefficients for OD demands, link flows, and dispersion parameter respectively. Specifically, these parameters are set as cvd = 0.3, cvx = 0.05, and cvθ = 0.1.

The target OD matrix , observed link flow vectors , and target dispersion parameter can be generated separately by adding random terms into the corresponding true values. The random terms are sampled from independent normal variables with zero means. For instance, the target OD matrix can be calculated by adding a random term with λd = 0.3 to the values of the true OD matrix divided by two, the observed link flow vectors can be generated by adding a random term with λf = 0.1, and the target parameter can be set as . In addition, the error tolerance threshold used in the optimization is set to ε1 = ε2 = 10−3. The convergence for theta is plotted in Fig 2, which shows the estimate slowly falling in the first 120 iterations before rapidly converging to the true value at 1.5099. This is a very slight deviation with the true value of 1.5. In addition, the convergence of the objective function is presented in Fig 3, where the value of the objective function sharply falls at the first iteration and then gradually decreases and levels off at a lower value. Poor initial choices of OD input vector and dispersion parameter may lead to the slower convergence.

In order to further evaluate the effectiveness of the proposed approach, a sensitivity analysis is conducted with parameter cvθ(CVT) varying from 0.1 to 0.5 and cvd(CVD) changing from 0.1 to 1. This generates 50 different estimates for RMSE (OD), RMSE (LF) and Theta as presented in Figs 46.

As shown in Fig 4, RMSE (OD) increases with the variation coefficient cvθ when cvd falls between 0.2 and 0.8. With cvθ fixed between 0.3 and 0.5, RMSE(OD) can be seen as a convex function of cvd. Alternatively, when cvθ is between 0.1 and 0.2, cvd has a negligible impact on RMSE (OD). Thus, the maximum value (19.9022) of RMSE (OD) can be found at cvd = 0.5 and cvθ = 0.5, and the minimum value (4.6859) is obtained at cvd = 0.3 and cvθ = 0.1. Compared with the initial RMSE (OD) of 93.8971 calculated from Table 1, a 78.8% reduction is achieved at the maximum RMSE (OD), and a 95% reduction is obtained at the minimum RMSE (OD).

Fig 5 shows the impact of cvd and cvθ on RMSE (LF). With the value of cvd fixed, RMSE (LF) increases with the variation coefficient cvθ. For a fixed value of cvθ, the RMSE (LF) decreases with an increase in cvd. Thus, we can conclude that maximum RMSE (LF) value of 19.6597 is located at cvd = 0.1 and cvθ = 0.5, and the minimum value of 8.6696 can be found at cvd = 0.6 and cvθ = 0.1. Compared with the initial RMSE (LF) value of 30.8347, 36.2% and 71.9% reductions can be achieved for the maximum value of RMSE (LF) and minimum value of RMSE (LF) respectively.

As shown in Fig 6, the value of theta varies negligibly with the choice of cvd and cvθ. In other words, the estimated value of theta always converges to approximately the true value. As shown in Fig 6, for a fixed value of cvd, the estimated θ is close to the true value for any given cvθ. For example, the value of θ fluctuates between 1.37 and 1.51 when cvd = 0.3. Likewise, for any fixed cvθ, the estimated θ varies minimally about the true value of θ using the proposed method. For example, the estimated θ is between 1.35 and 1.52 for cvθ = 0.1.

The above discussion reveals a fact that the initial value of , , and observed link flow vectors do not affect the theta estimation performance. This is equivalent to a convex optimization problem, where the optimal results tend to converge near the true dispersion parameter value. This implies that the estimate of θ is insensitive to the variation coefficients, and can be used as a stable and accurate parameter to determine travelers’ route decisions.

Comparison and Analysis

To further demonstrate the advantages of the proposed methodology, two OD matrix estimation methods are implemented and compared with the proposed approach. To make this comparison, we first implement the algorithm described in Yang et al. [15], which presents an optimization model for OD matrix estimation in congested networks using the logit-based SUE. The method described in Lo and Chan [16] is implemented for the second comparison. This method applies both statistical estimation and traffic assignment to simultaneously calculate the OD matrix and link choice proportions based on OD survey data and traffic counts. To maintain a fair comparison, the same test network and data set are applied in all cases.

The OD matrix estimation method proposed by Yang et al. [15] is given in section 2.2. The objective function is shown in Eq 16.

(16)

In Yang et al.’s work, the weighted Euclidean distance function is used to develop a unit weighting matrix and the value of theta is set to 1.5. The RMSE(OD), RMSE(LF), RMSE and RMSE( for Yang et al.’s approach are calculated and compared with the proposed approach in Table 4:

thumbnail
Table 4. Comparison between Yang et al.’s approach and the proposed approach with OD matrix and link flow estimation.

https://doi.org/10.1371/journal.pone.0146850.t004

As shown in Table 4, the proposed method yields significantly lower RMSE (OD) and RMSE (LF) relative to Yang et al.’s approach. Compared with the initial RMSE values, a 22.6% reduction in RMSE (OD) is achieved using the proposed approach, while only a 14.1% reduction is achieved using the method described in Yang et al. Similarly, the proposed approach resulted in a 34.7% reduction in RMSE(LF), while only a 28.6% reduction was achieved using Yang et al.’s approach. One reason that the dispersion parameter is estimated and integrated into the Eq 3 by in the proposed method, and it is able to yield a better estimate of the dispersion parameter than previous approaches. The other reason is that the covariance matrices U (for OD demands), V (for link flows) and Q (for dispersion parameter) are not a fixed variable during the calculation. These improvements can help the method enhance the estimation performance for the OD matrix and link flow vectors.

Lo and Chan [16] present the following maximum likelihood objective function: (17)

In Lo and Chan [16], it is assumed that the observed flows are equal to the true flows in the test network. For Lo and Chan’s algorithm, we set the target dispersion parameter to (This is also equal to the initial dispersion parameter value used in Lo and Chan [16]’s work), and the variation coefficients as follows: cvθ = 0.1, cvx = 0.05, and cvd = 0.3. In order to evaluate the performance of the proposed approach relative to that of Lo and Chan [16]’s method, RMSE (OD), RMSE (LF), and the estimated Theta are selected for comparison and shown in Table 5.

thumbnail
Table 5. Comparison between Lo and Chan’s approach and the proposed approach with OD matrix, link flow and Theta estimation.

https://doi.org/10.1371/journal.pone.0146850.t005

Unlike Lo and Chan’s method, random terms are added to the observed link flows in the proposed approach, thus introducing additional challenges for estimation. However, the results presented in Table 5 demonstrate that the method proposed in this paper outperforms Lo and Chan’s approach in terms of OD matrix, link flow, and Theta estimation accuracy.

Application to A Square Network in Seattle

A square network in Seattle is used as a congested network case study to demonstrate the applicability and transferability of the proposed approach in a real-world traffic network (Shown in Fig 7). Empirical data was collected from loop detectors located along one freeway section in Seattle area, and obtained for this research through the Strategic Highway Research Program 2 (SHRP 2 program) supported by Washington State Department of Transportation (WSDOT) [55].

thumbnail
Fig 7. Square network in Seattle.

Double circle nodes represent zone centroids (origins and destinations).

https://doi.org/10.1371/journal.pone.0146850.g007

The square test network used in this case study consists of 4 nodes and 8 links, where all nodes are centroids (origins and destinations). The topology of the test network is outlined in Fig 7. We assume that the study network is acyclic, such that the traffic flow starting from one node will leave the network before returning to the original node. Specifically, Links 1 and 2 represent the SR 520 Bridge connecting I-5 in Seattle and SR 202 in Redmond. Interstate 90 (I-90) is represented by Links 3 and 4, and Interstate 5 (I-5) is represented by Links 5 and 6. Links 7 and 8 represent Interstate 405 (I-405), which intersects I-90 in the south and SR 520 in the north.

Traffic flows were obtained from loop detectors installed at nodes 1, 2, 3 and 4, illustrated in Fig 8. The parameters for the BPR link performance cost function (Eq 18) were estimated based on the empirical data and are presented in Table 6.

(18)

Table 7 indicates the external traffic flow recorded for each node during peak hour, where 1-Link 1 represents the external traffic flow on Link 1 from node 1, and 2-Link 7 represents the external traffic flow on Link 7 from node 2, and so forth. To convert true link flows into a ground truth OD matrix, the flow proportion for each node η = 0.6 is assumed based on extensive video records and filed surveys. This implies that, for the traffic leaving each node, 60% exits the network from an adjacent node while 40% exits from the other nodes. In order to avoid circular flow in the OD calculation process, it is assumed that the final remaining traffic flow will leave from the last node before returning to the original node. Based on these assumptions, the ground truth OD matrix is calculated and shown in Table 8. In addition, the initial OD matrix can be computed by rounding the last digit of the true OD matrix as shown in Table 8.

thumbnail
Table 8. True OD matrix and initial OD matrix at peak hour for each OD pair.

https://doi.org/10.1371/journal.pone.0146850.t008

The true OD matrix in Table 8 is then used to assign the corresponding traffic flow into each link according to Eq 5 through Eq 10. The calculated traffic flows can be assumed to represent the true link flows, where link 1, 3, 5, 6, and 8 are selected as the observed links to estimate OD matrix shown in Table 9.

Similar to the hypothetical network, we assume that the OD demands and observed link flows follow the Poisson distribution, and the covariance matrices U and V can be assumed to be diagonal. The initial value of the dispersion parameter is set to 40.5. The remaining input parameters are set identically to the hypothetical network. In addition, a sensitivity analysis with 50 different combinations of variation coefficients cvd and cvθ was conducted to investigate the optimal parameter initialization for the proposed approach. The results of this sensitivity analysis are shown in Figs 911.

Fig 9 shows the value of RSME (OD) versus cvd and cvθ. For a fixed value of cvd between 0.1 and 0.7, RMSE(OD) increases with cvθ. RMSE(OD) is a concave function of cvθ when cvd is fixed between 0.8 and 1.0, and a convex function of cvd for a fixed value of cvθ between 0.1 and 0.5. Thus, the maximum RMSE (OD) of 85.6123 can be obtained at cvd = 0.7 and cvθ = 0.5, and the minimum value of 23.5917 can be obtained at cvd = 0.1 and cvθ = 0.1.

As shown in Fig 10, the value of RMSE (LF) increases with cvθ for a fixed value cvd. For a fixed value of cvθ, the value of RMSE (LF) decreases with an increase of cvd. The maximum RMSE (LF) of 82.5113 is found at cvd = 0.1 and cvθ = 0.5, and the minimum value of 7.3277 at cvd = 1.0 and cvθ = 0.1.

As noted in the hypothetical case, the choice of cvd and cvθ has very little impact on the estimation of Theta. As shown in Fig 11, the estimated dispersion parameter θ is between 20.8327 (cvd = 0.5 and cvθ = 0.5) and 22.7165 (cvd = 0.7 and cvθ = 0.2) in all cases. The best estimate of dispersion parameter θ can be found between 20.8327 and 22.7165.

Finally, using the BPR link performance cost function parameters described in Table 6, different combinations of variation coefficient cvd = 0.3 and cvθ = 0.1; cvd = 0.5 and cvθ = 0.5;cvd = 0.7 and cvθ = 0.2 are used to estimate theta for the actual network.

It is interesting to observe that the estimated RMSE(OD), RMSE(LF), and Theta for both hypothetical and actual networks exhibit a similar trend yet have obvious differences. Two primary reasons may explain these differences: First, the network topology is quite different for the two scenarios. The hypothetical network is unidirectional, where each node can be either origin or destination. In contrast, the actual network is bidirectional, where each node is both origin and destination, and thus multiple paths may exist between each OD pair. For example, the traffic flows on both 1-Link 1 and 1-Link 6 contribute to the OD demands from node 1 to node 2. Second, compared with the hypothetical network with equal cost parameters for all links, a more realistic BPR link performance cost function is adopted for the actual network. In the real-world network, the parameters (e.g. free-flow travel time and link capacity) are calibrated for each link based on empirical data. That said, the sensitivity analysis for Theta produced similar results for both the hypothetical and actual networks, indicating that this parameter is not sensitive to the choice of variation coefficients. In addition, the theta estimates obtained using a range of different parameter settings exhibits a similar and regular trend over time of day as shown in Fig 12. These findings provide guidance for initial parameter selection, and offer useful insight for interpreting modeling results.

Conclusions

This paper proposes a two-stage algorithm to simultaneously estimate origin-destination matrices and link choice proportions by incorporating a dynamic dispersion parameter into the route choice model. The dispersion parameter θ is of practical significance in describing travelers’ route choice decisions, but has typically been assumed constant in previous studies. Finding the optimal dispersion parameter is not a straightforward task. To address this issue, this paper presents a model calibration procedure to simultaneously estimate the dispersion parameter θ, link choice proportions, and OD matrix. In order to obtain the Generalized Least Square (GLS) estimators of the above listed parameters, a two-stage algorithm is proposed which integrates GLS estimation into the SUE traffic assignment procedure. The first and second stages of the algorithm are applied iteratively until the maximum relative difference presented in Step 11 is achieved, after which the estimated OD matrix, link choice proportion, and dispersion parameter θ can be obtained. The SQP approach based on the extended quasi-Newton method is used to search for the optimal solution in the first stage of the algorithm. The SUE traffic assignment procedure is applied to incorporate both OD matrix and link choice proportion estimation into the second stage of the algorithm, and MSA is used to obtain the equilibrium link flows.

A hypothetical network was constructed to test the performance of the proposed approach, followed by a comprehensive sensitivity analysis with 50 combinations of variation coefficient combinations cvd (CVD) and cvθ (CVT) to investigate the stability of the estimated OD matrix, link flows, and Theta. A comparison with two different methods described in Yang et al. [15] and Lo and Chan [16] suggests that the proposed approach can achieve superior performance in terms of RMSE (OD), RMSE (LF), and accuracy of the estimated Theta parameter. Moreover, a case study is presented using a real-world congested square network in Seattle, WA to demonstrate the practicality of the proposed approach, in which the true OD matrix and observed link flows are calculated via ground-truth traffic count data collected by loop detectors. The proposed method is shown to be robust under a range of initial parameter values. The RMSE (OD) can be reduced from 3426.9 to 23.6 at cvd = 0.1 and cvθ = 0.1 when traffic flows are observed on five out of eight links. In addition, the estimated dispersion parameter exhibits a consistent and regular trend by time of day for all combinations of initial parameters. For future research, the proposed approach should be tested on a network of greater complexity and size, and the impact of input data inaccuracy should be considered. Additionally, further work is needed to determine the number and location of observed links required for accurate OD estimation using the proposed approach.

Supporting Information

S1 Dataset. The dataset includes the Link Speed Data and Link Volume data, and the data were collected from loop detectors located along the freeway section (I-5, I-90, I-405 and SR 520) in Seattle area, and are retrieved via the Strategic Highway Research Program 2 (SHRP 2 program).

The file named as “S1 Link Speed Data” records the average speed for all links every 20-second time interval, and the other file named as “S1 Link Volume data” records volume for all links every 20-second time interval.

https://doi.org/10.1371/journal.pone.0146850.s001

(RAR)

Acknowledgments

The helpful comments from the two anonymous reviewers are gratefully acknowledged. This research is supported by National Natural Science Foundation of China (Project No. 71402011, 71471024, 51408019, 71301180, 51329801), National Social Science Foundation of Chongqing of China (No. 2013YBJJ035), and the Scientific and Technological Research Program of Chongqing Municipal Education Commission (No. KJ1400307), and the Natural Science Foundation of Chongqing of China(No. cstc2015jcyjA30012), National Key Technologies R&D Program of China (2014BAG01B03), Science and Technology Project on Transportation Construction by the Ministry of Transport of China (2015318835200).

Author Contributions

Conceived and designed the experiments: Yong W. XM Yinhai W. Performed the experiments: Yong W. XM YL KG. Analyzed the data: Yong W. XM YL KG MX. Contributed reagents/materials/analysis tools: KCH Yinhai W. Wrote the paper: Yong W.

References

  1. 1. Simini F, González M C, Maritan A, Barabási A-L. A universal model for mobility and migration patterns. Nature, 2012; 484: 96–100. pmid:22367540
  2. 2. Simini F, Maritan A, Néda Z. Human mobility in a continuum approach. PloS one, 2013; 8(3): e60069. pmid:23555885
  3. 3. Yan X-Y, Zhao C, Fan Y, Di Z, Wang W-X. Universal predictability of mobility patterns in cities. Journal of the Royal Society Interface, 2014; 11: 0834.
  4. 4. Gao Z K, Jin N D. A directed weighted complex network for characterizing chaotic dynamics from time series. Nonlinear Analysis: Real World Applications, 2012; 13(2): 947–952.
  5. 5. Tang J J, Wang Y H, Wang H, Zhang S, Liu F. Dynamic analysis of traffic time series at different temporal scales: A complex networks approach. Physica A: Statistical Mechanics and its Applications, 2014; 405, 303–315.
  6. 6. Gao Z K, Yang Y X, Fang P C, Zou Y, Xia C Y, Du M. Multiscale complex network for analyzing experimental multivariate time series. Europhysics Letters, 2015; 109(3): 30005.
  7. 7. Cheung WM, Wong SC, Tong CO. Estimation of a time-dependent origin-destination matrix for congested highway networks. Journal of Advanced Transportation, 2010; 40(1): 95–117.
  8. 8. Ma XL, Yu HY, Wang YP, Wang YH. Large-scale Transportation Network Congestion Evolution Prediction Using Deep Learning Theory. PloS one, 2015; 10(3): e0119044. pmid:25780910
  9. 9. Cascetta E. Estimation of trip matrices from traffic counts and survey data: a generalized least squares approach. Transportation Research Part B: Methodological, 1984; 18(4–5): 289–299.
  10. 10. Cascetta E, Nguyen S. A unified framework for estimating or updating origin/destination matrices from traffic counts. Transportation Research Part B: Methodological, 1988; 22(6): 437–455.
  11. 11. Yang H, Sasaki T, Iida Y, Asakura Y. Estimation of origin-destination matrices from link traffic counts on congested networks. Transportation Research Part B: Methodological, 1992; 26(6), 417–434.
  12. 12. Hazelton ML. Some comments on origin-destination matrix estimation. Transportation Research Part A: Policy and Practice, 2003; 37(10), 811–822.
  13. 13. Cantarella GE. A general fixed-point approach to multimode multi-user equilibrium assignment with elastic demand. Transportation Science, 1997; 31(2), 107–128.
  14. 14. Cascetta E, Postorino MN. Fixed point approaches to the estimation of O/D matrices using traffic counts on congested networks. Transportation Science, 2001; 35(2): 134–147.
  15. 15. Yang H, Meng Q, Bell MGH. Simultaneous estimation of the origin-destination matrices and travel-cost coefficient for congested networks in a stochastic user equilibrium. Transportation Science, 2001; 35(2): 107–123.
  16. 16. Lo HP, Chan CP. Simultaneous estimation of an origin-destination matrix and link choice proportions using traffic counts. Transportation Research Part A: Policy and Practice, 2003; 37(9): 771–788.
  17. 17. Manley E. Estimating urban traffic patterns through probabilistic interconnectivity of road network junctions. PLOS ONE, 2015; 10(5): e0127095. pmid:26009884
  18. 18. Ródenas RG, Marín Á. Simultaneous estimation of the origin–destination matrices and the parameters of a nested logit model in a combined network equilibrium model. European Journal of Operational Research, 2009; 197(1): 320–331.
  19. 19. Caggiani L, Ottomanelli M, Sassanelli D. A fixed point approach to origin-destination matrices estimation using uncertain data and fuzzy programming on congested networks. Transportation Research Part C: Emerging Technologies, 2013; 28: 130–141.
  20. 20. Shao H, Lam WHK, Sumalee A, Chen A, Hazelton ML. Estimation of mean and covariance of peak hour origin-destination demands from day-to-day traffic counts. Transportation Research Part B: Methodological, 2014; 68: 52–75.
  21. 21. Van Zuylen HJ, Willumsen LG. The most likely trip matrix estimated from traffic counts. Transportation Research Part B: Methodological, 1980; 14(3): 281–293.
  22. 22. Bell MGH. The estimation of an origin destination matrix from traffic counts. Transportation Science, 1983; 17(2): 198–217.
  23. 23. Spiess H. A maximum likelihood model for estimating origin-destination matrices. Transportation Research Part B: Methodological, 1987; 21(5): 395–412.
  24. 24. Maher MJ. Inferences on trip matrices from observations on link volumes: a Bayesian statistical approach. Transportation Research Part B: Methodological, 1983; 17(6): 435–447.
  25. 25. Bell MGH. The estimation of origin-destination matrices by constrained generalized least squares. Transportation Research Part B: Methodological, 1991; 25(1): 13–22.
  26. 26. Sheu JB. A composite traffic flow modeling approach for incident-responsive network traffic assignment. Physica A: Statistical Mechanics and its Applications, 2006; 367: 461–478.
  27. 27. Beckmann M, McGuire CB, Winsten CB. Studies in the Economics of Transportation. New Haven: Yale University Press; 1956.
  28. 28. Fisk CS, Boyce DE. A note on trip matrix estimation from link traffic count data. Transportation Research Part B: Methodological, 1983; 17(3): 245–250.
  29. 29. Lam WHK, Huang HJ. A combined trip distribution and assignment model for multiple user classes. Transportation Research Part B: Methodological, 1992; 26(4): 275–287.
  30. 30. Fisk CS. On combining maximum entropy trip matrix with user optimal assignment. Transportation Research Part B: Methodological, 1988; 22(1): 69–73.
  31. 31. Fisk CS. Trip matrix estimation from link traffic counts: The congested network case. Transportation Research Part B: Methodological, 1989; 23(5): 331–336.
  32. 32. Han SJ. A route-based solution algorithm for dynamic user equilibrium assignments. Transportation Research Part B: Methodological, 2007; 41(10): 1094–1113.
  33. 33. Lu CC, Mahmassani HS, Zhou XS. A bi-criterion dynamic user equilibrium traffic assignment model and solution algorithm for evaluating dynamic road pricing strategies. Transportation Research Part C: Emerging Technologies, 2008; 16(4): 371–389.
  34. 34. Inoue SI, Maruyama T. Computational Experience on Advanced Algorithms for User Equilibrium Traffic Assignment Problem and Its Convergence Error. Procedia-Social and Behavioral Sciences, 2012; 43: 445–456.
  35. 35. Kumar A, Peeta S. Entropy weighted average method for the determination of a single representative path flow solution for the static user equilibrium traffic assignment problem. Transportation Research Part B: Methodological, 2015; 71: 213–229.
  36. 36. Zhang HL, Mahmassani HS, Lu CC. Dynamic pricing, heterogeneous users and perception error: Probit-based bi-criterion dynamic stochastic user equilibrium assignment. Transportation Research Part C: Emerging Technologies, 2013; 27: 189–204.
  37. 37. Daganzo CF, Sheffi Y. On stochastic models of traffic assignment. Transportation Science, 1977; 11(3): 253–274.
  38. 38. Sheffi Y, Powell WB. An algorithm for the equilibrium assignment prolem with random link times. Networks, 1982; 12(2): 191–207.
  39. 39. Liu S, Fricker JD. Estimation of a trip table and the θ parameter in a stochastic network. Transportation Research Part A: Policy and Practice, 1996; 30(4): 287–305.
  40. 40. Meng Q, Lam WH, Yang L. General stochastic user equilibrium traffic assignment problem with link capacity constraints. Journal of Advanced Transportation, 2008; 42(4): 429–465.
  41. 41. Meng Q, Liu Z. Mathematical models and computational algorithms for probit-based asymmetric stochastic user equilibrium problem with elastic demand. Transportmetrica, 2012; 8(4): 261–290.
  42. 42. Lam WHK, Yin YF. An sctivity-based time-dependent traffic assignment model. Transportation Research Part B: Methodological, 2001; 35(6): 549–574.
  43. 43. Lam WHK, Li ZC, Huang HJ, Wong SC. Modeling time-dependent travel choice problems in road networks with multiple user classes and multiple parking facilities. Transportation Research Part B: Methodological, 2006; 40(5): 368–395.
  44. 44. Londono G, Lozano A. Dissuasive queues in the time dependent traffic assignment problem. Procedia-Social and Behavioral Sciences, 2014; 162: 378–387.
  45. 45. Bell MGH. Alternatives to dial’s logit assignment algorithm. Transportation Research Part B: Methodological, 1995; 29(4): 287–295.
  46. 46. Conti PL, Giovanni LD, Naldi M. Blind maximum likelihood estimation of traffic matrices under long-range dependent traffic. Computer Networks, 2010; 54(15): 2626–2639.
  47. 47. Guo XL, Yang H, Liu TL. Bounding the inefficiency of logit-based stochastic user equilibrium. European Journal of Operational Research, 2010; 201(2), 463–469.
  48. 48. Akamatsu T. A dynamic traffic equilibrium assignment paradox. Transportation Research Part B: Methodological, 2000; 34(6): 515–531.
  49. 49. Boggs PT, Tolle JW. Sequential quadratic programming for large-scale nonlinear optimization. Journal of Computational and Applied Mathematics, 2000; 124(1–2): 123–137.
  50. 50. Bureau of Public Roads. Traffic assignment manual. U.S. Department of Commerrce, Urban Planning Division, Washington, D. C., 1964.
  51. 51. Lu ZB, Rao WM, Wu YJ, Guo L, Xia JX. A Kalman filter approach to dynamic OD flow estimation for urban road networks using multi-sensor data. Journal of Advanced Transportation, 2015; 49(2): 210–227.
  52. 52. Liu HX, He XZ, He BS. Method of successive weighted averages (MSWA) and self-regulated averaging schemes for solving stochastic user equilibrium problem. Networks and Spatial Economics, 2009; 9(4): 485–503.
  53. 53. Wong SC. On the convergence of Bell’s logit assignment formulation. Transportation Research Part B: Methodological, 1999; 33(8): 609–616.
  54. 54. Cascetta E, Russo F. Calibrating aggregate travel demand model with traffic counts: estimators and statistical performance. Transportation, 1997; 24(3): 271–293.
  55. 55. Ma X, Wu Y, and Wang Y. DRIVE Net: An E-Science of Transportation Platform for Data Sharing, Visualization, Modeling, and Analysis. Transportation Research Record: Journal of the Transportation Research Board. 2011; 2215: 37–49.