A Two-Stage Algorithm for Origin-Destination Matrices Estimation Considering Dynamic Dispersion Parameter for Route Choice

This paper proposes a two-stage algorithm to simultaneously estimate origin-destination (OD) matrix, link choice proportion, and dispersion parameter using partial traffic counts in a congested network. A non-linear optimization model is developed which incorporates a dynamic dispersion parameter, followed by a two-stage algorithm in which Generalized Least Squares (GLS) estimation and a Stochastic User Equilibrium (SUE) assignment model are iteratively applied until the convergence is reached. To evaluate the performance of the algorithm, the proposed approach is implemented in a hypothetical network using input data with high error, and tested under a range of variation coefficients. The root mean squared error (RMSE) of the estimated OD demand and link flows are used to evaluate the model estimation results. The results indicate that the estimated dispersion parameter theta is insensitive to the choice of variation coefficients. The proposed approach is shown to outperform two established OD estimation methods and produce parameter estimates that are close to the ground truth. In addition, the proposed approach is applied to an empirical network in Seattle, WA to validate the robustness and practicality of this methodology. In summary, this study proposes and evaluates an innovative computational approach to accurately estimate OD matrices using link-level traffic flow data, and provides useful insight for optimal parameter selection in modeling travelers’ route choice behavior.


Introduction
Urban sprawl and population growth have resulted in increasingly severe traffic congestion in major cities around the world. City planners and decision makers have recognized the need for comprehensive traffic management strategies to meet the challenges of rapidly evolving built environments and population demographics. Effective transportation polices and control measures can improve traffic safety and quality of service, as well as promoting economic development and reducing air pollution. Obtaining origin-destination (OD) traffic demand matrix in low-cost and high-accuracy manner not only becomes a problem transportation science, but also draws attentions from many scholars in various scientific fields. For example, researchers in statistical physics and complex systems recently proposed a number of novel methods to estimate OD matrix directly from population data [1,2,3,4,5,6]. Reliable OD matrix estimation can provide critical insight for traffic management, operations, and urban planning efforts to mitigate congestion [7,8]. Thus, a reliable OD matrix estimation method is indispensable for both transportation planners and traffic engineers.
A number of approaches have been developed for estimating OD matrices in the past several decades [9,10,11,12,13,14,15,16,17,18,19,20]. Compared with conventional survey-based method, data-driven OD estimation methods relying on link-level traffic flow measurements require less effort and offer significantly reduced time and cost for data acquisition and processing. For such methods, observed traffic flows at key points throughout the network should be known as prior information for OD matrix initialization.
Past research on this topic has considered range of different optimization methods, including entropy maximizing estimators [21,22], maximum likelihood estimation [23], Bayesian inference estimation [24], generalized least squares (GLS) [9,10,25] to estimate OD demands. Entropy maximizing estimators are used to maximize the spread of trip distributions on all available paths (routes) where the observed traffic flows are used as the only information (i.e. without a target trip matrix). Maximum likelihood estimation aims to maximize the likelihood of the closeness between target OD matrix and estimated OD matrix. In the Bayesian inference approach, the target OD matrix is a prior probability function of the estimated OD matrix on a basis of observed traffic count data. The GLS estimator is a robust and efficient linear unbiased estimator, which can solve the estimation of OD matrix by minimizing the Weighted Euclidean Distances (WED) between the target data and the solution data.
User equilibrium (UE) assignment models are commonly used to obtain path choice behavior based on the estimated OD demand. Deterministic UE assignment models assume that all users have access perfect information about the generalized link travel costs, and select a route with the lowest perceived travel cost [26]. Beckman [27] formulated the UE assignment model by assuming that the OD demands are a function of level of service. A combined distribution and assignment model which relies on link-level traffic flow data was presented by Fisk and Boyce [28], and extended by Lam and Huang [29] to address multiclass-user transportation networks. Fisk [30,31] proposed a combined entropy maximizing model with UE constraints. Yang et al. [11] integrated the GLS technique with a UE traffic assignment model for OD matrix estimation, presented in the form of a convex bi-level optimization problem. Summaries of the more recent contributions to UE-based traffic assignment are provided in Han [32], Lu et al. [33], Inoue and Maruyama [34], Kumar and Peeta [35].
The stochastic user equilibrium (SUE) principle allows the perceived cost to vary between individuals in a heterogeneous population, which can be seen as a more realistic approach than deterministic UE [15,36], in which the perceived travel costs cannot vary between travelers. The probit SUE was first formulated as a generalization of user equilibrium by Daganzo and Sheffi [37], and developed by Sheffi and Powell [38] as a mathematical programming problem. Liu and Fricker [39] presented a two-stage SUE approach to estimate OD matrices and the probit dispersion parameter in an iterative manner. Yang et al. [15] improved on the methods described in Liu and Fricker by incorporating link traffic flows and travel cost obtained using logit-based SUE traffic assignment. Meng et al. [40] presented a linearly constrained model and solution algorithm for the probit SUE problem with fixed demand and separable link travel time functions. This modeling approach was extended in Meng et al. [41] using elastic demand and non-separable link travel time functions. Time-dependent traffic assignment can be also formulated as a multinomial logit model [42,43,44], and this has become one of the most common methods for SUE-based traffic assignment [45,46,47]. In a fixed-point formulation, fixed target demands or link flows are used to establish model based on UE and SUE principles [13,14,19].
In the multinomial logit model formulation, the link choice probability is a function of a dispersion parameter θ [16], which describes road users' perception of travel costs. Though the dispersion parameter θ is predetermined in many previous studies [14,36,45,46,47,48], here we assume that this value should be allowed to change with traffic conditions. In addition, Lo and Chan [16] proposed a maximum likelihood procedure for simultaneously estimating the OD matrix and the dispersion parameter θ, while the link choice proportions and link flows can be further calculated based on the maximum likelihood estimators of OD matrix and θ. Compared with the previous studies, the main contributions of this paper lie in: (1) A fixedpoint model is formulated with a dynamic dispersion parameter θ, where the estimation of link choice proportions is integrated into the optimization procedure; (2) A GLS estimator is utilized to train this model, and the link choice proportions can be simultaneously calculated based on the OD matrix and dispersion parameter through a multinomial logit model; (3) A two-stage iterative algorithm is presented to refine the OD matrix and dispersion parameter estimates, and Sequential Quadratic Programming (SQP) from the extended quasi-Newton method is applied in the two-stage algorithm process [49].
The remainder of this paper is organized as follows: In Section2, relevant notation, definitions, and model formulations are presented, followed by a link choice proportion approach to calculate the observed link flow using a true OD matrix. A two-stage algorithm is described in Section 3, along with model implementation details. The performance of the proposed approach is tested in a hypothetical network, and a sensitivity analysis is conducted using a range of variation coefficients. Results are presented and compared with those obtained through other established OD estimation methods. In section 4, results are presented for a realworld network using loop detector data in the city of Seattle, WA to demonstrate the practicality of the proposed approach. Finally, conclusions are summarized in Section 5.     The Fixed Point Model with Dynamic Dispersion Parameter. As presented in the previous subsection, the estimated OD vector matrix is expressed as d = [d 1 ,d 2 ,. . .,d j ,. . .,d τ ]', where d j denotes the mean traffic flow of the jth element of d for OD pair j. Consider an OD pair j connected by a link k which is associated with a link performance cost function c k (f k ) equal to the cost of using link k. The link performance cost function [50] is expressed during the traffic assignment procedure in Eq 1: The The OD matrix can be estimated via a fixed point formulation by considering the target OD matrix and observed link flows as follows [9,10,13,14,15,16,18,19,51]: Where: } is the assignment matrix, which represents the proportion of OD pair j using the observed link k; In this study, the dispersion parameter is integrated into the objective function (Eq 2) [16,19] as follows: This model can be seen as a Stochastic User Equilibrium (SUE) problem [13]. The Generalized Least Square (GLS) estimator can be used to solve Eq 3 by minimizing the Weighted Euclidean Distances (WED) between the target data and the solution vector, and Eq 3 can be then reorganized as shown in Eq 4 [9, 10, 15, 48]: Where: P(d,θ) = {p kj (d j ,θ)} is the assignment matrix, and is a function of both OD matrix and dispersion parameter θ.
The matrix for link choice proportions P can be generally assumed fixed during the optimization procedure [10,13,14,19]. This procedure performs well for uncongested traffic conditions or an idealized traffic network with fixed link costs. However, when the network becomes congested, users' choices are increasingly influenced by adverse traffic condition. In this case, link flow and cost are not independent, and the assignment matrix P should be assumed to vary within each optimization step for link flow and OD estimation. Similarly, the GLS estimators of d and θ can be also obtained by solving Eq 4.
The Link Choice Proportion Calculation Using the Dispersion Parameter. As mentioned in notation and definitions subsection, the link flow and cost will be updated when a new set of values of d and θ is received. Drivers' link choice decisions are influenced by the network-wide traffic condition, and thus the link choice proportion matrix P should be allowed to vary as well. The method of successive average (MSA) is adopted to calculate equilibrium link flows in the traffic assignment procedure [7,16,45,52].
The cost of path r connecting the OD pair j can be expressed as: The probability P rj can be then computed according to the path choice logit model [45]: For a driver traveling along the path r, the weight assigned to link k is equal to exp(-c k θ). It is worth noting that the sum of probabilities over all feasible paths for each OD pair is equal to one.
As previously noted, W' = [w' mn ] is the initial weight matrix of all possible paths connecting each OD pair. With the initial weight is set to w' mn = exp(-c k θ), then W', W 0 2 , and W 0 3 represent the weight matrix in the group of paths with one link, two links and three links respectively. Therefore, the weight matrix for all possible paths can be formulated as: Wong [53] and Lo and Chan [16] have proven that the right side of Eq 7 is convergent for any acyclic networks, and is equal to W = (E−W') −1 − E. Therefore, the probability of a trip from node m to node n (OD pair j) choosing link k can be calculated as follows: Where link k connects node g and node v, and w mn expresses weight matrix of all possible paths connecting nodes m and n, m,n 2 M. w mn is set to 1 for all nodes in the network. Following the previous definition, the auxiliary mean traffic flow y ðsÞ k of link k is defined for each incoming d and θ via the following equation: The equilibrium traffic link flows can be then obtained using the MSA method. Specifically, the flow of link k can be calculated at the (s+1)th iteration with the following equation: As shown in Eq 10, the flow of link k at the (s+1)th iteration is equal to the mean of the auxiliary traffic flow of link k in the previous s iterations.
When a new set of values of d and θ is received, the matrix P of link choice proportions is updated following the procedure described above, and is then integrated into the Eq 4 to update the values of d and θ. This optimization procedure continues until convergence of the OD matrix and dispersion parameter estimation is reached.

Model Solution Algorithm
To solve the Stochastic User Equilibrium (SUE) problem described above, a two-stage algorithm for GLS estimation and SUE traffic assignment is proposed: First, the OD matrix d and the dispersion parameter θ are simultaneously estimated under the condition of the fixed link flows, link costs, and weight matrix. Second, the link flows, link costs, and link choice proportions are updated according to the new values of d and θ in the SUE assignment process. The two-stage algorithm is executed iteratively until the convergence of values of d and θ is reached. Sequential quadratic programming (SQP) from the extended quasi-Newton method is chosen as the solution method [49].

Two-Stage Algorithm
The initialization procedure of the two-stage algorithm can be described as follows: 1. Initialize the counter t = 0, set the initial OD vector matrix d ð0Þ ¼ d, the initial dispersion parameter y ð0Þ ¼ y, and the initial link flow x ð0Þ k ¼ 0, k 2 K. 2. Calculate the initial link costs for all links in the network using Eq 1, and calculate the weight matrix W for all paths based on the initial link costs and θ (0) .
3. Calculate the link choice proportion matrix P using the weight matrix W and θ (0) .
4. Calculate the initial mean auxiliary traffic flow for all the observed links with Eq 9, and update t = t + 1.
The first stage of the algorithm is described as follows: Step 1. The objective function (Eq 4) can be updated with the new mean auxiliary observed link flows as follows: Where: U −1 , P (t) , V −1 , and Q −1 can be updated using the new mean auxiliary observed link flows, estimated OD vector matrix, dispersion parameter, and link flow vector respectively; The feasible set for d and θ should meet the requirements d ! 0, θ > 0. When the value of θ approaches zero, the path choice probabilities for all paths tend to be equal. As the value of θ increases, the path choice probabilities tend to be deterministic.
Step 2. Use the SQP algorithm to obtain a new set of values of d (t) and θ (t) that minimizes the objective function. The starting point for optimizing the OD vectord ðtÞ and dispersion parameterỹ ðtÞ should be fixed in advance. During the iterative process of the SQP algorithm, whenever a new value θ is received, the link choice proportion matrix P will be updated by changing the value of exp(−θc k ) in Eq 8, while the link cost and weight matrix should remain unchanged. The second stage of the algorithm can be described as follows: Step 3. Initialize the counter s = 1.
Step 4. Calculate the weight matrix W with the new dispersion parameter θ (t) .
Step 5. Calculate the link choice proportion matrix P (t) using the weight matrix W and dispersion parameter θ (t) .
Step 6. Calculate the mean auxiliary traffic flow for all observed links as follows: Step 7. Calculate the equilibrium traffic link flow of link k via the MSA method: x l ¼ x ðsÞ l þ 1 s ðy ðsÞ l À x ðsÞ l Þ; l 2 L Step 8. The maximum relative difference between current and previous mean link flows should satisfy the following requirement: If the above requirement is met, the algorithm proceeds directly to step 11, otherwise proceed to step 9.
Step 9. Calculate the new link costs according to x ðsþ1Þ l , l 2 L.
Step 10. Calculate the weight matrix using the updated link costs, set s = s + 1, and return to step 5.
Step 11. The maximum relative difference between the current and previous OD matrix estimates should satisfy the following requirement: If the above requirement is met, terminate the procedure and output the current estimates of OD vector matrix d and dispersion parameter θ as d (t) and θ (t) . Otherwise, set t = t + 1, and proceed to step 12.
Step 12. Calculate the new starting points as follows:

Model Evaluation
To evaluate the performance of the proposed method, the root mean squared errors (RMSE) for OD matrix and link flows after convergence are defined as follows: (1) The root mean squared error (RMSE) of the estimated link flows x ðsþ1Þ l relative to the true link flow x l is computed as follows: Similarly, the RMSE of the observed (target) link flowsf l relative to the true link flows x l can be defined as RMSE ðL FÞ, where x ðsþ1Þ l is replaced byf l in Eq 14.
(2) The RMSE of the estimated OD matrix d (t) relative to the true OD matrix d can be defined as RMSE (OD):

Numerical Experiment and Result Analysis A Hypothetical Network Test
In this section, the performance of the proposed approach is tested in a hypothetical network. The network and data proposed by Yang et al. [15] and Caggiani et al. [19] are adopted as the test bed with some slight modifications. The network (presented in Fig 1), is composed of 9 nodes (3 origin centroids and 3 destination centroids), and 14 links. The true and initial OD vector matrices d andd for the SQP algorithm are shown in Table 1. The initial dispersion parameterỹ is assumed to be 4, and the true dispersion parameterŷ is fixed to 1.5. Note that the initial OD matrixd and dispersion parameterỹ are quite dissimilar from those of the ground truth data.
The following parameters in the Bureau of Public Roads (BPR) [50] link performance function are used: α k = 0.15 and β k = 4, 8k 2 K. In addition, the free flow travel time (t k ) and capacity (C k ) for each link are predetermined as shown in Table 2.
The ground truth link flows can be generated by allocating the true OD matrix to the traffic network using SUE-Logit assignment method presented in Section 2.3. The true dispersion parameter is θ = 1.5, resulting in the link flows shown in Table 3. The set of links {5, 6, 7, 11, 13} is selected as the observed links.
In this example, we assume that the OD vector and link flow vector follow the Poisson distribution. The covariance matrices U (for OD demands) and V (for link flows) in Eq 4 can be assumed to be diagonal matrices [9,14,54]. The diagonal element for U, V and Q can be computed respectively through the following equations: Where cv d , cv x and cv θ represent the variation coefficients for OD demands, link flows, and dispersion parameter respectively. Specifically, these parameters are set as cv d = 0.3, cv x = 0.05, and cv θ = 0.1.
The target OD matrix d, observed link flow vectorsf , and target dispersion parameter y can be generated separately by adding random terms into the corresponding true values. The random terms are sampled from independent normal variables with zero means. For instance, the target OD matrix can be calculated by adding a random term with λ d = 0.3 to the values of the true OD matrix divided by two, the observed link flow vectors can be generated by adding a random term with λ f = 0.1, and the target parameter can be set as y ¼ 4. In addition, the error tolerance threshold used in the optimization is set to ε 1 = ε 2 = 10 −3 . The convergence for theta is plotted in Fig 2, which shows the estimate slowly falling in the first 120 iterations before rapidly converging to the true value at 1.5099. This is a very slight deviation with the true value of 1.5. In addition, the convergence of the objective function is presented in Fig 3, where the value of the objective function sharply falls at the first iteration and then gradually decreases and levels off at a lower value. Poor initial choices of OD input vector and dispersion parameter may lead to the slower convergence.
In order to further evaluate the effectiveness of the proposed approach, a sensitivity analysis is conducted with parameter cv θ (CVT) varying from 0.1 to 0.     Table 1, a 78.8% reduction is achieved at the maximum RMSE (OD), and a 95% reduction is obtained at the minimum RMSE (OD). As shown in Fig 6, the value of theta varies negligibly with the choice of cv d and cv θ . In other words, the estimated value of theta always converges to approximately the true value. As shown in Fig 6, for a fixed value of cv d , the estimated θ is close to the true value for any given cv θ . For example, the value of θ fluctuates between 1.37 and 1.51 when cv d = 0.3. Likewise, for any fixed cv θ , the estimated θ varies minimally about the true value of θ using the proposed method. For example, the estimated θ is between 1.35 and 1.52 for cv θ = 0.1.
The above discussion reveals a fact that the initial value of d, y, and observed link flow vectorsf do not affect the theta estimation performance. This is equivalent to a convex optimization problem, where the optimal results tend to converge near the true dispersion parameter value. This implies that the estimate of θ is insensitive to the variation coefficients, and can be used as a stable and accurate parameter to determine travelers' route decisions.

Comparison and Analysis
To further demonstrate the advantages of the proposed methodology, two OD matrix estimation methods are implemented and compared with the proposed approach. To make this comparison, we first implement the algorithm described in Yang et al. [15], which presents an optimization model for OD matrix estimation in congested networks using the logit-based  Origin-Destination Matrices Estimation SUE. The method described in Lo and Chan [16] is implemented for the second comparison. This method applies both statistical estimation and traffic assignment to simultaneously calculate the OD matrix and link choice proportions based on OD survey data and traffic counts. To maintain a fair comparison, the same test network and data set are applied in all cases. The OD matrix estimation method proposed by Yang et al. [15] is given in section 2.2. The objective function is shown in Eq 16.
ðd; yÞ ¼ arg min   Table 4: As shown in Table 4, the proposed method yields significantly lower RMSE (OD) and RMSE (LF) relative to Yang et al.'s approach. Compared with the initial RMSE values, a 22.6% reduction in RMSE (OD) is achieved using the proposed approach, while only a 14.1% reduction is achieved using the method described in Yang et al. Similarly, the proposed approach resulted in a 34.7% reduction in RMSE(LF), while only a 28.6% reduction was achieved using Yang et al.'s approach. One reason that the dispersion parameter is estimated and integrated into the Eq 3 by F 3 ðy; yÞ in the proposed method, and it is able to yield a better estimate of the dispersion parameter than previous approaches. The other reason is that the covariance matrices U (for OD demands), V (for link flows) and Q (for dispersion parameter) are not a fixed variable during the calculation. These improvements can help the method enhance the estimation performance for the OD matrix and link flow vectors.
Lo and Chan [16] present the following maximum likelihood objective function: ðd; yÞ ¼ arg max d!0;y>0 ln Lðy; d jf ; dÞ ð 17Þ In Lo and Chan [16], it is assumed that the observed flows are equal to the true flows in the test network. For Lo and Chan's algorithm, we set the target dispersion parameter to y ¼ 4 (This is also equal to the initial dispersion parameter value used in Lo and Chan [16]'s work), and the variation coefficients as follows: cv θ = 0.1, cv x = 0.05, and cv d = 0.3. In order to evaluate the performance of the proposed approach relative to that of Lo and Chan [16]'s method, RMSE (OD), RMSE (LF), and the estimated Theta are selected for comparison and shown in Table 5.
Unlike Lo and Chan's method, random terms are added to the observed link flows in the proposed approach, thus introducing additional challenges for estimation. However, the results presented in Table 5 demonstrate that the method proposed in this paper outperforms Lo and Chan's approach in terms of OD matrix, link flow, and Theta estimation accuracy.

Application to A Square Network in Seattle
A square network in Seattle is used as a congested network case study to demonstrate the applicability and transferability of the proposed approach in a real-world traffic network (Shown in Fig 7). Empirical data was collected from loop detectors located along one freeway section in Seattle area, and obtained for this research through the Strategic Highway Research Program 2 (SHRP 2 program) supported by Washington State Department of Transportation (WSDOT) [55]. The square test network used in this case study consists of 4 nodes and 8 links, where all nodes are centroids (origins and destinations). The topology of the test network is outlined in Fig 7. We assume that the study network is acyclic, such that the traffic flow starting from one node will leave the network before returning to the original node. Specifically, Links 1 and 2 represent the SR 520 Bridge connecting I-5 in Seattle and SR 202 in Redmond. Interstate 90 (I-90) is represented by Links 3 and 4, and Interstate 5 (I-5) is represented by Links 5 and 6. Links 7 and 8 represent Interstate 405 (I-405), which intersects I-90 in the south and SR 520 in the north.
Traffic flows were obtained from loop detectors installed at nodes 1, 2, 3 and 4, illustrated in Fig 8. The parameters for the BPR link performance cost function (Eq 18) were estimated based on the empirical data and are presented in Table 6. Table 7 indicates the external traffic flow recorded for each node during peak hour, where 1-Link 1 represents the external traffic flow on Link 1 from node 1, and 2-Link 7 represents the external traffic flow on Link 7 from node 2, and so forth. To convert true link flows into a ground truth OD matrix, the flow proportion for each node η = 0.6 is assumed based on extensive video records and filed surveys. This implies that, for the traffic leaving each node, 60% exits the network from an adjacent node while 40% exits from the other nodes. In order to avoid circular flow in the OD calculation process, it is assumed that the final remaining traffic Origin-Destination Matrices Estimation flow will leave from the last node before returning to the original node. Based on these assumptions, the ground truth OD matrix is calculated and shown in Table 8. In addition, the initial OD matrixd can be computed by rounding the last digit of the true OD matrix as shown in Table 8.
The true OD matrix in Table 8 is then used to assign the corresponding traffic flow into each link according to Eq 5 through Eq 10. The calculated traffic flows can be assumed to represent the true link flows, where link 1, 3, 5, 6, and 8 are selected as the observed links to estimate OD matrix shown in Table 9.
Similar to the hypothetical network, we assume that the OD demands and observed link flows follow the Poisson distribution, and the covariance matrices U and V can be assumed to be diagonal. The initial value of the dispersion parameterỹ is set to 40.5. The remaining input parameters are set identically to the hypothetical network. In addition, a sensitivity analysis with 50 different combinations of variation coefficients cv d and cv θ was conducted to investigate the optimal parameter initialization for the proposed approach. The results of this sensitivity analysis are shown in Figs 9-11.   As noted in the hypothetical case, the choice of cv d and cv θ has very little impact on the estimation of Theta. As shown in Fig 11, the estimated dispersion parameter θ is between 20.8327 (cv d = 0.5 and cv θ = 0.5) and 22.7165 (cv d = 0.7 and cv θ = 0.2) in all cases. The best estimate of dispersion parameter θ can be found between 20.8327 and 22.7165.
Finally, using the BPR link performance cost function parameters described in Table 6, different combinations of variation coefficient cv d = 0.3 and cv θ = 0.1; cv d = 0.5 and cv θ = 0.5;cv d = 0.7 and cv θ = 0.2 are used to estimate theta for the actual network.

Origin-Destination Matrices Estimation
It is interesting to observe that the estimated RMSE(OD), RMSE(LF), and Theta for both hypothetical and actual networks exhibit a similar trend yet have obvious differences. Two primary reasons may explain these differences: First, the network topology is quite different for the two scenarios. The hypothetical network is unidirectional, where each node can be either origin or destination. In contrast, the actual network is bidirectional, where each node is both origin and destination, and thus multiple paths may exist between each OD pair. For example, the traffic flows on both 1-Link 1 and 1-Link 6 contribute to the OD demands from node 1 to node 2. Second, compared with the hypothetical network with equal cost parameters for all links, a more realistic BPR link performance cost function is adopted for the actual network. In the real-world network, the parameters (e.g. free-flow travel time and link capacity) are  Origin-Destination Matrices Estimation calibrated for each link based on empirical data. That said, the sensitivity analysis for Theta produced similar results for both the hypothetical and actual networks, indicating that this parameter is not sensitive to the choice of variation coefficients. In addition, the theta estimates obtained using a range of different parameter settings exhibits a similar and regular trend over time of day as shown in Fig 12. These findings provide guidance for initial parameter selection, and offer useful insight for interpreting modeling results.

Conclusions
This paper proposes a two-stage algorithm to simultaneously estimate origin-destination matrices and link choice proportions by incorporating a dynamic dispersion parameter into the route choice model. The dispersion parameter θ is of practical significance in describing travelers' route choice decisions, but has typically been assumed constant in previous studies. Finding the optimal dispersion parameter is not a straightforward task. To address this issue, this paper presents a model calibration procedure to simultaneously estimate the dispersion parameter θ, link choice proportions, and OD matrix. In order to obtain the Generalized Least Square (GLS) estimators of the above listed parameters, a two-stage algorithm is proposed which integrates GLS estimation into the SUE traffic assignment procedure. The first and second stages of the algorithm are applied iteratively until the maximum relative difference presented in Step 11 is achieved, after which the estimated OD matrix, link choice proportion, and dispersion parameter θ can be obtained. The SQP approach based on the extended quasi-Newton method is used to search for the optimal solution in the first stage of the algorithm. The SUE traffic assignment procedure is applied to incorporate both OD matrix and link choice proportion estimation into the second stage of the algorithm, and MSA is used to obtain the equilibrium link flows.
A hypothetical network was constructed to test the performance of the proposed approach, followed by a comprehensive sensitivity analysis with 50 combinations of variation coefficient Origin-Destination Matrices Estimation combinations cv d (CVD) and cv θ (CVT) to investigate the stability of the estimated OD matrix, link flows, and Theta. A comparison with two different methods described in Yang et al. [15] and Lo and Chan [16] suggests that the proposed approach can achieve superior performance in terms of RMSE (OD), RMSE (LF), and accuracy of the estimated Theta parameter. Moreover, a case study is presented using a real-world congested square network in Seattle, WA to demonstrate the practicality of the proposed approach, in which the true OD matrix and observed link flows are calculated via ground-truth traffic count data collected by loop detectors. The proposed method is shown to be robust under a range of initial parameter values. The RMSE (OD) can be reduced from 3426.9 to 23.6 at cv d = 0.1 and cv θ = 0.1 when traffic flows are observed on five out of eight links. In addition, the estimated dispersion parameter exhibits a consistent and regular trend by time of day for all combinations of initial parameters. For future research, the proposed approach should be tested on a network of greater complexity and size, and the impact of input data inaccuracy should be considered. Additionally, further work is needed to determine the number and location of observed links required for accurate OD estimation using the proposed approach.
Supporting Information S1 Dataset. The dataset includes the Link Speed Data and Link Volume data, and the data were collected from loop detectors located along the freeway section (I-5, I-90, I-405 and SR 520) in Seattle area, and are retrieved via the Strategic Highway Research Program 2 (SHRP 2 program). The file named as "S1 Link Speed Data" records the average speed for all links every 20-second time interval, and the other file named as "S1 Link Volume data" records volume for all links every 20-second time interval. (RAR)