Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A constrained multinomial Probit route choice model in the metro network: Formulation, estimation and application

  • Yongsheng Zhang,

    Affiliation School of Traffic and Transportation, Beijing Jiaotong University, Beijing, China

  • Enjian Yao ,

    Affiliation MOE Key Laboratory for Urban Transportation Complex Systems Theory and Technology, Beijing Jiaotong University, Beijing, China

  • Heng Wei,

    Affiliation Advanced Research in Transportation Engineering and Systems (ART-EngineS) Laboratory, College of Engineering and Applied Science, The University of Cincinnati, Cincinnati, United States of America

  • Kangning Zheng

    Affiliation School of Traffic and Transportation, Beijing Jiaotong University, Beijing, China

A constrained multinomial Probit route choice model in the metro network: Formulation, estimation and application

  • Yongsheng Zhang, 
  • Enjian Yao, 
  • Heng Wei, 
  • Kangning Zheng


Considering that metro network expansion brings us with more alternative routes, it is attractive to integrate the impacts of routes set and the interdependency among alternative routes on route choice probability into route choice modeling. Therefore, the formulation, estimation and application of a constrained multinomial probit (CMNP) route choice model in the metro network are carried out in this paper. The utility function is formulated as three components: the compensatory component is a function of influencing factors; the non-compensatory component measures the impacts of routes set on utility; following a multivariate normal distribution, the covariance of error component is structured into three parts, representing the correlation among routes, the transfer variance of route, and the unobserved variance respectively. Considering multidimensional integrals of the multivariate normal probability density function, the CMNP model is rewritten as Hierarchical Bayes formula and M-H sampling algorithm based Monte Carlo Markov Chain approach is constructed to estimate all parameters. Based on Guangzhou Metro data, reliable estimation results are gained. Furthermore, the proposed CMNP model also shows a good forecasting performance for the route choice probabilities calculation and a good application performance for transfer flow volume prediction.


With new lines put into operation almost every year, the large scaled metro system has formed in some major cities in China, such as Beijing, Shanghai, Guangzhou and Shenzhen. Taking Guangzhou Metro for instance, up to the year of 2014, it is the sixth busiest metro system in the world and the third largest metro network in China with 9 lines, 164 stations including 21 transfer stations, and 260.5 km of tracks. In a large scaled metro network, the large number of transfer stations which brings plenty of routes for some origin-destination (OD) pairs increases the complexity of route choice modeling. Usually, according to a specific scheme, an individual chooses the best route among many alternative routes with comprehensive consideration of multiple factors, including the variables denoting the level of service of metro system, such as in-vehicle travel time, number of transfers, transfer time, congestion level, etc. and the variables describing the influence of topological structure of the metro network and route direction on passengers’ route choice preferences, such as angular cost [12].

The complex nature of route choice process responses to a large scaled metro system has brought challenges in establishing route choice model to reveal realistic behavioral decisions in the actual route choice process. Traditionally, with respect to route choice in a large scaled metro network, a route is chosen from a route set which is derived from attributes’ limitations, such as travel time and number of transfers. For example, for an OD pair, if the shortest travel time of one route is 30min, it is a common sense that passengers will not consider the route with more than 60min travel time. In this case, the 60min travel time is the limitation and the routes with less than 60min travel time constitutes the routes set. However, route choice and routes set steps are usually carried out separately and independently in the metro system which leads to losing the consistence between the two steps. For the two steps, the route choice step is a compensatory choice process which focuses on calculating the trade-offs among multiple influencing factors and the routes set generation step is a non-compensatory process which pays attention to the cut-offs associated with the attributes’ limitations. The non-compensatory behavior has been proved in the choice process [35]. And semi-compensatory route choice modelling which combines the routes set generation and route choice steps has attracted much more attention. The relationships between semi-compensatory, compensatory and non-compensatory choice processes are shown in Fig 1.

Fig 1. The relationships of different choice processes.

In the figure, ‘A+B = C’ means C is the combination of A and B. The arrow displays the one-to-one match.

Meanwhile, route over-lapping problem in the large scaled metro network has already been figured out by Yai et al. [6]. Especially for some OD pairs with long direct distance, the fact that some alternative routes will share some links brings the correlation among the routes. Although most Logit-based models were satisfactory in representing route choice behavior associated with route over-lapping problem, they were still the approximate responses to the real behavior. In order to exactly express the interdependency among alternatives, Probit model [7] is more suitable though its estimation is a little harder than Logit models. Faced with elaborative operational requirements and services, the operational department is looking forward to a more advanced route choice model to reveal passengers’ actual route choice behaviors so as to support personalized travel service and travel demand prediction.

Therefore, it is necessary to establish a semi-compensatory Probit route choice model and design an easier estimation approach for practice. In this paper, a constrained multinomial Probit route choice model is proposed to reveal the realistic route choice process along with the estimation approach, focusing on analyzing the semi-compensatory choice behavior and representing the interdependency among alternative routes.

Literature review

Route choice model based on random utility maximization (RUM) theory [8] mainly consists of two types, including Logit and Probit models. Among various models, Multinomial Logit (MNL) model [9] is the most widely used due to its easy estimation and application. Ramming, Raveau et al., Zhang et al. and Liu et al. successfully analyzed route choice behavior with the consideration of level of service, social demographics, travel purpose and route direction based on MNL model [1, 2, 1011]. But the assumption that the error component follows an identical and independent (IID) Gumbel distribution induces many weaknesses. In order to alleviate one of the weakness, known as route over-lapping problem which is caused by the interdependency among routes, many extended Logit models are developed, such as C-Logit [12], Path Sized Logit (PSL) [1314], Paired Combinatorial Logit (PCL) [1516], Cross Nested Logit (CNL) [17], Generalized Nested Logit (GNL) [18], Mixed Logit [19], etc. For the application in the metro network, Raveau et al. applied successfully C-Logit model to analyze passengers’ route choice preferences [20]. However, Logit models cannot avoid IID distribution assumption to formulate probability equation with closed form, weakening the interdependency among alternative routes, while Probit model [7] can reflect deeply the interdependency by covariance, closer to passenger’s actual route choice behavior. Yai et al. proposed a Probit model with structured variance to analyze route choice behavior in the railway network, saving the computational time to an extent [6].

Those models mainly focus on the process that an individual chooses the best route from a given routes set. The consistence between the routes set generation and route choice processes is usually neglected for metro passengers. Considering the interplay between the two sub-processes, Zhang et al. [21] successfully introduced constrained multinomial logit (CMNL) model [2223] into route choice modeling in the metro network to analyze passengers’ semi-compensatory choice behavior. Semi-compensatory models combining compensatory and non-compensatory behaviors have been paid more and more attention [2426]. As one of two major approaches, the two-stage approach is widely used, consisting of two stages: generating all possible consideration routes sets and then choosing routes from the generated routes sets [27]. The consideration routes set is a subset of master routes set which is limited by some specific attributes. The two-stage approach is attractive that different models are allowed to explain each stage and many successful applications have already been found in the literature [2830]. However, it leads to computational complex because too many consideration routes sets need to be constructed from master routes set [31]. And it also would have no sufficient robustness of choice prediction at the level of individual sets [32]. In order to avoid such a complex combinatorial number of choice sets, a kink called non-compensatory component is added to utility function, known as the second semi-compensatory choice modeling approach [3334]. The non-compensatory component won’t affect the utility when the attribute value lies in the domain, while it will negatively and significantly affect the utility when the attribute value exceeds the threshold. The non-compensatory component simplifies the structure of semi-compensatory choice model and saves the computational time by avoiding huge number of consideration choice sets compared with the two-stage approach. However, in those researches, kinks in the utility function make it non-differentiable at the cut-off, which is difficult to be applied in equilibrium and optimization processes.

To solve this problem, a constrained multinomial logit (CMNL) route choice model [21] is developed where the non-compensatory component is a continuous function, making utility function differentiable at the cutoff. However, its error component still follows (IID) Gumbel distribution and the route over-lapping problem in the route choice context still needs to be solved. Moreover, compared with the Gumbel distribution, the normal distribution is more approximate to the actual distribution of error component. To address the aforementioned problems, a study for developing a constrained multinomial probit (CMNP) route choice model for metro passengers is proposed in the paper. In this model, the error component follows the normal distribution instead of the IID Gumbel distribution to avoid the weaknesses. The correlations among alternative routes are measured by the covariance matrix.

The following sections of this paper are organized as follows: Section 3 introduces the CMNP route choice modeling methodology; Section 4 is about the estimation approach which is carried out based on MCMC method after transforming the CMNP model into Bayesian formulation; in Section 5, the CMNP model is estimated by the proposed estimation approach based on surveyed RP data in Guangzhou Metro and applied in forecasting the transfer passengers volumes; Section 6 is the conclusions.

Modeling methodology

Based on random utility theory, the utility function with constrained characteristic attributes mainly consists of three parts: compensatory, non-compensatory and error components. (1) where with respect to route k for OD pair rs, is the generalized utility perceived by passenger n; is the compensatory component; is non-compensatory component; is the random error component.

The compensatory component is a trade-off function of characteristic attributes, including level of service variables, network topology, etc. This function represents the compensatory trade-offs among attributes. For simplicity, the compensatory component is defined as a linear function of attributes as shown below. (2) where H is the number of characteristic attributes; Xk,h denotes the attributes; θn,h is the corresponding parameters needed to be estimated.

The non-compensatory component is a cut-off function which should satisfy below conditions:

  1. It should be a continuous function which guarantees the application in traffic equilibrium and optimization process.
  2. If a constrained attribute value of one route exceeds threshold, non-compensatory component will let the route utility tend to be negative infinity. Otherwise, non-compensatory component tends to be zero.

Therefore, the non-compensatory component is formulated in this paper as follows: (3) where is a continuous function limited in (0, 1). The part can be formulated as a probability function which represents the probability that route k is considered after the comparison between the constrained attributes of route k and the corresponding thresholds. Usually, more than one attribute is the constraint, according to conjunctive screen rule [35], can be defined as follows: (4) where for OD pair r-s, is the constrained attribute i of route k; In is the number of attributes constrained by individual n; the function measures the considered probability influenced only by characteristic attribute . To specify this function, we can assume a scenario that a constrained attribute with perceived error Ψ1 has an upper bound with perceived error Ψ2 (if it is a lower bound, the sign will reverse), then the function can be calculated by (5) where for OD pair r-s, is the threshold of attribute i constrained for individual n.

In this paper, we assume that the perception errors of the constrained attribute and threshold both follow the normal distributions and are independent from each other. Further, the errors Ψ1 and Ψ2 respectively follow normal distributions and , where and are the variances, and γi is the mean denoting the location parameter. According to the property of normal distribution, Ψ1-Ψ2 still follows normal distribution, that is Ψ1-Ψ2 ~N(−γi, σ2), where . And the function is equal to (6) where ωi is the scale parameter related to the variance (ωi>0, if is the upper bound of ; else, ωi<0) which affects the changing speed of the probability from 0 to 1; Φ(·) is the cumulative probability function of standard normal distribution. On account of the location parameter γi, even if the constrained characteristic attribute value is equal to the threshold, the considered probability may not be 0.5, depending on individual preference. The impacts of these parameters (e.g. the scale parameter ωi and location parameter γi) on the function are shown in Fig 2.

Fig 2. The impacts of scale and location parameters.

The impacts of scale and location parameters on considered probability function are displayed separately by changing the values.

Usually, in the metro system, the spatiotemporal constraints, referring to travel time and number of transfers, are taken as the constrained characteristic attributes. The fact that the passengers generally prefer to the route with smaller values of the two attributes leads to the result that both of the two constraints only have the upper bounds. In the route choice context for metro passengers, the thresholds to a specific attribute vary with OD pairs. The deterministic parts of the thresholds of constrained travel time and number of transfers are shown below respectively. (7) (8) where is the bound of travel time (including in-vehicle time and transfer time) of individual n for OD pair rs, h; is the bound of number of transfers; is the shortest travel time for OD pair rs, h; is the minimum number of transfers; αn and βn are the bound parameters needed to be calibrated.

Assuming that error component follows the multivariate normal distribution, that is , where is the covariance matrix associated with the correlation among alternative routes, together with the constraints on route availability in the utility function, it is called CMNP (constrained multinomial probit) model. With respect to the route choice scenario, the route over-lapping problem in the railway network has been identified by Yai et al. [6] which is similar to metro network. This paper rewrites the covariance matrix into three parts, where the first part depends on the correlation among routes, the second one denotes the transfer variance of the route, and the last one denotes the unobserved variance. The latter two parts distribute independently by route. (9) (10) where m is the number of alternatives in the routes set; is the unit variance which is independent from each other; is the variance of transfer 1time; is constant and identical to all routes; is the over-lapping length between route k and j for OD pair rs; is the number of transfers of route m; I is the identity matrix; li is the length of link i; Γk is the links set of route k; if link i is shared by route k and j, kikj = 1, otherwise, kikj = 0. There are only three parameters in this covariance matrix, but we just need to estimate the ratio λ1 of to and the ratio λ2 of to .

Then based on the random utility maximization, given the values of all parameters, the chosen probability of route k is equal to (11) where is the routes set between OD pair rs for passenger n. With respect to current scale of the metro network, the largest size of the routes set can be set as 10.

Model estimation

The Bayesian formulation to the CMNP model

Faced with multidimensional integrals of the multivariate normal densities especially for large routes set, MNP model is usually estimated by Bayesian formulation and Monte Carlo Markov Chain (MCMC) approach [36]. This paper mainly wants to exhibit how to transform CMNP model with structured covariance into Bayesian formula and introduce Cholesky Decomposition to descend the dimension of integral so that the computational time can be saved. After the dimension reduction process and integral domain transformation, the calculation of multidimensional integrals of the multivariate normal densities given the parameters can be carried out based on quasi-Monte Carlo method.

Here, we construct a vector ζ = μ∪σ∪Σ = θωγzλ including all unknown parameters, where θ contains the parameters in the compensatory component; ω is the vector with the scale parameters; γ is the vector with the location parameters; z contains the parameters in the threshold function, that is αn and βn in this paper; λ covers the parameters in the covariance matrix. Meanwhile, the vector Y denotes the indicators of the observations referring to the chosen routes. Compared with the traditional probit model and the proposed CMNP model, the non-compensatory component in the CMNP model leads to the difference between the two models. However, the value of the non-compensatory component can be easily calculated given the unknown parameters, benefiting from the independent bivariate normal distribution assumption. Based on Bayes’ theorem, the posterior distribution π(ζ|Y) is proportional to the priors on all unknown parameters, that is, the joint posterior distribution for the Hierarchical Bayes model is as follows. (12) where π(·) is the probability density function; P(Y|θ, ω, γ, z, λ) is the probability of observation Y given all unknown parameters which is equal to Eq (11). Supposing that all parameters are independent from each other, we can get the below equations.


Probit model is hard to be calculated even if all the unknown parameters are given because of the multivariate normal distribution. The Eq (11) can be rewritten into the D-dimensional integrals of the multivariate normal density as follows. (15) (16) where is the random error vector.

Considering that the covariance matrix is a Hermitian, positive-definite matrix, the integral of the general multivariate normal distribution can be transformed into that of standard normal distribution via Cholesky Decomposition to the covariance matrix and other substitutions [3738]. By this means, the integral domain is referred to as transforming the m-variate integral into one over the (m-1)-dimensional hypercube.

Based on Cholesky Decomposition, the covariance matrix can be written as (17) where D is a lower triangular matrix and DT is its conjugate transpose. We set (18) where is a vector substituting random error vector εrs. Then the Eq (11) can be transformed as (19) (20) where is the upper limit of the i-th layer’s integration; di,i is the element in D. We assume (21) where is a vector substituting vector qrs; Φ−1(·) is the inverse cumulative probability function of standard normal distribution, that is vi = Φ(qi). Meanwhile, we suppose vi = wiei. Thus, we get the equation (22) (23) where w = (w1, w2,…,wm) denotes the parameters vector.

When i = k, ei = 1, the m-dimensional integration declines into (m-1)-dimensional integration. And quasi-Monte Carlo method can be used to calculate the probability of multivariate normal distribution given all unknown parameters.

The parameter identification problem

With respect to the constrained multinomial logit model, Castro et al. discussed the parameter identification problem derived from the fact that one attribute exists both in the compensatory and non-compensatory components [23]. This problem has been avoided in this paper via the process that the threshold is regarded as a function whose value varies with the change of the OD scale. By this means, the parameters associated with the same attribute both in the compensatory and non-compensatory components are identifiable. Another parameter identification problem arises when the threshold parameter plus directly location parameter. In this paper, the constraint of number of transfers suffers this problem as β+γτ, where β is the threshold parameter and γτ is the location parameter. The two parameters cannot be identified, but we can estimate the sum of them without influence on other parameters. For simplicity, we can assume the location parameter is equal to 0, and then we can get the value of the threshold parameter.

Probability calculation based on Monte Carlo simulation

With respect to the solution to Eq (11) associated with the probit probability, the quasi-Monte Carlo method is carried out. Supposing that every element in w follows the uniform distribution wi ~ U(0, 1) and the elements in w are independent from each other, we use Halton sequence to generate random data. The vector wj = (w1j,…wij,…wmj). contains the random values generated by the Halton sequence in the j-th iteration. We can get the approximate solution to Eq (11) given unknown parameters θ, ω, γ, z, λ, that is, (24) where E(·) denotes the expected value.

Estimation algorithm

The Metropolis-Hastings (M-H) algorithm [3940] which is known as one of the MCMC approach is widely used to generate samples of the parameter from a prior distribution without the prior knowledge. In order to generate the final samples, candidates are drawn iteratively and they will be accepted as current samples with a certain probability in every iteration. With the increase of the number of iterations, Markov Chain ensures that we will gain a stationary posterior distribution of the parameter. In order to improve estimating efficiency, the variable-at-a-time Metropolis sampling scheme [39] is used to generate candidate for every parameter in turn in the parameters’ set. The estimation process is organized as follows.

Step 1: Generate randomly initial values from pre-defined prior distribution, that is , where NP denotes the number of parameters. And generate routes sets Ars for OD pairs based on physical length. Set iteration t = 1 and i = 1.

Step 2: Draw a candidate from a jumping distribution based on the Gaussian random walk Metropolis sampling method. This method suggests that the jumping distribution is supposed to be a normal distribution which is a symmetric distribution satisfying the equation . Here we set the jumping distribution as , where ξ2 is the proposal variance for the i-th parameter.

Step 3: Calculate the acceptance ratio , where (25) (26) where , referring to Eq (11). The same parts can be canceled out and the ratio is (27)

Step 4: Draw a value u from the uniform distribution U(0, 1). If uϑ, ζ(m) = ζ*; otherwise, ζ(m) = ζ(m−1).

Step 5: If i<NP, i = i+1, repeat Step 2—Step4; otherwise, continue Step 6.

Step 6: If m<M, m = m+1, i = 1, repeat Step 2—Step 5; otherwise, stop sampling.

Results and discussions


All data are available in the supporting information file (S1_Dataset.).

With new lines put into operation ceaselessly, Guangzhou Metro becomes the sixth busiest metro system in the world and the third largest metro system in China. Up to July 2014, there are 8 lines and 136 stations (including 19 transfer stations) in operation, forming 256.6km operating length and carrying about 6.2 million daily ridership, except the APM Line. The APM Line isn’t in our consideration, because it belongs to a unique system which needs to swipe through again although you were in other lines. The metro system covers the major urban areas of the city, reaches into some large suburban area and connects Guangzhou city and Foshan city. Through statistical analysis, there are 3721 OD pairs for which the routes with the shortest travel times are not the routes with the minimum number of transfers. It provides the possibility that more than one route will be considered by passengers with comprehensive consideration of multiple factors. The large scaled metro network increases the complexity of route choice analysis.

In July 2014, Guangzhou Metro Corporation organized a survey in the metro stations to collect passengers’ travel characteristics, such as respondents’ actual travel routes. Totally, the effective sample size is 14142. Based on the survey data, Fig 3(a) shows the relationship between the difference (namely the threshold minus the shortest travel time) and the shortest travel time. It can be seen that the difference increases logarithmically with the increase of the shortest travel time which demonstrates that the travel time threshold formula in Eq (7) is suitable. By data fitting, when αn = 0.446, the mean absolute percentage error (MAPE) is the minimum 3.657%. Fig 3(b) shows that when the minimum number of transfers is 0, the weighted mean value of the threshold of number of transfers is 1.951, that is, βn can be assumed as 1.951.

Fig 3. The thresholds of travel time and number of transfers.

The figure displays the survey data associated with thresholds. By fitting, the parameters in the thresholds can be estimated.


In the compensatory component, the in-vehicle travel time (, h), number of transfers (, time), transfer time (, h), comfort degree (, 0–1 variable) and revised angular cost (, km) are considered with the corresponding parameters θ1, θ2, θ3, θ4 and θ5, where revised angular cost measures the deviation degree of a route by transforming sin() into tan() and comfort degree represents the congestion level in the train whose value is when average load factor of one route is smaller than 20%, otherwise, . In the non-compensatory component, the travel time (, h) and number of transfers are considered with the threshold parameters α and β, scale parameters ωt and ωm as well as location parameters γt and γm respectively. In case of the parameter identification problem in the threshold of number of transfers, the location parameter is assumed to be 0, that is γm = 0. Moreover, the parameter λ1 and λ2 in the covariance matrix needs to be estimated.

Based on the surveyed data, the proposed model, MNP model, MNL model and CMNL model are estimated respectively. The latter two models are estimated based on the maximization likelihood estimation method, while the proposed model and MNP model are estimated based on the estimation approach proposed by this paper. Under the non-informative condition, the prior distributions for all parameters are assumed to follow uniform distribution. Totally, what parameters we need to estimate are θ1, θ2, θ3, θ4, θ5, α, β, ωt, ωm, γt, λ1 and λ2. Considering the signs of the parameters, we assume that θ1, θ2, θ3 and θ5 follow U(-20, 0); θ4, α, β, ωm, λ1 and λ2 follow U(0, 20); ωt follows U(60, 200); γt follows U(-1, 1). The proposal variance ξ2 is 0.05. The surveyed RP data is divided into two parts, referring to the 12039 data for estimation and 2103 data for examination. The scheme to screen the data will be described later. Based on MCMC approach, we tried 10000 iterations to estimate all parameters in the CMNP model, where the fore 5000 samples for each parameter are abandoned as burn-in period and the left 5000 effective samples for each parameter are drawn. The distribution of the effective samples for θ1 is taken as an example shown in Fig 4. Based on the samples, we can get the mean and 95% Bayesian conference interval (CI) shown in Table 1, as well as the estimations of MNL, MNP and CMNL models.

Fig 4. The distribution of effective samples for θ1.

The figure exhibits the frequency distribution of the samples to represent the convergence directly.

Kolmogorov-Smirnov (KS) Test is used to determine whether the samples follow normal distribution. Fig 4 shows the distribution of the effective samples for parameter θ1 for an example. Descriptive statistics show that the average value is -9.074 and the standard deviation is 0.366. By KS test, the p value is 0.433 which is greater than 0.05, proving that the samples follow normal distribution at 5% significance level. Thus, the sampling process converges. Other parameters have the same characteristics. As shown in Table 1, the different results for the parameters in MNP and CMNP models indicate that the MCMC approach can successfully distinguished all parameters though some of them have the same initial values. And the CI for every parameter ensures us to accept the means of the drawn samples. Meanwhile, coefficients of in-vehicle time, number of transfers, transfer time and revised angular cost are negative, meaning that the chosen probability of one route decreases along with the increase of in-vehicle time, number of transfers, transfer time or revised angular cost. Coefficients of comfort degree in both models are all positive, meaning that their increase will improve individual preference to the route. It is consistent with the common sense. Furthermore, t-values of the coefficients for MNL and CMNL models exceed 1.96, indicating that the null hypothesis that the true values of the coefficients are zero can be rejected at the 0.05 significance level. And the ρ2 of all models are greater than 0.2, indicating that all models have a good goodness-of-fit. Compared with the ρ2, CMNP model are the greatest, illustrating that the proposed CMNP model is the best among all models.

In addition to the estimation performance, the forecasting performances of all models are compared. In order to gain plenty of actual choice results which are drawn from the surveyed data, the route choices between some similar OD pairs are aggregated. As shown in Fig 5, we combine the origins as an identical origin R as well as the identical destination S, that is, R contains Guangzhou South Railway Station (R1), Shibi (R2), Huijiang (R3), Nanpu (R4), Luoxi (R5), Nanzhou (R6), Dongxiao South (R7) and Jiangtai Road (R8); S contains Jingxi Nanfang Hospital (S1) and Meihuayuan (S2). The transfer stations are Haizhu Square (m1), Gongyuanqian (m2), Jiahewanggang (m3), Yantang (m4), Guangzhou East Railway Station (m5) and Tiyu West Road (m6). Excluding the routes with chosen probabilities smaller than 0.0001, we have four routes left in Table 2 along with the chosen probabilities according to different models. The number of the actual choices between the specific OD pairs in the surveyed data is 2103 and the absolute error is calculated to compare the forecasting performance as shown in Table 2. We can see that CMNP model has the smallest MAE (Mean Absolute Error) which demonstrates that the proposed CMNP model has the best forecasting performance.

Fig 5. A diagram of deleted OD pairs and routes.

The origin stations are denoted as R; the destination stations are denoted as S; the transfer stations are denoted as m. Other stations are omitted in the figure. It means transferring if the route passes by a transfer station.


The route choice model can be used to predict the transfer flow volume, section flow volume, etc. which are the basis of scheduling the train plan, guiding individual travel route, etc. The proposed CMNP route choice model determines the route choice probability for every OD pair in the metro network. And then the flow volume on the route can be derived from the product of the probability and the OD volume. By counting the number of passengers transferring between two different lines based on the train timetable, the transfer flow volume can be calculated. All testing data are provided by Guangzhou Metro Corporation. The results are shown in Fig 6 where testing data is on the horizontal axis, predicting data is on the vertical axis, and the solid line is the basic line indicating that the predicting data is equal to the testing data if the data spot is on the line. Every spot represents the flow volume transferring form one running direction of one line to one running direction of another line. Usually, every line has two running directions. The mean absolute percentage error (MAPE) is 4.91% which shows that the proposed CMNP model has a good application prospect.

Fig 6. The transfer flow volume forecasting performance.

For each spot, it has two values, including the testing data corresponding to horizontal axis and forecasting value corresponding to vertical axis.


In a large scaled metro network, the complex nature of route choice process brings us a challenge to exactly figure out passengers’ actual decision rules. This paper focuses on integrating the impacts of routes set and the interdependency among alternative routes on route choice probability into route choice modeling in the metro network. The impact of routes set on route choice probability expresses the semi-compensatory choice process which is a combination of routes set generation and route choice stages. Thereafter, a constrained multinomial probit (CMNP) model is proposed by this paper, in which, the utility function consists of compensatory, non-compensatory and error parts. The compensatory part is a linear function of in-vehicle travel time, number of transfers, transfer time, congestion level and revised angular cost. The non-compensatory part measures the impact of considered probability of one route on the route’s utility by a logarithm function, where considered probability is calculated by a binary probit equation denoting the relationship between the constrained attributes (e.g. travel time and number of transfers) and the corresponding thresholds proposed by this paper. The error part follows a multivariate normal distribution, whose variance is structured into three parts, including measuring the correlation among routes, representing the transfer variance of the route, and denoting the unobserved variance.

With respect to the estimation, considering multidimensional integrals of the multivariate normal probability density function, the CMNP model is rewritten as Bayesian formulation and MCMC approach is constructed to estimate all parameters. As a key point to calculate the acceptance rate, given the unknown parameters, the multidimensional integrals of the multivariate normal probability density function can be transformed into those of standard normal distribution via Cholesky Decomposition to the covariance matrix and other substitutions. Then the integrals can be easily simulated by quasi-Monte Carlo algorithm.

At last, the proposed model is estimated by the proposed estimation approach based on the surveyed RP data in Guangzhou Metro. The estimations show that every parameter can be distinguished though they have the same initial values. And the Bayesian CI indicates the reliability of the mean of the samples. Moreover, compared with MNL, MNP and CMNL models, the proposed CMNP model shows the best forecasting performance with respect to the prediction on the route choice probabilities and transfer flow volumes.

In the future, we will try to estimate the proposed model based on the smart card data and the travel time reliability will also be considered in the model.

Supporting information

S1 Table. Estimations of MNL, CMNL and CMNP models.



Many thanks for helpful suggestions from reviewers and the editor.

Author Contributions

  1. Conceptualization: YZ EY.
  2. Funding acquisition: EY.
  3. Methodology: YZ EY HW KZ.
  4. Writing – original draft: YZ EY HW KZ.
  5. Writing – review & editing: YZ EY HW KZ.


  1. 1. Raveau S, Muñoz J C, De Grange L. A topological route choice model for metro. Transportation Research Part A: Policy and Practice. 2011; 45(2): 138–147.
  2. 2. Zhang Y, Yao E, Dai H. Transfer volume forecasting method for the metro in networking conditions. Journal of the China Railway Society. 2013; 23(11): 1–6.
  3. 3. Simon H A. A behavioral model of rational choice. The Quarterly Journal of Economics. 1955; 69: 99–118.
  4. 4. Tversky A. Elimination by aspects: a theory of choice. Psychological Review. 1972; 79: 281–299.
  5. 5. Bovy P H L. On modelling route choice sets in transportation networks: a synthesis. Transport Reviews. 2009; 29: 43–68.
  6. 6. Yai T, Iwakura S, Morichi S. Multinomial Probit with structured covariance for route choice behavior. Transportation Research Part B: Methodological. 1997; 31(3): 195–207.
  7. 7. Daganzo C F, Sheffi Y. On stochastic models of traffic assignment. Transportation Science. 1977; 11(3): 253–274.
  8. 8. McFadden D. The revealed preferences of government bureaucracy. Bell Journal of Economics Management Science. 1968; 6: 401–416.
  9. 9. Dial R B. A probabilistic multipath traffic assignment model: which obviates path enumeration. Transportation Research. 1971; 5: 83–111.
  10. 10. Ramming M S. Network knowledge and route choice. Massachusetts: Massachusetts Institute of Technology. 2001: 111–212.
  11. 11. Liu S, Yao E, Zhang Y. Personalized route planning algorithm for urban rail transit passengers. Journal of Transportation Systems Engineering and Information Technology. 2014; 14(05): 100–104.
  12. 12. Cascetta E, Nuzzolo A, Russo F, Vitetta A. A modified Logit route choice model overcoming path overlapping problems: specification and some calibration results for interurban networks, 13th International Symposium on Transportation and Traffic Theory. 1996; France.
  13. 13. Ben-Akiva M, Bierlaire M. Discrete choice methods and their applications to short term travel decisions. 1999; Springer, US.
  14. 14. Bekhor S, Ben-Akiva M E, Ramming M S. Evaluation of choice set generation algorithms for route choice models. Annals of Operations Research. 2006; 14(1): 235–247.
  15. 15. Chu C. A paired combinatorial Logit model for travel demand analysis, 5th World Conference on Transport Research, 1989; Japan.
  16. 16. Koppelman F S, Wen C H. The paired combinatorial Logit model: properties, estimation and application. Transportation Research Part B: Methodological. 2000; 34(2): 75–89.
  17. 17. Vovsha P. The cross-nested Logit model: application to mode choice in the Tel-Aviv metropolitan area, 76th Transportation Research Board Annual Meeting. 1997; Washington D. C., USA.
  18. 18. Wen C H, Koppelman F S. The generalized nested Logit model. Transportation Research Part B: Methodological. 2001; 35(7): 627–641.
  19. 19. McFadden D, Train K. Mixed MNL models for discrete response. Journal of Applied Econometrics. 2000; 15(5): 447–470.
  20. 20. Raveau S, Guo Z, Muñozs J C, Wilson N H M. A behavioural comparison of route choice on metro networks: time, transfers, crowding, topology and socio-demographics. Transportation Research Part A: Policy and Practice. 2014; 66: 185–195.
  21. 21. Zhang Y, Yao E, Liu S. A constrained MNL route choice model for metro passengers, 94th Transportation Research Board Annual Meeting. 2015; Washington D. C., USA.
  22. 22. Martínez F, Aguila F, Hurtubia R. The constrained multinomial Logit: a semi-compensatory choice model. Transportation Research Part B: Methodological. 2009; 43(3): 365–377.
  23. 23. Castro M, Martı´nez F, Munizaga M A. Estimation of a constrained multinomial Logit model. Transportation. 2013; 40(3), 563–581.
  24. 24. Cantillo V, Ortu´zar J D. A semi-compensatory discrete choice model with explicit attribute thresholds of perception. Transportation Research Part B: Methodological. 2005; 39(7): 641–657.
  25. 25. Kaplan S, Prato C G. Joint modeling of constrained path enumeration and path choice behaviour: a semi-compensatory approach, European Transport Conference. 2010; Glasgow.
  26. 26. Kaplan S, Prato C G. Closing the gap between behavior and models in route choice: the role of spatiotemporal constraints and latent traits in choice set formation. Transportation Research Part F: Psychology and Behaviour. 2012; 15(1): 9–24.
  27. 27. Manski C. The structure of random utility models. Theory and Decision. 1977; 8(3): 229–254.
  28. 28. Morikawa T. A hybrid probabilistic choice set model with compensatory and non-compensatory choice rules, 7th World Conference on Transport Research. 1995; Pergamon, Oxford.
  29. 29. Rob V N, Hoogendoorn-Lanser S, Koppelman F S. Using choice sets for estimation and prediction in route choice. Transportmetrica. 2008; 4(2), 83–96.
  30. 30. Kaplan S, Shiftan Y, Bekhor S. Development and estimation of a semi-compensatory model with a flexible error structure. Transportation Research Part B: Methodological. 2012; 46(2): 291–304.
  31. 31. Swait J D, Ben-Akiva M E. Incorporating random constraints in discrete models of choice set generation. Transportation Research Part B: Methodological. 1987; 21(2): 91–102.
  32. 32. Bliemer M C J, Bovy P H L. Impact of route choice set on route choice probabilities, 87th Transportation Research Board Annual Meeting. 2008; Washington, D. C., USA.
  33. 33. Cascetta E, Papola A. Random utility models with implicit availability/perception of choice alternatives for the simulation of travel demand. Transportation Research Part C: Emerging Technologies. 2001; 9(4): 249–263.
  34. 34. Swait J D. A non-compensatory choice model incorporating attribute cut-offs. Transportation Research Part B: Methodological. 2001; 35(7): 903–928.
  35. 35. Timothy J G, Greg M A. A choice model with conjunctive, disjunctive, and compensatory screening rules. Marketing Science. 2004; 23(3): 391–406.
  36. 36. Nobile A. A Hybrid Markov Chain for the Bayesian Analysis of the Multinomial Probit Model. Statistics and Computing. 1995; 8: 229–242.
  37. 37. Genz A. Numerical computation of the multivariate normal probabilities. Journal of Computational and Graphical Statistics. 1992; 1: 141–150.
  38. 38. Genz A. Comparison of methods for the computation of multivariate normal probabilities. Computing Science and Statisties. 1993; 25: 400–405.
  39. 39. Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H, Teller E. Equation of state calculations by fast computing machines. The Journal of Chemical Physics. 1953; 21 (6): 1087–1092.
  40. 40. Hastings W K. Monte Carlo sampling methods using Markov Chains and their applications. Biometrika. 1970; 57: 97–109.