## Figures

## Abstract

A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences.

**Citation: **Wang K, Ye X, Pendyala RM, Zou Y (2017) On the development of a semi-nonparametric generalized multinomial logit model for travel-related choices. PLoS ONE 12(10):
e0186689.
https://doi.org/10.1371/journal.pone.0186689

**Editor: **Jun Xu, Beihang University, CHINA

**Received: **June 13, 2017; **Accepted: **October 5, 2017; **Published: ** October 26, 2017

**Copyright: ** © 2017 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data are within the paper and its Supporting Information files.

**Funding: **This research is partially supported by the general project “Study on the Mechanism of Travel Pattern Reconstruction in Mobile Internet Environment” (no. 71671129) and the key project “Research on the Theories for Modernization of Urban Transport Governance” (no. 71734004) from the National Natural Science Foundation of China, http://www.nsfc.gov.cn/publish/portal1/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## 1. Introduction

The Gumbel distribution (also referred to as the Type-I extreme value distribution) plays a central role in discrete choice models for travel demand analysis[1]. This can be attributed to two major reasons. First, the Gumbel distribution closely resembles the normal distribution, which is often the preferred distribution to characterize the random disturbance term in an econometric model that accounts for the effect of unobserved factors. Second, when the Gumbel distribution is assumed for random components of utility functions, a closed-form likelihood function is obtained in the context of the application of the microeconomic utility maximization principle. With a closed-form likelihood function, maximum likelihood estimation (MLE) methods can be applied with ease to estimate model coefficients consistently and efficiently. Due to these appealing features of the Gumbel distribution, the Multinomial Logit (MNL) model is widely applied in practice and preferred over its counterpart that is based on the assumption of a normally distributed random error component (i.e., Multinomial Probit or MNP model)[2–4]. In the context of discrete-continuous choice behaviors, the Multiple Discrete-Continuous Extreme Value (MDCEV) model[5–9] developed based on the standard Gumbel distribution has a neat closed-form log-likelihood expression while others based on the normal distribution assumption do not have this feature[10–17].

However, according to the theory of maximum likelihood estimation, the consistency and efficiency of maximum likelihood estimators depend on the validity of the distributional assumption made on the random error term. It is important to ensure that the distributional assumptions on the random error terms are valid when applying the MLE method to estimate model coefficients of a discrete choice model. Methods to test for violations of the normal distribution are currently available in the economic literature[18]. Recently, the authors developed a practical method to test the validity of the distributional assumption on the random disturbance term in an MNL model and obtained significant statistical evidence to reject the standard Gumbel distribution assumption in a very commonly encountered empirical setting dealing with long distance travel mode choice[19]. That finding motivates this particular study which aims to develop and present the formulation for a Semi-nonparametric Generalized Multinomial Logit Model (SGMNL) for travel-related choices. The objective of this study is to generalize the MNL model by relaxing the assumption of a Gumbel distribution using a semi-nonparametric approach, and then demonstrate the efficacy of the approach by applying the generalized model to an empirical setting of travel mode choice. It should be noted that this generalization essentially differs from other extensions of the MNL that have yielded the Nested Logit, Cross-nested Logit, Heteroskedastic Logit or Multinomial Probit models[20]. Those models are generalized extensions that persistently employ the unimodal Gumbel or normal marginal distributions, whereas the proposed semi-nonparametric model presented in this paper allows the marginal error distribution to have multiple modes. Thus, the proposed model provides the ability to examine potential bias in model coefficients, marginal effects and elasticities in a discrete choice model that may arise when a unimodal distribution like the standard Gumbel distribution is violated in random components of utility functions.

Discrete choice models are widely used in transportation planning practice to predict travel mode choice behavior; the choice of transport mode has important implications for traffic congestion, energy consumption and air pollution. The study of mode choice behavior and its determinants can help transportation planning professionals design alternatives and implement policies that enhance sustainability, livability, and public health while reducing delays due to congestion. There are a number of recent studies in the literature that have focused on a study of travel mode choice behavior. For example, Shen et al. (2016) found that proximity to metro stations has a significant positive effect on the choice of rail transit as a primary commuting mode[4]. Ding et al. (2017) applied an integrated structural equation model and discrete choice model to investigate how the built environment affects travel mode. In their model system, they account for the mediating effects of car ownership and travel distance, thereby capturing both the direct and indirect effects of built environment attributes on travel mode choice[2]. Ding et al. (2014) proposed a cross-classified multilevel probit model of travel mode choice[21]. Comparisons with a traditional mode choice model not only revealed the effects of residential and workplace location on tour-based commute mode choice behavior, but also revealed the presence of spatial heterogeneity across home location and workplace in mode choice behavior. In this paper, a semi-nonparametric choice modeling method is proposed and applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using data from Argau, Switzerland. The proposed approach is motivated by the desire to offer a more flexible and robust methodological framework for activity-travel behavior analysis.

The remainder of the paper is organized as follows. In Section 2, the literature on semi-nonparametric choice models is reviewed. In Section 3, the orthonormal Legendre polynomial is introduced and then applied to extend the standard Gumbel distribution, thus enabling the development and formulation of the Semi-nonparametric Generalized Multinomial Logit Model (SGMNL). In Section 4, data used for the empirical study is described, and empirical estimation results are presented and discussed. Finally, conclusions and directions for future research are presented in the last section.

## 2. Literature review

As early as the time when McFadden initially proposed the MNL model[22], econometricians have been questioning the validity of the distributional assumption on the error term in random utility functions[23]. When a violation of the standard Gumbel distribution assumption is found, alternative modelling approaches may be explored to overcome the ill-effects. Adopting an alternative parametric distribution for random utilities may prove to be a solution; for example, the Weibull or logistic distribution recently proposed in the literature[24, 25] could serve as appropriate distributional assumptions on the random error term. In addition, a generalized multinomial logit model or a discrete-continuous choice model that allows heteroscedastic variance may also prove to be superior to the standard MNL and MDCEV model[26, 27]. However, all of these alternative distributions are unimodal in nature and therefore cannot capture potential multimodalities in random errors.

Concerns about the adverse effects of violations of distributional assumptions on the random error components have motivated the development of semi-parametric and semi-nonparametric choice models. The semi-parametric choice model employs the kernel density method to estimate the distribution of random errors, and therefore does not rely on any parametric distributional assumptions[28–32]. The semi-nonparametric (SNP) choice model, on the other hand, is developed based on a polynomial approximation of a probability density function (PDF) that takes a flexible form[33]. Because the likelihood function has an explicit analytical expression, the SNP choice modeling method appears to be more widely applied in practice than the semi-parametric approach[34–37].

Similar to a binary probit model, the SNP binary choice model formulation also starts with a random utility (U), which can be expressed as U = V + ε, where "V" is the systematic component and "ε" is the random component. If a dummy variable "y" indicates whether an alternative is chosen or not, then P(y = 1) = P(U > 0) = P(V + ε > 0) = P(ε > −V). The probability density function of "ε" takes the following form: (1)

In Eq (1), φ(ε) represents the PDF of the standard normal distribution and is referred to as the "a priori distribution". The denominator ensures that . Eq (1) can be extended as follows: (2) (3)

To evaluate the probability value above, recursion formulas may be applied to derive the indefinite integral of ∫ ε^{i+j}φ(ε)dε. The above SNP choice model is limited to a binary choice situation due to its computational complexity in the context of a multinomial choice situation.

## 3. Modeling methodology

### 3.1 Extending the standard gumbel distribution with the orthonormal legendre polynomial

Bierens[38] proposed a new polynomial, called the orthonormal Legendre polynomial, for estimating distributions on the unit interval in a semi-nonparametric framework. In the transportation choice modeling literature, this approach has been used to test normal and log-normal distributions of random coefficients in mixed logit models[39]. As per Fosgerau and Bierlaire[39] and Bierens[38], the orthonormal Legendre polynomial may be recursively defined as: (4) (5) In Eq (5), , . The advantage of using this polynomial is that it ensures (6)

According to Gallant and Nychka[33], the prior distribution in the semi-nonparametric approach can be a distribution other than the standard normal distribution. In this paper, the orthonormal Legendre polynomial is used to construct a semi-nonparametric (SNP) probability density function that extends the standard Gumbel distribution as follows:
(7)
where g(x) = exp(−e^{−x}) ∙ exp(−x), G(x) = exp(−e^{−x}), δ_{k} are scalar parameters and K represents the total number of polynomials. Using Eq (6), it can be shown that . As f(x) is positive, it qualifies as a probability density function.

Fig 1 compares the semi-nonparametric probability densities when the number of polynomials is 1 (K = 1) and the parameter δ_{1} takes a value of -2, 0, 1 or 2. When δ_{1} is 0, the distribution reduces to a standard Gumbel distribution, as shown by the red curve. When δ_{1} takes a value of -2, 1 or 2, the distributions are bimodal, although the secondary peak in the distribution is rather flat when δ_{1} is equal to -2 or 1.

Fig 2 compares the semi-nonparametric probability densities when the number of polynomials is 2 (K = 2) and two scalar parameters δ_{1} and δ_{2} are involved. With two polynomials, and where the highest power term of “G(x)” increases to 2, the SNP function represented in Eq (7) can generate a more flexible probability density distribution. It can be seen that, when δ_{1} is 2 and δ_{2} is -2, the distribution exhibits two modes with almost equal probability densities. When δ_{1} is 0 and δ_{2} is 2, the distribution shows three modes. It may further be expected that, when the number of polynomials (K) or the highest power term of “G(x)” increases, the SNP function with a flexible form can effectively represent the probability density function for a large family of distributions with multiple modes. Such flexibility allows for a better representation of the distribution of the error term in a random utility function of a choice model, and therefore provides the ability to obtain more consistent estimates of model coefficients.

### 3.2 Simplifying the semi-nonparametric *(SNP)* probability density function *(PDF)*

Following Gallant and Nychka[33], it is possible to employ the SNP PDF in Eq (7) to construct random components in utility functions so that multiple modes may be accommodated in their distributions. Before the choice probability can be derived, the SNP PDF needs to be simplified first. Using Eqs (4) and (5), it is possible to write the polynomial in a general form as:
(8)
where c_{n,k} is a constant coefficient for the term “x^{k}” in the n^{th} polynomial. When k > n, c_{n,k} = 0. Let and . Then, L_{0} = 1 and L_{1} = *a*x + *b*. When n ≥ 2, as per Eq (5),

Since c_{n−2,n−1} = 0, .

Then, it is possible to write: (9) (10)

When n = 0 or 1, define c_{0,0} = 1, c_{1,0} = *b*, and c_{1,1} = *a*. For any integer “n” (n ≥ 2), the recursion equations (10) can be applied to compute the coefficients c_{i,j} and all of the c_{i,j} values form a lower triangular matrix, called the “c” matrix in this paper. Table 1 provides an example of such a “c” matrix when “n” reaches 6. With the “c” matrix, the general form of the orthonormal Legendre polynomial (given the “n” value) may be obtained. For example, when n = 4, the fourth row vector of coefficients in the “c” matrix can be extracted to write the polynomial as L_{4}(x) = 3x^{0} − 60x^{1} + 270x^{2} − 420x^{3} + 210x^{4}.

After the “c” matrix is generated, δ_{0} needs to be defined as 1 and the numerator in the SNP probability density function in Eq (7) can be rewritten as:

Define a “d” vector, where each element . Since c_{k,i} = 0 when k < i,
(11)

Thus, . The SNP probability density function in Eq (7) may then be rewritten as: (12)

Essentially, the SNP PDF in Eq (7) has been simplified to be:
(13)
where ξ_{m} is a function with respect to parameters δ_{k}, and M (= 2K) is the highest power term of “G(x)” in the formula. The relationship between ξ_{m} and δ_{k} is described by Eqs (11) and (12). The cumulative distribution function (CDF) of the extended probability density function may be formulated as:
(14)

### 3.3 Derivation of choice probabilities and likelihood function

Suppose there are “J” alternatives in the choice set and their random utility functions are U_{1}, U_{2}, …, U_{J}. Let the utility U_{j} be expressed as the sum of the systematic component V_{j} and the random component ε_{j} (i.e., U_{j} = V_{j} + ε_{j}). Assume that ε_{j} independently follows the extended distribution and its semi-nonparametric PDF and CDF are given as:
(15)
(16)

The subscript “j” is added to allow ε_{j} in various random utilities to have different SNP distributions. In addition, three Lemmas, whose proofs are furnished in S1 Appendix, are used in the subsequent derivation of choice probabilities. Based on the utility maximization principle,
where “y” is a categorical choice variable indicating the specific alternative that is chosen. Then, P(y = 1) = P(ε_{2} < V_{12} + ε_{1},ε_{3} < V_{13} + ε_{1},…,ε_{J} < V_{1J} + ε_{1}), where V_{ij} = V_{i} − V_{j}.

According to Lemma 1 in S1 Appendix, [G(ε)]^{m} = G[ε − ln(m)], where m > 0. Thus,

Let the integral part in the formula be defined as "Int", i.e.,

According to Lemma 2 in S1 Appendix, ,

where . Then,

. According to Lemma 3 in S1 Appendix,

By substituting "Int" into the choice probability expression, an elegant closed-form equation for the choice probability may be obtained: (17)

The derivation above is shown for the case when y = 1, but can be generalized to the situation where y = k. Without loss of generality, (18)

The log-likelihood function over the entire sample may be formulated as:
(19)
where I() is an indicator function; the subscript “i” is the index for an observed choice in the sample and “N” is the sample size. The log-likelihood function can be maximized to estimate model coefficients in the systematic component V_{j} as well as parameters in the vector δ_{j} that have been incorporated into . When all M_{j} = 0, and the model reduces to the familiar MNL model. Thus, the proposed model may be considered a generalized multinomial logit model based on a semi-nonparametric approach.

## 4. Data and empirical estimation results

### 4.1 Data and modeling procedure

Data for the empirical study is extracted from the 2000 Swiss Microcensus travel survey. A sample consisting of 2,756 commuting trips reported by residents of Aargau Canton in Switzerland is used in this study to estimate models for commute mode choice. Four major commute modes are considered and defined as auto, transit, bicycle and walk. The sample market shares for these four alternatives show that the Aargau Canton of Switzerland depicts a multimodal transportation environment, where 57.62% of commuting trips are made by private auto and the remaining 42.38% of commuting trips are made by transit or non-motorized travel modes. In particular, the transit mode share is 15.86%, the bicycle mode share is 8.31%, and the walk mode share is 18.21%. The mode shares offer a sufficient number of observations in each travel mode, thus supporting the estimation of a mode choice model with multiple alternatives. In addition, multimodal network skim (level of service) data and commuters’ demographic and socioeconomic attributes are incorporated in the mode choice model specification.

The modeling effort started with the estimation of a simple MNL mode of mode choice. Model estimation results are presented in the first part of Table 2. Both level of service (LOS) attributes and commuters’ demographic and socioeconomic attributes are included as explanatory variables in the utility functions. Travel times, including auto in-vehicle time, transit in-vehicle time, and bicycle and walk times, exhibit significantly negative coefficients in the respective utility functions. Transit service frequency takes a significantly positive coefficient, indicating that a high service frequency would increase propensity of commuters to use transit. Model coefficients associated with demographic and socioeconomic attributes show that female commuters are less likely to use auto and bicycle modes. Low-income commuters are more likely to use transit or bicycle modes, while high-income commuters are less likely to use the transit mode. Commuters with lower education level are less likely to use auto than those with high education level. Older commuters are less likely to use public transit. All of the estimation results are behaviorally intuitive and consistent with expectations. The model’s log-likelihood value at convergence is -2495.646, corresponding to an adjusted likelihood ratio index of 0.1923 for the overall goodness-of-fit measure of the model.

Next, the proposed SGMNL (semi-nonparametric generalized multinomial logit) model is estimated to relax the standard Gumbel distribution for random components in modal utility functions. First, consider the specification in which K_{j} is set at 1, where “K” is the number of polynomials in Eq (7) and “j” is an index for travel mode (i.e., j = 1, 2, 3 or 4). When K_{1} = 1, it is found that the log-likelihood value improves from -2495.646 to -2488.037. As the current model nests the original MNL model, the likelihood ratio chi-square test may be applied to show that the improvement is statistically significant [i.e., (2495.646–2488.037) ×2 = 15.22 > 3.84, the critical chi-square value for one degree of freedom at a 95% confidence level]. This result strongly rejects the assumption of a standard Gumbel distribution for the random component in the auto utility function.

Model estimation results are presented in the second part of Table 2 and denoted as “SGMNL-11”. In this model, the signs of explanatory variable coefficients do not change from those obtained in the standard MNL model, but the magnitudes of coefficients in the auto utility function are found to differ. As expected, the alternative specific constant in the auto utility function changes substantially from -0.0919 to 0.9242 because the expectation of the new SNP distribution is very different from the expectation of the standard Gumbel distribution (Euler constant ≈ 0.577), and the alternative specific constant reflects this difference. An interesting finding is that the significance level of the single coefficient δ_{1,1} (as indicated by the t-statistic) is not as strong as that implied by the χ^{2} test for the overall model fit. However, it should be noted that the likelihood ratio test should be applied to determine whether a semi-nonparametric choice model form is more appropriate because the significance of multiple coefficients, and their contribution to overall goodness-of-fit, needs to be tested in most occasions.

After one significant coefficient δ_{1,1} is found for the first utility function, K_{2} in the second utility function is then set to 1 and δ_{2,1} is estimated. Estimation results for this model, denoted as “SGMNL-21”, are presented in the third part of Table 2. It can be seen that, after δ_{2,1} is introduced in the model specification, δ_{1,1} becomes insignificant but δ_{2,1} becomes highly significant as indicated by the t-statistics. The likelihood ratio test indicates that the model “SGMNL-21” with additional coefficient δ_{2,1} is significantly better than the model “SGMNL-11”, which does not include parameter δ_{2,1} [(2488.037–2472.741) × 2 ≈ 30.59 > 3.84]. The likelihood ratio test also shows that “SGMNL-21” is significantly better than the regular MNL model specification [(2495.646–2472.741) × 2 ≈ 45.81 > 5.99, the critical χ^{2} value corresponding to two degrees of freedom at a 95% confidence level]. Given that both “SGMNL-11” and “SGMNL-21” performed significantly better than the regular MNL model, both δ_{1,1} and δ_{2,1} should be retained in the SNP model. A comparison of coefficient estimates shows considerable differences across the “SGMNL-21”, “SGMNL-11”, and “MNL” models, particularly for the transit utility functions. This is consistent with the notion that the introduction of δ_{1,1} and δ_{2,1} will change the expectation and standard deviation of random components; both alternative specific constants and coefficients of explanatory variables change accordingly.

When δ_{3,1} or δ_{4,1} for bicycle and walk modes are introduced, no significant improvement is observed. In the interest of brevity, those estimation results are not presented here. The modeling effort now moves to the second stage, where the “K” value is increased to 2 and the coefficients δ_{1,2}, δ_{2,2}, δ_{3,2} and δ_{4,2} are introduced into the model one by one. In this stage, it is found that only the introduction of δ_{2,2} in the transit utility function significantly improves the overall model fit (χ^{2} test value = 6.57 > 3.84) while all other δ values do not. A final model estimation effort is performed, in which the “K” value is increased to 3 and parameter δ_{2,3} is introduced in the model. The maximum likelihood estimation procedure fails to converge, indicating that the sample of 2,756 choice observations may not be sufficient to support model estimation where the “K” value is increased to 3. Thus, the final best model is considered to be that which adopts a “K” value of 2 and introduces parameter δ_{2,2}, in addition to parameters δ_{1,1} and δ_{2,1} introduced in “SGMNL-21”. This final model is designated “SGMNL-22”. If its model coefficients are compared with those in “SGMNL-21”, there is no substantial difference observed, except for the alternative specific constant and the coefficient associated with the “high-income” dummy variable in the transit utility function. As this is considered the final model, all subsequent comparisons are conducted between the MNL model and the final “SGMNL-22” model.

### 4.2 Plotting probability density distributions of random components in the *“SGMNL-22”* model

Fig 3 depicts the probability density distributions of random components in the “SGMNL-22” model. Eqs (11) and (12) are used to convert the estimated δ values to ξ values and then Eq (13) is used to compute the probability densities based on ξ values. The green curve represents the standard Gumbel distribution for random components in bicycle and walk mode utility functions (i.e., e3 and e4 in Fig 3). The blue curve represents the distribution of the random component in the auto utility function. The coefficient δ_{1,1} not only reduces the variance of the distribution of the random component but also shifts its mode towards the negative side by about 0.6 units. This helps explain why the alternative specific constant in the auto utility of the “SGMNL-22” model is substantially more positive than that in the MNL model. The positive alternative specific constant offsets the negative expectation of the new random component. The lower variance of the error distribution for the auto utility may be due to the existence of fewer unspecified or unobserved random factors associated with auto mode choice than with other mode choices. The distribution of the random component in the transit utility function (i.e., e2) presents an interesting pattern in the context of this study. With the inclusion of parameters δ_{2,1} and δ_{2,2} in the model (both of which are significant), “e2” depicts a bimodal distribution as shown by the red curve. The major mode on the right side is located near 0.6 and the minor one on the left side is near -1.2 on the coordinate axis. Based on this finding, it may be conjectured that there are two key groups of commuters mixed in the sample. One group of commuters has a positive attitude and inclination towards using transit and is associated with the major mode of the distribution. Meanwhile, a smaller group of commuters has a negative attitude towards transit and comprises the distribution near the minor mode. Although the exact source of the bimodal distribution is uncertain, the proposed SNP modeling method depicts the existence of such a phenomenon and exposes the potential limitation of using conventional MNL choice models that are based on unimodal distributional assumptions. Capturing the bimodal distribution in the choice model can help realize more consistent coefficient estimates and reliable policy sensitivities.

### 4.3 A comparison of aggregate marginal effects and elasticities

Coefficients in choice models usually do not directly reflect the impact of an explanatory variable on choice probabilities, particularly when the standard deviations of random components are scaled up or down, as in the transit or auto utility in the SGMNL model estimated in this study. To better understand differences in model sensitivity between MNL and SGMNL models, marginal effects and elasticities are computed and compared. In this subsection, aggregate marginal effects (*AME*) and aggregate elasticities (*AE*) with respect to level of service (LOS) variables are computed based on the following two equations:
(20)
(21)

In the above equations, “P” represents the choice probability expression of the MNL or SGMNL model. “x_{i}” represents a vector of explanatory variables except the one (i.e., z_{i}) whose marginal effect or elasticity is being computed. “Δ” takes a value of 0.01 in this study as it is found that such a small interval provides sufficiently accurate estimates for “*AME*” and “*AE*” in both MNL and SGMNL models. Table 3 presents a comparison of computed “*AME*” and “*AE*” values between MNL and SGMNL-22 models. Relative differences in “*AME*” and “*AE*” are found to be considerable, which validates the notion that maximum likelihood estimators are inconsistent when distributional assumptions are violated. Such differences have important policy implications for transportation planning and management. For example, suppose a transportation authority intends to shift commuters from the auto mode to the transit mode by increasing transit service frequency. In predicting the number of commute drivers who will shift from auto to transit in response to the transit improvement, the conventional MNL model underestimates the elasticity with respect to transit service frequency by 25% (-0.082 vs -0.110).

### 4.4 A comparison of disaggregate marginal effects and elasticities

The “*AME*” or “*AE*” presented in the previous subsection provide sample sensitivity to explanatory variables at the aggregate level and show how a level of service (LOS) variable, for example, affects market shares of alternatives based on the assumption that the sample is randomly drawn and can therefore represent the population shares well. However, aggregate measures of effects mask an important difference between MNL and SGMNL models. The MNL model has the IIA (Independence of Irrelevant Alternatives) property while the SGMNL model does not have this property. In order to illustrate this important difference between the two models, disaggregate marginal effects and elasticities are computed and compared for a specific individual commuter who is a 40 year old male with medium-level income and education level above middle school. The multimodal transportation level of service variables for this individual’s commute are as follows: auto in-vehicle time is 5 minutes; transit in-vehicle time is 8 minutes; transit service frequency is 6 times per hour; bicycle travel time is 12 minutes; and walk travel time is 35 minutes. Given these input variables for this specific commuter, both MNL and SGMNL-22 models are applied to compute choice probabilities of alternative travel modes. Results are shown in Table 4. There is a substantial difference in the choice probability of transit mode between the two models. The computations show that the MNL model returns a transit choice probability that is higher than that provided by the SGMNL-22 model by 41.8%, presumably because the model does not capture and reflect the bimodal distribution of the random component in the transit utility function.

Table 4 also presents a comparison of predicted means of market shares (i.e., ) over the entire sample. An appealing property of the MNL model is that it can replicate the observed sample shares perfectly using alternative specific constants in utility functions [1]. The SGMNL model does not have this feature, but the greatest difference occurs in the transit share where the relative difference is found to be only 1.5%, which is quite reasonable and acceptable.

The IIA property, which is a key feature of the MNL model, also manifests in the form of equal cross-elasticities [40]. Formulations similar to those expressed in Eqs (20) and (21) are applied to compute disaggregate marginal effects and elasticities with respect to LOS variables. The only difference is that the equations are applied to the specific individual commuter as opposed to all of the commuters in the sample. Results of the computations are presented in Table 5.

It can be seen that cross-elasticities are equal in the MNL model, which reflects its IIA property. However, with unequal variances in auto and transit utilities in the SGMNL model, cross-elasticities for auto and transit choice probabilities are not equal, thus demonstrating that the SGMNL model does not possess the IIA property. However, because the random components in bicycle and walk utilities have equal variance, cross-elasticities for these two alternatives are still equal and therefore the IIA property holds for the bicycle and walk modes even in the case of the SGMNL model. This is similar to the situation where two alternatives belong to the same nest in a nested logit model.

### 4.5 A comparison of changes in transit choice probability in response to a service frequency improvement

To further illustrate the policy implications of alternative model forms, changes in transit choice probability predicted by the two models in response to a service frequency improvement are compared for the specific individual commuter considered previously. The result of this comparison is presented in Fig 4. Relative to the SGMNL model, the MNL model overestimates the transit choice probability when the service frequency is low (<18 per hour) but underestimates it when the service frequency is high (≥18 per hour). A service frequency of 18 transit vehicles per hour is quite high, reflecting a headway of just over three minutes. Given that most real-world transit services operate at frequencies less than 18 vehicles per hour, it appears that the MNL model is likely to overestimate the transit choice probability relative to the SGMNL model. In this particular example, when the service frequency is very low (≤4 per hour), the relative difference between the predicted transit choice probabilities computed from the MNL and SGMNL models can exceed 50%.

## 5. Conclusions

In this paper, a semi-nonparametric generalized multinomial logit (SGMNL) model is formulated and developed by applying orthonormal Legendre polynomials to extend the standard Gumbel distribution that lies at the core of multinomial logit models applied in practice. The semi-nonparametric function with flexible forms can represent a probability density function for a large family of multimodal distributions. Unlike the existing semi-nonparametric modeling method which is applied to binary choice situations in the econometric literature, the proposed method allows for modeling multinomial choices, which are typically encountered in travel-related choice behavior analysis and travel demand modeling. The advantage of the proposed method is that the formulation results in a closed-form likelihood function and standard maximum likelihood estimation methods can be applied for parameter estimation. Thus, the model estimation procedure is computationally efficient and free from simulation-based complexity or errors.

The proposed modeling method is applied to an empirical setting of commute travel mode choice among four alternatives (auto, transit, bicycle and walk), based on travel survey and network skim (level of service) data from the Canton of Argau in Switzerland. It is found that the distribution of the random component in the auto utility function is similar to a Gumbel distribution, but has substantially smaller variance. More notably, the random component in the transit utility function follows a bimodal distribution, which indicates a significant departure from and violation of the assumption of a Gumbel distribution. Unequal variances accommodated in the formulation allow the semi-nonparametric model to be free of the limitations of the IIA property that are inherent to the multinomial logit model. The semi-nonparametric model specifications are found to offer superior goodness-of-fit when compared with the MNL model. The violation of the standard Gumbel distribution assumption in the multinomial logit model leads to inconsistent coefficient estimates, marginal effects, elasticities and choice probabilities. In the empirical context considered in this study, the multinomial logit model is found to overestimate the predicted transit choice probability relative to the semi-nonparametric model for transit service scenarios commonly encountered in the real world.

A few limitations of the proposed method and directions for future research are worthy of note. First, it may be challenging to directly apply the proposed method to model choice behaviors in the context of a large choice set (e.g. [41]). The likelihood function, depicted in Eq (18), involves multiple levels of summations and the number of levels is dependent on the number of alternatives in the choice set. Thus, the computational complexity will increase geometrically with an increase in the number of alternatives in the choice set. Future research should focus on reducing computational complexity in the context of large choice sets. Second, the proposed model is developed based on the assumption that random components in utility functions are mutually independent. However, this assumption may not hold in empirical settings. In future research, there may be the potential to introduce correlations in joint semi-nonparametric distributions and develop nested or cross-nested versions of the proposed semi-nonparametric multinomial choice model. Third, it is uncertain whether the empirical results of this study, in which the random component of the transit utility is found to follow a bimodal distribution, are valid in different geographical and modal contexts. Conducting studies similar to this one in different contexts would help shed light on the generalizability of results reported in this paper.

## Supporting information

### S1 Fig. Comparisons of semi-nonparametric probability densities when K = 1.

https://doi.org/10.1371/journal.pone.0186689.s003

(TIF)

### S2 Fig. Comparisons of semi-nonparametric probability densities when K = 2.

https://doi.org/10.1371/journal.pone.0186689.s004

(TIF)

### S3 Fig. Probability density distributions of random components in the “SGMNL-22” model.

https://doi.org/10.1371/journal.pone.0186689.s005

(TIF)

### S4 Fig. Transit choice probability for a specific commuter in response to an improvement in service frequency.

https://doi.org/10.1371/journal.pone.0186689.s006

(TIF)

### S2 Table. Model estimation results of MNL, SGMNL-11, SGMNL-21 and SGMNL-22.

https://doi.org/10.1371/journal.pone.0186689.s008

(TIF)

### S3 Table. Comparisons of aggregate marginal effects (AME) and elasticities (AE).

https://doi.org/10.1371/journal.pone.0186689.s009

(TIF)

### S4 Table. Comparisons of market shares and individual choice probabilities.

https://doi.org/10.1371/journal.pone.0186689.s010

(TIF)

### S5 Table. Comparisons of disaggregate marginal effects and elasticities.

https://doi.org/10.1371/journal.pone.0186689.s011

(TIF)

## References

- 1.
Ben-Akiva M, Lerman S. Discrete choice analysis: theory and application to travel demand. Cambridge, Massachusetts, U.S: The MIT Press; 1985.
- 2. Ding C, Wang D, Liu C, Zhang Y, Yang J. Exploring the influence of built environment on travel mode choice considering the mediating effects of car ownership and travel distance. Transp Res Part A Policy Pract. 2017;100:65–80.
- 3. Ding C, Wu X, Yu G, Wang Y. A gradient boosting logit model to investigate driver’s stop-or-run behavior at signalized intersections using high-resolution traffic data. Transp Res Part C Emerg Technol. 2016;72:225–38.
- 4. Shen Q, Chen P, Pan H. Factors affecting car ownership and mode choice in rail transit-supported suburbs of a large Chinese city. Transp Res Part A Policy Pract. 2016;94:31–44.
- 5. Bhat CR. A multiple discrete–continuous extreme value model: formulation and application to discretionary time-use decisions. Transportation Research Part B: Methodological. 2005;39(8):679–707.
- 6. Bhat CR. The multiple discrete-continuous extreme value (MDCEV) model: role of utility function parameters, identification considerations, and model extensions. Transportation Research Part B: Methodological. 2008;42(3):274–303.
- 7. Garikapati V, You D, Pendyala R, Vovsha P, Livshits V, Jeon K. Multiple discrete-continuous model of activity participation and time allocation for home-based work tours. Transp Res Rec. 2014;(2429):90–8.
- 8. Jäggi B, Weis C, Axhausen KW. Stated response and multiple discrete-continuous choice models: Analyses of residuals. Journal of choice modelling. 2013;6:44–59.
- 9.
Pinjari AR, Bhat CR. Computationally efficient forecasting procedures for Kuhn-Tucker consumer demand model systems: application to residential energy consumption analysis. University of South Florida: Department of Civil and Environmental Engineering, 2011.
- 10. Bhat CR, Castro M, Khan M. A new estimation approach for the multiple discrete–continuous probit (MDCP) choice model. Transportation Research Part B: Methodological. 2013;55:1–22.
- 11. Konduri K, Ye X, Pendyala R. Probit-based discrete-continuous model of activity choice and duration with history dependency. Transp Res Rec. 2010;(2156):17–27.
- 12. Ma X, Tao Z, Wang Y, Yu H, Wang Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp Res Part C Emerg Technol. 2015;54:187–97.
- 13. Ma X, Yu H, Wang Y, Wang Y. Large-scale transportation network congestion evolution prediction using deep learning theory. PLoS One. 2015;10(3):e0119044. pmid:25780910
- 14. Tang J, Liu F, Zou Y, Zhang W, Wang Y. An Improved Fuzzy Neural Network for Traffic Speed Prediction Considering Periodic Characteristic. IEEE trans Intell Transp Syst. 2017:1–11.
- 15. Ye X, Pendyala RM. A probit-based joint discrete-continuous model system: analyzing the relationship between timing and duration of maintenance activities. Transportation and Traffic Theory 2009: Golden Jubilee. 2009:403–23.
- 16. You D, Garikapati V, Pendyala R, Bhat C, Dubey S, Jeon K, et al. Development of vehicle fleet composition model system for implementation in activity-based travel model. Transp Res Rec. 2014;(2430):145–54.
- 17. Zou Y, Yang H, Zhang Y, Tang J, Zhang W. Mixture modeling of freeway speed and headway data using multivariate skew-t distributions. Transportmetrica A: Transport Science. 2017:1–22.
- 18. Bera AK, Jarque CM, Lee L-F. Testing the normality assumption in limited dependent variable models. Int Econ Rev. 1984:563–78.
- 19.
Ye X, Garikapati VM, You D, Pendyala RM. A practical method to test the validity of the standard Gumbel distribution in Logit-Based multinomial choice models of human travel behavior. Shanghai, China: Tongji University; 2016.
- 20.
Train KE. Discrete choice methods with simulation: Cambridge university press; 2009.
- 21. Ding C, Lin Y, Liu C. Exploring the influence of built environment on tour-based commuter mode choice: a cross-classified multilevel modeling approach. Transp Res D Transp Environ. 2014;32:230–8.
- 22.
McFadden D. Conditional logit analysis of qualitative choice behavior. Frontiers in Econometrics. New York: Academic Press; 1974.
- 23. Manski CF. Maximum score estimation of the stochastic utility model of choice. J Econom. 1975;3(3):205–28.
- 24. Chikaraishi M, Nakayama S. Discrete choice models with q-product random utilities. Transportation Research Part B: Methodological. 2016;93:576–95.
- 25. del Castillo J. A class of RUM choice models that includes the model in which the utility has logistic distributed errors. Transportation Research Part B: Methodological. 2016;91:1–20.
- 26. Nakayama S, Chikaraishi M. Unified closed-form expression of logit and weibit and its extension to a transportation network equilibrium assignment. Transportation Research Part B: Methodological. 2015;81:672–85.
- 27. Sikder S, Pinjari AR. The benefits of allowing heteroscedastic stochastic distributions in multiple discrete-continuous choice models. Journal of choice modelling. 2013;9:39–56.
- 28. Klein RW, Spady RH. An efficient semiparametric estimator for binary response models. Econometrica. 1993:387–421.
- 29. Koster PR, Koster HR. Commuters’ preferences for fast and reliable travel: A semi-parametric estimation approach. Transportation Research Part B: Methodological. 2015;81:289–301.
- 30. Lee L-F. Semiparametric maximum likelihood estimation of polychotomous and sequential choice models. J Econom. 1995;65(2):381–428.
- 31. Li B. The multinomial logit model revisited: A semi-parametric approach in discrete choice analysis. Transportation Research Part B: Methodological. 2011;45(3):461–73.
- 32. Zhang S, Tang J, Wang H, Wang Y, An S. Revealing intra-urban travel patterns and service ranges from taxi trajectories. J Transp Geogr. 2017;61:72–86.
- 33. Gallant AR, Nychka DW. Semi-nonparametric maximum likelihood estimation. Econometrica. 1987:363–90.
- 34. Chen HZ, Randall A. Semi-nonparametric estimation of binary response models with an application to natural resource valuation. J Econom. 1997;76(1):323–40.
- 35. Creel M, Loomis J. Semi-nonparametric distribution-free dichotomous choice contingent valuation. J Environ Econ Manage. 1997;32(3):341–58.
- 36. Crooker JR, Herriges JA. Parametric and semi-nonparametric estimation of willingness-to-pay in the dichotomous choice contingent valuation framework. Environ Resour Econ (Dordr). 2007;27(4):451–80.
- 37.
Ye X, Robust modeling analysis of relationships between mode choice and trip chaining pattern using two-stage semi-nonparametric method. Transportation Research Board 89th Annual Meeting; 2010.
- 38. Bierens HJ. Semi-nonparametric interval-censored mixed proportional hazard models: Identification and consistency results. Econ Theory. 2008;24(03):749–94.
- 39. Fosgerau M, Bierlaire M. A practical test for the choice of mixing distribution in discrete choice models. Transportation Research Part B: Methodological. 2007;41(7):784–94.
- 40. Bhat CR. A heteroscedastic extreme value model of intercity travel mode choice. Transportation Research Part B: Methodological. 1995;29(6):471–83.
- 41. Wang Y, Wu B, Dong Z, Ye X. A Joint Modeling Analysis of Passengers’ Intercity Travel Destination and Mode Choices in Yangtze River Delta Megaregion of China. Mathematical Problems in Engineering,2016,(2016-7-19). 2016;2016(7):1–10.