Inference in skew generalized t-link models for clustered binary outcome via a parameter-expanded EM algorithm

Chénangnon Frédéric Tovissodé; Aliou Diop; Romain Glèlè Kakaï

doi:10.1371/journal.pone.0249604

Abstract

Binary Generalized Linear Mixed Model (GLMM) is the most common method used by researchers to analyze clustered binary data in biological and social sciences. The traditional approach to GLMMs causes substantial bias in estimates due to steady shape of logistic and normal distribution assumptions thereby resulting into wrong and misleading decisions. This study brings forward an approach governed by skew generalized t distributions that belong to a class of potentially skewed and heavy tailed distributions. Interestingly, both the traditional logistic and probit mixed models, as well as other available methods can be utilized within the skew generalized t-link model (SGTLM) frame. We have taken advantage of the Expectation-Maximization algorithm accelerated via parameter-expansion for model fitting. We evaluated the performance of this approach to GLMMs through a simulation experiment by varying sample size and data distribution. Our findings indicated that the proposed methodology outperforms competing approaches in estimating population parameters and predicting random effects, when the traditional link and normality assumptions are violated. In addition, empirical standard errors and information criteria proved useful for detecting spurious skewness and avoiding complex models for probit data. An application with respiratory infection data points out to the superiority of the SGTLM which turns to be the most adequate model. In future, studies should focus on integrating the demonstrated flexibility in other generalized linear mixed models to enhance robust modeling.

Citation: Tovissodé CF, Diop A, Glèlè Kakaï R (2021) Inference in skew generalized t-link models for clustered binary outcome via a parameter-expanded EM algorithm. PLoS ONE 16(4): e0249604. https://doi.org/10.1371/journal.pone.0249604

Editor: Luca Citi, University of Essex, UNITED KINGDOM

Received: August 15, 2020; Accepted: March 19, 2021; Published: April 6, 2021

Copyright: © 2021 Tovissodé et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting information files.

Funding: CFT is grateful to the Centre d’Excellence Africain en Sciences Mathématiques et Applications (CEA-SMA, https://ceasma-benin.org/) for funding his work. CFT was also financially supported by the African German Network of Excellence in Science (AGNES), through the "AGNES mobility grant for young scientists from sub Saharan Africa" (https://agnes-h.org/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Binary outcomes are prominent in many applied sciences, including but not limited to biological and social sciences. Moreover, in cross sectional as well as panel studies, dichotomous responses are often naturally grouped by sampling techniques or some properties of the sampling units [1]. The preferred modern method to analyze clustered binary data is through the Generalized Linear Mixed Model (GLMM) framework [2]. Indeed, when a binary outcome has been recorded repeatedly or in the presence of latent factors, GLMMs allow accounting explicitly for over-dispersion and correlation within clusters using random effects.

Let Y_ij denote the binary outcome (0 or 1) of the j^th measurement (j = 1, 2, ⋯, n_i) and Y_i the collection of responses from the i^th cluster, i.e. , i = 1, 2, ⋯, n. In terms of an underlying latent continuous random vector and random effects b_i = (b_i1, ⋯, b_iq)^⊤, the mixed probit model (PM) assumes that Y_ij are conditionally independent and given as [3]: (1) where I_A(x) is the indicator function which equals to 1 if x ∈ A and 0 otherwise; η_i is the n_i−vector of linear predictors, ; β is the p−vector of fixed effects; and are respectively known n_i × p and n_i × q matrices of covariates with X_ij = (X_ij1, ⋯, X_ijp)^⊤ and W_ij = (W_ij1, ⋯, W_ijq)^⊤; is the n_i × n_i identity matrix and denotes the q-variate normal distribution, with null mean vector and variance-covariance an unknown q × q matrix D, meant to capture the dependence structure of Y_i. The latent variable Z_ij serves for a convenient stochastic representation of the conditional outcome Y_ij. Equivalently, one may write P(Y_ij = 1|b_i) = Φ(η_ij) with Φ(⋅) the cumulative distribution function (cdf) of the standard normal distribution, standing as the inverse link function mapping the linear predictor η_ij and the predicted probabilities of the outcome Y_ij. Combined with the normality assumption on random effects, the systematic use of this link and the well known alternative, the logit link, is somewhat controversial [4, 5].

The link function indeed has a critical role in GLMMs since it heavily impacts estimates, predictions and consequently interpretations [4, 6]. As a result, in the binary generalized linear model literature, aside the logistic and probit models based on the steady shape logistic and normal distributions respectively, there has been increasing efforts to render the link function flexible. Many works have considered heavy tailed link functions, for instance the Semi-Nonparametric [7], Student-t [8] and generalized t [9] distributions, and elliptical scale mixtures [10, 11]. Indeed, the maximum likelihood estimators of logistic and probit regression models are not robust to outliers [7]. Heavy tailed links are not sensitive to outliers and thus allow outlier-robust inference. In particular, the links functions based on the Student t distribution incorporate observation-specific stochastic weights which can be used for outlier detection [7, 12]. Similarly, skew-probit [13], skew generalized t [9], asymmetric logistic [14], loglog and complementary loglog, power symmetric and reciprocal power symmetric [15] links were used among others to handle situations where the probability of a given binary response approaches zero at a different rate than it approaches one. Skew logistic distributions have also been developed (see e.g. [16]) and may be used with the same aim in mind. For example [9], discussed a prostate cancer study where the outcome variable Y represents the presence or the penetration of cancer in or near the prostate capsule of patients. The rate at which the probability of “Y = 1” approaches one is expected to be very different (slower) from the rate at which it approaches zero [9]. For this study, the skew generalized t-link fits best the data [9], indicating that the simultaneous use of skewed and heavy tailed link functions can lead to more effective modelling of binary data.

Furthermore, although random effects are traditionally assumed to be normally distributed in GLMMs, this may not be realistic [17, 18]. Therefore, huge efforts have been devoted to making the random effects distribution in GLMMs flexible, replacing the normal distribution with, for instance Semi-Nonparametric [19], probability integral transformation of normal [20], skew normal [21], log-normal [22, 23], Student-t [24] and scale mixtures of normal [25] distributions.

The above background demonstrates the extent to which the number of possible approaches for fitting a flexible GLMM to correlated binary outcomes goes, although none of these approaches attempts to explicitly account for skewness and tail behavior of the link function as well as the random effects distribution simultaneously. However, the misspecification of the link function or the random effects distribution can introduce substantial bias and reduce the accuracy of the mean response as well as heterogeneity estimates [6, 18]. Standing in a fully parametric frame, we propose a unifying approach based on skew generalized t (SGT) distributions [26], that is the class of models including among others the normal, the skew normal and the Student t models. The use of a skew generalized t family instead of the Student t family allows to rescale fixed effects so that they have the same interpretation as in the mixed probit model in Eq (1).

Our contributions include i) an extension of the flexible generalized t-link model built for independent binary samples proposed by [9] to deal with dependent binary samples (mixed model); ii) a parameter-expanded EM algorithm [27] for computing the maximum likelihood of skew generalized t-link models for correlated binary data, extending the EM algorithm of [24] for t-link models; and iii) empirical Bayes estimators of skew t distributed random effects in mixed models for binary data.

The organization of the paper is as follows. Section 2 presents preliminary results on the SGT distributions and the truncated SGT distributions and their first two moments. Section 3 specifies the SGT-link model (SGTLM) and describes maximum likelihood estimation and cluster-specific inference based on random effects and weights. A simulation study assessing the relative performance of the SGTLM relative to existing methods and the application of the modelling approach to a real respiratory infection data are presented in Section 4. Concluding remarks are given in section 5.

Preliminary results

In this section, we present some useful properties of the skew generalized t distributions.

Multivariate skew generalized t distributions

Multivariate skew generalized t (SGT) distributions are special cases of multivariate skew scale mixture of normal (SSMN) distributions [28] (pages 102-103) which we first introduce. A random variable Z is said to follow a p−variate SSMN distribution with location μ, scale Ω, and shape λ, if it can be represented as [29] (page 20, Eq 3.12): (2) where U, called scale mixing variable, is a positive random variable with cdf F_U(⋅|ν) indexed by a parameter vector ν, is the standard half normal distribution; Z₀, X and U are independent; and δ = (1 + λ^⊤ λ)^−1/2 Ω^1/2 λ. Different choices of the scale mixing distribution F_U(⋅|ν) result in various sub-classes of skew elliptical distributions, for instance, the skew normal when P(U = 1) = 1 [28] (page 103); the skew contaminated normal when ν = (ν₁, ν₂)^⊤, 0 < ν₁ < 1, 0 < ν₁ ≤ 1 and U is discrete and takes the values U = 1 with probability 1 − ν₁ and U = ν₂ with probability ν₁ [30] (page 308); the skew slash when , ν > 0 [30] (page 307); and the skew generalized t when ν = (ν, ν₀)^⊤, ν > 0, ν₀ > 0, [28] (page 105). The following result states conditions for the identifiability of SSMN distributions, a requirement for reliable inference using this class of distributions.

Lemma 1 (see S1 Appendix for a proof) The free parameters (μ, δ, Ω and ν) of a SSMN distribution with the representation in Eq (2) are identifiable if and only if i) the scale mixing distribution F_U(⋅|ν) is identifiable and ii) F_U(⋅|ν) does not satisfy for any element ν_k of ν and any distribution function H(⋅|ν_−k) where ν_−k is the vector ν without the element ν_k. If U has a probability density function (pdf) f_U(u|ν) for all u > 0, then the condition ii) is equivalent to f_U(⋅|ν) does not satisfy for any pdf h_U(⋅|ν_−k).

On setting and defining the expectations , and assuming that for the required expectations, the first two central moments of a SSMN vector Z are given by [28] (pages 109-110): (3) (4) The ability of the SSMN distributions to capture more data structure than the normal, the skew normal or the scale mixture of normals is reflected in the expressions for skewness () and kurtosis () indices given for the k^th marginal of Z as [31]: (5) (6) where δ_k is the k^th element of δ and is the k^th diagonal element of the covariance matrix given in Eq (4).

We notice from the expressions for skewness Eq (5) and kutosis Eq (6) indices that the parameter λ controls the shape of the distribution only through the working shape parameter δ = (1 + λ^⊤ λ)^−1/2 Ω^1/2 λ. This quantity is invariant under marginalization, i.e. by the stochastic representation in Eq (2), for any arbitrary subset of Z, the working shape parameter is the corresponding subset of δ. It is worth noticing however that the quantity δ cannot be specified unrestrictedly, independently of Ω. Indeed, we observe that δ = (1 + λ^⊤ λ)^−1/2 Ω^1/2 λ implies that λ = (1 + λ^⊤ λ)^1/2 Ω^−1/2 δ. This in turn gives λ^⊤ λ = (1 + λ^⊤ λ)δ^⊤ Ω⁻¹ δ which yields λ^⊤ λ(1−δ^⊤ Ω⁻¹ δ) = δ^⊤ Ω⁻¹ δ. We then get so that , i.e. 1 + λ^⊤ λ = (1 − δ^⊤ Ω⁻¹ δ)⁻¹ provided δ^⊤ Ω⁻¹ δ ≠ 1. Therefore, λ is recorvered from δ and Ω under the constraint δ^⊤ Ω⁻¹ δ < 1 as: (7) It is nevertheless possible to unrestrictedly specify δ and (positive definite). In this case, Ω is recovered as . Using the Sherman-Morrison identity [32] (page 121, Eq 3.1), we have from which we get that simplifies as . We thus have hence (8) In the binary data modeling framework, we shall consider δ and as model parameters as they turn to be easier to estimate by the mean of the EM algorithm. For the multivariate Skew Generalized t (SGT) distribution, the mixing variable U is gamma distributed, i.e. with pdf [33] (page 1, Eq 1): (9) The p−variate SGT distribution, denoted with ν = (ν, ν₀)^⊤ has pdf for [28] (page 106): (10) where (11) is the pdf of the p−variate Generalized t (GT) distribution, z₀ = Ω^−1/2 (z − μ), α = λ^⊤ z₀, and T(⋅|ν) is the cdf of the standard univariate t distribution with ν degrees of freedom. For SGT distributions, the expectations required for computing moments given in Eqs (3)–(6) have for t < ν the form (12) It is worthwhile noticing that the gamma mixing pdf f_G(⋅|ν/2, ν₀/2) satisfies the condition i) of Lemma 1 but not the condition ii). The SGT ditribution with ν as a parameter is thus not identifiable. However, restricting ν₀ to a fixed value (so that only ν is considered as a parameter) is sufficient to ensure identifiability of the SGT family of distribution. When ν₀ = ν, the p−variate SGT distribution reduces to the p−variate Skew t (ST) distribution [28] (page 106), denoted which is thus identifiable with pdf St_p(⋅|μ, Ω, λ, ν) and cdf St_p(⋅|μ, Ω, λ, ν). If λ = 0, the SGT distribution reduces to the GT distribution which equals the Student t distribution for ν₀ = ν. The following lemma formalizes the relationship between skew generalized t and skew t distributions.

Lemma 2 (see S2 Appendix for a proof) Let us consider the SGT distribution with ν = (ν, ν₀)^⊤ and pdf in Eq (10). Set . The following statements hold:

SGt_p(z|μ, Ω, λ, ν) = St_p(z|μ, Ω*, λ, ν);
If then .

Lemma 2 indicates that any SGT vector is a rescaled version of a ST vector. However, in the frame of binary data models, the use of a SGT distribution instead of a simple ST distribution as link function allows to control the scale of the link function through the parameter ν₀ [9]. Specifically, a skew generalized t-link allows to define a skewed and heavy-tailled binary mixed model where fixed effects have the same scale and interpretation as in the mixed probit model in Eq (1). Interestingly, the popular logit and probit links for binary data can be recast as special cases of the cdf of the SGT class of distributions. Indeed, the normal distribution is a limiting case of SGT distributions when ν₀ = ν → ∞ and λ = 0. Moreover, the logistic distribution is well appoximated by a rescaled Student t distribution with appropriate degrees of freedom [8] (page 228). These constatations make the SGT distributions appropriate for extending the traditional probit and logistic GLMMs, accounting for skewness and heavy tail behaviors. To this end, we present in the next section some results on truncated multivariate SGT distributions since binary data can reflect truncation of latent continuous variables.

Truncated multivariate skew generalized t distributions

As seen for the mixed probit model in Eq (1), models for binary data can be defined by truncating latent variables following continuous distributions. We define in this section a class of truncated multivariate skew generalized t distributions which are useful for a latent variable representation of skew generalized t-link binary data models. We also give expressions to evaluate some joint moments of a truncated multivariate skew generalized t distribution and a gamma distribution, as they prove useful in the implementation of the EM algorithm for the skew generalized t-link model.

Let represent a p−variate skew generalized t (SGT) vector restricted to a p-dimensional hyperplane ; with μ a p−vector (location), Ω a p × p positive definite matrix (scale), λ a p−vector (shape) and ν = (ν, ν₀)^⊤ a vector of positive scalars (degrees of freedom). The pdf of is: (13) where SGt_p(⋅|μ, Ω, λ, ν) is the pdf in Eq (10) and serves for normalization. The cdf of Z is denoted . When λ = 0, we obtain a truncated generalized t distribution denoted with pdf and cdf . When ν₀ = ν, we get a truncated ST distribution denoted with pdf and cdf . If both λ = 0 and ν₀ = ν, the truncated multivariate SGT distribution is reduced to a truncated multivariate t distribution [34] denoted with pdf and cdf .

In the frame of correlated binary data models, the truncation region typically has the form where are real intervals of the form or , for (k = 1, 2, ⋯, p). Let us consider for instance a vector Y of three binary outcomes obtained by truncating the elements of a 3−variate SGT vector : Y_k = 0 if Z_k ≤ 0 and Y_k = 1 if Z_k > 0. In practice, however, only the binary outcomes (Y) are observable whereas the latent outcome Z is unobservable. Suppose one observes the binary outcomes y = (1, 0, 1)^⊤. This implies that the corresponding value z of the latent vector Z satisfies z₁ > 0, z₂ ≤ 0, and z₃ > 0. The conditional distribution of Z given Y = y (required for maximum likelihood estimation using the EM algorithm) is thus truncated to the region , i.e. as defined in Eq (13).

We shall use the simplified notation with to denote a truncated SGT distribution when is the right truncated hyperplane . In this case, α_st = SGT_p(a|μ, Ω, λ, ν) with SGT_p(⋅|μ, Ω, λ, ν) the cdf of the p−variate ST distribution. This corresponds for instance to the situation where all binary outcomes are zeros. When λ = 0, the right truncated SGT distribution is a right truncated GT distribution denoted whose pdf and cdf are respectively denoted TGt_p(⋅|μ, Ω, ν, a) and TGT_p(⋅|μ, Ω, ν, a). When ν₀ = ν, the right truncated SGT distribution is a right truncated ST distribution denoted with pdf TSt_p(⋅|μ, Ω, λ, ν, a) and cdf TST_p(⋅|μ, Ω, λ, ν, a). If both λ = 0 and ν₀ = ν, the distribution is reduced to a right truncated t distribution denoted with pdf Tt_p(⋅|μ, Ω, ν, a) and cdf TT_p(⋅|μ, Ω, ν, a). In the above example, if y = (0, 0, 0)^⊤, then the truncation region becomes . Since all truncation points are zeros, we shall write in this case with a = (0, 0, 0)^⊤ using the above simplified notation.

The implementation of an EM algorithm for a SGT distribution based binary data model requires joint moments of the form , , and for s ∈ {1, 2}, Z⁽¹⁾ = Z and Z⁽²⁾ = Z Z^⊤, , , α = λ^⊤ Ω^−1/2(Z − μ), ζ₁(x) = ϕ(x)/Φ(x) with ϕ(⋅) the pdf of the standard normal distribution, and is an hyperplane of the form with , (a_k, ∞)} for a = (a₁, ⋯, a_p)^⊤. Proposition 1 hereafter will be useful for the derivation of , , and .

Proposition 1 (see S3 Appendix for a proof) Let with ν = (ν, ν₀)^⊤, and set α = λ^⊤ Ω^−1/2(Z − μ). Then, for any real r > − ν and an integrable function g(⋅) of Z: (14) (15) where , , , and ; , , , with , , and .

By the mean of a simple linear transformation, we obtain from Proposition 1 the joint expectations , , and in terms of moments of a truncated multivariate skew t distribution.

Corollary 1 (see S4 Appendix for a proof) Let with ν = (ν, ν₀)^⊤, , and with or . Then, on setting , , A = diag(A₁, ⋯, A_p) with A_k = 1 if and A_k = −1 if , a* = Aa, μ* = A μ, , , λ* = Aλ and α_st = ST_p(a*|μ*, Ω*, λ*, ν), (16) (17) (18) (19) (20) (21) where , , and we have set and .

For a practical use of Corollary 1, the cumulative multivariate skew t distribution is required. To this end, the function pmst of the package sn [35] in R freeware [36] is an appropriate routine.

Moments of truncated multivariate skew generalized t distributions

The evaluation of expectations involved in Corollary 1 calls for general expressions for the first and second order moments of truncated multivariate SGT distributions. These moments are required in the implementation of an EM algorithm for a SGT distribution based binary data model. The moments have been derived for truncated multivariate t distributions by [34] and were used by [24] in their implementation of the EM algorithm for a t-link GLMM. We present in this section the expressions for the first two moments of the multivariate SGT distributions, relying on the Theorem 1 of [37] and the moments of truncated multivariate t distributions available from [34] (see also [38]).

Let with ν = (ν, ν₀)^⊤ and , i.e. a p−variate SGT vector restricted to the right truncated hyperplane . The pdf of Z is: (22) where α_st = SGT_p(a|μ, Ω, λ, ν), SGT_p(⋅|μ, Ω, λ, ν) is the cdf of the p−variate SGT distribution. If μ = 0, Ω is a correlation matrix (Ω = R) and ν₀ = ν, then . In this case, the first two moments of Z can be evaluated using the following proposition which simply combines Theorem 3 in [34] with Theorem 1 in [37].

Proposition 2 (see S5 Appendix for a proof) Let with R a correlation matrix. Then, (23) (24) where with T_p(⋅|μ, Ω, ν) the cdf of the p−variate t distribution with location μ, scale Ω and degrees of freedom ν; with i^th element , t(⋅) being the pdf of the standard Student t distribution; with , and ; with i^th element , ; H* is the p × p matrix with diagonal elements and off diagonal elements defined as with ; ; D* is the p × p diagonal matrix with diagonal elements , H^*i denoting the i^th column of H*; , , , , with δ_i the i^th element of δ, ρ_ij the (ij)^th element of R; Hⁱ the i^th column of H; the vector a with its (i + 1)^th element (i.e.a_i) deleted; the (i + 1)^th column of with its (i + 1)^th element (i.e. 1) deleted; , being with its (i + 1)^th row and column deleted; ; the vector with its (i + 1)^th and (j + 1)^th elements (i.e.a_i and a_j) deleted; , being with its (i + 1)^th and (j + 1)^th rows and columns deleted; being the matrix with its (i + 1)^th and (j + 1)^th columns deleted, and only its (i + 1)^th and (j + 1)^th rows kept; ; ; the vector a with its i^th element (i.e.a_i) deleted; , being R with its i^th row and column deleted; being the matrix with its first and (i + 1)^th columns deleted, and only its first and (i + 1)^th rows kept; and .

The following corollary gives the first two moments of a general right truncated SGT vector with ν = (ν, ν₀)^⊤.

Corollary 2 Let with ν = (ν, ν₀)^⊤. Then, (25) (26) where , is the i^th diagonal element of Ω, , R is the correlation matrix from Ω, , a* = Λ⁻¹(a − μ) and E{X} and E{XX^⊤} are available from Proposition 2.

When ν → ∞, the truncated multivariate SGT family has the truncated multivariate skew normal family as a limiting case (see S5 Appendix for a definition and formulas for computing the first two moments).

Skew generalized t-link mixed binomial model

This section defines the skew generalized t-link model (SGTLM) and describes an Expectation-Maximization (EM) algorithm [39] accelerated using parameter expansion [27] for likelihood inference. Empirical Bayes estimators of random effects and weights are also obtained for cluster specific inference.

Model specification and marginal log-likelihood

The skew generalized t-link GLMM (SGTLM) considered in this work is defined as: (27) where Y_ij is the binary outcome of the j^th measurement (j = 1, 2, ⋯, n_i) on the i^th cluster (i = 1, 2, ⋯, n), Z_i is a latent continuous outcome which determines the observable , and b_i is a vector of q random effects associated to the cluster i. In Eq (27), η_i = X_i β + W_i b_i, β, X_i and W_i are as in Eq (1); , , , is the n_i−vector of all ones, , with υ₀ > 0 and ν > 2; and with and a q × q positive define matrix.

In the SGTLM, the distribution of a single latent outcome Z_ij is where and denotes a univariate SGT distribution with location μ, scale ω², shape λ and degrees of freedom ν. Therefore, on denoting SGT(⋅|μ, ω², δ, ν) the cdf of a scalar SGT distribution , the success probability of an outcome Y_ij given the random effects b_i is . Unlike in the common probit model (PM) (see Eq (1)), for a given cluster i, the n_i latent outcomes Z_ij are not independent given the random effects b_i. Indeed, on using Eq (4) and setting , the variance-covariance matrix of Z_i given b_i is (28) so that the correlation coefficient between two elements Z_ij and Z_ik of Z_i with k ≠ j is . Thus, conditional on random effects, the n_i latent outcomes in Z_i are uncorrelated only when δ_ε = 0, i.e. a skewed link function implies correlated latent outcomes within a cluster i.

The positive constant υ₀ in the SGTLM controls the scale of the latent variable Z_ij and thus the scale of the model link function. Indeed, from Eq (28), the conditional variance of Z_ij is . Setting υ₀ = 1 would yield a skew t-link model (i.e. ν = (ν, ν)^⊤). However, to make fixed effects in the SGTLM comparable with fixed effects in the common probit model (PM) characterized by a link function with a unit scale (i.e. Var{Z_ij|b_i} = 1), we have set (29) The application of Eq (3) to b_i shows that, as in the PM, random effects in the SGTLM have null mean vector E{b_i} = 0. Using Eq (4), the variance-covariance matrix of random effects is given by (30) When δ_ε = 0 and δ = 0, the SGTLM is reduced to the t-link model in [24] except υ₀ = 1 therein. As ν → ∞ (so that U_i = 1), the STGLM has as limiting case the mixed skew-probit model (SPM) which reduces to the PM for δ_ε = 0 and δ = 0.

By Eq (2), the SGTLM has the stochastic representation (31) where U_i and V_i are independent. In this representation of the SGTLM in terms of more common distributions, the n_i binary outcomes Y_ij of a cluster i are independent given the random effects b_i, the scale mixing variable U_i and the half normal variable V_i. Note that given U_i and V_i, Z_i|b_i and b_i are normally distributed and share the same U_i and V_i. As a result, the joint distribution of Z_i and b_i belongs to the class of SGT distributions. From the stochastic representation in Eq (31), we obtain the unconditional distributions of Z_i and as follows.

Proposition 3 (see S6 Appendix for a proof). Let us consider the latent vector Z_i and the binary variable Y_ij in Eq (27) and define with , , , and the related shape parameter . Then . Furthermore, the vector of binary outcomes Y_i has a multivariate Bernoulli distribution with joint probability mass function, (32) and Y_ij has a Bernoulli distribution with success probability and probability mass function, (33) where , A_ij = 1 − 2y_ij, is the j^th diagonal element of Ω_i, , and with Δ_ij and μ_ij the j^th elements of Δ_i and μ_i respectively.

Eq (32) conveniently expresses for a value y_i of Y_i, the likelihood as a cumulative probability of a ST distribution whose location, scale and shape parameters depend on y_i, using the identities P(Z_ij > z_ij) = P(−Z_ij < −z_ij) and sign(Z_ij) = 2Y_ij − 1 where sign(⋅) returns the sign of its argument. On using Eq (4) on the distribution of Z_i given in Proposition 3, the variance-covariance matrix of the outcomes at the latent scale is (34) Thus, in a model with a cluster-specific random intercept (q = 1) with δ = 0 and , the latent intra-class correlation coefficient (the proportion of variance explained by clustering at latent scale) is given by (35) The joint distribution of Z_i and b_i (i.e. ) is where , , , and . Thus, for j = 1, 2, ⋯, n_i and k = 1, 2, ⋯, q, the correlation between Z_ij and b_k is with the variance of b_k, the variance of Z_ij and is the covariance between Z_ij and b_k, W_ij is the j^th row of W_i and is the k^th column of , is the k^th diagonal element of and δ_k is the k^th element of δ.

The parameters of the SGTLM include β, δ_ε, δ, and ν where the vech(⋅) operator returns the lower triangle elements of its matrix argument. In order to avoid non-regular likelihood problems occurring in models based on the Student distribution and its extensions (in particular when ν is close to zero) [40], we follow some recent related works [24, 41] and first consider ν as known, focusing on . Classical inference on θ is based on the marginal likelihood of the observed data y. Using Proposition 3, the joint marginal log-likelihood of n independent clusters y_i (i = 1, 2, ⋯, n) is: (36) From Eq (36), an optimization routine like the R function optim can be used for inference on θ. We however develop an EM algorithm to circumvent the n_i-dimensional integral in Eq (32) when estimating θ.

Model identifiability

Estimations in the skew generalized t-link model (SGTLM) may produce inconsistent results which would induce unreliable and misleading conclusions, if the model is not identifiable. It is thus of great importance to check whether different points in the parameter space can be distinguished from observations y_i. We analyse in this section the identifiability of the SGTLM and indicate when it is necessary to restrict the parameter space to ensure reliable inference from observed data. We restrict attention to the case υ₀ = 1 (ignoring Eq (29)) since υ₀ is an artificial device only included to ensure a unit variance in the conditional link function as in the traditional probit mixed model.

Although not sufficient, the identifiability of the SGTLM requires both the marginal random effects distribution and the conditional model given random effects to be identifiable. The identifiability of the random effects distribution follows from the identifiability of multivariate skew t distributions. We survey the identifiability of the conditional model before turning to the marginal model.

• Conditional identifiability

Conditional on the random effects b_i and for fixed degrees of freedom parameter ν, the identifiability of the SGTLM reduces to the identifiability of the fixed effects skew-probit model. This follows because the skew t inverse link function is an average of the skew-probit inverse link function with respect to the gamma mixing distribution. The identifiability of the skew-probit model with one covariate has been recently shown to depend on the nature (binary/continuous) of the covariate in the model [42]. Indeed, the fixed effect skew-probit model is not identifiable in the absence of any covariate (i.e. each X_i is a column of n_i ones) [42] (page 1624, Proposition 2.1) or in the presence of a binary covariate [42] (page 1626, Proposition 2.2). On the other hand, the fixed effect skew-probit model is identifiable when the covariate is continuous [42] (page 1627, Proposition 2.3). Extension to the case of multiple covariates is straightforwardly obtained by requiring the covariate matrix to be of full column rank as in the classical linear regression model context. Whenever binary covariates are considered or no covariate is considered, we advocate to set δ_ϵ = 0 so that the conditional model reduces to a classical probit model.

• Marginal identifiability

From Proposition 3, it appears that when υ₀ = 1 the paramaters δ_ε and δ enter the marginal distribution of Y_i only through the marginal working shape whose j^th element can be written . As a result, caution is required when learning the model parameters from some realizations y_i of Y_i. Indeed, if the model includes a random intercept term, i.e.W_ij has the form where , we can partition the random effects working shape as with so that the j^th element of the marginal working shape reads . Therefore, only the sum δ_ϵ + δ₀ could be estimated and it would not be possible to distinguish in the observed skewness the part due to the random intercept from the part due to the conditional link function. This confounding issues may actually be avoided by considering more complex models based for instance on the fundamental skew distributions [43]. Recall that skewness is introduced in the SGTLM through a hidden standard half normal variable, namely V_i. As opposed to the unique standard half normal variable V_i used for both Z_i and b_i in the SGTLM, the use of two different standard half normal hidden variables for Z_i and b_i [44] (page 667 eq 5-6) or two different standard half multivariate normal hidden vectors [45] (page 420 eq 2.2) remove the confounding problem.

Fortunately, the non identifiability of the skewness of conditional link function and random intercept does not affect the success probability of the response since this only depends on Δ_i, but not on the individual values of δ_ϵ and δ₀. However, since the conditional link scale depends on δ_ϵ through υ₀, the confounding problem affects the scale and thus the interpretation of fixed effects. Moreover, inference on the random intercept is affected since the random intercept variance and skewness depend on δ₀. For example, δ_ϵ + δ₀ = 0 only indicates null marginal skewness, and in no way absence of link function and random intercept skewness which could be equally strong but of opposite signs. Thus, only a lower bound can be given to the random intercept variance: where is the first diagonal element of . To rule out this peculiar situation where the model is not marginally fully identifiable, some previous works on skew normal/skew t distributions have considered the restriction δ_ϵ = 0 (regardless of the presence of a random intercep term) for instance in the context of linear mixed effects models [30, 46, 47] (page 1492 eq 2, page 4100 eq 4, and page 309 eq 3.2 respectively), multivariate measurement error models [48] (page 35, Eq 4.11) and non linear mixed effect models [49] (page 7 eq 10), but no argument was given to support this choice.

In the very common situation where the mixed model includes a random intercept term, prior information on the shape of the link function and/or the random intercept is required to place a meaningful restriction on the parameter space by setting for instance δ_ϵ = 0 or δ₀ = 0 or δ_ϵ = δ₀. In the absence of such information, we advocate to consider the restriction δ₀ = 0 because the success probability of a response may exibit skewness, irrespective of the presence of random effects. This restriction thus allows to recover a fixed effects skew generalized t-link model when no random effect is considered. Overall, the restriction δ₀ = 0 simply expresses the unability of the SGTLM to capture any additional skewness structure from the data through the inclusion of only random intercept. For completeness, we develop in the next section an estimation procedure for the full model in Eq (27), since the introduction of any equality restriction on δ_ϵ and δ₀ can be straightforwardly reduced to δ₀ = 0.

The two restrictions discussed in this section are related to the structure of the quantity Δ_i and are required only for some specific data structures (models including a binary covariate or models with a random intercept only). However, even when a restriction is required on δ_ε or the first element of δ, the quantity Δ_i itself can remain unrestricted. When a restriction is required, it forces the skewness from a data to be summarized either by δ_ε or elements of δ. Overall, the SGT-link function is always allowed to be skewed (unconditional link). But some designs do not allow to distinguish skewness in the conditional link function from skewness in the distribution of random effects.

Maximum likelihood inference

Estimation via the EM algorithm.

The choice of the value of υ₀ in Eq (29) is in line with one of our purposes: rescale fixed effects so that they have the same interpretation as in the mixed probit model. There is no need to define υ₀ depending on situations, because in our proposal, υ₀ is fixed. However, since υ₀ is simply a scaling factor, it may be given any positive value during estimation, as long as the estimates are rescaled after the convergence of the estimation procedure so that υ₀ is finally given by Eq (29). Indeed, because routines are basically written for the skew t distributions, we used υ₀ = 1 in the EM algorithm and rescaled the estimates at the end of the procedure. Let us consider the complete data . Because y_i only retains the signs of elements of z_i, the joint density of y_i and z_i is where with if y_ij = 0 and if y_ij = 1. The density of is thus . Hence by Bayes’s rule and in light of Eq (27) with υ₀ = 1, the density of is: (37) where f_V(v_i) = 2ϕ(v_i) I_(0,∞)(v_i) and f_G(⋅|ν/2, ν/2) is given in Eq (9). By Eq (37) and on setting and , the complete data log-likelihood is: (38) where , and tr(⋅) is the trace operator. Let the estimate of θ at the k^th EM iteration. The E-step of the (k + 1)^th iteration finds the expectation of ℓ(⋅|y_com) given the observed data y and the current parameter estimate : (39) where and so that we have Note that the conditional expectation of is 0 since given y_i, . The E-step thus reduces to the computation of the conditional expectations , , , , , , , , , and . The expressions for these expectations (except ) are given in the following result where we have dropped the supraindex (k) for simplicity.

Proposition 4 (see S7 Appendix for a proof). Consider the random variables Y_ij, Z_i, b_i, U_i, and V_i as defined in Eq (27) with υ₀ = 1, and an update of the model parameter θ. Let , , , , , , , , , . Then: (40) (41) (42) (43) (44) (45) (46) where , , , , and the expectations , , , and are to be evaluated directly using Corollary 1 applied to the conditional latent vector , with if y_ij = 0 and if y_ij = 1.

The M-step jointly maximizes over θ. This yields the following updating expressions for θ.

Proposition 5 (see S8 Appendix for a proof) Consider an identifiable SGTLM as defined in Eq (27) with υ₀ = 1; and an estimate of θ. Set , , , , and . At EM iteration (k + 1), the updates of β, δ_ε, δ and are given by: (47) (48) (49) (50)

At convergence of the EM algorithm, we obtain the estimate of θ. The corresponding estimate of the variance-covariance matrix of random effects is . More generally, when ν is not actually known, the M-step of the EM algorithm can be extended to include a profiled marginal log-likelihood maximization step. Indeed, at EM iteration k, we notice that the estimate of θ depends on ν only through and the profiled marginal log-likelihood L_ν(⋅) for ν can be obtained by simply substituting for θ in Eq (36). We can thus find using a one dimensional optimization routine (e.g. optimize in R) to maximize L_ν(⋅). Then, the update of the parameter θ becomes . The use of the profiled marginal log-likelihood instead of a profiled version of can provide substantial gain of efficiency [50] and mostly helps bypass the calculation of which does not have any known closed form.

It is worthwhile noticing however that the inclusion of a profiled marginal log-likelihood maximization step would prevent the convergence of the whole estimation procedure if the marginal log-likelihood in Eq (36) as a function of ν is unbounded for a particular dataset. This issue especially when ν is close to zero. Another challenge associated to the estimation of ν is time. The use of this strategy requires a very fast routine to compute cumulative probabilities of the skew t distributions. As an alternative route to estimate ν, we point out the model selection approach of [51] (page 893). It consists in setting a grid of feasible values of ν and obtaining a sequence of estimates of θ. Then, the couple ν and maximizing the marginal log-like-lihood in Eq (36) is taken as the estimates of ν and θ.

Accelerating EM via parameter-expansion.

Besides its attractiveness and stability for handling incomplete data models, the EM algorithm sometimes experiences slow convergence, which has motivated many methods to accelerate its linear convergence speed. Among popular EM accelerators, the so-called parameter-expanded (PX) EM algorithm was proposed by [27] to speed up convergence. Let us consider a complete data model F(y_com|θ). The PX-EM algorithm expands F(y_com|θ) to a larger model F_X(y_com|Θ) parameterized by where θ_⋆ plays in F_X(y_com|Θ) the role of θ in F(y_com|θ) and α is a working parameter. The use of the PX-EM algorithm requires that (1) α admits a value α₀ that preserves the original complete data model and (ii) the observed-data model is preserved by a many-to-one reduction function R: Θ ↦ θ = R(Θ) which allows an unambigious recovering of θ from Θ. We refer to [27] for more details. For the SGTLM, let us consider the following expanded complete data model obtained by including a working q × q scale matrix α into the linear predictor as η_i = X_i β_⋆ + W_i α b_i: This expanded model equals the STGLM in Eq (27) when α takes the value α₀ = I_q, and has expanded parameter where vec is the usual operator which stacks the columns of its matrix argument. Under this model, the marginal distribution of Y_i remains as given in Eq (32) with and so that Θ reduces as . As the observed-data model is preserved whatever the value of α, we fix α = I_q at each E-step of the EM procedure. Therefore, the E-step of the PX-EM algorithm uses Proposition 4 to obtain conditional expectations required in Eq (39) as for the classical EM algorithm. At the M-step, the estimates of δ_⋆ and are still given by Eqs (49)–(50) respectively whereas the estimates of δ_ε⋆, β_⋆ and vec(α) are: (51) (52) where , , and with ⊗ the direct product operator. Using the reduction function, the original model parameter estimates can be recovered as , , and . In the neighbourhood of the ML estimate of θ, the working scale estimate becomes close to α₀ = I_q [27] so that the advantage of the PX-EM algorithm over the classical EM algorithm disappears. We thus propose to stop the PX acceleration once |λ_max| < ϵ with λ_max the dominant eigen value of and ϵ a pre-specified tolerance value (e.g. ϵ = 10⁻²).

Summary of the estimation procedure.

The estimation procedure starts with a parameter , k = 0 and iterates the following six steps until convergence.

E-step: compute conditional expectations defined by Eqs (40)–(46) with .
PX M-step: obtain and using Eqs (49)–(52) and the reduction function: , , and .
Test: compute λ_max the dominant eigen value of . If |λ_max| < 10⁻² then compute the marginal likelihood using Eq (36) and go to 4) with k = k + 1, otherwise return to 1) with k = k + 1.
E-step: compute conditional expectations defined by Eqs (40)–(46) with .
M-step: obtain using Eqs (47)–(50).
Test: compute the marginal likelihood using Eqs (36). If then go to 7), otherwise return to 4) with k = k + 1.
Rescaling: compute υ₀ using (29) and rescale the estimates as , and . Return .

Approximate standard errors.

With a view to allow asymptotic inference in SGTLM, we follow the empirical information-based method of [52] (pages 132-133) to compute the asymptotic variance-covariance matrix of the ML estimate of θ under some general regularity conditions. The observed information matrix is defined to be where , , being the contribution of the single observation y_i to the expected complete data log-likelihood in Eq (39). On setting , the elements of the score g_i can be explicitly evaluated using: (53) (54) (55) (56) Afterwards, the standard errors of estimated model parameters are approximated by square roots of diagonal elements of and confidence intervals can be built assuming asymptotic normality.

Empirical Bayes estimators of random effects and weights.

In this section, we provide the empirical Bayes estimators of cluster specific random effects and weights that are useful for evaluating individual intercepts and slopes as well as detecting outlying individuals. From Eq (27), the distribution of b_i conditional on Z_i = z_i, U_i = u_i and V_i = v_i is multivariate normal with mean and covariance matrix where , s_i = δ − r_i Δ_i, , , and . The conditional mean of b_i given Y_i = y_i is thus: (57) where , , and the quantities , , and are to be evaluated using Corollary 1 applied to with , , , if y_ij = 0 and if y_ij = 1. The empirical Bayes estimators of b_i can then be obtained as .

For outlying individuals detection, individual weights U_i are predicted by [53] which is given by Eq (16) in Corollary 1 applied to Z_i|Y_i = y_i. The empirical Bayes estimators of U_i are thus given by . Relatively low weights (< 1) are indicative of outlying individuals.

Applications

This section presents a simulation study for assessing performance of SGTLM, and an application of the modeling approach to a real dataset.

Simulation study

We conducted a simulation to evaluate the proposed approach to the analysis of correlated binary data. The simulation experiment targeted four specific objectives. First, it assessed for different sample sizes, the abilities of the probit (PM), the skew-probit (SPM), the generalized t-link (GTLM) and the skew generalized t-link (SGTLM) models to recover population parameters when the common normality assumption for the link function is either violated or not. The widely used logistic model was not investigated as the logistic distribution can be considered as a special case of the Student t distribution [8] hence the logistic model is a special case of GTLM. Second, the experiment evaluated the extent to which asymptotic 95% confidence intervals (CI_95%) can detect the presence of spurious skewness. Third, the experiment evaluated the ability of empirical Bayes estimators of random effects to predict true random effects. Finally, the simulation study assessed the ability of Akaike’s information criterion (AIC), Schwarz’s Bayesian information criterion (BIC) and Hannan-Quinn criterion (HQ) to select the correct model fit. All computations were performed in R.

Simulation design.

Mimicking the structure of the simulation model studied in [24] (page 1116), we considered the following GLMM: where η_i = (η_i1, ⋯, η_i6)^⊤, η_ij = β₀ + β₁ X_1i + b_0i + b_1i W_1ij, b_i = (b_0i, b_1i)^⊤; X₁ is a dichotomous covariate (Bernoulli distribution with sucess probability 0.5) and W₁ is a continuous occasion-varying random covariate (standard normal distribution); β₀ is an intercept, i.e. the general mean of the linear predictor η_ij and β₁ is the fixed-effects associated to the covariate X₁ with values arbitrarily fixed to β₀ = 1 and β₁ = −1; b_0i is a random intercept associated to the cluster i, b_1i is the random slope associated to W_1i; is a 2 × 2 scale matrix with diagonal elements 0.5 and 1, and off diagonal element 0.25; is a positive distribution with finite first two negative moments, i.e. for t = 1, 2; and . We considered δ = (0, δ₁)^⊤, i.e. a null random intercept skewness to ensure the identifiability of the model.

Under this general class of SSMN latent models, we considered two data models. The first is the probit data model where U = 1, δ₁ = 0 and δ_ε = 0 (probit link, υ₀ = 1). The second is the skew generalized t-link data model with δ_ε = −2 and δ₁ = 2 and with ν = 5 (υ₀ = 0.4598). We considered for each data model, sample sizes (n) of 100, 500 and 1000 and thus generated three sets of covariates which were used for all simulations involving each of the two data models. Under each of the six resulting simulation settings, we generated 250 datasets to which we fitted the four fitting models under evaluation (PM, SPM, GTLM, SGTLM), considering the model degrees of freedom as known and equals to ν = 5 for the SGTLM. Fixed effects (β₀ and β₁) and skewness parameters (δ_ϵ and δ₁) were initialized to zero whereas the scale matrix was initialized to the 2×2 identity matrix.

Performance measures.

In addition to estimates of fixed effects (, k = 0, 1) and skewness ( for SPM and SGTLM, l = ϵ, 1) and related empirical standard errors and CI_95%, we recorded random effects variances ( of b_0i and of b_1i) and covariance (σ₁₂) and their approximate standard errors derived using the delta method [54] as implemented in the R package car [55], empirical Bayes estimates of individual random effetcs (), and the AIC, BIC and HQ criteria defined as: , , and where is the maximized log-likelihood value, N is the total number of observations and N_p is the number of estimated model parameters. These data were used to compute various performance measures (Table 1) including the relative bias (%Bias) and the root mean square error (RMSE) in and ; the standard deviations (SD) of ; the quadratic mean () of standard errors of ; the coverage probabilities () of and , i.e. the proportion of times the CI_95% for β_k or δ_l included the true value; the arithmetic mean () of the square of Pearson’s correlation (coefficient of determination) between simulated and Bayes estimates of subject random effects; and the arithmetic means of information criteria AIC (), BIC () and HQ () across the 250 simulated datasets per simulation setting.

Download:

Table 1. Measures of the performance of binomial fitting models.

https://doi.org/10.1371/journal.pone.0249604.t001

Simulation results.

Simulation results presented in Tables 2 and 3 show that under the probit data generation mechanism, the probit, the skew-probit, the generalized t (GT)-link and the skew generalized t (SGT)-link models recovered the population parameter values. Indeed, the percentage of bias was below 5% at all levels for fixed effects, whereas for variance components, the percentage of bias was below 20%. We particularly noticed a high relative bias in the variance component (%Bias = 17.33) under probit fit to probit data (assuming the true model) with small sample size (n = 100). This may be explained by the maximum likelihood estimation method which is known for providing biased variance components [56]. Nevertheless, this can be improved by opting for residual maximum likelihood estimation procedure [56]. However, it is worth noticing that the estimation improves as the sample size increases and the empirical standard error estimates agree with the standard deviations from the simulations. The results for n = 100 and n = 500 are consistent with findings in [24] where empirical information based standard errors approached Monte Carlo standard errors. Moreover, the 95% confidence interval for the skewness parameters allows to detect spurious skewness in the skew-probit and the SGT-link models with coverage probabilities of 100%. This result can be explained by the underlined high accuracy of information based standard errors in this type of model. The power in predicting random effects varied from R² = 0.45 to R² = 0.47 for random intercepts and from R² = 0.23 to R² = 0.28 for random slopes, but was comparable for the three fitting models. Finally, it appears that on average all model selection criteria correctly considered probit fitting model as the parsimonious model.

Download:

Table 2. Results based on 250 replications of probit samples: Probit and skew-probit fits.

https://doi.org/10.1371/journal.pone.0249604.t002

Download:

Table 3. Results based on 250 replications of probit samples: Generalized t (GT)-link and skew generalized t (SGT)-link fits.

https://doi.org/10.1371/journal.pone.0249604.t003

Under a SGT-link data generation mechanism, the probit model performed poorly, showing large relative fixed effects bias values which decreased from 46% for samples of size n = 100 to 18% for samples of size n = 1000 (Table 4). The SGT-link model estimates were the less biased (%Bias < 12) as well as the most accurate with the lowest root mean square errors across all levels (Table 5). The same observations apply to variance components which were highly biased downward for probit, skew-probit and GT-link models (%Bias up to 90) relative to the SGT-link model (%Bias < 7). Regarding estimates of skewness parameters, the coverage probability was low (53% to 94%) for small sample size (n = 100) and approached nominal (95%) value for larger sample sizes (n = 500, 1000). The skew-probit model estimates (coverage probability down to 53%) were less reliable than estimates from the SGT-link model (coverage probability above 90%).

Download:

Table 4. Results based on 250 replications of skew generalized t-link samples (probit and skew-probit fits).

https://doi.org/10.1371/journal.pone.0249604.t004

Download:

Table 5. Results based on 250 replications of skew generalized t-link samples (generalized t-link and skew generalized t-link fits).

https://doi.org/10.1371/journal.pone.0249604.t005

Clearly, the SGT-link model adjusted better with non normal data and accordingly, random effects prediction is better with SGT-link model (R² ≥ 0.49) than with the probit or the skew-probit model. Moreover, all the considered model selection criteria namely AIC, BIC and HQ on average correctly selected the SGT-link model as the preferred model.

Application to the respiratory infection data

To demonstrate the usefulness of the proposed approach to correlated binary data modeling, we revisited the respiratory illness data (available in geepack package [57] in R) which was used by [24] to illustrate their t-link GLMM. The respiratory illness data was obtained from a clinical study of the effect of a treatment on 111 patients with respiratory illness, recruited from two different clinical centers. The patients were examined and their respiratory state (categorized as 1 = good, 0 = poor) determined (baseline). They were then randomized to receive either placebo or an active treatment. The goal of the study was to determine whether the treatment induced a better respiratory state in treated patients. The outcome is the respiratory state measured at four visits for each patient as good (y = 1) or poor (y = 0). In addition to the treatment (treat = 0 for placebo group (P) and treat = 1 for treated group (A)), the following fixed covariates were included: the clinical center (center = 0 for the first center and center = 1 for the second center), the baseline (respiratory state at the first visit), gender (sex = 0 for female (F) and sex = 1 for male (M)) and the interaction of treatment and gender. Following [24], we assumed that the age effect is patient-specific (random slope) and thus considered the patient age centered around its median (31 years) as a random covariate. Since the fixed covariates included binary variables (treat, gender, center and baseline), a conditional skew-probit model is not identifiable given random effects and we thus set δ_ϵ = 0 to ensure identifiability.

For the purpose of comparison, we fitted the probit, skew-probit, GT-link and SGT-link models. We initialized fixed effects β and the random slope skewness parameter δ to zero whereas the random slope scale was initialized to one. For the GT-link and the SGT-link models, we considered the model selection approach of [51] with degrees of freedom ν = 2.5, 2.6, …, 15.

As depicted in Fig 1, the profiled marginal log-likelihood for the GT-link model is unbounded, with smaller ν corresponding to better fit in accordance with the t-link model fits in [24] (ν ≤ 4). We thus set ν = 2.5 for the t-link model. For the SGT-link fit, Fig 1 indicates that the log-likelihood is bounded with a maximum at ν = 3.7, suggesting heavy tail link function and random slope distributions. The difference in the behaviours of the GT-link and SGT-link models may be explained by the implication of ν in the location of the skew t-link model through (see Eq (32)).

Download:

Fig 1. Fitting the generalized t-link and the skew generalized t models to the respiratory infection data: Plot of the marginal log-likelihood profiled for the degrees of freedom ν.

https://doi.org/10.1371/journal.pone.0249604.g001

The maximum likelihood (ML) estimates under the probit, skew-probit, GT-link and SGT-link models (Table 6) are somewhat close for the four fitted models which all show that respiratory illness is associated to clinical center, baseline state and treatment, with the treatment effect varying with gender. The SGT-link fit additionally indicates that, irrespective of the treatment, the respiratory state is poorer for male patients (Table 6, ) than females. We notice for this dataset, that the intercept coefficient estimate increases with model complexity and estimates of fixed effects and their respective standard errors are shrunk toward zero for the skew-probit model relative to the probit one, and for the skew-probit model relative to the SGT-link one. The skew-probit model fit also gave a higher skewness () as compared with the SGT-link model fit (). Although the estimated skewness is relatively low for both skew-probit and SGT-link models, the use of a skewed and heavy tail link clearly improved, not only the precision of estimates but also the adequacy between data and model. Indeed, the asymptotic 95% confidence interval for δ under the SGT-link includes zero (CI_95% = [−0.0010, 0.0534]), but we noticed from the simulation results that asymptotic CI_95% for skewness parameters becomes reliable only in large samples (n ≥ 500), whereas information criteria are reliable for all tested sample sizes. Thus, based on the AIC, BIC and HQ criteria in Table 6, the SGT-link fit is the best for the respiratory illness data. The estimate of the variance of the random slope of age is for the SGT-link fit, with close values under probit and skew-probit models. From the SGT-link fit, it appears that the treatment induced an overall better respiratory state for treated patients (with a negative coefficient β₄ = −2.0398 for the placebo group). Moreover, the treatment has on average a better effect on female patients than on male patients (with a positive coefficient, β₅ = 1.3425 for male patients in the placebo group). However, as noted by [24], new studies are required to check this latter trend because of the highly unbalanced proportion of males (79%) and females (21%) in the data.

Download:

Table 6. Maximum likelihood fits of probit, skew-probit, Generalized T (GT)-link and Skew Generalized T (SGT) -link models to the respiratory infection data.

https://doi.org/10.1371/journal.pone.0249604.t006

Conclusion

This work has considered the skew generalized t class of distributions for both link and random effects distributions in mixed models for binary data. The objective was to improve the exploitation of binary data bearing oddities such as skewness and tails thicker/thinner than the normal distribution. To allow inference in such models, we developped a maximum likelihood estimation procedure based on the EM algorithm. We combined results from [34] and [37] to obtained expressions for computing moments of truncated multivariate skew t distributions. The computation used existing R functions for the multivariate skew t cumulative distribution function. Our simulation experiment showed that, irrespective of sample size, the SGT-link model outperforms the probit GLMM when the underlying data generation mechanism is not normal. We also demonstrated that the skew generalized-link model performed better than the skew-probit and the generalized t-link GLMMs, when the underlying data is both skewed and heavy tailed.

An important finding is that when the model degrees of freedom ν is small and very large values are assumed (fitting probit and skew-probit models), the estimates of fixed effects are biased, whereas when ν is large but small values are assumed, the estimates of fixed effects are not biased. Moreover, asymptotic inference using information based standard errors proved highest ability accuracy in detecting spurious skewness in large samples (n ≥ 500) and information criteria on average selected the correct model fit for all tested sample sizes (n = 100, 500, 1000). These findings extend results in [24] on t-link GLMM to SGT-link GLMM, asserting that information criteria are reliable for selecting the best model for a particular dataset.

However, the simulation experiments revealed that the EM algorithm has a high computational cost. For instance, in a model with q = 2 random effects, n = 100 clusters and n_i = 6 observations per cluster, the mean running time for the SGT-link model fit was 4.76 minutes which is almost 135 times the time required by the probit model fit (2.12 seconds). Our implementation relies on the pmst function of the R package sn [35] to compute the cumulative probabilities of skew t distributions. This function uses the one dimensional routine integral of R on the multivariate normal cumulative distribution function. The use of the EM algorithm for large q values (e.g. q = 10, 15) requires the prior development of a faster routine for the computation of cumulative probabilities of skew t distributions. This will make the EM algorithm scalable for large q + n_i. On multicore plateformes, parallel computing can also substantially speed computations up. The expressions provided for computing moments of truncated multivariate skew t distributions is limited to work for models with ν > 2. The use of formulae given in [38] will extend our EM algorithm to very small degrees of freedom (1 < ν ≤ 2).

Binary data related to very rare events often require special treatment and are generally analysed using zero inflated models [58]. The development of a skew generalized t-link model with zero inflation can significantly improve the exploitation of such data. In addition to binary data, GLMMs handle other data types like count, proportional and ordinal outcomes. From the good performance demonstrated in this work and in previous related ones [9, 24], we believe that the simultaneous introduction of flexible links and random effects distributions in GLMM would benefit knowledge extraction from observed data in applied research fields where advances rely on modeling capacity.

Supporting information

S1 Appendix. Proof of Lemma 1.

This supporting information gives a proof of Lemma 1.

https://doi.org/10.1371/journal.pone.0249604.s001

(PDF)

S2 Appendix. Proof of Lemma 2.

This supporting information gives a proof of Lemma 2.

https://doi.org/10.1371/journal.pone.0249604.s002

(PDF)

S3 Appendix. Proof of Proposition 1.

This supporting information gives a proof of Proposition 1.

https://doi.org/10.1371/journal.pone.0249604.s003

(PDF)

S4 Appendix. Proof of Corollary 1.

This supporting information gives a proof of Corollary 1.

https://doi.org/10.1371/journal.pone.0249604.s004

(PDF)

S5 Appendix. S5 Proof and limiting case of Proposition 2.

This supporting information gives a proof of Proposition 2. The first two moments of truncated multivariate skew normal distributions (limiting case as ν → ∞) are also given (required for fitting skew-probit link models).

https://doi.org/10.1371/journal.pone.0249604.s005

(PDF)

S6 Appendix. Proof of Proposition 3.

This supporting information gives a proof of Proposition 3.

https://doi.org/10.1371/journal.pone.0249604.s006

(PDF)

S7 Appendix. Proof of Proposition 4.

This supporting information gives a proof of Proposition 4.

https://doi.org/10.1371/journal.pone.0249604.s007

(PDF)

S8 Appendix. Proof of Proposition 5.

This supporting information gives a proof of Proposition 5.

https://doi.org/10.1371/journal.pone.0249604.s008

(PDF)

Acknowledgments

The authors wish to thank the editor and two referees for their relevant comments and suggestions. They are also grateful to Matthews Lazaro (Kamuzu College of Nursing, Lilongwe, Malawi) for the time he devoted to edit the manuscript for language usage, spelling, and grammar.

References

1. El-Saeiti IN. Performance of mixed effects for clustered binary data models. In: AIP Conference Proceedings. vol. 1643. AIP; 2015. p. 80–85.
2. Nelder JA, Wedderburn RW. Generalized linear models. Journal of the Royal Statistical Society: Series A (General). 1972;135(3):370–384.
- View Article
- Google Scholar
3. McCulloch CE. Maximum likelihood variance components estimation for binary data. Journal of the American Statistical Association. 1994;89(425):330–335.
- View Article
- Google Scholar
4. Chen MH. Skewed link models for categorical response data. In: Skew-Elliptical Distributions and Their Applications. Chapman and Hall/CRC; 2004. p. 151–172.
5. McCulloch CE, Neuhaus JM. Misspecifying the shape of a random effects distribution: why getting it wrong may not matter. Statistical science. 2011;28(3):388–402.
- View Article
- Google Scholar
6. Czado C, Santner TJ. The effect of link misspecification on binary regression inference. Journal of statistical planning and inference. 1992;33(2):213–231.
- View Article
- Google Scholar
7. Stewart MB. Semi-nonparametric estimation of extended ordered probit models. Stata Journal. 2004;4(1):27–39.
- View Article
- Google Scholar
8. Liu C. Robit regression: a simple robust alternative to logistic and probit regression. In: Gelman A, Meng XL, editors. Applied Bayesian Modeling and Casual Inference from Incomplete-Data Perspectives. England: Wiley London; 2004. p. 227–238.
9. Kim S, Chen MH, Dey DK. Flexible generalized t-link models for binary response data. Biometrika. 2008;95(1):93–106.
- View Article
- Google Scholar
10. Abanto-Valle CA, Dey DK. State space mixed models for binary responses with scale mixture of normal distributions links. Computational Statistics & Data Analysis. 2014;71:274–287.
- View Article
- Google Scholar
11. Basu S, Mukhopadhyay S. Binary response regression with normal scale mixture links. BIOSTATISTICS-BASEL-. 2000;5:231–242.
- View Article
- Google Scholar
12. Pinheiro JC, Liu C, Wu YN. Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. Journal of Computational and Graphical Statistics. 2001;10(2):249–276.
- View Article
- Google Scholar
13. Chen MH, Dey DK, Shao QM. A new skewed link model for dichotomous quantal response data. Journal of the American Statistical Association. 1999;94(448):1172–1186.
- View Article
- Google Scholar
14. Komori O, Eguchi S, Ikeda S, Okamura H, Ichinokawa M, Nakayama S. An asymmetric logistic regression model for ecological data. Methods in Ecology and Evolution. 2016;7(2):249–260.
- View Article
- Google Scholar
15. Lemonte AJ, Bazán JL. New links for binary regression: an application to coca cultivation in Peru. Test. 2018;27(3):597–617.
- View Article
- Google Scholar
16. Asgharzadeh A, Esmaeili L, Nadarajah S, Shih S. A generalized skew logistic distribution. REVSTAT–Statistical Journal. 2013;11(3):317–338.
- View Article
- Google Scholar
17. Carlin JB, Wolfe R, Brown CH, Gelman A. A case study on the choice, interpretation and checking of multilevel models for longitudinal binary outcomes. Biostatistics. 2001;2(4):397–416. pmid:12933632
- View Article
- PubMed/NCBI
- Google Scholar
18. Agresti A, Caffo B, Ohman-Strickland P. Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Computational Statistics & Data Analysis. 2004;47(3):639–653.
- View Article
- Google Scholar
19. Chen J, Zhang D, Davidian M. A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution. Biostatistics. 2002;3(3):347–360. pmid:12933602
- View Article
- PubMed/NCBI
- Google Scholar
20. Nelson KP, Lipsitz SR, Fitzmaurice GM, Ibrahim J, Parzen M, Strawderman R. Use of the probability integral transformation to fit nonlinear mixed-effects models with nonnormal random effects. Journal of Computational and Graphical Statistics. 2006;15(1):39–57.
- View Article
- Google Scholar
21. Hosseini F, Eidsvik J, Mohammadzadeh M. Approximate Bayesian inference in spatial GLMM with skew normal latent variables. Computational Statistics & Data Analysis. 2011;55(4):1791–1806.
- View Article
- Google Scholar
22. Broström G, Holmberg H. Generalized linear models with clustered data: Fixed and random effects models. Computational Statistics & Data Analysis. 2011;55(12):3123–3134.
- View Article
- Google Scholar
23. Gad AM, El Kholy RB. Generalized linear mixed models for longitudinal data. International Journal of Probability and Statistics. 2012;1(3):41–47.
- View Article
- Google Scholar
24. Prates MO, Costa DR, Lachos VH. Generalized linear mixed models for correlated binary data with t-link. Statistics and Computing. 2014;24(6):1111–1123.
- View Article
- Google Scholar
25. Santos CC, Loschi RH. EM-Type algorithms for heavy-tailed logistic mixed models. Journal of Statistical Computation and Simulation. 2017;87(15):2940–2961.
- View Article
- Google Scholar
26. Azzalini A, Capitanio A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2003;65(2):367–389.
- View Article
- Google Scholar
27. Liu C, Rubin DB, Wu YN. Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika. 1998;85(4):755–770.
- View Article
- Google Scholar
28. Branco MD, Dey DK. A general class of multivariate skew-elliptical distributions. Journal of Multivariate Analysis. 2001;79(1):99–113.
- View Article
- Google Scholar
29. Hugo LDV, Cabral CRB. Scale Mixtures of Skew-Normal Distributions. In: Hugo LDV, Cabral CRB, Zeller CB, editors. Finite Mixture of Skewed Distributions. Switzerland: Springer International Publishing; 2018. p. 15–36.
30. Lachos VH, Ghosh P, Arellano-Valle RB. Likelihood based inference for skew-normal independent linear mixed models. Statistica Sinica. 2010;20:303–322.
- View Article
- Google Scholar
31. Capitanio A. On the canonical form of scale mixtures of skew-normal distributions; 2012. Available from: https://arxiv.org/abs/1207.0797.
32. Kéri G. The Sherman-Morrison formula for the determinant and its application for optimizing quadratic functions on condition sets given by extreme generators. In: Giannessi F, Pardalos P, T R, editors. Optimization Theory. Boston: Springer; 2001. p. 119–138.
33. Ahmed A, Reshi J, Mir K. Structural properties of size biased Gamma distribution. IOSR J Mathem. 2013;5:55–61.
- View Article
- Google Scholar
34. Ho HJ, Lin TI, Chen HY, Wang WL. Some results on the truncated multivariate t distribution. Journal of Statistical Planning and Inference. 2012;142(1):25–40.
- View Article
- Google Scholar
35. Azzalini A. The R package sn: The Skew-Normal and Related Distributions such as the Skew-t (version 1.5-2).; 2018. Available from: http://azzalini.stat.unipd.it/SN.
36. R Core Team. R: A Language and Environment for Statistical Computing; 2019. Available from: https://www.R-project.org/.
37. Galarza CE, Matos LA, Lachos VH. Moments of the doubly truncated selection elliptical distributions with emphasis on the unified multivariate skew-t distribution. arXiv preprint arXiv:200714980. 2020.
38. Galarza CE, Lin TI, Wang WL, Lachos VH. On moments of folded and truncated multivariate Student-t distributions based on recurrence relations. Metrika. 2021; p. 1–26.
- View Article
- Google Scholar
39. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society Series B (methodological). 1977;39(1):1–22.
- View Article
- Google Scholar
40. Fernandez C, Steel MF. Multivariate Student-t regression models: Pitfalls and inference. Biometrika. 1999;86(1):153–167.
- View Article
- Google Scholar
41. da Silva Braga A, Cordeiro GM, Ortega EM, Silva GO. The Odd Log-Logistic Student t Distribution: Theory and Applications. Journal of Agricultural, Biological and Environmental Statistics. 2017;22(4):615–639.
- View Article
- Google Scholar
42. Lee D, Sinha S. Identifiability and bias reduction in the skew-probit model for a binary response. Journal of Statistical Computation and Simulation. 2019;89(9):1621–1648.
- View Article
- Google Scholar
43. Arellano-Valle RB, Genton MG. Fundamental skew distributions. Journal of Multivariate Analysis. 2005;96:93–116.
- View Article
- Google Scholar
44. Arellano-Valle R, Bolfarine H, Lachos V. Bayesian inference for skew-normal linear mixed models. Journal of Applied Statistics. 2007;34(6):663–682.
- View Article
- Google Scholar
45. Arellano-Valle R, Bolfarine H, Lachos V. Skew-normal linear mixed models. Journal of data science. 2005;3(4):415–438.
- View Article
- Google Scholar
46. Lin TI, Lee JC. Estimation and prediction in linear mixed models with skew-normal random effects for longitudinal data. Statistics in medicine. 2008;27(9):1490–1507. pmid:17708515
- View Article
- PubMed/NCBI
- Google Scholar
47. Lachos VH, Dey DK, Cancho VG. Robust linear mixed models with skew-normal independent distributions from a Bayesian perspective. Journal of Statistical Planning and Inference. 2009;139(12):4098–4110.
- View Article
- Google Scholar
48. Lachos VH, Labra FV, Ghosh P. Multivariate skew-normal/independent distributions: properties and inference. Pro Mathematica. 2014;28(56):11–53.
- View Article
- Google Scholar
49. Pereira MAA, Russo CM. Nonlinear mixed-effects models with scale mixture of skew-normal distributions. Journal of Applied Statistics. 2019;46(9):1602–1620.
- View Article
- Google Scholar
50. Liu C, Rubin DB. The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika. 1994;81(4):633–648.
- View Article
- Google Scholar
51. Lange KL, Little RJ, Taylor JM. Robust statistical modeling using the t distribution. Journal of the American Statistical Association. 1989;84(408):881–896.
- View Article
- Google Scholar
52. Meilijson I. A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society Series B (Methodological). 1989;51(1):127–138.
- View Article
- Google Scholar
53. Meza C, Osorio F, De la Cruz R. Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Statistics and Computing. 2012;22(1):121–139.
- View Article
- Google Scholar
54. Cox C. Delta method. Encyclopedia of biostatistics. 2005;2.
- View Article
- Google Scholar
55. Fox J, Weisberg S. An R Companion to Applied Regression. 3rd ed. Thousand Oaks CA: Sage; 2019. Available from: https://socialsciences.mcmaster.ca/jfox/Books/Companion/.
56. Meza C, Jaffrézic F, Foulley JL. Estimation in the probit normal model for binary outcomes using the SAEM algorithm. Computational Statistics & Data Analysis. 2009;53(4):1350–1360.
- View Article
- Google Scholar
57. Yan J. geepack: Yet Another Package for Generalized Estimating Equations. R-News. 2002;2/3:12–14.
- View Article
- Google Scholar
58. Hall DB. Zero-Inflated Poisson and Binomial Regression with random effects: A Case Study. Biometrics. 2000;56(4):1030–1039. pmid:11129458
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. El-Saeiti IN. Performance of mixed effects for clustered binary data models. In: AIP Conference Proceedings. vol. 1643. AIP; 2015. p. 80–85.

[ref2] 2. Nelder JA, Wedderburn RW. Generalized linear models. Journal of the Royal Statistical Society: Series A (General). 1972;135(3):370–384.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. McCulloch CE. Maximum likelihood variance components estimation for binary data. Journal of the American Statistical Association. 1994;89(425):330–335.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Chen MH. Skewed link models for categorical response data. In: Skew-Elliptical Distributions and Their Applications. Chapman and Hall/CRC; 2004. p. 151–172.

[ref5] 5. McCulloch CE, Neuhaus JM. Misspecifying the shape of a random effects distribution: why getting it wrong may not matter. Statistical science. 2011;28(3):388–402.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Czado C, Santner TJ. The effect of link misspecification on binary regression inference. Journal of statistical planning and inference. 1992;33(2):213–231.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Stewart MB. Semi-nonparametric estimation of extended ordered probit models. Stata Journal. 2004;4(1):27–39.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref8] 8. Liu C. Robit regression: a simple robust alternative to logistic and probit regression. In: Gelman A, Meng XL, editors. Applied Bayesian Modeling and Casual Inference from Incomplete-Data Perspectives. England: Wiley London; 2004. p. 227–238.

[ref9] 9. Kim S, Chen MH, Dey DK. Flexible generalized t-link models for binary response data. Biometrika. 2008;95(1):93–106.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref10] 10. Abanto-Valle CA, Dey DK. State space mixed models for binary responses with scale mixture of normal distributions links. Computational Statistics & Data Analysis. 2014;71:274–287.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref11] 11. Basu S, Mukhopadhyay S. Binary response regression with normal scale mixture links. BIOSTATISTICS-BASEL-. 2000;5:231–242.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref12] 12. Pinheiro JC, Liu C, Wu YN. Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t distribution. Journal of Computational and Graphical Statistics. 2001;10(2):249–276.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref13] 13. Chen MH, Dey DK, Shao QM. A new skewed link model for dichotomous quantal response data. Journal of the American Statistical Association. 1999;94(448):1172–1186.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref14] 14. Komori O, Eguchi S, Ikeda S, Okamura H, Ichinokawa M, Nakayama S. An asymmetric logistic regression model for ecological data. Methods in Ecology and Evolution. 2016;7(2):249–260.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref15] 15. Lemonte AJ, Bazán JL. New links for binary regression: an application to coca cultivation in Peru. Test. 2018;27(3):597–617.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref16] 16. Asgharzadeh A, Esmaeili L, Nadarajah S, Shih S. A generalized skew logistic distribution. REVSTAT–Statistical Journal. 2013;11(3):317–338.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref17] 17. Carlin JB, Wolfe R, Brown CH, Gelman A. A case study on the choice, interpretation and checking of multilevel models for longitudinal binary outcomes. Biostatistics. 2001;2(4):397–416. pmid:12933632
View Article
PubMed/NCBI
Google Scholar

[44] View Article

[45] PubMed/NCBI

[46] Google Scholar

[ref18] 18. Agresti A, Caffo B, Ohman-Strickland P. Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Computational Statistics & Data Analysis. 2004;47(3):639–653.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref19] 19. Chen J, Zhang D, Davidian M. A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution. Biostatistics. 2002;3(3):347–360. pmid:12933602
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref20] 20. Nelson KP, Lipsitz SR, Fitzmaurice GM, Ibrahim J, Parzen M, Strawderman R. Use of the probability integral transformation to fit nonlinear mixed-effects models with nonnormal random effects. Journal of Computational and Graphical Statistics. 2006;15(1):39–57.
View Article
Google Scholar

[55] View Article

[56] Google Scholar

[ref21] 21. Hosseini F, Eidsvik J, Mohammadzadeh M. Approximate Bayesian inference in spatial GLMM with skew normal latent variables. Computational Statistics & Data Analysis. 2011;55(4):1791–1806.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref22] 22. Broström G, Holmberg H. Generalized linear models with clustered data: Fixed and random effects models. Computational Statistics & Data Analysis. 2011;55(12):3123–3134.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref23] 23. Gad AM, El Kholy RB. Generalized linear mixed models for longitudinal data. International Journal of Probability and Statistics. 2012;1(3):41–47.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref24] 24. Prates MO, Costa DR, Lachos VH. Generalized linear mixed models for correlated binary data with t-link. Statistics and Computing. 2014;24(6):1111–1123.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref25] 25. Santos CC, Loschi RH. EM-Type algorithms for heavy-tailed logistic mixed models. Journal of Statistical Computation and Simulation. 2017;87(15):2940–2961.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref26] 26. Azzalini A, Capitanio A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2003;65(2):367–389.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref27] 27. Liu C, Rubin DB, Wu YN. Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika. 1998;85(4):755–770.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref28] 28. Branco MD, Dey DK. A general class of multivariate skew-elliptical distributions. Journal of Multivariate Analysis. 2001;79(1):99–113.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref29] 29. Hugo LDV, Cabral CRB. Scale Mixtures of Skew-Normal Distributions. In: Hugo LDV, Cabral CRB, Zeller CB, editors. Finite Mixture of Skewed Distributions. Switzerland: Springer International Publishing; 2018. p. 15–36.

[ref30] 30. Lachos VH, Ghosh P, Arellano-Valle RB. Likelihood based inference for skew-normal independent linear mixed models. Statistica Sinica. 2010;20:303–322.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref31] 31. Capitanio A. On the canonical form of scale mixtures of skew-normal distributions; 2012. Available from: https://arxiv.org/abs/1207.0797.

[ref32] 32. Kéri G. The Sherman-Morrison formula for the determinant and its application for optimizing quadratic functions on condition sets given by extreme generators. In: Giannessi F, Pardalos P, T R, editors. Optimization Theory. Boston: Springer; 2001. p. 119–138.

[ref33] 33. Ahmed A, Reshi J, Mir K. Structural properties of size biased Gamma distribution. IOSR J Mathem. 2013;5:55–61.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref34] 34. Ho HJ, Lin TI, Chen HY, Wang WL. Some results on the truncated multivariate t distribution. Journal of Statistical Planning and Inference. 2012;142(1):25–40.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref35] 35. Azzalini A. The R package sn: The Skew-Normal and Related Distributions such as the Skew-t (version 1.5-2).; 2018. Available from: http://azzalini.stat.unipd.it/SN.

[ref36] 36. R Core Team. R: A Language and Environment for Statistical Computing; 2019. Available from: https://www.R-project.org/.

[ref37] 37. Galarza CE, Matos LA, Lachos VH. Moments of the doubly truncated selection elliptical distributions with emphasis on the unified multivariate skew-t distribution. arXiv preprint arXiv:200714980. 2020.

[ref38] 38. Galarza CE, Lin TI, Wang WL, Lachos VH. On moments of folded and truncated multivariate Student-t distributions based on recurrence relations. Metrika. 2021; p. 1–26.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref39] 39. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society Series B (methodological). 1977;39(1):1–22.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref40] 40. Fernandez C, Steel MF. Multivariate Student-t regression models: Pitfalls and inference. Biometrika. 1999;86(1):153–167.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref41] 41. da Silva Braga A, Cordeiro GM, Ortega EM, Silva GO. The Odd Log-Logistic Student t Distribution: Theory and Applications. Journal of Agricultural, Biological and Environmental Statistics. 2017;22(4):615–639.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref42] 42. Lee D, Sinha S. Identifiability and bias reduction in the skew-probit model for a binary response. Journal of Statistical Computation and Simulation. 2019;89(9):1621–1648.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref43] 43. Arellano-Valle RB, Genton MG. Fundamental skew distributions. Journal of Multivariate Analysis. 2005;96:93–116.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref44] 44. Arellano-Valle R, Bolfarine H, Lachos V. Bayesian inference for skew-normal linear mixed models. Journal of Applied Statistics. 2007;34(6):663–682.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref45] 45. Arellano-Valle R, Bolfarine H, Lachos V. Skew-normal linear mixed models. Journal of data science. 2005;3(4):415–438.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref46] 46. Lin TI, Lee JC. Estimation and prediction in linear mixed models with skew-normal random effects for longitudinal data. Statistics in medicine. 2008;27(9):1490–1507. pmid:17708515
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref47] 47. Lachos VH, Dey DK, Cancho VG. Robust linear mixed models with skew-normal independent distributions from a Bayesian perspective. Journal of Statistical Planning and Inference. 2009;139(12):4098–4110.
View Article
Google Scholar

[125] View Article

[126] Google Scholar

[ref48] 48. Lachos VH, Labra FV, Ghosh P. Multivariate skew-normal/independent distributions: properties and inference. Pro Mathematica. 2014;28(56):11–53.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref49] 49. Pereira MAA, Russo CM. Nonlinear mixed-effects models with scale mixture of skew-normal distributions. Journal of Applied Statistics. 2019;46(9):1602–1620.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref50] 50. Liu C, Rubin DB. The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika. 1994;81(4):633–648.
View Article
Google Scholar

[134] View Article

[135] Google Scholar

[ref51] 51. Lange KL, Little RJ, Taylor JM. Robust statistical modeling using the t distribution. Journal of the American Statistical Association. 1989;84(408):881–896.
View Article
Google Scholar

[137] View Article

[138] Google Scholar

[ref52] 52. Meilijson I. A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society Series B (Methodological). 1989;51(1):127–138.
View Article
Google Scholar

[140] View Article

[141] Google Scholar

[ref53] 53. Meza C, Osorio F, De la Cruz R. Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Statistics and Computing. 2012;22(1):121–139.
View Article
Google Scholar

[143] View Article

[144] Google Scholar

[ref54] 54. Cox C. Delta method. Encyclopedia of biostatistics. 2005;2.
View Article
Google Scholar

[146] View Article

[147] Google Scholar

[ref55] 55. Fox J, Weisberg S. An R Companion to Applied Regression. 3rd ed. Thousand Oaks CA: Sage; 2019. Available from: https://socialsciences.mcmaster.ca/jfox/Books/Companion/.

[ref56] 56. Meza C, Jaffrézic F, Foulley JL. Estimation in the probit normal model for binary outcomes using the SAEM algorithm. Computational Statistics & Data Analysis. 2009;53(4):1350–1360.
View Article
Google Scholar

[150] View Article

[151] Google Scholar

[ref57] 57. Yan J. geepack: Yet Another Package for Generalized Estimating Equations. R-News. 2002;2/3:12–14.
View Article
Google Scholar

[153] View Article

[154] Google Scholar

[ref58] 58. Hall DB. Zero-Inflated Poisson and Binomial Regression with random effects: A Case Study. Biometrics. 2000;56(4):1030–1039. pmid:11129458
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

Figures

Abstract

Introduction

Preliminary results

Multivariate skew generalized t distributions

Truncated multivariate skew generalized t distributions

Moments of truncated multivariate skew generalized t distributions

Skew generalized t-link mixed binomial model

Model specification and marginal log-likelihood

Model identifiability

Maximum likelihood inference

Estimation via the EM algorithm.

Accelerating EM via parameter-expansion.

Summary of the estimation procedure.

Approximate standard errors.

Empirical Bayes estimators of random effects and weights.

Applications

Simulation study

Simulation design.

Performance measures.

Simulation results.

Application to the respiratory infection data

Conclusion

Supporting information

S1 Appendix. Proof of Lemma 1.

S2 Appendix. Proof of Lemma 2.

S3 Appendix. Proof of Proposition 1.

S4 Appendix. Proof of Corollary 1.

S5 Appendix. S5 Proof and limiting case of Proposition 2.

S6 Appendix. Proof of Proposition 3.

S7 Appendix. Proof of Proposition 4.

S8 Appendix. Proof of Proposition 5.

Acknowledgments

References