Bayesian Inference for Generalized Linear Mixed Model Based on the Multivariate t Distribution in Population Pharmacokinetic Study

This article provides a fully Bayesian approach for modeling of single-dose and complete pharmacokinetic data in a population pharmacokinetic (PK) model. To overcome the impact of outliers and the difficulty of computation, a generalized linear model is chosen with the hypothesis that the errors follow a multivariate Student t distribution which is a heavy-tailed distribution. The aim of this study is to investigate and implement the performance of the multivariate t distribution to analyze population pharmacokinetic data. Bayesian predictive inferences and the Metropolis-Hastings algorithm schemes are used to process the intractable posterior integration. The precision and accuracy of the proposed model are illustrated by the simulating data and a real example of theophylline data.


Introduction
The population model is a pivotal element for the estimation of the individual pharmacokinetic parameters needed for dosage individualization. For a comprehensive discussion of the principles of population pharmacokinetic analyses see Ette and Williams et al [1][2]. One of the principal aim of population pharmacokinetic studies is to estimate the parameters associated with intra-and inter-individual variability in observed drug concentrations. Another important aim in population pharmacokinetic model is the establishment of relationships between parameters and covariates to explain parameter variability and facilitate dose adjustment decisions. So the explanation of the inter-individual variability in terms of subject-specific covariates is crucial for the study of population pharmacokinetics.
Many statistical models have been proposed to fit population PK (PPK) parameters. The most popular analytic statistic model for population PK data is the linear mixed model proposed by Laird and Ware [3]. In this model, the probability distributions for the response vectors of different individuals belong to a single family. However some random-effects parameters vary across individuals, with a distribution specified at the second stage. The first nonlinear mixed-effects modeling program introduced for the analysis of large amounts of pharmacokinetic data is NONMEM [4]. In the NONMEM program, inter-and intra-individual variability measures are combined, in a first-order approximation. Besides, many other more accurate methods have been incorporated in this software including Expanded Least Squares (ELS) and maximum likelihood method. In fact non-linear methods are used to estimate parameters of a chosen compartmental model. This method generally produces good results. For other nonlinear models for the analysis of population pharmacokinetic data see Wakefield et al [5][6][7], in which M-H algorithm is being used as an implementation of MCMC.
Ruth Salway et al [8] proposed a generalized linear model (GLM) with gamma distribution to deal with population PK data. But in a number of cases, especially during the development of new drugs, the structural pharmacokinetic model and inter and intra-subject variability models could be misspecified. This may lead to biased estimation for population parameters. Misspecification of k a may lead to errors in the estimation of volume and PK parameters such as maximal concentration (C max ). Graphical methods and Hosmer-Lemeshow type goodness-of-fit statistics can be used to detect misspecifications, as can be seen in Janet R [9] and Ivy Liu [10]. In the studies of other researchers, many semiparametric and nonparametric methods have been proposed to reduce the bias of estimates [11]. However, in all of these methods mentioned above, at least one of the following important issues standout.
Firstly, all the models and methods have a common assumption that the residual error is normally distributed, but that assumption is not always proper. This assumption may lack the robustness against departures from normality and outliers and may also lead to misleading results. In the context of clinical trials with large numbers of observations on per subject, often one or two of them may give rather extreme response values, so a heavy tail distribution may be more appropriate than the (log) normal distribution. When we try to estimate the parameters of pharmacokinetic data with heavy tails, such (log) normality assumptions are inappropriate to outliers and it may affect the estimation of fixed effects and variance components seriously. Secondly, the associated intensive computation burden in the inference is a major challenge, and in some cases it can even be computationally infeasible. Particularly, for nonlinear longitudinal models the computational problem becomes much worse. Therefore, in our study, a multivariate t distribution for a generalized linear model is considered, which has two obvious advantages: (1) t-distribution is more robust for modeling data with heavy tails than the normal distribution, i.e. it is more prone to outliers. It approaches the normal distribution as n (freedom degree) approaches infinity, and smaller values of n yield heavy tails.
(2) GLM model is easier to compute than nonlinear models. So in this paper, Bayesian method to estimate parameters in the GLM model is employed. The most pragmatic merit of Bayesian approach is the ability to take account of all parameter uncertainties. In aspect of Bayesian method research, lots of authors have advocated Markov Chain Monte Carlo (MCMC) schemes to deal with intractable posterior integrations, which can help reduce the difficulty of simulating directly from the posterior distribution. Wakefield AJ(1996) has proposed that MCMC methods used for hierarchical models [12]. O.Gimenez (2010) adopted a Bayesian framework with MCMC to carry out estimation and inference [13]. Chib considered several MCMC sampling schemes for hierarchical longitudinal models [14].
This article addresses these issues by modeling the response variable with outliers and using a Bayesian approach to investigate estimated parameters of generalized linear model proposed by Ruth Salway [8]. The rest of this article is organized as follows: ''The Model'' section describes the model and the chosen priors. The Metropolis-Hastings reject algorithm is constructed in the section of ''Bayesian estimation and predictive inference''.
''Simulation Study'' section provides three simulations to illustrate the performance of our proposed method. Extensive model checking is carried out for the theophylline data in the section of ''Application''. A conclusion is made in the part of ''Discussion''.

Generalized Linear Model with t Errors Distribution
In the PPK study, usually one or two of the observations may give such a rather extreme response values that a heavy tail distribution may be more appropriate than the (log) normal distribution. Gomez et al. have introduced a multivariate generalization of the power exponential distribution which could effectively model heavy-tailed data [15]. This is a subfamily of the elliptically contoured distributions, including the multivariate normal distribution as a special case. Another better known subfamily is the multivariate Student t distribution. The strategic importance of these distributions arises because they only require simple modifications to the multivariate normal distribution which could be easily programmed.
The one-compartment model has been widely studied [16,17]. The concrete expression is Where c(t) is drug concentration which is a function of time, D is dose, V is the apparent distribution volume, and k a and k e are respectively the absorption and elimination rate. According to Ruth Salway et al [8], we can rewrite the onecompartment model (1) as where b 0~l ogfk a =½V (k a {k e )g and b 1~{ k e . This model is really a transform of 1-compartment model. Let y(t ij ) be the j th (j~1,:::,k) measured concentration for the ith (i~1,:::,n) individual at time t ij . Then using a GLM and fitting the loglinear fractional polynomial model: Here, b 2 determines the absorption and we require b 1 v0 and b 2 v0 to ensure an increasing absorption phase and a decreasing elimination phase. Owing to the skewed nature of pharmacokinetic data, we shall use a log transform of the data. And for the GLM inference for other compartment model see Ruth Salway 2008 [8].
Let t Denote an n-dimensional multivariate t distribution with location vector c, scatter matrix S and degrees-of-freedom u, then a two-stage generalized linear random effect model that incorporates a finite mixture model could be rewritten as where At the first stage represented by (2), Y i~( y i1 ,:::,y ik ) 0 is the observation vector for the ith individual; D exp (T'b i ) is the function defining the pharmacokinetic model, including subjectspecific variable (e.g. dose). The design matrix T equals (1,t i ,1=t i )', and t i is the time schedule. b is the vector of fixed effects, representing the population parameters. e ij is the vector of random effects, and S the positive definite covariance matrix. The distribution of Y i can be expressed as Y i *t ni (c i ,S,u) and interested parameters can be expressed in terms of b~(b 0 ,b 1 ,b 2 )', which representing the model parameters.
At the second stage.  where b~(b 0 ,b 1 ,b 2 )' is the associated fixed effects, and At the second stage given by (3), a finite mixture model is used to describe the population distribution, where V represents the correlation matrix. With equation (2) and (3), the population physiology parameters such as half-life period, peak concentration and time to peak (t 1 2, = t max and c max ) could be derived from b~(b 0 ,b 1 ,b 2 )' directly according to the formula given by Ruth Salway et al [8].

Bayesian Inference
Methods such as Maximum likelihood techniques together with the algorithm such as EM, SAEM have been frequently used in GLMMs [18][19]. The computational burden is relatively small. Due mainly to recent advances in computing technology, the Bayesian sampling-based approach has been recognized as an alternative modeling strategy to offer the data-analysis. The ability to consider all parameter uncertainties is the merit in this approach. And MCMC techniques have revolutionized the field of Bayesian statistics by enabling posterior inference for arbitrarily complex models. Of course it is true that the methods are computer-intensive and compared to an ''equivalent'' maximum likelihood analysis, the overall run-time may be much longer. In our research, we adopt the Bayesian method for the model with the t-distribution, the same as that of Wakefield et al [12]. Based on the Bayesian theory, the posterior probability can be computed as follows: Since p(Y ) is a normalizing factor to ensure that this factor in practice can be ignored unless alternative models are compared. Bayes theorem concerns the terms involving b in essence and hence the complete posterior is often written as Here the posterior distribution is proportional to the product of the likelihood and the prior.
To complete a Bayesian formulation of the model above, one must specify a prior distribution for (b,S,V). Suppose (b,S,V) are independent priors, that is In this paper the traditional approach is adopted to avoid the confusion. In the absence of sufficient information of prior of b, a popular method of avoiding improper posterior distributions is to use proper conjugate priors that are diffuse. For the reason that closed form posteriors or full conditionals could be derived analytically in this way, the distribution forms of priors are chosen traditionally for mathematical convenience. For example, an inverse-gamma prior is typically specified for S and V when the form of the normal density was assumed for each observation. Nowadays there exist several reliable methods for sampling from non-standard distributions, so it is worthwhile giving a little more thought towards one's choice of priors. First Bayes (http://www. shef.ac.uk/stlao/lb.html) is an entry-level/educational Bayesian software package with a graphical interface, and it can be used to explore which distributions best reflect any prior information. For specific problems, prior uncertainty might be expressed better via a log-normal or uniform distribution. For example, Natrajan and Kass discussed the various options available for specifying noninformative priors for variance components [20]. In our study, the prior distributions are applied as follows: b*N(b 0 ,V), and we assume S,V are diagonal matrix with S~S 2 1 I, s 2 1 *G(a,b), and V~S 2 2 I, s 2 2 *G(c,d). G(a,b) represents gamma distribution with scale parameter a and location parameter b. The single corresponding element of b represents the population mean value, and the prior mean of which is typically set equal to an initial estimate. The element of V that corresponds to the gradients is usually set equal to zero, which represents none existence of covariate effects. Initial estimates based on previous studies may be a better choice, so in our study we use First Bayes or S-PLUS to test different values of c and d to find a pair which can result directly in a close match. Obviously the above discussion of priors for V is also applicable to the S component. The hyperparameters may also be chosen in a similar way.
Combining the complete-data likelihood function of model (2) with the prior distribution, we have the following joint posterior density of b.
But, the posterior distribution is not tractable and thus we resort to the simulation-based methods. We provide Metropolis-Hastings algorithm schemes to deal with intractable posterior integration. The acceptance probability is calculated via a rejection algorithm. The implementation is described as follows: The reject probability was denoted as The rejection component proceeds are:

Simulation Study
In order to illustrate that the model described above can be used for application, we develop three simulated examples.
Example 1. Firstly, we generated samples from normal distribution error terms, and analyzed these samples by normal error and t-distribution error in the M-H algorithm, respectively. The detailed processes are as follows.
The data containing 12 concentration measurements obtained from each individual over the successive 36 hours was simulated based on 10 subjects given a drug of 10mg dose each individual.
If b represent the collection of derived parameter vector (b 0 ,b 1 ,b 2 )', the population problem involves estimating b given the observed data Y~(Y 1 ,Y 2 ,:::,Y n )': According to the model mentioned above, expressed by equation (2) and (3) assumed to be normal, and the variance is 0.008. Random error is generated from the multivariate normal distribution with mean value equal to 0 and correlation matrix set as a diagonal matrix with element equal to 0.008. 100 samples have been obtained following the above sets.
Stable parameter estimators will be derived as long as there are enough iteration times through the following steps: Generate an identity matrix to save the posterior mean matrix, and calculate the initial mean values. (ii) Generate one random matrix sampled from the multivariate t distributions for error. Then the simulated observations equal the mean described in (2) plus the error matrix.
Calculate the reject probability of the sample and judge the acceptance probability for each parameter (b,S,V) via the reject algorithm. (iv) Update the parameters, and do the iteration 5000 times. Iterating between step (ii) and step (iii) in the conceptual algorithm until convergence.
Repeat the steps from (ii) to (iv) 100 times. In each repetition process, we discard these initial samples to reach steady state distribution. We estimated the necessary burnin (e.g., 4000-5000 iterations) that collected from the posterior distribution. There are 100 samples in total. Therefore 100 parameter estimates are obtained, and then the means and 95% confidence intervals are computed from them.
With the assumption, the generated data are typical concentration-time picture reflecting some unexpected conditions in the  PK study. For the Bayesian models we use independent normal priors on the fixed effect. The markov chain as shown in Figure 1 could be obtained with the reject probability which is calculated by using the reject algorithm described before. For this problem, it is not easy to monitor convergence for S,V. Even if incorrect initial valves are chosen, the chain will be rapidly converged to the previous specified values. Stable markov chain of the interested parameters is achieved after 5000 iteration times. The above process was implemented using Matlab software and the program codes are available in File S1. For example, Figure 1 shows that the posterior means ranges narrowly in the specified prior values. According to Figure 1, we can draw a conclusion that the convergence could be obtained rapidly with the initial values of (0.1, 20.1, 20.1). The means and 95% confidence intervals are computed from the last 1000 iterations. Point estimates and 95% interval confidence are presented in Table 1. For diagnostic plot to evaluate the goodness of fit see Figure 2 From table 1 and figure 2, it can be seen that the t distribution derives a close result with corresponding normal models. Example 2. The second simulation is to demonstrate that the t distribution approach yields better estimates of the population parameters when outliers are present. Unlike the above simulation, error terms of the sample are assumed to be normal distribution here. We random select 5% values of a sample, then plus a shift (ƒ+5) to these selected value as outliers. For a concrete example see Figure 3. Different from simulation in Example 1, we set b 0~( 0:8,{0:04,{0:2)', and S 1~S2~0 :008: The results in table 2 shows that t distribution derives a more accurate point estimates and a relative smaller interval than normal distribution. The finding also shows that the generalized linear model with a t-distribution may achieve reliable results when the data exhibit outliers.
Example 3. Unlike the above simulations, error terms of the sample are assumed to be t -distribution here. Again, we randomly select 5% values of a sample, then plus a shift (ƒ+10) to these selected value as outliers. Different from simulation in example 2, we set b 0~( 0:8,{0:04,{0:2)', and S 1~0 :01, S 2~0 :001. The results in table 3 show that t distribution derives a more accurate point estimates and a relative smaller interval than normal distribution. The finding also shows that the generalized linear model with a t distribution may achieve reliable results when the data exhibit outliers.

Application
Here we present a true example. Real pharmacokinetic data are available from the resource Facility for Population Kinetics at http://www.rfpk.washington.edu. Initially this data set was used for Bayesian analysis of linear and non-linear population models by using the Gibbs sampling.
For a description of the theophylline study sees Upton et al (1982). The parent drug concentrations in 24 hours are plotted in Figure 4. Here we apply our method to the data of 12 subjects given an oral dose of the antiasthmatic agent theophylline, with 10 concentration measurements obtained from each individual over the successive 25 hours. The data was originally analyzed in Upton et al. (1982) [21] and is available from the Resource Facility for Population Kinetics. The data is shown in Figure 4. It has been also used in Bayesian analysis of linear and non-linear population models employing Gibbs sampling with normal errors [22]. Wakefield suggested the data be analyzed by the generalized linear mixed model with the gamma error [8]. Now we apply tdistribution for generalized linear mixed model. Again, a convergence plot of markov chain has been obtained, as can be seen in Figure 5. Confidence ellipse and confidence ellipsoid pictures are shown in Figure 6. Most of the observations are located inside the domain. Ignore some exceptional values, this may suggest a good fit for the parameters. A comparable representation with the gamma distribution for the derived parameters can be seen in Table 4 and Table 5.
For performing the fitting of the model, we choose AIC information criterion to further analyze the results and asses the model. AIC information criterion, which can judge complexity of models, is a standard to evaluate the goodness-of -fit of models. After calculation, AIC values for the three models are obtained as 42.0077 for gamma model, 39.8176 for t-distribution model and 45.1087 for normal distribution model. The AIC value of tdistribution is smallest, so in this application it is proper to assume the error of pharmacokinetic model follows a t-distribution and it is better than normal distribution.

Discussion
In population PK studies both random effects and withinsubject errors are assumed to be normally distributed for mathematical convenience. However, such a normality assumption may be not suitable and in turn may affect the estimates of regression coefficients and variance components especially when the experimental data is thicker than normal tails or atypical observations. In this paper, we offered another distributionmultivariate t-distribution for Population PK data, and have obtained a more effective result than the Gamma distribution and normal distribution. In Example 1, results of t-distribution and normal distribution are close, indicating that t-distribution can be used when estimating the parameters in population pharmacokinetic models. In Example 2, PPK sample is produced from normal distribution and outliers are introduced. The results show tdistribution is more accurate than normal distribution for this sample. In Example 3, we introduced outliers from t-distribution, which still works better than normal distribution. We conclude that heavy-tailed t errors for generalized linear regression models are more stable than the normal error model, in the presence of outliers and it is an ideal method when processing clinical data.
We choose multivariate t-distribution for three reasons. Firstly, normal distribution is the limit of t-distribution. When the degree of freedom tends to be infinite, the tdistribution is exactly the normal distribution. If the degree of freedom is small, then t-  distribution tends to be more suitable. Thus, for the real data, we should estimate the degree of freedom of t-distribution in the very beginning. In a Bayesian analysis of a model with t-distribution developed by Geweke (1999) [22], the degree-of-freedom, if unknown, must be sampled from its conditional distribution. In this paper, our approach is similar to the method described by Worsley and Friston (1995) and implemented in SPM99 [23]. This is a data driven method and independent of any specific preprocessing method. Obviously, if the number of effective degree of freedom is large enough (say, .50), approximate distributions can be used (e.g., normal or x 2 ). Secondly, tdistribution is advantageous over normal distribution especially when there exist outliers in the data because t-distribution is more stable than normal distribution. Thirdly, it only requires simple modifications to the multivariate normal distribution which could be easily programmed. It is obvious to use other distributions for extension. Bayesian estimation is a good method for Population PK data (Wakefield et al. [24]). Wakefield [25] concluded that Bayesian inference was practically feasible for GLMMs, and provided an attractive choice to likelihood-based approaches. However there have been two major obstacles during its routine use: the specification of prior distribution and the evaluation of integrals which are required for inference. If the likelihood is correctly specified, the posterior distribution will be asymptotically normal with the mean of the true values, and variance covariance matrix may be given by the inverse of the expected information. Bayesian methods can be applied to the fractional polynomial GLMM, which can in turn be converted to PK parameters in straightforward fashion. In this study we placed priors on the model parameters (beta0, etc.) instead of derived parameters of interest, e.g., t 1=2 , c max etc., to avoid complex calculation and inference. Though in the pharmacokinetic study, these derived parameters are more understandable and combine various sources of information.
In this paper we also discussed informal (graphical) methods, which could provide both informative diagnostic aids and easily-understood inferential summaries and were sufficient for practical purposes. Arguably the most useful graphical tool for assessing convergence is the ''trace'' plot, using such a plot an experienced analyst can usually detect when a single chain has reached steadily, i.e., at what point the samples become independent of the starting value. From Figure 1 and Figure 6, we could see that the simulation process have implemented convergence.
There are many problems that need further studies in the future. Firstly, in fact, missing data and censoring data are very common in population PK data analysis. So it is worthwhile to analyze missing data and censoring data by using t distribution. Secondly, in this paper, we provide the Bayesian method for fitting the pharmacokinetic data with t-distribution. Finding more efficient computing algorithm for population pharmacokinetic data with t-distribution should be considered in the future. Thirdly, choice of covariance structure should also be considered under the condition of t-distribution. Our methods are expected to expand to the context of variance structures including heteroscedasticity, autocorrelation structure, or even autoregressive time series structure.

Supporting Information
File S1 Simulation analysis matlab code. Matlab code for performing the simulation analysis presented in the paper. (DOC)