Study of Bayesian variable selection method on mixed linear regression models

Yong Li; Hefei Liu; Rubing Li

doi:10.1371/journal.pone.0283100

Abstract

Variable selection has always been an important issue in statistics. When a linear regression model is used to fit data, selecting appropriate explanatory variables that strongly impact the response variables has a significant effect on the model prediction accuracy and interpretation effect. redThis study introduces the Bayesian adaptive group Lasso method to solve the variable selection problem under a mixed linear regression model with a hidden state and explanatory variables with a grouping structure. First, the definition of the implicit state mixed linear regression model is presented. Thereafter, the Bayesian adaptive group Lasso method is used to determine the penalty function and parameters, after which each parameter’s specific form of the fully conditional posterior distribution is calculated. Moreover, the Gibbs algorithm design is outlined. Simulation experiments are conducted to compare the variable selection and parameter estimation effects in different states. Finally, a dataset of Alzheimer’s Disease is used for application analysis. The results demonstrate that the proposed method can identify the observation from different hidden states, but the results of the variable selection in different states are obviously different.

Citation: Li Y, Liu H, Li R (2023) Study of Bayesian variable selection method on mixed linear regression models. PLoS ONE 18(3): e0283100. https://doi.org/10.1371/journal.pone.0283100

Editor: Lei Shi, Yunnan University of Finance and Economics, CHINA

Received: July 14, 2022; Accepted: March 1, 2023; Published: March 17, 2023

Copyright: © 2023 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting information files.

Funding: This article is partially supported by the fund of Yunnan Provincial Science and Technology Department of China (Award Number: 2019FH001-108). This work was also supported by Yunnan Provincial Department of Education of China (Award Numbers: 2022J0810, 2023J1028). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Multiple observation data of each index of the sample are required in biomedical and econometric research. Such data are usually referred to as longitudinal data. The mixed linear regression model is commonly used for fitting these data. In general, mixed linear regression models contain two parts: fixed effects and random effects that are subject to an unknown distribution. The variable selection problem in a mixed linear regression model usually focuses on the variable selection in the fixed effect part.

In recent years, the class of variable selection methods with penalty functions has become very popular. These methods are based on the least absolute shrinkage and selection operator (i.e., Lasso) method proposed by Tibshiran [1]. This class of penalty methods can perform variable selection and parameter estimation, and exhibits good stability and strong statistical properties. For example, the SCAD(Smoothly Clipped Absolute Deviation) penalty that was proposed by Fan and Li [2] can satisfy several excellent properties such as asymptotic unbiasedness, sparsity, and continuity. Zou [3] presented adaptive Lasso, which exhibits consistency when the number of variables is fixed and the sample size approaches infinity. This method solves the problem of poor consistency in Lasso estimation. Moreover, the adaptive group Lasso proposed by Wang and Leng [4] assigns different adjustment parameters to different regression system arrays, whereby effective variable selection and coefficient estimation can be performed, and subsequently, improved results can be obtained.

When Tibshiran proposed the Lasso method, he proved that when the prior distribution of the regression coefficient is a Laplace distribution, the estimation result of the regression coefficient that is obtained by the Lasso method is consistent with the result of the maximum a posteriori probability estimate, which led to the new concept of Bayesian Lasso. As the Bayesian method exhibits excellent stability and high computational efficiency, this method has been rapidly expanded. On this basis, Park and Casella [5] proposed a complete Bayesian model with conditional Laplace distribution as the prior distribution, and used Gibbs sampling to estimate the posterior distribution of the parameters. Subsequently, Kyung [6] further extended this model and proposed a complete Bayesian formula that can be combined with several variants of Lasso. Leng [7] extended this model to the complete Bayesian adaptive Lasso and applied it to the variable selection of linear models. Lykou [8] used the Bayesian Lasso method to select the model variables. Khondker [9] further extended this method to the Bayesian covariance Lasso method. Raman [10] proposed the Bayesian version of group Lasso method, applied it to contingency tables, and proved its stability and efficiency. Ibrahim [11] introduced the SCAD penalty and adaptive Lasso into the mixed linear regression model. Feng and Wang [12] presented the Bayesian adaptive group Lasso method and applied it to the semiparametric structural equation model. Kang and Song [13] applied the Bayesian adaptive group Lasso to the semiparametric hidden Markov model.

However, in general, the research on variable selection with a grouping structure of the explanatory variables under a mixed linear regression model with an implicit state remains lacking, and few studies have used Bayesian Lasso and its variants to solve this problem. In this study, we introduce the Bayesian adaptive group Lasso into the mixed linear regression model with hidden states to select the variables and estimate the parameters. The purpose is to explore the screening of explanatory variables in a mixed linear regression model when the samples have different states, and the explanatory variables are significant in some states and not significant in others.

The remainder of this paper is organized as follows: Section II introduces the basic form of the mixed linear regression model and its variable selection, Bayesian theory, Bayesian Lasso and its extension method, and the MCMC sampling algorithm, with a focus on the Bayesian adaptive group Lasso method. Section III introduces the core theory of this paper. First, the data and mixed linear regression model used in this study are outlined. Thereafter, the use of Bayesian adaptive group Lasso to estimate the parameters and select the variables under this mixed linear regression model is presented. Furthermore, the fully conditional posterior distribution of the unknown parameters involved in the Bayesian hierarchical model is derived. Finally, the specific algorithm steps of the Gibbs sampling in this study are provided. Section IV presents the application research. Subsequently, the estimation effect of the methods and algorithms on the real parameters and variable selection accuracy is valuated according to the numerical simulation results, and an example is provided. Section V summarizes the paper.

Model description

Consider the following mixed linear regression model, where the observed individuals are recorded as i = 1, 2, ⋯, N, and the observed is t = 1, 2, ⋯, T. Under the condition S_it = s, the regression model is: (1)

In the above, ε_it is the random error, which is independently and identically distributed in N(0, σ²), S_it is the state of the i-th sample at the t-th observation, S_it = s means that the model is defined in the specific state s. Parameter θ_s is the unknown regression coefficient, which is also known as the fixed effect, and θ_s = (α_s, β_s)^T. The explanatory variables corresponding to α_s are independent of one another. β_s represents the coefficient corresponding to the explanatory variable with a grouping structure.

Furthermore, α_s is an L-dimensional vector, β_s is a p-dimensional vector, and x_it = (x_it1, x_it2, ⋯, x_it(p+L)) is the known explanatory variable. Let the unknown random vector u_s be m-dimensional.Parameter u_s is often referred to as the random effect, and it is generally assumed that . Thus, we know that vector z_it is m-dimensional, and because the model established in this study is red the longitudinal data model in the mixed linear regression model, z_it is a vector in which the i-th component is 1 and the other components are 0, i = 1, 2, ⋯, N.

For state S_it, the following assumption applies: (2) where s = 1, 2, ⋯, S.q_s is an unknown constant value with , and S is a known positive integer; that is, the total number of states is known.

As the observation values in different states will not affect the theoretical form of the conditional prior of each unknown parameter and its corresponding full conditional posterior distribution, but only the specific numerical calculation, the distinction of the states of each observational value is discussed in the iterative calculation [14]. Therefore, the subsequent theoretical part is investigated in the specific state s. For convenience of the description, the subscript s is omitted. In specific state s, the model is abbreviated as (3) where θ = (α, β)^T, u is distributed in is distributed in N(0, σ²), and α = (α₁, α₂, ⋯, α_L)^T. All explanatory variables with a grouping structure are divided into J groups. The set of subscripts of each group is marked as G_j, j = 1, 2, ⋯, J. Thus, we can rewrite θ = (α, β)^T as .

Bayesian inference principle

Bayesian adaptive group Lasso

In this study, the Bayesian adaptive group Lasso has the following penalty function form: (4) where positive definite matrix is a p_j-order identity matrix, and λ_l and γ_j are positive penalty parameters that have positive values. and γ_j can be selected to calculate the corresponding full conditional posterior distribution, and the estimated value can be obtained by the Gibbs method [15].

We introduce the conditional Laplacian prior as the prior distribution of the coefficients of the explanatory variable [16], rewrite the model into a hierarchical structure, provide the fully conditional posterior distribution of all the parameters to be estimated, and subsequently, calculate their estimated values according to Gibbs.

The conditional Laplace prior for coefficient α is (5) where α_l is the l-th component of α, which is independent and identically distributed in a univariate Laplace conditional distribution [17], with the location parameter 0 and scale parameter .

The conditional Laplace prior for coefficient β is (6) where , which denotes the components of β, is independent and identically distributed in a multivariate Laplace distribution.

Subsequently, the above Laplace prior distribution is expressed as a normal mixed distribution with an exponential mixed distribution [18, 19]:

For α: (7) For β: (8)

For convenience of the description, we combine the components of each parameter: let ε = (ε₁₁, ε₁₂, ⋯, ε_mT)^T be an mT-dimensional vector, Z = (z₁₁, z₁₂, ⋯, z_mT)^T be a matrix of mT × mT, and Σ = σ²I_mT be a matrix of mT × mT. Moreover, is a matrix of m × m.

Let ε* = Zu + ε, which is distributed in N_mT(0, Σ + ZDZ^T). Therefore, according to the model assumption, the conditional distribution of the explained variable Y can be obtained as follows: (9)

Let Σ + ZDZ^T ≈ σ^*2I_mT, we can rewrite (9) as follows: (10) where the * of σ*²I_mT is omitted for a succinct description. The prior for parameter σ² is set as the inverse gamma distribution. Thus, the model can be expressed as the following hierarchical model: (11) where a, b, a_λ, b_λ, a_γ and b_γ are hyperparameters.

Gibbs sampling

The hierarchical model for Bayesian adaptive group Lasso was obtained in the previous section. It is necessary to solve the fully conditional posterior distribution of all unknown parameters to use Gibbs sampling to estimate the parameters involved in the model [20, 21].

According to the hierarchical model, all conditional posterior distributions of the parameters are obtained as follows: where .

Gibbs sampling can be used for parameter estimation once all of the conditional posterior distributions of all unknown parameters have been obtained. The confidence interval criterion method proposed by Li and Lin [17] is used for the variable selection. According to this method, for the coefficients corresponding to variables without a grouping structure α, if the 95% confidence interval does not cover zero, the variable can be considered as significant; otherwise, the variable is considered as not significant and is eliminated. For the coefficients corresponding to variables with a grouping structure β, if the 95% confidence interval of the estimated coefficient of a variable in the group covers zero, the entire group of variables is eliminated.

In the Gibbs process, the specific iteration procedure is as follows:

(1) The specific state of the observed value is unknown but the total number of hidden states is known, and an initial value is assigned to the hidden states: Let . The initial value of each parameter under specific state s is:

(2) For the k-th iteration:

sample the parameters in each state s, s = 1, 2, ⋯, S:

sample from ,

⋯⋯

sample from ,

until all parameters in all states have been converged.

Subsequently, the extracted parameters are used to calculate the full conditional probability density function of the hidden state: where and is the likelihood function of the observation in state s. Thus, the conditional probability density function of all hidden states is obtained, following which the state of each observation at this time can be obtained using distribution U(0, 1) as auxiliary sampling:

Update parameter q_s:

The k-th iteration ends.

(3) Return to step (2) and perform the (k + 1)-th iteration until the target number of iterations is reached.

Simulation experiment

Model settings

The main purpose of the numerical simulations are testing the accuracy of the model parameter estimation and variable selection, the accuracy of determining the state of each observation, and the differences in the variable selection results under different states. Moreover, the effects of different sample sizes on the parameter estimation were investigated.

(1) The simulation settings are as follows:

A total of 100 experiments were conducted in which the following situations were considered each time: observation times T = 3; sample size N = 100, and 300; number of hidden states S = 2. The probability that each observation value belonged to state 1 or state 2 was the same, namely 0.5.

In the first state: α = (−1.7, 1.3), , , and ; ; .

In the second state: , and .

The settings in the two states were considered for design matrix X:

The part corresponding to coefficient α, namely X_α, was distributed in the multivariate normal N(0,I), where I is the identity matrix.

The part corresponding to coefficient , namely , was distributed in .

As there was a strong correlation between the components of , the following settings were available: the element in row i and column k of was 0.7^|i−k|, i, k = 1, 2, 3; the element in rowi and column k of was 0.6^|i−k|, i, k = 1, 2, 3; the element in row i and column k of was 0.4^|i−k|, i, k = 1, 2, 3.

The following settings were used for the random effects:

(2) Hyperparameter and MCMC settings:

Hyperparameters a, b, a_λ, b_λ,a_γ, and b_γ in the hierarchical model (11) were set to [22]:(prior I) a = 1, b = 0.1, a_λ = 1, b_λ = 0.1, a_γ = 1, and b_γ = 0.01.

The number of iterations of the MCMC was set to 5000. Three groups of different initial values were set for all parameters to be estimated and the EPSR values of three parallel simulation sequences of all parameters were calculated. When the number of iterations was 2000, the EPSR values of all parameters were less than 1.2. This indicated that the sample converged at 2000 iterations. Therefore, the samples that were obtained from the first 2500 iterations were removed as aging values and only the following 2500 data were retained for analysis to ensure convergence. The posterior mean was used as the estimated value of each parameter.

Analysis of results

After repeating the experiment 100 times, the estimated conditions of each coefficient selected in the model could be classified in two different quantities and two states, as indicated in Table 1.

Download:

Table 1. Estimate results under prior I.

https://doi.org/10.1371/journal.pone.0283100.t001

It can be observed from Table 1 that the model generally had a good estimation effect for each parameter, and the estimation effect on each component of α was better than that on each component of . This may be because the three components of had a strong correlation with one another and imposed penalties on the entire group, so the estimation effect of a single component was somewhat poor.

Furthermore, the accuracy of the parameter estimation increased with the increase in the sample size, indicating that the estimation effect of the model increased.

The model variable selection was also investigated, and the 95% confidence interval of the posterior mean value of each parameter was calculated. For parameter α, if the confidence interval of its components was covered to zero, the corresponding variables were removed. For parameters , if the confidence interval of the components of was covered to zero, the entire group of was removed.

In this study, the components of β were considered as a whole. In each experiment, among the three components of the estimated value of β in two states, the true value was zero, and the variable selection result was the number of components excluding their corresponding variables. A total of 100 record results were obtained in each of the two states, and the mean value of the results was the average of the correct zeros in Table 2. Accordingly, the average of incorrect zeros indicated that the true value was not zero, but the variable selection result was the average of the number of coefficient components, excluding their corresponding variables.

Download:

Table 2. Identification results of insignificant variables.

https://doi.org/10.1371/journal.pone.0283100.t002

It can be observed from Table 3 that in the 100 repeated experiments, the two group vectors with zero real coefficients in state 2 were eliminated. However, in state 1, only one group vector had a true coefficient of zero, but two group vectors were excluded; that is, one group vector with a true coefficient of non-zero was excluded from the model.

Download:

Table 3. Frequency of group variables selected in model.

https://doi.org/10.1371/journal.pone.0283100.t003

According to Tables 1–3, only one of the components of parameter in state 1 had all zero values. However, according to the “confidence interval criterion”, if the confidence interval of a component covers zero, the entire corresponding group of variables should be eliminated. In as only one component had a large coefficient, it was considered that as a whole was not significant and needed to be eliminated.

Sensitivity analysis

In this section, we conduct sensitivity analysis to examine whether the proposed method is sensitive to the prior specification. we reset the hyperparameters as follows:(priorII) a = 6, b = 4, a_λ = 2, b_λ = 0.01, a_γ = 3, b_λ = 0.05. The MCMC setting is not changed. Table 4 is the parameter estimate result under prior II.

Download:

Table 4. Estimate results under prior II.

https://doi.org/10.1371/journal.pone.0283100.t004

The estimated results of parameters in Table 4 are similar to those in Table 1. The experimental results show that the proposed variable selection method is robust to the prior distribution hyperparameters.

Case study

The proposed model and method can be used in the study of Alzheimer’s disease to illustrate the practicability of the model and method. The data and more information can be found on its website(www.adni-info.org). Because many individuals had information missing, this research did not consider the missing data problem, we deleted individuals with information missing. So we selected 512 patients and collected their clinical information and basic variables at the base period, 6 months, 12 months, 24 months and 36 months. N = 512 and T = 5 in this model. The specific information of response variables and interest covariables initially selected by us is shown in Table 5.

Download:

Table 5. Variable details.

https://doi.org/10.1371/journal.pone.0283100.t005

In the model, FAQ(Functional Assessment Qestionaire) scores were selected as the response variable(y_it) to reflect the cognitive and behavioral abilities of respondents. Among the 11 possible interest variables, X₁, X₂, X₃, X₄ is the inborn and unchangeable biological genetic information, X₅, X₆, X₇ is the changeable biological information, X₈, X₉ is the past historical information, and X₁₀, X₁₁ is the current social attribute that may change. Therefore, we divide 11 variables into 4 groups:,,,. We roughly divide the respondents into two states, one is with cognitive and behavioral disorders, and the other is without or with slight cognitive and behavioral disorders. We need to study the following issues: 1.What are the factors that affect cognitive and behavioral abilities in each state? 2.In each state, what is the influence relationship between covariates and response variables?

The above problem is to select variables and estimate the parameters of model . Here α is the intercept term.

Before the empirical analysis, we first standardized the three variables, FAQ sore(y), age (X₅) and years of education (X₈). We choose the hyperparameter of prior I in the analysis. Table 6 shows the results of variable selection and parameter estimation.

Download:

Table 6. Estimate results of Alzheimer’s disease data.

https://doi.org/10.1371/journal.pone.0283100.t006

From Table 6, we can see the variable selection consequences of the model. Under state 1, three groups of variables G₁, G₂, G₄ have significant effects on response variables, while G₃ has no significant effects on response variables. Under state 2, G₂ has a significant impact on the response variable, while G₁, G₃, G₄ has no significant impact on the response variable.

Substituting the corresponding coefficients and variables into the model, we can get that when the respondents have cognitive impairment, the influence relationship model of FAQ score is y = 1.71 − 0.25X₁ − 0.16X₂ + 0.72X₃ + 0.08X₄ + 0.11X₅ + 0.46X₆ − 0.29X₇ − 0.69X₁₀ − 0.39X₁₁. When the respondents has no cognitive impairment or mild cognitive impairment, the influence relationship model of FAQ score is y = −0.84 + 0.27X₅ + 1.14X₆ − 0.43X₇.

The results of variable selection show that different contents affect FAQ scores in different cognitive states. When the respondent is in a cognitive disorder state, the innate genetic information, changeable biological information and current social attributes will affect the respondent’s cognitive ability (FAQ score). When the respondent is in the state of no cognitive impairment or mild cognitive impairment, only changeable biological information has a significant impact on cognitive ability (FAQ score).

From the consequences of variable selection, we found that the changeable biological information (X₅, X₆, X₇) had a significant impact on cognitive ability no matter the respondents were in any state. Further analysis shows that X₅ and X₆ have a positive impact on the FAQ score, and X₇ has a negative impact on the FAQ score. This shows that the older you get, the bigger Aβ₄₂ you get, the weaker your cognitive ability is. The larger the volume of hippocampus, the stronger the cognitive ability. In addition, under the condition of no cognitive impairment or slight cognitive impairment, the influence of innate genetic information (X₁, X₂, X₃, X₄) and current social attributes (X₁₀, X₁₁) on cognitive ability is not significant. In the state of cognitive impairment, the influence of these two groups of variables on cognitive ability is significant. This is an interesting discovery. For example, does this mean that people of different genders have different risks of cognitive impairment? These results provide a novel perspective that deserves further inves.

In addition, the original data set gives the diagnostic status of each respondent at each test. We use the results of the last iteration of MCMC as the model to classify the status of respondents. Through comparison, out of 2048 sample points (513 × 4 = 2048), 1962 sample points have positive classification results, with a correct rate of 95.8%. This shows that our model has good adaptability to data sets.

Conclusions

In this study, Bayesian adaptive group Lasso was applied to the mixed linear regression model with hidden states, adaptive Lasso was applied to certain independent variables, and adaptive group Lasso was applied to several variables with a grouping structure. Under the Bayesian framework, the selection of the penalty function and penalty parameters as well as that of the prior distribution of each parameter was provided, following which the concrete form of all conditional posterior distributions of each parameter were calculated. The specific implementation steps of the Gibbs sampling were presented. Finally, the effects of the model parameter estimation and variable selection were discussed. The simulation analysis demonstrated that the proposed model can better identify the insignificant variables, eliminate the insignificant variables with a grouping structure, and estimate the parameters accurately. The case study verified that the same set of variables may or may not be significant in different states.

Supporting information

S1 File.

https://doi.org/10.1371/journal.pone.0283100.s001

(RAR)

References

1. Tibshirani R. Regression Shrinkage and Selection Via the Lasso. J R Stat Soc B. 1996;58: 267–288.
- View Article
- Google Scholar
2. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its Oracle properties. J Am Stat Assoc. 2001;96: 1348–1360.
- View Article
- Google Scholar
3. Zou H. The adaptive Lasso and Its Oracle Properties. J Am Stat Assoc. 2006;101: 1418–1429.
- View Article
- Google Scholar
4. Wang H, Leng C. A note on adaptive group Lasso. Comp Stat Data Anal. 2008;52: 5277–5286.
- View Article
- Google Scholar
5. Park T, Casella G. The Bayesian Lasso. J Am Stat Assoc. 2008;103: 681–686.
- View Article
- Google Scholar
6. Kyung M. Penalized regression, standard errors, and Bayesian Lassos. Bayesian Anal. 2010;5: 369–411.
- View Article
- Google Scholar
7. Leng C, Tran M, Nott D. Bayesian adaptive Lasso. Ann Inst Stat Math. 2014;66: 221–244.
- View Article
- Google Scholar
8. Lykou A, Ntzoufras I. On Bayesian Lasso variable selection and the specification of the shrinkage parameter. Stat Comput. 2013;23: 361–390.
- View Article
- Google Scholar
9. Khondker ZS, Zhu H, Chu H, Lin W, Ibrahim JG. The Bayesian covariance Lasso. Stat Interface. 2013;6: 243–259. pmid:24551316
- View Article
- PubMed/NCBI
- Google Scholar
10. Raman S, Fuchs T, Wild P, et al. The Bayesian group-lasso for analyzing contingency tables. Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 881–888.
11. Ibrahim J, Zhu H, Garcia R, Guo R. Fixed and random effects selection in mixed effects models. Biometrics. 2011;67: 495–503. pmid:20662831
- View Article
- PubMed/NCBI
- Google Scholar
12. Feng X, Wang G, Wang Y, Song X. Structure detection of semiparametric structural equation models with Bayesian adaptive group Lasso. Stat Med. 2015;34: 1527–1547. pmid:25640461
- View Article
- PubMed/NCBI
- Google Scholar
13. Kang K, Song X, Hu X, Zhu H. Bayesian adaptive group Lasso with semiparametric hidden Markov models. Stat Med. 2019;38: 1634–1650. pmid:30484887
- View Article
- PubMed/NCBI
- Google Scholar
14. Liu H, Song X, Bayesian Analysis of Mixture Structural Equation Models With an Unknown Number of Components, Structural Equation Modeling: A Multidisciplinary Journal, 2018, 25(01): 41–55.
- View Article
- Google Scholar
15. Liu H, Song X, Zhang B, Varying-coefficient hidden Markov models with zero-effect regions, Computational Statistics and Data Analysis, 2022, 73:1–19.
- View Article
- Google Scholar
16. Liu H, Song X, Tang Y, Zhang B, Bayesian quantile nonhomogeneous hidden Markov models, Statistical Methods in Medical Research, 2021, 30(01): 112–128. pmid:32726188
- View Article
- PubMed/NCBI
- Google Scholar
17. Flynn C, Hurvich C and Simonoff J, (2013) Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models Journal of the American Statistical Association, 108, 1031–1043.
- View Article
- Google Scholar
18. Andrews D, Mallows C. Scale mixtures of normal distributions. Journal of the Royal Statistical Society: Series B (Methodological). 1974;36: 99–102.
- View Article
- Google Scholar
19. Torbjorn E, Taesu K, Lee T. On the multivariate Laplace distribution. IEEE Signal Process Lett. 2006;13: 300–303.
- View Article
- Google Scholar
20. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell. 1984;6: 721–741. pmid:22499653
- View Article
- PubMed/NCBI
- Google Scholar
21. Hobert J, Casella G. The effffect of improper priors on Gibbs sampling in hierarchical linear mixed models. Journal of the American Statistical Association, 1996, 91(436): 1461–1473.
- View Article
- Google Scholar
22. Li Q, Lin N. The Bayesian elastic net. Bayesian Anal. 2010;5: 151–170.
- View Article
- Google Scholar

[ref1] 1. Tibshirani R. Regression Shrinkage and Selection Via the Lasso. J R Stat Soc B. 1996;58: 267–288.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its Oracle properties. J Am Stat Assoc. 2001;96: 1348–1360.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Zou H. The adaptive Lasso and Its Oracle Properties. J Am Stat Assoc. 2006;101: 1418–1429.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Wang H, Leng C. A note on adaptive group Lasso. Comp Stat Data Anal. 2008;52: 5277–5286.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Park T, Casella G. The Bayesian Lasso. J Am Stat Assoc. 2008;103: 681–686.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Kyung M. Penalized regression, standard errors, and Bayesian Lassos. Bayesian Anal. 2010;5: 369–411.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Leng C, Tran M, Nott D. Bayesian adaptive Lasso. Ann Inst Stat Math. 2014;66: 221–244.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Lykou A, Ntzoufras I. On Bayesian Lasso variable selection and the specification of the shrinkage parameter. Stat Comput. 2013;23: 361–390.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Khondker ZS, Zhu H, Chu H, Lin W, Ibrahim JG. The Bayesian covariance Lasso. Stat Interface. 2013;6: 243–259. pmid:24551316
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref10] 10. Raman S, Fuchs T, Wild P, et al. The Bayesian group-lasso for analyzing contingency tables. Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 881–888.

[ref11] 11. Ibrahim J, Zhu H, Garcia R, Guo R. Fixed and random effects selection in mixed effects models. Biometrics. 2011;67: 495–503. pmid:20662831
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref12] 12. Feng X, Wang G, Wang Y, Song X. Structure detection of semiparametric structural equation models with Bayesian adaptive group Lasso. Stat Med. 2015;34: 1527–1547. pmid:25640461
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref13] 13. Kang K, Song X, Hu X, Zhu H. Bayesian adaptive group Lasso with semiparametric hidden Markov models. Stat Med. 2019;38: 1634–1650. pmid:30484887
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref14] 14. Liu H, Song X, Bayesian Analysis of Mixture Structural Equation Models With an Unknown Number of Components, Structural Equation Modeling: A Multidisciplinary Journal, 2018, 25(01): 41–55.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref15] 15. Liu H, Song X, Zhang B, Varying-coefficient hidden Markov models with zero-effect regions, Computational Statistics and Data Analysis, 2022, 73:1–19.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref16] 16. Liu H, Song X, Tang Y, Zhang B, Bayesian quantile nonhomogeneous hidden Markov models, Statistical Methods in Medical Research, 2021, 30(01): 112–128. pmid:32726188
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref17] 17. Flynn C, Hurvich C and Simonoff J, (2013) Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models Journal of the American Statistical Association, 108, 1031–1043.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref18] 18. Andrews D, Mallows C. Scale mixtures of normal distributions. Journal of the Royal Statistical Society: Series B (Methodological). 1974;36: 99–102.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref19] 19. Torbjorn E, Taesu K, Lee T. On the multivariate Laplace distribution. IEEE Signal Process Lett. 2006;13: 300–303.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref20] 20. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell. 1984;6: 721–741. pmid:22499653
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref21] 21. Hobert J, Casella G. The effffect of improper priors on Gibbs sampling in hierarchical linear mixed models. Journal of the American Statistical Association, 1996, 91(436): 1461–1473.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref22] 22. Li Q, Lin N. The Bayesian elastic net. Bayesian Anal. 2010;5: 151–170.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

Figures

Abstract

Introduction

Model description

Bayesian inference principle

Bayesian adaptive group Lasso

Gibbs sampling

Simulation experiment

Model settings

Analysis of results

Sensitivity analysis

Case study

Conclusions

Supporting information

S1 File.

References