Accelerating L1-penalized expectation maximization algorithm for latent variable selection in multidimensional two-parameter logistic models

Laixu Shang; Ping-Feng Xu; Na Shan; Man-Lai Tang; George To-Sum Ho

doi:10.1371/journal.pone.0279918

Abstract

One of the main concerns in multidimensional item response theory (MIRT) is to detect the relationship between observed items and latent traits, which is typically addressed by the exploratory analysis and factor rotation techniques. Recently, an EM-based L₁-penalized log-likelihood method (EML1) is proposed as a vital alternative to factor rotation. Based on the observed test response data, EML1 can yield a sparse and interpretable estimate of the loading matrix. However, EML1 suffers from high computational burden. In this paper, we consider the coordinate descent algorithm to optimize a new weighted log-likelihood, and consequently propose an improved EML1 (IEML1) which is more than 30 times faster than EML1. The performance of IEML1 is evaluated through simulation studies and an application on a real data set related to the Eysenck Personality Questionnaire is used to demonstrate our methodologies.

Citation: Shang L, Xu P-F, Shan N, Tang M-L, Ho GT-S (2023) Accelerating L₁-penalized expectation maximization algorithm for latent variable selection in multidimensional two-parameter logistic models. PLoS ONE 18(1): e0279918. https://doi.org/10.1371/journal.pone.0279918

Editor: Mahdi Roozbeh, Semnan University, IRAN, ISLAMIC REPUBLIC OF

Received: May 17, 2022; Accepted: December 16, 2022; Published: January 17, 2023

Copyright: © 2023 Shang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting information files.

Funding: The research of Ping-Feng Xu is supported by the Natural Science Foundation of Jilin Province in China (No. 20210101152JC) and the National Natural Science Foundation of China (No. 11571050). The research of Na Shan is supported by the National Natural Science Foundation of China (No. 11871013). The research of George To-Sum Ho is supported by the Research Grants Council of Hong Kong (No. UGC/FDS14/P05/20) and the Big Data Intelligence Centre in The Hang Seng University of Hong Kong. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Multidimensional item response theory (MIRT) models are widely used to describe the relationship between the designed items and the intrinsic latent traits in psychological and educational tests [1]. Early researches for the estimation of MIRT models are confirmatory, where the relationship between the responses and the latent traits are pre-specified by prior knowledge [2, 3]. Under this setting, parameters are estimated by various methods including marginal maximum likelihood method [4] and Bayesian estimation [5]. However, misspecification of the item-trait relationships in the confirmatory analysis may lead to serious model lack of fit, and consequently, erroneous assessment [6].

To avoid the misfit problem caused by improperly specifying the item-trait relationships, the exploratory item factor analysis (IFA) [4, 7] is usually adopted. The exploratory IFA freely estimate the entire item-trait relationships (i.e., the loading matrix) only with some constraints on the covariance of the latent traits. To obtain a simpler loading structure for better interpretation, the factor rotation [8, 9] is adopted, followed by a cut-off. Although the exploratory IFA and rotation techniques are very useful, they can not be utilized without limitations. For some applications, different rotation techniques yield very different or even conflicting loading matrices. Therefore, it can be arduous to select an appropriate rotation or decide which rotation is the best [10]. In addition, different subjective choices of the cut-off value possibly lead to a substantial change in the loading matrix [11].

Recently, regularization has been proposed as a viable alternative to factor rotation, and it can automatically rotate the factors to produce a sparse loadings structure for exploratory IFA [12, 13]. Scharf and Nestler [14] compared factor rotation and regularization in recovering predefined factor loading patterns and concluded that regularization is a suitable alternative to factor rotation for psychometric applications. Regularization has also been applied to produce sparse and more interpretable estimations in many other psychometric fields such as exploratory linear factor analysis [11, 15, 16], the cognitive diagnostic models [17, 18], structural equation modeling [19], and differential item functioning analysis [20, 21].

For MIRT models, Sun et al. [12] proposed a latent variable selection framework to investigate the item-trait relationships by maximizing the L₁-penalized likelihood [22]. In this framework, one can impose prior knowledge of the item-trait relationships into the estimate of loading matrix to resolve the rotational indeterminacy. Based on the observed test response data, the L₁-penalized likelihood approach can yield a sparse loading structure by shrinking some loadings towards zero if the corresponding latent traits are not associated with a test item. Consequently, it produces a sparse and interpretable estimation of loading matrix, and it addresses the subjectivity of rotation approach.

Since the marginal likelihood for MIRT involves an integral of unobserved latent variables, Sun et al. [12] carried out the expectation maximization (EM) algorithm [23] to solve the L₁-penalized optimization problem. We denote this method as EML1 for simplicity. In the E-step of EML1, numerical quadrature by fixed grid points is used to approximate the conditional expectation of the log-likelihood. This results in a naive weighted log-likelihood on augmented data set with size equal to N × G, where N is the total number of subjects and G is the number of grid points. To optimize the naive weighted L₁-penalized log-likelihood in the M-step, the coordinate descent algorithm [24] is used, whose computational complexity is O(N × G). However, N × G is usually very large, and this consequently leads to high computational burden of the coordinate decent algorithm in the M-step. As shown by Sun et al. [12], EML1 requires several hours for MIRT models with three to four latent traits. Another limitation for EML1 is that it does not update the covariance matrix Σ of latent traits in the EM iteration. Sun et al. [12] proposed a two-stage method. It first computes an estimation of Σ via a constrained exploratory analysis under identification conditions, and then substitutes the estimated Σ into EML1 as a known Σ to estimate discrimination and difficulty parameters. However, our simulation studies show that the estimation of Σ obtained by the two-stage method could be quite inaccurate.

Further development for latent variable selection in MIRT models can be found in [25, 26]. Zhang and Chen [25] proposed a stochastic proximal algorithm for optimizing the L₁-penalized marginal likelihood. They used the stochastic approximation in the stochastic step, which avoids repeatedly evaluating the numerical integral with respect to the multiple latent traits. However, the choice of several tuning parameters, such as a sequence of step size to ensure convergence and burn-in size, may affect the empirical performance of stochastic proximal algorithm. Xu et al. [26] applied the expectation model selection (EMS) algorithm [27] to minimize the L₀-penalized log-likelihood (for example, the Bayesian information criterion [28]) for latent variable selection in MIRT models. In their EMS framework, the model (i.e., structure of loading matrix) and parameters (i.e., item parameters and the covariance matrix of latent traits) are updated simultaneously in each iteration. In the simulation of Xu et al. [26], the EMS algorithm runs significantly faster than EML1, but it still requires about one hour for MIRT with four latent traits.

In this paper, we focus on the classic EM framework of Sun et al. [12] and give an improved EM-based L₁-penalized marginal likelihood (IEML1) with the M-step’s computational complexity being reduced to O(2 × G). The fundamental idea comes from the “artificial data” widely used in the EM algorithm for computing maximum marginal likelihood estimation in the IRT literature [4, 29–32]. In Bock and Aitkin (1981) [29] and Bock et al. (1988) [4], “artificial data” are the expected number of attempts and correct responses to each item in a sample of size N at a given ability level. Essentially, “artificial data” are used to replace the unobservable statistics in the expected likelihood equation of MIRT models. It should be noted that, the number of “artificial data” is G but not N × G, as “artificial data” correspond to G ability levels (i.e., grid points in numerical quadrature). As a result, the number of data involved in the weighted log-likelihood obtained in E-step is reduced and the efficiency of the M-step is then improved.

In our IEML1, we use a slightly different artificial data to obtain the weighted complete data log-likelihood [33] which is widely used in generalized linear models with incomplete data. Specifically, we classify the N × G augmented data into 2 × G artificial data (z, θ^(g)), where z (equals to 0 or 1) is the response to one item and θ^(g) is one discrete ability level (i.e., grid point value). Thus, we obtain a new weighted L₁-penalized log-likelihood based on a total number of 2 × G artificial data (z, θ^(g)), which reduces the computational complexity of the M-step to O(2 × G) from O(N × G).

In addition, it is crucial to choose the grid points being used in the numerical quadrature of the E-step for both EML1 and IEML1. There are various papers that discuss this issue in non-penalized maximum marginal likelihood estimation in MIRT models [4, 29, 30, 34]. To the best of our knowledge, there is however no discussion about the penalized log-likelihood estimator in the literature. In this paper, we will give a heuristic approach to choose artificial data with larger weights in the new weighted log-likelihood. Based on this heuristic approach, IEML1 needs only a few minutes for MIRT models with five latent traits.

The rest of the article is organized as follows. In Section 2, we introduce the multidimensional two-parameter logistic (M2PL) model as a widely used MIRT model, and review the L₁-penalized log-likelihood method for latent variable selection in M2PL models. In Section 3, we give an improved EM-based L₁-penalized log-likelihood method for M2PL models with unknown covariance of latent traits. In Section 4, we conduct simulation studies to compare the performance of IEML1, EML1, the two-stage method [12], a constrained exploratory IFA with hard-threshold (EIFAthr) and a constrained exploratory IFA with optimal threshold (EIFAopt). In Section 5, we apply IEML1 to a real dataset from the Eysenck Personality Questionnaire. A concluding remark is provided in Section 6.

2 Latent variable selection in multidimensional two-parameter logistic models

In this section, the M2PL model that is widely used in MIRT is introduced. Furthermore, the L₁-penalized log-likelihood method for latent variable selection in M2PL models is reviewed.

2.1 Multidimensional two-parameter logistic model

Consider a J-item test that measures K latent traits of N subjects. Let Y = (y_ij)_N×J be the dichotomous observed responses to the J items for all N subjects, where y_ij = 1 represents the correct response of subject i to item j, and y_ij = 0 represents the wrong response. Let θ_i = (θ_i1, …, θ_iK)^T be the K-dimensional latent traits to be measured for subject i = 1, …, N. The relationship between the jth item response and the K-dimensional latent traits for subject i can be expressed by the M2PL model as follows (1) where a_j = (a_j1, …, a_jK)^T and b_j are known as the discrimination and difficulty parameters, respectively. The parameter a_jk ≠ 0 implies that item j is associated with latent trait k. P(y_ij = 1|θ_i, a_j, b_j) denotes the probability that subject i correctly responds to the jth item based on his/her latent traits θ_i and item parameters a_j and b_j. For the sake of simplicity, we use the notation A = (a₁, …, a_J)^T, b = (b₁, …, b_J)^T, and Θ = (θ₁, …, θ_N)^T. The discrimination parameter matrix A is also known as the loading matrix, and the corresponding structure is denoted by Λ = (λ_jk) with λ_jk = I(a_jk ≠ 0).

In M2PL models, several general assumptions are adopted. The latent traits θ_i, i = 1, …, N, are assumed to be independent and identically distributed, and follow a K-dimensional normal distribution N(0, Σ) with zero mean vector and covariance matrix Σ = (σ_kk′)_K×K. Furthermore, the local independence assumption is assumed, that is, given the latent traits θ_i, y_i1, …, y_iJ are conditional independent.

To guarantee the parameter identification and resolve the rotational indeterminacy for M2PL models, some constraints should be imposed. To identify the scale of the latent traits, we assume the variances of all latent trait are unity, i.e., σ_kk = 1 for k = 1, …, K. Dealing with the rotational indeterminacy issue requires additional constraints on the loading matrix A. We adopt the constraints used by Sun et al. [12] and Xu et al. [26], that is, each of the first K items is associated with only one latent trait separately, i.e., a_jj ≠ 0 and a_jk = 0 for 1 ≤ j ≠ k ≤ K. In practice, the constraint on A should be determined according to priori knowledge of the item and the entire study.

2.2 Latent variable selection based on L₁-penalized method

The response function for M2PL model in Eq (1) takes a logistic regression form, where y_ij acts as the response, the latent traits θ_i as the covariates, a_j and b_j as the regression coefficients and intercept, respectively. We are interested in exploring the subset of the latent traits related to each item, that is, to find all non-zero a_jks. This can be viewed as variable selection problem in a statistical sense.

Under the local independence assumption, the likelihood function of the complete data (Y, Θ) for M2PL model can be expressed as follow (2) where φ(θ_i|Σ) is the density function of latent trait θ_i. The log-likelihood function of observed data Y can be written as (3)

To investigate the item-trait relationships, Sun et al. [12] applied the L₁-penalized marginal log-likelihood method to obtain the sparse estimate of A for latent variable selection in M2PL model. They carried out the EM algorithm [23] with coordinate descent algorithm [24] to solve the L₁-penalized optimization problem. However, the covariance matrix Σ of latent traits is assumed to be known and is not realistic in real-world applications.

Instead, we will treat Σ as an unknown parameter and update it in each EM iteration. For this purpose, the L₁-penalized optimization problem including Σ is represented as (4) where denotes the entry-wise L₁ norm of A. The tuning parameter η > 0 controls the sparsity of A. Larger value of η results in a more sparse estimate of A. The tuning parameter is always chosen by cross validation or certain information criteria. In this paper, we employ the Bayesian information criterion (BIC) as described by Sun et al. [12].

3 Implementation of the EM algorithm

Due to the presence of the unobserved variable (e.g., the latent traits Θ), the parameter estimates in Eq (4) can not be directly obtained. Sun et al. [12] carried out EML1 to optimize Eq (4) with a known Σ. Similarly, we first give a naive implementation of the EM algorithm to optimize Eq (4) with an unknown Σ. Then, we give an efficient implementation with the M-step’s computational complexity being reduced to O(2 × G), where G is the number of grid points. Lastly, we will give a heuristic approach to choose grid points being used in the numerical quadrature in the E-step.

3.1 A naive implementation of the EM algorithm

The EM algorithm iteratively executes the expectation step (E-step) and maximization step (M-step) until certain convergence criterion is satisfied. Specifically, the E-step is to compute the Q-function, i.e., the conditional expectation of the L₁-penalized complete log-likelihood with respect to the posterior distribution of latent traits Θ. The M-step is to maximize the Q-function. Let Ψ = (A, b, Σ) be the set of model parameters, and Ψ^(t) = (A^(t), b^(t), Σ^(t)) be the parameters in the tth iteration. The (t + 1)th iteration is described as follows.

3.1.1 E-step.

In the E-step of the (t + 1)th iteration, under the current parameters Ψ^(t), we compute the Q-function involving a Σ-term as follows (5) where Q₀ is and for j = 1, …, J, Q_j is where denotes the L₁-norm of vector a_j. The conditional expectations in Q₀ and each Q_j are computed with respect to the posterior distribution of θ_i as follows where , is the jth row of A^(t), and is the jth element in b^(t).

Note that the conditional expectations in Q₀ and each Q_j do not have closed-form solutions. It is usually approximated using the Gaussian-Hermite quadrature [4, 29] and Monte Carlo integration [35]. For simplicity, we approximate these conditional expectations by summations following Sun et al. [12]. Specifically, we choose fixed grid points and the posterior distribution of θ_i is then approximated by (6) where serves as a normalizing factor. Thus, Q₀ can be approximated by (7) and Q_j for j = 1, …, J is approximated by (8) Hence, the Q-function can be approximated by (9)

3.1.2 M-step.

In the M-step of the (t + 1)th iteration, we maximize the approximation of Q-function obtained by E-step (10) subject to Σ ≻ 0 and diag(Σ) = 1, where Σ ≻ 0 denotes that Σ is a positive definite matrix, and diag(Σ) = 1 denotes that all the diagonal entries of Σ are unity.

It can be easily seen from Eq (9) that can be factorized as the summation of involving Σ and involving (a_j, b_j). Thus, the maximization problem in Eq (10) can be decomposed to maximizing and maximizing penalized separately, that is, (11) and for j = 1, …, J, (12)

For maximization problem (11), can be represented as where tr[⋅] denotes the trace operator of a matrix, where (13) Therefore, the optimization problem in (11) is known as a semi-definite programming problem in convex optimization. We can obtain the Σ^{(t + 1)} in the same way as Zhang et al. [36] by applying a proximal gradient descent algorithm [37]. It is noteworthy that in the EM algorithm used by Sun et al. [12], Q₀ is a constant and thus need not be optimized, as Σ is assumed to be known.

For maximization problem (12), it is noted that in Eq (8) can be regarded as the weighted L₁-penalized log-likelihood in logistic regression with naive augmented data (y_ij, θ_i) and weights , where . Hence, the maximization problem in (Eq 12) is equivalent to the variable selection in logistic regression based on the L₁-penalized likelihood. Several existing methods such as the coordinate decent algorithm [24] can be directly used.

After solving the maximization problems in Eqs (11) and (12), it is straightforward to obtain the parameter estimates of Σ^{(t + 1)}, and for the next iteration.

We call the implementation described in this subsection the naive version since the M-step suffers from a high computational burden. It should be noted that the computational complexity of the coordinate descent algorithm for maximization problem (12) in the M-step is proportional to the sample size of the data set used in the logistic regression [24]. In (12), the sample size (i.e., N × G) of the naive augmented data set {(y_ij, θ_i)|i = 1, …, N, and is usually large, where G is the number of quadrature grid points in . For example, if N = 1000, K = 3 and 11 quadrature grid points are used in each latent trait dimension, then G = 1331 and N × G = 1.331 × 10⁶. This leads to a heavy computational burden for maximizing (12) in the M-step. As a result, the EML1 developed by Sun et al. [12] is computationally expensive.

3.2 An improved EM-based L₁-penalized likelihood method

In this subsection, motivated by the idea about “artificial data” widely used in maximum marginal likelihood estimation in the IRT literature [30], we will derive another form of weighted log-likelihood based on a new artificial data set with size 2 × G. Therefore, the computational complexity of the M-step is reduced to O(2 × G) from O(N × G).

As described in Section 3.1.1, we use the same set of fixed grid points for all θ_is to approximate the conditional expectation. Let with θ^(g) representing a discrete ability level, and denote the value of at θ_i = θ^(g). Using the traditional “artificial data” described in Baker and Kim [30], we can write as (14) where is the “expected sample size” at ability level θ^(g), and is the “expected frequency” of correct response to item j at ability θ^(g). Note that, in the IRT literature, and are known as “artificial data”, and they are applied to replace the unobservable sufficient statistics in the complete data likelihood equation in the E-step of the EM algorithm for computing maximum marginal likelihood estimation [30–32]. If η = 0, differentiating Eq (14), we can obtain a likelihood equation involving the traditional “artificial data”, which can be solved by standard optimization methods [30, 32].

For L₁-penalized log-likelihood estimation, we should maximize Eq (14) for η > 0. Although the coordinate descent algorithm [24] can be applied to maximize Eq (14), some technical details are needed. In this paper, from a novel perspective, we will view as a weighted L₁-penalized log-likelihood of logistic regression based on our new artificial data inspirited by Ibrahim (1990) [33] and maximize by applying the efficient R package glmnet [24].

Specifically, we group the N × G naive augmented data in Eq (8) into 2 × G new artificial data (z, θ^(g)), where z (equals to 0 or 1) is the response to item j and θ^(g) is a discrete ability level. Thus, in Eq (8) can be rewritten as (15) where is the “expected frequency” of correct or incorrect response to item j at ability θ^(g). The second equality in Eq (15) holds since z and F_j(θ^(g))) do not depend on y_ij and the order of the summation is interchanged. Thus, we obtain a new form of weighted L₁-penalized log-likelihood of logistic regression in the last line of Eq (15) based on the new artificial data (z, θ^(g)) with a weight . Note that and , so the traditional “artificial data” can be viewed as weights for our new artificial data (z, θ^(g)).

Since Eq (15) is a weighted L₁-penalized log-likelihood of logistic regression, it can be optimized directly via the efficient R package glmnet [24]. This is an advantage of using Eq (15) instead of Eq (14). Moreover, the size of the new artificial data set {(z, θ^(g))|z = 0, 1, and involved in Eq (15) is 2 × G, which is substantially smaller than N × G. This significantly reduces the computational burden for optimizing in the M-step. We call this version of EM as the improved EML1 (IEML1). Since the computational complexity of the coordinate descent algorithm is O(M) where M is the sample size of data involved in penalized log-likelihood [24], the computational complexity of M-step of IEML1 is reduced to O(2 × G) from O(N × G).

It is noteworthy that, for y_i = y_i′ with the same response pattern, the posterior distribution of θ_i is the same as that of θ_i′, i.e., . When the sample size N is large, the item response vectors y₁, ⋯, y_N can be grouped into distinct response patterns, and then the summation in computing is not over N, but over the number of distinct patterns, which will greatly reduce the computational time [30].

It should be noted that any fixed quadrature grid points set, such as Gaussian-Hermite quadrature points set, will result in the same weighted L₁-penalized log-likelihood as in Eq (15). However, neither the adaptive Gaussian-Hermite quadrature [34] nor the Monte Carlo integration [35] will result in Eq (15) since the adaptive Gaussian-Hermite quadrature requires different adaptive quadrature grid points for different θ_i while the Monte Carlo integration usually draws different Monte Carlo samples for different θ_i.

3.3 Heuristic approach for choosing grid points

In the new weighted log-likelihood in Eq (15), the more artificial data (z, θ^(g)) are used, the more accurate the approximation of is; but, the more computational burden IEML1 has. To reduce the computational burden of IEML1 without sacrificing too much accuracy, we will give a heuristic approach for choosing a few grid points used to compute .

Let us consider a motivating example based on a M2PL model with item discrimination parameter matrix A₁ with K = 3 and J = 40, which is given in Table A in S1 Appendix. The grid point set , where denotes a set of equally spaced 11 grid points on the interval [−4, 4]. Therefore, the size of our new artificial data set used in Eq (15) is 2 × 11³ = 2662. Based on one iteration of the EM algorithm for one simulated data set, we calculate the weights of the new artificial data and then sort them in descending order.

Fig 1 (left) gives the histogram of all weights, which shows that most of the weights are very small and only a few of them are relatively large. Fig 1 (right) gives the plot of the sorted weights, in which the top 355 sorted weights are bounded by the dashed line. The sum of the top 355 weights consitutes 95.9% of the sum of all the 2662 weights. This suggests that only a few (z, θ^(g)) contribute significantly to . Furthermore, Fig 2 presents scatter plots of our artificial data (z, θ^(g)), in which the darker the color of (z, θ^(g)), the greater the weight . It can be seen roughly that most (z, θ^(g)) with greater weights are included in {0, 1} × [−2.4, 2.4]³. In fact, artificial data with the top 355 sorted weights in Fig 1 (right) are all in {0, 1} × [−2.4, 2.4]³. These observations suggest that we should use a reduced grid point set with each dimension consisting of 7 equally spaced grid points on the interval [−2.4, 2.4]. Thus, the size of the corresponding reduced artificial data set is 2 × 7³ = 686. In this way, only 686 artificial data are required in the new weighted log-likelihood in Eq (15). Our simulation studies show that IEML1 with this reduced artificial data set performs well in terms of correctly selected latent variables and computing time.

Download:

Fig 1. Histogram of w_j (left column) and plot of sorted w_j (right column).

https://doi.org/10.1371/journal.pone.0279918.g001

Download:

Fig 2. Scatter plots of the grid points with the weights w_j under z = 1 (left column) and z = 0 (right column).

https://doi.org/10.1371/journal.pone.0279918.g002

In the literature, Xu et al. [26] gives a similar approach to choose the naive augmented data (y_ij, θ_i) with larger weight for computing Eq (8). In this paper, we however choose our new artificial data (z, θ^(g)) with larger weight to compute Eq (15).

4 Simulation studies

In this section, we conduct simulation studies to evaluate and compare the performance of our IEML1, the EML1 proposed by Sun et al. [12] and the constrained exploratory IFAs with hard-threshold and optimal threshold. In all methods, we use the same identification constraints described in subsection 2.1 to resolve the rotational indeterminacy. In addition, we also give simulation studies to show the performance of the heuristic approach for choosing grid points. The R codes of the IEML1 method are provided in S4 Appendix.

Here, we consider three M2PL models with the item number J equal to 40. Three true discrimination parameter matrices A₁, A₂ and A₃ with K = 3, 4, 5 are shown in Tables A, C and E in S1 Appendix, respectively. The corresponding difficulty parameters b₁, b₂ and b₃ are listed in Tables B, D and F in S1 Appendix. The non-zero discrimination parameters are generated from the identically independent uniform distribution U(0.5, 2). The true difficulty parameters are generated from the standard normal distribution. The diagonal elements of the true covariance matrix Σ of the latent traits are setting to be unity with all off-diagonals being 0.1.

For parameter identification, we constrain items 1, 10, 19 to be related only to latent traits 1, 2, 3 respectively for K = 3, that is, (a₁, a₁₀, a₁₉)^T in A₁ was fixed as diagonal matrix in each EM iteration. Similarly, items 1, 7, 13, 19 are related only to latent traits 1, 2, 3, 4 respectively for K = 4 and items 1, 5, 9, 13, 17 are related only to latent traits 1, 2, 3, 4, 5 respectively for K = 5.

Two sample size (i.e., N = 500, 1000) are considered. For each setting, we draw 100 independent data sets for each M2PL model. We obtain results by IEML1 and EML1 and evaluate their results in terms of computation efficiency, correct rate (CR) for the latent variable selection and accuracy of the parameter estimation. The computation efficiency is measured by the average CPU time over 100 independent runs. The CR for the latent variable selection is defined by the recovery of the loading structure Λ = (λ_jk) as follows: where is an estimate of the true loading structure Λ. The following mean squared error (MSE) is used to measure the accuracy of the parameter estimation: where denotes the estimate of a_jk from the sth replication and S = 100 is the number of data sets. The MSE of each b_j in b and σ_kk′ in Σ is calculated similarly to that of a_jk.

4.1 Computational efficiency

We first compare computational efficiency of IEML1 and EML1. To make a fair comparison, the covariance of latent traits Σ is assumed to be known for both methods in this subsection.

In this study, we consider M2PL with A₁. We use the fixed grid point set , where is the set of equally spaced 11 grid points on the interval [4, 4]. In each M-step, the maximization problem in (12) is solved by the R-package glmnet for both methods. Due to tedious computing time of EML1, we only run the two methods on 10 data sets. For each replication, the initial value of (a₁, a₁₀, a₁₉)^T is set as identity matrix, and other initial values in A are set as 1/J = 0.025. The initial value of b is set as the zero vector. The candidate tuning parameters are given as (0.10, 0.09, …, 0.01) × N, and we choose the best tuning parameter by Bayesian information criterion as described by Sun et al. [12].

The average CPU time (in seconds) for IEML1 and EML1 are given in Table 1. From Table 1, IEML1 runs at least 30 times faster than EML1. Moreover, IEML1 and EML1 yield comparable results with the absolute error no more than 10⁻¹³. It numerically verifies that two methods are equivalent.

Download:

Table 1. The average CPU time in seconds for IEML1 and EML1 under K = 3 and J = 40.

https://doi.org/10.1371/journal.pone.0279918.t001

4.2 Simulation for the unknown Σ case

In this subsection, we compare our IEML1 with a two-stage method proposed by Sun et al. [12], a constrained exploratory IFA with hard threshold (EIFAthr) and a constrained exploratory IFA with optimal threshold (EIFAopt). In the EIFAthr, all parameters are estimated via a constrained exploratory analysis satisfying the identification conditions, and then the estimated discrimination parameters that smaller than a given threshold are truncated to be zero. In the simulation studies, several thresholds, i.e., 0.30, 0.35, …, 0.70, are used, and the corresponding EIFAthr are denoted by EIFA0.30, EIFA0.35, …, EIFA0.70, respectively. In EIFAthr, it is subjective to preset a threshold, while in EIFAopt we further choose the optimal truncated estimates correponding to the optimal threshold with minimum BIC value from several given thresholds (e.g., 0.30, 0.35, …, 0.70 used in EIFAthr) in a data-driven manner.

For IEML1, the initial value of Σ is set to be an identity matrix. For other three methods, a constrained exploratory IFA is adopted to estimate Σ first by R-package mirt with the setting being “method = EM” and the same grid points are set as in subsection 4.1.

We consider M2PL models with A₁ and A₂ in this study. To compare the latent variable selection performance of all methods, the boxplots of CR are dispalyed in Fig 3. From Fig 3, IEML1 performs the best and then followed by the two-stage method. As we expect, different hard thresholds leads to different estimates and the resulting different CR, and it would be difficult to choose a best hard threshold in practices. EIFAopt performs better than EIFAthr. As complements to CR, the false negative rate (FNR), false positive rate (FPR) and precision are reported in S2 Appendix. The boxplots of these metrics show that our IEML1 has very good performance overall.

Download:

Fig 3. Boxplots of the correct rate of Λ obtained by IEML1 (dark gray boxes), two-stage (light gray boxes), EIFAthr and EIFAopt (white boxes) for K = 3 and 4 under sample size N = 500 and 1000.

https://doi.org/10.1371/journal.pone.0279918.g003

Fig 4 presents boxplots of the MSE of A obtained by all methods. From Fig 4, IEML1 and the two-stage method perform similarly, and better than EIFAthr and EIFAopt. We can see that larger threshold leads to smaller median of MSE, but some very large MSEs in EIFAthr.

Download:

Fig 4. Boxplots of the MSE of A obtained by IEML1 (dark gray boxes), two-stage (light gray boxes), EIFAthr and EIFAopt (white boxes) for K = 3 and 4 under sample size N = 500 and 1000.

https://doi.org/10.1371/journal.pone.0279918.g004

Figs 5 and 6 show boxplots of the MSE of b and Σ obtained by all methods. Note that, EIFAthr and EIFAopt obtain the same estimates of b and Σ, and consequently, they produce the same MSE of b and Σ. Therefore, their boxplots of b and Σ are the same and they are represented by “EIFA” in Figs 5 and 6. We can see that all methods obtain very similar estimates of b. IEML1 gives significant better estimates of Σ than other methods.

Download:

Fig 5. Boxplots of the MSE of b obtained by IEML1 (dark gray boxes), two-stage (light gray boxes), EIFAthr and EIFAopt (white boxes) for K = 3 and 4 under sample size N = 500 and 1000.

https://doi.org/10.1371/journal.pone.0279918.g005

Download:

Fig 6. Boxplots of the MSE of Σ obtained by IEML1 (dark gray boxes), two-stage (light gray boxes), EIFAthr and EIFAopt (white boxes) for K = 3 and 4 under sample size N = 500 and 1000.

https://doi.org/10.1371/journal.pone.0279918.g006

4.3 Evaluation on heuristic approach for choosing grid points

As presented in the motivating example in Section 3.3, most of the grid points with larger weights are distributed in the cube [−2.4, 2.4]³. Intuitively, the grid points for each latent trait dimension can be drawn from the interval [−2.4, 2.4]. In this subsection, we generate three grid point sets denoted by Grid11, Grid7 and Grid5 and compare the performance of IEML1 based on these three grid point sets via simulation study. Specifically, Grid11, Grid7 and Grid5 are three K-ary Cartesian power, where 11, 7 and 5 equally spaced grid points on the intervals [−4, 4], [−2.4, 2.4] and [−2.4, 2.4] in each latent trait dimension, respectively.

Fig 7 summarizes the boxplots of CRs and MSE of parameter estimates by IEML1 for all cases. From Fig 7, we obtain very similar results when Grid11, Grid7 and Grid5 are used in IEML1. Table 2 shows the average CPU time for all cases. The computing time increases with the sample size and the number of latent traits. The simulation studies show that IEML1 can give quite good results in several minutes if Grid5 is used for M2PL with K ≤ 5 latent traits.

Download:

Fig 7. Boxplots of the correct rate of Λ (row 1), the MSE of A (row 2), the MSE of b (row 3) and the MSE of Σ (row 4) for K = 3 (column 1), 4 (column 2) and 5 (column 3) under sample size N = 500 and 1000. The dark gray boxes, light gray boxes and white boxes represent the results via 11, 7 and 5 grid points per dimension respectively.

https://doi.org/10.1371/journal.pone.0279918.g007

Download:

Table 2. The average CPU time in seconds of IEML1 with Gird11, Grid7 and Grid5 under K = 3, 4, 5 with sample size N = 500, 1000.

https://doi.org/10.1371/journal.pone.0279918.t002

In fact, we also try to use grid point set Grid3 in which each dimension uses three grid points equally spaced in interval [−2.4, 2.4]. But the numerical quadrature with Grid3 is not good enough to approximate the conditional expectation in the E-step. It should be noted that IEML1 may depend on the initial values. In all simulation studies, we use the initial values similarly as described for A₁ in subsection 4.1. These initial values result in quite good results and they are good enough for practical users in real data applications.

5 Real data analysis

In this section, we analyze a data set of the Eysenck Personality Questionnaire given in Eysenck and Barrett [38]. The data set includes 754 Canadian females’ responses (after eliminating subjects with missing data) to 69 dichotomous items, where items 1–25 consist of the psychoticism (P), items 26–46 consist of the extraversion (E) and items 47–69 consist of the neuroticism (N). This data set was also analyzed in Xu et al. [26]. In order to guarantee the psychometric properties of the items, we select those items whose corrected item-total correlation values are greater than 0.2 [39]. The selected items and their original indices are listed in Table 3, with 10, 19 and 23 items corresponding to P, E and N respectively. Items marked by asterisk correspond to negatively worded items whose original scores have been reversed.

Download:

Table 3. The eysenck pensonality questionnaire items.

https://doi.org/10.1371/journal.pone.0279918.t003

In the analysis, we designate two items related to each factor for identifiability. Based on the meaning of the items and previous research, we specify items 1 and 9 to P, items 14 and 15 to E, items 32 and 34 to N. We employ the IEML1 to estimate the loading structure and then compute the observed BIC under each candidate tuning parameters in (0.040, 0.038, 0.036, …, 0.002) × N, where N denotes the sample size 754. The minimal BIC value is 38902.46 corresponding to η = 0.02 × N. The parameter estimates of A and b are given in Table 4, and the estimate of Σ is

Download:

Table 4. The parameter estimates by the IEML1 algorithm for the real data.

https://doi.org/10.1371/journal.pone.0279918.t004

From the results, most items are found to remain associated with only one single trait while some items related to more than one trait. Most of these findings are sensible. For example, item 19 (‘Would you call yourself happy-go-lucky?’) designed for extraversion is also related to neuroticism which reflects individuals’ emotional stability. Item 49 (‘Do you often feel lonely?’) is also related to extraversion whose characteristics are enjoying going out and socializing. In addition, it is reasonable that item 30 (‘Does your mood often go up and down?’) and item 40 (‘Would you call yourself tense or ‘highly-strung’?’) are related to both neuroticism and psychoticism.

6 Concluding remarks

In this paper, we obtain a new weighted log-likelihood based on a new artificial data set for M2PL models, and consequently we propose IEML1 to optimize the L₁-penalized log-likelihood for latent variable selection. We give a heuristic approach for choosing the quadrature points used in numerical quadrature in the E-step, which reduces the computational burden of IEML1 significantly. There are three advantages of IEML1 over EML1, the two-stage method, EIFAthr and EIFAopt. First, the computational complexity of M-step in IEML1 is reduced to O(2 × G) from O(N × G). In our simulation studies, IEML1 needs a few minutes for M2PL models with no more than five latent traits. Second, IEML1 updates covariance matrix Σ of latent traits and gives a more accurate estimate of Σ. Third, IEML1 outperforms the two-stage method, EIFAthr and EIFAopt in terms of CR of the latent variable selection and the MSE for the parameter estimates.

The current study will be extended in the following directions for future research. First, we will generalize IEML1 to multidimensional three-parameter (or four parameter) logistic models that give much attention in recent years. Second, other numerical integration such as Gaussian-Hermite quadrature [4, 29] and adaptive Gaussian-Hermite quadrature [34] can be adopted in the E-step of IEML1. Gaussian-Hermite quadrature uses the same fixed grid point set for each individual and can be easily adopted in the framework of IEML1. However, further simulation results are needed. Compared to the Gaussian-Hermite quadrature, the adaptive Gaussian-Hermite quadrature produces an accurate fast converging solution with as few as two points per dimension for estimation of MIRT models [34]. Therefore, the adaptive Gaussian-Hermite quadrature is also potential to be used in penalized likelihood estimation for MIRT models although it is impossible to get our new weighted log-likelihood in Eq (15) due to applying different grid point set for different individual. Third, we will accelerate IEML1 by parallel computing technique for medium-to-large scale variable selection, as [40] produced larger gains in performance for MIRT estimation by applying the parallel computing technique. Fourth, the new weighted log-likelihood on the new artificial data proposed in this paper will be applied to the EMS in [26] to reduce the computational complexity for the MS-step.

Supporting information

S1 Appendix. True discrimination and difficulty parameters in simulation studies.

https://doi.org/10.1371/journal.pone.0279918.s001

(PDF)

S2 Appendix. FNR, FPR and precision of the loading structure in the simulation for the unknown Σ case.

https://doi.org/10.1371/journal.pone.0279918.s002

(PDF)

S3 Appendix. Data sets of the study.

https://doi.org/10.1371/journal.pone.0279918.s003

(PDF)

S4 Appendix. R codes of IEML1.

https://doi.org/10.1371/journal.pone.0279918.s004

(PDF)

References

1. Reckase MD. Multidimensional Item Response Theory. 1st ed. New York: Springer; 2009.
2. Janssen R, De Boeck P. Confirmatory analyses of componential test structure using multidimensional item response theory. Multivariate Behavioral Research. 1999; 34(2): 245–268. pmid:26753937
- View Article
- PubMed/NCBI
- Google Scholar
3. Mckinley R. Confirmatory analysis of test structure using multidimensional item response theory. ETS Research Report Series. 1989; 2: i–40.
- View Article
- Google Scholar
4. Bock RD, Gibbons R, Muraki E. Full-information item factor analysis. Applied Psychological Measurement. 1988; 12(3): 261–280.
- View Article
- Google Scholar
5. Béguin AA, Glas CAW. MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika. 2001; 66(4): 541–561.
- View Article
- Google Scholar
6. da Silva MA, Liu R, Huggins-Manley AC, Bazán JL. Incorporating the Q-matrix into multidimensional item response theory models. Educational and Psychological Measurement. 2019; 79(4): 665–687. pmid:32655178
- View Article
- PubMed/NCBI
- Google Scholar
7. Cai L. High-dimensional exploratory item factor analysis by a Metropolis-Hastings Robbins-Monro algorithm. Psychometrika. 2010; 75(1): 33–57.
- View Article
- Google Scholar
8. Bernaards CA, Jennrich RI. Gradient projection algorithms and software for arbitrary rotation criteria in factor analysis. Educational and Psychological Measurement. 2005; 65(5): 676–696.
- View Article
- Google Scholar
9. Browne MW. An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research. 2001; 36(1): 111–150.
- View Article
- Google Scholar
10. Sass DA, Schmitt TA. A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research. 2010; 45(1): 73–103. pmid:26789085
- View Article
- PubMed/NCBI
- Google Scholar
11. Jin S, Moustaki I, Yang-Wallentin F. Approximated penalized maximum likelihood for exploratory factor analysis: An orthogonal case. Psychometrika. 2018; 83(3): 628–649. pmid:29876715
- View Article
- PubMed/NCBI
- Google Scholar
12. Sun J, Chen Y, Liu J, Ying Z, Xin T. Latent variable selection for multidimensional item response theory models via L₁ regularization. Psychometrika. 2016; 81(4): 921–939.
- View Article
- Google Scholar
13. Hui FKC, Tanaka E, Warton DI. Order selection and sparsity in latent variable models via the ordered factor LASSO. Biometrics. 2018; 74(4): 1311–1319. pmid:29750847
- View Article
- PubMed/NCBI
- Google Scholar
14. Scharf F, Nestler S. Should regularization replace simple structure rotation in exploratory factor analysis? Structural Equation Modeling: A Multidisciplinary Journal. 2019; 26(4): 576–590.
- View Article
- Google Scholar
15. Hirose K, Konishi S. Variable selection via the weighted group lasso for factor analysis models. The Canadian Journal of Statistics. 2012; 40(2): 345–361.
- View Article
- Google Scholar
16. Hirose K, Yamamoto M. Sparse estimation via nonconcave penalized likelihood in factor analysis model. Statistics and Computing. 2015; 25(5): 863–875.
- View Article
- Google Scholar
17. Chen Y, Liu J, Xu G, Ying Z. Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association. 2015; 110(510): 850–866. pmid:26294801
- View Article
- PubMed/NCBI
- Google Scholar
18. Liu J, Kang HA. Q-matrix learning via latent variable selection and identifiability. In: von Davier M, Lee YS, editors. Handbook of Diagnostic Classification Models. Cham: Springer; 2019. pp. 247–263.
19. Huang PH, Chen H, Weng LJ. A penalized likelihood method for structural equation modeling. Psychometrika. 2017; 82(2): 329–354. pmid:28417228
- View Article
- PubMed/NCBI
- Google Scholar
20. Magis D, Tuerlinckx F, De Boeck P. Detection of differential item functioning using the lasso approach. Journal of Educational and Behavioral Statistics. 2015; 40(2): 111–135.
- View Article
- Google Scholar
21. Tutz G, Schauberger G. A penalty approach to differential item functioning in Rasch models. Psychometrika. 2015; 80(1): 21–43. pmid:24297435
- View Article
- PubMed/NCBI
- Google Scholar
22. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B. 1996; 58(1): 267–288.
- View Article
- Google Scholar
23. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B. 1977; 39(1): 1–38.
- View Article
- Google Scholar
24. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software. 2010; 33(1): 1–22. pmid:20808728
- View Article
- PubMed/NCBI
- Google Scholar
25. Zhang S, Chen Y. Computation for latent variable model estimation: A unified stochastic proximal framework. Psychometrika. 2022; 87(4): 1473–1502. pmid:35524934
- View Article
- PubMed/NCBI
- Google Scholar
26. Xu PF, Shang L, Zheng QZ, Shan N, Tang ML. Latent variable selection in multidimensional item response theory models using the expectation model selection algorithm. British Journal of Mathematical and Statistical Psychology. 2022; 75(2): 363–394. pmid:34918834
- View Article
- PubMed/NCBI
- Google Scholar
27. Jiang J, Nguyen T, Rao JS. The E-MS algorithm: Model selection with incomplete data. Journal of the American Statistical Association. 2015; 110(511): 1136–1147. pmid:26783375
- View Article
- PubMed/NCBI
- Google Scholar
28. Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978; 6(2): 461–464.
- View Article
- Google Scholar
29. Bock RD, Aitkin M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika. 1981; 46(4): 443–459.
- View Article
- Google Scholar
30. Baker FB, Kim SH. Item Response Theory: Parameter Estimation Techniques. 2nd ed. Boca Raton: CRC press; 2004.
31. Zheng C, Meng X, Guo S, Liu Z. Expectation-maximization-maximization: A feasible MLE algorithm for the three-parameter logistic model based on a mixture modeling reformulation. Frontiers in Psychology. 2018; 8:2302. pmid:29354089
- View Article
- PubMed/NCBI
- Google Scholar
32. Chen P, Wang C. Using EM algorithm for finite mixtures and reformed supplemented EM for MIRT calibration. Psychometrika. 2021; 86(1): 299–326. pmid:33591556
- View Article
- PubMed/NCBI
- Google Scholar
33. Ibrahim JG. Incomplete data in generalized linear models. Journal of the American Statistical Association. 1990; 85(411): 765–769.
- View Article
- Google Scholar
34. Schilling S, Bock RD. High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika. 2005; 70(3): 533–555.
- View Article
- Google Scholar
35. Meng XL, Schilling S. Fitting full-information item factor models and an empirical investigation of bridge sampling. Journal of the American Statistical Association. 1996; 91(435): 1254–1267.
- View Article
- Google Scholar
36. Zhang S, Chen Y, Liu Y. An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology. 2020; 73(1): 44–71. pmid:30511445
- View Article
- PubMed/NCBI
- Google Scholar
37. Parikh N, Boyd S. Proximal algorithms. Foundations and Trends in Optimization. 2014; 1(3): 127–239.
- View Article
- Google Scholar
38. Eysenck S, Barrett P. Re-introduction to cross-cultural studies of the EPQ. Personality and Individual Differences. 2013; 54(4): 485–489.
- View Article
- Google Scholar
39. Kline P. A Handbook of Test Construction: Introduction to Psychometric Design. New York: Methuen; 1986.
40. von Davier M. New results on an improved parallel EM algorithm for estimating generalized latent variable models. In van der Ark LA, Wiberg M, Culpepper SA, Douglas JA, Wang WC, editors. Quantitative Psychology. Cham: Springer; 2017. pp. 1–8. https://doi.org/10.1007/978-3-319-56294-0_1

[ref1] 1. Reckase MD. Multidimensional Item Response Theory. 1st ed. New York: Springer; 2009.

[ref2] 2. Janssen R, De Boeck P. Confirmatory analyses of componential test structure using multidimensional item response theory. Multivariate Behavioral Research. 1999; 34(2): 245–268. pmid:26753937
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Mckinley R. Confirmatory analysis of test structure using multidimensional item response theory. ETS Research Report Series. 1989; 2: i–40.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref4] 4. Bock RD, Gibbons R, Muraki E. Full-information item factor analysis. Applied Psychological Measurement. 1988; 12(3): 261–280.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref5] 5. Béguin AA, Glas CAW. MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika. 2001; 66(4): 541–561.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref6] 6. da Silva MA, Liu R, Huggins-Manley AC, Bazán JL. Incorporating the Q-matrix into multidimensional item response theory models. Educational and Psychological Measurement. 2019; 79(4): 665–687. pmid:32655178
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref7] 7. Cai L. High-dimensional exploratory item factor analysis by a Metropolis-Hastings Robbins-Monro algorithm. Psychometrika. 2010; 75(1): 33–57.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Bernaards CA, Jennrich RI. Gradient projection algorithms and software for arbitrary rotation criteria in factor analysis. Educational and Psychological Measurement. 2005; 65(5): 676–696.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Browne MW. An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research. 2001; 36(1): 111–150.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Sass DA, Schmitt TA. A comparative investigation of rotation criteria within exploratory factor analysis. Multivariate Behavioral Research. 2010; 45(1): 73–103. pmid:26789085
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref11] 11. Jin S, Moustaki I, Yang-Wallentin F. Approximated penalized maximum likelihood for exploratory factor analysis: An orthogonal case. Psychometrika. 2018; 83(3): 628–649. pmid:29876715
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref12] 12. Sun J, Chen Y, Liu J, Ying Z, Xin T. Latent variable selection for multidimensional item response theory models via L₁ regularization. Psychometrika. 2016; 81(4): 921–939.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref13] 13. Hui FKC, Tanaka E, Warton DI. Order selection and sparsity in latent variable models via the ordered factor LASSO. Biometrics. 2018; 74(4): 1311–1319. pmid:29750847
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref14] 14. Scharf F, Nestler S. Should regularization replace simple structure rotation in exploratory factor analysis? Structural Equation Modeling: A Multidisciplinary Journal. 2019; 26(4): 576–590.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref15] 15. Hirose K, Konishi S. Variable selection via the weighted group lasso for factor analysis models. The Canadian Journal of Statistics. 2012; 40(2): 345–361.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref16] 16. Hirose K, Yamamoto M. Sparse estimation via nonconcave penalized likelihood in factor analysis model. Statistics and Computing. 2015; 25(5): 863–875.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref17] 17. Chen Y, Liu J, Xu G, Ying Z. Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association. 2015; 110(510): 850–866. pmid:26294801
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref18] 18. Liu J, Kang HA. Q-matrix learning via latent variable selection and identifiability. In: von Davier M, Lee YS, editors. Handbook of Diagnostic Classification Models. Cham: Springer; 2019. pp. 247–263.

[ref19] 19. Huang PH, Chen H, Weng LJ. A penalized likelihood method for structural equation modeling. Psychometrika. 2017; 82(2): 329–354. pmid:28417228
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref20] 20. Magis D, Tuerlinckx F, De Boeck P. Detection of differential item functioning using the lasso approach. Journal of Educational and Behavioral Statistics. 2015; 40(2): 111–135.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref21] 21. Tutz G, Schauberger G. A penalty approach to differential item functioning in Rasch models. Psychometrika. 2015; 80(1): 21–43. pmid:24297435
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref22] 22. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B. 1996; 58(1): 267–288.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref23] 23. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B. 1977; 39(1): 1–38.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref24] 24. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software. 2010; 33(1): 1–22. pmid:20808728
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref25] 25. Zhang S, Chen Y. Computation for latent variable model estimation: A unified stochastic proximal framework. Psychometrika. 2022; 87(4): 1473–1502. pmid:35524934
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref26] 26. Xu PF, Shang L, Zheng QZ, Shan N, Tang ML. Latent variable selection in multidimensional item response theory models using the expectation model selection algorithm. British Journal of Mathematical and Statistical Psychology. 2022; 75(2): 363–394. pmid:34918834
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref27] 27. Jiang J, Nguyen T, Rao JS. The E-MS algorithm: Model selection with incomplete data. Journal of the American Statistical Association. 2015; 110(511): 1136–1147. pmid:26783375
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref28] 28. Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978; 6(2): 461–464.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref29] 29. Bock RD, Aitkin M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika. 1981; 46(4): 443–459.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref30] 30. Baker FB, Kim SH. Item Response Theory: Parameter Estimation Techniques. 2nd ed. Boca Raton: CRC press; 2004.

[ref31] 31. Zheng C, Meng X, Guo S, Liu Z. Expectation-maximization-maximization: A feasible MLE algorithm for the three-parameter logistic model based on a mixture modeling reformulation. Frontiers in Psychology. 2018; 8:2302. pmid:29354089
View Article
PubMed/NCBI
Google Scholar

[98] View Article

[99] PubMed/NCBI

[100] Google Scholar

[ref32] 32. Chen P, Wang C. Using EM algorithm for finite mixtures and reformed supplemented EM for MIRT calibration. Psychometrika. 2021; 86(1): 299–326. pmid:33591556
View Article
PubMed/NCBI
Google Scholar

[102] View Article

[103] PubMed/NCBI

[104] Google Scholar

[ref33] 33. Ibrahim JG. Incomplete data in generalized linear models. Journal of the American Statistical Association. 1990; 85(411): 765–769.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref34] 34. Schilling S, Bock RD. High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika. 2005; 70(3): 533–555.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref35] 35. Meng XL, Schilling S. Fitting full-information item factor models and an empirical investigation of bridge sampling. Journal of the American Statistical Association. 1996; 91(435): 1254–1267.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref36] 36. Zhang S, Chen Y, Liu Y. An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology. 2020; 73(1): 44–71. pmid:30511445
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref37] 37. Parikh N, Boyd S. Proximal algorithms. Foundations and Trends in Optimization. 2014; 1(3): 127–239.
View Article
Google Scholar

[119] View Article

[120] Google Scholar

[ref38] 38. Eysenck S, Barrett P. Re-introduction to cross-cultural studies of the EPQ. Personality and Individual Differences. 2013; 54(4): 485–489.
View Article
Google Scholar

[122] View Article

[123] Google Scholar

[ref39] 39. Kline P. A Handbook of Test Construction: Introduction to Psychometric Design. New York: Methuen; 1986.

[ref40] 40. von Davier M. New results on an improved parallel EM algorithm for estimating generalized latent variable models. In van der Ark LA, Wiberg M, Culpepper SA, Douglas JA, Wang WC, editors. Quantitative Psychology. Cham: Springer; 2017. pp. 1–8. https://doi.org/10.1007/978-3-319-56294-0_1

Figures

Abstract

1 Introduction

2 Latent variable selection in multidimensional two-parameter logistic models

2.1 Multidimensional two-parameter logistic model

2.2 Latent variable selection based on L1-penalized method

3 Implementation of the EM algorithm

3.1 A naive implementation of the EM algorithm

3.1.1 E-step.

3.1.2 M-step.

3.2 An improved EM-based L1-penalized likelihood method

3.3 Heuristic approach for choosing grid points

4 Simulation studies

4.1 Computational efficiency

4.2 Simulation for the unknown Σ case

4.3 Evaluation on heuristic approach for choosing grid points

5 Real data analysis

6 Concluding remarks

Supporting information

S1 Appendix. True discrimination and difficulty parameters in simulation studies.

S2 Appendix. FNR, FPR and precision of the loading structure in the simulation for the unknown Σ case.

S3 Appendix. Data sets of the study.

S4 Appendix. R codes of IEML1.

References

2.2 Latent variable selection based on L₁-penalized method

3.2 An improved EM-based L₁-penalized likelihood method