Figures
Abstract
This paper introduces a new family of matrix variate distributions based on the mean-mixture of normal (MMN) models. The properties of the new matrix variate family, namely stochastic representation, moments and characteristic function, linear and quadratic forms as well as marginal and conditional distributions are investigated. Three special cases including the restricted skew-normal, exponentiated MMN and the mixed-Weibull MMN matrix variate distributions are presented and studied. Based on the specific presentation of the proposed model, an EM-type algorithm can be directly implemented for obtaining maximum likelihood estimate of the parameters. The usefulness and practical utility of the proposed methodology are illustrated through two conducted simulation studies and through the Landsat satellite dataset analysis.
Citation: Naderi M, Bekker A, Arashi M, Jamalizadeh A (2020) A theoretical framework for Landsat data modeling based on the matrix variate mean-mixture of normal model. PLoS ONE 15(4): e0230773. https://doi.org/10.1371/journal.pone.0230773
Editor: Daniel Capella Zanotta, Universidade do Vale do Rio dos Sinos, BRAZIL
Received: August 26, 2019; Accepted: February 11, 2020; Published: April 9, 2020
Copyright: © 2020 Naderi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are available: http://archive.ics.uci.edu/ml.
Funding: M. Naderi and A. Bekker acknowledge the research support provided by the National Research Foundation (NRF) of South Africa, Reference: CPRR160403161466 grant Number: 105840, Reference: SRUG190308422768 grant Number: 120839 and STATOMET. M. Arashi is also based upon research supported in part by the NRF of South Africa, Ref: IFR170227223754 grant Number: 109214 and SARChI Research Chair-UID: 71199 and Iran National Science Foundation (INSF) with grant number 97019472.
Competing interests: All authors have declared that no competing interests exist.
1 Introduction
The skew-normal (SN) distribution, initially introduced by Azzalini [1], has received considerable attention in both theoretical and applied statistics in the past two decades. Various extensions, forms and properties of the SN distribution in the multivariate case were derived in [2–5], and the acknowledged articles therein. An interesting form of the SN distribution was presented by Pyne et al. [3] who named it the restricted multivariate SN (rSN) model. Generally, the rSN distribution can be expressed as a linear transformation of the multivariate normally distributed random vector and the univariate truncated normal distribution. Although the rSN model, like the original SN one, can describe the skewness of data, it still is not robust in dealing with the outlying observations. To cover this drawback, Negarestani et al. [6] used the rSN transformation to introduce the family of multivariate mean mixture of normal (MMN) model. Specifically, a p-dimension random vector X is in the family of MMN distributions if
(1)
where ‘
’ stands for the equality in distribution, Z follows the multivariate normal model with zero mean and covariance matrix Σ, and W is an arbitrary random variable independent of Z. It is clear that the rSN distribution is a special case of (1) where the mixing variable W is followed by the truncated standard normal distribution lying within a truncated interval (0, ∞), denoted by
. It is shown by Negarestani et al. [6] that the family of MMN may provide a new model with wider range of skewness and kurtosis than the rSN, skew-t [4] and skew Student-t-normal [7] distributions. From (1), the probability distribution function (pdf) of random vector X can be presented as
(2)
where ϕp(⋅;⋅) denotes the pdf of multivariate normal distribution and h(·; ν) is the pdf of W parameterized by the vector parameter ν. The notation
will be used to indicate that X has pdf (2). Depending on the random variable W that can take values on the real line, the pdf (2) can be both symmetric and asymmetric. However, a more flexible and skewed version of the MMN model can be obtain if W has any asymmetric distribution or any positive support model like the truncated-normal, exponential and gamma distributions. Moreover, the pdf (2) can include skew-elliptical models, as the rSN distribution, or can result in skew non-elliptically contoured models if, for example, W is distributed as the exponential, Weibull and gamma models. From Fig 2 in Appendix A, it is observed that the family of MMN distributions offers different orientation compared with the family of mean-variance mixture of normal (MVMN) distributions [8].
Matrix variate distribution finds its genesis in modeling dependent multivariate observations in the normal case [9]. The recent use of the matrix variate normal (MVN) distribution can be found in modeling a wide variety of three-way data appearing in studies including control theory, stochastic systems, image recognition, repeated vector measurements, multivariate time series, spatial data, among others [10, 11]. The MVN distribution not only inherits some appealing properties, features as well as widespread applications from the multivariate normal model, but also it is still not stable and robust against non-normal features such as asymmetry and heavy tails. To deal with the heavy tailed data, Kshirsagar and Bartlett [12] proposed the matrix variate t distribution by showing that the estimator of the parameter matrix of regression coefficients unconditionally follows matrix variate t model. Bulut and Arslan [13] proposed the matrix variate slash distribution as a scale mixture of the matrix variate normal and the uniform distributions. Moreover, in accommodating skewness and kurtosis, the interest of skew distributions provides a platform for robust extension of matrix variate distribution. For instance, works on the matrix variate versions of SN distribution can be found in [14–17]. Even though the matrix variate SN distribution has many attractive properties, it suffers from robustness in dealing with heavy tailed data and from parameter estimation. Regarding these drawbacks of the matrix variate SN model and considering the aforementioned properties of the MMN family of distributions, the objective of this paper is to propose a family of matrix variate mean-mixture of normal (MVMMN) distributions. Some properties and features of our introduced family such as moments, the characteristic function, marginal and conditional distributions are studied. The maximum likelihood (ML) estimate of model parameters are computed by applying expectation-maximization (EM) type algorithm [18].
The contribution of this work can be broken down into six parts. We will begin the usual procedure with the model formulation of the MVMMN distribution in Section 2. Properties and characteristics of the MVMMN distribution are also studied in Section 3. The parameter estimation procedure using the EM-type algorithm and some computational strategies of implementation are given in Section 4. To examine the performance of the methodology into practice, simulation and real-world data analyses are presented in Sections 5 and 6. Finally, Section 7 gives some concluding remarks and future extensions.
2 Proposed family
To start the whole process, we begin with some notations and definitions. A random matrix variable defined as
follows a MVN distribution if its pdf is given as
(3)
where etr{A} = exp(tr(A)), tr(⋅) is the trace operator of a matrix, δ(X, M, Ψ, Σ) = Σ−1(X − M)Ψ−1(X − M)⊤ denotes the matrix variate Mahalanobis squared distance, and the mean matrix M and two dispersion matrices
,
are defined as
We shall use notation if
has pdf (3). The following definition is a new result from the representation (1) in the matrix format.
Definition 1 A random matrix variable
is said to have a MVMMN distribution if it can be generated by the stochastic representation
(4) where
, W is a random variable, independent of
, distributed by h(w; ν), and
is a skewness matrix defined as
It can be easily seen that the hierarchical representation of MVMMN model is (5)
Hence, the pdf of
can be given as
(6)
Applying the well-known property of the MVN distribution, we have
(7)
where vec(B) denotes the vectorization operator of matrix B, and ⊗ stands for the Kronecker product.
Remark 1 Referring to representation (4), it is clear that the mean of is M + Λ E(W), showing the assumption that the mean of MVMMN distribution is not fixed for all members of the population. We would like to emphasize that the family of matrix variate normal mean-variance mixture (MVNMVM) models [19, 20], assumes that both the mean and variance of the population member are not fixed. Therefore, an interesting extension of the MVMMN distribution can be introduced by considering the family of scale mixture of MVMMN distributions.
2.1 Special cases
- Restricted matrix variate skew-normal: If
in (4), then restricted matrix variate SN (RMVSN) distribution is arisen. The resulting pdf of
directly obtained by integrating out (6), is
(8) where η2 = tr(Ψ−1 Λ⊤ Σ−1 Λ) +1, A = η−1 [tr(Ψ−1 Λ⊤ Σ−1(Y − M))], and Φ(⋅) denotes the cumulative distribution function of standard normal model.
Lemma 1 If, then
where ϕ(⋅) is the pdf of standard normal distribution.
Proposition 1 Letand
. Then, W conditionally on
, denoted by WY, follows
.
Proof. Using the hierarchical representation (5), the pdf of RMVSN model (8), and the Bayes’ rule, we havewhich completes the proof after using some matrix factorizations.
- Convolution with exponential model: The exponentiated MVMMN (MVMMNE) distribution, say
, is derived as another special case of (4) if
, where
denotes the exponential distribution with mean 1. This leads to obtain the pdf of
form (6) as
where
,
.
Proposition 2 Letand
. Then,
.
Proof. In a similar manner as Proposition 1, the proof can be completed. - Convolution with Weibull model: The mixed-Weibull MVMMN (MVMMNW) distribution, denoted by
, is arisen when W in (4) follows the Weibull model respectively with shape and scale parameters α = 2 and β = 1,
. Hence, the associated pdf of
obtained by (6) is
where
,
.
Proposition 3 Letand
. Then, WY has the pdf
Moreover, for r = 1, 2, …,where
.
Proof. Results can be obtained from the Bayes’ rule and some matrix factorizations.
Theorem 1 The MVMMN distribution is log-concave if W has log-concave pdf.
Proof. Based on [21], if f(x) and g(y) are log-concave functions, then their convolution, i.e.,is also a log-concave function. Hence, the property of vectorization operator of the MVMMN distribution (7) and the fact that the MVN is log-concave completes the proof if W has a log-concave pdf.
Corollary 1 The RMVSN, MVMMNE and MVMMNW distributions are log-concave.
Proof. Since the truncated normal, exponential and Weibull (if the shape parameter is ≥1) distributions are log-concave, their associated matrix variate models are, using Theorem 1.
3 Characteristics
This section provides some substantial statistical properties of the MVMMN distribution.
Theorem 2 If
, then the mean and the characteristic function of
, respectively, are
where φW(⋅) is the characteristic function of W ∼ h(w; ν).
Proof. The proof of theorem can be completed by using the presented representations in Definition 1. Taking expectation on both sides of the stochastic representation (4) the first part is proved. Moreover for the second part, recall that the characteristic function of the matrix variate is given as
Hence, through the hierarchical representation (5), the characteristic function of is obtained by
.
Theorem 3 Let
, and M = (mij), Λ = (λij), Σ = (σij), Ψ = (ψij). Then, we have
Proof. (i) follows by using the hierarchical representation (5) and applying theorems 2.3.3 of [22]. For M = 0, it is clear from part (i) that
Therefore, we have
which completes the proof.
Theorem 4 The family of MVMMN distributions is closed under the transpose operator, i.e.,
Proof. Based on theorem 2.3.1 of [22], we have
Now, applying this transpose property of the MVN distribution into the hierarchical representation (5) results in
Theorem 5 Let
, and B is a q × p matrix of rank q ≤ p and D is a n × m matrix of rank m ≤ n. Then,
Proof. The proof of the theorem is completed through obtaining the characteristic function of :
where T1 = DT⊤ B. Now, by applying Theorem 2, we have
which is the characteristic function of
.
Theorem 6 Let
, and partition
, M, Λ, Σ, and Ψ as
where
, and
. Then,
Similarly, the marginal distribution of
, and
can be obtained.
Proof. The proof follows by applying Theorem 5 with considering B = (Iq 0q×(q−p)) and D = (Im 0m×(n−m))⊤, where Id denotes the unit matrix of order d.
Theorem 7 Let
, and partition Ψ, Σ as Theorem 6, and
, M, Λ as follows
where
, and
. Then,
- (i).
, and
.
- (ii).
, and
, where
,
, and
.
Proof. The proof of (i) is completed by considering proper matrices B and D in Theorem 5. Using the hierarchical representation (5) and applying theorem 2.3.12 of [22], the second part of the theorem is proven.
Corollary 2 If
and under partition of Theorem 7, we have
- (i).
where
,
and
.
- (ii).
, where
,
, and
.
Corollary 3 If
and under partition of Theorem (7), we have
- (i).
where
,
, and
.
- (ii).
, where
,
, and
.
The presentation of distribution of the matrix quadratic form, done by [23], can also be implemented in the context of the MVMMN family of distributions. Referring to theorem 2.2 of [23], they defined the distribution of quadratic form to be
where A is a n × n symmetric real matrix of rank r, and
.
Theorem 8 Let
and W ∼ h(w;ν) and An×n any n × n symmetric matrix of rank r. Then, conditionally on W = w,
are identically distributed, where δj are the non-zero eigenvalues of
and Bj are independent non-central Wishart distribution
for j = 1, …, r, where mj = Maj and aj are the corresponding orthogonal eigenvectors (
).
Proof. Using hierarchical representation (5) of the MVMMN model, we have . Consequently, the property of the matrix variate normal distribution leads to
. Now, by definition 2.1 of [23], we have
On the other hand, through theorem 2.2 of [23], we have
Therefore, the random matrices and B have identical distributions.
4 Parameter estimation
Suppose N matrix observations Y1, …, YN of dimension p × n are drawn independently and identically from the . Therefore, the log-likelihood function of
based on the observed data
is
(9)
To obtain ML estimate of Θ, an EM-type algorithm is implemented as a powerful estimation approach in dealing with the unobserved (missing and/or censored) data and latent variables [18]. The computations of EM algorithm are based on two iterative E- and M-steps. In E-step, the expected value of the complete-data log-likelihood function, the likelihood of the observed and missing data the latent variable, is computed, while in M-step, parameter estimates are updated by maximizing this expected value.
Through the hierarchical representation (5), the complete-data log-likelihood function of Θ, obtained by introducing latent variables W = (w1, …, wN) and omitting additive constants, is
(10)
ML estimation of Θ is performed by using the expectation-conditional maximization (ECM; [24]) algorithm as follows.
- Initialization: Set the number of iteration to k = 0 and choose a relative starting point Θ(k) = (M(k), Λ(k), Σ(k), Ψ(k), ν(k)). We point out that in our data examples the parameters are initialized by
, Λ(0) = 1p×n, Σ(0) = c1 Ip, Ψ(0) = c2 In. Here, 1p×n is a matrix of dimension p × n with unit elements. Moreover, the elements of two vectors c1 and c2 are computed, respectively, as
- E-step: The expected value of the complete-data log-likelihood function (10), called Q-function, is computed as
(11) where
,
, and depending on h(w; ν)
.
- First CM-step: Maximizing Q-function with respect to M and Λ give the following update
where
and
.
- Second CM-step: Update Σ and Ψ, respectively,
- Third CM-step: The additional parameter ν depending on the distribution of Wi is updated by
Remark 2 The conditional expectations
and
involved in the Q-function (11) can be obtained by Lemma 1 and Propositions 1, 2 and 3 for our three considered models. Furthermore, we note that in all special cases considered in Section 4, the distribution of mixing random variable W, is parameter free. Therefore, the last step of the ECM algorithm is not necessary.
4.1 Computational aspects
4.1.1 Convergence.
The process of the EM algorithm can be iterated until a suitable convergence rule, like or
, is satisfied where ε is a user specified tolerance and ℓ(⋅) is defined in (9). An alternative approach to determine convergence of the EM algorithm is the Aitken acceleration method [25]. To apply this approach, the asymptotic estimate of the log-likelihood at iteration k + 1, following [26], can be obtained as
where the Aitken acceleration of iteration k is
Therefore, the algorithm can be considered to have converged at iteration k + 1 when , [27]. In our study, the tolerance ϵ is considered as 10−5.
4.1.2 Model selection.
The models in competition in our data analysis are compared using the most commonly used measures Akaike information criterion (AIC; [28]) and Bayesian information criterion (BIC; [29]) defined as
where m is the number of free parameters and ℓmax is the maximized log-likelihood value. Models with lower values of AIC or BIC are considered more preferable.
5 Simulation studies
In this section, the performance of our model and its computational method is illustrated by conducting two simulation studies. The first simulation study aims at comparing the special cases of MVMMN model in dealing with skewed and leptokurtic simulated data. The second simulation study demonstrates whether our proposed ECM algorithm can provide good asymptotic properties.
Example 1 Model performance
In this experiment, simulated data are generated from a matrix variate normal inverse Gaussian (MVNIG; [20]) distribution with sample sizes N = 50, 100, 500, 1000 and 2000, to compare the performance of three special cases of MVMMN model. The MVNIG distribution belongs to the family of MVMVM models where the mixing random variable follows the , such that
denotes the generalized inverse Gaussian distribution with parameter (κ, χ, ψ) [30]. We consider this matrix variate distribution to generate non-normal data as it offers the desired level of asymmetry and leptokurtosis. Let χ = ψ = 3 and
Table 1 summarizes the average (ℓAV) and standard deviation (Std.) of the maximized log-likelihood together with the frequencies (out of 200 replications) of the particular model chosen based on the biggest ℓmax value. The results depicted in Table 1 reveal that the MVMMNE distribution provides a better fit than the other two MVMMN-based models. It is clear that the outperformance of MVMMNE distribution is improved by increasing the sample size, N.
In order to compare the accuracy of parameter estimates to the real values, the Frobenius (Frob.) norm is adopted. For a given d × m matrix A = [aij], the Frob. norm is defined as the square root of the sum of the squares of its elements, i.e. . Table 2 shows the average Frob. norm of
and
, where
and
are the ML estimates of the fitted model in the ith replication. It is observe that the Frob. norm decreases when the sample size increases. We can also see that the Frob. norm for Σ and Ψ for all models are very close to each other while the MVMME model has the furthest estimates of M and Λ.
Example 2 Performance of the model under AR(1) dependent structure
In order to investigate the effect of auto-regressive (AR(1)) dependent structure in Σ and Λ to the parameter estimates, we conduct another Monte Carlo simulation. In this experiment, we set A = 0 and Ψ−1 = I4 and where λ = 0.5, 2 and ρ = 0.5, 0.8. For generating a random sample from the MVMMN model, the value 0.001 is added to the diagonal elements of Σ to ensure that it is a positive definite matrix.
In each replication of 200 trials, the we generate data from the MVNIG distribution with true parameter values displayed above and χ = ψ = 3 for the sample sizes N = 100 and 1000. By fitting the RMVSN, MVMMNE and MVMMNW distributions to the generated data, the Frob. norm of and
are obtained. Table 3 summarizes the average Frob. norm of the ML estimates of the fitted models. As expected, the Frob. norm of the parameters decreases as the sample size increases. It can also be observed that the MVMMNW distribution has the smallest Frob. norm of
and
for the selected combinations of λ and ρ.
Example 3 Finite sample properties of the ML estimates
The second simulation study aims at investigating the finite-sample properties of ML estimators obtained by using the ECM algorithm. We consider the situation where Monte Carlo samples of sizes N = 100 and 500 are generated for each of the three special cases of MVMMN distribution. The presumed parameters for all distributions are same as used in Example 1. Fig 1 shows the marginal distributions of the columns, labeled by V1, V2, V3, and V4, for the RMVSN, MVMMNE and MVMMNW distributions of a typical dataset with size 100. The solid red line highlights the marginal mean. In each replication of 1000 trials, the synthetic dataset was fitted with the true generator model via ECM algorithm. To investigate the estimation accuracies, we calculate the bias and the mean squared error (MSE), defined as where
denotes the ML estimate of θtrue (a specific parameter) at the kth replication.
The detailed numerical results are reported in Table 4. It can be observed that the bias and MSE for all three special cases of MVMMN distribution tend to decrease toward zero by increasing the sample size, showing empirically the consistency of the ML estimates obtained via the ECM algorithm.
6 Analysis of Landsat data
To investigate the performance of the developed model in real-world data analysis, we consider Landsat satellite data (LSD) originally obtained by NASA and available at Irvine machine learning repository (http://archive.ics.uci.edu/ml). Each line of the LSD contains of four spectral values of nine pixel neighborhoods in a satellite image. In other words, the lines of LSD are related to a matrix of observations of 4 × 9 dimension. Moreover, each of the LSD matrix of observations belongs to one of six different classes, namely red soil, cotton crop, grey soil, damp grey soil, soil with vegetation stubble, and very damp grey soil. In our analysis, we focus on two classes, the red soil and cotton crop, with size 461 and 224, respectively, for illustrative purposes.
We fitted RMMVSN, MVMMNE and MVMMNW distributions by implementing the ECM algorithm. Table 5 shows a summary of ML fitting results, including the parameter estimates, maximized log-likelihood values, AIC and BIC of the three fitted models. It is observed that the MVMMNW and MVMMNE distributions respectively outperform the others for the red soil and cotton crop data. Based on the values of the shape matrix Λ, it is clear that the estimated skewness parameters are moderately to highly significant, showing that the distribution of matrix observation is skewed. Moreover, the estimated scale matrices Σ and Ψ highlight the covariance structure in the data.
7 Conclusion
This paper has introduced a new family of matrix variate distributions whose component pdfs arise from the mean-mixture of matrix variate normal model. Some properties and characteristics as well as three special cases of the new model are derived. We have developed a computationally EM-based algorithm for calibrating the matrix type parameters to the data. It is shown that the MVMMN distribution is closed under the formation of marginal and conditional distributions and under affine transformation which make it flexible to use in the various fields of three-variate data analysis, such as multivariate time series, image processing and longitudinal data analysis. Simulation results show that the ML estimates obtained via the ECM algorithm are empirically consistent. Moreover, numerical results from application to real dataset reveal that the proposed model is well suited in dealing with the skewed matrix variate experimental data.
The utility of our current approach can be extended to accommodate censored data based on a recent work studied in the multivariate case by [31, 32]. It may also be interesting to propose a family of scale mixture of MVMMN distribution to deal with heavy tailed three-way data. Another possible extension of the work herein is to consider finite mixture model based on the MVMMN distribution as a promising tools in classification and clustering heterogeneous matrix-valued asymmetric data [19, 33]. It would be of interest the distributions of the associated eigenvalues of the quadratic form (Theorem 8; for the complex form) to compute the channel capacity in wireless communication systems, since experimental data do not follow necessarily a normal distribution (see [34, 35]). All computations were carried out by R language and the computer program is available from the first author upon request.
Appendix A: Comparison of contour plots of the MMN and MVMN families
Fig 2 illustrate the contour plots of the bivariate rSN and bivariate exponentiated MMN (MMNE) distributions as special cases of MMN family as well as the contour plots of the bivariate generalized hyperbolic skew-t (GHST) and bivariate normal inverse Gaussian (NIG) distributions as special cases of MVMN family. O
Acknowledgments
Our sincere thanks go to the anonymous referees and the Associate Editor, for their comments which led to a considerable improvement on an earlier version of this paper.
References
- 1. Azzalini A. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics. 1985;12(2):171–178.
- 2. Azzalini A, Genton MG. Robust likelihood methods based on the skew-t and related distributions. International Statistical Review. 2008;76(1):106–129.
- 3. Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, et al. Automated high-dimensional flow cytometric data analysis. Proceedings of the National Academy of Sciences. 2009;106(21):8519–8524.
- 4. Lin TI. Robust mixture modeling using multivariate skew t distributions. Statistics and Computing. 2009;20(3):343–356.
- 5. Cabral CRB, Lachos VH, Prates MO. Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis. 2012;56(1):126–142.
- 6. Negarestani H, Jamalizadeh A, Shafiei S, Balakrishnan N. Mean mixtures of normal distributions: properties, inference and application. Metrika. 2019;82(4):501–528.
- 7. Ho HJ, Pyne S, Lin TI. Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms. Statistics and Computing. 2012;22(1):287–299.
- 8.
McNeil AJ, Frey R, Embrechts P. Quantitative risk management: Concepts, techniques and tools. 2005;.
- 9.
Roy SN. Some aspects of multivariate analysis. Statistical Publishing Society, Kolkata; 1957.
- 10.
Girko V, Gupta A. Multivariate elliptically contoured linear models and some aspects of the theory of random matrices. In: Multidimensional Statistical Analysis and Theory of Random Matrices: Proceedings of the Sixth Eugene Lukacs Symposium, Bowling Green, Ohio, USA, 29–30 March 1996. Walter de Gruyter GmbH & Co KG; 1996. p. 327.
- 11. Anderlucci L, Viroli C. Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data. The Annals of Applied Statistics. 2015;9(2):777–800.
- 12. Kshirsagar AM, Bartlett MS. Some extensions of the multivariate t distribution and the multivariate generalization of the distribution of the regression coefficient. Mathematical Proceedings of the Cambridge Philosophical Society. 1961;57(01):80.
- 13. Bulut YM, Arslan O. Matrix variate slash distribution. Journal of Multivariate Analysis. 2015;137:173–178.
- 14. Chen JT, Gupta AK. Matrix variate skew normal distributions. Statistics. 2005;39(3):247–253.
- 15. Harrar SW, Gupta AK. On matrix variate skew-normal distributions. Statistics. 2008;42(2):179–194.
- 16. Akdemir D, Gupta AK. A matrix variate skew distribution. European Journal of Pure and Applied Mathematics. 2010;3(2):128–140.
- 17. Zheng S, Hardin JM, Gupta AK. The inverse problem of multivariate and matrix-variate skew normal distributions. Statistics. 2012;46(3):361–371.
- 18. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society Series B (methodological). 1977;39(1):1–38.
- 19. Gallaugher MPB, McNicholas PD. Finite mixtures of skewed matrix variate distributions. Pattern Recognition. 2018;80:83–93.
- 20. Gallaugher MPB, McNicholas PD. Three skewed matrix variate distributions. Statistics & Probability Letters. 2019;145:103–109.
- 21.
Boyd S, Vandenberghe L. Convex optimization. Cambridge University Press; 2004.
- 22.
Gupta AK, Nagar DK. Matrix variate distributions. Chapman and Hall/CRC; 1999.
- 23. Singull M, Koski T. On the distribution of matrix quadratic forms. Communications in Statistics—Theory and Methods. 2012;41(18):3403–3415.
- 24. Meng XL, Rubin DB. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika. 1993;80(2):267–278.
- 25. Aitken AC. On Bernoulli’s numerical solution of algebraic equations. Proceedings of the Royal Society of Edinburgh. 1927;46:289–305.
- 26. Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG. The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics. 1994;46(2):373–388.
- 27.
Lindsay BG. Mixture models: theory, geometry and applications. In: NSF-CBMS regional conference series in probability and statistics. JSTOR; 1995. p. i–163.
- 28.
Akaike H. Information theory and an extension of the maximum likelihood principle. In: Selected papers of hirotugu akaike. Springer; 1998. p. 199–213.
- 29. Schwarz G, et al. Estimating the dimension of a model. The annals of statistics. 1978;6(2):461–464.
- 30. Good IJ. The population frequencies of species and the estimation of population parameters. Biometrika. 1953;40:237–260.
- 31. Wang WL, Lin TI, Lachos VH. Extending multivariate-t linear mixed models for multiple longitudinal data with censored responses and heavy tails. Statistical Methods in Medical Research. 2015;27(1):48–64. pmid:26668091
- 32. Lin TI, Lachos VH, Wang WL. Multivariate longitudinal data analysis with censored and intermittent missing responses. Statistics in Medicine. 2018;37(19):2822–2835.
- 33. Melnykov V, Zhu X. On model-based clustering of skewed matrix data. Journal of Multivariate Analysis. 2018;167:181–194.
- 34. Ratnarajah T, Vaillancourt R. Quadratic forms on complex random matrices and multiple-antenna systems. IEEE Transactions on Information Theory. 2005;51(8):2976–2984.
- 35. Bekker A, Ferreira J. Bivariate gamma type distributions for modeling wireless performance metrics. Statistics, Optimization & Information Computing. 2018;6(3).