A theoretical framework for Landsat data modeling based on the matrix variate mean-mixture of normal model

Mehrdad Naderi; Andriette Bekker; Mohammad Arashi; Ahad Jamalizadeh

doi:10.1371/journal.pone.0230773

Abstract

This paper introduces a new family of matrix variate distributions based on the mean-mixture of normal (MMN) models. The properties of the new matrix variate family, namely stochastic representation, moments and characteristic function, linear and quadratic forms as well as marginal and conditional distributions are investigated. Three special cases including the restricted skew-normal, exponentiated MMN and the mixed-Weibull MMN matrix variate distributions are presented and studied. Based on the specific presentation of the proposed model, an EM-type algorithm can be directly implemented for obtaining maximum likelihood estimate of the parameters. The usefulness and practical utility of the proposed methodology are illustrated through two conducted simulation studies and through the Landsat satellite dataset analysis.

Citation: Naderi M, Bekker A, Arashi M, Jamalizadeh A (2020) A theoretical framework for Landsat data modeling based on the matrix variate mean-mixture of normal model. PLoS ONE 15(4): e0230773. https://doi.org/10.1371/journal.pone.0230773

Editor: Daniel Capella Zanotta, Universidade do Vale do Rio dos Sinos, BRAZIL

Received: August 26, 2019; Accepted: February 11, 2020; Published: April 9, 2020

Copyright: © 2020 Naderi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are available: http://archive.ics.uci.edu/ml.

Funding: M. Naderi and A. Bekker acknowledge the research support provided by the National Research Foundation (NRF) of South Africa, Reference: CPRR160403161466 grant Number: 105840, Reference: SRUG190308422768 grant Number: 120839 and STATOMET. M. Arashi is also based upon research supported in part by the NRF of South Africa, Ref: IFR170227223754 grant Number: 109214 and SARChI Research Chair-UID: 71199 and Iran National Science Foundation (INSF) with grant number 97019472.

Competing interests: All authors have declared that no competing interests exist.

1 Introduction

The skew-normal (SN) distribution, initially introduced by Azzalini [1], has received considerable attention in both theoretical and applied statistics in the past two decades. Various extensions, forms and properties of the SN distribution in the multivariate case were derived in [2–5], and the acknowledged articles therein. An interesting form of the SN distribution was presented by Pyne et al. [3] who named it the restricted multivariate SN (rSN) model. Generally, the rSN distribution can be expressed as a linear transformation of the multivariate normally distributed random vector and the univariate truncated normal distribution. Although the rSN model, like the original SN one, can describe the skewness of data, it still is not robust in dealing with the outlying observations. To cover this drawback, Negarestani et al. [6] used the rSN transformation to introduce the family of multivariate mean mixture of normal (MMN) model. Specifically, a p-dimension random vector X is in the family of MMN distributions if (1) where ‘’ stands for the equality in distribution, Z follows the multivariate normal model with zero mean and covariance matrix Σ, and W is an arbitrary random variable independent of Z. It is clear that the rSN distribution is a special case of (1) where the mixing variable W is followed by the truncated standard normal distribution lying within a truncated interval (0, ∞), denoted by . It is shown by Negarestani et al. [6] that the family of MMN may provide a new model with wider range of skewness and kurtosis than the rSN, skew-t [4] and skew Student-t-normal [7] distributions. From (1), the probability distribution function (pdf) of random vector X can be presented as (2) where ϕ_p(⋅;⋅) denotes the pdf of multivariate normal distribution and h(·; ν) is the pdf of W parameterized by the vector parameter ν. The notation will be used to indicate that X has pdf (2). Depending on the random variable W that can take values on the real line, the pdf (2) can be both symmetric and asymmetric. However, a more flexible and skewed version of the MMN model can be obtain if W has any asymmetric distribution or any positive support model like the truncated-normal, exponential and gamma distributions. Moreover, the pdf (2) can include skew-elliptical models, as the rSN distribution, or can result in skew non-elliptically contoured models if, for example, W is distributed as the exponential, Weibull and gamma models. From Fig 2 in Appendix A, it is observed that the family of MMN distributions offers different orientation compared with the family of mean-variance mixture of normal (MVMN) distributions [8].

Matrix variate distribution finds its genesis in modeling dependent multivariate observations in the normal case [9]. The recent use of the matrix variate normal (MVN) distribution can be found in modeling a wide variety of three-way data appearing in studies including control theory, stochastic systems, image recognition, repeated vector measurements, multivariate time series, spatial data, among others [10, 11]. The MVN distribution not only inherits some appealing properties, features as well as widespread applications from the multivariate normal model, but also it is still not stable and robust against non-normal features such as asymmetry and heavy tails. To deal with the heavy tailed data, Kshirsagar and Bartlett [12] proposed the matrix variate t distribution by showing that the estimator of the parameter matrix of regression coefficients unconditionally follows matrix variate t model. Bulut and Arslan [13] proposed the matrix variate slash distribution as a scale mixture of the matrix variate normal and the uniform distributions. Moreover, in accommodating skewness and kurtosis, the interest of skew distributions provides a platform for robust extension of matrix variate distribution. For instance, works on the matrix variate versions of SN distribution can be found in [14–17]. Even though the matrix variate SN distribution has many attractive properties, it suffers from robustness in dealing with heavy tailed data and from parameter estimation. Regarding these drawbacks of the matrix variate SN model and considering the aforementioned properties of the MMN family of distributions, the objective of this paper is to propose a family of matrix variate mean-mixture of normal (MVMMN) distributions. Some properties and features of our introduced family such as moments, the characteristic function, marginal and conditional distributions are studied. The maximum likelihood (ML) estimate of model parameters are computed by applying expectation-maximization (EM) type algorithm [18].

The contribution of this work can be broken down into six parts. We will begin the usual procedure with the model formulation of the MVMMN distribution in Section 2. Properties and characteristics of the MVMMN distribution are also studied in Section 3. The parameter estimation procedure using the EM-type algorithm and some computational strategies of implementation are given in Section 4. To examine the performance of the methodology into practice, simulation and real-world data analyses are presented in Sections 5 and 6. Finally, Section 7 gives some concluding remarks and future extensions.

2 Proposed family

To start the whole process, we begin with some notations and definitions. A random matrix variable defined as follows a MVN distribution if its pdf is given as (3) where etr{A} = exp(tr(A)), tr(⋅) is the trace operator of a matrix, δ(X, M, Ψ, Σ) = Σ⁻¹(X − M)Ψ⁻¹(X − M)^⊤ denotes the matrix variate Mahalanobis squared distance, and the mean matrix M and two dispersion matrices , are defined as

We shall use notation if has pdf (3). The following definition is a new result from the representation (1) in the matrix format.

Definition 1 A random matrix variable is said to have a MVMMN distribution if it can be generated by the stochastic representation (4) where , W is a random variable, independent of , distributed by h(w; ν), and is a skewness matrix defined as

It can be easily seen that the hierarchical representation of MVMMN model is (5)

Hence, the pdf of can be given as (6)

Applying the well-known property of the MVN distribution, we have (7) where vec(B) denotes the vectorization operator of matrix B, and ⊗ stands for the Kronecker product.

Remark 1 Referring to representation (4), it is clear that the mean of is M + Λ E(W), showing the assumption that the mean of MVMMN distribution is not fixed for all members of the population. We would like to emphasize that the family of matrix variate normal mean-variance mixture (MVNMVM) models [19, 20], assumes that both the mean and variance of the population member are not fixed. Therefore, an interesting extension of the MVMMN distribution can be introduced by considering the family of scale mixture of MVMMN distributions.

2.1 Special cases

Restricted matrix variate skew-normal: If in (4), then restricted matrix variate SN (RMVSN) distribution is arisen. The resulting pdf of directly obtained by integrating out (6), is (8) where η² = tr(Ψ⁻¹ Λ^⊤ Σ⁻¹ Λ) +1, A = η⁻¹ [tr(Ψ⁻¹ Λ^⊤ Σ⁻¹(Y − M))], and Φ(⋅) denotes the cumulative distribution function of standard normal model.
Lemma 1 If , then where ϕ(⋅) is the pdf of standard normal distribution.
Proposition 1 Let and . Then, W conditionally on , denoted by W_Y, follows .
Proof. Using the hierarchical representation (5), the pdf of RMVSN model (8), and the Bayes’ rule, we have which completes the proof after using some matrix factorizations.
Convolution with exponential model: The exponentiated MVMMN (MVMMNE) distribution, say , is derived as another special case of (4) if , where denotes the exponential distribution with mean 1. This leads to obtain the pdf of form (6) as where , .
Proposition 2 Let and . Then, .
Proof. In a similar manner as Proposition 1, the proof can be completed.
Convolution with Weibull model: The mixed-Weibull MVMMN (MVMMNW) distribution, denoted by , is arisen when W in (4) follows the Weibull model respectively with shape and scale parameters α = 2 and β = 1, . Hence, the associated pdf of obtained by (6) is where , .
Proposition 3 Let and . Then, W_Y has the pdf
Moreover, for r = 1, 2, …, where .
Proof. Results can be obtained from the Bayes’ rule and some matrix factorizations.
Theorem 1 The MVMMN distribution is log-concave if W has log-concave pdf.
Proof. Based on [21], if f(x) and g(y) are log-concave functions, then their convolution, i.e., is also a log-concave function. Hence, the property of vectorization operator of the MVMMN distribution (7) and the fact that the MVN is log-concave completes the proof if W has a log-concave pdf.
Corollary 1 The RMVSN, MVMMNE and MVMMNW distributions are log-concave.
Proof. Since the truncated normal, exponential and Weibull (if the shape parameter is ≥1) distributions are log-concave, their associated matrix variate models are, using Theorem 1.

3 Characteristics

This section provides some substantial statistical properties of the MVMMN distribution.

Theorem 2 If , then the mean and the characteristic function of , respectively, are where φ_W(⋅) is the characteristic function of W ∼ h(w; ν).

Proof. The proof of theorem can be completed by using the presented representations in Definition 1. Taking expectation on both sides of the stochastic representation (4) the first part is proved. Moreover for the second part, recall that the characteristic function of the matrix variate is given as

Hence, through the hierarchical representation (5), the characteristic function of is obtained by .

Theorem 3 Let , and M = (m_ij), Λ = (λ_ij), Σ = (σ_ij), Ψ = (ψ_ij). Then, we have

(i). For 1 < i₁, i₂ < p, and 1 < j₁, j₂ < n,
(ii). If M = 0,

Proof. (i) follows by using the hierarchical representation (5) and applying theorems 2.3.3 of [22]. For M = 0, it is clear from part (i) that

Therefore, we have which completes the proof.

Theorem 4 The family of MVMMN distributions is closed under the transpose operator, i.e.,

Proof. Based on theorem 2.3.1 of [22], we have

Now, applying this transpose property of the MVN distribution into the hierarchical representation (5) results in

Theorem 5 Let , and B is a q × p matrix of rank q ≤ p and D is a n × m matrix of rank m ≤ n. Then,

Proof. The proof of the theorem is completed through obtaining the characteristic function of : where T₁ = DT^⊤ B. Now, by applying Theorem 2, we have which is the characteristic function of .

Theorem 6 Let , and partition , M, Λ, Σ, and Ψ as where , and . Then,

Similarly, the marginal distribution of , and can be obtained.

Proof. The proof follows by applying Theorem 5 with considering B = (I_q 0_q×(q−p)) and D = (I_m 0_m×(n−m))^⊤, where I_d denotes the unit matrix of order d.

Theorem 7 Let , and partition Ψ, Σ as Theorem 6, and , M, Λ as follows where , and . Then,

(i). , and .
(ii). , and , where , , and .

Proof. The proof of (i) is completed by considering proper matrices B and D in Theorem 5. Using the hierarchical representation (5) and applying theorem 2.3.12 of [22], the second part of the theorem is proven.

Corollary 2 If and under partition of Theorem 7, we have

(i). where , and .
(ii). , where , , and .

Corollary 3 If and under partition of Theorem (7), we have

(i). where , , and .
(ii). , where , , and .

The presentation of distribution of the matrix quadratic form, done by [23], can also be implemented in the context of the MVMMN family of distributions. Referring to theorem 2.2 of [23], they defined the distribution of quadratic form to be where A is a n × n symmetric real matrix of rank r, and .

Theorem 8 Let and W ∼ h(w;ν) and A_n×n any n × n symmetric matrix of rank r. Then, conditionally on W = w, are identically distributed, where δ_j are the non-zero eigenvalues of and B_j are independent non-central Wishart distribution for j = 1, …, r, where m_j = Ma_j and a_j are the corresponding orthogonal eigenvectors ().

Proof. Using hierarchical representation (5) of the MVMMN model, we have . Consequently, the property of the matrix variate normal distribution leads to . Now, by definition 2.1 of [23], we have

On the other hand, through theorem 2.2 of [23], we have

Therefore, the random matrices and B have identical distributions.

4 Parameter estimation

Suppose N matrix observations Y₁, …, Y_N of dimension p × n are drawn independently and identically from the . Therefore, the log-likelihood function of based on the observed data is (9)

To obtain ML estimate of Θ, an EM-type algorithm is implemented as a powerful estimation approach in dealing with the unobserved (missing and/or censored) data and latent variables [18]. The computations of EM algorithm are based on two iterative E- and M-steps. In E-step, the expected value of the complete-data log-likelihood function, the likelihood of the observed and missing data the latent variable, is computed, while in M-step, parameter estimates are updated by maximizing this expected value.

Through the hierarchical representation (5), the complete-data log-likelihood function of Θ, obtained by introducing latent variables W = (w₁, …, w_N) and omitting additive constants, is (10)

ML estimation of Θ is performed by using the expectation-conditional maximization (ECM; [24]) algorithm as follows.

Initialization: Set the number of iteration to k = 0 and choose a relative starting point Θ^(k) = (M^(k), Λ(^k), Σ^(k), Ψ^(k), ν^(k)). We point out that in our data examples the parameters are initialized by , Λ⁽⁰⁾ = 1_p×n, Σ⁽⁰⁾ = c₁ I_p, Ψ⁽⁰⁾ = c₂ I_n. Here, 1_p×n is a matrix of dimension p × n with unit elements. Moreover, the elements of two vectors c₁ and c₂ are computed, respectively, as
E-step: The expected value of the complete-data log-likelihood function (10), called Q-function, is computed as (11) where , , and depending on h(w; ν) .
First CM-step: Maximizing Q-function with respect to M and Λ give the following update where and .
Second CM-step: Update Σ and Ψ, respectively,
Third CM-step: The additional parameter ν depending on the distribution of W_i is updated by

Remark 2 The conditional expectations and involved in the Q-function (11) can be obtained by Lemma 1 and Propositions 1, 2 and 3 for our three considered models. Furthermore, we note that in all special cases considered in Section 4, the distribution of mixing random variable W, is parameter free. Therefore, the last step of the ECM algorithm is not necessary.

4.1 Computational aspects

4.1.1 Convergence.

The process of the EM algorithm can be iterated until a suitable convergence rule, like or , is satisfied where ε is a user specified tolerance and ℓ(⋅) is defined in (9). An alternative approach to determine convergence of the EM algorithm is the Aitken acceleration method [25]. To apply this approach, the asymptotic estimate of the log-likelihood at iteration k + 1, following [26], can be obtained as where the Aitken acceleration of iteration k is

Therefore, the algorithm can be considered to have converged at iteration k + 1 when , [27]. In our study, the tolerance ϵ is considered as 10⁻⁵.

4.1.2 Model selection.

The models in competition in our data analysis are compared using the most commonly used measures Akaike information criterion (AIC; [28]) and Bayesian information criterion (BIC; [29]) defined as where m is the number of free parameters and ℓ_max is the maximized log-likelihood value. Models with lower values of AIC or BIC are considered more preferable.

5 Simulation studies

In this section, the performance of our model and its computational method is illustrated by conducting two simulation studies. The first simulation study aims at comparing the special cases of MVMMN model in dealing with skewed and leptokurtic simulated data. The second simulation study demonstrates whether our proposed ECM algorithm can provide good asymptotic properties.

Example 1 Model performance

In this experiment, simulated data are generated from a matrix variate normal inverse Gaussian (MVNIG; [20]) distribution with sample sizes N = 50, 100, 500, 1000 and 2000, to compare the performance of three special cases of MVMMN model. The MVNIG distribution belongs to the family of MVMVM models where the mixing random variable follows the , such that denotes the generalized inverse Gaussian distribution with parameter (κ, χ, ψ) [30]. We consider this matrix variate distribution to generate non-normal data as it offers the desired level of asymmetry and leptokurtosis. Let χ = ψ = 3 and

Table 1 summarizes the average (ℓ_AV) and standard deviation (Std.) of the maximized log-likelihood together with the frequencies (out of 200 replications) of the particular model chosen based on the biggest ℓ_max value. The results depicted in Table 1 reveal that the MVMMNE distribution provides a better fit than the other two MVMMN-based models. It is clear that the outperformance of MVMMNE distribution is improved by increasing the sample size, N.

Download:

Table 1. Mean and standard deviation for the maximized log-likelihood and frequency of model outperformance in 200 replications for various sample sizes.

https://doi.org/10.1371/journal.pone.0230773.t001

In order to compare the accuracy of parameter estimates to the real values, the Frobenius (Frob.) norm is adopted. For a given d × m matrix A = [a_ij], the Frob. norm is defined as the square root of the sum of the squares of its elements, i.e. . Table 2 shows the average Frob. norm of and , where and are the ML estimates of the fitted model in the ith replication. It is observe that the Frob. norm decreases when the sample size increases. We can also see that the Frob. norm for Σ and Ψ for all models are very close to each other while the MVMME model has the furthest estimates of M and Λ.

Download:

Table 2. Mean of Frob. norm for parameter estimates of the candidate distributions for various sample sizes.

https://doi.org/10.1371/journal.pone.0230773.t002

Example 2 Performance of the model under AR(1) dependent structure

In order to investigate the effect of auto-regressive (AR(1)) dependent structure in Σ and Λ to the parameter estimates, we conduct another Monte Carlo simulation. In this experiment, we set A = 0 and Ψ⁻¹ = I₄ and where λ = 0.5, 2 and ρ = 0.5, 0.8. For generating a random sample from the MVMMN model, the value 0.001 is added to the diagonal elements of Σ to ensure that it is a positive definite matrix.

In each replication of 200 trials, the we generate data from the MVNIG distribution with true parameter values displayed above and χ = ψ = 3 for the sample sizes N = 100 and 1000. By fitting the RMVSN, MVMMNE and MVMMNW distributions to the generated data, the Frob. norm of and are obtained. Table 3 summarizes the average Frob. norm of the ML estimates of the fitted models. As expected, the Frob. norm of the parameters decreases as the sample size increases. It can also be observed that the MVMMNW distribution has the smallest Frob. norm of and for the selected combinations of λ and ρ.

Download:

Table 3. Mean of Frob. norm for parameter estimates of the candidate distributions for some selected values of λ and ρ.

https://doi.org/10.1371/journal.pone.0230773.t003

Example 3 Finite sample properties of the ML estimates

The second simulation study aims at investigating the finite-sample properties of ML estimators obtained by using the ECM algorithm. We consider the situation where Monte Carlo samples of sizes N = 100 and 500 are generated for each of the three special cases of MVMMN distribution. The presumed parameters for all distributions are same as used in Example 1. Fig 1 shows the marginal distributions of the columns, labeled by V1, V2, V3, and V4, for the RMVSN, MVMMNE and MVMMNW distributions of a typical dataset with size 100. The solid red line highlights the marginal mean. In each replication of 1000 trials, the synthetic dataset was fitted with the true generator model via ECM algorithm. To investigate the estimation accuracies, we calculate the bias and the mean squared error (MSE), defined as where denotes the ML estimate of θ_true (a specific parameter) at the kth replication.

Download:

Fig 1. Marginals of a typical simulated data form the RMVSN, MVMMNE and MVMMNW distributions if the drawing has been lengthwise stretched.

https://doi.org/10.1371/journal.pone.0230773.g001

The detailed numerical results are reported in Table 4. It can be observed that the bias and MSE for all three special cases of MVMMN distribution tend to decrease toward zero by increasing the sample size, showing empirically the consistency of the ML estimates obtained via the ECM algorithm.

Download:

Table 4. Simulation results for assessing the consistency of ML parameter estimates with two sample sizes.

https://doi.org/10.1371/journal.pone.0230773.t004

6 Analysis of Landsat data

To investigate the performance of the developed model in real-world data analysis, we consider Landsat satellite data (LSD) originally obtained by NASA and available at Irvine machine learning repository (http://archive.ics.uci.edu/ml). Each line of the LSD contains of four spectral values of nine pixel neighborhoods in a satellite image. In other words, the lines of LSD are related to a matrix of observations of 4 × 9 dimension. Moreover, each of the LSD matrix of observations belongs to one of six different classes, namely red soil, cotton crop, grey soil, damp grey soil, soil with vegetation stubble, and very damp grey soil. In our analysis, we focus on two classes, the red soil and cotton crop, with size 461 and 224, respectively, for illustrative purposes.

We fitted RMMVSN, MVMMNE and MVMMNW distributions by implementing the ECM algorithm. Table 5 shows a summary of ML fitting results, including the parameter estimates, maximized log-likelihood values, AIC and BIC of the three fitted models. It is observed that the MVMMNW and MVMMNE distributions respectively outperform the others for the red soil and cotton crop data. Based on the values of the shape matrix Λ, it is clear that the estimated skewness parameters are moderately to highly significant, showing that the distribution of matrix observation is skewed. Moreover, the estimated scale matrices Σ and Ψ highlight the covariance structure in the data.

Download:

Table 5. Parameters estimates and the performance summary of three matrix models on the LSD subsets.

https://doi.org/10.1371/journal.pone.0230773.t005

7 Conclusion

This paper has introduced a new family of matrix variate distributions whose component pdfs arise from the mean-mixture of matrix variate normal model. Some properties and characteristics as well as three special cases of the new model are derived. We have developed a computationally EM-based algorithm for calibrating the matrix type parameters to the data. It is shown that the MVMMN distribution is closed under the formation of marginal and conditional distributions and under affine transformation which make it flexible to use in the various fields of three-variate data analysis, such as multivariate time series, image processing and longitudinal data analysis. Simulation results show that the ML estimates obtained via the ECM algorithm are empirically consistent. Moreover, numerical results from application to real dataset reveal that the proposed model is well suited in dealing with the skewed matrix variate experimental data.

The utility of our current approach can be extended to accommodate censored data based on a recent work studied in the multivariate case by [31, 32]. It may also be interesting to propose a family of scale mixture of MVMMN distribution to deal with heavy tailed three-way data. Another possible extension of the work herein is to consider finite mixture model based on the MVMMN distribution as a promising tools in classification and clustering heterogeneous matrix-valued asymmetric data [19, 33]. It would be of interest the distributions of the associated eigenvalues of the quadratic form (Theorem 8; for the complex form) to compute the channel capacity in wireless communication systems, since experimental data do not follow necessarily a normal distribution (see [34, 35]). All computations were carried out by R language and the computer program is available from the first author upon request.

Appendix A: Comparison of contour plots of the MMN and MVMN families

Fig 2 illustrate the contour plots of the bivariate rSN and bivariate exponentiated MMN (MMNE) distributions as special cases of MMN family as well as the contour plots of the bivariate generalized hyperbolic skew-t (GHST) and bivariate normal inverse Gaussian (NIG) distributions as special cases of MVMN family. O

Download:

Fig 2. Contour plots comparison of special cases of the MMN and MVMN families.

https://doi.org/10.1371/journal.pone.0230773.g002

Acknowledgments

Our sincere thanks go to the anonymous referees and the Associate Editor, for their comments which led to a considerable improvement on an earlier version of this paper.

References

1. Azzalini A. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics. 1985;12(2):171–178.
- View Article
- Google Scholar
2. Azzalini A, Genton MG. Robust likelihood methods based on the skew-t and related distributions. International Statistical Review. 2008;76(1):106–129.
- View Article
- Google Scholar
3. Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, et al. Automated high-dimensional flow cytometric data analysis. Proceedings of the National Academy of Sciences. 2009;106(21):8519–8524.
- View Article
- Google Scholar
4. Lin TI. Robust mixture modeling using multivariate skew t distributions. Statistics and Computing. 2009;20(3):343–356.
- View Article
- Google Scholar
5. Cabral CRB, Lachos VH, Prates MO. Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis. 2012;56(1):126–142.
- View Article
- Google Scholar
6. Negarestani H, Jamalizadeh A, Shafiei S, Balakrishnan N. Mean mixtures of normal distributions: properties, inference and application. Metrika. 2019;82(4):501–528.
- View Article
- Google Scholar
7. Ho HJ, Pyne S, Lin TI. Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms. Statistics and Computing. 2012;22(1):287–299.
- View Article
- Google Scholar
8. McNeil AJ, Frey R, Embrechts P. Quantitative risk management: Concepts, techniques and tools. 2005;.
9. Roy SN. Some aspects of multivariate analysis. Statistical Publishing Society, Kolkata; 1957.
10. Girko V, Gupta A. Multivariate elliptically contoured linear models and some aspects of the theory of random matrices. In: Multidimensional Statistical Analysis and Theory of Random Matrices: Proceedings of the Sixth Eugene Lukacs Symposium, Bowling Green, Ohio, USA, 29–30 March 1996. Walter de Gruyter GmbH & Co KG; 1996. p. 327.
11. Anderlucci L, Viroli C. Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data. The Annals of Applied Statistics. 2015;9(2):777–800.
- View Article
- Google Scholar
12. Kshirsagar AM, Bartlett MS. Some extensions of the multivariate t distribution and the multivariate generalization of the distribution of the regression coefficient. Mathematical Proceedings of the Cambridge Philosophical Society. 1961;57(01):80.
- View Article
- Google Scholar
13. Bulut YM, Arslan O. Matrix variate slash distribution. Journal of Multivariate Analysis. 2015;137:173–178.
- View Article
- Google Scholar
14. Chen JT, Gupta AK. Matrix variate skew normal distributions. Statistics. 2005;39(3):247–253.
- View Article
- Google Scholar
15. Harrar SW, Gupta AK. On matrix variate skew-normal distributions. Statistics. 2008;42(2):179–194.
- View Article
- Google Scholar
16. Akdemir D, Gupta AK. A matrix variate skew distribution. European Journal of Pure and Applied Mathematics. 2010;3(2):128–140.
- View Article
- Google Scholar
17. Zheng S, Hardin JM, Gupta AK. The inverse problem of multivariate and matrix-variate skew normal distributions. Statistics. 2012;46(3):361–371.
- View Article
- Google Scholar
18. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society Series B (methodological). 1977;39(1):1–38.
- View Article
- Google Scholar
19. Gallaugher MPB, McNicholas PD. Finite mixtures of skewed matrix variate distributions. Pattern Recognition. 2018;80:83–93.
- View Article
- Google Scholar
20. Gallaugher MPB, McNicholas PD. Three skewed matrix variate distributions. Statistics & Probability Letters. 2019;145:103–109.
- View Article
- Google Scholar
21. Boyd S, Vandenberghe L. Convex optimization. Cambridge University Press; 2004.
22. Gupta AK, Nagar DK. Matrix variate distributions. Chapman and Hall/CRC; 1999.
23. Singull M, Koski T. On the distribution of matrix quadratic forms. Communications in Statistics—Theory and Methods. 2012;41(18):3403–3415.
- View Article
- Google Scholar
24. Meng XL, Rubin DB. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika. 1993;80(2):267–278.
- View Article
- Google Scholar
25. Aitken AC. On Bernoulli’s numerical solution of algebraic equations. Proceedings of the Royal Society of Edinburgh. 1927;46:289–305.
- View Article
- Google Scholar
26. Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG. The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics. 1994;46(2):373–388.
- View Article
- Google Scholar
27. Lindsay BG. Mixture models: theory, geometry and applications. In: NSF-CBMS regional conference series in probability and statistics. JSTOR; 1995. p. i–163.
28. Akaike H. Information theory and an extension of the maximum likelihood principle. In: Selected papers of hirotugu akaike. Springer; 1998. p. 199–213.
29. Schwarz G, et al. Estimating the dimension of a model. The annals of statistics. 1978;6(2):461–464.
- View Article
- Google Scholar
30. Good IJ. The population frequencies of species and the estimation of population parameters. Biometrika. 1953;40:237–260.
- View Article
- Google Scholar
31. Wang WL, Lin TI, Lachos VH. Extending multivariate-t linear mixed models for multiple longitudinal data with censored responses and heavy tails. Statistical Methods in Medical Research. 2015;27(1):48–64. pmid:26668091
- View Article
- PubMed/NCBI
- Google Scholar
32. Lin TI, Lachos VH, Wang WL. Multivariate longitudinal data analysis with censored and intermittent missing responses. Statistics in Medicine. 2018;37(19):2822–2835.
- View Article
- Google Scholar
33. Melnykov V, Zhu X. On model-based clustering of skewed matrix data. Journal of Multivariate Analysis. 2018;167:181–194.
- View Article
- Google Scholar
34. Ratnarajah T, Vaillancourt R. Quadratic forms on complex random matrices and multiple-antenna systems. IEEE Transactions on Information Theory. 2005;51(8):2976–2984.
- View Article
- Google Scholar
35. Bekker A, Ferreira J. Bivariate gamma type distributions for modeling wireless performance metrics. Statistics, Optimization & Information Computing. 2018;6(3).
- View Article
- Google Scholar

[ref1] 1. Azzalini A. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics. 1985;12(2):171–178.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Azzalini A, Genton MG. Robust likelihood methods based on the skew-t and related distributions. International Statistical Review. 2008;76(1):106–129.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, et al. Automated high-dimensional flow cytometric data analysis. Proceedings of the National Academy of Sciences. 2009;106(21):8519–8524.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Lin TI. Robust mixture modeling using multivariate skew t distributions. Statistics and Computing. 2009;20(3):343–356.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Cabral CRB, Lachos VH, Prates MO. Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis. 2012;56(1):126–142.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Negarestani H, Jamalizadeh A, Shafiei S, Balakrishnan N. Mean mixtures of normal distributions: properties, inference and application. Metrika. 2019;82(4):501–528.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Ho HJ, Pyne S, Lin TI. Maximum likelihood inference for mixtures of skew Student-t-normal distributions through practical EM-type algorithms. Statistics and Computing. 2012;22(1):287–299.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. McNeil AJ, Frey R, Embrechts P. Quantitative risk management: Concepts, techniques and tools. 2005;.

[ref9] 9. Roy SN. Some aspects of multivariate analysis. Statistical Publishing Society, Kolkata; 1957.

[ref10] 10. Girko V, Gupta A. Multivariate elliptically contoured linear models and some aspects of the theory of random matrices. In: Multidimensional Statistical Analysis and Theory of Random Matrices: Proceedings of the Sixth Eugene Lukacs Symposium, Bowling Green, Ohio, USA, 29–30 March 1996. Walter de Gruyter GmbH & Co KG; 1996. p. 327.

[ref11] 11. Anderlucci L, Viroli C. Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data. The Annals of Applied Statistics. 2015;9(2):777–800.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref12] 12. Kshirsagar AM, Bartlett MS. Some extensions of the multivariate t distribution and the multivariate generalization of the distribution of the regression coefficient. Mathematical Proceedings of the Cambridge Philosophical Society. 1961;57(01):80.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref13] 13. Bulut YM, Arslan O. Matrix variate slash distribution. Journal of Multivariate Analysis. 2015;137:173–178.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref14] 14. Chen JT, Gupta AK. Matrix variate skew normal distributions. Statistics. 2005;39(3):247–253.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref15] 15. Harrar SW, Gupta AK. On matrix variate skew-normal distributions. Statistics. 2008;42(2):179–194.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref16] 16. Akdemir D, Gupta AK. A matrix variate skew distribution. European Journal of Pure and Applied Mathematics. 2010;3(2):128–140.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref17] 17. Zheng S, Hardin JM, Gupta AK. The inverse problem of multivariate and matrix-variate skew normal distributions. Statistics. 2012;46(3):361–371.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref18] 18. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society Series B (methodological). 1977;39(1):1–38.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref19] 19. Gallaugher MPB, McNicholas PD. Finite mixtures of skewed matrix variate distributions. Pattern Recognition. 2018;80:83–93.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref20] 20. Gallaugher MPB, McNicholas PD. Three skewed matrix variate distributions. Statistics & Probability Letters. 2019;145:103–109.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref21] 21. Boyd S, Vandenberghe L. Convex optimization. Cambridge University Press; 2004.

[ref22] 22. Gupta AK, Nagar DK. Matrix variate distributions. Chapman and Hall/CRC; 1999.

[ref23] 23. Singull M, Koski T. On the distribution of matrix quadratic forms. Communications in Statistics—Theory and Methods. 2012;41(18):3403–3415.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref24] 24. Meng XL, Rubin DB. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika. 1993;80(2):267–278.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref25] 25. Aitken AC. On Bernoulli’s numerical solution of algebraic equations. Proceedings of the Royal Society of Edinburgh. 1927;46:289–305.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref26] 26. Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG. The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics. 1994;46(2):373–388.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref27] 27. Lindsay BG. Mixture models: theory, geometry and applications. In: NSF-CBMS regional conference series in probability and statistics. JSTOR; 1995. p. i–163.

[ref28] 28. Akaike H. Information theory and an extension of the maximum likelihood principle. In: Selected papers of hirotugu akaike. Springer; 1998. p. 199–213.

[ref29] 29. Schwarz G, et al. Estimating the dimension of a model. The annals of statistics. 1978;6(2):461–464.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref30] 30. Good IJ. The population frequencies of species and the estimation of population parameters. Biometrika. 1953;40:237–260.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref31] 31. Wang WL, Lin TI, Lachos VH. Extending multivariate-t linear mixed models for multiple longitudinal data with censored responses and heavy tails. Statistical Methods in Medical Research. 2015;27(1):48–64. pmid:26668091
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref32] 32. Lin TI, Lachos VH, Wang WL. Multivariate longitudinal data analysis with censored and intermittent missing responses. Statistics in Medicine. 2018;37(19):2822–2835.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref33] 33. Melnykov V, Zhu X. On model-based clustering of skewed matrix data. Journal of Multivariate Analysis. 2018;167:181–194.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref34] 34. Ratnarajah T, Vaillancourt R. Quadratic forms on complex random matrices and multiple-antenna systems. IEEE Transactions on Information Theory. 2005;51(8):2976–2984.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref35] 35. Bekker A, Ferreira J. Bivariate gamma type distributions for modeling wireless performance metrics. Statistics, Optimization & Information Computing. 2018;6(3).
View Article
Google Scholar

[91] View Article

[92] Google Scholar

Figures

Abstract

1 Introduction

2 Proposed family

2.1 Special cases

3 Characteristics

4 Parameter estimation

4.1 Computational aspects

4.1.1 Convergence.

4.1.2 Model selection.

5 Simulation studies

6 Analysis of Landsat data

7 Conclusion

Appendix A: Comparison of contour plots of the MMN and MVMN families

Acknowledgments

References