Multivariate Longitudinal Analysis with Bivariate Correlation Test

In the context of multivariate multilevel data analysis, this paper focuses on the multivariate linear mixed-effects model, including all the correlations between the random effects when the dimensional residual terms are assumed uncorrelated. Using the EM algorithm, we suggest more general expressions of the model’s parameters estimators. These estimators can be used in the framework of the multivariate longitudinal data analysis as well as in the more general context of the analysis of multivariate multilevel data. By using a likelihood ratio test, we test the significance of the correlations between the random effects of two dependent variables of the model, in order to investigate whether or not it is useful to model these dependent variables jointly. Simulation studies are done to assess both the parameter recovery performance of the EM estimators and the power of the test. Using two empirical data sets which are of longitudinal multivariate type and multivariate multilevel type, respectively, the usefulness of the test is illustrated.


Introduction
In statistical studies, one often needs to analyze data with nested sources of variability: e.g., pupils in classes, employees in companies, repeated measurements in subjects, etc. [1] referred to these type of data as grouped data which are also named multilevel data, hierarchical data or nested data in the literature [2][3][4]. In the analysis of such data, it is usually illuminating to take account of the variability associated with each level of nesting. There is variability, e.g., between pupils but also between classes. The measurements related to a specific subject (level of nesting) can be correlated, while observations from different subjects are usually independent, and one may draw wrong conclusions if either of these sources of variability is ignored [5]. A series of works in statistical literature focus on the analysis of univariate multilevel data (or univariate grouped data) where a single outcome of interest is analyzed [6][7][8][9][10][11]. Such analyses are generally simple to deal with due to the availability of many software packages conceived to perform them [12][13][14]. In practice, many scientific questions of interest require to focus on multiple outcomes, all arising from the same multilevel study, leading to the so-called multivariate multilevel data. For example, to answer some questions of interest, [15] analyzed hearing threshold data (in the Baltimore Longitudinal Study on Aging) [16] which consisted in the longitudinal recording of 22 variables. [17] also studied the joint evolution of HIV RNA and CD4+ T lymphocytes in a cohort of HIV-1 infected patients treated with highly active antiretroviral treatment, by jointly analyzing both markers. [18] used multivariate multilevel regression analysis to investigate individual level determinants of self rated health and happiness, as well as the extent of community level covariation in health and happiness. [19] also used multivariate multilevel analysis to jointly model three commonly used indicators of fear of crime which are: feeling unsafe alone at home after dark, feeling unsafe walking alone after dark and worry about becoming a victim of crime. A variety of works were devoted to joint modeling during the last few decades (see e.g., [20][21][22][23][24]).
These analyses often require a specification of the joint density of all outcomes or, at least, the correlation structure of the data and therefore can lead to the parsimony and/or computation (optimization) problems as well as to numerical difficulties in statistical inference, when the dimension of these outcomes increases. Many analysis strategies were proposed in the statistical literature to circumvent these problems. These strategies generally consist in reducing the dimensionality of the multivariate vector of outcomes and/or in using a small number of latent variables to model correlations within these data. Joint analysis of multivariate multilevel data then requires a trade-off between the increase of the computational complexity and the gain in information.
In this work, we focus on the multivariate linear mixed-effects model, including all the correlations between the random effects along with the independent marginal (dimensional) residuals. The correlations between two dependent variables are then those from the random effects related to these dependent variables. The class of mixed-effects models considered here assumes that both the random effects and the errors (residuals) follow Gaussian distributions. These models are intended for the analysis of multivariate multilevel data in which the dependent variables are continuous.
We use the EM algorithm to estimate the parameters of the model but here, we have two novelties: 1) we suggest a general expression of EM-based estimators which can help in analyzing multivariate longitudinal data as well as the multivariate multilevel data, not of the longitudinal type, and 2) we test the significance of the correlations between the random effects of two dependent variables, using the likelihood ratio test which allows to decide if some dependent variables are significantly correlated or not. By using this bivariate correlation test, the novelty here is the illustration, through empirical data, of some of the consequences of performing separate analyses when a joint analysis is required. Two dependent variables which are found to be uncorrelated after this test will be analyzed with two independent models (or analyzed separately). This strategy may be considered as a way toward the obtaining of a more parsimonious model in high dimension without losing much information. It may also be used in a joint modeling selection procedure.
The paper is organized as follows. In Section 2, contributions of previous works are briefly presented. We also present in this section the EM-based estimators of the parameters of the multivariate linear mixed model. Simulations studies are done in Section 3 where we also discuss the power of the likelihood ratio test which allows to test the significance of the correlation between two response variables. Two illustrations on empirical data are also done in Section 3. The first, concerning bivariate two-level data, is about a study on the effects of school differences on pupils' progress in Dutch language and arithmetics in the Netherlands. The second illustration concerns a longitudinal study on the immune response to malaria of infants in Benin.

Previous works
In this Section, we briefly recall the framework of the multivariate multilevel analysis (see for instance, [25,26]). We can basically distinguish two main approaches to model such data: those which specify the joint distribution of all outcomes without the use of latent structures, and the models using latent structures. We denote by y 1 , …, y m the m dependent vectors of interest, and y ¼ ðy > 1 ; Á Á Á ; y > m Þ > .
Modeling methods without latent structures. The first approach which is that of the modeling without latent structures comprises three sub-approaches consisting in a) direct specification of the correlation structure of y, b) analysis without explicit modeling of the correlation structure of y and c) conditional models.
In the case of direct specification of Cov(y), [27] and [28] factorized the covariance matrix of y by using the Kronecker product in order to have more parsimonious models in the context of fully balanced data. With the same idea of having a parsimonious structure, [29] specified the intra-outcome and inter-outcome correlations, respectively, as follows: Corrðy k ðtÞ; y k ðsÞÞ ¼ exp ðajs À tj y k Þ and Corrðy k ðtÞ; y k 0 ðsÞÞ ¼ exp ððajs À tj þ 1Þ y kk 0 Þ, with t and s indicating the time, and k and k 0 indicating the dimension. Although these models are useful, they are often too restrictive and may not be realistic in many applications, especially when the data, for example, in the longitudinal studies are unbalanced (i.e. the number of available measurements per subject and the time points at which the measurements were taken often differ from one subject to another). Another class of joint models, specifying directly the joint distribution of y, and whose application is often not straightforward, due to unbalanced data structures is the so-called copula model [30,31]. Denoting by F i , i = 1, …, m the cumulative distribution function of the ith component of y, y i , a copula model is defined by an mdimensional cumulative distribution function C(u 1 , …, u m ) with uniform marginals such that F(y 1 , …, y m ) = C(u 1 , …, u m ) with (U 1 , …, U m ) = (F 1 (y 1 ), …, F m (y m )), where F is the joint cumulative distribution of y. That is, the joint distribution of y can be written in terms of its marginal distributions and a copula which describes the dependence structure between its components. While the construction of copulas is mathematically elegant, parameters estimation is often not feasible, especially in high-dimensional situations [26]. One of the rare applications of the copula-based modeling in the multivariate multilevel data analysis framework was proposed by [32], who studied the hemodynamic effect of a new antidepressant on the diastolic blood pressure, the systolic blood pressure and the heart rate of 10 healthy volunteers. They separately modeled, at first, each longitudinal series of response and used a copula to relate the marginal distributions of these responses at each observation time. In a second step, at each observation time, the conditional (on the past) distributions of each response were related using another copula describing the relationship between the corresponding variables. One of the advantages of this approach is that there is no need to use the same family of distributions for all response variables. As [33] used ARIMA process to model the error structure of earnings in a longitudinal data analysis context, time series models can also be used for modeling multivariate multilevel data in order to describe the dynamic dependence between variables and perform forecasting. The most commonly used multivariate time series model, the vector autoregressive (VAR) model which is relatively easy to estimate, is found to be similar to the multivariate multiple linear regression [34] where the errors for different response variables on the same trial are set to be correlated [35]. Other examples of VAR modeling include [36] and [37], but one drawback of the model is that the number of parameters can become very large, potentially leading to estimation problems [38].
Regarding analysis without explicit specification of Cov(y), [6] proposed an extension of generalized linear models to the analysis of longitudinal data, where they introduced a class of estimating equations called generalized estimating equations (GEE). GEE estimation ensures consistent estimates of the regression parameters without specifying the joint distribution of a subject's observations. That is, GEE replaces V½y by the so-called working covariance matrix W(α) which depends on an unknown vector α to estimate. The related working correlation matrix, R(α), is also considered. Incorrect choice of W(α) does not affect the consistency of the regression parameters' estimators [6]. [39] discussed the use of GEEs with multivariate discrete variables, where focus was on the modeling of the marginal (dimensional) means of these variables and their pairwise associations. The extension of the GEE method to mixed continuousdiscrete responses was discussed by [40] and [41]. [42] also avoided the need of explicit modeling of the covariance structure of bivariate longitudinal responses by using SUR [5] and GEE. As pointed out by [43], ambiguities concerning the definition of the working covariance matrix can result in a breakdown of the GEE-based estimation. For example in the longitudinal data analysis, if the true structure of correlation is equicorrelation, ðR Ã i Þ jk ¼ r, and that the working structure is autoregressive, (R i ) jk = α |j−k| , there is no solution forâ when −1/2 ρ < −1/3 [43]. This can be viewed as the major drawback of the GEE method since it can lead to the misspecification of within-subject associations in the context of longitudinal data analysis, for instance. Examples of procedures which bypass the need to explicitly model the underlying covariance structure of y include [42,44,45]. These procedures, generally, consist in regressing each component of y on relevant covariates of interest, followed by combination of these regression coefficients into a single global estimate of the covariates effect [25].
One way to avoid the direct specification of the joint distribution of y is to factorize it, leading to the so-called conditional models [46]. For two responses, the joint density f(y 1 , y 2 ) can be written as follows: f ðy 1 ; y 2 Þ ¼ f ðy 1 jy 2 Þf ðy 2 Þ ¼ f ðy 2 jy 1 Þf ðy 1 Þ ð 1Þ The choice of the conditioning response is of course arbitrary and requires very careful reflection about plausible associations between components of y. For example, in the specification of a conditional model such as f(y 1 |y 2 ), y 2 plays the role of covariate and different choices can lead to completely opposite results and conclusions [47]. In a clinical trial, for example, none of these factorizations will be of interest due to the conditioning on a post-randomization outcome which may partially attenuate the treatment effect on the other [26]. Another drawback of conditional models is that they do not directly lead to marginal inferences. Suppose that scientific interest would be in a comparison of the rate of longitudinal change in average of y 1 and y 2 . The factorization f(y 1 , y 2 ) = f(y 1 |y 2 )f(y 2 ) directly allows for inferences about the marginal evolution of y 2 , but the marginal expectation of y 1 requires computation of E½y 1 ¼ E½E½y 1 jy 2 , which, depending on the actual models, may be far from straightforward [26].
Modeling methods using latent structures. The second approach regarding models using latent structures can also be split in two sub-approaches including the strategy based on the reduction of the dimensionality of y and the mixed-effect models. The general idea of reducing the dimension of y is to use principal-component type analysis, or a summary function, to first reduce the dimensionality of y and then, use standard univariate multilevel models for the analysis of the principal factors or the retained summaries of y [48][49][50][51][52]. Although it is useful, simple to understand and easy to compute, this strategy of dimension reduction has some drawbacks such as the loss of information as discussed by [25] and [26]. [25] used this approach and retained the first principal-component only which explains 31% of the total variation in their data. They found out that the summary function does not have any physical significance and the inference results cannot be interpreted in terms of the effect of the covariates on the original (response) variables. They also found that the method fails to explore the association of the components of y along time, in the case of longitudinal studies. Furthermore, the method is not applicable in situations where all the components of y are not measured at the same time point [25], although a possible extension might be the use of functional principal components [53].
Regarding the mixed-effect models, [15], [54], [55], [56] and [57] proposed the use of random-effects models for multivariate longitudinal data. They pointed out that the main disadvantage of joining separate mixed models by allowing their model-specific random effects to be correlated is the increase of the dimension of the total vector of random effects with the number of outcomes, leading to computational problems. To circumvent these problems, [15] noted that all parameters in the joint model can be estimated by fitting all the bivariate models, based on f ðy s ; y t Þ ¼

Z Z
f ðy s jg s Þf ðy t jg t Þf ðg s ; g t Þdg s dg t for all m(m − 1)/2 pairs (y s , y t ), 1 s < t m, resulting from the main multivariate model. Estimators for the main parameters are obtained by averaging over the results from fitting the m(m − 1)/2 pairwise models. They then showed that the pseudo-likelihood theory can be used to derive the asymptotic distribution of these estimators, and used SAS procedures for mixed models [14] based on the Newton-Raphson algorithm to fit their models, following the approach in [17]. In some multilevel studies, focus is not to directly model y, but a few number of latent variables which cannot be quantified directly (e.g., depression and anxiety), but through measurements of y. In such situations, analysis may be conducted in two steps: the first produces the obtaining of the latent variables and the second proceeds to the joint analysis of these latent variables. For example, [58] proposed a latent factor linear mixed model to capture the joint trend over time of latent variables. the authors reduced, indeed, the high-dimensional responses to low-dimensional latent factors by the factor analysis model, and then used the multivariate linear mixed model to study the longitudinal trends of these latent factors, where the estimates have been done using the EM algorithm. To deal with missing values in multivariate longitudinal analysis using multivariate linear mixed-effects model, [59] proposed multiple imputations using Markov chain Monte Carlo, where they used EM algorithm for the parameters estimation. Here, the authors sped up the EM algorithm by analytically integrating the random effects out of the likelihood function, avoiding to treat them as missing data. [60] used EM based modeling to estimate the parameters of the multivariate linear mixed model under a SAS macro program encoded in IML. Although the EM algorithm is known to be slow, one of the biggest advantages of this method is that it is not computationally expensive, even with a large number of response variables. In this context, our contribution is the writing of the EM-based estimators in a more general form than those used in [58,59] and [60]. The expressions of the EM-based estimators used in this paper can easily perform any analysis in the framework of the multivariate multilevel data analysis using multivariate linear mixed-effects model.
Another technique somewhat close to those discussed in [58] is the structural equationsbased techniques. For example, [61] developed linear structural equations with latent variables approach. Considering y ¼ ðy > 1 ; y > 2 Þ > , this approach can be expressed as follows: y i = μ i + G i η i ; i = 1, 2 and βη 1 = γη 2 , where η i , i = 1, 2 are the latent variables, β(m × m) and γ(m × n) are coefficient matrices governing the linear relations of all variables involved in the m structural equations. G i , i = 1, 2 are known matrices. The parameters of the model may be estimated by gradient and quasi-Newton methods, or a Gauss-Newton algorithm that obtains least-squares, generalized least-squares, or maximum likelihood estimates. One modeling strategy which fuses together mixed-effects model and VAR model in order to analyze multivariate multilevel data is the so-called multilevel-VAR method. For example, [62] used the multilevel-VAR model in the context of network inference in psychopathology, where they used the population standard deviation of the person-specific random effects to construct a network representing individual variability. Examples of multilevel-VAR modeling include [63] and [38]. State space models [64] which are useful to investigate the dynamical properties of latent variables can also be used to analyze multivariate multilevel data. For example, [65] introduced an extension of the basic state space model which is flexible and general in the sense of it is applicable to any time series for multiple systems.
Methods for estimating the connectivity maps containing heterogeneity may also be applied to analyze multivariate multilevel data. [66] presented the Group Iterative Multiple Model Estimation (GIMME) approach, which addresses the issue of heterogeneity (the need for individual-level maps) in effective connectivity mapping while capitalizing on shared information to arrive at group inferences. Unlike mixed-effects models, GIMME allows for the structure of the connectivity maps to be unique across individuals [66].
One can also use a nonparametric function f to handle the relationship between the components of y and the covariates [67][68][69]. This strategy requires also to have sufficient data per subject, in the case of multivariate longitudinal data. Other estimation strategies implemented under softwares and discussed by [70] can perhaps be extended to the multivariate analysis case, when necessary.
Let us finally point out that the software packages which can easily and accurately analyze (jointly) the data of multivariate multilevel type are extremely rare, and one arranges the data and manipulates packages primarily designed for fitting univariate models to handle their analysis. The SabreR [71] package, under the R software [72], which has been devoted to jointly fitting up to three mixed-effects models, with random intercepts only, has been recently removed from the depot. These facts prove by themselves that the analysis of multivariate multilevel data in a single framework is a challenging task. Bayesian-based approaches can be implemented using packages like R2WinBUGS [73] under the R software, and are useful but very time consuming and require a good expertise from the user who can easily be discouraged.

Model and notations
The model discussed here is the multivariate linear mixed-effects model (or the multivariate linear multilevel model), including all the correlations between the random effects, but the marginal residual terms are assumed to be uncorrelated. For a more general multivariate linear mixed-effects model, the dependent variables are assumed to be correlated, conditional on the random-effects. That is, the marginal residual terms are correlated. In this paper, as in many other works (see for example, [59,60,74] and [58]), we assume that conditional on the random-effects, the dependent variables are uncorrelated. In the context of using EM algorithm in estimating the model parameters, this assumption allows to derive the EM-based estimators for the residual variance parameters. If the dimensional residual terms are assumed to be correlated, the EM-based estimators of theirs variance parameters are not easy to deal with and we don't treat this case here. This model assumes that both the random effects and the residuals follow Gaussian distribution, and is intended for the analysis of multivariate multilevel data in which the dependent variables are continuous. For the sake of simplicity we focus on the bivariate case (m = 2) in most of the paper, but the generalization to higher dimensions (m > 2) is straightforward. The model is as follows: For k 2 {1, 2}, β k and γ k denote respectively the fixed effects and the random effects vector of covariates, while ε k is the residual component. X k is a matrix of covariates and Z k a covariates-based design matrix. dim(X k ) = N k × p k and dim(Z k ) = N k × q k , where N k is the total number of observations in the dimension k of the model. p k and q k are, respectively, the number of fixed effect related covariates and the number of random effect related covariates in the dimension k of the model. If N k is a constant N for any k, the index k will be removed and N will denote the total number of observations in all dimensions of the model. The bold symbols represent parameters of multiple dimensions (i.e. S 1 concerns dimension 1 of the model while S concerns both dimensions).
Another way to easily understand the model is to express it using the levels of the covariate related to the random-effects. This expression (subject-based version) of the model is, generally, used in the framework of longitudinal data analysis, and lead to EM-based estimators (expressions) which are a particular case of the estimators expressions obtained in Eqs (17), (18) and (19) (for example, see [60]). Denoting by n the total number of subjects involved in the longitudinal study, the model can be expressed as follows: denoting by i a subject, for i = 1, …, n with and N 1i and N 2i are the dimensions of y 1i and y 2i , respectively. Here, we assume that the marginal residuals are homoscedastic (V½ε ki ¼ s 2 k I N ki ; k ¼ 1; 2), but the residual covariance matrices can be of full form as in Eq (4). In order to make clear the relation between the model described by Eqs (2), (3) and (4), and its version expressed by Eqs (5), (6) and (7), we propose below a detailed example.
Detailed example. We place ourselves in the case of longitudinal data where we observe two response variables y 1 and y 2 which are respectively the weight (kg) and the size (cm) of infants according to the score (V 2 ) of the quality of their food as well as the quality score (V 1 ) of their mothers' food. Infants are n = 3 girls (sex = F) and boys (sex = M) who are monitored over time. The dataset is presented by Table 1.
Suppose that the model at each of two dimensions has one random intercept by subject (infant) and one random slope by subject in the direction of the infant's age (in months). For example, considering an identifiability constraint covering the sex variable whose level F is the reference, the bivariate linear mixed model can be written as follows: where, explicitly     In the present example we have dim(X 1 ) 6 ¼ dim(X 2 ) and dim(Z 1 ) 6 ¼ dim(Z 2 ) due to the presence of the NA (Not Available) within the values of the variable V 1 . Removing information related to this NA in the dimension 1 of the model does not affect its dimension 2.
Referring to the subject-based version of the model, Then,

EM estimation
Let θ be the vector of unknown parameters in β 1 , β 2 , Γ, S 1 , S 2 . The EM algorithm requires an initial value of θ and some expressions (estimators) to update until convergence. In the next two subsections we provide these estimators, their initial values and the stopping criterion.

EM-based estimators of parameters
Theorem 1. Suppose that y ¼ ðy > 1 ; y > 2 Þ > satisfies the model based on Eqs (2), (3) and (4) and θ the vector of its unknown parameters while θ old is the previous value of θ provided by the EM algorithm. Let f(y,γ|θ) be the joint density function of y and γ given θ, and Qðyjy old Þ ¼ E½ log f ðy; gjyÞjy; y old . Let M be the mapping y old 7 ! Mðy old Þ ¼ŷ such that: Then, the EM-based estimator of θ, i.e.ŷ, is expressed through: Γ ¼ V½gjy; y old þ E½gjy; y old E½gjy; y old > ; ð18Þ where, E½g k jy; y old ¼ Covðg k ; yjy old ÞV½yjy old À1 ðy À E½yjy old Þ; V½g k jy; y old ¼ Γ k À Covðg k ; yjy old ÞV½yjy old À1 Covðg k ; yjy old Þ > and Covðg 1 ; yjy old Þ ¼ proof. For k 2 {1, …, m},b k ,Ŝ k andΓ optimize the quantity: where f(y,γ|θ) is the joint density function of the observed data y and the random effect γ. In the case of m = 2, we have: Since f is a multivariate Gaussian, using the dominated convergence theorem and the derivative under the integral sign, the differential of Q(θ|θ old ) yields: Partial derivatives of Q(θ|θ old ) yield: for k 2 {1, …, m}, k E½ðy k À X k b k À Z k g k Þ > ðy k À X k b k À Z k g k Þjy; y old S À1 k and @Qðyjy old Þ @Γ ¼ 1 2 Γ À1 E½gg > jy; y old Γ À1 À Γ À1 À Á : We then get EM-based estimators by setting these partial derivatives equal to zero. E½g k jy; y old and V½g k jy; y old are straightforward to get since ðg > k ; y > Þ > is a multivariate Gaussian.
Initialization and stopping criterion of the algorithm Various ways exist for obtaining starting values forb k ,Γ,Ŝ k ; for k ¼ 1; Á Á Á ; m. Taking inspiration from [75] and [60], we have separately fitted each dimension of the model by using the lme4 package [76] under the R software and have used marginal estimated parameters to initializeb k andŜ k . We then keep the expected random effectsg ¼Ê½gjy to initializeΓ by The stopping criterion is related to the relative error of the components of θ as follows: where (r) is the iteration index and θ j the jth component of θ. tol = 10 −5 seems to work well in practice.

Results and Discussion
Simulation studies In this section, simulation studies are used to investigate the computational properties of the EM-based estimators. For the sake of simplicity, these simulation studies are conducted using simulated bivariate longitudinal data sets. Through these studies, we pursue two objectives: the first is to assess the accuracy of parameter estimates and the second is to analyze the power of the likelihood ratio test performed via these EM-based estimators. In the following paragraph, we explain how we choose the parameters that have been used to simulate the working longitudinal data sets.
The working data sets. We suppose that we are following up a sample of subjects where the goal is to evaluate how the growth of the weight and the height of the individuals of this population are jointly explained by the sex, the score of nutrition (Nscore) and the age. We randomly choose through a uniform distribution the score of nutrition between 20 and 50, and the age between 18 and 37, using the R software. All the analysis in this paper are done using the R software. The subject's sex is also randomly chosen. The model under which we simulate the data sets is the following: n indicating the total number of subjects, for i = 1, …, n with The random effect related to the dependent variable 'weight' or 'height' is a vector composed by one random intercept and one random slope in the direction of the covariate 'Nscore'. The total number of observations is denoted by N. We randomly choose β 1 , β 2 , σ 1 and σ 2 whose values are in the first column of Table 2. Γ is also randomly chosen such that it is positive definite, with the following form: The covariance between the random effects γ 1 and γ 2 is set, intentionally, in order to be able to decrease or increase the correlation between the marginal random effects γ 1 and γ 2 , by changing the value of ρ, without losing the positive definiteness of Γ. This property of Γ will be used to assess the power of the likelihood ratio test through simulations, by changing the value of ρ. We simulate 1000 data sets with ρ = 0.8, in order to assess the accuracy of estimates using the EM-based estimators. With ρ = 0. 8 Empirical accuracy of the estimates. The 1000 data sets simulated in order to assess the accuracy of the estimates performed using the EM-based estimators contain N = 5000 observations provided by n = 300 independent subjects.
The mean and the standard deviation of the 1000 estimates are presented, respectively, in the second and the third column of the Table 2. The bias of the parameter estimates, which is the absolute difference between the true value of the parameter and the mean of the 1000 estimates, is calculated as measure of performance. These bias are contained in the forth column of the Table 2.
The bias contained in the estimates of β k and σ k ranges from 0.000 to 0.063 (Table 2), and the bias contained in the estimates of the entries of Γ ranges from 0.001 to 1.881 (Eq 37). These results show thatb k andŝ k (i.e.Ŝ k ) seem unbiased when Γ is biased. The estimates of Γ appear to be poorer than the estimates of all other parameters. In order to investigate which entries of Γ are particularly poorly estimated, we calculate the coefficients of variation (CV) of these entries. The CV computed here is obtained by dividing the standard deviation of the estimates by the true value of each entry of Γ. The CVs give an idea of the variability of estimates around the true values and enable to compare these variabilities between them. A particularly large value of CV could lead us to suspect that the corresponding input is particularly poorly estimated. Here, the CV ranges from 0.08 to 0.19, and is represented by the Fig 1 for more visibility. Given these CV values, it seems that none of the entries of Γ is particularly poorly estimated.
Deep investigation on the estimates' accuracy. Here, we compute the Mean Square Error (MSE) of the EM-based estimators with N = 600,1000 and N = 3000 across n = 50, 60, 100 and 300 to investigate how both values of n and N affect the quality of the estimates. For each value of n and N, we simulate 1000 data sets on which we estimate the model parameters and compute the MSE of these estimates.
Without surprise, Table 3 shows that the quality of estimates is clearly improved when both n and N grow. Estimations performed on dataset containing N = 3000 observations are more accurate than those performed with N = 600, observing the maximum value of the MSE in each case. For N = 600, information contained in Table 3 shows that the MSE related to n = 60 (60 subjects) are better than those related to n = 300. This result suggests a good tradeoff between the number of subjects and the total number of observations in order to have accurate estimates, especially if the number of observations is not very high. Once again, it appears thatΓ (Table 3) has the highest MSE for all values of n and N.
The bivariate likelihood ratio test. Considering the random effects covariance matrix Γ (see Eq (35)), the related correlation matrix is That is, the matrix of the correlations between the marginal random effects (i.e., the random effects related to the two dependent variables) is If we decide to test H 0 : Cor(γ 1 , γ 2 ) = 0 against H 1 : Cor(γ 1 , γ 2 ) 6 ¼ 0 in the case of these simulated data, we must know the distribution of the LR statistic S. In order to approximate the distribution of S, under H 0 , we proceed to an extensive simulation study in the next paragraph.
Empirical distribution of S under H 0 . In this paragraph, our goal is to investigate about the empirical law of the LR statistic S, under H 0 , when the size N of the data set increases. The simulated data sets used in this paragraph are also of bivariate longitudinal type, with N the total number of observations coming from n subjects. We choose N as an arithmetic sequence ranging from 50 to 2000, where the common difference is 50. We choose n = N/5 as it is sufficient to have two observations per subject for fitting the model. When N/n = 1, the randomeffects parameters and the residual variance are unidentifiable [1].
The expected (standard) asymptotic distribution of S, under H 0 , is a χ 2 (4). This may be explained by the fact that Cov(γ 1 , γ 2 ) and its transpose, Cov(γ 1 , γ 2 ) > , contain four entries, respectively, and Γ contains Cov(γ 1 , γ 2 ) and Cov(γ 1 , γ 2 ) > . Therefore, the difference between the number of entries of Γ which need to be estimated with Lðyjdata; H 0 Þ and Lðyjdata; H 1 Þ, respectively, is df = 4. Precisely, the parameters of interest are ρ η 1 τ1 , ρ η 1 τ2 , ρ η 2 τ1 and ρ η 2 τ2 (see Eq (14)). Fig 2 assumes an asymptotic distribution of χ 2 (4) and plots the Kolmogorov-Smirnov test's p-value (at log10 scale) against the total number of observations of the data set that has served to compute the LR statistic S. The blue curve is obtained by applying the empirical Bartlett correction to S and the red curve is obtained without correction. The horizontal dashed line represents log10(0.05). The empirical Bartlett corrected S, sayŜ B , can be expressed aŝ S B ¼ df Â S=Ê½SjH 0 . This Bartlett correction is applied in order to avoid the small size distortion of the χ 2 (df) distribution, when performing the LR test using a data set of small size [78]. Fig 2 thus helps to investigate how the LR distribution performs in finite and small dimension. It also helps to investigate, in the case of this bivariate correlation test, how the Bartlett correction helps to avoid the small size distortion of the chi-square approximation. As the total number of observations increases, the curves (red and blue) reach the dashed line, gradually. Assuming the χ 2 (4) distribution of S, it seems important to work with a data set containing at least 500 observations coming from at least 2 subjects, and to apply the Bartlett correction in order to avoid the breakdown of the procedure.
The type I error is generally controlled by the significance level of 10% (red and blue curves of Fig 3). It is clear that the control is almost full with the Bartlett correction (blue curve of Fig 3).
By simulating 1000 × 3000 realizations of χ 2 (4) distribution, we plot the red sheath represented in Fig 4. This sheath corresponds to the minimum and the maximum of the simulated χ 2 (4) realizations. The blue curve inside the red sheath represents the empirical LR statistics obtained from the 3000 simulated data sets under H 0 . This figure (Fig 4) shows that the asymptotic distribution of LR statistic related to the bivariate correlation test is not violated, since the blue curve does not go out of the red sheath.
Empirical power of the bivariate correlation test. In order to analyze the power of this likelihood ratio test performed with EM-based estimates, we calculate S on data sets which subjects. An asymptotic distribution of χ 2 (4) is assumed and the type I error (at log10 scale) is ploted against the total number of observations of the data set that has served to compute the LR statistic S. The blue curve is obtained by applying the empirical Bartlett correction to S and the red curve is obtained without correction. The horizontal dashed lines represent the significance levels of 5% and 10%, respectively. doi:10.1371/journal.pone.0159649.g003 have been simulated under H 0 and H 1 , respectively, leading to what we named S0 and S1 vectors containing the resulting values of S. We then plot a ROC curve with S0 and S1, where S0 is the vector of the cases while S1 contains the controls. We calculate S0 and S1 in different situations where we have changed the value of ρ in the following configuration: We maintain fixed η 1 = 5.27, η 2 = 6.00, τ 1 = 9.89, τ 2 = 1.17 and change ρ (2 {0.1, 0.2, 0.3, …, 0.9}). The number of subjects (n) and the total number of observations (N) have also been modified throughout these simulation studies. In each case, the estimated Area Under Curve (AUC) of the ROC curve with its confidence interval have been recorded to produce Fig 5. With n = 50 subjects, we detect, indeed, a correlation of 0.6 when the total number of observations is N = 3000; in contrast, if the total number of observations is N = 600, we perfectly detect a correlation of 0.7. In the case where estimates are of a higher quality (because they are performed on data sets having a sufficient number of observations N = 5000 and subjects n = 300), we plot ROC curves with low values of ρ (0.1, 0.2 and 0.3). We then show in Fig 6, the estimated AUC and its 95% confidence interval. Fig 6 shows that EM-based estimators lead to a good power of the bivariate correlation test, when we have a sufficient number of observations and subjects in the longitudinal study case. This goodness of the power of the bivariate correlation test persists when the correlation between marginal random effects is low (about 0.2).

Applications on real data sets
In this section we analyze two data sets by using the likelihood ratio test through the EM-based estimators presented above. The first dataset is of multivariate multilevel type and the second is, specifically, of longitudinal multivariate type.
Application to education data in the Netherlands. The data used here are named 'bdf' under the package nlme [13] of the R software. These data contain N = 3776 Grade eight students (aged about eleven years) in n = 208 elementary schools in the Netherlands [79]. These pupils were tested twice (with an interval of one year between grade seven and grade eight) on their proficiency in Dutch language and arithmetics, where the goal was to investigate which characteristics of schools can account for the differences in the effectiveness of schools with regard to pupils' progress in language and arithmetics. Most of the previous analyses of this dataset were concerned with investigating how the language test score depends on the pupil's intelligence, his family's socio-economic status and on related class or school variables. By fitting two independent (separate) models, [79] found that variables in Table 4 have a significant effect on post-test scores (language post-test and arithmetic post-test). These variables are: socio-economic status, intelligence score, age, gender and nationality. They also found a significant random slope related to the language pre-test and to the gender in the language post-test model.
Based on these results from [79] and some of their data (n = 131 schools, N = 2287 pupils; age and ethnicity are not present), we have fitted the bivariate linear mixed-effect model where post-test scores are the response variables and covariates are the pre-test scores, socio-economic status, intelligence score, gender and minority (a factor indicating if the pupil is a member of a minority group). Random intercepts and random slopes related to pre-test scores are integrated to the model on the school level in the configuration shown by the Table 5. Table 6 contains estimated fixed effects and residual standard deviations of the model.
The null hypothesis, H 0 , that the arithmetic post-test score and the language post-test score are independent, is rejected with a p-value of 1.436 × 10 −7 . This result justifies a joint analysis of post-scores conditionally on the covariates present in the model and is therefore a supplementary information obtained from the data. The estimated correlation matrix of the random effects is:r  Table 6 shows that covariates which are significant in the independent models are also significant in the joint model. The Minority covariate is neither significant in the joint model, nor significant in the independent models.Γ ij andr ij identify the item which is at the intersection of the row i and the column j of the matricesΓ andr, respectively. These matrices are filled from the top to the bottom in the order of (Intercept y 1 , Slope y 1 , Intercept y 2 , Slope y 2 ).
There is a clear inter-school variability with respect to the post-test scores (Γ 11 ¼ 15:15 and Γ 33 ¼ 30:69). Everything else being equal, schools that have good scores in arithmetics also have good scores in language (r 31 ¼ 0:84). The schools in which the differential effect of the pre-test score in arithmetics on the post-test score is strongly negative are in average above the average post-test score in language (r 41 ¼ À0:82); same as with the language pre-test score (r 32 ¼ À0: 84). This confirms that the scores in language and in arithmetics vary in the same direction in schools. The schools in which the differential effect of the pre-test score (in arithmetics or language) on the post-test score is strongly negative are in average above the average post score (r 21 ¼ À1 andr 43 ¼ À1), and vice versa. These schools have strived to bring the level of all pupils above the average. In contrast, pupils with a good initial level maintain their level without becoming excellent. The differential effect of the pre-test score has a very weak variability (arithmetics score:Γ 22 ¼ 0:02; language score:Γ 44 ¼ 0:01) and this implies that the pre-test score explains about 0.15% (in arithmetics) and 0.03% (in language) of inter-school variability of post-test scores.
We have fitted the bivariate model without random slopes (with random intercept only) to investigate if it fits more to the data than the model with random slopes, due to the weak variability of these random slopes. The results are presented in Table 7, where the estimated fixed effects and their significance generally remain the same.
The estimated covariance matrix, related to results contained in Table 7 of random effects iŝ A clear difference appears between these two distributions. We notice the same difference between distributions of random intercepts and slopes as shown in Fig 7c and 7d as well as the joint distribution of random slopes in Fig 7e and 7f. The joint bivariate model seems to fit more to the present data and we retain it for their analysis.
Application to malaria immune response data in Benin. The data come from a study which was conducted in nine villages (Avamé centre, Gbédjougo, Houngo, Anavié, Dohinoko, Gbétaga, Tori Cada Centre, Zébè and Zoungoudo) of Tori Bossito area (Southern Benin), where P. falciparum is the most common species in the study area (95%) [80] from June 2007 to January 2010. The aim of this study was to evaluate the determinants of malaria incidence in the first months of life of child in Benin. Details of the follow-up procedures have been published elsewhere [81].
Data description. Mothers (n = 620) were enrolled at delivery and their newborns were actively followed-up during the first year of life. One questionnaire was conducted to gather information on women's characteristics (age, parity, use of Intermittent Preventive Treatment during pregnancy (IPTp) and bed net possession) and on the course of their current pregnancy. Maternal peripheral blood as well as cord blood were collected into Vacutainer 1 EDTA (Ethylene diaminetetraacetic acid) tubes. At birth, newborn's weight and length were measured by midwives and gestational age was estimated using the Ballard method [82].
During the follow-up of newborns, axillary temperature was measured weekly. Symptomatic malaria cases, defined as fever (>37.5°C) with TBS and/or RDT positive, were treated with an artemisinin-based combination therapy as recommended by the Benin National Malaria Control Program. Systematically, TBS were made every month to detect asymptomatic infections. Every three months, venous blood was sampled to quantify the level of antibody against malaria promised candidate vaccine antigens. The environmental risk of exposure to malaria was modeled for each child, derived from a statistical predictive model based on climatic, entomological parameters, and characteristics of children's immediate surroundings as reported by [83]. Concerning the antibody quantification, two recombinant P. falciparum antigens were used to perform IgG subclass (IgG1 and IgG3) antibody quantification by Enzyme-Linked Immuno-Sorbent Assay (ELISA) standard methods developed for evaluating malaria vaccines by the African Malaria Network Trust (AMANET [www.amanet148trust.org]). Protocol was described in detail [84].
Data analysis. For our analysis, we use some of the data and we rename the proteins used in the study described above, for confidentiality reasons (some important findings are yet to be published). Thus, the proteins we use here, are named A1, A2, B and C, and are related to the antigens IgG1 and IgG3 as mentioned above in the description of the study. A1 and A2 are different domains of the same protein A, and C and D are two different proteins. Information contained in the multivariate longitudinal dataset of malaria are described in the Table 8, where Y denotes a protein which is one of the following: The aim of the analysis of these data is to evaluate the effect of the malaria infection on the child's immune (against malaria). Since the antigens which characterize the child's immune status interact together in the human body, we analyze the characteristics of the joint distribution of these antigens, conditional on the malaria infection and other factors of interest. The dependent variables are then provided by conc.Y (Table 8) which describes the level of the protein Y in the children at 3, 6, 9, 12, 15 and 18 months. All other variables in the Table 8 are covariates. We then have eight dependent variables which describe the longitudinal profile (in the child) of the proteins listed in Eq (43).
In the models that we fit to these data, we specify one random intercept by child and one random slope by child in the direction of the malaria infection. The illustration we do here is to jointly analyze each of the 28 pairs of proteins, in order to investigate if some profiles of proteins are independent, conditional on the configuration of the fitted model. After performing Fig 7. Posterior distributions of random intercepts conditional on the data related to School 47 in the education dataset. Left panels assume independence across the two dimensions while right panels assume dependence. Top panels for the joint distribution of the random intercepts, middle panels for the joint distribution of random intercept in first dimension and random slope in the second dimension, bottom panels for the joint distribution of the random slopes.
doi:10.1371/journal.pone.0159649.g007 To investigate the general configuration of these proteins, in terms of correlations, we build their hierarchical cluster tree using −log(p-value) as dissimilarity. This hierarchical cluster tree is presented by the Fig 8. The branch related to the IgG1 is different from the branch related to the IgG3. In other words, IgG1_A1, IgG1_A2, IgG1_B and IgG1_C are on the same branch which is different from the branch containing IgG3_A1, IgG3_A2, IgG3_B and IgG3_C (Fig 8). Relatively to both IgG1 and IgG3, A1 and A2 go together, and B and C also go together. These results are biologically very consistent, since A1 and A2 are domains of the same protein, and B and C are two different proteins. On the cluster (Fig 8), it also appears that the proteins IgG3_A1 and IgG1_B which are not significantly correlated (according to our bivariate test) are distant. Statistically, the model which may be used to jointly analyze these 8 protein profiles is not probably the model which contains all the 27 significant correlations, avoiding overfitting problems. Based on the results provided by the bivariate correlation test, it may be useful to perform a regularization procedure in the fitting of the full eight-variate model.

Conclusion
In the context of the multivariate linear mixed-effects model, we have suggested the more general expressions of the EM-based estimators than those used in the literature to analyze multivariate longitudinal data. These estimators fit the framework of the multivariate multilevel data analysis which, obviously, englobes the multivariate longitudinal data analysis framework. We also have built a likelihood ratio test based on these EM estimators to test the independence of two dimensions of the model. Furthermore, the simulation studies have validated the power of this test and have shown that this is an extremely sensitive test. In the context of longitudinal data, it allows to detect a modest correlation signal with a very small sample (ρ = 0.3, AUC = 0.81, with n = 60 subjects and N = 600 observations). In the simulation studies, the empirical distribution of the likelihood ratio statistic fits the χ 2 (4). The asymptotic properties of likelihood ratio statistics, under nonstandard conditions, have been shown by [85] and [86]. These works have been generalized by [87] to cover a large class of estimation problems which allow sampling from non identically distributed random variables. The asymptotic distribution of the LR statistic derived by [87] is a mixture of chi-squared distributions. In the context of likelihood ratio tests for variance components in linear mixed-effects models, [88] used the results of [87] to prove that the proposed mixture of chi-squared distributions is the actual asymptotic distribution of such LR used as test statistics for null variance components with one or two random effects. Based on these works, Further theoretical investigations may be done to properly find out the asymptotic distribution of the likelihood ratio statistic in the case of this bivariate correlation test. Finally, we have illustrated the usefulness of the test on two different real-life data. The first dataset, which is of multivariate multilevel type, concerns the effects of school and classroom characteristics on pupils' progress in Dutch language and arithmetics, where the scores in language and arithmetics are the two response variables which have been considered. Our method has yielded results that are consistent both with information in existing publications and with a conceptual understanding of the phenomenon. On this dataset, we have highlighted a joint effect between the scores in arithmetics and language within schools in the Netherlands. The second dataset, which is of longitudinal multivariate type, concerns a study of the effect of the malaria infection on the child's immune response in Benin. By jointly analyzing all the pairs of protein profiles of interest, we have plotted a hierarchical cluster tree of these proteins, using the bivariate correlation test. Information contained in this hierarchical cluster tree is consistent with the biological literature related to this issue.
The model as it is written is easily extendable to more dimensions despite a sparsity problem in choosing the parameterization of the covariance matrix or the precision matrix. Probably we could use this two-dimensional dependence test to structure a larger covariance matrix. The bivariate correlation test can help to construct iteratively, using a stepwise procedure, a parsimonious joint model containing all the components of y. This stepwise procedure may consist in adding to the constructing model, at each step, the significant correlation between two dependent variables. Using a model selection strategy, the model which fits more to the data will be retained. It could possibly be advantageous to turn to graphical LASSO type approaches to make a penalized estimation of this covariance (or precision) matrix. We could also resort to the rapid optimization methods such as that implemented in the lme4 [76] package, given the slow pace of the EM algorithm. It would be useful to assess the interest of this method compared to some heuristics such as the one which consists in setting one marginal response variable as a covariate of the other(s).
Supporting Information S1 File. Empirical data sets used in the applications section.