Random intercept and linear mixed models including heteroscedasticity in a logarithmic scale: Correction terms and prediction in the original scale

Random intercept models are linear mixed models (LMM) including error and intercept random effects. Sometimes heteroscedasticity is included and the response variable is transformed into a logarithmic scale, while inference is required in the original scale; thus, the response variable has a log-normal distribution. Hence, correction terms should be included to predict the response in the original scale. These terms multiply the exponentiated predicted response variable, which subestimates the real values. We derive the correction terms, simulations and real data about the income of elderly are presented to show the importance of using them to obtain more accurate predictions. Generalizations for any LMM are also presented.


Introduction
In economics and other scientific areas such as medicine, geology, and genetics; it is common to study linear models with a dependent variable defined in a logarithmic scale; for instance in studies related to income [1], health insurance [2], medical expenditures [3], health care utilization and earnings [4], or sediment discharge [5]. In the logarithmic scale, the variable can have an associated normal distribution, whereas in the original scale this is not true. In other words, dependent variables correspond to random variables with a log-normal distribution, a skewed distribution associated with variables taking only positive values, which has been extensively used in analyses for real data corresponding to stock prices, income (without higher-income individuals), time from infection to first symptoms, distribution of particles, number of words per sentence, age of marriage, size of living tissue, etc. There are instances in which presence of heteroscedasticidity can be solved considering such logarithmic scale; however, sometimes this issue is not solved even after the transformation, for instance when the variability is not proportional to the squared conditional mean response given values of the explanatory variables. Additionally, there are data in which nesting between observations is present, for instance, when observations belong to the same spatial cluster. In this case, independence between observations is not satisfied, since the values in a same cluster are correlated, and a random intercept model is preferred. A random intercept model (RIM) in a logarithmic scale is a special type of linear mixed model (LMM) [6,7], in which: where i = 1, . . ., m, j = 1, . . ., n i , m is the number of clusters, n i is the number of observations in the ith cluster, log(Y ij ) is the response associated with the jth observation in the ith cluster, x ij = (x ij1 , . . ., x ijp ) 0 is a vector of dimension p associated with the jth observation in the ith cluster corresponding to the p fixed effects given in the regression parameters vector β = (β 1 , . . ., β p ) 0 . Variable γ i represents an intercept random effect associated with cluster i, which allows to model the relationship among observations for each cluster, it has a normal distribution Nð0; s 2 g Þ; additionally, γ i , for i = 1, . . ., m, are independent and identically distributed (i.i.d.). The random error is � ij , and since heteroscedasticidity is assumed in (1), � ij � Nð0; s 2 w À 1 ij Þ i.i. d., where w À 1 ij is a known number that allows different variability between observations and clusters. The terms w À 1 ij are assumed as known; and for instance, they could be obtained using the unit size or the ELL method [8,9]. Under this method, two linear models are fitted, a first model (beta) is the corresponding marginal model and a second one (alpha) is a model associated with transformed residuals obtained from the beta model (residuals obtained after deleting the effects associated with the random effects); then, an approximation of the terms w À 1 ij can be used in the random intercept model. Finally, the random effects γ i and the errors � ij are independent.
In many cases, it is necessary to return to the original scale of Y ij . Traditionally, this is done by simply applying an exponential function to the predicted values obtained from the model. However, this approach does not consider that the random terms involved in the model are transformed as well, and predictions are subestimated. In some cases, a generalized linear mixed model (GLMM) [10] with an associated distribution according to the data type could be used (e.g., gamma, Poisson, etc.). However, sometimes it is preferred to use the normal distribution in the logarithmic scale, when we know the dependent variable has a log-normal distribution (as far as we know, it is not one distribution included in programs that fit GLMM; and, transforming the dependent variable in a normal GLMM would be similar as what we are doing), or when other processes depend on such normal RIM. For instance, in small area estimation there are methods based on the RIM, e.g. the empirical best predictor (EBP) method [11], in which parameters are estimated from a RIM using the sample information; after that, the conditional distribution of the out-of-sample data given the sample data can be derived from the normal distribution assumption; the predicted values, simulations, and Monte Carlo approximations are used to estimate poverty measures at a small area level (for elements in or outside of the sample), and finally, a parametric bootstrap mean squared error (MSE) estimator is obtained based on the same RIM. Additionally, not all possible distributions are implemented in GLMM and a logarithmic transformation must be used in LMM. In this sense; recently, [12] proposed a model for data showing skewness at the log scale based on an extension of a distribution called generalized beta of the second kind, which can be seen as a random effects model designed for skewed response variables extending the usual log-normal-nested error model, they also found empirical best predictors for poverty measures in small areas.
Statistical models to correct the logarithmic transformation in linear regression models have been proposed by different authors, e.g. [5,13,14]; other authors have used Bayesian methods to deal with it, e.g. [15]. Some extensions to address heteroscedasticity in linear regression models with a logarithmic scale have also been proposed, e.g. [2,16]. Other authors have compared the logarithmic transformation in linear models with other type of models in different applications, e.g. [4,[17][18][19][20][21]. Moreover, others have studied the Box-Cox transformation in linear models, e.g. [3,[22][23][24], being the logarithmic transformation a particular case. Finally, a Box-Cox transformation in LMM has been studied by [23].
In this paper are derived the correction terms that should be used to obtain more accurate predictions in a RIM with heteroscedasticity, when predictions are desired in the original scale. In economics, for instance, this allows a more accurate prediction of income or to improve predictions of measures depending on it, for instance poverty measures. These correction terms multiply the exponentiated predicted values obtained from the RIM, calculating the latter values (without correction) being the usual procedure. These terms are important since they allow to obtain more precise predictions, with a smaller MSE. Since the RIM contains two random terms, the random effect and the error term, two correction terms are obtained. When these correction terms are not included, subestimation occurs. Similar terms have been obtained for linear regression models, but, as far as the authors know, they have not been derived for RIM. These results are relevant, not only when a RIM is fitted in some data, but also when methodology is based on such models, for instance in small area estimation.
The motivation example considers a sample of aging individuals over 60 years old in Mexico, in which some household and socio-demographic measures and income are known. The information is available by state i = 1, . . ., m; for m = 32, the number of individuals by state is n i , Y ij is the income by individual j = 1, . . ., n i and state i, and there are a total of n ¼ P m i¼1 n i observations. Assume we want to estimate the expected income for these individuals, E[Y ij ], according to the available explanatory variables. This process could be useful to estimate income for another set of similar individuals, in which income is not available but the other variables are, for imputation, or in a simulation process, as in small-area estimation which depends on simulating income in out-of-sample individuals. In the framework of linear models, there are several options for this estimation; in some of them, the distributional assumptions are better satisfied in a logarithmic scale, using log(Y ij ) as response, but the estimations are required on the original scale. A first option is to estimate the expected income, without a common random effect for state, simply using a linear regression in a logarithmic scale and using a correction term to estimate in the original scale. A second option is fitting a linear model on the log-transformed scale with random effects for state obtaining the exponentiated predicted values to estimate the income. A third option is to associate a gamma distribution to the income, a commonly used distribution for positive skewed data such as the income or costs [25], and fit a generalized linear mixed model including random effects for state. The fourth option we propose is to apply correction terms on the second option. We show here that this improves the precision of the estimated values. We apply the corrections terms to both the real data set and in simulated data to evaluate which of the different described options including random effects has a better performance, for the simulations we variate the number of clusters, observations by cluster, parameters associated with the variance of the random effects and error terms, and consider models under different distributions. This paper is organized as follows. In the second section, we briefly present the correction term used in a linear regression model in a logarithmic scale, including the so-called smearing estimate. In the third section, we derive correction terms for a RIM with heteroscedasticity, and the corresponding correction terms for a RIM with homoscedasticity are obtained as a particular case. In the fourth section, we obtain simulations using a log-normal distribution to show that the MSE is minimized when the correction terms are used and in certain cases when gamma distribution simulations are used, comparing the estimations using our method with those derived from a generalized linear mixed model. Additionally, the real data corresponding to income in elderly people is analyzed to show the use of the correction terms and to compare predictions using different options to calculate them. In a fifth section, we propose a generalization for LMM and for transformations different from the logarithm. Finally, the conclusion is presented in the last section, and some of the linear algebra used for the calculations is presented as Supplementary Material.

Correction term associated with a linear regression model in a logarithmic scale
In our motivation example, assume that we estimate income in a logarithmic scale without considering a random effect for state. In this case, we are in the framework of a linear regression and the second index j is unnecessary, and thus, for simplicity we eliminate it in this section.
A linear regression model in a logarithmic scale, also called log-normal linear model, is defined as: where Y i is the response variable for the ith observation, x i = (x i1 , . . ., x ip ) 0 is a vector of the p explanatory variables for the ith observation, β = (β 1 , . . ., β p ) 0 is a vector of dimension p of regression parameters, and u i is an error term, where u i � N(0, σ 2 ) i.i.d., for i = 1, . . ., n. Noticing that the integral in the last equality is equal to one, hence, and so E½Y i � ¼ expðx 0 i βÞexpðð1=2Þs 2 Þ: Therefore, the estimator of the predicted response is given byÊ Using the same reasoning and considering heteroscedasticity in (2), allowing to have different variability among subjects, i.e. u i � Nð0; The last term can be estimated by replacing β with the least squares estimator β ¼ ðX 0 XÞ À 1 X 0 logðYÞ, and σ 2 with the biased (maximum likelihood, ML) or unbiased estimator,ŝ 2 ¼ RSS=n orŝ 2 ¼ RSS=ðn À pÞ; respectively, where RSS ¼ ðlogðYÞ À XβÞ 0 ðlogðYÞ À XβÞ is the residual sum of squares. Observe thatÊ½Y i � ¼ expðx 0 iβ Þexpðð1=2Þŝ 2 w À 1 i Þ is the estimator of the predicted response associated with a log-normal distribution.
Since F is not completely known, the empirical distribution, is used, where from (2)û i ¼ logðY i Þ À x 0 iβ is the estimated value of u i , and the indicator function I fû i �ug is equal to 1 ifû i � u and 0 otherwise.
Assuming Y 0 corresponds to an observed response with associated explanatory variables values x 0 , the predicted response is: Furthermore, substituting the regression parameter β by the estimatesβ, the estimated predicted response is given by:Ê

Correction terms in a RIM with heteroscedasticity and a logarithmic scale
In this section we derive the correction terms for a RIM with heteroscedasticity in a logarithmic scale. In terms of our motivation example, the process corresponds to estimate the expected income by fitting a model in a logarithmic scale adding a common random effect for state and using correction terms that allow more precise estimations in the original scale. First, a preliminar estimator is introduced. Second, an estimator based on the random effect best linear predictor is presented. Third, an estimator based on a conditional expectation is proposed. Finally, a correction term based on the smearing estimate is given.

A preliminar estimator
From the RIM model given in (1) then, by using independence between γ i and � ij , the expectation of the response in the original scale is: As in (3), the expectations of the exponentials of γ i and � ij are E½expðg i Þ� ¼ expðð1=2Þs 2 g Þ and E½expð� ij Þ� ¼ expðð1=2Þs 2 w À 1 ij Þ, and, as a consequence, Therefore, using the corresponding estimators, whereŝ 2 andŝ 2 g are variance estimators corresponding to the error and random effects terms, respectively, andβ is the fixed effects estimator, estimated by using ML or restricted ML estimator (REML) methods.

An estimator based on the random effect best linear predictor
The predicted values in (5) do not consider that the random effect γ i can be estimated through the best linear predictor,ĝ thus having a predictor for each ith observation, for i = 1, . . ., m. The vector of estimated random effects corresponds toĝ ¼ E½gjlogðYÞ�.
Similarly, to obtain a better predictor associated with Y ij , it is more adequate to use (4). Hence, the predictor is: A first approach to estimate E[exp(γ i )|log(Y)] could be simply by usingÊ½expðg i ÞjlogðYÞ� ¼ expðĝ i Þ; so the estimator (7) would bê is the predicted value corresponding to log(Y ij ) exponentiated to return to the original scale (naive estimator). This term is multiplied by a term associated with the error. Assuming heteroscedasticity,Ê½expð� ij Þ� ¼ expðð1=2Þŝ 2 w À 1 ij Þ; and the estimator (7) would be: Note that, according to the Jensen inequality, Hence, a better prediction can be derived by ].

An estimator based on E[exp(γ i )|log(Y)]
In this subsection, we obtain a better predictor by computing directly the conditional expectation E[exp(γ i )|log(Y)]. For this purpose, first we derived the conditional distribution of the random effect γ i conditional to the transformed response for the sample, log(Y) = (log(Y 1 ), The random effect has an univariate distribution γ i � N(0, σ 2 ), whereas log(Y) has a multivariate distribution log(Y) � N n (Xβ, V), where X is the design matrix of dimension n × p of fixed effects associated with the response, β is a vector of dimension p of regression parameters, and V is the variance and covariance matrix Var[log(Y)] with dimension n × n. The expected value of this conditional distribution corresponds to the predictor given in (6), whereas using properties concerning the distribution of conditioned multivariate normal random variables, it can be shown (see Proposition 1 in Supplementary Material) that the variance associated with the conditional distribution corresponds to

PLOS ONE
RIM and LMM with heteroscedasticity in a logarithmic scale: Correction and prediction in the original scale Thus, Using the result given in (3), corresponding to the expected value associated with a lognormal random variable, As a consequence, the predictor of the response in the original scale,Ê½Y ij �; is estimated considering heteroscedasticity and a predictor E[exp(γ i )|log(Y)], for each i = 1, . . ., m, and corresponds to From (11) and assuming w À 1 ij ¼ 1 (or w ij = 1), which is a model with homoscedasticity in the error term, P n i j¼1 w ij ¼ n i ; and the predictorÊ½Y ij � corresponds to Observe how the predicted values given in (11) or (12) include the term expðx 0 ijβ þĝ i Þ, which is the naive estimator associated with Y ij . This value is corrected according to two factors, one corresponding to the error and another to the random effect. In contrast, the predictor in (9) only considered the term associated with the error term.
Under heteroscedasticity, the predictor given in (9) subestimates the real value since the term E[exp(γ i )|log(Y)], i = 1, . . ., m, is not used. However, it can be easier to calculate since the sum P n i j¼1 w ij is not included. Once, E[exp(γ i )|log(Y)] is calculated, the predictor is given in (11). As far as we know, this expected value had not been obtained before.
Observe that all estimators given in (9), (11), and (12) consider that a normal distribution is associated with the transformed data.

A correction term based on the smearing estimate
We saw in the second section that a smearing estimator [3] is a nonparametric statistic used to estimate the expected response on the untransformed scale after fitting a linear model on the transformed scale, thus being useful when the normality assumption is not satisfied. In this subsection, we used this type of estimator to obtain correction terms for the RIM. One variant of the estimators in a model considering homoscedasticity, Eq (12), is obtained by using a smearing estimate for the error term: One variant, considering different variance in each ith cluster, and the corresponding smearing estimate, is:

Experimental results
In this section, the proposed correction terms for RIM with heteroscedasticity in a logarithmic scale are applied to simulation-based scenarios and to an income for elderly people real dataset.

Simulation-based experiment
A simulation-based experiment is conducted to analyse the correction terms proposed in this paper.
The goal of this simulation experiment is to demonstrate that the proposed approach implementation properly works, and, therefore, the real values are adequately recovered by the estimated ones. We generated one hundred datasets for different scenarios, the generated covariates and general structure are as follows.
A The intercept random effects γ i , for i = 1, . . ., m, are generated from a normal distribution with mean 0 and variance s 2 g ¼ f0:2; 0:4g, g i � Nð0; s 2 g Þ. In order to include heteroscedasticity, fixed values were proposed for the weights w ij . They have been deterministically assigned as w ij = (i + 1)/10 + j/1000, for i = 1, . . ., m and j = 1, . . ., n i . The error terms vector � is generated from a multivariate normal distribution N n (0, R), . . . ; s 2 w À 1 in i Þ, and σ 2 = {0.2, 0.4}. Note that, by using properties of the multivariate normal distribution, it is also possible to generate � by the following way: first simulate � � from a multivariate standard normal distribution N n (0, I n×n ), or equivalently generate � � ij from a univariate standard normal distribution N(0, 1), then ij . Finally, the response variable in the logarithmic scale is obtained from (1), this is, by substituting the simulated values in Fig 1 shows one simulated dataset. These graphics show that the response variable log(Y) has a linear relation with Xβ (graphic in the left). In contrast, as sometimes occurs in practice, a logarithm transformation is needed on Y to get a linear relationship with the explanatory variables (graphic in the right). Fig 2 shows the simulated response variable log(Y) and Y, compared with their estimated responses. The squared red dots represent the naive estimates without correction terms in (8), and the blue triangles represent the estimated values obtained by using the correction terms in (11). Note that in general the estimates by using the naive estimator are lower than the estimates obtained by using the proposed correction terms in (11), showing that the naive estimator subestimates the real values.
Multiple data sets were generated according to the specifications provided in the above paragraphs, and the model's performance was analyzed by using the mean squared error (MSE), given by ðY ij ÀÊ½Y ij �Þ 2 : Table 1 shows the means and standard deviations (sd) associated with the MSE for the one hundred datasets simulated for each scenario defined according to different values of m, n i , σ 2 , and s 2 g . The MSE are computed by using different estimatesÊ½Y ij �; in specific, first by using the naive estimator of (8) (column MSE naive ), and then by using the correction terms of (5), (9), (11), and (13) (columns MSE (5) , MSE (9) , MSE (11) , and MSE (13) respectively). Finally, a GLMM with a gamma distribution and a logarithmic link is fitted (column MSE Gamma ), being this an alternative to model positive skewed variables avoiding fitting a transformed response in a LMM.
From these simulation scenarios it is shown that, assuming heteroscedasticity, the best estimations, with the lowest MSE mean and sd, are in general those obtained by using the correction terms given in (11). Standard deviations are always larger in column MSE (9) . Moreover, just in one case the means and sd's in column MSE Gamma are lower than others.
Another type of datasets were generated from a GLMM with a gamma distribution associated with the response Y ij and logarithmic link. The values of the parameters are similar to the ones used in the previous simulation experiment concerning the RIM in a logarithmic scale, having analogous balanced and unbalanced designs with the same values of m, n i , x il , p, β, and s 2 g ; and including the heteroscedasticity terms w ij . The response variable of the GLMM with gamma distribution and logarithmic link where the probability density function of Y � Gamma(shape = a, scale = s) is given by f Y ðyÞ ¼ 1 GðaÞs a y aÀ 1 e À y=s , a > 0, s > 0, and where g i � Nð0; s 2 g Þ. The shape parameter a depends on α, which was chosen as α = {1, 1.5, 5}. The purpose of simulating data based on a GLMM with gamma distribution and logarithmic link was to see how our approach worked even when the true distribution associated with the data was not Gaussian. However, our simulations are based on a model extensively used in positive skewed distributions, being this model an alternative to fitting a LMM on the transformed response. In fact, for some particular values assigned to the shape and scale parameters, the distribution associated with the data was similar as that observed for the LMM in the logarithmic scale. The means and standard deviations (sd) of the MSE are computed by using the estimates,Ê½Y ij �, given by the naive estimator, correction terms of (5), (9), (11), and (13) and a GLMM with gamma distribution and logarithmic link, respectively.
https://doi.org/10.1371/journal.pone.0249910.t001 Table 2 shows the means and standard deviations (sd) associated with the MSE for the one hundred datasets simulated for each scenario, each one defined according to different values of m, n i , α, and s 2 g ; and assuming heteroscedasticity. From these scenarios, it is shown that when the parameter associated with shape α is much bigger than 1, the best estimations, those  having the lowest MSE mean and sd, are in general those obtained by using the correction terms given in (11). Hence, in this case, the estimations by using the RIM in a logarithmic scale and the corrections terms are good, even better than those obtained using a GLMM with a gamma distribution and logarithmic link. However, when α is close to 1 the estimations obtained by using the RIM in a logarithmic scale are worst, which makes sense, since a gamma distribution with parameter α = 1 is an exponential distribution, which completely differs from a log-normal distribution.

Income for elderly people data application
Returning to our motivation example, we performed analyses based on the National Household Income and Expenditure Survey (Encuesta Nacional de Ingresos y Gastos de los Hogares, ENIGH) 2016 [26], a biennial study to examine income and its distribution in Mexico. Elderly people were considered (60 or more years old). Quaterly total income, that is the income considering all possible sources of income, was obtained for each person as a response variable. Household and sociodemographic information was considered as well. To avoid presence of outliers, only people with an income between 2,000 and 40,000 Mexican pesos were considered. Hence, a total of n = 18, 512 participants were included in the analyses. As already mentioned in the Introduction section, a logarithmic scale was used for the response variable. To help deciding which variables to use as explanatory, we first fitted linear regression models. According to the obtained results, some variables were modified (categories collapsed) or generated using information from other questions. The , number of residents, type of the location where the household is in (0 = Rural, 1 = Urban, a location is considered as urban when its size is of 2,500 or more residents), socioeconomic stratum (1 = Low, 2 = Low medium, 3 = High medium, 4 = High), and flooring material (1 = Ground, 2 = Cement, 3 = Wood, mosaic, or another floor recovering).
Since individuals are nested in each of the 32 states, an intercept random effect for state was included, each state having between 400 and 1000 observations. The parameter (fixed effects) estimations associated with the RIM model with homoscedasticity in the error term are shown in Table 3. The estimated standard deviation associated with the random effect,ŝ g ; is approximately 0.08, and the corresponding value associated with the error term,ŝ; is approximately 0.6. A likelihood ratio test comparing the RIM model with a model without the random effect, i.e. s 2 g ¼ 0; was obtained, with an associated p-value of less than 0.05 (this number when divided by two is even smaller, a calculation that must be made since the hypothesis involves a value in the frontier of the parametral space). Hence, a random effect is necessary and a linear regression model (without random effects) should not be fitted, which we defined as a first option to possibly use for this data in the Introduction section. Fig 3 shows the histogram and qq-plot associated with the residuals. They are indicative that the normality assumption is satisfied, the same being true when the random effects qqplot is examined.  (12), a particular case of (11), and which, according to the simulation results, are the best estimations (with lowest MSE). Note that the estimates derived through the naive estimator are in general lower than those derived through the proposed correction terms in (12), showing that the naive estimator subestimates the data. In terms of the options discussed in the Introduction section, the naive estimates and those including the correction terms corresponded to the second and fourth, respectively. When the naive estimator is obtained and compared with the true values, the squared root of the mean squared error is 7027.784, whereas using the correction factor given in (12), the squared root of the mean squared error is 6829.003, which is an improvement. In terms of the third option discussed in the Introduction, we fitted a GLMM using a gamma distribution for the response variable, a logarithmic link function, and both a penalised quasi-likelihood (PQL) and Laplace approximation methods, we checked that the normality assumption in the estimated random effects is satisfied. We obtained values for the squared root of the mean squared error of 6973.41 and 6979.769 under  the PQL and Laplace methods, respectively. Hence, in this example the estimates under the correction term are even more precise that those obtained using a GLMM. We fitted models considering some heteroscedasticity schemes, for instance using the ELL method, the cluster size, or the squared residuals, but only with the former method we obtained an inferior mean squared error than under the homoscedasticity scheme; however, the normality assumption was not satisfied as well.

Mimic simulation example
Validating our proposed correction terms for RIM including heteroscedasticity in a logarithm scale, we did a simulation experiment based on 100 data sets of size 2000. The simulated data approximately mimic the motivating data of the income for elderly people, assuming two types of weights associated with heteroscedasticity: the cluster size and one of the explanatory variables, as sometimes is found in real data. Details are given in the S2 Text. Our simulation strategy generated the means and standard deviations of the MSE for each one of the corrections terms considering the two types of weights and varying values associated with the variance of the random effects and error terms, see Table 4. The estimations with the lowest MSE corresponded to those obtained using the correction terms associated with Eqs (9) and (11). See details and a table including more values in the Supplementary Material.

Generalization to linear mixed models and with functions different from the logarithm
In this section, we generalize the correction terms for any LMM and for transformations different from the logarithm. We have seen that the estimators based on the conditional expectancy associated with the random effects given the transformed response have a better performance; thus, we present only this type of estimator for LMM, obtaining a closed formula. A LMM includes q random effects; for instance, we can have random effects associated with some or all the fixed effects. In its matrix form, a LMM corresponds to where X is the design matrix associated with the fixed effects of dimension n × p and β is the corresponding vector of parameters of dimension p. On the other hand, γ = (γ 1 , . . ., γ m ) is a vector of dimension mq of random effects, where γ i a vector of dimension q corresponding to all random effects associated with a cluster i, with distribution γ � N mq (0, G), with G a diagonal matrix of dimension mq × mq, G = diag (D, D, . . ., D), where D is the variance and covariance matrix of dimension q × q associated with the random effects, which is assumed to be the same for all clusters. This term is multiplied by the matrix U, a block diagonal matrix of dimension n × mq given by U = diag(U 1 , U 2 , . . ., U m ), with U i of dimension n i × q. The vector of errors has distribution � � N n (0, R), where R is a block diagonal matrix of dimension n × n given by R = diag(S 1 , S 2 , . . ., S m ), with S i a diagonal matrix of dimension n i × n i given by The error terms and random effects are assumed independent. Considering an individual j in a cluster i; i = 1, . . ., m and j = 1, . . ., n i , the expression analogous to (1) associated with a LMM is: where u ij is the jth row corresponding to matrix U i . From the joint distribution of the random effects γ and transformed response log(Y), we obtain (see Proposition 2 in S1 File) that the variance and covariance matrix associated with cluster i, Var(γ i |log(Y)), for i = 1, . . ., m, is and g i j logðYÞ � N q ðĝ i ; Var½g i jlogðYÞ�Þ; whereĝ i is the best linear predictor of γ i ,ĝ i ¼ E½g i jlogðYÞ�. Consequently, and using the expected value corresponding to a log-normal distribution: Thus, to estimate E[Y ij ] in a cluster i; i = 1, . . ., m, for an individual j; j = 1, . . ., n i , where Y ij is modeled as in (14), we use the estimator expðð1=2Þŝ 2 w À 1 ij Þ for the random error � ij , multiplied by the expected value associated with the random effects conditional to the response E½expðu 0 ij g i ÞjlogðYÞ� calculated in (16), and the constant part expðx 0 ijβ Þ. The estimator corresponds to: In (17), all terms are known once substituting the estimated variance and covariance terms for the random effects in D andŝ 2 in S i , both D and S i part of Var(γ i |log(Y)). These terms and obtained after fitting the model.
For instance, consider a model including random effects associated with the intercept and a variable u. For each cluster i = 1, . . ., m, γ i = (γ i1 , γ i2 ) 0 , with γ i1 and γ i2 scalars corresponding to the random effects for the intercept and variable u, respectively. The values associated with variable u in cluster i can be accommodated in a vectorial form as u i ¼ ðu i1 ; . . . ; u in i Þ 0 , thus U i is a matrix of dimension n i × 2 such that U i ¼ ð1 n i ; u i Þ 0 , where 1 n i corresponds to the intercept. Finally, where s 2 g 1 and s 2 g 2 correspond to the variances associated with the random effects for the intercept and variable u, respectively, and s 2 g 1 g 2 is the corresponding covariance. It is easy to derive that in this case (15) corresponds to with A 0 i = ðs 2 g 1 1 n i þ s 2 g 1 g 2 u i ; s 2 g 1 g 2 1 n i þ s 2 g 2 u i Þ and D given in (18). This equation can be substituted in expression (17) using estimations of s 2 g 1 , s 2 g 2 , and s 2 g 1 g 2 , values obtained after fitting the LMM in any statistical software.
We could consider a transformation more general than a logarithm, for instance a Box-Cox transformation g, whose inverse follows a power-normal distribution. Each observation Y ij ; for j = 1, . . ., n i and i = 1, . . ., m, associated with a MLM under a Box-Cox transformation with parameter λ, g(Y ij ), satisfies that gðY ij Þ � Nðm; The expected value E[X] of a power-normal distribution, in this case X � PNðl; m; s 2 � Þ, is calculated in [27] (Lemma 1). After considering the estimated parameters, this expression corresponds to one class of corrected predictions in the original scale, that without conditioning the random effects to the sample. For instance, for λ = 0, the expected value given in [27] is EðXÞ ¼ expðm þ s 2 � =2Þ, corresponding to Eq (5) when only one random effect is used. For an invertible function g(�), and considering estimators based on conditioning on the sample, as in (11) for a RIM or (17) for any LMM, a simulation can be used. Assuming that in the transformed scale all normality assumptions are satisfied, we can apply similar results as when a MLM and logarithm transformation were considered, and where Var[γ i |g(Y)] corresponds to (15). The expected value of the response in the original scale in a cluster i for an individual j can be approximated with simulations by generating a set of random numbers z l , for l = 1, . . ., L, according to the distribution given in (19), and obtaining:

Conclusion
The correction terms we proposed for a RIM with or without heteroscedasticity with response in a logarithmic scale enable more precise predictions. This is useful since responses in a logarithmic scale are commonly used, specially in financial and poverty analyses, and with our procedure, we can obtain more precise predictions of an economic measure in a population or better simulations of the distribution of the response, or an associated measure, for a new population (by simulating the error term and random effects and using the values of the explanatory variables). As the simulations assuming log-normal distributions and real data show, the best predictions, with lowest MSE, correspond to those including two correction terms, one for the errors and another for the random effects. These correction terms are easy to calculate and implement without the need of special software. Even though in a GLMM, a distribution different from the normal can be used, it is sometimes desired simply to work in a logarithmic scale when the normal behaviour under this transformation is properly satisfied; or in other words, when a lognormal distribution adequately fits some data. Besides, through simulations with gamma distributions, a commonly used distribution used to model income or similar variables, we showed that the predictions using the two correction terms are more precise than those obtained through a GLMM with a gamma distribution, as long as the parameter α associated with shape, in the gamma distribution is not close to one. And, even when the parameter is one, corresponding to an exponential distribution, as the number of clusters and observations in each cluster increase, the estimations obtained using the correction terms are close to those obtained with the GLMM and a gamma distribution (being in general better the ones using the smearing estimate, specially for lower values associated with the variance of the random effect, and viceversa), and better that those obtained through another correction method or without correction terms. On the other hand, in other type of analyses, as in some small area estimation techniques, it is desirable to preserve a normal distribution since the fit of a RIM is just one first step in a set of processes, all assuming normality; hence, assuming another distribution would change the complete technique; and, without the correction, the estimated poverty measures or any measure associated with a small area might be incorrect. The weights we considered for heteroscedasticity were of the form s 2 w À 1 ij ; however, a more general form s 2 ij can be used by substituting s 2 ij for s 2 w À 1 ij in all formulas. If the variance structure is estimated using a function, for instance an exponential variance structure, we estimate the LMM including this structure. Thus, the parameters for the structure are estimated with the fixed and random effects parameters. Any inference should be performed being careful that the degrees of freedom are corrected or appropriate corrections applied, particularly for small sample sizes [28] and non-linear covariance structures [29]. For the predictions in the original scale, the s 2 ij terms can be calculated using the estimated parameters corresponding to the variance structure and then using our formulas. Any further inference should be taken with care considering the variance structure was estimated. In fact, assuming any correlation structure associated with the error for each cluster, i.e. assuming that the matrix S i is not necessarily diagonal (however, the correlation structure between clusters is still assumed diagonal), for instance when time is involved, Eqs (15) and (16) still hold true, and formula (17) might be used modifying the third term accordingly, though care should be taken if any inference is required.
We also generalized the procedure considering any LMM, being RIM a particular case, and outlined the process that could be followed when a function different from the logarithm is used, though it seems that approximations should be used in this general case. Future work could be to continue working with transformations different from the logarithm to see if better predictions with closed formulas can be obtained. An exact variance estimator of the predicted values is also something desirable, though it seems, from some preliminary calculations, that a closed formula cannot be obtained; however, a better approximation than one using only simulations might be possible. We are working in the implementation of the correction terms in two-part models and their variants, for instance for health expenditure data in which there is concentration in the zero value since some people do not spend money, to see whether our correction terms allow to obtain better predictions as some preliminary analyses have shown.
Supporting information S1 File. Proposition 1 and Proposition 2 concerning the calculations to obtain Var(γ i |log (Y)) and Var(γ i |log(Y)) for a RIM and LMM, respectively, and details about the mimic simulation example.