A power approximation for the Kenward and Roger Wald test in the linear mixed model

We derive a noncentral F power approximation for the Kenward and Roger test. We use a method of moments approach to form an approximate distribution for the Kenward and Roger scaled Wald statistic, under the alternative. The result depends on the approximate moments of the unscaled Wald statistic. Via Monte Carlo simulation, we demonstrate that the new power approximation is accurate for cluster randomized trials and longitudinal study designs. The method retains accuracy for small sample sizes, even in the presence of missing data. We illustrate the method with a power calculation for an unbalanced group-randomized trial in oral cancer prevention.


Motivation
Linear mixed models are widely used in biomedical research for inference in analyses with missing data. Kenward and Roger [1] described a scaled Wald statistic and null case reference distribution for tests of fixed effects in the linear mixed model. Despite the widespread use of the Kenward and Roger [1] method for data analysis, no general methods are available to calculate power for the Kenward and Roger [1] test.
Several authors have described power approximations for related tests and models. Helms [2] described a noncentral F power approximation for a Wald test. Helms used a different null case reference distribution than the one derived by Kenward and Roger. Stroup [3] suggested an "exemplary data" approach for calculating power for mixed models with missing data. Tu et al. [4,5] developed an asymptotic power approximation based on generalized estimating equations. Shieh [6] provided non-central F power approximations for multivariate models with random covariates and no missing data. Chi, Glueck, and Muller [7] demonstrated that power methods for the general linear multivariate model may be used in complete, balanced, homoscedastic mixed models. We derive a noncentral F power approximation for the Kenward and Roger [1] test for a broad range of models. We use a method of moments approach [8] to form an approximate distribution of the Kenward and Roger [1] scaled Wald statistic, F R , under the alternative. The reference distribution of F R under the alternative depends on the approximate moments of the unscaled Wald statistic.
The remainder of the manuscript is organized as follows. In Section 2, we introduce notation for the general linear mixed model and briefly review the methods of Kenward and Roger [1]. In Section 3, we describe a noncentral F power approximation for the Kenward and Roger [1] test. In Section 4, we summarize the Monte Carlo simulation study used to evaluate the power approximation. In Section 5, we demonstrate a power calculation for a longitudinal trial in oral cancer prevention. In Section 6, we provide concluding remarks.
2 Notation, models, and hypothesis testing Extend the direct sum operator [9, Section 1.3] to sets of arbitrarily sized matrices as follows. Let {A 1 , . . ., A J } be a set of matrices such that A j has dimension (r j × c j ). Let 0 r i ;c j be an

Notation
and Let E 0 (u) and E A (u) be the expectations of the random variable u under the null and alternative hypotheses, respectively. Similarly, let V 0 ðuÞ and V A ðuÞ indicate the variance under the null and alternative hypotheses. For random matrix variates, denote the covariance under the null and alternative hypotheses as V 0 ðAÞ and V A ðAÞ, respectively. Let X � D indicate that random variable X follows a distribution D exactly, while X _ � D indicates that distribution is followed approximately. Let F � F ðn n ; n d ; gÞ indicate that the random variable F follows a noncentral F distribution [10] with numerator degrees of freedom ν n , denominator degrees of freedom ν d , and noncentrality parameter γ. For γ = 0, F is said to follow a central F distribution, written F � F ðn n ; n d Þ. Define F À 1 ðb; n n ; n d ; gÞ such that for 0 Use Y � N N;p ðM; Ξ; ΣÞ to indicate that the (N × p) matrix Y follows a matrix Gaussian distribution, with M an (N × p) matrix of means, X an (N × N) symmetric, positive definite column covariance matrix, and S a (p × p) symmetric, positive definite row covariance matrix [9,Chapter 8]

The general linear mixed model
We describe the general linear mixed model for Gaussian outcomes using the notation of Muller and Stewart [9,Chapter 5]. Let i 2 {1, . . ., N} indicate the ith independent sampling unit [9,Chapter 5]. An independent sampling unit may be a single participant, as in a clinical trial, or a group of participants, as in a cluster-randomized study. Observations from two different independent sampling units are statistically independent. Observations within an independent sampling unit may be correlated. For example, for a particpant in a longitudinal trial, repeated measurements over time will be correlated.
Let p i be the number of observations for the ith independent sampling unit, with p = max i (p i ). For the ith independent sampling unit, let y i be the (p i × 1) vector of observed outcomes, X i be the (p i × r) fixed effects design matrix of rank r, and e i be the (p i × 1) vector of random errors. Assume that for i 6 ¼ j, e i ? e j and y i ? y j . Let S i be a (p i × p i ) symmetric, positive definite matrix, with Let β be the (r × 1) vector of regression parameters. The linear mixed model for the ith independent sampling unit is Let n ¼ P N i¼1 p i . Define the (n × 1) vectors y s ¼ y 0 . Stack the fixed effect design matrices into the (n × r) matrix Throughout, we assume that predictor values are not allowed to change within an independent sampling unit, i.e., that there are no repeated covariates. In addition, we assume that all predictor values are fixed as part of the study design. The population-averaged form of the linear mixed model is Define The distribution of y s is

Tests for fixed effects in mixed models
Let α be the Type I error rate. Let C be the (a × r) matrix of fixed effects contrasts. Define the (a × 1) matrix θ = Cβ, and let θ 0 be the (a × 1) matrix of null values. The general linear hypothesis may be stated as In order to conduct power analysis for the general linear hypothesis in the mixed model, we must consider the target estimation method. Several estimation methods have been described for mixed models [12,Chapter 5]. Common estimation methods include restricted maximum likelihood and maximum likelihood.
Let m indicate the estimation method. LetΣ s;m andβ m be the estimates of S s and β obtained from method m. Defineθ m ¼ Cβ m . The Wald statistic for the linear mixed model is The distribution of the Wald statistic is not known exactly for any m. Various reference distributions have been suggested for each estimation method m. In general, the distributions share a common form, with Under the null hypothesis, γ m = 0 and w m _ � F ðn n;m ; n d;m Þ.

The Kenward-Roger test for fixed effects
Kenward and Roger [1] suggested using restricted maximum likelihood estimation (m = R) and a scaled Wald statistic.
Kenward and Roger [1] used Taylor expansion to estimate E 0 (w R ) and V 0 ðw R Þ from observed data. Kenward and Roger [1] substituted E 0 (w R ) and V 0 ðw R Þ into method of moments approximations for λ and the reference distribution of F R under the null. With F R _ � F ða; nÞ, 3 Power approximation for the Kenward-Roger test in the linear mixed model

The approximate moments of the Wald statistic
We derive a noncentral F power approximation for the Kenward approximately Wishart. Finally, under the assumption of independence, we combine the terms to obtain an approximate F distribution.

The approximate distribution of CðX
with a single central Wishart. The result follows from Theorems 1, 2 and 3 in A. The theorems provide an approximate distribution for a positive definite sum of potentially singular quadratic forms in independent inverse central Wishart matrices. The accuracy of the approximation depends on the degrees of freedom of the component quadratic forms. To ensure sufficient degrees of freedom, we make the following homoscedasticity assumptions. Recall p = max i (p i ). With S max a symmetric, positive definite matrix, For independent sampling units with observation pattern R d , assume Without loss of generality, permute the independent sampling units in Eq 8 so that Estimate S s withΣ The following thought experiment gives reasonable approximations for the distribution of eachΣ d . All independent sampling units with observed data pattern R d have p d observations. For each R d , suppose we form a complete, balanced mixed model containing only the independent sampling units with observed data pattern R d . For each balanced mixed model, assume that X s includes the full time by treatment interaction. This permits recasting each balanced mixed model as an equivalent general linear multivariate model [9,Chapter 14]. For cluster randomized designs, we assume that the mixed model is recast as a two-stage model of cluster means [13,Chapter 4], a special case of the multivariate model.
For the dth multivariate model, let q be the rank of the multivariate design matrix andÊ d be the (N d × p d ) matrix of residuals. Assume N d > (q + p d + 1). Then an unbiased, consistent estimate of S d ,Σ d;M , can be formed using known results for the multivariate model. Thus, Recall that in the Wald statistic (Eq 12), Using Eq 25 and Theorem 3 in Appendix, approximate the distribution of X 0 sΣ À 1 s;M X s with a single inverse central Wishart, From the linear properties of Wishart matrices [11, p. 111, Theorem 3.4.1], We assume that w � w R . From Eq 19, ðθ W À θ 0 Þ is approximately Gaussian. From Eq 28, where and From Eq 30, we calculate E 0 (w), E A (w), and V A ðwÞ, using standard results for central and noncentral F distributions [10].

A three-moment approximation for the distribution of the Kenward and Roger scaled Wald statistic under the alternative hypothesis
We use a method of moments approach [8] To obtain values for λ, ν, and γ under the alternative, we match three moments, setting and With we obtain n ¼ 4 þ 2ða þ 2gÞ þ ða þ gÞ 2 ra 2 À a À 2g ; ð39Þ When γ = 0, Eq 39 reduces to which shares the same form as the result obtained by Kenward and Roger (Eq 16). The exact values of ρ, and hence ν, will differ due to the disparate techniques used to obtain moments for the Wald statistics, w and w R .

Power calculation for the Kenward and Roger test
We calculate power for the Kenward

Methods
We compared approximate power values, calculated as in Section 3.3, with empirical power for two types of study designs: unbalanced, cluster randomized trials and longitudinal studies with known dropout patterns. Approximate power was calculated using our mixedPower package for R version 4.0.2 [15]. Empirical power was calculated by Monte Carlo simulation in SAS [16, version 9.4]. We defined α, S max , β, C and θ 0 . For i 2 {1, . . ., N}, we specified X i and R d . We generated 10, 000 replicates of e s and computed y s as in Eq 8. For each replicate, we tested the linear contrast C using SAS PROC MIXED with the DDFM = KenwardRoger flag to request Kenward and Roger [1] denominator degrees of freedom. Empirical power was estimated as the proportion of replicates for which the null hypothesis was rejected. Source code is available at http:// github.com/SampleSizeShop/mixedPower.

Cluster randomized designs.
We compared approximate and empirical power for 36 cluster randomized designs. We assumed that each design had a single Gaussian outcome. Half of the clusters were assumed to have complete data, with the remaining clusters assumed to have some amount of missing data. We varied the number of treatment groups, t 2 {2, 4}, the number of clusters randomized to each treatment, N treatment 2 {10, 40}, the total number of participants in a complete cluster, p 2 {5, 50} and the ratio of the incomplete cluster size to the complete cluster size s 2 {0.6, 0.8, 1}. We only included designs which met the assumption that N d > (q + p d + 1) for all R d .
For each design, we repeated the simulations for several intraclass correlation values ρ 2 {0.04, 0.1, 0.2, 0.5}, with The β matrix had the form for designs with 2 treatments and for designs with 4 treatments. The scale factor b was selected so that the approximate power was roughly 0.2, 0.5 or 0.8. In each scenario, we calculated power for the null hypothesis of no difference among treatment groups at α = 0.05. We used the Wald test with denominator degrees of freedom as described by Kenward and Roger [1].

Longitudinal designs.
We calculated approximate and empirical power for 36 longitudinal study designs. Each design had 5 repeated measures and 50 participants per treatment group. We varied the number of treatment groups, t 2 {2, 4}, the pattern of missing data, either monotone (missing the 4th and 5th observations), or non-monotone (missing the 2nd and 4th observations), and the number of participants in each treatment group with some amount of missing data, N incomplete 2 {0, 10, 20}. For observations within a given participant, we assumed a first-order auto-regressive correlation structure [12, p. 99], with ρ = 0.4 and σ 2 = 1. The β matrix had the form for designs with 2 treatments and for designs with 4 treatments. The scale factor and hypothesis testing were as described for the cluster randomized designs with one exception: we calculated power for the null hypothesis of no time by treatment interaction.

Performance criteria.
For each design, we computed the deviation as approximate power minus empirical power. We produced box plots summarizing the deviations overall, within all cluster randomized trials, and within all longitudinal designs. For the cluster randomized trials, we produced box plots stratified by the number of treatment groups, the cluster size, and the ratio of the incomplete cluster size to the complete cluster size. For the longitudinal designs, we produced box plots summarizing the deviations stratified by the number of treatment groups, the pattern of missing observations, and the number of incomplete independent sampling units per treatment.
Positive deviations indicated that the approximate power values were larger than the empirical power values. Negative deviations indicated that the approximate power values were smaller than the empirical power values. Further details for cluster-randomized designs are shown in Fig 2. The accuracy of the power approximation improved with larger cluster sizes. The approximation retained accuracy regardless of the ratio of incomplete to complete cluster sizes. As shown in Table 1, accuracy was similar across ICC values, with slight improvements with increasing correlation.

Results
Results for longitudinal designs are shown in Fig 3. The power approximation was highly accurate for all longitudinal designs tested.

Applied example
We demonstrate a power calculation for an unbalanced cluster-randomized trial of an intervention to reduce oral cancer risk behaviors. The example is based on a hypothetical study examining the impact of workplace smoking cessation programs on tobacco use. We used a synthetic, rather than a real example, so that the power calculation is easy to follow. In a real power calculation, values of differences in means, standard deviations and intra-class correlation coefficients could be drawn from the literature, as described in Guo et al. [17].
For our demonstration, we assume that 80 worksites will be randomized to 2 smoking cessation programs, with 40 sites per treatment condition. Of the 40 sites randomized to each smoking cessation program, 25 worksites will have 30 participants, and the remaining 15 will have 20 participants. The outcome for the analysis will be urinary cotinine. We wish to detect a difference of 25 ng/ml. We assume a standard deviation of 125 ng/ml, and an intraclass correlation of 0.04. We will calculate power for the Kenward and Roger [1] test of the smoking cessation program effect. We set α = 0.05.
To begin the calculation, we first identify the patterns of observations in the study, including complete clusters with 30 participants, and incomplete clusters with 20 participants.   Table 2 summarizes the design matrices and patterns of observations by cluster size and treatment assignment.

Discussion
We describe a power approximation for the Kenward and Roger (1997) test of fixed effects in the linear mixed model. The method was accurate to within about ±0.06 for all designs, with the best accuracy observed for longitudinal designs. We note that Kenward and Roger (2009) have since described a refinement which improves estimation of the non-linear covariance structures in small samples. We have restricted our discussion to the Kenward and Roger (1997) approach, since it is most commonly used in statistical practice. The method has several limitations. The assumption of N d > (q + p d + 1) may be too restrictive for multilevel designs with large cluster sizes. In addition, we assume that the pattern of missing data is known. The method does not apply to repeated covariates, which often appear in biomedical studies. However, the method does apply to baseline covariates, a common study design. We make a strong homoscedasticity assumption of equal variance for each independent sampling unit. This assumption means that the power computations are not appropriate for random regression, for models with group differences in variance, or for certain spatial-temporal applications. Nevertheless, the assumption of homoscedasticity is widely made for randomized controlled clinical trials, laboratory studies, and observational studies, which makes the method useful for a variety of cases. Lastly, the method has not been evaluated for binary or Poisson data.
The analytic results from this manuscript suggest several future extensions. We may be able to calculate power for linear mixed models with random missing data patterns by invoking conditional distribution theory and calculating expected power across patterns of missingness. In addition, the approach used to form the distribution ofΣ s;M provides the first step towards a non-iterative alternative to restricted maximum likelihood estimation for some mixed models. For big data applications, such a non-iterative approach may facilitate highly parallel computation of parameter estimates in mixed models.
Our power approximation provides a general, flexible, accurate and rapid method to calculate power for the Kenward and Roger (1997) test. For studies in which the Kenward and Roger (1997) test is the planned method of data analysis, our power approximation should be used. By aligning power analysis with the planned data analysis, researchers can more accurately assess power for biomedical studies. Accurate power analysis is an ethical imperative for research with human participants.

Appendix
is positive definite.

PLOS ONE
Note that for i 2 {1, 2, ‥, q}, I q ({i}) 0 I q ({i}) is a (q × q) matrix for which the ith diagonal element is 1 and all remaining elements are 0. Therefore, Eq 55 can be equivalently expressed as a direct sum.
From Mathai and Provost [18, p.18, Theorem 2.2b.1], it follows that each I p ðR m Þ 0 S À 1 m I p ðR m Þ is positive semi-definite. By assumption, for each Q i , there exists a c i such that X c i 2 Q i such that X c i ¼ I q ðfigÞ � I p . Then Because S À 1 c i is positive definite and the remaining I p ðR m Þ 0 S À 1 m I p ðR m Þ are positive semi-definite for i 2 {1, . . ., q}, then is positive definite. Since P k m¼1 X 0 m S À 1 m X m is a block matrix, the eigenvalues of P k m¼1 X 0 m S À 1 m X m are the eigenvalues of all of the blocks. Since each block (Eq 58) is positive definite and hence has positive eigenvalues, it follows that P k m¼1 X 0 m S À 1 m X m must also be positive definite.
it can be shown that is approximately distributed as S À 1 � � IW qp ðN � ; Ψ � Þ. Proof. Theorem 1 in Appendix demonstrates that Q −1 is positive definite under the restriction that for each i 2 {1, . . ., q}, there exists at least one m such that X m = I q ({i}) � I p .
To derive an approximate distribution for Q −1 , we match the expectation of the sum of the Wishart matrices and the variance of the trace of the sum of the Wishart matrices. Set and From Theorem 2 in Appendix and the independence of the S À 1 m , Then the approximate parameters for S À 1 � � IW q ðN � ; Ψ � Þ are and ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where ðN m À p m ÞðN m À p m À 1Þ 2 ðN m À p m À 3Þ ; and c h ¼ ð2h 1 p À 4h 2 þ 4h 3 p þ 4h 3 þ h 4 p 2 þ 3h 4 pÞ: The method of moments approximation yields an asymptotic approximation for the sum, as desired. Theorem 4. Let n and p be positive integers, μ be a (p × 1) vector of means, and S x 6 ¼ S W be symmetric and positive definite (p × p) matrices. Suppose x � N p ðμ; Σ x Þ independently of W � W p ðn; Σ W Þ. Then with l u ¼ d À 1 u ðμ 0 Σ À 1 W μÞ; ð78Þ and Proof. Define V ¼ x 0 Σ À 1 W x=x 0 W À 1 x. Define U ¼ x 0 Σ À 1 W x. Using Lemma 17.10 in Arnold [19, p. 319], it follows that Vjx � w 2 nþpÀ 1 . Hence, V ? x, which implies V ? U. The expression U is a weighted sum of noncentral χ 2 random variables [9, Theorem 9.5, p. 176]. Approximate the distribution of U with a single noncentral χ 2 , so that U _ � l u w 2 n u ðd u Þ. Using the approach described by Kim et al. [8], obtain values for λ u , n u and δ u by matching the following three moments: and V A fl u w 2 n u ðd u Þg ¼ V A ðUÞ: ð83Þ The moments of U are [9, Corollary 9.6.3, p. 179], and Then the approximate parameters of U are and Since ðU=l u Þ _ � w 2 n u ðd u Þ, V � w 2 nþpÀ 1 , and V ? U, U=ðl u n u Þ V=ðn þ p À 1Þ _ � F fn u ; n þ p À 1 ð Þ; d u g: