Power and Sample Size Determination in the Rasch Model: Evaluation of the Robustness of a Numerical Method to Non-Normality of the Latent Trait

Patient-reported outcomes (PRO) have gained importance in clinical and epidemiological research and aim at assessing quality of life, anxiety or fatigue for instance. Item Response Theory (IRT) models are increasingly used to validate and analyse PRO. Such models relate observed variables to a latent variable (unobservable variable) which is commonly assumed to be normally distributed. A priori sample size determination is important to obtain adequately powered studies to determine clinically important changes in PRO. In previous developments, the Raschpower method has been proposed for the determination of the power of the test of group effect for the comparison of PRO in cross-sectional studies with an IRT model, the Rasch model. The objective of this work was to evaluate the robustness of this method (which assumes a normal distribution for the latent variable) to violations of distributional assumption. The statistical power of the test of group effect was estimated by the empirical rejection rate in data sets simulated using a non-normally distributed latent variable. It was compared to the power obtained with the Raschpower method. In both cases, the data were analyzed using a latent regression Rasch model including a binary covariate for group effect. For all situations, both methods gave comparable results whatever the deviations from the model assumptions. Given the results, the Raschpower method seems to be robust to the non-normality of the latent trait for determining the power of the test of group effect.


Introduction
The evaluation of perceived health outcomes or more generally patient-reported outcomes (PRO) is increasingly performed in different health areas. PRO combine self-reported information provided by the patient on his health or his treatment and aims at assessing his quality of life, anxiety or pain for instance. PRO differ from other health outcomes because these patient's characteristics cannot be directly measured such as overall survival for instance. These particular outcomes are usually evaluated using selfassessment questionnaires that are composed of a set of questions (called items) whose responses provided by the patients are analyzed.
Analysis of PRO can be based on two approaches: Classical Test Theory (CTT), or Item Response Theory (IRT) [1]. CTT relies on the observed scores (possibly weighted sum of patient item's responses) that are assumed to provide a good representation of a ''true'' score. IRT relies on an underlying response model relating the items responses to a latent unobservable variable, often called latent trait, usually assumed to follow a normal distribution and interpreted as a measure of the studied concept (quality of life, for example). IRT models are increasingly used to validate PRO instruments and to analyze these particular outcomes [2] [3] [4]. Moreover, amongst the large family of IRT models, the Rasch model [5] is often used for dichotomous items in health sciences. This model has interesting psychometric properties, in particular the specific objectivity property [6]. It involves that the patients can be objectively compared, that is to say independently of the questionnaire. Besides, the Rasch model presents several advantages such as the possibility to obtain a measure of the latent trait on an interval scale as well as the management of missing data and of possible floor and ceiling effects [7].
Despite the widespread use of PRO, the design and planning of studies, regarding careful a priori sample size and power determination, remain hardly ever provided. Furthermore, it has been stressed that many studies might not be adequately powered to determine clinically important changes in PRO [8] [9]. Specific sample size methodology is importantly needed for clinical research including PRO to avoid inadequately sized studies [10]. An inappropriate sample size determination could indeed lead to erroneous and uninformative conclusions or expose patients to inappropriate medical strategies.
The sample size for the comparison of a normally distributed endpoint in two independent groups can be computed using the usual formula conditionally on some assumed parameters values. Generally, the expected difference between mean values of the studied endpoint in the two groups (group effect c) and the variance of the endpoint (s 2 ), often assumed to be equal in both groups, have to be defined. These parameters can be determined from a pilot study, the literature or experts opinions. Using these assumptions the sample size in each group can be determined for a given type I error and a given power.
The usual sample size formula could be used in the framework of IRT when the latent trait (endpoint) is assumed to be normally distributed. However, a previous study has shown that this widelyused formula was inadequate for IRT models because it leads to an underestimation of the required sample size [11]. This underestimation is closely related to the fact that the latent trait is an unobservable variable and that its estimation requires a model which creates uncertainty. Moreover, sample size determination is a key point for a rigorous planning in order to be able to determine an expected clinically relevant difference while controlling the type I and type II errors. Hence, an adaptation of the classical formula was required in order to offer a theoretical method for calculating the number of subjects for PRO studies [11] using IRT.
From this perspective, a method has been developed for power and sample size determination when an IRT model, the Rasch model, is intended to be used for analysis [12]. This method named Raschpower provides the power for a given sample size during the planning stage of a study in the framework of IRT.
This method has been validated under some conditions when all the assumptions of the underlying model were fulfilled. The aim is to study the impact of misspecifications of the distribution of the latent trait on the performance of the Raschpower method. The objective is to ensure that the Raschpower method produces reliable results when the required assumptions are not fulfilled. A simulation study is performed to assess whether the Raschpower method can still be used when the latent trait is not normally distributed.

Rasch model
In IRT, the link between a latent trait (quality of life for example), and item parameters (items difficulties) is modelled. The probability that a person i (i = 1,…,N) responds x ij to an item j (j = 1,…,J) is modelled with a logistic model depending on two parameters, the value of the latent trait of the person, h i and the difficulty of the item j, d j . For a questionnaire composed of J dichotomous items answered by N patients, the Rasch model can be written as follows: where x ij is a realization of the random variable X ij . h i are realizations of the random variable H, h 1 , h 2 ,…, h N are mutually independent with a common underlying distribution which is generally assumed to be a normal distribution. In this case, the parameters of the Rasch model can be estimated by marginal maximum likelihood (MML) [13]. The Rasch model relies on three assumptions: i-unidimensionality: a unique latent variable explains the responses to the items; ii-monotonocity: the probability of a positive response to an item is a non-decreasing function of the latent variable; iii-local independence: given an individual, the item responses are independent of one another. A constraint has to be adopted to ensure the identifiability of the model: in the present paper, the mean of the latent trait (m) is set to 0 [14] but the constraint can be put on either the mean of the latent trait or the mean of items difficulties.

Determination of the power by the Raschpower method
To compare the means of the latent trait in two independent groups, we use the group effect (c) which is the difference between the means of the latent trait in each group. The expected sample size is N 0 in the first group and N 1 in the second group. The latent regression Rasch model including a binary covariate for group effect is defined by the following: To identify the model, the mean of the latent trait (m) is 0 where m is the mean between m 0 and m 1 , each of them weighted by the sample sizes N 0 and N 1 . Consequently, in the first and the second group, respectively. Therefore g i corresponds to in the first group and to N 0 N 0 zN 1 in the second group.
The variance of the latent trait s 2 is assumed to be equal in the two groups. The Wald test is used to compare the means of the latent trait in the two independent groups. The hypotheses for a two-sided test of comparison are defined as H 0 : c = 0 against H 1 : c?0. To perform the test, an estimate C of c and its variance are required and it is assumed that the test statistic C ffiffiffiffiffiffiffiffiffiffiffiffiffi ffi var(C) p follows a normal distribution N(0,1) under H 0 . The patient's responses are also required to estimate the group effect's variance. At the planning stage, they are not known but a planning dataset can be determined conditionally on the assumed values for the sample size in each group (N 0 and N 1 ), the group effect (c), the item difficulties (d j ) and the variance of the latent trait (s 2 ). For each possible response pattern (corresponding to a combination of responses of an individual to all items), the associated probability for each group is computed using the Rasch model. The expected frequency of each response pattern in each group is then determined. For dichotomous items, the number of response patterns corresponds to 2 J where J is the total number of items. The dataset composed of the expected frequencies associated to each response pattern is then analyzed with a latent regression Rasch model including a binary covariate for group effect. From this model, we estimate the group effect (c) and its variance by the method of marginal maximum likelihood.
To approximate the variance, we use the property of the Cramer-Rao (CR) bound which allows obtaining the lower bound of the variance of an unbiased estimator from the inverse of Fisher information. The expected power of the test of the group effect based on the Cramer-Rao bound can be approximated by [12]: where c is assumed to take on positive values, z 1-a/2 is the quantile of the standard normal distribution, W is the cumulative standard normal distribution function, and vâ ar(ĉ c) is evaluated using the Cramer-Rao bound.

Simulation study
The Raschpower method assumes a normal distribution for the latent trait. To evaluate its robustness to departures from this normality assumption, the power determined with this method was compared to the power obtained using a simulation study, regarded as a reference. The data were simulated with a nonnormal distribution for the latent trait and the empirical rejection rates obtained when fitting a model (assuming a normal distribution for the latent trait) to these simulated data was compared to the power obtained with the Raschpower method. The first step consisted in simulating data according to the parameters and underlying assumptions. The expected values for the parameters that are used at the planning stage of a study are the group effect (c), the number of items (J), the item difficulties (d j ) but also the distribution of the latent trait. Two independent datasets (groups) were simulated. Each simulated scenario, corresponding to a combination of parameters, was replicated 1000 times. The simulated datasets were subsequently analyzed with a latent regression Rasch model including a binary covariate for group effect and assuming the normality assumption for the latent trait.
Simulated distributions of the latent trait. Data were simulated with a latent trait that followed a beta distribution depending on two parameters v and t. The beta distribution had different shapes depending on the value of these parameters which are both greater than zero ( Figure 1). For example, when v and t were lower than 1, we obtained a U shaped distribution. If v was set to 1 and t was greater than 1 we obtained a L shaped distribution. Similarly, when v was greater than 1 and t was set to 1 we obtained a J shaped distribution.
These scenarios with different distributions of the latent trait reflect some specific situations: -The U shaped distribution means that the population is composed of individuals responding positively to most items and inversely, of individuals responding negatively to most items. The parameters of the beta distribution for this scenario have been fixed to v = t = 0.4. -The L shaped distribution corresponds to a situation in which individuals respond mostly negatively. The parameters of the beta distribution have been fixed to v = 1 and t = 4.
-The J shaped distribution is the opposite situation where individuals respond mainly positively. In this case, the parameters of the beta distribution have been fixed to: v = 4 and t = 1. Type I error and power determination in the simulation study. We estimated the group effect and its variance using the latent regression Rasch model including a binary covariate for the group effect on the simulated datasets. For the simulation study, the difficulty of items and the variance of the latent trait were fixed to their expected values. A Wald test was applied in order to estimate the type I error computed among the simulated datasets as the rejection rate of the null assumption (c = 0) when the group effect was simulated at 0 (under H 0 ). Similarly, the power was estimated among the simulated datasets as the rejection rate of the null assumption when the group effect was simulated at a value different from 0 (under H 1 ).

Evaluated criteria
To study the influence of the non-normality of the latent trait on the performance of the Raschpower method, several criteria were compared. The type I error (a) and their confidence interval were estimated using the simulated datasets. The power (1-b S ) was obtained in the same way and compared to the power given by the Raschpower method based on the Cramer-Rao bound, (1-b CR ). This latter is computed with the Raschpower module of Stata [12]. These comparisons allow determining whether the estimated power differ when the latent trait is not normally distributed. As the estimation of (1-b CR ) is based on the estimated value of the variance of c, a good estimation of the power requires a good estimation of this variance. Hence, the mean of the variance of the group effect in the simulations is compared with the estimated variance of the group effect using the Raschpower method. Along with these criteria, in the simulation study, the estimations of the group effect are studied to check that the estimated value is close to the simulated value for all cases. For the simulation study, the estimation of the group effect corresponds to the mean of the estimations obtained across the 1000 simulated datasets.

Results
Type I error of the test of group effect Table 1 shows the empirical type I error for each of the 30 combinations of number of items (J), sample sizes (N g ; g = 0,1), and distributions of the latent trait. All the type I errors are close to the expected value of 5%. Among the 30 estimates (when c = 0), only two of the 95% confidence intervals do not contain the expected value of 5%. These cases are observed with N g = 100, J = 10 and N g = 200, J = 10 for a U shaped latent trait distribution (parameters of the beta distribution v = t = 0.4). Table 2 presents the mean of the estimations of the group effect (ĉ c) obtained using the simulated datasets according to the sample size (N g ; g = 0,1), the number of items (J) and the different shapes of the distribution of the latent trait. For all distributions of the latent trait and parameter values, such as the sample size and the number of items, the group effect is correctly estimated. Estimations of the group effect (ĉ c) are close to their simulated values (c). Among the 120 estimates, only three 95% confidence intervals do not contain the expected value of the group effect (c), these results are not shown. Table 3 presents the results of the estimation of the mean variance of the group effect (var S ) using the simulated datasets, and the variance of the group effect obtained with Raschpower (var CR ) for different values of the group effect (c), the sample size in each group (N g ; g = 0,1) and the number of items (J). These results are related to the case where the latent trait has a beta U shaped distribution.

Estimation of the variance of the group effect
The results show that, whatever the values of the parameters, the estimations of the variances of the group effect are close (between the simulations and the Raschpower method). The difference between the estimations of the variances is on average 8.75 10 24 and it fluctuates between 20.0017 (J = 10, N g = 50, c = 0.8) and 0.0013 (J = 10, N g = 50, c = 0). Among the expected effects, we find that the variance of the group effect decreases when the sample size (N g ; g = 0,1) increases whereas it rises, but only slightly, when the group effect (c) increases. Moreover, the variance of the group effect drops when the number of items (J) expands. Other simulation results obtained with L and J shaped distributions of the latent trait are similar, the variance estimates for the group effect obtained in the simulations are very close to those given by the Raschpower method (results shown Table S1).  Table S2), the power increases with the sample size (N g ; g = 0,1), the group effect (c) and the number of items (J). For example, with the Raschpower method using N g = 100, with c = 0.5 and J = 5, the power is 69.4% and it is 81.1% when J = 10.

Discussion
The Raschpower method provides Rasch-based power determination for two-group cross-sectional comparisons when a Rasch model is intended to be used for analysing PRO data. It relies on some assumptions and in particular the normality of the latent trait distribution which might not be encountered in practice. The impact of a deviation from the normality assumption on the determination of the power of the test of group effect using the Raschpower method was studied using simulations. The power is a key point for the determination of sample size at planning stage for cross-sectional studies comparing two groups and it depends on two parameters: the group effect and its variance.
The results have shown that the powers estimated using either simulations (with a non-normal distribution of the latent trait) or the Raschpower method were very close. The violation of the assumption regarding the distribution of the latent trait had very little impact on the estimation of the variance of the group effect and thus on the power of the test of group effect. As expected, the power varied with different parameters; in particular, it increased when the number of items, the sample size and the group effect rose.
Some methodological choices can be discussed regarding both the distributions of the latent trait and items parameters. The robustness of the Raschpower method might be related to the fact that the distributions of the latent trait and of the items parameters were overlaid. Indeed, in this study, the items parameters were simulated as regularly distributed and adapted to a population with a latent trait distributed in the same range of values. This reflects a questionnaire that is neither too easy nor too difficult for patients with some balance of positive and negative responses. Moreover, the regularity of the distribution involves that the latent Table 1. Type I error and confidence intervals obtained using simulations according to the sample size (N g ; g = 0,1), the number of items (J) and the distribution of the latent trait (Beta distribution). trait of individuals is evaluated with a similar precision all over its continuum. Finally, the choice of overlapping distributions avoids the presence of ceiling and floor effects. It has been shown in previous results [15] that the Raschpower method is valid when the items distribution is overlaid with the latent trait distribution. Some other aspects related to the robustness of the Raschpower method could also be investigated. It could be interesting to study some parameters misspecifications, especially those that might affect the power of the test of group effect, such as the variance of the latent trait (s 2 ) for instance. The purpose would be to get more insight regarding the impact on the performance of the Raschpower method when the expected value of the parameter (fixed at planning stage) is different from the observed value on the data (analysis stage).
The violation of the assumption of normality of the latent trait does not impact the estimation of the variance of the group effect. The power is correctly estimated and the Raschpower method is robust for power analyses of PRO data analysed with a Rasch model. This issue of robustness to misspecification of the distribution of random effects has been approached in a more general way in generalized linear mixed models (GLMM) from which IRT models such as the Rasch model come from [16] [17] [18]. The consequences of misspecifying the random-effects distribution on the estimation and hypothesis testing in GLMM was studied, through simulations. Different distributions and variances of the random-effect were investigated. The results have shown that in the context of small variance and only one randomeffect, the estimations were correct as well as the control of the type I and II errors for all simulated distributions. Moreover, the estimates of the fixed effects are much less sensitive to misspecification of the random effects distribution [19].
In the framework of our study, the Raschpower method and the Rasch model used in the simulation study were performed in the same conditions with small variance and only one random-effect. Furthermore, in simulations, the fixed effect (c) was estimated without bias meaning that the estimation of the power of the group Table 2. Mean of the estimations of the group effectĉ c ð Þ obtained using simulations according to the sample size (N g ; g = 0,1), the number of items (J) and the distribution of the latent trait (Beta distribution).   Table 3. Estimation of the variance of the group effect using simulations (var S ) and using the Raschpower method (var CR ) according to the different values of group effect (c), sample size in each group (N g ; g = 0,1) and number of items (J).  [18]. These results bring new elements about the robustness of the Raschpower method and complete those of the literature on the robustness in GLMM. The Raschpower method seems to be robust to non-normality of the latent trait and the power and type I error are not affected by a misspecification of the distribution of the latent trait.

Supporting Information
Table S1 Estimation of the variance of the group effect using simulations (var S ) and using the Raschpower method (var CR ) according to different values of group effect (c), sample size in each group (N g ; g = 0,1) and number of items (J). L and J shaped distributions case for the latent trait.