A note on misspecification in general linear models with correlated errors for the analysis of crossover clinical trials

Among various approaches to the repeated measures analysis in crossover clinical trials, the general linear models (GLMs) with correlated errors attract substantial attention due to their simplicity in model specification, implementation, and interpretation. The goal of this research note is to conduct simulation studies to numerically investigate the impact of model misspecification in the GLMs with correlated errors in the analysis of crossover trials. A series of synthetic two-treatment and three-treatment crossover trials were designed, and simulation studies were conducted to assess how treatment effect estimation, type I error rates, and power can be affected by misspecified period effects, carryover effects, and variance-covariance structures in the GLMs. Numerical studies confirm that (i) the GLMs with terms for both carryover and period effects and with an unstructured variance-covariance matrix can provide unbiased treatment effect estimates and control of Type I error rates and that (ii) misspecification in either period effects, carryover effects, or covariance structures in the GLMs can induce inflated type I error, declined power, or biased treatment effect estimates. Although methodologic contribution of this research note is minimal, we provide practical recommendations and advice to pharmaceutical sponsors and other investigational drugs and device applicants in designing and analyzing crossover trials using the GLMs with correlated errors.


Introduction
In clinical trials with a crossover design, study subjects are assigned to receive a sequence of different treatments in multiple study periods, during which study endpoints are repeatedly measured in each period on the subjects. The crossover trials are not uncommon in investigations of new medical devices [1] and are standard in drug studies of bioequivalence [2]. Senn  [3] and Jones and Kenward [4] provide systematic reviews on design and analysis of crossover trials. Crossover trials have two major advantages over the conventional parallel-group clinical trials. First, the influence of confounding covariates on treatment evaluation can be largely reduced because the crossover subjects essentially serve as their own control. Second, optimal crossover designs are statistically efficient, and therefore fewer subjects are required than the conventional parallel-group designs. However, crossover trials still have substantial issues in data analysis [5]. One of the issues is presence of period effects caused by the order that treatments are given to study subjects. Period effect represents a systematic difference between different periods in the outcome for evaluating a treatment. The presence of a period effect may suggest that a patient's underlying condition and potential to respond to the treatment would have changed from one treatment period to another. To avoid confounding period effects, groups of subjects are randomized to multiple sequences of treatments [4]. Another issue is potential existence of carryover effects that may affect study endpoints together with the "direct effect" of treatments administered to the subjects. Carryover effect is defined as the lingering effect of the treatment of the previous study period on the current study period. It presents when the treatment effect given in the previous period persists into the second period and distorts the current treatment effect. Carryover effects in crossover trials may bias analysis of the direct treatment effect [6].
Statistical methodologies for analyzing crossover trials were developed for various types of study endpoints, including dichotomous endpoints [7][8] and ordinal endpoints [9][10], but a large body of literature discusses the methodologies for continuous endpoints [11][12]. For continuous and normally distributed endpoints, Bellavance et al. [13] proposed a modified Ftest approximation that accounts for the correlations within subjects induced by repeated measures to conduct relevant hypothesis tests. Simulation studies conducted by Bellavance et al. showed that the modified F-test approximation gives adequate control of the type I error rate over a variety of the covariance structure for three-period crossover trials [13]. Yet, Jones and Kenward [4] promoted the use of linear fixed-effects and random-effects models to analyze crossover trials. Bellavance and Tardif [14] described a nonparametric approach to analyze the three-treatment, three-period crossover trials by providing unbiased treatment effect estimates and transforming the original crossover design into a randomized block design in which the well-known rank tests can be applied. Ö hrvik [15] proposed another nonparametric method that can be applied to a class of crossover trials with three or more treatments.
Among a variety of analysis approaches, general linear models (GLMs) with correlated errors have attracted substantial attention as tools to analyze the data from crossover trials, primarily because of their simplicity in model specification, implementation, and interpretation [4]. However, in practice, model specification of GLMs with correlated errors has caused massive troubles, as data analysts and clinical investigators have had difficulty deciding whether or when they should include period effects and carryover effects in the GLMs and which variance-covariance structure they should assume for the GLMs. For the GLMs, the likelihoodratio test and information criteria, such as the Akaike information criterion (AIC) and the Bayesian information Criterion (BIC), can be applied to compare the performance of two or more GLMs. Littell at al. [16] showed that specification of covariance structure substantially influences the inference of fixed effects in the analysis of repeated measures data. Lu and Mehrotra [17] recommend using unstructured covariance as the default strategy for analyzing longitudinal data from randomized clinical trials with a moderate-to-large number of subjects and a small-to-moderate number of time points. The goal of this research note is to conduct simulation studies to numerically investigate the impact of model misspecification in the GLMs with correlated errors in the analysis of data collected from crossover clinical trials. This investigation was motivated by two real-world randomized crossover clinical trials for investigational medical devices: a two-treatment, two-period crossover trial comparing a new test contact lens with a control lens and a three-treatment, three-period crossover trial comparing two new contact lenses with a control lens. Consequently, we designed a series of synthetic two-treatment and three-treatment crossover trials and simulated datasets from these trials to assess how treatment effect estimation, type I error, and power for testing treatment effects are affected by misspecified period effects, carryover effects, and variance-covariance structures.
The messages delivered by this research note are concise. The numerical studies confirm that the GLMs including carryover and period effects can provide unbiased treatment effect estimates and control of Type I error rates if carryover effects are identifiable. Additionally, assuming an unstructured covariance structure for the GLMs has proven to be a safe choice if there is not sufficient confidence in covariance structure specification. The numerical studies show that misspecification in either period effects, carryover effects, or covariance structures likely induces inflated type I error, declined power, or biased treatment effect estimates. We recommend adopting the GLMs with carryover and period effects and an unstructured covariance structure to analyze crossover trials. Also, we verified that the balanced crossover design should be preferred over the unbalanced crossover design. Although methodologic contribution of this research note is minimal, its practical contribution is solid and substantial. This note provides recommendations and advice to pharmaceutical sponsors and other investigational drug and device applicants who would use the GLMs with correlated errors as their primary analysis approach to analyze crossover trial data but are confused on model-specification issues. These issues have never been thoroughly addressed in the context of analyzing crossover trials.

General linear models with correlated errors for the analysis of crossover clinical trials
We consider an s-sequence, p-period crossover clinical trial that compares t treatments, and it is assumed that in the crossover trial there are n i subjects in sequence group i, i = 1,2,� � �,s, with P s i¼1 n i ¼ n. Let y ijk denote the response observed on the kth subject in period j of sequence group i, where j = 1,2,� � �,p and k = 1,2,� � �,n i . In this research note, we assume that the response variable y ijk is continuous and normally distributed and investigate misspecification in the GLM with correlated errors for analyzing the data collected from such a crossover trial. In (1), μ is an intercept, π j is the effect associated with period j, τ d [i,j] represents the direct treatment effect associated with the treatment applied in period j of sequence i with d [i,j] = 1,2,� � �,t, λ d[i,j−1] denotes the first-order carryover effect from the treatment applied in the preceding period j−1 of sequence i with d [i,j−1] = 1,2,� � �,t, and λ d[i,0] = 0, and � ijk is the random error with zero mean and variance var (� ijk ). In the crossover trial, each subject is repeatedly measured during the p periods. Therefore, it is necessary to specify a shared variance-covariance structure S ¼ S ik ðcovð� ij 1 k ; � ij 2 k Þ is its (j 1 ,j 2 ) entry) for the GLM (1) to account for the correlated response measurements from each subject. In the analysis of two-period crossover trials, we specify a compound symmetry (CS) covariance structure for S. With this specification, (1) is equivalent to a random-intercept model if the covariance components in S are non-negative. In the analysis of three-period crossover trials, two covariance structures are considered: compound symmetry and unstructured (UN) covariance structure. Additional terms such as the second-order carryover effect, direct treatment-by-carryover interaction effect, and direct treatment-by-period interaction effect, can be added to (1). However, such terms are rarely of much interest in practice [4] and are not included in our investigation. The estimation strategy for (1) is to achieve unbiased estimation of both fixed effects and covariance parameters simultaneously by using a likelihood function. For a specified covariance structure S, the maximum likelihood (ML) estimator for the fixed effects is the generalized least squares (GLS) estimator. If we assume the response variable y ijk in (1) normally distributed, let Y ¼ ðy 111 ; . . . ; y 1p1 ; . . . ; y spn s Þ 0 and let X represent the design matrix, the GLS estimator of fixed effects vector β iŝ where N represents the Kronecker product. Asymptotically, b � Nðb; ðX 0 ðS À 1 N I n ÞXÞ À 1 Þ If S is known, the GLS estimator (2) would be a best linear unbiased estimator (BLUE). However, S is usually unknown in practice. We then estimate the parameter by substituting the unknown S with its estimateŜ, For a specified structure pattern for S,b E in (3) can be estimated using the maximum likelihood (ML) method with a reduced log-likelihood [18]. However, Diggle et al. [18] noted that the ML estimation presents with conflict because a large design matrix is needed for consistent estimates, whereas a design matrix with a small number of columns is required to yield approximately unbiased estimation. The method of restricted maximum likelihood (REML) [19] is usually applied as the objection to the ML procedure that can produce biased estimators for covariance parameters [20]. Swallow and Monahan [21] recommend the REML method over other variance component estimation methods on the basis of the results from their simulation studies. The REML method for covariance parameters estimation is used throughout this article, combined with an adjustment procedure developed by Kenward and Roger [22]. This procedure had been proved with a notable effect in controlling type I errors in small sample size studies [22]. Inference based on this combined procedure is more reliable than others for analyzing crossover trials [11].

Misspecification of carryover and period effects
Model misspecification remains a critical issue for the data analysis of crossover clinical trials because misspecified models can create bias in treatment effect estimation and the corresponding hypothesis testing. Here, we investigate the impact of misspecification in the GLMs with correlated errors in the analysis of data collected from crossover clinical trials. The primary objective of our investigation is to gauge the extent of inference bias on treatment effects when carryover effect, period effect, or both are omitted in such models. Considered in this research note are four analysis approaches, including three GLMs with distinct mean structures and a naïve hypothesis testing procedure: the GLM (1) is the approach that is considered with both period effect and carryover effect, and it is abbreviated as the "PE-CE model". The GLM that only includes period effect is specified as and is abbreviated as the "PE-NCE model"; the GLM that does not include either period effect or carryover effect is specified as and is abbreviated as the "NPE-NCE model". The naïve hypothesis testing procedure refers to the approach of treating only the measurements from the first period as a randomized, parallel study and ignoring other periods (i.e., periods 2 to p). For a two-period crossover trial assessing two treatment groups, the two-sample t-test is used; for a three-period crossover design, the one-way analysis of variance, or ANOVA, is used to determine the treatment effects.

Motivation: Two crossover clinical trials for investigational medical devices
Contact lenses are medical devices used to provide flexible and convenient vision correction. Contact lenses can be used to correct various vision disorders, including myopia, hyperopia, presbyopia, and astigmatism. The numerical investigation reported in this research note was motivated by two real-world randomized crossover clinical trials for investigational contact lenses: a two-treatment, two-period crossover trial comparing a new test contact lens with a control lens and a three-treatment, three-period crossover trial comparing two new contact lenses with a control lens. Both clinical trials are balanced design, enrolling 48 subjects for the two-period, two-sequence trial (24 subjects in each of the 2 sequences: CT and TC; "C" represents the control lens and "T" represents the test lens) and 18 subjects for the three-period trial (3 subjects in each of the 6 sequences: CT 1 T 2 , CT 2 T 1 , T 1 CT 2 , T 1 T 2 C, T 2 CT 1 and T 2 T 1 C; "C" represents the control lens, "T 1 " represents the first test lens, and "T 2 " represents the second test lens). The primary endpoint of both trials was the subjective visual quality scores for the test versus control lenses on a scale from 0 to 100, with 0 denoting unfavorable visual quality. Boundary issues on the primary endpoint were ignored in the analysis. For the two crossover trials, Table 1 displays and compares the estimated treatment effects, period effects, and first-order carryover effects and their standard errors obtained from the PE-CE model, the PE-NCE model, the NPE-NCE model, with CS and UN covariance structures, and from the naïve hypothesis testing procedure (two-sample t-test or one-way ANOVA) that only tests the outcome measurements from the first study period. Throughout this note, the treatment effect refers to the direct effect of a treatment minus the direct effect of its control, and thus, is distinguished with the direct treatment effects defined in (1).
For the two-period crossover trial, four analysis approaches all suggested significantly higher visual quality scores for the test lens than for the control lens. The estimates of the treatment effect and corresponding standard errors obtained from the naïve two-sample t-test and the PE-CE model were almost identical. Substantial carryover effect (approximately 40% of treatment effect) and a modest period effect (approximately -15% of main treatment effect) were detected from the PE-CE model, although neither effect was statistically significant. The treatment effect estimates obtained from the PE-NCE model and the NPE-NCE model were similar, and these treatment effect estimates were distinct from those obtained from the naïve two-sample t-test and the PE-CE model.
For the three-period crossover trial, the visual quality scores of two new test lenses and those of the control lens were similar in all four analysis approaches. The effect estimates differed when specifying a CS or UN covariance structure for the GLMs. The treatment effect estimates given by the one-way ANOVA procedure and the PE-CE model were not comparable. This dissimilarity may be due to the small sample size, large between-subject variation, or the higher-order carryover effects that were not considered in the analysis. The treatment effect estimates and corresponding standard errors given by the three GLMs were close to but still distinct from each other.
Analysis results in Table 1 reveal that the GLMs with different model specification can result in uncertainty in treatment effect estimation and hypothesis-testing conclusions. Therefore, model misspecification remains an issue that will largely affect the use of the GLMs with correlated errors in the analysis of crossover trials. This fact motivated us to investigate the impact of model misspecification in the GLMs in such an analysis task. A series of simulation studies were designed and conducted to assess whether treatment effect estimation, as well as type I error and power in hypothesis testing, will be affected by misspecified period effects, carryover effects, and variance-covariance structures in the GLMs with correlated errors.

Two-period, two-treatment crossover trials
We designed and conducted two simulation studies to investigate the impact of misspecification of period and carryover effects in the GLMs with correlated errors for analyzing the data collected from two-period, two-treatment (two-by-two) crossover trials. It was assumed that, in the two-by-two crossover trials, a treatment (denoted by "T") and a control (denoted by "C") were compared through two sequences, TC and CT.

Type I error under true null hypotheses
To investigate the impact of misspecification of period and carryover effects on Type I error obtained from testing treatment effects, 2000 datasets were generated from the GLM (1), with sample sizes of n = 20 and n = 200 and with both a balanced design (CT:TC = 1:1) and an unbalanced design (CT:TC = 1:3). When generating the datasets, standard deviation of the response variable was fixed, and then three sets of period effects (period 2 relative effect) were considered as −15%, 0%, or 25% of the response standard deviation. Under the true null hypotheses that no treatment or carryover effects exist (when the treatment effect is zero, it Table 1. Comparison of estimated treatment effects, period effects, and first-order carryover effects and their standard errors (in parentheses) given by the three GLMs (with CS and UN covariance structures) and naïve hypothesis testing procedure for real-world two-period and three-period crossover trials.

Two-Period Crossover Trial
Two-sample t-test 24.6 (7. was reasonable to assume the corresponding carryover effect is zero as well), both the treatment effects and carryover effect differences were set to be zero. Then, the response values were generated according to the GLM (1) with a CS covariance matrix using the within-subject correlation coefficient 0.2, 0.5, or 0.7, respectively. For each dataset, the four analysis approaches (two-sample t-test that only analyzes the outcome measurements from the first study period, the PE-CE model, the PE-NCE model, and the NPE-NCE model) were used to estimate the treatment effect and test whether the treatment effect was equal to zero. The Wald test was conducted in the hypothesis testing of treatment effects with the GLMs. The empirical type I error rates of the four approaches obtained in different scenarios are summarized in Fig 1. All four approaches yielded type I error rates near the nominal level of 5% (Fig 1A and 1C) when analyzing the data simulated from the balanced crossover trials, regardless of sample sizes (small or large) and period effects (zero or not). For the data simulated from the unbalanced crossover trials, the type I error rates obtained from the two-sample t-test, the PE-CE model, and the PE-NCE model were still within 5% of the nominal level. However, the NPE-NCE model produced noticeably inflated type I error rates when the period effect existed (Fig 1B and 1D), especially with a large sample size.

Power and estimation bias
To investigate the impact of misspecification of period effects, carryover effects, and covariance structures on estimation of treatment effect and power obtained in testing treatment effects, 2000 datasets were generated from the two-by-two crossover trials as described above but with nonzero treatment and carryover effects. In this simulation study, the treatment effect was fixed at the size of 50% of the response standard deviation, and the carryover effect difference was 0%, 15%, or 25% of the response standard deviation. This is equivalent to assuming that 0%, 30%, or 50% of the treatment effect was carried over from the first to the second period.
For each generated dataset, the four analysis approaches were used to estimate the treatment effect and to construct a 95% confidence interval for the treatment effect. We calculated the average percent error (PE = 100 × (Estimated Treatment Effect−True Treatment Effect)/ True Treatment Effect) that quantifies the estimation bias of the treatment effect, the power (the percentage of simulated datasets for which the 95% confidence intervals of the treatment effect estimates did not cover zero), and the coverage probability (CP, the percentage of simulated datasets for which the 95% confidence intervals of the treatment effect estimate covered the true value). Table 2 and Table 3 summarize the estimation bias quantified by the PE, the power, and the CPs of the 95% confidence intervals obtained by analyzing the datasets for the Table 2. Average percent error, power, and 95% coverage probabilities obtained from the analysis of two-period, two-treatment crossover trials based on 2000 simulated datasets, balanced design, sequence CT:TC = 1:1, sample size n = 20, and effect size 0.5. balanced and unbalanced two-by-two crossover trials with the small sample size n = 20. Simulation results obtained from the large sample size n = 200 were similar, and thus are not reported here. For the two-by-two crossover trials with a balanced design (Table 2), the two-sample t-test and the PE-CE model produced similar unbiased estimators. The PE-NCE model and the NPE-NCE model gave treatment effect estimates that were close to each other but were biased when carryover effects existed, and the bias was proportional to the magnitude of the carryover effects. The two-sample t-test generated power similar to that of the PE-CE model, and the power was stable as the period effect, carryover effect, and within-subject correlation coefficient changed. The PE-NCE model and the NPE-NCE model had larger power than did the two-sample t-test or the PE-CE model. The power remained stable with different magnitudes of the period effect, but decreased as carryover effect increased, and increased as the withinsubject correlation coefficient increased. For the two-by-two crossover trials with an unbalanced design (Table 3), the two-sample t-test and the PE-CE model still produced comparable unbiased estimators and power. However, estimation bias of the treatment effect and the Table 3. Average percent error, power, and 95% coverage probabilities obtained from the analysis of two-period, two-treatment crossover trials based on 2000 simulated datasets, unbalanced design, sequence CT:TC = 1:3, sample size n = 20, and effect size 0.5. power obtained from using the PE-NCE model and the NPE-NCE model were not close anymore. Treatment effect estimates remained biased when carryover effects existed for the PE-NCE model and the NPE-NCE model when analyzing the unbalanced two-by-two crossover trials. The NPE-NCE model produced larger relative bias than did the PE-NCE model with up to 60% PE, when substantial period effect and carryover effect existed.

Three-period, three-treatment crossover trials
We designed and conducted two more simulation studies to investigate the impact of misspecification of period and carryover effects, as well as covariance structures, in the GLMs with correlated errors for analyzing the data collected from three-period, three-treatment crossover trials. It was assumed that, in the three-period, three-treatment crossover trials, two treatments (denoted by "T 1 " and "T 2 ", respectively) and a control (denoted by "C") were compared through six sequences: CT 1 T 2 , CT 2 T 1 , T 1 CT 2 , T 1 T 2 C, T 2 CT 1 , and T 2 T 1 C.

Type I error under the true null hypotheses
To investigate the impact of misspecification of period and carryover effects and covariance structures on Type I error obtained from testing treatment effects, 2000 datasets were generated from the GLM (1) with sample sizes of n = 24 for balanced three-period, three treatment crossover trials (n/6 subjects in each of the six sequences) and n = 240 for unbalanced threetreatment crossover trials (CT 1 T 2 :CT 2 T 1 :T 1 CT 2 :T 1 T 2 C: T 2 CT 1 :T 2 T 1 C = 4:4:1:1:1:1). When generating the datasets, standard deviation of the response variable was fixed, and then three sets of (Period 2 effect, Period 3 effect) combinations were considered as (−6%,−15%), (0%,0%), or (10%,25%) of Period 3 standard deviation. Given the assumed true null hypotheses on treatment effects, it was assumed that both the treatment effects and carryover effect differences were zero. Then, the response values were generated according to the GLM (1) with two CS covariance matrices using the within-subject correlation coefficient of 0.2 or 0.7, a Toeplitz (TP) covariance matrix representing homogeneous variance and distinct pairwise correlation coefficients at three periods, and an unstructured (UN) covariance matrix (specification of the covariance matrices are illustrated in Table 4).
For each dataset, one-way ANOVA that only analyzes the outcome measurements from the first study period and three GLMs (the PE-CE model, the PE-NCE model, and the NPE-NCE model) with both CS and UN covariance structures were used to estimate the treatment effects and to test whether the treatment effect of T 1 was zero, given that the treatment effect of T 2 was negligible. Additional simulation studies showed that the magnitude of treatment effect of T 2 had little impact on estimates, type I error rates, and power of the treatment effect of T 1 (results are not shown). Fig 2 shows the empirical type I error rates of the four analysis approaches for testing the treatment effect of T 1 with the small sample size n = 24. When the sample size increased to 240, the patterns of type I error rates were unchanged, and therefore, are not shown here. For the balanced three-treatment crossover trials, both one-way ANOVA and three GLMs with the UN within-subject covariance structure maintained an adequate control of the type I error level (Fig 2A, 2C, 2E, and 2G). The three general models with the CS Table 4. Variance-covariance matrices specified in generating simulation datasets for the three-period, three-treatment crossover trials. Misspecification in general linear models with correlated errors for the analysis of crossover clinical trials covariance structure performed well in terms of type I error for the datasets that were simulated from the true CS and UN covariance structures (Fig 2A, 2C and 2G). However, the PE-CE model tended to have an inflated type I error rate for the datasets simulated from the true TP covariance structure (Fig 2E). The inflation did not improve as the sample size increased to 240 (results are not shown). For the unbalanced three-treatment crossover trials, the ANOVA method and the PE-CE model with the UN covariance structure yielded type I error rates near the nominal 5% of the datasets that were simulated from different true covariance structures (Fig 2B, 2D, 2F, and 2H). In contrast, three GLMs with the CS covariance structure did not maintain control of type I error rates when the covariance structure was misspecified (Fig 2F).

Power and estimation bias
To investigate the impact of misspecification of period effects, carryover effects, and covariance structures on estimation of treatment effects and power obtained in testing treatment effects, 2000 datasets were generated from the three-treatment crossover trials as described above, but with nonzero treatment and carryover effects of T 1 . The T 1 treatment effect was fixed at the size of 50% of the Period 3 standard deviation, and its carryover effect difference was set up as 0% or 25% of the Period 3 standard deviation, which is equivalent to assuming that 0% or 50% of the T 1 treatment effect was carried over from one period to another. The treatment effect of T 2 and its carryover effect difference were assumed to be zero. For each dataset, one-way ANOVA and three GLMs (the PE-CE model, the PE-NCE model, and the NPE-NCE model) with both CS and UN covariance structures were used to estimate the treatment effect of T 1 and to construct a 95% confidence interval for the treatment effect. Table 5 and Table 6 present the estimation bias quantified by the PE, the power, and the CPs of the 95% confidence intervals from the balanced and unbalanced three-treatment crossover trials with small sample size. Simulation results obtained from the large sample size n = 240 were similar and are not reported here.
For the three-treatment crossover trials with a balanced design (Table 5), one-way ANOVA and the PE-CE model with either a CS or an UN covariance structure produced unbiased treatment effect estimates of T 1 when analyzing the data that were simulated from three different covariance structures including the TP structure. The PE-NCE model and NPE-NCE model with an identical covariance structure generated similar treatment effect estimates, and these estimates were biased when the carryover effects existed. The bias was proportional to the magnitude of the carryover effects. The three GLMs with either a CS or an UN covariance structure were more powerful than was the one-way ANOVA in detecting whether the treatment effect of T 1 was zero, regardless of the presence of period and carryout effects. The power of these models increased as the within-subject correlation coefficient of the CS covariance structure increased from 0.2 to 0.7.
The GLMs with a CS covariance structure provided slightly larger power than did the corresponding models with an UN covariance structure when analyzing the datasets generated from the CS variance-covariance matrices. However, when the datasets were simulated from the UN covariance structure, misspecification of covariance structure in GLMs reduced the power by more than 20%. For the three-treatment crossover trials with an unbalanced design (Table 6), the same patterns in PE and power were observed for the one-way ANOVA procedure and the three GLMs. A cross-table comparison between the numerical results in Table 5 for the three-treatment crossover trials with a balanced design and the results in Table 6 with an unbalanced design revealed that the power presented in Table 6 was obviously lower than the power in the corresponding position in Table 5.

Discussion and conclusion
In this research note, we report Monte-Carlo simulation studies on the impact of misspecification of period and carryover effects, as well as covariance structures, in the GLMs with correlated errors for analyzing the data collected from crossover clinical trials. For the two-by-two crossover trials comparing two treatments, the four analysis approaches tested all provide reasonable control of type I error, except for the NPE-NCE model as a misspecified model. The PE-CE model cannot improve power from the naïve two-sample t-test that analyzes the data from the first period. Due to model misspecification, the treatment effect estimates given by the PE-NCE and NPE-NCE models are biased when period and carryover effects exist. It should have been plausible to consequently recommend prioritizing the use of the PE-CE models over other approaches. However, in the two-by-two crossover trials, the carryover effects are not identifiable unless further assumptions are made for these effects. In the simulation studies reported in this note, we assume the carryover effect difference is proportional to the treatment effect. Therefore, our simulation results indicate that the advantage of two-bytwo crossover design vanishes when carryover effects do. This type of crossover trial is only recommended with prior knowledge that the carryover effects are trivial, in which the PE-NCE models are recommended for data analysis. Using a washout period between two periods is highly encouraged to eliminate carryover effects. Table 5. Average percent error, power, and 95% coverage probabilities obtained from the analysis of three-period, three-treatment crossover trials based on 2000 simulated datasets, balanced design, sequence CT 1 T 2 :CT 2 T 1 :T 1 CT 2 :T 1 T 2 C: T 2 CT 1 :T 2 T 1 C = 1:1:1:1:1:1, sample size n = 24, and an effect size of 0.5 for T 1  For three-treatment crossover trials with a substantial number of sequences, the GLMs including carryover and period effects (i.e., the PE-CE models) can provide significantly higher empirical power than does the one-way ANOVA approach, by which only the measurements from the first period are tested, and unbiased treatment effect estimates can be attained with this model assuming the UN covariance structures. Otherwise, misspecification in either period effects, carryover effects, or covariance structures can induce inflated type I error, declined power, or biased treatment effect estimates. Therefore, we recommend adopting the PE-CE model with a UN covariance structure for the data analysis in this setting. Additionally, the balanced crossover design should be preferred over the unbalanced crossover design. The numerical results also indicate that, to achieve additional power against the conventional parallel-group study design, the two-period crossover design is recommended only when the Table 6. Average percent error, power, and 95% coverage probabilities obtained from the analysis of three-period, three-treatment crossover trials based on 2000 simulated datasets, unbalanced design, sequence CT 1 T 2 :CT 2 T 1 :T 1 CT 2 :T 1 T 2 C: T 2 CT 1 :T 2 T 1 C = 4:4:1:1:1:1, sample size n = 24, and an effect size of 0.5 for T 1  carryover effect is negligible, and the three-period crossover design is recommended even when substantial carryover effect exists. However, to make a fair comparison between parallelgroup and crossover designs, investigators need to consider costs and duration of the clinical trials.
In this research note, we only considered two-period, two-treatment and three-period, three-treatment crossover trials involving first-order carryover effects. Extrapolation of our results and recommendations beyond this range of design specifications require further investigation and evidence.