A Comparison of Four Methods for the Analysis of N-of-1 Trials

Objective To provide a practical guidance for the analysis of N-of-1 trials by comparing four commonly used models. Methods The four models, paired t-test, mixed effects model of difference, mixed effects model and meta-analysis of summary data were compared using a simulation study. The assumed 3-cycles and 4-cycles N-of-1 trials were set with sample sizes of 1, 3, 5, 10, 20 and 30 respectively under normally distributed assumption. The data were generated based on variance-covariance matrix under the assumption of (i) compound symmetry structure or first-order autoregressive structure, and (ii) no carryover effect or 20% carryover effect. Type I error, power, bias (mean error), and mean square error (MSE) of effect differences between two groups were used to evaluate the performance of the four models. Results The results from the 3-cycles and 4-cycles N-of-1 trials were comparable with respect to type I error, power, bias and MSE. Paired t-test yielded type I error near to the nominal level, higher power, comparable bias and small MSE, whether there was carryover effect or not. Compared with paired t-test, mixed effects model produced similar size of type I error, smaller bias, but lower power and bigger MSE. Mixed effects model of difference and meta-analysis of summary data yielded type I error far from the nominal level, low power, and large bias and MSE irrespective of the presence or absence of carryover effect. Conclusion We recommended paired t-test to be used for normally distributed data of N-of-1 trials because of its optimal statistical performance. In the presence of carryover effects, mixed effects model could be used as an alternative.

Introduction N-of-1 trials (single case of randomized controlled trials, randomized controlled trial in individual patient) are multicycle, double-blinded controlled cross-over trials based on individuals [1,2,3,4]. N-of-1 trials are designed to test the effect difference of two treatments which are conventionally labeled as Group A (test group) and Group B (control group). The two periods in each cycle are randomly assigned to different treatments for each subject with a washout period. Figure 1 shows a typical 3-cycles N-of-1 trials.
Evidence-Based Medicine Working Group suggested that N-of-1 trials provided the strongest evidence for the decisions of the individual patient [16]. However, N-of-1 trials have not been widely used. One reason was that N-of-1 trials required relatively stable symptoms or diseases, medications with short half-lives, and rapid measurable responses [3,17]. Another important reason was related to the difficulties about the statistical analysis of the data [18].
To analyze the normally distributed data of N-of-1 trials, a number of methods have been proposed. Gabler et al reported that 52% of 108 articles used a visual/graphical representations without statistical comparison, 44% t-test, and 24% pooled analysis [19]. Visual analysis which plotted the data on a graph was commonly used [20], though its validity remained unclear. The parametric tests, such as z-test, two samples t-test, paired t-test and analysis of variance were widely applied to analyze such data [21]. Meta-analysis of summary data (short for meta-analysis) was proposed to estimate the pooled treatment effect for more than two subjects [22]. Individual participant data meta-analysis used linear mixed models to estimate treatment effect while accounting for correlations deriving from the individuals [8,22,23,24]. Mixed effects model and generalized estimating equation based on the effect difference of two treatments in each cycle were established [18,25].
However, it remained unclear which method should be adopted to provide more robust inferences for data of N-of-1 trials. To provide a practical guidance for the analysis of N-of-1 trials, we conducted a simulation study of 3-cycles and 4-cycles N-of-1 trials to compare the performance of four methods under various variance-covariance structures.

Methods
For simplicity, the four methods are introduced below in 3cycles design.

Paired t-test (Model 1)
In N-of-1 trials, each cycle with two periods which are assigned to Group A or Group B is considered as a pair. For example, in 3cycles N-of-1 trials with 10 subjects enrolled, 30 pairs of data are delivered because each subject provides 3 pairs of data. Paired t-test (Model 1) is then used to analyze the 30 pairs of data altogether, which does not account for between-subject effects.

Mixed Effects Model of Difference (Model 2)
The difference of the two groups in the same cycle is calculated. There are three differences for each subject. Mixed effects model of difference can be formulated as where y ih denotes the i-th (i = 1, 2, …, n) subject's difference for the h-th (h = 1, 2, 3) cycle. The intercept m represents the overall mean of effect difference of Group A and Group B. t h represents cycle effect for the h-th cycle. c i indicates random effects of the i-th subject. e ih represents the i-th subject's random error for the h-th cycle.
If there is only one subject (n = 1), Model 2 reduces to: y h~m ze h . y h denotes the subject's difference for the h-th (h = 1, 2, 3) cycle. e h represents random error for the h-th cycle. So the model is identical to paired t-test (Model 1) when n = 1.

Mixed Effects Model (Model 3)
We assume that carryover effect is only caused by treatment effect of the previous period and not by other periods. All the periods except the first period have carryover effect. Therefore, there are three kinds of effects: no carryover effect, carryover effect of Group A (l A ) and carryover effect of Group B (l B ). l A and l B represent carryover effect left over by Group A and Group B in the previous period respectively. Based on mixed effects model established by Zucker et al [25], we add period effect and carryover effect into the model. Considering all response values in six periods, mixed effects model (Model 3) is set as where y ij denotes the i-th(i = 1, 2, …, n) subject's response value for the j-th (j = 1, 2, …6) period. a is the intercept of the model. m means the overall mean of effect difference between two groups. group = 1 if the subject is assigned to Group A, and group = 0 if the subject is assigned to Group B. t j represents period effect for the j-th period. l A and l B represent carryover effect of Group A and Group B respectively. Z A and Z B are indicator variables. Z is a dummy variable. Z A = 1 (Z B = 1) represents that there is carryover effect in Group A (Group B); Z A = 0 and Z B = 0 mean no carryover effect in both Group A and Group B. c i indicates random effects of the i-th subject. e ij represents i-th subject's random error for the j-th period. The covariance structure for random error of each subject (var(e ij )) is equal to D. Random subject effect c i * iid N(0,s 2 i ). c i is independent on e ij . If there is only one subject (n = 1), Model 3 reduces to: Here, y j denotes the response value of the j-th period, and e j represents random error of the j-th period.

Meta-analysis (Model 4)
Each subject of N-of-1 trials is considered as a separate trial (study). A typical method to analyze n.1 N-of-1 trials is to use meta-analysis [22]. Meta-analysis combines summary data from each subject to form a weighted average using the method of Der-Simonian and Laird: where y y Ai and y y Bi denote the means of Group A and Group B for the i-th individual respectively. S Ai and S Bi are their corresponding standard deviations respectively. n Ai and n Bi are the numbers of i-th individual receiving Group A and Group B respectively. d i represents mean of effect difference for the i-th individual, and w i is its weight. Meta-analysis is not well-defined for N-of-1 trials for n = 1 subject.

Generating Data
The continuous 6-dimensions (for 3-cycles) or 8-dimensions (for 4-cycles) normal distribution data were generated by a multivariate normal random number generator (a SAS Macro) [26,27] based on mixed effect model. That was, y ij was generated from multivariate normal distribution according to compound symmetry (CS) structures or first-order autoregressive (AR) structure. Two groups (A or B) of each cycle in each subject were randomly assigned using ''Proc Plan'' in the SAS 9.2 software. All the subjects in each period were also randomly allocated into Group A or Group B to assure that half of the subjects in each period receive Group A or Group B. The actual response value of each subject was produced according to the allocations. For example, the allocation sequence of the first subject in six periods was BABAAB.
We assumed that carryover effect was caused only by the previous period, and was equal to a certain percentage of the previous treatment effect. We set carryover effect with two levels, no carryover effect (0%) and the presence of it with 20% of the previous treatment effect. That was, l A and l B was both set to 0% or 20% of the previous treatment effect. For instance, suppose that the allocation sequence of a subject was ABBABA and no carryover effect in the first period. Carryover effect of Group A was added to effect of Group B in the second period, carryover effect of Group B was then added to effect of Group B in the third period, and so on. Carryover rate was defined as carryover effect divided by treatment effect.

Parameter Setting
The 3-cycles (six periods) and 4-cycles (eight periods) designs of N-of-1 trials were included in the study. The sample size was set to 1, 3, 5, 10, 20 and 30 respectively. Three variance-covariance matrices were assumed as compound symmetry (CS) structures, and two variance-covariance matrices were assumed as first-order autoregressive (AR) structures. All variances in CS or AR structures were set to 1. The covariance (correlation coefficient) of CS structures were set to 0 (CS1), 0.5 (CS2) and 0.8 (CS3) respectively. The covariances of AR1 structure were set to 0.5, 0.5 2 , 0.5 3 , 0.5 4 and 0.5 5 , with autoregressive coefficient of 0.5. The covariances of AR2 structure were set to 0.8, 0.8 2 , 0.8 3 , 0.8 4 and 0.8 5 , with autoregressive coefficient of 0.8. The true effect of Group B was equal to 2, while the true effect of Group A was equal to 2, 2.4, 2.6 and 3 respectively. Carryover rate of Group A and Group B were both set to 0% or 20%. The simulation was repeated 3000 times for each parameter setting.

Analysis and Model Assessment
Data analyses were conducted by the SAS 9.2 software package. To fit the models, covariance structures of Model 2 and Model 3 were set to the structure which was coincident with that of the data generated. Type I error, power, bias (mean error), mean square error (MSE) and percent error (PE) of effect difference were used to assess the performances of the models. Percent error of ME was absolute of ME divided by the true effect difference. Bias, MSE   and PE were calculated as follows: where m represented the true effect difference.m m m was the estimated effect difference for the m-th simulation. M was the number of simulation (M = 3000).

Results
Type I Error Table 1 presented type I error of the four models for n = 1, 3, 5, 10, 20, 30 in 3-cycles N-of-1 trials under five variance-covariance matrices structures with the assumption of what the true effect difference between two groups was 0 (m A = m B = 2). When n = 1, Model 4 was not performed, and the results of Model 1 and Model 2 were the same.
Under three CS structures, Model 1 and Model 3 were consistent with each other and yielded type I error near to 5% (the nominal level). Type I error of Model 2 was less than 5%, while that of Model 4 was far from 5% under three CS structures. Type I error of 4 models was less than 5% under two AR structures.
As carryover rate increased from 0% to 20%, type I error of Model 1, Model 2 and Model 4 slightly reduced, but Model 3 was unaffected.
The results of 4-cycles N-of-1 trials performed as well as that of 3-cycles N-of-1 trials (Table S1).

Power
The true effect differences were set to 0.4, 0.6 and 1.0 respectively. Table 2 showed the power of 3-cycles N-of-1 trials.
When n = 1, except for Model 4 (unavailable), three models yielded very low power less than 0.34. The power of Model 1 and Model 2 were the same. In practice, one individual design of N-of-1 trials should not be considered unless the effect size is sufficiently large.
The power of all models was increasing with sample size. When n .1, Model 1 yielded the highest power, followed by Model 3 at any setting. The power of Model 4 were greater than that of Model 2 when n #5. The power of Model 2 were greater than that of Model 4 when n $10.
Under most situations, CS3 structure yielded higher power than other four structures.
As carryover rate increased from 0% to 20%, the power of Model 3 was unaffected by carryover rate, and the power of Model 1, Model 2 and Model 4 had a slight decline.

Bias and MSE
In the absence of carryover effect, bias of Model 1, Model 2 and Model 3 was minuscule (near 0), while bias of Model 4 was larger.
Carryover rate had an impact on bias of Model 1, Model 2 and Model 4 except Model 3. In the presence of 20% carryover effect, Bias of 4-cycles N-of-1 trials was similar with that of 3-cycles N-of-1 trials, with smaller MSE (Table S3 and Table S4).

Examples
The first example was concerned with Ornithine transcarbamylase deficiency (OTCD) of 3-cycles N-of-1 trials [28]. A 48year-old patient with OTCD was treated by either L-arginine capsules (test group) or placebo capsules (control group) for weekly periods. The patient and physicians were blinded to treatment. Plasma glutamine as an endpoint was measured about 3 pm on 7 th day. The mean differences of plasma glutamine between two groups were estimated as 2137.7 (P = 0.078, 95% CI: 238.4, 313.8) by Model 1 (or Model 2) and 2118.7 (P = 0.129, 95% CI: 263.0, 300.3) by Model 3. The results showed that Model 3 yielded larger P value and wider confidence interval than Model 1 (or Model 2).
The second example was based on idiopathic chronic fatigue of 3-cycles N-of-1 trials [29]. N-of-1 double-blinded, randomized trials were performed on four physicians who complained of chronic fatigue. Each physician received three pairs of treatments comprising 4 weeks of Spirulina platensis (test group) and 4 weeks of placebo (control group), with 2 weeks washout time. Severity of fatigue was measured on a 10-point scale daily during the second half (weeks 3 and 4) of each period. The outcome was the mean fatigue scores in the same period. The data was re-analyzed using the four models. The effect difference estimators of fatigue between two groups were 20.11 (P = 0.485, 95% CI: 20.43,

Discussions
We conducted a simulation study to compare four models of N-of-1 trials. The process of generating data was based on five variance-covariance matrices with parameter setting of cycle number, carryover effect and sample size. The performance of four models were assessed by type I error, power, bias and MSE.
Paired t-test was recommended to use for normally distributed data of N-of-1 trials irrespective with or without carryover effect. Comparing with other three models, paired t-test yielded the highest power almost in all situations, with type I error nearer to the nominal level. Paired t-test had the smallest MSE almost in all situations, and its bias was comparable with mixed effects model and mixed effects model of difference. The real examples from Hackett et al [28] and Baicus et al [29] showed that the results were consistent with the simulation study. In addition; paired t-test was simple to apply, and the result could be explained easily. However, mixed effects model of difference, meta-analysis with the method of Der-Simonian and Laird were not suitable to analyze the data of N-of-1 trials, due to their low power, large bias and MSE. Table 3. Cont. Though carryover effect may exist in cross-over studies as like N-of-1 trials theoretically, it is not plausible in practice owing to enough washout period [30,31]. Based on our simulation results, 20% carryover effect has a little influence on the estimation compared with the absence of carryover effect. Therefore, carryover effect is usually ignored in clinical trials.
Mixed effects model was also recommended to use for the data of N-of-1 trials if there was carryover effect. Mixed effects model could deal with carryover effect flexibly, and separated carryover effect from treatment effect in n?1 N-of-1 trials. The results of 20% carryover effect showed that bias in mixed effects model was approximate equal to 0. However, mixed effects model took into account period effect, carryover effect, and random individual effect, resulting in a larger standard deviation for the estimators of effect difference. So mixed effects model yielded bigger MSE. As carryover rate increased from 0% to 20%, the power of mixed effects model was close to (lower than) that of paired t-test. In addition, mixed effects model had a variety of advantages, such as dealing with the missing data, considering numerous variancecovariance structures, and adding new covariables according to the different conditions. For example, some diseases might improve (such as self-healing disease) or deteriorate (such as cancer) over time. We could add a covariate of time into mixed effects model.
As sample size increased, type I error and bias of four models remain unchanged, MSE of four models gradually reduced, and the power of all the models (except meta-analysis) increased. The 3-cycles and 4-cycles N-of-1 trials had the comparable results in type I error and Bias. The 4-cycles N-of-1 trial yielded higher power and smaller MSE than 3-cycles N-of-1 trial because more periods led to the effect such as increasing sample size.
The simulation study essentially assumed that variance of within-subject was equal. We assumed that the variance of withinsubject is not equal, for instance, with a sample size of 20, withinsubject variance of 10 subjects being 1 and that of the other 10 subjects being 2. The simulation results show that type I error and bias of unequal variance of within-subject are similar to that of equal variance. The power of unequal variance is less than the corresponding power of equal variance. MSE of unequal variance is larger than that of the equal variance due to the bigger variance of raw data. These signify that unequal variance of within-subject could reduce the power but does not affect type I error or bias.
There were several limitations in this study (1) The simulation study considered only the compound symmetry and first-order autoregressive variance-covariance matrices. It did not take into account other variance-covariance matrices. In practice, the correlation between different times should be more complex rather than completely the same or autoregressive. (2) Unbalanced (different cycles) N-of-1 trials which are also applied in practice were not considered in our study. (3) The simulation study did not consider missing data. (4) The simulation assumed that each subject had one measurement (after treatment) per period, thus not accounting for potential information from repeated measurements within each period.
In conclusion, paired t-test was simple and easy to apply, with better statistical performance. It was recommended to use for normally distributed data of N-of-1 trials. Mixed effects model provided an alternative when there was carryover effect.