Sample Size Requirements for Studies of Treatment Effects on Beta-Cell Function in Newly Diagnosed Type 1 Diabetes

Preservation of -cell function as measured by stimulated C-peptide has recently been accepted as a therapeutic target for subjects with newly diagnosed type 1 diabetes. In recently completed studies conducted by the Type 1 Diabetes Trial Network (TrialNet), repeated 2-hour Mixed Meal Tolerance Tests (MMTT) were obtained for up to 24 months from 156 subjects with up to 3 months duration of type 1 diabetes at the time of study enrollment. These data provide the information needed to more accurately determine the sample size needed for future studies of the effects of new agents on the 2-hour area under the curve (AUC) of the C-peptide values. The natural log(), log(+1) and square-root transformations of the AUC were assessed. In general, a transformation of the data is needed to better satisfy the normality assumptions for commonly used statistical tests. Statistical analysis of the raw and transformed data are provided to estimate the mean levels over time and the residual variation in untreated subjects that allow sample size calculations for future studies at either 12 or 24 months of follow-up and among children 8–12 years of age, adolescents (13–17 years) and adults (18+ years). The sample size needed to detect a given relative (percentage) difference with treatment versus control is greater at 24 months than at 12 months of follow-up, and differs among age categories. Owing to greater residual variation among those 13–17 years of age, a larger sample size is required for this age group. Methods are also described for assessment of sample size for mixtures of subjects among the age categories. Statistical expressions are presented for the presentation of analyses of log(+1) and transformed values in terms of the original units of measurement (pmol/ml). Analyses using different transformations are described for the TrialNet study of masked anti-CD20 (rituximab) versus masked placebo. These results provide the information needed to accurately evaluate the sample size for studies of new agents to preserve C-peptide levels in newly diagnosed type 1 diabetes.


Introduction
Type 1 diabetes results from a T-cell mediated progressive autoimmune destruction of the insulin secreting pancreatic b-cells [1], and numerous therapeutic targets and agents have been proposed to ameliorate this process [2] based on a growing understanding of the underlying mechanisms. The measurement of C-peptide in response to a stimulus provides a valid and reliable measure of the effects of therapy on residual b-cell function [3], the preferred stimulus being a mixed-meal tolerance test [4], as recognized in the recent FDA guidance on drug development in newly diagnosed type 1 diabetes [5]. Unfortunately, published reports from recently completed trials generally do not present the measures of residual variation and other quantities needed to guide sample size determination for future trials. The best available data [3] were based on a pooling of data from prior published and unpublished studies in subjects with a wide range of diabetes duration, heterogeneous methods of collection and assays, and limited follow-up.
The Type 1 Diabetes Trial Network, established by the National Institute of Diabetes, Digestive and Kidney Diseases, recently conducted two therapeutic trials in recent onset type 1 diabetes. Herein the available data from these studies are used to describe the effects of different transformations on the distributional properties (e.g. normality) of the C-peptide values, and to evaluate the sample size (or power) for a new study.

Subjects
The anti-CD20 study [6] enrolled 87 subjects, 81 meeting the intention-to-treat criteria (52 rituximab, 29 placebo). The results showed that rituximab significantly preserved b-cell function at the primary 12-month outcome visit [6]. The analyses herein employ the 30 placebo treated subjects who completed the 12 month examination, including an additional placebo subject who had been excluded from the intention-to-treat cohort because placebo infusions (double masked) were halted owing to a safety alert.
The MMF/DZB study [7] included 126 subjects randomly assigned to either mycophenolate mofetil alone or in combination with daclizumab, or a control group, who were followed for up to 2 years. Therapy was terminated for futility in the spring of 2008 by the external Data and Safety Monitoring Board after observing virtually no differences in C-peptide levels among the treatment groups. Further, since the two treated groups in the MMF/DZB study [7] were no different from placebo, the data from the 126 MMF/DZB study subjects were pooled with the 30 anti-CD20 placebo control group subjects as the basis for the analyses herein.

Methods and Procedures
The MMF/DZB and anti-CD20 studies enrolled male and female subjects between ages 8-45 within 100 days of diagnosis of Type 1 diabetes who had at least one islet autoantibody and peak stimulated C-peptide §0:2 pmol/ml. Stimulated C-peptide values were obtained during a 2 or 4 hour mixed-meal tolerance test (MMTT) [4] conducted at baseline, 3, 6, 12, 18 and 24 months. Only the 2-hour data are employed herein. Over 5 minutes, participants ingested the Boost liquid oral dietary supplement (mixed meal, Nestlé HealthCare Nutrition, Inc.) dosed relative to body weight. Basal (fasting) plasma samples were collected 10 minutes prior to the meal (210), just prior to the time of ingestion (0), and at 15, 30, 60, 90 and 120 minutes thereafter. Cpeptide levels were measured centrally at the b-cell function laboratory (Seattle, WA). The primary outcome was the area under the 2-hour curve (AUC) in pmol/ml/120 min computed using the trapezoidal rule. The corresponding '' AUC mean'' in pmol/ml is obtained as AUC/120 [3,4]. Non-measurable timed values were set equal to the lower limit of quantification of the assay before computing the AUC.

Statistical Considerations
Most C-peptide values will fall between 0 and 1 and the distribution is positively skewed [3]. Thus, scale-contracting transformations were considered. However, the log transformation could introduce negative skewness because log(x) approaches {? as the value x approaches zero. This can be corrected by using log(xz1) [8]. The square-root transformation compresses the distribution of values w1 and slightly expands the distribution of values between 0 and 1. Both the MMF/DZB and anti-CD20 studies pre-specified that the primary analyses would employ the log(xz1) values.
Commonly, the primary analysis compares the mean of the Cpeptide values between treatment groups after a period of treatment such as 12 or 24 months. With normally distributed errors, the most powerful test is an Analysis of Covariance (ANCOVA) adjusting for the baseline C-peptide value [9], and other baseline factors such as age and sex as previously recommended [3]. Algebraically, this is equivalent to an analysis using the change from baseline when adjusted for the baseline value [10]. Herein the analyses of the follow-up AUC mean values from the 2 hours of the MMTT are presented using the combined data from the two studies with an adjustment for study (MMF/ DZB versus anti-CD20) and treatment group so as to account for any chance differences among studies and groups.
ANCOVA assumes normally distributed residuals with constant variation over the range of C-peptide values (homoscedasticity). The residuals were obtained from the regression of each subject's raw or transformed variables on age, sex, study and treatment group within study. The distribution of the residuals was evaluated using quantilequantile plots [11]. The Shapiro-Wilks test [12] assessed departures from normality. White's test [13] assessed the assumption of homoscedasticity (constant error variances among subjects).
For each transformation y~f (x), the mean values and confidence limits are presented using the inverse transformation applied to the mean of the transformed values, (y) {1 , and the corresponding confidence limits. Thus, for an analysis using y~log(x), the inverse mean is the geometric mean x~exp(y). For an analysis using y~log(xz1), the inverse mean is the geometric-like mean x~exp(y){1. For an analysis using y~ffi ffiffi x p , the inverse mean is x~y 2 .
An analysis using the log-transformed values is readily described in terms of geometric means and a percentage difference between groups. The final Results sub-section on Statistical Computations shows how an analysis using the log(xz1) transformed values can also be described as a ratio of geometric means, and an analysis using the ffiffiffi x p values can be described as a difference in ordinary means, both in units of pmol/ml. That section also derives expressions that can be employed to compute the standard errors and confidence limits for the inverse transformed means following an analysis with either the log(xz1) or ffiffiffi x p transformations.

Distribution Properties
The characteristics of the subjects in the two studies were comparable owing to the similar eligibility criteria (Table 1). Of the complete cohort, 152 (97%) were evaluated at 12 months and 118 (76%) at 24 months, the latter owing to early termination of the MMF/DZB study. Figure 1 also presents box and whisker plots of the values over time which show that the distributions are strongly positively skewed with an elongated right tail. Figures 2,3,4,5 show the quantile-quantile (Q-Q) plots of the residuals from an analysis of the raw and the transformed values at 12 and at 24 months adjusted for the baseline value, age, sex, study (MMF/DZB versus anti-CD20) and treatment group within study. These plots compare the empirical quantiles (the dots) versus those expected from a normal distribution (the diagonal line). The observed data is normally distributed when the observed values fall directly on the line. For both the 12 and 24-month data, the raw AUC mean values show the most severe departures from normality with a distribution that is far too peaked and with right skewness. The ideal normal distribution would in fact have longer symmetric tails that would fall outside of the range of the observed values. The log(x) transformation expands the left tail, but too much so, generating left skewness and does not correct for the peakedness. The log(xz1) and the ffiffiffi x p transformation both provide a more symmetric and less peaked distribution relative to the ideal normal. These analyses suggest that the distributional assumptions for an ANCOVA test of means at 12 or 24 months are adequately met using either the log(xz1) or using the ffiffiffi x p transformation. The log(x) values appear to have substantial left skewness but this is attributable to a small number of values close to zero. In practice, this transformation may also be considered. Thus, in the following we describe the assessment of sample size using all of these approaches.

Sample Size Computations
The power of the test of means depends on the absolute difference between groups on the chosen scale, the sample size and the residual variation as measured by the standard deviation (SD) or root mean square error (RMSE) s. For a given type I error probability a, residual RMSE (SD) s, and fraction of subjects assigned to the treated group Q, equation (1) of the sub-section on Statistical Computations provides the sample size N needed to provide desired power 1{b to detect a difference D~m 1 {m 2 in the mean values on the chosen scale in the treated group and control groups [14]. If no transformation was employed then D is the difference in the untransformed or raw mean values. If a transformation was employed then D is the difference in the means of the transformed values. Table 3 presents the quantities needed to compute sample size or power for non-stratified analyses without or with a transfor- mation separately for an analysis at 12 months and at 24 months. The table also presents the relevant quantities within three age strata shown in Table 1 since the mean C-peptide values and standard deviation vary according to age, as described below. The baseline transformed and inverse mean is also presented for reference. While the trial properties could be described in terms of the change from baseline, as described in the methods, from [10] in terms of changes from baseline would have the exact same power as an analysis of the follow-up visit values when both analyses are also adjusted for baseline. Thus, for simplicity, computations are described in terms of the month 12 or 24 values, not the changes from baseline.
Power depends on the difference between the means on the chosen scale (i.e., without or with transformation), and the smaller this difference the lower the power. Likewise, power depends on the residual RMSE, and the higher the value the lower the power. There is sampling variation in both of these quantities such that the values in a future study could be higher or lower than those observed herein, affecting power. Thus, more conservative estimates of sample size and power are provided using the onesided lower 90% confidence limit for the mean and the upper 90% limit for the SD. Table 4 then presents sample size computations for an analysis at 12 and 24 months, respectively, using either the non-transformed or transformed data assuming a one-sided test at the 0.05 level, 85% power and a 2:1 allocation ratio (treated:control) with no losses to follow-up. These are the design parameters adopted as a template for TrialNet studies; however, additional computations with other design parameters are readily obtained from the equations presented in the sub-section on Statistical Computations. Also, to be conservative, in Table 4 the 90% lower limit for the control group mean and upper limit for the RMSE from Table 3 are employed. For an analysis of the raw AUC mean values, there is no transformation so the difference between groups (D) refers to a difference in the original units (pmol/ml). That difference is often specified in terms of a percentage difference, such as 50%. Alternately the difference could be specified in terms of an algebraic difference (subtraction).
For example, in Table 4 the estimated untransformed mean at 12 months in the control group is 0.4 pmol/ml. A 50% increase yields a mean of 0.6 in the treated group with D~0:2 pmol/ml. With RMSE s~0:259, the standardized difference is D=s~0:77 with resulting N = 53.0, rounded up to 54, the next highest number divisible by 3. Alternately, the difference to be detected could have been specified as an algebraic difference of 0.2 pmol/ ml, rather than a 50% increase, yielding the same result in this case. However, in other cases shown, a 50% difference may not be equivalent to a difference of 0.2 pmol/ml. For sample size calculations where a transformation will be employed, it is important to note that the analysis, and thus the means and D (and the RMSE s), must be specified in terms of the transformed values. However, the meaningful difference to be detected is generally specified in terms of the inverse means in pmol/ml. Consider, for example, detecting a 50% difference using the log(xz1) transformation. The control group mean of the log(xz1) values is 0.31 log(pmol/ml+1). The inverse control group geometric-like mean is ( exp 0:31 {1)~0:36 pmol/ml. A 50% difference yields a value of 0.54 pmol/ml in the treated group with a corresponding log(xz1) value of 0.43 log(pmol/ ml+1). Compared to the mean value of 0.31 log(pmol/ml+1) in the control group, this yields D~(0:43{0:31)~0:12. When employed in equation (1) with RMSE s~0:167, the resulting N = 57.7 that is rounded to 60. Alternately, the difference could be specified as an algebraic difference in the inverse mean values, such as a 0.2 pmol/ml difference in the geometric-like means. In this case the treated group inverse mean would be 0.56 pmol/ml and the corresponding log(xz1) value is 0.44. This yields a D~(0:44{0:31)~0:13 log(pmol/ml+1) and the resulting N = 47.4 that is rounded to 48. This smaller sample size arises because a 0.2 pmol/ml difference between groups in the inverse means results in a larger D in the transformed values (0.13) than when the effect is specified as a 50% improvement (0.12).
The sample size required to detect a 50% difference at 24 months is double that needed to detect a 50% difference at 12 months. One reason is that the control group-mean is smaller, due to the progressive loss of C-peptide leading to lower values at 24 than at 12 months, some of which are virtually zero (and still included in the analysis). As a result, a 50% increase in the pmol/ ml values results in a slightly smaller difference D between the transformed means at 24 than at 12 months, except for the log(x) analysis that is unchanged from 12 months. But the main reason is the higher RMSE at 24 months than at 12 months, more so for the log(x), resulting in smaller standard difference values D=s and larger N than at 12 months.
However, the sample size needed for a study designed to detect a 0.2 pmol/ml difference between groups at month 24 is about the same as that at month 12, except for the log(x). While the RMSE of the log(x) is greater at 24 than 12 months, so also is the mean difference so that the standard difference D=s is greater at 24 months, resulting in a smaller sample size requirement at month 24 than at month 12.
The sub-section on Statistical Computations also presents equations for the computation of power for a given N using

Influence of Age
In Table 3, the residual SD values in the 13-17 year age category are substantially higher than those in the other age strata at 12 months but not at 24 months. Thus, a substantially larger sample size would be required for a 12-month study in this age group, or predominantly containing this age group, regardless of whether the treatment effect of interest is stated as a percentage difference or an absolute difference between groups. Table 3 also shows that the control group mean values at 12 and at 24 months are substantially higher in the 18+ year category than in the other two strata. This indicates that a smaller sample size would be required to detect a given percentage treatment group difference within this age stratum than within other age strata. The table, however, also presents the baseline transformed and inverse means within each age stratum. Compared to the month 12 and 24 values, the baseline values are also higher in the 18+ category than the 8-12 category. This indicates that the Cpeptide is declining at a lower rate in the 18+ year category and that a smaller algebraic treatment group difference might therefore be observed, thus requiring a larger sample size.
For illustration, Table 5 presents the sample size calculations for a 12-month study restricted to each of the three age strata using the log(xz1) values. To detect a 50% difference, a study in adults would require a much smaller sample size, but to detect a 0.2 pmol/ml difference, a study in children 8-12 y would require a smaller sample size. In both cases, a study restricted to adolescents 13-17 y of age would require the largest sample size.

Age Mixtures
The TrialNet data herein consists of a mixture of subjects within the three age categories as shown in Table 1 that applies to the overall estimates presented in Table 3. However, the mixture of the age groups within a study may differ from that herein, such as for a study that is restricted to adults alone initially, followed by enrollment of adolescents and adults. In this case, a sample size computation could be based on a weighted average of the age

The Anti-CD20 Study Results
The anti-CD20 study published results demonstrated significant differences between the rituximab and control group subjects in the primary intent-to-treat analysis of the log(xz1) values at one year of follow-up [6]. That single analysis was pre-specified in the study protocol because any post-hoc selected analysis could substantially inflate the type I error probability and lead to biased results. Nevertheless, it is instructive to examine what the study results would have been had a different approach to the analysis been chosen. Table 6 presents the ANCOVA model adjusted treatment group effect using the untransformed and transformed 12-month C-peptide values. As expected from above, the analysis of untransformed values failed to reach statistical significance, whereas those of the transformed values were each statistically significant. While the log(x) analysis produced the smallest p-value, its F-value was not substantially different from that of the other analyses.
The distribution of the raw AUC mean residuals in this study was not as distorted that in the combined cohort data. The distributions of the transformed values were similar to those in the combined cohorts. The Shapiro-Wilks test was again significant for the raw and log(x) values, but White's test values were similar in the four analyses.
The log(x) or other transformation may produce different variances between groups, thus violating one of the assumptions of the ANCOVA test [15]. The F-test of equality of the variances of the covariate-adjusted values was highly significant for the log(x) transformed values (pv0:0001) and marginally for the ffiffiffi x p values (p = 0.046), but not the raw values or log(xz1) values. In this case, Satterthwaite's test, that allows for unequal variances, yields p = 0.017 for the difference between groups in the log(x) values, and p = 0.019 for the ffiffiffi x p values. Two additional distribution-free tests of the difference in baseline-adjusted values also provided significant results. A Wilcoxon test (also called a rank transformation analysis [16]) provided one-sided p-values ranging from 0.023 to 0.029; and a test using White's robust information-sandwich estimate of the variance [13] yielded p-values almost identical to those in Table 6 (0.061, 0.008, 0.018 and 0.012, respectively).
Thus, even though the log(x) distribution departs from normality, it nevertheless provides a significant difference between groups as did the other transformations. Were the log(x) analysis pre-specified as the primary analysis, the result would still be valid, though the test would have less power than one using a more appropriate transformation. Table 7 then shows the inverse means and the relative and absolute differences between groups on each scale. Among the different analyses, the percentage difference in the inverse means was greatest for the log(x) values even though the control group mean was lower. The algebraic differences were similar among the analyses. Table 6 shows the transformed means, the RMSE and the standardized difference for each analysis. The standardized difference that determines power is slightly greater for the log(x) values.

Statistical Computations
This sub-section presents the statistical equations used in the computations presented in the main paper, with additional examples. This includes methods for the calculation of sample size and power; the assessments based on different mixtures of subjects in the three age strata; and the computation of a ratio or difference in the mean levels using either the log(x+1) or square root transformations. Throughout, the C-peptide value x refers to the AUC mean value in pmol/ml.
Computation of Sample Size and Power. The equations used to compute sample size and power are widely available, as in [14]. Let a denote the type 1 (false positive) error probability and b the type II (false negative) error probability. Then Z 1{a is the critical value for the test statistic at level a, one or two-sided as prespecified; e.g. Z 1{a~1 :645 for a one-sided 0.05 level test, 1.96 for a two-sided test. Z 1{b is the quantile corresponding to the desired level of power 1{b, e.g. 1.04 for 85% power. To allow for an unequal allocation, let Q designate the fraction of subjects assigned to the treated group, 1{Q that to control, e.g. Q~2=3 for a 2:1 randomization to treatment and control. Then let m 1 denote the mean of the transformed (or untransformed) values in the treated group and m 2 that in the control group, with difference D~m 1 {m 2 . Denote the root mean square error (RMSE) of the transformed values as s. Then the sample size needed to detect the difference D in the transformed values is provided by the equation where % designates approximate equality.
To allow for a fraction (L) losses-to-follow-up or missing outcome data, then the sample size would inflated by (1{L) {1 . For example, if N~100 with complete follow-up, then to adjust for 20% losses the sample size would be inflated to yield N L~N =(1{L)~100=0:8~125.
In some cases, the sample size may be specified from other considerations and it may then be desired to evaluate the power of the study to detect a given difference. In this case, for a given N the power is computed from where power = W(Z 1{b ) is the cumulative normal fraction at the value Z 1{b , e.g. power = 0.85 for Z 1{b = 1.04. Alternately, the study properties may be specified in terms of a difference D that can be detected with a given level of power with a specific Nand s, as provided by The N employed in equations (2) and (3) should be the number of evaluable subjects. For example, if the planned N~200, to allow for 10% losses, then the equation should employ N~180.
The above simple equations are approximations to the precise computations using the non-central student's t-distribution for which iterative computations are required. See [14], among many.
For Nw100 the precise sample size is less than 2% greater than that provided by the above equation (1), and less than 1% for Nw200.
For example, Table 4 shows that a sample size of N = 60 provides 85% power to detect a 50% difference using the log(xz1) transformed values at 12 months, whereas a sample size of 117 would be required at 24 months. If the study is conducted using N = 60, the power to detect a 50% difference at 24 months could be computed using (2). From Table 4, a 50% difference in a log(xz1) analysis at 24 months would yield a value D=s = 0.52. Substituting this quantity along with N~60 yields Z 1{b = 0.254 that corresponds to power of 60%.
Alternately, the difference that can be detected with 85% power could be computed from (3) upon substituting Z 1{b = 1.04 (for 85% power), s = 0.192 (the RMSE from Table 4) and N~60 to show that this sample size would provide 85% power to detect a difference D = 0.14 in the log(xz1) values at 24 months. Adding this amount to the control transformed mean in Table 4 (0.23) yields a treated transformed mean of 0.24+0.14 = 0.38. Taking the inverse transformation of each yields means of 0.27 pmol/ml for the control group and 0.46 pmol/ml for the treated group, or a 70% difference.
Age-Averaged Estimation. As described in the text, other studies may comprise a mixture of age categories that differs from that in the TrialNet studies. Denote the fraction of subjects within the three age categories as P 1 for age 8-12, P 2 for 13-17, and P 3 for 18 and older. For a specified difference between groups, that could vary among the age strata, let D i denote the difference to be detected within the ith age stratum, and s i the residual standard deviation within each stratum (i~1,2,3). Then, the average expected difference between groups is and the average residual variance would be P 3 i~1 P i s 2 i . Then sample size would be computed using Likewise, the power of the study for a given sample size would be obtained from and the average difference that can be detected as For example, consider a study designed to detect a 50% difference using the log(xz1) transformed values in an analysis at 12 months that is projected to enroll fractions P 1~0 :4, P 2~0 :4, and P 3~0 :2. From the quantities specified in Table 3, a 50% increase in the 90% limit of the inverse control group means, the resulting transformed means, and D are   Table 3. Quantities required to assess sample size or power using an analysis of the untransformed or transformed AUC (pmol/ml) values at 12 or 24 months, along with the baseline mean values (transformed and inverse). Then the terms needed for the sample size calculation are Interpretation of log(x) and log(x+1) Analyses. For an analysis of the raw values, standard programs will compute the baseline adjusted mean values and their differences. For an analysis on the log scale, taking the exponential function of the baseline adjusted means provides estimates of the geometric means. Programs also provide an estimate of the difference between the means of either the raw values or the log values. In the latter case, taking the exponential function of the difference provides an estimate of the ratio of the geometric means.
However, for an analysis using the log(x+1) transformation or the square root transformation, programs do not directly provide estimates of the ratio or difference of the corresponding C-peptide mean values on the pmol/ml scale. Herein we show how these estimates can be obtained from other computer program computations.
If the log(x+1) transformation is used as the basis for the analysis of the levels of C-peptide it is useful to summarize the results using the ratio of the geometric means, say R, with 95% confidence limits on R. Since R is a ratio with a value of 1 under the null hypothesis, asymmetric confidence limits computed using the log of R will provide more accurate coverage probability than symmetric limits based on the simple estimated standard error of R itself. The necessary quantities can be obtained from an analysis of the log(x+1) values using a program such as a SAS PROC GLM or PROC MIXED to compute the adjusted means (called LSmeans) and their standard errors. Table 4. Sample sizes* for two groups with 2:1 allocation (Q = 2/3) to treatment versus control needed to provide 85% power to detect either a 50% difference, or an absolute difference of 0.2 pmol/ml, using either the raw or transformed data for an unstratified analysis at 12 or at 24 months. *All computations are for a one-sided test at the 0.05 level with no adjustment for losses to follow-up. In all cases the 90% limits for the control group mean and the SD are used as the parameter values in equation (1). The exact N is provided as well as that rounded up to the nearest integer satisfying the 2:1 allocation fractions. doi:10.1371/journal.pone.0026471.t004 Let y i refer to the LSmean of the log(x+1) values in the ith group (i~1,2). Then R is expressed as The variances of each mean, say Var(y 1 )~V 1 and V (y 2 )~V 2 , are provided by squaring the standard errors provided by the LSMEANS computation. An additional computation is needed to obtain the covariance of the two. The LSMEANS output also includes a computation of the difference, i.e. D~y 1 {y 2 , but not the SE of the difference. Thus, an estimate statement is used to obtain the variance (or SE) of the difference between the group LSmeans. Since Var(D)~Var(y 1 )zVar(y 2 ){2Cov(y 1 ,y 2 ) ð11Þ then the covariance, say Cov(y 1 ,y 2 )~Cov 1,2 , is obtained by subtraction as Using the delta method it is then shown that the variance of the log ratio is e y 2 e y 2 {1 Cov 1,2 : Table 5. Sample sizes* for two groups with 2:1 allocation (Q = 2/3) to treatment versus control needed to provide 85% power to detect either a 50% difference, or an absolute difference of 0.2 pmol/ml, using an age-stratified analysis of log(x+1) values at 12 months.  These analyses are based on the intention-to-treat cohort that includes 29 placebo-treated subjects who met the defined criteria. The results using the log(xz1) values are identical to those that appeared in the primary study manuscript [6]. doi:10.1371/journal.pone.0026471.t006 The 95% asymmetric confidence limits on R are then obtained as Interpretation of ffiffiffi x p Analyses. If the analysis uses the ffiffiffi x p transformation it would be desirable to summarize the results using the difference of the inverse means, say S, with 95% confidence limits on S, both computed in pmol/ml units. As above, let y 1 refer to the LSmean of ffiffiffi x p in one group and y 2 that in the other group, with respective variances V 1 and V 2 obtained as the square of the standard errors. Then the difference S in pmol/ml units is expressed as Again using the delta method, the variance of S is obtained as The 95% symmetric confidence limits on S are then obtained as

Discussion
Analyses of two recently completed TrialNet studies in newly diagnosed type 1 diabetes assess the properties of the stimulated Cpeptide levels of a 2-hour Mixed Meal Tolerance Test used to measure b-cell function. In general, a transformation is needed to improve the normality of the distribution of values. Among those considered herein, the log(x) over-corrects, replacing right skewness with left skewness, whereas the log(xz1) and ffiffiffi x p values are more nearly symmetrically distributed. The resulting sample size estimates for an analysis using the log(x) values are greater than using either the raw or log(xz1) and ffiffiffi x p values. The sample sizes using the ffiffiffi x p values were slightly greater than those using the log(xz1) values.
TrialNet pre-specified that the log(xz1) transformation would be used so as to improve the distribution because the majority of the AUC mean values are less than 1 [3]. Another approach to deal with this might be to simply multiply the AUC mean values (x) by a constant (C), such as multiplying by C~100 to yield values 100*AUC mean in pmol/(ml/100). Taking the log transformation yields log(cx)~log(c)zlog(x). In this case the shape of the distribution of log(cx) is the same as that of log(x) and the properties of the analysis of the log(cx) values is the same as that of an analysis of the log(x) values.
While TrialNet had initially selected the log(xz1) transformation based on its preliminary data, it is possible that preliminary studies of a compound might suggest that a different transformation best captures the effect of treatment on the distribution of values. For example, if a preliminary study suggests that an analysis of the raw values appears to best reflect the treatment group difference, then a sample size calculation using the raw values might be preferred even though, based on the computations herein, a smaller sample size might be computed using a transformation. Likewise, preliminary data from other studies might suggest that a different transformation, like log(x), might be preferred, in which case we hope that the data presented herein could be useful for planning future studies.
Sample size computations are shown for an analysis at 12 and 24 months using either a relative (50%) increase or a fixed (0.2 pmol/ml) difference between groups. The N required to detect a fixed difference is principally a function of the residual variation (RMSE) that tends to be greater at 24 than at 12 months, resulting in a larger N at 24 months. The N required to detect a relative increase is also a function of the control group mean because a percentage increase from a larger control mean value equates to a larger absolute difference. For example, in Table 4, a 50% increase in the log(xz1) values at 12 months corresponds to a difference of 0.12 pmol/ml versus a difference of 0.10 pmol/ml at 24 months in Table 4, again resulting in a larger N at 24 months.
In practice it might be more appropriate to consider a larger difference between groups at 24 than at 12 months. For example, if an effective treatment actually stabilized the level of C-peptide over 2 years, then owing to the progressive decline in the control group, there should be a larger difference between groups at 24 than at 12 months that would lead to the requirement for a smaller sample size.
The results also depended on age, stratified herein as 8-12 years, 13-17 and 18 and older at diagnosis. The residual variation among those 13-17 years was substantially higher than that in the other age categories, perhaps because they are peripubertal. Thus, methods are described to compute sample size for a study with specific planned fractions of subjects in these age categories.
It may also be prudent to consider different effect sizes within the age strata. Comparing the inverse mean values within the age strata at 12 months versus 24 months (Table 3), the rate of decline in those 18 and above is less than that in the other categories. Thus, a treatment that stabilizes the level of C-peptide over 2 years would have a smaller treatment effect among those 18 and above because the control group would be falling at a lower rate. This could readily be addressed by using a smaller difference in this age category when conducting an age specific computation as shown in the sub-section on Statistical Computations.
The TrialNet anti-CD20 study showed a statistically significant beneficial effect (p = 0.02) of rituximab versus placebo on the pre- *These analyses are based on the intention-to-treat cohort that includes 29 placebo-treated subjects who met the defined criteria. The results using the log(xz1) values are identical to those that appeared in the primary study manuscript [6]. doi:10.1371/journal.pone.0026471.t007 specified log(xz1) C-peptide at 12 months [6]. Additional analyese presented herein show that the differences in the log(x) and ffiffiffi x p , but not the raw, values were also statistically significant, more so for the log(x). While the log(x) violated the common variance assumption based on White's test being significant (Table 2), other non-parametric or robust tests not requiring that assumption were also significant. Such a test might be preferred if it is decided to use the log(x) values in the analysis of a study.
The optimal transformation may also differ for other methods of analysis. For example, a secondary analysis of the anti-CD20 study assessed the difference between groups in the average rate of decline (or slope) in the C-peptide values over time [6]. Biologically, a constant percentage decline per year in C-peptide would be expected [3], corresponding to the rate of decline in bcell mass. This constant percentage decline implies that the slope of the log(x) values is constant over time, or that the log C-peptide is a linear function of time with coefficient b, and the percentage change in C-peptide per year is estimated as 100(exp(b){1). Neither an analysis of the log(xz1) nor ffiffiffi x p values would have this interpretation. On this basis, the log(x) values were employed in the slope analysis presented in the published report [6]. This analysis used a random coefficient model [17] allowing a unique rate of change in log C-peptide over time for each subject with an estimate of the mean slope within each treatment group. The mean percentage decline in the rituximab group was significantly less than that with placebo (38 versus 56% per year, p = 0.027). However, had the analysis been done using the raw, log(xz1) or ffiffiffi x p values, none would have approached significance (pw0:155 for all).
It should also be noted that there are many other possible transformations that might be employed. Among the most general is the family of Box-Cox power transformations [18,19] that can often transform a set of quantitative values to a near normal distribution. Such transformations are often used to promote a strongly linear association among variables on the transformed scales. Rarely, however, are such transformations used for inferences about the underlying mean values, as is the focus herein.
Clearly, the results herein largely apply to a population of subjects recruited in North America. Whether they apply to studies conducted in other populations is unknown. However, the distributions of C-peptide values obtained from a cross sectional study of the properties of a mixed meal versus glucagon stimulation test conducted in North America were similar to those of an identical study conducted in Europe [4], despite the fact that different central laboratories were employed in each study. Further, it is remarkable that consistent patterns of change in C-peptide over time have been observed in the control groups of studies conducted different populations [6,7,[20][21][22][23].
In conclusion, these TrialNet studies support the need to employ a transformation in the analysis of C-peptide values over time in therapeutic studies of new onset type 1 diabetes. The patterns of variation differ after 12 months and 24 months, and among age categories. However, it is possible to fine-tune the design of a study in a manner that allows for these factors.

Supporting Information
Supplement S1 Members of the Type 1 Diabetes Trial Network. (DOCX)