Prognostic factors for changes in the timed 4-stair climb in patients with Duchenne muscular dystrophy, and implications for measuring drug efficacy: A multi-institutional collaboration

The timed 4-stair climb (4SC) assessment has been used to measure function in Duchenne muscular dystrophy (DMD) practice and research. We sought to identify prognostic factors for changes in 4SC, assess their consistency across data sources, and the extent to which prognostic scores could be useful in DMD clinical trial design and analysis. Data from patients with DMD in the placebo arm of a phase 3 trial (Tadalafil DMD trial) and two real-world sources (Universitaire Ziekenhuizen, Leuven, Belgium [Leuven] and Cincinnati Children's Hospital Medical Center [CCHMC]) were analyzed. One-year changes in 4SC completion time and velocity (stairs/second) were analyzed. Prognostic models included age, height, weight, steroid use, and multiple timed function tests and were developed using multivariable regression, separately in each data source. Simulations were used to quantify impacts on trial sample size requirements. Data on 1-year changes in 4SC were available from the Tadalafil DMD trial (n = 92) Leuven (n = 67), and CCHMC (n = 212). Models incorporating multiple timed function tests, height, and weight significantly improved prognostic accuracy for 1-year change in 4SC (R2: 29%-36% for 4SC velocity, and 29%-34% for 4SC time) compared to models including only age, baseline 4SC and steroid duration (R2:8%-17% for 4SC velocity and 2%-13% for 4SC time). Measures of walking and rising ability contributed important prognostic information for changes in 4SC. In a randomized trial with equal allocation to treatment and placebo, adjustment for such a prognostic score would enable detection (at 80% power) of a treatment effect of 0.25 stairs/second with 100–120 patients, compared to 170–190 patients without prognostic score adjustment. Combining measures of ambulatory function doubled prognostic accuracy for 1-year changes in 4SC completion time and velocity. Randomized clinical trials incorporating a validated prognostic score could reduce sample size requirements by approximately 40%. Knowledge of important prognostic factors can also inform adjusted comparisons to external controls.

Introduction Duchenne muscular dystrophy (DMD) is a progressive, debilitating neuromuscular disorder occurring in approximately 15.9 to 19.5 per 100,000 live births, based on estimates from the United States and United Kingdom [1,2]. DMD results in progressive muscle-wasting, loss of ambulation during adolescence, and death by early adulthood [3]. Over the last decade substantial progress has been made to identify and develop targeted therapies to treat the underlying cause of DMD [4]. However, although several drugs [5][6][7][8][9] have been tested in late-phase placebo-controlled trials, conditional or full regulatory approval has thus far been secured for only two novel drugs in the ambulatory setting: ataluren for nonsense mutation DMD [10], and eteplirsen for exon 51 skippable mutations [11].
A major challenge in the design of DMD clinical trials, and in the interpretation of their results, is heterogeneity in patients' rates of disease progression [5,6,8,9]. To address this, investigators have sought to enrich trial populations for patients with less variable prognoses over the duration of a trial. Such enrichment efforts have been implemented through the use of selected patient characteristics, in particular age, duration of steroid use and baseline performance on the primary functional endpoint, to define inclusion/exclusion criteria and to stratify randomization in clinical trials. However, even in trials incorporating these approaches, observed variability in functional outcomes remains large, spanning improvement in function for some patients to complete loss of function in others.
At the same time, evidence has emerged that additional patient characteristics, particularly those measuring different aspects of ambulatory function, have additive value as important prognostic factors. Performance on the timed rise from supine has been associated with changes on the six minute walk distance (6MWD) and disease progression in general [12,13]. In addition, our earlier work has demonstrated that a composite score based on multiple baseline measures of function more than doubled prognostic accuracy for changes in 6MWD compared to predictions based on age, baseline 6MWD, and steroid use [12,14].
Use of prognostic factors to inform clinical trial design is well-established in many therapeutic areas, including, for example, cardiovascular disease [15], renal disease [16], and oncology [17] trials. The use of prognostic enrichment has also been noted as a potentially useful strategy for clinical trials in guidance issued by the Food and Drug Administration (e.g., see sections IV and V of [18]). The stronger the prognostic factors, the higher their value for trial design in terms of increasing power and decreasing sample size requirements. Better understanding of prognostic factors could be especially valuable for trials in DMD due to challenges associated with heterogeneity in patients' progression over time and the limited patient populations available to participate in trials.
Since DMD is characterized by progressive deficits in muscle strength, the timed 4-stair climb (4SC) has been used as an important assessment in clinical practice and in clinical trials [5-7, 9, 19]. In the present study we assess prognostic factors for 1-year changes in 4SC performance, develop a preliminary composite prognostic score, and assess consistency of prognostic performance across data sources. We also quantify the impacts that use of the prognostic score would have on sample size requirements in clinical trials. This study was conducted within the collaborative Trajectory Analysis Project (cTAP), a pre-competitive collaboration of drug developers, clinical experts and registries, patient advocacy groups, and data scientists engaged in research to improve drug evaluations in DMD.

Data sources and study population
This study used data shared with cTAP from three different sources: placebo arm data from Eli Lilly's phase 3 trial of tadalafil in patients with DMD (NCT01865084), and real-world data from DMD centers in Leuven, Belgium (Leuven) and at the Cincinnati Children's Hospital Medical Center (CCHMC) in Cincinnati, Ohio. At the time of this study, these constituted all data sources accessed by cTAP with available data on the 4SC test.
Tadalafil DMD trial. The Tadalafil DMD Study Group Trial (Tadalafil DMD Trial) was a randomized, double-blind, placebo-controlled phase 3 trial of tadalafil, which enrolled ambulatory boys with DMD aged 7 to 14 years, who had at least 6 months of steroid use prior to trial recruitment, and baseline 6MWD between 200 and 400 meters [5]. Patients were randomly assigned to receive placebo, low dose tadalafil (0.3 mg/kg), or high dose tadalafil (0.6 mg/kg) daily for 48 weeks [5]. Trial assessments, including timed function tests, occurred every 12 weeks. Only those patients randomized to placebo were included in the present study. The study protocol and consent forms for the use of data from Eli Lilly's phase 3 trial of tadalafil in patients with DMD were approved by the institutional review/ethics boards at each participating medical center and conducted in accordance with the Declaration of Helsinki and other international ethics guidelines. The phase 3 trial data used in this study were de-identified and thus no institutional review board (IRB) approval was required for the de-identified data used in this analysis.
Leuven. Data were collected from boys diagnosed with DMD who were routinely monitored in the pediatric neurology clinic at Universitaire Ziekenhuizen in Leuven, Belgium. This ongoing study has been approved by Ethische Commissie Onderzoek, the Ethics Committee of the University Hospitals Leuven, and was conducted in accordance with the Declaration of Helsinki. Written consent from the guardians of each participant was obtained. The database available for the present study included 158 boys with DMD and clinic visits as recent as February 2017. Assessments of timed function tests occurred approximately every 6 months, and included the timed 4SC, 10-meter walk/run, and rise from supine, as well as concurrent assessments of height, weight, and steroid use.
CCHMC. Natural history data were obtained from electronic medical records of patients with DMD receiving care at CCHMC. The data was fully de-identified and was collected under an IRB-approved clinic registry, CCHMC IRB #1, which captures clinical data from clinic visits with informed consent from patients and care givers to participate in the clinic registry at CCHMC. The database available for the present study spanned the years 2003 to 2016 and included 480 boys diagnosed with DMD. Functional assessments occurred every 6 months to 1-year, and included timed 4SC, timed 30-foot walk/run, and timed sit to stand.

Study measures
Outcomes. The primary outcome measures in this study were the annualized changes in 4SC velocity (Δ4SC velocity) and 4SC completion time (Δ4SC time) over an approximately 1-year period (8 to 16 months). While 4SC completion time (recorded in seconds taken to complete the test) may be more clinically interpretable, 4SC velocity (measured in stairs/second) may have desirable statistical properties. It was not the purpose of this study to assess the relative statistical value of these measurement scales. Both have been considered in planned analyses of DMD clinical trials. Both measures were investigated here to assess sensitivity of prognostic factors to choice of 4SC measurement scale. Annualized changes in 4SC were calculated by dividing the change observed between two assessments by the elapsed time (in years) between those assessments. Patients who could not complete the 4SC due to loss of ability to climb stairs had their velocities set to zero and their completion times set to 12 seconds; completion times exceeding 12 seconds were also truncated to 12 seconds, as this reflects the typical maximum recording of 4SC completion times in real-world clinical practice. Sensitivity analyses using a threshold of 30 seconds, which has been used as the maximum recording time in some clinical trials [6,7] were also conducted. The analyses on the velocity scale also serve as a sensitive analysis for the issue of truncation of completion times, since, on the velocity scale, larger completion times are translated into smaller velocities approaching zero. The current study is not intended to recommend appropriate cut-off times for clinical trials or clinical practice, but rather to assess the sensitivity of prognostic factors to different cutoff times. 4SC assessments that were not performed for other reasons (e.g., bone fracture) were considered missing.

PLOS ONE
There were some differences across data sources in how 4SC times were assessed and recorded, and in availability of data on loss of ability to climb stairs. In Leuven and the Tadalafil DMD Trial placebo arm, task completion occurred when the patient's second foot reached the last step, whereas in CCHMC task completion occurred when the patient's leading foot reached the last step. Additionally, the inability to complete the 4SC due to the loss of ability to climb stairs was not captured in clinical practice data available from CCHMC. Consequently, the 4SC time and velocity data from CCHMC reflects data only from patients who were able to complete the test. In contrast, the inability to complete the 4SC test was explicitly recorded at all visits in Leuven and the Tadalafil DMD Trial placebo arm.
Prognostic factors. In all three data sources prognostic factors assessed included demographics and vitals (age, height, weight, body mass index [BMI]), duration of steroid use (�1 year vs. <1 year), current deflazacort use (yes/no), and timed function tests. Apart from 4SC, the available timed function tests differed across data sources. The 10 meter walk/run and timed rise from supine were available in Leuven and the Tadalafil DMD Trial; in CCHMC, the 30-foot walk/run and timed sit to stand were available in lieu of these two tests. Patients (in the Tadalafil DMD Trial and Leuven) who were unable to complete the 10-meter walk/run and rise from supine tests, or those who took longer than 12 seconds to complete, had their times set to 12 seconds for analysis. Velocities for these timed function tests were set to zero for patients who had lost the function being assessed by the test.

Study design
Changes in 4SC were studied over intervals of follow-up that were approximately 1-year in length (Fig 1). For each data source, we identified all intervals (i.e. pairs of clinic visits) meeting each of the following criteria: 1) the first clinic visit in the pair, defined as the baseline visit, had 4SC time < 12 seconds and information available for all prognostic factors of interest; 2) the baseline visit and a follow-up visit, defined the outcome visit, were separated by approximately 1 year (8 to 16 months); 3) 4SC was assessed at both the baseline and outcome visits. Separate study samples were constructed in this fashion for each data source. Patients from the Tadalafil DMD Trial contributed one~1-year interval each, corresponding to the approximately 1-year between the baseline and 48-week assessments in the trial. Patients from Leuven and CCHMC could have multiple pairs of~1-year intervals meeting the aforementioned criteria. For these patients, all non-overlapping~1-year intervals were included in the analyses.
The outcome visit for one interval was allowed to serve as the baseline visit for a subsequent interval, but further overlap was disallowed.
Δ4SC time = (4SC time at outcome visit -4SC time at baseline visit)/ time in years between outcome and baseline visits. Δ4SC time > 0 indicates worsened performance; Δ4SC time < 0 improved performance.
Δ4SC velocity = (4SC velocity at outcome visit -4SC velocity at baseline visit)/ time in years between outcome and baseline visits. Δ4SC velocity > 0 indicates improved performance; Δ4SC velocity < 0 indicates worsened performance.

Statistical analyses
Baseline characteristics were summarized for each data source using means and standard deviations (SD) for continuous variables and counts and percentages for categorical variables. Observed Δ4SC velocity and Δ4SC completion times were also summarized for each data source.
Fitting prognostic models. Multivariable regression models for the Δ4SC velocity were fit in each data source to assess the importance of the candidate prognostic factors. Three models were fit: (1) a base model that contained only age, baseline 4SC velocity, and duration of steroid use; (2) an intermediate model that incorporated current deflazacort use, measures of walk/run and rising function, in addition to the factors in the base model, and (3) a full model that incorporated height, weight, and BMI, in addition to the factors included in the intermediate model. In all models, generalized estimating equations, with an exchangeable covariance structure were used to account for repeated measures in Leuven and CCHMC. A similar set of models were fit for 4SC completion time in each data source, with the corresponding timed function test completion times replacing velocities used as baseline predictors in the intermediate and full models.
Assessing prognostic value. The prognostic value of each model was measured by calculating the root mean squared prediction error, which was computed as the SD of the difference between observed and model-predicted Δ4SC velocity. To visually assess the performance of each model, scatterplots of the observed versus model-predicted Δ4SC velocity were generated. R 2 values were also calculated to measure the percentage of variation in Δ4SC velocity explained by each model. Box plots of the observed Δ4SC velocity, stratified by quartiles of predicted change, were used to assess model calibration. Finally, contributions of predictors to the overall prognostic value of the model was assessed by estimating R 2 values for multivariable models obtained by separately adding each predictor to the base model, and separately removing each predictor from the full model. Because the rise from supine and 10-meter walk/run were not assessed in data from CCHMC, these predictors were replaced with sit to stand and 30-foot walk/run tests, respectively. Similar analyses were carried out for Δ4SC time. Sensitivity analyses. Two sets of sensitivity analyses were conducted. First, the analyses were repeated with 4SC completion times for patients who had lost the ability to climb stairs set to 30 seconds. In this sensitivity analysis, patients with 4SC completion times exceeding 30 seconds had these times truncated to 30 seconds (rather than 12 seconds as in the primary analyses). Second, the models were re-fit using 1-year intervals defined based on visits 10 to 14 months apart rather than 8 to 16 months apart as in the primary analyses.
Impacts on trial sample size. Simulations were used to quantify the extent to which use of a prognostic score could reduce sample size requirements for randomized controlled trials in DMD. These simulations assumed that changes in 4SC velocity are normally distributed, hypothesized a treatment effect on mean Δ4SC velocity of 0.25 stairs/second, assumed a type I error rate of 0.05, and considered different scenarios for the level of variability of Δ4SC velocity (SD of Δ4SC velocity either 0.5, 0.6, or 0.7 stairs/second). For each scenario, power to detect the treatment effect was estimated for two possible statistical analyses: 1) an unadjusted analysis and 2) an analysis adjusting for a baseline prognostic score with an R 2 of 0.35.

Baseline sample characteristics
Δ4SC velocity was available from 92 patients (contributing 92~1-year intervals) from the Tadalafil DMD Trial, 67 patients (235~1-year intervals) from Leuven, and 212 patients (543 1-year intervals) from CCHMC (Table 1). Baseline characteristics are summarized in Table 2. Patient demographics were generally similar across all three sources at the start of thẽ 1-year intervals analyzed. Duration of steroid use was longer in CCHMC and in the Tadalafil DMD Trial than in Leuven. Patients in CCHMC and Leuven predominantly received deflazacort at baseline; in the Tadalafil DMD Trial, steroid use at baseline was evenly split between deflazacort and prednisone. Patients in the Tadalafil DMD Trial had slightly worse function at baseline, on average, as indicated by their lower North Star Ambulatory Assessment (NSAA) total score and longer completion times (lower velocities) on the available timed function tests.
Observed Δ4SC velocity and Δ4SC time 4SC velocity declined over the~1-year period in all three data sources: mean Δ4SC velocity was -0.06 stairs/second (SD = 0.65) in Leuven, -0.18 stairs/second (SD = 0.41) in the Tadalafil   Table 1. Sample selection in each data source.

PLOS ONE
Prognostic factors for changes in the timed 4-stair climb in DMD intervals from the Tadalafil DMD Trial. As noted above, this information was not available in data from CCHMC.

Models for Δ4SC velocity time and Δ4SC time
Δ4SC velocity. Results from the base and full multivariable models for Δ4SC velocity are presented in Table 3. Model coefficients and R-squared values from the intermediate model for Δ4SC velocity were similar to the full model and are shown in S1 Table. Overall, the base model explained 17% and 16% of the variation of Δ4SC velocity in Leuven and CCHMC patients, respectively. In these data sources, older age and longer duration of steroid use and higher baseline 4SC velocity were associated with declines in 4SC velocity over the~1-year period. Among patients from the Tadalafil DMD Trial, 8% of the variation in Δ4SC velocity was explained by the predictors included in the base model, with none of them being statistically significant.
In all three data sources, the full model, additionally including timed function tests (rise from supine, 10 meter walk/run, or sit to stand and 30 foot walk/run) and additional patient characteristics (current steroid use, height, weight, and BMI), explained a greater proportion of the variation in Δ4SC velocity (29% in the Tadalafil DMD Trial, 36% in Leuven, and 30% in CCHMC). In the full models fit in Leuven and CCHMC, longer duration of steroid use and greater velocities of 10-meter walk/run and rise from supine (or 30-foot walk/run and sit to   stand) at baseline were associated with greater improvements in 4SC velocity. Higher 4SC velocity at baseline was associated with a decrease in Δ4SC velocity over the 1-year period. Scatterplots of the observed versus model-predicted Δ4SC velocity, and box plots of the observed Δ4SC velocity, stratified by quartiles of predicted change illustrate the improvement in prediction of Δ4SC velocity with the addition of the prognostic factors in the full model (S1 and S2 Figs). Models were further evaluated by inspection of residuals plotted relative to predicted values, which were consistent with adequate model specification. In addition, use of generalized estimating equations rendered the models robust to potential non-normality of residuals. The contributions of each prognostic factor when removed from the full model, added separately to the base model, and studied in isolation are summarized in S2A-S2C Table. In general, the addition of velocities for 10-meter walk/run, rise from supine (or, 30 foot walk/run, sit to stand in CCHMC), and current deflazacort use to the base model resulted in the greatest increases in R 2 while removing baseline 4SC velocity from the full model resulted in the largest decreases in R 2 .
In a randomized trial with equal allocation to treatment and placebo arms, baseline adjustment for such a prognostic score would enable a treatment effect of 0.25 stairs/second on Δ4SC velocity to be detected with approximately 120 total patients, compared to 180 total patients without use of the prognostic score (at 80% power) (Fig 2). Power gains with adjustment for prognostic score under additional scenarios are shown in S3 and S4 Figs.
Δ4SC time. Results from the base and full multivariable models for Δ4SC time are presented in S3 Table. Overall, the base model explained 13% and 11% of the variation of Δ4SC time in patients from Leuven and CCHMC, respectively, with older age and longer duration of steroid use associated with worsening outcomes. In the Tadalafil DMD Trial, the base model explained 2% of the variation of Δ4SC time with none of the predictors being statistically significant. Results of the intermediate model for Δ4SC time are shown in S4 Table. Compared with the base model, the full model explained a greater proportion of the variation of Δ4SC time from all three data sources (29% in the Tadalafil DMD Trial, 34% in Leuven, and 34% in CCHMC). Associations of the predictors with Δ4SC time were generally very similar to those seen for Δ4SC velocity. The contributions of each prognostic factor when removed from the full model, added separately to the base model, and studied in isolation are summarized in S5A-S5C Table. In general, the addition of 10-meter walk/run, rise from supine, and current deflazacort use to the base model resulted in the greatest increases in R 2 while removing baseline 4SC completion time from the full model resulted in the largest decreases in R 2 .
Sensitivity analyses. Results from both sets of sensitivity analyses were very similar to the primary analyses in all three data sources: R 2 for models for each set of analyses are summarized in S6A and S6B Table.

Discussion
This study identified prognostic factors for~1-year change in 4SC completion time and velocity among patients with DMD. We found that timed function tests other than 4SC, measures of walking and rising ability in particular, along with height, weight, and BMI added significant prognostic value for change in 4SC-well beyond that provided by age, baseline 4SC, and steroid duration. The proportion of variation explained by the addition of these factors was approximately doubled across all data sources.
Recently, we assessed prognostic factors for 1-year change in 6MWD in an earlier study sample of patients from the Leuven database [12]. A key finding of that study was that a combination of multiple baseline measures of ambulatory function provided significantly enhanced prognostic accuracy compared with using only age, baseline 6MWD, and steroid use. These findings are echoed in the present study of 4SC outcomes, further highlighting the value of combining multiple measures of function into composite prognostic scores in DMD. In addition, consistent with the prior study of prognostic factors for 6MWD, combining multiple measures of baseline function increases prognostic value even as those baseline measures are well correlated with each other [12]. Both of these studies also found that patient age no longer carries significant prognostic value after already accounting for multiple measures of baseline function along with height and weight. This indicates that knowing how a patient performs on several different measures of ambulatory function is more important to their prognosis over the next year than knowing their age. While this may at first seem counter-intuitive for a progressive disease, the lower prognostic importance of age accords with the observed heterogeneity across patients in rates of progression of ambulatory function in DMD. Different patients progress at different ages [20], and different measures of function progress at different times, and thus combining several measures of ambulatory function provides a more accurate prognosis than relying on any single measure alone, or on age.
Inclusion criteria for clinical trials in ambulatory DMD have traditionally been defined based on three conventional prognostic factors: age, steroid use for at least a defined period of time (3 or 6 months), and the baseline value of the primary endpoint (e.g., 6MWD or 4SC) [5][6][7][8]21]. As the natural history of DMD has become better characterized, the ranges of these characteristics have varied, and generally narrowed [5,22], with the aim of defining a subset of patients with more homogenous rates of progression in whom power to demonstrate treatment effects would be increased. In addition, primary analyses of clinical trials have typically adjusted for these conventional baseline factors (age, steroid use, and baseline assessments of the primary endpoint) in an effort to improve precision of estimated treatment effects. The present study is the second to demonstrate that composite prognostic scores can more than double the prognostic accuracy versus that provided by the conventional factors [12]. This indicates that composite prognostic scores for DMD present important opportunities for improving trial design and analyses. Many decisions in trial design are ultimately aimed at managing variation in outcomes or enriching for drug-modifiable patients. The more accurately one can predict outcomes, the better one can manage outcome variation and enrichment. Thus, these trial design decisions can be improved by use of an accurate prognostic score. For example, enrichment of a trial population (or a pre-specified subgroup) for patients with more homogenous and drug-modifiable trajectories will be more precise when the inclusion/exclusion criteria used to define the population are based on thresholds of an accurate composite prognostic score, instead of thresholds of individual characteristics with substantially lower prognostic value. When selecting baseline characteristics for stratifying randomization or for adjustment in statistical analyses of the treatment effect, it is usually not practical to stratify or adjust for more than 2 or 3 baseline characteristics. The stronger the prognostic factors that can be used, the greater the increase in power via stratification and adjustment. Use of a single composite prognostic score that already incorporates multiple baseline characteristics and doubles explained variation would be statistically superior to use of any combination of age, baseline 4SC, and steroid use.
Apart from parallel group randomized trials, a better understanding of prognostic factors for functional outcomes in DMD is needed to guide matched or adjusted comparisons to external control groups for single arm trials and long-term extension studies, and for augmentation of randomized placebo groups with external controls. A key step in the incorporation of external controls is mitigating the risk of bias due to differences in patient characteristics between non-randomized groups. It is usually not feasible to match on or adjust for all characteristics. Rather, the goal is to account for baseline differences in important prognostic factors. Thus, knowledge of important prognostic factors, or validated prognostic scores, is essential for making credible comparisons to external controls-and for avoiding the infeasibility or reductions in power that could arise from a desire to adjust for factors that are actually not prognostic.
This study has several strengths beginning with the inclusion of data from more than 350 boys diagnosed with DMD (n = 371) from three different sources. In addition, the general consistency of findings across data sources bodes well for the future construction of a well-validated prognostic score that performs well across data sources and can thus be reliably applied to clinical trials. In particular, despite known differences in specific measures between sites, such as differences in timed function test availability and assessment protocols, similar improvements in prognostic performance were observed when timed function tests were incorporated into the models. The use of different timed function tests in CCHMC in particular, i.e., 30 foot walk/run and sit to stand rather than 10 meter walk/run and rise from supine, qualitatively validates the models developed using the other data sources, and suggests that these tests are capturing aspects of patients' function that are biologically and statistically meaningful for prognosis.
In some cases, however, the differences across the data sources included in this study present some limitations. As discussed in the methods section we did not have explicit recording of loss of stair climbing ability in the CCHMC data used for these analyses, and both realworld data sources seemed to halt timed function tests earlier than observed in the clinical trial placebo arm. It should also be noted that the Tadalafil DMD Trial imposed stricter inclusion criteria of baseline 6MWD between 200 and 400 meters and >6 months stable steroid use [5], whereas the study samples for the other data sources required only 4SC time < 12 seconds. This is reflected by patients in the Tadalafil DMD Trial having worse performance on baseline functional measures, and a larger average decline in 4SC over the 1-year period, compared with the other sources. The real-world data sources also included more heterogeneous intervals of time between visits than the clinical trial. We conducted a number of sensitivity analyses to account for these differences across data sources, including use of different 4SC cutoffs (12 and 30 seconds), studying both 4SC completion time and velocity, and considering different windows of follow-up time to approximate 1-year change (8-16 months and 10-14 months). Our main findings were robust to these changes.
This study has demonstrated that composite prognostic scores for 4SC are likely to be feasible and to be important for clinical trial design, and that development of a consensus-based composite prognostic score is warranted. Additional prognostic factors should also be investigated, and might include other functional measures such as the NSAA [23][24][25] and measures of muscle strength, which have been correlated with functional status [26]. Imaging assessments of bone density and changes in lean body mass [27], cardiac and pulmonary measures [28,29], dystrophin genotypes, genetic modifiers and other biomarkers could also be investigated in future analyses. Use of a prognostic score in clinical trial design and interpretation requires several steps beyond those conducted in the present study. Most importantly, a consensus score, ideally based on multiple data sources, will need to be developed and validated, and achieve clinical acceptance. In addition, the small (but not negligible) effort that it would take to calculate and interpret the score will need to be viewed as worthwhile relative to the gains that can be achieved in power and/or lower sample size requirements. Research to address these steps is currently being conducted within cTAP.
Supporting information S1