Which and How Many Patients Should Be Included in Randomised Controlled Trials to Demonstrate the Efficacy of Biologics in Primary Sjögren’s Syndrome?

Objective The goal of this study was to determine how the choice of the primary endpoint influenced sample size estimates in randomised controlled trials (RCTs) of treatments for primary Sjögren’s syndrome (pSS). Methods We reviewed all studies evaluating biotechnological therapies in pSS to identify their inclusion criteria and primary endpoints. Then, in a large cohort (ASSESS), we determined the proportion of patients who would be included in RCTs using various inclusion criteria sets. Finally, we used the population of a large randomised therapeutic trial in pSS (TEARS) to assess the impact of various primary objectives and endpoints on estimated sample sizes. These analyses were performed only for the endpoints indicating greater efficacy of rituximab compared to the placebo. Results We identified 18 studies. The most common inclusion criteria were short disease duration; systemic involvement; high mean visual analogue scale (VAS) scores for dryness, pain, and fatigue; and biological evidence of activity. In the ASSESS cohort, 35 percent of patients had recent-onset disease (lower than 4 years), 68 percent systemic manifestations, 68 percent high scores on two of three VASs, and 52 percent biological evidence of activity. The primary endpoints associated with the smallest sample sizes (nlower than 200) were a VAS dryness score improvement higher to 20 mm by week 24 or variable improvements (10, 20, or 30 mm) in fatigue VAS by week 6 or 16. For patients with systemic manifestations, the ESSDAI change may be the most logical endpoint, as it reflects all domains of disease activity. However, the ESSDAI did not improve significantly with rituximab therapy in the TEARS study. Ultrasound score improvement produced the smallest sample size estimate in the TEARS study. Conclusion This study provides valuable information for designing future RCTs on the basis of previously published studies. Previous RCTs used inclusion criteria that selected a small part of the entire pSS population. The endpoint was usually based on VASs assessing patient complaints. In contrast to VAS dryness cut-offs, VAS fatigue cut-offs did not affect estimated sample sizes. SGUS improvement produced the smallest estimated sample size. Further studies are required to validate standardised SGUS modalities and assessment criteria. Thus, researchers should strive to develop a composite primary endpoint and to determine its best cut-off and assessment time point.

assess the impact of various primary objectives and endpoints on estimated sample sizes. These analyses were performed only for the endpoints indicating greater efficacy of rituximab compared to the placebo.

Results
We identified 18 studies. The most common inclusion criteria were short disease duration; systemic involvement; high mean visual analogue scale (VAS) scores for dryness, pain, and fatigue; and biological evidence of activity. In the ASSESS cohort, 35 percent of patients had recent-onset disease (lower than 4 years), 68 percent systemic manifestations, 68 percent high scores on two of three VASs, and 52 percent biological evidence of activity. The primary endpoints associated with the smallest sample sizes (nlower than 200) were a VAS dryness score improvement higher to 20 mm by week 24 or variable improvements (10, 20, or 30 mm) in fatigue VAS by week 6 or 16. For patients with systemic manifestations, the ESSDAI change may be the most logical endpoint, as it reflects all domains of disease activity. However, the ESSDAI did not improve significantly with rituximab therapy in the TEARS study. Ultrasound score improvement produced the smallest sample size estimate in the TEARS study.

Introduction
Primary Sjögren's syndrome (pSS) is a chronic autoimmune disorder that induces dryness of the eyes (xerophthalmia) and mouth (xerostomia); salivary gland lesions; and presence of autoantibodies including anti-SSA, anti-SSB, and/or rheumatoid factor. The prevalence of pSS ranges from less than 0.1 to 1 percent [1], and adult women are predominantly affected. The manifestations are disabling symptoms due to ocular and oral dryness combined with fatigue [2] and severely impaired quality of life [3,4]. Diffuse pain and fibromyalgia are also present in 5 percent of pSS patients [2,5], as seen in systemic lupus erythematosus (SLE). In addition, 5 percent to 50 percent of patients have systemic manifestations consisting chiefly of rheumatologic, neurologic, pulmonary, hematologic, and renal disorders [6]. B-cell hyperactivity is the hallmark of the disease, and presence of germinal centres in the salivary glands predicts the development of lymphoma [7,8]. Thus, the presentation of pSS varies to an extraordinary extent across patients and over time, and key symptoms are partly assessed using subjective tests. These characteristics of pSS raise major challenges when designing studies to assess treatment responses. To date, no systemic treatment has been proven effective in altering the course of pSS [9]. The recent development of several monoclonal antibodies and new insights into the pathophysiology of pSS have provided opportunities for evaluating new treatment targets, leading to a dramatic increase in the number of randomised controlled trials (RCTs) in pSS. The first RCTs assessed TNF alpha antagonists and failed to demonstrate efficacy [10,11]. More recently, studies focussed on B cells [12], which play a central role in the development of pSS [13][14][15], and on other targets such as interleukin-6 and CTLA-4 [16,17]. Preliminary openlabel studies of the safety and efficacy of biologics produced encouraging results [18][19][20] and were followed by small RCTs, which indicated efficacy in improving visual analogue scale (VAS) scores for fatigue and dryness, as well as stimulated whole salivary flow [21,22]. However, in the TEARS trial [23] (Tolerance and EfficAcy of Rituximab in primary Sjögren's syndrome), a large multicentre double-blind RCT in patients with active recent and/or systemic pSS, rituximab failed to significantly improve the primary endpoint versus a placebo at week 24, although a significant improvement was noted at week 6. Clearly, it would be useful to determine the minimal clinically important differences for endpoints used to evaluate treatments. There is also a need for estimating the sample sizes required for future studies of pSS according to the primary endpoint [24].
The goal of this study was to determine how the choice of the primary endpoint influenced sample size estimates in RCTs of treatments for pSS. We reviewed the inclusion criteria and primary endpoints used in published RCTs, and we analysed TEARS study results to evaluate how changes in these criteria and endpoints affected the sample size required for future RCTs.

Study Design
We reviewed the literature to identify the most widely used inclusion criteria for RCTs of biologics in pSS. We then applied those criteria to the ASSESS cohort [15], a recent prospective cohort of patients with well-established pSS, to determine the proportions of patients who would have been considered eligible for the RCTs. Finally, we conducted a post hoc analysis of TEARS trial data to evaluate how the primary endpoint affected the required sample size.

Literature Review
We used MeSH terms to search PUBMED, EMBASE, and clinicaltrial.gov for trials of biologics in pSS published or registered between 2000 and 2014. We considered all trials for which the inclusion criteria and primary endpoint were clearly defined.

Study Populations
We applied various inclusion criteria sets to the ASSESS (Assessment of Systemic Signs and Evolution in Sjögren's Syndrome) cohort [15], a multicentre prospective cohort of patients with well-established pSS created in 2006 to identify factors predicting lymphoma during a 5-year prospective follow-up. All patients gave their written informed consent to participation in the study, which was approved by the appropriate ethics committee. To ensure that the study population would be representative of the entire population with pSS, consecutive patients fulfilling American-European Consensus Group (AECG) criteria for pSS were enrolled. Fifteen centres recruited 395 patients. At baseline, median (25(th)-75(th)) disease duration was 5 (2-9) years, median EULAR Sjögren's Syndrome Disease Activity Index (ESS-DAI) [25,26] was 2 (0-7.0), and median EULAR Sjögren's Syndrome Patient Reported Index (ESSPRI) [27]  To evaluate the effect of the primary endpoint on the number of eligible patients, we performed a post hoc analysis of data from the TEARS study, a large double-blind RCT comparing the efficacy of rituximab to a placebo in pSS [23]. All patients met AECG criteria [28] and had active disease defined as values 50/100 mm for at least two of four VASs evaluating dryness, pain, fatigue, and global disease, respectively. Eligibility criteria were either recent disease (lower than 10 years since symptom onset) with biological activity (anti-SSA or rheumatoid factor) or cryoglobulinaemia or hypergammaglobulinaemia or elevated ß2-microglobulinaemia, or hypocomplementaemia; or systemic pSS defined as at least one extra-glandular manifestation. The primary endpoint was a 30-mm improvement from week 0 to week 24 on at least two of the four VAS scores. Secondary endpoints included improvement from baseline to week 24 in each of the four VAS scores, the ESSDAI [25,26]; basal salivary flow rate; salivary-gland ultrasound (SGUS) grade [19]; Schirmer's test results; van Bijsterveld scores; Chisholm grade; and laboratory variables (C-reactive protein and erythrocyte sedimentation rate; rheumatoid factor; antinuclear antibodies; serum IgG, IgA, and IgM levels; serum complement; cryoglobulinaemia; and serum level of B-cell-activating factor. A substudy was performed in a single centre, where 28 patients underwent B-mode and Doppler ultrasonography of the parotid and submandibular glands for assessments of echostructure and vascularisation. The 122 patients were recruited at 14 university hospitals and randomly assigned in a 1:1 ratio to blinded treatment with intravenous rituximab infusions (1 g) or placebo at weeks 0 and 2. Among them, 24 had recent-onset pSS, 31 systemic pSS, and 67 both.

Statistical Analysis
We described the inclusion criteria and endpoints used in published RCTs of treatments for pSS. We then separated the ASSESS cohort patients using the S1 File into groups based on whether they met the main inclusion criteria used in published RCTs, to evaluate the proportion of patients who would have been considered eligible (see supporting informations).
We estimated the sample sizes required to obtain 80 percent power for detecting each of the TEARS study endpoints, using Epi Info 7 (method based on the Fleiss formula) [29] on the basis of the SAS data set of the S2 File with the imputed values used for the analysis in supporting informations (Labels are available in the SAS data set to understand variables names; the SAS code to find again the results; titles allow to understand which results are being talked about). These analyses were performed only for the endpoints indicating greater efficacy of rituximab compared to the placebo.

Inclusion Criteria Used in Previous Studies
Of 147 publications identified on pubmed between 2001 and 2014 and two ongoing studies identified on clinicaltrial.gov. using "sjogren's syndrome" within the limit "clinical trial", we identified 17 studies evaluating any biologic in pSS (Fig 1).
Most of them evaluated TNF alpha antagonists, abatacept or rituximab [10,11,[18][19][20][21][22][23][30][31][32][33][34][35][36][37][38]. An open-label study tested epratuzumab in patients with B-cell overactivity, but no RCT is available for this drug. An open-label design was used in 8 studies (Table 1). Nine studies were published or ongoing RCTs (Table 2). One unblinded non-randomised trial evaluating the efficacy of rituximab versus synthetic disease-modifying antirheumatic drugs in two centres [30] was excluded due to the absence of randomisation. All preliminary open-label studies suggested efficacy of the evaluated biologics other than etanercept. Five open-label studies and three RCTs evaluated rituximab; another study of this drug is ongoing [31]. More recently, studies evaluated belimumab [32] and abatacept [18]. A large RCT is currently evaluating tocilizumab, for which no open-label data are available. All these studies used a combination of objective and subjective inclusion criteria. The most commonly used inclusion criteria were those in the AECG classification. In addition, 4 of the 8 open-label studies used the presence of autoantibodies, 3 the VAS scores, and 3 the systemic manifestations. In RCTs (Table 2), all inclusion criteria were based on the AECG classification, with salivary gland biopsy abnormalities, autoantibodies, or salivary flow rate impairment. Composite inclusion criteria were used in some RCTs; they were based on systemic manifestations in 5/9 studies, VAS score elevation in 5/9 studies, recent disease onset in 2/7 studies, and biological activity in 2/7 studies. Systemic manifestations have been used more often since 2011, probably due to the introduction of the ESSDAI.
In summary, the main inclusion criteria were a short disease duration (<4, 5, or 10 years); systemic involvement (ESSDAIhigher to 1); VAS scores higher to 5/10 for dryness, pain, and fatigue; and biological activity markers (hypergammaglobulinaemia and/or cryoglobulinaemia, and/or high béta 2 microglobulinaemia, and/or low C4). None of the studies assessed the presence of germinal centres or number of foci in salivary-gland biopsies [39,40] or the SGUS [41].

Application of Inclusion Criteria to the ASSESS Cohort
Of the 395 patients included in the ASSESS cohort, 342 (87 percent) had the data needed to assess the presence of the main inclusion criteria used in studies of biologics in pSS. At least two VAS scores were higher to 50/100 in 233/342 (68 percent) patients, and 232 (68 percent) patients had systemic manifestations with an ESSDAI 2 (Fig 2). Only 35 percent (121/342) of patients had recent-onset disease. Requiring symptom onset within the last 4 years, systemic disease, at least two of three VAS scores higher to 50/100, and biological activity would result in the inclusion of only 30/342 (9 percent) patients. The combination of recent-onset or systemic pSS with at least two of three VAS scores higher to 50/100 and biological activity would include 100/342 (29 percent) patients.

Sample Sizes According to TEARS Study Endpoints
In the TEARS study [23], one primary and several secondary endpoints were used to compare the efficacy of rituximab and of a placebo. The primary endpoint was an at least 30-mm improvement at week 24 in at least two of four VAS scores for fatigue, dryness, pain, and global disease. The proportion of patients who achieved the primary endpoint was not significantly different between the rituximab and placebo groups. However, several other endpoints were significantly better with rituximab. Thus, at least two of three VAS scores improved by more than 30 mm by week 6. The VAS dryness score was significantly improved at week 24 and the SGUS score was improved at week 24. Higher proportions of improved patients were found in the rituximab group for the VAS fatigue and global disease scores and for the ESSPRI. We computed the proportions of patients with improvements in the four VAS scores and in the ESSPRI or SGUS score defined using various cut-offs, and we determined the sample sizes required to demonstrate differences during future RCTs at weeks 6, 16, and 24 (Table 3). At week 24, the greatest difference between the placebo and rituximab groups was for the VAS dryness score.
Using cut-offs of VAS changes of 10, 20, or 30 mm induced large changes in the estimated sample size. A cut-off higher to 20 mm increased the sample size from 132 to more than 300 at week 24 compared to a cut-off of 10 mm. In contrast, the VAS fatigue score cut-off did not influence sample size, which was less than 200 for assessment at week 6 or 16. An at least 10-point improvement in the ESSPRI was achieved by 340 patients at week 24, but detecting a larger improvement would have required over 1000 patients. SGUS was associated with the smallest sample size required to detect an effect of rituximab versus placebo. Nevertheless, in the TEARS study, the SGUS score change was not associated with the improvement in the mean VAS dryness score from baseline to week 24. For patients with systemic manifestations, the ESSDAI change may be the most logical endpoint, as it reflects all domains of disease activity. However, the ESSDAI did not improve significantly with rituximab therapy in the TEARS study.

Discussion
Because pSS is a complex and heterogeneous disease that is difficult to evaluate objectively, measuring treatment responses is challenging. Validated endpoints are lacking. However, tools have been developed recently to assess systemic activity (ESSDAI) and burden to the patient (ESSPRI). The best time point for such assessments is unclear, and the inclusion criteria that should be used for clinical trials are debated. Answers to these issues are urgently needed to allow the design of feasible RCTs capable of providing clinically relevant information on the efficacy of treatments for pSS. We sought to obtain such answers by examining data from earlier studies.
Most of the open-label studies of biologics in pSS suggested good safety and efficacy, but these results were not confirmed in RCTs. Inadequate sample size may have prevented the RCTs from detecting significant efficacy and discordant results have been found between RCTs with large and small sample sizes. The use of classification criteria for patient selection is not sufficient: patients selected for RCTs should have manifestations for which improvement or stabilisation is feasible. In addition, they should exhibit biological markers of disease activity and disabling symptoms such as dryness, pain, and fatigue, since the vast majority of pSS patients report discomfort related to such symptoms. However, even when evaluated using the ESSPRI, these subjective symptoms may fail to correlate with objective tests and fluctuate over time. Furthermore, the relative contributions of active disease processes and irreversible residual damage to the symptoms may be difficult to determine. Given the lack of specificity of dryness and fatigue, most studies relied on composite criteria that included autoantibodies or salivary-gland biopsy abnormalities [3]. Systemic manifestations were rarely used as inclusion criteria in the past, probably due to the absence of a scoring system; since the introduction in 2011 of the ESSDAI, a mean ESSDAI higher to 5 has been used. In the TEARS study, rituximab failed to significantly improve the systemic manifestations compared to the placebo. In another study of 20 patients given rituximab and compared to 10 patients given a placebo [42], rituximab substantially improved the ESSDAI standardised response mean at week 24. Results of ongoing RCTs will probably provide more accurate results [31].
When we applied the most widely used inclusion criteria to a large nationwide cohort of patients with recent pSS, we found that most patients reported severe discomfort, with at least two of four VAS scores (dryness, fatigue, pain, and global disease) higher to 50/100. However, less than 20 percent of patients had the combination of recent-onset active disease with high VAS scores and biological activity. This point may affect the feasibility of RCTs. Only half the patients with disease onset within the last 4 years had evidence of biological activity. The ESS-DAI was elevated, but the mean value was less than 2 points. Thus, the inclusion of patients with high sub-scores on a single ESSDAI domain may require international multicentre patient recruitment. Recent disease onset and/or systemic disease are widely believed to predict a better treatment response, particularly to biologics, compared to long-standing disease. Using these two inclusion criteria dramatically decreases the required sample size. On the opposite, most patients had at least two VAS scores greater than 50/100 and biological activity.
Efficacy data from therapeutic trials depend in large part on the primary endpoint. In the TEARS study, rituximab improved the VAS dryness and fatigue scores, in keeping with earlier findings [21,22]. The definition of a clinically significant improvement in pSS is a current focus of research [42,43]. Improvements in VAS scores were used in previously published studies [21,22]. VAS fatigue score cut-offs of 10, 20, and 30 mm at week 6 or 16 were associated with similar sample size requirements. A 20-mm cut-off may provide a good balance between clinical significance and sample size. Rituximab is the first treatment for which an effect on incapacitating fatigue has been demonstrated using a randomised controlled design. For the VAS dryness score, choosing a cut-off greater than 10 mm dramatically decreased the sample size. For the ESSPRI, at least 150 patients would be needed regardless of the cut-off used.
The SGUS score deserves consideration as an endpoint. In the TEARS study, the parotid gland score improved significantly with rituximab therapy [44]. This endpoint produces the smallest sample size. SGUS is a simple and inexpensive procedure that can be repeated easily and safely over time and that is undergoing validation [45,46]. However, the exact significance of the SGUS score in terms of the disease process is still being evaluated, and there is no published evidence that it correlates with improvements in salivary flow rate or VAS dryness scores.
In conclusion, this study provides valuable information for designing future RCTs on the basis of previously published studies. The most common inclusion criteria were short disease duration; systemic involvement; high mean VAS scores for dryness, pain, and fatigue; and biological evidence of activity. Combining these criteria would select only a small proportion of patients with pSS. The primary endpoints associated with the smallest sample sizes are a VAS dryness score improvement higher to 20 mm by week 24 and variable improvements (10, 20, or 30 mm) in the VAS fatigue score by week 6 or 16. SGUS improvement produced the smallest estimated sample size (n = 42). Further studies are required to validate standardised SGUS modalities and assessment criteria. Thus, researchers should strive to develop a composite primary endpoint and to determine its best cut-off and assessment time point. The Sjogren Syndome Responder Index (SSRI), recently published [47], could be a candidate.
Supporting Information S1 File. S1 file was used to separate the ASSESS cohort patients into groups based on whether they met the main inclusion criteria used in published RCTs, to evaluate the proportion of patients who would have been considered eligible. (XLS) S2 File. S2 file SAS data set with the imputed values was used for the analysis (Labels are available in the SAS data set to understand variables names; the SAS code to find again the results; titles allow to understand which results are being talked about). (ZIP)