Inconsistent selection of outcomes and measurement devices found in shoulder arthroplasty research: An analysis of studies on ClinicalTrials.gov

Introduction Recent evidence suggests a lack of standardization of shoulder arthroplasty outcomes. This issue is a limiting factor in systematic reviews. Core outcome set (COS) methodology could address this problem by delineating a minimum set of outcomes for measurement in all shoulder arthroplasty trials. Methods A ClinicalTrials.gov search yielded 114 results. Eligible trials were coded on the following characteristics: study status, study type, arthroplasty type, sample size, measured outcomes, outcome measurement device, specific metric of measurement, method of aggregation, outcome classification, and adverse events. Results Sixty-six trials underwent data abstraction and data synthesis. Following abstraction, 383 shoulder arthroplasty outcomes were organized into 11 outcome domains. The most commonly reported outcomes were shoulder outcome score (n = 58), pain (n = 33), and quality of life (n = 15). The most common measurement devices were the Constant-Murley Shoulder Outcome Score (n = 38) and American Shoulder and Elbow Surgeons Shoulder Score (n = 33). Temporal patterns of outcome use was also found. Conclusion Our study suggests the need for greater standardization of outcomes and instruments. The lack of consistency across trials indicates that developing a core outcome set for shoulder arthroplasty trials would be worthwhile. Such standardization would allow for more effective comparison across studies in systematic reviews, while at the same time consider important outcomes that may be underrepresented otherwise. This review of outcomes provides an evidence-based foundation for the development of a COS for shoulder arthroplasty.

ClinicalTrials.gov to elucidate the diversity of methodologies and outcomes reported. The objective of this study is to provide an evidence-based foundation for the development of a COS for shoulder arthroplasty.

Methods
We conducted an analysis of studies catalogued in ClinicalTrials.gov to examine outcomes reported in registered orthopedic surgery clinical trials. This study did not meet the regulatory definition of human subject research as defined in 45 CFR 46.102(d) and (f) of the Department of Health and Human Services' Code of Federal Regulations [17] and, therefore, was not subject to Institutional Review Board oversight. We consulted Li et al [18], the Cochrane Handbook for Systematic Reviews of Interventions [19], and the National Academies of Science, Engineering, and Medicine's (formally the Institute of Medicine) Standards for Systematic Reviews [20] for best practices in data collection and management for systematic reviews as we developed our methodology. To adhere to best practices in reporting, we applied relevant PRISMA guidelines [21] (Checklist items 1-3, 5-11, 13, 16-18, 20, 23, 24, 26, 27) since our study involved the synthesis of multiple registered trials. We applied SAMPL guidelines [22] for reporting descriptive statistics. This study was registered with the Core Outcome Measurement in Effectiveness Trials (COMET) Initiative (http://www.comet-initiative.org/studies/ details/812?result=true). Data from this study is publicly available on figshare (https://dx.doi. org/10.6084/m9.figshare.3464831.v2).

Eligibility criteria for considering studies for this review
Primary studies registered in ClinicalTrials.gov between 2005 and 2015 in which shoulder arthroplasty (including total shoulder arthroplasty, reverse shoulder arthroplasty, hemiarthroplasty, and glenoid resurfacing) was used as an intervention were eligible for this review. For this study, both open (not yet recruiting, recruiting) and closed (active, not recruiting; completed; terminated; suspended; withdrawn; enrolling by invitation) trials were eligible for inclusion. Randomized and non-randomized clinical trials as well as observational studies were included since these study designs may be registered on ClinicalTrials.gov [23]. We used the following definitions to classify study types. A clinical trial (National Institutes of Health definition) was defined as "a research study in which one or more human subjects are prospectively assigned to one or more interventions (which may include placebo or other control) to evaluate the effects of those interventions on health-related biomedical or behavioral outcomes." An observational study was defined as "a biomedical or behavioral research study of human subjects designed to assess risk factors for disease development or progression, assess natural history of risk factors or disease, identify variations based on geographic or personal characteristics (such as race/ethnicity or gender), track temporal trends, or describe patterns of clinical care and treatment in absence of specific study-mandated interventions" [24].

Search strategy for identifying relevant studies
We consulted a research librarian to conduct a search for clinical trials registered on Clinical-Trials.gov that examined shoulder arthroplasty interventions reported in orthopedic surgery literature. ClinicalTrials.gov was searched in order to identify unpublished or ongoing trials. We used registered trials to minimize the possibility of selective outcome reporting bias 30 and to better understand the outcomes reported in current orthopedic clinical trials. This search was narrowed for four common arthroplasty shoulder procedures: total shoulder arthroplasty (TSA), reverse shoulder arthroplasty (RSA), hemiarthroplasty (HA), and glenoid resurfacing; however, we did not impose a limiter for language or restrict the search by journal. The final search string is as follows: Shoulder AND (Surg Ã OR operat Ã OR arthroplasty OR hemiarthroplasty OR (joint Ã AND replace Ã ) OR debride OR debridement OR debrided OR (surface AND (replace OR replacement OR replaced)) OR resurface OR resurfaced OR resurfacing) | received from 01/01/2005 to 12/31/2016. The search was performed on June 30, 2017.

Study selection and data collection
Four authors (MTS, JTS, BMH, and GRD) equally divided the studies among one another and independently screened all of the studies for eligibility. To be eligible, a study must have reported the use of shoulder arthroplasty as an intervention. We included total, hemi-, and reverse arthroplasty as well as glenoid resurfacing; hence, arthroscopic studies were excluded from analysis. Studies must also have been registered on ClinicalTrials.gov between 2000 and 2016. We included both observational and interventional studies, as both commonly report primary and secondary outcomes in ClinicalTrials.gov. After the initial screening was completed, a second screening was performed by an author (BND) who was blinded from previous screening results. Discrepancies in screening were resolved by discussion between BND and the other authors. Final exclusions are outlined in the PRISMA flow diagram (Fig 1).
An abstraction manual was designed after consulting several sources [25][26][27][28][29][30] to ensure data abstraction was consistently and accurately performed by authors. Authors participated in a series of meetings to apply the abstraction manual to a subset of 15 studies as a pilot test before launch. During these meetings, authors abstracted data elements by reviewing each study, discussing data elements, and reaching agreement on changes to the abstraction manual. Refinements were made based on pilot feedback and a final manual was produced. Data elements included: • sponsor(s), title of the article; • start date of trial (year); • study status (not yet recruiting; recruiting; active, not recruiting; completed; terminated; suspended; withdrawn; enrolling by invitation) • study type (interventional, observational, etc.); • type of arthroplasty (TSA, RSA, HA, glenoid resurfacing, other); • sample size; • measured outcomes; • outcome measurement device; • specific metric of measurement (value at a time point, change from baseline, time to event, unclear); • method of aggregation (mean, median, percent/proportion, absolute number, unclear); • outcome classification (primary, secondary, other, unclear); • whether the outcome was considered a side effect/harmful.
The registered studies meeting inclusion criteria were then equally divided for data abstraction among four authors (MTS, GRD, JTS, and BMH). Working in pairs, authors first abstracted data elements from articles in their set and then validated the abstracted data of their partner. Any discrepancies in data abstraction were settled by discussion between the pair, or when necessary, by adjudication with the blinded author (BND) to ensure the accuracy and integrity of this study.

Definition and classification of measured outcomes
We defined an outcome as the exact word-for-word terms (presented as either a primary or secondary outcome) in a trial for any clinical endpoint, or physiological, metabolic, or mortality event measured by clinicians or researchers [26]. Eleven outcome domains were determined based on the distribution of outcomes within this study and previously defined domains by Page et al [28]. Outcomes were classified under the following outcome domains: Adverse Events, Function/Disability, Global Assessment of Treatment Success, Health Related Quality of Life (HRQoL), Orthopedic Tests, Other, Pain, Radiologic Evaluation, Range of Motion (ROM), Strength, and Survival. Individual outcomes were distributed into each of these categories during the coding process. In order to decrease heterogeneity of reported outcomes, authors determined standardized terminology for each outcome.

Statistical analysis
Results were summarized using frequencies and percentages for binary outcomes, and medians and interquartile ranges (IRQs) for continuous outcomes. Locally weighted scatterplot smoothing (nonparametric regression method) was used to smooth the scatterplots of outcome domain use over time [28]. Our final scatterplot data is available on figshare (https://dx. doi.org/10.6084/m9.figshare.3464831.v2). Descriptive statistics were used to summarize data and all analyses were conducted using STATA 13.1 (College Station, TX).

Results
A total of 114 clinical trials were identified on ClinicalTrials.gov. Forty-eight studies were excluded after failing to meet inclusion criteria (Fig 1). A final sample size of 66 trials underwent data abstraction and was included in the final data synthesis. Clinical trials included within this study started their research between 2000 and 2016, as summarized in Table 1.

Implant survival (11) Revision/reoperation (3)
Device success rate (1) Time to first revision (1) Frequency (6) Kaplan-Meier (5) Unspecified (5) Time to event (11 The Radiologic Evaluation domain contained the greatest number of outcomes (n = 79) followed by the HRQoL (n = 68) and Global Assessment of Treatment Success (n = 60) domains (Table 2). In terms of outcome reporting, the Radiologic Evaluation domain contained a large number of unique outcomes that were measured in a few studies. The Global Assessment of Treatment Success domain contained the most commonly reported outcome, shoulder outcome score (n = 58). Pain (n = 33), quality of life (n = 15), function (n = 15), ROM (n = 11) and implant survival (n = 11) were also frequently reported outcomes (Table 3). Across all domains, 61 outcomes had an unspecified measurement device. The most common measurement devices were the Constant-Murley Shoulder Outcome Score (n = 38), American Shoulder and Elbow Surgeons (ASES) Shoulder Score (n = 33), and frequency counts (such as number of adverse events or revisions) (n = 30) ( Table 2).
There was a mean of six outcomes reported per study, with a range between one and thirtyseven outcomes reported per study. In each trial registry, the outcomes received a classification of primary, secondary, other, or unspecified. Of the 383 reported outcomes, 68.7% (263/383) were classified as secondary outcomes and the remaining were predominantly primary outcomes (120/383, 31.3%).

Discussion
Results from our study suggest the need for greater standardization of outcomes as well as the instruments used to measure them. Interestingly, concurrent evaluations to ours by Page et al. [31][32] have affirmed the need for greater standardization of outcomes and measurement for shoulder disorders. Our findings are complimentary and confirmatory even though we used different search methodologies and applied different inclusion criteria. We limited our search to registered trials to minimize selective outcome reporting, whereas Page et al. reviewed published trials that served as primary studies in Cochrane reviews or were indexed in PubMed. Furthermore, while we examined outcomes reported across studies applying specific interventions (i.e., arthroplastic procedures), Page et al. looked more broadly at shoulder disorders. Despite these differences, we observed similar inconsistencies in trial outcomes. The lack of consistency observed in these studies indicates that developing a core outcome set for shoulder arthroplasty trials would be worthwhile. Such standardization would allow for more effective study to study comparisons in systematic reviews, while at the same time consider important outcomes that may be underrepresented otherwise.
While six outcomes, on average, were measured across trials, there were trials with as many as 37 outcomes measured in a single trial. Core outcome sets are developed to refine outcomes to those most meaningful and important across investigations and could help limit the number of outcomes being measured. Large numbers of outcomes in trials could result in increased occurrences of selective outcome reporting bias [33] or p-hacking [34], both of which may adversely affect our understanding of the true nature of clinical trial results. We found a wide variety of shoulder instruments used across trials. For global assessment of treatment success, the Constant-Murley Score and ASES were used more frequently than other instruments. A systematic review of psychometric properties for the Constant-Murley Score reported the need for greater standardization for performing the score and greater caution during score interpretation [35]. Other issues, such as weighting the subscales, are ongoing matters of investigation with this scale. For most shoulder instruments, psychometric studies have focused on traditional validity and reliability estimates. Additional research is needed to determine important outcomes such as the minimal clinically important difference [35,36].
We noted several temporal trends in outcomes in this study. For example, our results suggest that HRQoL outcomes decreased over time. This finding is contrary to recent calls to include patient-centered outcomes in clinical research [37][38][39][40][41]. As early as 1990s, researchers recognized the importance of including patient-centered outcomes in orthopedic surgery research, rather than reliance on revision rates or clinical judgments to evaluate post-operative improvement [42]. Xu et al described HRQoL outcomes as a "necessity to fully understand the  Inconsistent selection of shoulder arthroplasty outcomes effects" of orthopedic interventions [43]. Furthermore, given recent indications of the prevalence of clinical depression in patients undergoing elective TSA, improved understanding of important quality of life variables is clearly warranted [44].

Limitations
Our study has the following limitations. We limited our sample to outcomes reported on Clini-calTrials.gov based on the recommendation of Clark and Williamson [45]. We chose this approach to include the most current outcomes, while simultaneously limiting selective outcome reporting bias. Although ClinicalTrials.gov is a United States based trial registry platform, there are currently 201 countries utilizing the registry and accounting for nearly 50% of registered studies [46]. Challenges also exist with registry-listed outcomes, which include the potential for vague and incomplete reporting. These challenges have been noted by the WHO and ClinicalTrials.gov, and actions are being taken to improve the accurate reporting of trial outcomes. We also did not search other trial registries, as Moja et al found that ClinicalTrials. gov contained enough data to adequately describe the ongoing research and was most valuable of all registries to finding ongoing clinical trials [47]. Furthermore, we wanted to avoid translating registrations that were written in other languages. We also did not search databases of published works, like MEDLINE or Embase, since published studies have been known to limit outcome reporting to only those which were found to be statistically significant [48][49][50]; therefore, the published literature may not contain all outcomes originally intended for measurement [51].

Conclusion
In summary, this study found a lack of standardization regarding outcomes and measurement devices. This lack of standardization limits systematic reviews to outcomes reported and measured consistently across studies. Important outcomes may be omitted from a subset of studies, limiting data synthesis. Our study provides a summary of outcomes most frequently reported and co-occurring outcomes as a foundation for a follow up study to begin developing a core outcome set for shoulder arthroplasty studies.