Reliability and Validity of Selected PROMIS Measures in People with Rheumatoid Arthritis

Purpose To evaluate the reliability and validity of 11 PROMIS measures to assess symptoms and impacts identified as important by people with rheumatoid arthritis (RA). Methods Consecutive patients (N = 177) in an observational study completed PROMIS computer adapted tests (CATs) and a short form (SF) assessing pain, fatigue, physical function, mood, sleep, and participation. We assessed test-test reliability and internal consistency using correlation and Cronbach’s alpha. We assessed convergent validity by examining Pearson correlations between PROMIS measures and existing measures of similar domains and known groups validity by comparing scores across disease activity levels using ANOVA. Results Participants were mostly female (82%) and white (83%) with mean (SD) age of 56 (13) years; 24% had ≤ high school, 29% had RA ≤ 5 years with 13% ≤ 2 years, and 22% were disabled. PROMIS Physical Function, Pain Interference and Fatigue instruments correlated moderately to strongly (rho’s ≥ 0.68) with corresponding PROs. Test-retest reliability ranged from .725–.883, and Cronbach’s alpha from .906–.991. A dose-response relationship with disease activity was evident in Physical Function with similar trends in other scales except Anger. Conclusions These data provide preliminary evidence of reliability and construct validity of PROMIS CATs to assess RA symptoms and impacts, and feasibility of use in clinical care. PROMIS instruments captured the experiences of RA patients across the broad continuum of RA symptoms and function, especially at low disease activity levels. Future research is needed to evaluate performance in relevant subgroups, assess responsiveness and identify clinically meaningful changes.

Introduction symptoms and function. Results are reported using a common metric (i.e., a T-score with a mean of 50 and standard deviation (SD) of 10) and have been normed to the US population. To date, only the PROMIS Physical Function scale has been evaluated in RA [24,25].
In earlier work, we identified domains that people with RA considered impactful on their health-related quality of life (HRQL) [15,18]. Here, we describe the performance and validation of 11 PROMIS instruments in adults with RA in the context of ongoing care. We hypothesized that as compared with the general US population, PROMIS scores in people with RA would reflect greater HRQL impairments related to pain, fatigue, sleep, mood, physical function, and participation. We also hypothesized that scores would correlate moderately to strongly with existing legacy instruments assessing similar constructs and would show evidence of a dose-response relationship with disease activity levels.

Materials and Methods
Data are from a prospective cohort study of people receiving guideline-based RA care in an academic rheumatology clinic. All procedures performed were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The study was approved by the Johns Hopkins Institutional Review Board (NA00071923). After providing written informed consent, coordinators registered participants in 94 Assessment Center (www.assessmentcenter.net), a secure online PROMIS research management tool.

Sample
Adults ages 18+ who were fluent in English and were enrolled in our clinical practice registries were eligible to participate. Exclusions were significant medical or psychiatric illness that the treating clinician felt would limit an individual's ability to participate in the study.

Procedures
Individuals were consecutively approached by phone or at the time of a routine clinic visit and provided with details about the study. After providing written informed consent, coordinators registered participants in Assessment Center (www.assessmentcenter.net), a secure online PROMIS research management tool. Assessment Center provides access to the CATs, SFs and study-specific questionnaires, and will automatically generate a report containing scores for PROMIS CATs. After checking in with the clinic receptionist, participants were given a tablet computer linked to a study-specific URL to complete questionnaires described below. RA legacy instruments were also completed. A subset of participants were consecutively approached and asked to complete the same PROMIS measures 2 days later to assess test-retest reliability.

Measures
Sociodemographic and RA Characteristics. Socio-demographic information was drawn from the patient's medical record and included age, sex, race/ethnicity, education, work status, RA duration, and RF/CCP status. Swollen and tender joint counts (28 joints) and MD global assessments of disease activity (100 mm VAS) were provided by treating rheumatologists. Clinical disease activity index scores (CDAI) were calculated to assess disease activity level.
Patient Reported Outcomes. Legacy RA PROs that are routinely collected included the Patient Global Assessment (100 mm VAS), Pain (100 mm VAS), the 8-item Modified Health Assessment Questionnaire of disability (M-HAQ, 0-3 scale), [10] and a fatigue 100 mm VAS [30]; with each measure, higher scores reflect more of the symptom. PROMIS instruments for physical, emotional, and social domains were selected based on earlier work [14][15][16][17]. Version 1.0 CATs were administered for: Pain Interference, Fatigue, Sleep Disturbance, Sleep-Related Impairment, Depression, Anxiety, Anger, and Physical Function; the 3-item PROMIS Pain intensity SF was also included. Version 2.0 CATs were administered for Ability to Participate in Social Roles and Satisfaction with Social Roles and Activities. Specific items, response options, and anchors are available through www.assessmentcenter.net. Scales were administered in fixed order, and CATs were programmed to administer from 4-8 items until a standard error (precision of estimate) fell at or below 0.3. Higher scores indicate more of the trait being measured, so that for physical function, participation, and satisfaction, a higher score is "better", whereas for symptoms, higher scores indicate higher levels of the symptom.

Statistical Analysis
Pearson coefficients and Spearman's rho were used to examine the degree to which PROMIS scores were consistent with each other and legacy PROs for similar domains. ANOVA was used to compare domain scores for PROMIS and legacy variables by CDAI disease activity levels. Pearson correlation and Cronbach's alpha were used to assess reproducibility and internal consistency; reliability >.7 was considered acceptable. Analyses were done using IBM SPSS version 22.

Patient Reported Outcomes
PROMIS and legacy scale scores are shown in Table 2. Mean patient global and pain scores were approximately 30 on a 100-point scale, and fatigue was 40. All legacy PROs were positively skewed. Floor effects were evident with the pain, fatigue and patient global VASs; 26 (15%) reported no pain, 18 (10%) reported no fatigue, and 31 (18%) scored 0 on the Patient Global (very well). MHAQ scores reflected minimal disability with 81 people (46%) scoring 0.
PROMIS scores were distributed across a broad range (i.e., -2.7 to +3.1 SD; see Fig 1) for each PROMIS measure, and were relatively normally distributed except Pain Intensity, Pain Interference and Depression which showed positive skewing. Across PROMIS instruments, mean scores were between 45 and 55 (i.e., within normal limits or 0.5 SD of US general population norms) except for Physical Function and Pain Intensity which were significantly lower than population norms. Across CATs, the median number of items administered was 3, except for Anger which was 4, and median completion time was 7 minutes.
Correlations among PROMIS Measures. Correlations among individual PROMIS scales ranged from weak to strong (e.g., r's 0.23 to 0.85; all p's .002) ( Table 3). The highest correlations ( 0.7) were evident among scales measuring similar constructs: physical health (e.g., pain, fatigue, sleep), mental health (e.g., depression, anxiety, anger), and social health (Ability to Participate in Social Roles and Satisfaction with Social Roles and Activities). Physical Function was also strongly correlated with Pain Interference (r = -0.71) and Ability to Participate in Social Roles (r = 0.70). The two participation scales were moderately to strongly (r's-.34 to .70) correlated with all symptom and function scales.
Convergent and Known Groups Validation with Legacy Instruments. The PROMIS Physical Function, Pain Intensity and Pain Interference and Fatigue instruments correlated strongly (rho's 0.75;p's 0.01) with corresponding legacy instruments (Table 4). Patient Global was moderately to strongly (rho's 0.68; p's 0.01) associated with PROMIS scales. The lowest associations were between Patient Global and PROMIS mood scales (Anger, rho = 0.32; Depression, rho = 0.41, and Anxiety, rho = 0.41; all p's 0.01).
In general, PROMIS scores worsened significantly (p < .05) as disease activity increased from remission through high disease activity (Table 5); physical health scores worsened by 12-17 points, social domains by 16-18 points, and emotional health by 8-11 points. A doseresponse relationship was evident in Physical Function. Similar trends were evident in all scales, although scores were not significantly different between low and moderate disease activity levels for most measures, except Anger, which remained within normal limits for remission, low, and moderate disease activity and worsened only in those with high disease activity. In all PROMIS physical and social health instruments, increases in impairment were highest between people in LDA vs. remission ( 0.7 SD); Anxiety and Depression worsened by nearly 0.5 SD, while Anger increased only slightly. Similar patterns were seen between those in high vs. moderate disease activity, where impairment increased on average 0.5 SD; the exception was Pain Intensity, where scores increased an average of 3.4 points.

Discussion
This study is the first to report evidence of the reliability and construct validity of 11 PROMIS instruments in people with RA within the context of ongoing care. We selected PROMIS instruments that reflect outcomes people with RA identified as important to them in our foundational work that included a literature review, focus groups with patients, surveys of experts, and combined patient-provider consensus Delphi exercises [15,16,32]}. The 11 instruments were completed in <11 minutes by 75% of patients. Pain (Intensity and Interference), Physical Function and Fatigue scores correlated highly (rho's 0.75) with corresponding legacy  The process for developing and validating PROs has evolved considerably over the last two decades and now includes recommendations to identify patient-relevant symptoms through qualitative inquiry, cognitively test and debrief of potential items, rigorously psychometrically evaluate, and validate in the targeted patient population and context of intended use (e.g. RCT vs. clinical practice) [3,23,33]. PROMIS instruments were initially developed to help researchers obtain precise estimates of symptoms and functional impacts from patients across chronic diseases using a common metric. Although PROMIS was developed and tested in the general US population and later in selected clinical conditions, evaluating the content validity of instruments, construct validity against legacy instruments, and the responsiveness of these instruments in specific conditions is necessary as outlined in the PROMIS instrument maturity model [23].
An important strength of the PROMIS instruments was the ability to capture the experiences using a common T-score metric and across the broad continuum of symptoms and function experienced by people with RA spanning roughly ± 3 SD (or 99.7% of data in a normal distribution). Notably, fatigue, emotional distress, sleep, and participation, which are not currently part of the recommended RA core set [30], also showed a wide distribution of scores, with many individuals reporting significant impairments. Floor and ceiling effects, recognized limitations of many instruments [24,25], were evident for many patients with legacy measures; for example, in our sample nearly 1 in 2 (46%) scored 0 on the MHAQ. Among the 56 people in remission, substantial proportions of individuals scored 0 on legacy instruments of pain (41%), physical function (75%), fatigue (27%) and patient global (46%). In contrast, PROMIS scores for people in remission showed considerable dispersion; the range for Pain intensity was 22 points (T-scores of 31 to 52), 43 points for Physical Function (27 to 70) and Satisfaction with Role Activities (24 to 67), and 44 points (22 to 66) for Ability to Participate in Social Roles and Activities. In physical and social domains, the largest increase in impairment was between people in remission and those in LDA. Conversely, in emotional domains, minimal differences were seen among lower levels of disease activity (remission to low, low to moderate), with the greatest differences evident between moderate and high disease activity. Most scores on PRO-MIS measures were higher in people with moderate vs. low disease activity, though differences were not statistically significant. However, the relatively small number of patients in each group and significantly clustering of individuals around the cut point between low and moderate disease activity may have contributed to this finding. Reliable, precise, and accurate measurement of symptoms and functional impacts across the continuum of disease activity has never been more important to optimize RA treatment given that remission or LDA is the current target for management. [11,34]. With the development of biologics and the focus on early, intensive treatment, many people with RA now reach states of remission or LDA; in our sample, 69% were at these targets. Composite RA disease activity measures, (e.g. Disease Activity Score [DAS28], CDAI, Simplified Disease Activity Index [SDAI]) rely on the answer to a single global question about disease activity or health status. However, multidimensional measures such as the SF-36 are proprietary and burdensome to complete and score in clinical practice settings. From the battery of instruments available, we were able to select PROMIS instruments to focus on important outcomes that either can only come from patients (i.e., symptoms) or those that are most practical to obtain by asking patients (impacts). PROMIS CATs offer optimal precision on a common metric with immediate scoring for real time use in clinical encounters. The ability of PROMIS to detect small changes even at the low end of symptoms and disability in patients with minimal disease activity can offer new insight into the relative burden of living with RA and new opportunities to compare the impact and side effects associated with current treatments, as well as the ability to capture changes in HRQL that may be relevant to tapering therapies after achieving a target of remission.
Findings from domains in which we assessed both symptom intensity and impact (e.g., Pain, Sleep, and Participation) produced some discrepancies that were not expected. Median Pain Interference scores were 10 points higher than Pain Intensity, suggesting the impact of pain on day-to-day function may be much greater than what scores on Pain Intensity scores reflect. Among patients in remission (CDAI <2.8), 36 (64%) had CDAI scores 1.0, supporting the absence of detectable disease and "deep" remission. Within this group, median T-scores were better than population norms for Pain Intensity (30.7), Pain Interference (39.1), Sleep Impairment (44.5), Depression (43.6), Anger (44.1), Ability to Participate Social (59.1) and Satisfaction with Social Roles and Participation (58.3); Physical Function was at the population norm (50.7). Higher scores may indicate a response shift reflecting how patients adapt to and report their level of symptoms and function over time [35]. Response shifts occur as patients reconceptualize their life circumstances, reprioritize what is important, and recalibrate (e.g., what pain scores of 10 represent) as they learn to live with RA [36]. For instance, some RA patients have reported that when they record a score of "0" on a questionnaire, this does not necessarily represent the absence of a symptom, but instead reflects a new baseline of "what is normal for me" [37]. Thus, our findings also raise important questions in defining the expected "norms" for RA symptoms and function. Further evaluation in larger numbers of patients across the continuums of age, disease activity, duration, disability, and adaptation is warranted to define RA norms.
Before widespread use of PROMIS in RA research and care can be recommended, it will be important to evaluate their performance in relevant subgroups and demonstrate that the instruments are sufficiently responsive or sensitive to change over time. Evaluation of how PROMIS scores change with fluctuations in disease activity is needed to define minimally detectable differences and clinically meaningful changes, essential parameters to facilitating their use in longitudinal care for individuals. Whether the PROMIS instruments perform similarly in other forms of arthritis, autoimmune, and inflammatory diseases remains to be determined.
Strengths of this study include use of a well-characterized cohort with the broad range of characteristics reflective of patients seen in real world settings. We evaluated the performance of PROMIS instruments within the context of usual care. Limitations of the study include use of a mostly white, well-educated sample with established RA that was generally well controlled. In our study, participants were English-speaking; there are ongoing efforts to evaluate translated versions of PROMIS instruments and examine cross-cultural validity [38]. The legacy measures used in this study were limited to ACR core set PROs that we routinely administer; PROMIS anxiety, depression and anger measures already have strong evidence of validity with cross-walks available for legacy measures [39][40][41]. We used PROMIS CATs which require an internet connection to Assessment Center and may not be feasible in some settings. The use of SFs in RA clinical care warrants further study to determine whether these retain sufficient precision for clinical decision-making [42].

Conclusions
This study contributes new evidence supporting the reliability and construct validity of 11 PROMIS instruments in RA and feasibility of real-time administration and scoring for use in clinical practice. Results demonstrate the considerable impact that RA may have on multiple domains of physical, emotional, and social health. This work provides important preliminary data supporting the applicability of PROMIS in RA research and care with broad implications for other forms of inflammatory and autoimmune diseases in estimating the intensity and impact of symptoms and function important to patients. Ongoing validation of the 'universal' PROMIS instruments in specific diseases such as RA can facilitate comparisons across diseases, treatments, cultures, and countries.