Performance of the inFLUenza Patient-Reported Outcome (FLU-PRO) diary in patients with influenza-like illness (ILI)

Background The inFLUenza Patient Reported Outcome (FLU-PRO) measure is a daily diary assessing signs/symptoms of influenza across six body systems: Nose, Throat, Eyes, Chest/Respiratory, Gastrointestinal, Body/Systemic, developed and tested in adults with influenza. Objectives This study tested the reliability, validity, and responsiveness of FLU-PRO scores in adults with influenza-like illness (ILI). Methods Data from the prospective, observational study used to develop and test the FLU-PRO in influenza virus positive patients were analyzed. Adults (≥18 years) presenting with influenza symptoms in outpatient settings in the US, UK, Mexico, and South America were enrolled, tested for influenza virus, and asked to complete the 37-item draft FLU-PRO daily for up to 14-days. Analyses were performed on data from patients testing negative. Reliability of the final, 32-item FLU-PRO was estimated using Cronbach’s alpha (α; Day 1) and intraclass correlation coefficients (ICC; 2-day reproducibility). Convergent and known-groups validity were assessed using patient global assessments of influenza severity (PGA). Patient report of return to usual health was used to assess responsiveness (Day 1–7). Results The analytical sample included 220 ILI patients (mean age = 39.3, 64.1% female, 88.6% white). Sixty-one (28%) were hospitalized at some point in their illness. Internal consistency reliability (α) of FLU-PRO Total score was 0.90 and ranged from 0.72–0.86 for domain scores. Reproducibility (Day 1–2) was 0.64 for Total, ranging from 0.46–0.78 for domain scores. Day 1 FLU-PRO scores correlated (≥0.30) with the PGA (except Gastrointestinal) and were significantly different across PGA severity groups (Total: F = 81.7, p<0.001; subscales: F = 6.9–62.2; p<0.01). Mean score improvements Day 1–7 were significantly greater in patients reporting return to usual health compared with those who did not (p<0.05, Total and subscales, except Gastrointestinal and Eyes). Conclusions Results suggest FLU-PRO scores are reliable, valid, and responsive in adults with influenza-like illness.

Introduction Influenza (flu) is characterized by an array of symptoms, including chills, cough, sore throat, runny or stuffy nose, fatigue, muscle/body aches, and potentially diarrhea and vomiting, with symptoms ranging in severity and duration [1]. In the absence of known influenza virus, this constellation of symptoms can be caused by a variety of other viruses and is often diagnosed as influenza-like illness (ILI) [2]. While most patients recover from ILI, the symptoms can negatively impact daily activities and functioning. Symptoms of ILI often closely approximate influenza symptoms such that the two previously have been indistinguishable. Therefore, a symptom measure useful in influenza may also be useful for evaluating the presence and severity of symptoms of ILI.
The InFLUenza Patient-Reported Outcome (FLU-PRO) measure was designed to evaluate the presence, severity, and duration of influenza symptoms in clinical trials. Developed using good research practices for scale development methods [3][4][5], including those recommended by the US Food and Drug Administration (FDA) [6], this 32-item daily diary offers a comprehensive evaluation of symptoms commonly experienced by patients with influenza that can be completed in 5 minutes. While other measures of influenza symptom severity exist, such as the Flu-iiQ [7] and the Canadian Acute Respiratory Illness and Flu Scale (CARIFS) [8], these measures do not assess the full range of symptoms associated with varied strains of influenza and identified as important to the patients themselves [9]. Additionally, the CARIFS was developed for use in children, while the FLU-PRO was developed for studies of adults and children, with content validity shown in both groups [9]. Previous research demonstrated that the FLU-PRO is reliable, valid, and responsive to change in hospitalized and non-hospitalized adults with laboratory-confirmed influenza [10]. The purpose of these analyses was to test the

Study design
Data from the prospective, observational FLU-PRO development study conducted with informed consent, under the National Institute of Allergy and Infectious Diseases institutional review board approval, and in accordance with the Declaration of Helsinki, were used in these analyses. Methods and results of the primary analyses in those with laboratory-confirmed influenza are reported elsewhere [10]. Briefly, adults !18 years of age seeking medical care for acute influenza symptoms at participating clinics in the US (16 sites; English-speaking), Argentina (two sites; Spanish-speaking), United Kingdom (one site; English-speaking), and Mexico (three sites; Spanish-speaking) were recruited during clinic visits. An elevated body temperature of 100˚F [37.8˚C] or greater was not an enrollment requirement. All study participants were tested for influenza using rapid influenza diagnostic tests (RIDTs), including polymerase chain reaction, rapid antigen test, and/or viral culture by nasal or nasopharyngeal swab. The performance of the FLU-PRO in those testing positive are presented elsewhere [10]. Data from subjects testing negative for the influenza virus were included in the current analysis.
In addition to the influenza diagnostic test, consented patients completed assessments of sociodemographic and clinical characteristics at baseline and a daily diary for 14 days that included the 37-item draft FLU-PRO symptom diary and nine additional questions used for FLU-PRO validation. At the Mexico site, the diary was completed via telephone interview with data entered directly into a web-based portal by the interviewer. Patients in 16 US sites, one UK site, and one Argentina site completed the survey either via interviewer-administration or self-administration via a web-based system using the subject's personal web-enabled device.

Compliance with ethical standards
The studies were conducted in accordance with the Declaration of Helsinki, and the National Institute of Allergy and Infectious Diseases ethics committee/institutional review board requirements, and good clinical practice guidelines. Informed consent was obtained from all individual participants included in the study.

Instruments: Patient-reported Outcomes (PROs)
InFLUenza Patient-Reported Outcome (FLU-PRO). The final FLU-PRO questionnaire instructed respondents to rate the severity of 32 influenza symptoms over the past 24 hours. The presence and severity of influenza signs and symptoms are assessed across six body systems affected by influenza: Nose (4 items), Throat (3 items), Eyes (3 items), Chest/Respiratory (7 items), Gastrointestinal (4 items), and Body/Systemic (11 items). A total score quantifies symptoms overall. Respondents are asked to rate each sign or symptom on a 5-point ordinal severity scale, with higher scores indicating a more severe sign or symptom. The questionnaire was developed for self-report or interviewer-administration, with slight differences in the instructions applicable for each administration.
Development procedures addressing content validity of the FLU-PRO are described elsewhere [9]. Quantitative item reduction and validation testing were performed on data from a prospective validation study, with data from laboratory-confirmed influenza positive patients (N = 221) serving as the primary analytical sample [10].

Patient Global Assessments (PGA).
The Patient Global Rating of Flu Severity is a single item to assess participants' overall influenza symptom severity. Participants were asked to rate severity on the following scale: 0 ("No flu symptoms today"), 1 ("Mild"), 2 ("Moderate"), 3 ("Severe"), and 4 ("Very severe"). The Patient Global Assessment of Interference in Daily Activities is a single item to assess interference in daily activities due to influenza symptoms during that day. Participants rated interference on the following scale: 1 ("Not at all"), 2 ("A little bit"), 3 ("Somewhat"), 4 ("Quite a bit"), and 5 ("Very much"). The Patient Global Assessment of Health is a single item to assess general physical health during that day. Participants rated their physical health on the following scale: 1 ("Poor"), 2 ("Fair"), 3 ("Good"), 4 ("Very good"), and 5 ("Excellent"). Finally, a Patient Global Rating of Change in Flu Severity was used to identify stable patients for reproducibility assessment.
Return to "usual" health and activities. Patients were asked to respond (yes/no) to the following questions: "Have you returned to your usual health today?" and "Have you returned to your usual activities today?"

Statistical analyses
Statistical tests were performed in accordance with classical test theory [11] to evaluate the psychometric properties of the FLU-PRO total and domain scores in participants with ILI, including reliability, construct and known-groups validity, and responsiveness. These analyses were performed on the entire ILI cohort and stratified by hospitalization status.
Reliability (internal and test-retest). Cronbach's coefficient alpha was used to estimate internal consistency reliability of the FLU-PRO Total and domain scores on day 1. Coefficients of 0.7-0.9 were pre-specified as "good" internal consistency, 0.4-<0.7 as moderate, and <0.4 as low or poor [11,12].
Data from patients whose influenza severity was unchanged over time were used to estimate the test-retest reliability of FLU-PRO Total and domain scores. Stable subjects were defined as those with "no change" on the Patient Global Rating of Change in Flu Severity over two consecutive days during Week 1 (i.e., day 1 to day 2, day 2 to day 3, etc.). If a subject was missing FLU-PRO scores for one of the days in the planned comparison, data for this subject was excluded from that analytical pair. Intraclass correlation coefficients (ICC from a fixed-effects model), paired t-tests, and effect size (ES) were calculated to evaluate score reliability. ICCs were expected to be at least moderate, exceeding 0.60. Mean differences between the two observations were expected to be minimal with a small ES (<0.20).
Construct validity. The relationship between the FLU-PRO Total and domain scores and three global ratings were assessed using Spearman correlations (r s ) using day 1 data, hypothesizing that the relationship between FLU-PRO scores and these global ratings would be moderate to high (r s >0.30) [13]. Correlations with the Patient Global Rating of Flu Severity were hypothesized to be strongest, while weaker correlations were expected with the more distal constructs, including the Patient Global Rating of Physical Health and the Patient Global Assessment of Interference with Daily Activities.
Known-groups validity. Analysis of variance (ANOVA) was used to compare FLU-PRO Total and domain scores across three Patient Global Rating of Flu Severity categories at day 1: "None" or "Mild"; "Moderate"; and "Severe" or "Very severe". Mean (SD), F-scores, and p-values were reported to determine the magnitude of the differences. Pairwise comparisons between means were performed using Scheffe's test adjusting for multiple comparisons.
Responsiveness. Analysis of covariance (ANCOVA) was used to compare changes in FLU-PRO scores at day 7 in responders (those returning to usual health or activity) and non-responders (those not returning to usual health or activity), adjusting for day 1 scores.
Responders were defined using the two different anchors in two separate analyses. It was expected that responders would have significantly larger (p <0.05) change scores than nonresponders.
Exploratory analyses. Exploratory analyses were conducted to statistically compare the symptom profiles of ILI patients in the current analytic sample at day 1 to the influenza positive patients in the FLU-PRO development sample. Specifically, independent samples t-tests were used to compare mean FLU-PRO total and domain scores between all ILI patients overall (i.e., hospitalized, and non-hospitalized) versus all influenza positive patients overall, while a 2-way analysis of variance (ANOVA) was used to compare mean FLU-PRO scores by influenza (i.e., positive, and negative) and hospitalization (i.e., hospitalized, and non-hospitalized) status. Scheffe's test was used to assess pairwise comparisons between groups.

Sample
Of the 536 subjects enrolled in the study, 441 had the minimum required data (day 1 diary assessment and !1 post-day 1 diary entry); data from the 220 subjects testing negative for the influenza virus were used in the current study. Table 1 presents baseline demographic and clinical characteristics for the analytical sample. The majority of the participants (70.9%) were from non-US countries.

Evaluation of psychometric properties
Results for the entire sample are reported below; results stratified by hospitalization status are provided in the online supplement S1 File.
Descriptive statistics of FLU-PRO domain and Total scores. The distributional characteristics of the FLU-PRO domain and Total scores on day 1 are shown in Table 2. Mean domain scores were lowest for the Gastrointestinal domain (mean = 0.5; SD = 0.7) and highest for the Nose (mean = 1.5; SD = 1.0) and Throat (mean = 1.5; SD = 1.2) domains. Floor effects were found for the Gastrointestinal (47%) domain. No ceiling effects were found.
Construct validity. As shown in Table 4, at day 1 the strongest associations were observed between FLU-PRO domain and Total scores and the Patient Global Rating of Flu Severity (r s = 0.31-0.68, p<0.0001), except Gastrointestinal (r s = 0.19, p<0.05). Moderate to high correlations were also found between FLU-PRO scores and the Patient Global Rating of Physical Health for all scores (r s = -0.30--0.49) except Gastrointestinal (r s = -0.21), Nose, and Eyes (both r s = -0.28). Only associations between the Patient Global Assessment of Interference in Daily Activities and the Total score (r s = 0.36, p<0.0001) and Body/Systemic domain score (r s = 0.40, p<0.0001) were moderate to strong.
Known-groups validity. Significant differences in FLU-PRO total scores were found across patient groups according to the patient global symptom severity rating (F = 81. 7 Fig 1, demonstrating a reduction in mean scores across days 1-14. In support of responsiveness, mean total and domain change scores were significantly greater for patients reporting a return to usual health (responders) by day 7, compared to those who did not, with the exception of the Eyes and Gastrointestinal domains ( Table 6). Using patient's report of return to usual activities as an anchor for responsiveness, mean Total and Body/Systemic change scores

Discussion
The objective of this study was to assess the performance properties of the FLU-PRO in patients seen in the clinic with ILI and testing negative for the influenza virus. For FDA qualification purposes, the FLU-PRO was developed and tested in patients with acute, laboratoryconfirmed influenza, with scores exhibiting sound measurement properties in the sample overall and stratified by hospitalization status. This current study tested the properties of FLU-PRO scores in hospitalized and non-hospitalized patients with ILI. If the instrument performs well, this would facilitate its use in population-level epidemiologic studies and natural history studies, where laboratory diagnosis of influenza is not always sought or confirmed, and in studies of patients with influenza-like symptoms but infected with viruses or pathogens other than influenza.
Results of this study indicate FLU-PRO scores are reliable and reproducible, demonstrate construct and known-groups validity, and are responsive to change in subjects with ILI. Internal consistency levels were high for each of the domains and the total score, and 2-day testretest reliability levels during the first seven days following enrollment were generally moderate to strong. The relatively low test-retest reliability of the Gastrointestinal domain is due, in part, to the low symptom prevalence and constrained variance. Further evaluation in patients with influenza strains or pathogens characterized by greater incidence of gastrointestinal distress should be performed, as gastrointestinal symptoms may occur in up to 40% of patients with influenza [14]. It was interesting to note that score reproducibility was lower during the first 2 assessment days in "stable" patients (those reporting "no change" in their symptoms). This may be a reflection of the relative nature of "no change" during recovery from an acute illness, an interpretation that could be explored qualitatively. As hypothesized and in support of construct validity, FLU-PRO scores were significantly related to patient global ratings of influenza severity and global health, with patterns and values similar to those found in the influenza positive sample. Weaker associations were observed with the patient global rating of interference with activities, as patients may return to usual activities before total improvement/resolution of symptoms. The data similarly supported known-groups validity as FLU-PRO scores were lowest in patients rating their symptoms as the mildest, and increased with increasing patient-reported symptom severity. It was interesting to note that eye, nose, and throat symptoms were more strongly related to flu severity than respiratory symptoms in milder cases (defined by patient global assessment), with respiratory symptoms sensitive to differences in the more severe cases. These results suggest the non-  respiratory domains may be particularly useful in studies of patients with milder influenza-like illness and to capture the full range of symptom severity. The low prevalence of gastrointestinal symptoms (relatively high percentage of participants reporting no gastrointestinal symptoms) would make it more difficult to show significant differences by patient global ratings of influenza severity. Further study in patients with strains or pathogens characterized by gastrointestinal effects are needed. When responders were defined by reports of return to usual health, the FLU-PRO demonstrated responsiveness to change from day 1-7. Similar to results observed in the influenza positive sample, FLU-PRO total and domain scores, except for Eyes and Gastrointestinal, were responsive to change in the ILI group when return to health was used to define response. Using return to usual activities as an anchor, responsiveness in ILI patients was shown in the total and Body/Systemic domain scores.
Comparing current study findings in patients with ILI to influenza-positive patients in the original validation study sample [10], the FLU-PRO performed similarly in both patient groups. FLU-PRO domain and total score reliability and reproducibility was supported, with similar coefficient profiles across domains. Additionally, there were moderate to strong correlations between FLU-PRO domain and total scores and Patient Global Rating of Flu Severity, Correlations with interference with daily activities were stronger in the laboratory-confirmed influenza patient population.
Results of this study show the utility of the multi-domain profile scores provided by the FLU-PRO. Unlike other measures that assess 1 or 2 dimensions of influenza symptom severity for use in either adults or children [7,8], the FLU-PRO is a comprehensive, multi-dimensional measure with content intended for use across a wide age spectrum [9]. Its structure yields a total score, representing symptom severity overall to facilitate hypothesis testing, and separate assessments of 6 different bodily systems that may be differentially affected by the virus and/or treatment. For example, the FLUiiQ assesses two domains: respiratory and systemic domains, with cough, sore throat, and nasal congestion symptoms captured within a single domain [7]. However, the ILI experienced by the patients in the current study was characterized by sore throat, with less severe respiratory and systemic symptoms than observed in patients testing positive for the influenza virus. The detailed body system profile can capture these symptoms which patients stated were important to them [9], and also provide information on where treatment is and is not providing symptomatic relief.
Gastrointestinal symptoms were experienced by less than 25% of the sample, indicating these symptoms were not prevalent in the year of this study but have been observed in a greater proportion of patients in other years with different circulating viral strains. Although this is technically a floor effect, the results provided quantitative information on the prevalence of gastrointestinal symptoms in this sample [14]. The Gastrointestinal domain should be retained, to assure comprehensive symptom assessment across viral strains or pathogens.
While the findings of the current study suggest FLU-PRO scores are reliable, valid, and responsive in this expanded target population, the study had several limitations. First, although patients were tested for influenza using RIDTs, the sensitivity and sensitivity of these tests for detecting influenza A or B can vary, and can differ pending the strain of influenza circulating during any given year [15,16]. Thus, it is likely that some ILI patients in this study were infected with influenza A or B. Testing for viruses other than influenza was not available at most sites, precluding analyses by viral etiology. The FLU-PRO performance can be evaluated in various specific viral infections in future studies. It is important to note that the purpose of the FLU-PRO is not to diagnose or to differentiate influenza types. Rather, it is to quantify the presence and severity of symptoms that are characteristic of various influenza and other respiratory viruses. The FLU-PRO performed well in those testing positive for influenza [10], the primary analyses of the study, with the current study suggesting the performance properties were comparable in patients presenting to the clinic with a similar set of symptoms but testing negative for influenza using the RIDT. Second, although hospitalized patients were included in the analytic sample, details about the hospitalization event (e.g., duration of influenza prior to hospitalization, acuity level during hospitalization, duration of hospitalization, complicating comorbid conditions, treatment) are unknown. These can be evaluated in future studies, but seem unlikely to affect the measurement properties of the instrument, although may result in a different scoring algorithm. Third, due to sample size limitations, stratified analyses by patient characteristics (e.g., sex, age) was not performed, but should be considered for future studies. Finally, while the content validity of the FLU-PRO has been established in children and adolescents through qualitative research, quantitative evaluation of measurement properties has not been conducted in these patient groups.
Results of this study suggest the FLU-PRO may be useful as an outcome measure in clinical trials and epidemiological studies of disease due to non-influenza viruses. Although influenza is often indistinguishable from other viral infections in clinical practice, a standardized comprehensive symptom measure may be useful for exploring differences in the presentation and course of various viral infections. The FLU-PRO may also be used as an inclusion criteria for studies to ensure patients have sufficient symptoms to test treatment effects and recovery patterns. Future work also may evaluate the use of the FLU-PRO in prevention studies, such as vaccines, for disease due to various viruses including influenza.

Conclusion
Results of this study suggest FLU-PRO scores are reliable, valid, and responsive to change in patients testing negative for the influenza virus, indicating the instrument can be used in studies of confirmed influenza and influenza-like illness.