Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A practical evaluation of statistical methods for the analysis of patient reported outcomes in an observational pharmaceutical study

  • Lucy R. Williams,

    Roles Conceptualization, Formal analysis, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Gilead Sciences, Inc., Foster City, California, United States of America, School of Public Health, White City Campus, Imperial College London, London, United Kingdom

  • Andrea Marongiu,

    Roles Methodology, Writing – original draft, Writing – review & editing

    Affiliation Gilead Sciences, Inc., Foster City, California, United States of America

  • Filippos T. Filippidis,

    Roles Writing – review & editing

    Affiliation School of Public Health, White City Campus, Imperial College London, London, United Kingdom

  • Marion Heinzkill,

    Roles Writing – review & editing

    Affiliation Gilead Sciences, Inc., Foster City, California, United States of America

  • Anna R. van Troostenburg,

    Roles Writing – review & editing

    Affiliation Gilead Sciences, Inc., Foster City, California, United States of America

  • Richard Haubrich,

    Roles Writing – review & editing

    Affiliation School of Medicine, University of California San Diego, San Diego, California, United States of America

  • Heribert Ramroth

    Roles Supervision, Writing – original draft, Writing – review & editing

    Heribert.Ramroth@gilead.com

    Affiliation Gilead Sciences, Inc., Foster City, California, United States of America

Abstract

Background

Patient-reported outcomes (PROs) provide a unique opportunity to tailor clinical care to patients’ needs. Observational pharmaceutical industry analyses of PROs in the HIV field often utilise simplistic pairwise comparisons of pre-defined follow-up periods to baseline, making inappropriate missing data assumptions and yielding limited information on the nature of the change in PRO. Our aim was to evaluate different statistical approaches for PRO analyses.

Methods

Paired difference tests, Friedman’s ANOVAs (F-ANOVA), linear mixed models (LMMs) and weighted generalised estimating equations (wGEEs) were applied to the analysis of the Short Form 36 (SF-36) mental component score (MCS) and physical component score (PCS) from treatment-naïve patients in an observational cohort of people living with HIV. Changes in MCS and PCS were assessed to compare the benefits of each approach.

Results

The paired difference test demonstrated statistically significant increases in MCS and PCS from baseline to every follow-up, assuming however, data were missing completely at random. Use of the F-ANOVA was limited due to unbalanced data, leading to non-responder bias. While controlling for covariates, the LMMs and wGEEs illustrated a statistically significant increase in MCS and PCS with a steep increase over the first few months, followed by a plateau.

Conclusion

Relative to paired difference tests, multivariable regression approaches can better handle missing data, control for confounding factors, and provide information on the timing and magnitude of PRO changes. Regression methods therefore facilitate more informative conclusions in observational PRO analyses, and thus provide more detailed evaluations of treatment regimens from the patient’s perspective.

Introduction

The use of patient-reported outcomes (PROs) in clinical practice and research allows patients’ perspectives to be integrated into the evaluation of their treatment [1]. These outcomes are defined as any report of the status of a patient’s health condition that comes directly from the patient, without interpretation by a healthcare professional [2]. They assess features of health beyond survival, laboratory tests and biomarkers, and address concepts that are either impossible or difficult to directly observe [3], such as treatment satisfaction [4] and health-related quality of life (HRQoL) [5]. PROs therefore present an opportunity to understand treatments from the patients’ perspective in a more multidimensional manner than is possible with clinical outcomes alone.

The value of PROs has been recognised in clinical development, with PRO data being increasingly collected to better understand patients’ treatment experiences [6,7]. However, concerns have been raised over their analysis and reporting [8,9], particularly regarding the handling of high rates of missing data, the use of unsuitable methods of statistical analysis and lack of consistency of analytical approaches across studies [10,11].

Missing data and unbalanced data (where there is an unequal number of observations across variable categories) are common problems in longitudinal studies as patients frequently miss visits or are lost to follow-up [12]. The missingness is rarely missing completely at random (MCAR; where missing data does not systematically differ from observed data [13]), and is more frequently missing at random (MAR; where missingness depends on observed data but not on unobserved data) or missing not at random (MNAR; where missingness is related to the unobserved responses) [14,15]. However simplistic statistical methods such as paired difference tests are often utilised that assume MCAR missingness [16,17], potentially leading to biased conclusions. Often these tests are used to repeatedly compare patient reported outcomes at each follow-up interval to baseline (e.g., treatment initiation) without adjustment for multiple testing.

To our knowledge, all work evaluating methods for the analysis of PROs have focussed on randomised controlled trials (RCTs). The aim of this study was to evaluate different statistical approaches for the analysis of numerical PROs using a real-world observational dataset. Given that the issues surrounding the statistical analysis of PROs have not yet been addressed in the field of HIV, and the increasing importance of their use for HIV patients [3], the statistical methods were evaluated on a HIV-1 dataset.

Materials and methods

Application dataset

The statistical methods were applied to PRO data collected in a prospective observational cohort study, “TAFNES” [GS-DE-292–1912], conducted between January 2016 and November 2019 [18]. Ethical clearance was given by the Ethical Committee of the Baden-Wuerttemberg State Medical Association, file number F-2015–075. Written informed consent was gained from patients included in the study. TAFNES enrolled 767 HIV-1 patients treated with Emtricitabine/Tenofovir alafenamide (F/TAF)-based antiretroviral treatments. PRO data were collected during routine visits at approximately 0, 3, 6, 12, 18 and 24 months after treatment initiation. As previous studies have shown greater change in PROs in treatment-naïve than treatment-experienced people living with HIV (PLWH) [1820], the treatment-naive subgroup was selected for these analyses.

Patient reported outcomes

Two PROs were evaluated: mental and physical HRQoL, collected using the Short Form-36 (SF-36) version 1 [21]. The SF-36 assesses HRQoL on eight scales: Physical Functioning, Role Physical, Bodily Pain, General Health, Vitality, Social Functioning, Role Emotional and Mental Health. These scales are used to compute two summary scores; the Mental Component Score (MCS) and the Physical Component Score (PCS), representing overall mental and physical HRQoL respectively. Each score ranges 0–100, with higher scores indicating higher HRQoL and a score of 50 representing the mean in the US calibration population.

Statistical approaches

We evaluated four statistical approaches on the adequacy of their assumptions, the detail of their conclusions, and the communicability of their results (S1 Fig). As the true underlying change in MCS and PCS is unknown, we compare methods on their consistency with clinically plausible trends. The selected approaches were chosen because of the commonality of their use in HIV pharmaceutical PRO analyses [2226], their applicability to skewed distributions that are characteristic of the SF-36 data [27,28], and their simplicity or ease of implementation.

  1. Paired difference test (PD-test)
  2. Friedman’s ANOVA (F-ANOVA)
  3. Linear mixed model (LMM)
  4. Weighted generalised estimating equation (wGEE)

We selected methods that are either commonly used (PD-test, F-ANOVA) or have been proposed for PRO analyses but not frequently taken up (LMM, GEE) [1,26,39]. The wGEE analysis was extended to evaluate the use of a discrete- or continuous-time variable. We replicated the discrete-time wGEE model, using a continuous-time variable, and evaluated the following methods for non-linear trends:

  1. Polynomial transformation
  2. Fractional polynomial transformation
  3. Piecewise linear splines

Paired difference test

Paired difference tests compare the means or medians of two related samples. They are widely used in PRO analyses [22,23,2931] due to their simplicity and the interpretability of their results, but they assume that missing data is MCAR. We applied paired Wilcoxon rank sum (PWRS) tests because no assumptions are required on the normality of the outcome distribution. We tested validity of the MCAR assumption with Little’s test [32] and then the significance of the differences between the MCS and PCS values, respectively, at M0 with their values at each visit window.

Friedman’s ANOVA

The non-parametric F-ANOVA tests whether two or more related population means are equal. It requires balanced data (the same number of observations at each factor level) and assumes that missing data are MCAR. The approach is occasionally used for PRO analyses [33,34], as it can compare PRO values at multiple time-points, whereas each paired difference test can be used on only two time-points. We applied the F-ANOVA to test the significance of the difference in PRO scores at each visit window. Where a significant difference was identified, post-hoc PWRS tests were used to identify the combinations of visit windows that were significantly different.

Linear mixed model

The LMM is an extension of linear regression that tests the significance of the association between two variables. This maximum likelihood approach accounts for the correlation between repeated observations through the specification of random effects, and it can handle MCAR and MAR missingness. The approach is being increasingly used for the analysis of PROs in clinical trials, with a discrete-time variable [35].

To satisfy the LMM normality assumptions with the TAFNES SF-36 outcomes, both scores were log transformed: MCSt = (–ln(100 – MCS)) and PCSt = (–ln(100 – PCS). Both were modelled as a function of discrete-time, with the following baseline covariates: age, sex, HIV RNA count (log copies/ml), number of neuropsychiatric comorbidities, number of physical comorbidities, and presentation with advanced HIV, defined as persons presenting with a CD4 cell count ≤200/ul or presenting with an AIDS-defining event, regardless of the CD4 cell count [36]. The final combination of variables was selected using backwards selection (threshold p value 0.05) from a global model with all variables and their interactions with time. As backwards selection can lead to inappropriate removal of important covariates, we then validated the covariate selection using the Akaike information criterion (AIC) and including only those that improved the model fit (lower AIC). Age and sex were defined as a-priori covariates and were kept in the models regardless of statistical significance. A random effect for each individual was added to the intercept to account for correlation between repeated observations. Participants with missing covariate data were excluded from the LMM analysis.

Weighted generalised estimating equation: discrete-time variable

The GEE is also an extension of the traditional linear model. It is a quasi-likelihood approach that accounts for correlation between observations with the specification of a working correlation matrix [37,38]. Although proposed as an appropriate method for PRO analyses [26], this approach has received little attention in practice [39].

As the traditional GEE assumes missing data are MCAR but the weighted GEE accounts for both MCAR and MAR missing data, we applied the wGEE. To weight observations by their inverse probability of being observed, weights were generated using multivariable logistic regression, as described in Salazar et al. [40]. A separate logistic regression model was executed for each visit window, using a binary outcome variable for whether an individual provided an SF-36 observation during a given follow-up interval or not. The predictors were the covariates detailed in section 3.2.3 as well as the most recently observed MCS and PCS values, where available. The fit of the models was checked with Hosmer-Lemeshow tests. Selection of the working correlation matrix was based on the quasi-likelihood under the independence model criterion (QIC), a measure of model fit with lower values representing better fit. The QIC was developed as a modification of the AIC to apply to models fit by the GEE approach. Therefore, in this manuscript we use the QIC for GEE analyses and AIC for LMM analyses. Participants with missing covariate data were excluded from the wGEE analysis.

Weighted generalised estimating equation: continuous-time variable

To compare alternative non-linear modelling approaches for PRO analyses, the wGEE analysis was repeated with time as a continuous, numeric, non-linear variable. Models for MCS and PCS were fitted, modelling the time variable with polynomial [41], fractional polynomial [42] and piecewise linear spline [43] approaches. For each, the best-fitting functional form was first determined using the QIC in the full model and was then validated following covariate selection.

Sensitivity analyses

While other GEE extensions are available, we focused on the weighted GEE as a relatively simple approach for handling MAR data. However, we performed a sensitivity analysis to compare the results of an unweighted GEE (with missing MCS and PCS values filled using multiple imputation) to the wGEE (S1 Methods).

Secondly, as previously recommended to reduce the number and strength of assumptions of more complex models, and to maximise interpretability [1], our primary analysis focussed on methods that assume MCAR or MAR missingness. Sensitivity analyses using multiple imputation was performed to evaluate the robustness of the LMM and wGEE results to MNAR data (S1 Methods).

Software

All analyses were performed using R version 3.5.2. Sample R code is available in an online repository: https://github.com/lucyrose96/PRO-Methods-Sample-Code.

Results

Descriptive analysis

The sample consisted of 293 treatment-naïve participants. Median PCS at baseline was comparable to the general population (54.3, IQR 48.0–57.4), whereas MCS was lower (46.6, IQR 35.7–54.1). Each statistical approach’s analysis population size and baseline characteristics differed to the overall recruited population to varying extents, depending on the missing data and balanced data assumptions (Table 1). Both MCS and PCS were negatively skewed over time (Shapiro-Wilk test P value <0.0001, S2 Fig).

thumbnail
Table 1. Baseline demographic, clinical and SF-36 characteristics of the total analysis population and each statistical analysis population.

https://doi.org/10.1371/journal.pone.0344968.t001

While 269 patients (91.8%) provided evaluable SF-36 data at M0, this dropped to 185 (63.1%) at M3 and 163 (60.6%) at M24 (Fig 1A). The overall population MCS and PCS increased between M0 and M6, then displayed a plateau or slight decline (Fig 1 C-1D). By M24, the median MCS increased by 5.97 (12.8%), while the median PCS increased by 1.40 (2.6%).

thumbnail
Fig 1. Change in SF-36 mental component score (MCS) and physical component score (PCS) over the two-year follow-up.

(A) Number of individuals providing evaluable SF-36 data in each visit window. (B) Population-level change in median (Inter-Quartile Range) MCS (blue) and PCS (red). (C) Individual-level change in PCS. (D) Individual-level change in MCS. Points in (C) and (D) coloured by visit window assigned in data processing: Red = Baseline, Yellow = 3 months, Green = 6 months, Turquoise = 12 months, Blue = 18 months, Pink = 24 months.

https://doi.org/10.1371/journal.pone.0344968.g001

PWRS test

The PWRS test demonstrated a statistically significant increase in MCS and PCS between treatment initiation and every follow-up visit window (Table 2). As most participants had provided SF-36 data at baseline, the requirement of the PWRS test for balanced data reduced its analysis population size only slightly. However, there were differences between the populations included in the analyses. In the M0-M24 PWRS analysis population, the median baseline MCS was higher than in the population excluded and there were moderate differences in the presence of physical comorbidities. A significant result for Little’s test (p = 0.0001) showed that its MCAR assumption was not reasonable.

thumbnail
Table 2. Paired Wilcoxon Rank Sum Test change in median mental and physical component scores.

https://doi.org/10.1371/journal.pone.0344968.t002

Friedman’s ANOVA

The requirement for balanced data (for our analysis, an observation at each visit window) in the F-ANOVA reduced its analysis population to 73. This reduced sample demonstrated differing trends to the excluded unbalanced population, leading the F-ANOVA tests to demonstrate contrasting results to the PWRS (S3 Fig). Although the F-ANOVA demonstrated a statistically significant difference in the median MCS between study visits (df = 5, chi = 15, p = 0.01), post-hoc tests did not find statistically significant differences between baseline and every follow-up. For PCS, the F-ANOVA did not identify a statistically significant difference in median scores between study visits (df = 5, chi = 6, p = 0.3).

LMM and wGEE

The regression outputs for the LMM and wGEE demonstrated a significant increase in mental HRQoL from treatment initiation to every follow-up period, as well as an association with the number of ongoing neuropsychiatric comorbidities and log HIV RNA count at baseline (S1 Table). In both models, a higher number of neuropsychiatric comorbidities and higher log HIV RNA count at baseline were negatively associated with mental HRQoL, but the negative effect of initial HIV RNA count on MCS did not persist over the first few months of follow-up.

The PCS LMM and wGEE models showed a significant increase in PCS from treatment initiation to every follow-up period (S2 Table). Variables that were predictive of higher PCS were: age, number of ongoing physical comorbidities, presentation with advanced HIV and log HIV RNA at baseline. Although presentation with advanced HIV and higher HIV RNA at baseline were significantly associated with lower PCS at treatment initiation, the effects were reduced by the first follow-up visit at M3. LMM model diagnostic plots are provided in S4 Fig and the logistic regression results for the wGEE weighting models are presented in S3 Table. The unstructured working correlation matrix was selected for all wGEE models (S4 Table).

wGEE continuous-time analysis

The best-fitting continuous-time MCS wGEE was the fractional polynomial (((time+0.1)/100)-2), QIC = 5236), followed by the piecewise linear spline (QIC = 5238), then the polynomial (QIC = 5239). For the PCS, the best-fitting model was the 3-degree polynomial (QIC = 4364), followed by the piecewise linear spline (QIC = 4365), and the fractional polynomial (QIC = 4367). All continuous MCS and PCS models gave a better fit than their respective categorical time models (MCS categorical QIC = 5248, PCS categorical QIC = 4374). However, the similarity of the QIC values demonstrate the similarity in the quality of model fit across approaches.

The best-fitting continuous-time models showed the same associations between the covariates and the outcomes that were previously identified in the categorical models (Tables 3 and 4). These models demonstrated the steep increase in the population-average mental HRQoL immediately after treatment initiation, as well as a more gradual incline in physical HRQoL, with a slight decline after the first year (Fig 2).

thumbnail
Table 3. Weighted generalised estimating equation regression output for the Mental Component Score model with time modelled using a fractional polynomial.

https://doi.org/10.1371/journal.pone.0344968.t003

thumbnail
Table 4. Weighted Generalised Estimating Equation regression output for the Physical Component Score model with time modelled with a three-degree polynomial.

https://doi.org/10.1371/journal.pone.0344968.t004

thumbnail
Fig 2. Adjusted mental and physical component scores, estimated using the best-fitting non-linear weighted generalised estimating equation models.

(A) Mental component score (MCS). (B) Physical component score (PCS). MCS is modelled with time with a fractional polynomial [((time + 0.1)/10)-2], adjusting for sex, age and presentation with advanced HIV. PCS is modelled with a 3-degree polynomial [time3], adjusting for sex, age, physical comorbidities, presentation with advanced HIV and baseline HIV RNA count.

https://doi.org/10.1371/journal.pone.0344968.g002

Sensitivity analyses

The LMM and wGEE results were overall robust to missing data being MNAR. Analysis of the TAFNES data with an unweighted GEE with multiply imputed missing observations gave comparable results to the wGEE (S5 Table).

Discussion

In this paper we evaluated four statistical approaches and within the last approach, three non-linear modelling approaches, for the analysis of longitudinal changes in PROs. Each statistical approach had generally consistent conclusions, but they differed in their depth, the appropriacy of their assumptions and the communicability of the results.

The strengths and limitations of each approach are summarised in Table 5. While the PD test is simple and easily interpreted, its missing data assumption was invalid, and it provided little insight into the longitudinal nature of the change in HRQoL and the factors that influenced it. The requirement of the F-ANOVA for balanced data limited its ability to accurately analyse the dataset, such that it gave contrasting results to all other approaches. This demonstrates a limitation of simpler statistical approaches; as they require data to be complete for all, or for certain combinations of visit windows, their sample size is reduced, and they are susceptible to non-responder bias. In contrast, the LMM and wGEE approaches utilised all SF-36 observations. They illustrated how mental and physical HRQoL changed after treatment initiation and identified factors associated with the changes. This is important for identifying clinical or socio-demographic groups with poorer HRQoL outcomes or poorer responses to treatment, and for controlling for variables associated with missingness. While it is possible to explore differences in changes in an outcome between subgroups with simpler statistical approaches such as the PD test, this requires stratification potentially to very small groups, leading to a large number of tests and increasing the likelihood of false positives. Multivariable regression approaches instead evaluate all covariates together. The LMM analyses required transformation of the MCS and PCS outcomes to meet its normality assumptions, and due to its complexity, the parameter estimates couldn’t be back transformed so the interpretability of the model coefficients was reduced. This is a major limitation of LMMs when data do not satisfy its assumptions. In these cases, back-transforming model predictions and illustrating the change over time across patient profiles would support understanding. The issue with LMM interpretability contrasts to the more robust and flexible wGEE approach, which doesn’t make distributional assumptions, therefore facilitating analysis on the original scale. As PRO data tend to be collected through questionnaires, they are susceptible to ceiling and floor effects (where most observed values are close to the maximum or minimum possible value, respectively). PRO data therefore often do not conform to the normal distribution, and in these cases, the wGEE may prove favourable.

thumbnail
Table 5. Summary of the strengths and limitations of the four statistical approaches.

https://doi.org/10.1371/journal.pone.0344968.t005

The methods evaluated in this manuscript highlight an inherent trade-off in study design between model complexity/appropriateness of assumptions, and interpretability. Typically, simpler methods are more easily communicated, particularly to audiences without statistical training. More complex models may more accurately represent the data, but the underlying message can be lost. Ultimately, method selection should strike an appropriate balance for the study objectives.

A final benefit of both multivariable regression approaches was the ability to analyse continuous variables, where greater insight can be communicated on the nature of the change in a PRO over time. These analyses showed a steep incline in mental HRQoL soon after treatment initiation, but a more gradual change in physical HRQoL. As we used observational data, it is not possible to know if this trend represents the true underlying MCS and PCS trends, however it corroborates previously seen trends in physical and mental HRQoL following initiation of antiretrovirals in treatment-naïve PLWH [20]. Although the regression models themselves are more difficult to interpret, we recommend visualising trends through figures to aid communication of results to clinicians. From a statistical perspective, the continuous-time variable models had a better fit than the categorical-time models as they required fewer parameter estimates. This simplification is important for non-RCTs where confounders must be controlled for.

The fractional polynomial and polynomial models gave the best fitting models for the MCS and PCS respectively. However as both polynomial approaches use power terms to model non-linear trends, the parameter estimates themselves may be difficult to interpret for audiences without statistical training. In this illustration, we selected the best model based on model-fit, however for other researchers, the more interpretable piecewise linear spline approach may be preferable, particularly if the model fit is similar to (fractional) polynomial transformations. If selecting a polynomial or fractional polynomial approach, PRO trends can be visualised using the estimates generated from the model to aid communication of results. This visualisation could also be extended to generate subgroup-specific estimates and confidence intervals could be generated with bootstrapping. The decision on the best approach for other datasets will depend on the underlying trend being modelled.

Our recommendation for the use of multivariable regression modelling approaches for the analysis of PROs is supported by recent work by the Setting International Standards for the Analysis of Quality of Life (SISAQOL) consortium. SISAQOL has discussed key statistical considerations for PRO analyses for cancer RCTs and highlight the importance of methods that can handle missing data appropriately, make realistic assumptions and produce interpretable results [1]. For evaluating the change in PRO at a time point and for describing its response trajectory over time, they recommend the use of an LMM with a discrete-time variable. While our work supports the use of multivariable regression approaches such as the LMM, we also highlight the value of more robust regression approaches such as the wGEE. Additionally, we demonstrate the improvement in model fit that can be made by modelling time continuously, particularly for observational studies where models may require multiple covariates and interactions, where the longitudinal change is of interest, and where patients might not attend follow-up visits at exact time points.

While the primary objective of this manuscript was to evaluate statistical approaches on data that reflect the complexity and variability of real-world clinical practice, simulations would be valuable to quantify the degree of accuracy and precision of each method. A simulation study was beyond the scope of this manuscript, but future work evaluating statistical methods on simulated PRO data would be valuable. The use of one example dataset may limit the generalisability of the conclusions of this research. PRO data comes in a variety of forms, so the benefits of one method for these data may not apply for other PRO data. However, the methods recommended in this paper are versatile for a range of outcomes; both the LMM and GEE can be applied to binomial (binary PROs, e.g., improvement/no improvement), Poisson (count PROs, e.g., number of symptoms), Gamma (exponentially distributed PROs, e.g., time to self-reported recovery) and gaussian data (numeric PROs, e.g., some score-based PROs). Importantly for PRO data, both can be applied to unbalanced data and data that are MAR or MCAR, and the GEE can be applied to skewed outcome distributions.

In this paper we have covered four statistical approaches, but depending on other researchers’ data, other methods may be considered. For example, when data cannot be assumed to satisfy the MCAR or MAR assumptions, pattern-mixture models or joint modelling approaches can better handle missing data. Data that conforms to the normal distribution assumption with or without a link function can be handled with the LMM or generalised LMM, giving greater interpretability that when using the LMM with a complex transformation of the outcome variable. PROs may also be analysed as time to plateau or time to deterioration, in which case, survival-type approaches would be more appropriate than those covered in this manuscript. A wider range of methods have been discussed theoretically elsewhere [1], and future work would be valuable practically evaluating the applicability of these methods to PRO data.

Conclusions

In this paper, we demonstrated the utility of multivariable regression modelling approaches for an example PRO dataset of HIV-1 patients, particularly for characterising the nature of the change in PRO over time, identifying associated covariates and for performing analysis with reasonable missing data assumptions. We highlight the benefit of the robustness of the wGEE for analysing non-normal data and recommend it as a favourable approach when the assumptions of the LMM cannot be met. Finally, we showed how modelling time continuously can improve the fit of covariate-heavy regression models.

Supporting information

S3 Fig. Balanced and Unbalanced MCS and PCS Across Visits.

https://doi.org/10.1371/journal.pone.0344968.s004

(DOCX)

S1 Table. Mental Component LMM and wGEE Regression Estimates.

https://doi.org/10.1371/journal.pone.0344968.s006

(DOCX)

S2 Table. Physical Component LMM and wGEE Regression Estimates.

https://doi.org/10.1371/journal.pone.0344968.s007

(DOCX)

S3 Table. Probability of SF-36 Observation Logistic Regression Results.

https://doi.org/10.1371/journal.pone.0344968.s008

(DOCX)

Acknowledgments

We thank Bo Zhao for conducting additional sensitivity analyses and Craig Pfeifer for helping with editorial aspects and submission of the manuscript.

References

  1. 1. Coens C, Pe M, Dueck AC, Sloan J, Basch E, Calvert M, et al. International standards for the analysis of quality-of-life and patient-reported outcome endpoints in cancer randomised controlled trials: recommendations of the SISAQOL Consortium. Lancet Oncol. 2020;21(2):e83–96. pmid:32007209
  2. 2. Food and Drug Administration, U.S. Department of Health and Human Services. Guidance for Industry Use in Medical Product Development to Support Labeling Claims Guidance for Industry. Clinical/Medical Federal Register. 2009.
  3. 3. Kall M, Marcellin F, Harding R, Lazarus JV, Carrieri P. Patient-reported outcomes to enhance person-centred HIV care. Lancet HIV. 2020;7(1):e59–68. pmid:31776101
  4. 4. Atkinson MJ, Sinha A, Hass SL, Colman SS, Kumar RN, Brod M, et al. Validation of a general measure of treatment satisfaction, the Treatment Satisfaction Questionnaire for Medication (TSQM), using a national panel study of chronic disease. Health Qual Life Outcomes. 2004;2:12. pmid:14987333
  5. 5. Ware Jr JE, Snow KK, Kosinski M, Gandek B. The SF-36 Health Survey: Manual and Interpretation Guide. Boston: Health Institute, New England Medical Center. 2000.
  6. 6. Doward LC, Gnanasakthy A, Baker MG. Patient reported outcomes: looking beyond the label claim. Health Qual Life Outcomes. 2010;8:89. pmid:20727176
  7. 7. Gnanasakthy A, Mordin M, Evans E, Doward L, DeMuro C. A review of patient-reported outcome labeling in the United States (2011–2015). Value in Health. 2017;20:420–9.
  8. 8. Mercieca-Bebber R, King MT, Calvert MJ, Stockler MR, Friedlander M. The importance of patient-reported outcomes in clinical trials and strategies for future optimization. Patient Relat Outcome Meas. 2018;9:353–67. pmid:30464666
  9. 9. Rivera SC, Kyte DG, Aiyegbusi OL, Slade AL, McMullan C, Calvert MJ. The impact of patient-reported outcome (PRO) data from clinical trials: a systematic review and critical analysis. Health Qual Life Outcomes. 2019;17(1):156. pmid:31619266
  10. 10. Bylicki O, Gan HK, Joly F, Maillet D, You B, Péron J. Poor patient-reported outcomes reporting according to CONSORT guidelines in randomized clinical trials evaluating systemic cancer therapy. Ann Oncol. 2015;26(1):231–7. pmid:25355720
  11. 11. Mack DE, Wilson PM, Santos E, Brooks K. Standards of reporting: the use of CONSORT PRO and CERT in individuals living with osteoporosis. Osteoporos Int. 2018;29(2):305–13. pmid:28971256
  12. 12. Ibrahim JG, Molenberghs G. Missing data methods in longitudinal studies: a review. Test (Madr). 2009;18(1):1–43. pmid:21218187
  13. 13. Richards LE, Little RJA, Rubin DB. Statistical Analysis with Missing Data. Journal of Marketing Research. 1989;26(3):374.
  14. 14. Fairclough DL, Peterson HF, Cella D, Bonomi P. Comparison of several model-based methods for analysing incomplete quality of life data in cancer clinical trials. Statist Med. 1998;17(5–7):781–96.
  15. 15. Fairclough DL, Peterson HF, Chang V. Why are missing quality of life data a problem in clinical trials of cancer therapy?. Statistics in Medicine. 1998;:667–77.
  16. 16. Fielding S, Ogbuagu A, Sivasubramaniam S, MacLennan G, Ramsay CR. Reporting and dealing with missing quality of life data in RCTs: has the picture changed in the last decade?. Qual Life Res. 2016;25(12):2977–83. pmid:27650288
  17. 17. Pe M, Dorme L, Coens C, Basch E, Calvert M, Campbell A, et al. Statistical analysis of patient-reported outcome data in randomised controlled trials of locally advanced and metastatic breast cancer: a systematic review. Lancet Oncol. 2018;19(9):e459–69. pmid:30191850
  18. 18. Pauli R, Jessen H, Postel N, Heuchel T, Rieke A, Hillenbrand H, et al. Effectiveness, persistence and safety of elvitegravir/cobicistat/emtricitabine/tenofovir alafenamide (E/C/F/TAF), F/TAF 3rd agent or rilpivirine/F/TAF (R/F/TAF) in treatment-naïve HIV-1 infected patients - 24-month results from the German TAFNES cohort. HIV Medicine. 2019;:317–37.
  19. 19. Dessie ZG, Zewotir T, Mwambi H, North D. Multivariate multilevel modeling of quality of life dynamics of HIV infected patients. Health Qual Life Outcomes. 2020;18(1):80. pmid:32209095
  20. 20. Protopopescu C, Marcellin F, Spire B, Préau M, Verdon R, Peyramond D, et al. Health-related quality of life in HIV-1-infected patients on HAART: a five-years longitudinal analysis accounting for dropout in the APROCO-COPILOTE cohort (ANRS CO-8). Qual Life Res. 2007;16(4):577–91. pmid:17268929
  21. 21. Ware JE Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care. 1992;30(6):473–83. pmid:1593914
  22. 22. Llibre JM, Hung C-C, Brinson C, Castelli F, Girard P-M, Kahl LP, et al. Efficacy, safety, and tolerability of dolutegravir-rilpivirine for the maintenance of virological suppression in adults with HIV-1: phase 3, randomised, non-inferiority SWORD-1 and SWORD-2 studies. Lancet. 2018;391(10123):839–49. pmid:29310899
  23. 23. Aboud M, Kaplan R, Lombaard J, Zhang F, Hidalgo JA, Mamedova E, et al. Dolutegravir versus ritonavir-boosted lopinavir both with dual nucleoside reverse transcriptase inhibitor therapy in adults with HIV-1 infection in whom first-line therapy has failed (DAWNING): an open-label, non-inferiority, phase 3b trial. Lancet Infect Dis. 2019;19(3):253–64. pmid:30732940
  24. 24. Murray M, Antela A, Mills A, Huang J, Jäger H, Bernal E, et al. Patient-Reported Outcomes in ATLAS and FLAIR Participants on Long-Acting Regimens of Cabotegravir and Rilpivirine Over 48 Weeks. AIDS Behav. 2020;24(12):3533–44. pmid:32447500
  25. 25. Wohl D, Clarke A, Maggiolo F, Garner W, Laouri M, Martin H, et al. Patient-Reported Symptoms Over 48 Weeks Among Participants in Randomized, Double-Blind, Phase III Non-inferiority Trials of Adults with HIV on Co-formulated Bictegravir, Emtricitabine, and Tenofovir Alafenamide versus Co-formulated Abacavir, Dolutegravir,. Patient. 2018;11: 561–73.
  26. 26. Bell ML, Horton NJ, Dhillon HM, Bray VJ, Vardy J. Using generalized estimating equations and extensions in randomized trials with missing longitudinal patient reported outcome data. Psychooncology. 2018;27(9):2125–31. pmid:29802657
  27. 27. Burholt V, Nash P. Short Form 36 (SF-36) Health Survey Questionnaire: normative data for Wales. J Public Health (Oxf). 2011;33(4):587–603. pmid:21307049
  28. 28. Bowling A, Bond M, Jenkinson C, Lamping DL. Short Form 36 (SF-36) Health Survey questionnaire: which normative data should be used? Comparisons between the norms provided by the Omnibus Survey in Britain, the Health Survey for England and the Oxford Healthy Life Survey. J Public Health Med. 1999;21(3):255–70. pmid:10528952
  29. 29. Antinori A, Cossu MV, Menzaghi B, Sterrantino G, Squillace N, Di Cristo V, et al. Patient-Reported Outcomes in an Observational Cohort of HIV-1-Infected Adults on Darunavir/Cobicistat-Based Regimens: Beyond Viral Suppression. Patient. 2020;13(3):375–87. pmid:32266663
  30. 30. Jayaweera D, Dejesus E, Nguyen KL, Grimm K, Butcher D, Seekins DW. Virologic suppression, treatment adherence, and improved quality of life on a once-daily efavirenz-based regimen in treatment-Naïve HIV-1-infected patients over 96 weeks. HIV Clin Trials. 2009;10(6):375–84. pmid:20133268
  31. 31. Murray M, Pulido F, Mills A, Ramgopal M, LeBlanc R, Jaeger H, et al. Patient-reported tolerability and acceptability of cabotegravir + rilpivirine long-acting injections for the treatment of HIV-1 infection: 96-week results from the randomized LATTE-2 study. HIV Res Clin Pract. 2019;20(4–5):111–22. pmid:31533539
  32. 32. Little RJA. A Test of Missing Completely at Random for Multivariate Data with Missing Values. Journal of the American Statistical Association. 1988;83(404):1198–202.
  33. 33. Rintala A, Häkkinen A, Paltamaa J. Ten-year follow-up of health-related quality of life among ambulatory persons with multiple sclerosis at baseline. Qual Life Res. 2016;25(12):3119–27. pmid:27363691
  34. 34. Thompson E, Viksveen P, Barron S. A patient reported outcome measure in homeopathic clinical practice for long-term conditions. Homeopathy. 2016;105(4):309–17. pmid:27914570
  35. 35. Bottomley A, Pe M, Sloan J, Basch E, Bonnetain F, Calvert M, et al. Analysing data from patient-reported outcome and quality of life endpoints for cancer clinical trials: a start in setting international standards. Lancet Oncol. 2016;17(11):e510–4. pmid:27769798
  36. 36. Antinori A, Coenen T, Costagiola D, Dedes N, Ellefson M, Gatell J, et al. Late presentation of HIV infection: a consensus definition. HIV Med. 2011;12(1):61–4. pmid:20561080
  37. 37. Wedderburn RWM. Quasi-Likelihood Functions, Generalized Linear Models, and the Gauss-Newton Method. Biometrika. 1974;61(3):439.
  38. 38. Pan W. Akaike’s information criterion in generalized estimating equations. Biometrics. 2001;57(1):120–5. pmid:11252586
  39. 39. Hamel J-F, Saulnier P, Pe M, Zikos E, Musoro J, Coens C, et al. A systematic review of the quality of statistical methods employed for analysing quality of life data in cancer randomised controlled trials. Eur J Cancer. 2017;83:166–76. pmid:28738257
  40. 40. Salazar A, Ojeda B, Dueñas M, Fernández F, Failde I. Simple generalized estimating equations (GEEs) and weighted generalized estimating equations (WGEEs) in longitudinal studies with dropouts: guidelines and implementation in R. Stat Med. 2016;35(19):3424–48. pmid:27059703
  41. 41. Nussbaumer HJ. Polynomial Transforms. Springer S. Fast Fourier Transform and Convolution Algorithms. Springer S. Springer, Berlin, Heidelberg; 1982. pp. 151–80. https://doi.org/10.1007/978-3-642-81897-4_6
  42. 42. Royston P, Altman DG. Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modelling. Applied Statistics. 1994;43(3):429.
  43. 43. Howe LD, Tilling K, Matijasevich A, Petherick ES, Santos AC, Fairley L, et al. Linear spline multilevel models for summarising childhood growth trajectories: A guide to their application using examples from five birth cohorts. Stat Methods Med Res. 2016;25(5):1854–74. pmid:24108269