Sequential FOLFIRI.3 + Gemcitabine Improves Health-Related Quality of Life Deterioration-Free Survival of Patients with Metastatic Pancreatic Adenocarcinoma: A Randomized Phase II Trial

Background A randomized multicenter phase II trial was conducted to assess the sequential treatment strategy using FOLFIRI.3 and gemcitabine alternately (Arm 2) compared to gemcitabine alone (Arm 1) in patients with metastatic non pre-treated pancreatic adenocarcinoma. The primary endpoint was the progression-free survival (PFS) rate at 6 months. It concludes that the sequential treatment strategy appears to be feasible and effective with a PFS rate of 43.5% in Arm 2 at 6 months (26.1% in Arm 1). This paper reports the results of the longitudinal analysis of the health-related quality of life (HRQoL) as a secondary endpoint of this study. Methods HRQoL was evaluated using the EORTC QLQ-C30 at baseline and every two months until the end of the study or death. HRQoL deterioration-free survival (QFS) was defined as the time from randomization to a first significant deterioration as compared to the baseline score with no further significant improvement, or death. A propensity score was estimated comparing characteristics of partial and complete responders. Analyses were repeated with inverse probability weighting method using the propensity score. Multivariate Cox regression analyses were performed to identify independent factors influencing QFS. Results 98 patients were included between 2007 and 2011. Adjusting on the propensity score, patients of Arm 2 presented a longer QFS of Global Health Status (Hazard Ratio: 0.52 [0.31-0.85]), emotional functioning (0.35 [0.21–0.59]) and pain (0.50 [0.31 – 0.81]) than those of Arm 1. Conclusion Patients of Arm 2 presented a better HRQoL with a longer QFS than those of Arm 1. Moreover, the propensity score method allows to take into account the missing data depending on patients’ characteristics. Trial registration information Eudract N° 2006-005703-34. (Name of the Trial: FIRGEM).


Methods
HRQoL was evaluated using the EORTC QLQ-C30 at baseline and every two months until the end of the study or death. HRQoL deterioration-free survival (QFS) was defined as the time from randomization to a first significant deterioration as compared to the baseline score with no further significant improvement, or death. A propensity score was estimated comparing characteristics of partial and complete responders. Analyses were repeated with inverse probability weighting method using the propensity score. Multivariate Cox regression analyses were performed to identify independent factors influencing QFS.

Introduction
The results of a phase II trial concerning untreated patient with metastatic Pancreatic Cancer (mPC) have shown that sequential treatment using FOLFIRI.3 and gemcitabine was effective and safe [1].
In first line treatment, FOLFIRINOX protocol and the association of nab-paclitaxel + gemcitabine improve overall survival (OS) [2,3] and represent a new therapeutic option in first line. However, the less favorable toxicity profiles of these new strategies could limit this option to younger patients with a good Performance Status (0 or 1) [4]. A sequential association of chemotherapy protocol without cross-resistance may increase anti-tumor effects and limit toxicities, preserving patient's Health-related Quality of Life (HRQoL).
Prognosis of patients with mPC remains extremely poor. In consequence, HRQoL is a major subject of concern for these patients who are often painful and symptomatic at the time of diagnosis. Moreover, HRQoL appears to be an independent prognostic factor for OS alongside classical clinical and demographic factors [5]. In metastatic settings, the current discussion is to consider HRQoL as a co-primary endpoint along with a tumor parameter such as progression-free survival (PFS) [6,7].
However, HRQoL results remain poorly used to modify therapeutic strategies, due to the complexity of its longitudinal analysis and to a lack of standardization. Moreover, results should have the ability to translate findings into information that decision makers find understandable and compelling.
In recent years, time to event models like time until definitive HRQoL score deterioration (TUDD) have been proposed as a modality of longitudinal HRQoL analysis in oncology, especially in metastatic setting [8]. The TUDD method produces clinically meaningful results for clinicians like Kaplan-Meier survival curves and hazard ratio (HR). TUDD including death as an event was defined as "HRQoL deterioration-free survival" (QFS) [9].
One other major concern of longitudinal HRQoL studies is missing data [10], specifically in advanced cancer where attrition is common [11]. Patients may dropout before the end of the study, generally due to a health status deterioration or death. In this case, missing data can bias the analysis and interpretation [10,[12][13][14], and should be considered to ensure accuracy and robustness of the results. Several methods have been investigated to handle with missing data [15,16]. The most well-known is the pattern-mixture model [17] but it is rarely applied due to its complexity [17,18].
Then it would be interesting to develop a method to use in conjunction with QFS to handle with informative missing data. Methods using the propensity score are often used in observational studies in order to reduce the bias of the absence of randomization and to allow causal inference [19]. The propensity score is used to model the probability of receiving a treatment conditionally to the variables observed before treatment. The main methods used with propensity score are stratification, matching and inverse probability weighting (IPW) methods [20]. In survival analyses, IPW method is recommended [21]. Indeed, IPW method of the propensity score was already proposed to take into account missing data [22].
The objective of this study was to compare longitudinal HRQoL according to treatment arm using QFS in a metastatic setting and secondary to investigate the application of the IPW method based on the propensity score in conjunction with the TUDD in order to take into account missing data depending on patients' characteristics.

Patients and eligibility criteria
This study was a multicenter, randomized, non-comparative, open phase II trial, conducted in French centers. Inclusion criteria were: histologically or cytologically proven mPC, no previous chemotherapy (adjuvant chemotherapy with gemcitabine was allowed if administered more than 12 months before inclusion) or radiotherapy (unless at least one measurable target lesion was present outside the irradiated area) and WHO performance status <2. Exclusion criteria were bile ducts adenocarcinoma, ampulloma and a history of another cancer. All patients were fully informed of the study and provided signed written informed consent (see S1 Informed Consent). The protocol was approved by the ethics committees ("Comité de Protection des Personnes"). This study FIRGEM was registered with EudraCT (https://eudract.ema.europa.eu/; N°2 006-005703-34) before the start date. The design of this study has been extensively described elsewhere [1]. The protocol for this trial and supporting CONSORT checklist are available as supporting information (see S1 and S2 Protocol; see S1 Checklist). List of Ethics Committees is also available in supporting information (see S1 Authorization).

Health-related quality of life assessment
HRQoL was evaluated using the European Organisation for Research and Treatment of Cancer (EORTC) QLQ-C30 cancer specific questionnaire [23], at inclusion and every two months until progression, limiting toxicity, patient's refusal or death. The QLQ-C30 includes 30 items and measures five functional scales (physical, role, emotional, cognitive and social functioning), global health status (GHS), financial difficulties and eight symptom scales (fatigue, nausea and vomiting, pain, dyspnea, insomnia, appetite loss, constipation, diarrhea) [23]. These scores vary from 0 (worst) to 100 (best) for the functional dimensions and GHS, and from 0 (best) to 100 (worst) for the symptom dimensions and were generated according to the EORTC Scoring Manual [24].

Statistical analysis
Sample size calculation. The primary endpoint was the 6-month PFS rate. Secondary endpoints were OS, safety/tolerability, tumor response, PFS and QFS. The trial was based on a Fleming one-step design [25]. The expected 6-month PFS rate with the sequential treatment was 45%. A PFS rate of 25% was chosen as uninteresting rate of effectiveness (H0: 6-month PFS 25% = unacceptable efficacy, H1: 6-month PFS 45% = expected efficacy). With a unilateral type I error of 5% and a type II error of 10%, it was necessary to include 46 patients in each arm, rounded to 49 to compensate for an anticipated 5% rate of loss to follow-up.
Based on Fleming decision criteria, experimental arm will be considered uninteresting if 15 or less than 15 alive patients were free of progression. It will be considered as promising if 16 or more than 16 alive patients were free of progression.
The analysis was performed on intent-to-treat principle (all randomized patients irrespective of treatment received and eligibility criteria). Analyses of primary endpoint were done on the first randomized 46 patients with available PFS data (to match with Fleming criteria decision rules) while all other analyses were done on all randomized patients. Tumour responses were defined using RECIST (version 1.1) [26] and determined by investigators.
Population. Randomized patients whatever eligibility criteria with at least one HRQoL score were included in the QFS analysis (modified intent to treat analysis). Pre-specified targeted HRQoL dimensions were GHS (mITT1), physical (mITT2) and emotional functioning (mITT3), fatigue (mITT4) and pain (mITT5).
Since this is a non comparative randomized phase II trial and HRQoL was an exploratory secondary endpoint, no p-value was provided while effect size was presented using hazard ratio with 95% confidence interval (CI95%). A five-point difference in HRQoL scores was considered as the Minimal Clinically Important Difference [27].
Descriptive analysis. Baseline variables were described using means and standard deviations for continuous variables and percentages for qualitative variables. Baseline HRQoL scores were described by treatment arm. The number of HRQoL questionnaires completed at each measurement time was reported. The Most Common Grade 3 or 4 Adverse Events occurring during the study according to the National Cancer Institute Common Terminology Criteria for Adverse Events (version 3.0) [28] were reported by treatment arm.
Missing data analysis. The missing data patterns were patients with at least one missing HRQoL score during the follow-up (partial responders) versus patients with all available scores until their drop-out of the study or death (complete responders). The number and percentage of patients according to the missing data profile (partial vs. complete responders) were described at each measurement time by treatment arm. The number and percentage of complete responders, partial responders and non responders (patients who did not complete any HRQoL questionnaire) were described by treatment arm and the difference between the two treatment arms was compared using Chi-square test. All baseline variables that could be associated with missing data patterns (partial vs. complete responders) were tested with an univariate logistic regression model. Variables with an univariate P-value 0.20 were eligible for multivariate analysis. To prevent collinearity, when two variables were significantly correlated, one variable was retained according to its clinical relevance. The final multivariate model was chosen according to the Akaike criteria and the area under the ROC curve and described with Odds-Ratio (OR) and its 95%CI. Fitted values were then extracted from the model and constituted the propensity score [29].
Longitudinal analysis. The QFS was defined as the time from randomization to a first deterioration with a 5-point Minimal Clinically Important Difference as compared to the baseline score with no further improvement of more than 5 points as compared to the baseline sore, or all-cause of death [8]. Patients with no baseline score were censored at baseline (Day 0). Patients with no follow-up measure were censored just after baseline (Day 1). Patients with no deterioration before their drop-out and those with a deterioration followed by a significant improvement are censored at the time of the last follow-up or the last HRQoL assessment. Each targeted dimensions of the QLQ-C30 was studied.
Based on the intention-to treat principle and according to the worst possible scenario, a sensitivity analysis was performed integrating non responders patients and considering these patients in deterioration since baseline (Day 1).
QFS curves were calculated using the Kaplan-Meier estimation method and described using median and its 95%CI. Univariate Cox analyses were done as exploratory analysis to estimate effect of treatment arm size with the HR and 95%CI. Follow-up was calculated using reverse Kaplan-Meier estimation.
To take into account missing data, analyses were repeated by assigning a weight to patients according to the IPW method of propensity score [21]. The weight equals to the inverse of the propensity score value for partial responders and to the inverse of the opposite of the propensity score value for complete responders [21]. Multivariate Cox regression model was also conducted as exploratory analysis in order to investigate parameters which seem to be associated with QFS. All variables collected at baseline were tested in univariate analysis. Some interaction effects between treatment arm and clinical variables were investigated. Variables with 1 not included in the 95%CI of the HR were eligible for multivariate analysis. The same variables were kept in multivariate analysis for unweighted and weighted QFS analyses. The variable treatment arm was forced in multivariate analysis.
All analyses were performed with R software [30].

Study population
Between October 2007 and May 2011, 98 patients (49 in each treatment arm) were enrolled in 10 French centers (Fig 1). Baseline characteristics of patients are summarized in Table 1 Missing data analysis Table 2 gives the number and percentage of complete, partial and non-responders in each treatment arm.
Among the 66 patients (67.3%) who had completed at least one HRQoL questionnaire during the study, 40 (60.6%) were partial responders (15 in Arm 1 (37.5%), 25 in Arm 2 (62.5%)) and 26 (39.4%) were complete responders (15 in Arm 1 (57.7%), 11 in Arm 2 (42.3%)) during the follow-up. The details of the HRQoL questionnaire completed at each follow-up measurement time according to treatment arm and missing data profile are given in Table 3.
Based on the univariate analyses, variables associated with responder profiles and retained to build the propensity score were a primary tumor location at the pancreatic head (yes vs. no), presence of metastatic lymph node (yes vs. no), neutrophils, hemoglobin and platelet rates (dichotomized according to the median value). In multivariate analysis, a primary tumor location at the pancreatic head (OR = 2.72 [95%CI 0.86-9.16]), the presence of lymph node metastases (7. .54-6.10]) were independently associated with partial responder profile but not statistically significant. The area under the ROC curve was equal to 0.76.

Longitudinal analysis
In Arm 1 (gemcitabine alone) and Arm 2 (gemcitabine + FOLFIRI. Regarding the unweighted analysis ( Variables retained for the Cox multivariate analysis were treatment arm (Arm 2 vs. Arm 1), number of metastatic sites (2 or more vs. 1) and an interaction effect between treatment arm and the number of metastatic sites, according to the univariate Cox regression analysis (data not shown).  In the unweighted analysis, all the 95%CI contained the value of 1. Regarding the weighted analyses, the treatment arm (gemcitabine + FOLFIRI.3) and the number of metastatic sites (one site) seemed to be independently associated with longer QFS of physical functioning ( Table 5). The number of metastatic sites (more than one vs. one) seemed to be associated with a shorter QFS of GHS, fatigue and pain.
As for the unweighted analysis, the same trends were observed for the sensitivity unweighted analysis integrating non-responders patients (see S1 Fig, S3 and S4 Tables).
Meanwhile to the recent progress in the improvement of OS, preserving HRQoL is of paramount importance considering the symptom burden and the poor prognosis of mPC. If several Phase III trials attempted to show a clinical benefit or improvement in HRQoL, few have achieved their goals [31,32]. Recently, the clinical trial comparing FOLFIRINOX to gemcitabine shown an improvement in HRQoL for FOLFIRINOX arm [5].
In our trial, HRQoL results support the efficacy profile of FOLFIRI.3 + Gemcitabine regimen. Patients in FOLFIRI.3 + Gemcitabine arm presented a longer QFS than those of gemcitabine alone arm whatever the HRQoL score considered in both QFS analyses even if patients in FOLFIRI.3 + Gemcitabine arm presented twice as much as those of Gemcitabine alone arm occurrence of grade 3 or 4 neutropenia. In multivariate weighted analysis, treatment with sequential FOLFIRI.3 + gemcitabine seemed to be associated with longer QFS in each HRQoL score considered including pain and fatigue score, two symptoms commonly present at time of diagnosis. It would be interested to study the impact of the occurrence of at least one grade 3-4 toxicity on the QFS.
Median QFS for each domain was shorter than median PFS irrespective of the use of IPW method. It is noteworthy that survival estimates depend on the QFS definition. Contrary to our definition, all-cause death was not integrated as an event in the definition of TUDD chosen by Gourgou-Bourgade et al. [5]. In consequence, median TUDD was not reached after a 26.6 months follow-up while the median PFS was 6.4 months in the FOLFIRINOX arm [2,5] which was not in agreement with clinical profiles of these patients. Moreover it is underlined that comparison across trials is not possible, stressing the need to adopt a common definition of TUDD or QFS [9]. If QFS is increasingly used in clinical trials, consensual methods to optimize management of missing data are still lacking [5,[33][34][35]. In FOLFIRINOX trial, little information was provided on the method used to deal with missing data, except when authors declared that the two groups did not differ in terms of rate of missing data [5].
In our study, in both unweighted and weighted analyses, patients in Arm 2 presented a longer QFS than patients in Arm 1. In multivariate analyses, treatment arm (gemcitabine + FOL-FIRI.3) and number of metastatic sites (one site) tended to be associated with longer QFS of physical functioning in the weighted analysis. The same trends were observed for the unweighted analysis.
In this way, using the IPW method of the propensity score influences the results of the multivariate analysis by underlining more significant associations. A high weight is assigned to patients with no missing data (mainly patients of Arm 1) and a low weight to partial responders (mainly patients of Arm 2). As in unweighted analysis, a longer QFS was yet observed for patients of Arm 2 as compared to those of Arm 1 for most HRQoL dimensions, the HR increased with the use of the IPW method. The use of the propensity score in conjunction with the TUDD method allowed reducing the bias due to the occurrence of missing data depending on patients' characteristics during the follow-up. This bias cannot be totally eliminated because Quality of Life in Metastatic Pancreatic Adenocarcinoma missing data can also depend on unobserved data. However, some logistic problems could explain the reasons for partial and non-responders because these patients were followed in the study for other endpoints. Another statistical approach to use in conjunction of the QFS method should be proposed to adequately take into account missing not at random data. Besides primary prevention procedures for limiting missing data rate, additional work on statistical methods to handle with missing data is still needed. Multiple imputations on the HRQoL scores could also be performed but this method requires a larger sample and can only retain one or two factors associated with missing data [36], more variable can be retained in the propensity score. Then this approach could be suggested for the trials with limited sample size. Contrary to the pattern mixture models, the IPW method in conjunction with the TUDD approach is more appropriate to the design of oncology clinical trials, for which a lot of HRQoL measures are done. In fact, the number of possible patterns increases with the number of HRQoL measures. Austin et al. recommend to use IPW for time to event data [21]. Propensity score matching could also be performed for survival analysis but a higher sample size is needed. Finally, the IPW method is easy understandable (weighting observations according to the presence or absence of missing data) [21].
In conclusion, analyses of QFS supports that sequential strategy with FOLFIRI.3 followed by gemcitabine in patients with untreated mPC is feasible and, despite more toxicities, delayed the HRQoL deterioration. Moreover, using the propensity score allows controlling the imbalance of informative missing data between the two arms and provides more precise estimation of the treatment effect. This sequential treatment strategy will now be compared with FOLFIR-INOX in a phase III trial (French study). This phase III clinical trial will allow to confirm or not these results raised from an exploratory analysis.  Table. Results of the Kaplan-Meier estimation of the health-related quality of life deterioration-free survival for a QLQ-C30 score considering non-responders patients in deterioration since baseline and comparison between treatment arms. (DOC) S4 Table. Results of the multivariate Cox regression analysis for the QFS analysis of each targeted score of the QLQ-C30 considering non-responders patients in deterioration since baseline.