Comparative Effectiveness of Second-Line Targeted Therapies for Metastatic Renal Cell Carcinoma: A Systematic Review and Meta-Analysis of Real-World Observational Studies

Objective The optimal sequencing of targeted therapies for metastatic renal cell carcinoma (mRCC) is unknown. Observational studies with a variety of designs have reported differing results. The objective of this study is to systematically summarize and interpret the published real-world evidence comparing sequential treatment for mRCC. Methods A search was conducted in Medline and Embase (2009–2013), and conference proceedings from American Society of Clinical Oncology (ASCO), ASCO Genitourinary Cancers Symposium (ASCO-GU), and European Society for Medical Oncology (ESMO) (2011–2013). We systematically reviewed observational studies comparing second-line mRCC treatment with mammalian target of rapamycin inhibitors (mTORi) versus vascular endothelial growth factor (VEGF) tyrosine kinase inhibitors (TKI). Studies were evaluated for 1) use of a retrospective cohort design after initiation of second-line therapy, 2) adjustment for patient characteristics, and 3) use of data from multiple centers. Meta-analyses were conducted for comparisons of overall survival (OS) and progression-free survival (PFS). Results Ten studies reported OS and exhibited significant heterogeneity in estimated second-line treatment effects (I2 = 68%; P = 0.001). Four of these were adjusted, multicenter, retrospective cohort studies, and these showed no evidence of heterogeneity (I2 = 0%; P = 0.61) and a significant association between second-line mTORi (>75% everolimus) and longer OS compared to VEGF TKI (>60% sorafenib) (HR = 0.82, 95% CI: 0.68 to 0.98) in a meta-analysis. Seven studies comparing PFS showed significant heterogeneity overall and among the adjusted, multicenter, retrospective cohort studies. Real-world observational data for axitinib outcomes was limited at the time of this study. Conclusions Real-world studies employed different designs and reported heterogeneous results comparing the effectiveness of second-line mTORi and VEGF TKI in the treatment of mRCC. Within the subset of adjusted, multicenter observational studies, second-line use of mTORi was associated with significantly prolonged survival compared with second-line use of VEGF TKI.


Introduction
Renal cell carcinoma (RCC) has a lifetime risk of approximately 1-2%, with one third to one half of cases presenting with or progressing to metastatic disease (mRCC) [1,2]. The prognosis for mRCC is poor, with a historical 5-year survival rate of approximately 10% [3]. During the past decade, the advent of targeted therapies has significantly improved patient outcomes in mRCC. Seven targeted therapies are currently in use: the vascular endothelial growth factor (VEGF) tyrosine kinase inhibitors (TKIs) sorafenib, sunitinib, pazopanib, and axitinib, the VEGF-directed monoclonal antibody bevacizumab, and the mammalian target of rapamycin inhibitors (mTORis) everolimus and temsirolimus. Guidelines recommend treatment initiation with a VEGF TKI for most patients. However, the majority will eventually fail their first line treatment due to disease progression or intolerance. Sequential treatment with subsequent lines of VEGF TKI or mTORi is the current standard of care for mRCC [4]. However, there is no consensus on the optimal sequencing of targeted therapies after the failure of firstline VEGF TKI.
Evidence from available randomized clinical trials does not fully inform laterline treatment choices. The mTORi everolimus has shown superior PFS compared to placebo in the second-line setting, but has not been compared to other secondline targeted therapies in a completed randomized trial [5]. Sorafenib demonstrated comparable progression-free survival (PFS) and superior overall survival (OS) to temsirolimus [6] and inferior PFS compared with axitinib in the second-line setting [7]. However, no other randomized comparisons of targeted therapies are available in the second-line setting. In addition, randomized trials in mRCC have not directly demonstrated impacts on OS, due to crossovers between treatment arms following disease progression. Given the large number of treatment options for mRCC following the failure of a first targeted therapy, the comparative effectiveness of different sequential treatment strategies for mRCC, especially in terms of OS, is of high interest to physicians and patients.
To address this need for comparative evidence, a number of observational studies have been conducted to compare outcomes among different mRCC treatment sequences. The results of these studies have been mixed. Some have associated prolonged PFS or OS with second-line mTORi versus VEGF TKI [8], others with VEGF TKI versus mTORi [9]; others have found no significant differences among second-line treatments [10]. It is possible that differences across these studies could be due to heterogeneity in data sources, study designs and analytical methods. In addition, observational studies may be subject to varying levels of confounding and selection bias due to the lack of randomization [11].
When properly conducted and reported, observational studies can provide a valuable complement to clinical trial evidence in comparative effectiveness research by providing results applicable to broader, more inclusive populations that reflect real-world practice, and by comparing longer-term clinical outcomes such as OS. The differing results among currently available observational studies in mRCC present a challenge to decision makers who are interested in considering real-world evidence.
The present study systematically summarizes and interprets the published realworld evidence comparing OS and PFS for sequential treatment with VEGF TKI-mTORi versus VEGF TKI-VEGF TKI in mRCC patients. Since most patients receive a VEGF TKI in the first-line setting, and many studies do not adequately represent third-line treatment, we focused on comparisons of second-line treatment outcomes as a practical and meaningful first step in understanding the comparative effectiveness of treatment sequences. In addition, since most studies report only class-level treatment groups, we further focused on second-line mTORi versus second-line VEGF TKI at the class level. The objectives of this study are to assess whether the comparative evidence demonstrates significant heterogeneity across studies and to obtain consensus estimates of comparative effectiveness using meta-analysis when studies are suitably similar.

Systematic Literature Review
A systematic literature review was conducted using Medline and Embase (2009)(2010)(2011)(2012)(2013), and conference proceedings from American Society of Clinical Oncology (ASCO), ASCO Genitourinary Cancers Symposium (ASCO-GU), and European Society for Medical Oncology (ESMO) (2011-2013). These date ranges are intended to capture publications of real-world data following the approval of mTORi in 2009, and to capture recent real-world data presented at conferences but not yet published in manuscript form. Search queries are included in the S1 Appendix in S1 File. We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines in designing, performing, and reporting the systematic review (S1 Checklist in S1 File) [12]. Included studies were required to: 1) be observational (i.e., non-randomized), 2) compare mTORi versus VEGF TKI as second-line treatments for mRCC, 3) report PFS or OS outcomes, and 4) be published in English. Reviews, case reports, economic models, analyses of randomized trials and other studies not reporting analyses of real-world data were excluded. When multiple analyses were identified using the same data source, only the analysis based on the most recent data was included. When peer-reviewed publications and conference presentations were identified for the same analysis, the conference presentation was excluded. The systematic literature review was conducted on September 3 rd , 2013. Two researchers (YZ and PQ) independently applied the selection criteria, extracted the relevant data into a data collection spreadsheet with prepared fields, and assessed the quality of each included study; a third party (NL) was consulted to arbitrate disagreement.

Assessment of Study Designs
In order to evaluate the reliability of comparative evidence, a pre-planned assessment of study designs was conducted. Included studies were classified according to criteria derived from the Newcastle-Ottawa Quality Assessment Scale for Cohort Studies [13].
1) Use of a retrospective cohort design after second-line treatment initiation. In a retrospective cohort design, inclusion criteria are applied only to patient history up to and including the exposure of interest, in this case the initiation of a second-line targeted therapy. All patients meeting the inclusion criteria are then followed, retrospectively, as long as possible for outcome events (progression or death) or censoring due to the end of follow-up. A common departure from retrospective cohort designs occurs when a patient's inclusion in the study depends on events occurring after the exposure of interest, such as initiating a later-line treatment. This results in immortal time bias, which will bias comparative treatment effects to an unknown degree and direction [14]. The Newcastle-Ottawa scale does not explicitly include this criterion, however it is implicit in the classification of a study as a cohort study. In addition, study designs that are not valid retrospective cohorts would fail to satisfy the ''representativeness of the exposed cohort'' and the ''adequacy of follow-up'' items in the Newcastle-Ottawa scale. Therefore, the present review considered studies using a retrospective cohort design more reliable than those that did not. 2) Adjustment for patient characteristics. Non-randomized treatment groups, as are found in observational studies, may have different patient characteristics prior to starting treatment. Such differences can result in confounding bias, i.e. differences outcomes between treatment groups that are due to differences in patient characteristics, such as demographics, severity, prognostic factors, rather than to treatment effects. The risk of confounding bias may be reduced by adjustment for pre-treatment characteristics, such as in a multivariable regression model [15,16]. In the present review, we assessed whether or not each study reported adjusted comparative analyses, and summarized the patient characteristics included in the adjustment. Comparative analyses that do not adjust for baseline differences would fail to satisfy the comparability of cohorts criterion in the Newcastle-Ottawa scale. Adjusted results were considered more reliable than unadjusted results in the present review. 3) Inclusion of data from multiple study centers. Multicenter studies are more likely to be representative and generalizable to broader populations, and are therefore considered more reliable than single-center studies [16,17] In addition, treatment patterns at a single center may consistently channel particular patient profiles to particular treatments, which can result in confounding biases that are difficult to address via adjustment for patient characteristics. The Newcastle-Ottawa scale includes an assessment of the representativeness of the cohorts. In the present review, multicenter studies were considered more reliable than single-center studies.
Additional items from the Newcastle-Ottawa scale, including ascertainment of exposure, methods of outcome assessments and reporting of follow-up, were also evaluated.

Meta-Analyses
Estimated treatment effects of second-line mTORi versus VEGF TKI were synthesized for OS and PFS across all identified studies using meta-analysis. Treatment effects were measured as hazard ratios (HRs). Pooled HRs and associated 95% confidence intervals (CIs) and P values were estimated under a random effects model. Separate meta-analyses were then applied to the subgroup of adjusted, multicenter, retrospective cohort studies (i.e., studies meeting all three criteria described above). When studies did not report HRs, they were imputed based on reported medians and associated 95% CIs for time to event and a constant hazard assumption. In each meta-analysis, heterogeneity was assessed using I 2 and tested with Cochran's Q statistic and its associated P value. Small study bias was also assessed using funnel plots and Egger's tests. Meta-analyses were conducted using the R software [18].

Results
The systematic literature review identified 12 studies meeting all inclusion criteria: 6 peer-reviewed journal publications [8][9][10][19][20][21] and 6 conference abstracts/ posters [22][23][24][25][26][27] (Fig. 1). Among these studies, 10 reported treatment effects on OS and 7 reported effects on PFS and were subsequently included in further analyses for OS and PFS, respectively. Studies reporting OS included a pooled total of 2,228 patients: 961 patients who received second-line mTORi and 1,267 patients who received second-line VEGF TKI. Studies reporting PFS included a pooled total of 1,926 patients: 916 patients who received second-line mTORi and 1,010 patients who received second-line VEGF TKI.

Studies reporting OS
Study designs differed substantially among the 10 studies reporting OS (Table 1). Seven employed a retrospective cohort design [8,10,[20][21][22][23][24]. The 3 studies that departed from a retrospective cohort design did so by requiring patients to receive a third-line therapy after the initiation of second-line treatment [9,19,27], resulting in immortal time bias for the effects of second-line treatment. Seven out of the 10 studies reported adjusted treatment effects [8][9][10][19][20][21][22]. Patient characteristics used for multivariable adjustment are listed in Table 2. With the Effectiveness of 2nd-Line Targeted Therapies for mRCC   exception of one claims-based study [10], the studies adjusted for similar mRCC prognostic factors, including the Memorial Sloan-Kettering Cancer Center (MSKCC) score [28], the Heng et al. criteria [29] or their components (MSKCC score components: Karnofsky performance status (KPS), time from diagnosis to therapy, serum lactate dehydrogenase level, hemoglobin level, and corrected serum calcium; Heng et al. criteria components: KPS, time from diagnosis to therapy, hemoglobin level, corrected serum calcium, neutrophil level, and platelet level). Eight out of the 10 studies were conducted in multiple centers in North America and Europe [8-10, 19, 20, 22, 23, 27]; the 2 identified single-centered studies were conducted in South Korea [21] and Spain [24]. Four studies met all 3 criteria (i.e., were multicenter, adjusted, retrospective cohort studies) and were considered for separate meta-analyses [8,10,20,22]. The 10 studies differed in the allowed reasons for discontinuing first-line therapy, ranging from requiring progression on first-line [20] to broader definitions of first-line treatment failure including progression, non-response and lack of tolerability [8,19,21] (Table 1). Additional criteria included in the Newcastle-Ottawa scale either did not differentiate among studies or were not relevant for this review of OS and PFS. In particular, in all studies patients were necessarily free of the outcomes (observed progression or death) at the start of second-line therapy. None of the studies included outcome assessments that were blinded to treatment group. No studies provided a detailed accounting of all subjects lost to follow-up, however all studies used statistical methods appropriate for random censoring. In all studies, ascertainment of exposure was based on secure records (medical records or claims). Hazard ratios for death comparing second-line mTORi versus VEGF TKI ranged from 0.65 to 3.13 across the 10 identified studies. A meta-analysis pooling all of these HRs exhibited significant heterogeneity, with over twice as much variability arising from between study differences as from within studies (I 2 568%; P50.001; Fig. 2). No evidence of small study reporting bias was detected by the funnel plot (Fig. 3) or the Egger's test (P50.146). No significant difference in OS was identified between treatment sequences in this overall meta-analysis (HR51.11, 95% CI 0.84-1.45, P50.491), and, more importantly, the pooled effect estimate is difficult to interpret due to the significant heterogeneity.
A meta-analysis including only the 4 adjusted, multicenter, retrospective cohort studies [8,10,20,22] was also performed (Fig. 4). These 4 studies included a total of 1,464 patients, constituting over half of the total number of patients in all 10 studies. 689 of these patients received mTORi (.75% everolimus) and 775 patients received VEGF TKI therapy (.60% sorafenib, no axitinib) in the secondline. There was no evidence of heterogeneity in the comparative effects estimates among these 4 studies (I 2 50%; P50.608). The funnel plot was symmetrical, indicating no evidence of publication bias ( Fig. 5; Egger's test was not performed due to the small number of studies). In a meta-analysis of these four studies meeting reliability criteria, second-line mTORi was associated with significantly prolonged OS compared with VEGF TKI, corresponding to an 18% reduction in the hazard of death (HR50.82, 95% CI 0.68 to 0.98, P50.028).
As a sensitivity analysis, we further investigated the impact of one additional study, Park et al. [21] which used an adjusted retrospective cohort design, but was conducted in a single-center in South Korea (N542 patients with mTORi and N541 patients with VEGF TKI as second-line treatment). This study reported numerically shorter OS for second-line mTORi compared to VEGF TKI (adjusted HR51.71, 95% CI 0.86 to 3.40, P50.125), which, despite the wide confidence interval and small sample size, was significantly different from the pooled HR among the 4 adjusted, multicenter, retrospective cohort studies (P50.004). When Park et al. was pooled with these 4 studies, the resulting HR for mTORi versus

Discussion
This study systematically reviewed and synthesized real-world comparative evidence for second-line mTORi versus VEGF TKI in the treatment of mRCC. Study designs and patient populations varied across studies, and this variation was reflected in significant heterogeneity in the estimated comparative effects on OS and PFS. Pooling all of the studies together, there was no evidence of a difference between mTORi and VEGF TKI in the second-line setting. However, more importantly, the high level of heterogeneity indicated that it was not possible to draw a single comparative conclusion from the diverse collection of all identified studies.
When synthesizing comparative evidence, it is important to consider the reliability of the included studies in addition to their heterogeneity. To this end, we applied three criteria, adapted from the Newcastle-Ottawa scale, to identify studies with the most reliable comparative study designs [13]. First, we required studies to follow a retrospective cohort design that imposed inclusion criteria only up to the initiation of second-line therapy, and then followed all included patients as long as possible for outcomes. This is similar to requiring an ''intent-to-treat'' approach in clinical trials. The three studies that did not meet this criterion required patients to initiate third-line therapy [9,19,27], and therefore excluded large proportions of second-line patients who did not reach third line during the study period due to loss to follow-up, continuation of second-line treatment at the time of chart review, death during second-line therapy, or other reasons. These patients contain valuable information about second-line treatment outcomes. While designs that exclude these patients, by requiring three lines of treatment, provide valuable retrospective descriptions of patient treatment sequences, they are not valid designs for comparing the effectiveness of secondline treatment choices [14]. Indeed, such study populations cannot be identified in clinical practice at the time second-line treatment decisions are made because future use of third-line treatment is unknown at that point. As a second criterion, we required studies to report comparative outcomes that were adjusted for patients' characteristics prior to the initiation of second-line treatment. Comparative studies that do not adjust for baseline characteristics could be biased by avoidable baseline differences [15,16]. Finally, we required studies to draw data from multiple treatment centers, as such studies are considered more representative and generalizable than single-center studies [16,17]. Studies not meeting these three criteria may provide valuable descriptive evidence, and could meet general quality criteria for reporting of observational studies, but do not provide the same level of comparative evidence as studies that do meet the criteria. On a per-patient basis, the majority of the evidence identified in our systematic review met all three of these reliability criteria. Additional items from the Newcastle-Ottawa scale were also evaluated, but did not differentiate among studies.
It is notable that after focusing the meta-analysis on adjusted, multicenter, retrospective cohort studies, there was no evidence of heterogeneity in estimated second-line treatment effects on OS. This suggests that these four studies, although based on diverse data sources including a prospective multi-national registry, medical records from Germany, a retrospective chart review in the US and US claims data, are estimating the same underlying association between second-line treatment and OS. The pooled estimate from these studies showed a significant association between use of mTORi and prolonged OS compared with VEGF TKI in the second-line setting. The magnitude of the difference was clinically significant, representing an 18% decrease in the hazard of death associated with second-line mTORi.
One additional study that employed an adjusted, retrospective cohort design, but was conducted at a single center in South Korea, was considered in a sensitivity analysis. Despite including fewer than 100 patients, this study showed a significantly different and opposite association between second-line treatment and OS than the pooled analysis of the four studies meeting all three criteria. It was not possible to assess whether this difference was due to factors affecting the single center in South Korea, or other potential differences. Nevertheless, inclusion of this study in the meta-analysis, along with the adjusted, multicenter, retrospective cohort studies, did not significantly change the hazard ratio for second-line mTORi versus VEGF TKI.
As observed for the comparative studies of OS, the full group of studies comparing PFS showed significant heterogeneity and no significant differences between second-line mTORi and VEGF TKI. However, even after focusing the meta-analysis of PFS on adjusted, multicenter, retrospective cohort studies, significant heterogeneity remained among the PFS comparisons. Potential reasons for greater heterogeneity in PFS were not clear. Results were consistent between two separate US-based chart reviews, which suggested longer PFS with second-line mTORi versus VEGF TKI [8,26]. However, a multinational European study reported the opposite association [25]. It was not possible to reach a consensus conclusion about comparative effects on PFS by pooling these studies.
This review and meta-analysis of observational studies carries important limitations. The foremost limitation is that the meta-analyses are based on nonrandomized treatment comparisons. The comparisons between drug classes may be confounded by differences in the types of patients treated with each class. Potential confounding factors may include, for example, differences in age, metastatic burden, RCC histology, performance status, response to first VEGF TKI, lab values (e.g., neutrophil count, platelet count, corrected calcium level) or composite risk scores (e.g., MSKCC or Heng et al. criteria). Study design features that depart from a retrospective cohort design, such as requiring the initiation of a 3rd-line treatment, could also introduce bias. Since the present study relied on published data, it was not possible to adjust for pre-specified characteristics at the patient level. We aimed to limit the potential for confounding in our metaanalyses by conducting sub-analyses of published studies that included more reliable comparative designs. However, even the included studies with the more reliable comparative designs and adjustment for important prognostic factors, may be confounded by unobserved differences in patient populations.
Only a well-conducted randomized trial can avoid the potential for confounding. However, little evidence from randomized controlled trials comparing mTORi to VEGF TKI in the second-line setting is currently available. A recent randomized controlled trial reported comparable PFS but significantly better OS for second-line use of sorafenib, a VEGF TKI, versus temsirolimus, an mTORi [6]. However, this study did not report subsequent treatments that were off-protocol, which might have influenced the results. Additionally, this study did not include everolimus, the mTORi used by the majority of the patients in the present study, or other VEGF TKIs (e.g., sunitinib); therefore, a comparative conclusion at the class level cannot be made. There are also potential limitations due to missing or inaccurate data obtained from real-world practice. In particular, assessments of progression may vary across practices and patients depending on visit schedules and the use and interpretation of imaging. The present study also pooled treatments at the class level, comparing mTORi vs. VEGF TKI, since most underlying studies did not report drug-specific results. However, there is evidence that individual drug effects can vary within these classes [7,8,10]. Future realworld research, with adequate sample sizes, will be valuable for understanding drug specific effects. In the present study, the majority of second-line mTOR use was everolimus and the majority of second-line VEGF TKI use was sorafenib.
This study also carries limitations inherent in meta-analysis. Though we conducted a systematic review of both peer-reviewed publications and conference proceedings, there is a possibility of publication bias, such as the selective reporting of significant findings. However, there is reason to believe that publication bias was absent or negligible in the present review. Following the advent of new targeted therapies for mRCC, there has been high interest in any real-world data on treatment outcomes with sequential targeted therapy. Indeed, most identified studies did not individually show statistically significant treatment differences. It should also be noted that this review included conference proceedings in addition to peer-reviewed publications. Conference proceedings may be subject to revision during peer review. On the other hand, conference proceedings often report more recent real-world data, which is important when studying the outcomes that reflect recently approved treatments, including everolimus in the second line setting and pazopanib in the first-line setting. At the time of this study, adequate real-world non-clinical trial evidence was not available for study of axitinib outcomes in the second-line setting and hence no axitinib-specific data were included. Whether axitinib would change the reported results remains to be seen in future studies. Finally, the present study did not compare outcomes among different 3 rd -line treatment choices. An appropriate retrospective cohort design for comparing 3 rd -line treatment outcomes would follow patients after initiation of 3 rd -line treatment, and would adjust for patient characteristics available at the time of 3rd-line treatment initiation, including treatments received in the first and second line. Future real-world studies of 3 rdline treatment outcomes will be valuable.

Conclusions
In this systematic review, real-world studies employed different designs and reported heterogeneous results comparing the effectiveness of second-line mTORi and VEGF TKI in the treatment of mRCC. Due to the high heterogeneity, it was not possible to draw a comparative conclusion from the full set of identified studies. In a sub-analysis of studies with more reliable designs for comparative analysis (i.e., adjusted, multicenter, retrospective cohort studies), second-line use of mTORi was associated with significantly prolonged OS compared with secondline use of VEGF TKI in the treatment of mRCC. Real-world outcomes for axitinib were not available at the time of this analysis, and should be considered in future studies. The present review demonstrates that study design should be considered when interpreting observational studies comparing treatment sequences in mRCC.