Prognostic Value of 18F-FDG PET/CT in Surgical Non-Small Cell Lung Cancer: A Meta-Analysis

Background The identification of surgical non-small cell lung cancer (NSCLC) patients with poor prognosis is a priority in clinical oncology because of their high 5-year mortality. This meta-analysis explored the prognostic value of maximal standardized uptake value (SUVmax), metabolic tumor volume (MTV) and total lesion glycolysis (TLG) on disease-free survival (DFS) and overall survival (OS) in surgical NSCLC patients. Materials and Methods MEDLINE, EMBASE and Cochrane Libraries were systematically searched until August 1, 2015. Prospective or retrospective studies that evaluated the prognostic roles of preoperative 18F-FDG PET/CT with complete DFS and OS data in surgical NSCLC patients were included. The impact of SUVmax, MTV or TLG on survival was measured using hazard ratios (HR). Sub-group analyses were performed based on disease stage, pathological classification, surgery only and cut-off values. Results Thirty-six studies comprised of 5807 patients were included. The combined HRs for DFS were 2.74 (95%CI 2.33–3.24, unadjusted) and 2.43 (95%CI: 1.76–3.36, adjusted) for SUVmax, 2.27 (95%CI 1.77–2.90, unadjusted) and 2.49 (95%CI 1.23–5.04, adjusted) for MTV, and 2.46 (95%CI 1.91–3.17, unadjusted) and 2.97 (95%CI 1.68–5.28, adjusted) for TLG. The pooled HRs for OS were 2.54 (95%CI 1.86–3.49, unadjusted) and 1.52 (95%CI 1.16–2.00, adjusted) for SUVmax, 2.07 (95%CI 1.16–3.69, unadjusted) and 1.91 (95%CI 1.13–3.22, adjusted) for MTV, and 2.47 (95%CI 1.38–4.43, unadjusted) and 1.94 (95%CI 1.12–3.33, adjusted) for TLG. Begg’s test detected publication bias, the trim and fill procedure was performed, and similar HRs were obtained. The prognostic role of SUVmax, MTV and TLG remained similar in the sub-group analyses. Conclusions High values of SUVmax, MTV and TLG predicted a higher risk of recurrence or death in patients with surgical NSCLC. We suggest the use of FDG PET/CT to select patients who are at high risk of disease recurrence or death and may benefit from aggressive treatments.


Introduction
The application of advanced diagnostic and screening techniques led to the increased detection of early staged non-small cell lung cancers (NSCLC) and improved cures using standard surgery [1]. The 5-year survival after resection of localized NSCLC approaches a modest 50% despite improved surgical techniques and advanced adjuvant therapy [2,3]. No prognostic factor, except stage and performance status, was definitively established in NSCLC. Accurate markers would be invaluable to stratify patients for adjuvant therapy and predict outcomes. 18 F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) is the standard modality for staging, treatment response monitoring and prognosis prediction for a variety of tumors, including NSCLC [4,5]. Standardized uptake value (SUV) is a semi-quantitative determination of the normalized concentration of radioactivity, and maximum SUV (SUV max ) is the most widely applied parameter in clinical practice [6]. Volumetric parameters, including metabolic tumor volume (MTV) and total lesion glycolysis (TLG), were also used recently to reflect disease burden and tumor aggressiveness in NSCLC [4,7]. Several recent systematic reviews and meta-analyses [8][9][10] found that SUV was negatively correlated with prognosis in heterogeneous groups of NSCLC patients. Im et al. [11] reported significant prognostic values of MTV and TLG on survival in NSCLC patients. However, the quality of existing studies has not been systematically assessed, and their clinical features have not been fully assessed to further evaluate the potential association between SUV or volumetric parameters and prognosis in surgical NSCLC. Therefore, we performed a meta-analysis to identify, appraise, and synthesize results from published studies that examined the prognostic value of SUV max , MTV and TLG on diseasefree survival (DFS) and overall survival (OS) in surgical NSCLC patients.

Materials and Methods
Search Strategy and Eligible Criteria MEDLINE, EMBASE and Cochrane Library were searched and updated through August 1, 2015. The following terms were used: "non-small cell lung cancer" OR "NSCLC" OR "carcinoma, nonsmall cell lung" AND " 18 F-FDG" OR "fluorodeoxyglucose" OR "PET" OR "positron emission tomography" AND "survival" OR "local control" OR "prognostic" OR "outcome" OR "predict" AND "surgery" OR "resect" OR "operation". Reviews, case studies, conference abstracts and editorials were excluded.
Two authors independently searched articles and performed an initial screening of identified titles and abstracts. Articles were further reviewed if they reported the prognosis of surgically resected NSCLC patients with pre-operative 18 F-FDG PET/CT imaging from original data. Full-text articles were used for the second screening. The following inclusion criteria for the meta-analysis were used: (1) prospective or retrospective studies investigating the correlation of FDG uptake with DFS, recurrence-free survival (RFS), and/or OS; (2) pathological stage I-IIIA NSCLC patients who received diagnostic 18 F-FDG PET/CT scanning before treatments; (3) treated with surgery alone or adjuvant therapy; (4) survival data assessed in detail; and (5) surgical procedures included either full anatomical resections or limited lung resection regardless of whether they were performed via open thoracotomy or video-assisted thoracic surgery. A consensus resolved any discrepancies.
Studies that included patients with small cell lung cancer (SCLC) were eligible if more than 95% of patients had NSCLC. Patients with an advanced stage (IIIB-IV) also accounted for less than 5% of the included studies. Data were partially extracted when only certain sub-group analyses met our inclusion criteria. Studies that included patients who received neoadjuvant therapy were excluded. Only the most recent or complete report was included when the survival results of the same patient population were reported more than once.

Data Extraction
Data extraction was conducted in agreement with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidance (S1 PRISMA Checklist) [12]. Two investigators independently extracted information, including the first author, publication year, country, study design, sample size, stage, treatment, and survival endpoints. The primary endpoint was DFS, which was measured from the defined starting point in each study to the date of recurrence or first progression. OS was taken as the secondary endpoint and defined as the time from the starting point applied in each study to death.

Study Quality Control
Three investigators reviewed and scored each article independently using a quality scale (S1 File). Quality assessment included four modified parts based on similar studies: scientific design, the generalizability of the results, data analyses, and PET reports [13][14][15]. Five items were observed in each part. A point value of 0, 1, or 2 was scored to each item. A consensus was obtained of all investigators present, which ensured the objectivity of the scores and correct interpretation. Final scores are expressed as percentages, and higher values reflect a greater consistence with quality assessment standards. Any article with a final score < 60% was excluded.

Statistical Analysis
Review Manager statistical software (RevMan, version 5.3) was used. The impact of SUV max , MTV and TLG on DFS and OS was measured using hazard ratios (HRs). Survival data were extracted using the methodology suggested by Tierney et al. [16]. Cut-off values of SUV max , MTV and TLG and the delineation thresholds applied to MTV and TLG were determined based on the definition applied in each individual study. Unadjusted and adjusted values were extracted for risk measurements. We extracted the HR estimate and 95% confidence intervals (CIs) directly from each study when provided by the authors. P values of the log-rank test, number analyzed in each group, and the number of events were extracted to estimate the univariate HR indirectly. Correlations between the quality scores and the number of patients were measured using the Spearman's rank correlation coefficient.
Heterogeneity was evaluated using Cochrane's Q test and I 2 [17]. P<0.05 in Q test was considered significant heterogeneity. An I 2 value of 0% indicates no heterogeneity, a value less than 25% indicates low heterogeneity, a value of 25.1-50% indicates moderate heterogeneity, and a value greater than 50% indicates substantial heterogeneity [18]. A fixed effect model was used to calculate the pooled HRs when no, low or moderate heterogeneity was observed. A random effects model was applied when substantial or significant heterogeneity was observed. An HR greater than 1 implied worse survival outcome for patients with high SUV max , MTV or TLG, but an HR less than 1 implied a survival benefit for patients with high SUV max , MTV or TLG. Sub-group analyses were performed based on histological subtypes, pathological stage, surgery only and cut-off values.
The possibility of publication bias was estimated using visual inspection of a funnel plot. Begg's test was performed for meta-analysis that included more than 10 studies [19,20]. We also performed non-parametric "trim and fill" procedures to further estimate the potential influence of publication bias [21,22]. This procedure calculates a new pooled HR that incorporates hypothetical missing studies.

Study Characteristics and Qualitative Assessment
Thirty-six eligible studies were included in the meta-analysis   (Fig 1, Tables 1 and 2). Only two studies [31,37] were prospectively designed. The studies were published between 2000 and 2015, and the sample size varied from 49 to 530 subjects (median 102). Only 5 SCLC [40,51] patients were mixed into the analysis of 5807 patients. Four studies lacked raw data of stage [23,29,40,49], but the distribution of stages I, II, III and IV were 80.4%, 14.2%, 4.5% (2.7% IIIA, 0.9% IIIB, and 0.9% stage III) and 0.9%, respectively. Table 1 lists PET/CT scans and models. The dose of FDG injected varied from 150 to 666 MBq based on different individual scanning protocols. The time duration before scanning was 40-60 minutes in 28 studies, 81 minutes in 1 study, 90 minutes in 1 study and not reported in 6 studies. SUV max was measured in 34 studies [23][24][25], which normalized values by body weight. MTV was measured in 7 studies [24, 26-29, 52, 53], and TLG was measured in 7 studies [24,26,27,29,52,53,56]. A fixed SUV of 2.5 [27,28,52,56], the gradient segmentation method [29], a 50% of SUV max [24], a 42% of SUV max [53], and mediastinal background SUV mean plus 2 standard deviations [26] were adapted to segment volumes of interest. A minimum P value, receiver operating characteristics (ROCs), and median value were applied in most studies to determine cut-off values. Median cut-off points were 5. Adjusted HRs were determined for 25 studies. Most risk measures were adjusted for tumor size or T stage, stage, age, gender and histology, and other studies were adjusted for lymph node status, differentiation and CEA level.
Twenty-seven studies published complete resection rates as 100%, while the remaining studies did not report rates. Average (mean or median) follow-up duration was given in 29 studies and ranged from 16.6 to 64 months (median 32.0 months). The follow-up design was reported in detail in 11 studies, but it was not indicated in 20 studies.
Attempts to contact the authors to obtain missing information of methodological quality were made when necessary, and the mean quality score was 77.5% (70.0% to 87.5%). Spearman's correlation coefficient was 0.326 between the quality score and the number of patients (P = 0.36).

Discussion
There is a high risk of local relapse and distant metastasis after curative resection for earlystage and localized NSCLC. Therefore, adjuvant therapy was explored to eliminate occult metastases and/or loco-regional residual tumor cells with a consequent reduction on recurrence and prolonged survival. It is essential to identify prognostic factors that may predict patients who are at a high risk of recurrence who will attain the most benefit from the adjuvant therapy to optimize the treatment. The evidence-based use of adjuvant therapy is highly dependent on clinical-pathological tumor staging information in the clinical setting. The role of 18 F-FDG PET/CT imaging for the prediction of local control and OS in surgical NSCLC must be investigated because it may provide important biological information beyond TNM staging.
The present systemic review and meta-analysis found that higher values of SUV max , MTV and TLG predicted a higher risk of disease recurrence or death in patients with surgical NSCLC. The positive association remained statistically significant across stratified analyses according to stage, pathology and cut-off values. FDG PET/CT may be used to select patients who are at high risk of tumor recurrence or death and may benefit from subsequent more aggressive treatments. SUV max is the most commonly used parameter in 18 F-FDG PET/CT diagnosis and response monitoring because of high reproducibility and availability. The potential prognostic value of SUV max for primary lung cancer was widely reported in various staged and treated populations [8][9][10]14] (Table 4). Therefore, our meta-analysis focused on surgical NSCLC only and provided the most comprehensive information for the total population and sub-groups based on disease stage, pathological classification and cut-off values. However, SUV max only provides information about a single volumetric pixel within the tumor, and it does not measure the volume or heterogeneity of metabolically active disease. Volumetric parameters, such as MTV and TLG, were investigated recently. The prognostic role of MTV and TLG was meta-analyzed in NSCLC patients with different stages [11]. Similar results were derived in our study, which focused on surgical NSCLC patients. Volume-based parameters exhibit advantages in the measurement of metabolic tumor burden, but controversy on the most appropriate segmentation method to measure MTV and TLG remains. Potential preferable performance of volumetric parameters to SUV max as prognostic factors were reported by the studies [24,28,29,52,53] that reported complete data of FDG PET/CT-derived parameters. The present meta-analysis demonstrated that SUV max performed equally with volumetric parameters based on existing data because of the limited data of volumetric parameters compared with FDG uptake. Other FDG PET/CT imaging characteristics beyond traditional parameters were also studied, such as intratumor FDG uptake heterogeneity. This parameter, as an area under the curve (AUC) of the cumulative histogram, and texture analysis predict tumor control [59] and are independent prognostic factors for survival [60][61][62] in NSCLC. However, these reports were not included in present meta-analysis because the study population was relatively small. Patient heterogeneity, statistical data mining, retrospective cohorts, PET acquisition and calculations of SUV max are significant contributors to heterogeneity, which limited the application of glucose uptake as a companion diagnostic/prognostic marker. NSCLC is a heterogeneous disease. Patients with different histological types, stages, surgical procedures and adjuvant treatments were included in the meta-analysis. For example, Higashi et al. [41] and Stiles et al. [44] applied similar thresholds for FDG uptake. Significant differences were found in Higashi's study in DFS (HR 8.17, 95%CI: 2.83-23.53), but statistically significant differences in DFS were not found in Stiles's study (HR 1.54, 95%CI: 0.92-2.56). There were more patients with stage I NSCLC (80.7% versus 76.8%) and more patients with bronchioloalveolar cell carcinoma (22.8% of BAC versus <8.3%) in Higashi's study, which may explain the lower risk of recurrence in patients with low tumor FDG uptake. The heterogeneity in PET imaging thresholds was also obvious between the studies, which be explained by many factors, including the type of PET machine, the algorithms for iteration and reconstruction, the time elapsed between FDG injection and emission scan, and the method for threshold determination. Differences in defining the regions of interest [63] and timing of the data acquisition [64] may also result in different absolute SUV estimates.
Heterogeneity between the included reports was the main limitation of this meta-analysis. Non-English articles were excluded. The fact that small sample studies with negative results are less frequently published or published with simple descriptions led to the phenomenon of increased standard error for higher HRs. The trim and fill sensitivity analysis in the present study, which incorporates the hypothetical missing studies, did not change the general result, which suggests that the association was convincible. Individual HRs from small sample studies weighed less in the total HR, and it was also helpful to ensure the reliability of results. MTV and TLG were measured in 7 studies only. Multivariate analyses were based on 5 studies for MTV and 4 for TLG. Too little data were available to meta-analyze the values of volumetric PET/CT parameters for the prediction of patient's prognosis. Only 2 included studies were prospectively designed, but PET as a biomarker to prognosticate or predict the response to therapy was assessed over 10 years. The prospectively designed studies [65,66] that were ineligible for the present meta-analysis also reported primarily positive results on various FDG PET/CTderived parameters of lung cancer patients. Our meta-analysis offers a considerably valid conclusion for clinical practice under the circumstance of insufficient evidence from prospectively designed data.
In summary, this meta-analysis demonstrated that high values of SUV max and MTV derived from the pretreatment of 18 F-FDG PET/CT predicted a higher risk of recurrence or death in surgical NSCLC patients. Our findings suggest that FDG PET/CT may be used for risk stratification in disease control and survival. Patients with tumors who exhibit intense FDG uptake may be considered at a high risk of treatment failure and may benefit from more aggressive treatment. Further individual patient data should be meta-analyzed to determine the optimal threshold for PET imaging parameters.