Predictive value of interim 18F-FDG-PET in patients with non-small cell lung cancer treated with definitive radiation therapy

Purpose We evaluated that early metabolic response determined by 18F-fluorodeoxyglucose positron emission tomography/computed tomography (FDG-PET/CT) during radiotherapy (RT), predicts outcomes in non-small cell lung cancer. Material and methods Twenty-eight patients evaluated using pretreatment 18F-FDG-PET/CT (PETpre) and interim 18F-FDG-PET/CT (PETinterim) after 11 fractions of RT were retrospectively reviewed. Maximum standardized uptake value (SUVmax) was calculated for primary lesion. Predictive value of gross tumor volume (ΔGTV) and SUVmax (ΔSUVmax) changes was evaluated for locoregional control (LRC), distant failure (DF), and overall survival (OS). Metabolic responders were patients with ΔSUVmax >40%. Results Metabolic responders showed better trends in 1-year LRC (90.9%) than non-responders (47.1%) (p = 0.086). Patients with large GTVpre (≥120 cc) demonstrated poor LRC (hazard ratio 4.14, p = 0.022), while metabolic non-responders with small GTVpre (<120 cc) and metabolic responders with large GTVpre both had 1-year LRC rates of 75.0%. Reduction of 25% in GTV was not associated with LRC; however, metabolic responders without a GTV response showed better 1-year LRC (83.3%) than metabolic non-responders with a reduction in GTV (42.9%). Metabolic responders showed lower 1-year DF (16.7%) than non-responders (50.0%) (p = 0.025). An ΔSUVmax threshold of 40% yielded accuracy of 64% for predicting LRC, 75% for DF, and 54% for OS. However, ΔGTV > 25% demonstrated inferior diagnostic values than metabolic response. Conclusions Changes in tumor metabolism diagnosed using PETinterim during RT better predicted treatment responses, recurrences, and prognosis than other factors historically used.

Background Fluoro-2-deoxy-D-glucose-positron emission tomography (FDG-PET) imaging has become an important and popular tool for determining the disease stage in patients with non-smallcell lung cancer (NSCLC). The National Comprehensive Cancer Network recommends the use of 18 F-FDG-PET/computed tomography (CT) for the appropriate staging of lung cancer [1].
There are several roles of FDG-PET/CT in NSCLC, such as diagnosis, prognosis, and radiotherapy (RT) planning. Recent investigations have shown that FDG-PET/CT has more than 90% accuracy in diagnosis of malignant nodules, with a low false-positive rate [2]. FDG-PET also plays a significant role in nodal staging (accuracy 90%, sensitivity 79-85%, and specificity 87-92%) [3,4] and distant metastasis detection, with previously unsuspected diagnosis of extrathoracic lesions in up to 10% of patients, beyond CT alone [5]. FDG-PET offers a benefit over conventional CT after treatment where, for example, although tumor shrinkage may be observed, inflammation and fibrosis after neoadjuvant chemotherapy or RT make assessment difficult [1].
In addition, FDG-PET plays an important role in target volume delineation of the gross tumor volume (GTV), for both the primary tumor and lymph nodes [6]. Its superior contrast between tumor and non-tumor tissue means that FDG-PET can also decrease inter-physician contouring variability, compared to delineation with CT alone [7]. It also greatly assists physicians in distinguishing the tumor tissue from atelectasis [8]. Therefore, a consensus report has been endorsed for target volume delineation using PET imaging [9].
Currently, chemoradiotherapy (sequential or concurrent) is considered as a standard treatment for locally advanced NSCLC. Despite the emergence of immunotherapy, targeted therapy, and new RT techniques, the prognosis of those patients remains poor. Therefore, the ability to identify non-responders during treatment, in order to change ineffective treatment early on, is very desirable [10]. Several studies have demonstrated interim PET (PET interim ) metrics as a prognostic factor, but most of these included conventional three-dimensional conformal RT and various chemotherapy regimens, with varied timing of PET interim . Therefore, in this study, we focused on metabolic and volumetric parameters, which are easily accessible during RT, in patients treated with modern RT and certain chemotherapy regimens.

Study population
Patients diagnosed with NSCLC who had undergone RT with PET interim between March 2015 and January 2018 were enrolled. Patients were excluded if they underwent RT with preoperative aim (n = 7), if pre-RT FDG-PET/CT (PET pre ) was not available or was performed at another institution (n = 6), if they did not complete RT (n = 2), and if follow-up details were missing (n = 4). Ultimately, we retrospectively reviewed medical records and tumor characteristics of 28 patients, as well as their clinical outcomes. This study was approved by the Health Institutional Review Board of Yonsei University Hospital (No. 4-2019-0608). The study was conducted in accordance with the provisions of the 1975 Declaration of Helsinki. The requirement for informed consent was waived owing to the retrospective nature of this study. All data between March 2015 and May 2019 were fully anonymized before authors accessed them.

Treatment
All patients, except three patients who were medically ineligible due to poor performance and comorbidity, received chemotherapy using a platinum-and taxane-based regimen. Twenty-five patients began on RT administered concurrently with weekly paclitaxel (45 mg per square meter of body-surface area) via intravenous infusion over 1 hour, followed by carboplatin at an area under the plasma concentration time curve (AUC) of 2 mg/mL � minute, with a total dose of AUC � (glomerular filtration rate + 25), as an intravenous infusion over 30 minutes.
All patients underwent simulation four-dimensional CT without contrast enhancement (3-mm slice thickness) for RT planning in both initial plan and interim adaptive plan. The GTV was delineated by single radiation oncology expert with more than 30 year experience in lung cancer (C.G.L) at simulation CT with contrast enhancement, including the primary tumor and involved regional nodes (1 cm or larger in short axis, showing abnormal FDG-avidity on PET pre , or proven on biopsy), based on both CT and pre-RT FDG-PET/CT. The internal GTV was contoured on all-phase four-dimensional CT scans in order to reflect the effects of respiration. The clinical target volume was defined as GTV plus a 3-5-mm margin in order to include microscopic tumor extension. An additional 3-mm margin to both the internal GTV and clinical target volume was added to planning target volume (PTV1 and PTV2, respectively) based on institutional image-guidance strategies. A simultaneous integrated boost was utilized in PTV1 for 63 Gy in 30 fractions and PTV2 for 54 Gy in 30 fractions. All patients were treated with intensity-modulated RT using volumetric-modulated arc therapy (Elekta VMAT, Elekta, Stockholm, Sweden) [11]. Daily pretreatment imaging using kilovoltage conebeam CT was performed for image-guided RT.

F-FDG-PET/CT method
All PET pre and PET interim scans were performed using Discovery STE (GE Healthcare, Milwaukee, WI, USA) scanner. Every patient fasted for a minimum of 6 hours before 18 F-FDG administration, ensuring a blood glucose level below 140 mg/dL. Patients were then injected with FDG at 5.5 MBq/kg. After allowing 45-60 minutes for tracer uptake, patients underwent PET/CT imaging along with a non-contrast low-dose CT scan for attenuation correction (30 mA, 140 kVp). Images were acquired from the base of the skull to the proximal thigh, with acquisition times of 3 minutes/bed position. The intrinsic spatial resolution of the system was approximately 5 mm (full width at half maximum) in the center of the field of view. All PET images were then reconstructed using a three-dimensional row-action maximum likelihood interactive reconstruction algorithm. All patients started RT median 16.5 days (range, 8-35 days) after the PET pre scan to accurately reflect the tumor metabolism.(9) To minimize interpretation difficulty due to non-specific FDG accumulation from radiation-induced inflammation during RT, we performed a PET interim scan at a median of 2 weeks (range 13-22 days) after initiation of RT [12].

PET metrics
PET/CT images were consistently analyzed by two radiation oncology physicians (N.K. and C.G.L.) using the MIM Maestro 6.7 (MIM Software Inc., Cleveland, OH, USA). The region of interest was delineated over the primary tumor on the PET pre and PET interim scans using PET Edge, a semi-automatic gradient-based method validated for its superiority over manual or threshold methods [13]. This algorithm sets the contour boundary at the location where the signal gradient is highest. Then, deformable registration of delineated GTV in contrastenhanced planning CT scans for initial and adaptive plan was performed to adjust the region of interest generated by two blinded radiation oncologists. Final region of interest for further analysis regarding PET parameter was approved by single radiation oncologist (C.G.L.). The SUV was measured in all voxels in the primary tumor region of interest. The maximum SUV (SUV max ) was defined as the maximum decay-corrected activity concentration in the tumor/ (injected dose/body weight). Since metabolic target volume or total lesion glycolysis is based on relative uncertainty compared to maximum value of SUV due to inflammation, fibrosis, or atelectasis in lung cancer, we only analyzed the SUV max in the current study.

Statistical analysis
The percentage change in each parameter between the PET pre and PET interim was calculated using the following equation [14]: Since there is limited information for universally accepted the optimal cut-off value for dynamics in PET parameters, receiver operating characteristics curve analyses regarding any failures were used to assess the cut-off threshold of SUV max from PET interim for identifying metabolic responders. As a reference, volumetric response was assessed based on GTV changes (ΔGTV), with a threshold of 25%, which could improve the response assessment compared to Response Evaluation Criteria in Solid Tumors [15]. Locoregional recurrence (LRR) and distant failures (DF) were defined as any first recurrence within and outside the PTV until the last follow-up, respectively. Overall survival (OS) was calculated from the day of first RT to the date of death or the last follow-up visit. Survival curves were estimated using the Kaplan-Meier method and compared using the log-rank test. Univariable analysis of LRR and DF was performed using Cox regression analysis. A multivariable analysis was not performed because no statistically significant factors were identified on univariable analysis. Sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV) were calculated to assess the diagnostic value of selected parameters. In addition, Delong's test after bootstrapping 200 times was performed to compare the predictive value of selected cutoff values from parameters. The α level of 0.05 was used: a p-value <0.05 was regarded as a rejection to the null hypothesis and therefore considered statistically significant. All statistical analyses were performed using SPSS version 25.0.0 (IBM Corp., Armonk, NY) and R (version 3.6.3; R Foundation for Statistical Computing, Vienna, Austria).

Cohort characteristics
Details of the patients' characteristics are presented in Table 1. Males predominated (92.9%) among the entire group of 28 patients, and the median age was 73.5 years (interquartile range (IQR) 66.0-88.0). Most patients were diagnosed as having squamous cell carcinoma (64.3%), followed by adenocarcinoma (35.7%). Median primary tumor size was 4.1 cm (IQR 3.4-5.3) and more than half of the patients (82.2%) were diagnosed at stage III. The PET interim was obtained approximately 11 fractions after treatment initiation, with a median dose of 23.1 Gy (IQR 23.1-24.7).

Changes during RT
The median GTV pre and SUV max(pre) were 119.6 cc (IQR 85.7-190.6) and 15.5 (IQR 11.5-21.4), respectively. Both GTV and SUV max were generally decreased on PET interim ; median ΔGTV and ΔSUV max were 23.6% (IQR 14.0-49.6%) and 32.9% (IQR 8.4-64.6%), respectively. However, four patients showed an increased GTV and another five showed increased SUV max . The quantitative analysis of SUV max and GTV is summarized in S1 Table.

Treatment outcomes
Median follow-up was 17.7 months (IQR 11.9-22.2). Twelve patients developed LRR, 15 patients showed DF, and 7 patients experienced both LRR and DF; 4 of them encountered with simultaneous LRR and DF as a first treatment failure. The overall 1-year LRR rate was 34.3%, while the DF rate was 36.1% for the entire cohort (S1A Fig). One-year OS and progression-free survival rates were 82.0% and 53.3%, respectively (S1B Fig).

Prognostic factors for treatment outcomes
With an area under the receiver operating characteristics curve of 0.812 for any failures (S2 Table), a threshold of 40% was calculated as the optimal cut-off for ΔSUV max . With this threshold, there were 12 metabolic responders and 16 non-responders. Metabolic response based on ΔSUV max of 40% demonstrated a difference in locoregional control (LRC), but this was not statistically significant for the entire cohort; 1-year LRC rate for metabolic responders (n = 12) was 90.9%, compared to 47.1% for non-responders (n = 16, Fig 1A, Table 2). However, large GTV pre (�120 cc) was identified as a poor prognostic factor for LRC on univariable analysis (HR 4.14, 95% CI 1.23-13.97; p = 0.022), whereas ΔGTV had little impact on LRC (p = 0.341). However, metabolic response showed a borderline impact on LRC, along with GTV pre

PLOS ONE
( Fig 1B). Metabolic responders with a small GTV pre (n = 4) showed the best 1-year LRC rate, of 100%. In contrast, metabolic non-responders with a large GTV pre (n = 8) showed the worst 1-year LRC rate, of 15%. There was no difference in LRC between metabolic non-responders with a small GTV pre (n = 8) and metabolic responders with a large GTV pre (n = 8) (1-year LRC rate 75.0% vs. 75%, p = 0.584). Responders were patients with SUV max reduction rates �40%, whereas non-responders were those with SUV max reduction rates <40%.

Diagnostic tests
The diagnostic test results are presented in Table 4. Using the threshold of 40%, ΔSUV max provided a sensitivity of 56.3%, specificity of 75.0%, accuracy of 64.3%, PPV of 75.0%, and NPV of 56.3% for predicting LRR. GTV pre with a threshold of 120 cc was identified as a tool for predicting LRR, with a diagnostic accuracy of 71.4%. ΔSUV max showed better diagnostic ability for predicting DF than GTV pre , with a sensitivity, specificity, and accuracy of 75.0%; PPV of 80.0%; and NPV of 69.2%. There was no statistical difference in AUC value for LRR between ΔSUV max and GTV pre criteria (

Discussion
In this study, we investigated the predictive value of using 18 F-FDG-PET parameters before and during RT for predicting treatment outcomes in patients with NSCLC. Although there was a significant difference in LRC according to GTV pre , metabolic response showed some degree of impact based on subgroup analysis. However, changes in SUV max were significantly associated with DF, and this criterion has proved its diagnostic value to predict response to RT. Tumor burden, measured by GTV, is important in tumor control models of RT; a given dose induces a log cell kill, assuming that the larger the tumor, the more cells and, therefore, the more radiation needed for LRC [16]. Given that GTV pre defined on CT was significantly associated with LRR at the RT dose (total dose of 60-63 Gy) used in the present study, it can be assumed that dose escalation is needed to achieve local control in NSCLC [17]. Secondary analysis of the RTOG 9311 study revealed that increasing GTV (>45 cm 3 ) was related to poor OS and progression-free survival [18]. Several other series [19,20] have also suggested that tumor volume is a significant prognostic factor for survival. However, a recent prospective, observational factor study of TROG 99.05 [21] found that a large primary tumor volume was not associated with poor survival, after adjusting for the effects of T and N stage. Instead, large primary tumor volume had an adverse impact on survival only within the first 18 months (comparable to the median follow-up period for the present study). In addition, changes in GTV had no impact on the treatment outcomes, and metabolic response could help stratify patients: those with a large GTV pre and favorable metabolic response showed an LRC rate comparable to that of patients with a small GTV pre and poor metabolic response. Several series provide evidence for a correlation between SUV and tumor cell proliferation [22]. An early reduction in FDG uptake during treatment can predict tumor response. In addition, SUV max represents the enhanced tapping of 18 F-FDG into the tumor cells, due to biological mechanisms, tumor aggressiveness, and hypoxia [23].
Owing to the heterogeneity of patient populations with NSCLC at an advanced stage, there is no concrete evidence regarding the prognostic value of PET pre . A recent meta-analysis of 13 studies with 1474 patients demonstrated that high SUV max(pre) in the primary tumor was associated with reduced survival [24]. Another meta-analysis of 36 studies on 5807 patients with surgically treated NSCLC also identified SUV max(pre) as a prognostic factor for disease-free survival, with an HR of 1.52 (95% CI 1.16-2.00). However, the retrospective study by Hoang et al.
[25] with a homogeneous population did not find a correlation between metabolic parameters on PET pre and survival, which is consistent with the findings of the present study.
Discriminating non-responders from responders can help physicians to avoid unnecessary toxicity in patients expected to have a poor prognosis, by early interruption of ineffective therapy. Because changes in FDG uptake were associated with tumor shrinkage, PET interim can also help physicians decide when to modify the RT plan, with PTV modification or dose escalation. Several series with various sample sizes (10-77 patients) have shown the prognostic value of PET interim in patients with NSCLC treated with RT [26, 27] and in those with other solid tumors [28,29]. And secondary analysis of ESPATUE study revealed that remaining SUVmax in the primary tumor after induction chemotherapy was associated with survival and freedom from extracranial progression in consistent to the current study [30]. Furthermore, a recent meta-analysis of 21 studies on 627 patients reported PET interim as a promising tool for the early judgment of treatment [12]. However, because most of these studies were retrospective and examined multiple outcomes, concerns around the statistics include the fact that there were multiple comparisons and selective reporting of endpoints. More importantly, definite criteria or standard parameters have not yet been determined, and prognostic metrics range from SUV max [27] and ΔSUV max [31] to total lesion glycolysis [32] and metabolic tumor volume [14]. In our series, ΔSUV max was associated with DF and LRR, suggesting that this parameter helps to stratify patients. Metabolic response based on ΔSUV max was not significantly associated with LRC on univariable analysis, possibly due to the lack of statistical power.
However, SUV as a semiquantitative index has limitations owing to poor reproducibility [24], making it difficult to adopt a threshold among different centers. In place of the SUV value itself, we calculated a cut-off value for ΔSUV max (a 40% reduction), which was predictive of both LRR and DF. Criteria for the relative change in SUV max can be a tool for predicting early treatment response in the same institution, which, in turn, can minimize the issue of variability and enhance the prognostic value of this metabolic parameter.
Early response appears to be an indicator of tumor biology and a predictor of the likelihood of treatment failure. Thus, the assessment of early response makes it easier to identify poor responders who are eligible for the intensification or modification of treatment, instead of continuation of the initial treatment (the so-called 18 F-FDG-PET/CT-guided treatment algorithm). A recent phase II trial proved that adaptive RT with escalated doses accompanied by PET interim is feasible and results in favorable LRC [33]. A further ongoing clinical trial (RTOG 1106) is examining adaptive RT with dose escalation for FDG-avid tumors on PET interim . Another promising area of research that needs further prospective trials is the early switching of systemic chemotherapy in patients with a small decrease in SUV max . Recently, there are several on-going trials in other solid tumors investigating the role of immune checkpoint blockade stratified by PET parameters (NCT 03829007, NCT 03853187, NCT 02760225).
Our study had several limitations. First, as a retrospective analysis, the results should be interpreted with caution. Although we have analyzed an optimal cut-off value for SUV max , we used a median value of GTV and 25% criteria for ΔGTV as previously reported to minimize statistical overfitting. Optimal threshold could be derived from further investigation with large number of patients and it should be externally validated. Second, there are inherent biases since this study was carried out in a single institution. However, our analysis was strengthened using consistent modern 18 F-FDG-PET/CT, imaging analyses, chemotherapy regimen, and RT techniques. Other limiting factors include possible inflammatory changes caused by irradiation, which may mimic changes in tumor glucose metabolism associated with treatment. We evaluated the PET interim at 2 weeks after initiation of RT to minimize the overlapping of inflammation and residual tumor [12]. In addition, there is a possibility of overestimation of changes in SUV, because of the partial-volume effect; tumor reduction may underestimate the FDG uptake. Lastly, lack of a univocal parameter remains a challenge in dealing with the metabolic parameters as a universal prognostic or predictive factor. Although FDG uptake is generally used as a parameter to reflect the proportion of viable tumor cells, new tracers are now available for specifically detecting apoptosis and proliferation to provide a highly accurate prediction of treatment response.

Conclusions
We could cautiously assume that response criteria based on changes in SUV max during RT could be useful for identifying responders to current treatment among patients with NSCLC. The optimal management of poor responders identified on PET interim remains to be determined. Furthermore, a prospective study to confirm the efficacy of 18 F-FDG-PET/CT-guided algorithms in patients with NSCLC is warranted.