Prognostic value of baseline metabolic tumor volume and total lesion glycolysis in patients with lymphoma: A meta-analysis

Whether baseline metabolic tumor volume (TMTV) and total lesion glycolysis (TLG) measured by FDG-PET/CT affected prognosis of patients with lymphoma was controversial. We searched PubMed, EMBASE and Cochrane to identify studies assessing the effect of baseline TMTV and TLG on the survival of lymphoma patients. Pooled hazard ratios (HR) for overall survival (OS) and progression-free survival (PFS) were calculated, along with 95% confidence intervals (CI). Twenty-seven eligible studies including 2,729 patients were analysed. Patients with high baseline TMTV showed a worse prognosis with an HR of 3.05 (95% CI 2.55–3.64, p<0.00001) for PFS and an HR of 3.07 (95% CI 2.47–3.82, p<0.00001) for OS. Patients with high baseline TLG also showed a worse prognosis with an HR of 3.44 (95% CI 2.37–5.01, p<0.00001) for PFS and an HR of 3.08 (95% CI 1.84–5.16, p<0.00001) for OS. A high baseline TMTV was significantly associated with worse survival in DLBCL patients treated with R-CHOP (OS, pooled HR = 3.52; PFS, pooled HR = 2.93). A high baseline TLG was significantly associated with worse survival in DLBCL patients treated with R-CHOP (OS, pooled HR = 3.06; PFS, pooled HR = 2.93). The negative effect of high baseline TMTV on PFS was demonstrated in HL (pooled HR = 3.89). A high baseline TMTV was significantly associated with worse survival in ENKL patients (OS, pooled HR = 2.24; PFS, pooled HR = 3.25). A high baseline TLG was significantly associated with worse survival in ENKL patients (OS, pooled HR = 2.58; PFS, pooled HR = 2.99). High baseline TMTV or TLG predict significantly worse PFS and OS in patients with lymphoma. Future studies are warranted to explore whether TMTV or TLG could be integrated into various prognostic models for clinical decision making.


Introduction
Lymphoma continues to be the most common form of hematological malignancy worldwide [1,2]. Lymphoma is a heterogeneous group of biologically and clinically distinct neoplasms and have been historically divided into two distinct categories: non-Hodgkin lymphoma a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 (NHL) and Hodgkin lymphoma (HL) [3]. Although major progress has been made in the treatment of patients with lymphoma, many still fail to achieve a response or subsequently relapse [4]. These patients are not easily identified by existing pretreatment prognostic indexes such as the IPI (international prognostic index), IPS (international prognostic score for Hodgkin lymphoma [HL]), FLIPI (prognostic score for follicular lymphoma), MIPI (prognostic score for mantle cell lymphoma), and PIT (prognostic index for peripheral T cell lymphoma) or by conventional computed tomography (CT)-based response assessment [5][6][7][8][9]. Therefore, there is an urgent need for new prognostic and predictive markers which permit an accurate and early identify high-risk patient categories. [18] Fluorine fluorodeoxyglucose-positron emission tomography/computed tomography (FDG-PET/CT) has been recognized by the 2014 International Conference on Malignant Lymphoma imaging consensus guidelines as the standard imaging modality to evaluate glucose metabolism in fluorodeoxyglucose (FDG)-avid lymphoma tumors [10,11]. Its value for prognosis prediction at interim and end treatment has been recently investigated [12][13][14].
Many studies have also showed that quantitative volumetric parameters derived from baseline 18 F-FDG PET such as total metabolic tumor volume (TMTV) or total lesion glycolysis (TLG) could predict outcome in diffuse large B-cell lymphoma [15][16][17], in follicular lymphoma [18], in peripheral T-cell lymphoma [19], in extranodal natural killer/T-cell lymphoma [20] and in Hodgkin lymphoma [21,22]. However, those studies evaluating the prognostic values of pretherapy TMTV and TLG in in patients with various lymphoma subtypes showed inconclusive and contradictory results [23].
Therefore, the purpose of this meta-analysis was to evaluate the prognostic value of baseline TMTV or TLG by PET/CTin patients with lymphoma, in order to provide more evidence of their clinical value as prognostic biomarkers.

Inclusion and exclusion criteria
A computerized search of PubMed, Embase and Cochrane was conducted to find relevant studies published prior to May 01, 2018. The following search terms were used: ("lymphoma"[-MeSH Terms] OR lymphom � [All Fields] OR lymphoproliferative [All Fields] OR hodgkin � [All Fields] OR non-hodgkin � [All Fields]) AND ("Tomography, emission-computed"[MeSH Terms] OR ("positron emission tomograpy"[MeSH Terms]) OR (computed [All Fields] AND tomograph � [All Fields])) AND (prognos � OR predict � OR surviv � OR overall survival � OR recurrence � OR progress � ). All searches were limited to human studies.
Eligible studies met the following criteria: (i) They were observational studies (retrospective or prospective) or clinical trials, (ii) the studies were limited to lymphoma, (iii) 18 F-FDG PET was used as an initial imaging tool, (iv) patients had not undergone chemotherapy, immunechemotherapy or radiotherapy before the 18 F-FDG PET scan, (v) the volume of the lymphoma was measured, (vi) the survival data was reported. Studies were excluded if: (i) they were case reports, case series, review articles, editorials, letters or comments; (ii) the patient survival data was unavailable or insufficient to perform the meta-analysis, (iii) the data included was specifically for HIV-associated lymphoma, pediatric lymphoma, primary central nervous system lymphoma, primary testicular lymphoma, or primary mediastinal large B-cell lymphoma, or (iv) they included overlapping patients and data. Two reviewers (B.P. Guo and Q. Ke) independently selected the literature using a standardized protocol. Disagreements were resolved by discussion.

Quality assessment
The methodological quality of the primary manuscripts was independently evaluated by two reviewers (B.P. Guo and X.H. Tan) by means of the Newcastle-Ottawa-Scale (NOS) [24], which is used for the quality assessment of cohort and case-control studies. The NOS comprises three quality parameters: selection (0-4 points), comparability (0-2 points), and outcome assessment (0-3 points). Studies with scores of six or more were considered to be of high quality. Any disparities between investigators were resolved by discussion. Study quality was not an exclusion criterion.

Data extraction
Data extraction was carried out by B.P Guo and independently confirmed by the other authors (X.H. Tan and H. Cen). The collected data included the following: Study characteristics: first author, year of publication, country, study design, imaging modalities, type of lymphoma, number of patients, treatment, tumor volume parameters (maximum threshold for PET volume auto-segmentation, and median MTV/TLG), MTV/TLG cut-off values, determination of MTV cut-off, median follow-up, and endpoints. Extracted data were entered onto a standardized Excel file (Microsoft Corporation). Discrepancies were resolved by discussion with coauthors.
We chose OS and PFS as endpoints for our meta-analysis. Overall survival is defined as the length of time from either the date of diagnosis or the date of recruitment in a study to the moment of death as a result of any cause. PFS is defined as the length of time from either the date of diagnosis or the date of recruitment in a study until lymphoma progression or death as a result of any cause.

Statistical analysis
The impact of MTV or TLG on survival was measured by estimating the effect size of the hazard ratios (HR). Pooled HRs of more than 1.00 indicated poor survival in the group with high MTV or TLG values when compared with the group with low values. For studies in which the HRs and CIs were not available, we used the method proposed by Parmar et al. [25] to derive estimates from survival curves. The point estimate of the HR was considered statistically significant at the p< 0.05 level if the 95% CI did not include the value 1.
Heterogeneity was assessed by means of Cochran Q and I 2 statistics. I-square (I 2 ) values of <30%, 30%-50%, 50%-75% and >75% were defined as low, moderate, substantial and considerable heterogeneity, respectively [26]. If heterogeneity existed between primary studies, a random effects model was used. Otherwise, a fixed effects model was used for the meta-analysis. Publication bias was assessed by visual inspection of the funnel plot and also by means of the Begg and Egger tests [27,28]. A p-value less than 0.05 indicates the existence of publication bias. All statistical analyses were performed by using RevMan 5.3 (Nordic Cochrane Centre).

Study characteristics
The 27 observational studies fulfilled the inclusion criteria, were published between 2012 and 2018 and are summarized in Table 1. Twenty-five studies were retrospective observational studies, and three studies were prospective multicenter trials. Seventeen studies included  patients with diffuse large B cell lymphoma, three included patients with follicular lymphoma, one included patients with peripheral T-cell lymphoma, four studies included patients with extranodal natural killer/T cell lymphoma, and three studies included patients with Hodgkin lymphoma. The total sample size was 2729. Either TMTV or TLG was measured in 12 studies, and both were measured in 15 studies. Three thresholding methods for the auto segmentation of PET volumes exist. A fixed SUV of 2.5 was used in 7 studies, the percentage of SUVmax (40%, 41%, 42% or 50%) was used in 18 studies, and in 2 studies the MTV was measured by setting the tumor margin threshold as the liver SUVmean plus 3SDs. In each study, patients were divided into 2 groups (high and low volume) based on the cut-off values. The MTV and TLG cut-off values were determined by means of receiver operating curve (ROC) and X-tile analyses. Receiver operating characteristics (ROCs) was used in 19 studies, receiver-operating characteristics and X-tile analysis in 3 studies, X-tile analysis in 1 study, and 4 studies did not provide cut-off information. The MTV cut-off values ranged between 10.7 and 595 cm 3 and the TLG values ranged from 46.4 to 5356. The study quality assessed by means of the NOS was fair, with a median quality score of 8 (range 5-9).
Next, we examined the prognostic significance of TLG on different types of lymphomas. A meta-analysis of seven and eight studies involving DLBCL patients showed poorer PFS and OS in those with high TMTV values than in those with low TMTV values, with pooled HRs for OS and PFS of 3.06 (95%CI, 1.52-6.18, p = 0.002; heterogeneity: I 2 = 67.3%, p = 0.003) and 2.93  (Table 3). Only one study provided relevant data on the correlation between TLG and clinical outcome in PTCL patients and only one study provided data on the correlation between TLG and outcome in HL patients; therefore, the pooled analysis could not be performed.
We also conducted subgroup analyses stratified by data collection method, sample size, and different threshold values. The subgroup analysis of retrospectively collected data showed pooled HRs for OS and PFS of 2.28 (95%CI, 1.40-3.71, p = 0.001; heterogeneity: I 2 = 50.1%, p = 0.029) and 2.97 (95%CI, 2.03-4.35, p<0.001; heterogeneity: I 2 = 35.7%, p = 0.122), respectively. Only one study involved prospectively collected data, so the pooled analysis could not  Table 3). Only one study used a threshold �2.5, so the pooled analysis could not be performed.

Publication bias
Inspection of the funnel plot and formal statistical tests (TMTV: Egger test, p = 0.931; Begg test, p = 0.867; TLG: Egger test, p = 0.200; Begg test, p = 0.236; Fig 4) showed no evidence of publication bias in the meta-analysis of the prognostic significance of baseline metabolic tumor volume and total lesion glycolysis in adult lymphoma.

Main findings
This meta-analysis comprehensively and systematically reviewed the current available literature and found that: (1) A high baseline TMTV significantly predicted poor OS and shorter PFS in adult lymphoma patients (p<0.00001 and p<0.00001, respectively); (2) A high baseline TMTV was significantly associated with reduced survival in DLBCL patients treated with R-CHOP and predicted poor OS and PFS for different types of lymphomas, such as FL, ENKL and HL. The evidence supporting this association was consistent in most subgroup analyses (retrospective data collection, ethnicity, sample size, and different thresholds). The analysis of prospectively collected data and studies using a 40% threshold suggested a trend towards poor OS; however, these results were not statistically significant; (3) A high baseline TLG significantly predicted poor OS and shorter PFS in adult lymphoma patients (p<0.00001 and p = 0.005, respectively); (4) A high baseline TLG was significantly associated with reduced survival in DLBCL patients treated with R-CHOP and predicted poor OS and PFS in different types of lymphomas, such as FL, ENKL and HL. The evidence of this association was consistent in most subgroup analyses (data collection method, ethnicity, sample size, and different thresholds). Metabolic tumor volumes can be segmented by using various methods, such as a fixed SUV threshold, a percentage (based on the percentage of maximum uptake in the lesion), a threshold adjusted to the tumor-to-background ratio, or a gradient [23]. Reproducibility is the key for reliable volumetric tumor segmentation. Different TMTV measurement methods have been used in various types of lymphoma, each with specific advantages and disadvantages. A method based on the 41% SUVmax threshold is recommended by the European Association of Nuclear Medicine (EANM) for TMTV measurement of solid tumors. It has been developed in patients with HL and DLBCL, showing good reproducibility [48]. Different thresholding methods were used for PET volume auto-segmentation in the studies included herein; however, a threshold of 41% or 40% of the SUVmax was widely used. In our study, we conducted subgroup analyses stratified by different thresholds (�2.5, 41%, and others). The results demonstrated that high TMTV or TLG values were associated with shorter PFS and OS. Subgroup stratification based on a threshold of 40% of the SUVmax showed that a high TMTV was a negative predictor of PFS; however, it did not significantly predict poor PFS and OS in the case of TLG.
Baseline MTV by PET/CT, is a promising prognostic indicator in patients with lymphoma, which is better than using size-defined bulk [16,33]. TLG, which is the MTV multiplied by the mean SUV in the volume, is also prognostic [37], but appears no better than MTV in prediction of survival in lymphoma [16].
Several retrospective studies have shown that metabolic tumor volume (MTV) is a strong predictor of prognosis irrespective of the method [19,21,49]. However, cut-offs used to divide patients into high and low risk groups by MTV are highly dependent on the patient population and the method used. A fixed 41% SUVmax relative thresholding method has been applied successfully in different subtypes of lymphoma, but probably overestimated the volume of lesions with low SUVmax, particularly for smaller VOIs [19,21,48]. The 2.5 method could include the volume of nontumor regions located between small distant nodes with high uptake [50]. The 2.5 method probably overestimated MTV in approximately 12% of patients who had low FDG uptake in the liver or liver involvement by lymphoma [49]. Furthermore, the negative and positive predictive values of the 41% method have been shown to be superior to other methods, which results in excellent outcome prediction in other subtypes of lymphoma [21]. Generally, current evidence showed that metabolic tumor volume values were significantly influenced by the choice of the method used for determination of volume. However, no significant differences were found in term of prognosis [21]. In clinical practice, a consensus on the most accurate method or an optimal cut-off to define the MTV for specific lymphoma subtypes will be required, which will require validation in multicenter prospective trials.
Several methods for autosegmentation of PET volumes exist (e g, threshold-based, gradient-based, statistical, and texture-based methods) [51]. All methods have strengths and limitations. Reproducibility is the key for tumor segmentation in routine practice [23]. There is no universally accepted reproducible and practical method for tumor segmentation. Recently, Yu et al. reported a new semi-automatic approach that applies first an anatomical multi-atlas segmentation on the CT images to remove the organs having hyper uptake value on PET images. Using a CRFs (Conditional Random Fields) model, the rate of good detection of lymphoma is 100% in 11 patients [52]. Meanwhile, this new semi-automatic approach has the best dice index for the real lymphoma regions. However, this new methodology will require prospective validation in sufficiently large patient cohorts.
Among the included studies, there were mainly two different approaches to define the optimal TMTV cut-off value as a predictor of survival: X-tile analysis, receiver operating curve (ROC) analysis, or both. X-tile analysis is the primary approach for reliable cut-point determination. This method creates separate training and validation data sets, improving the robustness of the analysis [53]. In the studies included in this meta-analysis, ROC was widely used. This method defines the optimal cut-off point as the value whose sensitivity and specificity are closest to the value of the area under the ROC curve, and for which the absolute value of the difference between the sensitivity and specificity values is minimal. This method is recommended for finding the true cut-off point [54]. Meignan et al. [18] used another restricted cubic spline to define the optimal TMTV cut-off point. Splines are used to model the relationship between TMTV as a continuous variable and survival time, but their contribution to optimal cut-off point definition is minimal. Subgroup analyses based on different MTV cut-off values demonstrated that patients with a high TMTV had shorter PFS and OS than those with a low TMTV.
A major strength of this meta-analysis is that it complied with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [55]. In addition, we extracted the maximum information from the included studies by a thorough qualitative review and quantitative meta-analysis.
Our study also has some limitations. Firstly, nearly all of the included studies were retrospective, which may result in confounding and detection bias. Secondly, patients with different types of lymphoma were treated with different therapeutic regimes. Thirdly, PET scans were performed using scanners of different generations, which may potentially affect the calculation of the SUV and therefore, of TMTV and TLG as well. Similarly, the FDG uptake times were difficult to standardize. Based on all of the above, the clinical heterogeneity of the included studies could be an issue. Finally, our meta-analysis was based on data from published trials, and we did not obtain individual patient data.

Conclusions
Our meta-analysis suggests that high baseline metabolic tumor volumes or total lesion glycolysis measured by FDG-PET/CT predict significantly worse overall survival and progressionfree survival in patients with lymphoma. Therefore, TMTV and TLG may serve as new prognostic biomarkers. In view of our findings, future clinical trials with patients with different types of lymphoma are warranted to determine whether these novel findings can be integrated into various prognostic models, with the goal of achieving better risk stratification and treatment selection.