The performance of MR perfusion-weighted imaging for the differentiation of high-grade glioma from primary central nervous system lymphoma: A systematic review and meta-analysis

It is always a great challenge to distinguish high-grade glioma (HGG) from primary central nervous system lymphoma (PCNSL). We conducted a meta-analysis to assess the performance of MR perfusion-weighted imaging (PWI) in differentiating HGG from PCNSL. The heterogeneity and threshold effect were evaluated, and the sensitivity (SEN), specificity (SPE) and areas under summary receiver operating characteristic curve (SROC) were calculated. Fourteen studies with a total of 598 participants were included in this meta-analysis. The results indicated that PWI had a high level of accuracy (area under the curve (AUC) = 0.9415) for differentiating HGG from PCNSL by using the best parameter from each study. The dynamic susceptibility-contrast (DSC) technique might be an optimal index for distinguishing HGGs from PCNSLs (AUC = 0.9812). Furthermore, the DSC had the best sensitivity 0.963 (95%CI: 0.924, 0.986), whereas the arterial spin-labeling (ASL) displayed the best specificity 0.896 (95% CI: 0.781, 0.963) among those techniques. However, the variability of the optimal thresholds from the included studies suggests that further evaluation and standardization are needed before the techniques can be extensively clinically used.


Introduction
Gliomas are the most common type of primary neoplasms in adults [1]. Patients who are afflicted with glioma, particularly high-grade glioma (HGG), always have a short lifespan and poor quality of life. In general, the HGGs were more likely to be rim-like lesions on the MR imaging while the PCNSLs were more likely to be homogeneous enhancing masses. However, in many cases, conventional MR imaging of primary central nervous system lymphoma mimics that of the high-grade glioma, which could all appear with rim-like enhancement with necrosis or could manifest as homogeneous enhancing masses [2][3]. However, the treatment strategies are completely different. Therefore, accurately differentiating HGG from PCNSL is quite important for the adoption of eligible treatment strategies to minimize the risk for those patients [4][5][6].
Given the limitations of conventional MRI in differentiating HGG from PCNSL, an increasing number of studies have recently focused on monitoring the physiological and metabolic characteristics of tumors [2,3,7,8]. MR perfusion imaging, including the dynamic susceptibility-contrast (DSC)-MRI, dynamic contrast-enhanced (DCE)-MRI, intra-voxel incoherent motion (IVIM)-MRI and arterial spin-labeling (ASL)-MRI techniques, could provide information about the micro-vascular physiology of tumors. Among the techniques of MR perfusion imaging, DSC is the most widely used. The main application of DSC is to quantitatively detect the cerebral blood volume (CBV) in different lesions [7]. Compared with the DSC technique, IVIM has the advantage of providing quantitative measurements of both the tumor cellularity and vascularity [9]. ASL is an emerging MR perfusion imaging technique that requires no extrinsic tracer or radiation exposure, which is a benefit of ASL over other perfusion imaging techniques [10]. Additionally, DCE has the ability to obtain characteristics of the vascular microenvironment such as vascular permeability [8].
It has been reported that HGG and PCNSL share different vascularity features [7,8,11]. Therefore, PWI holds promise in separating HGG from PCNSL on the basis of their different characteristics of angiogenesis and neovascularity [11][12][13][14]. However, individual studies have used different techniques on heterogeneous patient groups and included a small number of cases, thus making it difficult to systematically evaluate the performance of PWI. Therefore, we perform this meta-analysis systematically to assess the diagnostic accuracy of MR perfusion in distinguishing HGG from PCNSL based on the eligible published studies.

Search strategy
We conducted this meta-analysis according to the PRISMA guidelines (S1 Table). A systematic literature search was conducted in Embase, PubMed, and Chinese Biomedical databases to select eligible studies by using a combination of free-text words and MeSH terms as follows: (perfusion/PWI/perfusion weighted imaging/magnetic resonance perfusion/MR perfusion/ perfusion image) AND (glioma/brain neoplasm/brain tumor) AND (lymphoma). The search time was from the database inception to October 1, 2016, with the language restricted to English and Chinese. The reference lists of all eligible studies were hand-searched for underlying relevant articles.

Data extraction and quality assessment
The last process to evaluate the articles included was completed individually by two of the reviewers (A.W. Shao and W.l. Xu). The following basal characteristics were obtained: authors, years, country, study design, number of patients included in each study, age and gender, pathology, reference standards and technical information (strength of image field, technique of PWI, parameters, cut-off value).
For the differentiation, HGGs (grades III-IV) were positive, and PCNSLs were negative. The TP, FP, FN and TN values from each study were calculated. Two of the authors independently assessed the methodological quality of the studies using the Quality Assessment Tool for Diagnostic Accuracy Studies version 2 (QUADAS-2) [15]. Any discrepancies were resolved by an adjudicating senior author.

Statistical analysis
We used standard methods to evaluate the diagnostic accuracy [16][17].
First, we evaluated the threshold effect by adopting the Spearman correlation coefficient between the logit of SEN and the logit of (1−S; first, the heterogeneity was evaluated between each study that may have been caused by PE). A threshold effect existed if the value of P < 0.05.
Then, a chi-squared value test and inconsistency index (I2) of the diagnostic odds ratio (DOR) were used to assess the heterogeneity in each study. If severe heterogeneity was present with a value of P < 0.1 or I2> 50%, the random effect models were chosen; otherwise, the fixed effect models were used. We performed meta-regression analyses to find the source of heterogeneity [18,19].
We calculated the pooled sensitivity, specificity, LR+, LR−, and diagnostic odds ratios (DOR) with their 95% confidence intervals (CIs) with the best performing parameter from each study, and the same principle was used in the subgroup analyses. We added a value of 0.5 to all cells of studies that had SENs or SPEs of 100%. We calculated the SROC, AUC and Q Ã index (i.e., the point on the SROC at which SEN and SPE are equal; this is the best statistical method for assessing diagnostic performance). AUC values ranging from 51% to 70%, from 71% to 90%, and >90% suggested low, moderate, and diagnostic accuracy, respectively. The minimum number of studies required to form a subgroup was 3. The statistical analyses mentioned above were conducted using the Meta-DiSc statistical software version 1.4 [17].
Publication bias was assessed by Deek's funnel plot. Formal testing for publication bias was conducted with P < 0.1 showing significant asymmetry [20]. This process was conducted using Stata14.0 (StataCorp LP, College Station, TX).

Literature search and study characteristics
A total of 67 articles were screened based on their abstracts and our inclusion/exclusion criteria; 17 of these articles were potentially eligible for further assessment. After a full-text review, the remaining 14 studies evaluating patients with high-grade glioma vs primary central nerve system lymphoma (PCNSL) using MR perfusion met the eligibility criteria for the meta-analysis [9-10,21-24,25-28,29-32]. The study selection flow is displayed in Fig 1. The detailed characteristics of all 14 articles are summarized in Table 1. (More details could be reached in S2 Table) As shown in Table 1, eleven studies were retrospective, and only three studies were prospective. Among the 14 studies, the number of participants included in each article ranged from 29 to 71, and 598 patients had an appropriate quality of data (according to the data extraction in 'Materials and Methods'). These 598 patients had a mean age of 55.8, ranging from 6 to 90. The main reference standards used in each study were pathological analyses obtained from biopsy and/or resection. In these 598 patients, there were 178 PCNSLs and 420 HGGs. Six articles evaluated DSC [21,22,24,25,27,32], 5 studies evaluated ASL [10,23,24,28,31], 3 studies evaluated DCE [26,29,30], and 2 studies evaluated IVIM [9,21].  Differentiation of high-grade glioma from primary central nervous system lymphoma by MR perfusion Regarding the strength of the imaging field, 12 studies utilized 3.0T MRI Scanners, and only 2 studies used 1.5T MRI Scanners [24,28].
The quality test of each study is shown in Fig 2. Most of the studies had a low or unclear risk of bias. Overall, the study quality was eligible.  Fig 3A. The AUC under the SROC was 0.9415 ( Fig 4A).
No IVIM parameter was eligible for the subgroup meta-analysis because the minimum required number for each subgroup analysis was three.

Heterogeneity analysis
No severe heterogeneity was found in the pooled analysis in the DSC or ASL groups, but there was severe heterogeneity in the overall and ASL groups.
The degree of malignancy correlates with both the microvascularity and neovascularity of the tumors. A high degree of malignancy increases the microvascularity and neovascularity of tumors and thus increases the tumor blood flow [25][26][27]. Pathologically, HGG tends to be more malignant than PCNSL, so HCG tends to have a higher level of tumor blood flow and denser vascularity. All of the hemodynamic variables could be measured by using different MR perfusion imaging techniques.
In the results, the AUC for the overall group was 0.9415, which indicated a high diagnostic accuracy of the PWI to distinguish HGGs from PCNSLs. The DOR is a single indicator of test performance that combines the SEN and SPE data into a single number [34]. The pooled DOR for diagnostic accuracy of the overall group was 55.83, which indicated that the use of MR perfusion might be helpful in distinguishing HGGs from PCNSLs. LR+ and LR−are also adopted as ways to assess the diagnostic accuracy of the test because these values appear to be more significant in clinical practice than are the SROC curve and the DOR. A LR >10 or <0.1 always means great and consequential shifts from pre-test to post-test probability and show a good diagnostic accuracy [35]. The value of the LR+ for the overall group was 5.63, which suggests that patients with HGGs were approximately six times more likely to have a positive test than patients with PCNSLs. In contrast, the LR−value was 0.145, which indicates that if the value of the best parameter was lower than the corresponding cut-off value, the probability for this patient to be diagnosed with HGG would be 14.5%, which is not sufficiently low to exclude HGGs. There was evidence of heterogeneity in the overall group, but this heterogeneity was not caused by threshold effect. Therefore, we conducted a meta-regression analysis, which demonstrated that the source of the heterogeneity might come from the MR perfusion imaging technique (p = 0.001). Deek's funnel plot asymmetry test showed no significant publication bias for the overall group.
For the DSC group, the AUC (0.9812) suggested a high diagnostic accuracy. The pooled DOR for the DSC technique was 204.10, which showed that the DSC technique might be useful in the diagnosis of HGGs. There was no evidence of heterogeneity or publication bias among the 6 relevant studies, which indicated that the results for the DSC technique were statistically credible.
For the ASL group, the AUC (0.9421) also indicated a high diagnostic accuracy. The pooled DOR for diagnostic accuracy of the ASL technique was 47.987, which showed that the DSC technique might also be useful in the diagnosis of HGGs. There was no evidence of heterogeneity or publication bias among the 5 relevant studies, which meant that the results for the ASL technique were statistically credible.
For the DCE group, the AUC (0.9179) also showed a high diagnostic accuracy. The pooled DOR for diagnostic accuracy of the DCE technique was 21.247, which showed that the DSC Differentiation of high-grade glioma from primary central nervous system lymphoma by MR perfusion technique might also be useful in the diagnosis of HGGs. Evidence of heterogeneity was observed for the DCE technique. A meta-regression indicated that the design and strength of MRI might contribute to the heterogeneity because the three studies were all retrospective and used 3T MRI. No publication bias was observed in the DCE group.
The DSC is a widely-used technique in the literature for assessing intracranial mass lesions [36,37]. The results of the DSC technique in this meta-analysis showed higher diagnostic accuracy (AUC: 0.9812) than the other two techniques (AUC: ASL, 0.9421; DCE, 0.9179), demonstrating that the DSC technique has higher diagnostic accuracy than the ASL and DCE group in distinguishing HGGs from PCNSLs. However, the results of DSC perfusion imaging could be affected by the T2 Ã and T1 effects due to contrast agent leakage. ASL is an emerging MR perfusion imaging technique that requires no extrinsic tracer or radiation exposure, which is an advantage of ASL over other perfusion imaging techniques. ASL also showed high accuracy in clinical applications [10,[38][39][40][41]. Furthermore, several studies have displayed the successful application of DCE-MR imaging for the quantitative evaluation of vascular permeability parameters, although its limitations affect its clinical use [42][43].
The Youden index (sensitivity+specificity-1), a combinatory index of sensitivity and specificity at a cut-point, summarizes the discriminatory accuracy of a diagnostic test [44]. Based on the overall study analysis, the Youden index for the differentiation of HGGs from PCNSLs was higher for the DSC technique (0.824) than for the ASL technique (0.722) or the DCE technique (0.645). Considering this diagnostic performance, the DSC technique might be an optimal index for distinguishing HGGs from PCNSLs. Additionally, the DSC technique holds the best sensitivity (0.963) compared with the other two techniques (ASL/DCE: 0.826/0.884), whereas the ASL technique displayed the best specificity (ASL/DSC/DCE:0.896/0.861/0.761) in the discrimination.
However, given the limited data, a further subgroup analysis for the DSC technique is needed to find the optimal parameter and its cut-off value in differentiating HGGs from PCNSLs.

Limitations
There were several limitations in our meta-analysis, although the MR perfusion showed a high diagnostic accuracy.
First, most of the included studies adopted multiple and different parameters to evaluate the performance of the MR perfusion; therefore, the optimal parameter and threshold value remain difficult to identify due to the highly variable proposed cutoff values, and the conclusion drawn from each study is potentially valuable only as a general guide. Further evaluation and standardization of the techniques and post-processing methods are needed before the techniques can be extensively clinically used. Second, we included patients who had been diagnosed with WHO grade III glioma, whereas the majority of the patients included were grade IV. Thus, the different tumor biology and angiogenesis might have impact on the results. Third, there was evidence of heterogeneity among the overall and DCE groups. Factors such as different field strengths, types of techniques and post-processing methods might have contributed to this heterogeneity. Although heterogeneity was not found in the DSC and ASL groups, there were differences among the studies, such as age, gender, study designs, parameters and MR devices.

Conclusions
This meta-analysis revealed a high level of accuracy of the PWI to distinguish HGGs from PCNSLs. Among the MR perfusion imaging techniques, DSC might be an optimal index for distinguishing HGGs from PCNSLs. Furthermore, the DSC technique showed the best sensitivity. and the ASL technique displayed the best specificity. However, the variability of optimal thresholds from the included studies suggests that further evaluation and standardization are needed before the methods can be extensively clinically used.
Supporting information S1