Diagnostic utility of whole body Dixon MRI in multiple myeloma: A multi-reader study

Objective To determine which of four Dixon image types [in-phase (IP), out-of-phase (OP), fat only (FO) and water-only (WO)] is most sensitive for detecting multiple myeloma (MM) focal lesions on whole body MRI (WB-MRI) images. Methods Thirty patients with clinically-suspected MM underwent WB-MRI at 3 Tesla. Unenhanced IP, OP, FO and WO Dixon images were generated and read by four radiologists. On each image type, each radiologist identified and labelled all visible myeloma lesions in the bony pelvis. Each identified lesion was compared with a reference standard consisting of pre- and post-contrast Dixon and diffusion weighted imaging (read by a further consultant radiologist) to determine whether the lesion was truly positive. Lesion count, true positives, sensitivity, and positive predictive value were compared across the four Dixon image types. Results Lesion count, true positives, sensitivity and confidence scores were all significantly higher on FO images than on IP images (p>0.05). Discussion FO images are more sensitive than other Dixon image types for MM focal lesions, and should be preferentially read by radiologists to improve diagnostic accuracy and reporting efficiency.


Introduction
In recent years, whole body-MRI (WB-MRI) has emerged as a valuable tool for assessing disease activity in multiple myeloma (MM). [1][2][3][4][5] MRI is a key component of the Durie-Salmon PLUS staging system [6], and the number of lesions identified on MRI correlates closely with mortality. [7] As a result, WB-MRI is developing into a first-line imaging modality in MM. [8,9] The two major obstacles for widespread use of WB-MRI are cost and long scan times. It is therefore important to maximise diagnostic value but minimise acquisition time, particularly for MM patients who may be frail and in pain. To make best use of the available scan time, WB-MRI protocols typically include both anatomical imaging (for assessment of morphology, fractures and spinal cord compression [10,11]) and functional imaging (for assessing cellularity and perfusion [5,[11][12][13]). However, imaging protocols vary substantially between centres: anatomical imaging may use T1-weighted or T2-weighted images (or a combination), and may implement spin echoor gradient echo-based sequences. [14] When choosing sequences, considerations include image quality, acquisition time, the cost of data acquisition and storage, and interpretation time.
Recently, gradient echo-based Dixon MRI has been used for anatomical WB-MRI in MM [5,14,15], and has several advantages over conventional T1-or T2-weighted imaging. Dixon MRI enables the generation of four separate image types: in-phase (IP), out-of-phase (OP), water-only (WO) and fat-only (FO). Acquisition times are similar to those for conventional gradient echo imaging and shorter than for spin echo imaging. [15,16] When reporting, the IP images can be viewed in a similar fashion to conventional T1-weighted images, whilst water and fat can be separately evaluated on WO and FO images.
However, it is uncertain whether the 'additional' images (OP, WO and FO) offer any additional diagnostic information compared to IP imaging alone, and if so which image type is optimal for reading. Therefore, the best approach to reporting WB-MRI is unclear: it is uncertain whether reviewing the IP images alone is sufficient, or whether the additional images provide additional information. Clarifying this issue could improve the accuracy of disease staging and also increase reporting efficiency-radiologists could begin their read by reviewing the most diagnostically-useful scans. Furthermore, reconstructing and storing the additional images would only be justified if they provided additional diagnostic information.
In this study, we aimed to evaluate radiologists' diagnostic accuracy for detecting focal lesions on each of the four Dixon image types, using post-contrast and diffusion images as a reference standard. We hypothesised that sensitivity would be improved by using FO and WO images compared to IP images.

Materials and methods Subjects
This prospective study was performed with institutional review board approval (Research Ethics Committee reference 12/LO/0428). All patients gave written informed consent.
Thirty patients (13 males and 17 females, median age 55, age range 36-82) with clinically suspected symptomatic multiple myeloma were prospectively enrolled between June 2012 and September 2014. Patients were excluded if they had a history of previous. malignancy or previous chemotherapy/radiotherapy, estimated GFR < 50 mL/min/1.73 m 2 , were unable to given informed consent or had a contraindication to MRI scanning. Further assessment showed that 26 out of 30 had MM, one had smoldering MM, two a had solitary plasmocytoma, and one had monoclonal gammopathy of uncertain significance. For each patient, clinical and biochemical parameters were recorded as shown in Table 1. Baseline interphase fluorescence in situ hybridisation (FISH) was performed on CD138-selected plasma cells from bone marrow samples, using probes for IGH translocations t(4:14), t(11;14) and t(14;16), del(17p), del (13) and 1p-/1q+ [17]. Genetic risk was determined according to International Myeloma Working Group recommendations [18].

Acquisition
All subjects underwent WB-MRI imaging on a 3.0T wide-bore system (Ingenia; Phillips Healthcare, Best, Netherlands) using two anterior surface coils, a head coil and an integrated posterior coil. The WB-MRI protocol included coronal pre-and post-contrast modified Dixon (Dixon) acquisitions from which fat and water images and calculated in and out of phase images were reconstructed on the scanner using a two-point method [19] (TR 3.0ms, TE 1.02-18, flip angle 15˚, slice thickness 5mm, pixel bandwidth 1992Hz, acquisition matrix 196 x 238, SENSE factor 2, number of slice 120) in addition to diffusion and post-contrast imaging covering vertex to toe using ten contiguous anatomical stations ( Table 2). The coronal images were 'stitched' together and presented as a head-to-toe whole body image to the reader; the images were then magnified according to the reader's preference for specific analysis of the pelvis.

Image assessment
The individual sets of pre-contrast Dixon images were randomised and read by four consultant radiologists, who each had between five and fifteen years of specialist expertise in oncological MR imaging. All readers were blinded to clinical data and diagnosis. On each image set, each radiologist was asked to count the number of myeloma lesions present in the bony pelvis (pubis, ischium, ilium and sacrum) and to label these lesions on the images (up to a maximum of 20). If the disease was diffuse or there were over 20 lesions, the patient was assigned a lesion count of 20. Additionally, the radiologists were asked to provide a confidence score based on their degree of certainty that there were myeloma lesions in the pelvis on a 4-point Likert scale (1-no lesions, 2-indeterminate lesions, 3-likely myeloma lesions, 4-very likely myeloma lesions). After scoring, each labelled lesion was compared to a reference standard consisting of diffusion-weighted, pre-and post-contrast Dixon imaging, which had been evaluated by a further consultant radiologist with over 20 years of experience in myeloma and MR imaging. On the reference imaging, all lesions demonstrating abnormal marrow signal compared to background marrow (i.e. hypointense on IP and FO images, and hyperintense on WO images) and which showed contrast enhancement or restricted diffusion were assigned as myeloma lesions and labelled on the images. For the reference standard, no maximum lesion count was used (i.e. all lesions were labelled) to ensure that all lesions on the IP, OP, FO and WO Dixon images could be compared directly to a reference lesion. Using the reference standard imaging, we also recorded whether patients had focal or diffuse disease (the diffuse category included patients with focal-on-diffuse infiltration). For each Dixon image set, we compared each lesion with the reference standard to determine the number of per-set true positive lesions (TP), false positive lesions (FP) (i.e. those that were incorrectly identified as lesions); and false negative lesions (FN) (these were the 'reference-standard lesions' which were not identified). For each Dixon image type (30 sets per type), we determined the mean per-set lesion count, sensitivity (TP/TP+FN), positive predictive value (TP/TP + FP) and mean confidence score.

Design and statistics
A summary of the study design is given in Fig 1. To account for clustering within the data, for each lesion detection metric (lesion count, sensitivity, positive predictive value and mean confidence score), values were compared across the four Dixon image types using a multilevel mixed-effects linear regression model, performed using Stata [Stata IC Version 14.1, College Station, USA]. Image type (i.e. IP, OP, FO or WO) was used as the predictor variable, and the value of the specific lesion detection metric being analysed (i.e. lesion count, sensitivity, positive predictive value or mean confidence score) was used as the outcome variable. Data were clustered at the level of 'subject' (patient) and 'observer' (radiologist). This analysis was repeated for the subgroup of patients who had diffuse disease (as determined by the reference standard assessment), and for the subgroup of patients with focal disease.

Percent contrast and contrast-to-noise ratio
Percent contrast and contrast-to-noise ratio (CNR) were calculated using a previously described method [15]. Specifically, in patients with at least three focal lesions greater than 3mm in diameter, circular regions of interest (ROIs) were placed on the three largest focal myeloma lesions, and three further ROIs were placed in areas of bone marrow without focal lesions in the sacrum and iliac bones. Percent contrast was calculated as: where S a is the mean signal intensity of myeloma lesions and S b is the background marrow signal intensity.
Similarly, CNR was calculated as: where S asd and S bsd are the mean within-ROI standard deviation values for myeloma lesions and background marrow respectively. A one-way analysis of variance (ANOVA) with a posthoc Tukey Kramer multiple comparison test was used to compare percent contrast and CNR between image series.

Results
Four radiologists read four image series for each of 30 patients (120 image series per radiologist), and identified 610, 955, 549 and 734 lesions respectively compared to 1560 reference lesions. An example of a focal lesion, as shown on the four Dixon image types, is given in Fig 2. A summary of the mean lesion count, true positives, sensitivity, positive predictive value and confidence score for each of the four image types is given in Table 3; these values are also shown graphically in Fig 3. The results of the regression analysis including confidence intervals are also provided in Table 3.

Lesion count and true positives
The mean lesion counts for each image type (averaged over all patients and all four radiologists) were 5.2 for IP, 5.8 for OP, 7.0 for FO and 5.7 for WO. Significantly more lesions were identified on the FO images than on the IP images (p = 0.006), but there was no significant difference between OP and IP images (p = 0.364) or WO and IP images (p = 0.504).
Of the identified lesions, the mean number of true positives was 4.9 for IP, 4.6 for OP, 6.5 for FO and 5.1 for WO. Significantly more true positive lesions were identified on the FO images than on the IP images (p = 0.008), but there was no significant difference in true positives between OP and IP images (p = 0.633) or WO and IP images (p = 0.702).

Sensitivity and positive predictive value
The mean sensitivity for each image type was 0.34 for IP, 0.32 for OP, 0.42 for FO and 0.36 for WO. Sensitivity was significantly higher on the FO images than on the IP images (p = 0.023), but there was no significant difference between OP and IP images (p = 0.696) or between WO and IP images (p = 0.590).
The mean positive predictive values were 0.86 for IP, 0.67 for OP, 0.81 for FO and 0.82 for WO. There was no significant difference in PPV for FO compared to IP (p = 0.146) or WO compared to IP (p = 0.617). However, positive predictive values were significantly poorer on OP images than on IP images (p = 0.000).

Sub-group analysis
Of 30 patients, there were 23 patients in the focal disease group (this included the two patients with solitary plasmacytoma) and six patients in the diffuse disease group. True positives, sensitivity and PPV for focal and diffuse groups are given in Table 4 and Table 5 respectively. In the focal disease group, true positives and sensitivity were significantly higher in the FO group than in the IP group (p = 0.008 and 0.037 respectively). There was no significant difference in PPV between IP and FO groups (p = 0.516). The OP images performed significantly less well than IP images in terms of PPV (p = 0.000).
In the diffuse disease group, there were no significant differences between FO and IP groups in terms of true positives, sensitivity or PPV (p = 0.483, p = 0.349 and p = 0.113 respectively). PPV was again significantly poorer on the OP images than on the IP images (p = 0.004).
True positives and sensitivity were higher for the focal disease group (sensitivity on FO images was 0.29 in the diffuse group and 0.46 in the focal group), although these groups were not formally compared.  30 and WO: 4.5), and was significantly higher for FO images than for IP images (p = 0.003). There was no significant difference between OP and IP images, or between WO and IP images. True positives, sensitivity, positive predictive value and confidence are compared across the four image types, using the in phase images as the baseline.

Percent contrast and contrast-to-noise ratio
https://doi.org/10.1371/journal.pone.0180562.t005 Contrast to noise ratio was also highest in the FO group (the values for each image type were, IP: 3.41, OP: 3.34, FO: 5.57 and WO: 5.04). However, was no significant difference in CNR between groups.

Discussion
In this study, lesion counts, true positive counts, sensitivity, positive predictive value and reader confidence were compared across the four Dixon images types. We have shown that FO images are superior to other image types and in particular IP images in terms of lesion counts, true positives, sensitivity and confidence. Furthermore, our data suggest that focal lesions demonstrate greater contrast compared to background marrow on FO images than on IP images, which may account for the superior sensitivity of FO images. The positive predictive values for FO images were similar to those for IP and WO images and higher than those for OP images, suggesting that the increase in sensitivity reflects a true increase in lesion conspicuity rather than a lower reader threshold for lesion identification. The use of FO images offered the greatest advantage for patients with focal lesions, but also provided superior sensitivity in patients with diffuse disease.
The superior performance of FO imaging could occur because myelomatous infiltration of the bone marrow causes a proportionally greater reduction in marrow fat content than in water content. Normal adult bone marrow typically consists of 50-90% fat [20][21][22] and infiltration with myeloma cells decreases fat content [13,23,24]; however, the increase in water content may be relatively less because myeloma cells have an increased nuclear to cytoplasmic ratio. [25] This suggestion is supported by the observation that focal lesions are more difficult to detect in younger patients with cellular bone marrow imaging [26] or in myeloma patients with a higher bone marrow cell percentage. [15] To our knowledge, this is the first study comparing lesion detection rates on individual Dixon images in patients with MM. A small number of studies have examined lesion contrast in Dixon imaging compared to other sequences [15,27], but none of these have directly examined lesion detection rates by radiologists. This study suggests that the use Dixon imaging improves diagnostic sensitivity and confidence compared to in phase T1-weighted gradient Diagnostic utility of Dixon MRI in multiple myeloma echo imaging alone. We therefore argue that Dixon imaging should be used in preference to T1-weighted imaging alone for anatomical WB-MRI in MM. Furthermore, radiologists should specifically review the FO image type when reading WB-MRI in MM to increase diagnostic yield and improve reporting efficiency.
The accuracy of lesion detection in MM directly impacts on assessment of disease burden and therefore prognosis. [7] Walker et al. showed that patients with more than seven focal lesions on WB-MRI had a five year survival of 55%, compared to 73% for those with no focal lesions [7]. Moulopolos et al. similarly showed that radiological assessment of disease burden could be used to separate patients into different survival categories. [28] In patients with only a small number of lesions, poor diagnostic sensitivity could theoretically alter the diagnosis itself-small volume disease could be missed altogether, or patients with a small number of lesions (>1) could be incorrectly diagnosed with solitary plasmacytoma.
A limitation of this study is that our observations are confined to images generated using a single Dixon sequence. It would be preferable to compare sensitivity and positive predictive value across gradient echo (Dixon) and spin echo images including T1-weighted and STIR images, to form a more definitive overall assessment of the optimal sequence. However, this type of study would be difficult to perform in practice since acquiring conventional T1weighted spin echo images in addition to Dixon images would be extremely time consuming. Furthermore, previous studies suggest that gradient echo imaging offers similar image quality spin echo imaging in MM. [16] The study is also limited by the nature of the scoring system used. In particular, the upper limit of 20 for the lesion count means that we have not captured differences in the number of lesions detected in patients with very high tumour load. However, the clinical importance of these differences is doubtful and current staging systems do not differentiate between patients with more than 20 lesions. [6,29] Our scoring system also penalises observers who fail to identify diffuse infiltration, leading to generally low sensitivity scores when compared to the reference standard.
Further work is required to examine the diagnostic utility of different MR sequences to arrive at an optimised protocol for WB-MRI in MM. In particular, it would be useful to determine the extent to which DWI, post-contrast and pre-contrast Dixon imaging each contribute to the overall interpretation of the WB-MRI scan. Careful assessment of the 'value' of each sequence is essential if cost-effective, high volume whole body scanning is to be achieved. High-value MRI is becoming an increasingly important goal for the imaging community [30], and studies specifically examining the value of WB-MRI in MM will be essential for widespread clinical implementation.

Conclusion
Fat-only Dixon images offer higher lesion detection rates compared to in-phase images alone in multiple myeloma. We suggest that radiologists should preferentially review the fat-only images when reading to improve diagnostic accuracy and reporting efficiency.
Supporting information S1 File. Raw data showing lesion detection rates for the four Dixon image types. Lesion counts, true positives, false positives, false negatives, sensitivity, PPV and confidence are provided for each of the image types, for each observer and each patient. Please refer to the Materials and Methods section for more information on data arrangement. (XLSX)