Standardized Reporting of Prostate MRI: Comparison of the Prostate Imaging Reporting and Data System (PI-RADS) Version 1 and Version 2

Introduction Objective of our study was to determine the agreement between version 1 (v1) and v2 of the Prostate Imaging Reporting and Data System (PI-RADS) for evaluation of multiparametric prostate MRI (mpMRI) and to compare their diagnostic accuracy, their inter-observer agreement and practicability. Material and Methods mpMRI including T2-weighted imaging, diffusion-weighted imaging (DWI) and dynamic contrast-enhanced imaging (DCE) of 54 consecutive patients, who subsequently underwent MRI-guided in-bore biopsy were re-analyzed according to PI-RADS v1 and v2 by two independent readers. Diagnostic accuracy for detection of prostate cancer (PCa) was assessed using ROC-curve analysis. Agreement between PI-RADS versions and observers was calculated and the time needed for scoring was determined. Results MRI-guided biopsy revealed PCa in 31 patients. Diagnostic accuracy for detection of PCa was equivalent with both PI-RADS versions for reader 1 with sensitivities and specificities of 84%/91% (AUC = 0.91 95%CI[0.8–1]) for PI-RADS v1 and 100%/74% (AUC = 0.92 95% CI[0.8–1]) for PI-RADS v2. Reader 2 achieved similar diagnostic accuracy with sensitivity and specificity of 74%/91% (AUC = 0.88 95%CI[0.8–1]) for PI-RADS v1 and 81%/91% (AUC = 0.91 95%CI[0.8–1]) for PI-RADS v2. Agreement between scores determined with different PI-RADS versions was good (reader 1: κ = 0.62, reader 2: κ = 0.64). Inter-observer agreement was moderate with PI-RADS v2 (κ = 0.56) and fair with v1 (κ = 0.39). The time required for building the PI-RADS score was significantly lower with PI-RADS v2 compared to v1 (24.7±2.3 s vs. 41.9±2.6 s, p<0.001). Conclusion Agreement between PI-RADS versions was high and both versions revealed high diagnostic accuracy for detection of PCa. Due to better inter-observer agreement for malignant lesions and less time demand, the new PI-RADS version could be more practicable for clinical routine.

Consensus has been reached that standardization of imaging and reporting of prostate MRI is important to ensure high diagnostic quality, reproducible MRI results and applicability of prostate MRI for multicenter studies. In 2012 an expert consensus group of the European Society of Urogenital Radiology (ESUR) introduced the version 1 (v1) of the Prostate Imaging Reporting and Data System (PI-RADS) [12]. Since then, PI-RADS v1 has been clinically applied and evaluated in several clinical studies. mpMRI PI-RADS v1 scores showed high diagnostic accuracy for the detection of PCa and high inter-observer agreement [13][14][15][16]. However, some limitations of PI-RADS v1 have been recognized in the past years. With this first version, it was not clearly defined how exactly scores should be determined and combined and this gave room for individual interpretation and therefore to variability in the application of PI-RADS v1 [13]. Additionally, various types of perfusion curves that can occur in the prostate led to confusion; there is great heterogeneity in enhancement characteristics of prostate cancers and there is also great heterogeneity in perfusion characteristics of benign prostate lesion [17]. Including perfusion scores in the overall PI-RADS led to higher scores in benign lesions. Therefore, the American College of Radiology (ACR), the ESUR, and the AdMeTech Foundation established a Steering Committee to further develop, update and simplify PI-RADS under consideration of ongoing research in an effort to make PI-RADS more globally acceptable. This resulted in the updated PI-RADS version 2 (v2) [17].
The purpose of this study was to determine the agreement between PI-RADS v1 and v2 for evaluation of multiparametric prostate MRI and to compare their diagnostic accuracy, their inter-observer agreement and practicability.

Material and Methods Patients
Data of 69 consecutive patients who underwent mpMRI of the prostate and subsequently an MRI-guided in-bore biopsy between December 2012 and December 2014 were retrospectively analyzed. 15/69 patients were excluded from the analysis due to an incomplete MRI protocol or distinct artifacts in one or more MRI sequences. Out of 54 patients, 18 patients had at least one negative pre-biopsy and 36 had no pre-biopsy. Mean age (±standard deviation) was 69.6 ±9.6 years, mean PSA was 8.7±4.9 μg/L and mean prostate volume was 52.1±24.7 ml. The study was approved by the ethics committee of the Hannover Medical School.

Multiparametric MRI
Multiparametric MRI was acquired according to ESUR guidelines [1,12] on a 3 Tesla system (Magnetom Skyra, Siemens Healthcare, Erlangen, Germany) using an 18-channel body coil and a spine coil. In order to reduce bowel movement, all patients received an intravenous injection of 20 mg butylscopolamine (Buscopan 20 mg, Boehringer, Ingelheim, Germany) prior to the examination. T2 TSE sequences were acquired in transverse, sagittal and coronal orientation with FOV = 220 x 220 mm 2 , matrix = 300 x 512, TR > 3000 ms, TE > 90 ms. For DWI four b-values = 0, 200, 400, !800 s/mm 2 were used. Other sequence parameters were FOV = 320 x 320 mm 2 , matrix = 120 x 160, TR > 4500 ms and TE > 70 ms. For calculation of ADC maps, a monoexponential model was used and all b-values were included. For DCE a vibe sequence was acquired in transverse plane: FOV = 260 x 260 mm 2 , matrix = 135 x 190, TR = 5 ms, TE = 1.5 ms, temporal resolution = 7.2 s using Gadovist as contrast agent in a weight-adapted standard dose of 0.1 mmol/kg bodyweight with an injection rate of 2.7 ml/s ( Table 1).

MRI-guided in-bore biopsy
The MRI-guided in-bore biopsy was performed on the same MRI system as diagnostic imaging (Magnetom Skyra, Siemens Healthcare, Erlangen, Germany) using an 18-channel body coil and a spine coil integrated into the scanner. Patients with high clinical probability of PCa and/ or equivocal or suspicious MRI findings, as diagnosed by the radiologists who primarily reported the study, underwent biopsy. 65 prostate lesions in 54 patients were successfully biopsied. Patients were placed in prone position. A needle guide connected to the arm of a portable biopsy device (Dyna-TRIM Device, Invivo International PC Best, Netherlands) was inserted rectally after topic local anesthesia (Lidocaine, Instillagel 40 ml, Farco-Pharma, Köln, Germany) was applied. T2-weighted images (HASTE or TSE) were acquired in transverse and sagittal planes. Biopsy was planned using the Dyna-CAD workstation and Dyna-CAD software (Invivo International PC Best, Netherlands) and the determined needle position was adjusted manually. 2 cores were taken per lesion with an MRI-compatible 18-gauge fully automatic biopsy gun. The needle position inside the lesion was verified using T2 HASTE sequences. No additional systematic biopsy was performed.

PI-RADS analysis
65 prostate lesions in 54 patients were analyzed independently by two readers with five years (reader 1) and 2 years (reader 2) experience in the interpretation of prostate MRI using Visage 7.1 software (Visage Imaging GmbH, Berlin Germany) and a standardized hanging protocol. Readers were blinded to clinical findings and histopathology. Image data of all patients were interpreted twice: once according to PI-RADS v1 [12,18,19] and once according to PI-RADS v2 [17]. The time difference for assessment of data in the same patient was at least two weeks. Half of the MRI examinations were first interpreted according to PI-R-ADS v1, the other half were first interpreted according to PI-RADS v2 in order to avoid bias. In brief, lesions were scored in each sequence (T2 TSE, DWI and DCE) on a scale from 1 = highly likely to be benign to 5 = highly likely to be malignant. For scoring of DCE the presence of focal or asymmetric contrast enhancement as well as the shape of the signal intensity-time curve was evaluated. Signal intensity-time curves were created using Visage 7.1 software by placing a region-of-interest into the suspicious lesion [12]. PI-RADS sum score was calculated as the sum of PI-RADS scores from these sequences. The global PI-R-ADS score was then determined as follows: PI-RADS v2 analysis was performed according to the recently published guidelines [17]: lesions were categorized into transitional zone (tz) and peripheral zone (pz) lesions. tz lesions were scored on T2 TSE sequences on a scale from PI-RADS 1 to 5. T2 PI-RADS scores of 1-2 and 4-5 accorded with the overall PI-RADS score. Only for intermediate lesions (PI-RADS 3) on T2 TSE images DWI as a second sequence was analyzed. A PI-R-ADS score of 1-4 on the DWI resulted in an overall PI-RADS score of 3, while a DWI PI-R-ADS score of 5 results in an overall PI-RADS score of 4. pz lesions were scored primarily based on DWI. Similarly, DWI PI-RADS scores of 1-2 or 4-5 accorded with the overall PI-RADS score. For intermediate lesions (PI-RADS 3) based on DWI, the DCE sequence was visually analyzed and interpreted as positive or negative. DCE was positive when there is enhancement that is focal, earlier or contemporaneous with enhancement of adjacent normal prostatic tissue and corresponds with T2 or DWI lesion [17]. Positive DCE results in an overall PI-RADS of 4, while negative DCE results in an overall PI-RADS of 3 [17]. Additionally, ADC-values were documented. Prostate volume was calculated according to PI-RADS v2 guidelines using the formula for a conventional prolate ellipse (maximum AP diameter x maximum transverse diameter x maximum longitudinal diameter x 0.52) [17].

Practicability of PI-RADS versions (time need)
In order to evaluate practicability of PI-RADS v2 compared to PI-RADS v1, time was taken for the determination of the PI-RADS scores for each version. Time was measured after all sequences were inspected and lesions were identified and reflects only the time for assigning the PI-RADS score itself.

Statistical analysis
For statistical analysis, GraphPad Prism software versions 5 (GraphPad Software, Inc., USA) as well as SPSS Statistics version 21 (SPSS, IBM, Chicago, IL, USA) were used. As clinical data and PI-RADS scores were not normally distributed as determined by the Kolmogorov-Smirnov test and PI-RADS scores represent ordinal variables, patients with and without biopsy-proven PCa were compared using the non-parametric Mann-Whitney U test. ADC values were normally distributed and were compared with the unpaired t-test between groups with and without PCa. The time needed for assigning the PI-RADS was not normally distributed and were compared with the non-parametric Wilcoxon test. Diagnostic performance of mpMRI PI-RADS scores was determined for the dominant lesion in each patient that was biopsied by MRI-guided in-bore biopsy. Results of MRI-guided biopsy were considered as the reference for this study. Receiver operating characteristic (ROC) curve analysis for PI-RADS v1 and v2 was performed separately for each reader as well as for the tz and pz, using histopathological results of MRI-guided in-bore biopsy as the gold standard. Youden-selected thresholds were determined, and sensitivity and specificity of MRI PI-RADS scores at the threshold were recorded. The agreement between PI-RADS versions and the inter-observer agreement for each version was determined using Cohen's kappa statistics. The agreement was defined excellent (κ>0.81), good (κ = 0.61-0.80), moderate (κ = 0.41-0.60), fair (κ = 0.21-0.40) and poor (κ 0.20) [20]. Values are given as mean ± standard deviation (SD). P-values <0.05 were considered statistically significant.  Table 2). An example of PI-RADS scoring in a patient with biopsy proven PCa (Gleason 3+4 = 7a) is given in Fig 1. An overview of PI-RADS scores assigned according to PI-RADS version 1 and version 2 is given in Tables 3 and 4.

Diagnostic accuracy of mpMRI PI-RADS scores version 1 and version 2
Sensitivity and specificity for the detection of PCa were 100% and 74%, respectively, at the Youden-selected cut-off PI-RADS !3 with v2 and 84% and 91%, respectively, at a Youdenselected cut-off PI-RADS !4 with v1 for the experienced reader. Youden-selected thresholds were similar for the less experienced reader, who achieved nearly equivalent sensitivities and specificities with PI-RADS v2 ( Table 5, Fig 2). In addition, diagnostic accuracy for pz and tz lesions was analyzed separately. With PI-R-ADS v2 scores best diagnostic accuracy in the pz was reached with a cut-off !3 at a Youden-Index of 71% (sensitivity 100%, specificity 71%), while in the tz a cut-off !4 revealed highest diagnostic accuracy (sensitivity 80%, specificity 100%, Youden-Index 80%). With PI-RADS v1 Youden-selected cut-off was !4 for both pz and tz lesions (Table 6, Fig 3).

Practicability of PI-RADS scoring (time need)
When comparing the time needed to determine the PI-RADS score after having evaluated the entire examination, the experienced reader needed 24. According to PI-RADS v2, DWI is the leading sequence. As the lesion has high signal intensity on b1000 with corresponding strong ADC reduction of 0.5 10 −3 mm 2 /s as well as a diameter >15 mm the lesion was scored PI-RADS 5, highly likely to be malignant. No other sequence is required for scoring according to PI-RADS v2. With PI-RADS v1 a score for each sequence needs to be determined. For this patient the following scores were assigned: T2 TSE PI-RADS 5, DWI PI-RADS 5, DCE PI-RADS 5. This results in in a PI-RADS sum of 15 and a global PI-RADS 5. In this patient the time need for PI-RADS scoring with v2 after inspection of all images was 9 seconds, while it was 59 seconds with v1. MRI-guided in-bore biopsy revealed Gleason 3+4 = 7 tumor.

Discussion
We showed that the updated PI-RADS v2 for evaluation of mpMRI of the prostate had high diagnostic accuracy for detection of PCa with equivalent sensitivities and specificities compared to PI-RADS v1. The agreement between PI-RADS v1 and v2 scores was good. The interobserver agreement for malignant lesions was better with PI-RADS v2 than with PI-RADS v1 and the time needed for PI-RADS scoring was significantly shorter for PI-RADS v2 indicating a better practicability.
Rising acceptance of mpMRI by urologists depends on high diagnostic accuracy for detection of significant PCa and reproducible interpretation. Therefore, standardized analysis and reporting of prostate MRI with comprehensible and clearly defined criteria are required. The initial version of the PI-RADS scoring (version 1) revealed high diagnostic accuracy in several studies [13] and good inter-observer agreement [15]. In the present study, comparing the updated PI-RADS scoring (version 2) with the initial version 1, we found equivalent diagnostic accuracy and a good agreement between the PI-RADS versions. The updated PI-RADS v2 provides a more detailed description on the assessment of prostate MRI with clearly defined criteria for PI-RADS scoring and representative images for each PI-RADS score and each sequence separately for pz and tz lesions [17]. As it was done with PI-RADS v1, the complete mpMRI examination (T2, DWI and DCE) has to be acquired and inspected completely. Major renewals in PI-RADS v2 are that the relevance of MRI-sequences is weighted depending on the localization of the lesion in the pz or tz and that scoring in one sequence is sufficient for most lesions.
Only for indeterminate lesions (PI-RADS 3) scoring in a second sequence is required. In PI-R-ADS v2 the most important sequence for diagnosis of significant PCa is DWI, being the leading sequence for the pz and the second sequence for the tz. This development is based on previous research showing that DWI provides highest accuracy for PCa detection for pz and tz lesions, if only one sequence is considered [21][22][23]. For tz tumors the combination of T2 and DWI, as used in PI-RADS v2, has been reported to have highest diagnostic accuracy [21], while adding DCE could not improve tumor detection in treatment-naïve prostates [24]. In contrast, in the   Table 2. pz diagnostic performance of DCE is better [21,24], so that it is reasonable to use DCE as a second sequence in PI-RADS v2, if DWI reveals indeterminate results (PI-RADS 3). Besides the opportunity to consider different tissue characteristics of pz and tz with PI-RADS v2, scoring of one or a maximum of two MRI sequences is more straightforward and efficient. Importantly, there are several reasons why it is still recommended for prostate MRI to acquire and interpret morphological T2-weighed images and at least two functional sequences. First, tumor localization and characteristics are unknown before the examination, so that it is not clear which sequences are necessary for PI-RADS v2 scoring. Second, if image quality due to artifacts in   Table 3. one sequence is insufficient (e.g. distortion in DWI often occurs with hip implants), the other two sequences can still be used for PI-RADS scoring, as the new version suggests a score if one sequence is missing [17]. Third, in pre-treated prostates following radiation therapy, focal therapy or endourethral treatment detection of PCa is often more challenging and treatment related signal changes have to be considered. In this situation other sequences than suggested in the PI-RADS v2 might be helpful. Therefore, in our study, PI-RADS scores were assigned after the entire examination was closely evaluated and artifacts were excluded. Sensitivities and specificities of both PI-RADS versions in our study were comparable to those reported previously for PI-RADS v1 in studies that accurately used PI-RADS scoring and had pooled sensitivity of 82% (95%CI; 0.7-0.9) and specificity of 82% (95% CI; 0.7-0.9) [13][14][15][16][25][26][27][28]. In a recent study, 95% of PCa foci !0.5 ml were correctly identified with PI-R-ADSv2 using whole-mount pathology as reference standard [29]. Youden-selected PI-RADS thresholds in our study varied between PI-RADS !3 and !4 depending on PI-RADS version and tumor localization. The calculated threshold for best diagnostic accuracy of all lesions with version 2 was !3, while it was !4 with version 1. However, the number of patients might be too low to draw any further conclusions from this discrepancy between PI-RADS versions. In addition, recommendations for patient management based on mpMRI PI-RADS scores have not been proposed so far and are explicitly not included in the PI-RADS v2 document [17]. Development of reliable recommendations based on PI-RADS scores needs further research with large prospective multicenter trials and long-term follow-up of patients.
The inter-observer agreement between experienced and less experienced readers in our study was moderate for PI-RADS v2 (κ = 0.56) and fair for PI-RADS v1 (κ = 0.39) when considering all lesions. A recent study, evaluating PI-RADS v2, also found moderate interreader agreement for five interpreters when considering all lesions [30]. Rosenkrantz et al. reported that inter-observer agreement (using PI-RADS v1) between experienced readers was better (concordance correlation coefficient 0.61-0.68) than between experienced and inexperienced readers (concordance correlation coefficient 0.34-0.48) for all lesions [31], indicating that reader agreement is dependent on experience with prostate MRI. Schimmoeller et al. showed a good inter-reader agreement for tumor-positive lesions and a moderate to good interreader agreement for benign lesions with PI-RADS v1 [15]. An explanation for the fact that interobserver agreement is better for tumor-positive lesions might be that various types of benign changes of prostate tissue occur, which are often difficult to differentiate from tumor lesions. In our study, when considering only biopsy-proven PCa lesions, inter-observer agreement with PI-RADS v2 (κ = 0.56) was better than with PI-RADS v1 (κ = 0.14). This may be explained by a more detailed and clear definition of PI-RADS scores with v2. For example PI-RADS 4 and 5 lesions with v2 are clearly differentiated by the lesion size (diameter !15 mm results in PI-R-ADS 5), the presence of definite extraprostatic extension or invasive behavior.
Another important finding of our study is that the time needed for assigning the PI-RADS score itself was significantly shorter with v2 compared to v1, which is due to the fact that for most prostate lesions (i.e. PI-RADS 1,2, 4 and 5 lesions) scoring of only one sequence for v2 instead of three sequences for v1 is necessary. Accordingly, for PI-RADS 3 lesions, requiring two sequences for PI-RADS v2 scoring, the time difference was less pronounced and statistical significance was not reached. Furthermore, producing a signal intensity-time curve is not required for DCE analysis with version 2, which saves additional time. Furthermore, the newly introduced sector map of the prostate with 39 regions is more intuitive as the regions are named according to their location and are not numbered [17]. Therefore, our study suggests that PI-RADS v2 might indeed be more practicable and easier to implement into clinical routine. Note that the time needed to evaluate the entire prostate MRI and to identify the lesions was not included in the time needed for assigning the PI-RADS score.
Limitations of our study are that MRI-guided in-bore biopsy was used as a reference standard, no additional systematic biopsy was performed and no long-term follow-up was available. Therefore, false negative biopsy results cannot be excluded, which has to be considered when interpreting diagnostic accuracy results. However, the primary objective of our work was to directly compare the two PI-RADS versions in terms of inter-and intra-observer variability, practicability and their diagnostic performance.
In conclusion, standardized evaluation of prostate MRI according to PI-RADS v2 showed high diagnostic accuracy for the detection of PCa, equivalent to PI-RADS v1. The agreement between the two PI-RADS versions was good. Of note, inter-observer agreement for tumorpositive lesions was better with PI-RADS v2 and shorter time was needed for scoring according to PI-RADS v2 indicating better practicability for clinical practice.
Supporting Information S1