Computer aided detection in prostate cancer diagnostics: A promising alternative to biopsy? A retrospective study from 104 lesions with histological ground truth

Background Prostate cancer (PCa) diagnosis by means of multiparametric magnetic resonance imaging (mpMRI) is a current challenge for the development of computer-aided detection (CAD) tools. An innovative CAD-software (Watson Elementary™) was proposed to achieve high sensitivity and specificity, as well as to allege a correlate to Gleason grade. Aim/Objective To assess the performance of Watson Elementary™ in automated PCa diagnosis in our hospital´s database of MRI-guided prostate biopsies. Methods The evaluation was retrospective for 104 lesions (47 PCa, 57 benign) from 79, 64.61±6.64 year old patients using 3T T2-weighted imaging, Apparent Diffusion Coefficient (ADC) maps and dynamic contrast enhancement series. Watson Elementary™ utilizes signal intensity, diffusion properties and kinetic profile to compute a proportional Gleason grade predictor, termed Malignancy Attention Index (MAI). The analysis focused on (i) the CAD sensitivity and specificity to classify suspect lesions and (ii) the MAI correlation with the histopathological ground truth. Results The software revealed a sensitivity of 46.80% for PCa classification. The specificity for PCa was found to be 75.43% with a positive predictive value of 61.11%, a negative predictive value of 63.23% and a false discovery rate of 38.89%. CAD classified PCa and benign lesions with equal probability (P 0.06, χ2 test). Accordingly, receiver operating characteristic analysis suggests a poor predictive value for MAI with an area under curve of 0.65 (P 0.02), which is not superior to the performance of board certified observers. Moreover, MAI revealed no significant correlation with Gleason grade (P 0.60, Pearson´s correlation). Conclusion The tested CAD software for mpMRI analysis was a weak PCa biomarker in this dataset. Targeted prostate biopsy and histology remains the gold standard for prostate cancer diagnosis.

Introduction staff, the supply of which is disproportionally low compared to the increased diagnostic demand of the second most common cancer in males. An increasing body of evidence supports the role of automated mpMRI analysis in the form of Computer-Aided Detection (CAD) methods. CAD systems approach MRI modalities quantitatively and allow for information convergence into statistical pipelines that are adjusted to predict malignancy [13].
A recently commercialized, automated analysis tool for the assessment of prostate cancer in mpMRI (Watson Elementary™, Watson Medical, Den Ham, The Netherlands) has achieved high sensitivity and specificity in its first evaluation [23]. The Watson Elementary™ method is tuned up to predict the malignancy grade with an mpMRI-based Gleason score correlate, termed Malignancy Attention Index (MAI). In this study, Watson Elementary™ was retrospectively assessed in our hospital´s database of 104 prostate lesions with histological ground truth after MRI-guided biopsy, and showed a low sensitivity for PCa detection, which was not superior to the observational diagnosis. Our results are compared with previous related studies.

Ethics
All patient data were derived from the prostate database of the Suedharz Hospital Nordhausen. Data were analyzed retrospectively, fully anonymized, in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its amendments as well as with the guidelines of the Ethical Committee for clinical studies of the University of Jena. Due to the retrospective character of the study, the ethical committee has waived the mandate for obtaining a legally effective informed consent from the included subjects. Accordingly, therapeutic decisions were not influenced by the outcome of this study.

Study design
The evaluation was retrospective for 104 histologically characterized biopsy cores obtained with MRI-guided prostate biopsy ( (Table 1, Fig 1). All patients were examined with suspicion for prostate cancer based on elevated PSA assay after a negative systematic biopsy and none of them had previously received chemotherapy for prostate cancer treatment. From a total of 122 patients, 43 patients were excluded from this study due to protocol mismatch with the software´s technical requirements: in 27 patients, rejection was due to static field strength inconsistencies (1.5T excluded); in one patient, due to anatomic malposition of the prostate; in eight patients, the arterial reference curve in the DCE sequences was insufficient; three patients were rejected due to motion artifacts; in two cases, the fusion step failed (see later); and in one case the arterial input curve could not be defined. Images from a single patient were not accepted by the software for an unknown reason. No patient-or lesion-related inclusion/ exclusion criteria applied.
All lesions were graded by 2 radiologists; one with intermediate experience, and one boardcertified radiologist, according to the Prostate Imaging Reporting And Data System (PI-R-ADS™) v1 and -v2. MRI-guided transrectal needle biopsies were always performed in less than 3 months post diagnosis. Histological characterization and Gleason grading on H&E stained sections followed.
Watson Elementary™ (installed on 11.5.2016 by Watson Medical, Den Ham, The Netherlands) was tested with a manual generation of regions of interest (ROI) in diagnostic MRI series encompassing targeted biopsy cores. The diagnostic accuracy hypothesis tested the ability of CAD to detect known, manually drawn lesions. Therefore all lesions without histological ground truth, including new lesions identified by the software, were neglected and did not influence the statistics of the current study.
Evaluation of the diagnostic accuracy followed with estimation of the sensitivity, specificity, Positive Predictive Value (PPV) and False Negative Rate (FNR). Implementation of Receiver Operatic Characteristic (ROC) analysis and estimation of the Area Under the Curve (AUC) allowed for the definition of optimal cut-off diagnostic values using Youden statistics (see also Statistics).

MRI-guided prostatic biopsy
Malignancy suspect lesions according to PI-RADS™ scoring were transrectally biopsied under stereotactic MRI-guidance with a Philips Ingenia 3.0T MR system using a dStream Torso body coil with Flex Coverage anterior and posterior coils that allow for a 32-channel, 60 cm body coverage (Philips North America Corporation, Andover, MA USA).
Briefly, a stereotactic needle-frame with x-y-z freedom was fixed onto the patient table. Target lesions were (re-)allocated and the sampling position was planned with the frame-dedicated software (DynaCAD, Invivo Corporation, Gainesville, FL, USA). The accurate position of the biopsy needle-tip was confirmed with a T2w TSE HR sequence.

Image acquisition for mpMRI
Both biopsies and diagnostic imaging were performed on the same Philips Ingenia 3.0T MRsystem within a time interval of 3 months. The following axially orientated image set was used for CAD analysis ( Table 2): 1. T2 weighted high-resolution Turbo Spin Echo imaging (T2 TSE HR) (Fig 2Ai, 2Bi and 2Ci)  (Fig 2Aiii and 2Aiv, 2Biii and 2Biv, 2Ciii and 2Civ).

Description of Watson Elementary™
Watson Elementary™ software implements a fully automated 3-step-method, previously reported in detail [23]. Below is a brief description of its image processing steps: Computer aided detection of prostate cancer: A retrospective study of diagnostic accuracy 1. Affine image co-registration 2. Pixel detection step for feature extraction and non linear processing, such as ADC maps from DWI sequences (Fig 2Aii, 2Bii and 2Cii), kinetic parameters of the DCE-profile such as K trans , V e and K ep according to Tofts´pharmacokinetic model [24,25] and normalization of T2w images based on a rectangular prostatic reference volume for calculation of first and second order texture features.
3. Feature classifier with 3 logical steps: Step 1 defines the dimension of predictor space and the transformation parameters.
Step 2 is a linear summation parameter that integrates step 1 information to construct a scalar map.
Step 3 features an error-feedback method, which has been trained by a supervised learning process to achieve a congruence of the scalar value (step 2) with the Gleason grade.
The final output of the feature classifier is a pixel-based malignancy prediction score, ranging from 0 to 1, termed as Malignancy Attention Index (MAI). Watson Elementary™ constructs a malignancy prediction heatmap with high MAI values represented in warm colors. This map is projected onto T2w images, thus anatomically highlighting suspect lesions. Moreover, MAI pixel values of each manually defined lesion are automatically sorted in a histogram (malignancy prediction histogram). Histogram shape, mean and median MAI have been suggested as PCa biomarkers [23].

Lesion definition and evaluation of the volume of interest
Target lesions, i.e. lesions that have been selected for MRI-guided biopsy based on their PI-R-ADS score, were evaluated retrospectively, after PI-RADS scoring and after the histological report of the biopsy. Lesions were manually defined in T2w images (for the transitional zone) and ADC maps (for the peripheral prostate zone) according to the biopsy needle position by a third radiologist (IEP) blinded to the PI-RADS scoring and the histological identity to avoid bias in sampling. Data analysis was performed by a medical student (AT) and a radiologist (IEP), both blinded to the PI-RADS scoring and histological identity to avoid bias in lesion classification. Watson Elementary™ allows for definition of consecutive ROIs in 2D planes to form a 3D lesion volume of interest. The system generates a probabilistic heatmap (with the MAI values) for each section, which is projected onto the corresponding T2w image. The predefined ROIs were copied from the imaging series to the probabilistic map and the system generated for each predefined lesion (sum of ROIs) a feature summary in a.pdf including (i) a histogram of MAI values, (ii) the average DCE curve (iii) the lesion volume [26]. User access to the ground data is possible only for the probabilistic histograms for each region and not for the intermediate meta-data such as DCE curves, which are only graphically displayed in.pdf format. The classification of lesions was performed visually, by evaluating (i) whether the heatmap of the lesion is distinguishable from the background and (ii) the skewness of the MAI histogram (Fig 3). Classification was performed by a radiologist (IEP) and a medical student (AT), both blinded to the PI-RADS scoring and histological identity of the lesions. In the first classification step observers made no assumption about the histological identity. After classification, the histological identity was unmasked to calculate the specificity and the sensitivity of the software for malignant lesions.

Statistics
The Statistical Package for the Social Sciences (SPSS Version 21, IBM, Armonk, NY, USA) was used for statistical analysis and graphical plotting. Data were screened for normality using the Shapiro-Wilk test. Values are expressed as median/IQR (interquartile range) and rounded up to the second decimal place unless otherwise stated. Statistical significance was tested using the t-test or the ranksum Mann-Whitney test for unpaired data. MAI score between groups were compared with Kruskal-Wallis ANOVA on ranks with Dunn´s posthoc test. Linear correlations were tested with the Pearson product moment correlation coefficient. Continuous probability distribution, as well as independence of nominal data, was tested by means of the chisquared test. ROC curves were calculated for MAI median, MAI mean and MAI median-tomean ratio as skewness index as well as for ADC values and PI-RADS reading scores. Statistical significance was set at P <0.05.

Automated feature extraction: Detection of the arterial input function
The first step in image processing by Watson Elementary™ is feature extraction. For the DCE kinetic analysis, the tissue signal is normalized on the perfusion curve of the common femoral artery (arterial input function). Arterial detection is semi-automated and has to be manually confirmed. The correct artery position was automatically defined in 61 (77.22%) patients; in the remaining 18 (22.78%), it had to be manually reassessed.

CAD-sensitivity and specificity
All 104 biopsy cores from 79 patients were scored according to PI-RADS™ in the initial diagnostic dataset (Table 1). Datasets acquired prior to the establishment of PI-RADS™ v2 were rescored post hoc so that PI-RADS™ v1 and -v2 scores were available for all included cores. Switching between PI-RADS versions, as a consequence, over-or underscored some lesions. Although PI-RADS 2 lesions were not subjected to a biopsy, some were included in the analyzed database. Those lesions were biopsied as PI-RADS 3 or 4 according to PI-RADS™ v1 and then underscored to PI-RADS 2 after the introduction of PI-RADS™ v2 (Table 1).
In the joined MAI/T2w images (Fig 3A-3E, left panels), we evaluated whether the heatmap of our target lesions was visually distinguishable from the background, i.e. whether the lesion could be detected in the MAI/T2w heatmap without the use of ADC maps. Warm-colored ROIs with a characteristic left-skewed MAI histogram were evaluated as classified (detected) lesions ( Fig 3A-3C), in sharp contrast to the flat or right-skewed MAI histogram shape of the non classified (cold-colored) ROIs (Fig 3D and 3E). In previous work, a MAI-max cut-off value of 0.6 was used as criterion for malignant lesion classification, based on the assumption that MAI linearly correlates with the Gleason score [23]; this method was not applicable in our study because almost all lesions, regardless of their histogram shape and histological identity, showed a maximum MAI value higher than 0.6 (Supporting information, S1).
From 47 histologically confirmed malignant (PCa) and 57 benign lesions, 22 PCa and 14 benign lesions were classified (Fig 4A and Table 3). The deduced CAD-sensitivity for prostate malignancy in our series was 46.81% with a specificity of 75.44% and a PPV of 61.11% ( Table 4).
The hypothesis whether CAD preferentially classifies malignant over benign cores was tested with a chi-squared test, which revealed independent classification of benign and malignant biopsy cores, albeit with a slight trend towards the classification of malignancy, P 0.06 (Table 3). Furthermore, we questioned whether CAD sensitivity leaned towards a particular histological identity, a particular Gleason malignancy grade or a specific benign condition. All malignant lesions (n 47) were typified as acinar adenocarcinoma of various Gleason grades from 5 to 10. As graphically demonstrated in (Fig 4B), CAD-sensitivity was not influenced by the malignancy grade, P 0.713 R 0.193 Pearson correlation. Among false positives (i.e. benign cores falsely classified) (Fig 4C), 5/14 lesions (35.71%) corresponded to benign prostate

Malignancy attention index as biomarker
In a previous study [23], MAI was proposed as a potential PCa biomarker and MAI heatmaps/ histograms as potential core malignancy profiles (Fig 3).
Median MAI values were selected as the most representative descriptive parameter of skewed histograms. The comparison between all observed (classified and non classified) benign and malignant biopsy cores indeed revealed a significantly lower MAI score for benign lesions, with (median/IQR) 0.39/0.18 compared to malignant ones 0.56/0.24, P 0.023 Mann-Whitney U-test (Fig 5B).
MAI score was, as expected, significantly higher in classified compared to non classified lesions regardless of identity, P < 0.05 Kruskal-Wallis ANOVA on ranks with Dunn´s post hoc test. However, classified PCa and benign cores did not show any significant MAI difference, P < 0.05 Kruskal Wallis ANOVA on ranks with Dunn´s post hoc test (Fig 5C).
Furthermore, we tested whether MAI score qualifies for a Gleason´s grade predictor using the Pearson´s test. Within the 22 classified PCa biopsy cores, MAI did not show any significant correlation with Gleason grade, P 0.52 R 0.14 Pearson product moment correlation (Fig 5D).
ROC analysis of median and mean MAI as malignancy predictors revealed rather poor results with an area under curve ± standard error of the mean (AUC±SEM) 0.63±0.06 (95% CI 0.52-0.74), P 0.02 for MAI median and 0.64±0.06 (95% CI 0.53-0.75), P 0.02 for MAI mean. The predictive outcome of the median/mean ratio as skewness index was not significant with Computer aided detection of prostate cancer: A retrospective study of diagnostic accuracy   Table 5). By setting an optimized cut-off point for MAI mean, however, we could improve the sensitivity and specificity up to 70.21% / 61.4% (95% CI 55.11-82.66% and 47.57-74%, respectively) (Fig 6 and Table 5). Moreover, analysis of the ADC value alone showed a stronger predictive behavior compared to the software-calculated MAI with AUC 0.79±0.05 (95% CI 0.70-0.88) and P 0.04 compared to MAI, chi-squared test (Fig 6). Guided by the hypothesis that Watson Elementary™ might be more specific for particular lesion locations and sizes, we tested for possible predilection towards the peripheral or the central zone of the prostate gland. In the transitional prostate zone CAD has classified 10 out of 25 histologically confirmed PCa; whereas in the peripheral zone, 12 out of 22, hence with no apparent influence on the performance (P 0.481 chi-squared test). The lesion volume, however, had a significant influence on the CAD-performance. Amongst lesions smaller than 0.5ml ( Fig 7A) the vast majority was not classified (sensitivity 27.27% and FNR 31.37%). For intermediate size cores of 0.5ml-1.0ml, the CAD revealed an improved performance (sensitivity 53.33% and FNR 18.42%) and false negatives were minimized for lesions larger than 1.0 ml (sensitivity 80%, FNR 13.33%). As expected, lesion volume was independent of Gleason grade, R 0.18 P 0.15 Pearson´s correlation. It´s worth noticing that it is more crucial to eliminate the number of FN than the number of FP because the therapeutic consequence for the patient would be an undiagnosed PCa in the first case, compared to an unnecessary biopsy in the second case. In this context, CAD-performance is satisfactory for lesions larger than 1.0ml ( Fig  7A). In Fig 7B, the MAI score of classified and non classified cores is plotted with the lesion volume. There is a strong trend for a positive correlation between lesion size and MAI-score for TP lesions (P 0.057 Pearson´s correlation) but not for any other category (TN, FP and FN, P > 0.1 Pearson´s correlation). We questioned the clinical significance of lesions smaller than 0.5 ml, which make up 49.04% of our database. Interestingly, the malignancy incidence between lesions smaller than 0.5 ml, and those that were larger, was identical (Fig 7Ci and  7Ciii) with approximately 43% probability of malignancy in both groups. Moreover, within malignant lesions, we observed comparable PCa aggressiveness in terms of Gleason grade (Fig  7Cii and 7Civ), with high-grade cancers being equally possible in both small and larger lesions.
In summary, MAI is a weak PCa biomarker, especially for lesions smaller than 0.5ml in either the transitional and peripheral zone, regardless of lesion aggressiveness.

Discussion
This study aims to emphasize the growing necessity for commercialized prostate mpMRI CAD-software tools for the radiological, and perhaps urological, praxis. By retrospectively testing 104 lesions (47 malignant, 57 benign) in a series of 79 patients, a commercialized prostate CAD, Watson Elementary™, revealed a sensitivity of 46.81% for prostate malignancy, with a specificity of 75.44% and a PPV of 61.11%. Our results considerably differ from previous reports on the same software. Roethke et al. [23] have tested Watson Elementary™ in a cohort of 45 patients with 1102 MR/TRUS acquired biopsy cores (76 malignant/1026 benign) and ANOVA on ranks, * P < 0.05 Dunn´s post hoc test (D) No significant correlation between MAI and Gleason grade, P 0.522, N 22, R 0.144, Pearson´s product moment correlation; n.s., non significant. https://doi.org/10.1371/journal.pone.0185995.g005 Computer aided detection of prostate cancer: A retrospective study of diagnostic accuracy  achieved a sensitivity of 85.71% and specificity of 87.50% when setting an optimal MAI mean cut-off threshold for malignancy detection. By setting an optimized cut-off value of MAI mean in our study, we could improve the sensitivity and specificity up to 70.21%/61.4%, which is inferior to the previously reported values but comparable in terms of methodology [23]. This considerably differs from previous promising studies that have established custom-made software tools for mpMRI analysis with high stand-alone accuracy for malignancy detection [33][34][35][36]. The group of Litjens et al. [26] achieved a stand-alone accuracy of AUC = 0.89 in a remarkably large database of 347 patients. The outcome discrepancy between our series and previous testing of the same software [23] could be attested to a variety of causes. In terms of methodology, a previous study applied a combination of systematic and MRI-guided transperineal biopsies, ending up with more probes per patient (approximately 25) compared to the current study, which was based exclusively on MRI-guided transrectal biopsies (approximately 2 probes per lesion) [23]. Nevertheless, regardless of the number of biopsies per patient/core, both studies define a confirmed lesion by at least one positive biopsy. Roethke et al. [23] report their results in patient-based percentages, in contrast to our study which is needle-based. Taking into account the different reporting methods, our results are technically comparable with the needle-based results of Roethke et al., i.e. sensitivity/specificity 54.67%/97.76%.
Another methodological variation that might have influenced the discrepancy from previous work is that Roethke et al. implement a MAI-max cut-off value of 0.6 as criterion for malignant lesion classification [23]. This method, though more objective than visual classification, was not applicable in our study because almost all lesions showed a maximum MAI value higher than 0.6 (Supporting information, S1).
A prerequisite of the high CAD-accuracy is the training of the classifier on a database with similar characteristics to the testing database [13]. An important limitation of this study is the lack of interaction with the classifier of the commercially available tested CAD-software [23]. Despite the classifier having been trained on scanner data with the same (3T) field strength, factors such as the different technical characteristics, coils, static magnetic field inhomogeneities and protocols for the resonance frequency adjustment led to contrast differences that could sufficiently affect the outcome. In a thorough review by Wang et al. [13], numerous studies with databases varying between 15 and 100 patients were compared, not only in terms of performance but also in terms of the analyzed modalities, field strength, ground truth, the method for candidate lesion generation and applied classifier, revealing a broad heterogeneity. Implementation of different receiver coils, such as the use of endorectal coil [33,34], increases the methodological variation.
The lesion volume should be considered as an independent factor. The current study had, as a single exclusion criterion, technical data incompatibility with Watson Elementary™. In ca. 86% of the cases, the sampled lesion´s volume was smaller than 1.0 ml; and in 49%, smaller than 0.5 ml, in keeping with early PCa diagnosis. Watson Elementary™ showed promising performance only in lesions larger than 1.0 ml, which might explain differences with previous studies where volume inclusion criteria might have differed. In the studies of Roethke et al. [23,41], lesions smaller than 0.5 ml were not considered clinically significant cancer, in line with ESUR guidelines 2012 [17]. However, lack of methodological definition on volume selection criteria does not allow for a more elaborated comparison. It is remarkable that both the malignity and Gleason grade of smaller, "clinically insignificant"lesions do not differ compared to larger lesions, as shown in Fig 7C. This result supports the existing body of evidence that small cancers can significantly affect a patient´s outcome and encourages biopsy and treatment according to guidelines [42].
Furthermore, lack of access to whole-mount prostate pathology was a limitation of this study with possible influence on the results. The classifier of Watson Elementary™ has been regularized to create congruence with malignity grade, a.k.a. Gleason grade [23]. Nonetheless, MAI did not significantly correlate with the pathological outcome, which is the Gleason score in our database. Inter-observer differences in Gleason grading between the training and the testing database could already contribute to this discrepancy, albeit minimally, as previous studies have shown negligible inter-observer variation mostly at the upper and lower limit, namely 4 and 8-10 of the Gleason scale [43,44]. Another possible variability factor may rely on the classifier´s training on whole slide pathology specimens, whereas the grading process in the current study was based on needle biopsies [45,46]. Moreover, the unequal sample distribution with the majority of patients revealing Gleason 6 or 7 (3+3 or 3+4) at the time of diagnosis may bias Pearson´s correlation coefficient negatively [47]. However, such an unequally weighted Gleason distribution is the typical occurrence pattern in population screening and should be taken into account in the training process of a detection method [48,49].
The ROC-analysis of the observer´s performance reveals a reduced accuracy of our study compared with previously published results on PI-RADS [16,41]. Kasel-Seibert et al. In a recent study, compared the performance of PI-RADS version 2 to version 1, showing a high accuracy of AUC of 0.88 and 0.91 for v1 and v2, respectively, in the hands of experienced MRI readers [50]. However, considerable differences in the methodology and database selection should be taken into consideration. Patients with PI-RADS 1 or 2 were not subjected to biopsy and therefore not included in our study. The majority of PI-RADS 3 patients was also not biopsied in our hospital, and were therefore excluded from this study. A limited number of included PI-RADS 2 and 3 lesions derived mostly by downgrading PI-RADS v1 4 lesions in the re-evaluation process after the introduction of PI-RADS v2. On the other hand, Kasel-Seibert et al. [50], as well as other previous studies [26] included PI-RADS 3 lesions in their official selection criteria. Another considerable bias-introducing factor, as the authors also acknowledge, is the patient selection criteria (systematic biopsy was not performed) and the relative low (29%) malignancy rate within the selected population [50]. In the current study, almost all patients were subjected to a (non conclusive or negative) systematic biopsy and the malignancy rate was ca. 45%, thus considerably higher compared to previously published data [50].
Recently, studies that evaluate the role of CAD implementation in improving the radiolo-gist´s performance have been gaining ground on those evaluating stand-alone performance, such as our study [51,52]. Large scale approaches (n = 89 [51] and n = 107 [52]) implemented different methodologies to show that CAD implementation improved radiologists' sensitivity from 80.9% to 87.6% [51]. Accuracy without and with CAD-combined reading for differentiation between benign and malignant (AUC 0.81 versus 0.88), indolent and aggressive lesions (AUC 0.78 versus 0.88) was improved, respectively [52]. In both original research works, the CAD classifier has been previously established and trained on a comparable database, in contrast to our study where interaction with the classifier was not possible.
This study shows that a carefully designed commercialized CAD software (Watson Elemen-tary™) does not perform satisfactorily when tested with a different instrumentation and imaging configuration, despite using almost double the number of patients compared to previous studies [13,23,[33][34][35]. Lack of whole-mount prostate pathology, the low number of PI-RADS 2 and 3 lesions and the challenging character of the database including small lesions, not necessarily encountered as significant in previous studies, should be encountered as possible limiting factors. It is worth mentioning that the scanning parameters applied in our department fulfill the recommendations for diagnosis as defined by the American College of Radiology in PI-RADS™ v2. In line with previous observations reviewed by Wand et al. [13], the results of this study support that super-optimistic CAD-performances might be dataset-bound. Altogether, while being in the right framework, the tested software is not satisfactory yet. A necessary requirement of a CAD-software is the ability to apply and generalize to different scanning settings. A broader, optimally multicenter pool of datasets for broader and maybe interactive classifier training should be implemented to improve the general applicability of CAD systems [13].