Incremental Learning with SVM for Multimodal Classification of Prostatic Adenocarcinoma

Robust detection of prostatic cancer is a challenge due to the multitude of variants and their representation in MR images. We propose a pattern recognition system with an incremental learning ensemble algorithm using support vector machines (SVM) tackling this problem employing multimodal MR images and a texture-based information strategy. The proposed system integrates anatomic, texture, and functional features. The data set was preprocessed using B-Spline interpolation, bias field correction and intensity standardization. First- and second-order angular independent statistical approaches and rotation invariant local phase quantization (RI-LPQ) were utilized to quantify texture information. An incremental learning ensemble SVM was implemented to suit working conditions in medical applications and to improve effectiveness and robustness of the system. The probability estimation of cancer structures was calculated using SVM and the corresponding optimization was carried out with a heuristic method together with a 3-fold cross-validation methodology. We achieved an average sensitivity of 0.844±0.068 and a specificity of 0.780±0.038, which yielded superior or similar performance to current state of the art using a total database of only 41 slices from twelve patients with histological confirmed information, including cancerous, unhealthy non-cancerous and healthy prostate tissue. Our results show the feasibility of an ensemble SVM being able to learn additional information from new data while preserving previously acquired knowledge and preventing unlearning. The use of texture descriptors provides more salient discriminative patterns than the functional information used. Furthermore, the system improves selection of information, efficiency and robustness of the classification. The generated probability map enables radiologists to have a lower variability in diagnosis, decrease false negative rates and reduce the time to recognize and delineate structures in the prostate.


Introduction
The most frequently diagnosed cancer for men in Germany, the UK and the US is prostatic adenocarcinoma [1][2][3]. New imaging techniques in MRI and their integration [4] have been developed with the aim of providing more information to radiologists, to improve visual interpretation and to formulate a better staging of prostate cancer. However, radiologists who perform clinical routines have to face a variety of MRI modalities, analyze a considerable amount of data and provide their evaluation. This routine requires substantial human interaction, and due to the challenging nature of interpreting and staging prostate cancer, data may be filtered cognitively or disregarded. Variations and inconsistencies in reports exist. MR imaging of the prostate has become so advanced that it presents a ''dizzying array of data'' for interpretation [5].
Various MR imaging modalities have been studied to improve visual inspection and diagnosis of cancer. MRI Diffusion-Weighted Imaging (DWI), MRI Dynamic-Contrast Enhanced (DCE) and MR-Spectroscopy [6][7][8][9][10][11] have been investigated. The general consensus is that combination of modalities increases the prediction accuracy of the diagnosis. However, the inter-and intra-variability in diagnosis and the significant amount of information to analyze have motivated the development of automated tools aimed at achieving a higher cancer detection rate and reducing clinical processing time simultaneously.
To reduce variability and inconsistency, automatic approaches for detection of prostate cancer using 1.5T multimodal imaging have been developed for in vivo studies on a per-voxel basis. The utility of multimodal imaging from coronal plane with 3D features (e.g. 3D co-occurrence matrix and a discrete cosine transform) and fisher linear discriminant (FLD) has been demonstrated using computer vision techniques [12]. The performance and robustness of second-order and first-order statistical discriminants alongside feature ensemble and random forest methods have also been enhanced [13,14]. Statistical features and MR Spectroscopy have been used together with a multikernel graph embedding method demonstrating the contribution of multimodality classification [15]. The implemented second-order statistical approaches in these studies are based however on angular dependent features.
Haralick [16] suggested a method invariant under rotation but the disadvantage is the computational cost involved, both in terms of memory requirement and computational time [17,18]. We implemented an angular independent approach based on [17], integrating features with texture proprieties close to the human perception mechanism, which was also used by other medical applications [19]. We integrate features by considering the relationship between an element and all its neighboring elements at one time instead of one direction at a time, an approach based on [18]. In Madabhushi [13] a Gabor operator was employed to extract frequency signatures. However, the contribution of this method negatively affect the classification results. Other methods as local phase quantization (LPQ) [20] and local binary pattern (LBP) [21] have been utilized to quantify textural properties. Ojansivu [22] introduced a rotation invariant local phase quantization (RI-LPQ) to extract texture information. RI-LPQ outperformed Gabor operator as well as local binary patterns (LBP). Thus, RI-LPQ was implemented to identify frequency signatures.
Detection feasibility of cancer using support vector machines (SVM) and MR images (T2-weighted, apparent diffusion coefficient (ADC) derived from DWI and wash-in-rate (k ep ) derived from DCE) has been demonstrated [23][24][25]. Moreover, the performance of SVM was enhanced in comparison with other classification methods for the recognition of prostate cancer using different MRI modalities (T2-weighted, DWI-ADC and pharmacokinetic parameters derived from DCE) [26,27].
SVM performs well when the data set is limited in size and nonlinearly distributed. However, as is the case for any supervised classifier, performance and precision rely on the availability of a considerable data set size of correctly identified observations. Generally, the studies using SVM to learn from new data involves discarding the existing classifier, aggregating the new data to the old set and training a new classifier from scratch. As an effect, these methods result in unlearning [28], which is the inability of the system to learn new patterns without forgetting previously learned ones. Not forgetting old or underrepresented patterns is important for the recognition of prostate cancer because of the different possible features of cancer and its overlap with noncancerous or normal structures. Such unlearning can be addressed through an incremental learning algorithm, defined as one that meets the following criteria [29,30]: 1. be able to learn additional information from new data. 2. not require access to the original data used to train the existing classifier. 3. preserve previously acquired knowledge.
The main contribution of our framework is the implementation and evaluation of an incremental learning algorithm based on [30], which uses an ensemble algorithm inspired from Freund and Schapire's adaptive boosting algorithm (Adaboost.M1) [31]. A random subspace method [32] was also implemented for the selection of features. Biological studies have shown that ensemble classifiers with SVM outperform single and balanced SVM [33][34][35][36]. Ensemble methods are one of the most promising solutions to many biological problems and together with SVM, as base classifier, further improvement in recognition of prostate cancer can be achieved. In this study we present a pattern recognition system to estimate the probability of prostate cancer using a multimodal data set from a real clinical environment.

Patient Data Set
The institutional review board (Institute for Clinical Radiology and Nuclear Medicine, University Medical Center Mannheim, Heidelberg University) approved this retrospective study. Patient information was anonymized and de-identified prior to any analysis. We used data of 12 patients from a clinical environment (full spectrum) with ages between 52 and 75 years, Gleason pattern of 3 and 4 with Gleason scores between 6 and 8. The data set contained in most cases other glandular non-cancerous prostate conditions like prostatitis, benign prostatic hyperplasia and prostatic atrophy.
The patients did not undergo any treatment before the acquisition of MR images. Treatments such as hormone therapy, previous biopsy or brachytherapy [12,37,38] tend to alter intensity values of the images, resulting in a drastically diminished ability to determine a threshold between cancer and benign intensity values.

Acquisition of the MRI Data Set
The data set included MRI transversal slices with three modalities: T2-weighted, Dynamic-Contrast Enhanced plasma flow (DCE-PF) and DCE mean transit time (DCE-MTT). The three MRI modalities were acquired with a mean time of 45 minutes without interruption. This process aids to minimize tissue deformation and motion artifacts between the different MR images. A 3.0T scanner (Magnetom Trio; Siemens Healthcare, Erlangen, Germany) was used with a body phased array coil combined with an endorectal coil (Medrad Medical Systems, Volkach, Germany). The T2-weighted images were acquired with the following parameters: TR/TE = 4000/101 ms; Size of matrix = 3206320; echo-train length (ETL) = 25; Pixel bandwidth = 200 Hz; Field of view (FOV) = 20 cm; Pixel spacing = 0.625/0.625 mm; Slice thickness = 3 mm; Sequence type = TSE. The acquisition parameters for DCE were: TR/ TE = 193/1.02 ms; Size of matrix = 3846388; Field of view (FOV) = 39 cm; Pixel spacing = 1.015/1.015 mm; Slice thickness = 6 mm; Sequence type = 2D TurboFlash T1; Flip angle = 9. The parametric maps, plasma flow and mean transit time were calculated using the open source perfusion analysis tool UMM-Perfusion [39]. The DCE examination was performed after bolus injection of 0.1 mmol/kg body weight of gadolinium chelate (Dotarem; Guerbet, Roissy CdG Cedex, France) with a bolus velocity of 2.5 mL/s using a power injector (Medrad Inc, Pittsburgh, PA), followed by a saline flush of 40 mL.

Histological Data Set
The group of patients had histologically confirmed cancer. The elapsed time between the acquisition of the MR images and the prostatectomy was less than 24 hours which minimizes tissue deformation artifacts, and low correlation between the pathological reports and the MR images due to anatomical movements [37,40]. The histological report consists of different prostate cuts starting from the apex, middle and basis prostate region, separated in the right and left gland, serially sectioned according to a modified Stanford protocol [41]. A pathologist marked the corresponding cancer regions on each of the corresponding cuts.

Description of the System
This subsection outlines the components of the system. In Figure 1 the workflow of the pattern recognition system is shown. The two main phases of learning and classification are indicated in the figure. The pre-processing and feature extraction blocks were used in both phases. The datasets and the corresponding code are freely available upon request.
1. Ground Truth Labels: Labels of ground truth (GT) are necessary for the training and the evaluation of a supervised classifier. An experienced radiologist traced cancer regions on MR images, section by section, with the marked regions by a pathologist. A total database of 41 slices with about 70 cancerous ROIs from all twelve patients was used as ground truth. 2. Pre-Processing: Three steps were involved, a B-Spline interpolation for re-sampling of data, in which each pixel in different modalities corresponds to the same spatial location, a bias field correction using a variation of the nonparametric nonuniform intensity normalization algorithm (N3) and a landmark-based intensity standardization to minimize the intensity variation between slices. 3. Feature Extraction: Structural and functional information corresponding to T2-weighted, DCE-MTT and DCE-PF imaging was used, as well as first-order statistical descriptors (median, standard deviation, skewness, uniformity and average entropy), second-order descriptors using a neighboring gray level dependent matrix (NGLDM) and a neighborhood graytone difference matrix (NGTDM). To extract frequency signatures a rotation invariant local phase quantization (RI-LPQ) was implemented. 4. Training: The construction of a model to automate and formalize the pattern recognition process was implemented using a pixel-based supervised learning technique employing an ensemble SVM with the corresponding extension to the non-separable case [42]. 5. Classification and Probability Map: The last step in the inductive inference process was the prediction/classification using the optimal model obtained after training. The posterior probability was also estimated to present the corresponding probability map.
6. Evaluation: The results were evaluated on accuracy, precision and efficiency as well as statistical significance. A methodology was also implemented to assess the results using area under the receiver operating characteristic curve (AUC-ROC) and to achieve an optimal generalized answer.
In the following subsections, we describe in detail the individual steps of our system.

Ground Truth Labels
Ground truth (GT) labels for the learning phase was extracted using the annotations of an expert radiologist following the histological reports of a pathologist. The same process has been used in many investigations for the generation of GT [12,23,43] in cases where the conditions of clinical data are not suitable to perform an automatic registration. An expert radiologist can recognize landmarks such as morphology of central and peripheral zone (PZ) of the prostate, urethra, seminal vesicle, calcifications, etc. These landmarks help to recognize and trace benign and malignant regions in MRI. However, delineation still remains a difficult task for a human observer. Therefore, considering the recommendation of [44], a methodology for manual segmentation to minimize variation in the delineation process was followed (see Fig. 2): 1. Tracing: Regions of interest (ROI) were traced using Osirix software [45]. The ROIs were annotated on the T2-weighted images because of its high resolution and to facilitate anatomy recognition. 2. Magnification factor: We used a fixed factor (1:800) for all data sets to annotate the ROIs. 3. Window/level value: The window width and center in the T2weighted images were fixed depending on the minimal and maximal gray value in each volume. 4. Peripheral zone (PZ): Only areas inside the PZ were assessed.
The radiologist performed the corresponding annotation of this zone on T2-weighted images. We gave special attention to the PZ not only to reduce high bias produced by non-cancerous prostate conditions such as benign prostatic hyperplasia located in the central zone [4,46,47], but also the PZ constitutes over 70% of the glandular prostate [46] and 65%-74% of the prostate tumor nodules are located in this zone [48].
After the annotations by the radiologist on T2-weighted images, the corresponding masked images constituting the ground truth were extracted. One precondition to minimize mislabeled pixels in the ground truth was the selection of a minimum size of cancer. Haider [40] pointed out that tumor sizes greater than 0.13 cm 2 are sufficient to detect significant cancer areas. Therefore sizes of cancer larger than 0.1 cm 2 were employed in this study.

Pre-Processing
During image acquisition, an affine transformation matrix was recorded describing the spatial location (translation, direction and rotation) of each data set. Based on the transformation, a spatial re-sampling of the data sets DCE-MTT, DCE-PF was performed using ITK Toolkit [49] to obtain a new image in the same spatial space as the T2-weighted data. B-Spline interpolation was employed in the re-sampling to improve registration.
The PZ of the prostate suffers from a corruption of intensity values. To correct bias field artifact a variation of the popular nonparametric nonuniform intensity normalization (N3) algorithm [50,51] was used. The algorithm, named ''N4ITK'', connects a robust B-Spline algorithm with a multiresolution optimization which demonstrated a superior performance [52]. The parameters were set following the recommendation of [53]. Figure 3 illustrates an example of an image with the calculated bias field superimposed.
Another bias factor in MRI is non-standardness between slices. To overcome intensity variation a landmark-based method [54,55] was applied because of its efficiency and accuracy [56]. The basic premise of a landmark-based method is to find a mapping based on the median and percentiles that deforms the histogram of an input image to match a reference. The mapping is performed linearly and independently per segments using three landmarks.
The landmark-based method consisted of two steps. First, a training step was executed only once for a given modality. The histogram H j was computed for each slice j~1,:::,S in a volume, where S is the total number of slices. The median intensity (M j ) and the intensities corresponding to the 0 and 99.8 percentile (P 1 j ,P 2 j ) of H j were computed (see [54] for details). After that, the median was mapped to the standard scale (R 1 ,R 2 ) (equation 1).
Finally, the mean value ( M M) of mapped M' j medians of each slice was computed.
The second step was the transformation of the intensity. In the transformation step, again the median intensity (M j ) and intensities corresponding to the 0 and 99.8 percentile (P 1 j ,P 2 j ) of H j were calculated. The intensity values for each pixel x were mapped to a new value x': Finally, the set of images were normalized using z-scores such that intensity values have a mean of 0 and standard deviation of 1.

Feature Extraction
A total of 32 features corresponding to structure (T2-weighted images (f 1 ) ), functional (DCE-MTT (f 2 ), DCE-PF (f 3 ) images) and statistical texture features were used to describe the appearance and shape of the intensity distributions of prostate structures. Texture information was extracted from the structural T2-weighted images.
Madabhushi [13] mentioned the importance of first-and second-order statistical features to improve the classification results. We consider some additional first-order descriptors of the intensity histogram based on [57]. We computed five operators with two windows sizes (k [ 3,5 f g) to get information on the median intensity (f 4 ), dispersion (f 5 ), skewness (f 6 ), smoothness (f 7 ) and variability (f 8 ) using the following operators respectively: 4. Energy (f 7 ) The operators were computed for each pixel x in all slices of a volume. N x contains the k-neighborhood pixels centered on x and x is the mean of N x . p(N x ) refers to the probability distribution of surrounding pixels.
Second-order statistical features were extracted using two angular independent approaches, a neighborhood gray-tone difference matrix (NGTDM) and a neighboring gray level dependent matrix (NGLDM).
In NGTDM, the matrix NGT is set using the equation 6, The array R x(i) contains all pixels with gray value x(i), R is the number of pixels in R x(i) .
M M x(i) is the mean value over a neighborhood centered at, but excluding x(i). Five operators were computed with the NGT matrix (Coarseness (f 9 ), Contrast (f 10 ), Busyness (f 11 ), Complexity (f 12 ) and Texture Strength (f 13 )) [17]. These operators quantify the difference between regions with different intensity levels, the spatial frequency of intensity changes from one pixel to its neighbor and the magnitude of the difference between two intensity values, weighting the difference with their probabilities of occurrence.
For NGLM, equation 7 describes the implementation of matrix Q. For a given image of size ½n,m with pixel x(i,j),i~1,2,:::,n and j~1,2,:::,m, there is a matrix Q(c,r), where c is the intensity value x(i,j) and r the number of repetitions of c.
The neighborhood was defined by d~tk=2s[Z z , where k is the windows size (k [ 3,5 f g). The points (q,v) are the coordinates to localize the pixel x[N x . The symbol # denotes the number of elements in the set, r((i,j),(q,v)) is the distance between the elements (i,j) and (q,v), and a[Z z defines the similarity between two pixels (a = 10). Three operators were computed using the Q matrix (Number Nonuniformity (f 14 ), Second Moment (f 15 ) and Entropy (f 16 )) [18].
To extract frequency signatures, a rotation invariant local phase quantization (RI-LPQ) implemented by Ojansivu [22] was integrated into the system (the Matlab implementation is available at the Outex site http://www.outex.oulu.). It is based on the blur invariance property of the Fourier phase spectrum. The phase information was extracted from a local k-neighborhood N x with k = 3, centered on a pixel x of a subimage I(x). The local spectra was computed with a short-term Fourier transform (STFT) defined by equation 8.
Where u is the frequency, and w G is a circular Gaussian function. A 2-D rotation matrix (R h ) with an angle h was used to rotate the images and the circular local neighborhood (36 different angles were generated). The local Fourier coefficients were computed and the phase information was recorded by observing the sign of the imaginary parts of the coefficients' vector. The resulting values were normalized to a range of 0 to 15. A RI-LPQ histogram of these values with 16 bins was composed and used as 16 additional features (f 17 to f 32 ) for each pixel in the classification.

Training
The advantage of an incremental learning algorithm lies in the ability to incrementally learn additional information from new data when it becomes available. We implemented an ensemble algorithm based on [30]. The ensemble classifiers were trained based on a dynamically updated distribution over the training data points, in which points that are difficult to classify receive a higher probability to increase their chance of being selected into the next training data set. The algorithm uses a database DB k ,k~1,:::,K where K corresponds to the number of patients available (in our case 12 patients). The data points of all patients were first randomly permuted and split into K = 12 equal-sized parts.
As base classifier the non-separable case formulation of SVM was employed, which will be referred to 1C-SVM. The aim of training using 1C-SVM was to construct an optimal hypotheses h, separating two classes and maximizing the distance to the closet point from either class [58], which is robust against model misspecification and provides a high predictive power. The main concept is finding an optimal hyperplane f (x)~w(x i ) T wzw 0 in the feature space using the formulation in equation (9).
where w(x i ) maps x i into a higher-dimentional space, C is a regularization hyper-parameter that penalizes misclassification. j is a slack variable to relax the constraints and is called margin error or misclassification error. The equation (9) can be solved with a quadratic programming solution using the lagrangian dual objective functions [59]. By using the dual objective function is possible to use the kernel trick K(x i ,x j ), which maps data into higher dimensions in order to handle non-linear relationships between patterns. The mapping to N a higher dimensionality also permits the capture of similarities between data points. A Gaussian Radial Basis Function (RBF) was used as a kernel, given by equation (10).
c is a hyper-parameter which changes the smoothness of the kernel function i.e. a greater or lesser relationship between the data points is found depending on the value of c.
To estimate the optimal hyper-parameter c and C for 1C-SVM the heuristic method Nelder-Mead was used [60,61]. The selected criteria to find the optimal classifier was the area under the receiver operating characteristic curve (AUC-ROC) which is commonly used in medical decision making, and is useful for organizing classifiers and used increasingly in machine learning and data mining [59,62].
The inputs to the ensemble algorithm are: 1. training data S k~( x i ,y i )Di~1,:::,N k f g . It consists of N k training data points with x i [R d being the d-dimensional features and y i [ {1,1 f g indicating the corresponding class. The N k data points are randomly selected from the k th database DB K ; 2. a base classifier to generate a hypotheses h. A requirement on the base classifier is that it obtains a 50% correct classification performance on its own training data set. 3. an integer T k specifying the number of iterations t~1,2,:::,T k to be generated for each data set. T k~3 were employed, which was enough to reduce the prediction error.
The ensemble algorithm starts by initializing a set of weights a t for the training data S k , and a distribution D t obtained from a t . According to D t the training data S k is divided into a training subset TR t and a test subset TE t at the t th iteration of the algorithm. Without a priori information the distribution of weights D t is initially set to be uniform. At each iteration t, the weights adjusted at iteration t{1 are divided by the sum of all w t{1 to ensure a legitimate distribution and a new D t is obtained. Training and test subsets are drawn randomly according to D t , and the SVM is trained using tenfold CV method. A hypothesis h t is obtained as the t th classifier, whose error e t (11) is computed on the entire data set S k~T R t zTE t .
The error e t is required to be less than 0.5 to ensure a reasonable performance of h t . If the condition is satisfied, h t is accepted and the error is normalized to obtain the normalization error (12) If the condition is not satisfied, then the current h t is discarded, and a new training subset is selected. All hypotheses generated so far are then combined using the weighted majority voting to obtain the composite hypothesis H t (13), which allows efficient incremental learning capability when new classes are introduced. The hypothesis with good performance are awarded a higher voting weight.
The error of the composite hypothesis H t is computed with (14), and must also be less than 0.5 to ensure a reasonable performance of H t , otherwise the algorithm discards the composite hypothesis and returns to select a new training subset.
The normalized composite error is then computed with (15).
To reduce the weights of those data points that are correctly classified by the composite hypothesis H t and lower the probability of being selected in the next training subset, the rule of equation 16 is used.
The hypothesis H F for the training subset and the subset of features can be obtained by combining all hypotheses that have been generated so far using the weighted majority voting rule (17).
The features for use in the classifier was performed with a pairwise correlation using the spearman's coefficients [63] avoiding multicollinearity. Features with a correlation coefficient lower than 0.60 were integrated in the pattern recognition system. Furthermore, a sequential forward methodology, leaving out the features that did not decrease significantly the prediction error, was employed.
Another method for the selection of features is a generalization of the random forest algorithm, known as random subspace ensemble [32], which perform better when the features have high correlation. A random subspace of incremental learning SVM for each training subset N k was implemented. The method randomly selected different subsets of features to train each classifier. Given the training subset N k , a subspace F was chosen with N F features. We employed F~30 and N F~5 0% of the total number of features. All subspaces F of incremental learning SVM were trained and combined using a weighted majority voting rule. To implement a random subspace the equation 17 was redefined as follow: The normalized error B F (19) was calculated with the mean square error (mse) of the hypothesis H F . The final hypothesis H final is computed with weighted majority voting (20).

Classification and Probability Map
The classification output of the SVM are decision values indicating the distance of data points to the hyper-plane without an estimation of how likely a data point belongs to a class, which is more important in a medical routine. We applied an extension of LIBSVM [42], which is an improvement of Platt [64], to give a probability estimation of cancer based on Lin et al. [65] and Wu et al. [66]. The extension proposes approximating the posterior probability by a sigmoid function using decision values and labels of each data point. Finally, the probability map estimation was saved in DICOM files for posterior analysis and diagnosis by the physicians.

Evaluation
To achieve an objective evaluation, the following methodology was implemented. The data set was divided into three subsets: training, validation and test. The aim of the validation subset was to make a fair estimation of performance, independent of the test data, and to pick optimal parameters that maximize the performance of the validation subset which in turn provides a more generalized solution.
We implemented a nested cross-validation (CV) method for an unbiased estimation of prediction error in an independent test data. As used in [14,15] threefold CV was used for the evaluation of the classifier, i.e. 8 patient studies were used for training and validation and 4 patient studies for independent testing. To estimate the unknown tuning parameters of the classifier tenfold CV was used with data points from the 8 studies selected for training and validation. The data points were randomly permuted and divided in 10 parts, one part for the validation and nine for training. The AUC of the receiver operating characteristic curve (ROC) was used as performance measure which provides a representation of the probability that a randomly chosen disease subject is correctly ranked with greater suspicion than a randomly chosen non-disease subject [67].
Since in other approaches other statistical measures of performance for binary classification were used, we also computed statistical measures, such as the sensitivity (TPR) and specificity (SPC) (21,22) as complementary information. To obtain binary classification a thresholding procedure is used. We selected a threshold T h such that a data point i is classified as cancerous if the posterior probability p(y i =x i ) §T h , otherwise is classified as noncancerous. Several threshold values T h were selected and the balanced accuracy (BAC), which is defined as the mean of sensitivity and specificity, was computed for each of the threshold values. The selected T h is the one which produced the maximum BAC value. The use of TPR and SPC yielded additional information about the performance of the classifier and allowed comparison of our results with other approaches. Additional measures like the Positive Predictive Value (PPV) (23) and the area under the PPV-TPR curve (AUC-PPV/TPR), known in machining learning as Precision-Recall curve, were also computed to give further information about performance on the positive class (cancer structure). PPV is also significant because it is also related with a priori chance of extra-capsular disease [37].
PPV~T P TPzFP ð23Þ TP and TN denote the number of true positive and true negative data points, respectively. FP and FN denote the number of false positive and false negative data points, respectively.
To evaluate the significance of the results between different models we used the Friedman test [68], which is a non-parametric statistical test that analyzes the ranked samples of an experiment and detects differences between the models across multiple test attempts. The p-value as a measure of statistical significance allows to reject or to accept the null hypothesis that there is no difference between experiments [69]. A p-value lower than 0.05 rejects the null hypothesis and the test is statistically significant, between 0.05 and 0.1 is marginal significant and higher than 0.1 is not statistically significant.

Results
Feature generation, texture analysis, standardization, normalization, data cleaning, bootstrapping and evaluation methodology were implemented using Matlab 7.10 software. Training, prediction and the extensions of 1C-SVM were computed with LIBSVM C++ code [42]. To speed up the CV process and classification phase we implemented the code using the parallel computing toolbox of Matlab and the Grid Cluster infrastructure of Baden -Wuerttemberg bwGRID [70].
The results of accuracy, precision and efficiency are presented using the mean and standard deviation. We present also the corresponding box plots to give additional statistical information. The computational time reported was running the classifications on a PC with an Intel Core i7 CPU at 3.4 GHz and 16 GB of RAM.

Performance of a Single 1C-SVM
A single 1C-SVM was run to compare the results with the proposed method. This classifier was implemented in other approaches for prostate classification giving good results [12,23,25,26]. The features were selected according to the spearman's coefficients and the sequential forward approach. The features that significantly improved the AUC-ROC were the first-order texture descriptors: standard deviation, third moment and energy. We compared three models using different configurations. The first model contained only values of T2-weighted images (T2), the second includes T2-weighted, DCE-MTT, DCE-PF (T2DCE) and the third model consisted of T2 and its most significant texture features (T2Tex). Table 1 shows the results of 1C-SVM using the three models.
Mislabeled pixels or extreme intensity variations can be present in the training dataset. The identification of outliers in highdimension for each class, without any assumption on the distribution, is challenging. To avoid deviant observation we used One-class SVM [42] in each class independently for each patient dataset. The parameters n and c, using a radial basis function, were optimized with Nelder-Mead. The training data set is subject to imbalance, known as minority class, that can cause class representation problems. Bootstrap over-sampling was used to balance the number of data points in each class for each patient. Bootstraping to enlarge the sampling pool [71] was performed with a Monte Carlo sampling algorithm [72]. The algorithm chose uniformly distributed pseudorandom data with replacement. To obtain a preferred sample size, the non-cancerous regions were undersampled, while the cancerous regions were oversampled until both classes were balanced. The algorithm was run for each patient independently. Another method to overcome the minority class problem is a cost-sensitive SVM [42] to penalize misclassification asymmetrically. However, we did not get a significant improvement compared to stratified sampling. For simplicity and reducing computing cost, we used the bootstrap sampling to balance the number of data points. Table 2 illustrates the results using a single 1C-SVM with a processed training dataset. Comparing table 1 and 2, the difficulties of a single SVM to handle outliers and an imbalanced data set can be observed. Including methods to avoid outliers and balance the data set contribute to enhance the performance of a single SVM. However, these procedures and a large number of samples introduced a significant computational cost during the training phase.

Evaluation of the Incremental Learning Algorithm
The same three models were compared (T2, T2DCE and T2Tex). Table 3 shows the results using the ensemble framework. The method based on Platt [42] to estimate the probability of cancer was implemented. We also used the decision values of SVM without the extension based on Platt to confirm the results. The AUC-ROC = 0.87060.103 and AUC-PPV/TPR = 0.46160.247 using the decision values did not present a significant difference. The model with T2-weighted and first-order texture descriptors (T2Tex) had the highest accuracy with an AUC-ROC significantly better than the T2 and T2DCE models (pv0:05), by highlighting relationship between pixels. Although T2 presented the highest sensitivity, T2Tex improved the trade-off between sensitivity, specificity and PPV values, which is more meaningful in clinical routines. T2DCE did not provide any significant improvement in performance. Figure 4 illustrates the box plot of these results. In comparison with table 2, the results of table 3 were not significantly different, however the ensemble 1C-SVM did not require a processed training data set, demonstrating that it is more robust in the presence of outliers and the imbalance in data than a single SVM. Furthermore, the computational cost in the training phase was a quarter of the time compared to a single SVM. A random subset of features was also evaluated, however we did not obtain a significant improvement (AUC-ROC = 0.84860.083) and the computing time increased by a factor of 10.

Assessment of the Classifier According to the Area of Cancer Structures
Recognition of cancer in the PZ is challenging, due to the presence of other glandular non-cancerous prostate conditions, different stages and sizes of cancer. Therefore, we evaluated the performance of the framework using the two best models: T2 and T2Tex according to the size of cancer. Table 4 summarizes the results for two different sizes of cancer (0:1 cm 2 ƒAƒ0:5 cm 2 and 0:5 cm 2 vBƒ1:3 cm 2 ). T2Tex performed better in AUC-ROC and AUC-PPV/TPR, for both small (A) and big areas (B), than T2, as well as a better trade-off between sensitivity, specificity and PPV. Texture information contributed also to identify suspect structures with different stages and sizes. In case of small areas (A) the false positive rate was higher and demonstrated discrimination complexity between small cancer areas and normal or other glandular non-cancerous prostate conditions. In case of big areas (B) the results were similar for both models.

Evaluation of Ensemble 1C-SVM Against Human Experts
For comparison, two radiologists marked only cancer regions for five different patients (around 15 slices) in T2 images observing the usual MRI modalities and following normal evaluation routines (without any knowledge of the pathology or previous diagnosis). After the segmentation of cancer structures the results were evaluated against the ground truth. Similarly, the results of the automatic classification for the selected test patient were evaluated against the ground truth. Table 5 shows the comparison of cancer structures segmented by a radiologist and results obtained by our system. In terms of PPV, the radiologist can discern with more accuracy anatomical tissue or other noncancerous prostate conditions that overlap with cancer structures. An expert who is familiarized with structures and shapes and prior knowledge about the meaning of other structures can further reduce the false positive rate. The sensitivity in the results by the  human expert exemplifies the challenging nature of interpreting and staging prostate cancer. The human expert has to handle a variety of MRI modalities, MRI intensity variation artifacts and confined time for analysis. Our system helps to improve the sensitivity and decrease the time to recognize and delineate cancer regions. Figure 5 illustrates the results after classification of two test patients using an ensemble 1C-SVM with T2Tex as features. A correlation between ground truth and classification results was observed on slices containing histological information. In neighbor slices was also observed evidence of cancer. These slices do not have histological data because of loss of information during the histology process for the extraction of prostate tissue. Nevertheless, the possibility of cancer in neighbor slices is evident due to the irregularities in shape and spread characteristics of cancer. The major number of T2-weighted slices together with the classification model contributes to locating suspicious cancer structures along the inter-slice direction.

Qualitative Evaluation
In the presence of other non-cancerous prostate conditions, ensemble 1C-SVM classified these regions as suspicious tissue with a probability of cancer between 40% to 80%, but the results depend on the stage of these prostatic conditions (e.g. first patient in the fifth slice on right side).
We trained the classifier with pixels inside the PZ. If regions outside the PZ or other prostate tissue (e.g ejaculatory ducts or neurovascular bundle) are classified, the results are meaningless. These regions have an overlap with cancer structures and were not included in the training.
Images such as the first and the second slice of patient 2 have to be assessed in detail with the radiologist and the pathologist to verify if this result was a statistical error of the classifier or a systematic error because of mislabeled data.

Discussion
In this work we present the feasibility of developing an ensemble 1C-SVM. To the best of our knowledge, this is the first automatic classifier for recognition of prostate cancer that uses incremental learning algorithms satisfying the criteria of being able to learn additional information from new data while preserving previously acquired knowledge and preventing the unlearning [28] of old or underrepresented patterns. Moreover, the training phase does not need to be initiated from scratch when new data becomes available, actually it does not require access to the data used to train the existing classifier at all. This framework makes possible not only the integration of new discriminant features, but also accommodation of new classes that may be introduced with new data.
As mentioned in [25,26], a comparison between study results is difficult because of differences between study population. In addition, each study has a different amount of data in which artifacts in the MR images and imbalances between cancer and benign data points are presented. Other differences involve annotations that do not follow same standards (e.g. classification of pixels, voxels, sextant), the discrimination task (e.g. using cancerous and normal tissue or normal tissue together with other non-cancerous conditions) and the evaluation methodology. We can report that with our framework an ensemble 1C-SVM renders Table 2. Classifier's performance for a single 1C-SVM with a processed training dataset.    [12,14]. These studies were evaluated using leave-one-out (LOO), nevertheless it is known that LOO has a high variance [59].  The analysis of cancerous structures is not trivial, they are characterized by their lack of shape and their overlap with other tissues (e.g. non-cancerous prostate conditions, and even healthy tissue). Moreover, the degree of intensity overlap depends also on the stage and size of malignant regions [9,46] and together with the anatomy that change from patient to patient, makes the discrimination between benign and cancer tissue more difficult. Our results suggest that a more straightforward and efficient approach using first-order texture features is sufficient to achieve significant improvement in the recognition of cancer. Indeed, Niaf [26] demonstrated that using all extracted features would lead to poor performance. The degraded performance in our results using second-order features could reflect the differences in stage and size of malignant regions and the presence of other prostate conditions. However, other studies [13][14][15] suggest that second-order features yield good results. An evaluation of second-order angular independent features with a larger database containing more cancer regions (e.g. in different stages or different sizes) could help to assess the importance of these features in the classification. A random subspace of features did not improve the results of the incremental leaning algorithm. In Kuncheva [73] was reported a modest improvement using random subspaces depending on the data set. A restrictive choice of features and a higher number of random subspaces could help to increase the performance, but it implies higher computational cost, and we did not find evidence suggesting that a better choice of F and N F could improve the results significantly.
The data set was pre-processed to increase the performance of the classifier and robustness of the system. A general drawback of previous publications is the limited focus on pre-processing the data to minimize artifacts and deviations, e.g. the corruption of intensity values in the peripheral zone (PZ) of the prostate near to the rectum caused by sharp near field endorectal coil artifacts [12,74] and the intensity variations between slices in MR imaging, known as non-standardness [54,55,75]. High signal intensity artifacts can not only limit the detection of cancer in the PZ, but also decrease the ability to generalize, which in most cases has not received appropriate attention in medical image applications. Although, these intensity variations have usually little impact on visual diagnosis, they significantly affect the performance of many image processing and analysis techniques based on absolute pixel intensities [76][77][78].
The ensemble workflow is similar to the decision process. A human being tends to seek several opinions before making any important decision. Individual opinions are weighted and combined to obtain a final decision [79]. This workflow offers higher accuracy and stability when compared to a single classifier. In addition, ensemble algorithms can often alleviate the problems of small sample size and high dimensionality which commonly occur in many bioinformatics applications [80].
The algorithm using ensemble 1C-SVM is also easy to implement and converges much faster than a single classifier because the ensemble algorithm does not use a large database for the training phase. Managing and analyzing huge databases implies computing complexity, storage problems and poor classification accuracy due to difficulties in finding correct classifiers [81], i.e. the size of the search space increases and underrepresented patterns that depend on the changing anatomy from patient to patient, may go undetected.
The incremental learning algorithm also demonstrates a more generalized performance and a narrow confidence interval, which indicates improved stability and robustness. Furthermore, this suits the working conditions in medical applications, in which a representative data set is difficult and time consuming to obtain. Data sets often become available in small and separate batches at different times.
The base classifier 1C-SVM method using the Gaussian Radial function also renders good results. The optimization process to   find the hyper-parameters, together with the use of a radial kernel, was crucial to obtain higher accuracy. A Gaussian function can better characterize the similarities between cancer structures and neighbor pixels, which correspond in fact to the standard deviation of the intensity values. A default c parameter depending on the number of features, as used in other studies, is not an optimal choice to find similarities between structures. A heuristic method, such as Nelder-Mead, is a better choice for model selection, however a poor starting point can lead to a local search. We suggest to first run a loose search to select the starting point, and then run Nelder-Mead for a precise search.
The results using areas of cancer confirmed the positive performance of our system for different sizes of cancer structures. In addition, we observed that in the presence of larger malignant structures, other regions such as the central zone or tissue outside the prostate capsule were also affected by cancer. This observation suggests that the PPV value is related to the chance of extracapsular disease, as mentioned also in [37].
Our results demonstrated that inclusion of the functional features used in this study: DCE parameters, plasma flow (PF) and mean transit time (MTT) do not improve the classification results. However, Vos et al. [24] demonstrated the contribution of different DCE parameters, such as T1Static, Ve in the extravascular and extracellular space, transfer constant (K trans ) and rate constant (K ep ). These DCE parameters have to be assessed with our system, which may improve the performance differentiating cancer and benign tissue. Other functional features such as Diffusion-Weighted Imaging (DWI) and MR-Spectroscopy have been evaluated [9][10][11]15]. However, we do not integrate these features because of high bias and noise within the available data which are produced by artifacts in these modalities.
In our study, the ground truth was created manually by visual correlation of the histological sections with the MRI slices. There was not enough available information to perform an automatic registration, this is an important point to improve in the future. Nevertheless, visual correlation is still an acceptable method that has been used in other technical and clinical publications [12,13,15,24,26,27].
Further investigations are necessary such as including more functional features in the system, increasing the number of patients and improving the extraction of ground truth. The integration of a third class corresponding to other non-cancerous prostate conditions is also important, for which the incremental learning algorithm can be improved e.g. using other combination rules or Figure 5. Recognition of cancer structures for two clinical patients. Five slices corresponding to the MRI volume of each patient are illustrated. The probability map was superimposed over the T2-weighted images. The background represents the gray values of the T2-weighted images for the whole prostate tissue. In the foreground the probability estimation of cancer is shown using a color map only over the PZ for the corresponding slices. The probability of cancer is ranged on a color scale: red (99%{70% probability), yellow-green (70%{30% probability), without color (lower than 30% probability of cancer). The white marks highlight the ground truth regions. doi:10.1371/journal.pone.0093600.g005 Table 6. Summary of current studies and our system results. The computer-aided studies from Artan [25], Niaf [26] used a leave-one-out (LOO) CV methodology, and from Tiwari [15] which, similar to this paper, used three-fold CV adding prior information in the ensemble to increase the performance.
Conclusion A pattern recognition system using an incremental learning algorithm and texture information for the automatic classification of prostatic adenocarcinoma, was presented. Our system not only suits the complex working conditions in medical applications, but also learns additional information from new data preventing unlearning. The performance, effectiveness and robustness of the system, along with its simplicity and speed in training as well in classification was also presented. Moreover, our system operates with a low number of features (T2-Weighted and texture operators) demonstrating the importance in the selection of information.
The probability estimation map generated by our system enables radiologist to diagnose with less variability in the diagnosis and reduce not only the false negative rate but also the time to recognize and delineate structures in the prostate. The radiologist could evaluate this estimation map before issuing a final report in order to suggest a better targeted treatment. As a result the radiologist can opt for a less aggressive therapy, can reduce the number of unnecessary biopsies and spare healthy tissue. In addition, our system can also be a suitable advisory tool for nonexpert radiologists.