Influence of Texture and Colour in Breast TMA Classification

Breast cancer diagnosis is still done by observation of biopsies under the microscope. The development of automated methods for breast TMA classification would reduce diagnostic time. This paper is a step towards the solution for this problem and shows a complete study of breast TMA classification based on colour models and texture descriptors. The TMA images were divided into four classes: i) benign stromal tissue with cellularity, ii) adipose tissue, iii) benign and benign anomalous structures, and iv) ductal and lobular carcinomas. A relevant set of features was obtained on eight different colour models from first and second order Haralick statistical descriptors obtained from the intensity image, Fourier, Wavelets, Multiresolution Gabor, M-LBP and textons descriptors. Furthermore, four types of classification experiments were performed using six different classifiers: (1) classification per colour model individually, (2) classification by combination of colour models, (3) classification by combination of colour models and descriptors, and (4) classification by combination of colour models and descriptors with a previous feature set reduction. The best result shows an average of 99.05% accuracy and 98.34% positive predictive value. These results have been obtained by means of a bagging tree classifier with combination of six colour models and the use of 1719 non-correlated (correlation threshold of 97%) textural features based on Statistical, M-LBP, Gabor and Spatial textons descriptors.


Introduction
The tissue microarray (TMA) is an ordered array that contains several hundreds of small tissue cylinders (core sections) in a paraffin block. A typical example of a breast TMA thumbnail obtained with an Aperio ScanScope T2 is shown in Fig 1. The resolution of Aperio ScanScope T2 at 40x objective is 0.23 μm/pixel. Thus, these cores are images at 40x magnification and their size varies between 6200 and 7300 pixels. These core sections can be cut and processed like any other histological section, using immunohistochemistry (IHC) for protein targets and in situ hybridisation to detect gene expressions or chromosomal alterations [1] [2]. Moreover, TMAs allow rapid and reproducible investigations of biomarkers that define the presence of [11]. Therefore, colour is also an important feature, and its interrelationship with texture should be considered when possible [12]. Table 2 summarises different state-of-the-art feature descriptors used in histopathology, the classification methods and their accuracy. The number of classes has also been included, as well as the properties of the database (BBDD). These properties include the number of images used in the classification process, if they are whole slide images (WSI) or not, the acquisition scale and the device used in the acquisition process.
The research literature using a combination of colour and texture for features description is limited and has not been significantly explored in digital pathology. The research shows that no standard colour model is used. Moreover, there is also a lack of comprehensive studies of the most suitable colour models for the different tissues types in histopathology [12][13][14].
Other problems are that the database is poorly populated and the number of classes to classify is insufficient. Though some promising results have been shown for breast biopsy classification with 98.6% and 99.25% accuracy [15,16], these works use only two classes. In breast tissue there are other structures that must be considered, such as, adipose tissue and benign anomalous structures.
This study presents an exhaustive assessment on the utility of texture and colour models never considered before for automatic classification of breast TMA into four classes or structures. To this end, four different texture models comprising six types of descriptors and eight colour spaces were analysed in order to find which descriptor or combination of descriptors best discriminate between breast TMA structures. We found significant performance differences among descriptors and a significant improvement when certain groups of descriptors are combined with different colour models, providing an overall accuracy rate of 99.88%.
The remainder of the paper is organised as follows: Section 2 presents the material and methods used in this work. Materials include the eight colour models applied on our histopathological images and the hardware employed to perform the experiments. In Section 3, the feature descriptors, the feature extraction process and the feature dimension reduction methods are described. Section 4 addresses the classification process including an explanation of the classifiers used. Section 6 presents the experimental results divided into four different experiments that combine colour models and descriptors. Finally, Section 7 concludes the paper.

Materials and Methods
This study was reviewed and approved by the ethics committee at Hospital General Universitario de Ciudad Real. This is a retrospective cohort study and no identifying information was taken from the patients. We used only the digital image obtained from the scanner at the Hospital General Universitario de Ciudad Real. The TMA images were acquired by the motorised microscope ALIAS II (LifeSpan Biosciences Inc.) and by an Aperio ScanScope T2 at 10x. Once the breast TMA cores were digitalised, 628 representative regions of the four tissue classes were selected. These regions of interest were selected in a manual way and under the supervision of a pathologist. The size of these regions was 200 x 200 pixels (0.74μm/pixel at 10x) and the TMA tissue classes were: i) benign stromal tissue with low and medium cellularity (170 images), ii) adipose tissue (103 images), iii) benign structures and anomalous (163 images), and iv) different kinds of malignancy, that is, ductal and lobular carcinomas (192 images). The first class (i) is characterised by the pink hue-blue stromal cells prior staining due to tissue with HE (hematoxylin and eosin). The second class (ii) is represented in the images as bubbles on the tissue stroma. The third class (iii) shows lobules, ducts and several anomalous structures. The types of anomalous benignity represented in class three are: sclerosing and adenosis lesions, fibroadenomas, tubular adenomas, phyllodes tumours, columnar cell lesions and duct ectasia (see Fig 2). Finally, the fourth class (iv) is characterised by the different kinds of malignancy. Images of this class show ductal and lobular carcinomas in situ and invasive.
Experiments were performed on an Intel Core i7 950 3.07 GHZ computer with 12 GB RAM. The method was implemented using C/C++ and the IPP and OpenCV libraries for image processing. The Intel TBB library was used for parallelisation of the algorithms.  Classifiers used in this paper were extracted from the PRTools (Pattern Recognition Tools) MATLAB toolbox.

Colour Model Analysis
In order to better represent the HE images and the colour patterns that the human visual system perceives, we used six colour models: RGB, CMYK, HSV, Lab, Luv, SCT and two combinations of them Lb and Hb. The eight colour representations were compared individually and jointly. In this way we addressed two issues: (i) characterisation of the HE images with limited colour spectrum, i.e., only with blue and pink hues; (ii) representation of the three components that the human visual system perceives, i.e., luminance, chrominance and an achromatic pattern component hence, the importance of analysing colour models in histopathological analysis. The eight colour spaces are described as follows: • RGB. In the RGB colour model, each colour appears as a combination of the three primary spectral components: red, green, and blue. The RGB model is based on a three-dimensional Cartesian coordinate system. This is represented by a cube with the corners corresponding to the red, green, blue, cyan, magenta, yellow, black (0,0,0) and white (255,255,255) colours. Grey scale extends in a diagonal from black to white corners of the cube. Colours are the points on or inside the cube, defined by vectors extending from the origin.
• CMYK. The CMYK colour model is a subtractive colour model contrary to the RGB which is an additive colour model. This colour model refers to the four inks used in colour printing: cyan, magenta, yellow, and key (black). In subtractive colour models black colour is the absence of light and other colours are the combination of all primary coloured lights. This colour model has been used in studies about tissues biomarkers by IHC for cervical cancer [17] and intra-epithelial lesions [18].
• HSV. The HSV model is a nonlinear transformation of the RGB colour space that describes colour (hue) in terms of their shade (saturation) and brightness (value). This is represented by a cone. Hue (base cone) is expressed as a number from 0 to 360 degrees representing hues of red (starts at 0), yellow (starts at 60), green (starts at 120), cyan (starts at 180), blue (starts at 240), and magenta (starts at 300). Saturation (radius base) is the amount of gray (0% to 100%) in the colour. Finally, value (cone height) works in conjunction with saturation and describes the brightness or intensity of the colour from 0% (black) to 100% (white). HSV and HSI have also been used in several studies with histological images of prostate [19] or breast stained with diaminobenzidine and hematoxylin (DABH) [20].
• Lab. The Lab colour model (also called CIE L Ã a Ã b Ã colour model) was specified by The International Commission on Illumination (CIE). This was designed to approximate human vision and is composed of three colour channels: L, a and b. The L channel indicates colour luminosity, black and white take values 0 and 100, respectively. The a channel indicates the colour position between magenta and green (negative values indicate green while positive values indicate magenta). Lastly, the b channel indicates the colour position between yellow and blue (negative values indicate blue while positive values indicate yellow). This colour model has been used in histological images due their capability to segment the different tissue structures, including stroma, cells and lumen [19,21,22].
• Luv. Luv (also called CIE L Ã u Ã v Ã colour model) is a colour model adopted by the CIE in 1976 as a modification of U Ã V Ã W Ã and Lab colour models. The Luv model is especially useful when working with a single illuminant and uniform chrominance. The L channel represents colour luminosity as in Lab. The chromaticity components u and v are coordinates of a specified white point, a white point being a set of values or chromaticity coordinates that serve to define the neutral colour in an image. Therefore, the Luv colour model has been used for illumination normalisation [10,16].
• SCT. The SCT colour model (spherical coordinate transform) is not a colour model per se. Nevertheless, it is possible to make a conversion to SCT based on the RGB colour model. Thus, the SCT colour model is decomposed into three components M, ϕ and θ. M reproduces the colour intensity and is the length of the RGB vector, ϕ is the angle between the blue axis to the RG plane and θ corresponds to the angle between G and R axes [23]. Conversion from RGB to SCT is shown in Eqs (1), (2) and (3). This colour model is useful when there are few changes in illumination.
• Lb and Hb. The Lb and Hb colour spaces consist of the L, H and b colour channels. The emphasis on the L and b channel (present in Lab and Luv colour models) is due to the ability of these channels to make a good segmentation on breast TMA cells. The use of L and b channels has shown to be essential to improve classification results. Images representing Lb and Hb are visualized by duplicating the b channel; thus, the three colour components are Lbb and Hbb.
TMA image samples with the eight colour spaces are shown in Fig 3. Colour model combination. Surprisingly, only one paper has studied the influence of colour models in HE histological images [24]. A combination of colour models is one of the goals of this paper. Usually, all the studies about tissue classification use only the most simple colour model, which is the RGB. Furthermore, channels and colour models of each colour model can be combined to create new models. Our study not only performs a classification using the aforementioned eight colour models individually, but also carries out the classification making combinations of them. We found that the classification results improved when more than two colour models were combined.
Textural features will be one of the bases of this study. We organised descriptors in groups according to their formulation, as shown in Table 1, which at the same time will help to conduct the later experiments. A brief description is given here and we refer the reader to plenty of well documented references.

Statistical Descriptors
In 1973, Haralick introduced a general procedure for describing textural features of an image [25]. First and second order statistical descriptors are a quantification of the spatial variation in the spatial image shade. The first order statistical descriptors are based on the image histogram (Table 3). On the other hand, the second order statistical descriptors consider the relationship of the image pixels (see Table 4). They are based on the Grey Level Co-occurrence Matrix (GLCM) of the image. GLCM are second order histograms that represent the spatial dependence of the image pixels. These spatial relationships are calculated with the neighbouring

Statistical Formula
Mean  pixels in a sliding window. Furthermore, these relationships can be defined indicating the distance and the angle between the reference pixel and its neighbour. Distances were taken at 1, 3 and 5 pixel-wide neighbourhoods and at a direction parameter equal to 0°, 45°, 90°and 135°to cover different angles. Finally, a total of 241 texture features were extracted by the first and second order statistical descriptors, 13 and 19 features respectively, as suggested in [12].

Fourier Transform
The Fourier Transform (FT) is an important image processing tool which is used in a wide range of applications to analyse spectral content of a signal where 2D frequencies arise from When: Influence of Texture and Colour in Breast TMA Classification grey level variations along features such as, image filtering, image reconstruction and image compression. Fourier is a frequency descriptor. The FT embeds translational, periodicity, rotation and scaling properties. The spectrum of the FT encloses the direction of repeating structures of the image as high concentrations of energy in the spectrum. This allows the extraction of certain parts of the image with much detail. In order to define the 2D FT let us suppose that the continuous function f(x, y) has been discretised in the succession: x takes the values 0, 1, 2, . . ., M-1 and y takes the values 0, 1, 2, . . ., N-1. Bearing this in mind, the 2D discrete Fourier transform (DFT) pair, which is composed by the 2D discrete and the 2D discrete inverse Fourier transforms, is defined as in Eqs (4) and (5) [26].
In an image using the 2D DFT, variables x and y denote columns and rows, and variables u and v denote the vertical and horizontal frequencies, respectively. Horizontal and vertical frequencies are represented in the Fourier magnitude image as vertical and horizontal lines, respectively. The central point in the image is the direct current term that represents the average intensity of the whole image. Once the 2D DFT is applied on an image it is possible to distinguish the discontinuities of the original image in the Fourier magnitude image. For this paper, four filtered images were calculated by image. On the Fourier magnitude image, different frequency radius masks were extracted. This radius represents the different frequencies in the image (see Fig 4). Corresponding pixels of each mask in the magnitude image are placed in a matrix of pixels, which is then converted to an image.

Wavelets Transform
Fourier transform has only frequential information. Therefore, it is necessary to use a function that contains frequential and spatial information, like wavelets. In this way, the Wavelet Transform (WT) can be defined as a special type of transform represented by versions of a shifted and scaled finite wave. This wave is denominated mother wavelet [27]. A set of base functions (or base wavelets) ψ j, k (x) is generated from the wavelet mother ψ x by the expression defined in Eq (6) where the variable j determines the scale and k the translations. Thus, the scale j is bigger than 0 and the translations k are real numbers.
A wavelet function is an orthonormal wavelet if the set of base functions ψ j, k (x) defined by the Eq (7) forms orthonormal bases in L 2 (R)  (8) and (9). Influence of Texture and Colour in Breast TMA Classification Where W ϕ (j 0 , k) and W ψ (j, k) are called approximation coefficient and detail coefficient, respectively.
The DWT applied to an image can be described as a low-resolution image at scale n, plus a set of details on it ranging from low to high resolution [28]. The 2D DWT is implemented by applying low-pass and high-pass filters to the rows and columns of the image. In this way, the original image is divided into four sub-images: low resolution (scale function), vertical, horizontal and diagonal orientation (tree basis wavelets or details). The low-resolution sub-image is obtained by convolving the low-pass filter on the columns and rows of the image. The vertical oriented sub-image is calculated from the convolution of the low-pass filter on the image columns and the high-pass filter on the image rows. The horizontal orientation image is extracted from the convolution of the high-pass filter and the low-pass on the image columns and rows, respectively. Finally, the diagonal orientation image is obtained from the convolution of the high-pass filters on the image rows and columns. All these images, in the same scale, must be summed to achieve orientation invariance. Based on preliminary observations, the number of scales was set to four and the wavelet used was the overcomplete version and five stem long of Daubechies basis to build our descriptors as the energy on every scaled level. Finally, four scaled levels were selected in this paper. Only the detail images of each level were used leaving out the scaled image. Additionally, these detail images were summed to merge the three detail images on a single image (see Fig 5).

Multiresolution Gabor Transform
2D Gabor filters are bandpass filters used in image processing for edge detection, segmentation and feature extraction. The Gabor filter consists of a Gaussian function modulated by complex sinusoidal of frequency and orientation. The result is the partition of the Fourier plane into bands modulated in orientation and octave bands apart in frequency. Gaussian shape ensures an optimum spreading in both dimensions, i.e., space location and frequency discrimination, while one weakness of wavelets is the pronounced frequency overlapping. However, the Gabor filter is not suitable for some applications. The DC term has non-zero mean value at some specific bandwidths, being counter-productive in pattern recognition applications. The DC component is not necessary due to the fact that it gives a feature that changes with the average value. Another drawback of the Gabor filters is that filters of arbitrarily wide bandwidth cannot be implemented since the filter width is limited to one octave. A multiresolution analysis for the Gabor filter can solve some of its limitations. The multiresolution analysis transforms the filters on different frequencies as scaled versions of each other [29]. Thus, fine detail is equivalent to high frequency. The multiresolution 2D Gabor filter in the spatial domain with l scales and o orientations is given by Eqs (10) and (11).
x 0 ¼ xcosy þ ysiny and y 0 ¼ Àxsiny þ ycosy ð11Þ Where f denotes the central frequency of the filter, θ the rotation angle of the Gaussian major axis and the plane wave, γ the sharpness along the major axis, and η the sharpness along the minor axis. The 2D Gabor filter in the frequency domain is given by Eq (12) [30].
Cðx; y; f ; yÞ ¼ e Where Similarly to wavelets, the multiresolution Gabor descriptor is formed by calculating the energy at every scaled level (see Eq (13)).
The multiresolution Gabor filter, G lo (f, θ) has been calculated with L = 4 scales and O = 4 orientations, where I(x,y) represents the image.

M-LBP
The Local Binary Pattern (LBP) operator is based on the idea that texture is described by patterns or local spatial structures within the image [31]. These patterns may be detected by a 3x3 mask called texture spectrum that compare masked values, g p (x, y), with their central pixel, g c (x, y), acting as threshold. The labeled pixels are multiplied by a fixed weighting function and summed to obtain a value, that is: Hðg p ðx; yÞ À g c ðx; yÞÞ2 p ð14Þ Where g c (p = 0,.., P) are the values of the neighbours and H(Á) is the Heaviside function. (P, R) are the parameters of the circular mask, with P = 8 sampling points and R = 1 radius of the neighbourhood. LBP are considered as powerful textural descriptors with discrimination capacity, computational simplicity and tolerance for changes of scale. Several variations have been implemented, such as the improved LBP (ILBP) and the mean LBP (M-LBP). The ILBP uses the mean of all the pixels, g p ðx; yÞ, instead of using a reference pixel, g c (x, y) [32]. The M-LBP is similar to the ILBP, but does not consider the central pixel when the binary pattern is created [33]. In this work, the best results were obtained with the M-LBP. The formulation of LBP P, R yields a total of 2 P different patterns. Thus, the range of value pixels is kept between [0. . .255] to create the texture image and then extract the first and second order Haralick statistical descriptors.

Textons
Textons descriptors represent the texture by maps, which are created performing a clusterisation of the principal pixels in an image. There are two types of textons: frequential (F-Textons) and spatial (S-Textons). Frequential textons are based on the responses generated by a filter bank. The filter bank selected is the maximum response filter bank (MR8) composed of a Gaussian and a Laplacian filter, and 18 edge and bar filters with three basic scales. The MR8 filter bank is applied over the tissue images. This process generates 38 response filters per image so that each pixel belongs to the original image and is now represented by a 38 dimensional vector [34]. In the case of spatial textons, they are not filtered by a filter bank, but each image pixel is represented by the intensity values of an NxN square neighbourhood around it. In our study, a 3x3 square neighbourhood was selected. Thus, each original image is now represented by a 9-dimensional vector. A k-means clustering algorithm is applied over all the pixel vectors (frequential or spatial). This algorithm allows creating vector groups with similar values. A representative vector called texton is selected for each vector group. Sixty representative textons were selected for each class. Thus, 240 textons were extracted and collected in the texton vocabulary. A textons map is generated by each tissue image and the texton vocabulary. These texton maps will be used later as the filtered images used for the classification. The following steps must be done for each image: • The MR8 filter bank or the square neighbourhood is applied over the tissue images. Each pixel, now represented by a 38 or a 9-dimensional vector, is assigned to its nearest texton.
• The texton map is created when all image pixels are classified by their nearest texton. This classification is performed using the k-nearest neighbours algorithm. The texton map is a representation of the original image that assigned the corresponding texton indices (a new colour) to each pixel. According to a previous study carried out by the authors of this paper, spatial textons obtained better results in the classification of breast histopahological images stained with HE [12]. For that reason, frequential textons have been dismissed in this study.

Feature Extraction
Once the images from our dataset were converted to each colour model they were filtered by each descriptor and the whole bank of statistical descriptors were calculated. Thus, statistical features (Haralick coefficients) were extracted from the original image (called intensity hereafter), each band of Fourier, Wavelets and Gabor, the M-LBP filtered images as well as the spatial texton maps (see Fig 6). This means that the total number of descriptors became four times larger for the space-frequency descriptors, given a four-level decomposition transform. Finally, the first and second order texture statistics are calculated on filtered images of each colour model. Thus, intensity, M-LBP and textons statistical descriptors contain an average of 241 features for each colour model and Fourier, wavelets and Gabor statistical descriptors containing an average of 964 (241x4) features. That is, in total for the six textural descriptors, we extract 3615 features for each colour model.

Dimensionality Reduction
In the correlation method, the dimensionality reduction is estimated by means of a Pearson correlation coefficient, which measures the linear dependence between two or more features. This dependence is estimated by the correlation coefficient [35] shown in the Eq (15), where Influence of Texture and Colour in Breast TMA Classification cov is the covariance and var the variance.
The Pearson correlation coefficient takes values between 1 and +1. When jrj % 0, this means that there is no correlation (positive or negative) between the features, and x and y are completely uncorrelated. On the other hand, if there is some linear dependence between the x and y features, jrj will take a value near 1.
The correlation method removes redundant and unnecessary information in the feature dataset. In our study, two threshold values of 97% and 99% were adopted for the correlation method. This method achieved an overall 76% dimensionality reduction in our feature dataset maintaining or improving the classification results obtained with the original features.

Classification
Once textural features are computed, the next issue is how to assign each query case to a preestablished class. Train and classification have been performed using 10-fold cross-validation (10fcv). This validation method divides the feature set into 10 disjoint subsets and performs a loop with 10 iterations using in each iteration nine of the subsets as the training set, and the another subset as the test set. In the end, the average of all folds provides an estimation of the classification accuracy of the model. This type of methodology for classification ensures that each fold has a class distribution similar to the whole dataset. A similar procedure was applied with groups of one element, the so-called leave-one-out. Both training methods gave similar results and for the sake of simplicity only 10fcv will be shown in the experiments.
Many classifiers can be applied and some of them could significantly improve accuracy rates. Here, the purpose is to find which descriptor or combination of descriptors better discriminate between breast TMA structures rather than carry out a thorough analysis of classifiers performance. Hence, although we compare an extensive bank of classifiers like nearestneighbour, k-means, neural networks, decision trees, quadratic Bayes normal classifier, Fisher classifier, linear discriminant or support vector machine (SVM), here, we select five representative ones. That is, the classifiers shown in this manuscript are Fisher, SVM, Random Forest, Bagging and AdaBoost. The classifiers based on weak individual classifiers (tree classifiers in our study) demonstrated a valuable capacity to distinguish the different breast tissue classes.

Fisher classifier
Fishers linear classifier finds a linear discriminant function by minimising the errors in the least square sense [36]. This linear discriminant is based on finding a direction in the feature space such that the projection of the data minimises Fishers criterion, i.e., the ratio of the squared distance between the class means and averaged class variances. The multi-class implementation used corresponds to the one-against-all strategy.

Support Vector Machine
Support Vector Machines (SVM) find a discriminant function by maximising the geometrical margin between positive and negative samples [37]. Thus, the space is mapped so that examples from different classes are separated by a gap as wide as possible. Besides linear classification, SVMs can function as a non-linear classifier by using the so-called kernel trick. This trick can be considered a mapping of the inputs onto a high-dimensional feature space in which classes become linearly separable. SVMs minimise both training error and the geometrical margin. The latter accounts for the generalisation abilities of the resulting classifier. In this implementation a linear kernel was used on a multi-class classifier of type one-against-all.

Random Forest
Random forest was proposed by Breiman [38] as a combination of tree predictors such that each tree depends on the values of a random selection of features. Each decision tree is built using a random subset of the features, independently of the past random subsets but with the same distribution. Finally, the forest chooses the most popular class which is the class with the most votes. The random feature vectors may be generated using several techniques, such as Bagging, random split selection and the so-called random subspace technique.
The forest error rate depends on: i) the strength of the individual trees in the forest and ii) the correlation between two trees in the forest. Random forest algorithm has the advantages of being faster, relatively robust to outliers and noise, giving useful internal estimates of error, strength, correlation and variable importance and easily parallelised. The classifier has been trained using 50 decision trees.

Bagging Trees
The Bagging (Bootstrap Aggregating) algorithm (see Table 5) is a method of classification that generates weak individual classifiers using Bootstrap. Each classifier is trained on a random redistribution of the training set so many of the original examples may be repeated in each classification [39,40]. Generally, the error of combining several types of classifiers is explained by bias-variance decomposition. The bias of each classifier is given by its intrinsic error and measures how well a classifier explains the problem. Variance is given by the training set used to create the classifier model. The total error classification is given by the sum of bias and variance. In this paper, the Bagging method was applied to classification trees and ad hoc, an ensemble of 50 trees were taken. The Bagging Tree classifier is the one that obtained the best performance.

AdaBoost
AdaBoost was introduced by Freund and Schapire [41] and is based on training different classifiers with different training sets, such as Bagging. The main idea of the algorithm [40] is to Table 5. Bagging Algorithm [40]. Influence of Texture and Colour in Breast TMA Classification assign weights to the training set. Initially, all the weights are equated but each round, the weights of misclassified examples are increased. Thus, in subsequent rounds the weak classifiers will be more focused on these examples (see Table 6). The Adaboost was one of the best classifiers available. As in the random forest classifier, the AdaBoost classifier was trained using 50 decision trees.

Evaluation Metrics
An important aspect when diagnostic results are managed is their correct interpretation. In classification problems there are two types of results produced: positive and negative detections or classifications. However, in some cases, positive types can be classified as negative and vice versa. These cases are called false positives and false negatives, respectively. Thus, four types of outputs, that is, true positive (TP), true negative (TN), false positive (FP) and false negative (FN) must be considered in the interpretation of results.
The following validity and security criteria are fulfilled when the classification results are interpreted: Table 6. AdaBoost Algorithm [40].
-L, the number of classifiers to train 2. For k = 1, . . ., L -Take a sample S k from Z using distribution w k .
-Build a classifier D k using S k as the training set.
-Calculate the weighted ensemble error at step k by -If k = 0 or k ! 0.5, ignore D k , reinitialize the weights w k j to 1 N and continue.
-Else, calculate • Validity denotes the grade of validity over the results. This condition is measured by the sensitivity and specificity coefficients (see Eq (16)). The sensitivity represents the conditional probability of classifying the tissue as positive. Then, this coefficient indicates the capability that the selected classifier has to detect positive cases. The specificity shows the conditional probability of classifying the tissue as negative. Therefore, in contrast to the sensitivity, the specificity indicates the capability that the classifier has of detecting the negative cases.
• Security is determined by the predictive value of the result (see Eq (17)). What is the security level in the classification result to predict negative or positive cases? The positive predictive value (PPV) or precision indicates the likelihood of a positive case being classified as such. This value is estimated using the amount of real positive cases that were classified as positive.
The negative predictive value (NPV) indicates the likelihood that a negative case had been classified as such. In the same ways as the previous predictive value, it is estimated using the amount of real negative cases that were classified as negative. Finally, the accuracy (ACC) is the proportion of true results (true positives and true negatives).
The evaluation metrics used in this paper are based on the binary classification method. In multi-class problems, the metrics are calculated taking the values of a single class against the mean values in all other classes (one-vs.-rest) [42].

ROC Curves
The ROC (Receiver operating characteristic) curve is another common technique to validate results [43]. Basically, it consists of a graphic plot that represents the sensitivity versus (1-specificity) for a binary classifier system, and therefore, the diagnostic accuracy of the test. ROC curves have several purposes: • To determine the point where greater sensitivity and specificity is achieved.
• To evaluate the discriminative ability of the test • To compare this capability among several different tests The Youden index is the point that represents the largest sensitivity and specificity. Graphically, it is the nearest point to the coordinate (0,1) (upper-left corner of the graph). The discriminative ability of the test is its capability to distinguish between positive and negative cases. This capability is measured by the area under the curve (AUC). This area has a value between 0.5 and 1, where 1 represents a perfect diagnosis and 0.5 a test without a discriminating capacity.

Empirical Results
Classification results depend on feature quality and the suitable selection of the classifier. Several experiments were done to show concrete aspects of descriptors and classifiers. As mentioned above, in this study, a relevant set of features was obtained from first and second order Haralick statistical descriptors obtained from the intensity image, Fourier, Wavelets, Gabor, M-LBP and spatial texton descriptors. Furthermore, each feature set was extracted for each colour space discussed in previous sections that is, RGB, CMYK, HSV, Lab, Luv, SCT and channel combinations Lb and Hb. However, the combination of so many colour models and descriptors can substantially increase the size of the feature set; hence, the importance of using methods to reduce feature dimensionality.
The combination of different features and classifiers implies a considerable number of possible experiments. In order to perform all of these tests, four types of classification experiments were performed with six classifiers: (1) classification per colour model individually, (2) classification by combination of colour models, (3) classification by combination of colour models and descriptors, and (4) classification by combination of colour models and descriptors with a previous feature set reduction.

Experiment 1: Results per colour model
As mentioned above, some colour models are able to highlight different breast tissue structures, and therefore, improve the classification results. That will be the first improvement in our study. A deep study of the different colour models and their combination was done. On average the best classifiers that work well with all features and colour models are the AdaBoost and the Bagging Trees with an average error of 0.18. The error refers to mislabelling in the 4-class classification. Reviewing the results individually, the best performance was achieved with the intensity statistical features applied to the Hb image and using AdaBoost, with an error of 0.061, an average of 93.92% PPV and 97% ACC. The worst classification error was obtained by the statistical Fourier descriptor. It was observed that the results improved significantly when more than three colour models were combined. In fact, five colour model combinations provide better results; they were: Hb&Luv&SCT, CMYK&Hb&Lb&HSV&Lab, CMYK&Hb&Lb&HSV&Luv&SCT, RGB&Hb&Lb&HSV&Luv&SCT and the eight colour models. On average the best classifier that works well with all features and all colour model combinations is the AdaBoost Bagging Tree with an average error of 0.11. Reviewing the results individually, the best performance was achieved with the   Influence of Texture and Colour in Breast TMA Classification descriptors ranges from 482 to 2410 per three, five, six and eight colour models used in the five colour model combinations.
Once again, the AdaBoost and the Bagging Tree classifiers were the best classifiers. Bagging classifier using CMYK&Hb&Lb&HSV&Luv&SCT colour combination with the Intensi-ty&M-LBP&Gabor&S-Textons descriptors, and AdaBoost classifier using the Hb&Luv&SCT colour combination with Intensity&M-LBP&Gabor descriptors. The ROC curves are shown in Fig 9. The best result was obtained with the Bagging classifier, and the confusion matrix is shown in Table 7. The results are promising, reaching 98.26% ACC and 96.95% PPV, and

Experiment 4: Results of combining colour models and descriptors with a prior reduction of correlated features
Although the previous experiments improve the classification results, the combination of colour models and descriptors also increases the number of features, and thus, the computation time needed. In addition, some features may be redundant or provide irrelevant information getting worse results. Irrelevant and redundant features could be detrimental for the training processes and increase the computational time [44]. New feature sets smaller than the originals and without redundant information may be created. As mentioned above, the correlation method [35] was analysed to carry out the dimensionality reduction.
Correlation was performed with two threshold values, 97% and 99%, which allow a reduction of 75.73% and 75.79% of the initial features, respectively.
It has to be pointed out that the best results were obtained by Bagging Tree classifier and a correlation threshold of 97% (see Fig 10). Note that using Intensity&M-LBP&Gabor, Intensi-ty&M-LBP&Gabor&S-Textons, and Intensity&M-LBP&Gabor& Wavelets descriptor combinations the error value is lower than 0.035.  In this experiment, the best result was obtained with the CMYK&Hb&Lb&HSV&Luv&SCT colour combination and Intensity&M-LBP&Gabor&S-Textons descriptors. Confusion matrices are shown in Table 8 where an average of 99.05% ACC and 98.34% PPV were obtained with a total of 1719 features. ROC curves are shown in Fig 11. The computational time of the classification was 244.2 seconds. Thus, the validity of the colour model and descriptor combination in the breast TMA classification is demonstrated.

Discussion
This paper has described a complete study on breast TMA classification using the combination of colour and texture descriptors. The study shows promising results with a dataset of 628 TMA images divided into four classes, including benign anomalous structures, which are ignored in most studies. Overall, class 2, that is, adipose tissue, is the class best classified while classes 1 (benign stromal tissue with cellularity) and 4 (ductal and lobular carcinomas) show greater difficulty in the classification. Nevertheless, a suitable combination of descriptors and colour models makes our results obtain an error value below 0.035. Besides, different dimensionality reduction methods were performed due to the large size of the feature dataset and the increase of the computational time. Finally, the best result was obtained with the combination of intensity, M-LBP, Gabor and S-Textons statistical descriptors and a suitable combination of colour models with CMYK, Hb, Lb, HSV, Luv and SCT reducing correlated features with a 97% correlation threshold and using a Bagging tree classifier. This test obtained an average of 99.05% accuracy and 98.34% positive predictive value making this study truly valuable in breast TMA classification. In addition, although the number of features was large, computational times in the classification were not very excessive, and therefore, the CAD methodology proposed is suitable for the daily work of pathologists.   Tables 9 and 10 show the results obtained depending on the correlation threshold value selected. In addition, a comparison between the results with and without feature selection is shown in Fig 18a and 18b.     Influence of Texture and Colour in Breast TMA Classification Supporting Information S1 Database Files. RGB tissue images used in this paper.

AdaBoost and Bagging classifiers in Experiment 4 results
(ZIP)