Texture Descriptors Ensembles Enable Image-Based Classification of Maturation of Human Stem Cell-Derived Retinal Pigmented Epithelium

Aims A fast, non-invasive and observer-independent method to analyze the homogeneity and maturity of human pluripotent stem cell (hPSC) derived retinal pigment epithelial (RPE) cells is warranted to assess the suitability of hPSC-RPE cells for implantation or in vitro use. The aim of this work was to develop and validate methods to create ensembles of state-of-the-art texture descriptors and to provide a robust classification tool to separate three different maturation stages of RPE cells by using phase contrast microscopy images. The same methods were also validated on a wide variety of biological image classification problems, such as histological or virus image classification. Methods For image classification we used different texture descriptors, descriptor ensembles and preprocessing techniques. Also, three new methods were tested. The first approach was an ensemble of preprocessing methods, to create an additional set of images. The second was the region-based approach, where saliency detection and wavelet decomposition divide each image in two different regions, from which features were extracted through different descriptors. The third method was an ensemble of Binarized Statistical Image Features, based on different sizes and thresholds. A Support Vector Machine (SVM) was trained for each descriptor histogram and the set of SVMs combined by sum rule. The accuracy of the computer vision tool was verified in classifying the hPSC-RPE cell maturation level. Dataset and Results The RPE dataset contains 1862 subwindows from 195 phase contrast images. The final descriptor ensemble outperformed the most recent stand-alone texture descriptors, obtaining, for the RPE dataset, an area under ROC curve (AUC) of 86.49% with the 10-fold cross validation and 91.98% with the leave-one-image-out protocol. The generality of the three proposed approaches was ascertained with 10 more biological image datasets, obtaining an average AUC greater than 97%. Conclusions Here we showed that the developed ensembles of texture descriptors are able to classify the RPE cell maturation stage. Moreover, we proved that preprocessing and region-based decomposition improves many descriptors’ accuracy in biological dataset classification. Finally, we built the first public dataset of stem cell-derived RPE cells, which is publicly available to the scientific community for classification studies. The proposed tool is available at https://www.dei.unipd.it/node/2357 and the RPE dataset at http://www.biomeditech.fi/data/RPE_dataset/. Both are available at https://figshare.com/s/d6fb591f1beb4f8efa6f.


Methods
For image classification we used different texture descriptors, descriptor ensembles and preprocessing techniques. Also, three new methods were tested. The first approach was an ensemble of preprocessing methods, to create an additional set of images. The second was the region-based approach, where saliency detection and wavelet decomposition divide each image in two different regions, from which features were extracted through different descriptors. The third method was an ensemble of Binarized Statistical Image Features, based on different sizes and thresholds. A Support Vector Machine (SVM) was trained for each descriptor histogram and the set of SVMs combined by sum rule. The accuracy of the computer vision tool was verified in classifying the hPSC-RPE cell maturation level.

Dataset and Results
The RPE dataset contains 1862 subwindows from 195 phase contrast images. The final descriptor ensemble outperformed the most recent stand-alone texture descriptors,

Introduction
The retinal pigment epithelial (RPE) cells reside in the back of the eye between the photoreceptor cells and choroid. The RPE monolayer is vitally important for the vision as RPE cells compose a diffusion barrier to protect photoreceptor cells from humoral substances, but also maintain the viability of photoreceptor cells [1]. The RPE cell differentiation and maturation is a slow process, modulated by culturing environmental trophic factors [2,3]. The morphology changes during maturation [4]: from the elongated, so called "fusiform morphology", of immature RPE; via "epithelioid morphology" i.e. rounder but still without pigmentation (after one to two weeks of culture); to "cobblestone morphology" (approximately after a month) when the cells have condensed and become heavily pigmented [4]. This phenomenon can be seen both in primary RPE [4] and in human pluripotent stem cells (hPSC) derived RPE cell maturation [5]. Recently, in the first human embryonic stem cells (hESC) RPE transplantations to humans, it was demonstrated that less pigmented cells integrated better than heavily pigmented cells [6]. Furthermore, new serial plating methods to expand the hPSC-RPE cell number [7,8] need a quality and purity evaluation after every plating step [8]. These applications would benefit from a non-invasive and reliable method to assess the maturity development of hPSC-RPE cells. The benefits of cell morphology analysis for both RPE tissue [9] and hPSC-RPE cell cultures [10] has already been shown. However, this has been mainly done by manual examination and therefore is affected by inter-and intra-operator variability, making it less suitable for clinical application. In particular, Jiang et al. [9] recently published a computer vision approach for RPE tissue explants, discriminating between age (young, <61days-old vs old, !100 or 180 days-old) and genotype (control vs rd10, considered to be a model for autosomal recessive retinitis pigmentosa). The analysis was based on 21 morphological features of the cells, including aspect ratio and area, by means of principal component analysis. In [11], three degrees of pigmentation were considered as a good maturation marker. A manual approach was chosen, where two observers subjectively classified the cell pigmentation levels and an objective pigmentation measurement was inferred from the Photoshop's Info Palette for a set of manuallyselected points.
In this paper, we focused on the specific problem of the classification of the maturation level of hPSC-RPE cells by means of a different approach: texture analysis. Together with the increasing availability of advanced and accurate image acquisition techniques, texture analysis has become nowadays a common processing approach for medical and biological images. Its versatility makes it applicable to images acquired with diverse modalities: from medical imaging to microscopy [12][13][14][15].
In spite of the recent progresses in texture analysis, textural information in medical images is still often assessed by conventional features such as first order statistics (e.g. variance, kurtosis and skewness), second order statistics (Haralick features extracted from the grey level cooccurrence matrix [16]) or wavelet features. In [17], wavelets and a subset of the Haralick features were extracted from X-ray images to diagnose presence of osteosarcomas. First order statistics and features extracted from the grey level co-occurrence matrix and run-length matrix proved their utility in colorectal polyp identification in colonoscopy [18], classification of intracardiac masses (thrombi, malignant, and benign tumors) for cardiac tumor detection [19] and breast cancer malignancy classification in histological images [20].
More recent techniques, such as local binary pattern (LBP) [21] or texture descriptors derived from it (e.g. local ternary pattern (LTP) [22], local quinary pattern (LQP) [13], etc.), were applied to medical imaging for the examination of Pap test samples [12] or in the inquiry of endoscopy images of healthy and celiac disease duodenal tissue [14]. Another important research area, where texture descriptors are commonly used, is cell classification. Due to the availability of many datasets (2D HeLa dataset (HeLa) [23], chinese hamster ovary cells (CHO) [24], etc.), this field is very prolific for specific classification tasks and for the development of more and more accurate texture descriptors. In [13], the multi-threshold approach was applied to LTP and LQP and tested by classifying six different datasets of cellular and subcellular organelles. In [15], a new variant of LBP, the rotation invariant co-occurrence among adjacent LBP (RICLBP), obtained outstanding results in the MIVIA HEp-2 dataset. It suggests also that clinical tests, such as the antinuclear antibody test, can benefit of improved accuracy texture descriptors.
In spite of the efforts performed during the latest years to improve the discriminant power of texture descriptors, preprocessing did not receive the same attention. Recently published preprocessing approaches exploit the separation of the texture image in two different regions or maps, e.g. textural information extracted from edge information. In [25], the Difference of Gaussians (DoG) filter was used to compute from a given image two maps representing the "positive" and the "negative" sides of the image edges, resulting in a classification accuracy improvement. A similar approach was exploited by [26] (details in section 2.3) for the extraction, through Sobel filtering, of an edge and a non-edge region from a texture image to compute LBP, LTP, etc. on the original image masked by each map. The technique is interesting since it can be combined with many state-of-the-art texture descriptors. In order to differentiate from the canonical preprocessing, we named the descriptors combined with this approach as region-based descriptors. In addition to them, well-assessed preprocessing algorithms were tested, e.g. wavelet [27] and Gabor filters [28]. We paid particular attention to preprocessing techniques in this study, to improve the descriptors classification power.
The main aim of this paper was to develop three simple but effective methods to create ensembles of texture descriptors: the ensemble of preprocessing, the region-based approach and the ensemble of Binarized Statistical Image Features (Bsif). We validated them on a wide range of biological image datasets, with particular focus on the quantification of hPSC-RPE cell maturation stages, which enables a user-independent method to analyze the cell cultures before their use in implantation or as in vitro cell models. To find the most suitable ensemble of descriptors for the classification of the three developmental stages, we tested a combination of large sets of both preprocessing methods and texture descriptors. In the perspective of using hPSC-RPE cells for drug tests or transplantation, we used phase-contrast microscopy images, which is a noninvasive assessment method. Our work resulted in a methodological core for a software tool in order to assess quantitatively the level of development of hPSC-RPE cells. The same ensembling methods resulted effective also for other classification problems, ranging from medical diagnostic to virus images.
The pipeline of the process consisted in the following steps. First, we considered many stateof-the-art stand-alone texture descriptors in order to select the best ones. Second, they were combined together and with techniques to augment and enhance the features extracted from each image, to improve the classification performances. Finally, the best performing features sets resulting from the previous step were combined together, thus resulting in ensembles which improved significantly the performances of the pre-existing stand-alone descriptors and of the ensembles based on a single descriptor. For classification, we used Support Vector Machines (SVM).
In detail, we propose the following novelties. First, an ensemble of preprocessing approaches (based on wavelet decomposition, Gabor filtering, orientation image and Multi-scale approach by Gaussian filtering) was applied to create a set of images to be used together with the original one. A different descriptor was extracted from each processed image and the set of SVMs combined by sum rule. Second, saliency detection and wavelet decomposition were tested for the region-based descriptors. Each image was divided into two regions from which histograms were extracted by different texture descriptors. From each histogram, a specific SVM is trained [29]. Finally, the partial scores obtained by the different SVMs were combined by sum rule. Third, the original stand-alone versions of the Bsif [30] was improved by combining different Bsif sets. They were obtained by (i) varying the size of the filter and (ii) introducing a threshold while building the Bsif image. We constructed a new ensemble, using a set of sizes and thresholds, which greatly outperformed the stand-alone version.

Proposed approaches
In this work, we developed and validated three methods for ensembling texture descriptors and techniques such as preprocessing or region-based feature extraction, to describe images with an augmented feature set and eventually improving their classification. The first method was a combination of preprocessing methods, to create additional images from the original one and then extract texture information from each of them, enhancing the feature set describing the original image. The second approach worked differently, obtaining the augmentation of the feature set by means of the region-based approach. Saliency detection and wavelet decomposition divide each image in two different regions, from which features were extracted through different descriptors. Again, the original image is described by an enhanced feature set. Different texture descriptors were tested with the first two approaches. The third method augmented the feature set describing the original image by combining feature vectors extracted by Bsif, based on different filter sizes and binarization thresholds. The most effective feature sets extracted according to the aforementioned approaches were ensembled together. To test and optimize our approaches for the hPSC-RPE classification problem, we built a new RPE image dataset (Section 2.2): 1862 subwindows were extracted from 195 phase contrast images of maturing hPSC-RPE cells. Finally, we analyzed how the three approaches performed independently on the analyzed datasets: in order to generalize their viability, 10 large datasets of medical and biological images were used (Section 2.3).
The remainder of this section is organized as follows: the section 2.1.1 briefly explains the basic texture descriptors used in this paper; the section 2.1.2 is dedicated to preprocessing techniques and their ensemble; the section 2.1.3 presents the region-based approach using, saliency maps and wavelet maps; the section 2.1.4 explains the ensemble of Bsif; and, finally, the section 2.1.5 details the new multi-quinary coding tests.
2.1.1 Texture descriptors. The standards texture descriptors used in this work are summarized in Table 1, together with the chosen parameters. As for the LBP-based approaches, we tested both uniform and rotation invariant uniform bins (see section 3 for details).
2.1.2 Preprocessing. One of the aims was to improve the performance of texture descriptors by using a set of different preprocessing methods before feature extraction. When using a given preprocessing approach, a new set of images was produced to be then processed by an ensemble of descriptors. Classification was performed separately for each descriptor using SVMs, with both linear and radial basis function kernels, as the base classifier. For each dataset, the best kernel and set of parameters were chosen using a 5-fold cross-validation approach on the training data. SVMs were implemented using the tool LibSVM (available at http://www. csie.ntu.edu.tw/~cjlin/libsvm/) and combined by sum rule.
The image preprocessing phase included the testing of the following four methods: decomposition by wavelets, multi-resolution by Gaussian filters, orientation image and Gabor filters. Of note, only the training data is used for finding the parameters of the different approaches while the test set is blind. The flowchart of the preprocessing is reported in Fig 1. Wavelet transform is frequently used in many computer vision problems related with detection and recognition of objects of interest. Wavelet transform [27], to be used for 2D decomposition, requires a 2D scaling function, φ(x,y) and three 2-D wavelets functions, ψ i (x,y), where i Table 1. Texture descriptors and their parameter sets.

Acronym
Descriptor and parameters Ref

MLQP
The multi-quinary coding version of LBP and its parameters as proposed in the original paper, i.e. the multi-threshold LQP, combined with several loci of points. [13]

CLBP
Completed LBP with 2 (radius, neighboring points) configurations: (1,8) and (2,16 represents the three possible intensity variations along horizontal, vertical and diagonal edges i = {H,V,D}. As both functions are separable, the scaled and translated basis functions are defined as: φ j;m;n ðx; yÞ ¼ 2 j=2 φð2 j x À m; 2 j y À nÞ; c i j;m;n ðx; yÞ ¼ 2 j=2 c i ð2 j x À m; 2 j y À nÞ; i ¼ fH; V; Dg: For the three discrete wavelet transform functions (W H ,W V and W D for horizontal, vertical and diagonal respectively) of a M x N function f(x,y), the used formulation is: where j 0 is an arbitrary starting scale and the W φ (j 0 ,m,n) coefficients represent an approximation (on the initial scale j 0 ) of f(x,y). The W i c ðj; m; nÞ coefficients represent the three directional (horizontal, vertical, and diagonal) details for higher scales than j 0 .
In our experiments, we used the Daubechies wavelet family (Wa) with four vanishing moments. An example of Wa processing is shown in Fig 2. The second preprocessing technique was a multi-scale approach by means of Gaussian scale-space representation (MRS). The original image was filtered to obtain two smoothed versions by using a 2D symmetric Gaussian lowpass filter of size k pixels (here we use k = 3 and k = 5) with standard deviation 1. Illustrative results of MRS preprocessing are shown in Fig 3. The third preprocessing technique was the Orientation image (OR). In the Orientation image the image gradient is soft quantized [43] using d orientations (here d = 3), thereby producing d processed images. This method is used to reduce the noise or other forms of degradation.
In detail, OR computation is organized in 3 steps: 1. for each pixel, the p gradient magnitude m(p) and orientation θ(p) are computed. θ(p) is then discretized over [0, 2π]. The pixel label is a d-dimensional vector ½ b m 1 ðpÞ; . . . ; b m d ðpÞ with just one non-null element i which is equal to m(p) if the discretized θ(p) corresponds to the i-th bin, i.e. b m i ðpÞ ¼ mðpÞ. A more refined approach, the soft decomposition of the magnitude, consists in quantizing m(p) in two parts to be assigned to the directions of the p's two nearest neighbors. Details about soft decomposition are reported in [43]; 2. to include in each pixel the p information also from its neighborhood, a local histogram of orientations computed all over the pixels contained into a squared-shape image patch (Cell) centered in p and size w. At pixel p the new feature vector is ½m 1 ðpÞ; . . . ; m d ðpÞ, wherẽ 3. finally, a self-similarity measurement computed over n cells centered in the c j pixels surrounding p (this topological structure is a circular block of radius L centered in p): otherwise and τ is a threshold slightly greater than 0 to make the mapping stronger in near-uniform regions. This produces d = 3 different orientation images (for details refer to [43]); an example of OR preprocessing is shown in Fig 4. The final preprocessing technique was Gabor filters. A 2D Gabor filter is a Gaussian kernel function modulated by a sinusoidal plane wave that is able to detect frequencies in various scales and directions. Gabor wavelets are derived from the convolution of the input image with a family of Gabor kernels. This family, or bank, of Gabor filters is created by dilating and rotating a specific function. Gabor wavelets approximate, to a certain level, the perception in the primary human visual cortex [28]. In this study four scales {1, 2, 3, 4} and four directions {0°, 45°, 90°, 135°} were implemented, thus 16 images are obtained. Choosing a specific frequency and direction allows creating a map containing the local frequency and orientation information for each pixel in an image.
A symmetric Gabor filter has the following general form in the spatial domain: Gðx; y; n; s; yÞ ¼ exp À x 02 þ y 02 where ν is the frequency of the sinusoidal wave, θ is the orientation, and σ is the standard deviation of the Gaussian envelope [44]. An example of Gabor filter and convolved image is shown in Fig 5. The various preprocessing techniques, and their ensembles, used in Section 3 are summarized in Table 2.
2.1.3 Region-based descriptors. This idea was mainly inspired by [26], where the edgebased LBP variant (Edge) is proposed. This bases on the evidence that, when an observer needs to fixate the attention to a particular image, the most likely perceived locations are the ones that present the highest spatial frequency edge information [45].
The Edge descriptor is computed as follows: 1. applying LBP to an image to obtain the LBP image (LBPI); 2. detecting the edges in the original image by means of Sobel filter. Two binary maps are created from the edge information: the edge map (E, where edge pixels are set to 1 and nonedge pixels to 0), and the non-edge map (NE, where edge pixels are set to 0 and non-edge pixels to 1); 3. combining LBPI with the E and NE masks, to obtain two histograms (H E for edge pixels and H NE for non-edge pixels), see (Abdesselam, 2013) for details; 4. mounting the final histogram (weighted concatenation): where w E and w NE represent the empirically determined weight that express the greater relevance of edge regions in capturing the viewer's visual attention; Unlike in [26], in this study the two histograms, H E and H NE , were not combined into one feature vector but they were used separately to train two different SVMs that are then combined by sum rule.

Wa
Features extracted from the four images obtained applying the wavelet decomposition to the original image. The four SVMs are combined by sum rule.

WaH
Features extracted from the images obtained applying a two level decomposition to the original image, as proposed by [14]. The chosen wavelet mother is the same used in Wa.

OR
Features extracted from the three orientation images obtained from the original image. This ensemble is built by 3 SVMs.

Ga
The original image is filtered by Gabor filters obtaining 16 images and then features are extracted. Two methods for extracting the two maps were tested: the former based on saliency and the latter on wavelet decomposition. It should be noted that for both approaches, as in Edge, the descriptor was extracted initially from the original image and the two regions were used only to calculate the two histograms. The flowchart of the region-based approach is reported in Fig 6. In detail: 1. the chosen descriptors (namely LBP, LTP, LPQ, RICLBP and WLD) were applied to the texture image to get the labeled image DescI; 2. two maps, Map + and Map -, were computed according to saliency or wavelet (details in the next sections); 3. two histograms, H + and H -, were computed by combining DescI with Map + and Map -, respectively; 4. H + and Hwere used to train two different SVMs that were then combined by sum rule.
Saliency: We used the method proposed in [46] to extract a saliency map from the image. Given an image x, the signature is defined as where sign() represents the sign operator and DCT() is the Discrete Cosine Transform. Hou et al. [46] demonstrated analytically and experimentally that the support of the foreground of an image can be approximated by the reconstructed image x, obtained by transforming back to the spatial domain the image signature, as follows: x ¼ IDCTðImageSignatureðxÞÞ: For images whose foreground is evident compared to their background, the saliency map m is defined as where g is a Gaussian kernel aimed to blur the noise induced by the sign quantization and o is the entrywise matrix product operator. To build the saliency map the standard deviation of the Gaussian kernel was set to 2.
For each image two regions were extracted. The first contained the pixels with saliency higher and the latter contained the pixels with saliency lower than a prefixed threshold. To build the saliency map two different thresholds were applied: 0.5 and 0.7. Hence, for each image, two saliency maps and four histograms were extracted.
Wavelet: The wavelet decomposition (see section 2.1.2) used four wavelets, and the horizontal, vertical and diagonal coefficients matrices were considered. These matrices were resized to the size of the original image and then the mean value of each image was calculated. Each image was divided in two regions whose pixels were respectively greater and smaller than the mean value.
The region-based methods, as well as the baselines, are summarized in the following Table 3.
2.1.4 Binarized Statistical Image Features. The Bsif descriptor assigns an n-bit label to each pixel of a given image by exploiting a set of n linear filters. Given a neighborhood of l x l pixels and a set of n linear filters of the same size, the n-bit label to be assigned to the central pixel of the neighborhood is obtained by binarizing s ¼ Wx where x is the l 2 x 1 vector notation of the l x l neighborhood and W is a n x l 2 matrix representing the stack of the vector notations of the filters. In detail, the i-th digit of s is a function of the i-th linear filter w i and it is expressed as thus each bit of the Bsif code can be obtained as The set of filters w i is estimated by maximizing, through independent component analysis, the statistical independence of the filter responses s i on a set of patches from natural images. In Table 3. Region-based methods and baselines (BAS) used for comparison.

Acronym
Region-based method BAS O Standard texture descriptor applied to the original image.

DoG
The Difference of Gaussians approach proposed in [25].

Saliency + Wavelet
Fusion by sum rule between Saliency and Wavelet;

All
The fusion by sum rule among Saliency, Edge and Wavelet; All + Comb The descriptors of All are extracted from the images obtained using Wa and OR, where the preprocessing approaches are applied to the original image and the two images obtained by MRS. Due to computation time Ga was not tested. where th = 0. We improved the stand-alone version of Bsif by combining different Bsif in two ensembles, Size_Bsif and Full_Bsif. Size_Bsif was obtained by varying the filter size = {3, 5, 7, 9, 11} (i.e. we use 5 different filters). The second ensemble, Full_Bsif, was derived from Size_Bsif by varying also the threshold th used to binarize the image. In detail, we used the following thresholds th = {-9, -6, -3, 0, 3, 6, 9} for each different size of the filter and, the 35 SVMs trained with these Bsif-based descriptors (for each couple of size and threshold a different SVM is trained) were combined by sum rule. 2.1.5 Multi-quinary coding. Variants of the original LBP descriptor were proposed, based on modifications on the binarizing function s(x), originally defined in [21] as: ; where x = q p −q c , q c represents the central pixel in the neighborhood and q p each of the surrounding pixels.
In [22], LTP was defined by encoding the same difference x with 3 values, by means of the threshold τ: In [13], this approach was extended to LQP by introducing two thresholds τ 1 and τ 2 (τ 1< τ 2 ), thus getting the quinary coding: These variants of the binary coding allow a lower sensitivity to noise, especially in near-uniform regions, and a higher level of granularity that allows catching more textural features with respect to the original version. However, to compensate the increased verbosity of the ternary and quinary codings, the ternary patterns are split into one positive and one negative binary patterns, according to the sign of its components, while the quinary patterns are split into four Table 4. Loci of points defining the different neighborhood topologies. For each geometric locus defined in [12], its formal definition and parameters are reported.

Locus
Definition Parameters Circle x 2 + y 2 = r 2 r: radius. Ellipse x 2 a 2 þ y 2 b 2 ¼ 1 a: semi-major axis; b: semi-minor axis. Parabola y ¼ À x 2 c þ 2c c: vertex-focus distance. Hyperbola x 2 a 2 À y 2 b 2 ¼ 1 a: semi-major axis; b: semi-minor axis. Spiral r = a + bθ a: turns the spiral; b: sets the inter-turnings distance; r, θ: polar coordinates. After computing one histogram for each binary pattern, the six partial histograms (two for LTP and four for LQP) are concatenated into a final histogram.
In MLQP, the threshold selection is a critical task: in [13], we set the thresholds manually to get good performance in studied datasets. The proposed thresholds were stable enough also in the RPE classification problem. The performance of MLQP was enhanced by building a large set of LQP coupling the set of thresholds with the geometric loci presented in [12] and summarized in Table 4.
All the loci of points (with the exception of the circle) were rotated by β = {0°, 45°, 90°, 135°} to catch the anisotropic structural information according to different orientations as in Fig 7. The flowchart of the quinary coding and the use of the geometric loci is reported in Fig 8. Afterwards, the Sequential Forward Floating Selection (SFFS) was applied, using the training data for selecting only a single subset of the MLQP descriptors.
SFFS and its predecessor Sequential Forward Selection (SFS) are top-down searches that sequentially select a subset of features from the original set of candidates in order find an optimal subset.
Starting from the empty subset S 0 , SFS sequentially adds the k-th feature, maximizing the objective function when combined with the subset S k-1 of the previously selected k-1 features, thus getting the current subset S k . However, the main drawback is that the selected features cannot be reevaluated and discarded after the addition of a new feature.
SFFS [47] improves SFS by carrying out backward steps after the inclusion of a new feature as long as the objective function rises. For instance, after the k-th step forward, i.e. the selection of the k-th feature, each feature in S k is removed from the subset to get a smaller subset S 0 kÀ1 whose performance is compared with S k−1 's. If S 0 kÀ1 results in a greater objective function than S k−1 , then it replaces S k−1 .
We used SFFS as a feature selection method, where each feature was assigned a couple of thresholds and a geometric locus, to find the most useful (thresholds, locus) sets for MLQP. Therefore, we selected a set of MLQP descriptors, where each descriptor was used to train a given SVM: the objective function of SFFS was the maximization of the area under the ROC curve (obtained combining by sum rule the set of SVMs) using an internal 10-fold cross validation in the training data. The same procedures (thresholds, geometric loci and feature selection) were used also to extend the Local Configuration Pattern (LCP) descriptor to its multi-threshold quinary version MLCP.
2.1.6 Color descriptors. For the RPE dataset only (Section 2.2) we used an additional set of descriptors, not based on texture, but on colors (COLORS). COLORS consists in the concatenation of statistics computed on the three channels of a color image: mean, homogeneity, standard deviation, third, fourth and fifth moments and the marginal histograms (8 bins for each channel) [ [51] were used for this study. Cell lines were cultured on top of mitomycin-treated (10 μg/ml,Sigma-Aldrich) (i.e. mitotically inactivated) human foreskin fibroblasts feeder cells (CRL-2429TM, ATCC, Manassas, VA, USA). The undifferentiated cells were cultured similarly as in Sorkio et al. [52] and after one week of culture the differentiation was induced by reducing the KO-SR concentration to 15%, removing the bFGF and commencing the floating culture as previously described in Vaajasaari et al. [53]. Floating aggregates were fed thrice a week and grown for 70-195 days. The pigmented areas of floating aggregates were manually dissected, dissociated with 1x Trypsin-EDTA and replated on collagen IV from human placenta (5 μg/cm 2 , Sigma-Aldrich). Adherently cultured cells were imaged for the fusiform morphology after 8 days (range 6-9 days), for the epithelioid morphology after 9 days (range 8-9) and for the cobblestone morphology after 19 days (range 17-24) of culturing.
2.2.2 Ethical issues. The National Authority for Medicolegal Affairs Finland has approved our research with human embryos (Decision number 1426/32/300/05). We also have a supportive statement from the local ethics committee of the Pirkanmaa hospital district Finland to derive and expand hESC lines for research purposes (R05116). Local ethics committee of the Pirkanmaa Hospital District has given a supportive statement to generate iPSC lines for research purposes (R11028), and use them to ophthalmic research (R14023). No new hESC or hiPSC lines were generated in this study.

Image acquisition.
The cell culture images were acquired for analysis with the same settings (25-125 ms exposure time, 2560 x 1920 pixels, dynamic contrast and autowhite balance) from cell cultures using a Nikon Eclipse TE200S phase-contrast microscope (Nikon Instruments Europe B.V., Amstelveen, Netherlands) with the 20x objective and Ph1 phase contrast. Cell imaging parameters are described in Table 5.
2.2.4 Building the RPE dataset. Each acquired image was divided into 16 subwindows which were manually labeled into 4 classes by two trained operators; samples particularly difficult to be labeled were inspected by a specialist. Subwindows containing clutters, out-of-focus elements or just background were discarded. The criteria of inclusion and examples are shown for each class in Table 6 and in Fig 9. In Fig 9, the four RPE classes are represented from left to right: the fusiform cell type, the epithelioid with its characteristic globular shape, the final maturation stage cobblestone and a mixed class example with an epithelioid cell in the middle of the image and, in the left side, a cluster of fusiform cells. The final dataset includes a total of 1862 subwindows: the number of subwindows per class is reported in Table 6. Before using the RPE images, they were converted to gray scale by means of the standard MATLAB (The MathWorks, Inc., Natick, Massachusetts, United States) function rgb2gray.

Validation in other datasets
For validating some of the proposed variants of texture descriptors and the system for RPE classification, we ran several comparisons also in other datasets. As testing protocol, we used the 5-fold cross validation, except for the VIR dataset for which the 10-fold validation protocol was provided by the original author.
The following datasets were used: • VIR: this dataset [55] contains 1500 images, evenly divided into 10 classes, of viruses extracted using negative stain transmission electron microscopy. The 10-fold validation protocol shared by the authors was used. The mask for background subtraction was not used and the features were extracted from the whole images. The dataset is available at http:// www.cb.uu.se/~gustaf/virustexture/; • HI: this Histopathology dataset [56] is composed of 2828 images from different organs, unevenly distributed among 4 classes, representative of the four fundamental tissues (connective, epithelial, muscular, and nervous). The dataset is available upon request to Loris Nanni [nanni@dei.unipd.it]; • BR: this dataset [57] is a subset of the digital database for screening mammography [58] and contains 1394 images of breast tissue, 810 control, 273 malignant and 311 benign breast cancers. The dataset is available upon request to Geraldo Braz Junior [ge.braz@gmail.com]; • PR: this dataset, reported in [59] contains 329 proteins, divided into DNA-binding (118 samples) and non-DNA-binding (231 samples). From the 3D tertiary structure of each protein, its 2D distance matrix was computed (considering only atoms that belong to the protein backbone) and used to extract texture features. The dataset is available upon request to Loris Nanni [nanni@dei.unipd.it]; • CHO: this cell dataset [24] contains 327 fluorescent microscopy images of Chinese Hamster Ovary cells and distributed into 5 classes. The dataset is available at http://ome.grc.nia.nih. gov/iicbu2008/hela/index.html#cho; • HeLa: the 2D HeLa dataset [23] consists in 862 single cell images, divided into 10 staining classes, from fluorescence microscope acquisitions on HeLa cells. The dataset is available at http://ome.grc.nia.nih.gov/iicbu2008/hela/index.html; • LE: the LOCATE ENDOGENOUS mouse sub-cellular organelles dataset [60] contains 502 images, unevenly distributed among 10 classes, of endogenous proteins or features of specific organelles. The dataset is available at http://locate.imb.uq.edu.au/; • LT: the LOCATE TRANSFECTED mouse sub-cellular organelles dataset [60] contains 553 images, unevenly distributed in 11 classes, of fluorescence-or epitope-tagged protein transiently expressed in specific organelles. The dataset is available at http://locate.imb.uq.edu.au/;

Experimental results for the RPE dataset
The chosen performance indicator was the area under the ROC curve (AUC) as it is more reliable than accuracy. AUC allows summarizing in one scalar value the ROC curve. In multi-class problems the one-versus-all approach was used: each of the m classes was considered as "positive" and the remaining m-1 classes as "negative", thus obtaining m partial AUCs. Finally, the global AUC was computed as an average of the partial AUCs.
As testing protocol a 10-fold cross validation was used. Of note, the 10-fold was applied at image level, so all the sub-windows of a given image belonged or to the training set or to the test set.
Of note, when referring to a texture descriptor combined with preprocessing/region-based approaches, we use the notation preprocessing(descriptor). For example, Comb(RICLBP) means RICLBP combined with the various preprocessing Wa, OR and Ga applied to the original image and the two images obtained by MRS (see Table 2).
In the first test, reported in Table 7 several texture descriptors and five different ensembles (the last five rows) were compared. The methods named A+B are the fusions by sum rule between the methods A and B. The LBP-based approaches use uniform bins, except in presence of the suffix -ri, representing the rotation invariant bins. From the results reported in Table 7, it is clear that the best performances were obtained using descriptor ensembles. The best performance was reached by the ensemble RICLBP+MLPQens+MLCP, while the best stand-alone approach was RICLBP. We used an already published method [61] for assessing the difference between two approaches in the same dataset. MLCP, i.e. the multi-threshold quinary ensemble of LCP built according to [13] (see section 2.1.5), outperforms RICLBP with a probability of 80% using a one sided test of significance with p = 0.05. However, the proposed ensembles RICLBP + MLPQens + MLCP and Comb(RICLBP) + Full_Bsif + MLPQens + MLCP outperform each stand-alone approach with a probability of 95%, using a one sided test of significance with p-value of 0.05.
We tested also the effectiveness of COLORS, obtaining an AUC of 89.10%. In Table 8 different approaches based on Bsif were compared. The method named Baseline represents the standard stand-alone Bsif with size = 7. Moreover, Full_Bsif was coupled with the best previous ensemble (i.e. RICLBP+MLPQens+MLCP) increasing slightly the performance.
The performance of the best descriptors was presented in Table 9, coupled with the preprocessing approaches detailed in section 2.2 and Table 2.
Notice that all the preprocessing approaches are coupled with only stand-alone texture descriptors due to the high computation time. The performances of all the descriptors were improved when the features were extracted from Comb, comparing with the performance obtained by O.
The region-based methods, proposed in section 2.3 and Table 3, were compared in Table 10.
It is clear that a descriptor applied to a set of processed images drastically outperformed the same descriptor obtained using only the original image. However, only some methods' performances benefited from the different preprocessing methods. The baseline multi-quinary approach (i.e. all the descriptors extracted from the circle neighborhood) and the effect of SFFS are reported in Table 11. SFFS supervised selection improved MLQP but had no positive effect on MLCP.
The final ensemble was created by sum rule among Comb(RICLBP), Full_Bsif, MLPQens and MLCP. It obtained an AUC of 86.49%. As shown in Table 6, the RPE dataset was unbalanced towards the cobblestone class, consequently a single 10-fold cross validation risks to create a training set not representative of the classes with fewer subwindows. Therefore we ran a higher-performance but more computationally demanding protocol: the leave-one-image-out, consisting in leaving out one full image (i.e. the image divided into the 16 subwindows) for each round. We tested such protocol on our best ensemble Comb(RICLBP) + MLPQens + MLCP + Full_Bsif, whose performance increased from 86.49% to 91.98%. A further improvement was obtained by the fusion by sum rule between such ensemble and COLORS, obtaining an AUC of 95.00%.

Results with other datasets
Afterwards, the proposed ensemble of Bsif, the preprocessing applied before feature extraction and the region-based descriptors were validated with the other datasets. Other tests, e.g. coupling MLCP with the selection of the geometric loci, were not performed due to their huge computational time.
As in the previous tests, the performance indicator was the AUC. Moreover, the experiments were statistically validated with the Wilcoxon signed rank test and the Bonferroni-Holm method.
The performances of Bsif and of standard texture descriptors, as baseline, were compared in Table 12. The three best baseline methods were CLBP, RICLBP and LTP. LTP outperformed all the other baseline approaches (except RICLBP and CLBP) with a p-value of 0.05. Furthermore, there was no difference between the performance of RICLBP and LTP. Moreover, Full_Bsif outperformed both Bsif, Size_Bsif and all the baseline approaches, including LTP with a p-value of 0.05. In order to avoid reporting a massive amount of results, in the following we summarized only the results from the best performing baseline descriptors, i.e. RICLBP, LPQ and LTP.
The effect of preprocessing on the additional datasets was reported in Table 13. We also tested the ensemble of preprocessing O+Wa+OR, i.e. the sum rule among the preprocessing approaches O, Wa and OR applied to the original image and to the two images obtained by MRS. We can conclude that, among the stand-alone preprocessing, O is the best one and that the best approach is the ensemble O+Wa+OR, which outperformed the baseline approach O with a p-value of 0.05.
The results reported in Table 14 showed that the region-based approaches outperformed, with a p-value of 0.05, the standard application of texture descriptors. Especially, compared to the baseline O, All+O improved AUC (or did not perform worse) for all the datasets and for the three tested descriptors with a p-value = 0.05.
Finally, to summarize the best techniques presented in this section, we created a further ensemble F where we combine Full_Bsif, O+Wa+OR(LTP), O+Wa+OR(RICLBP), O+Wa+OR (LPQ), All+O(LTP), All+O(RICLBP) and All+O(LPQ). F was compared in Table 15 to Full_Bsif: F performed better than Full_Bsif, or at least equally, on all the additional datasets with a pvalue of 0.05.

Conclusions
In this work, we assembled and tested many state-of-art texture descriptors and a large set of preprocessing methods for demanding image classification tasks and their application in a new and very specific biological problem, i.e. the automatic assessment of the maturation level of hPSC-RPE cells. This is a very well warranted problem as RPE cells are planned to be used for implantation and for in vitro cell models for drug and disease modeling. In all these applications the perquisite for the maturation assessment is the non-invasiveness and consequently there is the need for label-free methods. Thus, analysis methods based on just phase contrast microscopy images are welcomed. in this work we used the normalized histograms, while in [13] the non-normalized histograms. Therefore, for the same dataset, and for the same testing protocol, different results were reported.
The first aim of this work consisted in applying three new methods to create ensembles of texture descriptors (based on combinations of preprocessing techniques, region-based approaches and Bsif with different filter sizes and binarization thresholds) to find the most suitable descriptors for the texture-based classification of the considered datasets, in particular of the RPE dataset. A combination of different preprocessing techniques (i.e. Wa, OR and Ga applied at three different scale of representation obtained by MRS) allowed to boost all the best performing descriptors (see Table 9, with the exception of WLD, for which MRS was not necessary). It is interesting to note that while each descriptor obtained the best performance with a different preprocessing, the fusions Wa+OR+Ga and Comb, improved the single best preprocessing for all the descriptors. Similar improvements were obtained by the region-based methods, in particular by combining the region selection by Edge, Wavelet and Saliency (see Table 13. AUC obtained using the preprocessing approaches and LTP, RICLBP and LPQ.  Table 10). However, it is interesting to note that a global combination of preprocessing and region-based feature extraction did not provide significant improvements compared to using the two approaches individually (see Table 10, last column). Therefore, we chose to exploit only preprocessing, hence obtaining the new ensemble Comb(RICLBP) + Full_Bsif + MLPQens + MLCP. This approach performance was AUC = 86.49%, which is the best result observed on the RPE dataset. The second aim of this work was to provide the methodological core for a software tool in order to assess quantitatively the level of development of hPSC-RPE cells, compared to the classification provided in [4,62]. Our study primarily shows that a computer vision system is able to classify the RPE cell maturation stage and, secondly, it enables a correct and repeatable estimation of the maturation level on new images, a necessary step before using these specific cells and tissues, e.g. for drug testing. The most accurate ensemble, Comb(RICLBP) + Full_Bsif + MLPQens + MLCP, got an AUC of 86.49%, confirming that image processing methods can be employed to classify the maturity of RPE in microscopy images. Moreover, we proved that using the higher-performance, but slower to be validated, leave-one-image-out classification protocol we obtained for the best method Comb(RICLBP) + Full_Bsif + MLPQens + MLCP AUC over 91%.

Preprocessing
The only related studies are [9] and [11]. Jiang et al. [9] reported a correlation between two specific morphological features, cell area and aspect ratio, two maturation stages (young, <61days-old vs old, !100 or 180 days-old) and two genotypes (control vs rd10). However, the cell source was different, mouse eyes vs RPE cells derived from hPSCs, as well as the maturation stages, since their focus was only in the cobblestone stage. In Kamao et al. [11], the degree of pigmentation (objective dPG, in the original publication) was assessed as the main marker on hPSC-RPE maturation, by means of a manual assessment of the RGB values from the Photoshop's Info Palette, for single cells (obtained by manual segmentation) and cell-groups. In spite of the findings, in particular that the objective dPG correlated with the RPE function, the technique of Kamao et al. [11] required user interaction, which might not be objective and is not suitable for huge number of images.
To prove the feasibility of the proposed methods not only on RPE images, but also for a wider range of biological applications, additional tests were performed on a selection of 10 datasets, spacing from diagnostic to microscopy images (electronic transmission as well as fluorescence imaging). We observed that the best-performing configurations of the three new proposed approaches (namely region-based descriptors, the ensemble of preprocessing algorithms and the improved Bsif) provided good classification results, obtaining an average AUC greater than 95%. In particular, we compared all the tested/proposed ensemble approaches and the best method resulted to be Full_Bsif that outperformed all the other approaches with a p-value of 0.05. This result highlights the key role of these methods in improving many texture descriptors' accuracy in the classification of different kind of biological images. Of note, the effectiveness of Full_Bsif has to be seen all over the tested datasets (see Table 12). By changing the filter size and the binarization threshold we can obtain Size_Bsif and Full_Bsif which work better than the baseline Bsif on all the tested datasets (see Tables 8 and 12). On the RPE dataset, Full_-Bsif by itself obtains results lower than RICLBP+MLPQens+MLCP (82.79% vs 84.60%), nevertheless Full_Bsif obtains statistically higher performances on the additional datasets, thus resulting the best ensemble based on a single descriptor. Finally, we tested on the additional datasets, a last ensemble F built by gathering the best techniques of the aforementioned three new approaches: Full_Bsif, O+Wa+OR(LTP), O+Wa+OR(RICLBP), O+Wa+OR(LPQ), All+O (LTP), All+O(RICLBP) and All+O(LPQ). F outperformed Full_Bsif in six datasets and obtained equal AUC in the other four. Due to the variety of the additional datasets, F represents our proposed ensemble to process a generic dataset as well as a suggestion to other researchers for further studies on this topic. Of note, the approaches investigated in section 3.2 showed better performances on the additional datasets than in the RPE dataset. This is due to the nature of each dataset, e.g. staining, microscopy technique, etc. The RPE dataset was acquired directly imaging the cell cultures through a phase-contrast microscope. On the other hand, among the additional datasets we had more complex imaging techniques such as electronic transmission (VIR) or fluorescence imaging (CHO, HeLa, LE, LT, RNai). Such techniques involve sample treatment (e.g. ultrathin samples for electron transmission, staining with antibodies for immunofluorescence or other stains for histology images). In spite of the better image quality which then affects the classification performance, such processing necessarily alters the samples and it is time-consuming. Moreover, many datasets provided pre-segmented images, thus excluding the textural information from the background.
The main limitation in the RPE study is the strict standards we had to define and fulfill for the image acquisition, necessary to build a reliable dataset, but demanding to be implemented in the laboratory practice.
As future work, we will build new methods to build region-based approaches for the classification of biological image datasets. Furthermore, we aim to use a heterogeneous ensemble of classifiers, instead of a stand-alone SVM, to improve the performance. Finally, to remove potentially confounding patterns, we plan to design an automatic method to remove the background from samples such as fusiform and epithelioid images.
In conclusion, in this paper we presented three methods to developed ensembles of texture descriptors, proving that specific preprocessing and ensembling techniques improve the performance of many state-of-art texture descriptors. Moreover, we automated the classification of the maturation stages of RPE cells by means of an ensemble of texture descriptors. Such methods were finally validated on a wide set of general biological image analysis problems.