Supervised retinal vessel segmentation from color fundus images based on matched filtering and AdaBoost classifier

The structure and appearance of the blood vessel network in retinal fundus images is an essential part of diagnosing various problems associated with the eyes, such as diabetes and hypertension. In this paper, an automatic retinal vessel segmentation method utilizing matched filter techniques coupled with an AdaBoost classifier is proposed. The fundus image is enhanced using morphological operations, the contrast is increased using contrast limited adaptive histogram equalization (CLAHE) method and the inhomogeneity is corrected using Retinex approach. Then, the blood vessels are enhanced using a combination of B-COSFIRE and Frangi matched filters. From this preprocessed image, different statistical features are computed on a pixel-wise basis and used in an AdaBoost classifier to extract the blood vessel network inside the image. Finally, the segmented images are postprocessed to remove the misclassified pixels and regions. The proposed method was validated using publicly accessible Digital Retinal Images for Vessel Extraction (DRIVE), Structured Analysis of the Retina (STARE) and Child Heart and Health Study in England (CHASE_DB1) datasets commonly used for determining the accuracy of retinal vessel segmentation methods. The accuracy of the proposed segmentation method was comparable to other state of the art methods while being very close to the manual segmentation provided by the second human observer with an average accuracy of 0.972, 0.951 and 0.948 in DRIVE, STARE and CHASE_DB1 datasets, respectively.


Introduction
Manual inspection of fundus images by a specialist known as a direct ophthalmoscopy is being challenged by the computer-assisted diagnosis of retinal images. Although direct ophthalmoscopy using retinal fundus images could be considered as an effective approach for diagnosing various retina related diseases that can result in blindness such as macular degeneration and diabetic retinopathy, it is time-consuming and the results cannot be easily reproduced. On the other hand, computer-assisted diagnosis of retinal fundus images has been shown to be as accurate as direct ophthalmoscopy while being faster and more reliable. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 The condition and the appearance of the blood vessel network can be considered as an important aspect in many computer-assisted diagnosis systems using retinal fundus images. As a result, many techniques have been proposed for retinal blood vessel segmentation. However, it remains a challenging task due to the variations in vessel shape and width coupled with image acquisition difficulties that often results in a low-quality image with uneven illumination and a considerable amount of noise. Previously proposed retinal blood vessel segmentation methods can be categorized into supervised and unsupervised methods. Unsupervised approaches usually rely on different combinations of image processing concepts such as morphological operations, different filtering techniques and clustering methods, to name a few. On the other hand, supervised methods (mostly including machine learning based methods) utilize a set of pixel-wise features derived from the images to construct a set of rules that can be used for separating vessels and the background.
In this paper, retinal vessel segmentation methods are briefly discussed as it is intended to provide some insight to the overall segmentation concepts and is by no means an exhaustive review of these methods. For a detailed review of vessel segmentation methods please refer to [1][2][3]. Currently, thin vessels and the noise in retinal images can be considered as the main challenges in retinal vessel segmentation. Moreover, the majority of vessel segmentation methods optimize their preprocessing and segmentation steps for each dataset separately, resulting in high accuracy on specific datasets whereas the accuracy will suffer if applied to other datasets. Although there are some interactive and semi-automatic segmentation approaches, most of the vessel segmentation methods are considered as automatic. Usually, vessel segmentation approaches in retinal images include preprocessing steps aimed at enhancing the vessels, although some methods may skip this step and go directly to the segmentation.
Popular unsupervised retinal vessel segmentation methods can be divided into vessel tracking, matched filtering and morphology based methods. Starting from a set of initial points defined either manually or automatically, vessel tracking methods try to segment the vessels by tracking the center line of the vessels. This tracking can be done by utilizing different vessel estimation profiles such as Gaussian [4][5][6], generic parametric [7], Bayesian probabilistic [8] and multi-scale profiles [9]. On the other hand, filtering based techniques utilize different kernels for modeling and enhancing retinal vessels such as matched filters [10], Gaussian filters [11], wavelet filters [12,13], Gabor filters [14,15] and COSFIRE filters [16][17][18]. Methods utilizing morphological operations can be used for both the enhancement of retinal images and the segmentation of blood vessel tree from the background [19][20][21].
The supervised approach requires the ground truth segmentation provided by human experts (regarded as the gold standard) for training a classifier using a set of features calculated based on local (pixel-wise) or global image characteristics, acting as a priori knowledge and guiding the training. This set of features should be able to effectively discriminate between different objects of interest such as vessel and background pixels and can be extracted by different concepts such as Gabor filter responses and gray level co-occurrence matrices, to name a few. Various classification concepts can also be used to classify the pixels such as adaptive boosting (AdaBoost), support vector machines (SVM), artificial neural networks (ANN), Gaussian mixture models (GMM) and k-nearest neighbors (k-NN).
Niemeijer et al. [22] proposed a supervised retinal vessel segmentation where a k-NN classifier is utilized for identifying vessel and non-vessel pixels based on a feature vector constructed using a multi-scale Gaussian filter. Staal et al. [23] proposed a similar approach utilizing a feature vector constructed using a ridge detector where the ridge pixels are grouped into convex sets that approximately represent the straight lines. Based on a feature vector constructed using 2-D multi-scale Gabor wavelet filters. Marin et al. [24] proposed a feed-forward neural network (NN) based classifier utilizing 7-D feature vector calculated using moment-invariant features. Fraz et al. [25] proposed a classifier based on boosted decision trees using a 9-D feature vector computed from Gabor filter responses, morphological transformation, line strength measures and gradient vector field. Ricci et al. [26] proposed a fast and computationally non-demanding approach utilizing an SVM coupled with features derived using a rotation-invariant linear operator and pixel intensity.
Lupascu et al. [27] proposed an AdaBoost classifier using a 41-D feature set that could achieve good accuracy on DRIVE dataset. Wang et al. [28] proposed an ensemble based retina vessel segmentation method that is currently amongst the most accurate methods proposed. Their method is based on the creation of super-pixels using a simple linear iterative clustering (SLIC) approach where one pixel from each of the super-pixels is randomly selected for feature extraction. A trainable hierarchical feature extraction approach using a convolutional neural network (CNN) is then used on the selected pixel with an ensemble based Random Forest (RF) being used as the main classifier. You et al. [29] proposed an SVM based semi-supervised method utilizing features extracted using radial projection. Roychowdhury et al. [30] proposed a GMM classifier using 8-D features calculated from pixel neighborhood on first-order and second-order gradient images.
Zhu et al. [31] proposed an extreme learning machine (ELM) based segmentation using a 39-D feature vector constructed using morphological and local features coupled with features computed from phase congruency, Hessian and divergence of vector fields. Zhu et al. [32] proposed a similar approach using a classification and regression tree (CART) classifier using a 36-D feature vector constructed using multi-scale and multi-orientation morphological transformation and local features coupled with features computed from divergence of vector fields. Wang et al. [33] proposed an SVM based segmentation using a 30-D feature vector constructed using Gaussian and multi-scale Gabor filter features coupled with features computed from divergence of vector fields. Tang et al. [34] proposed an SVM based segmentation using a feature vector constructed using Multi-Scale vessel filtering and Gabor Wavelet features. Aslani et al. [35] proposed a random forest classifier based segmentation using a 17-D feature vector constructed using multi-scale and multi-orientation Gabor filter responses and intensity features coupled with features computed from vesselness measure and B-COSFIRE filter response.
Although most supervised retina vessel segmentation methods require an extensive and computationally demanding training phase, the results obtained are more accurate than unsupervised segmentation. In this paper, a supervised approach for automatic segmentation of retinal blood vessels in fundus images is proposed. The proposed method is based on an AdaBoost classifier coupled with the most informative pixel-wise features selected by a "minimal-redundancy-maximal-relevance" (mRMR) feature selection approach for increased accuracy and decreased computational requirements. The AdaBoost classifier is used as it has strong discriminative power and is computationally efficient. Moreover, a combination of filters is used to improve the segmentation as each filter responds in a distinct manner to different pixels in the image. By combining these filters, it is also possible to make the segmentation approach more robust considering the different image characteristics between datasets. Moreover, while many features related to pixels (such as intensity and texture) can be used, these features could have a high degree of redundancy with complicated interrelations leading to high computational requirements that make supervised segmentation resource intensive. As a result, feature selection has been used to reduce the computational requirements. Furthermore, the proposed method has not been optimized for any specific dataset as the goal of the study was to identify a set of optimal features and preprocessing parameters that could be used on a variety of datasets and images. The performance of the proposed method is validated using publicly accessible DRIVE [23], STARE [36] and CHASE_DB1 [37] datasets commonly utilized in assessing the performance of retinal vessel segmentation methods utilizing fundus images. REVIEWDB [38] and BioImLab [39] datasets were excluded as they are designed for use in methods dealing with vessel width and tortuosity estimation, respectively.
The rest of the paper is organized as follows: materials and methods section introduces the datasets used, the proposed image and vessel enhancement steps followed by segmentation steps including feature extraction and selection steps along with the AdaBoost classifier. In results and discussion section, the effects of feature selection and extraction parameters on segmentation accuracy is shown and the results of the proposed method and its performance is compared to recent methods from the literature. Finally, the conclusions are drawn.

Materials and methods
In this study, the green channel of the RGB fundus image is utilized per suggestions from many previous works as it was shown that vessels have the highest contrast against the background in the green channel. The use of blue channel results in a small dynamic range and red channel offers insufficient contrast, as illustrated in Fig 1. Moreover, Mendonca and Campilho [40] further validated the use of the green channel by comparing different channels of the RGB image, Luminance channel of National Television Systems Committee (NTSC) color space and a component of the lab image representation system where the green channel of the fundus image was shown to provide better contrast. Furthermore, like most other studies, only the pixels inside the FOV area of the image is considered for the segmentation as the pixels outside this area are considered as background and have no known medical applications.

Datasets
Datasets used in this study are amongst the most popular publicly accessible datasets used for development and testing the performance of various retinal segmentation methods. Datasets used also include corresponding vessel segmentation done manually by different experts considered as the ground truth.
DRIVE dataset includes 40 color fundus images divided equally into training and testing sets with each including 20 images [23]. For each image in the dataset, an FOV mask along with the manual segmentations of the corresponding vessel tree (one expert for training set and two experts for testing set) are provided. A Canon CR5 non-mydriatic camera with an FOV of 45˚and bit depth of 8-bits was used to capture the images with a resolution of STARE dataset includes 20 color fundus images with half of them containing signs of different pathologies [36]. For each image in the dataset, manual segmentations of the corresponding vessel tree done by two experts are provided. A canon TopCon TRV-50 fundus camera with FOV of 35˚and bit depth of 8-bits was used to capture the images with a resolution of 700×605 pixels. Fig 5 illustrates an image from this dataset with its respective manual vessel segmentations.
CHASE_DB1 dataset (referred to as CHASE in some publications) includes 28 color fundus images acquired from 14 patients participated in the child heart and health study in England [37]. For each image in the dataset, manual segmentations of the corresponding vessel tree done by two experts are provided. A Nidek NM 200D fundus camera with FOV of 30˚and bitdepth of 8-bits was used to capture the images with a resolution of 1280×960 pixels. Fig 6 illustrates an image from this dataset with its respective manual vessel segmentations.
As seen from sample images, the manual segmentation provided by the first observer in DRIVE and CHASE_DB1 datasets includes finer segmentation for thinner vessels in contrast to STARE dataset where the manual segmentation provided by the second observer includes finer segmentation for thinner vessels.

Preprocessing and vessel enhancement
The extracted green channel of the fundus image is first filtered using a 3×3 median filter for reducing the image noise as median filters have the useful property of retaining edge information within an image. As retina fundus images have high contrast around the edges of the image resulting in possible false positive vessels being detected around the edges of the image, the edges are smoothed [15,17]. Initial FOV mask is computed by thresholding the luminance  channel on CIElab color space [41] computed from the original RGB fundus image. Then, the FOV boundary is dilated by one pixel where the value of the new pixel is calculated as the mean value of its 8-connected neighbor pixels. This process is repeated fifty times for ensuring that no false vessels will be detected near the FOV boundary [17].
Contrast limited adaptive histogram equalization (CLAHE) algorithm [42,43] is then used for enhancing the contrast between vessels and the background. CLAHE improves the local contrast and does not over amplify the noise present in relatively homogeneous regions. Then, the noise is further reduced by having the images morphologically opened. Finally, the image is further enhanced using a combination of morphological Top-hat and Bottom-hat operations.
Retinex based inhomogeneity correction. Often, retinal fundus images can include illumination inhomogeneity resulting in reduced segmentation accuracy. As a result, an inhomogeneity correction step is highly desired with Retinex theorem proposed by Land and McCann [44] being one of the most popular approaches. Retinex theorem, named after a combination of words retina and cortex, is used to remove the illumination inhomogeneity from the images, improving the segmentation accuracy [45][46][47][48]. In this study, a bilateral filter based Retinex algorithm is used as this approach was shown to provide better results in retinal images while preserving the vessels edge details [49,50]. Based on Retinex theorem, any given image I can be represented as a component-wise multiplication of its reflectance R and illumination L as: Let's denote x as a pixel belonging to image I, the pixel x of the reflectance image R(x) could be obtained by computing the difference of the logarithms between the original image I(x) and the resulting image L(x) after applying a bilateral filter to the original image I(x), defined as: Where w represents the window size used for measuring the spatial relations between pixels with a window size of 3×3 pixels being used in this study as suggested by [46] and M is a normalization factor. gð'; xÞ represents the spatial closeness (computed using Euclidean distance) between pixels x and ' inside the window (w) and sð'; xÞ represents the intensity similarity between these pixel pairs. σ d represents the spatial spread (based on low-pass filtering) and σ r represents the spread of the image intensity range with σ r and σ d value of 0.3 used in this study. Afterwards, the image background is computed and removed by subtracting the image from its median filtered image using a large kernel.
B-COSFIRE filter. Bar-selective combination of shifted filter responses (B-COSFIRE) was proposed by Azzopardi et al. [17,51] for detection of patterns with a bar shape. Vessels are enhanced with B-COSFIRE filter by using a bank of collinearly aligned Difference of Gaussian (DoG) filters that have been configured for detecting the bar-like appearance of blood vessels at different angles. A DoG filter for detecting the intensity variations of the image can be defined as: Where σ denotes the standard deviation of the Gaussian function. The response c σ (x,y) of a DoG filter is computed by convolving image I (x,y) with DoG (x,y) while replacing any negative values with zero. Fig 7 illustrates a B-COSFIRE filter configured for detecting a vertical bar. Let's consider point '1' as the center point for a set of concentric circles and responses c σ (x,y) along these circles, points '1' till '5' represent significant responses to significant intensity changes (assuming a circle with radios of zero at the center point). These points are represented using set S defined as: Where ρ i and Φ i represent the polar coordinates and n represents the number of DoG filter responses being considered. For the example image shown in Fig 7, S = {(0,0) 1 , (2,π/2) 2 , (2,3π/ 2) 3 , (4,π/2) 4 , (4,3π/2) 5 } with subscripts denoting the points with their position in the set S. By using this configuration, a B-COSFIRE filter is configured that is selective for the collinear alignment of significant intensity changes in an image (such as vessel patterns). By utilizing the DoG filter responses from different positions in set S, the B-COSFIRE filter output at the center points is determined. First, the DoG filter responses are blurred for allowing some tolerance in the position of the points being considered. The blurring is done by keeping the maximum value of the weighted DoG filter responses where a Gaussian function G σ 0 (x,y) is multiplied with DoG filter responses for the weighting. The standard deviation σ 0 is considered as a linear function defined as σ 0 = σ 0 + αρ i where σ 0 and α are constants. Then, the responses are centered at the center point of DoG filter where each blurred DoG filter response is shifted by distance ρ i with an opposite direction to Φ i . The DoG filter response s r i ;F i ðx; yÞ at each position (ρ i ,Φ i ) is computed as: Where: Shift values are given as: The output of the B-COSFIRE filter is then defined as the weighted geometric mean of all the blurred and shifted responses from the set S as: Where |.| t denotes the thresholding of the responses at a fraction of their maximum value where t = (0 t ! 1). It should be noted that a B-COSFIRE filter will have a response only in case of non-zero values from s r i ;F i ðx; yÞ as the weighted geometric mean is an 'AND' type function. By moving further from the center point of the B-COSFIRE filter, the contribution of the blurred and shifted responses decrease.
The orientation of the bar structures used in the configuration determines the orientation preference of a B-COSFIRE filter. As such, it is possible to achieve rotation invariance by considering outputs r s (x,y) computed from a set of bar structures oriented at angles ranging from 0 to π. Bar structures oriented at angle θ k can be computed by considering a new set R y k ðSÞ generated with respect to S as: Essentially, the process for obtaining responses r s (x,y) is the same as s(x,y) where only the bar orientations differ. In practice, twelve angles with equal intervals are used such as θ k = {kπ/12 | 0 k 12} where the maximum response value of the B-COSFIRE filters with different orientations is taken at every location (x,y) as: Due to 'AND' type output function of a B-COSFIRE filter, there should be no filter response at the end of the bars. However, due to the noise present in the images, a response will be computed although with a much lower value compared to a middle point in the bar. As such, a new B-COSFIRE filter was introduced that is selective for the bar ending known as asymmetric B-COSFIRE filter while the B-COSFIRE filter previously described is known as symmetric B-COSFIRE filter [17]. By placing the center point of the filter on the end point of the bar, a much higher response value is obtained. An example of symmetric and asymmetric B-COS-FIRE filter responses can be seen in Figs 7 and 8, respectively. In this study, the responses from both the symmetric and asymmetric B-COSFIRE filters are summed together with the controlling parameters set empirically with σ = 2.4, ρ = {0,2,4,. . .,10}, σ 0 = 3, α = 0.6 for symmetric and σ = 2.1, ρ = {0,2,4,. . .,18}, σ 0 = 1, α = 0.1 for asymmetric filters, respectively.

Frangi filter.
Frangi filters were first proposed for enhancing the vessel profiles in coronary artery segmentation [52]. Hessian matrix based Frangi filter is a popular approach that is both efficient and requires less computation time [53]. The Hessian matrix is constructed by computing the vertical and horizontal diagonals of the second-order derivative of the image. The Hessian based Frangi filter can be defined as: Where the pixel of interest is defined by x, the standard deviation for computing the Gaussian derivative of the image is denoted as σ and f represents the filter. The hessian matrix can be defined as: Where H xx , H xy , H yx and H yy represent the directional second-order partial derivatives of the image. Let's denote λ 1 and λ 2 as the eigenvalues of H, these are used for determining the probability of the pixel of interest x being a vessel based on the following notions: Then, the Hessian based Frangi filter can be defined as: Where R b and a are used for differentiating linear structures from blob-like structures while β and S are being used for differentiating background (noise) and vessels with the controlling

Extracting image features
The accuracy of machine learning based supervised segmentation approaches is highly dependent on the set of features being utilized. Thus, it is necessary to select the best set of features for a good separation between the vessels and the background. Sole use of intensity values as features is not that accurate and reliable and spatial relations between neighboring pixels should also be included in the features. In this study, a sliding window is used for extracting image features for each of the image pixels by considering a predefined set of pixels in the neighborhood of each pixel. Intensity based image features. Intensity based features can be used to represent some texture characteristics of the image using the distribution of intensities inside the image. These features are mostly calculated using the image histogram. Given an image I with size n, these features are defined as: Variance Skewness Kurtosis Image features calculated using gray level co-occurrence matrix. Co-occurrence matrices are used to track the distribution of the pixel pairs inside an image and are very popular in pattern recognition and image processing fields [54][55][56]. As this study uses gray intensity values for pixels, only one co-occurrence matrix will be used, known as gray level co-occurrence matrix (GLCM). The GLCM characterizes the image texture by calculating the adjacency occurrence of a pixel with specific intensity i to a pixel with intensity value j in a predefined distance d and angle θ. Although various distances can be used for calculating co-occurrence matrices, the choice of the angles is usually limited to angles as a multiple of 45˚(0˚, 45˚, 90˚, 135˚), allowing direct use of intensity values without any interpolation (as co-occurrence matrices are rotationally invariant). To reduce dimensionality, for any given sliding window, a single GLCM from all four angles is calculated in this study. Let P define the GLCM of a quantized image I (x,y) with P(i,j) representing the GLCM and N g representing the number of gray levels in image I. Only one GLCM of size N g × N g is computed per image per distance by simultaneously adding up the frequency of co-occurrences of all pixels with their connected neighbors with all pixels considered once as a center voxel. The entry (i,j) of the of the normalized GLCM is then defined as: Before any discussion on image features computed using GLCM, the following notions should be considered: Cluster Prominence ¼ P Homogeneity I ¼ P i P j p ði; jÞ 1 þ ji À jj ð48Þ Maximum probability ¼ max i;j pði; jÞ ð52Þ Gray level run length matrix. Given an image, a gray level run can be defined as a set of consecutive pixels having the same gray level and being collinear in any given direction with the number of pixels in this set representing the length of the run. Gray level run length matrices (GLRLM) are used to represent this set where each element, denoted by P (i, j, θ), contains the number of runs with length j with the gray level of i and the orientation θ representing line segment formed by the pixels [57,58]. Gray level run length matrices can be computed by: Where f (x, n) represents the gray level function for the pixel (m, n) , and τ (i,θ) is the length of the gray level run i with direction θ and CARD denotes the cardinality of the set (number of elements). The choice of the angles is usually limited to angles as a multiple of 45˚(0˚, 45˚, 90˚, 135˚) allowing direct use of intensity values without any interpolation (as GLRLM is rotationally invariant). To reduce dimensionality, for any given image a single GLRLM from all four angles is calculated. Utilizing the original run length matrix P(i, j, θ), texture characteristics can be defined as: Long Run Emphasis Gray Level Non À Unif ormity Run Percentage ¼ n r n p ð66Þ Low Gray Level Run Emphasis High Gray Level Run Emphasis In above equations, n r represents the total number of runs and n p represents the total number of pixels in the image.
Using Gabor filters for texture characterization. For 2D images (spatial Gabor filter), convolution is used for applying Gabor filters where varying kernels are defined as Gaussian kernels modulated by a sinusoid [59,60]. Using Cartesian basis as the center, these kernels are defined based on an abscissa with orientation θ. Gaussian and sinusoid components of the Gabor filter K θ,σ,γ,λ,φ can be customized independently. Gabor filter K θ,σ,γ,λ,φ is defined as: Where the Gaussian component is customized by its deviation σ and a spatial aspect ratio γ defining the ellipticity of the circular Gaussian and the sinusoid is customized by a spatial wavelength λ and a phase offset ϕ. The Gabor kernel is expressed using the orientation θ and with a change of scale defined by the size of pixel (v x ,v y ) and by a translation of the center of the kernel (i c ,j c ) as: Where: x 0 ¼ ði À i c Þv x cosy þ ðj À j c Þv y siny ð72Þ A Gabor kernel is made of parallel stripes with different weights inside an ellipsoidal envelope with the parameters of the kernel controlling the size, the orientation and the position of these stripes. The wavelength λ, specified in pixels, represents the wavelength of the filter. This value is used for scaling the stripes whereas by modifying the wavelength, the overall size of the stripes is modified with the stripes keeping the same orientation and relative dimensions. This wavelength should be more than two and is often chosen to be less than the fifth of the image dimensions (because of the influence of image borders).
The angle of the parallel stripes is specified using the orientation θ. By modifying θ, the kernel can be rotated and oriented at the desired position. The shift of the cosine factor is represented by the phase offset ϕ. The symmetry of the kernel is determined by ϕ whereas by modifying the shift, positions of the inhibitory and excitatory stripes changes. The kernel is symmetric for a phase shift of ϕ = 0 and asymmetric for a phase shift of ϕ = π/4. Ellipticity is represented using the aspect ratio γ. The bandwidth b is used as a replacement for the Gaussian deviation σ [61,62]. For extracting Gabor features, an image R(x,y) is defined where the input image I(x,y) is convolved with every Gabor filter g(x,y) from the banks of available filters as [63,64]: Where Ã denotes 2D linear convolution and M and N are the size of the Gabor filter mask. The local squared energy E(x,y) of the filtered image can be obtained by computing the absolute mean deviation of the transformed values in the filtered images from the mean μ within a window W of size M x M y as: Local responses from each one of the Gabor filters can also be represented in terms of amplitude A(x,y) defined as: To reduce dimensionality, for each scale λ a single mean value for squared energy and amplitude is calculated in this study.

Feature selection
As many of the extracted image features can be redundant or correlated, resulting in reduced performance in many classification methods, feature selection should be used for identifying the most prominent and informative features. Feature selection increases the classification accuracy by identifying the most discriminant features and their combination that might otherwise be hidden or contaminated by noisy features in a large feature vector. Another problem to consider during feature selection is the fact that the combinations of individually good features might not necessarily result in a good classification performance. In other words, "the m best features are not the best m features" [65]. It is likely that features selected using Max-Relevance criterion would have a high degree of redundancy, i.e. these features could have high dependency amongst them. This redundancy between features can result in a classifier with less than optimal performance. In this study, "minimal-redundancy-maximal-relevance" (mRMR) method proposed by [65] has been utilized for selecting the best set of features. The mRMR approach selects the most informative features using the maximal relevance criterion based on mutual information while minimizing the redundancy between the features and has gained wide popularity, especially in biomedical data analysis.

AdaBoost classifier
Introduced by Freund and Schapire [66], Adaptive Boosting known as AdaBoost is a supervised machine learning technique that constructs a strong classifier by combining a set of lowlevel classifiers (known as weak learners) with low discrimination ability. The AdaBoost algorithm has become very popular in problems related to computer vision and medical imaging as it is simple to implement and is relatively fast. The AdaBoost algorithm can be easily applied to many problems without the need for extensive tuning. Only the number of training iterations needs to be defined, along with a set of weak learners. Moreover, the AdaBoost algorithm can quickly calculate the results of classification, this is especially useful in methods having lots of classifications. Given a training set, composed of features (samples) with ground truth classes, AdaBoost constructs a strong classifier by combining multiple weak learners in a coarseto-fine approach. Although AdaBoost can be applied to multi-class classification problems, the context of this work only requires binary classification, therefore binary AdaBoost classification will be discussed. Given a training set X comprised of features and their classes (x i , y i ), the goal of the AdaBoost algorithm is the construction of a strong classifier H that provides rules to predict the class of a sample x i in the training set X with good accuracy. Given a set of weak learners fh j g j and a maximum number of learning iterations T, the algorithm constructs this classifier H as a weighted sum of t weak learners balanced by their weights αt. It should be noted that a certain weak learner might be used more than once while some available weak learners might remain unutilized.
The AdaBoost learning process is based on defining a classifier that minimizes the prediction error for the training set X. This minimization problem could be considered as identifying a set of weights a i along a set of weak learners {h 1 ,. . .,h T } that minimize the classification error on the training set. As the expected class y i is given for each sample x i , classification error is well defined. The main idea behind the learning process is to begin by classifying the easiest cases and then shifting the focus on more difficult cases. This process is based on a distribution D t that assigns a weight to each training sample x i . During the learning process, the weights for well classified samples will decrease while the weights for misclassified samples will increase, thus shifting the focus on more difficult samples. The learning process is iterative, containing three main steps as shown in Algorithm 1. Train a weak learner h t using distribution D t 2.
Update distribution as: First, a proper weak learner h t should be chosen with an error smaller than 0.5 for the current distribution D t . Then the error ε t is calculated with respect to the current distribution D t and later used to define the weight αt for the weak learner h t . Finally, a new distribution D t +1 is calculated to focus more on samples that were not correctly classified during the step t while reducing the importance of correctly classified samples. A normalization term Z t is also used so that D t +1 remains as a distribution. Often, samples are gives an equal weight at the initial distribution. However, in some cases such as in an unbalanced training set, the distribution might also be used to give some samples more weights. The algorithm stops when there is no error or when no weak learner has an error smaller than 0.5, as the classification accuracy cannot be improved on the training set. In the first instance, classification is already perfect in the training set, thus no further improvement is possible and in the second instance, the addition of more weak learners will not reduce the classification error. It should be noted that a treebased week learner has been utilized in this study.

Postprocessing
Due to the noise or anatomical structures that might be present in fundus images, there might be small blob-like regions of unconnected pixels wrongly identified as vessels (especially in pathological images). As a result, the final segmented image is cleaned by removing non-elongated and unconnected regions comprised of less than 30 pixels.

Performance measures
Definitions of true and false positive/negative. Sensitivity. Sensitivity represents the probability that the segmentation method will correctly identify vessel pixels. Sensitivity is computed as: Specificity. Specificity is the probability that the segmentation method will correctly identify non-vessel pixels. Specificity is computed as: The final classifier is the defined as: a t h t ðxÞ ! Accuracy. Accuracy represents the overall performance of a segmentation method. Accuracy is computed as: Area under a receiver operating characteristic curve. The area under a receiver operating characteristic curve (AUC), also known as AUROC, shows the performance of the segmentation method and is determined based on the trade-offs between sensitivity and specificity [67]. It should be noted that an AUC of 0.50 or less means that the segmentation algorithm is based purely on random guessing and an AUC of 1 means that the segmentation algorithm is able to segment all pixels correctly with respect to provided ground truth segmentation.

Results and discussion
In this study, like most of the previous studies, the performance of the proposed vessel segmentation method is compared to the provided ground truth segmentations by the first observer (referred to as the first expert in some publications) in all datasets and the test set of the DRIVE dataset. While DRIVE dataset includes separate training and testing sets, training the classifier for use in STARE and CHASE_DB1 datasets is done differently as they do not have a separate training set. Currently, there are two popular approaches for training a classifier on STARE dataset. The first approach is based on the leave-one-out concept [15,27] where one image is selected as test data and the classifier is trained using other images in the dataset with all the possible pixels used for the training, this process is repeated by changing the test image till all images in the dataset have been used once for testing. Second approach that is more popular (also used in this study) is based on randomly selecting a small number of pixels in each image and training the classifier using these samples with the use of 0.5% [26], 1% [35], 2% [28] and 6% [68] of the total pixels available in the dataset being suggested by different authors. In this study, a subset of 1% of randomly selected pixels inside the FOV from each image from STARE dataset and a subset of 5% of randomly selected pixels inside the FOV from each image from DRIVE training set is used for training the classifier as suggested by [35]. For CHASE_DB1 dataset, like the STARE dataset, a subset of 1% of randomly selected pixels inside the FOV from each image was used for training the classifier.
It should be noted that since the FOV masks are not provided in STARE and CHASE_DB1 datasets and to make the proposed method compatible with other datasets, FOV masks used were generated automatically and no dataset supplied FOV mask was used in this study. Table 1 illustrates the effects of different window sizes used for extracting image features on the classification accuracy using four randomly selected images from each of the datasets. Table 2 shows the accuracy implications of using different distances for calculating GLCM matrices computed on a 5×5 window utilizing an AdaBoost classifier with 200 learning cycles and 5-fold cross-validation using the same samples. As seen, the best segmentation accuracy can be achieved by using a window of 3×3 pixels and GLCM distance of one pixel.
The accuracy of different classification concepts was compared for selecting the most appropriate classification method for vessel segmentation using the same set of four randomly selected images from each of the datasets using 5-fold cross-validation with the AdaBoost classifier being the most accurate, as illustrated in Fig 10. As discussed, the accuracy of any classifier can be increased by selecting the most appropriate set of features and removing the redundant features. Table 3 illustrates different feature combinations sorted using the mRMR method and their effect on classifier accuracy using an AdaBoost classifier with 5-fold crossvalidation. It should be noted that each feature combination contains all the features listed before that combination using the same set of four randomly selected images from each of the datasets. As seen, a combination of 10 features can result in the best overall classification accuracy while reducing the computational time by an average of 70.74%.  As mentioned before, the length (the number of components) of an AdaBoost classifier is determined by the maximum number of learning cycles given by the user during the learning process. As the AdaBoost method is based on a coarse-to-fine approach, adding components in later learning cycles of the training process can not only result in a low increase in the overall quality of classification, it can also increase the classification error. Thus, an optimal number of components (length) of an AdaBoost classifier should be determined using a validation step. This optimal length can be determined by finding the number of components of the classifier whereby adding more components, no improvement on generalization error can be seen. Fig 11 illustrates the generalization error for an AdaBoost classifier for vessel classification using 1,250 learning cycles using the same set of four randomly selected images using Table 3. Different combinations of features and their effect on vessel segmentation accuracy. It should be noted that each feature combination contains all the features listed before that combination.

Feature combinations
Features Sensitivity Specificity Accuracy Supervised retinal vessel segmentation 5-fold cross-validation with 10 selected features per pixel (90% of samples were used for training and 10% were used for testing the classifier). As seen, by increasing the number of components the error decreases till 1100 components whereas the error begins to increase by adding more components. Based on validation results, training the classifier using 1080 learning cycles offers the best classification accuracy in this study. This validation step offers many advantages. First, it improves the noise tolerance of the classifier. If some noise is present in the training set, these noisy samples are used in later stages of the learning process due to the coarse-to-fine approach of the AdaBoost classifier. Thus, by determining the optimal number of learning cycles, the influence of these noisy samples on overall classification function might be minimized. Then, the validation can be used for reducing classifier over-fitting by identifying the optimum number of learning cycles. Finally, the classification process will be faster as the classifier will not perform unnecessary computations. It should be noted that the proposed method was implemented in MATLAB R2016a using an Intel Core i7-3370 3.4GHz CPU coupled with 8 gigabytes of RAM with average classifier training time of 260 minutes per dataset. Table 4 shows the performance of the proposed method compared to other state of the art segmentation methods for DRIVE and STARE datasets. Table 5 compares the performance of the proposed method and other state of the art segmentation methods that used CHASE_DB1 dataset. It should be mentioned that the first observer in DRIVE and CHASE_DB1 datasets has segmented thinner vessels compared to the second observer resulting in low sensitivity and high specificity while the second observer has segmented thinner vessels compared to the first observe in STARE dataset, resulting in high sensitivity and low specificity. Please note that the use of the character "-" in Tables imply that the performance metric was not implemented by the authors in their respective paper.
The results obtained on the DRIVE dataset shows that the proposed method is amongst the top methods with an accuracy of 0.9722, sensitivity of 0.8726 and specificity of 0.9884. In the  Moreover, the results show that the proposed method was more accurate than the segmentation provided by the second human observer for DRIVE and STARE datasets while being very close to the segmentation provided by the second human observer in CHASE_DB1 dataset. The results achieved are comparable to most supervised and unsupervised segmentation methods from the literature. The low interest for including CHASE_DB1 dataset in the development of vessel segmentation methods and lower segmentation accuracy observed in this dataset can be attributed to the non-uniform illumination in the background coupled with central vessel reflexes on some images and the low overall contrast between vessels, making accurate segmentation a challenging task. Although it is possible to increase the segmentation accuracy of the proposed method by adjusting the parameters for preprocessing and feature extraction/selection separately for each of the datasets, the goal of the study was to identify an optimal set of features, preprocessing and segmentation parameters that could be used on a variety of datasets and images.
For ensuring the robustness of supervised segmentation methods, a cross-training/testing approach is commonly used where a classifier is trained using one dataset and tested on other datasets and vice-versa. From a practical standpoint, this cross-training/testing can be a good measure of the effectiveness of supervised segmentation on unseen data with Table 6 illustrating the segmentation accuracy in case of this cross-training/testing. A total of 5%, 1% and 1% of the previously selected random pixels from DRIVE (train set), STARE and CHAS-E_DB1 datasets were used for training the classifiers, respectively. Table 7 compares the accuracy of different supervised methods proposed for retinal vessel segmentation in the case of cross-training/testing. As seen, the accuracy obtained for cross-training/testing is comparable to other methods from the literature with an accuracy of 0.9701 for DRIVE dataset trained using STARE dataset and an accuracy of 0.9484 for STARE dataset trained using DRIVE dataset. The proposed method can be considered independent of the training data as it did not show a considerable drop in accuracy compared to some other methods from the literature. The ROC curves of the proposed segmentation method for DRIVE, STARE and CHASE_DB1 datasets are illustrated in Fig 12. Table 8 shows the average segmentation time required by different vessel segmentation methods computed per image. Figs 13 and 14 illustrate a visual comparison between the vessel segmentation performance of the proposed method and other state of the art methods for a sample image from DRIVE and STARE datasets, respectively. As seen, the proposed method was able to provide acceptable segmentation accuracy with low levels of noise and segmentation artifacts. Although the visual comparison is subjective, it can still be used to highlight the advantages and disadvantages of different segmentation approaches. As illustrated, thin vessels and the noise in the images can be considered as the main challenges in retinal vessel segmentation. Another advantage of the proposed method can be seen in pathological retinal images where vessel pixels can be easily identified as non-vessels, degrading the usefulness of the segmentation. As illustrated in Fig 15, the proposed method could provide less noisy segmentation compared to other methods from the literature on some sample pathological retinal images from STARE dataset.

Conclusions
Vessel segmentation can be considered as an important step toward automated retina analysis tools. The segmented vessels can be used for advance retina image analysis such as computing the vessel tortuosity and diameter, differentiating arteries and veins along with measuring the arteriovenous ratio. Moreover, segmented vessels are routinely used as features in retinal disease classification systems that are used for identification of several systematic diseases such as stroke, hypertension or diabetes, to name a few. In this paper, a supervised retinal vessel segmentation algorithm based on matched filters and AdaBoost Supervised retinal vessel segmentation classifier is proposed. The image is enhanced using morphological operations, the contrast is increased utilizing CLAHE method and the image inhomogeneity is corrected by Retinex approach. Then, a combination of B-COSFIRE and Frangi matched filters are used to enhance the blood vessel network. From this enhanced image, using a sliding window, different pixel-wise statistical features are computed. Utilizing mRMR feature selection, a set of features were selected for use in an AdaBoost classifier while keeping the features as small as possible without sacrificing the segmentation accuracy. The proposed method could handle pathological retina images and produces good segmentation, especially in thinner vessels. The proposed segmentation method was validated on publicly accessible datasets using common validation metrics where the results in DRIVE (Sensitivity = 0.8726, Specificity = 0.9884), STARE (Sensitivity = 0.8085, Specificity = 0.9798) and CHASE_DB1 (Sensitivity = 0.8192, Specificity = 0.9591) datasets were shown to be comparable to all supervised and unsupervised methods from the literature.

Algorithm availability
The data used to test the algorithm with source code and MATLAB implementation of algorithms used in Table 8 are included as supporting information. The DRIVE, STARE and CHASE_DB1 datasets are available at http://www.isi.uu.nl/Research/Databases/DRIVE/, http://www.ces.clemson.edu/~ahoover/stare/ and https://blogs.kingston.ac.uk/retinal/ chasedb1/, respectively.