Multiview Locally Linear Embedding for Effective Medical Image Retrieval

Content-based medical image retrieval continues to gain attention for its potential to assist radiological image interpretation and decision making. Many approaches have been proposed to improve the performance of medical image retrieval system, among which visual features such as SIFT, LBP, and intensity histogram play a critical role. Typically, these features are concatenated into a long vector to represent medical images, and thus traditional dimension reduction techniques such as locally linear embedding (LLE), principal component analysis (PCA), or laplacian eigenmaps (LE) can be employed to reduce the “curse of dimensionality”. Though these approaches show promising performance for medical image retrieval, the feature-concatenating method ignores the fact that different features have distinct physical meanings. In this paper, we propose a new method called multiview locally linear embedding (MLLE) for medical image retrieval. Following the patch alignment framework, MLLE preserves the geometric structure of the local patch in each feature space according to the LLE criterion. To explore complementary properties among a range of features, MLLE assigns different weights to local patches from different feature spaces. Finally, MLLE employs global coordinate alignment and alternating optimization techniques to learn a smooth low-dimensional embedding from different features. To justify the effectiveness of MLLE for medical image retrieval, we compare it with conventional spectral embedding methods. We conduct experiments on a subset of the IRMA medical image data set. Evaluation results show that MLLE outperforms state-of-the-art dimension reduction methods.


Introduction
Medical image interpretation is a procedure which requires high accuracy. Currently, radiologists rely on both knowledge and heuristics to accomplish this procedure [1]. As a result of perceptual, training and fatigue differences among radiologists, there are variations in the interpretations made by different personnel to the same image [2][3][4]. Moreover, with the wide deployment of modern medical imaging devices in hospitals, large numbers of medical images are produced every day, placing an additional burden on radiologists. On one hand, they have to render accurate diagnoses for each image; on the other, they have to interpret large amounts of medical images within a limited time frame [4].
To tackle these challenges, content-based image retrieval (CBIR) has been introduced into the radiology interpretation routine in recent years [4][5][6][7][8][9][10][11]. CBIR employs visual descriptors to represent medical images, and machine learning techniques to retrieve and compare those images. For a given query image, the technique of contend based medical image retrieval (CBMIR) aims to find its visually similar and semantically relevant counterparts by retrieving samples from a given medical image archive. In the context of CBMIR, medical image is usually represented as vector with attributed features. Then similarity between two medical images is measured by distance between the corresponding feature vectors. This helps radiologists to efficiently extract similar cases from a variety of archives, thus providing assistance with medical image interpretation and decision making.
Similar to CBIR, CBMIR faces two basic issues: using discriminative visual features to represent medical images and assessing similarity among images represented in the feature space. This paper focuses on the former issue.
By contrast with images in other domains, most medical images have gray values, and fine details are emphasized in the image content [4]. A single feature therefore cannot cover all the details of a medical image. Following this observation, many visual features have been simultaneously employed to reveal different aspects of medical images. Dimitrovski et al. [12] extracted pixel value, local binary pattern (LBP) [13], edge histogram descriptor [14] and SIFT features [15] to represent medical images. Lehman et al. [16] proposed an automatic medical image categorization framework that combines four types of texture feature and one intensity feature to represent medical images. Chen et al. [17] extracted six textual features to represent ultrasound images. In [18], Wu et al. recently extracted texture features and morphological features to classify ultrasound breast tumor images. Moreover, Dy et al. [19] proposed a lung image retrieval method based on 110 features. For a detailed review of features used in the medical domain, please refer to [4]. In this paper, we call these visual features ''multiview features''.
With the increasing use of multiview features, medical CBIR also suffers from the ''curse of dimensionality''. To reduce the dimension of feature vectors, one conventional solution is to concatenate these feature vectors into a long vector, and then use traditional dimension reduction techniques, e.g., locally linear embedding (LLE) [20], principal component analysis (PCA) [21] or laplacian eigenmaps (LE) [22], to project the concatenated vector to a low-dimensional subspace. Huang et al. [23] built a computer-aided breast cancer diagnosis system using PCA to project original high-dimensional textual features into a lowdimensional feature space. Zhang et al. [24] proposed a brain midsagittal plane image recognition system that employed PCA to perform dimensionality reduction. Chen et al. [17] used PCA to reduce the dimension of textural feature vectors extracted from breast ultrasound images. In [25], Cho et al. employed linear discriminant analysis (LDA) to perform feature selection. Although these solutions have achieved promising results, there is room for performance enhancement, because these methods coarsely perform dimension reduction on all features and ignore the fact that different features have wide-ranging physical meanings. Recently, Bagci et al. [26] proposed a hybrid scheme for chest radiological image feature selection. They first selected features which could coarsely identify abnormal imaging patterns. Then they refined the selected features to enhance prediction accuracy.
To solve these problems, and considering the complementary properties of various features, we formulate a new method called multiview locally linear embedding (MLLE) to represent medical images in a low-dimensional feature space that is simultaneously learned from multiview features. MLLE is proposed in the context that multiview learning has received intensive attentions in the machine learning community [27][28][29][30][31][32][33][34][35]. The key idea of MLLE comes from patch alignment framework [36] and LLE. The patch alignment framework unifies discrete spectral analysis-based dimensionality algorithms in two stages: local patch construction and whole alignment [36]. LLE constructs a local patch in the lowdimensional space by preserving the patch's linear reconstruction relation in original space, whereas MLLE constructs local patches from each feature space by preserving the geometric structure of patches according to the LLE criterion. To explore the complementary properties among multiview features, MLLE assigns various weights to patches from different feature spaces. Finally, MLLE uses global coordinate alignment [36,37] and alternating optimization [38] techniques to learn a smooth lowdimensional embedding from the multiview features. We present a detailed evaluation of MLLE for CBMIR to demonstrate its effectiveness. Compared to conventional dimension reduction methods, e.g., PCA, LLE, LE, MLLE differs in the following ways: 1) MLLE uses LLE to obtain the optimal low-dimensional subspace on each view, and 2) MLLE learns a smooth lowdimensional global subspace by exploring complementary properties of each view.
To evaluate performance of the proposed MLLE, we conduct experiments on an IRMA [39] coded medical image data set [40]. IRMA medical image coding system [39] is a mono-hierarchical multi-axial classification standard for medical images. The system classifies medical images from four orthogonal axes: imaging modality, body orientation, examined body region and examined biological system. IRMA coding system is applicable to medical images obtained by different medical imaging techniques, which include computed tomography (CT), digital radiography (DR), magnetic resonance imaging (MRI), and positron emission tomography (PET), etc.

Multiview Locally Linear Embedding
In this section, we detail the presented dimension reduction algorithm, i.e., MLLE. To better present MLLE, we first explain meanings of math notations used in this paper.
In the rest of this paper, X~fx 1 , Á Á Á ,x N g denotes medical image data set, which contains N medical images. Y denotes the corresponding low-dimensional embedding of X : For each medical image x i , i~1, Á Á Á ,N, we extract V different low level features to represent its visual content. Then we say that x i has V different views: x v i is the feature vector of x i on the vth view. Accordingly, X has V different views: x v iK g represents the local patch of x i built on the vth view, which contains Kz1 images. Wherex x v i1 , Á Á Á ,x x v iK are K nearest neighbors ofx x i : Detailed description of these math notations is listed in Table 1. Table 1. Important notations used in this paper.

Notation
Description Notation Description

Local Patch Construction
Local patch construction on single view. Given a point . ,x x v iK by linear coefficients [20] x wherew w i~( w i1 ,w i2 , . . . ,w iK ) T is determined by minimizing reconstruction errord d i arg miñ By solving (2), we get [20].
When Kwm or when data pointsx x i1 , Á Á Á ,x x iK are not in general position, matrix M is singular or near singular [41]. To avoid this, a regularization term is added to each entry of M according to the following criterion [41]: where constant c satisfies c 2 %1, tr( : ) is the trace operator. And g pq is defined as LLE assumes Y v i~fỹ y v i ,ỹ y v i1 ,ỹ y v i2 , . . . ,ỹ y v iK g, the corresponding local patch of X v i in the learned low-dimensional embedding, is also reconstructed byw w ĩ i ,X 2 i , . . . ,X V i : These multiview local patches correspond to various low-dimensional local patches. We denote these low dimensional local patches as The differing features make different contributions to the representation of the medical image in the final low-dimensional embedding Y , so these low-dimensional local patches have different degrees of importance in determining Y : Considering this, we have the following objective function of multiview local patch optimization for the ith patch wherec c~fc 1 ,c 2 , . . . ,c V g T , the vth entry c v implies the contribution of vth view to learn the final embedding Y :

Global Coordinate Alignment
For each local patch . . ,ỹ y v iK g: By assuming that all Y v i s are chosen from the final embedding Y~fỹ y 1 ,ỹ y 2 , . . . ,ỹ y N g, i.e., Y v i~Y S v i , we can obtain the final low-dimensional embedding Y : , if the kth nearest neighbor ofx x v i isx x v n ; 0, otherwise: Considering the whole medical image data set X~fx 1 ,x 2 , . . . ,x N g, we can unify all local patches into the final embedding Y to obtain the global coordinate alignment (detailed derivation is given in Appendix S1) arg min where Objective Function To uniquely determine the low-dimensional embedding Y from (9), we add the constraint YY T~I : Thus Y is obtained by solving the optimization problem The solution toc c is c v~1 corresponding to the vth view which minimizes tr(YL v Y T ), and c v~0 otherwise. This means that only one view is selected to learn the low-dimensional embedding Y , while other views are discarded. To avoid this, we set c v /c r v with rw1: Then the optimization problem in (11) Alternating Optimization There are two unknown parameters, i.e.,c candY , in (12). Here we employ the alternating optimization technique [38] to solve the optimization problem. The alternating optimization procedure includes the following two steps.
Step 1: Fixc c to update Y Whenc c is fixed, the optimization problem in (12) equals Because L v is symmetric and positive semidefinite (the proof is given in Appendix S2), then, P V v~1 L v is symmetric and positive semi-definite. Hence, the optimization problem in (13) can be solved by using eigenvalue decomposition on P V v~1 L v : The globally optimal solution is the eigenvectors having the smallest d eigenvalues of P V v~1 L v :  Step 2: Fix Y to updatec c When Y is fixed, the optimization problem in (12) can be solved by using Lagrange optimization. The Lagrange function is By taking the derivate of L with respect to unknown parameter c c, and given that . . .
we get Experiment Setup In this section, we describe experiment setup for performance evaluation of MLLE for CBMIR. We organize this section as follows. In Section 3.1, we introduce our test bed, i.e., IRMA medical image data set. In Section 3.2, medical image feature extraction is detailed.

IRMA Medical Image Data Set
The IRMA medical image data set is a popular benchmark database used to evaluate CBMIR [6,12,42,43]. The new version of the IRMA medical image data set [40] contains 193 categories with a total of 12,677 fully annotated gray value radiographs in a training set. These images are 8 bits per pixel. The images are categorized according to a mono-hierarchical multi-axial classification standard called IRMA coding system [39]. The coding system classifies a medical image from four orthogonal axes: imaging modality, body orientation, body region examined and biological system examined. We select the first 57 categories containing a total of 10,902 images from the training set for our experiment. Figure 1 shows examples of the images used in our evaluation.

Feature Extraction
All images in the IRMA dataset are gray value images, which encode ample texture information. We use three image descriptors, i.e., local binary patterns (LBP) [13], SIFT [15], and pixel intensity, to extract the visual features from each medical image.
To enhance the discriminability of the image descriptors, we divide the medical image into equal regions for each descriptor. In each region, an image descriptor is employed to extract the visual features. Finally, we concatenate all the feature vectors obtained from the regions in a single long vector to represent the image. For each image descriptor, we employ four different image division schemes. There are three image descriptors, and each image descriptor generates four different features. Thus, we obtain twelve different features from each image. The feature extraction procedures of each image descriptor are detailed below.
LBP. LBP is a powerful descriptor for analyzing twodimensional textures. LBP has the advantages that it is robust to gray-scale variations and low computational complexity. This makes LBP appropriate for gray-scale medical image analysis. Formally, for center pixel c at (x,y) with gray value g c , there are P equally spaced pixels contained in the circularly symmetric neighbor set of c with radius R. LBP assigns a unique value to the center pixel c [13]: where g p is the gray value of the pth neighbor of center pixel c, Observing LBP value in binary circular representation, we find that a vast majority of LBP binary codes, sometimes more than 90%, have ''uniform'' appearance [13]. Here, uniform appearance indicates that there are limited numbers of 0=1or1=0 transitions in LBP code. These uniform binary patterns capture discriminant local features, e.g., edges, corners, and spots, of the image content. After computing LBP values over an examined image or image region pixel by pixel, these LBP values are accumulated into a discrete occurrence histogram. Uniform patterns in the histogram with different LBP values are accumulated to various bins, while the remaining ''non-uniform'' patterns are accumulated in another bin.
In our implementation, we use the LBP u2 (8,1) operator to compute the LBP values over a medical image, pixel by pixel. The subscript (8, 1) means that eight neighbors, equally contained in the circle with radius one, are utilized to determine the LBP value of the center pixel. Clearly, the resulting LBP value can be encoded into an eight bits binary string. The superscript u2 represents a uniform pattern which has at most two 0=1or1=0 transitions. For an eight bits LBP binary string, there are 58 u2 patterns. Hence the resulting discrete occurrence histogram has 59 bins.
To enhance the discriminability of the LBP descriptor, we divide the medical image into equal regions. A normalized 59-bin histogram is built for each region. Finally, these normalized histograms are concatenated into a single histogram as a feature vector of the image. We employ four image division schemes: 363, 464, 565 and 666, giving us four different LBP feature vectors for Figure 2 demonstrates a 464 image division scenario and the concatenated LBP histogram extracted from the image.  SIFT. Following the bag of features paradigm [44] and dense sampling strategy, we build SIFT histograms to present medical images. We begin by extracting 128-D SIFT vectors [15] from patches densely sampled from the image. The sampling space and patch size are set as 8 and 16616, respectively. The next step is to build a visual word dictionary over all the SIFT vectors extracted from the entire data set. Following the settings in [12], we employ K-means clustering to learn the dictionary. Euclidean distance is used as the measurement of the distance between two SIFT vectors. To reduce computing time, we set the number of iterations as 100. The visual word dictionary size is set as 500. We finally acquire a SIFT visual word dictionary D sift [R 128|500 , where each column vectord is the centroid SIFT vector generated by K-means clustering. We call column vectord d i a ''visual word''.
Via dense sampling, each sampled image region x is represented as a collection of SIFT vectors S~fs s i g P i~1 , where P is the total number of patches sampled from x: For each SIFT vectors s i , there exists a unique visual word d j [D sift , which is nearest tos s i : We assign the visual word index, i.e., j, tos s i , so that each patch sampled from x has a unique index in the visual word dictionary D sift : Consequently, x can be denoted as a collection of visual word indexes. Accumulating these indexes into a 500-bin histogram, we obtain a SIFT histogramh h sift [R 500|1 to present x: To enhance the discriminability of the SIFT descriptor, we also divide each image equally into 161, 262, 363 and 464 regions, respectively. From each region, a 500-bin SIFT histogram is generated. By concatenating and normalizing these SIFT histograms, we obtain a long vector to represent the whole image. Thus for each image, we obtain four different SIFT features: Figure 3 illustrates a 262 division scenario and the corresponding normalized concatenated SIFT histogram.
Pixel intensity. The raw intensity value of each image pixel is also utilized as a content descriptor to represent the image. We follow the bag of features paradigm and dense sampling strategy to generate intensity histograms from medical images. The parameter settings of dense sampling and visual word dictionary building are the same as those detailed in Section 3.2.2. We utilize a 16|16 image patch p to densely sample each image region. Therefore, we obtain an intensity vectorṽ v[R 256|1 by concatenating the intensity values of 256 pixels contained in p: We also utilize K-means clustering to generate an intensity visual word dictionary D intensity [ R 256|500 : Via histogram accumulation, we finally obtain a 500-bin intensity histogram to represent the sampled image or image region.
To enhance the discriminability of the intensity descriptor, we also divide each image equally into 161, 262, 363 and 464 regions, respectively. An intensity histogram is built from each region. Finally, a histogram of the whole image is obtained by concatenating the region intensity histograms into a long vector. Thus for each image, we finally obtain four intensity Figure 4 shows the 161 division scenario and the corresponding normalized intensity histogram.

Results
This section evaluates performance of MLLE compared with that of LLE, MSE [31], LE and PCA, in the context of CBMIR. We organize this section as follows. In Section 4.1, we evaluate performance of these dimensionality reduction methods using mean average precision (MAP). In Section 4.2, we use receiver operating characteristic (ROC) curve analysis to evaluate performance of these methods. Section 4.3 reports evaluation results in terms of sensitivity, specificity, and diagnostic odds ratio (DOR). In Section 4.4, we explore effects of parameters d,k and r on performance of MLLE. In Section 4.5, we discuss performance discrepancy of MLLE when using different distance metrics to compute the K-nearest neighbors contained in local patch, which is detailed in Section 2.1. In Section 4.6, we conduct experiments to demonstrate that there is no need to perform feature selection before MLLE.

Performance Evaluations Using MAP
In this section, we use MAP to compare the effectiveness of the proposed MLLE for CBMIR with that of LLE, MSE, PCA and LE.
The experiment is conducted as follows. First, the lowdimensional subspaces of the medical image data set are learned by MLLE, PCA, LLE, MSE and LE, respectively. MLLE simultaneously learns a low-dimensional subspace from twelve features. For the other three methods, low-dimensional subspaces are learned by concatenating all twelve features. Second, based on the learned subspaces, a ''leave one out'' image retrieval procedure is conducted in the data set. In detail, we choose one image as the query sample for each category; all other images from the data set are ranked according to the Euclidean distance to the query image measured in the low-dimensional subspace. For each query, the top N images are returned. In this section, we use MAP to evaluate the performance of a dimension reduction method. MAP is the mean of all average precisions (AP) for different categories. The AP is computed in the ranked top N images. Figure 5 shows the MAP values when different dimension reduction methods are used. The number of top N images starts with one, and increases from five to fifty with step five. The result shows that our MLLE method achieves the best performance. The most effective feature of MLLE is that it benefits from the alternating optimization and global coordinate alignment techniques, which exploit the complementary properties of different features and simultaneously learn a unified low-dimensional subspace from these features.
To detail the effectiveness of MLLE for CBMIR, we illustrate one of the retrieval results in Figure 6. As shown in the figure, there are six rows of medical images. From top to bottom, the first row is the query image, while the other five rows are the retrieval results of MLLE, LLE, PCA, LE and MSE, respectively. Each row of retrieval results consists of the top ten images retrieved from the data set. From the figure, we can see MLLE has the best retrieval performance. In (B), all of the images retrieved by MLLE come from the same category as the query image. In (C), images 2, 4, 6, 10 retrieved by LLE are not similar to the query image. In (D), images 2, 3, 4, 5, 7, 10 are erroneously retrieved by PCA. In (E), images 2, 3, 4, 7 are incorrectly retrieved by LE. Moreover, images 1, 8, 10 in (F) are also erroneously retrieved by MSE.

Performance Evaluations Using ROC
In this section, we compare performance of MLLE with that of LLE, MSE, LE and PCA using ROC curve analysis.
ROC curve analysis is a popular mechanism to measure the ability of a computer program to determine a given medical image as ''positive'' or ''negative'', which is the typical ''two-class'' classification problem. And currently, there is no practical methods to assess the performance of ''N-class'' classification task using ROC curve [45]. We treat CBMIR as a binary classification problem: for a given query image, the task of CBMIR is to classify samples contained in image data set into two classes, i.e., positive class (query image relevant class) and negative class (query image irrelevant class). The IRMA medical image data set used in our experiments contains 57 categories. So we evaluate retrieval performance of MLLE and other dimensionality reduction Experiment #2 also has two steps. This experiment only differs from experiment #1 that k-nearest neighbors (KNN) is used as classifier in step 2. In detail, for a given test sample s, a ''leave one out'' retrieval is performed. All other images contained in the data set are sorted according to their Euclidean distance to s: The probability that s is positive is defined as p~pos=k, where pos is the number of positive samples among k nearest neighbors of s: In our experiment, we set k as 15.
We conduct ROC curve analysis on the IRMA category 14, 16, 20, 21, 22 and 49, respectively. For each IRMA category, number of samples contained in positive and negative test set is detailed in Table 2. Figure 7 shows ROC curves obtained via experiment #1. In the experiment, we use SVM as classifier. Table 3 details the corresponding area under ROC curve (A Z value). Figure 8 presents ROC curves obtained via experiment #2. In the experiment, we use KNN (K = 15) as classifier. Table 4 reports the corresponding A Z value. These results are obtained using statistical software MedCalcH 12.7.0.
The computed A Z values for detecting between positive and negative images from IRMA category 16, 20, 21, 22 and 49 are also detailed in Table 3. The corresponding comparison of ROC curves is demonstrated in Figure 7 (B), (C), (D), (E), and (F), respectively. The results indicate that MLLE achieves best performance than traditional dimensionality reduction methods. We can draw the same conclusion by analyzing Figure 8 and Table 4.
Based on these two experiments, we conclude that PCA performs poorly in experiment #1 is caused by the subsequent classifier, SVM. We further discuss the reason as follows.
PCA maximizes the mutual information between original high dimensional Gaussian distributed samples and projected lowdimensional samples. It does not explore the geometric structure of the data. Therefore, in the very low dimensional subspace projected by PCA, when there exists great imbalance between positive and negative set (as shown in Table 2), it is hard for SVM to find the optimal hyperplane to separate positive set from negative set.
Different to PCA, MLLE, LLE, MSE and LE are manifold learning based dimensionality reduction methods. These methods explore geometric structure among samples in high dimensional    data set, and preserve the structure in low dimensional sub-space. Therefore, though great imbalance exists between positive and negative set, it is possible for SVM to find the optimal hyperplane to separate positive set from negative set. Because geometric structure of positive and negative set is preserved in the low dimensional data set, respectively. Then performance of MLLE, LLE, MSE and LE does not greatly affected by classifiers. We can draw the conclusion from Table 3 and Table 4.
Performance Evaluations Using Sensitivity, Specificity, and DOR In this section, we compare performance of MLLE with that of LLE, MSE, LE and PCA using sensitivity, specificity, and DOR.
Sensitivity, specificity and DOR are indicators to compare performance of competing diagnostic tests, which are used to separate subjects with a target disorder from subjects without it [48]. Diagnostic test is the typical ''two-class'' classification problem: for a given subject, the aim of diagnostic test is to determine whether the subject is ''positive'' (with a target disorder) or ''negative'' (without a target disorder).
Following this, we design experiments to evaluate diagnostic performance of MLLE, LLE, MSE, LE and PCA on each category of IRMA data set, respectively. In detail, for each IRMA category, we treat it as positive test set. Meanwhile, a negative test set containing equal number of samples as that of positive test set is constructed by randomly selecting images from other categories. Based on the positive test set and negative test set, a diagnostic test procedure is performed on low-dimensional embedding obtained by MLLE, LLE, MSE, LE and PCA, respectively. Definitely, for each test image, all other images contained in IRMA data set are ranked according to their L2 distances to the test image. Then diagnostic result of the test image is determined by the following criterion: if more than half of the top k ranked images is positive, then the test image is positive; otherwise, the test image is negative. In our experiments, we set k as 15.   Similar to ROC curve analysis, we present here experimental results obtained on four IRMA categories. Experimental results on other categories can also be obtained with the method detailed above. Table 5, Table 6, Table 7 and Table 8 compare diagnostic performance of MLLE, LLE, MSE, LE and PCA in terms of sensitivity, specificity and DOR, which are obtained on IRMA category 1, 4, 7 and 25, respectively. We get these results using Meta-Disc 1.4 [49]. As shown in Table 5, the estimated sensitivity, specificity and DOR for the proposed MLLE in determining images from category 1 is 0.92 (2129=(2129z185)), 0.99 (2285=(2285z29)) and 906.76 ((2129=185)=(29=2285)), respectively. This means that for MLLE the odds for positivity among medical images from IRMA category 1 are 906.76 times higher than the odds for positivity among medical images from other IRMA categories. In the same way, the DORs for LLE, MSE, LE and PCA can be calculated. From Table 5 we can draw the conclusion that MLLE has the highest DOR in discrimination of IRMA category 1 compared to LLE, MSE, LE and PCA (906.76 vs. 773.44, 523.27, 335.32 and 675.00, respectively). The same conclusion can be drawn from Table 6, Table 7 and Table 8.
Evaluation results in terms of sensitivity, specificity, and DOR show that the proposed MLLE yields significantly higher performance than traditional dimensionality reduction methods.

Effects of Parameters
In this section, we analyze effects of parameters on MLLE performance. These parameters include d, dimension of the  learned embedding, K, number of nearest neighbors contained in local patch, and r, scaling factor for the weight of each feature. Effects of parameter d. Figure 9 shows the MAP values when the propose MLLE is evaluated using different dimensionalities d: In these experiments, parameters k and r are same as those in the former experiment. From these experiments, we can see that the proposed MLLE outperforms existing dimension reduction methods. Moreover, we detail the MAP values of MLLE in Table 9. From the table we can see that MLLE achieves the best performance with d set as 200.
Effects of parameter K. Figure 10 shows the MAP values when the proposed MLLE is evaluated with different K: In the  Effects of parameter r. Figure 11 shows the MAP values when MLLE is evaluated with different r: In the experiments, parameters d, K are fixed to 200 and 140, respectively. In  Figure 11 (A), r is updated from 2 to 10 with step 1. From the figure, we can see that MLLE achieves best performance when r is approximate to 3. In Figure 11 (B), r is updated from 1.1 to 3 with step 0.1. It can be seen that MLLE achieves best performance when r is set as 2.5.

Performance Comparison of MLLE with Different Distance Metrics
Geodesic distance, L1 distance (which is also named city block distance or Manhattan distance) and L2 distance are well-known distance metrics used in the field of dimensionality reduction. In Section 2.1, we use L2 distance to find K-nearest neighbors of each medical image. In this section, we perform experiments to evaluate performance of MLLE with different distance metrics, i.e., geodesic, L1, and L2 distance.
Following the same experiment setup of experiment #1 detailed in Section 4.2, we conduct experiments to evaluate effects of these three different distance metrics on MLLE performance using ROC curve analysis. Figure 12 shows ROC curves of MLLE with different distance metrics obtained on IRMA category 2, 3, 19, 31, 51 and 52, respectively. The number of images contained in positive and negative test set for each category is presented in Table 10. Table 11 details the corresponding A Z values.
As shown in Table 11, for IRMA category 2, the A Z value for detecting between 1,103 positive images and 9,799 negative images is 0:979+0:0028 when using L2 distance. When applying L1 distance and geodesic distance, the computed A Z values are 0:945+0:0035 and 0:592+0:0083, respectively. Figure 12 (A) shows the comparison of ROC curves for these three sets of performance data. Table 11 demonstrates that L2 distance achieves the highest A Z value in detection of IRMA category 2 compared to L1 distance and geodesic distance ( 0:979+0:0028 vs. 0:945+0:0035 and 0:592+0:0083, respectively).
The computed A Z values for detecting between positive and negative images from IRMA category 3, 19, 31, 51 and 52 are also detailed in Table 11. The corresponding ROC curves are demonstrated in Figure 12 (B), (C), (D), (E) and (F), respectively. From these results we can conclude that L2 distance is the best solution for MLLE to construct local patches. The same conclusion can be drawn from experimental results obtained on other IRMA categories.

Selecting Features before MLLE
In this section, we conduct experiments to demonstrate that there is no need to perform feature selection before MLLE.
The proposed MLLE has the merit of simultaneously learning a low-dimensional embedding from multiple features, by exploring different significances of different features. In detail, MLLE assumes that each feature has different contribution to the final learned low-dimensional embedding, though the feature does not have significant difference between different medical images. We clarify this point based on two experiments described as follows.
Experiment #3 includes the following three steps.
Step 1: For each medical image x i [X , we divide its twelve features into three groups: Step 2: For each group, we employ laplacian score feature selection (LPFS) [50], the unsupervised feature selection method, to determine the importance of each feature. In detail, within each feature group, we concatenate the four feature vectors into a long vector. So we get three long feature vectors to represent : Then the medical image data set X has three different views: : On each view, we use LPFS to determine the importance of each feature. And the most important m feature entries are selected. Finally, X is represented by three dimensionreduced views: Step 3: We utilize MLLE to learn the low-dimensional embedding Y based on three views obtained in step 2. The dimension of Y is set as 200. We denote this method as lpfs-MLLE (laplacian score feature selection-based MLLE).
Experiment #4 includes the following three steps.
Step 1: This step is same as step 1 of experiment #3.
Step 2: For each feature group, we employ multi-cluster feature selection (MCFS) [51], the manifold learning-based feature selection method, to select features which can best preserve the multi-cluster structure of medical image data set X . In detail, each medical imagex x i has three different feature vectors: : On each view, we use MCFS to select m feature entries which can best preserve the multi-class structure of this view. In our experiment, we set m as 500. Then X can be represented by three dimension-reduced views: Step 3: This step is same as step 3 of experiment #3. We denote this method as mcfs-MLLE (Multi-cluster feature selection-based MLLE).
We compare performance of MLLE, mcfs-MLLE and lpfs-MLLE using ROC curve analysis. The experimental setup is same as that of experiment #1 detailed in subsection 4.2. Figure 13 shows ROC curves of these methods obtained on IRMA category 14, 27, 30, 43, 45 and 57, respectively. For each category, the number of samples contained in positive test set and negative test set is detailed in Table 12. Table 13 shows the corresponding A Z values. Table 13 shows that the A Z value for discriminating between 151 positive images from IRMA category 14 and 10,751 negative images from other categories is 0:990+0:0035 when using MLLE without feature selection. When applying MCFS and LPFS before MLLE to perform the same experiment, the computed A Z values are 0:848+0:0185 and 0:869+0:0137, respectively. Figure 13 (A) demonstrates the comparison of ROC curves for these three sets of performance data. From Table 13 we can see that directly using MLLE to perform dimensionality reduction yields the highest A Z value in the discrimination of IRMA category 14, compared to using feature selection methods MCFS and LPFS before conducting MLLE ( 0:990+0:0035 vs. 0:848+0:0185 and 0:869+0:0137, respectively).
The computed A Z values for detecting positive and negative images from IRMA category 27, 30, 43, 45 and 57 are also detailed in Table 13. The corresponding comparison of ROC curves are shown in Figure 13  respectively. Based on these results, we can come to the conclusion that, though using dimensionality reduction methods before MLLE can reduce features and save computing time, the learned embedding is worse than that obtained directly by MLLE.
It should be noted that, in this manuscript, to demonstrate the effectiveness of MLLE to explore complementary properties of different features, we extract twelve different features from each medical image. In practice, there is a trade-off between the number of visual features and retrieval performance. Within an acceptable range of retrieval performance, users can extract less visual features to save computing time. In fact, three to six visual features are capable of achieving the acceptable retrieval performance.

Discussion and Conclusion
We organize this section as follows. In Section 5.1, we give statistical analysis of experimental results presented above. Then we discuss the reason that MLLE achieves effective performance than existing dimensionality reduction methods in Section 5.2. Finally, Section 5.3 concludes our work.

Statistical Analysis
In this paper, we use MAP, DOR and ROC as criteria to evaluate the performance of different methods. These criteria reflect the effectiveness of these methods from different aspects. In particular, MAP demonstrates the retrieval performance of different methods on the IRMA test set. DOR and ROC show the ability of different methods to distinguish different types of medical image. Evaluation results obtained from different criteria demonstrate that MLLE achieves best results.
Statistically, we utilize F1-measure to determine the reliability of different criterion.    measure to other performance criteria, i.e., MAP and ROC, we can obtain the same conclusion.

Discussion
There are two reasons that make MLLE more effective to learn a low-dimensional embedding from multiview features, compared with existing dimensionality reduction methods. The first is that MLLE can simultaneously learn a low-dimensional embedding on multiview features. Different from other methods, MLLE uses LLE to obtain optimal low-dimensional subspace on each view and global coordinate alignment technique to unify all learned subspaces into a global one. The second is that MLLE can explore complementary properties among different features. Different from traditional dimensionality reduction methods that treat each feature equally, MLLE assigns different weight to each feature and utilizes alternating optimization technique to obtain these weights. Experimental results demonstrate the effectiveness of MLLE, in the context of CBMIR, compared with existing methods.

Conclusion
With the rapid proliferation of radiological images in the medical domain, retrieving medical images from large archives to aid radiological image interpretation is becoming one of the most active research fields. CBMIR utilizes multiple visual features to represent images, which brings the problem of the ''curse of dimensionality''. Though conventional dimensional reduction methods can be employed to tackle this problem, these solutions ignore the fact that different visual features have a range of physical meanings. There is therefore a challenge to discover the complementary properties of multiple visual features to represent medical images. In this paper, we propose a new multiview learning method called MLLE to address the problem. Experi-mental evaluations on a subset of the IRMA medical image dataset have demonstrated that the new method effectively represents medical images in a low-dimensional subspace, and thus improves the performance of CBMIR significantly.
In the proposed method, it is found that local patch size K, subspace dimension d and scaling factor r affect the effectiveness of MLLE. From Figure 10, Table 9 and Figure 11 we can see that optimal parameter values for MLLE exist on the IRMA medical image dataset. In the future, we will evaluate the performance of MLLE on other medical image test bed to further explore effects of parameters on MLLE.