Content-based medical image retrieval continues to gain attention for its potential to assist radiological image interpretation and decision making. Many approaches have been proposed to improve the performance of medical image retrieval system, among which visual features such as SIFT, LBP, and intensity histogram play a critical role. Typically, these features are concatenated into a long vector to represent medical images, and thus traditional dimension reduction techniques such as locally linear embedding (LLE), principal component analysis (PCA), or laplacian eigenmaps (LE) can be employed to reduce the “curse of dimensionality”. Though these approaches show promising performance for medical image retrieval, the feature-concatenating method ignores the fact that different features have distinct physical meanings. In this paper, we propose a new method called multiview locally linear embedding (MLLE) for medical image retrieval. Following the patch alignment framework, MLLE preserves the geometric structure of the local patch in each feature space according to the LLE criterion. To explore complementary properties among a range of features, MLLE assigns different weights to local patches from different feature spaces. Finally, MLLE employs global coordinate alignment and alternating optimization techniques to learn a smooth low-dimensional embedding from different features. To justify the effectiveness of MLLE for medical image retrieval, we compare it with conventional spectral embedding methods. We conduct experiments on a subset of the IRMA medical image data set. Evaluation results show that MLLE outperforms state-of-the-art dimension reduction methods.
Citation: Shen H, Tao D, Ma D (2013) Multiview Locally Linear Embedding for Effective Medical Image Retrieval. PLoS ONE 8(12): e82409. https://doi.org/10.1371/journal.pone.0082409
Editor: Yong Fan, Institution of Automation, CAS, China
Received: May 25, 2013; Accepted: October 23, 2013; Published: December 13, 2013
Copyright: © 2013 Shen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work has been supported in part by the National Natural Science Foundation of China (NSFC) under Grant No.61003017, the Project of Key Laboratory of Software Development Environment under Grant No. SKLSDE-2013ZX-30, and the ARC FT project under Grant No. FT130101457. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Medical image interpretation is a procedure which requires high accuracy. Currently, radiologists rely on both knowledge and heuristics to accomplish this procedure . As a result of perceptual, training and fatigue differences among radiologists, there are variations in the interpretations made by different personnel to the same image –. Moreover, with the wide deployment of modern medical imaging devices in hospitals, large numbers of medical images are produced every day, placing an additional burden on radiologists. On one hand, they have to render accurate diagnoses for each image; on the other, they have to interpret large amounts of medical images within a limited time frame .
To tackle these challenges, content-based image retrieval (CBIR) has been introduced into the radiology interpretation routine in recent years –. CBIR employs visual descriptors to represent medical images, and machine learning techniques to retrieve and compare those images. For a given query image, the technique of contend based medical image retrieval (CBMIR) aims to find its visually similar and semantically relevant counterparts by retrieving samples from a given medical image archive. In the context of CBMIR, medical image is usually represented as vector with attributed features. Then similarity between two medical images is measured by distance between the corresponding feature vectors. This helps radiologists to efficiently extract similar cases from a variety of archives, thus providing assistance with medical image interpretation and decision making.
Similar to CBIR, CBMIR faces two basic issues: using discriminative visual features to represent medical images and assessing similarity among images represented in the feature space. This paper focuses on the former issue.
By contrast with images in other domains, most medical images have gray values, and fine details are emphasized in the image content . A single feature therefore cannot cover all the details of a medical image. Following this observation, many visual features have been simultaneously employed to reveal different aspects of medical images. Dimitrovski et al.  extracted pixel value, local binary pattern (LBP) , edge histogram descriptor  and SIFT features  to represent medical images. Lehman et al.  proposed an automatic medical image categorization framework that combines four types of texture feature and one intensity feature to represent medical images. Chen et al.  extracted six textual features to represent ultrasound images. In , Wu et al. recently extracted texture features and morphological features to classify ultrasound breast tumor images. Moreover, Dy et al.  proposed a lung image retrieval method based on 110 features. For a detailed review of features used in the medical domain, please refer to . In this paper, we call these visual features “multiview features”.
With the increasing use of multiview features, medical CBIR also suffers from the “curse of dimensionality”. To reduce the dimension of feature vectors, one conventional solution is to concatenate these feature vectors into a long vector, and then use traditional dimension reduction techniques, e.g., locally linear embedding (LLE) , principal component analysis (PCA)  or laplacian eigenmaps (LE) , to project the concatenated vector to a low-dimensional subspace. Huang et al.  built a computer-aided breast cancer diagnosis system using PCA to project original high-dimensional textual features into a low-dimensional feature space. Zhang et al.  proposed a brain midsagittal plane image recognition system that employed PCA to perform dimensionality reduction. Chen et al.  used PCA to reduce the dimension of textural feature vectors extracted from breast ultrasound images. In , Cho et al. employed linear discriminant analysis (LDA) to perform feature selection. Although these solutions have achieved promising results, there is room for performance enhancement, because these methods coarsely perform dimension reduction on all features and ignore the fact that different features have wide-ranging physical meanings. Recently, Bagci et al.  proposed a hybrid scheme for chest radiological image feature selection. They first selected features which could coarsely identify abnormal imaging patterns. Then they refined the selected features to enhance prediction accuracy.
To solve these problems, and considering the complementary properties of various features, we formulate a new method called multiview locally linear embedding (MLLE) to represent medical images in a low-dimensional feature space that is simultaneously learned from multiview features. MLLE is proposed in the context that multiview learning has received intensive attentions in the machine learning community –. The key idea of MLLE comes from patch alignment framework  and LLE. The patch alignment framework unifies discrete spectral analysis-based dimensionality algorithms in two stages: local patch construction and whole alignment . LLE constructs a local patch in the low-dimensional space by preserving the patch’s linear reconstruction relation in original space, whereas MLLE constructs local patches from each feature space by preserving the geometric structure of patches according to the LLE criterion. To explore the complementary properties among multiview features, MLLE assigns various weights to patches from different feature spaces. Finally, MLLE uses global coordinate alignment ,  and alternating optimization  techniques to learn a smooth low-dimensional embedding from the multiview features. We present a detailed evaluation of MLLE for CBMIR to demonstrate its effectiveness. Compared to conventional dimension reduction methods, e.g., PCA, LLE, LE, MLLE differs in the following ways: 1) MLLE uses LLE to obtain the optimal low-dimensional subspace on each view, and 2) MLLE learns a smooth low-dimensional global subspace by exploring complementary properties of each view.
To evaluate performance of the proposed MLLE, we conduct experiments on an IRMA  coded medical image data set . IRMA medical image coding system  is a mono-hierarchical multi-axial classification standard for medical images. The system classifies medical images from four orthogonal axes: imaging modality, body orientation, examined body region and examined biological system. IRMA coding system is applicable to medical images obtained by different medical imaging techniques, which include computed tomography (CT), digital radiography (DR), magnetic resonance imaging (MRI), and positron emission tomography (PET), etc.
Multiview Locally Linear Embedding
In this section, we detail the presented dimension reduction algorithm, i.e., MLLE. To better present MLLE, we first explain meanings of math notations used in this paper.
In the rest of this paper, denotes medical image data set, which contains medical images. denotes the corresponding low-dimensional embedding of For each medical image we extract different low level features to represent its visual content. Then we say that has different views: where is the feature vector of on the view. Accordingly, has different views: Where is the feature matrix of on the view. represents the local patch of built on the view, which contains images. Where are nearest neighbors of Detailed description of these math notations is listed in Table 1.
Local Patch Construction
Local patch construction on single view.
Given a point its local patch is defined as where are nearest neighbors of in LLE preserves the local geometry of by assuming that is reconstructed from by linear coefficients (1)where is determined by minimizing reconstruction error
(2)By solving (2), we getwhere is a local Gram matrix, .
When or when data points are not in general position, matrix is singular or near singular . To avoid this, a regularization term is added to each entry of according to the following criterion :(3)where constant satisfies is the trace operator. And is defined as
Similar to equation (2), is determined by minimizing the reconstruction error (6)where encodes the local geometric information of
Local patch construction on multiple views.
Each sample has different local patches on different views, i.e., These multiview local patches correspond to various low-dimensional local patches. We denote these low dimensional local patches as The differing features make different contributions to the representation of the medical image in the final low-dimensional embedding so these low-dimensional local patches have different degrees of importance in determining Considering this, we have the following objective function of multiview local patch optimization for the patch(7)where the entry implies the contribution of view to learn the final embedding
Global Coordinate Alignment
For each local patch there is a low-dimensional embedding By assuming that all are chosen from the final embedding i.e., we can obtain the final low-dimensional embedding Selection matrix is defined as(8)
Considering the whole medical image data set we can unify all local patches into the final embedding to obtain the global coordinate alignment (detailed derivation is given in Appendix S1)(9)where , (10)
The solution to is corresponding to the view which minimizes and otherwise. This means that only one view is selected to learn the low-dimensional embedding while other views are discarded. To avoid this, we set with Then the optimization problem in (11) reduces to(12)
There are two unknown parameters, i.e., in (12). Here we employ the alternating optimization technique  to solve the optimization problem. The alternating optimization procedure includes the following two steps.
Step 1: Fix to update
Because is symmetric and positive semidefinite (the proof is given in Appendix S2), then, is symmetric and positive semi-definite. Hence, the optimization problem in (13) can be solved by using eigenvalue decomposition on The globally optimal solution is the eigenvectors having the smallest eigenvalues of
Step 2: Fix to update
In this section, we describe experiment setup for performance evaluation of MLLE for CBMIR. We organize this section as follows. In Section 3.1, we introduce our test bed, i.e., IRMA medical image data set. In Section 3.2, medical image feature extraction is detailed.
IRMA Medical Image Data Set
The IRMA medical image data set is a popular benchmark database used to evaluate CBMIR , , , . The new version of the IRMA medical image data set  contains 193 categories with a total of 12,677 fully annotated gray value radiographs in a training set. These images are 8 bits per pixel. The images are categorized according to a mono-hierarchical multi-axial classification standard called IRMA coding system . The coding system classifies a medical image from four orthogonal axes: imaging modality, body orientation, body region examined and biological system examined. We select the first 57 categories containing a total of 10,902 images from the training set for our experiment. Figure 1 shows examples of the images used in our evaluation.
All images in the IRMA dataset are gray value images, which encode ample texture information. We use three image descriptors, i.e., local binary patterns (LBP) , SIFT , and pixel intensity, to extract the visual features from each medical image.
To enhance the discriminability of the image descriptors, we divide the medical image into equal regions for each descriptor. In each region, an image descriptor is employed to extract the visual features. Finally, we concatenate all the feature vectors obtained from the regions in a single long vector to represent the image. For each image descriptor, we employ four different image division schemes. There are three image descriptors, and each image descriptor generates four different features. Thus, we obtain twelve different features from each image. The feature extraction procedures of each image descriptor are detailed below.
LBP is a powerful descriptor for analyzing two-dimensional textures. LBP has the advantages that it is robust to gray-scale variations and low computational complexity. This makes LBP appropriate for gray-scale medical image analysis.
Formally, for center pixel at with gray value , there are P equally spaced pixels contained in the circularly symmetric neighbor set of with radius . LBP assigns a unique value to the center pixel :(17)where is the gray value of the neighbor of center pixel
(18)Observing LBP value in binary circular representation, we find that a vast majority of LBP binary codes, sometimes more than , have “uniform” appearance . Here, uniform appearance indicates that there are limited numbers of transitions in LBP code. These uniform binary patterns capture discriminant local features, e.g., edges, corners, and spots, of the image content. After computing LBP values over an examined image or image region pixel by pixel, these LBP values are accumulated into a discrete occurrence histogram. Uniform patterns in the histogram with different LBP values are accumulated to various bins, while the remaining “non-uniform” patterns are accumulated in another bin.
In our implementation, we use the operator to compute the LBP values over a medical image, pixel by pixel. The subscript (8, 1) means that eight neighbors, equally contained in the circle with radius one, are utilized to determine the LBP value of the center pixel. Clearly, the resulting LBP value can be encoded into an eight bits binary string. The superscript u2 represents a uniform pattern which has at most two transitions. For an eight bits LBP binary string, there are 58 u2 patterns. Hence the resulting discrete occurrence histogram has 59 bins.
To enhance the discriminability of the LBP descriptor, we divide the medical image into equal regions. A normalized 59-bin histogram is built for each region. Finally, these normalized histograms are concatenated into a single histogram as a feature vector of the image. We employ four image division schemes: 3×3, 4×4, 5×5 and 6×6, giving us four different LBP feature vectors for each feature: and Figure 2 demonstrates a 4×4 image division scenario and the concatenated LBP histogram extracted from the image.
Following the bag of features paradigm  and dense sampling strategy, we build SIFT histograms to present medical images. We begin by extracting 128-D SIFT vectors  from patches densely sampled from the image. The sampling space and patch size are set as 8 and 16×16, respectively.
The next step is to build a visual word dictionary over all the SIFT vectors extracted from the entire data set. Following the settings in , we employ K-means clustering to learn the dictionary. Euclidean distance is used as the measurement of the distance between two SIFT vectors. To reduce computing time, we set the number of iterations as 100. The visual word dictionary size is set as 500. We finally acquire a SIFT visual word dictionary where each column vector is the centroid SIFT vector generated by K-means clustering. We call column vector a “visual word”.
Via dense sampling, each sampled image region is represented as a collection of SIFT vectors where P is the total number of patches sampled from For each SIFT vector there exists a unique visual word which is nearest to We assign the visual word index, i.e., to so that each patch sampled from has a unique index in the visual word dictionary Consequently, can be denoted as a collection of visual word indexes. Accumulating these indexes into a 500-bin histogram, we obtain a SIFT histogram to present
To enhance the discriminability of the SIFT descriptor, we also divide each image equally into 1×1, 2×2, 3×3 and 4×4 regions, respectively. From each region, a 500-bin SIFT histogram is generated. By concatenating and normalizing these SIFT histograms, we obtain a long vector to represent the whole image. Thus for each image, we obtain four different SIFT features: Figure 3 illustrates a 2×2 division scenario and the corresponding normalized concatenated SIFT histogram.
The raw intensity value of each image pixel is also utilized as a content descriptor to represent the image. We follow the bag of features paradigm and dense sampling strategy to generate intensity histograms from medical images. The parameter settings of dense sampling and visual word dictionary building are the same as those detailed in Section 3.2.2. We utilize a image patch to densely sample each image region. Therefore, we obtain an intensity vector by concatenating the intensity values of 256 pixels contained in We also utilize K-means clustering to generate an intensity visual word dictionary Via histogram accumulation, we finally obtain a 500-bin intensity histogram to represent the sampled image or image region.
To enhance the discriminability of the intensity descriptor, we also divide each image equally into 1×1, 2×2, 3×3 and 4×4 regions, respectively. An intensity histogram is built from each region. Finally, a histogram of the whole image is obtained by concatenating the region intensity histograms into a long vector. Thus for each image, we finally obtain four intensity feature vectors: Figure 4 shows the 1×1 division scenario and the corresponding normalized intensity histogram.
This section evaluates performance of MLLE compared with that of LLE, MSE , LE and PCA, in the context of CBMIR. We organize this section as follows. In Section 4.1, we evaluate performance of these dimensionality reduction methods using mean average precision (MAP). In Section 4.2, we use receiver operating characteristic (ROC) curve analysis to evaluate performance of these methods. Section 4.3 reports evaluation results in terms of sensitivity, specificity, and diagnostic odds ratio (DOR). In Section 4.4, we explore effects of parameters and on performance of MLLE. In Section 4.5, we discuss performance discrepancy of MLLE when using different distance metrics to compute the K-nearest neighbors contained in local patch, which is detailed in Section 2.1. In Section 4.6, we conduct experiments to demonstrate that there is no need to perform feature selection before MLLE.
In the following experiments, the subspace dimension in MLLE, LLE, MSE, PCA and LE is set as 200. The number of nearest neighbors in MLLE, LLE, MSE and LE is fixed to 140. The parameter for MLLE, MSE is fixed to 2.5. The procedure for finding optimal parameters and for MLLE is detailed in Section 4.4.
Performance Evaluations Using MAP
In this section, we use MAP to compare the effectiveness of the proposed MLLE for CBMIR with that of LLE, MSE, PCA and LE.
The experiment is conducted as follows. First, the low-dimensional subspaces of the medical image data set are learned by MLLE, PCA, LLE, MSE and LE, respectively. MLLE simultaneously learns a low-dimensional subspace from twelve features. For the other three methods, low-dimensional subspaces are learned by concatenating all twelve features. Second, based on the learned subspaces, a “leave one out” image retrieval procedure is conducted in the data set. In detail, we choose one image as the query sample for each category; all other images from the data set are ranked according to the Euclidean distance to the query image measured in the low-dimensional subspace. For each query, the top images are returned. In this section, we use MAP to evaluate the performance of a dimension reduction method. MAP is the mean of all average precisions (AP) for different categories. The AP is computed in the ranked top images.
Figure 5 shows the MAP values when different dimension reduction methods are used. The number of top N images starts with one, and increases from five to fifty with step five. The result shows that our MLLE method achieves the best performance. The most effective feature of MLLE is that it benefits from the alternating optimization and global coordinate alignment techniques, which exploit the complementary properties of different features and simultaneously learn a unified low-dimensional subspace from these features.
To detail the effectiveness of MLLE for CBMIR, we illustrate one of the retrieval results in Figure 6. As shown in the figure, there are six rows of medical images. From top to bottom, the first row is the query image, while the other five rows are the retrieval results of MLLE, LLE, PCA, LE and MSE, respectively. Each row of retrieval results consists of the top ten images retrieved from the data set. From the figure, we can see MLLE has the best retrieval performance. In (B), all of the images retrieved by MLLE come from the same category as the query image. In (C), images 2, 4, 6, 10 retrieved by LLE are not similar to the query image. In (D), images 2, 3, 4, 5, 7, 10 are erroneously retrieved by PCA. In (E), images 2, 3, 4, 7 are incorrectly retrieved by LE. Moreover, images 1, 8, 10 in (F) are also erroneously retrieved by MSE.
Performance Evaluations Using ROC
In this section, we compare performance of MLLE with that of LLE, MSE, LE and PCA using ROC curve analysis.
ROC curve analysis is a popular mechanism to measure the ability of a computer program to determine a given medical image as “positive” or “negative”, which is the typical “two-class” classification problem. And currently, there is no practical methods to assess the performance of “N-class” classification task using ROC curve . We treat CBMIR as a binary classification problem: for a given query image, the task of CBMIR is to classify samples contained in image data set into two classes, i.e., positive class (query image relevant class) and negative class (query image irrelevant class). The IRMA medical image data set used in our experiments contains 57 categories. So we evaluate retrieval performance of MLLE and other dimensionality reduction methods on each IRMA category and plot the corresponding ROC curves, respectively. Because of space limitation, we present here ROC curves obtained on six IRMA categories. ROC curves on other categories can also be obtained with the method detailed as follows.
We conduct two experiments, namely experiment #1 and experiment #2, to perform ROC curve analysis.
Experiment #1 includes the following two steps. Step 1: We project high dimensional medical image samples to 200-dimension subspace using MLLE, LLE, MSE, LE and PCA, respectively. In detail, for MLLE, we simultaneously learn the 200-dimension subspace from 12 visual features. For LLE, MSE, LE and PCA, we first combine 12 visual features into a 31,474-dimension vector. Then we utilize these methods to project the high dimensional dataset to 200 dimensional samples. Step 2: We employ binary support vector machines (SVM) as classifier to determine the probability that a given image is positive, based on the learned dimensionality reduced data set. In detail, we use LIBSVM  to solve the binary SVM classifier. For each IRMA category, five-fold cross-validation scheme  is employed to train the binary SVM classifier. Then we treat all images within current IRMA category as positive test examples for ROC curve analysis. Meanwhile, we utilize images within other categories as negative test examples.
Experiment #2 also has two steps. This experiment only differs from experiment #1 that k-nearest neighbors (KNN) is used as classifier in step 2. In detail, for a given test sample a “leave one out” retrieval is performed. All other images contained in the data set are sorted according to their Euclidean distance to The probability that is positive is defined as where is the number of positive samples among k nearest neighbors of In our experiment, we set as 15.
We conduct ROC curve analysis on the IRMA category 14, 16, 20, 21, 22 and 49, respectively. For each IRMA category, number of samples contained in positive and negative test set is detailed in Table 2. Figure 7 shows ROC curves obtained via experiment #1. In the experiment, we use SVM as classifier. Table 3 details the corresponding area under ROC curve (AZ value). Figure 8 presents ROC curves obtained via experiment #2. In the experiment, we use KNN (K = 15) as classifier. Table 4 reports the corresponding AZ value. These results are obtained using statistical software MedCalc® 12.7.0.
The classifier is SVM. (A) ROC curves obtained on IRMA category 14. (B) ROC curves obtained on IRMA category 16. (C) ROC curves obtained on IRMA category 20. (D) ROC curves obtained on IRMA category 21. (E) ROC curves obtained on IRMA category 22. (F) ROC curves obtained on IRMA category 49.
(A) ROC curves obtained on IRMA category 14. (B) ROC curves obtained on IRMA category 16. (C) ROC curves obtained on IRMA category 20. (D) ROC curves obtained on IRMA category 21. (E) ROC curves obtained on IRMA category 22. (F) ROC curves obtained on IRMA category 49.
From Table 3 we can see that the AZ value for determining between 151 positive images from IRMA category 14 and 10,751 negative images from other categories is when using the proposed MLLE. When applying LLE, MSE, LE and PCA to distinguish positive and negative images, the computed AZ values are and respectively. Figure 7 (A) represents the comparison of ROC curves for these five sets of performance data. Table 3 demonstrates that MLLE yields the highest AZ value in discrimination of IRMA category 14 compared to LLE, MSE, LE and PCA ( vs. and respectively).
The computed AZ values for detecting between positive and negative images from IRMA category 16, 20, 21, 22 and 49 are also detailed in Table 3. The corresponding comparison of ROC curves is demonstrated in Figure 7 (B), (C), (D), (E), and (F), respectively. The results indicate that MLLE achieves best performance than traditional dimensionality reduction methods. We can draw the same conclusion by analyzing Figure 8 and Table 4.
Another phenomenon should be noted is the significant performance difference of PCA between experiment #1 and experiment #2. From Figure 7 and Table 3, we can see that PCA achieves poor performance (AZ value of PCA on IRMA category 14, 16, 20, 21, 22 and 49 is and respectively). Moreover, the performance of PCA is worse than that of other methods. While Figure 8 and Table 4 demonstrate that PCA gains significant performance improvement (AZ value of PCA on IRMA category 14, 16, 20, 21, 22 and 49 is and respectively). And the performance of PCA is better than that of MSE and LE.
Based on these two experiments, we conclude that PCA performs poorly in experiment #1 is caused by the subsequent classifier, SVM. We further discuss the reason as follows.
PCA maximizes the mutual information between original high dimensional Gaussian distributed samples and projected low-dimensional samples. It does not explore the geometric structure of the data. Therefore, in the very low dimensional subspace projected by PCA, when there exists great imbalance between positive and negative set (as shown in Table 2), it is hard for SVM to find the optimal hyperplane to separate positive set from negative set.
Different to PCA, MLLE, LLE, MSE and LE are manifold learning based dimensionality reduction methods. These methods explore geometric structure among samples in high dimensional data set, and preserve the structure in low dimensional sub-space. Therefore, though great imbalance exists between positive and negative set, it is possible for SVM to find the optimal hyperplane to separate positive set from negative set. Because geometric structure of positive and negative set is preserved in the low dimensional data set, respectively. Then performance of MLLE, LLE, MSE and LE does not greatly affected by classifiers. We can draw the conclusion from Table 3 and Table 4.
Performance Evaluations Using Sensitivity, Specificity, and DOR
In this section, we compare performance of MLLE with that of LLE, MSE, LE and PCA using sensitivity, specificity, and DOR.
Sensitivity, specificity and DOR are indicators to compare performance of competing diagnostic tests, which are used to separate subjects with a target disorder from subjects without it . Diagnostic test is the typical “two-class” classification problem: for a given subject, the aim of diagnostic test is to determine whether the subject is “positive” (with a target disorder) or “negative” (without a target disorder).
Following this, we design experiments to evaluate diagnostic performance of MLLE, LLE, MSE, LE and PCA on each category of IRMA data set, respectively. In detail, for each IRMA category, we treat it as positive test set. Meanwhile, a negative test set containing equal number of samples as that of positive test set is constructed by randomly selecting images from other categories. Based on the positive test set and negative test set, a diagnostic test procedure is performed on low-dimensional embedding obtained by MLLE, LLE, MSE, LE and PCA, respectively. Definitely, for each test image, all other images contained in IRMA data set are ranked according to their L2 distances to the test image. Then diagnostic result of the test image is determined by the following criterion: if more than half of the top k ranked images is positive, then the test image is positive; otherwise, the test image is negative. In our experiments, we set k as 15.
Similar to ROC curve analysis, we present here experimental results obtained on four IRMA categories. Experimental results on other categories can also be obtained with the method detailed above.
Table 5, Table 6, Table 7 and Table 8 compare diagnostic performance of MLLE, LLE, MSE, LE and PCA in terms of sensitivity, specificity and DOR, which are obtained on IRMA category 1, 4, 7 and 25, respectively. We get these results using Meta-Disc 1.4 . As shown in Table 5, the estimated sensitivity, specificity and DOR for the proposed MLLE in determining images from category 1 is 0.92 0.99 and 906.76 respectively. This means that for MLLE the odds for positivity among medical images from IRMA category 1 are 906.76 times higher than the odds for positivity among medical images from other IRMA categories. In the same way, the DORs for LLE, MSE, LE and PCA can be calculated. From Table 5 we can draw the conclusion that MLLE has the highest DOR in discrimination of IRMA category 1 compared to LLE, MSE, LE and PCA (906.76 vs. 773.44, 523.27, 335.32 and 675.00, respectively). The same conclusion can be drawn from Table 6, Table 7 and Table 8.
Evaluation results in terms of sensitivity, specificity, and DOR show that the proposed MLLE yields significantly higher performance than traditional dimensionality reduction methods.
Effects of Parameters
In this section, we analyze effects of parameters on MLLE performance. These parameters include dimension of the learned embedding, number of nearest neighbors contained in local patch, and scaling factor for the weight of each feature.
Effects of parameter d.
Figure 9 shows the MAP values when the propose MLLE is evaluated using different dimensionalities In these experiments, parameters and are same as those in the former experiment. From these experiments, we can see that the proposed MLLE outperforms existing dimension reduction methods. Moreover, we detail the MAP values of MLLE in Table 9. From the table we can see that MLLE achieves the best performance with set as 200.
(A) The algorithms are evaluated with (B) The algorithms are evaluated with (C) The algorithms are evaluated with (D) The algorithms are evaluated with (E) The algorithms are evaluated with (F) The algorithms are evaluated with (G) The algorithms are evaluated with (H) The algorithms are evaluated with (I) The algorithms are evaluated with
Effects of parameter K.
Figure 10 shows the MAP values when the proposed MLLE is evaluated with different In the experiments, parameters are fixed to 200 and 2, respectively. The results show that MLLE achieves the best performance with set as 140.
Effects of parameter r.
Figure 11 shows the MAP values when MLLE is evaluated with different In the experiments, parameters are fixed to 200 and 140, respectively. In Figure 11 (A), is updated from 2 to 10 with step 1. From the figure, we can see that MLLE achieves best performance when is approximate to 3. In Figure 11 (B), is updated from 1.1 to 3 with step 0.1. It can be seen that MLLE achieves best performance when is set as 2.5.
Performance Comparison of MLLE with Different Distance Metrics
Geodesic distance, L1 distance (which is also named city block distance or Manhattan distance) and L2 distance are well-known distance metrics used in the field of dimensionality reduction. In Section 2.1, we use L2 distance to find K-nearest neighbors of each medical image. In this section, we perform experiments to evaluate performance of MLLE with different distance metrics, i.e., geodesic, L1, and L2 distance.
Following the same experiment setup of experiment #1 detailed in Section 4.2, we conduct experiments to evaluate effects of these three different distance metrics on MLLE performance using ROC curve analysis.
Figure 12 shows ROC curves of MLLE with different distance metrics obtained on IRMA category 2, 3, 19, 31, 51 and 52, respectively. The number of images contained in positive and negative test set for each category is presented in Table 10. Table 11 details the corresponding AZ values.
(A) ROC curves obtained on IRMA category 2. (B) ROC curves obtained on IRMA category 3. (C) ROC curves obtained on IRMA category 19. (D) ROC curves obtained on IRMA category 31. (E) ROC curves obtained on IRMA category 51. (F) ROC curves obtained on IRMA category 52.
As shown in Table 11, for IRMA category 2, the AZ value for detecting between 1,103 positive images and 9,799 negative images is when using L2 distance. When applying L1 distance and geodesic distance, the computed AZ values are and respectively. Figure 12 (A) shows the comparison of ROC curves for these three sets of performance data. Table 11 demonstrates that L2 distance achieves the highest AZ value in detection of IRMA category 2 compared to L1 distance and geodesic distance ( vs. and , respectively).
The computed AZ values for detecting between positive and negative images from IRMA category 3, 19, 31, 51 and 52 are also detailed in Table 11. The corresponding ROC curves are demonstrated in Figure 12 (B), (C), (D), (E) and (F), respectively. From these results we can conclude that L2 distance is the best solution for MLLE to construct local patches. The same conclusion can be drawn from experimental results obtained on other IRMA categories.
Selecting Features before MLLE
In this section, we conduct experiments to demonstrate that there is no need to perform feature selection before MLLE.
The proposed MLLE has the merit of simultaneously learning a low-dimensional embedding from multiple features, by exploring different significances of different features. In detail, MLLE assumes that each feature has different contribution to the final learned low-dimensional embedding, though the feature does not have significant difference between different medical images. We clarify this point based on two experiments described as follows.
Experiment #3 includes the following three steps. Step 1: For each medical image we divide its twelve features into three groups: LBP group SIFT group and intensity group Step 2: For each group, we employ laplacian score feature selection (LPFS) , the unsupervised feature selection method, to determine the importance of each feature. In detail, within each feature group, we concatenate the four feature vectors into a long vector. So we get three long feature vectors to represent and Then the medical image data set has three different views: and On each view, we use LPFS to determine the importance of each feature. And the most important feature entries are selected. Finally, is represented by three dimension-reduced views: and Accordingly, for each image we obtain three dimension-reduced feature vectors: and In our experiment, we set as 500. Step 3: We utilize MLLE to learn the low-dimensional embedding based on three views obtained in step 2. The dimension of is set as 200. We denote this method as lpfs-MLLE (laplacian score feature selection-based MLLE).
Experiment #4 includes the following three steps. Step 1: This step is same as step 1 of experiment #3. Step 2: For each feature group, we employ multi-cluster feature selection (MCFS) , the manifold learning-based feature selection method, to select features which can best preserve the multi-cluster structure of medical image data set . In detail, each medical image has three different feature vectors: and Then the whole medical image data set can be represent by three different views: and On each view, we use MCFS to select feature entries which can best preserve the multi-class structure of this view. In our experiment, we set as 500. Then can be represented by three dimension-reduced views: and Step 3: This step is same as step 3 of experiment #3. We denote this method as mcfs-MLLE (Multi-cluster feature selection-based MLLE).
We compare performance of MLLE, mcfs-MLLE and lpfs-MLLE using ROC curve analysis. The experimental setup is same as that of experiment #1 detailed in subsection 4.2. Figure 13 shows ROC curves of these methods obtained on IRMA category 14, 27, 30, 43, 45 and 57, respectively. For each category, the number of samples contained in positive test set and negative test set is detailed in Table 12. Table 13 shows the corresponding AZ values.
(A) ROC curves on IRMA category 14. (B) ROC curves on IRMA category 27. (C) ROC curves on IRMA category 30. (D) ROC curves on IRMA category 43. (E) ROC curves on IRMA category 45. (F) ROC curves on IRMA category 57.
Table 13 shows that the AZ value for discriminating between 151 positive images from IRMA category 14 and 10,751 negative images from other categories is when using MLLE without feature selection. When applying MCFS and LPFS before MLLE to perform the same experiment, the computed AZ values are and respectively. Figure 13 (A) demonstrates the comparison of ROC curves for these three sets of performance data. From Table 13 we can see that directly using MLLE to perform dimensionality reduction yields the highest AZ value in the discrimination of IRMA category 14, compared to using feature selection methods MCFS and LPFS before conducting MLLE ( vs. and respectively).
The computed AZ values for detecting positive and negative images from IRMA category 27, 30, 43, 45 and 57 are also detailed in Table 13. The corresponding comparison of ROC curves are shown in Figure 13 (B), (C), (D), (E) and (F), respectively. Based on these results, we can come to the conclusion that, though using dimensionality reduction methods before MLLE can reduce features and save computing time, the learned embedding is worse than that obtained directly by MLLE.
It should be noted that, in this manuscript, to demonstrate the effectiveness of MLLE to explore complementary properties of different features, we extract twelve different features from each medical image. In practice, there is a trade-off between the number of visual features and retrieval performance. Within an acceptable range of retrieval performance, users can extract less visual features to save computing time. In fact, three to six visual features are capable of achieving the acceptable retrieval performance.
Discussion and Conclusion
We organize this section as follows. In Section 5.1, we give statistical analysis of experimental results presented above. Then we discuss the reason that MLLE achieves effective performance than existing dimensionality reduction methods in Section 5.2. Finally, Section 5.3 concludes our work.
In this paper, we use MAP, DOR and ROC as criteria to evaluate the performance of different methods. These criteria reflect the effectiveness of these methods from different aspects. In particular, MAP demonstrates the retrieval performance of different methods on the IRMA test set. DOR and ROC show the ability of different methods to distinguish different types of medical image. Evaluation results obtained from different criteria demonstrate that MLLE achieves best results.
Statistically, we utilize F1-measure to determine the reliability of different criterion. Table 14 shows F1-measure values for MLLE, LLE, MSE, LE and PCA on the IRMA category 1, 4, 7 and 25, respectively. From the table, we can see that MLLE achieves the best performance compared with other methods. This evaluation further confirms the results obtained by DOR. By using F1-measure to other performance criteria, i.e., MAP and ROC, we can obtain the same conclusion.
There are two reasons that make MLLE more effective to learn a low-dimensional embedding from multiview features, compared with existing dimensionality reduction methods. The first is that MLLE can simultaneously learn a low-dimensional embedding on multiview features. Different from other methods, MLLE uses LLE to obtain optimal low-dimensional subspace on each view and global coordinate alignment technique to unify all learned subspaces into a global one. The second is that MLLE can explore complementary properties among different features. Different from traditional dimensionality reduction methods that treat each feature equally, MLLE assigns different weight to each feature and utilizes alternating optimization technique to obtain these weights. Experimental results demonstrate the effectiveness of MLLE, in the context of CBMIR, compared with existing methods.
With the rapid proliferation of radiological images in the medical domain, retrieving medical images from large archives to aid radiological image interpretation is becoming one of the most active research fields. CBMIR utilizes multiple visual features to represent images, which brings the problem of the “curse of dimensionality”. Though conventional dimensional reduction methods can be employed to tackle this problem, these solutions ignore the fact that different visual features have a range of physical meanings. There is therefore a challenge to discover the complementary properties of multiple visual features to represent medical images. In this paper, we propose a new multiview learning method called MLLE to address the problem. Experimental evaluations on a subset of the IRMA medical image dataset have demonstrated that the new method effectively represents medical images in a low-dimensional subspace, and thus improves the performance of CBMIR significantly.
In the proposed method, it is found that local patch size subspace dimension and scaling factor affect the effectiveness of MLLE. From Figure 10, Table 9 and Figure 11 we can see that optimal parameter values for MLLE exist on the IRMA medical image dataset. In the future, we will evaluate the performance of MLLE on other medical image test bed to further explore effects of parameters on MLLE.
Detailed Derivation of Equation (9).
We would like to thank the academic editor and reviewers for their constructive comments. We thank Doctor Richong Zhang for his helpful discussions. We also thank courtesy of TM Deserno, Dept. of Medical Informatics, RWTH Aachen, Germany, for providing us IRMA medical image dataset.
Conceived and designed the experiments: HS DT DM. Performed the experiments: HS. Analyzed the data: HS DT DM. Contributed reagents/materials/analysis tools: DT. Wrote the paper: HS DT DM.
- 1. Croskerry P (2005) The theory and practice of clinical decision-making. Canadian Journal of Anesthesia 52: R1–R8.
- 2. Siegle RL, Baram EM, Reuter SR, Clarke EA, Lancaster JL, et al. (1998) Rates of disagreement in imaging interpretation in a group of community hospitals. Academic Radiology 5: 148–154.
- 3. Barlow WE, Chi C, Carney PA, Taplin SH, D'Orsi C, et al. (2004) Accuracy of screening mammography interpretation by characteristics of radiologists. Journal of the National Cancer Institute 96: 1840–1850.
- 4. Akgul CB, Rubin DL, Napel S, Beaulieu CF, Greenspan H, et al. (2011) Content-based image retrieval in radiology: Current status and future directions. Journal of Digital Imaging 24: 208–222.
- 5. Ghosh P, Antani S, Long LR, Thoma GR (2011) Review of medical image retrieval systems and future directions. 24th International Symposium on Computer-Based Medical Systems (CBMS). 1–6.
- 6. Wang JY, Li YP, Zhang Y, Wang C, Xie HL, et al. (2011) Bag-of-features based medical image retrieval via multiple assignment and visual words weighting. IEEE Transactions on Medical Imaging 30: 1996–2011.
- 7. Valente F, Costa C, Silva A (2013) Dicoogle, a PACS featuring profiled content based image retrieval. PLoS ONE 8: e61888.
- 8. Quellec G, Lamard M, Cazuguel G, Roux C, Cochener B (2011) Case retrieval in medical databases by fusing heterogeneous information. IEEE Transactions on Medical Imaging 30: 108–118.
- 9. Mu?ller H, Kalpathy-Cramer J (2010) The ImageCLEF medical retrieval task at ICPR 2010-information fusion to combine visual and textual information. In: Unay D, Cataltepe Z, Aksoy S, editors. Recognizing Patterns in Signals, Speech, Images, and Videos. Berlin: Springer-Verlag Berlin. 99–108.
- 10. Müller H, Michoux N, Bandon D, Geissbuhler A (2004) A review of content-based image retrieval systems in medical applications–clinical benefits and future directions. International Journal of Medical Informatics 73: 1–23.
- 11. Depeursinge A, Duc S, Eggel I, Mu?ller H (2012) Mobile medical visual information retrieval. IEEE Transactions on Information Technology in Biomedicine 16: 53–61.
- 12. Dimitrovski I, Kocev D, Loskovska S, Dzeroski S (2011) Hierarchical annotation of medical images. Pattern Recognition 44: 2436–2449.
- 13. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24: 971–987.
- 14. Park DK, Jeon YS, Won CS (2000) Efficient use of local edge histogram descriptor. Proceedings of the 2000 ACM workshops on Multimedia. Los Angeles, California, USA: ACM. 51–54.
- 15. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60: 91–110.
- 16. Lehmann TM, Guld MO, Deselaers T, Keysers D, Schubert H, et al. (2005) Automatic categorization of medical images for content-based retrieval and data mining. Computerized Medical Imaging and Graphics 29: 143–155.
- 17. Chen DR, Huang YL, Lin SH (2011) Computer-aided diagnosis with textural features for breast lesions in sonograms. Computerized Medical Imaging and Graphics 35: 220–226.
- 18. Wu WJ, Lin SW, Moon WK (2012) Combining support vector machine with genetic algorithm to classify ultrasound breast tumor images. Computerized Medical Imaging and Graphics 36: 627–633.
- 19. Dy JG, Brodley CE, Kak A, Broderick LS, Aisen AM (2003) Unsupervised feature selection applied to content-based retrieval of lung images. IEEE Transactions on Pattern Analysis and Machine Intelligence 25: 373–378.
- 20. Roweis ST, Saul LK (2000) Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 290: 2323–2326.
- 21. Jolliffe IT (1986) Principal component analysis: Springer-Verlag New York.
- 22. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15: 1373–1396.
- 23. Huang YL, Kuo SJ, Chang CS, Liu YK, Moon WK, et al. (2005) Image retrieval with principal component analysis for breast cancer diagnosis on various ultrasonic systems. Ultrasound in Obstetrics & Gynecology 26: 558–566.
- 24. Zhang Y, Hu QM (2008) A PCA-based approach to the representation and recognition of MR brain midsagittal plane images. 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. New York: IEEE. 3916–3919.
- 25. Cho HC, Hadjiiski L, Sahiner B, Chan HP, Helvie M, et al. (2013) A similarity study of content-based image retrieval system for breast cancer using decision tree. Medical Physics 40: 012901.
- 26. Bagci U, Jaster-Miller K, Olivier KN, Yao J, Mollura DJ (2013) Synergistic combination of clinical and imaging features predicts abnormal imaging patterns of pulmonary infections. Computers in Biology and Medicine 43: 1241–1251.
- 27. Liu WF, Tao DC (2013) Multiview hessian regularization for image annotation. IEEE Transactions on Image Processing 22: 2676–2687.
- 28. Luo Y, Tao DC, Xu C, Xu C, Liu H, et al. (2013) Multiview vector-valued manifold regularization for multilabel image classification. IEEE Transactions on Neural Networks and Learning Systems 24: 709–722.
- 29. Yu J, Wang M, Tao DC (2012) Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Transactions on Image Processing 21: 4636–4648.
- 30. Xie B, Mu Y, Tao DC, Huang KQ (2011) m-SNE: Multiview stochastic neighbor embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 41: 1088–1096.
- 31. Xia T, Tao DC, Mei T, Zhang YD (2010) Multiview spectral embedding. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 40: 1438–1446.
- 32. Xu C, Tao DC, Xu C (2013) A Survey on multi-view learning. CoRR abs/1304.5634.
- 33. Tao DC, Li XL, Wu XD, Maybank SJ (2007) General tensor discriminant analysis and gabor features for gait recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 29: 1700–1715.
- 34. Tao DC, Tang XO, Li XL, Wu XD (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 28: 1088–1099.
- 35. Tao DC, Li XL, Wu XD, Maybank SJ (2009) Geometric mean for subspace selection. IEEE Transactions on Pattern Analysis and Machine Intelligence 31: 260–274.
- 36. Zhang TH, Tao DC, Li XL, Yang J (2009) Patch alignment for dimensionality reduction. IEEE Transactions on Knowledge and Data Engineering 21: 1299–1313.
- 37. Zhang Z, Zha H (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing 26: 313–338.
- 38. Bezdek J, Hathaway R (2002) Some notes on alternating optimization. In: Pal N, Sugeno M, editors. Advances in Soft Computing – AFSS 2002: Springer Berlin Heidelberg. 288–300.
- 39. Lehmann TM, Schubert H, Keysers D, Kohnen M, Wein BB (2003) The IRMA code for unique classification of medical images. Medical Imaging 2003: PACS and Integrated Medical Information Systems: Design and Evaluation: SPIE. 440–451.
- 40. Deserno TM OB (2009) 15,363 IRMA images of 193 categories for ImageCLEFmed 2009. V1.0 ed. http://www.irma-project.org/datasets_en.php?SELECTED=00009#00009.dataset.
- 41. Saul LK, Roweis ST (2003) Think globally, fit locally: unsupervised learning of low dimensional manifolds. The Journal of Machine Learning Research 4: 119–155.
- 42. Yang L, Jin R, Mummert L, Sukthankar R, Goode A, et al. (2010) A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 32: 30–44.
- 43. Deselaers T, Keysers D, Ney H (2008) Features for image retrieval: an experimental comparison. Information Retrieval 11: 77–107.
- 44. Li FF, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: Schmid C, Soatto S, Tomasi C, editors. IEEE Computer Society Conference on Computer Vision and Pattern Recognition: IEEE. 524–531.
- 45. Metz C (2008) ROC analysis in medical imaging: a tutorial review of the literature. Radiological Physics and Technology 1: 2–12.
- 46. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2: 1–27.
- 47. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence. 1137–1145.
- 48. Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM (2003) The diagnostic odds ratio: a single indicator of test performance. Journal of Clinical Epidemiology 56: 1129–1135.
- 49. Zamora J, Abraira V, Muriel A, Khan K, Coomarasamy A (2006) Meta-DiSc: a software for meta-analysis of test accuracy data. BMC Medical Research Methodology 6: 31.
- 50. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Advances in Neural Information Processing Systems 18 (NIPS 2005). Vancouver, Canada. 507–514.
- 51. Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA: ACM. 333–342.