Dual-Force ISOMAP: A New Relevance Feedback Method for Medical Image Retrieval

With great potential for assisting radiological image interpretation and decision making, content-based image retrieval in the medical domain has become a hot topic in recent years. Many methods to enhance the performance of content-based medical image retrieval have been proposed, among which the relevance feedback (RF) scheme is one of the most promising. Given user feedback information, RF algorithms interactively learn a user’s preferences to bridge the “semantic gap” between low-level computerized visual features and high-level human semantic perception and thus improve retrieval performance. However, most existing RF algorithms perform in the original high-dimensional feature space and ignore the manifold structure of the low-level visual features of images. In this paper, we propose a new method, termed dual-force ISOMAP (DFISOMAP), for content-based medical image retrieval. Under the assumption that medical images lie on a low-dimensional manifold embedded in a high-dimensional ambient space, DFISOMAP operates in the following three stages. First, the geometric structure of positive examples in the learned low-dimensional embedding is preserved according to the isometric feature mapping (ISOMAP) criterion. To precisely model the geometric structure, a reconstruction error constraint is also added. Second, the average distance between positive and negative examples is maximized to separate them; this margin maximization acts as a force that pushes negative examples far away from positive examples. Finally, the similarity propagation technique is utilized to provide negative examples with another force that will pull them back into the negative sample set. We evaluate the proposed method on a subset of the IRMA medical image dataset with a RF-based medical image retrieval framework. Experimental results show that DFISOMAP outperforms popular approaches for content-based medical image retrieval in terms of accuracy and stability.


Introduction
Medical image interpretation is a process which incorporates subjective perception and objective reasoning. Typically, radiologists obtain superficial visual features from medical images and render diagnostic conclusions based on personal knowledge and experience. Due to differences of perception, training and fatigue, different conclusions about the same medical image will be drawn by different professionals or by the same professional under different circumstances [1,2]. The goal of content-based medical image retrieval (CBMIR) is to enable radiologists to make better diagnosis about a given case by retrieving similar cases from a variety of semantically annotated medical image archives.
It is well-known that ''semantic gap'' is one of the issues faced by content-based image retrieval (CBIR). The fact that medical images contain varied, rich and subtle visual features [3] is an additional challenge to the use of CBIR in radiology. Unlike from regular image understanding, medical image diagnosis is dependent on case-specific interpretation. It is common for visually similar medical images to convey different semantic meanings, while semantically-alike images have different visual features. Let us take medical images obtained from IRMA medical image dataset [4] as an example. The IRMA medical image dataset is a widely used test bed for performance evaluation of CBMIR [5][6][7][8].
The new version of IRMA dataset [4] contains 12,677 fully annotated gray value radiographs in a training set. These images are categorized into 193 classes according to a mono-hierarchical multi-axial classification standard called the IRMA coding system [9]. The system classifies a medical image from four orthogonal axes: imaging modality, body orientation, body region examined and biological system examined. Figure 1 and Figure 2 illustrate the scenario of semantic gap. As shown in Figure 1, two chest radiographs have a similar visual appearance, but their semantic meanings are different. The IRMA code [9] of the left image is ''1123-127-500-000'', while the IRMA code of the right image is ''1123-110-500-003''. By contrast, though their visual appearance is different, the two mammograms shown in Figure 2 have the same IRMA code ''1124-310-610-625''.
Relevance feedback (RF) is a promising solution to fill the semantic gap in CBIR [10]. Under the assumption that every user's need is different and time varying [11,12], RF provides a user-in-the-loop mechanism to allow a user to interact with the retrieval system to refine the retrieval results. The basic process of RF in CBIR is as follows: 1) the retrieval system returns the initial retrieval results to the user; 2) the user labels query-relevant images and query-irrelevant images as positive feedback and negative feedback, respectively; 3) based on the labeled feedback, the retrieval system learns to improve the retrieval performance and returns new results; 4) if the user is satisfied with the new results, the RF process ends; otherwise, go to 2).
Many RF methods have also appeared in CBMIR in recent years. Rahman et al. [28] utilized positive feedback to update the optimal query point for medical image retrieval. They proposed a RF-based dynamic similarity fusion approach for biomedical image retrieval [29] in which RF information is utilized to reweight features at each iteration. Xu et al. [30,31] utilized RF to update feature weights for X-ray image retrieval. To solve the small sample size problem, Hoi et al. [32] proposed a method called semi-supervised SVM batch mode active learning for both medical and regular image retrieval. In addition, Ko et al. [33] integrated the RF scheme into CBMIR to boost retrieval performance. Though the approaches mentioned above achieve promising results, there is room for performance enhancement because most of these methods do not consider the manifold structure of low-level image features.
In this paper, we formulate a new RF method termed dual-force ISOMAP (DFISOMAP) for CBMIR. DFISOMAP is proposed in the context of precisely exploring the manifold structure of lowlevel image visual features [34][35][36]. DFISOMAP operates in the following three stages: 1) the local geometry preservation stage, 2) the margin maximization stage, and 3) the similarity propagation stage. First, the local geometry of the positive examples in the high-dimensional feature space is preserved according to the isometric feature mapping (ISOMAP) criterion [37]. To precisely model the geometric structure of positive examples in the lowdimensional embedding, a reconstruction error constraint according to locally linear embedding (LLE) [38] [39], locality preserving projections (LPP) [40], biased discriminant analysis (BDA) [21], constrained similarity measure using support vector machine (CSVM) [18], ISOMAP and exponential locality preserving projections (ELPP) [41], DFISO-MAP differ in the following ways: 1) DFISOMAP precisely preserves the geometric structure of positive feedback examples, and 2) DFISOMAP does not suffer from the undersampling problem.

Dual-Force ISOMAP
In this section, we detail the proposed DFISOMAP. To better present the method, Table 1 lists important notations used in this paper.
Consider a set of medical images I~½x x 1 , Á Á Á ,x x N [R h|N in lowlevel feature space, and a query imagex x q [I: Following the queryby-example paradigm of the CBIR system, there are top n returned images for each query, from which we obtain n z images which are from the same semantic class asx x q : We term them positive examples:x x q 1 , Á Á Á ,x x q n z : Putting these examples together, we get a positive feedback set X z~½x x q 1 , Á Á Á ,x x q n z : Meanwhile, we obtain n { images, which are from different semantic classes with respect tox x q : We term them negative examples:x x q1 , Á Á Á ,x x qn { : Putting these examples together, we get a negative feedback set DFISOMAP assumes that medical images lie on a lowdimensional manifold R l and are artificially embedded in a high-dimensional ambient space, i.e., the low-level feature space R h : The objective of DFISOMAP is to learn a mapping function F from R h to R l , based on the relevance feedback set X : The learned mapping F should effectively separate positive examples from negative examples. For simplicity, we assume that F is linear. The problem of DFISOMAP is then converted to find a projection

Local Geometry Preservation
ISOMAP preserves the local geometry of positive examples by the following objective function [37] arg miñ and D E are n z |n z matrices. According to [37], D G and D E can be converted to inner product matrix t(D G ) and t(D E ), respectively. Operator t(D) is defined as where I nz is an n z |n z identity matrix,ẽ e nz~( 1, . . . ,1) T [R nz : Thus, equation (1) can be transformed to arg miñ where tr½. stands for the trace operator, Y z~U T X z : Assuming that Y T z Y is a constant matrix, equation (5) can be converted to Table 1. Important notations used in this paper. arg miñ

Notation Description Notation Description
where A~X z t(D G )X T z : To minimize reconstruction error of the local geometry preservation presented above, we further assume eachỹ y i [Y z can be reconstructed by its neighbors. Thus, we have arg miñ where B~X z (I{W T )(I{W T ) T X T z , I is an n z |n z identity matrix. W i,j is obtained via locally linear embedding (LLE) [38]: Putting equation (6) and (7) together, we obtain the objective function for local geometry preservation where a §0 is the trade-off parameter.

Margin Maximization
In the low-dimensional embedding, we expect that the average pairwise distances between negative and positive feedback examples will be as large as possible, and the average pairwise distances among positive feedback examples will be as small as possible, i.e., arg max Where h §1 is the gap factor. Consideringỹ y i~U Tx x i , equation (10) reduces to:

Similarity Propagation
Equation (11)  The straightforward way to shrink the pairwise distance between interclass examples is to minimize the average weighted square distance between all sample pairs (ỹ y i ,ỹ y j ),0ƒi,jƒn, in the low-dimensional embedding: where N Ã [R n|n is termed similarity matrix.
In this paper, we defineN Ã as N Ã quantifies the similarity relationship among positive and negative examples, respectively. In our implementation, we settas 0.5.
Putting equation (11) and (12) together, we have arg max where bw0 is the trade-off parameter. Let us denote Then equation (14) can be rewritten as where C~X (L{M)X T ,Lis a diagonal matrix, L i,i~P n j~1 M i,j :

Objective Function
Combining equation (9) with (16), we obtain the objective function of DFISOMAP where c §0 is the margin factor, E~A{aBzcC: Because the real matrix E is symmetric (the proof is given in Appendix S1), U can be solved by standard eigenvalue decomposition on E: By imposing UU T~I l on (17), U is formed by the l eigenvectors associated with the first l largest eigenvalues.

CBMIR Framework
We utilize the framework depicted in Figure 3 for CBMIR. Any RF feedback algorithm can be integrated into this framework. As shown in this figure, when a query image is provided, its lowlevel visual features are extracted. All images contained in the medical image database are then sorted in ascending order according to their distance from the query image measured by Euclidean metric. If the user is not satisfied with the result, s/he labels some semantically relevant images as positive feedback examples and some semantically irrelevant images as negative feedback examples. Based on these feedback examples, a RF model can be trained. All images, including the positive feedback, the negative feedback and the remaining images contained in the medical image database, are re-sorted based on the updated similarity metric and the top-ranked images are returned. If the user is not satisfied with the result, the RF process is repeated.
For DFISOMAP, we learn a projection matrix U according to equation (17). Then we use U to project all the images to the lowdimensional embedding. In the projected embedding, each image is re-sorted in ascending order with respect to its Euclidean distance from the query image and the top-ranked images are returned to the user. The RF procedure stops when the user is satisfied with the results.
We use LBP [43], SIFT [44] and pixel intensity descriptors respectively to extract features from the medical image. For the  [45] scheme to represent the image. In detail, we first densely sample each image with SIFT and the intensity descriptor, respectively. We set the sampling space as 8, and the patch size as 16616. Then we use K-means clustering to learn two 500-word dictionaries, i.e., SIFT and intensity visual word dictionary. Finally, for each image, we obtain a 500-bin SIFT and intensity histogram, respectively. We represent each image by concatenating the 531-bin LBP histogram, 500-bin SIFT histogram and 500-bin pixel intensity histogram into a 1531-D long vector. To get rid of redundant information contained in the concatenated vector and reduce the computational complexity in the next section, we normalize the concatenated 1531-D vector into a normal distribution with zero mean and one standard deviation. Then we use principal component analysis (PCA) to reduce the normalized vector to a 500-D feature vector.

Performance Evaluation
In this section, we report performance of the proposed DFISOMAP for CBMIR comparing with that of other methods, i.e., LDA, LPP, BDA, CSVM, ISOMAP, LLE and ELPP.
This section is organized as follows. In section 4.1, we introduce the dataset used for evaluation. Section 4.2 presents experimental setup. In section 4.3, we compare DFISOMAP with other RF approaches using mean average precision (MAP) and standard deviation (SD). Section 4.4 reports performance evaluation results of RF methods in terms of precision and recall. Finally, we explore effects of parameters on the performance of DFISOMAP in section 4.5.

IRMA Medical Image Dataset
The IRMA medical image dataset is widely used for CBMIR evaluation. In our experiment, we select the first 57 categories from the new version of IRMA dataset as test bed. The selected images contain a total of 10,902 images. Figure 4 shows example images from the dataset. Figure 5 illustrates three query images.

Experimental Setting
We conduct 338 independent experiments to evaluate performance of DFISOMAP and other RF methods. In detail, we randomly select 338 images from the IRMA data set as query examples. These images belong to different IRMA categories. In general, five or six images are selected from each IRMA category. In initial retrieval, for each query sample, there are five to eight relevant images in top30 ranked results. For each selected image, a ''leave one out'' query is conducted: Rest images contained in the data set are ranked according to their Euclidean distance to the query sample.
Different RF algorithms are embedded into the framework depicted in Figure 3. The RF process is automatically performed by the computer. A computer-simulated query for each query image is performed on all the other 10,901 images contained in the dataset. The computer marks all query relevant images as positive feedback in the top 30 images and the rest as negative feedback. In general, we have between two and eight images as positive feedback. The procedure is close to a real-world application scenario, because typically the user does not want to label many feedback examples in the iteration process. We set the number of RF iterations as 10. For the first iteration, the returned images are ranked according to their Euclidean distance from the query image. Starting from the second iteration, different RF algorithms learn different projection matrices U based on positive and negative feedback, respectively. In the projected low-dimensional embedding, other images in the dataset are re-ranked according to their Euclidean distance from the query image.
We parameterize the settings of all baseline methods according to the descriptions in corresponding papers. In the experiments, the parameters of different methods are tuned to obtain the best results. For CSVM, we choose the Gaussian kernel 2 ) with s~0:5: LibSVM [9] is utilized to achieve an optimal hyperplane to separate negative and positive examples. For ELPP, we set parameters as what is described in [41].

Performance Evaluation Using MAP and SD
In this section, we use MAP and SD to measure the performance of DFISOMAP and other RF algorithms. MAP is the mean of average precision values of the 338 independent queries. MAP value measures the retrieval precision of RF algorithms. SD value is computed from AP values of the 338 independent queries. SD value assesses the stability of RF algorithms. Figure 6 and Figure 7 illustrate performance of the proposed DFISOMAP compared to LDA, LPP, BDA, CSVM, ISOMAP, LLE and ELPP-based RF algorithms. In From the figure we can see that, in all experiments, and after any number of iterations, the proposed DFISOMAP consistently outperforms other conventional RF algorithms in terms of MAP. The DFISOMAP also shows good stability, as demonstrated by the SD value and tendency of the SD curve. At each level (top 10

Performance Evaluation Using Precision and Recall
In this section, we utilize average precision (AP) and average recall (AR) to evaluate performance of DFISOMAP and other methods. In the context of CBMIR, precision refers to percentage of relevant medical images in top retrieved results. AP is calculated as the averaged precision values obtained via all queries. And recall refers to percentage of relevant medical images in all relevant examples contained in the test bed. AR is averaged recall values of all queries. Figure 8, Table 2 and Table 3 show AP of different methods. In detail, Figure 8 Table 2 and Table 3, respectively. From these two tables, we can draw the conclusion that DFISOMAP achieves more promising results compared with other methods.  Table 4 and Table 5 present AR of different algorithms. Specifically, Figure 9 (A), (B), (C), (D) and (E) demonstrate AR of different approaches obtained in the top 10, 20, 30, 40, and 50 results, respectively. We can conclude from the figure that DFISOMAP is more effective than the other compared methods. Moreover, AR values of top ranked results for different methods after the fifth and ninth feedback are given in Table 4 and Table 5, respectively. According to these two tables, we can see that DFISOMAP is more effective than other approaches.

Effects of Parameters
(1) Effects of a. As shown in equation (17), parameteracontrols the contribution of B toE: WhereBstands for utilizing LLE to preserve local geometry of positive feedback examples.
With the same experimental setup detailed above, we conduct experiments to evaluate effects of a:In our experiments, we increaseafrom 0 to 100 with step 10, and setcas 1400. Table 6 and Table 7 show AP and AR of DFISOMAP in top50 results, respectively. From which we can draw the following conclusions. 1) DFISOMAP achieves best performance whenais set as 10. 2) With the increasing of a,performance of DFISOMAP degrades. 3) Whenais set as 0, i.e.,Bhas no contribution toE, performance of DFISOMAP is worst. The conclusion verifies the effectiveness of  (2) Effects of c. Equation (17) demonstrates thatccontrols the contribution of C toE:WhereCstands for similarity propagation in positive and negative examples.
With the same experimental setup mentioned above, we conduct experiments to explore effects of c:In our experiments, we increasecfrom 0 to 2000 with step 200, and set as 10. Table 8 and Table 9 detail AP and AR of DFISOMAP in top50 results, respectively. From the table we can draw the following conclusions. 1) DFISOMAP achieves best performance whencis set as 1400. 2) Whencis set as 0, i.e., there is no similarity propagation, performance of DFISOMAP is worst. The conclusion confirms effectiveness of similarity propagation.

Conclusion
Starting from the assumption that medical images are artificially embedded in a high-dimensional visual feature space, we propose the dual-force ISOMAP (DFISOMAP) to map medical images from high-dimensional feature space to low-dimensional embedding. In the framework of CBMIR, DFISOMAP precisely preserves the geometric structure of positive feedback examples according to the ISOMAP criterion, and effectively separates negative examples from positive examples by utilizing two forces. The evaluation results on a subset of the IRMA medical image dataset show that DFISOMAP outperforms popular dimensionality reduction-based RF algorithms, e.g., LDA, BDA, LPP, ISOMAP, LLE, ELPP and support vector machine-based RF algorithms, e.g., CSVM.

Supporting Information
Appendix S1 Proof of E is symmetric.