MBR-SIFT: A mirror reflected invariant feature descriptor using a binary representation for image matching

The traditional scale invariant feature transform (SIFT) method can extract distinctive features for image matching. However, it is extremely time-consuming in SIFT matching because of the use of the Euclidean distance measure. Recently, many binary SIFT (BSIFT) methods have been developed to improve matching efficiency; however, none of them is invariant to mirror reflection. To address these problems, in this paper, we present a horizontal or vertical mirror reflection invariant binary descriptor named MBR-SIFT, in addition to a novel image matching approach. First, 16 cells in the local region around the SIFT keypoint are reorganized, and then the 128-dimensional vector of the SIFT descriptor is transformed into a reconstructed vector according to eight directions. Finally, the MBR-SIFT descriptor is obtained after binarization and reverse coding. To improve the matching speed and accuracy, a fast matching algorithm that includes a coarse-to-fine two-step matching strategy in addition to two similarity measures for the MBR-SIFT descriptor are proposed. Experimental results on the UKBench dataset show that the proposed method not only solves the problem of mirror reflection, but also ensures desirable matching accuracy and speed.


Introduction
The local feature point has been successfully used in pattern recognition and computer vision applications, such as image retrieval [1], object recognition [2], gesture recognition [3], texture recognition [4], 3-D reconstruction [5], building panoramas [6], and wide baseline matching [7,8]. Image matching based on local features generally consists of three stages: feature point extraction, description, and matching. In feature point extraction, reliable points of interest in the image are extracted as feature points. A good descriptor should be robust to photometric transformations, such as brightness and highlight, while being invariant to geometrical transformations, such as rotation, scaling, viewpoint, and reflection [9].
Until recently, numerous feature descriptors have been proposed, of which the scale invariant feature transform (SIFT) descriptor proposed by Lowe [10] is one of the most successful PLOS  and popular local image feature descriptors. The SIFT descriptor, which is generated with the gradient distribution of the local region, was proven to be the best local invariant feature descriptor by Mikolajczyk and Schmid [11]. However, its matching inefficiency slows down the entire process. Much research has been conducted on improving the SIFT algorithm. The PCA-SIFT [12] descriptor improves the efficiency of the SIFT algorithm by reducing the dimension of the SIFT descriptor vector from 128 to 36. Additionally, GLOH [11] is an extension of the SIFT descriptor that is designed to increase its robustness and distinctiveness, to a certain extent. Morel and Yu [13] proposed an affine SIFT, which simulates all the distortions caused by variations in the direction of a camera's optical axis.
In the matching procedure, the 128-dimensional (128-D) descriptors of all keypoints in two images are extracted. The 128-D descriptor of each keypoint in the first image is compared with that of the second image. The Euclidean distance is used as the similarity measurement of the two descriptors to locate the nearest matching keypoint. However, the SIFT algorithm usually generates hundreds to thousands of keypoints for each image. Hence, the SIFT features could be numerous in a large-scale image database. Moreover, the distance computation involves calculating the square root. Thus, image matching in the SIFT method for a largescale image database would be extremely time-consuming. To solve this problem, several binary SIFT (BSIFT) methods, which convert the SIFT descriptors to a binary representation, have been proposed in the last few years. The Hamming distance is used to measure the distance between two BSIFT descriptors, which takes advantage of bit-wise operations instead of the root mean square, and leads to a significant decrease in feature matching time. Ni [14] first proposed a binary string approach for SIFT keypoints. His method exploited the Hamming distance to measure the similarity of two BSIFT vectors. Chen et al. [15] proposed comparing the absolute difference between two adjacent values in a descriptor with the threshold, and then representing the comparison result with binary digits (zero or one), which generated a 128-bit BSIFT descriptor string. This approach was simple, while drastically decreasing the matching time; however, the matching accuracy rate also decreased. Zhou et al. [16] compared the 128 values of the SIFT descriptor individually with two threshold values. The comparison results were represented by three combinations: 11, 10, and 00. Correspondingly, a 256-bit BSIFT descriptor string was obtained. This approach improved matching accuracy to some extent; however, the matching time increased compared with the approach proposed by Chen et al.
The aforementioned BSIFT methods and their improved algorithms mostly ignore the problem of mirror reflection, which results in a significant increase in the mismatch rate for a mirror image pair. Guo [17] presented a mirror reflection invariant descriptor (MIFT), which was inspired by SIFT. However, the matching time of MIFT is comparable to that of SIFT.
To address these problems, this paper presents a new horizontal or vertical mirror reflection invariant binary descriptor named MBR-SIFT, in addition to a novel image matching approach. MBR-SIFT not only binarizes the SIFT descriptor, but also takes into consideration the problem of mirror reflection. First, 16 cells in the local region around the SIFT keypoint are reorganized, and then a 128-D vector of the SIFT descriptor is transformed into a reconstructed vector called R-SIFT according to eight directions. Finally, MBR-SIFT is obtained after R-SIFT binarization and reverse coding. To improve the matching speed and accuracy, a fast matching algorithm that includes a coarse-to-fine two-step matching strategy and two types of similarity measure for the MBR-SIFT descriptor are proposed. To examine the effectiveness of the proposed MBR-SIFT descriptor, it is also compared with other local descriptors.

Related work
The SIFT algorithm extracts image features by searching the keypoints in the image, and then calculates the descriptors from the local region around the keypoints. As shown in Fig 1a, the local region is first divided into 16 cells with eight directions in each cell, and each direction is given a value. Finally, the 128-D SIFT descriptor, as shown in Fig 1d, isobtained. The SIFT binarization approach is to transform the 128-D descriptor (d 0 , d 1 ,. . ., d 127 ) into a set of binary numeric strings. The commonly used binarization approaches can be classified into two common categories. The first category is proposed to compare the differential value Ad i of the two adjacent values in a descriptor with the predefined threshold M: The comparison result b i is zero or one, which is only denoted by one bit [15]. The second category directly compares each d i of the 128-D descriptor (d 0 , d 1 ,. . ., d 127 ) with two thresholds, M 1 and M 2 : The comparison result is 11, 10, and 00, which is denoted by two bits [16]: In essence, the first category converts the original 128 decimal values to a 128-bit binary value, which decreases the memory requirements and reduces the matching time. The problem of this type of approach is that it weakens the discriminative power of the SIFT descriptors. Regarding the discriminative power of SIFT descriptors, the second category is better than the first; however, its matching speed relative to the first category is slower.
The number of mismatched pairs would significantly increase for a mirror image pair, whether the matched features are SIFT descriptors or BSIFT descriptors. As shown in we can easily identify that the difference between both SIFT descriptors is large, which implies that SIFT is not horizontal mirror reflection invariant. Once the local region around the keypoint is vertically reflected, as shown in Fig 1f, the corresponding 128-D SIFT descriptor is the same as the scenario with the horizontal mirror reflection. Thus, SIFT is also not vertical mirror reflection invariant. Similarly, the binary SIFT descriptor is also not horizontal or vertical mirror reflection invariant.

Our approach
An intuitive idea to make a BSIFT descriptor mirror reflection invariant is to artificially reflect one of the matching image pairs and perform image matching once again. This approach is simple; however, the time for matching is increased because of the repetitive execution of the SIFT algorithm and binarization operation. The BSIFT descriptor of the mirror reflection image can be achieved if we conduct a simple operation on the original BSIFT descriptor, which leads to savings in computational time. The proposed binarization method, MBR-SIFT, is based on this idea.

SIFT descriptor reconstruction
By analyzing the structure of the SIFT descriptor, we found that the connection between the BSIFT descriptors before and after mirror reflection can be built by reconstructing the SIFT descriptor.
As shown in Fig 2a, the second and fourth columns of the 16 cells are reorganized in the reverse of their original order in Fig 1a, and correspondingly, in Fig 2b, the SIFT descriptor is reconstructed, in which the order of the 16 cells is "1,2, 3,4,8,7,6,5,9,10,11,12,16,15,14,13." Similarly, in Fig 2c and 2d, the order of the 16 cells in the horizontal or vertical mirror image is reconstructed as " 13,14,15,16,12,11,10,9,5,6,7,8,4,3,2,1." It can be observed that the order of the 16 cells for the image and its mirror image just meets the reversal relation. Additionally, each cell consists of eight oriented gradients, that is, "A 1 B 1 C 1 D 1 E 1 F 1 G 1 H 1 " for the first cell in

R-SIFT binarization
We denote the R-SIFT descriptor by the 128-D vector (D 0 , D 1 ,. . ., D 127 ) and the differential value by AD i (i = 0, 1,. . ., 127), given by The modulo operation in Eq (4) ensures that AD i is the difference between adjacent values in the same direction. Two SIFT binarization methods for AD i are proposed to work with the following fast matching algorithm. The first is to compare AD i with zero, and the comparison result is zero or one, only denoted by one bit. Thus, the 128-D R-SIFT descriptor is transformed into a 128-bit binary string denoted as BR-SIFT1: This procedure can be illustrated as follows: The second SIFT binarization method is to compare AD i with a threshold (±T), and the comparison result is 00, 01, 10, or 11, represented by two bits. Thus, the 128-D R-SIFT descriptor is transformed into a 256-bit binary string denoted as BR-SIFT2: This procedure can be illustrated as follows: where T is a positive value defined as where D and σ are the mean and standard deviation of the 128-D R-SIFT descriptor (D 0 , D 1 ,. . ., D 127 ), respectively; and a and b are constants. Through numerous experiments, the optimal a and b values were determined to be 2.3 and 0, respectively.

Reverse coding
After constructing the BR-SIFT1 and BR-SIFT2 descriptors for the original image, we illustrate how to construct their corresponding descriptors for its horizontal and vertical mirror image denoted as MBR-SIFT1 and MBR-SIFT2, respectively. From Table 1, it can be observed that the R-SIFT descriptors of the original image and its mirror image have a reversal relation. Thus, after the differential operation, with the exception that b 1 15 is changed into~b 1 15 , where~represents the NOT operator, the remaining bits in direction A of the BR-SIFT1 and MBR-SIFT1 descriptors are a mirror of each other. Regarding the first 15 bits in direction A, the MBR-SIFT1 descriptor can be recovered by scanning the BR-SIFT1 descriptor in reverse order and then performing the bitwise NOT operation. Additionally, the order of the eight directions in the BR-SIFT1 descriptor for the original image is "ABCDEFGH." Despite this, the order of the eight directions in the MBR-SIFT1 descriptor for its mirror image is "AHGFEDCB;" therefore, we must exchange 16 binary values of directions B and H, directions C and G, and directions D and F in the BR-SIFT1 descriptor. Under this scheme, the MBR-SIFT1 descriptor is constructed from the BR-SIFT descriptor. Similarly, MBR-SIFT2 can also be constructed from BR-SIFT2; the difference is that the length of the descriptor for BR-SIFT2 is 256 bits. In Eq (6), it can be observed that if two differential values are symmetric with respect to the axis, their encoding value is also inverse, that is, (0,0) and (1,1), and (0,1) and (1,0). Therefore, in the procedure of the bitwise NOT operation, (0,0) and (1,1), and (0,1) and (1,0) are exchanged.
We compare the time complexity between the proposed MBR-SIFT method and original BSIFT method. Both the MBR-SIFT and BSIFT methods achieve 128-bit or 256-bit binary descriptors after the differential operation. Unlike BSIFT, to improve matching efficiency and accuracy, the MBR-SIFT method obtains two types of binary descriptors, BR-SIFT1 and BR-SIFT2, from the differential values. Instead of performing the SIFT algorithm on the mirror reflected image, the MBR-SIFT1 and MBR-SIFT2 binary descriptors for the mirror reflected image are directly constructed from BR-SIFT1 and BR-SIFT2, respectively, by inverse coding, which significantly reduces the computational time. The proposed binarization algorithm includes bitwise operations, such as binary digits exchange. The computational speed of MBR-SIFT is faster than that of SIFT and slightly lower than that of BSIFT.

Two-step matching
To consider matching accuracy and computational efficiency, we present a coarse-to-fine twostep matching strategy. Coarse matching is performed based on the BR-SIFT1 and MBR-SIFT1 descriptors. The Hamming distance is used to measure the similarity between descriptors. The keypoints that correspond to the minimum distance are selected as the candidate keypoints for further use. Then, in the refining stage, the matching pair is selected from the candidate keypoints using the improved Hamming distance as the similarity measurement between the BR-SIFT2 and MBR-SIFT2 descriptors.

Coarse matching
Suppose I 1 and I 2 are an image pair to be matched. For example, let B 1 1 denote the BR-SIFT1 descriptor of keypoint a 1 in I 1 , and B 1 2 and M 1 2 denote the BR-SIFT1 and MBR-SIFT1 descriptors, respectively, of keypoint a 2 in I 2 . Then calculate the Hamming distance between B 1 1 and B 1 2 , and B 1 1 and M 1 2 , and take the smaller value as the distance between keypoints a 1 and a 2 . In coarse matching, the BR-SIFT1 descriptor of each keypoint in I 1 is compared with that of both the BR-SIFT1 and MBR-SIFT1 descriptors in I 2 . Finally, the first n keypoints with the smallest

-A 2 A 4 -A 3 A 8 -A 4 A 7 -A 8 A 6 -A 7 A 5 -A 6 A 9 -A 5 A 10 -A 9 A 11 -A 10 A 12 -A 11 A 16 -A 12 A 15 -A 16 A 14 -A 15 A 13 -A 14 A 1 -A 13
original BR-SIFT1 descriptor where d min and d 0 min represent the smallest distance and second smallest distance, respectively, and ratio 2 [0, 1] is a predefined threshold value. If ratio is equal to one, n has a higher probability of being two, while too few candidate keypoints affect the matching accuracy. By contrast, if ratio is equal to zero, n has a higher probability of being five, while too many candidate keypoints decrease the matching speed. Considering these two aspect of the matching problem, ratio is set to 0.5.

Fine matching
We redefine the similarity measurement between two keypoints based on the Hamming distance. As shown in Algorithm 1, the new similarity measurement is designed for 256-bit BR-SIFT2 or MBR-SIFT2 descriptors. For these 256-bit binary descriptors, the Hamming distance is calculated once per four bits. If the result is zero, counter counter1 or counter2 is increased by one, which can further improve matching accuracy.

Experiment
We evaluated the proposed approach, MBR-SIFT, on a public dataset, the UKBench dataset [18], which contains 10,200 images from 2,550 object/scene groups. Each group consists of four images taken from different views or in different imaging conditions.
To demonstrate the effectiveness of the improved similarity measurement, we also implemented another version called MBR-SIFT', which still used the Hamming distance instead of the improved Hamming distance for fine matching. Moreover, we also implemented the original SIFT, and Chen's [15] and Zhou's [16] methods to compare accuracy and efficiency. Moreover, several local binary features, such as CS-LBP [19], BRIEF [20], BRISK [21], and FREAK [22], have been proposed recently with promising performance in image matching, and we also implemented them to compare their potential with BSIFT in image matching. All of these methods used Eq (11) to determine whether the keypoint pairs were matched.
The results are presented with recall versus 1-precision [11,12] given by recall ¼ tn en ð12Þ where tn and en represent the number of correct matches and ground truth number of matches between the images, respectively, and fn and qn represent the number of false matches and total number of matches between the images, respectively. To evaluate the performance of the image matching method, we need to determine matching pairs as much as possible with high accuracy [23], that is, when 1-precision is the same, the performance for the method with a higher recall is better.  Fig 3a-3d shows the matching results of the first four methods, and the second row shows the matching results of the latter four methods), in which the blue lines and red thick lines represent the correct matches and false matches, respectively. Table 2 shows the matching results over the mirror reflection. It can be observed from Table 2 that the performance of MBR-SIFT is superior to the other methods in terms of accuracy and recall. Additionally, the image pairs of Fig 3a and 3b were generated from an artificial reflection. By contrast, the image pairs of Fig 3c and 3d originated from a mirror image, which implies lower similarity. Therefore, in terms of MBR-SIFT, the recall of Fig 3a and 3b is much higher than that of Fig 3c and 3d. In terms of CS-LBP and BRIEF, the accuracy and recall of Fig 3a-3d are zero, which demonstrates the worst performance. The main reason is that both CS-LBP and BRIEF are not robust to rotation and scaling.  . This is mainly because the improved distance measurement was used, which led to a stricter matching criteria than that of the other methods, and thus achieved higher matching accuracy. Despite this, recall was slightly decreased compared with other methods. However, matching accuracy is more important than recall in image matching.

Matching accuracy and efficiency
Moreover, we also compared CS-LBP, BRIEF, BRISK, and FREAK with SIFT in addition to its variants. As shown in Fig 5, all of the four methods, CS-LBP, BRIEF, BRISK, and FREAK, performed much worse than SIFT and its variants in accuracy and recall. This is mainly because their discriminative power is not as good as SIFT and its variant feature BSIFT.
As shown in Table 3, the efficiency of the SIFT method was the lowest among the nine methods. Regarding MBR-SIFT, its computational time was approximately the same as that of MBR-SIFT', slightly higher than that of Chen's method, and lower than that of Zhou's method. This is because Chen's and Zhou's methods generated 128-bit and 256-bit binary descriptors, respectively, whereas MBR-SIFT and MBR-SIFT' generated binary descriptors that included both 128 bits and 256 bits. The efficiency of the other four methods, CS-LBP, BRIEF, BRISK, and FREAK, was higher than that of SIFT and its variants. This is because the number of features extracted by the four methods was significantly lower than that of SIFT and its variants.

Parameter analysis
Next, we conducted the experiments on 200 image pairs to investigate the impact of the number of candidate keypoints n on efficiency and accuracy. For convenience, we ignored the criteria in Eq (10), where n is set in the range (0,40). In terms of efficiency, as shown in Fig 6a, the matching time increased as n increased. It can be observed from Fig 6b that the accuracy first increased as n increased to four, and then remained stable as n continued to increase. This implies that the candidate keypoints through coarse matching contained the correct matching pairs when n was greater than a certain value. In considering both matching time and accuracy, the maximum of n was set to five.
To improve matching efficiency, n is equal to either two or five in Eq (10). The value of n is determined by ratio. Fig 7a and 7b show the matching time and accuracy for different ratios, respectively. It can be seen from Fig 7a and 7b that both the matching time and accuracy decreased as the ratio increased, that is, the smaller the value of ratio, the longer the matching time and the higher the matching accuracy, and vice versa. Ratio is in the interval [0,1]. When ratio was set to zero, the value of n was five. When ratio was set to one, the value of n was two. This implies that the value of ratio only affected the matching time and accuracy for n between two and five. Therefore, in considering both the matching time and accuracy, ratio was set to 0.5.

Conclusion
In this paper, we presented a binary SIFT descriptor (MBR-SIFT), which was achieved by reconstructing the SIFT descriptor. The MBR-SIFT descriptor is invariant to mirror reflection  while being robust to rotation, scaling, viewpoint, lighting, and blur changes. Additionally, we also presented a coarse-to-fine two-step matching strategy, in addition to a novel similarity measure to further improve the performance of image matching. The experimental results show that the proposed method can achieve higher matching accuracy, whereas recall is slightly lower. In future research, we will consider how to ensure both high accuracy and recall.