A novel image registration approach via combining local features and geometric invariants

Image registration is widely used in many fields, but the adaptability of the existing methods is limited. This work proposes a novel image registration method with high precision for various complex applications. In this framework, the registration problem is divided into two stages. First, we detect and describe scale-invariant feature points using modified computer vision-oriented fast and rotated brief (ORB) algorithm, and a simple method to increase the performance of feature points matching is proposed. Second, we develop a new local constraint of rough selection according to the feature distances. Evidence shows that the existing matching techniques based on image features are insufficient for the images with sparse image details. Then, we propose a novel matching algorithm via geometric constraints, and establish local feature descriptions based on geometric invariances for the selected feature points. Subsequently, a new price function is constructed to evaluate the similarities between points and obtain exact matching pairs. Finally, we employ the progressive sample consensus method to remove wrong matches and calculate the space transform parameters. Experimental results on various complex image datasets verify that the proposed method is more robust and significantly reduces the rate of false matches while retaining more high-quality feature points.


Introduction
Image registration is mainly for obtaining the transformation parameters between images taken at different times from different angles and sensors with translation, rotation, scaling and/or distortion to get the best match in the pixel layer [1]. Image registration is a fundamental issue for many computer vision technologies, such as image restoration, targeting, tracking, image stitching, image fusion, 3D reconstruction, and pattern recognition [2,3]. It is an important preliminary step to improve the accuracy and validity of the above problems.
Image registration techniques play an increasingly important role in various fields. For military applications, to improve precision strike capabilities, various images from different sensors, such as infrared, radar, and hyperspectral imaging, require high-precision registration [4,5]. The requirements are similar for civil use equipment for security monitoring, traffic PLOS  proposed the scale invariant feature transformation (SIFT) algorithm [14]. The SIFT algorithm provided a huge improvement in accuracy for corner detection. It is robust to light changes, noise, and affine transformation and can achieve sub-pixel accuracy [15]. Bay et al. [16] improved SIFT's efficiency and proposed the speed up robust feature (SURF) algorithm, which used Hessian matrices and distributed descriptors. SURF allows multiple images in a scale space to be processed simultaneously and does not require image subsampling. Therefore, it can effectively reduce descriptor dimensionality and significantly improve calculation speed while guaranteeing accuracy.
To apply the feature point detection algorithm in real time, Rosten and Drummond [17] proposed a new method called the features from accelerated segment test (FAST) algorithm. FAST compares surrounding pixels to obtain key points using machine learning. Therefore, it is simple, effective and easily ported to embedded systems [18]. Calonder et al. [19] proposed the BRIEF descriptor by comparing the PCA, LDA and other feature dimensional reduction methods. It reduces the time needed to generate feature descriptors by calculating and matching binary strings. Subsequently, FAST and BRIEF were redesigned by Rublee et al. They proposed the ORB algorithm at the 2011 IEEE International Conference on Computer Vision, which provided significant advantages in performance and speed [20]. In recent years, the registration technology based on image features has many applications. Zhang et al. [21] registered medical images by establishing a key feature model to describe the features and matching the corresponding points via a geometric constraint. Because the traditional methods were insufficient to achieve adequate results under different image deformations, Kahaki et al. [22] proposed an invariant feature matching method to overcome the limitations by measuring the dissimilarity of the features through the path based on eigenvector properties. Then, they achieved the registration of high resolution IKONOS satellite images. Li et al. [23] proposed an approach to robustly build key point mappings on multispectral images, and a similarity transformation was considered to account for the misalignment between two images. Lee et al. [24] proposed an application of the SIFT algorithm to stitch cervical-thoracic-lumbar (C-T-L) spine magnetic resonance (MR) images, and the results indicated that it can be improve diagnosis capabilities.
The local feature detection and matching methods described above have good real-time performance, noise immunity, robustness, and other positive characteristics, but image registration remains a challenging research topic [25]. Registration accuracy, reliability and computational time are three important characteristics that constrain universal registration methods in different circumstances. The traditional methods, such as SIFT or SURF, have excellent performance and high precision. However, when the images lack texture features, the feature points are difficult to extract and describe, and the matching result will be similar to Fig 1A. For images with rich textures, although it is possible to extract a large number of high-quality feature points, key point selection and precise matching still have many problems, particularly for images that are captured at different times, phases, or using different sensors, such as medical images. These images often have relative distortion, deformation, and/or uneven illumination. Traditional matching methods based on global or local features are limited by key point quantity and quality, which makes it difficult to guarantee precise results. In addition, it is difficult to describe the invariance of an image with many similar feature points by traditional methods. Therefore, we propose solving these problems by combining the modified local features and geometric invariants.

Feature detection and description
Although many modified registration methods based on image features can theoretically enhance computational efficiency, ORB always performs better in complex scenarios [6]. It is fast, effective and accurate. Therefore, during feature detection and description, we improve the ORB algorithm to extract higher quality feature points to satisfy more complex applications.
In this step, we use the improved FAST-9 algorithm to detect features. A feature discriminant response function T is defined as where G(p) and G(i) are the grayscale values at p and its neighboring points and ξ is the threshold value. Here, we set the threshold to 40. When comparing the 16 neighbor points, if there are 9 consecutive points in the circular boundary of the neighborhood and their grayscale values are larger than ξ, it is judged to be a feature point. Then, the corner response function proposed by Harris was used to select from the identified feature points. Then, the directions of feature points need to be calculated. For any of the feature points, the neighbor moments M pq of the neighborhood pixels and the centroid of these moments C can be expressed as where x, y are the positions of feature points and the centroid can be calculated by M pq . The angle between the feature point and centroid is set as the dominant orientation, expressed as α = atan 2(M 01 ,M 10 ), where atan2 is the quadrant-aware version of arctan. These feature points provide directional invariance but are not scale invariant. This will be improved later.
After detecting the feature points, the improved BRIEF descriptor is used to describe these features. First, the point pairs are randomly generated in image patches. Let (p(m),p(n)) be the grayscale values for a point pair p, where each point pair corresponds to a binary string test λ.
( Then, k point pairs are randomly selected for generating a binary string. The feature descriptors D k are expressed as These descriptors are based on the pixel values and are easily affected by noise. Therefore, a neighboring sub-window of feature points is defined, and the pixel value is replaced by comparing the gray-level integration of the sub-window. To ensure the descriptor has rotational invariance, n pairs of features are chosen at points (x i ,y i ) and form a matrix S.
where (x i ,y i ) are the coordinates of the points. Then, using the dominant orientation and affine transformation matrix obtained in the feature detection stage, a new feature description matrix S' and descriptor D' can be calculated by rotating the affine transformation matrix such that Finally, using greedy search, 256 pixel pairs with minimum correlation can be found to describe the features.
To achieve scale invariance, traditional methods use the multiscale partitioning before feature detection, and then feature detection and extraction are separately performed. However, this often results in considerable mismatched features between low-and high-resolution images, which reduces the final matching rate. The SIFT algorithm also suffers from the same problem. Bastanlar et al. [26] proved this issue and proposed a preprocessing SIFT (PP-SIFT) solution.
In this work, we proposed an optimized method to reduce mismatches. The following processing steps are added to detect and match features of multiscale images.
Step 1: For images at a high resolution, we adopt a Gaussian low-pass filter and down sampling both horizontally and vertically.
Step 2: Apply ORB matching to images and plot the histogram of scale ratios.
Step 3: Form the histogram of scale differences and define a window |H max ± ω| around the peak of histogram H max . Parameter ω is set between 0.20 and 0.35. The matches with scale differences outside this window are rejected.
Step 4: Extract the correct scale ratio (d) from the histogram as the mean of the most dominant Gaussian in the mixture.
Step 5: Accept only the matches with a scale ratio between 0.6d and 1.4d.  [27,28]. The result shows that mismatches are significantly reduced, and a large number of correct matches remain after the optimization.

Feature selection and matching method
The feature descriptor mentioned above is a binary string that provides increased storage and matching speed. The traditional matching method often utilizes the brute force (BF) [29] algorithm to match the feature points, which is followed by the RANSAC algorithm to eliminate mismatches. It is effective for normal scenes [30]. However, for complex applications, such as the matching of medical images or remotely sensed images, the generation of a large number of interference points can frequently occur and lead to a high mismatch rate [22]. Furthermore, when the feature points are sparse, it is difficult to guarantee an accurate matching [31]. To address this challenge, we propose new constraints of rough selection according to the distribution of feature points. We establish a new feature description vector and matching criterion based on geometrical relationships and employ the PROSAC algorithm for accurate matching.

Constraint by feature distribution
Let hd 1 and hd 2 be the binary strings of feature descriptors for two images constructed by the ORB algorithm.
( where p and q are the descriptors of two images. Then, the Hamming distances D of the image features is the XOR operation for the descriptors In traditional methods, feature points with Hamming distances smaller than a previously set threshold [32] ε are where D j are the distances of the jth match point pairs and Match j are the selected matches. This method is applicable for images with relatively even distributions of features. However, if the image includes an energy-focused region or a region with an intensive feature distribution, it is difficult to define an appropriate threshold to avoid mismatches between nearby feature points. Since these points below the threshold are very similar, the image contains many lowquality feature points, especially in areas with strong noise interference. Fig 4 is the distribution of feature points for two images. Fig 4A shows the detected feature points. The selection result is expressed by green circles when the threshold satisfies ε = 60, and the red stars ( Ã ) show the reliable matches, as shown in Fig 4B. The traditional method filters most of the low-quality feature points, but many correct matchings are also removed. The threshold parameter is unstable and unreliable as a selection standard. Fig 4C shows the fitted curves based on the probability statistics of the feature point distances. It has a large overlap area around the mean with a corresponding high contact ratio, and most of the reliable feature points are distributed in the overlap area. Therefore, the mean centered constraint condition is set as a rough selection in this paper. We define the mean of root mean distance " D as The selection window R is centered on " D and defines the matching points to be retained.
where ε 1 and ε 2 are the upper and lower limits of R, respectively, and can be modified according to the image feature distribution density. Here, we set ε 1 to the medium value between the minimum distance and " D and set ε 2 to the medium value between the maximum distance and " D.
The distance constraint can remove significant errors and retain most of the reliable feature points, but it still needs further screening. KNN bilateral matching is employed in the following steps to select more reliable matching points.
Let p i be a key point in the current frame, and let p j1 and p j2 be the two nearest matches of Hamming distance in the corresponding reference frame. Their distance are D(p i ,p j1 ) and D (p i ,p j2 ), which are optimal and sub-optimal, respectively. Similarly, there are two corresponding matching points for a key point in the reference frame with distances D(p j ,p i1 ) and D(p j , p i2 ). There are also two candidate matching points based on the descriptor distance in another image.
The ratio of the optimal and sub-optimal value is used as the selection condition for the two images' feature points. Two better quality sets of key points can be obtained using Here, we set t to 0.65 according to the experiments. Finally, matches that simultaneously satisfy both conditions are the respective optimal matches.
After applying the distance constraint of the selection window and bilateral matching, many false matches are filtered without using RANSAC, as shown in Fig 5.

A matching method based on geometric invariants
After the rough selection, the remaining feature points are more robust with higher quality. However, many mismatches may still exist for complicated situations. The main reason is that not all of the points identified by the Hamming distance match are correct matching pairs, and key point distance is only one factor considered for matching. The method has some limitations, especially when lacking texture and details or when there are few feature points [33]. It is also difficult to exactly match image pairs that have large deformations.
Therefore, this paper considers geometric invariance as a reliable matching factor, and a new method based on geometric invariants is proposed to provide further selection and matching of feature points. The geometric invariance was often used to describe the shape of objects, such as the shape context algorithm proposed by Belongie et al. [34]. It considers the object's shape and contour in the image and makes full use of contextual information for image sequences. The algorithm uses the log polar histogram to describe the contour sampling distribution, and it is widely used for digital recognition, trademarks, and the like. Here, we briefly introduce the principles of this method and then propose our method.
Let the set P = {p 1 ,p 2 ,. . .,p m } with m sampling points describe an object's shape. The log polar histogram H i (m) of the other m−1 sampling points is calculated as the shape contextual descriptor for each point.
and the log polar transformation (LPT) can be expressed as Then, divide ρ into five equal parts, divide α into twelve equal parts, and form k sections. Every point has its own distribution relative to the others, so the number of sampling points in each sub-sector domain can be used as the similarity criterion. Accordingly, using the matching cost function F i,j between feature points p i and p j , we get The feature point matching problem can be converted to match the weighted undirected bipartite graph. Finally, using the Hungary algorithm, we can find the optimal match and minimum cost value. Thus, the key points can be easily matched.
The feature of shape context can be easily extracted. The image scale and rotation transformation in the Cartesian coordinate system can be converted into the translation in the log coordinate system using LPT. Therefore, it has good scale and rotational invariance.
This algorithm has advantages for matching object shape, but the result may be affected by image noise and edge detection. Additionally, when objects are deformed, the matching accuracy and stability may be compromised [35]. The algorithm also requires that a point set must be the subset of a larger group, which is difficult to satisfy for image registration. Therefore, this paper utilizes the underlying theory of this method and proposes a new model to filter and match the feature points.
We define the matching point set obtained after coarse selection as R = {p 1 ,p 2 ,. . .,p n }. We calculate the distance from each key point p i to the other n − 1 key points p j without dividing the sharp histogram. The distances D i (n) between these points can be expressed as Assuming that k is the number of feature points, a feature description matrix with k × (k − 1) dimensions can be obtained.
If the dimension is different between two images, the similarity among feature vectors cannot be measured by Eq (18). Therefore, an improved descriptive model is proposed. Let m and n be the number of feature points in the two images, with the feature vectors ( Then, a new matching cost function F i,j that considers the feature point distances is defined as where D i (s) and D j (t) are the feature vectors of the current image and target image, respectively. σ is the controllable distance error threshold set it to 1 in this paper. Finally, we construct the binary search trees for every key point according to the cost function. Then, we calculate the ratio of the previous K nodes and compare them with threshold T to judge whether they are an acceptable matching point pair.
where F i,j (m) is the matching cost function and F i,j (max) is the maximum matching. We set the parameter T to 0.8 in this paper. This method considers location distribution and geometrical relationships among the feature points. Even in the case of few feature points, such as Fig  1A, it can achieve highly accurate matching, as shown in Fig 6. To make the matching method more robust, we employ the progressive sample consensus algorithm that improves the RANSAC to remove outliers. The RANSAC algorithm does not deal well with the situation in which the number of mismatched pairs is too large in the proportion to the total matched pair, and it may fail, as shown in Fig 1C. The progressive sample consensus algorithm uses a data subset with a high matching rate as the sample set to estimate fitting [27]. It realizes rapid convergence and can deal with high mismatching. Following the methods discussed above, we can select feature points.
Then, the matched points can be easily sorted according to quality. The steps to remove the final mismatched pairs in this paper are as follows.
1) Feature points, identified following the procedures discussed above, are sorted in descending order by matched degree.
2) Set the sampling frequency, sample set λ, and sample size η. Each sample includes the coordinates of one feature point. The initial sample size η = 4 is the minimum to estimate the transformation matrix. After each loop, the sample size increases to η = η + 1.
3) Determine the initial 4 sample set. Three matching point pairs are randomly chosen from λ, and then the ηth matching point pair from λ is added to constitute the initial sample set. The transformation matrix M transforms the coordinate from (u',v') into (u,v) and can be estimated from the initial sample set, 4) Judge whether the initial matching point pair can satisfy the following two sampling termination conditions. If the conditions are not met, repeat step 2. Otherwise, exit the loop.
① The ratio between the number of inner points and the total points is larger than the error threshold, ξ. If p and p' are a matching point pair, the condition to be an inner point is kMp−p'k 2 ξ.
② The rate of increase of inner points should be less than the increase threshold, ξ'. In other words, the number of inner points should increase slowly after a certain number of samplings.

Experimental procedure and results
The SIFT algorithm is often used in image registration and has better performance than many others methods. Here, we compare the SIFT algorithm to the BF algorithm employed to match the feature points and the RANSAC algorithm employed to eliminate mismatches. We show the final visual matching results. To examine the performance and robustness of our proposed method in various situations, test images were chosen with different resolutions and different applications. Fig 7, Fig 8, Fig 9, Fig 10 and Fig 11B are from the public image database. The other images were newly taken using a digital camera. Visual qualitative contrasts (including the connecting line between matching point pairs in two images) and quantitative comparisons (involving the different parameters and calculation results) were performed. Qualitative comparison results The SIFT algorithm combined with RANSAC has good performance, but there are still some significant mistakes that are difficult to remove by using RANSAC. In our experiment of other registration algorithms, such as SURF and ORB, the results have similar outcomes to SIFT. All methods perform well when faced with a rotation of over 30 degrees, as shown in Fig 7C. The proposed method is significantly more accurate, and it provides a more uniform key point distribution due to the multi-layer selection. Fig 8 shows the outcomes for more challenging images with excess exposure of the face. The images in Fig 8A and 8B are from the public Purdue AR face image database [36]. The SIFT algorithm produces many mismatches, whereas the proposed method has more reliable matching. Large numbers of correct matches are produced for varying illumination. most of them are removed by RANSAC. The SIFT algorithm provides many mismatches, while the proposed method can precisely complete registration. Fig 9B images are from the launch of the Tiangong rocket at different times. They have rich details, but the image content changes significantly. There are also large differences in brightness between the tail flame of the rocket and the dark background. The changing flight altitude and thick smoke can also seriously affect registration. Fig 1A and the upper image in Fig 9B are the result of SIFT and ORB, respectively. It shows that both the SIFT and ORB algorithms fail to accurately register this case. In contrast, the proposed method still has a large number of high quality matches.
In summary, the proposed method shows excellent performance in feature extraction, selection and matching, even for very complex images.

Quantitative comparison results
A further four groups of images were analyzed using SIFT and ORB with BF. Then, RANSAC was used to remove feature selection mismatches. These images correspond to different situations, including low signal-to-noise ratio, high dynamic range, lack of textures and medical images with local similarity. The number of matching points, the points after screening, the number of false matches and the matching rate were calculated and compared. Visible results and the quantitative comparisons for these test images are shown in Figs 10 and 11 and Table 1, respectively. Fig 10A images show strong noise interference and low image quality. Both the SIFT and ORB algorithms have nearly half the error matches after screening by RANSAC. The proposed method removes the low reliability matching points, thus significantly reducing the final mismatch rate.
The images in Fig 10B have a large dynamic range and many similar features, but they lack details. Although a large number of feature points is extracted by SIFT and ORB, the methods   N 2 ). Therefore, the number of key points involved in matching will directly impact the algorithm's efficiency. In general, in the step of feature detection, the number of key points is similar between SIFT/ORB and the proposed method, but the proposed method's processing can be up to 10 times faster than SIFT. In addition, the proposed method's processing speed is similar to that of ORB, but the number of correct matches is far greater than that of ORB.

Conclusion and future work
Considering the limitations of traditional methods, this paper proposed a fast and robust image registration approach based on local features and geometric invariants. In the step of feature detection and description, we proposed an improved method of the ORB algorithm. The proposed method is scale invariant and produces more higher quality feature points. Then, we improved the removal of mismatches by combining it with the distribution of key points. A new distance constraint window is set according to the distribution of feature points, and the bidirectional matching constraint from the K nearest neighbor is utilized to extract higher quality feature points.
To further improve the method's adaptability and robustness and obtain the optimum matching point pairs, we proposed a novel geometric constraints matching algorithm with a new feature description vector based on the geometric invariance and a new cost function. Appropriate selection criteria were established to remove unreliable matches, and we integrated the PROSAC algorithm to further remove false matches. Thus, even with complex situations, we are still able to accurately register the images.
The experimental results show that our proposed registration method has superior adaptability and stronger robustness in terms of increasing the number of reliable key points and reducing the mismatch rate compared to the SIFT and ORB algorithms.
Future work may include improving the efficiency and real-time applications. The proposed method has fast matching speed using binary string descriptors, so parallel processing may be considered to allow higher resolution images to be processed in real time.