Computation of the normalized cross-correlation by fast Fourier transform

The normalized cross-correlation (NCC), usually its 2D version, is routinely encountered in template matching algorithms, such as in facial recognition, motion-tracking, registration in medical imaging, etc. Its rapid computation becomes critical in time sensitive applications. Here I develop a scheme for the computation of NCC by fast Fourier transform that can favorably compare for speed efficiency with other existing techniques and may outperform some of them given an appropriate search scenario.


Introduction
Covariance, by definition, provides a measure of the strength of the correlation between two sets of numbers (or time series). A serious setback of the covariance is its dependence on the amplitude of either of the series that are compared. This dependency is eliminated if one uses the normalized form of the covariance, referred to as the normalized cross-correlation (otherwise known as the correlation coefficient). The simplest form of the normalized crosscorrelation (NCC) is the cosine of the angle θ between two vectors a and b: NCC is one of those quantities with application in a variety of research fields as diverse as physics [1,2], signal processing [3][4][5][6][7], engineering [8,9], medical imaging [10], and statistical finance [11]. The similarity of the mathematical expression for the numerator of the NCC (see section 1, Eq (4)) with that of a convolution is striking and implicates that its computation must be optimal in the Fourier transform space [3]. Starting from this premise, Lewis [5] conjectured and numerically proved that the numerator of the NCC can be efficiently computed in either the spatial or the frequency domain contingent upon the parameters of the problem. For an overall rapid computation of the NCC it is necessary to have a likewise efficient computation of its denominator. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 A number of techniques have been proposed by different authors for the fast computation of the NCC. These techniques have their merits and drawbacks, most of them excelling in very specialized cases. Here we mention three of these techniques: [5,6] and [10].
Lewis [5] proposes the computation of the denominator of the NCC in direct space by keeping sum-tables in order to avoid repetitive computations. The numerator of the NCC can however be computed either in the direct space or in the frequency domain. As Lewis notes in the introduction to [5] "Unfortunately the normalized form of correlation does not have a correspondingly simple and efficient frequency domain expression". This statement is of importance because only one or the other (spatial or frequency domain) computation can be optimally fast given the parameters of an application. This fact is duly observed by Lewis in section 4 of [5] where he writes, referring to searching a feature of length N in a time series of length M: When M is much larger than N the complexity of direct 'spatial' computation is approximately N 2 M 2 multiplications/additions and the direct method is faster than the transform method. The transform method becomes relatively more efficient as N approaches M and with larger M, N.
The computation of the denominator in the frequency domain is addressed in the present paper. While the numerator of the NCC is the covariance between two different time series, either of the two terms under the square root in the denominator represents the covariance of that series with itself (i.e. auto-covariance) at lag zero. If an efficient computation for the numerator in the frequency space is possible, one can expect this to be possible for the denominator as well.
Luo and Konofagou [10] propose the computation of the NCC exclusively in direct space by keeping sum-tables for both the numerator and the denominator. While Luo's approach cannot outperform Lewis computation when a single feature of a given size, selected from one time series, is exhaustively searched for throughout the other time series, it can however be advantageous for other search scenarios [10].
Yoo and Han [6] propose a scheme for the computation of the NCC that considerably reduces both the number and the complexity of the required computational operations as compared with other full computing schemes. The proposed scheme is based on approximative assumptions. The gain in the computational speed is done at the expense of an algorithm that generates false positives in a feature search scenario. The rate of false positives depends on the noise level present in the time series, and can be minimized by tuning parameters in the algorithm. This performance dependence on the noise level can be considered a setback when "shoot first and ask later" approach can be a problem. It is also not clear how this algorithm can be extended and will perform if signals were complex-valued.
The algorithm presented in this paper can handle complex-value signals. It can be considered as an extension/supplement of the work done by Lewis in that the fast Fourier transform (FFT) is used to compute both the numerator and the denominator of the NCC. The proposed algorithm opens the possibility for the computational parallelization (FFT threads) of the procedure and may offer speed gains for appropriate template and search region size [5].
The paper is organized as follows. Section 1 exposes the mathematical formula for the 1D complex NCC and lays the ground for developing the proposed FFT computational scheme, which is done in Section 2. The 2D formulas can be found in Appendix C, their derivation follows exactly the same steps as the 1D. In Results, two measurements are considered as test cases. Both of them are real-valued, the first is 1D and the other 2D. The python code developed for the computation of the NCC can handle complex-value measurements and is listed in Appendix B. An illustrative complex valued 1D test case is provided in Supplements 1. Python programs as well as the data sets used for the 1D and 2D illustrations can be found in the supplements.
The time spent in computing the NCC for the 1D test case is tabulated below for several numeralical methods/schemes used. The timing is done on optimized C++ code, it certainly is processor dependent and is shown here to create a general comparative idea. The execution time is averaged over 15 runs, with 5 ms wait time between runs to account for the processor delays. The parameter values used for this 1D test can be found in the Results section. The test shows that: 1. Lewis NCC with direct space computation of the numerator is slightly slower than when FFT is used instead (the denominator is computed in direct space using sum-tables in both cases), Table 1. 2. The proposed computation is slower than Lewis method with FFT computation of the numerator for the parameter values, i.e. template and search region lengths, investigated in this paper, Table 2. An additional test case comprising 1D complex value measurements and of larger size is investigated in Supplement 1. Other possible computational optimizations (multi-threads, wisdom FFT) are not investigated in this work.
3. The Luo's computation of NCC using sum-tables for the numerator is not faster than the Lewis algorithm for the search scenario tested here, Table 3.
To have a fair comparison of the computational speed between the Lewis' and the proposed method, both methods are implemented for complex-value time series (requiring approximately twice as many computations as for real-value time series, see the above listed computational times of Luo's method that handle real or complex time series). The two methods are then however applied on real-value time series for simplicity.

Section 1
Consider two, complex-value and periodic, functions of time: f(t) and g(t), both having the same time period T. Suppose that these two functions are uniformly sampled with a time step Δt such that T = nΔt. Using the discrete Fourier transform formalism the sampled func- Here fft, ifft are respectively the fast Fourier transform function and its inverse as defined in MATLAB, j represents the imaginary unit, i, k, p, q will represent integer variables, and n, m are integer parameters. Similarly, we symbolically write Consider two subsets of consecutive data points of the same length m extracted from the discrete time series {f i } and {g i }, as shown in Fig 1. Hereafter The normalized cross-correlation (NCC) between these two subsets is defined as In the following section I present the scheme for computing the NCC using the fast Fourier transform.

Section 2
Start by calculating is the Dirichlet kernel. Next calculate By variable substitution k = n − 1 − k@ and relabeling obtain where The dependency of a k, k 0 on m is left out of notation for clarity. In the following I transform the double sum to a single sum. Fig 2 helps with visualizing the procedure.
The double sum suggests that we sum along columns per each row, and then sum all terms so obtained. Chose, instead, to first sum along the secondary diagonals per each secondary diagonal, and then sum all obtained terms. By doing this the k + k 0 + 1 = l mod n remains constant during the first summation and the exponential term depending on p is factorized. Therefore with Straightforward calculations using definitions and index relabeling reveal that Now, Relabeling, noting that γ 0 = 1, and condensing Then Similar calculations hold for g and jgj 2 .
Finally calculate the remaining term that completes the computation of NCC using the fast Fourier transform where with

Results
The method is illustrated with two examples, both drawn from MRI measurements. The first example is 1D and relates to a rigid phantom, the intensity profile of which is measured at two different positions, 30 mm apart (as read on the landmark meter of the scanner) along the bore of the magnet. The two profiles differ partly because the static magnetic field in the scanner deviates even so slightly from homogeneity and partly because of the non-linearity of the gradients. The contribution of the measurement noise to the profiles' difference is of a less consequence. In this measurement the field of view (FOV) is 300 mm and the resolution xres = 128. The intention is to find the phantom's displacement by using only the information provided by these two intensity profiles. This is done by selecting a template, which represents a feature of interest, in one of the profiles, say the dashed, black line in Fig 3a. In our example the start position and the length of the selected template are q = 80 and m = 27 samples respectively. We then search for the presence of a similar feature in the other profile, the navigator, plotted as the solid, red line in Fig 3a. Because the two profiles are different, the degree of similarity between the template and chunks of the same size extracted exhaustively from the navigator profile is determined from the value of the normalized cross-correlation (NCC). The similarity of the chunk to the template is higher the larger the value of NCC between the two is, with the chunk being identical to the template for NCC = 1. The total number of chunks that can be construed from the navigator profile is n − m + 1. Here n = xres = 128, and n − m + 1 = 102. The NCC values are plotted in Fig 3b and show that the feature of interest is located at the start position max = 67 where NCC reaches it maximum value. Fig 3c shows how the template visually compares with the most similar chunk if drawn on top of it. From the difference between the start positions of the template and the most similar chunk we can compute the translational distance between the two intensity profiles as: FOV Á (q − max)/xres * 30 mm, in good agreement with what the landmark meter on the scanner displayed. A python version of the code generating the data used in the plots is listed in Appendix B and can be downloaded from the supplements.
The second example is 2D and relates to a patient resting on the scanner's table while MR images are being taken. There is no gross motion of the patient, but there is motion of his bowels. The intention is to track the motion of a region of interest from one time frame to the next. Here we display two time frames and select a template on the first. We search throughout the second frame to find where the most similar (with the template) 2D chunk is located. This test case is one of the most complicated for motion tracking as there is displacement of the region of interest out of the 2D fixed plane where images are being taken into the third dimension as well as nonrigid transformation, i.e. deformation, of the tissues. NCC is not the most suitable metric to be used for feature tracking in cases like this, at least not without any adaptation [7]. However the method performs reasonably well given that the bowels reconfigure considerably from one frame to the next, as evidenced by the low value (0.67 vs. 1) of the maximum NCC. The result is presented in Fig 4. A python version of the code used can be downloaded from the supplements. Motion tracking using the NCC would perform better the smaller the topological difference between consecutive time frames is. Unfortunately this is not always possible with MR Images in spatial domain: the spatial resolution of an MR image is directly related to the time spent collecting data for its reconstruction. If the topological difference between two consecutive time frames is large, motion tracking using the NCC can encounter a "glitch" where a chunk of irrelevant anatomy is selected as the most similar region with the template. This encounter will result either with a lost track or a return to the track after the abrupt excursion away from the track (hence the term "glitch").
Comments in the listed code are kept to a minimum with the understanding that the code flow and variable labeling closely follows the derivations presented in the text of this paper. In the code, the computations related to the template are separated from other cross-correlation computations. This is done to increase the computational efficiency since the parameters related to the template need to be computed only once while the rest of NCC computation might need to be performed over a multitude of navigators (i.e. time frames). The code for computing the 1D NCC with Lewis' [5], and Luo's [10] algorithms is listed as well so that a comparison of the NCC values obtained with these methods can be done. The provided python code cannot however be used to compare the time efficiency of these numerical methods. Their time performance comparison is done by this author using optimized C++ code.

Conclusion
In this paper I have demonstrated an algorithm for the computation of the normalized crosscorrelation (NCC) using the FFT. It is shown that, for the data sets investigated, the computation of the NCC fully in the transform space is not faster than the optimized algorithm proposed by Lewis (in the Lewis optimized algorithm the numerator of the NCC is computed in the transform space and the denominator in the direct space). The comparative computational speed between the two algorithms seems to be unaffected by the template and the search window sizes with the Lewis' algorithm always performing faster. This finding is somewhat of a surprise considering that the physical nature of the numerator and denominator in the NCC formula is almost "identical" with the numerator standing as the correlation between two time series (cross-correlation) and the denominator involving the correlation of a time series with itself (auto-correlation). A final comparison of the computational speed between these two algorithms (Lewis' optimized algorithm and the one proposed here) would be their thread parallelization.  [2]] k = arange(1,n) kernel = (1.0/m) Ã ((exp(1 j Ã 2 Ã pi Ã m Ã k/n) − 1)/(exp(1 j Ã 2 Ã pi Ã k/n) − 1)) kernel = cat(([1 + 1 j Ã 0.0], kernel)) gc, gg, FTpg = template_functions(template, kernel, n, q, m, p) cc = complex_ccor(navigator, gc, gg, kernel, FTpg, n, q, m, p) lewis_cc = lewis_ccor(navigator, template, n, q, m, p) luo_cc = luo_ccor(navigator, template, n, q, m, p)

Appendix C
For the 2D case we have we get f ¼ ifft2ðfa k;k 0 g k;k 0 gÞj p;p 0 ð27Þ jf j 2 ¼ ifft2ðffft2ðfjf j 2 i;i 0 gÞj k;k 0 g k;k 0 gÞj p;p 0 ð28Þ with similar terms for for g and jgj 2 .
The 2D result presented is generated with the 2D python code that can be found in the supplements. The two files "image1.dat" and "image2.dat" loaded from the main subroutine, are plain text. They contain the template and navigator image respectively. Both images are 512x512 "pixels" with real-value samples scaled from 0 to 255. The code runs equally well for other image sizes and images with complex-value samples.