Stereo Matching by Filtering-Based Disparity Propagation

Stereo matching is essential and fundamental in computer vision tasks. In this paper, a novel stereo matching algorithm based on disparity propagation using edge-aware filtering is proposed. By extracting disparity subsets for reliable points and customizing the cost volume, the initial disparity map is refined through filtering-based disparity propagation. Then, an edge-aware filter with low computational complexity is adopted to formulate the cost column, which makes the proposed method independent on the local window size. Experimental results demonstrate the effectiveness of the proposed scheme. Bad pixels in our output disparity map are considerably decreased. The proposed method greatly outperforms the adaptive support-weight approach and other conditional window-based local stereo matching algorithms.


Introduction
Stereo matching solves the correspondence problem between stereo image pairs, which for a long time has been one of the most fundamental and challenging computer vision tasks. As designed by Scharstein and Szeliski [1], and which is widely acknowledged by later researches, a four-step framework of Stereo matching (as Fig 1 shows): matching cost computation, cost aggregation, disparity computation and disparity refinement may generate the dense and twoframe stereo problem. Most existing algorithms, including local, global as well as semi-global ones, perform all or some of these four steps.
Local stereo matching algorithms usually perform cost computation and aggregation by a simple winner-take-all (WTA) strategy. In contrast, global approaches transform the problem to an energy-minimization model, which formulate a global optimization function composed of a data term and a smoothness term, and perform global disparity optimization [2][3][4][5][6][7][8][9] by dynamic programming (DP), graph cuts (GC) [10,11] or belief propagation (BP) [12][13][14][15]. Although high accuracy can be achieved for disparity estimation through global optimization, the large computation and time cost also limits its implementation in real-time applications. Moreover, segment-based approaches utilize plane fitting from the initial disparities on each segment, based on the assumption that disparities vary smoothly and continuously within each homogeneous color segment. The result can be further improved by a global optimization model to find optimal parameters of disparity plane, as the labeling space is relatively small. Recently the concept of semi-global stereo matching is also proposed based on a recognition stage that the support pixels for cost aggregation should be selected from the whole image and not restricted in a local matching window. Representatively, Hirschmuller introduced the mutual information to compute pixel wise matching cost and aggregate the costs along multiple paths that end in the current pixel [16]. Yang built a graph using all image pixels as nodes and compute the matching cost adaptively based on pixel similarity on a minimum spanning tree (MST) [17]. Matching cost aggregation plays a critical role to reduce the mismatching rate when using the per-pixel matching function. For most local stereo matching algorithms this is achieved by summing up or averaging the matching costs in a surrounding window centered by the current pixel. Using adaptive support-weight [18,19] for neighbor pixels will take edges or textures into account and bring better performance by adapting big date processing technologies [20][21][22][23][24].
In this paper a filtering-based stereo matching algorithm is proposed. Based on the perception that aggregating matching costs in a rectangular window is equivalent to filter the cost volume, we conducted the adaptive support-weight cost aggregation through filtering. Particularly we used the edge-aware guided filter which has linear time complexity with respect to the image size, it would reduce the computational complexity and greatly save the running time. Then the reliable and unreliable points were obtained through crosschecking of two rough-estimated disparity maps. Lastly, the high confidence disparity estimates are propagated from reliable points to unreliable ones by filtering a customized cost volume. A feedback-based optimization can also be achieved by integrating the initial disparity map into the guide image and re-execute the filtering progress. The main novelty of the proposed method is the utilize of reliable points, by propagating of these point, high performance of stereo matching would be achieved.
The rest of this paper is organized as follows. Section 2 explains the idea that matching cost aggregation is equivalent to the cost volume filtering. In Section 3 the concept of reliable point and disparity subset is first introduced, then we show how to customize a new cost volume to propagate the useful information from reliable points to unreliable ones by applying edgeaware filtering. Experimental results are demonstrated in Section 4, along with some analysis and evaluation. Finally, we conclude our paper and discuss the future work in Section 5.

Cost Aggregation by Cost Volume Filtering
The robust dissimilarity measure taking both SAD and gradient into account [13] is used as the pixel-wise matching cost function. Then a cost aggregation procedure which sums up costs in a window is usually implemented for local stereo matching. In this way, cost aggregation is equivalent to applying filtering on the initial cost volume. Furthermore, the adaptive supportweight can be achieved by an edge-aware filter such as bilateral filter. For non-linear filtering each output pixel is calculated in O(r 2 ) if the kernel size is r Ã r, thus the computational complexity shoots up as the kernel size increases. The so-called O(1) or constant time bilateral filter [25,26], meaning the computational complexity is invariant to the kernel size, is designed for fast implementation.
In this paper we adopt the guided filter proposed by He et al. [27] to conduct the cost volume filtering. The guided filter is based on a local linear model, assuming that in a local window the filtering output q can be expressed as a linear transform of the guided image I: Where (a k , b k ) are linear coefficients which are constant in ω k , Considering the constraints from the filtering input p the linear coefficients can be derived as: Here, S k is the 3 Ã 3 covariance matrix of I in ω k , and U is a 3 Ã 3 identity matrix, μ k is the mean of image I. Then compute the output q using: The guided filter is also edge-aware like bilateral filter, yet has better performance near edge locations. Another advantage of guided filtering is that the time complexity is only O(1) for each pixel and O(N) for an image of N pixels. In contrast, traditional edge-aware filters such as bilateral filter has O(r 2 ) time complexity for each pixel when the local filter window has size of r Ã r, thus for the whole image it's O(Nr 2 ) time. As the size of filter kernel increases, the time cost for such filters increases rapidly as well. Although the guided filter follows the local linear assumption, its computation complexity is unrelated to the local window size. This property makes it more practical for cost volume filtering.
When performing guided filtering on the cost volume C, the input image to be filtered is a certain slice at disparity candidate d in the cost volume, and the input color image of reference view is used as the guide image. An initial disparity map can be generated using the winnertake-all (WTA) strategy based on the filtered cost volume.

Disparity Propagation of Reliable Points
The left-right consistency check is widely used to verify the accuracy of the disparity estimation for each pixel. In this work pixels that pass the cross-check are marked as reliable points, and accordingly the others are unreliable ones. It is reasonable that the information of reliable points in the cost volume has higher confidence and should do favor to other pixels.

A. Building Disparity Subsets
Denote the full disparity range as D full , a small subset D sub (p) is built for each reliable pixel p containing a couple of disparity candidates corresponding to the |D sub (p)| minimal matching costs in the cost volume. The size of the subset should be small as |D sub (p)| ≪ |D|. The extracted disparity values have the highest confidence among all candidates for the current pixel. Disparities out of this subset will be punished when filtering the cost volume. Fig 2 (a) shows the result of the cross-check based on the initial disparity map in (b). The unreliable pixels failing the cross-check are marked in red. Computing disparities in these red regions has to consult to the useful information from nearby reliable points.

B. Disparity Propagation Base on Customized Cost Volume
We customize a new cost volume based on the initial disparity map and the disparity subsets of reliable points: Where λ 1 < λ 2 indicating that disparities far from the subset D sub (p) will lead to larger penalty. However, the penalty is limited as we use the exponential function.
Once again we perform the WTA optimization at each pixel and get a refined disparity map. A constant-time median filter is then applied as post-processing to fill holes and remove peaks.

C. Integrating Disparity into Guided Filtering
The RGB image of the reference view is used as the guide image when filtering the cost volume. Notice that before the new volume is generated an initial disparity map is available. The estimated disparity can feed back to the filtering procedure as an extra channel of the guide image, and bear a hand in filtering the customized cost volume. We integrate the disparity channel into the RGB reference image and form a new RGB-D guide image. For guided filtering this means simply replace the three-dimensional vector I i in Eqs (2) and (4) with a four- dimensional one. Then disparity propagation using the new guide image and the customized cost volume will lead to refinement and optimization of the disparity map.

Experimental Results
Our method is evaluated on the standard Middlebury benchmark. All the experiments run on a PC platform equipped with Intel Core i5 CPU and 4GB memory. The size of the disparity subset D sub = 2, and parameters λ 1 and λ 2 are set to 0.04 and 1.2, respectively. Fig 3 shows the left view of the four test image pairs "Tsukuba", "Venus", "Teddy", "Cones" and their corresponding ground truth disparity. The third column shows the disparity maps generated using the proposed algorithm. Bad pixels with absolute disparity error larger than 1.0 are marked out in the last column. It can be observed that the proposed algorithm recovers satisfactory disparity maps even for complicated scenes. Most of the bad pixels lie near edges where occlusion often occurs and is challenging for all stereo matching algorithms. Table 1 gives the quantitative evaluation indicators of our results with error threshold = 1, along with results of some other representative algorithms such as adaptive support-weight approach and global optimization algorithms using Graph Cuts or Belief Propagation. Our method outperforms the others with respect to the average percent of bad pixels, especially for the "Cones" image pair on which our method is in the top 3 considering all 3 indicators including bad pixels in regions near discontinuities, non-occluded regions and all regions. For the other test image pairs our results still occupy the advanced level among all algorithms listed here. The proposed guided filtering-based disparity propagation outperforms the adaptiveweight approach for all test images except for "Tsukuba". Moreover, out algorithm maintain the edge-aware property just as adaptive-weight yet has obvious advantage on computation complexity, as guided filtering is O(1) time for each pixel and regardless of the filter kernel size. This makes it easier to use bigger local window size when handling images of large size without concern of sharp rise of the time cost. The running time of proposed method is as Table 2 shows, All the experiments run on a Laptop T450s equipped with Intel Core i5-5200U CPU and 4GB memory.
We also test various combinations of parameters including the size of disparity subsets as well as λ 1 , λ 2 in the penalty term. Table 3 gives the average percent of bad pixels on the above 4 test image pairs when using various combinations of parameters, which is denoted in the form of (|D sub |, λ 1 , λ 2 ). Our experience suggests that the size of disparity subsets |D sub | in the range [2,3], and [0.02,1.0], [0.8,2.0] for λ 1 , λ 2 respectively can achieve approximately optimal results. The selection of parameters is based on the experimental results on the test images, as listed in Table 3. Empirically the size of disparity subsets is set to be small, for the initial disparity value of a reliable points is regarded to be of relatively high confidence, thus large penalty should be given to avoid large deviation from the subsets.

Conclusions
This paper proposes a stereo matching algorithm based on disparity propagation using edgeaware filtering. By extracting disparity subsets for reliable pixels and define a new cost volume Stereo Matching by Filtering-Based Disparity Propagation accordingly, mismatches in the initial disparity map are corrected through disparity propagation from nearby reliable points. The guided filtering is integrated to conduct the propagation in O(1) time, which shows great advantage compared to traditional window-based cost aggregation methods. Future work will focus on how to customize a more reasonable cost volume which is essential to the disparity propagation. Occlusion handling and post processing of the disparity map also remain to improve.