3D reconstruction method based on second-order semiglobal stereo matching and fast point positioning Delaunay triangulation

Yongbing Xu; Kaixiang Liu; Jinyan Ni; Qingwu Li

doi:10.1371/journal.pone.0260466

Abstract

Binocular vision uses the parallax principle of the human eye to obtain 3D information of an object, which is widely used as an important means of acquiring 3D information for 3D reconstruction tasks. To improve the accuracy and efficiency of 3D reconstruction, we propose a 3D reconstruction method that combines second-order semiglobal matching, guided filtering and Delaunay triangulation. First, the existing second-order semiglobal matching method is improved, and the smoothness constraint of multiple angle directions is added to the matching cost to generate a more robust disparity map. Second, the 3D coordinates of all points are calculated by combining camera parameters and disparity maps to obtain the 3D point cloud, which is smoothed by guided filtering to remove noise points and retain details. Finally, a method to quickly locate the insertion point and accelerate Delaunay triangulation is proposed. The surface of the point cloud is reconstructed by Delaunay triangulation based on fast point positioning to improve the visibility of the 3D model. The proposed approach was evaluated using the Middlebury and KITTI datasets. The experimental results show that the proposed second-order semiglobal matching method has higher accuracy than other stereo matching methods and that the proposed Delaunay triangulation method based on fast point location requires less time than the original Delaunay triangulation.

Citation: Xu Y, Liu K, Ni J, Li Q (2022) 3D reconstruction method based on second-order semiglobal stereo matching and fast point positioning Delaunay triangulation. PLoS ONE 17(1): e0260466. https://doi.org/10.1371/journal.pone.0260466

Editor: Claudionor Ribeiro da Silva, Universidade Federal de Uberlandia, BRAZIL

Received: October 29, 2020; Accepted: November 9, 2021; Published: January 25, 2022

Copyright: © 2022 Xu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in the study are available from http://www.cvlibs.net/datasets/kitti/index.php and https://vision.middlebury.edu/.

Funding: This research was supported by National Key R&D Program of China (No. 2018YFC0406900), Jiangsu Key R&D Program (No. BE2018066) and Shandong Provincial Water Conservancy Research Project (No. SDSLKY201905) The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Binocular simulation of human vision matches the corresponding pixels of image pairs to obtain disparity maps and calculates the 3D coordinate of each point in the 3D scene. This process has wide applications in 3D reconstruction, industrial inspection, robotics navigation and virtual reality. However, binocular vision still has great limitations in obtaining the 3D information of a scene, although its ability to roughly calculate the depth range is sufficient for humans. For 3D reconstruction tasks, the approximate depth range is not sufficient. Recovering a realistic 3D scene in a computer requires accurate 3D coordinates. In addition, efficient and realistic texture mapping is also indispensable. We improved the key components of the 3D reconstruction pipeline based on binocular vision, including stereo matching, point cloud filtering and triangulation. Finally, the accuracy and visibility of 3D reconstruction are improved. Specifically, the proposed method includes a semiglobal stereo matching method based on second-order smoothness constraints, a point cloud smoothing method based on guided filtering, and a Delaunay triangulation method based on fast point positioning.

Stereo matching is vital in binocular vision, and directly influences the 3D reconstruction results. Stereo matching methods can be divided into four steps: cost calculation, cost aggregation, disparity calculation and disparity refinement [1]. The methods are classified into three categories according to cost aggregation: local matching methods [2–5], global matching methods [6–11], and semiglobal matching(SGM) methods [12]. The proposed second-order semiglobal stereo matching method aggregates multiple cost loss functions to enhance the robustness of the method, and the matching error of the method is decreased by pooling cost loss in different directions.

The 3D coordinates of all points are calculated according to camera parameters and disparity maps. The 3D point cloud obtained by the above steps contains considerable noise, and needs to be smoothed. Point cloud filtering methods include bilateral filtering [13] and guided filtering [14]. Bilateral filtering combines spatial proximity and intensity similarity, which can not only remove the noise points but also preserve the detailed features. However, the method has higher computational complexity. Guided filtering is a local linear filtering method with lower computational complexity than bilateral filtering that introduces guided images. The method can achieve good results. To restore the visual surface of a 3D scene, surface reconstruction and texture mapping are also required. Surface reconstruction methods include the distance field contour surface method [15], Poisson reconstruction [16], and Delaunay triangulation [17]. The distance field contour surface method derives the initial tangent plane from the K-nearest neighborhood points and extracts the contour surface by forming the distance field according to normal vector uniformity. Poisson reconstruction adopts invisible fitting, and obtains an invisible equation corresponding to the point cloud model by solving the Poisson equation. Delaunay triangulation connects 3D points into triangles according to certain rules and has high stability. After reconstructing the surface of the scene, texture mapping is performed to enhance the realism of the 3D model.

In this paper, we propose a 3D reconstruction method based on binocular stereo matching and Delaunay triangulation based on fast point positioning to improve the accuracy and efficiency of 3D reconstruction. The contributions of this paper are as follows: (1) A semiglobal stereo matching method combining multiple matching costs is proposed to obtain disparity maps, which are applied to subsequent 3D reconstruction. (2) A 3D point cloud smoothing method based on guided filtering is used to effectively remove noise points and retain the details of the 3D point cloud. (3) The Delaunay triangulation method based on fast point positioning is proposed to accelerate the Delaunay triangulation. Finally, the surface of the point cloud is accurately reconstructed, and the 3D reconstruction of the binocular image pair is realized.

The remainder of this paper is organized as follows: Section 2 introduces related work similar to the proposed method in the field of stereo matching and Delaunay triangulation. Section 3 introduces the proposed 3D reconstruction method. Section 4 discusses the overall evaluations and discussions on the proposed method. Section 5 draws conclusions and outlines directions for future research.

2 Related work

The 3D reconstruction method based on binocular stereo matching obtains the parallax of the left and right images through stereo matching to restore depth information, and then constructs a surface triangle mesh and implements texture mapping to achieve 3D reconstruction. In the following, the latest developments in related research will be explained in terms of stereo matching and the construction of surface triangular meshes.

2.1 Stereo matching

Early stereo matching was based on feature point matching [18–20]. The sparse disparity map that was obtained must be converted into a dense disparity map through interpolation calculation, however, the interpolation process is complicated. Feature extraction and positioning have a great influence on the matching result. The current sparse stereo matching is mostly applied to tasks such as camera pose estimation in SLAM. To avoid the complex process and errors caused by interpolation, most of the existing methods for recovering the 3D information of a scene directly obtain the dense disparity map [21, 22].

The current research on stereo matching mainly focuses on matching strategies, matching cost calculation and cost aggregation. Stereo matching methods are divided into global methods [7, 9], local methods [4, 5] and semiglobal methods [12] according to the different matching cost aggregation processes. The global method is more robust to occlusion and weak texture areas, which is more complex and time-consuming. Compared with the global method, the local method is faster but less accurate. The semiglobal method(SGM) converges matching costs from multiple directions to achieve a balance of accuracy and efficiency and is widely used in vision-based 3D reconstruction. SGM uses single-pixel mutual information(HMI) as the matching cost and performs one-dimensional energy minimization along multiple directions to approximately replace the two-dimensional global energy minimization. Woodford et al. [23] proposed a quadratic pseudo-Boolean optimization to calculate the second-order smoothness constraint and achieved superior performance in experiments. The proposed second-order semiglobal stereo matching method combining multiple matching costs can take into account the time efficiency and robustness of weak texture regions.

2.2 Mesh reconstruction based on triangulation

Triangulation divides discrete and disordered point clouds into mesh grids in 3D space. Usually, the point cloud is projected onto a two-dimensional plane to form a discrete point set in the plane area. Then, an irregular triangulated network of the point set is constructed. Among the methods for generating triangle meshes, the Delaunay triangulation method is the best, as it avoids the appearance of ill-conditioned triangles. Common methods for constructing Delaunay triangular meshes include the divide and conquer method [24], point-by-point insertion method [25], sweep surface line method [26], and triangulation growth method [27]. The triangulation growth method has been gradually eliminated due to its low efficiency. The divide-and-conquer method recursively divides the set of points and merges it level by level from bottom to top to generate the final triangulation. This is the most efficient method, but due to its use of recursion, the running process consumes considerable memory and cannot process a large amount of data. The point-by-point interpolation method constructs a convex polygon containing all points, generates the initial triangulation, and inserts the remaining points one by one. This method occupies a small amount of memory and can handle a large amount of data. However, as the triangle increases, the operating efficiency of the insertion point gradually decreases. The proposed method of quickly locating insertion points can improve the efficiency of Delaunay triangulation based on point-by-point insertion.

3 The proposed method

3.1 Matching cost of semiglobal stereo matching

In the actual scene, the binocular image has large differences in the gray value of pixels near the matching point due to inconsistent light intensity, camera exposure, or radiation intensity on the surface of the object. In the proposed method, the census metric and the gradient metric are used as the matching cost. The census transformation reflects the local structural features of the matching point domain, and we denote the census transformation as T(p), where p represents the currently matched pixel. In Formula 1, C_census(p, d) represents the census metric between the pair of matching points with the disparity value d in the binocular image pair. This is obtained by calculating the Hamming distance between T(p) and T(p_d). (1)

The gradient metric C_gradient(p, d) can be obtained from Formula 2, where C_gradient(p, d) represents the gradient metric of a matching point pair with a disparity d. ∇I(p) is the gradient value at pixel point p, which is obtained by the Sobel operator. (2)

Since both the census metric and the gradient metric are cost metrics for a single pixel, they will be mismatched due to the effects of noise and lighting. The proposed second-order smoothness-constrained stereo matching cost aggregation method based on multidirectional angles can improve the matching accuracy of weak texture regions. This is achieved by constraining the parallax difference in multiple angle directions to adapt to inclined or curved surfaces. In Fig 1, the direction r of pixel p is represented as a triangular area composed of pixels p, p − r, and p + r. Pixel p in direction r is expressed as: (3)

Download:

Fig 1. Angular direction of pixel p.

https://doi.org/10.1371/journal.pone.0260466.g001

Formula 4 gives the definition of the disparity smoothness constraint P_3r(⋅) of pixel p in direction r. When the angle α of direction r is small, the smoothness constraint is increased to maintain the discontinuity of the parallax; when the angle α of direction r is large, the smoothness constraint is reduced to adapt to the inclined or curved plane; (4)

In Formula 4, τ is the threshold used to prevent the parallax smoothness constraint from falling into the local minimum. The final second-order smoothness constraint is defined as: (5) where d is the disparity of pixel p and d′ is the disparity of pixel p − r. P₁ and P₂ are two constants, P₂ > P₁. The P₂ penalty is applied to pixel p with large parallax smoothness to adapt to the inclined or curved surface; the P₁ penalty applied to pixel p with small parallax smoothness to preserve the discontinuity of the edge. Therefore, the final matching cost must also make the cost aggregation of the current pixels affected by all pixels in multiple directions. Eight directions are selected from around the current pixel, and the matching cost in each direction is calculated using dynamic programming methods. Finally, the parallax is determined by the WTA(Winner takes all) rule. The path cost of a single pixel in a certain direction is defined as follows: (6) where . In Formula 6, L_r(p, d) represents the path cost of a single pixel in direction r, d is the parallax of the pixel, and C(p, d) represents the aggregation of C_census(p, d) and C_gradient(p, d).

3.2 3D reconstruction based on guided filtering and Delaunay triangulation

The flowchart of the proposed 3D reconstruction method is shown in Fig 2. It includes point cloud computation, point cloud smoothing, surface reconstruction, and texture mapping. The point cloud is obtained by combining the disparity maps and camera parameters, the disparity maps are obtained by stereo matching and the camera parameters are obtained by camera calibration. The point cloud is smoothed by filtering to remove the noise points. Finally, surface reconstruction and texture mapping are performed to acquire an authentic 3D model. This section focuses on point cloud smoothing and surface reconstruction during 3D reconstruction.

Download:

Fig 2. Flow chart of the 3D reconstruction method.

https://doi.org/10.1371/journal.pone.0260466.g002

3.2.1 Point cloud smoothing based on guided filtering.

Due to errors in camera calibration and stereo matching, the obtained 3D point cloud includes much noise. Therefore, the point cloud must be smoothed before surface reconstruction. This section adopts guided filtering to smooth the point cloud. Guided filtering is a local linear filter method that effectively removes noise points and preserves the detail features. Guided filtering introduces the guiding image G. Here, G is the input itself and the relationship between the output image O and the input image I can be obtained as follows: (7) where, q is the neighborhood pixel of pixel p, and the weight W_pq(G) is defined as follows: (8) where, μ_k and are the mean and variance of all pixels’ gray value in the window w_k, |w| is the size of the window,and σ is an adjusting parameter. Guided filtering is introduced to 3D point clouds, and the original point cloud and smoothed point cloud satisfy the local linear relationship in depth. After guided filtering, all the vertices move along their normal direction, which is defined as follows: (9) where, V is a vertex in the point cloud, V′ is the vertex after smoothing by guided filtering, d is a weighting factor. The normal direction n of vertex V is calculated by local surface fitting. A plane is established to estimate the local geometry of the vertex V, and is defined as follows: (10) where, A, B, C and D are plane parameters. The KD-tree algorithm is adopted to select ten points near vertex V as the neighborhood spatial points. The coordinates of these points are substituted into Formula 10 so that the plane parameters can be estimated by the least squares method. Then, the updated normal direction is obtained as follows: (11)

The weighting factor d is determined by the depth similarity between the center point and the local neighborhood points. The depth similarity refers to the distance vector that is projected onto the normal direction of the center point and is defined as follows: (12) where S_V is a neighborhood point set centered on vertex V, p and q are the neighborhood points of vertex V, and S is the size of set S_V. <n, V − p> and <n, V − q> indicate the depth similarity between the two points p and q and vertex V. μ_v and σ_v indicate the mean and variance of all points’ depth similarity in the set S_V, respectively.

3.2.2 Projection-based Delaunay triangulation.

To improve the visibility of the 3D model, a surface patch must be added on the point cloud. The surface patch can be any shape, generally a triangle or quadrangle is selected. Since a triangle is the smallest unit that constitutes the plane, it does not easily deform during the rotation process. Therefore, a triangle is selected as the basic unit to perform triangulation. As shown in Fig 3, projection-based Delaunay triangulation can be roughly divided into the following three steps:

Map a 3D point cloud to a 2D plane using orthogonal projections based on the normal direction.
Triangulate the 2D point set obtained from the mapping according to Delaunay criterion to determine the topological connection relationship between points.
Determine the topological connection between the original 3D points according to the topological connection relationship of the projection points in the plane. The obtained triangular mesh is the reconstructed surface model.

Download:

Fig 3. Triangulation.

https://doi.org/10.1371/journal.pone.0260466.g003

Among the triangulation methods, Delaunay triangulation has the best mathematical features such as the closest, uniqueness, optimality, most regular, and convex polygon surface, so it is adopted for surface reconstruction. A projection-based Delaunay triangulation method is performed. First, the 3D point set is mapped to the xOy plane. Second, the plane point set is triangulated according to the Delaunay triangulation method based on the Bower-Watson algorithm [28]. Finally, the plane triangle mesh is mapped to the 3D space to generate a 3D model. Then, texture mapping is performed to restore the surface features of the 3D model. A flow chart of the projection-based Delaunay triangulation method is shown in Fig 4.

Download:

Fig 4. Flow chart of the projection-based Delaunay triangulation.

(a) 3D point set (b) plane point set (c) plane triangulation (d) 3D triangulation (e) texture mapping.

https://doi.org/10.1371/journal.pone.0260466.g004

The Bowyer-Watson method is currently the most common point-by-point insertion method, as shown in Fig 5. First, a supertriangle containing all of the scatter points is constructed. Second, each point is inserted in turn, and the influence triangles containing the point are found. Third, the common edges of the influence triangles are deleted, and the point and all the vertices of the influence triangles are joined. Finally, the unique triangular mesh model is obtained by adjusting the diagonal lines until all the points are inserted.

Download:

Fig 5. Flow chart of the Bowyer-Watson algorithm.

(a) inserting the point P (b) searching for the influence triangles (c) deleting common edge AB (d) forming triangles.

https://doi.org/10.1371/journal.pone.0260466.g005

In the process described above, the triangle containing the insertion point and the influence range of the insertion point must be found every time a point is inserted. The original Bowyer-Watson method needs to traverse the edges and vertices of all triangles. As the triangle increases, the time cost of these two steps will increase significantly. A method for quickly locating the insertion point based on direction search is proposed, which can quickly locate the triangle containing the point to be inserted. In the following, G indicates the barycentric of the triangle and P indicates the point to be inserted. Take the newly generated triangle as the initial triangle. Starting from the initial triangle, the search direction is determined by the relative position of G and P. When G and P are on different sides of a certain edge, the next triangle to be searched is the triangle adjacent to this edge. Stop searching when G and P are on the same side of the three edges of the triangle. This triangle contains the point to be inserted. Fig 6 shows the direction search method.

Download:

Fig 6. Method of quickly locating insertion point based on direction search.

https://doi.org/10.1371/journal.pone.0260466.g006

In Fig 6, P is the point to be inserted, S is the initial triangle, and T is the target triangle. In terms of implementation, the vertices of the triangle are arranged clockwise, such that the barycentric of the triangle is always on the right side of the edge. It is only necessary to determine whether the point to be inserted is on the left edge. For some special cases:

The point to be inserted is on a certain edge of the triangle: it is regarded as inside the triangle.
The point to be inserted is the vertex of the triangle: do not insert the point.
The point to be inserted is on the extension line of a certain edge of the triangle: it is considered to be on the same side of the edge as the barycentric. Continue to evaluate the next edge.

The method of quickly locating the insertion point based on direction search can be used to locate the triangle containing the points very quickly due to the point-by-point insertions according to the principle of spatial proximity.

4 Results and discussion

To verify the effectiveness of the proposed method, the C++ language is combined with OpenCV and OpenGL to realize the proposed method. The hardware platform is an Intel Core i5–4210H@2.90GHz. The Middlebury dataset [29, 30] and KITTI dataset [31] are adopted to evaluate the performance of the proposed method, and the depth information is compared with the ground truth to perform quantitative analysis of the reconstruction performance. The scene of the Middlebury dataset is an indoor environment, so a qualitative analysis of 3D reconstruction visualization for the Middlebury dataset is selected. In addition, our results are evaluated on the KITTI 2012 and KITTI 2015 benchmarks to obtain a quantitative analysis of the reconstruction effect. The contrast methods have certain similarities with the proposed semiglobal stereo matching method. Second, we compare the proposed Delaunay triangulation of fast point positioning with the original Delaunay triangulation-based on insertion points to show the speed advantage of the proposed fast point location method.

4.1 Middlebury dataset

In this section, six image pairs in the Middlebury dataset are selected to perform a 3D reconstruction experiment. The experimental results are shown in Figs 7–11, for the Cloth1, Wood1, Djembe, Piano, and Shelves scenes, respectively. In these figures, (a)-(d) are the left image, right image, disparity image, and Delaunay triangulation, while (e)-(h) are the 3D reconstruction results of these scenes from different angles. The 3D reconstruction results of scenes Cloth1 and Cloth3 are accurate, and the proposed method can accurately restore the surface of the cloth. The 3D reconstruction results of other scenes are not ideal due to the occurrence of depth discontinuity in these scenes.