A simple approximation algorithm for the diameter of a set of points in an Euclidean plane

Jieying Hong; Zhipeng Wang; Wei Niu

doi:10.1371/journal.pone.0211201

Abstract

Approximation algorithms with linear complexities are required in the treatments of big data, however, present algorithms cannot output the diameter of a set of points with arbitrary accuracy and near-linear complexity. By introducing the partition technique, we introduce a very simple approximation algorithm with arbitrary accuracy ε and a complexity of O(N + ε⁻¹ log ε⁻¹) for the cases that all points are located in an Euclidean plane. The error bounds are proved strictly, and are verified by numerical tests. This complexity is better than existing algorithms, and the present algorithm is also very simple to be implemented in applications.

Citation: Hong J, Wang Z, Niu W (2019) A simple approximation algorithm for the diameter of a set of points in an Euclidean plane. PLoS ONE 14(2): e0211201. https://doi.org/10.1371/journal.pone.0211201

Editor: J. Alberto Conejero, IUMPA - Universitat Politecnica de Valencia, SPAIN

Received: September 12, 2017; Accepted: September 20, 2018; Published: February 8, 2019

Copyright: © 2019 Hong et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported by National Natural Science Foundation of China (Grant Nos. 11601023). The website of the funder is http://www.nsfc.gov.cn/. The author Wei Niu received the funding. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Given a finite set of points T in a 2D Euclidean plane , its diameter, denoted by d_T, is defined as the maximum distance between two points of T. Computing the diameter of a point set is a fundamental problem in computer science. It has been proved that in an Euclidean plane, finding the accurate diameter of a set of N points can be reduced to formulating the convex hull of them, with a lower bound of complexity O(N log N) [1–4].

In the science of big data, this classical problem encounters new challenges. For big data, the number of points N can be huge, and one usually expects linear or sub-linear algorithms to replace the O(N log N) complexity. Clearly, as O(N log N) is the lower bound, they will certainly be approximate algorithms. In the present paper we only consider the algorithms without pre-processing. For these cases, no sub-linear algorithm can guarantee the accuracy of the approximate diameter, as listing the points will require a minimum O(N) complexity. Therefore, if we want to obtain an estimable approximate diameter, a linear complexity should be the lower bound.

As an introduction, we here show an easiest approximate algorithm. Given an arbitrary point p_i, this algorithm simply reports its maximum distance to other points, i.e, with the distance between points T_i,T_j ∈ T, as the approximate diameter d_a. It is simply to show that 1 ≤ d_T/d_a ≤ 2, implying a very low accuracy of the approximation.

There exist two references for improving this approximation to higher accuracy. Egecioglu and Kalantari designed an algorithm that in m iterations the reported approximate diameter d_m satisfies that [5]. Recently, Alipour et al. improved this algorithm to allow fewer iterations, however, the accuracy of the approximation did not change [6].

Another type of approximation problems allows introducing an arbitrary positive number 0 < ε < 1 aims at outputting an approximate diameter d_o in linear time, such that (1) Note that this is equivalent to the description that , indicating that if Eq (1) is satisfied, we can output instead of to satisfy that (2) which is formally consistent to literature. These problems are usually named as (1 + ε)-approximations. In two dimensions, lots of approximation algorithms with various near-linear complexities have been developed. Table 1 gives a comparion of different (1 + ε)-approximation algorithms to compute the diameter of T, in chronological order. However, we remark that most of these algorithms are difficult to be implemented in practices since they require complicated calculations and designs in computational geometry. In this paper, we will introduce a very simple algorithm to approach a near-linear complexity of O(N + ε⁻¹ log ε⁻¹) which is much simpler to be implemented in applications.

Download:

Table 1. Comparasion of different (1 + ε) − approximation algorithms to compute the diameter of T.

https://doi.org/10.1371/journal.pone.0211201.t001

Approximation algorithm for the diameter in an Euclidean plane

For the finite set of points T in , we choose a point O ∈ T arbitrarily as the origin, and then divide the plane into 6n same regions with . In each region S_i (i = 1, …, 6n), we can find a farthest point from the origin, and let r_i denote the distance between the farthest point of S_i and the origin. By using the origin O as the the center of a circle and r_i as the radius, we can obtain 6n sector regions, as Fig 1. We remark that the number of the regions, 6n, will allow in the following parts to provide analytical error estimation. Let p_i (i = 1, …, 6n) be the midpoint of the arc of each sector region, and compute the largest distance d_p of these 6n midpoints p_is. Then we propose the following main theorem of this paper which shows the relationship between the diameter d_T of the point set T and the largest distance d_p. Here we note that the virtual points p_is can be different with the real points in T.

Download:

Fig 1. Diagram of the partition for the point set.

Solid points are real points in the set T, while empty points are virtual points in the set of p_is.

https://doi.org/10.1371/journal.pone.0211201.g001

Theorem 1. d_T is the diameter of a finite set of points T in , and d_p is the largest distance of the 6n virtual midpoints p_is as defined above, then the following statement holds: (3)

In the following we will give the proof of Theorem 1 in two parts: to prove the upper and lower bounds of respectively.

Lower bound of

Without loss of generality, suppose that an endpoint of the line segment d_p is in the region S, and then we denote the opposite angle region by S₀ and denote the other regions clockwise by S₁, …, S_6n−1. Note that in this way the region S is exactly the region S_3n. Let the line passing through the origin O and the midpoint of the arc of the region S be the x-axis, then we can set up the Cartesian coordinate system in the plane, as Fig 2. The coordinate of the midpoint p_i of the arc in each region S_i is , where i = 1, …, 6n − 1, and r_i is the radius of the sector region S_i. Before giving the proof on the lower bound of , we bring out the following lemmas.

Download:

Fig 2. Diagram of the 6n regions for the point set and the Cartesian coordinate system.

https://doi.org/10.1371/journal.pone.0211201.g002

Lemma 1. If an endpoint of the line segment d_p is in the region S (i.e., S_3n) as we supposed above, then the other endpoint of d_p cannot be obtained in the regions S_2n+1, …, S_3n−1.

Proof. Denote R = max(r₀, …, r_6n−1), then the relationship R ≤ d_p is obvious. For point in the region S_2n+1, …, S_3n−1 (i.e., i ∈ ⟦2n + 1, 3n − 1⟧), and the point p_3n(−r, 0) in the region S, we can compute the distance between these two points: (4) Obviously, .

Let , then f′(r) = 2r − r_i can be easily obtained. Since r, r_i ∈ [0, R], then we have that f′(r) < 0 when r ∈ [0, r_i/2), and f′(r) > 0 when r ∈ (r_i, R]. Thus f(r)_max = f(0) or f(R).

When r = 0, ; and when . Let , since r_i ∈ [0, R], we can obtain the maximum of g(r_i): g(r_i)_max = g(0) or g(R), and g(0) = g(R) = R². Thus g(r_i) ≤ R².

Therefore, , and the equality arrives when (r = R, r_i = 0) or (r = R, r_i = R) or (r_i = R, r = 0). In this way, we can prove that . That means the distance , is always less than d_p.

According to the symmetry of the regions, the following lemma can be easily obtained.

Lemma 2. If an endpoint of the line segment d_p is in the region S (i.e., S_3n), then the other endpoint of d_p cannot be obtained in the regions S_3n+1, …, S_4n−1.

Moreover, the cases for an endpoint of d_p in the regions S₀, S_6n−1, …, S_4n are equivalent to those in the regions S₀, S₁, …, S_2n. Therefore, if we suppose that an endpoint of the line segment d_p is in the region S, then from Lemma 1 and 2, we only need to consider the 2n + 1 cases where the other endpoint of d_p is in the regions S₀, …, S_2n. In what follows we will consider two cases in order to compute the lower bound of .

Case I: i ∈ ⟦0, 2n − 1⟧

As we supposed above, an endpoint of d_p is in the region S, then there certainly exists a point q₁ on the arc of the region S. The coordinate of q₁ is (r cos θ, r sin θ), where and r is the radius of the sector region S. If the other endpoint of d_p is in the region S_i (i = 0, 1, …, 2n − 1), then there certainly exists a point q₂ on the arc of the region S_i and the coordinate of q₂ is (−r_i cos θ_i, r_i sin θ_i), where and r_i is the radius of the sector region S_i.

The distance between the two points q₁ and q₂ can be computed as (5) where , for i = 0, 1, …, 2n − 1. Since cos x is monotone decreasing in , we have . Thus (6) From the definition of d_T, we know that (7)

Since , the following relationship can be obtained. (8)

From (8), we have (9) When x ∈ [0, 2n − 1], the function is monotone decreasing, thus . In addition, , Then we can obtain that (10)

Let and x ∈ [0, 2n − 1]. We consider the monotonicity of f(x) and compute its derivative for this purpose. (11) Since the denominator of f′(x) is always greater than 0, we only consider the sign of its numerator. Let g(x) be the numerator of f′(x) divided by π/(3n). (12)

The derivative of g(x) is . In the case x ∈ [0, 2n − 1], it is obviously that g′(x) > 0, and we can get that g(0) < 0 and g(2n − 1) < 0 by computation. Thus for any x ∈ [0, 2n − 1], g(x) < 0 and also f′(x) < 0. Therefore we know that f(x) is monotone decreasing in the interval [0, 2n − 1].

All this leads up to the following inequality: (13)

Case II: i = 2n

In this case , and from the proof of Lemma 1 we know that . Moreover, since d_T ≥ R, we have (14)

Concluding the two cases, we can obtain the lower bound of : (15)

Upper bound of

In this subsection, we will prove the upper bound of . Similar to the approach for the proof of the lower bound, supposing that an endpoint of the line segment d_T is in the region S, and then we only need to consider the cases that the other endpoint of d_T is in the region S_i for i ∈ ⟦0, 2n⟧.

Case I: i ∈ ⟦1, 2n⟧

As we supposed above, an endpoint of d_T is in the region S and the other endpoint is in the region S_i, which denoted by m₁ and m₂ respectively. The coordinates of m₁ and m₂ are () and () respectively, where , , and . The distance between the two points , is exactly d_T and thus .

Furthermore, (16) where , for i = 1, …, 2n. Since cos x is monotone decreasing in [0, π], we have . Thus (17)

Let , where .

In the case , . And since , we know that (18)
In the case , we can compute that .
1. If , then when . And we can get the maximum of : . Thus , and as we mentioned above, , those lead that (19) In this moment, (20) Therefore we have (21)
2. If , then when and when . In this case, or h(r).
  If , same as the case (a), (22) If , then let , where . The derivative of is . If , then , and . In this case, , and then similar to the case (a), (23) If , then , and . Here we have , and thus (24)

Since , we can summarize all the cases in Case I and get (25) Then by using the similar approach for computing the lower bound of , where i ∈ ⟦1, 2n⟧, we can deduce the following inequality (26)

Case II: i = 0

In this case, . Moreover, (27)

Therefore, (28)

From Case I and II, the supper bound of can be concluded: (29)

In this way, we have proved the Theorem 1, and the relationship in this theorem can be also written as (30) This theorem therefore provides a fast approximation for the diameter of the point set T.

Remarks

We remark that the complexities of visiting all N points in set T, calculating their polar coordinates and renewing the values of r_i are all linearly O(N). The complexity of calculating the diameter of 6n virtual points is O(n log n) as introduced in section 1. Indeed, even if we compare all pairs among these points via brute force, the complexity will be at most O(n²), which will be negligible by comparing to O(N) if N is huge. Therefore, we conclude that our approximation algorithm has a complexity of O(N + n log n), which is deterministically linear complexity when N ≫ n.

In addition, recalling the problem descriptions (1) and (2), we can also formulate the outputted diameter and calculate the necessary number of regions n for an arbitrary accuracy 0 < ε < 1. It is easy to verify that when (31) Eqs (1) and (2) are both satisfied. For small ε Taylor expansion shows that (32) which indicates that (33) and the complexity writes (34)

We therefore remark that all values of d_p, d_o and can be used as approximate diameters, depending on the accuracy interval one requires.

Numerical tests

As illustrated in the previous section, expressions (3), (1) and (2) describe the error bounds of d_p, d_o and respectively. However, in practice these upper and lower bounds correspond to the worst cases, while for most situations the error will be even smaller. In this section we show by three different point sets this error distribution, respectively. In the first point set T₍₁₎, the Cartesian coordinate of each point is (x, y), where x and y are independent random variables uniformly distributed in [0, 1), leading to a diameter close to the diagonal; in the second point set T₍₂₎, the polar coordinate of each point is (r, θ), where θ is a random variable homogeneously distributed in [0, π), and r is a random variable with Gaussian distribution N(0, 1) [11]; the third point set T₍₃₎ is chosen from a real database on the positions of fluid particles [12]. Both T₍₁₎ and T₍₂₎ have 1 × 10⁶ discrete points, while T₍₃₎ have 4 × 10⁵ discrete points. We simply use O(n²) brute force method to calculate the diameter of the 6n virtual points. Indeed, this does not yield any inconvenience in the calculation, while the computational time of the case n = 100 is only 1.01 times of that of the case n = 1. Therefore we can conclude that the calculations are of near-linear complexity.

Without loss of generality, here we use our algorithm to output the values of d_p, and compare them with the theoretical error ranges (3), as shown in Fig 3. We randomly select 100 different origin points for each value of n respectively. Clearly, for all cases, most calculated diameters d_p are quite close to the real value d_T, which are even quite better than the theoretically worst bounds (shown as dash-dotted lines in Fig 3). These results then show the effectiveness of the present algorithm.

Download:

Fig 3. Numerical results of d_T/d_p.

(a) T₍₁₎ case; (b) T₍₂₎ case; (c) T₍₃₎ case. The theoretical error bounds are shown as dash-dotted lines.

https://doi.org/10.1371/journal.pone.0211201.g003

We also present the CPU time in Fig 4. Points are generated similarly to the T₍₁₎ case, i.e., point coordinates are independent random variables uniformly distributed in [0, 1). The partition parameter n is fixed as 2, 12, 100 and 300 respectively. Calculations are performed via single thread at Intel Core i5-6200U CPU 2.30GHz, interpreted by Python 2.5.1 in the IDLE software. Fig 4 shows that the CPU time is linear to the value of N, illustrating that the present algorithm is of nearly linear complexity. In addition, although no optimization is implemented to accelerate the calculations, the real performance is acceptable since calculating the approximate diameter of 2 × 10⁶ points with n = 100 (corresponding to relative error ε = 9 × 10⁻⁷) only costs about 3 seconds. These evidences suggest the implementation of the present algorithm in real applications.

Download:

Fig 4. CPU time with different N values.

Points are generated similarly to the T₍₁₎ case. n = 2, 12, 100 and 300 respectively.

https://doi.org/10.1371/journal.pone.0211201.g004

Conclusion

As a fundamental problem of big data, linear approximation algorithms for the diameter of a set of points will be potentially useful. By introducing the partition technique, we introduce an approximation algorithm with arbitrary accuracy and deterministically linear complexity. The implementation of this algorithm is very simple and does not require any complicated data structure. Note that the lower bound of the proposed algorithm is O(N + n log n) with n of the order of ε⁻¹, while a brute force visiting algorithm for virtual points will increase this to O(N + n²). In practice n will be much smaller than N, therefore O(n²) will be negligible by comparing to O(N). In addition, increasing the number of partition n does not increase any multiple coefficient to O(N), which indicates the robustness of the near-linear complexity of our algorithm. Comparing to existing approximation algorithms, the present algorithm shows a lowest complexity O(N + ε⁻¹ log ε⁻¹). Also, another advantage of the present algorithm is that it is very simple to be implemented, which does not require any complicated data structure or geometry calculation.

The present contribution is a preliminary attempt in 2D plane. For higher dimensional cases, this method might also be extended, but a division of hyper-sphere [13–15] will be required. In those situations, other partition schemes will be more efficient. For example, one may use high-dimensional Cartesian coordinates instead of the division of hyper-sphere. The related accuracy will also be more complicated, and is expected to be investigated in our future work.

Acknowledgments

This work has been supported by the NSFC project 11601023.

References

1. Yao ACC. On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM J Comput. 1982;11:721–736.
- View Article
- Google Scholar
2. Preparatat FP, Shamos MI. Computational geometry: an introduction; 1985.
3. Bentley JL, Preparata FP, Faust GM. Approximation algorithms for convex hulls. Comm ACM. 1982;25:64–68.
- View Article
- Google Scholar
4. Malandain G, Boissonnat JD. Computing the Diameter of a Point Set. Int J Comput Geom Appl. 2002;12(06):489–509.
- View Article
- Google Scholar
5. Eg̃eciog̃lu O, Kalantari B. Approximating the diameter of a set of points in the Euclidean space. Inform Process Lett. 1989;32(4):205–211.
- View Article
- Google Scholar
6. Alipour S, Kalantari B, Homapour H. Fast approximation and randomized algorithms for diameter. arXiv preprint arXiv:14102195. 2014.
7. Agarwal PK, Matoušek J, Suri S. Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Comput Geom. 1992;1(4):189–201.
- View Article
- Google Scholar
8. Barequet G, Har-peled S. Efficiently approximating the minimum-volume bounding box of a point set in three dimensions. J Algorithms. 1999;38:82–91.
- View Article
- Google Scholar
9. Chan TM. Approximating the diameter, width, smallest enclosing cylinder, and minimum-width annulus. In: Proceedings of the sixteenth annual symposium on Computational geometry. ACM; 2000. p. 300–309.
10. Arya S, Chan TM. Better ε-Dependencies for Offline Approximate Nearest Neighbor Search, Euclidean Minimum Spanning Trees, and ε-Kernels. In: Proceedings of the thirtieth annual symposium on Computational geometry. ACM; 2014. p. 416.
11. Knuth D. The Art of Computer Programming 2: Seminumerical Algorithms. MA: Addison-Wesley; 1968.
12. Fang L, Bos WJT, Jin GD. Short-time evolution of Lagrangian velocity gradient correlations in isotropic turbulence. Phys Fluids. 2015;27:125102.
- View Article
- Google Scholar
13. Cooper PW. The hypersphere in pattern recognition. Inf Control. 1962;5(4):324–346.
- View Article
- Google Scholar
14. Katsuki S, Frangopol DM. Hyperspace Division Method for Structural Reliability. J Eng Mech. 1994;120(11):2405–2427.
- View Article
- Google Scholar
15. Sato K, Yamaji A. Uniform distribution of points on a hypersphere for improving the resolution of stress tensor inversion. J Struct Geol. 2006;28(6):972–979.
- View Article
- Google Scholar

[ref1] 1. Yao ACC. On constructing minimum spanning trees in k-dimensional spaces and related problems. SIAM J Comput. 1982;11:721–736.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Preparatat FP, Shamos MI. Computational geometry: an introduction; 1985.

[ref3] 3. Bentley JL, Preparata FP, Faust GM. Approximation algorithms for convex hulls. Comm ACM. 1982;25:64–68.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref4] 4. Malandain G, Boissonnat JD. Computing the Diameter of a Point Set. Int J Comput Geom Appl. 2002;12(06):489–509.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref5] 5. Eg̃eciog̃lu O, Kalantari B. Approximating the diameter of a set of points in the Euclidean space. Inform Process Lett. 1989;32(4):205–211.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref6] 6. Alipour S, Kalantari B, Homapour H. Fast approximation and randomized algorithms for diameter. arXiv preprint arXiv:14102195. 2014.

[ref7] 7. Agarwal PK, Matoušek J, Suri S. Farthest neighbors, maximum spanning trees and related problems in higher dimensions. Comput Geom. 1992;1(4):189–201.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref8] 8. Barequet G, Har-peled S. Efficiently approximating the minimum-volume bounding box of a point set in three dimensions. J Algorithms. 1999;38:82–91.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref9] 9. Chan TM. Approximating the diameter, width, smallest enclosing cylinder, and minimum-width annulus. In: Proceedings of the sixteenth annual symposium on Computational geometry. ACM; 2000. p. 300–309.

[ref10] 10. Arya S, Chan TM. Better ε-Dependencies for Offline Approximate Nearest Neighbor Search, Euclidean Minimum Spanning Trees, and ε-Kernels. In: Proceedings of the thirtieth annual symposium on Computational geometry. ACM; 2014. p. 416.

[ref11] 11. Knuth D. The Art of Computer Programming 2: Seminumerical Algorithms. MA: Addison-Wesley; 1968.

[ref12] 12. Fang L, Bos WJT, Jin GD. Short-time evolution of Lagrangian velocity gradient correlations in isotropic turbulence. Phys Fluids. 2015;27:125102.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref13] 13. Cooper PW. The hypersphere in pattern recognition. Inf Control. 1962;5(4):324–346.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref14] 14. Katsuki S, Frangopol DM. Hyperspace Division Method for Structural Reliability. J Eng Mech. 1994;120(11):2405–2427.
View Article
Google Scholar

[31] View Article

[32] Google Scholar

[ref15] 15. Sato K, Yamaji A. Uniform distribution of points on a hypersphere for improving the resolution of stress tensor inversion. J Struct Geol. 2006;28(6):972–979.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

Figures

Abstract

Introduction

Approximation algorithm for the diameter in an Euclidean plane

Lower bound of

Upper bound of

Remarks

Numerical tests

Conclusion

Acknowledgments

References