Two Dimensional Yau-Hausdorff Distance with Applications on Comparison of DNA and Protein Sequences

Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences. The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm. The Yau-Hausdorff method can be used for measuring the similarity of DNA sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of DNA sequences. The graphical representations of DNA sequences conserve all sequence information and the Yau-Hausdorff distance is mathematically proved as a true metric. Therefore, the proposed distance can preciously measure the similarity of DNA sequences. The phylogenetic analyses of DNA sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of DNA or protein sequences. This study demonstrates that Yau-Hausdorff distance is a natural metric for DNA and protein sequences with high level of stability. The approach can be also applied to similarity analysis of protein sequences by graphic representations, as well as general two dimensional shape matching.


Lemma 2:
Let A and B be two sets of finite points in R d . For a translation vector t ∈ R d , we define A + t = {a + t|a ∈ A}. For a rotation θ, we define A θ to be the set A rotated around the origin by θ. Let H d (A, B) = inf t,θ h(A θ + t, B), then H d is a metric, and is called minimum d-dimensional Hausdorff metric. Proof: 1.
means that A and B are of the same shape, i.e. we can find translation t and rotation θ, such that A θ + t = B).
On the other hand, if A = = = B, then we can find t 0 and θ 0 , s.t. A θ0 + t 0 =
3. Take three finite point sets Since the rotation group is compact and we only need to consider the translation in a compact region, we can find θ 1 , t 1 , θ 2 , t 2 , s.t.
That gives us The triangle inequality holds.
We have proven that H d is a metric.

Theorem:
Let A and B be two point sets of finite points in R 2 . For a rotation θ, we define P x (A θ ) to be the x-axis projection of A θ .
Here H 1 is the minimum one-dimensional Hausdorff distance, then D is a metric. Proof: 1. Since which gives us D(A,B)=D(B,A).

We take three sets A,B,C of finite points in
for each fixed θ 0 . Assume α 0 is a rotation, s.t.
for each fixed rotation θ 0 .
We take the maximum of all rotation θ in the left hand, and we get Similarly, we can get The triangle inequality holds. Assume that there are m points in set A and n points in set B. We assume that m > n.
We can find a rotation θ 0 , s.t. the number of points in P x (A θ0 ) has m different points, but the number of points in P x (B ϕ ) is no more than n, Similarly we can determine the shape of the triangle OA 1 A 2 with three rotations.
For the next point A 3 , we can know the distance between A 3 ,O, the distance between A 3 ,A 1 and the distance between A 3 ,A 2 by three rotations.
So the location of A 3 is fixed. The other points are fixed in the same way.
For each point, we need three other rotations. So with 3+3(n−3) = 3n−6 rotations, the shape of A is fixed.
It means that we can reconstruct the set A in a plane from the information of P We have proven that D is a metric.

Lemma
for any rotation θ and ϕ. Here H 1 is the minimum one-dimensional Hausdorff distance. Proof: for any i,j. Take the minimum of j = 1, 2, ..., m in this inequality, and we get Take the max of i = 1, 2, ..., n in this inequality, and we get This means Similarly we have Q.E.D.
Proof: Assume First we prove that h(A θ1 + t 1 , B) ≥ D(A, B) for any fixed θ 1 and t 1 . We only Equation (80) above is from the lemma. That gives us for any fixed θ 2 , which means Similarly we can get Equations (84) and (85) give us for any θ 1 and t 1 . Take minimum of the left hand, and we have Q.E.D.
Let a = OM = sin θ, b = M P , then P N = sin θ = a.
Next we prove that inf ϕ H 1 (P x (A θ ), P x (B ϕ )) = 1 2 min{a, b}. Assume a ≤ b, draw four disks of radius Because there are no more than three points in the projection of B ϕ , there must be a disk that does not contain any point of P x (B ϕ ). So H 1 (P x (A θ ), P x (B ϕ )) ≥ 1 2 a, for any rotation ϕ. We then take a rigid motion ϕ 0 , s.t. P x (B ϕ0 ) = {O, M , N }.
Take t = − 1 2 a, and translate P x (B ϕ0 ) by t. Assume P x (B ϕ0 )− 1 2 a = {O , M , N }. We can see that the Hausdorff distance