Graph diffusion distance: Properties and efficient computation

doi:10.1371/journal.pone.0249624

Fig 1.

The first seven levels of the graph lineage of path graphs, with ancestry relationships.

Δl = 0 edges are colored in orange, Δl = ±1 edges are colored in blue. Self-loops are not illustrated.

More »

Expand

Fig 2.

Top: subsamples of a mesh of the Utah teapot, of increasing density (each node is connected to its 8 nearest neighbors by the Δl = ±0 edges, rendered in blue). These samples form a graph lineage (Δl = ±1 edges are not illustrated). Bottom: the same set of nodes, with only Δl = ±1 edges plotted (in orange) for one node from the coarsest level and its descendants.

More »

Expand

Fig 3.

A plot illustrating unimodality of diffusion distance.

D² was calculated between two grid graphs Sq₇ and Sq₈ of size 7 × 7 and 8 × 8, respectively. The distance is given by the formula as a function of t. The peak, at t ≈.318, yields the distance D²(Sq₇, Sq₈).

More »

Expand

Table 1.

Summary of this paper’s investigation of different forms of our graph dissimilarity measure.

In this work, we systematically explore properties of this measure given sparsity parameter s = 0, and various regimes of t (fixed at some early time, or maximized over all t) and α (fixed at α = 1, fixed at a constant power r of the ratio of graph sizes, or minimized over all α. We leave exploration of nonzero values of the sparsity parameter to future work. Variants not explicitly called out are not considered. In the case where α and t are both optimized and s > 0, it is unclear which of the metric conditions GDD satisfies, hence the corresponding classification is left blank.

More »

Expand

Fig 4.

Two plots demonstrating characteristics of distance calculation between a (7× 7) grid and an (8 × 8) grid.

(a): Plot illustrating the discontinuity and multimodality of the linear version of distance. Each gray curve represents a function . The thicker curve is the lower convex hull of the thinner curves as a function of α, that is: . We see that f(α) is continuous, but has discontinuous slope, as well as several local optima (marked by arrowheads). These properties make difficult to optimize, necessitating the development of Algorithm 1. (b): As in (a), but with D²(Sq₇, Sq₈|t = .318) plotted instead of . This t value is the location of the maximum in Fig 3.

More »

Expand

Fig 5.

Graph lineages used in multiple numerical experiments in the main text.

More »

Expand

Fig 6.

Distances D²(G, H) calculated for several pairs of graphs.

The top plot shows distances where G and H are both chosen from {Grid_13×13, P₁₆₉, C₁₆₉, Ba₁₃}. At bottom, distances are calculated from G chosen in {Grid_12×12, P₁₄₄, C₁₄₄, Ba₁₂} to H chosen in {Grid_13×13, P₁₆₉, C₁₆₉, Ba₁₃}. As expected, diagonal entries are smallest.

More »

Expand

Fig 7.

Comparison of runtimes for our algorithm and bounded golden section search over the same interval [10⁻⁶, 10].

Runtimes were measured by a weighted count of evaluations of the Linear Assignment Problem solver, with an n × n linear assignment problem counted as n³ units of cost. Because our algorithm recovers the entire lower convex hull of the objective function as a function of α, we compute the cost of the golden section search as the summed cost of multiple searches, starting from an interval bracketing each local optimum found by our algorithm. We see that our algorithm is much less computationally expensive, sometimes by a factor of 10³. The most dramatic speedup occurs in the regime where n₁ ≪ n₂. Graphs were generated by drawing n₁ uniformly from [5, 120], drawing n₂ uniformly from [n₁, n₁ + 60], and then adding edges according to a Bernoulli distribution with p in {.125, .25, .375, .5, .625, .75, .875 } (60 trials each).

More »

Expand

Fig 8.

Histograms of triangle inequality violation.

These plots show the distribution of Disc(G₁, G₂, G₃), as defined in the text, for the cases (a) top: the linear or small-time version of distance and (b) bottom: the exponential or arbitrary-time version of distance. We see that for the sizes of graph we consider, the largest violation of the triangle inequality is bounded, suggesting that our distance measure may be an infra-ρ-pseudometric for some value of ρ ≈ 1.8 (linear version) or ρ ≈ 5.0 (exponential version). See Table 1 for a summary of the distance metric variants introduced in this paper. We also plot the same histogram for out-of-order (by vertex size) graph sequences: Disc(G₂, G₁, G₃) and Disc(G₃, G₂, G₁). Each plot has a line at x = 1, the maximum discrepancy score for which the underlying distances satisfy the triangle inequality.

More »

Expand

Table 2.

Mean distances between graphs in several lineages.

For two lineages G₁, G₂… (listed at left) and H_!, H₂, … (listed at the top), each entry shows the mean distance D(G_i, H_i+1) (where the average is taken over i = 1 to 12). As expected, we see that the distance from elements of a graph lineage to other members of the same lineage (the diagonal entries of the table) is smaller than distances taken between lineages. Furthermore as expected, 1D paths are more similar (but not equal) to 1D cycles than to other graph lineages.

More »

Expand

Fig 9.

Cauchy-like behavior of graph distance as a function of sequence index, n.

The distance between successive square grids and all other graph sequences appears to diverge (the same behavior is seen for k-barbells). Notably, the distance between Grid_n×n and Grid_(n+1)×(n+1) does not appear to converge, until much higher values of n (n > 100) than the other convergent series. This may be because the distances calculated are an upper bound, and may be converging more slowly than the ‘true’ optima.

More »

Expand

Fig 10.

Limiting behavior of D and two parameters as path graph size approaches infinity.

All distances were calculated between Path_n and Path_n+1. We plot the value of the objective function, as well as the optimal values of α and t, as n → ∞. Optimal α rapidly approach 1 and the optimal distance tends to 0. Additionally, the optimal t value approaches a constant (t ≈.316345), providing experimental validation of the assumption we make in proving Theorem 14.

More »

Expand

Fig 11.

Comparison of the distance D(Sq_n, Sq_n+1) as a function of n, to the upper bound calculated as the optimum of distance between Pa_n and Pa_n+1.

We see that the upper found converges to some constant D ≈ 0.01782, whereas the actual distance appears to be converging to 0 as n → ∞.

More »

Expand

Fig 12.

3D meshes used in the shape analysis experiment.

Each mesh was used to produce several sampled discretizations, which were then compared using GDD.

More »

Expand

Fig 13.

Embedding of pairwise distances between mesh discretizations.

We see that GDD clusters each category of mesh tightly, and furthermore that clusters are nearby when they are structurally similar meshes, and distant otherwise. Axes represent the three principal components of the distance matrix and are thus unitless.

More »

Expand