Fig 1.
The first seven levels of the graph lineage of path graphs, with ancestry relationships.
Δl = 0 edges are colored in orange, Δl = ±1 edges are colored in blue. Self-loops are not illustrated.
Fig 2.
Top: subsamples of a mesh of the Utah teapot, of increasing density (each node is connected to its 8 nearest neighbors by the Δl = ±0 edges, rendered in blue). These samples form a graph lineage (Δl = ±1 edges are not illustrated). Bottom: the same set of nodes, with only Δl = ±1 edges plotted (in orange) for one node from the coarsest level and its descendants.
Fig 3.
A plot illustrating unimodality of diffusion distance.
D2 was calculated between two grid graphs Sq7 and Sq8 of size 7 × 7 and 8 × 8, respectively. The distance is given by the formula as a function of t. The peak, at t ≈.318, yields the distance D2(Sq7, Sq8).
Table 1.
Summary of this paper’s investigation of different forms of our graph dissimilarity measure.
In this work, we systematically explore properties of this measure given sparsity parameter s = 0, and various regimes of t (fixed at some early time, or maximized over all t) and α (fixed at α = 1, fixed at a constant power r of the ratio of graph sizes, or minimized over all α. We leave exploration of nonzero values of the sparsity parameter to future work. Variants not explicitly called out are not considered. In the case where α and t are both optimized and s > 0, it is unclear which of the metric conditions GDD satisfies, hence the corresponding classification is left blank.
Fig 4.
Two plots demonstrating characteristics of distance calculation between a (7× 7) grid and an (8 × 8) grid.
(a): Plot illustrating the discontinuity and multimodality of the linear version of distance. Each gray curve represents a function . The thicker curve is the lower convex hull of the thinner curves as a function of α, that is:
. We see that f(α) is continuous, but has discontinuous slope, as well as several local optima (marked by arrowheads). These properties make
difficult to optimize, necessitating the development of Algorithm 1. (b): As in (a), but with D2(Sq7, Sq8|t = .318) plotted instead of
. This t value is the location of the maximum in Fig 3.
Fig 5.
Graph lineages used in multiple numerical experiments in the main text.
Fig 6.
Distances D2(G, H) calculated for several pairs of graphs.
The top plot shows distances where G and H are both chosen from {Grid13×13, P169, C169, Ba13}. At bottom, distances are calculated from G chosen in {Grid12×12, P144, C144, Ba12} to H chosen in {Grid13×13, P169, C169, Ba13}. As expected, diagonal entries are smallest.
Fig 7.
Comparison of runtimes for our algorithm and bounded golden section search over the same interval [10−6, 10].
Runtimes were measured by a weighted count of evaluations of the Linear Assignment Problem solver, with an n × n linear assignment problem counted as n3 units of cost. Because our algorithm recovers the entire lower convex hull of the objective function as a function of α, we compute the cost of the golden section search as the summed cost of multiple searches, starting from an interval bracketing each local optimum found by our algorithm. We see that our algorithm is much less computationally expensive, sometimes by a factor of 103. The most dramatic speedup occurs in the regime where n1 ≪ n2. Graphs were generated by drawing n1 uniformly from [5, 120], drawing n2 uniformly from [n1, n1 + 60], and then adding edges according to a Bernoulli distribution with p in {.125, .25, .375, .5, .625, .75, .875 } (60 trials each).
Fig 8.
Histograms of triangle inequality violation.
These plots show the distribution of Disc(G1, G2, G3), as defined in the text, for the cases (a) top: the linear or small-time version of distance and (b) bottom: the exponential or arbitrary-time version of distance. We see that for the sizes of graph we consider, the largest violation of the triangle inequality is bounded, suggesting that our distance measure may be an infra-ρ-pseudometric for some value of ρ ≈ 1.8 (linear version) or ρ ≈ 5.0 (exponential version). See Table 1 for a summary of the distance metric variants introduced in this paper. We also plot the same histogram for out-of-order (by vertex size) graph sequences: Disc(G2, G1, G3) and Disc(G3, G2, G1). Each plot has a line at x = 1, the maximum discrepancy score for which the underlying distances satisfy the triangle inequality.
Table 2.
Mean distances between graphs in several lineages.
For two lineages G1, G2… (listed at left) and H!, H2, … (listed at the top), each entry shows the mean distance D(Gi, Hi+1) (where the average is taken over i = 1 to 12). As expected, we see that the distance from elements of a graph lineage to other members of the same lineage (the diagonal entries of the table) is smaller than distances taken between lineages. Furthermore as expected, 1D paths are more similar (but not equal) to 1D cycles than to other graph lineages.
Fig 9.
Cauchy-like behavior of graph distance as a function of sequence index, n.
The distance between successive square grids and all other graph sequences appears to diverge (the same behavior is seen for k-barbells). Notably, the distance between Gridn×n and Grid(n+1)×(n+1) does not appear to converge, until much higher values of n (n > 100) than the other convergent series. This may be because the distances calculated are an upper bound, and may be converging more slowly than the ‘true’ optima.
Fig 10.
Limiting behavior of D and two parameters as path graph size approaches infinity.
All distances were calculated between Pathn and Pathn+1. We plot the value of the objective function, as well as the optimal values of α and t, as n → ∞. Optimal α rapidly approach 1 and the optimal distance tends to 0. Additionally, the optimal t value approaches a constant (t ≈.316345), providing experimental validation of the assumption we make in proving Theorem 14.
Fig 11.
Comparison of the distance D(Sqn, Sqn+1) as a function of n, to the upper bound calculated as the optimum of distance between Pan and Pan+1.
We see that the upper found converges to some constant D ≈ 0.01782, whereas the actual distance appears to be converging to 0 as n → ∞.
Fig 12.
3D meshes used in the shape analysis experiment.
Each mesh was used to produce several sampled discretizations, which were then compared using GDD.
Fig 13.
Embedding of pairwise distances between mesh discretizations.
We see that GDD clusters each category of mesh tightly, and furthermore that clusters are nearby when they are structurally similar meshes, and distant otherwise. Axes represent the three principal components of the distance matrix and are thus unitless.