Marathon: An Open Source Software Library for the Analysis of Markov-Chain Monte Carlo Algorithms

We present the software library marathon, which is designed to support the analysis of sampling algorithms that are based on the Markov-Chain Monte Carlo principle. The main application of this library is the computation of properties of so-called state graphs, which represent the structure of Markov chains. We demonstrate applications and the usefulness of marathon by investigating the quality of several bounding methods on four well-known Markov chains for sampling perfect matchings and bipartite graphs. In a set of experiments, we compute the total mixing time and several of its bounds for a large number of input instances. We find that the upper bound gained by the famous canonical path method is often several magnitudes larger than the total mixing time and deteriorates with growing input size. In contrast, the spectral bound is found to be a precise approximation of the total mixing time.


Introduction
The task of random sampling is to return a randomly selected object from a typically large set of objects according to a specified probability distribution.Such tasks often arise in practical applications like network analysis, where properties of a certain network of interest are to be compared with those of a random null model network [1,2].Another application is approximate counting of combinatorial objects.While this is typically a hard problem, the number of solutions of a self-reducible problem, however, can be approximated in polynomial time using randomly sampled objects [3].We consider sampling methods which follow the so-called Markov-Chain Monte Carlo (MCMC) approach.A MCMC algorithm can be considered as a random walk on a directed graph where the vertices of this graph correspond to the set of objects to be sampled from.Starting at an arbitrary vertex, we go to a randomly chosen adjacent vertex and continue our random walk from this vertex; after some time, we stop the random walk.The object representing the final vertex is returned as a random sample.An overview of the surprisingly versatile applications of the MCMC approach was given by Diaconis [4].
Motivation In general, an infinite number of steps will lead to a truly random sample.But what happens if we stop the random walk after a finite number of steps?The number of steps which are necessary to sample from a probability distribution which is close to the desired distribution is known as the total mixing time of a Markov chain, and is of central interest for the applicability of an MCMC algorithm.There are several methods which are able to gain upper bounds on the total mixing time like the canonical path method.Sinclair [5] gives an excellent overview about different bounding techniques.Sometimes, such bounding methods can successfully be applied to Markov Chains to gain upper bounds on their total mixing times.When applied successfully, such upper bounds most often are high-degree polynomials on the size of the input.This is far too large to be applicable in practice, where more than a linear number of steps is infeasible for large input size.In fact, software tools like mfinder [1], which can be used for motif search in large networks, use a MCMC sampling approach with a linear number of steps as default for generation of null models.This leads to the problem that the sampling result might not be as random as desired, resulting in a non-optimal or even incorrect behaviour of the application.
On the other hand, it may be the case that upper bounds on the total mixing time are just too pessimistic.From a purely theoretical perspective it is often already a breakthrough to establish that a given Markov chain is rapidly mixing, that is, to establish a polynomial mixing time bound.Since the bounding techniques in Markov chain analysis are often fairly general and worst-case instances in terms of total mixing time are not known, it is not clear, whether the upper bounds gained by such methods are tight for some worstcase instance.However, for practical applicability, it is of eminent importance to find as sharp bounds as possible.Up to now, there is very little knowledge about the gap between the so established upper bounds and tight bounds for actual worst-case instances.Probably most researches in this field will suspect that a considerable real gap exists, but for specific Markov chains it is in general completely open how many orders of magnitude this gap may be large.By this reason, we believe that the true total mixing time might be much lower than proven by theoretical methods.Therefore, our working questions are: a) Is the total mixing time really as large as the bounding methods propose, or is it possible that the bounding methods are just not precise enough to capture the real total mixing time?b) Which bounding method has the best potential and could lead to better results when further information about the structure of state graphs is given?

Contribution
In trying to answer these questions, we developed the C++ library marathon, which has been designed to compute several structural properties of Markov chains and its corresponding state graphs.The library offers the following features: • Construction of state graphs from arbitrary input instances for a set of user-definable transition rules.
• Computation of structural properties of state graphs.
-Computation of the total mixing time for arbitrary ǫ.
-Computation of the congestion bound gained by the canonical path method with an arbitrary, user-definable, path construction scheme.
-Computation of the upper and lower spectral bound.
-Computation of network properties like diameter and average path length.
We built this library to easily add algorithms for network analysis.We expect that our library could be helpful both for theoretical scientists as well as for practitioners, who implement MCMC algorithms and have to choose an appropriate number of steps.As a demonstration on what kind of research can be done with the help of marathon, we analysed structural properties and investigated the quality of the total mixing time bounds on the example of four well-known Markov chains.
From our experiments, we gained several insights.
• We found that the congestion bound is multiple times larger than the corresponding total mixing time for almost all instances.In addition, the quality of the congestion bound deteriorates with growing input size.This indicates that the congestion bounds are bad approximations of the total mixing time, and, since this bound is often used for theoretical analysis, that the high-degree polynomial bounds from theory may be too pessimistic.
• In contrast, the upper spectral bound is close to the total mixing time for all observations.Even if its quality also deteriorates with growing instance size, the spectral bounds keeps close to the total mixing time much longer than the congestion bound.
• The lower spectral bound can be used as a very good approximation for all input instances we investigated in this work.The data suggests an almost linear relationship between the lower spectral bound and total mixing time.
• We figured out several new structural insights about the Markov chains we investigated.For example, we found that the the second largest eigenvalue in magnitude is almost always positive.The total mixing time of instances of the same input size can be very diverse and depends on the vertex degree in the state graphs.This shows that the experimental approach provided by our library may lead to structural insights which may be useful to get new intuitions for developing more precise bounding techniques.
From our experiments we can conclude that the canonical path method will not lead to tight bounds of the total mixing time, even when further information about a state graphs structure is included.Instead, developing new methods based on the spectral gap seems to be more promising.
Structure The remainder of this article is structured as follows: In the next section we give a brief introduction into the theory of Markov chains, in particular, in the concepts of total mixing time and its bounds.Thereafter, we present the main features of the marathon library.In the second part of this article, we demonstrate the applicability of the library in a set of experiments, while trying to answer the questions described above.The results of these experiments are shown and discussed in the final section.

Methods and Materials
An important tool for understanding MCMC based sampling processes are Markov chains.To make the following sections more understandable, we will briefly introduce the most important concepts of the theory of Markov chains.For a less steep introduction into the topic, we recommend the lecture notes of Sinclair [6], the textbook by Levin, Peres and Wilmer [7], and the survey of Lovász [8].

Theory of Markov Chains
A Markov chain can be seen as a random walk on a set Ω of combinatorial objects, the so-called states.Two states x, y ∈ Ω can be connected via a transition arc (x, y) ∈ Ψ, when x can be transformed into y via small local changes.This definition induces a so-called state graph Γ = (Ω, Ψ), representing the objects and their adjacencies.We define for each pair of states x, y ∈ Ω the so-called proposal probability κ : Ω × Ω → [0, 1] and a weight function w : Ω → R + .A step from x to y in a random walk is done with transition probability P (x, y) = κ(x, y)•min(1, w(y)/w(x)).The matrix P = (P (x, y) x,y∈Ω ) is called transition matrix.
Algorithm 1 shows the classical random walk based sampling method.A random state b ∈ Ω is returned, sampled from a probability distribution p (t) a which describes a state's probability after t steps when starting at state a ∈ Ω.The fundamental theorem for Markov chains (see for example [7]) says that the distribution p (t) a converges for t → ∞ to the unique stationary distribution π with π(y) = w(y) x∈Ω w(x) for all y ∈ Ω and initial states a ∈ Ω, if the state graph Γ is non-bipartite, connected and reversible with respect to π, i.e. π(x)P (x, y) = π(y)P (y, x) for each pair of states x, y ∈ Ω. Markov chains which fulfil these properties are called ergodic.Neighbour selection: Pick a neighbour y of x with probability κ(x, y).

4:
Metropolis rule: x ← y with probability min 1, w(y) w(x) .5: end for 6: return x Total Mixing Time The total variation distance measures the distance between two probability distributions µ and η: We define τ a (ǫ) := min{t ∈ N : ||p a − π|| ≤ ǫ} as the minimal number of steps a random walk has to take to reach a distribution which is close to ǫ to its stationary distribution when starting at state a ∈ Ω.The total mixing time τ (ǫ) of a Markov chain is defined as A Markov chain is known as rapidly mixing, if τ (ǫ) can be bounded from above by a polynomial which depends on the input size n and the parameter ǫ −1 .The total mixing time can be bounded by several techniques.We briefly present two widely used bounding methods which we investigate in the remainder of this article.
Spectral Bound Let 1 = λ 1 > λ 2 ≥ . . .≥ λ |Ω| > −1 be the real eigenvalues of the transition matrix P and let λ max be defined as The total mixing time can be bounded by the lower and upper spectral bound [5], i.e., where π min denotes the smallest component of π.The transition matrix P is not necessarily symmetric.However, it can be transformed into a symmetrical matrix: where D is the |Ω| × |Ω| diagonal matrix with the components of π on its main diagonal.This transformation relies on the reversibility of a Markov chain.It is a classical result that P sym has the same eigenvalues as P .
Congestion Bound Sinclair's multi commodity flow method [5] is often used for bounding the total mixing time.Let P = x =y P xy be a family of simple paths in Γ, each P xy consisting of simple paths between x and y ∈ Ω.The maximum load congestion ρ, with respect to P, is then defined as For any system of paths P the total mixing time of a reversible Markov chain can be bounded by the congestion bound, i.e., The quality of the congestion bounds depend on the path construction scheme P. The congestion bound is often used to gain theoretical bounds on the total mixing time.

The marathon library
In this section, we introduce the main features of the marathon library.Our source code is published under MIT licence and freely available at https://github.com/srechner/marathon.To install, just follow the instruction manual at github.Several example programs are available.The marathon library is designed for the study of structural properties of Markov chains, respectively their corresponding state graphs.One of its current main applications is the computation of the total mixing time and several of its bounds.The suggested way to use our library is to implement the transition rules of a Markov chain and conduct some experiments to quickly learn some properties which are typically hard to find in theory.In particular, we allow the computation of the eigenvalue λ max and the application of the canonical path method to compute the congestion bound with some path congestion scheme.This way, one can quickly evaluate whether a scheme captures the total mixing time closely or if there is a noticeable gap.The marathon library has been designed with two main goals in mind: Performance and Extensibility.Aiming for the first goal, we use the C++ programming language, so that various highly efficient third party libraries become available.In particular, we use CUDA [9] and OpenMP [10] to accelerate compute-intense tasks with the use of multicore processors and highly parallel graphic processing units.To achieve the second goal, we designed our library for easy extensibility.Our representation of the state graphs is versatile enough to allow the integration of arbitrary graph algorithms.Moreover, we provide interfaces to simplify the programming effort and to hide complexity from the user.For example, adding a new Markov chain is done in a few steps: 1. Implement a data structure which represents a state, here simply called State.
2. Create a class which implements the MarkovChain interface, using State as a template parameter.The following virtual methods have to be overridden: • The method MarkovChain::arbitraryState, constructs an arbitrary state from a given input string (e.g.construct a bipartite graph from a given bipartite degree sequence).
• The method MarkovChain::neighbours, takes an object u of type State and creates a list of adjacent states N u whose elements are the result of all valid random choices within u plus their corresponding proposal probability κ, according to the set of transition rules.
In many cases this can be done within about 200 lines of code with a high amount of re-usability.Example chains have been included within the library.
We will now give a detailed description of the algorithmic ideas behind the most important methods available in the library.

Construction of the State Graph
Given an instance representation (e.g. a degree sequence), the first step of the analysis is the construction of the state graph for the given instance.The purpose of this step is to gain a sparse representation of the state graph Γ = (Ω, Ψ).Initializing Ω with an arbitrary state generated by the MarkovChain::arbitraryState method, we perform a full graph scan to iteratively enumerate the set of states and transition arcs.In each step, we select a state u ∈ Ω and enumerate all its adjacent states N u via the MarkovChain::neighbours method.We add a transition arc (u, v) with proposal probability κ(u, v) to Ψ for each adjacent state v ∈ N u and add v to Ω, if not already included.At the end of this step, we have a complete list Ω of states and an adjacency list representation of the transition arcs Ψ.The probability of each transition , where the function w : Ω → R + can optionally be overridden to enable the Metropolis rule (see Algorithm 1).The construction of the state graph relies on connectedness of the state graph, which is given whenever the Markov chain is irreducible.The construction of the state graph is a sequential process, consuming O(|Ω| + |Ψ|) time and memory.
Computation of the Total Mixing Time We compute the total mixing time τ (ǫ) of a state graph based on the following idea: Let P be the transition matrix of an ergodic Markov chain and let a ∈ Ω be an initial state.One step of the random walk changes the probability distribution p (t) a • P .Iterating this argument, the distribution after t steps is given by where p (0) a is a row vector whose components are p (0) a (a) = 1 for the initial state a and p (0) a (i) = 0, for i = a.So, each row of the matrix P t gives the distribution to a certain p (t) a for a ∈ Ω.The total mixing time τ (ǫ) can therefore be computed from P by iterated matrix multiplication.The principle can be described as follows: We transform the adjacency list representation of Γ into its adjacency matrix representation.The entries P i,j of this matrix represent the transition probabilities for going from state i to state j, so this matrix is the transition matrix described above.Since the total variation distance decreases monotonically with t, we can find τ (ǫ) with a two-step procedure.
1. Starting with i = 0, we iteratively increment i until max a∈Ω ||p This requires, in fact, just a sequence of i = ⌈log 2 (τ (ǫ))⌉ matrix squaring operations to compute the matrices P 2 , P 4 , P 8 , . . ., P 2 i .At the end of step one, we know that the total mixing time τ (ǫ) lies in the range 2. We use binary search to find τ (ǫ).We define two variables l = 2 i−1 and r = 2 i representing respectively lower and upper bounds on τ (ǫ).In a sequence of log 2 (r−l) = ⌈log 2 (τ (ǫ))⌉−1 iterations, we half the search space in each step by computing the total variation distance max a∈Ω ||p (m) a − π|| with m = ⌊(l + r)/2⌋ from P m .Although P m could, in principle, be computed from P with binary exponentiation, we save additional matrix multiplications by reusing P l to compute P m = P l P m−l .Maintaining the invariant max a∈Ω ||p a − π||, we update the lower bound l or the upper bound r, depending, whether the total variation distance is larger than ǫ or not.We stop when l ≥ r.The value of r is the total mixing time.
Computing the total mixing time requires O(|Ω| 2 ) memory and O(log(τ (ǫ)) • |Ω| ω ) time, where ω is the exponent of the matrix multiplication algorithm.(In most libraries, ω is equal to 3.However, the matrix multiplication implementation can be exchanged by using another implementation when building the library.) Computing the total mixing time is currently by far the most time and memory consuming operation in our library.Due to quadratic memory requirement, it is applicable only for small state graphs with |Ω| ≤ 30000 (depending on main memory).Since this method is also very time-consuming, we provide three implementations, which can be used if the appropriate hardware is available: a) A classical CPU implementation on the basis of openBLAS [11], which runs in parallel on multi-core CPUs.This method should be used for very small instances, or if no CUDA-capable GPU is available.
b) A cuBLAS [12] based GPU implementation which can be significantly faster than the CPU implementation.Because of the lack of memory on most GPUs, this implementation is designed for small state graphs with |Ω| ≤ 15000 (depending on GPU memory).
c) A CPU-GPU hybrid implementation using the cuBLASXt [13] library for matrix multiplication.This implementation should be the method of choice if a CUDA capable GPU is available.This method is typically faster than the CPU implementation and has the additional advantage that the CPU is free to be used for other computations.
The three implementations share the same asymptotical running time, but differ greatly in practice (see Fig.  Computation of the Spectral Bound The difference between the largest and the second largest eigenvalue (1 − λ max ) of a Markov chain's transition matrix is known as the spectral gap and is of central interest in mixing time analysis.
To compute this quantity, we use the well-known ARPACK++ [14] library to compute the two real eigenvalues with the largest magnitude of the transition matrix P .In case of non-symmetric transition rules, we first transform the transition matrix P into the symmetrical matrix P sym via Equation ( 5).After computing both eigenvalues with the largest magnitude, we use the second eigenvalue in order to compute the upper and lower spectral bound via Inequalities (3) and ( 4).Since this method requires only a sparse representation of the transition matrix, it is also applicable for large state graphs.
Computing the Congestion Bound As stated in the introduction, we are also interested in the quality of other bounding techniques.In particular, we are interested in the canonical path method which is often used to gain theoretical bounds.We implemented a method that can be used to gain an upper bound on the mixing time by computing the maximal congestion of a user-definable path construction scheme.Four each pair of states (u, v) ∈ Ω × Ω we apply the path construction scheme to construct a path p from state u to state v.The congestion of each transition arc lying on this path is increased by |p|π(u)π(v) (see Equation ( 6)).Since the construction of |Ω| 2 paths can be done completely independently of each other, we use OpenMP to construct all paths in parallel.
The maximal congestion of any transition arc is used for computing the upper bound via Inequality (7).This method has a time complexity of O(|Ω| 2 ) and a memory complexity of O(|Ψ|).
Network Analysis As state graphs can be seen as weighted directed graphs, all kinds of graph algorithms can be integrated into marathon.As an example, we added functions for computing the diameter of a state graph, as well as functions for computing the average path length.

Exemplary Markov-Chains
To demonstrate possible applications of marathon, we will describe a set of experiments in the following section.In our experiments, we focus on two famous sampling problems.The problem of uniformly sampling a perfect matching in a bipartite graph and of uniformly sampling a bipartite graph realization for given vertex degrees.Both problems are based on bipartite graphs.A bipartite graph G = (U, V, E) is an undirected graph with disjoint vertex sets V and U and a set of edges E ⊆ U × V , connecting vertices from U with V .We will refer to the vertices of U as u i , where i ranges from 1 to n := |U | and to the vertices of V as v j , where j ranges from 1 to n ′ := |V |.We analysed four different chains that can be used for these sampling problems.To make this article self-contained, we give a introduction to the chains.

Markov Chains for Sampling of Perfect Matchings
A perfect matching in a bipartite graph G = (U, V, E), with n = n ′ , is a subset M ⊆ E of edges such that a) all edges in M are non-adjacent, and b) |M | = n.Uniformly sampling a perfect matching is the problem of choosing one perfect matching from the set of all perfect matchings of G uniformly at random.We describe two important Markov chains which are known to solve the problem of uniformly sampling perfect matchings.These chains origin from applications in statistical physics and are widely known in the field of Markov chain analysis, so they make good examples for our analysis.Both Markov chains use the set of near-perfect matchings as auxiliary states.A near-perfect matching is a subset of non-adjacent edges from E, but with one edge less than a perfect matching.So, being specific, the Markov chains used for sampling are based on the set Ω M of perfect and near perfect matchings in G.
Matching Chain One Broder [15] introduced a Markov chain which can be described as follows.In state M ∈ Ω M we pick a candidate state M ′ (line three in Algorithm 1) by choosing an edge e = {u, v} from E uniformly at random.One of five cases occurs: 1.If M is a perfect matching and e ∈ M , then remove e from M and select 2. If M is a near-perfect matching and e ∈ M , then add e to M and select 3. If M is a near-perfect matching, u is not matched in M , and an edge e ′ = {w, v} ∈ M exists, then remove e ′ from M , add e and select M ′ ← (M \ {e ′ }) ∪ {e}.
4. Symmetrically, if M is a near-perfect matching, v is not matched in M , and an edge e ′ = {u, z} ∈ M exists, then remove e ′ from M , add e and select M ′ ← (M \ {e ′ }) ∪ {e}.
5. In all other cases, stay at M .
The state M ′ is proposed as a candidate state with proposal probability κ(M, M ′ ) = 1/|E| in line three of Algorithm 1.The proposed state M ′ is accepted or refused by line four of Algorithm 1, using unit weights.So, this chain converges to a uniform stationary distribution.In the remainder of this article, we refer to this Markov chain as Matching Chain One.Jerrum and Sinclair [16] proved that Matching Chain One is rapidly mixing for a graph class where the ratio of near-perfect matchings N (G) and perfect matchings M(G) in G is polynomially bounded.Being specific, its total mixing time can be bounded from above: The class of dense bipartite graphs, i.e. bipartite graphs G = (U, W, E) with |U | = |W | = n and with minimal vertex degree of at least n/2, was shown as a class for which the ratio of near-perfect and perfect matchings is polynomially bounded [16].For this class of graphs, the following upper bound on the total mixing time can be gained [17]: A main problem of Matching Chain One is the fact that it does not only generate perfect matchings but also near-perfect matchings.In many graphs with 2n vertices the ratio of the number of near-perfect and perfect matchings cannot be bounded by a polynomial which is dependent on n.In such cases, the sampling of a perfect matching needs an exponential number of trials to see a perfect matching at all.
Matching Chain Two One way to overcome the problem of an exponential total mixing time has been given by Jerrum, Sinclair and Vigoda [18].Their main idea is to use a carefully chosen weight function w to sample from a nonuniform distribution.This way, they gained a rapidly mixing Markov chain which can be used to sample perfect matchings from any given bipartite graph.
In addition, the transition rules were changed to the following: In a state M ∈ Ω M : 1.If M is a perfect matching, we choose an edge e = {u, v} from M uniformly at random.Remove e from M and select M ′ ← M \ {e}.
2. If M is a near-perfect matching where u and v are unmatched vertices, we choose a vertex z ∈ U ∪ V uniformly at random.The proposal probability κ(M, M ′ ) for two states M, M ′ ∈ Ω M is therefore 1/n when moving between perfect and near-perfect matchings and 1/(2n) when moving between near-perfect matchings.In connection with Algorithm 1, this Markov chain converges to a stationary distribution π which is proportional to the weight function w.Defining M as the set of perfect matchings of G, and N u,v as the set of near perfect matchings in G where u and v are unmatched, the weight function suggested in [18] is defined as: Knowing the values of |M| and |N u,v |, the total mixing time of this chain can be bounded from above by the following polynomial [19]: Unfortunately, |M| and |N u,v | are usually not known in practice.Jerrum, Sinclair and Vigoda gave a description of a procedure for approximating these quantities in polynomial time [18].In our experiments though, we knew the exact values since we had a list of all states.In combination with Algorithm 1, this Markov chain is rapidly mixing for all bipartite graphs and can be used for sampling of perfect matchings.We refer to this chain as Matching Chain Two.

Markov Chains for Sampling of Bipartite Graph Realizations
The bipartite graph realization problem is to find, for a pair of non-increasing integer sequences (a 1 , . . ., a n ), (b 1 , . . ., b n ′ ) with a i , b i > 0, a bipartite graph G = (U, V, E), without loops and parallel edges, such that the vertex degrees of u i ∈ U matches the a i and that the vertex degrees of v i ∈ V match the b i .Such a graph G is then called bipartite graph realization or, shorter still, realization.
The set of all realizations of a given degree sequence pair makes the sampling set Ω R for this sampling problem.The bipartite graph realization problem is so important in many fields that it was reinvented several times.It is also known as the problem of contingency tables with given marginal sums or, as matrices with fixed row and column sums.As a direct application of random bipartite graphs, scientists often use a sampled realization as a null model to prove statistical hypotheses.Since sampling of random bipartite graphs is important in a wide range of fields, we chose it as a second example for our experiments.
In contemporary literature, one can find two possible Markov chains that solve the problem of uniform sampling of a bipartite graph realization.One of these approaches is simply to transform it into the perfect matching sampling problem [20].This way, it is known that the running time of the sampling procedure is bounded polynomially.However, the very simple and intuitive switch chain, presented by Kannan et al. [21], is widely used in practice.
Switch Chain One At a state G = (U, V, E) ∈ Ω R , choose four integers i, j, k, l with 1 ≤ i ≤ k ≤ n and 1 ≤ j ≤ l ≤ n ′ uniformly at random.One of three cases occur: 3. In all other cases, select G ′ ← G.
Such a switch, i.e. changing the endpoints of two edges, does not change the degree sequence and thus constructs a new bipartite graph realization of S. It turns out, all graph realizations can be generated by starting with a given graph realization and applying a number of switches [22].The proposal probability κ(G, G ′ ) of a non-loop transition is 2 n(n+1) • for all x ∈ Ω R Algorithm 1 converges to a uniform stationary distribution.Kannan et al. [21] proved polynomially bounded mixing times of this chain for regular sequence pairs.Miklós et al. [23] extended the proof to half-regular sequence pairs.For all other sequence classes, the problem of the rapid mixing property is still open.We will refer to this Markov chain as Switch Chain One.
Recently, Greenhill gave a proof that the total mixing time of the directed, nonbipartite version of Switch Chain One is bounded by the following polynomial when the largest degree d max of the input sequence lies in the range 3 |E|, where |E| is the number of edges in the realization [24]: Switch Chain Two A modification of Switch Chain One, given by Berger et al. [25] for directed graph sequences, can also be used for the bipartite version.The main idea is to reduce the number of transition loops in the state graph.To avoid non-convergence, an additional additional artificial edge {u 0 , v 0 }, which connects two additional vertices u 0 and v 0 , is added to the set of edges.In a state G = (U, V, E) ∈ Ω R select a pair of non-adjacent edges e 1 = {u i , v j } and e 2 = {u k , v l } from the set E ∪ {{u 0 , v 0 }}.
Without the additional edge, a problem would occur if no adjacent edge pairs existed for each graph realization of a sequence; then, the state graph would not possess loops.Similar arguments like those of Ref. [25] show that this chain converges to the uniform distribution.This chain reduces the average probability of transition loops and, as it will turn out, the average total mixing time.However, the asymptotical total mixing time is the same as for Switch Chain One.We will refer to this chain as Switch Chain Two.

Experiments
We demonstrate our library to experimentally analyse the mixing time behaviour of the Markov chains.The goal of our experiments is to quantify the gap between the total mixing time and the various bounding methods.We are asking the question, whether the bounding techniques are tight or if there is a huge gap between total mixing time and its bounds.The latter case would support the thesis that the total mixing time actually is far smaller than known in theory.This would be a welcomed message for any programmer who wants to implement a sampling method and does not want to choose a highly polynomial (or even exponential) number of steps.We want to make some general remarks before beginning to describe our experiments.
The choice of ǫ ǫ ǫ Recall, that the total mixing time τ (ǫ), as well as its bounds depends on the value of ǫ which defines the distance to the stationary distribution at the end of the random walk.For the purpose of our experiments, we choose ǫ = 10 −3 .Furthermore, for other values of ǫ we get similar results.
Applying canonical paths To compute the congestion bounds, we use the path construction schemes from the original proofs and apply them to concrete state graphs.As those path construction schemes were used to gain theoretical bounds without full knowledge of a Markov chain's structure, applying them to concrete state graphs shows how sharp such bounds could be if full structural information were available.Consequently, the bounds in general cannot be better than the upper bounds that are gained when applying the scheme on an actual state graph.To understand, how sharp the theoretical bounds can be, we analyse the differences to the actual total mixing times.In the case of matching chains, we use the construction scheme presented in [16] which was used to gain the upper bound given in Equation 10.In case of the switch chains we apply the definition of the canonical paths from [21].For the definition of the path construction schemes, we refer to the original articles.We tried as close as possible to implement the original canonical path construction rules.However, the definition often relies on an arbitrary order of the vertices, or orderings of cycles.It turned out that changing these orders has an effect on the quality of the congestion bound.However, as the proofs for the upper bounds are valid for all orderings, this is not a problem for our observations.
Hardware Usage Since we wanted to compute the properties of a large amount of instances, we decided to compute as many properties of a state graph as possible in parallel.As shown before, the computation of the total mixing time is the bottleneck for our computations.For this reason, we decided to transfer this task to the GPU whenever possible.This has the advantage, that the free CPU cores can be used for the computation of other quantities, while the GPU is occupied.With this strategy, we achieve a large throughput of instances.Fig. 2 shows the hardware usage while computing the properties of a Markov chain instance.

Enumeration Experiment
In our first experiment, we wanted to gain an impression concerning the dimension of the total mixing time and its bounds.
Experimental Setup Firstly, we enumerated all small input instances up to a given limit for each chain.In particular, we enumerated 19,378 bipartite degree sequence pairs with up to 6+6 vertices as input for the switch chains and 89,242 connected non-isomorphic bipartite graphs with 6+6 vertices as input for the matching chains.While the first task was done with a simple backtracking procedure, the latter task was done by the command line tool nauty [26] binatorial objects served as input instances for the Markov chains.Even though the size of input is small, the resulting state graphs can be large.For example, the largest state graph processed in this experiment had 297,200 states and corresponds to the regular degree sequence pair ( Since both the number of instances, as well as the size of the state graphs grow exponentially, we were not able to perform such an experiment for a much larger instance size.For each instance, we constructed the corresponding state graph.For all state graphs with a maximum of 20,000 states, we computed the following properties: a) total mixing time, b) spectral bound, and c) canonical path congestion bound.Although we could easily compute the latter bounds for much larger state graphs, we did not do so, since we need the total mixing time for comparison.

Results
In Fig. 3, we show the results of the enumeration experiment.Each input instance corresponds to two data points, showing the ratio of the upper spectral bound (green), and congestion bound (blue) with the total mixing time.We immediately observe that the spectral bound is very close to the total mixing time for all instances, in contrast to the congestion bound.More precisely, in case of Matching Chain Two, the congestion bound is up to 800 times larger than the total mixing time.Similar observations can be made for the other chains.In Fig. 4, we highlight the instances which are known to have a polynomial congestion bound.In the case of the switch chains, these are regular and halfregular degree sequences.In the case of Matching Chain One, these are dense bipartite graphs.Matching Chain Two is known to be rapidly mixing for all instances.We find that the observations can also be confirmed within this set.

Scaling Experiment
We wanted to know whether the observed effects remain valid for larger instances.The main problem is, that the number of instances for our Markov chains, both degree sequences and bipartite graphs, grows exponentially with input size.So, a complete enumeration becomes infeasible when the instance size grows.To gain some insights, we picked selected instances for the switch chains and scaled them to larger size.
Experimental Setup We picked a half-regular sequence pattern (n − 1, n − 2, 2, 1), (2, 2, . . ., 2), where the number of two's in the second sequence is n.The parameter n can be used to scale the instance to different sizes.In doing so, a bipartite graph which realizes this degree sequences has n + 4 vertices and 2n edges.These sequence pairs are half-regular and so, it is known that the switch chains are rapidly mixing.For each n between 4 and 50, we constructed the corresponding state graph and computed the total mixing time and its bounds.Again, due to the large memory consumption, we stopped computing the total mixing time when |Ω| exceeded 20,000.To predict missing values of total mixing time for larger state graphs up to about 600,000 states, we used a linear model based on the following observation.

Results
We observed that the lower spectral bound seems to have a linear relationship with the total mixing time.This allowed us to predict the missing total mixing time values using a simple linear model (see Fig. 5 for details).To quantify the gap between the bounds and total mixing time, we again divided each bound through the corresponding total mixing time or its predicted value.Fig. 6 shows the result of this experiment.We observed that the gap between the congestion bound and the total mixing time grows with input size n.Furthermore, we observed that the same is true for the upper spectral bound, but in a much slower way.We repeated this experiment with other half-regular sequence patterns and observed similar effects.In particular, we changed our sequence pattern ).In the case of Switch Chain Two, we observe similar effects.These results are interesting because these bounds are lower bounds on the mixing time of the switch chains for all half-regular sequence pairs.Since two of them are clearly super-linear, a random walk with a linear number of

Loop Reduction Experiment
From the data of the enumeration experiment, we observed that the total mixing time does not correlate with the size of the state graph (see Fig. 8).Instead, the largest mixing time appears at input instances corresponding to small state graphs.For example, the sequence pair (6, 6, 6, 6, 5, 5), (6,6,6,6,5,5), possesses just two different states but is beneath the instances with largest total mixing  time in both switch chains.To explain this effect, consider sequence pairs of the general form (n, n, . . ., n − 1, n − 1), (n, n, . . ., n − 1, n − 1), where 2n is the number of vertices in the corresponding bipartite graph.It is apparent that such sequence pairs posses two different realizations, meaning, their state graph have two states.Now consider Switch Chain One; for a given n, the number of ways to choose i, j, k, l is (n • (n + 1)/2) 2 ∈ O(n 4 ).Due to the large number of edges in the sequence pairs, all but one of these choices result in loops.As a consequence, the expected number of steps, just to get from one state to the other, is O(n 4 ), which is immediately a lower bound for the mixing time of Switch Chain One.The fact that an instance with polynomially bounded total mixing time makes the largest mixing time over all instances is likely an effect of the small input size.To better understand this effect, we decided to further reduce the influence of the loops.
Experimental Setup We repeated the enumeration experiment but made some artificial modifications on each state graph before computing its properties.
For each state graph we determined the minimal loop probability over all states P min := min i∈Ω P ii .We wanted to reduce the transition probability of each loop in the state graph by this quantity.However, to avoid the removal of all loops and possible problems with convergence, we removed only 99% of this probability from the loops.This way, we gained new transition probabilities of The amount of .99Pmin which was reduced from the probability mass of each state, was restored to the network through rescaling of the remaining transition arcs: By subtracting the same amount of probability from each loop, we kept the symmetry of the transition matrix so that the chain's stationary distribution was still uniform.The result was a state graph with the same structure but drastically reduced loop probability.In such state graphs, the mixing times are much more dependent on the structure of the state graph than on the number of the loops.We repeated the computation of the total mixing time and its upper bounds with those modified state graphs for Switch Chain One.

Results
We first observed that the modification indeed had a strong effect on the total mixing time (see Fig. 9).While the largest total mixing time in our instance set is about 1400, in the set of reduced state graphs, it is 63.In addition, we observed that the total mixing time after the reduction steps no longer correlates with the average loop probability.So, the effects we observe now are more or less detached from the influence of the loops.We observed, that the effect we observed in the enumeration experiment still remain (see Fig. 10).
The quality of the congestion bound is still bad while the largest total mixing time occurs at relatively small state graphs.We finished our experiments with  the observation that the upper spectral bound as well as the congestion bound heavily depend on the average vertex degree of the state graph (see Fig. 11).We concluded that the average degree is a structural property of a state graph which has a stronger influence on the total mixing time than its size.

Results and Discussion
We introduced the marathon library and demonstrated how it can be used to support the analysis of sampling algorithms by a set of experiments.We briefly summarise the most important observations from our experiments: • The congestion bound is much larger than the total mixing time in almost all cases.The quality of the congestion bounds deteriorates with growing input size.This observation is made in reduced and non-reduced state graphs.
• The upper spectral bound keeps close to total mixing time even when the input size grows, although it can be observed, that its quality drops slowly.Again, this observation is made in reduced and non-reduced state graphs.
• The lower spectral bound seems to have an almost linear relationship with the total mixing time.This way, it can be used to precisely predict the total mixing time.The search for an explanation of this effect is likely a good subject for further theoretical studies.
• Maybe the most surprising observation made in our experiments is that large total mixing time tends to occur at small state graphs.
• The average vertex degree of a state graph has a strong relationship on its total mixing time.This observation should be investigated further.
Figuring out which sequences have a tendency towards more diverse tex degrees in their state graphs could lead to further interesting rapidly mixing graph classes.
Our observations indicate that the theoretical bound gained by the canonical path method is likely too pessimistic.Moreover, although we know the exact structure of a state graph in our experiments, which can never be the case in a normal practical scenario, the congestion bound is too large.This leads us to the conclusion that the canonical path method will never yield applicable values.Furthermore, this gives hope to the hypothesis, that the true total mixing time is smaller than existing theory is able to prove.So, we are optimistic about the applicability of the switch chains.We think that a promising next step could be to focus on two fields which need one another: investigating and the proving of special structures for state graphs depending on the problem and figuring out new relationships between structures of graphs and its eigenvalues.

Future Work
One of our next steps is to use our new tool very extensively.We want to investigate further chains and test further hypotheses.Moreover, we plan to add further functionality to the marathon library to enable additional research questions.For example, we want to compute the distribution of the mixing time.
While the total mixing time displays the maximal number of steps necessary to drop below a total variation distance of ǫ, the mixing time as a function of the starting state would lead to further insights, such as average mixing time.
Another additional feature would be the computation of a multi commodity flow congestion scheme, which is a generalization of the canonical path method.Furthermore, we want to improve the performance of existing methods.In particular, we want to add multi GPU support as well as a parallel method for the construction of the state graph.Another topic would be the parallel evaluation of the canonical path congestion method using GPUs.
1 for a comparison of performance).Since most of the work is matrix multiplication, this comparison almost directly maps to a performance comparison between the corresponding libraries for matrix multiplication.The O(log(τ (ǫ))) invocations of the total variation distance computation, each with time complexity of O(|Ω| 2 ), contribute only a little to the running time.

Figure 1 .
Figure 1.Single and double precision performance of the total mixing time computation.The charts show the running time for the computation of the total mixing time on the example of five state graphs of size 8012 to 20358.Due to the relatively small amount of GPU memory on our test system, only the first four (respectively two) state graphs could be processed by the GPU implementation in single precision mode (respectively double precision mode).The running times were measured on an Ubuntu 14.04 system with a Intel Xeon E3-1231, NVIDIA GeForce GTX 970 (4 GB GPU memory) and 16 GB of main memory, using gcc in version 4.8.4 and CUDA in version 7.0.

Figure 3 .
Figure 3.The quality of the upper bounds.The quotient of congestion bound (blue) and total mixing time, respectively upper spectral bound (green) and total mixing time versus the size of the corresponding state graph.

Figure 4 .
Figure 4.The quality of the upper bounds for rapidly mixing instances.The results of Fig. 3 filtered to highlight instances with known polynomial mixing time.Instances with no known polynomial bound are coloured gray.

Figure 5 .
Figure 5. Relationship between the lower spectral bound and the total mixing time.The total mixing time is shown in connection to a corresponding lower spectral bound for sequence pairs of the form (n − 1, n − 2, 2, 1), (2, 2, . . ., 2).We use the displayed formulas to predict missing values for total mixing time.

Figure 6 .
Figure 6.Quality of the upper bounds with growing instance size.The quotient of congestion bound (blue), respectively upper spectral bound (green) with total mixing time is shown as a function of the instance size.A dotted line shows the point where we stop computing the total mixing time and start using the lower spectral bound approximation.

Figure 7 .
Figure 7. Growing asymptotic for the total mixing time of half-regular sequence pairs.The observed mixing time curves for the degree sequence pairs of type A are coloured violet, of type B red, and of type C orange.

Figure 8 .
Figure 8. Number of states versus total mixing time.Each data point represents one input instance from our set of input instance.Instances with known polynomial mixing time are highlighted red.

Figure 9 .
Figure 9. Influence of the average loop probability on the total mixing time.Each data point represents a state graph, for each input instance of Switch Chain One, before and after loop reduction.

Figure 10 .
Figure 10.Properties of reduced state graphs.(A) Size of a state graph versus its total mixing time.Regular and half-regular sequence pairs are highlighted red.(B) Quotient of congestion bound (blue) and upper spectral bound (green) with corresponding total mixing time.

Figure 11 .
Figure 11.Influence of the average vertex degree.Connection between average vertex degree of a state graph and its total mixing time, respectively canonical path bound.
Figure2.Typical hardware usage for our experiments.At first, the state graph is being constructed as a sequential task.Afterwards, the properties of the state graph are being computed in parallel.When each property is computed the next instance is processed in the same manner.