Local Hypergraph Clustering using Capacity Releasing Diffusion

Local graph clustering is an important machine learning task that aims to find a well-connected cluster near a set of seed nodes. Recent results have revealed that incorporating higher order information significantly enhances the results of graph clustering techniques. The majority of existing research in this area focuses on spectral graph theory-based techniques. However, an alternative perspective on local graph clustering arises from using max-flow and min-cut on the objectives, which offer distinctly different guarantees. For instance, a new method called capacity releasing diffusion (CRD) was recently proposed and shown to preserve local structure around the seeds better than spectral methods. The method was also the first local clustering technique that is not subject to the quadratic Cheeger inequality by assuming a good cluster near the seed nodes. In this paper, we propose a local hypergraph clustering technique called hypergraph CRD (HG-CRD) by extending the CRD process to cluster based on higher order patterns, encoded as hyperedges of a hypergraph. Moreover, we theoretically show that HG-CRD gives results about a quantity called motif conductance, rather than a biased version used in previous experiments. Experimental results on synthetic datasets and real world graphs show that HG-CRD enhances the clustering quality.


introduction
Graph and network mining techniques traditionally experience a variety of issues as they scale to larger data . For instance, methods can take prohibitive amounts of time or memory (or both), or simply return results that are trivial. One important class of methods that has a different set of trade-offs are local clustering algorithms [Andersen et al., 2006]. These methods seek to apply a graph mining, or clustering (in this case), procedure around a seed set of nodes, where we are only interesting in the output nearby the seeds. In this way, local clustering algorithms avoid the memory and time bottlenecks that other algorithms experience. They also tend to produce more useful results as the presence of the seed is powerful guidance about what is relevant (or not). For more about the trade-offs of local clustering and local graph analysis, we defer to the surveys [Fountoulakis et al., 2017].
Among the local clustering techniques, the two predominant paradigms [Fountoulakis et al., 2017] are (i) spectral algorithms [Andersen et al., 2006;Chung, 2007;Mahoney et al., 2012;Spielman and Teng, 2013], that use random walk, PageRank, and Laplacian methodologies to identify good clusters nearby the seed and (ii) mincut and flow algorithms [Lang and Rao, 2004;Andersen and Lang, 2008;Orecchia and Zhu, 2014;Veldt et al., 2016Veldt et al., , 2018, that use parametric linear programs as well as max-flow, min-cut methodologies to identify good clusters nearby the seed. Both have different types of trade-offs. Mincutbased techniques often better optimize objectives such as conductance or sparsity whereas spectral techniques are usually faster, but slightly less precise in their answers. These difference often manifest in the types of theoretical guarantees they provide, usually called Cheeger inequalities for spectral algorithms. A recent innovation in this space of algorithms is the capacity releasing diffusion, which can be thought of as a hybrid max-flow, spectral algorithm in that it combines some features of both. This procedure provides excellent recovery of many node attributes in labeled graph experiments in Facebook, for example.
In a different line of work on graph mining, the importance of using higherorder information embedded within a graph or data has recently been highlighted in a number of venues [Benson et al., 2016;Tsourakakis et al., 2017;Li and Milenkovic, 2017]. In the context of graph mining, this usually takes the form of building a hypergraph from the original graph based on a motif [Li and Milenkovic, 2017]. Here, a motif is just a small pattern, think of a triangle in a social network, or a directed pattern in other networks like a cycle or feed-forward loop patterns. The hypergraph corresponds to all instances of the motif in the network. Analyzing these hypergraphs usually gives much more refined insight into the graph. This type of analysis can be combined with local spectral methods too, as in the MAPPR method [Yin et al., 2017]. However, the guarantees of the spectral techniques are usually biased for large motifs because they implicitly, or explicitly, operate on a clique expansion representation of the hypergraph [Li and Milenkovic, 2017].
In this paper, we present HG-CRD, a hypergraph-based implementation of the capacity releasing diffusion hybrid algorithm that combines spectral-like diffusion with flow-like guarantees. In particular, we show that our method provides cluster recovery guarantees in terms of the true motif conductance, not the approximate motif conductance implicit in many spectral hypergraph algorithms [Benson et al., 2016;Yin et al., 2017]. The key insight to the method is establishing an algorithm that manages flow over hyperedges induced by the motifs. More precisely, if we use for illustration, a triangle as our desired motif, if a node i wants to send a flow to node j in the process, instead of sending the flow through the edge (i, j), it will send the flow through the hyperedge (i, j, k). This ensures that node i sends flow to node j that is connected to it via a motif and that nodes i, j and k are explored simultaneously. To show why is it important to consider higher order relations, we explore a similar metabolic network explored by Li and Milenkovic [Li and Milenkovic, 2017], where nodes represent metabolites and edges represent the interactions between metabolites. These interactions usually described by equations as M 1 + M 2 → M 3, where M1 and M2 are the reactant and M3 is the product of the interaction. Our goal here is to group metabolite represented by node 7 with other nodes based on their metabolic interactions. Figure 1 shows the used motif in HG-CRD and the graph to cluster. By considering this motif, HG-CRD separates three metabolic interactions, while CRD separates six metabolic interactions. Note that, for three node motifs, Benson et al. [Benson et al., 2016] show that a weighted-graph called the motif adjacency matrix, suffices for many applications. We also consider this weighted approach via an algorithm called CRD-M (here, M, is for motif matrix), and find that the HG-CRD approach is typically, but not always, superior.
Additionally, we show that HG-CRD gives a guarantee in terms of the exact motif-conductance score that mirrors the guarantees of the CRD algorithm. More precisely, if there is a good cluster nearby, defined in terms of motif-conductance, and that cluster is well connected internally, then the HG-CRD algorithm will find it. This is formalized in section 4.6.
Finally, experimental results on both synthetic datasets and real world graphs for community detection show that HG-CRD has a lower motif conductance than the original CRD in most of the datasets and always has a better precision in all datasets. To summarize our contributions: · We propose a local hypergraph clustering technique HG-CRD by extending the capacity releasing diffusion process to account for hyperedges in section 4. · We show that HG-CRD is the first local higher order graph clustering method that is not subject to the quadratic Cheeger inequality in section 4.6 by assuming the existence of a good cluster near the seed nodes. · We compare HG-CRD to the original CRD and other related work on both synthetic datasets and real world graphs for community detection in section 5.

related work
While there is a tremendous amount of research on local graph mining, we focus our attention on the most closely related ideas that have influenced our ideas in this manuscript. This includes hypergraph clustering and higher-order graph analysis, local clustering, the capacity releasing diffusion method, and the MAPPR method.

HYPERGRAPH CLUSTERING AND HIGHER-ORDER GRAPH ANALYSIS
It is essential to develop graph analysis techniques that exploit higher order structures, as shown by [Benson et al., 2018], because these higher order structures reveal important latent organization characteristics of graphs that are hard to reveal from the edges alone. Several techniques [Benson et al., 2016;Tsourakakis et al., 2017;Li and Milenkovic, 2017] are available for the general analysis of higher-order clustering or hypergraph clustering. The two are related because the higher-order structures, or motifs, as used in [Benson et al., 2016], can be assembled into a hypergraph representation where each hyperedge expresses the presence of a motif. In all three methods, the hypergraph information is re-encoded into a weighted graph or weighted adjacency matrix. For instance, in [Benson et al., 2016], they construct an n × n motif adjacency matrix, where each entry (i, j) in the matrix represents the number of motif instances that contain both node i and node j. Then, it uses the motif adjacency matrix as an input to the spectral clustering technique. This has good theoretical guarantees only for 3 node motifs. Likely, [Tsourakakis et al., 2017] also constructs the same type of motif adjacency matrix. Using the motif adjacency matrix, Tectonic normalizes the edge weights of the motif adjacency matrix by dividing over the degree of the motif nodes. Finally, to detect the clusters, Tectonic removes all edges with weight less than a threshold θ. More recently, [Li and Milenkovic, 2017] generalize the previous results by assigning different costs for different partitions of a hyperedge before reducing the hypergraph to a matrix. They also show that the returned cluster conductance has a quadratic approximation to the optimal conductance and give the first bound on the performance for a general sized hyperedge. See another recent paper [Veldt et al., 2020] for additional discussion and ideas regarding hypergraph constructions and cuts.
In contrast, in our work here, our goal is an algorithm that directly uses the hyperedges without the motif-adjacency matrix construction entirely, i.e. the HG-CRD method. We use the motif-weighted adjacency matrix (CRD-M) solely for comparison.

LOCAL CLUSTERING
As mentioned in the introduction, local clustering largely splits along the lines of spectral approaches -those that use random walks, PageRank, and Laplacian matrices -and mincut approaches -those that use linear programming and max-flow methodologies [Fountoulakis et al., 2017]. For instance, approximate Personalized PageRank approaches due to [Andersen et al., 2006] are tremendously successful in revealing local structure in the graph nearby a seed set of vertices. Examples abound and include the references. [Leskovec et al., 2009;Kloumann and Kleinberg, 2014;Zhu et al., 2003;Gleich and Mahoney, 2015;Pan et al., 2004]. We elaborate more on these below because we use a motif-weighted local clustering algorithm MAPPR as a key point of comparison with this class of techniques.
Mincut based techniques [Lang and Rao, 2004;Andersen and Lang, 2008;Orecchia and Zhu, 2014] form a sequence or a parametric linear program that is, under strong assumptions, capable of optimally solving for the best local set in terms of objectives such as set conductance (number of edges leaving divided by total number of edges) and set sparsity (number of edges leaving divided by number of vertices). These can be further localized [Orecchia and Zhu, 2014;Veldt et al., 2016Veldt et al., , 2018 by incorporating additional objective terms to find smaller sets. As mentioned, a recent innovation is the CRD algorithm, which presents a new set of opportunities. We explain more about that algorithm below. Related ideas exist for metrics such as modularity [Clauset, 2005].

MAPPR
Motif-based approximate personalized PageRank (MAPPR) [Yin et al., 2017] is a local higher order graph clustering technique that generalizes approximate personalized PageRank (APPR) [Andersen et al., 2006] to cluster based on motifs instead of edges. MAPPR was proven to detect clusters with small motif conductance and with a running time that depends on the size of the cluster. Experimental results on community detection show that MAPPR outperforms APPR in the quality of the detected communities. Like the existing work, MAPPR uses the motif-weighted adjacency matrix. Again, our goal is to avoid this construction, although we use it for comparison. [Wang et al., 2017] proposed a new diffusion technique called capacity releasing diffusion (CRD), which is faster and stays more localized than spectral diffusion methods. Additionally, CRD is the first diffusion method that is not subject to the quadratic Cheeger inequality. The basic idea of the CRD process is to assume that each node has a certain capacity and then start with putting excess of flow on the seed nodes. After that, let the nodes transmit their excess of flow to other nodes according to a similar push/relabel strategy proposed by [Goldberg and Tarjan, 1988]. If all the nodes end up with no excess of flow, then there was no bottleneck that kept the flow contaminated and therefore CRD repeats the same process again while doubling the flow at the nodes that are visited. On the other hand, if at the end of the iteration, we observed that too many nodes have too much excess (according to some parameter), then we have hit a bottleneck and we should stop and return the cluster. The cluster will be identified by the nodes that have excess of flow at the end of the iteration. In this work, we extend CRD process to consider clustering based on higher order structures.

KEY DIFFERENCES WITH OUR CONTRIBUTION
In this work, we aim to extend capacity releasing diffusion process (CRD) to account for clustering based on higher order structures instead of edges. As CRD was shown to stay more localized than spectral diffusions, higher order techniques based on CRD will also stay more localized than spectral-based higher order techniques like MAPPR, this is the main motivation of why we choose to extend the CRD process. Furthermore, we discuss this localization property in more details in section 4.5.

local cluster quality
Two important measures to quantify a set S as a cluster in the graph G are conductance [Schaeffer, 2007] and motif conductance [Benson et al., 2016]. We define these here, as well as reviewing our general notation.
Scalars are denoted by small letters (e.g., m, n), sets are shown in capital letters (e.g., X, Y ), vectors are denoted by small bold letters (e.g., f , g), and matrices are capital bold (A, B). Given an undirected graph G = (V, E), where V is the set of nodes and E is the set of edges, we use d(i) to denote the degree of the vertex i, and we use n to be the number of nodes in the graph. Additionally, let e be the vector of all ones of appropriate dimension.
Conductance captures both how well-connected set S is internally and externally and it is defined in terms of the volume and cut of the set. The volume of set S is vol(S) = v∈S d(v). The cut of set S is the set of all edges where one end-point is in S and the other is not. This gives cut(S) = {{u, v} ∈ E : u ∈ S and v ∈S}. Then the conductance of the set S is defined as: where minvol(S) = min(vol(S), vol(V − S)). For weighted graphs, the volume uses the weighted degrees and the |cut(S)| is the sum of edge weights cut. Motif conductance as defined by [Benson et al., 2016] is a generalization of the conductance to measure the clustering quality with respect to a specific motif instance, it is defined as: where cut M (S) is the number of motif instances that have at least one node in S and at least one node inS, vol M (S) is the number of motif instance end points in S and minvol M (S) = min(vol M (S), vol M (V − S)). Additionally, we will define the size of the motif k as the number of nodes in the motif.
The motif hypergraph is built on the same node set as G where each hyperedge represents an instance of a single motif. For instance, the motif hypergraph for Figure

a capacity releasing diffusion for hypergraphs via motif matrices
Our main contribution is the HG-CRD algorithm that avoids the motif-weighted adjacency matrix. We begin, however, by describing how CRD can already be combined with the motif-weighted adjacency matrix in the CRD-M algorithm. For three node motifs, this offers a variety of theoretical guarantees, but there are significant biases for larger motifs that we are able to mitigate with our HG-CRD algorithm.
One straightforward idea to extend capacity releasing diffusion to cluster based on higher order patterns is to construct the motif adjacency matrix W M as described by [Benson et al., 2016], where the entry W M (i, j) equals to the number of motif instances that both node i and node j appear in its nodes. After constructing the motif adjacency matrix, we will run the CRD process on the graph represented by the motif adjacency matrix W M . As the CRD process does not take into account the weight of the edges, we consider two variations, the first one is to duplicate each edge (u, v) W M (u, v) times. This results in a multigraph, where the CRD algorithm can be easily adapted and the second variation is to multiply the weight of the edge (u, v) used in the CRD process by W M (u, v). We report the second variation in the experimental results as it has the best F 1 in all datasets. Based on [Benson et al., 2016] proof, when the size of the motif is three, then the motif conductance will be equal to the conductance of the graph represented by the motif adjacency matrix. As the CRD process guarantees that the conductance of the returned cluster is O(φ) where φ is a parameter input to the algorithm that controls both the maximum flow that can be pushed along each edge and the maximum node level during the push-relabel process. This extension guarantees that when the motif size is three, the motif conductance of the returned cluster is O(φ). In the rest of the paper, we will propose another extension for higher order capacity releasing diffusion that has a motif conductance guarantee of O(kφ) for any motif size k.

A TRUE HYPERGRAPH CRD
Let H = (V, E) be a hypergraph where the hyperedges E represent motifs. For this reason, we will use motif and hyperedge largely interchangeably. The basic idea in CRD is to push flow along edges until a bottleneck emerges. In HG-CRD, the basic idea is the same. We push the flow across hyperedges as much as we can until there is too much excess flow on the nodes, which means nodes are unable to push much flow to hyperedges containing them. This will happen because the algorithm has reached a bottleneck and hence the algorithm has identified the desired cluster with respect to the high order pattern or motif. As such, the HG-CRD procedure is highly algorithmic as it implements a specific procedure to move flow around the graph to identify this bottleneck.
The input to the HG-CRD algorithm is a hypergraph H, a seed node s, and parameters φ, τ, t, α. The parameter t controls the number of iterations of the procedure, which is largely correlated with how large the final set should be. The parameter τ controls what is too much excess flow on the nodes and the parameter α controls whether we can push flow along the hyperedge or not. The value φ is the target value of motif conductance to obtain. That is, φ is an estimate of the bottleneck. Note that much of the theory is in terms of the value of φ, but the algorithm uses the quantity C = 1/φ in many places instead. The output of the algorithm is the set A representing the cluster of node s.
We begin by providing a set of quantities the algorithm manipulates and a simple example.

NODE AND HYPEREDGE VARIABLES IN HG-CRD
Each node v in the graph will have four values associated with it, which are: the degree of the node d M (V ), the flow at the node m M (v), the excess flow at the node ex(v), and the node level l(v). In the hypergraph, we define the degree as follows: and in this case we will call node v to be active. Additionally, we will define excess of flow , 0) and therefore, we visualize each node v to have a capacity of d M (v) and any extra flow than this is excess. We will also restrict the node level to have a maximum value h = 3log(e T mM ) φ . If l(v) reach h, then node v cannot send or receive any flow.
Moreover, each hyperedge e = (v, u 1 , ...u k−1 ) will have a capacity C = 1 φ , where φ is a parameter input to the algorithm and e will have a flow value associated with it m M (e), which represents how much flow is pushed along this hyperedge. To determine if we can push more flow through this hyperedge or not, we will have a residual capacity variable with each hyperedge and it will be defined as r(e) = min(l(v), C)−m M (e). In the process, we can push flow through the hyperedge e = (v, u 1 , ...u k−1 ) if and only if it is an eligible hyperedge, where an eligible hyperedge is defined as:

THE HIGH-LEVEL ALGORITHM
The exact algorithm details which are adjusted from the pseudocodes in [Wang et al., 2017] to account for the impact of hyperedges, are present in HG-CRD Algorithm. In this section, we summarize the main intuition for the algorithm. At the start, the flow value m M (s) at the seed node s will be equal to 2d M (s) and in each iteration, HG-CRD will double the flow on each visited node. Each node v that has excess flow picks an eligible hyperedge e that contains the node v and sends flow to all other nodes in the hyperedge e. After performing an iteration (which we will call it HG-CRD-inner in the pseudocode,) we will remove any excess flow at the nodes (step 8 in pseudocode). If all nodes do not have excess flow in all iterations (no bottleneck is reached yet) and as we double the flow at each iteration j, then the total sum of flow at the nodes will be equal to 2d M (s)2 j . However, if the total sum of flow at the nodes is significantly (according to the parameter τ ) less than 2d M (s)2 j , then step 8 has removed a lot of flow excess at the nodes because many nodes had flow excess at the end of the iteration. This indicates that the flow was contaminated and we have reached a bottleneck for the cluster. Finally, we will obtain the cluster A by performing a sweep-cut procedure. Sweep-cut is done by sorting the nodes in descending order according to their level values, then evaluating the conductance of each prefix set and returning the cluster with the lowest conductance. This will obtain a cluster with motif conductance of O(kφ) as shown in section 4.6.
FIGURE 2 -Explanation of hypergraph CRD (HG-CRD) steps on a toy example when we start the diffusion process from node 0 and where each hyperedge (triangle in this case) has a maximum capacity of two. Note that red nodes are nodes with excess of flow, while black nodes are nodes with no excess. At the end, HG-CRD was able to highlight the correct cluster containing nodes 0, 1, 2 and 3.
In this section, we will describe the hypergraph CRD dynamics on a toy example. Figure 2 shows the motif-based degree and the flow for each node of the graph at each iteration. We will start the process from the seed node 0 and let us choose the triangle as the motif we would like to consider in our clustering process. Additionally, we will fix each hyperedge maximum capacity C to be two. At iteration 0, node 0 will increase its level by one in order to be able to push flow, then it will pick hyperedges {0, 1, 3}, {0, 1, 2} and {0, 2, 3} and push one unit of flow to each one of them. After that at iteration 1, each node will double its flow value. In this iteration, only node 0 has excess of flow and therefore it will again pick hyperedge {0, 1, 3}, {0, 1, 2} and {0, 2, 3} and push one unit of flow to each one of them. Similarly at iteration 2, each node will double its flow value. In this iteration, nodes 0, 1, 2 and 3 have excess of flow and therefore node 1 will send two units of flow to hyperedge {1, 4, 6} and will not be able to send more flow as the hyperedge has reached its maximum capacity (Recall that we have fixed the maximum capacity of each hyperedge to be two). In the rest of the iteration, node 0, 1, 2 and 3 will exchange the flow between them until their levels reach the maximum level h and this will terminate the iteration. Finally, we have four nodes with excess and the detected cluster A will be 0, 1, 2 and 3 as these nodes have m M ≥ d M . In case of original CRD, the returned cluster A will be nodes 0, 1, 2, 3 and 7, as some flow will leak to node 7.

MOTIVATING EXAMPLE
One of the key advantages of CRD and our HG-CRD generalization is that they can provably explore a vastly smaller region of the graph than random walk or spectral methods. (Even those based on the approximate PageRank.) Here, figure 3 shows a generalized graph of the graph provided in the original CRD paper [Wang et al., 2017], where we are interested in clustering this graph based on the triangle motif. In this graph, there are p paths, each having l triangles and each path is connected at the end to the vertex u with a triangle. In higher order spectral clustering techniques, the process will require Ω(k 2 l 2 ) steps to spread enough mass to cluster B. During these steps, the probability to visit node v is Ω(kl/p). If l = Ω(p), then the random walk will escape from B. However, let us consider the worst case in HG-CRD where we start from u. In this case, 1/p of the flow will leak from u to v in each iteration. As HG-CRD process doubles the flow of the nodes in each iteration, it only needs log l iterations to spread enough flow to cluster B and therefore the flow leaking toB will be (log l)/p of the total flow in the graph. This means HG-CRD will stay more localized in cluster B and will leak flow toB much less than higher order spectral clustering techniques by a factor of Ω( p log l ). HG-CRD is also able to detect the cluster in fewer number of iterations than higher order spectral clustering. -Generalized example to show a comparison between hypergraph CRD (HG-CRD) and higher order spectral clustering. Starting the diffusion process from node u, HG-CRD will stay more localized in cluster B and will leak flow toB much less than higher order spectral clustering techniques by a factor of Ω( p log l ).

HG-CRD ANALYSIS
The analysis of CRD proceeds in a target recovery fashion. Suppose there exists a set B with motif conductance φ and that is well-connected internally in a manner we will make precise shortly. This notion of internal connectivity is discussed at length in [Wang et al., 2017] and . Suffice it to say, the internal connectivity constraints that are likely to be true for much of the social and information networks studied, whereas these internal connectivity constraints are not likely to be true for planar or grid-like data. We show that HG-CRD, when seeded inside B with appropriate parameters, will identify a set closely related to B (in terms of precision and recall) with conductance at most O(kφ) where k is the largest size of a hyperedge. Our theory heavily leverages the results from [Wang et al., 2017]. We restate theorems here for completeness and provide all the adjusted details of the proofs in the supplementary material. However, these should be understood as mild generalizations of the resultshence, the statement of the theorems is extremely similar to [Wang et al., 2017]. Note that there are some issues with directly applying the proof techniques. For instance, some of the case analysis requires new details to handle scenarios that only arise with hyperedges. These scenarios are discussed in 4.7.
Let us assume that there exists a good cluster B that we are trying to recover. The goodness of cluster B will be captured by the following two assumptions, which are a generalized version of the two assumptions mentioned in CRD analysis: Assumption 1 (Generalization of Wang et al. [2017], Assumption 1).
where φ HG-CRD will work as follows, similar to CRD process, we will assume good cluster B that satisfies assumption 1 and 2, and has the following properties: vol M (B) ≤ vol M (G)/2, the diffusion will start from v s ∈ B and we know estimates of φ

THEORETICAL CHALLENGES
Proving the previous theorems for HG-CRD is non-trivial since several cases arises in hyperedges that do not exist in the original CRD. In this subsection, we discuss some of these non-trivial cases (The complete proofs of the theorems are provided in the supplementary material), such as: · Theorem 1: To prove this theorem, CRD [Wang et al., 2017] groups nodes based on their level value (nodes in level i will be in group B i ) and then consider nodes with level at least i to be in one cluster (S i ). After that, they categorize edges between cluster S i andS i into two groups based on the level values of their endpoints. Since hyperedges do not have two endpoints, we needed to extend this categorization and proved similar properties for the extension. Because of this extension, we got a gap of k between the motif conductance and φ as we proved that the motif conductance is O(kφ) instead of the original proof where the conductance is O(φ). · Theorem 2: The proof of CRD [Wang et al., 2017] relies on the following equation: where E(A, B) is the set of edges from cluster A to cluster B. This equation only holds for edges and not true for hyperedges of k > 2. Therefore, we provide a totally different proof that works for hyperedges of any sizes.

RUNNING TIME AND SPACE DISCUSSION.
The running time of local algorithms are usually stated in terms of the output. Recall that CRD running time is O((vol(A) log vol(A))/φ). As hypergraph CRD replaces the degree of vertices by the motif-based degree and the flow in each iteration depends on the motif-based degree, therefore the running time will depend on volM(A) instead of vol(A) and it will be O((volM(A) log volM(A))/φ). For space complexity, it is O(volM(A)), as each node v we explore in our local clustering will store hyperedges containing it.

experimental results
In this section, we will compare CRD using motif adjacency matrix (CRD-M) and hypergraph CRD (HG-CRD) to the original CRD in the community detection task using both synthetic datasets and real world graphs, then we will compare HG-CRD to other related work like motif-based approximate personalized Page-Rank (MAPPR) and approximate personalized PageRank (APPR) in the community detection task using both undirected and directed graphs.  We use the LFR model [Lancichinetti et al., 2008] to generate the synthetic datasets as it is a widely used model in evaluating community detection algorithms. The parameters used for the model are n = 1000, average degree is 10, maximum degree is 50, minimum community size is 20 and maximum community size is 100. LFR node degrees and community sizes are distributed according to the power law distribution, we set the exponent for the degree sequence to 2 and the exponent for the community size distribution to 1. We vary µ the mixing parameter from 0.02 to 0.5 with step 0.02 and run CRD, CRD using motif adjacency matrix (CRD-M) and hypergraph CRD (HG-CRD) using a triangle as our desired higher order pattern. Each technique is run 100 times from random seed nodes and report the median of the results. The implementation of CRD requires four parameters which are maximum capacity per edge C, maximum level of a node h, maximum iterations t of CRD inner, how much excess of flow is too much τ . We use the same parameters for all CRD variations following the setting in [Wang et al., 2017], h = 3, C = 3, τ = 2 and t = 20. For HG-CRD, we set α to be 1. As shown in figure 4, HG-CRD has the lowest motif conductance and it has better F 1 than the original CRD and CRD-M. HG-CRD gets higher F 1 when communities are harder to recover, when µ gets large.

HYPERGRAPH-CRD COMPARED TO CRD
Local community detection is the task of finding the community when given a member of that community. In this task, we start the diffusion from the given node. Table 3 shows the community detection results for CRD, CRD using motif adjacency matrix (CRD-M) and hypergraph CRD (HG-CRD). In these experiments, we identify 100 communities from the ground truth such that each community has a size in the range mentioned in table 2. The community sizes are chosen similar to the ones reported in [Yin et al., 2017] and [Wang et al., 2017]. Then for all algorithms, we start from each node in the community and finally report the result from the node that yields the best F1 measure. The implementation of CRD requires four parameters which are maximum capacity per edge C, maximum level of a node h, the maximum number of times t that CRD inner is called, how much excess of flow is too much τ . We use the same parameters for all CRD variations following the setting in [Wang et al., 2017], which are: C = 3, h = 3, t = 20 and τ = 2 except in Youtube, we set the maximum number of iterations t to be 5 as the returned community size was very large and therefore the precision was small (Increasing the maximum number of iterations increases the returned community size as it allows the algorithm to explore wider regions). Additionally, we choose the triangle as our specified motif and for HG-CRD, we try all possible values of α, which are 1 and 2 and report the results of both versions. As shown in Table 3, hypergraph CRD (HG-CRD) has a lower motif conductance than CRD in all datasets and it has the best or close F 1 in all datasets except Amazon. Looking closely, we can see that hypergraph CRD has a higher precision than the original CRD in all datasets. This can be attributed to exploiting the use of motifs which kept the diffusion more localized in the community. Additionally, CRD using motif adjacency matrix (CRD-M) has lower motif conductance than CRD in four datasets and higher F 1 in three datasets. The higher F 1 of CRD-M can also be attributed to the higher precision it achieves over CRD.

RELATED WORK COMPARISON
In this section, we will compare hypergraph CRD to motif-based approximate personalized PageRank (MAPPR) and approximate personalized PageRank (APPR) in the community detection task. We follow the same experimental setup as mention in the previous section. Table 4 shows the precision, recall and F 1 for hypergraph CRD, MAPPR and APPR. As shown in table 4, HG-CRD obtains the best  Furthermore, we compare HG-CRD to CRD, APPR and MAPPR using a directed graph, which is Email-EU. We set the parameters to be the same as the last section, and set α to be 1. For this task, we try three different directed motifs shown in figure 5, which are a triangle in any direction (M1), a cycle (M2), and a feed-forward loop (M3). As shown in table 5 HG-CRD has the highest F 1 by around 10% compared to MAPPR, which again attributed to its high precision.

RUNNING TIME EXPERIMENTS
We have made no extreme efforts to optimize running time. Nevertheless, we compare the running time of CRD, CRD-M and HG-CRD on LFR datasets while varying the mixing parameter (µ). Figure 6 shows the running times of detecting a community of a single node, we repeat the run 100 times starting from random nodes, then we report the mean running time and the error bars represent the standard deviations. As shown in the figure, when the communities are well separated (µ is less than 0.3), CRD is the fastest technique. HG-CRD is slower than CRD by a small gap 0.1 seconds and is faster than CRD-M. When the communities are hard to recover (µ gets larger than 0.3), CRD takes a longer time to recover the communities. However, both HG-CRD and CRD-M are able to recover the communities faster and with higher quality since they use higher order patterns.

discussion
In this paper, we have proposed HG-CRD, a hypergraph-based implementation of the capacity releasing diffusion hybrid algorithm. Our future exploration includes using similar idea of pushing through hyperedges to extend other flow-based methods like max-flow quotient-cut improvement (MQI) [Lang and Rao, 2004], Flow-Improve [Andersen and Lang, 2008] and Local Flow-Improve [Orecchia and Zhu, 2014] to cluster based on motifs. Proof. This is just algebra: volM Theorem 3 Proof. We extend the original CRD proof for higher order patterns. Let us classify the nodes into three cases based on their level values: . Node v kept increasing its level because it had excess of flow until it reached the maximum level.
Node v does not have excess at the end, otherwise, its level would have increased to h. · Case 2: if l(v) = 0, then m M (v) < d M (v) + (k − 1). Node v never had excess of flow to push. Proof of case 1: · If B h is empty, then the nodes were able to diffuse all of their excess. Then a full HCRD step is done and Proof of case 2: In this case, B 0 and B h are not empty and let S i be the set of nodes with level at least i. The claim will be that one of the S cuts must have conductance O(kφ). Let us start by dividing the hyperedges between S i andS i into two groups: · Group 1: Hyperedges with at least one endpoint in B j and at least another endpoint in B j or B j−1 , where j ≥ i. · Group 2: Hyperedges across more than one level (The difference in level values between the node with the highest level to all other nodes in the hyperedge is at least two). Additionally, let z 1 (i, j) = (k − 1) × |hyperedges in group 1|, z 2 (i) = (k − 1) × |hyperedges in group 2|, φ 1 (i, j) = z1(i,j) volM(Si) and φ 2 (i) = z2(i) volM(Si) . First, we will show that there exists i * between h and h 2 such that: φ 1 (i * , j) ≤ φ h . This will be a proof by contradiction: Let φ 1 (i, j) > φ h ∀i = h, ..., h 2 and j ≥ i, then: As h = 3 log (e T mM ) φ ≤ 3 log (volM(G)) φ , we get: (1) The idea in the remaining proof is that z 2 hyperedges are definitely pushing flow outside of S, while z 1 hyperedges can be pushing flow inside and outside of S. Consider any hyperedge counted in z 2 (i), these hyperedges have level difference between the node of the highest level and all other nodes of at least two, which means the residual capacity of the hyperedge in z 2 (i) is zero (Because the difference in level is at least two, this means the node of highest level in the hyperedge did not consider pushing flow to the hyperedge, this can be either because (1) It did not have excess of flow, (2) a node in the hyperedge has reached its maximum capacity or (3) the hyperedge has reached its maximum capacity.
Option (1) is not correct as the difference in level is at least two, which means the node with the highest label had excess and raised its level to push flow to another hyperedge. Additionally, option (2) is not correct as in this case, the node with maximum capacity has excess of flow and will end with level h making it the node with the highest level or it will raise its level and push flow to the hyperedge first and get space since it has the lowest level. Therefore, option (3) is the correct one and the hyperedge residual capacity is zero.) Since i * ≥ h 2 ≥ 1 φ , then min(l(v), C) where v is the node pushing the flow across the hyperedge and as all nodes level is at least h 2 , which is greater than 1 φ and C = 1 φ , therefore, the flow of hyperedge f must be 1 φ . Hence pushing flow outside of S * i of 1 φ per node. However, unlike the edge case, we cannot assume that all nodes of z 2 (i) is pushing flow from S * i toS * i as some of the nodes is actually inside S * i or insidē S * i . The z 1 (i) can push a maximum of 1 φ in S (As an upper bound of the flow leaking outside S * i , we will assume that all edges of z 1 (i) is pushing flow into S * i ) and 2volM(S * i ) mass can start at S * i . Therefore, we have: where α is the average number of nodes of z 2 (i * ) group that is on the other sidē S * i . By assuming that S * i is the smaller side of the cut, we get: .
Multiply both the numerator and denominator by k − 1, we get: Hence, φ M (S * i ) is O(kφ). When k = 2, the constant with φ is 4, which is exactly the constant in the original CRD proof. If S * i is not the smaller side of the cut, then similar to the CRD argument, we should run the contradiction argument from 1 to h 2 and note that at most z 2 /(k − 1) by C are pushed intoS i . This flow will either stay inS i or go back to S i through z 1 hyperedges. Therefore: Recall thatS * i is the smaller side of the cut, we get: Therefore, φ M (S * i ) = O(kφ), which completes the proof. . As nodes in S i for i = 1 to h are either in case 0 or 1 of the levels and therefore they have m M (v) ≥ d M (v). Therefore, we can use assumption 2 and get: which completes the proof of the lemma. Proof. As φ ≥ Ω(φM (B)) k and from theorem 1, we have φ M (A) ≤ O(kφ), therefore we will get φ M (A) ≤ φ M (B) and therefore the diffusion will not stuck on any bottleneck subset inside B and will be able to spread the mass all over B.
Before all nodes in B are saturated, the leakage according to lemma 2 is O( k σlog volM(B) ) and we need log volM(B) iterations to saturate all nodes in B and therefore the leakage toB in all iterations is O( k σ ) fraction of the total mass in the graph before and after saturating the nodes in B.
After saturating nodes in B, we will run constant number of iterations before terminating and therefore at termination the total mass will be after removing excess of flow is θ(volM(B)) because at t = log volM(B), the termination condition will be τ 2d M (v s )volM(B) and after log volM(B) all the nodes in B are saturated and the leakage is O( volM(B) σ ) therefore, the total mass will be ≤ 2 1+σ σ volM(B) and therefore the total mass is less than the termination condition by choosing appropriate τ .