HAT: Hypergraph analysis toolbox

Recent advances in biological technologies, such as multi-way chromosome conformation capture (3C), require development of methods for analysis of multi-way interactions. Hypergraphs are mathematically tractable objects that can be utilized to precisely represent and analyze multi-way interactions. Here we present the Hypergraph Analysis Toolbox (HAT), a software package for visualization and analysis of multi-way interactions in complex systems.


Introduction
Network science is a powerful framework for studying complex systems.However, recent work highlights the limitations of classical methods in networks, which only consider pairwise interactions between nodes to describe group interactions.Use of hypergraphs, in which an edge can connect more than two nodes, has therefore emerged as a new frontier in network science [1,2].
Chromosome conformation capture (3C) methods identify physical interactions ("contacts") between genomic loci [3,4].While classical capture is pairwise, recent advancements capture multi-way chromatin interactions via proximity ligation (Pore-C, Supplementary Information 5.1) [5], split-pool tagmentation (SPRITE) [6], or multi-contact 3C (MC-3C) [7].However, the investigation and biological interpretation of these multi-way contacts is hampered by scarcity of methods for multi-way data [5,8].Hypergraphs are a mathematically tractable extension of graph theory that precisely represent multi-way interactions (Supplementary Information 5.2) [2].We introduce the Hypergraph Analysis Toolbox (HAT), a general purpose software for the analysis of multi-way interactions and higher-order structures.HAT contains both well-studied and novel mathematical methods for hypergraph analysis in both MATLAB and Python.
Motivated to investigate Pore-C data, HAT is designed as a versatile software for hypergraph analysis.While there are several robust libraries for graph analysis, most hypergraph software is not multi-faceted and targets specific problems, such as hypergraph partitioning or clustering (Table 1).As a general purpose tool, the algorithms implemented in HAT address hypergraph construction, visualization, and the analysis of structural and dynamic properties.HAT is the first software to utilize tensor algebra for hypergraph analysis [9,10,11], and it contains recently developed methods for hypergraph similarity measures [11].HAT is open source, standardized across MATLAB (version 2021b onward) and Python (version 3.7 onward) implementations, and is documented at https://hypergraph-analysis-toolbox.readthedocs.io,where it will continue to be maintained and developed.

Applications
In the work of [8], methods contained in HAT were utilized to examine Pore-C data (Figure 1a, Supplementary Information 5.1).Hypergraphs were constructed from Pore-C data from multiple cell types.Hypergraph entropy measures the structural organization of the genome.Hypergraph similarity measures were utilized to compare the structural similarity between different regions of the genome and cell types.This hypergraph analysis was integrated with other sequencing modalities to identify transcriptional clusters and elucidate the higher-order organization of the genome.Other applications of HAT include investigating social networks [20,21], supply chain networks [22,23], (bio)chemical reaction networks [24,25], and epidemiological and ecological networks [1,2,26,27].
Construction from Data.There are two approaches for constructing a hypergraph from data (Supplementary Information 5.2).Data formats with explicit multi-way interactions, such as Pore-C are directly input to HAT for hypergraph construction.However, the vast majority of data are either pairwise observations (e.g., Hi-C) or do not contain either pairwise or multi-way interactions (e.g.sequencing data), so we implemented three measures to infer multi-way relationships based on multi-correlation measures [29,30,31].HAT constructs hyperedges by setting a minimum threshold for the multi-correlation to be considered a hyperedge.

Clique Expansion Star Expansion
Virtual Pairwise Matrix Representation Tensor Representation Expansion and Numerical Representation.For uniform hypergraphs, the adjacency, degree, and Laplacian tensors (Figure 1c) are provided and utilized in similarity, entropy, and controllability calculations (Supplementary Information 5.3).Such tensor based calculations are not currently supported for non-uniform hypergraphs and will be pursued in the future.However, both uniform and non-uniform hypergraphs expand to pairwise structures (Figure 1d, Supplementary Information 5.4).HAT contains hypergraphs expansions to generate clique expansions, star expansions, and line graphs.These representations facilitate indirect hypergraph similarity and entropy measures for non-uniform hypergraphs.
Characteristic Structural Properties.The following structural properties of hypergraphs are computed: average distance between vertices is computed based on Equation ( 30) in [11]; the clustering coefficient is calculated with Equation ( 11) in [9]; hypergraph centrality is measured according to methods in [35].For a uniform hypergraph, entropy is computed according to [9], which defined hypergraph entropy based on the higher-order singular values of the Laplacian tensor.For non-uniform hypergraphs, standard graph entropy measures are applied to the aformentioned hypergraph expansions.
Controllability.For a uniform hypergraph, the controllability matrix may be computed given the set of input or control nodes [10].HAT is the first software to analyze controllability properties of hypergraphs.
Similarity Measures.Hypergraph similarity is measured according to the recent work [11], which distinguishes direct and indirect hypergraph similarity measures.Direct measures utilize tensor representations of uniform hypergraphs; indirect measures utilize graph similarity measures applied to hypergraph expansions.HAT is the first software to implement hypergraph similarity using a tensor representation based on the novel methods in [11].A series of spectral-based measures, as well as Hamming Distance, the Jaccard Index, and centrality measures are provided to measure the similarity between hypergraphs.
For ease of use, the MATLAB and Python implementations are functionally independent but syntactically similar.The software may be installed from the online documentation, GitHub, or via PIP and the MathWorks file exchange for the respective Python and MATLAB implementations.

Conclusion
Hypergraphs can represent multi-way relationships unambiguously.HAT offers visualization and a computational framework for studying hypergraphs and Pore-C data.Thus HAT can advance the study of multi-way interactions in the genome or other complex systems.
5 Supplementary Information

Pore-C: multi-contact, chromosome conformation capture
Pore-C is a long read sequencing technique designed to capture structural features of genome architecture [5,8].It is the most recent extension of chromosome conformation capture (3C) technologies [4].Pore-C data contains multi-way contacts indicating sets of genomic loci that are colocalized in the nucleus.This reveals insight into the higher-order spatial organization of the genome.The Pore-C assay contains similar processes to previous 3C methods [5].First, multi-way contacts between any number of genomic loci are ligated in the nucleus.The genomic loci in these regions are detached from their original chromosomes and chained together.The chained regions are sequenced to determine the set of genomic loci that were originally collocated together.Hypergraphs are natural representations of Pore-C data [8].Individual genomic loci, which can be viewed and binned at any scale for this representation, are represented as vertices in the hypergraph and the colocalization of multiple loci defines a hyperedge.Hi-C data, which captures similar colocalized pairwise relationships, is commonly represented as the adjacency matrix of a graph, but the multi-way contacts of Pore-C necessitate its representation as a hypergraph.The Pore-C assay has already contributed to the field, and new methods of analyzing this data continue to be developed [5,8].

Hypergraphs
Hypergraph theory extends graph structures to represent multi-way relationships among elements of a set.Mathematically, a graph G = {V, E g } is a set of vertices V together with a set of edges E g , where each edge e ∈ E g is a pair of vertices (i.e., e = (v i , v j ) where v i , v j ∈ V).Graphs are numerically represented as matrices.
Hyperedges model multi-way relationships by allowing hyperedges to contain any number of vertices, expanding beyond the pairwise restrictions of a graph.Formally, a hypergraph H = {V, E h } is a set of vertices together with a set of hyperedges E h where each hyperedge h ∈ E h is a subset of vertices (i.e., h ⊆ V).When all hyperedges of a hypergraph have cardinality k, it is referred to as a k-uniform hypergraph.The extension from edges to hyperedges makes hypergraphs more precise representations of data and presents computational advantages.

Numeric Representations of Hypergraphs
The incidence matrix is the primary numerical representation of hypergraphs (Figure 1b).An incidence matrix H of a hypergraph H = {V, E h } is a n × m matrix when there are n vertices and m hyperedges.
Rows of H are vertices in the hypergraph, and columns are hyperedges.Each element H j,i of the incidence matrix is 1 when vertex j is a member of or incident to hyperedge i and 0 otherwise.
A k-uniform hypergraph can also be represented by a tensor (Figure 1c).The adjacency tensor of a hypergraph is the higher-order analogue of a graph adjacency matrix.Mathematically, given a k-uniform hypergraph H = {V, E h } with n vertices, the adjacency tensor is defined as Given the adjacency tensor representation, there are analogue degree and Laplacian tensors based on their pairwise definitions [9,10,11].

Hypergraph Expansions
There are two primary pairwise representations of hypergraphs (Figure 1d).Pairwise representations are often helpful to project multi-way interactions into sets of pairwise interactions or to apply standard graph theoretic operations on hypergraphs.
Clique Expansion.The clique expansion algorithm constructs a graph on the same set of vertices as the hypergraph by defining an edge set where every pair of vertices contained within the same edge in the hypergraph have an edge between them in the graph.Given a hypergraph H = {V, E h }, then the corresponding clique graph is C = {V, E c } where This is called clique expansion because the vertices contained in each h ∈ E h forms a clique in C. While the map from H to C is well-defined, the transformation to a clique graph is a lossy process, so the hypergraph structure of H cannot be uniquely recovered from the clique graph C alone [11].
Star Expansion.The star expansion of H = {V, E h } constructs a bipartite graph S = {V s , E s } by introducing a new set of vertices V s = V ∪ E h where some vertices represent hyperedges.There exists an edge between each vertex v, e ∈ V s when v ∈ V, e ∈ E h , and v ∈ e.Each hyperedge in E h induces a star in S.This is a lossless process, so the hypergraph structure of H is well-defined given a star graph S.

Figure 1 :
Figure1: a.The Pore-C assay identifies multi-way chromatin strand colocalization within the nucleus[5].b.Hypergraph representation of Pore-C is drawn where each chromatin strand is represented as a vertex and the multi-way contacts are hyperedges.This is depicted as both a hypergraph and an incidence matrix.c.For multi-way contacts of uniform size, hypergraphs are numerically represented as an adjacency tensor or multi-dimensional matrix.d.Multi-way structure are decomposed with clique and star expansions that generate virtual pairwise contacts[5].e.The workflow of HAT to construct hypergraphs from data, visualize, represent numerically, and computations available for each representation are outlined as a flowchart.