Closed trail distance in a biconnected graph

Graphs describe and represent many complex structures in the field of social networks, biological, chemical, industrial and transport systems, and others. These graphs are not only connected but often also k-connected (or at least part of them). Different metrics are used to determine the distance between two nodes in the graph. In this article, we propose a novel metric that takes into account the higher degree of connectivity on the part of the graph (for example, biconnected fullerene graphs and fulleroids). Designed metric reflects the cyclical interdependencies among the nodes of the graph. Moreover, a new component model is derived, and the examples of various types of graphs are presented.


Introduction
More interconnected parts of graphs play an essential role in the social and natural sciences. The formalization of the term "more connected part" can be defined in many ways. In this article, we focus on generalizing biconnected components of a graph and we define a novel metric that considers higher degree of connectivity on the part of the graph. Biconnected components of the graph do not allow good scalability, and their definition is complicated for weighted graphs. Our approach is based on the cycle length limit in the definition of biconnected components. The first work devoted to the study of cycles of limited length is [1,2].
Topological data analysis often uses tools developed in algebraic topology [3][4][5]. Cycles play a significant role in algebraic topology. For example, Poincaré's theorem of duality [6] shows that the homology group H n (M, Z 2 ) is a space with the inner product above the field Z 2 where the inner product is defined as an intersection index. In the case that M is a continuous manifold, then any homology class x 2 H 1 (M, Z 2 ) can be represented by a closed curve γ & M. In this case, the intersection index x Á x becomes zero when and only when a small surrounding of the curve γ is orientable.
Cycles with limited length play an essential role in the application of algebraic topology [3,4]. When calculating the topological properties of data, it is necessary to look for cycles of limited length [7,8]. Cyclical structures are also very often found in materials research. For example, fullerenes form long cycles [9], in which topological properties play an important role. Complex and social networks are another field in which cyclic structures appear [10].
The partitioning of large complex networks is a challenging task. Such networks are used as a representation of proteins, chemical compounds, co-author networks, social networks, etc. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 The classical partitioning methods have problems with densely connected subgraphs that cannot be partitioned easily. A list of the largest biconnected component in the selected network was published by Leskovec [11].
In the case of protein interaction networks in computational biology, the authors in [12] found vertices that are articulation points (determined by the computing of biconnected components), but they have a low degree and, therefore, they are unlikely to be essential to the network. In [13] the authors found the biconnected components that enabled further analysis.
Molecular topology [14] is another area where topological and metric distances are used in graphs representing molecules. The fullerenes are cage-like, hollow molecules of pseudospherical symmetry consisting of pentagons and hexagons only, resulting in a trivalent polyhedron with precisely three edges (bonds) joining every vertex occupied by carbon [9]. In graph theoretical terms, fullerenes belong to the class of cubic, planar, three-connected, and simple graphs, see Fig 1. The authors of [15] give an overview of some graph invariants that can possibly correlate with the stability of a fullerene molecule.
Fullerenes have been the subject of intense research for their unique physical, chemical, and biological properties and for their technological applications, especially in materials science, electronics, nanotechnology, and medicine [16][17][18][19].
The measuring of the distances between two nodes in a graph is a difficult task. The standard measure for this distance is the shortest path between two nodes in a graph [20,21]. Another way is the expected lengths of random walks on the graph, which can be used to derive the commute time distance [22]. The authors of [23] examine generalized distances on graphs that interpolate, depending on a defined parameter, between the shortest path distance and the commute time or resistance distance. Variants of node distances are described in detail in [24][25][26][27].
In addition to the node distance measure, the quality of the components (partitions) is also measured by means of several approaches. In 2014, a metric that measured the quality of communities according to the number of 3-cycles split across the communities was published [28]. The idea is based on the four types of directed triangles that contain cycles. These triangles are used to identify communities in directed networks. In [29], a measure that integrates both the concept of closed walks and clustering coefficients to replace the edge betweenness in the divisive hierarchical clustering algorithm (the Girvan and Newman method) was published. Levoranto et al. [30] used the strongly p-connected components for community detection in oriented networks. Community detection in undirected graphs is different. One of the major and well-known approaches uses the union of cliques to define a community [31]. Edaschery et al. [32] defined distance-k clique of a graph G = (V, E) as a subgraph of G with a diameter k. The authors use these distance-k cliques for a clustering.
Our approach defines a new type of metric in a graph based on "cyclical distances". This metric is based on the definition of a biconnected component. The distance between two vertices in the graph is defined as the length of the shortest closed trail that contains these two vertices. The distance defined in this way allows straightforward generalization for weighted graphs and also allows scalability.
In this article, we define a new measure on an undirected connected graph without bridges for the measurement of distances using cyclic subgraphs. This measure satisfies the metric properties. Our innovative measure may be used to define a new type of components that highlight the locally connected subgraphs. Moreover, these components are not based on the biconnectivity property and, therefore, are able to partition densely connected biconnected components easily.
We will first introduce the terminology and the notation which we use in the article. In the next section, we define the new distance in biconnected undirected graphs, and we describe some properties of this distance. In conclusion, we discuss the advantages and limitations of the defined distance.

Terminology and notation
In this section, knowledge of graph theory will be required. The definitions of the following terms were taken from [33]: A loop is an edge (directed or undirected) that joins a single endpoint to itself. A walk on a graph is an alternating series of vertices and edges such that for j = 1, . . ., k the vertices v (j−1) and v (j) are the end points of the edge e (j) . A closed walk is a walk where the initial vertex is also the final vertex. The length of a walk is the number of edges in this walk. We will denote the length of a walk W(u, v) as |W(u, v)|. A trail is a walk in which no edge occurs more than once. A closed trail (circuit) is a closed walk with no repeating edges. We will denote a closed trail which contains the vertices u, v as A path is a walk in which no edge or internal vertex occurs more than once (a trail in which all the internal vertices are distinct). We will denote a path with an initial vertex u and a final vertex v as P (u, v). A cycle is a closed path with a length at least one. We will denote the closed path containing u, v as CP (u, v). A clique is a subgraph where each node is adjacent to every other node. A planar graph is a graph that can be drawn on a sphere or a plane with no edge crossings.
A connected graph is a graph such that between every pair of vertices there exists a walk. A graph is called k-connected if the removal of fewer than k vertices leaves neither a non-connected graph nor a trivial one. A component of a graph is a maximal connected subgraph. An edge e is a bridge of the connected graph G if {e} is a disconnecting edge-set of G. An articulation is a vertex of a graph whose removal increases the number of components. A biconnected graph is a connected and "nonseparable" graph, meaning that if any vertex were to be removed, the graph would remain connected. Therefore a biconnected graph has no articulation vertices.
The property of being 2-connected is equivalent to biconnectivity, with the caveat that the complete graph of two vertices is sometimes regarded as biconnected but not 2-connected. This property is especially useful in maintaining a graph with a two-fold redundancy, to prevent disconnection upon the removal of a single edge (or connection).
A biconnected component (or 2-connected component) is a maximal biconnected subgraph.

Equivalent characterizations of biconnectivity
Let G = (V, E) be a simple undirected graph (loop-less, no multiple edges) that contains at least three points. Each of the following statements is equivalent to that G is biconnected 2. E 6 ¼ ; and for every v 2 V and e 2 E there is a cycle of G containing v and e, 3. G has no isolated vertices and for every e 1 , e 2 2 E there is a cycle containing e 1 and e 2 , 6. E 6 ¼ ; and for every v 1 , v 2 2 V and e 2 E there is a path from v 1 to v 2 containing e.
The shortest path metric (d sp ) [25] is the one most commonly used to determine the distance between vertices of the graph. It is a metric on the vertex-set V of a connected graph G = (V, E), defined, 8u, v 2 V, as the length of the shortest path (P(u, v)) in G. This metric does not affect greater coherence between vertices in the biconnected graph. Our goals in this article are to define a new metric on the undirected biconnected graph for the measurement of distances using cyclic subgraphs and to use higher connectivity among the vertices in the biconnected graph.

Closed trail distance in a biconnected graph without loops
In this section, a metric between the vertices in a biconnected graph without loops via a closed trail (circuit) will be defined.
The length of the shortest closed trail that started in a vertex v and ended in the same vertex is equal to 0. Then 8v 2 V d ct (v, v) = 0. And from the definition it follows that 8u, v 2 V d ct (u, v) ! 0 because the distance between u and v is the number of edges in the closed trail containing u and v.
Symmetry of the distance is obvious:  The violation of the triangle inequality by the cycles (d sc ) may be solved using the closed trails (d ct ).
In a similar way to that in which we defined the k-CT component, a k-SC component may be defined using cycles of a length up to k. The graph is a k-shortest cycle connected graph (k-SC) if every two vertices lie on a cycle of the length k. The k-SC component of the graph is a maximal k-SC subgraph.

Example 2. Figs 3 and 4 demonstrate maximal and different k-CT components in the undirected biconnected graph without loops. Lemma 1. Every 3-CT component is a clique.
Proof. Let Co3 be a 3-CT component. According to Definition 3: 8u, v 2 Co3 |CT(u, v)| 3. It is obvious that this CT(u, v) contains only three vertices and three edges and there exists the edge (u, v). Therefore, all the vertices in Co3 are adjacent and create a clique.
Example 3. Fig 5 shows the differences between the closed trail distance and the shortest path distance. We chose a subgraph of the fullerene graph for the comparison between distances.

Lemma 2. Every closed trail with a length 4 or 5 is a cycle with the same length.
Proof. Proof by contradiction. Let there exist a closed trail with a length of 4 (CT4) or a closed trail with a length of 5 (CT5) which is not a cycle. Let CP3 be a cycle (closed path) with a length of 3. It is obvious that it is not possible to create CT4 (CT5) on the set with one, two or three vertices because CT4 (CT5) contains four (five) different edges. We have at least four vertices and CT4 (CT5) has to contain the vertex u which appears at least three times in the sequence of the trail (the initial, final and inner vertex of the closed trail). Vertex u is initial and inner vertex of the trail if CT4 (CT5) contain CP3. When we add one (two) edge(s) to the CP3 we have a trail with a length of 4 (5) but it not possible to create a closed trail because the edge between the last-but-one vertex and initial (final) vertex is already on the trail (Fig 6). The only closed trails with a length of 4 (5) are therefore cycles with a length of 4 (5).
The demonstration of the impossibility of creating a CT4 or CT5 which is not a cycle and the vertex u is the initial, inner but not final vertex of the closed trail with a length 4 (5): CT4 ¼ ue 1 xe 2 ye 3 ue 4 z; CT5 ¼ ue 1 xe 2 ye 3 ue 4 ze 5 xðyÞ: The first closed trail that is not a cycle is CT6 (Fig 6, third picture).

Theorem 2. The 3,4,5-CT components are biconnected. They are the same as the 3,4,5-SC (shortest cycle) components.
Proof. It is obvious for 3-CT components. The proof follows from Lemma 1. 3-CT components are cliques and therefore 3-CT components are biconnected.
We suppose that for 4(5)-CT components it is not true. There exists a connected 4(5)-CT component which is not biconnected. Then the component contains the articulation x-see  Fig 7). From Lemma 2 it follows that all closed trails with a length of 4(5) are cycles and then the vertex x is not an articulation. This is a contradiction.
Example 4. The k-CT component is not sometime a biconnected subgraph for k ! 6. The 6-CT component can be the smallest closed trail-connected component which is not biconnected (Fig 6).

Lemma 3. d ct (u, v) is a metric in any connected graph without bridges and defines the distances between two nodes u and v.
Proof. The graph is connected without bridges. There is a path between any pair of nodes u and v. Let P 1 (u, v) = ue (1) . . . e (i) xe (i+1) . . . e (k) v be a path between any pair of nodes u and v. Because the graph has no bridges (it is not 1-edge connected), a path P 2 (u, v) that has no common edge with P 1 (u, v) exists (see Fig 8). The joining of P 1 (u, v) and P 2 (u, v) creates a closed trail on which the nodes u and v lie (CT(u, v)). Therefore, a closed trail between any two nodes  biconnected graph & connected graph without bridges ¼ 1 À CT component: The ordering of type of components according to its cardinality is defined as follows: Example 5. Fig 9 shows (Fig 9). This 4-CT component is biconnected. The Table 1 contains numbers of k-CT components which are maximal for selected k. The Fig 10 shows the sample of k-CT components on the graph of fulleroid C 260 − I [5,7].

Closed trail distance in a weighted undirected graph
The definition of the CT-distance may be extended for the weighted graph G = (V, E, w) where w is a mapping w: E ! R + .
It is the 4-CT component with the biggest weight. There exist 4-CT or 5-CT components with smaller weight (red-{v 4 , v 5 , v 6 , v 7 } and olive-{v 4 , v 2 , v 5 , v 6 , v 7 }). The red marked closed trail contains edges with bigger weight than a blue marked closed trail. The value of the wCT component is useful for better scaling of components.

Conclusion
In this article, we defined a new metric for measuring the distance between two nodes of a biconnected graph. Moreover, the defined method holds the properties of a metric on the connected graphs without bridges. The same algorithm works on any connected graph, but it does not hold the metric properties. The metric reflects the cyclical interdependencies among the vertices of the graph. The metric derives components, called k-CT, the sets of vertices that have a distance between each pair of vertices less than or equal to k. The 3,4,5-CT components are biconnected because the closed trails with the length 3, 4, and 5 are cycles. The defined metric is applicable for both unweighted and weighted graphs. Closed trail distance in a biconnected graph The paper [34] contains a method for community detection based on network decomposition. Another approach to detection of overlapping communities is used by Palla et al. [35]. Both approaches use a clique percolation method [36] for community detection, more precisely, they use cliques for specifying part of a community. A clique is a 3-CT component according to our approach. Our proposed measure and a component definition decompose the graph into overlapping components using different k-CT components.