Graphlet correlation distance to compare small graphs

Graph models are standard for representing mutual relationships between sets of entities. Often, graphs deal with a large number of entities with a small number of connections (e.g. social media relationships, infectious disease spread). The distances or similarities between such large graphs are known to be well established by the Graphlet Correlation Distance (GCD). This paper deals with small graphs (with potentially high densities of connections) that have been somewhat neglected in the literature but that concern important fora like sociology, ecology and fisheries, to mention some examples. First, based on numerical experiments, we study the conditions under which Erdős-Rényi, Fitness Scale-Free, Watts-Strogatz small-world and geometric graphs can be distinguished by a specific GCD measure based on 11 orbits, the GCD11. This is done with respect to the density and the order (i.e. the number of nodes) of the graphs when comparing graphs with the same and different orders. Second, we develop a randomization statistical test based on the GCD11 to compare empirical graphs to the four possible null models used in this analysis and apply it to a fishing case study where graphs represent pairwise proximity between fishing vessels. The statistical test rules out independent pairing within the fleet studied which is a standard assumption in fisheries. It also illustrates the difficulty to identify similarities between real-world small graphs and graph models.


Unfunded studies
Enter: The author(s) received no specific funding for this work. The authors have declared that no competing interests exist. NO  interactions are then, graphically, the edges between the nodes of the graph (one node = 7 one individual). Mathematically, a graph is formalised by an adjacency matrix [4], with 8 a number of columns and rows equal to the number of individuals, and elements taking a 9 value equal to 1 if there is an interaction between the individuals and 0 otherwise. While 10 such graphs are simplistic representation of relational structure, they can provide an 11 essential and formal representation of various complex phenomena from diverse scientific 12 fields such as protein-protein interaction [5] in biology or the interaction between social 13 animals [6] in ecology. Comparing graphs can therefore allow us to compare groups 14 with respect to the interactions they exhibit. There is an abundant literature in graph 15 theory aimed at comparing graphs [7][8][9][10]. This comparison is often done in a descriptive 16 and qualitative way by comparing synthetic indicators of graph structures [11]. For 17 April 8, 2022 1/14 example, by comparing the distribution of the number of links that each individual has 18 (degree distribution [12]) or the occurrences of certain forms of links between bundles of 19 individuals (motif distribution [13]). These descriptive approaches were first performed 20 in domains such as sociology [14], chemistry [15] and physics in the 90's, and more 21 recently in neuroscience to compare brain graphs [16], in genomics to compare molecular 22 graphs from different species [17] and in behavioral ecology [18][19][20][21][22]. 23 The shift to quantitative graph comparisons with the introduction of similarity or 24 distance measures is more recent [23] and has resulted in the development of plenty of 25 distances (see [9] for a recent review). Amongst these, the Graphlet Correlation Distance 26 (GCD) was shown to not only outperform the others but also to be robust to order 27 (number of nodes) and density differences between the graphs compared [24,25]. Graphlets 28 are small and connected subgraphs [26,27] that extend the concept of motifs [13,28] of a 29 graph and emerged as an accurate mining tool to provide topological information that 30 is not exclusively local [29]. Graphlets generalize the degree distribution of a graph to 31 the distribution of subgraphs connected to a node which is assigned a particular role 32 (orbit) [8,30]. Yaveroglu et al [25] showed that eleven orbits were sufficient to exhaustively 33 describe a graph, so that the topology [11] of the graph, i.e the configuration by which 34 the individuals of a graph are connected, can be summarized by the correlation matrix 35 between these eleven vectors of orbits' degrees, also called the Graphlet Correlation 36 Matrix (GCM) [25]. The GCD between two graphs is defined as the Euclidean distance 37 between the GCM of the graphs [25]. 38 To go beyond the comparison of simple descriptors of interactions between individuals, 39 it is appealing to test functional hypotheses about these interactions [23]. One possible 40 approach is to test whether a graph can be considered as an outcome of a specific random 41 graph (null model). For example, Erdös-Rényi [31] is a graph model where the links 42 between individuals are mutually independent. It can therefore be used as a model-null 43 to test the absence of correlation between the interactions of individuals. Some studies 44 based on different graph comparison methods identified the similarities between empirical 45 graphs and the outcomes of some random graph models [30,32]. However, to the best of 46 our knowledge, none of these approaches exploits the strong potential of GCD. 47 Most of the studies available in the literature focus on graphs with large number of 48 nodes (several hundreds or thousands) and very low edge densities (≤ 0.1) [33]. However, 49 these are not the only real-world graphs. In sociology, for example, the classical examples 50 of Zachary's (1997) karate club network [34] and Sampson's (1968) monks' network [35] 51 contain 34 and 18 nodes respectively. In ecology, food webs can be studied at the level 52 of trophic groups rather than at the level of species or individuals [36] with a number 53 of entities from 25 to 172. In fisheries, fleets may consist of only ten or a few dozen 54 interacting actors [37]. Thus, there are multiple cases of small-size graphs applications 55 that deserve dedicated methodological developments.

56
This paper deals with two main gaps in the literature. First, we assess the performance 57 of GCD in the small graph domain to extend its domain of applicability. Second, we 58 develop a statistical test based on the GCD to compare empirical graphs to three possible 59 null models for both small and large-size graphs. In the first part of this paper, we 60 present the method to assess the ability of GCD to correctly distinguish small simulated 61 graphs from known model types (Erdős-Rényi [31], Barbási-Albert scale free [38] and 62 k -regular [39]) by a clustering approach [25,40] using a numerical experimental design. 63 In these numerical experiments, the orders of the graph fluctuate from 5 to 50 to mimic 64 the range encountered in some real small graphs, while the density is completely covered 65 from 0 to 1. We specifically address the problem of the family of k-regular graphs which 66 are difficult graphs to solve with the GCD. We study its discriminating power with 67 respect to the density and order of the graphs, but also with respect to the differences in 68 order and density between the compared graphs. We then propose a statistical test based 69 on the GCD to evaluate whether an empirical graph can be considered as an outcome 70 of a particular random graph. Finally, we illustrate the relevance of this approach by 71 using two fishing case studies to assess the independence of observed proximities between 72 fishing vessels modeled by graphs. The statistical test does not rule out independent 73 behavior within one of the two studied fleets.

75
Graphlets Correlation Distance (GCD) 76 Yaveroglu et al [25] recently proposed to compare graphs on the basis of the first eleven 77 non-redundant orbits graphlets of up to 4-nodes. Considering a graph G of order N, they 78 first consider the N × 11 matrix which contains for each node their orbits' degree i.e the 79 number of times the node is presented in each of the eleven orbits. Columns are called 80 Graphlets Degree Distribution (GDD) [30] and the first column is the standard vector of 81 degree values. Then, the Spearman's Correlation coefficient [41] is computed between all 82 columns of the GDD matrix to build an 11 × 11 matrix called the Graphlet Correlation 83 Matrix (GCM). In this framework, the topology of a given graph G is summarised by 84 its Graphlet Correlation Matrix denoted GCM G . The GCD 11 between two graphs G 1 85 and G 2 is defined as the Euclidean distance between the upper triangular parts of their 86 respective GCM : Qualifying GCD 11 on small synthetic graphs 88 The performance of the GCD 11 to identify similarities between small graphs is assessed 89 with an experimental design using three different models of random graphs, namely the 90 Erdős-Rényi (ER) [31], the Barbási-Albert scale free (SF-BA) [42] and the k -regular 91 (REG) [39] models.

92
The Erdős-Rényi random model is the simplest and most common uncorrelated 93 random graph model. An Erdős-Rényi graph ER(N, d ) of order N and edge density 94 d = 2m/ N (N − 1) gets m edges that are randomly and uniformly chosen among the 95 N 2 possible edges [31]. This simple configuration results in an uncorrelated graph i.e, 96 with a zero assortativity [43] meaning that there is not preferential attachment among 97 nodes. In other words, the Erdős-Rényi random model generates graphs where edges are 98 statistically independent each other (which should not be confused with the notion of an 99 independent set of nodes [44]).

100
The Barbási-Albert scale free model accounts for some preferential connectivity 101 as observed in some real-world graphs [42]. In fact, in many graphs the node degree 102 distribution, follows a power law whose power γ is comprised between 2 and 3 [45]. A 103 Barbási-Albert scale free graph SF-BA(N, d, γ) of order N can be viewed as a graph 104 where each of the N nodes and a subset of m edges are added sequentially by an iterative 105 process. The preferential attachment means that the more connected a node is, the 106 more likely it is to receive new edges. This "rich-get-richer" phenomenon [38] results in 107 a graph with particular components called hubs.
The diagonals of D 1,1 and D 2,2 are trivial and are not considered (null distance 140 between a graph and itself). To insure relevant computations of precision and recall, 141 the diagonals of D 2,1 and D 1,2 are also removed. Given the symmetry of the GCD 11 , 142 D 1,1 and D 2,2 are also symmetrical and, D 1,2 = t(D 2,1 ), where t means transpose. All 143 counts are then twice larger than expected, which, however, simplifies when computing 144 precision and recall. From the precision-recall curve, that is precision P (ϵ) as a function 145 of recall R(ϵ), the AUPR is defined as: where ∆R(ϵ k ) is the change in recall from rank k − 1 to k. For each combination of order 147 and edge density, the resultant AUPR is used to complete an |N | × |d| matrix of AUPR. 148 An AUPR score equal to 1 means a perfect distinction whereas an AUPR score equal 149 to 0.5 represents a baseline which corresponds to the expected score of a random classifier. 150 An AUPR score to 0 occurs when graph topologies are all identical. We arbitrary consider 151 that an AUPR larger than 0.9 ensures a clear discrimination between two models. In 152 the domain within which AUPR ≥ 0.9 further called domain of applicability, the GCD 11 153 In this second case, only ER and SF -BA comparisons are considered to test the ability 159 of the GCD 11 to assign smaller distances to pairs of graphs coming from the same models 160 than to those coming from different models. We do not include REG in this approach 161 because the topology of graphs coming from REG remains identical regardless of the 162 order.

163
For all possible pairs of combinations of orders and densities (N 1 , d 1 ) × (N 2 , d 2 ) we 164 build the three 100 × 100 following GCD 11 matrices using the already simulated graphs: 165 We then compute the percentage of cases where the inter-model distance 166 D ER,SF -BA (N 1 , d 1 , N 2 , d 2 ) is larger than either of the two intra-model distances 167 D ER,ER (N 1 , d 1 , N 2 , d 2 ) and D SF -BA,SF -BA (N 1 , d 1 , N 2 , d 2 ). This percentage is used to 168 complete an (N 1 × d 1 ) × (N 2 × d 2 ) asymmetric matrix of probability. To limit com-169 putational and because the outputs change slowly with the order values, the numbers 170 of possible values for the order are reduced so that (N 1 , N 2 ) ∈ {5, 10, ..., 50} 2 and 171 (d 1 , d 2 ) ∈ {0, 0.01, ..., 1} 2 . We arbitrary consider that a probability of at least 0.9 is 172 sufficient to ensure a clear discrimination between two models which is the threshold 173 used to defined the domain of applicability of the GCD 11 .
We denote GCM M the average Graphlet Correlation Matrix of M and build the test 176 by computing η the number of times the distance between GCM (G) and GCM M is 177 smaller or equal than the distance between GCM (M k ) and GCM M . The p-value [46] is 178 defined byp = (η + 1)/(K + 1). The larger the p-value is, the less evidence against H 0 . 179

Empirical graphs 180
The developments proposed in this paper are illustrated on small graphs describing 181 pairwise relationships (the edges) among a set of vessels (the nodes) identified in a 182 previous work [37] based on joint-movement analysis [47]. Two particular and contrasting 183 fleets (group of vessels sharing same technical characteristics) are considered among 184 those studied in [37] with twenty graphs each. Based on pair trawling, Fleet 1 is 185 characterised by strong pairwise collaborative relationships and leads to graphs that are 186 April 8, 2022 5/14 strictly k-regular [39]. Conversely, Fleet 2 is characterised by ephemeral relationships 187 due to encounters at sea that are random or assumed to be so, and provides graphs with 188 unknown topological properties and of unknown types. When comparing graphs coming from Erdős-Rényi (ER) and Barbási-Albert scale free 193 (SF -BA) models, the domain of applicability (AU P R ≥ 0.9) of the GCD 11 is parabolic 194 with regards to the order and the density (Fig 1a. The range of edge densities allowing a 195 clear discrimination depends on the order and increases with graphs order. For instance, 196 for an order of 15 and 30, the domain of applicability respectively spans a range of edge 197 densities from 0.25 to 0.4, and from 0.05 to 0.8. Furthermore, a perfect discrimination 198 (AU P R = 1) is gradually reached for graphs with more than 30 nodes, more and 199 more irrespective of the edge density. Overall, the domain of applicability exhibits 200 an asymmetrical surface. For a given order, our results show that the discrimination 201 between ER and SF -BA random graphs model is generally better for the lower half 202 range of edge density.

Fig 1. Quality of clustering (AUPR) for three pairs of models. (a) Erdős-Rényi vs Barbási-Albert scale free, (b) Erdős-Rényi vs 1-regular and (c)
Barbási-Albert scale free vs 1-regular. For each pair of models, and for each order (from 4 to 50) and edge density (from 0 to 1) combination, the quality of clustering between 100 graphs of the two models is assessed by the Area Under the Precision Recall curve (AUPR). A maximum value of 1 corresponds to perfect discrimination. Empirical graphs from fleet 1 (red squares) and from fleet 2 (blue triangles) are projected according their features (order and edge density).
A trivial part of the domain of uncertainty corresponds to combinations of order and 204 edge density that lead to the same graph regardless of the graph models (isomorphic 205 graphs [48]). For instance, densities of 0 and 1 result in empty or complete graphs 206 respectively, and lead to null AUPR values (null distance between each pair of graph). 207 The trivial part of the domain of uncertainty is indeed symmetrical (black crosses; 208 Fig 1a).

209
The rest of the domain of uncertainty is rather asymmetric. For very small densities 210 (left side), the number of edges is insufficient to enable the emergence of significant 211 different topological components. For very high densities (right side), the two topologies 212 gradually converge towards complete graphs. These two effects decrease as graph order 213 increases and connect under a certain order threshold (approximately 12-14 nodes).

214
When comparing graphs originated from the 1-regular model and the Erdős-Rényi or 215 Barbási-Albert scale free models (Fig 1b and Fig 1c), only even values of orders from 4 to 216 50 are consistent with the 1-regular property, and their densities are totally determined 217 by their orders. A single AUPR is thus attributed to each order. In both cases, the 218 AUPR increases as a function of the order, quickly reaching a perfect value (AUPR = 1) 219 with orders equal to 16 and 10 for ER and SF -BA cases respectively. The GCD 11 can 220 therefore be used with confidence to discriminate an 1-regular from an ER or SF -BA 221 random graphs for any order above 8 nodes (AUPR ≥ 0.9). The high minimum quality 222 of clustering for all tested orders (at least 0. When dealing with different orders and densities, the domain of applicability of the 228 GCD 11 turns out to depend first on the order. For equal orders (Fig 2b, block diagrams 229 on the first bisector), the surface of the domain of applicability increases from 0.015 to 230 0.19 when the order increases from 15 to 50. This means that the edge density difference 231 allowing a clear discrimination between ER and SF -BA is larger for "large" graphs. Compared to the reference cases where the two graphs are of the same order (block 233 diagrams in Fig 2b), an increase of the order of one of the two graphs leads systematically 234 to larger domains of applicability when the increase concerns the ER graph. For instance, 235 starting with the comparison between ER(20, .) and SF -BA(20, .) with a domain of 236 applicability equal to 0.08, the domain of applicability expends from 0.09 to 0.12 when 237 the order of the ER graph increases (in column), while it flattens around 0.09 when the 238 increase of order concerns the SF B A graph (in row).

239
The domain of applicability is also systematically asymmetric favouring situations 240 where the edge density of the SF -BA graph is larger than the edge density of the ER 241 graph it is compared to, whatever their respective orders (Fig 2a)). The asymmetry that 242 exists on average is, however, dependent of the edge densities. As a matter of fact, when 243 the orders increase, the domain of applicability acquires a "violin" shape. The violin's 244 body represents the major part of the domain of applicability and concerns the lower half 245 range of edge density. It is asymmetric with regards to the first bisector which means 246 that the range of densities allowing to distinguish ER and SF -BA is larger when their 247 edge densities are small, and when SF -BA graphs are denser than ER. The violin's 248 head represents the domain of applicability, also asymmetric, for high or very high edges 249 densities (d ≥ 0.7). However the asymmetry is reversed, that is, when ER graphs are 250 denser than SF -BA. The violin's neck is the finest part of the domain of applicability 251 and appears as a transition between the two previous parts (the body and the head). In 252 the violin's neck the GCD 11 is able to distinguish ER and SF -BA with very similar 253 edges densities. Empirical graphs used in this study are characterised by small orders ranging from 10 257 to 25 nodes and large edge densities ranging from 0.05 to 0.61 (Table.1). Graphs of 258 fleet 1 are on average smaller and strongly less dense than graphs of fleet 2. The two 259 fleets from which the graph are built get substantial different graphs. On the one hand, 260 due to a strong and exclusive collaborative relationship, fleet 1 (Fig 3a) leads to regular 261 April 8, 2022 7/14 graphs of degree 1, i.e, disconnected edges. On the other hand, graphs of fleet 2 (Fig 3c) 262 show a single dense component reflecting multiple relationships. The peculiar 1-regular 263 topology of graphs of fleet 1 results in a strong negative correlation between order and 264 density which does not exist in fleet 2. As a matter of fact, 1-regular graphs gets even 265 number of nodes and their sizes (S = N/2).
266 Table 1. Main features of empirical graphs: order (number of nodes), size (number of edges) and edge density (ratio between the size and the graph maximum size).  Due to the differences in degree and edge density, their respective GCMs also show 267 major differences. The GCM of fleet 2 (Fig 3d) exhibits a standard shape [25] with 268 strong positive and negative correlations between the first eleven non redundant orbits. 269 These contrasted correlations capture heterogeneity in the role of vessels (nodes) in 270 the graph. For instance, the negative correlation between orbits {4, 6, 9} and orbits 271 {0, 2, 5, 7, 8, 10, 11} indicates the existence of peripheral nodes [25]. The GCM of fleet 272 1 (Fig 3b) shows a singular shape with a unit correlation between each pair of orbits. 273 Indeed, in 1-regular graphs, and for all strongly k-regular graphs [49], each node has 274 the same role, leading to the same eleven first orbits' degrees. This result suggests that 275 regular graphs have the same GCM and consequently, cannot be distinguished using 276 this metric.

277
Testing model type 278 All graphs of fleet 2 (blue triangles) (Fig 1b) are in the domain of applicability (AUPR 279 ≥ 0.9). However, Graph 01, 02 and 10 are very close to the boundary of the domain 280 of applicability of the GCD 11 . The diagrams of AUPR presented in Fig 1b and Fig 1c 281 April 8, 2022 8/14 are specifically relevant for features of fleet 1 graphs that also lie in the domain of 282 applicability of GCD 11 (red squares). Consequently, it is relevant to use the GCD 11 to 283 test if empirical graphs are outcomes of ER of SF -BA random graph models.

284
None of the graphs from fleet 1 present any similarity with same order and density 285 Erdős-Rényi or Barbási-Albert scale free graphs (Table 2). Due to the 1-regular topology 286 of graphs from fleet 1, and according to their order from 10 to 22, theses results were easily 287 predictable according to previous results on Fig 1b and 1c. Conversely, all graphs from 288 fleet 2 are statistically not different from Erdős-Rényi graphs with an estimate p-value 289 from 0.097 to 0.714. This suggests that graphs from fleet 2 and outcomes of Erdős-Rényi 290 share similar topological properties. Edges, and by extension the relationships between 291 vessels of fleet 2 , may be considered as statistically independent. 292 Table 2. Estimated p-values. Each empirical graph is associated to an estimated p − value (p) of being an outcome of an Erdős-Rényi or a Barbási-Albert scale free model. As in Table 1 However, Graphs 06 and 10 from fleet 2 also present a significant probability to 293 be an outcome of Barbási-Albert scale free graphs (p ≥ 0.16). For Graph 06, the 294 balanced p-value between ER (p = 0.107) and SF -BA (p = 0.16) may suggest that 295 Graph 06 presents an intermediate topology between ER and SF -BA graphs. Indeed, 296 the AU P R (1 > AU P R ≥ 0.9) associated to features of Graph 06 on Fig 1a implies 297 small overlapping between ER and SF -BA graphs which does not exclude the existence 298 of "extreme" graphs from these models which might present some similarities. Graph 06 299 might be one of these "extreme" graphs. For Graph 10, the unbalanced p-values between 300 ER (p = 0.094) and SF -BA (p = 0.572) reflects a different situation. Even if the AU P R 301 associated to features of Graph 10 (1 > AU P R ≥ 0.9) implies small overlapping between 302 ER and SF -BA graphs, Graph 10 is also the most dense empirical graph (d = 0.61). 303 According to this density, its small similarity with ER graphs could reflect the beginning 304 of the topology convergence between the two models.

305
Pair testing 306 The objective here is to test if two empirical graphs are an outcome of the same random 307 model or not. This could be helpful if the previous statistical test fails to identify 308 significant similarities with any random graphs models. Based on previous results, we 309 first identify the pairs of graphs that, given their respective orders and edge densities, 310 belong to both sides of the domain of applicability of the GCD 11 . This leads to consider 311 the following four pairs of graphs: {(03; 08); (04; 05); (04, 08); (05; 08)} (Fig 4). Not 312 surprisingly, these graphs present small densities (from 0.22 to 0.32) and, in each of 313 these pair, the two graph densities are very similar with a maximum density variation of 314 April 8, 2022 9/14 0.07 in pair (05; 08). Probability to correctly distinguish Erdős-Rényi and Barbási-Albert scale free graphs with orders and edge densities of graphs from fleet 2. Each pair of empirical graphs (i, j) from fleet 2 is associated to a comparison of an ER of order N i and edge density d i and a SF -BA of order N j and edge density d j . Each cell is colored as the probability that For each pair of graphs, the two intra-model distance distributions (ER vs ER) and 316 (SF -BA vs SF -BA) are very similar and overlap each other (Fig 5). This suggests 317 that the GCD 11 remains almost unchanged when comparing graphs coming from the 318 same graph model for any graph model. On the other hand, the inter-model distance 319 distribution (ER vs SF -BA) is clearly different and greater than the two intra-model 320 distance distributions. However, there is a small overlap between these three distributions 321 which is reflected in the probability values 1 > P ≥ 0.9.

322
Except for the pair (03; 08) (Fig 5a), the GCD 11 between empirical graphs (red 323 dotted lines) falls near the mode of the two intra-model distance distributions indicating 324 that these graphs are likely to come from the same model. It is worth noting that, 325 without the previous statistical test results (Table 2), this second test does not allow 326 to identify if empirical graphs are an outcome of Erdős-Rényi or Barbási-Albert scale 327 free graphs. However, this approach is relevant if the statistical test failed to identify 328 significant similarities with any random graphs models by providing an alternative way 329 to assess if two empirical graphs could be an outcome of the same model. The dotted red line shows the distance GCD 11 between each pair of empirical graphs from fleet 2 which presents suited features (order and edge density) to be compared. For each comparison, the empirical distance is compared with the two intra model distance distribution (ER vs ER in white, SF -BA vs SF -BA in black) and the inter model distance distribution (and ER vs SF -BA in grey) computed according to features of pairs of empirical graphs.

331
This work extends the use of the graphlet correlation distance originally proposed for large 332 real-world graphs to small real-world graphs. Through a numerical benchmark study, we 333 show the relevance of the Graphlet Correlation Distance (GCD 11 ) for comparing graphs 334 with the same order and the same density configuration. The generic statistical test 335 proposed in this study to test the similarity between empirical graphs and graph models 336 regardless of order and edge density can be applied without restriction on the size of 337 the graphs. Some limitations of the GCD 11 are highlighted on the basis of numerical 338 evidences presented here. While the k-regular graphs defy any relevant comparison, 339 the performance of the GCD 11 deteriorates when the orders and/or the densities differ, 340 especially with large density variations. This work is based on two contrasted and 341 commonly encountered random graph models, the Erdős-Rényi and Barbási-Albert scale 342 free graph models. However, the proposed experimental design and numerical analysis 343 can be directly used with other random graph models to explore new properties of the 344 GCD 11 and extend its domain of applicability. For example, it might be interesting 345 to explore the ability of the GCD 11 to compare graphs with communities using the 346 Lancichinetti-Fortunato-Radicchi [50] random graph model. The application of the 347 April 8, 2022 10/14 method developed in this study to fisheries data is particularly suitable for testing 348 whether certain fishing behaviors can be considered independent. This property is 349 generally required to apply statistical inference methods and more particularly when 350 estimating population biomasses of marine ecosystems. A very operational goal of the 351 GCD and the associated statistical test developed here could therefore be to identify 352 the sub-part of the fishing data corresponding to this independence property and their 353 use to provide an index of population abundance. Finally, by extending the use of GCD 354 to small real-world graphs, we hope to stimulate research interest in graph-theoretic 355 methods for these small graphs that are little studied in the literature.