Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Measuring group fairness in community detection

  • Elze de Vink,

    Roles Investigation, Methodology, Validation

    Affiliation Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands

  • Frank W. Takes,

    Roles Supervision, Writing – review & editing

    Affiliation Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands

  • Akrati Saxena

    Roles Conceptualization, Methodology, Supervision, Validation, Writing – original draft, Writing – review & editing

    a.saxena@liacs.leidenuniv.nl

    Affiliation Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands

Abstract

Understanding community structures is crucial for analyzing networks, as nodes join communities that collectively shape large-scale networks. In real-world settings, the formation of communities is often impacted by several social factors, such as ethnicity, gender, wealth, or other attributes. These factors may introduce structural inequalities; for instance, real-world networks can have a few majority groups and many minority groups. Community detection algorithms, which identify communities based on network topology, may generate unfair outcomes if they fail to account for existing structural inequalities, particularly affecting underrepresented groups. In this work, we propose a set of novel group fairness metrics to assess the fairness of community detection methods. Additionally, we conduct a comparative evaluation of the most common community detection methods, analyzing the trade-off between performance and fairness. Experiments are performed on synthetic networks generated using LFR, ABCD, and HICH-BA benchmark models, as well as on real-world networks. Our results demonstrate that the fairness-performance trade-off varies widely across methods, with no single class of approaches consistently excelling in both aspects. We observe that Infomap and Significance methods are high-performing and fair with respect to different types of communities across most networks. The proposed metrics and findings provide valuable insights for designing fair and effective community detection algorithms.

Introduction

Social networks are used to represent complex social systems where nodes represent individuals or entities, and the relationships between these entities, such as friendships, collaborations, or shared interests, are denoted by edges. In social network analysis, the objective is to uncover meaningful patterns, understand structural properties within networks, and analyze the dynamic processes taking place on these networks. A key concept in social network analysis is community detection, which identifies groups of nodes that are more densely connected internally and more loosely connected with the rest of the network. With that, a community is “a group of nodes that have a higher likelihood of connecting to each other than to nodes from other communities” [1]. Detecting communities is essential for understanding the functional organization of networks, predicting behavior, and improving applications such as recommendation systems, awareness spread by influence maximization, anomaly detection, and disease outbreak modeling. This is done by community detection algorithms, which take the structure of a social network as the input and output a partitioning of the nodes into communities [2]. In the literature, many different community detection algorithms have been proposed to identify meaningful clusters, which can reveal hidden structures driving real-world interactions [3].

One important aspect of network analysis is understanding how the structure of a social network reflects inherent social inequalities [4]. The formation of communities within these networks is influenced by factors such as ethnicity, gender, race, and socioeconomic status, leading to variations in network structure in terms of community size, density, and connectivity [4]. If these structural inequalities are not accounted for in the algorithm design phase, network analysis algorithms might produce biased outcomes, particularly disadvantaging minority groups. To mitigate bias and promote equitable outcomes, it is essential to integrate structural inequalities into the design of network analysis methods, ensuring fair treatment for all users and groups, regardless of their type, size, or any protected attribute. A fair community detection method should accurately identify all types of communities, whether small or large, dense or sparse, or having different connectivity in the network, while ensuring high-quality results. Community detection methods, which leverage network structure to identify communities, often struggle to accurately detect small or sparsely connected groups [5]. This misclassification can further propagate bias in other downstream network analysis tasks, such as influence maximization [6], influence minimization [7], link prediction [8,9], and centrality ranking [10], that utilize the community structure to ensure fairness.

Ghasemian et al. [5] analyzed 16 community detection methods and found significant variation, for example, in the number of identified communities across different methods. However, their study did not explore the impact of these methods on communities of varying sizes and densities. While numerous metrics exist to evaluate the quality of detected communities [11], there is currently no established metric to assess the fairness of a community detection method, particularly in relation to measuring its bias against minority groups. Despite the extensive literature on community detection methods and evaluation metrics [12], fairness remains an underexplored aspect in community detection, lacking clear definitions and comprehensive evaluation frameworks. Yet, its importance is evident in ensuring fair network analysis.

In this work, we introduce a set of group fairness metrics to assess the fairness of community detection methods. The objective of these metrics is that, given as input the ground truth communities (community labels of nodes), the communities identified by a detection method, a network dataset, and a fairness criterion, output a fairness score based on the specified community property. This indicates the extent to which the employed community detection method provided a fair partitioning of the network given the chosen fairness criterion, such as community size, density, or connectivity. Our approach begins by matching the predicted communities to the ground truth communities. In particular, we introduce three measures, called FCCN, F1, and FCCE, to compute fairness for each ground truth community, which will be subsequently used to compute the group fairness score .

In the experiments, we first extensively analyze the behavior of the proposed metric. Next, we conduct a comparative analysis of existing community detection methods, examining the performance-fairness trade-off to determine whether high-performing methods exhibit biases, i.e., we assess the extent to which they are fair. We measure this fairness based on three structural properties of communities - (i) size, (ii) density, and (iii) conductance. Here, size refers to the total number of nodes in a community, density represents the ratio of internal edges to possible internal edges, and conductance measures the fraction of a community’s edge volume that connects external nodes. We categorize community detection methods based on their algorithmic approach into six classes: (i) Optimization, (ii) Spectral, (iii) Propagation, (iv) Dynamics, (v) Representation Learning, and (vi) Probabilistic. Experiments are performed on synthetic benchmark models, including LFR [13], ABCD [14], and HICH-BA [7] models and real-world networks. The performance of community detection methods is measured using NMI [15], RMI [16], ARI [17], NF1 [18], and PF1 [19] evaluation metrics. Our experiments reveal that no single class of community detection methods consistently outperforms others. The performance-fairness trade-off varies significantly across methods as well as on networks. The analysis highlights that some of the best-performing and fairest approaches, including Infomap [20], RSC-V [21], RSC-K [21], Significance [22], Walktrap [23], SBM [24], and SBM-Nested [25] methods.

The main contributions of our work are mentioned below.

  • The paper introduces a novel set of group fairness measures, denoted as Φ, to evaluate the fairness of community detection methods. These measures quantify how fairly a community detection method identifies communities of varying structural properties, e.g., size, density, and conductance.
  • Extensive experiments are conducted to study the fairness-performance trade-off of 24 community detection methods on synthetic benchmark models (LFR, ABCD, HICH-BA) and real-world networks.
  • Through empirical analysis, the paper highlights the suitability of different methods for different types of networks. We recommend the Infomap and Significance Community detection methods as they achieve high fairness-performance trade-offs on different networks.

The paper is structured as follows. First, we review related work on evaluation metrics for detected communities and algorithmic fairness in community detection. Next, we introduce the proposed group fairness metric, followed by a description of the experimental setup. We then present the empirical analysis and insights, summarizing the most important findings and giving an outline of potential future research directions, followed by the conclusion.

Related work

In this section, we discuss the existing literature on evaluation metrics and algorithmic fairness for community detection.

Evaluation metrics

Community detection consists of two phases: identifying meaningful community structures by means of a community detection algorithm and evaluating the quality and relevance of the detected communities. Here, we first discuss metrics for evaluating detected communities without relying on ground truth labels, which are also used as optimization criteria for community detection algorithms. Next, we discuss metrics to assess the quality of the identified communities.

Metrics for community detection.

Community detection methods aim to identify meaningful group structures within a network by optimizing quality metrics. These measures compute the goodness of the community on the basis of the connectivity of nodes and the network structure. Such quality metrics can be categorized into four main types: (i) internal connectivity-based, (ii) external connectivity-based, (iii) internal and external connectivity-based, and (iv) network model-based [11].

Internal connectivity-based metrics assess the quality of identified communities using the structure within a community. These include Internal Density, Edge Inside, Average Degree, Fraction Over Median Degree, and Triangle Participation Ratio [26]. They measure factors such as edge density, internal connectivity, node degree distribution, and the prevalence of triangular motifs. For instance, Internal Density measures the density of edges within the community by comparing the actual number of internal edges to the total possible internal edges. Edge Inside counts the total number of internal edges within the community. Overall, these measures analyze how tightly knit nodes are within the identified community compared to the rest of the network.

External connectivity-based metrics evaluate how a community interacts with the rest of the network. Important measures falling under this category include the Cut Ratio [27], which calculates the fraction of outgoing edges relative to all possible edges, and Expansion [26], which measures the number of external edges from the community divided by the size of the community. The next class of metrics considers both the internal and external connections of nodes from the community’s perspective. Common metrics include Conductance [28], Normalized cut [28], and Maximum-ODF (Out Degree Fraction), Average-ODF, and Flake-ODF [29].

Network model-based metrics, such as modularity, compare actual community structures against a randomized null model to determine the strength of detected communities. Modularity [30,31] evaluates community quality by comparing the actual number of internal edges to the expected number in a random graph with the same degree distribution. The higher modularity values indicate stronger community structures. However, modularity optimization faces challenges, including the resolution limit [32], which prevents the detection of smaller communities with varying levels of interconnectedness, and the degeneracy problem [33], where multiple distinct community structures yield similar modularity values. To address these issues, various modifications have been proposed, such as modularity density [34], modularity intensity [35], Adaptive scale modularity [36], Community Score [37], SPart [38], Permanence [39], and Significance [22]. A variety of community detection algorithms have been developed that optimize these measures, aiming to detect better communities [22,4043].

Ground truth-based validation metrics.

The performance of a community detection method is evaluated by comparing the detected communities with a ground truth structure. To compare the detected and ground truth communities, several metrics from the data mining clustering literature have been adapted and reformulated to incorporate network-specific information [44].

Mutual Information (MI) [45], derived from information theory, measures how much one partition tells us about the other one. Normalized Mutual Information (NMI) [15] refines MI by incorporating the entropy of the respective community structures, while Reduced Mutual Information (RMI) [16] addresses a flaw in MI by ensuring a value of zero when the predicted partition consists of n communities (each having one node), indicating no meaningful structure. Purity [46] assigns each detected community to the most frequent ground truth label within it. A Purity score of 1 indicates a perfect match. However, Purity is asymmetric, meaning Purity(C, P) and Purity(P, C) are not necessarily equal, where C and P are the set of ground truth and predicted communities. The former, commonly referred to as “Purity," is more widely used, while the latter (Purity(P, C)) is known as “Inverse Purity" [47]. The F-Measure [47] overcomes this limitation by computing the harmonic mean of Purity and Inverse Purity.

An alternative way to define a community is as a collection of pairwise decisions for nodes in a network [48]. Two nodes are considered part of the same community if they share the ground truth label. The Rand Index (RI) evaluates accuracy by measuring the proportion of correctly assigned node pairs: true positives (TP) and true negatives (TN) indicate correct assignments, while false positives (FP) and false negatives (FN) represent errors. A true positive (TP) occurs when two nodes belonging to the same ground truth community are correctly grouped within the same detected community. Similarly, we can compute other values of the confusion matrix. RI reflects the overall alignment between detected and ground truth communities. However, it has limitations, which the Adjusted Rand Index (ARI) [17] addresses by reducing sensitivity to the number of communities. Recent metrics in this class include Variation of Information (VI) [49], Edit Distance [50], NF1 [18], and PF1 [19]. In our study, we use a selection of the most commonly used metrics from different categories to evaluate the performance of community detection methods.

Fairness-aware community detection

Fairness-aware network analysis aims to ensure that algorithms used for analyzing social networks produce unbiased and equitable outcomes for all users and groups [4]. The evolution of social networks is impacted by several factors, such as ethnicity, gender, or socioeconomic status, which lead to structural inequalities. Fairness-agnostic methods, such as community detection, link prediction, and influence maximization, often disproportionately favor majority groups and reinforce existing structural inequalities in networks. Fairness-aware network analysis introduces techniques to mitigate structural inequalities by incorporating fairness constraints, redefining evaluation metrics, and adjusting algorithmic decisions to ensure equitable representation and treatment across different social groups. In recent years, fairness-aware methods have been proposed for several downstream network analysis tasks, such as link prediction [8,9], centrality ranking [51], influence maximization [52,53], and influence minimization [7]. However, fairness in community detection is still underexplored.

Community detection aims to uncover structural patterns in networks, but no single algorithm is universally optimal across all inputs, as stated by the No Free Lunch theorem for community detection [54]. Ghasemian et al. [5] analyzed 16 community detection algorithms on a benchmark corpus of 572 diverse real-world networks to examine their over- and underfitting behaviors. The findings reveal that (i) algorithms vary significantly in the number and composition of communities detected, (ii) similar algorithms cluster together based on their outputs, (iii) performance differences impact link-based learning tasks, and (iv) no algorithm consistently outperforms others across all networks. Probabilistic and non-probabilistic methods exhibit distinct behaviors, with spectral techniques producing more similar results to each other than to other approaches. The study highlights the importance of evaluating community detection algorithms across a wide range of networks, as results from small-scale studies may not generalize.

The detectability of communities is a crucial factor in ensuring fairness in community detection. Prior research has established detectability thresholds [2,55,56] that define conditions under which communities become undetectable. Radicchi [57] demonstrated that degree heterogeneity enables modularity-based community detection algorithms to recover network community structures accurately. However, in complex networks, such as those generated by the LFR and ABCD benchmark models, the existence of a well-defined detectability threshold remains uncertain [2]. If some specific community properties hinder detectability, it may introduce bias in community detection outcomes, disproportionately affecting specific groups and leading to unfair representations in social network analysis.

Mehrabi et al. [58] highlight that community detection algorithms, particularly those optimizing modularity, tend to exclude low-degree nodes. To address this issue, the authors proposed the Communities with Lowly-connected Attributed Nodes (CLAN) method, designed for networks with attributed nodes. CLAN incorporates a supervised learning step that reassigns attributed nodes from smaller predicted communities into larger ones. However, this approach assumes that smaller communities are not meaningful and should be merged for downstream tasks, potentially leading to the dissolution of actual minority communities rather than their correct identification. While the study does not explicitly define fairness, it aims to introduce a method for mitigating an observed bias in community detection algorithms.

Manolis et al. [59] introduce two fairness metrics for communities: balance fairness and modularity fairness. Their analysis focuses on networks with two disjoint groups of nodes, represented as blue and red, where red nodes constitute the protected group. Balance fairness quantifies the deviation of the fraction of red nodes in a community from their overall fraction in the entire network. Modularity fairness evaluates how well red and blue nodes are connected within a community, using modularity as a measure of group connectedness. These metrics assess whether the protected group is adequately represented and integrated within each community. Through experiments on synthetic networks, the study finds that group size imbalance has the most significant impact on both fairness metrics. However, these methods do not fully capture the underlying definition of communities and the mesoscale connectivity of networks. Communities in social networks emerge based on human behavior and connection patterns, and enforcing proportional group representation in each detected community may not align with real-world community structures. Consequently, the proposed fairness definitions resemble node clustering rather than true community detection and may be more suitable for applications requiring equitable clustering rather than structural community identification.

The proposed group fairness metric (Φ)

To calculate the proposed fairness metrics, we first map the ground truth communities with the identified communities. We then evaluate the community-wise fairness using the three introduced metrics, and finally, community-wise fairness scores are used to determine the overall fairness of a community detection method. The detailed 3-step methodology is outlined below, followed by an analysis of the proposed metric’s behavior.

1. Community mapping

Consider a network with m ground truth communities, denoted as . A community detection method applied to G, produces a set of k predicted communities, represented as . To evaluate potential bias in a community detection method, it is crucial to measure the quality of each identified ground truth community. This is accomplished by mapping each ground truth community to the most relevant predicted community.

The mapping process is done using the following steps iteratively until at least one ground truth and one predicted community remains unmapped:

  1. Compute the Jaccard similarity for each pair of ground truth and predicted communities as follows:
  2. Select the pair with the highest similarity score and map the corresponding ground truth and predicted communities. If multiple pairs have the same highest score, the tie is broken by randomly selecting a ground truth and predicted community pair for mapping.

If any ground truth community remains unmapped after this process, it is considered completely misclassified and is mapped to an empty set.

2. Community-wise performance metrics

We introduce the following three metrics to assess how well one ground truth community is captured by its corresponding predicted community, considering both node membership and structural connectivity through edges.

  1. Fraction of Correctly Classified Nodes (FCCN): A straightforward way to evaluate how well a predicted community (pj) represents the ground truth community (ci) is by measuring the fraction of ground truth nodes correctly captured within the predicted community. This metric, referred to as FCCN, is calculated as follows:(1)
  2. F1 Score: The FCCN primarily considers the overlap of nodes between the ground truth and predicted communities, but does not penalize the presence of extra nodes in the predicted community. To address this limitation, we introduce the F1 score, inspired by the F1 score used in machine learning [60]. It is computed as follows:(2)
  3. Fraction of Correctly Classified Edges (FCCE): Community structure is primarily driven by its edges, making it essential to evaluate a community detection method based on how well it preserves intra-community connections. To capture this aspect, we introduce the FCCE metric, which measures the proportion of ground truth community edges present in the corresponding predicted community. It is defined as:(3)
  4. where E(ci) represents the set of intra-community edges in the ground truth community ci, and it is computed as: .

3. Group fairness metric (Φ)

Our goal is to investigate whether a given community detection method favors or exhibits a bias toward communities with specific characteristics, concretely, size, density, and conductance. We do so using the previously defined community-wise performance metrics. Our approach starts by analyzing the relationship between these performance scores and the attribute value of each community. For instance, Fig 1 illustrates FCCN versus normalized community size for a sample network, revealing that larger communities tend to be identified more accurately than smaller ones. To ensure fair comparisons across different networks, we normalize community attribute values between 0 and 1 by applying min-max scaling.

thumbnail
Fig 1. FCCN vs. normalized community size for each community on a small sample network.

The best-fit line shows the trend of a community detection method.

https://doi.org/10.1371/journal.pone.0336212.g001

To compute the group fairness metric (Φ), we fit a linear regression line using least squares approximation [61] on the proposed community-wise fairness metrics (represented by F ; ) versus normalized community property (p; e.g., {size, density, conductance}). For example, in Fig 1, the dashed line represents the best-fit linear line for the community-wise fairness metric (FCCN) versus normalized community size. The group fairness metric is defined as the slope of this regression line, which ranges in (–1,1). The regression line provides the changes in the x-axis () and y-axis (), and the angle θ is computed using the arctangent function as follows:

To get the fairness metric value ranging in (–1,1), we multiply the angle with as the arctangent angle is in radians within . Finally, the fairness of a community detection method, with respect to a community-wise fairness metric (F ) and community property (p), is computed as:

(4)

The x-axis values are normalized using min-max scaling, meaning that , and therefore, it can be excluded from the calculation. Referring to the example in Fig 1, where , the fairness score is computed as follows:

This value represents the fairness score for the community-wise metric (FCCN) with respect to community size.

The proposed metric ranges from –1 to 1, where a value of 0 (corresponding to a straight best-fit line) indicates a fair result, meaning that all communities are identified either equally well or equally poorly. Negative values suggest that the community detection method favors communities with lower property values (p), while positive values indicate a bias toward communities with higher property values (p).

Analyzing metric behavior

In this section, we analyze how the proposed fairness measures respond to node misclassification. To do this, we evaluate different fairness metrics under various levels of node misclassification to understand its impact on community fairness. We construct a HICH-BA network [7] consisting of a single community with 1,024 nodes and ∼90k edges. The misclassification process gradually removes the nodes from this community, starting from 0 to 1,024. Initially, the classification is entirely accurate, with the predicted community perfectly aligning with the ground truth community. However, as misclassification increases, the prediction deviates by progressively removing nodes. Fig 2 shows how the fairness metric scores change as the number of misclassified nodes increases over time. For FCCE, the figure highlights the range between the highest and lowest values observed across 20 repetitions of misclassification, with the average score marked.

thumbnail
Fig 2. Community-wise fairness metric score as the number of misclassified nodes in the predicted community increases.

The plot shows the average FCCE values over 20 iterations, along with the highest and lowest recorded values at each point.

https://doi.org/10.1371/journal.pone.0336212.g002

To observe the behavior of the proposed metric in networks where misclassified nodes can be reassigned to different communities, we generate a HICH-BA network [7] with a homophily factor of 0.9. The network consists of two communities: a majority group with 70 nodes and a minority group with 40 nodes, connected by approximately 900 edges. Initially, the predicted communities align perfectly with the ground truth communities. To introduce misclassification, nodes are progressively swapped between the minority and majority communities, ranging from 0 to 40. The resulting predicted and ground truth communities are then mapped for evaluation. Fig 3 presents the fairness scores for individual communities (left) and group fairness (right) as a function of the number of swapped nodes. Since the FCCE score depends on which specific nodes are reassigned, the figure reports the average value along with variations observed over 20 iterations. Due to the homophilic nature of the network, FCCE values tend to be lower than those of FCCN and F1.

thumbnail
Fig 3. Behavior analysis of proposed measures.

(a) the behavior of community-wise performance metrics and (b) group fairness on a HICH-BA network having both minority and majority communities.

https://doi.org/10.1371/journal.pone.0336212.g003

As an equal number of nodes are exchanged between the minority and majority communities, community-wise fairness remains lower for the minority group compared to the majority until a critical threshold (∼0.75). Beyond this point, the mapping between majority and minority communities is switched, leading to an increase in fairness for the minority group relative to the majority. A similar trend is observed in , which initially favors the majority but, after the mapping transition, either favors the minority or remains neutral. Additionally, is fair after this threshold, as it accounts for nodes in the predicted communities that do not appear in the ground truth. It is important to note that this example represents an extreme scenario that may not commonly occur in real-world settings.

The proposed fairness metric (Φ) provides insights into biases, allowing community detection methods to be assessed not only based on overall performance but also in terms of fairness across communities with varying sizes, densities, and conductance.

Computational complexity of the proposed fairness metrics

Here, we discuss the computational complexity of the proposed fairness metrics. Let (number of nodes), me = |E| (number of edges), (number of ground-truth communities), and (number of predicted communities). We analyze the complexity of each step as follows.

1) Community mapping: To compute , we can build a contingency table of overlaps by scanning all nodes once. This will have time complexity of O(n), and space complexity of O(mk). We can compute sizes |ci| and |pj| using the same node pass. Now, the complexity to compute Jaccard similarity of a pair is O(1), and therefore the complexity to compute Jaccard similarity for all (i, j) pairs is O(mk). Finally, we do a greedy one-to-one mapping of ground-truth and predicted community, where we look for the pair having the highest similarity and remove them, which will have the time complexity of . Greedy one-to-one mapping can also be done using a heap that might lead to the time complexity of .

2) Community-wise metrics (FCCN, F1, FCCE): FCCN and F1 require only which were computed earlier and have the time complexity of . FCCE requires edge-level information. We compute ground-truth internal edges using one pass over edges, which leads to the time complexity of O(me), and space complexity of O(m). We also compute internal edges for both labels (i, j), having the time complexity: O(me), and space complexity: O(mk).

3) Community properties (size, density, conductance): The complexity to compute communities’ size is O(n), which can be computed using earlier information. The complexity to compute density and conductance is O(me), which can also be tracked during the edge pass done while computing FCCE.

4) Group fairness score Φ: The complexity to normalize properties using min-max scaling is O(m), and to fit Least-squares line is O(m). Finally, using the slope of the line can be computed in O(1).

Overall complexity: Now the overall time complexity of different combinations can be computed using the above steps, which is summarized in Table 1. The space complexity in all cases will be .

thumbnail
Table 1. Time complexity of computing Φ across different combinations of performance metrics and community properties.

https://doi.org/10.1371/journal.pone.0336212.t001

FCCN and F1 do not require edge-level information unless the property involves density or conductance. FCCE always requires an edge pass; the same pass can also compute density and conductance, so all FCCE vs. property combinations share the same complexity. The complexity for computing the fairness Φ is O(m) and does not affect the overall asymptotics.

Experimental setup

This section outlines the experimental setup, detailing the community detection methods used in the analysis, the datasets (both synthetic and real-world), and the metrics for evaluating the quality of the identified communities, focusing on the performance-fairness trade-off.

Community detection methods

We analyze 24 community detection methods based on their performance and fairness using our proposed group fairness metric, Φ. Table 2 classifies these methods into six classes according to their approach to community partitioning. These approaches are briefly explained below.

  1. Optimization methods optimize a quality function to assess partition or community quality. Most of the methods in this category use the modularity function [30,31], except significance [22]. Given that modularity optimization is NP-hard, these methods typically use heuristic techniques.
  2. Dynamics methods infer community structure through network traversal, often using random walks. The intuition is that random walks tend to remain within communities, as they are more densely connected than the rest of the graph. For instance, Walktrap [23] proposes a similarity measure based on random walks, which effectively captures dense subgraphs in sparse graphs.
  3. Spectral methods create partitions based on spectral properties of matrices describing the network, such as the adjacency matrix or the Laplacian matrix [2]. These methods analyze the eigenvalues and eigenvectors of these matrices, which provide insights into the network structure. For example, Spectral Clustering [62] makes use of the Fiedler vector, the second smallest eigenvalue, to construct communities.
  4. Propagation methods initially assign a community label to all nodes and then iteratively update nodes’ community label based on neighboring nodes, aiming for a stable configuration that reflects the community structure. The Label Propagation Algorithm (LPA) [63] was the first such method, updating the nodes’ label according to the majority of neighbors, with ties resolved randomly. The speed and scalability of the LPA method make it well-suited for large networks and serve as the foundation for various extensions, including FLPA [64], LLPA [65], LPA-MNI [66], WSSLPA [67], and DCC [68].
  5. Representation Learning based methods first generate a network embedding that is a latent representation of the network in a low-dimensional space, and then apply clustering algorithms, such as k-means, to create a partition of the network [69].
  6. Probabilistic methods approach community detection by modeling the network as being generated by an underlying probabilistic process. These models estimate the likelihood that each node belongs to a certain community based on the observed edges in the network. The methods falling under this category include the Expectation-Maximization [70], Stochastic Block Model [24], and SBM - Nested [25].
thumbnail
Table 2. Overview of community detection methods used in experimentation from 6 classes.

https://doi.org/10.1371/journal.pone.0336212.t002

Network datasets

To perform our experiments, we use synthetic benchmark network-generating models and real-world networks, which are discussed below.

Synthetic networks.

We use the following benchmark models to generate synthetic networks:

  1. LFR Benchmark Model: The Lancichinetti–Fortunato–Radicchi (LFR) benchmark [13] is a widely used method for generating synthetic networks with an inherent community structure. It improves upon the Girvan-Newman benchmark [79] by incorporating power-law distributions for both degree and community sizes, making the generated networks more representative of real-world structures [80,81]. A key feature of the LFR benchmark is the mixing parameter μ, which determines the fraction of intercommunity edges. When , all edges are confined within communities, whereas results in no intra-community edges. Other configurable parameters in the LFR model include the number of nodes, average and maximum degree, minimum and maximum community size, and power law distribution exponent of community sizes (τ), allowing for flexible network generation.
    For our experiments, we primarily set the parameter values used by Lancichinetti et al. [82]. Specifically, we generate networks having 10,000 nodes with other parameters set as- the average degree degavg = 20, the maximum degree degmax = 100, the minimum community size |cmin| = 20, the power-law exponent of the degree distribution , and the power-law exponent of the community size distribution . The mixing parameter is varied as to generate different types of networks.
  2. ABCD Model: The Artificial Benchmark for Community Detection (ABCD) model [14] is similar to the LFR benchmark but offers improvements in scalability and interpretability of the mixing parameter. Like LFR, it generates networks where both the degree and community size distributions follow power laws. However, ABCD runs approximately 100 times faster. ABCD introduces as its mixing parameter, providing a more intuitive measure of community strength. When , all edges remain within communities, similar to LFR. However, when , edges are randomly distributed throughout the network. In contrast to LFR, where results in zero intra-community edges. Crucially, ABCD maintains intra-community edges proportional to community size.
    To ensure comparability, we configure the ABCD model parameters to closely align with the graphs generated by the LFR benchmark. Therefore, the corresponding values are determined using the global method from [14]. While plays a similar role to μ in LFR, it differs slightly as .
  3. HICH-BA Model: HIgh Clustering Homophily Barabási- Albert (HICH-BA) [7] model is a recently proposed extension to the homophily BA model [83]. It was specifically designed to generate homophilic networks with controlled clustering coefficient, density, and community sizes. The HICH-BA model takes the following parameters:
    • n: Total number of nodes in the network.
    • r: This is a list containing the likelihood of assigning a node to each community. The i-th element in r shows the likelihood of adding a node to the ith community ci.
    • h: This is the homophily factor.
    • pN: At each iteration, a node is added to the network with probability pN; otherwise, an edge is added.
    • pt: Probability to form a close triad connection.
    • pPA: Probability for a new edge to be placed using preferential attachment.

    HICH-BA iteratively adds nodes and edges to the network based on probabilities provided by the user. HICH-BA allows for a custom community size distribution by setting r. We set r to create two network cases: (i) multiple minority communities with one majority community (MMin), where the parameter r was set as [0.005, 0.005, 0.005, 0.01, 0.01, 0.01, 0.02, 0.02, 0.02, 0.9], and (ii) multiple majority communities (MMaj), where r is set as [0.003, 0.003, 0.003, 0.03, 0.03, 0.03, 0.3, 0.3, 0.3]. The other parameters are same for both cases: , h = 0.9, pN = 0.1, pT = 0.3, and pPA = 0.8.

Table 3 summarizes the average parameters of the generated networks using different benchmark models. In our experiments, we generate ten networks for each configuration using the chosen generative model. All results are shown by computing the average and standard deviation of the scores obtained across these networks.

thumbnail
Table 3. Synthetic dataset summary: The values are the average of all networks of a given type.

https://doi.org/10.1371/journal.pone.0336212.t003

Structural properties of networks. The networks generated using different benchmark models have distinct structural characteristics, reflecting various aspects of real-world networks. Therefore, it is crucial to analyze the structural properties of these networks, which are later used to assess fairness, including community size, density, and conductance, to gain a deeper understanding of their internal and external connectivity. To achieve this, we generate networks using different models and study the correlation between community size, density, and conductance using the Pearson correlation coefficient. Fig 4 presents these correlations for LFR, ABCD, and HICH-BA models, providing insights into the interplay of community connectivity across different network types.

thumbnail
Fig 4. correlation between community properties — size, density, and conductance, in LFR, ABCD, and HICH-BA networks.

https://doi.org/10.1371/journal.pone.0336212.g004

For LFR networks (shown in Fig 4(a)), there is a strong negative correlation between size and conductance at , indicating that larger communities tend to have lower conductance. As μ increases, this correlation weakens, while a new relationship emerges between density and conductance, showing that the dense communities have lower conductance. In contrast, ABCD networks (Fig 4(b)) consistently show a negative correlation between density and size across all values. Unlike LFR networks, large communities in ABCD networks have high conductance at low μ values, suggesting they are well separated. Additionally, the correlation between density and conductance differs across LFR and ABCD models, with dense communities in ABCD being more distinctly separated for high values. For HICH-BA networks (Fig 4(c)), structural properties depend heavily on the network’s composition, particularly whether multiple majority or minority groups exist. In MMaj and MMin networks, density and size are negatively correlated. However, in MMin networks, conductance also has a strongly negative correlation with size and a positive correlation with density.

Real-world networks.

We use the following real-world networks; summarized in Table 4.

  1. Polbooks [84]: This network represents the co-purchasing of books on US politics, with data collected around the 2004 presidential election. The books are grouped into three communities based on political affiliation: conservative, liberal, and non-partisan.
  2. Football [79]: The US college (American) football network represents the regular-season games of the 2000 NCAA Division I-A football matches. Nodes correspond to football teams, while edges indicate matches played between them. The communities in the network are defined based on the 12 conferences to which the teams belong.
  3. Eu-core [85,86]: This communication network is constructed from emails exchanged between employees of a European research center, where communities correspond to the departments employees belong to. The network is converted into an undirected network, and the largest connected component comprising 98% of the nodes is considered.

All networks used in our experiments are undirected, unweighted, and connected with non-overlapping communities.

Evaluation metrics

We use the following metrics to measure the quality of the identified communities as compared to the ground truth. Let us assume that is the given network, C is the set of ground truth communities, defined as , and P is the set of predicted communities .

  1. Normalized Mutual Information (NMI): NMI [15] is, as the name suggests, a normalized variant of mutual information (MI). The MI is computed as
    where S(C) represents the Shannon entropy of the clustering size distribution, while S(C,P) denotes the Shannon entropy of the joint clustering size distribution. Now, NMI is computed as:
    where H(C) and H(P) are the entropies of the ground truth and predicted partitions, respectively.
  2. Reduced Mutual Information (RMI): The RMI [16] was proposed to address a flaw with MI that the measure should return a value of 0 when the predicted partition consists of n communities (each having one node), indicating no meaningful structure. However, instead of 0, MI returns H(C) [16]. RMI addresses this by adding a correction term and is defined as:
    where represents the number of contingency tables with row and column sums equal to a = {|ci|} and b = {|pj|}, respectively. We use the RMI method proposed in [87], which provides an improved approach for encoding contingency tables, and the upper limit of the computed values is bounded by 1
  3. Adjusted Rand Index (ARI): The ARI [17] is a chance-adjusted version of the Rand Index (RI). RI is given by:
    Here, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives. These terms are defined as follows:
    • TP: The number of node pairs that belong to the same community in both C and P.
    • TN: The number of node pairs that belong to a different community in both C and P.
    • FP: The number of node pairs that are in a different community in C and in the same community in P.
    • FN: The number of node pairs that are in the same community in C and in a different community in P.

    Hubert et al. [17] formulated a way to adjust for chance in any measure M as follows:
    where E(M) is the expected value for some null model. Hubert formulated that if partitions are generated randomly, the expected number of pairs in a community intersection is given by:
    Using these definitions, the ARI is computed as:
    ARI has an upper bound of 1, which occurs when the predicted partition perfectly matches the ground truth partition. Its lower bound is -1, with negative values indicating that the similarity between the two partitions is lower than what would be expected from randomly assigned partitions.
  4. Average F1 Score (PF1): PF1, introduced by Rossetti et al. [19], evaluates the quality of predicted communities by mapping them to ground truth communities based on the highest label overlap. Multiple predicted communities can map to the same ground truth community. Once mapped, the similarity is assessed using the F1-score, i.e., the harmonic mean of recall and precision. The averaged F1-score provides a basis for comparing partitions. To differentiate it from our proposed measure, we referred to it as PF1.
  5. Normalized F1 Score (NF1): Rossetti et al. [18] refined PF1 by introducing NF1, addressing cases where some ground truth communities remain unmapped. If a ground truth community’s label is not the most frequent in any predicted community, it is excluded from mapping. The set of mapped ground truth communities is denoted as Cid. Two key measures are introduced: coverage, the fraction of ground truth communities that are mapped () and redundancy, the ratio of predicted communities to mapped ground truth communities (). NF1 normalizes PF1 by incorporating both measures and is computed as:

Experimental details

LFR networks are generated using NetworkX’s Python library [88]. The code for ABCD [14] and HICH-BA [7] is available on GitHub, with links provided in their respective papers.

Most community detection methods are implemented using the CDlib Python library [89], which we use with default parameters unless specific settings are required. Community detection methods that require the number of communities as input include RSC-K, RSC-SSE, RSC-V, Spectral Clustering, DeepWalk, FairWalk, Node2Vec, Fluid, and EM. Since the ground truth number of communities is typically unknown, these methods have an advantage over those that must infer it. For reproducibility, the parameter values used in community detection methods are detailed in the SI (refer to Appendix 1). Our code and data are available at: https://github.com/akratiiet/Group-Fairness-Metrics-for-Community-Detection

Results

In this section, we discuss the fairness of community detection methods with respect to community size, density, and conductance versus the quality of identified communities.

Analysis on LFR networks

Fairness-performance trade-off versus community size.

We begin by evaluating how effectively different community detection methods identify communities of varying sizes. Fig 5 presents the relationship between NMI and three group fairness metrics (, , and ) for LFR networks with mixing parameters , 0.4, and 0.6. To conduct these experiments, we generate 10 LFR networks for each configuration and apply community detection methods. Community detection methods that produce different results across executions are run ten times, and we report the overall average and standard deviation.

thumbnail
Fig 5. NMI vs. on LFR networks.

NMI vs. fairness of community detection methods with respect to community size for LFR networks of 10,000 nodes having different μ values.

https://doi.org/10.1371/journal.pone.0336212.g005

For , most community detection methods, except for EM, Paris, SBM-nested, and RSC-SSE, tend to favor larger communities across all fairness metrics F*. Methods that achieve both high fairness and good quality community include RSC-K, RSC-V, Infomap, Walktrap, and Significance, all of which exhibit near-optimal NMI (∼1). In contrast, SBM-Nested effectively detects smaller communities (negative ) due to its hierarchical nature, which allows it to accurately identify smaller communities at lower levels. As the mixing parameter μ increases to 0.4, SBM-Nested continues to favor smaller communities. However, when communities become highly interconnected (), no method perfectly identifies small groups. With high interconnectivity, all methods show a stronger bias toward larger communities, and the fair methods tend to have very low NMI, indicating that they struggle equally across all community sizes. Notably, across all types of networks, there is no observed correlation between fairness and the performance of community detection methods.

Fairness-performance trade-off versus community density.

We further examine the performance of community detection methods in identifying communities with varying densities. Fig 6 shows that when the mixing parameter is low, community detection methods tend to detect sparse communities more effectively than denser ones. However, as μ increases and inter-community edges become more prevalent, this pattern shifts. At and 0.6, methods including Leiden, Louvain, RB-C, RB-ER, and Spinglass, have high NMI but negative . These methods predict fewer communities than the ground truth, which are often mapped to low-density ground truth communities. At , methods with high NMI also perform better at detecting denser communities. For instance, the Significance method predicts significantly more communities than the ground truth, effectively identifying dense structures while fragmenting sparser ones in its prediction.

thumbnail
Fig 6. NMI vs. on LFR networks.

NMI vs. fairness of community detection methods with respect to community density for LFR networks of 10,000 nodes having different μ values.

https://doi.org/10.1371/journal.pone.0336212.g006

Fairness performance trade-off versus community conductance.

Fig 7 illustrates NMI versus fairness concerning community conductance. Most community detection methods tend to favor communities with lower conductance, which are more separated from other communities. This bias intensifies as μ increases, leading to lower scores across all fairness metrics. At medium mixing levels (), methods such as RSC-V, SBM-Nested, Infomap, and Significance achieve both high fairness and good-quality communities. However, at , a linear relationship emerges between and NMI, where methods that prioritize fairness tend to have poor predictive performance, while those with higher accuracy show a stronger bias.

thumbnail
Fig 7. NMI vs. on LFR networks.

NMI vs. fairness of community detection methods with respect to community conductance for LFR networks of 10,000 nodes having different μ values.

https://doi.org/10.1371/journal.pone.0336212.g007

Analysis on ABCD networks

Fairness-performance trade-off versus community size.

Fig 8 shows the trade-off between Normalized Mutual Information (NMI) and fairness for various community detection methods applied to ABCD networks of 10,000 nodes under different values of (0.2, 0.4, and 0.6). The community detection methods perform better on ABCD even when compared to LFR networks with high μ values, as ABCD networks better replicate communities even with higher interconnectivity. However, as increases (from top to bottom in the figure), the distribution of fairness values becomes more dispersed, suggesting that different methods handle fairness differently at varying levels of structural mixing. The NMI values tend to decline as xi increases for most of the community detection methods, as expected.

thumbnail
Fig 8. NMI vs. on ABCD networks.

NMI vs. fairness of community detection methods with respect to community size for ABCD networks of 10,000 nodes having different values.

https://doi.org/10.1371/journal.pone.0336212.g008

For , community detection Methods, including Significance, RSC-SSE, Spectral, Infomap, Walktrap, and Label Propagation, are fair. However, as increases, Spectral tends to be less fair, and RSC-V emerges as fair. For and 0.6, the communities identified using the SBM-Nested method show a high value for performance metrics; however, the fairness is lower as the method favors large size, high conductance, and low-density communities. For , Representation Learning-based methods (DeepWalk, Node2Vec, Fairwalk) show fairness with respect to conductance and fairly identify communities of all conductance. Fairwalk is a fairness-aware representation learning-based method; it still does not fairly identify communities of all types, and the performance with respect to different validation metrics (refer to S6, S7, and S8 Tables in Appendix 3, SI) is lower compared to many other methods.

Fairness-performance trade-off versus community density.

Fig 9 shows the fairness of community detection methods with respect to density on ABCD networks. The results are quite similar to what we observed for community size, as discussed earlier. However, there are a few key distinctions to note. The Paris method fairly identifies communities of different densities as shown by . However, as increases to 0.6, Paris becomes fair with respect to and . This occurs because Paris effectively detects large, sparse communities but also assigns a significant number of nodes from other communities to these larger groups, which is captured by . Similarly, the Spectral method consistently detects a comparable number of communities across all ABCD networks. However, its fairness decreases in terms of and , while increasing for . This is because it tends to identify large, sparse communities while merging other community nodes into them, which is reflected in the measure.

thumbnail
Fig 9. NMI vs. on ABCD networks.

NMI vs. fairness of community detection methods with respect to community density for ABCD networks of 10,000 nodes having different values.

https://doi.org/10.1371/journal.pone.0336212.g009

Overall, most community detection methods favor lower-density communities rather than higher-density ones, particularly concerning . Methods such as Combo, Leiden, Louvain, RB-C, RB-ER, and SBM, consistently favor lower-density communities across all values of . Unlike the findings for LFR networks, fairness does not necessarily improve as the mixing parameter increases, i.e., due to the structural connectivity of ABCD networks.

Fairness performance trade-off versus community conductance.

Fig 10 presents the fairness of community detection methods with respect to conductance on ABCD networks. As observed previously, methods that achieve both high fairness and good community quality include Significance, RSC-SSE, Infomap, and Walktrap, which are also in this case. Another group of methods that demonstrates slightly lower fairness and community quality but still performs relatively well includes FairWalk, Node2Vec, Spectral, RSC-V, SBM-Nested, Fluid, and EigenVector.

thumbnail
Fig 10. NMI vs. on ABCD networks.

NMI vs. fairness of community detection methods with respect to community conductance for ABCD networks of 10,000 nodes having different values.

https://doi.org/10.1371/journal.pone.0336212.g010

An important observation is that as increases, methods, including Combo, Leiden, Louvain, RB-C, and SBM, deviate from other community detection methods in terms of fairness. At , these methods tend to favor high-conductance communities. At , their fairness improves, but by , they begin favoring low-conductance communities. This shift occurs because, in ABCD networks, the correlation between conductance and community size varies significantly with (see Fig 4). As increases, these methods detect fewer communities, which are often mapped to larger, low-conductance ground truth communities. Additionally, we observe that community detection methods that require the number of communities as input have lower fairness. While they effectively identify high-conductance communities, they do not correctly group nodes from lower-conductance communities, leading to discrepancies from the ground truth.

Analysis on HICH-BA networks

We construct two homophilic networks: MMaj, which consists of multiple majority groups (large-sized communities), and MMin, which consists of multiple minority groups (small-sized communities). In both networks, smaller communities have higher density. However, the correlation of size and density with conductance differs significantly between them. Next, we analyze the fairness and performance of various community detection methods on these networks.

MMaj and MMin represent two extreme types of network structures. A key observation is that the RB-ER and Significance methods predict an excessive number of communities (over 3,000) in both networks. This suggests that when small-sized communities are present, these methods also tend to fragment larger communities into numerous smaller ones. Most community detection methods are not inherently designed to handle such extreme cases. Nevertheless, we briefly discuss their results and the fairness-performance trade-off of different methods. Figs 11, 12, and 13 show NMI versus fairness of different community detection methods with respect to size, density, and conductance, respectively.

thumbnail
Fig 11. NMI vs. on HICH-BA networks.

NMI vs. fairness of community detection methods with respect to community size for HICH-BA networks of 10,000 nodes having (i) MMaj network having multiple majority communities and (ii) MMin network having multiple minority communities.

https://doi.org/10.1371/journal.pone.0336212.g011

thumbnail
Fig 12. NMI vs. on HICH-BA networks.

NMI vs. fairness of community detection methods with respect to community density for HICH-BA networks of 10,000 nodes having (i) MMaj network having multiple majority communities and (ii) MMin network having multiple minority communities.

https://doi.org/10.1371/journal.pone.0336212.g012

thumbnail
Fig 13. NMI vs. on HICH-BA networks.

NMI vs. fairness of community detection methods with respect to community conductance for HICH-BA networks of 10,000 nodes having (i) MMaj network having multiple majority communities and (ii) MMin network having multiple minority communities.

https://doi.org/10.1371/journal.pone.0336212.g013

Community detection methods that achieve relatively high NMI scores include Combo, Leiden, Louvain, RB-C, Spinglass, SBM, and SBM-Nested. These methods also predict a reasonable number of communities, maintaining good performance. However, on MMin networks, most community detection methods do not achieve high NMI values. This is influenced by their ability to detect the largest ground truth community, which most of the methods are not able to identify properly. Methods that perform well in terms of NMI on MMin networks include Paris, RSC-V, Label Propagation, SBM, and SBM-Nested. Notably, SBM and SBM-Nested methods are fair across all types of communities.

A particularly interesting observation is that Walktrap and Infomap perform fairly and maintain high performance across different LFR and ABCD Networks. These methods maintain fairness on HICH-BA networks; however, the quality of the detected communities is poor. This is because both methods overestimate the number of communities, leading to lower-quality results across varying community properties.

Real-world networks

We perform experiments on three real-world networks: Polbooks, Football, and Eu-core. Fig 14 presents NMI vs. for real-world networks. All methods, except EM, tend to favor larger groups, with consistent results across various fairness metrics and community property. It is important to note that no single method is universally fair across all datasets, though Significance consistently performs well. Detailed results for Polbooks, Football, and Eu-core networks are provided in SI Appendix 3, in S11, S12, and S13 Tables, respectively.

thumbnail
Fig 14. NMI vs. fairness of community detection methods on real-world networks.

https://doi.org/10.1371/journal.pone.0336212.g014

Fairness versus other evaluation metrics

We also assess fairness in relation to four additional metrics that evaluate the quality of the detected communities to ensure the robustness of our findings. These metrics include: (i) Adjusted Rand Index (ARI), (ii) Reduced Mutual Information (RMI), (iii) F1 Score (PF1), and (iv) Normalized F1 Score (NF1) (refer to the Evaluation Metrics Section for details). Fig 15 presents the results of for the LFR network with , analyzed with respect to size, density, and conductance. This analysis aims to identify community detection methods that achieve both high fairness and high performance across multiple evaluation metrics, making them suitable for broader applications. Additionally, understanding the working dynamics of these methods will provide insights for designing new community detection algorithms with high fairness-performance trade-offs. From this evaluation, the standout community detection methods demonstrating both fairness and high performance include RSC-V, RSC-K, Walktrap, Infomap, and Significance.

thumbnail
Fig 15. RMI, ARI, PF1, and NF1 vs. on LFR networks having .

https://doi.org/10.1371/journal.pone.0336212.g015

Discussion

Communities within networks vary in size, density, and connectivity. However, most community detection algorithms do not account for structural inequalities among communities and in the network structure, leading to biased detection outcomes and affecting downstream social network analysis tasks [4]. Certain types of communities, particularly smaller in size or having high conductance, are often not properly identified by existing methods. In literature, there exist several metrics to quantify the quality of identified communities [12]; however, the metrics to compute the fairness of a community detection method are underexplored.

In this work, we introduce group-fairness metrics , which quantify bias concerning a given community property p. This metric is grounded in the fairness principle that all types of communities should be detected equally well. It first computes the community-wise fairness using the three introduced measures and then uses them to calculate the overall fairness. Next, we evaluate 24 community detection methods on both real-world and synthetic networks. Synthetic networks are generated using the LFR, ABCD, and HICH-BA benchmark models. For LFR and ABCD networks, we analyze different levels of community mixing, while HICH-BA networks are structured based on the number of majority communities and overall community composition. We analyze community detection methods based on their fairness, as measured by Φ, and their performance using Normalized Mutual Information (NMI), Reduced Mutual Information (RMI), Adjusted Rand Index (ARI), Average F1 Score (PF1), and Normalized F1 Score (NF1).

Our analysis of the fairness-performance trade-off in community detection reveals key biases across different community properties and network mixing. If we talk about fairness with respect to community properties, in LFR networks, most methods favor larger communities, except for SBM-Nested, which effectively detects smaller groups due to its hierarchical structure. As the mixing parameter (μ) increases, fairness-oriented methods struggle equally across all sizes, leading to lower NMI. Regarding community density, methods with high NMI, such as Leiden, Louvain, and Spinglass, tend to favor sparse communities at low μ but shift toward denser ones as μ increases. The Significance method excels at identifying dense communities but overestimates their number. For community conductance, most methods prefer low-conductance communities, and this bias worsens as μ increases. At , methods like RSC-V, SBM-Nested, Infomap, and Significance achieve both fairness and high performance. However, at , fairness and accuracy trade-off linearly, where fair methods perform poorly, and high-performing methods show stronger bias.

Community detection methods perform better in ABCD networks compared to LFR, even at high values, but fairness varies with structural mixing. At , Significance, RSC-SSE, Spectral, and Infomap show high fairness, but Spectral becomes less fair as increases, while RSC-V improves. SBM-Nested excels in quality but favors large, low-density, high-conductance communities, reducing fairness. Representation learning-based methods (DeepWalk, Node2Vec, FairWalk) show fairness for conductance but not for other properties, as the likelihood of correctly identifying communities of varying conductance is similar. Regarding density, Paris and Spectral exhibit shifting fairness behavior due to their tendency to merge nodes into larger communities. Most community detection methods favor low-density communities, and unlike LFR networks, fairness does not necessarily improve with increased mixing. For conductance, Significance, RSC-SSE, Infomap, and Walktrap maintain high fairness and performance, while methods like Combo, Leiden, and SBM shift their conductance bias as increases. Additionally, methods requiring predefined community numbers tend to have lower fairness, as they misclassify nodes from low-conductance communities.

In contrast to LFR and ABCD networks, fairness and performance trends in HICH-BA networks differ significantly. While smaller communities in both networks have higher density, their relationship with conductance differs. Across both HICH-BA networks, there is a general bias toward low-conductance communities, consistent with LFR and ABCD results. However, biases concerning community size and density vary. In both networks, methods like RB-ER and Significance predict an excessive number of communities, fragmenting larger ones. In the multi-majority (MMaj) structure, methods such as Combo, Leiden, Louvain, RB-C, Spinglass, SBM, and SBM-Nested perform well, but struggle in MMin, where only Paris, RSC-V, Label Propagation, SBM, and SBM-Nested achieve good results. In the MMin network, overall performance is lower, with only seven methods achieving NMI scores above 0.4. Notably, SBM and SBM-Nested are fair across all community types. While Walktrap and Infomap maintain fairness across different network types, they overestimate the number of communities in HICH-BA networks, leading to lower-quality results.

Significance, RSC-K, Infomap, Walktrap, and SBM (including SBM-Nested) are among the top-performing community detection methods across various networks. These methods perform well on LFR and ABCD networks, which closely resemble real-world structures. However, Walktrap’s performance declines significantly in highly mixed networks. Fairness and performance often correlate (meaning high-performing methods are more fair), particularly in ABCD networks, where many communities are accurately identified, resulting in high fairness scores even at high mixing levels (). High-performing methods generally tend to show biases toward large, dense, and low-conductance communities. Additionally, modularity-based approaches such as Combo, Leiden, Louvain, RB-C, and RB-ER follow similar performance and fairness patterns. While these methods achieve good results, they tend to favor less dense communities, demonstrating a different bias compared to other community detection methods.

Based on our findings, we recommend Significance and Infomap for general use. These methods consistently achieve high performance and fairness across all networks. These methods do not require users to specify the number of communities, unlike many other high-performing methods. Additionally, both of these methods identify many communities well, often ranking among the fairest methods, and Significance outperforms other methods in real-world networks. While fairness is important, community detection method selection should not be based solely on fairness metrics. A method scoring high in fairness may still detect all communities poorly. Additionally, some methods could achieve better results with optimized parameters rather than the default settings in CDlib. However, Significance and Infomap require no parameter tuning, making their strong fairness-performance trade-off even more notable. Interestingly, Significance does not cluster with modularity-based methods despite optimizing a goodness score similar to those approaches. Unlike modularity-based methods, it optimizes for Significance rather than modularity, resulting in distinct fairness-performance characteristics. We also recommend using the fairness metric if one needs to be chosen, as it considers multiple factors while computing the fairness for a community, and the overall fairness of the method is captured very well.

The choice of a community detection method should also align with the specific network analysis task. Different fairness metrics offer insights relevant to practical applications. For instance, in Influence Maximization, when spreading information to remote parts of a network, selecting a community detection method that identifies low-conductance communities well may be advantageous. The proposed fairness measures and our analysis provide valuable insights for designing fairer community detection methods, improving existing algorithms, and optimizing parameter selection. Additionally, our findings highlight that when extreme cases are created using the HICH-BA model, none of the existing community detection methods perform well. This underscores the need to develop new, well-performing methods specifically designed to handle such challenging network structures.

In this work, we use a Jaccard similarity–based greedy iterative approach to establish a one-to-one correspondence between ground-truth and detected communities. In the proposed mapping, ties are resolved randomly. To evaluate the robustness of the metric, we compare this random tie-breaking with two deterministic strategies that prioritize either larger or smaller communities, respectively. The results, reported in SI Appendix 2, S2 Table, indicate that the choice of tie-breaking strategy has no significant effect on the computed fairness score. Furthermore, while Jaccard similarity–based mapping is intuitive and computationally efficient, it may not guarantee a globally optimal assignment, as once a pair is selected, subsequent choices are constrained, which may potentially result in suboptimal alignments. One could explore alternative mapping approaches, such as the Hungarian algorithm [90], which provides an optimal assignment by maximizing the total similarity across all mappings. Studying the effect of such alternatives on the fairness score remains an interesting direction for future work.

In our framework, fairness trends are modeled by fitting a linear regression between performance measures and community properties (e.g., size, density, conductance). While this provides an interpretable slope-based fairness score, the approach assumes that disparities vary approximately linearly with the chosen property. One can explore fairness metrics to see if the relationship can be a better fit using non-linear curves or if the performance varies a lot across different community sizes, for instance, if it degrades sharply after a threshold. In such cases, a linear model may underestimate or mischaracterize the true nature of disparities. Future extensions could incorporate non-linear models or non-parametric techniques to better capture complex fairness–property relationships while preserving interpretability. Additionally, our use of linear regression abstracts away specific community-wise performance values, and a deeper examination of these values may provide further insights.

This work focused on undirected, unweighted networks with non-overlapping communities. However, many real-world complex networks contain overlapping or hierarchical communities, and future research could explore fairness metrics tailored for such networks, as well as hypergraph-based community detection. Moreover, this work mainly focuses on group fairness, and comparing the community detection methods from the perspective of individual fairness will provide additional insights at the node level. Developing individual fairness metrics for community detection represents a promising direction. One could also investigate causal fairness, aiming to explain disparities through underlying mechanisms. For example, temporal real-world network datasets could be used to design fairness metrics that capture and measure biases arising from such processes. Finally, the current fairness metrics rely on the availability of ground-truth information, which may not always be readily accessible in many cases. Designing fairness metrics that are independent of ground-truth structures would therefore be highly valuable for practical applications and should be explored further.

Conclusion

Community detection is crucial for understanding network structures and node behaviors, yet many community detection methods overlook structural inequalities, leading to biased results. In this work, we introduced group-fairness metrics to evaluate the fairness of community detection methods concerning community properties. Our approach maps ground truth to detected communities and evaluates fairness using three community-wise measures (FCCN, F1, FCCE), which are then aggregated into an overall fairness score Φ. We analyzed the performance-fairness trade-off analysis of 24 community detection methods from six different classes across real-world and synthetic networks (LFR, ABCD, and HICH-BA network generating models). Our analysis reveals that no single class of methods consistently outperforms others, but certain patterns emerge, as explained in the Discussion section. We also observe that the community detection methods that take the number of communities as input have the upper hand over other methods, though they still do not perform well compared to other methods. Our analysis highlights that Significance and Infomap are both high-performing and fair across various community properties, making them practical choices without requiring extensive knowledge of the network structure. However, in networks with extreme structures like MMaj and MMin, no method effectively identifies high-quality communities of all types. In cases with multiple majority groups, Spinglass could be used, while for networks with multiple minority groups, SBM, SBM-Nested, and Label Propagation are suitable options.

In real-world applications such as social media analysis, recommender systems, and online communities, fairness-aware community detection will help ensure that minority or underrepresented groups are not systematically marginalized in how communities are identified or characterized. In existing fairness-aware network analysis algorithms, such as influence maximization or influence blocking, identified communities serve as the basis for designing fair solutions, and therefore, detecting communities fairly will lead to more equitable interventions. By using fairness evaluation metrics, researchers can design algorithms that not only detect meaningful structures but also promote equitable representation and fair treatment of diverse groups in network-driven decision-making.

Supporting information

S1 File. This file contains implementation details of different community detection methods (refer to Appendix 1), robustness results of the fairness metric Φ with respect to different community mapping procedures (refer to Appendix 2), and detailed results on LFR, ABCD, HICH-BA, and real-world networks (refer to Appendix 3).

https://doi.org/10.1371/journal.pone.0336212.s001

(PDF)

References

  1. 1. Barabási AL. Network science book. 2014.
  2. 2. Fortunato S, Hric D. Community detection in networks: a user guide. Physics Reports. 2016;659:1–44.
  3. 3. Fortunato S. Community detection in graphs. Physics Reports. 2010;486(3–5):75–174.
  4. 4. Saxena A, Fletcher G, Pechenizkiy M. FairSNA: algorithmic fairness in social network analysis. ACM Comput Surv. 2024;56(8):1–45.
  5. 5. Ghasemian A, Hosseinmardi H, Clauset A. Evaluating overfit and underfit in models of network community structure. IEEE Trans Knowl Data Eng. 2020;32(9):1722–35.
  6. 6. Farnad G, Babaki B, Gendreau M. A unifying framework for fairness-aware influence maximization. In: Companion Proceedings of the Web Conference 2020 . 2020. p. 714–22. https://doi.org/10.1145/3366424.3383555
  7. 7. Saxena A, Gutiérrez Bierbooms C, Pechenizkiy M. Fairness-aware fake news mitigation using counter information propagation. Appl Intell. 2023;53(22):27483–504.
  8. 8. Saxena A, Fletcher G, Pechenizkiy M. HM-EIICT: fairness-aware link prediction in complex networks using community information. J Comb Optim. 2021;44(4):2853–70.
  9. 9. Saxena A, Fletcher G, Pechenizkiy M. NodeSim: node similarity based network embedding for diverse link prediction. EPJ Data Sci. 2022;11(1):24.
  10. 10. Tsioutsiouliklis S, Pitoura E, Tsaparas P, Kleftakis I, Mamoulis N. Fairness-aware link analysis. CoRR. 2020. https://arxiv.org/abs/2005.14431
  11. 11. Chakraborty T, Dalmia A, Mukherjee A, Ganguly N. Metrics for community analysis. ACM Comput Surv. 2017;50(4):1–37.
  12. 12. Fortunato S, Hric D. Community detection in networks: a user guide. Physics Reports. 2016;659:1–44.
  13. 13. Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testing community detection algorithms. Physical Review E. 2008;78(4).
  14. 14. Kami ński B, Prałat P, Théberge F. Artificial Benchmark for Community Detection (ABCD)—fast random graph model with community structure. Net Sci. 2021;9(2):153–78.
  15. 15. Fred ALN, Jain AK. Robust data clustering. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2003. Proceedings. 2003. p. II.
  16. 16. Newman MEJ, Cantwell GT, Young J-G. Improved mutual information measure for clustering, classification, and community detection. Phys Rev E. 2020;101(4–1):042304. pmid:32422767
  17. 17. Hubert L, Arabie P. Comparing partitions. Journal of Classification. 1985;2(1):193–218.
  18. 18. Rossetti G. RDyn: graph benchmark handling community dynamics. Journal of Complex Networks. 2017;5(6):893–912.
  19. 19. Rossetti G, Pappalardo L, Rinzivillo S. A novel approach to evaluate community detection algorithms on ground truth. In: Complex Networks VII: Proceedings of the 7th Workshop on Complex Networks CompleNet 2016 . 2016. p. 133–44.
  20. 20. Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci U S A. 2008;105(4):1118–23. pmid:18216267
  21. 21. Zhang Y, Rohe K. Understanding regularized spectral clustering via graph conductance. Advances in Neural Information Processing Systems. 2018;31.
  22. 22. Traag VA, Krings G, Van Dooren P. Significant scales in community structure. Sci Rep. 2013;3:2930. pmid:24121597
  23. 23. Pons P, Latapy M. Computing communities in large networks using random walks. In: Computer and Information Sciences-ISCIS 2005 : 20th International Symposium, Istanbul, Turkey. 2005. p. 284–93.
  24. 24. Peixoto TP. Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models. Phys Rev E Stat Nonlin Soft Matter Phys. 2014;89(1):012804. pmid:24580278
  25. 25. Peixoto TP. Hierarchical block structures and high-resolution model selection in large networks. Physical Review X. 2014;4(1).
  26. 26. Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining and identifying communities in networks. Proc Natl Acad Sci U S A. 2004;101(9):2658–63. pmid:14981240
  27. 27. Yen-Chuen Wei, Chung-Kuan Cheng. Towards efficient hierarchical designs by ratio cut partitioning. In: 1989 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers. p. 298–301. https://doi.org/10.1109/iccad.1989.76957
  28. 28. Jianbo Shi, Malik J. Normalized cuts and image segmentation. IEEE Trans Pattern Anal Machine Intell. 2000;22(8):888–905.
  29. 29. Flake GW, Lawrence S, Giles CL. Efficient identification of Web communities. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 2000. p. 150–60. https://doi.org/10.1145/347090.347121
  30. 30. Newman MEJ. Modularity and community structure in networks. Proc Natl Acad Sci U S A. 2006;103(23):8577–82. pmid:16723398
  31. 31. Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;69(2 Pt 2):026113. pmid:14995526
  32. 32. Fortunato S, Barthélemy M. Resolution limit in community detection. Proc Natl Acad Sci U S A. 2007;104(1):36–41. pmid:17190818
  33. 33. Good BH, de Montjoye Y-A, Clauset A. Performance of modularity maximization in practical contexts. Phys Rev E Stat Nonlin Soft Matter Phys. 2010;81(4 Pt 2):046106. pmid:20481785
  34. 34. Li Z, Zhang S, Wang R-S, Zhang X-S, Chen L. Quantitative function for community detection. Phys Rev E Stat Nonlin Soft Matter Phys. 2008;77(3 Pt 2):036109. pmid:18517463
  35. 35. Sun PG, Gao L, Yang Y. Maximizing modularity intensity for community partition and evolution. Information Sciences. 2013;236:83–92.
  36. 36. Van Laarhoven T, Marchiori E. Axioms for graph clustering quality functions. The Journal of Machine Learning Research. 2014;15(1):193–215.
  37. 37. Pizzuti C. Ga-net: a genetic algorithm for community detection in social networks. In: International conference on parallel problem solving from nature. Springer; 2008. p. 1081–90.
  38. 38. Chira C, Gog A, Iclanzan D. Evolutionary detection of community structures in complex networks: a new fitness function. In: 2012 IEEE Congress on Evolutionary Computation. 2012. p. 1–8. https://doi.org/10.1109/cec.2012.6256561
  39. 39. Chakraborty T, Srinivasan S, Ganguly N, Mukherjee A, Bhowmick S. On the permanence of vertices in network communities. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. 2014. p. 1396–405. https://doi.org/10.1145/2623330.2623707
  40. 40. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008;2008(10):P10008.
  41. 41. Chakraborty T, Srinivasan S, Ganguly N, Bhowmick S, Mukherjee A. Constant communities in complex networks. Sci Rep. 2013;3:1825. pmid:23661107
  42. 42. Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep. 2019;9(1):5233. pmid:30914743
  43. 43. Bonald T, Charpentier B, Galland A, Hollocou A. Hierarchical graph clustering using node pair sampling. In: MLG 2018 -14th International Workshop on Mining and Learning with Graphs; 2018.
  44. 44. Berkhin P. A survey of clustering data mining techniques. Grouping multidimensional data: Recent advances in clustering. Springer. 2006. p. 25–71.
  45. 45. Danon L, Díaz-Guilera A, Duch J, Arenas A. Comparing community structure identification. J Stat Mech. 2005;2005(09):P09008–P09008.
  46. 46. Lin TY, Ohsuga S, Liau CJ, Hu X. Foundations and novel approaches in data mining. Springer; 2005.
  47. 47. Artiles J, Gonzalo J, Sekine S. The semeval-2007 weps evaluation: establishing a benchmark for the web people search task. In: Proceedings of the fourth international workshop on semantic evaluations (semeval-2007). 2007. p. 64–9.
  48. 48. Hubert L, Arabie P. Comparing partitions. Journal of Classification. 1985;2(1):193–218.
  49. 49. Meil ă M. Comparing clusterings—an information based distance. Journal of Multivariate Analysis. 2007;98(5):873–95.
  50. 50. Aynaud T, Guillaume JL. Static community detection algorithms for evolving networks. In: 8th international symposium on modeling and optimization in mobile, ad hoc, and wireless networks. IEEE; 2010. p. 513–9.
  51. 51. Tsioutsiouliklis S, Pitoura E, Tsaparas P, Kleftakis I, Mamoulis N. Fairness-Aware PageRank. In:Proceedings of the Web Conference 2021 . WWW ’21. New York, NY, USA: Association for Computing Machinery; 2021. p. 3815–26.
  52. 52. Stoica A-A, Chaintreau A. Fairness in social influence maximization. In: Companion Proceedings of The 2019 World Wide Web Conference. 2019. p. 569–74. https://doi.org/10.1145/3308560.3317588
  53. 53. Feng Y, Patel A, Cautis B, Vahabi H. Influence maximization with fairness at scale. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023. p. 4046–55. https://doi.org/10.1145/3580305.3599847
  54. 54. Peel L, Larremore DB, Clauset A. The ground truth about metadata and community detection in networks. Sci Adv. 2017;3(5):e1602548. pmid:28508065
  55. 55. Decelle A, Krzakala F, Moore C, Zdeborová L. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E. 2011;84(6).
  56. 56. Nadakuditi RR, Newman MEJ. Graph spectra and the detectability of community structure in networks. Phys Rev Lett. 2012;108(18):188701. pmid:22681123
  57. 57. Radicchi F. Detectability of communities in heterogeneous networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2013;88(1):010801. pmid:23944399
  58. 58. Mehrabi N, Morstatter F, Peng N, Galstyan A. Debiasing community detection: the importance of lowly connected nodes. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2019. p. 509–12.
  59. 59. Manolis K, Pitoura E. Modularity-based fairness in community detection. In: Proceedings of the 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2024. p. 126–30.
  60. 60. Chinchor N. MUC-4 evaluation metrics. In: Proceedings of the 4th conference on Message understanding - MUC4 ’92. 1992. p. 22. https://doi.org/10.3115/1072064.1072067
  61. 61. Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. Springer; 2009.
  62. 62. Higham DJ, Kalna G, Kibble M. Spectral clustering and its use in bioinformatics. Journal of Computational and Applied Mathematics. 2007;204(1):25–37.
  63. 63. Cordasco G, Gargano L. Community detection via semi-synchronous label propagation algorithms. In: 2010 IEEE international workshop on: business applications of social network analysis (BASNA). 2010. p. 1–8.
  64. 64. Traag VA, Šubelj L. Large network community detection by fast label propagation. Sci Rep. 2023;13(1):2701. pmid:36792915
  65. 65. Hu B, Li W, Huo X, Liang Y, Gao M, Pei P. Improving Louvain algorithm for community detection. In: 2016 International Conference on Artificial Intelligence and Engineering Applications. 2016. p. 110–5.
  66. 66. Li H, Zhang R, Zhao Z, Liu X. LPA-MNI: an improved label propagation algorithm based on modularity and node importance for community detection. Entropy (Basel). 2021;23(5):497. pmid:33919470
  67. 67. Malhotra D, Gera R, Saxena A. Community detection using semilocal topological features, label propagation algorithm. In: Computational Data, Social Networks: 10th International Conference and CSoNet 2021, Virtual Event, November 15–17, 2021, Proceedings. 2021. p. 255–66.
  68. 68. Das S, Biswas A, Saxena A. DCC: a cascade-based approach to detect communities in social networks. In: International Conference on Computer Vision, High-Performance Computing, Smart Devices, and Networks. 2022. p. 381–92.
  69. 69. Arya A, Pandey PK, Saxena A. Node classification using deep learning in social networks. Deep learning for social media data analytics. Springer; 2022. p. 3–26.
  70. 70. Newman MEJ, Leicht EA. Mixture models and exploratory analysis in networks. Proc Natl Acad Sci U S A. 2007;104(23):9564–9. pmid:17525150
  71. 71. Clauset A, Newman MEJ, Moore C. Finding community structure in very large networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2004;70(6 Pt 2):066111. pmid:15697438
  72. 72. Sobolevsky S, Campari R, Belyi A, Ratti C. General optimization technique for high-quality community detection in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2014;90(1):012811. pmid:25122346
  73. 73. Reichardt J, Bornholdt S. Statistical mechanics of community detection. Phys Rev E Stat Nonlin Soft Matter Phys. 2006;74(1 Pt 2):016110. pmid:16907154
  74. 74. Newman MEJ. Finding community structure in networks using the eigenvectors of matrices. Physical Review E. 2006;74(3).
  75. 75. Perozzi B, Al-Rfou R, Skiena S. DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.
  76. 76. Rahman T, Surma B, Backes M, Zhang Y. Fairwalk: towards fair graph embedding. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. IJCAI’19. AAAI Press; 2019. p. 3289–95.
  77. 77. Grover A, Leskovec J. node2vec: scalable feature learning for networks. KDD. 2016;2016:855–64. pmid:27853626
  78. 78. Parés F, Gasulla DG, Vilalta A, Moreno J, Ayguadé E, Labarta J, et al. Fluid communities: A competitive, scalable and diverse community detection algorithm. In:International conference on complex networks and their applications. Springer; 2017. p. 229–40.
  79. 79. Girvan M, Newman MEJ. Community structure in social and biological networks. Proc Natl Acad Sci U S A. 2002;99(12):7821–6. pmid:12060727
  80. 80. Arenas A, Danon L, Diaz-Guilera A, Gleiser P, Guimerà R. Community analysis in social networks. European Physical Journal B. 2003;38.
  81. 81. Stegehuis C, van der Hofstad R, van Leeuwaarden JSH. Power-law relations in random networks with communities. Phys Rev E. 2016;94(1–1):012302. pmid:27575143
  82. 82. Lancichinetti A, Fortunato S. Limits of modularity maximization in community detection. Phys Rev E Stat Nonlin Soft Matter Phys. 2011;84(6 Pt 2):066122. pmid:22304170
  83. 83. Karimi F, Génois M, Wagner C, Singer P, Strohmaier M. Homophily influences ranking of minorities in social networks. Sci Rep. 2018;8(1):11077. pmid:30038426
  84. 84. Krebs V. Political book networks.
  85. 85. Leskovec J, Krevl A. SNAP datasets: Stanford large network dataset collection. 2014. http://snap.stanford.edu/data
  86. 86. Yin H, Benson AR, Leskovec J, Gleich DF. Local higher-order graph clustering. KDD. 2017;2017:555–64. pmid:29770258
  87. 87. Jerdee M, Kirkley A, Newman MEJ. Mutual information and the encoding of contingency tables. Phys Rev E. 2024;110(6–1):064306. pmid:39916204
  88. 88. Hagberg A, Swart P, Chult S, D. Exploring network structure, dynamics, and function using NetworkX. Los Alamos, NM (United States): Los Alamos National Lab. 2008.
  89. 89. Rossetti G, Milli L, Cazabet R. CDlib: a python library to extract, compare and evaluate communities from complex networks. Applied Network Science Journal. 2019.
  90. 90. Kuhn HW. The Hungarian method for the assignment problem. Naval Research Logistics. 1955;2(1–2):83–97.