Heatmap centrality: A new measure to identify super-spreader nodes in scale-free networks

The identification of potential super-spreader nodes within a network is a critical part of the study and analysis of real-world networks. Motivated by a new interpretation of the “shortest path” between two nodes, this paper explores the properties of the heatmap centrality by comparing the farness of a node with the average sum of farness of its adjacent nodes in order to identify influential nodes within the network. As many real-world networks are often claimed to be scale-free, numerical experiments based upon both simulated and real-world undirected and unweighted scale-free networks are used to illustrate the effectiveness of the proposed “shortest path” based measure with regards to its CPU run time and ranking of influential nodes.


Introduction
Networks provide a framework to model complex systems, utilizing nodes and edges to depict the interactions between system components. While various tools have been developed to analyze real-world systems, nodal centrality measures are one of the more prominently used network analysis techniques to quantify the influence of a node with respect to other nodes within the network [1]. Some of the more well-known centrality measures include the degree centrality (C D ) [2], eigenvector centrality (C E ) [3], closeness centrality (C C ) [4], and betweenness centrality (C B ) [5]. The utilization of centrality measures on networks to identify influential nodes can lead to a more comprehensive understanding of the dynamics and behavior of real-world systems, such as the identification of the most influential individuals in a social network, the key airports in a transportation network, or the super-spreaders in a disease [6]. Past applications of the four well-known centralities, along with various generalizations of the measures, on real-world networks include the Internet [7,8], transportation systems [9,10], biological systems [11][12][13], and social systems [14,15].
However, the identification of influential nodes through centrality measures depends upon the network structure. Generally, a node with a higher centrality value is considered more influential than the other nodes, where the value numerically quantifies the level of influence a node has with respect to the network's topology. For instance, the degree centrality of a node is an indication of the number of neighbors connected to it. An extension of degree centrality,

PLOS ONE
PLOS ONE | https://doi.org/10.1371/journal.pone.0235690 July 7, 2020 1 / 31 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 more influential than the others. For example, if an individual and each of its immediate neighbors are infected, then neither of them necessarily possess more infectivity nor influence than the other. Yet, if an individual is the only one infected among its immediate neighbors, then the individual possesses more infectivity and influence than the others. Using the above arguments, a node with a large negative heatmap centrality is more likely to control the diffusion of information among other nodes and possess a greater amount of infectivity. As many real-world networks are often claimed to be scale-free [25] (e.g., social networks [26], the Internet [27], protein-protein interaction networks [28,29], and airline networks [30,31]), the properties of the heatmap centrality measures are explored in this paper through experimental studies involving both simulated and real-world scale-free networks. Generally, a network is considered "scale-free" if the fraction of nodes with degree k follows a power-law distribution k −γ , where γ>1 [25]. Yet, alternative versions of the "scale-free hypothesis" have included weaker requirements, such that the power law only needs to hold in the upper tail of the degree distribution [32], or stronger requirements, such as requiring 2<γ<3 [33,34]. As a consequence of the various interpretations of the "scale-free", different researchers can apply the term to slightly different concepts, ultimately complicating efforts to evaluate the relationship and properties between the centrality measures on scale-free networks [25]. To avoid any confusion with the term "scale-free", in this paper, a "scale-free" network is defined as a simple, undirected network whose degree distribution follows a power-law distribution k −γ , where γ>1. Additionally, any network that is either too dense or too sparse is not considered, as networks with such densities are not plausibly scale-free [25].
To evaluate the algorithmic performance and efficiency on simulated scale-free networks, the CPU (central processing unit) times and the Spearman-rank and Kendall-rank correlation coefficients of the proposed heatmap measure are calculated on networks of various size N and density d. To evaluate the algorithmic efficiency on real-world scale-free networks, three experiments are performed to compare the nodal ranking of the proposed measure with the rankings with respect to the degree, eigenvector, closeness, and betweenness centralities: A comparison of the top-10 ranked nodes, a comparison of both the Spearman-rank and Kendall-rank correlation coefficients between each pair of measures, and a comparison of the spreading capability of the top-10 nodes using a modification of the standard Susceptible-Infected (SI) model [35][36][37][38]. Based upon the results of the experiments performed on both the simulated and real-world scale-free networks, the heatmap centrality may be considered as a potentially viable measure in the identification of super-spreader nodes in scale-free networks.
The rest of the paper is organized as follows. Section 2 provides preliminary material, such as definitions and notations that will be referred to throughout the paper, and an overview of the four well-known centrality measures is provided. Section 3 introduces the proposed heatmap centrality measure, discusses its time complexity, and provides a toy example to highlight the properties of each centrality measure, as well as the advantages of the proposed measure. Section 4 details the two rank correlation coefficients, Spearman and Kendall, that are used to compare the rankings with respect to two centrality measures. An example calculation of each correlation with respect to the betweenness and heatmap centralities is also included in this section. In Section 5, an overview of the SI model is provided, while Section 6 describes the datasets and introduces the evaluation methodologies used to study the accuracy of each measure. The analysis of the experimental results is also included in the section. Section 7 concludes the paper and details future work. = {e j,k |v j ,v k 2V} by the set of connections between nodes v j and v k . Because G is undirected, if e j, k 2E, then e k,j 2E.
If e j,k 2E, then nodes v j and v k are considered adjacent, where node v j is known as a neighbor of node v k , and node v k is a neighbor of node v j . The neighborhood of node v j 2V is the set of the neighbors of v j 2V.
A path P(v j , v k ) between nodes v j and v k is defined as a sequence of edges that connect adjacent nodes, and the length of a path P(v j , v k ) is defined as the sum of the edges in P(v j , v k ). Let Pðv j ; v k Þ be the set of all paths between nodes v j and v k . Then P s ðv j ; v k Þ 2 Pðv j ; v k Þ is defined as a shortest path (i.e., a path of minimum length) between nodes v j and v k with length s(v j , v k ).
Finally, define a graph as connected if there exists a path between every pair of its nodes. In this paper, only simple, undirected, unweighted, and connected networks are considered, where a simple graph, by definition, does not contain duplicate edges or loops (i.e., an edge that connects a node to itself).

Centrality measures
A centrality measure is a function C : V ! R that assigns a value to each node within the network. This value numerically quantifies the influence of a node with respect to the network structure, where a node with a higher centrality value is usually considered more influential than the other nodes. Centrality measures do not consider node-level covariate information (i.e., node characteristics that "co-vary" with the network). Although many centrality measures have been proposed to rank nodes within a network according to their level of influence, the well-known measures for characterizing the centrality of a node within a network include the degree, eigenvector, closeness, and betweenness centralities.
Consider a simple, undirected, unweighted, and connected network G = (V,E) with N = |V| nodes and M = |E| edges. Then the network G can be described by an adjacency matrix A = {a i, j } of size N×N, where a i,j = 1 if nodes v i and v j are connected by edge e i,j , and a i,j = 0 if otherwise.

Degree centrality
The degree centrality for node v i , denoted as C D (v i ), is defined [2] as The degree centrality measures the influence of a node by the number of edges connected to it, where a node with a high degree value is a highly connected node within the network and thus, involved in a large number of interactions. The degree centrality is considered a local network measure as it does not take the structure of the rest of the network into account. The time complexity to compute the degree centrality of one node, C D (v i ), is O(N) since there are N entries in the row corresponding to each node in the adjacency matrix A [6]. Consequently, the calculation of the degree centrality of all N nodes in the network requires O(N 2 ) time in a dense network (i.e., M = N 2 ), and O(M) time in a sparse network (i.e., M<N(N−1)/2).

Eigenvector centrality
Let λ 1 ,λ 2 ,. . .,λ N denote the eigenvalues of the adjacency matrix A = {a i,j } of network G. Then the largest eigenvalue of matrix A is λ max with corresponding eigenvector e = [e 1 ,e 2 ,. . .,e N ] T such that l max e i ¼ P N j¼1 a i;j e j . Then, the eigenvector centrality for node v i , denoted as C E (v i ), is defined [3] as Intuitively, the eigenvector centrality quantifies node v i as influential if it is connected to other influential nodes within a network [17], thus calculating both the direct and indirect influence of the node. As a result, the eigenvector centrality is considered a measure of the global network connectivity. Since the power method is used in the calculation of the eigenvector centrality, there is no "proven" time complexity as the exact time complexity depends upon a number of factors, such as the speed of the convergence of the normalized eigenvector to λ max .

Closeness centrality
The closeness centrality for node v i , denoted as C C (v i ), is defined [4] as or, more simply, as the reciprocal of farness, where farness is defined as the sum of the length of the shortest paths between node v i and all other nodes in the network. Intuitively, the closeness centrality measures how quickly information can spread from node v i , utilizing the idea that a node is close to all nodes within the network and not just close to its neighbor. As a result, the closeness centrality is considered a measure of the global network connectivity. Furthermore, the time complexity needed to calculate the closeness centrality of one node, C C (v i ), is O(N+M) since the centrality utilizes breadth-first search (BFS), an algorithm that runs in O (N+M) time, to determine the shortest path from one node to all other nodes [6]. Therefore, in order to compute the closeness centrality of all N nodes in the network, the BFS algorithm is implemented on each of the N nodes such that the time complexity for the closeness centrality

Betweenness centrality
The betweenness centrality for node v i , denoted as C B (v i ), is defined [5] as where s v j ;v k is the number of shortest paths between nodes v j and v k , and s v j ;v k ðv i Þ is the number of shortest paths between nodes v j and v k that pass through node v i . The interactions of two nonadjacent nodes depend upon other nodes, which generally lie on the shortest paths between the two nodes [39]. Thus, the betweenness centrality considers a node influential if it lies on a large fraction of shortest paths between a pair of nodes within the network. Intuitively, the betweenness centrality measures how much information is likely to flow through node v i . Since the betweenness centrality takes the structure of the entire network into account, it is considered a global network measure.
To calculate the betweenness centrality of a node within the network, the fastest known algorithm is Brandes [40] which performs two basic steps. First, an augmented BFS starts from node v j and computes s v j ;v k for every node v k . Secondly, during the accumulation phrase, the BFS constructed in the previous step is utilized to calculate the value of C B (v i ) for every node PLOS ONE  [41]. Since this computation is performed for each node, then the calculation of the betweenness centrality on all N nodes using the Brandes algorithm requires a time complexity of O(N(N+M)) = O(N 2 +NM), which for all purposes, simplifies to O(NM).
Although the closeness and betweenness centralities each require O(NM) time to calculate the shortest paths for all N nodes in the network, the closeness centrality is a relatively less timeconsuming measure in comparison, as it does not require much of the post-processing work (i.e., calculating the number of shortest paths that pass through the node) that is required for the betweenness centrality. In particular, for each node v i , the N shortest paths trees from the BFS algorithm must be traced through in order to determine the number of shortest paths that pass through v i [6]. Therefore, calculating the betweenness centrality for one node, C B (v i ), could take an additional O(NM) time such that the overall time complexity to compute the betweenness centrality for all the nodes in the network would be O(N 2 +2NM), or simply O(NM) [6].

Heatmap centrality
A new centrality measure, termed the heatmap centrality, utilizes both local and global network information by comparing the farness of each node (i.e., the global network information) with the average sum of the farness of its neighbor nodes (i.e., the local network information). In particular, the heatmap centrality for node v i , denoted as C HM (v i ), is defined formally as The heatmap measure identifies the "hot spot" node within its neighborhood, as it considers a node with a smaller farness than that of the average of its neighbors to be an influential node within the network. Intuitively, a node with the smaller farness among that of its neighbors is more likely to have information pass specifically through it, rather than through any of the adjacent nodes. When the sign of C HM (v i ) transitions from negative to positive, then the average sum of the farness of the neighbors of v i becomes smaller than that of node v i , decreasing the likelihood of information passing specifically through node v i . Therefore, using this intuition, the heatmap centrality can be considered a "shortest path" based measure and utilized in the identification of super-spreader nodes that control the flow of information within a scale-free network.
Consider the pseudocode for the heatmap centrality algorithm in Fig 1 and node v i with, say, k neighbors. In Step 1, the calculation of the farness of node v i and its k neighbors requires O((k+1)(N+M)) time as BFS needs to be executed (k+1)-times. In Step

Toy example
Fig 2 displays a simple, undirected, unweighted, and connected network G with 15 nodes and 19 edges. The advantages of the proposed measure are detailed by comparing the rankings of the nodes in the network with respect to the degree centrality (C D ), eigenvector centrality (C E ), closeness centrality (C C ), betweenness centrality (C B ), and heatmap centrality (C HM ). The centrality value of each node and its ranking with respect to the five measures are provided in Table 1. Unsurprisingly, node 3 is top-ranked by both the degree and eigenvector centralities, as it is the most connected node within the network. In addition, node 6 is top-ranked by the closeness, betweenness, and heatmap measures, since its close proximity to all the nodes within the network allows it to control the diffusion of information. Yet, although node 6 is secondranked by both the degree and eigenvector measures, and node 8 by the closeness and betweenness measures, the heatmap centrality ranks node 10 second. Nodes 6 and 8 both lie on the path that bridge the two components of network G, such that information passing through node 6 must pass through node 8 in order to reach the nodes on the right side of the network. As a result of its positioning on the bridge path, node 8 is given large closeness and betweenness values strictly due to its adjacency to node 6. But the heatmap centrality does not assign a node a high rank necessarily due to a topological advantage. Instead, the proposed measure leverages the farness of a node's neighbors to provide a ranking that is more reflective of how information flows throughout the network.

The rank correlation coefficients
The ranking of a node with respect to a centrality measure quantifies the node's influential seating if the nodes in the network are ordered in the decreasing order of the centrality values.

PLOS ONE
As there are many different centrality measures used to rank the nodes of a network by their level of influence, rank correlation coefficients may be utilized to determine the similarity of two centrality rankings. Intuitively, the correlation coefficient between two variables will be high when two variables have a similar rank, and low when two variables have a dissimilar rank [42]. Additionally, correlation coefficient values closer to 1 indicate a stronger similarity between the two rankings. The Spearman-rank and Kendall-rank are both correlation coefficients that depend only on the ranks of the variables (e.g., the nodes), and not on the observed values (e.g., the centrality values). Spearman's correlation coefficient (ρ) is equivalent to the traditional linear correlation coefficient calculated on the rankings of variables, while Kendall's correlation coefficient (τ) is proportional to the number of pairwise adjacent inversions that are required to convert one ranking into the other [43]. Although Kendall's τ differs in magnitude and is usually smaller when compared to Spearman's ρ, it has become a standard statistic to compare the correlation between two rankings for a number of reasons, such as its fastcomputational time [44]. In this paper, both rank correlation coefficients are used to evaluate the correctness of the rankings to better assess the effectives of the centrality measures.

Spearman-rank correlation coefficient
The Spearman-rank correlation coefficient ρ [45] of the rankings with respect to any two centrality measures, say C X and C Y , is calculated as follows. First, the ranking of nodes in decreasing order of the centrality values is obtained. The index at which a node appears in the list is initially considered the tentative ranking of the node. If two or more nodes have the same centrality value, the tie between the nodes is broken in favor of the node with the smaller numerical node identification label. For example, if nodes 4 and 6 both had the same centrality value, then node 4 is ranked ahead of node 6 due to its identification label of 4 being less than 6. The final ranking for a node with respect to a centrality measure is the same as the tentative ranking for the node only if it does not have a tie with any other node for the centrality measure. If two or more nodes have a tie with respect to a centrality measure, their final ranking is the average of the tentative rankings for the nodes with respect to the measure [46]. Define d i as the difference in the final ranking for node v i with respect to the two centrality measures C X and C Y , Table 1. The ranking of all 15 nodes in the example network G with respect to the degree (C D ), eigenvector (C E ), closeness (C C ), betweenness (C B ), and heatmap (C HM ) centrality measures.

PLOS ONE
where 1�i�N and N is the number of nodes in the network. Then the Spearman-rank correlation coefficient of the rankings with respect to two centrality measures, C X and C Y , is calcu-

Kendall-rank correlation coefficient
For any two centrality measures, C X and C Y , define a pair of nodes v i and v j as concordant with respect to C X and C Y if the nodes are arranged in the same order by the measures, and discordant with respect to C X and C Y if the nodes are arranged in opposite order [46]. For example, if node 4 is ranked above node 6 by both measures C X and C Y , then the node pair is concordant. Yet, if node 4 is ranked above node 6 by measure C X , but ranked below node 6 by measure C Y , then the node pair is discordant. If a pair of nodes are assigned the same ranking with respect to C X and C Y , then the node pair is considered neither concordant nor discordant. Then the Kendall-rank correlation coefficient τ [47] of the rankings with respect to two centrality measures, where N C is the total number of concordant pairs and N D is the total number of discordant pairs.

Example calculation
To quantify the extent of the similarity in the ranking of the nodes with respect to the betweenness and heatmap centrality measures on the example network G in Fig 2, the Spearman-rank correlation is r C B ; C HM ð Þ ¼ 1 À ð6Þð30Þ 15ð225À 1Þ ¼ 0:946 where N = 15 and Kendall-rank correlation is t C B ; C HM ð Þ ¼ 96À 9 96þ9 ¼ 0:829 where N C = 96 and N D = 9. The details of the calculation of the Spearman-rank correlation are provided in Table 2, while those for the Kendall-rank correlation are provided in Table 3.

Table 2. The Spearman-rank correlation for the betweenness (C B ) and heatmap (C HM ) centrality measures is calculated based upon the difference (d i ) in the final ranking with respect to the rankings of C B and C HM for each node (v i ) in network G.
Node

PLOS ONE
To assist in the calculation of the concordant and discordant pairs required for the Kendallrank correlation used in the example, the nodes v i in the first column of Table 3 are sorted according to their C B ranking in the second column. The corresponding C HM ranking of each node is listed in the third column. For example, node 8 is ranked second with respect to C B , but ranked fourth with respect to C HM . Since the nodes are already sorted with respect to their C B ranking, then the number of concordant pairs N C for node v i equals the total number of larger ranks that exist below its C HM ranking. Similarly, the number of discordant pairs N D for node v i equals the total number of smaller ranks that exist below its C HM ranking. For example, consider node 12 which has C B and C HM rankings of 5 and 7, respectively. There are 8 concordant pairs because nodes 1, 2, 4, 7, 11, 13, 14, and 15 have both a C B ranking below 5 and a C HM ranking below 7. In addition, there are 2 discordant pairs since only nodes 5 and 9 have both a C B ranking below 5 and a C HM ranking above 7. Similar calculations are performed to determine the number of concordant and discordant pairs for the remaining nodes in the network, with the results displayed in the fourth and fifth columns of Table 3.

The SI model
Since the diffusion of information can be likened to the propagation of a disease, an epidemic model is proposed to track the information spreading process and identify potential superspreaders within the network. In particular, a susceptible-infected (SI) model is utilized to simulate the spreading process and examine the spreading capability (i.e., the spreading efficiency) of the nodes within the network. In theory, the SI model identifies influential nodes based upon the idea that an influential node is more likely to have a role in passing along a disease (or analogously, information), and thus, have a stronger spreading capability [48]. The SI model has been used as a baseline model to compare the rankings of centrality measures [35][36][37][38], where the average infection efficiency of nodes is used as a measure to evaluate the effectiveness of a centrality measure [49].
In the SI model, each node belongs to one of two possible states: susceptible or infected. The susceptible state S(t) represents the number of nodes susceptible to, but not yet infected by, the disease at time step t. The infected state I(t) represents the number of nodes that have been infected and are able to spread the disease to susceptible nodes at time step t. Infected nodes can infect susceptible nodes with a fixed probability, and once a node becomes infected, it remains infected. Initially, all nodes are in the susceptible state, with the exception of one node in the infected state. At each time step t, for each infected node, one randomly selected susceptible neighbor becomes infected with probability β. In this paper, the value of the infection probability is set to β = 1 for simplicity in the subsequent experiments. The cumulative number of infected nodes at time step t, denoted by F(t), is used as a measure of the initially infected node's influence at time t. As the time step t increases, F(t) increases and eventually stabilizes at time step t c , denoted by F(t c ). Therefore, the spreading capability F(t c ) is used to illustrate the influence of a particular node, where a larger F(t c ) indicates a stronger influence.
Although the top-10 nodes of a network may be equivalent among two measures, the ordering of the nodes may still differ. In order to demonstrate the effectiveness of the ranking with respect to one centrality measure over that with respect to another centrality, the spreading capability of the top-10 ranked nodes among each centrality measure is compared. Following a similar direction taken by Qiao et al. [35], the top-10 nodes, collectively, serve as the source of the infection. Although F(t) stabilizes at different values of t c for each real-world network, the average infection capability of the top-10 nodes is calculated at each time step and used to examine the infection ability of the top-10 nodes within each centrality ranking. Because only one randomly selected susceptible neighbor becomes infected with probability β = 1 at each time step, the spreading process is repeated 100 times independently to eliminate any environmental randomness. It is noted that this SI model with β = 1 is a modified version of the standard SI model, in which all the susceptible neighbors of an infected node have a possibility to become infected [7].

Simulation and analysis
The codes and the calculation of the heatmap and well-known centrality measures, along with the spreading capability from the SI model, are generated and executed using version 1.2.4.2 of the igraph package in R [50]. The datasets for the real-world scale-free networks are downloaded from version 0.1.3 of the networkdata package [51] and version 3.0.16 of the tnet package [52] in R. Finally, the experiments are run in parallel and performed on the RStudio Virtual Machine provided by Pomona College. There are 2 CPUs (Intel Xeon Processor E5 v4 at 2.20 GHz) on the physical server that the RStudio Virtual Machine resides on, and each CPU has 24 cores. With 2GB provisioned per core, the RStudio server has 96 GB of RAM.

Simulated scale-free networks.
To validate the computational efficiency of the proposed centrality measure, the heatmap centrality is applied to simulated Barabási-Albert (BA) scale-free networks of various size N and density d, where d is defined as the ratio of the number of edges present in the network to the number of possible edges in the network. The BA model is an algorithm for generating scale-free networks through the preferential attachment mechanism, in which the more connected a node is, the more likely it is to acquire new edges [53]. In particular, the network begins with m 0 (in this paper, m 0 = 1) nodes. With a linear preferential attachment, then at each time step, one new node is added to the network with m�m 0 edges that connect it to the existing nodes within the network. After t time steps, the BA algorithm generates a network with |V| = N = t+m 0 nodes and |E| = mt edges.
For an undirected network with |V| = N nodes and |E| = M edges, the density d is defined as d ¼ 2M NðNÀ 1Þ . With m 0 = 1, such that t = N−1 and M = m(N−1), it can be shown that the density in a BA scale-free network is d � 2m N [54]. As the betweenness centrality is the slowest among the four well-known measures (refer to Section 2 for the overview of its time complexity), the betweenness measure is used to gauge whether the proposed measure can be computed in an acceptable amount of time. In order to benchmark the runtimes of both the betweenness and heatmap centralities as both the size N and density d of the network increase, BA scale-free networks of size N and density d are constructed using the function sample_pa from the igraph package by specifying both the size of the network N and the number of edges to add in each time step, m, such that m � dN 2 . Table 4 contains the values of m used to simulate the scale-free networks of various size and density in the experiments. For each specified size and density, 100 scale-free networks are simulated. For each network in the set of 100 simulated networks, the runtime to execute each centrality measure on the entire network is calculated. For each set of 100 simulated networks, the mean and standard deviation of the runtimes are calculated for each measure.
For each set of 100 simulated networks, the mean and standard deviation of the basic structural features of networks of size N and density d are calculated and are summarized in Tables  5-7, where <C D > denotes the average degree of a node, <cc> denotes the clustering coefficient (i.e., a measure of the degree to which nodes in a network tend to cluster together), and diameter denotes the longest path of the shortest path between any two nodes.

Real-world scale-free networks
To validate its effectiveness in identifying influential nodes, the heatmap centrality is applied to four real-world scale-free networks: (1) Email, the email communication network of a university in Spain [55], (2) Polblogs, the hyperlink network about the political blogs in the United States [56], (3) USFlights, the airline network consisting of the airports in the United States in 2010 [57], and (4) Facebook, the social network containing the online interactions between students at University of California, Irvine [58].
Email is an undirected and unweighted network that depicts the email communication network at the University of Rovira i Virgili in Tarragona in the south of Catalonia in Spain. In

PLOS ONE
the network, each node represents a user and each edge represents that at least one email was sent between two connected users. Polblogs is an undirected and unweighted network that depicts the United States political blogosphere data compiled by Lady Adamic and Natalie Glance. In the network, each node represents a blog and each edge represents a hyperlink between two blogs. To create the Polblogs network, the largest connected component from the original network was selected, which lowered the number of nodes by 268 from 1490 to 1222. All 268 nodes that were removed were isolated in the original network. In addition, all edges, which were originally directed, were made undirected. Finally, the network was made simple by removing all multiedges and loops, which lowered the number of edges by 3 from 16717 to 16714.
USFlights is an undirected and unweighted network of the United States' airports in 2010, where each node represents an airport and an edge is a direct route between two airports. To create the USFlights network, the largest connected component from the original network was selected, which lowered the number of nodes by 286 from 1858 to 1572. All 286 nodes that were removed were isolated in the original network. In addition, all edges, which were originally directed, were made undirected. Finally, the network was made simple by removing all multi-edges and loops, which lowered the number of edges by 11020 from 28234 to 17214.
Facebook is an undirected and unweighted network of the online social interactions that originated from a virtual community among students at UC Irvine. Each node represents a user and an edge represents that at least one message was sent between two users. To create the Facebook network, the largest connected component from the original network was selected, which lowered the number of nodes by 6 from 1899 to 1893. All 6 nodes that were removed were isolated in the original network. In addition, all edges, which were originally directed, were made undirected. Finally, the network was made simple by removing all multiedges and loops, which lowered the number of edges by 47899 from 61724 to 13835. Fig 3 displays the degree distribution of each network, where each distribution exhibits a power law with P(k)~k −γ , where γ>1, such that each network can be conjectured to be scalefree. The basic structural features of the four real-world networks are summarized in Table 8, where N and M denote the number of nodes and edges, respectively, in the network, d denotes the density, <C D > denotes the average degree of a node, <cc> denotes the clustering coefficient, and diameter denotes the longest path of the shortest path between any two nodes.

Efficiency analysis on simulated scale-free networks
The efficiency of the heatmap centrality is demonstrated by comparing its CPU (central processing unit) time for the simulated scale-free networks described in Tables 4-7 against the runtime of the betweenness centrality. In addition, the effectiveness of the heatmap centrality is detailed by comparing both the Spearman-rank and Kendall-rank correlations of its ranking with respect to the rankings of the degree, eigenvector, closeness, and betweenness centralities on the same set of simulated scale-free networks.

Experiment 1: compare the CPU time
In this experiment, the function proc.time in version 3.6.0 of the base package in R [59] is used to measure the CPU time (in seconds) required to execute the betweenness and heatmap centrality measures on each simulated scale-free network of size N and density d. The CPU time is defined as the sum of the "user time" and "system time" values, where "user time" is the CPU time charged from the execution of user instructions of the calling process [59], and "system time" is the CPU time charged for execution by the system on behalf of the calling process [59]. The "user + system" value indicates how much CPU has been used to execute the algorithm and calculate the centrality value of each of the N nodes in the simulated scale-free network. The betweenness centrality is calculated using the function betweenness in the igraph package [50]. The pseudocode for the heatmap centrality is provided in Fig 1. The functions closeness and degree in the igraph package are used to calculate the closeness and degree centrality, respectively, that are required in the calculation of the heatmap centrality.  Fig 4, the heatmap centrality requires nearly half the CPU time than the betweenness centrality. Although the heatmap and betweenness centralities share the same time complexity, it is noted that the magnitude of the observed differences in CPU times may vary with different function implementations and versions of the packages in R.

Experiment 2: compare the correlation of the rankings
For each simulated scale-free network of size N and density d, the Spearman-rank and Kendall-rank correlations of the heatmap centrality with respect to the degree, eigenvector, closeness, and betweenness measures are calculated. The results of the Spearman-rank correlation are shown in Fig 5, while those of the Kendall-rank are shown in Fig 6. In this experiment,   Fig 3. The degree distribution of the four real-world networks. The degree distribution of each real-world network exhibits a power law distribution P(k)~k −γ , where γ>1. The R 2 calculated from the linear regression analysis on log(P(k))~−γlog(k) is provided for each network as a measure of the goodness of fit for the power law model on the degree distribution. The closer R 2 is to 1, the higher the degree of fit to the power-law distribution. https://doi.org/10.1371/journal.pone.0235690.g003

PLOS ONE
both rank correlation coefficients highlight similar relationships between each pair of centrality measures, although the values of Kendall's τ tend to be smaller than that of Spearman's ρ.
In particular, the Spearman-rank correlation between the rankings of the heatmap and degree centralities in Fig 5 is strong. With the exception of a few network sizes and densities, as both the size and density increase, the value of ρ for both the heatmap and closeness centralities, and the heatmap and eigenvector centralities increases. Finally, the correlation of the rankings with respect to the betweenness and heatmap centralities is the strongest amongst all other correlations, with the value of ρ suggesting a very strong association between the rankings of the

Experimental analysis on real-world scale-free networks
To verify the efficiency of the heatmap centrality on real-world scale-free networks, several experiments are conducted to examine the relative advantages and disadvantages. In order to evaluate the efficiency of the proposed measure, four centrality measures which comprise of degree, eigenvector, closeness, and betweenness are also applied to the same set of real-world networks for comparison. The experiments include (1) identifying the ten most influential nodes within each network, (2) measuring the effectiveness of the top-10 nodes with respect to each centrality ranking using a modified susceptible-infected (SI) model, and (3) calculating the correlation of the rankings with respect to each pair of centrality measures. The real-world  Table 8.

Experiment 1: compare the top-10 ranked nodes
In this experiment, the proposed heatmap centrality measure (C HM ), degree centrality (C D ), eigenvector centrality (C E ), closeness centrality (C C ), and betweenness centrality (C B ) are employed to identify the top-10 nodes of the four real-world networks: Email, Polblogs, USFlights, and Facebook. The results are shown in Tables 9-12. With respect to the centrality measures C D , C E , C C , and C B , the top-ranked node has the most positive value, while the topranked node according to the heatmap centrality C HM has the most negative value. In each table, the nodes that have been identified by all five measures are bolded.
According to the results shown in Table 9, in the Email network, the proposed measure C HM shares the same seven, three, eight, and nine nodes between C D , C E , C C , and C B respectively. Based upon the results shown in Table 10, in the Polblogs network, the number of the same nodes in the top-10 ranking between the heatmap centrality and C D , C E , C C , and C B is Table 9. The top-10 ranked nodes of the Email network with respect to the degree (C D ), eigenvector (C E ), closeness (C C ), betweenness (C B ), and heatmap (C HM ) centrality measures. The nodes in bold are identified by all five measures. nine, five, ten, and nine, respectively. From the results in Table 11, in the USFlights network, the proposed measure C HM shares the same four, two, four, and five nodes between C D , C E , C C , and C B respectively. Finally, based upon the results shown in Table 12, in the Facebook network, the number of the same nodes in the top-10 ranking between the heatmap centrality and C D , C E , C C , and C B is eight, seven, eight, and eight, respectively. With the exception the results taken from the USFlights network, the heatmap centrality shares a minimum of seven of the top-10 ranked nodes with the degree, closeness, and betweenness centralities, respectively. Finally, with respect to the eigenvector centrality, the heatmap centrality shares only three to seven of the top-10 ranked nodes.

Experiment 2: compare the average spreading capability of the top-10 ranked nodes
In this experiment, a modification of the susceptible-infected (SI) model is used to estimate the spreading capability (i.e., spreading influence) of the top-10 nodes ranked by the degree, eigenvector, closeness, betweenness, and heatmap centrality measures. Inspired by the direction of Qiao et al. [35], the top-10 nodes ranked by each centrality measure collectively serve as Table 11. The top-10 ranked nodes of the USFlights network with respect to the degree (C D ), eigenvector (C E ), closeness (C C ), betweenness (C B ), and heatmap (C HM ) centrality measures. The nodes in bold are identified by all five measures. the set of initially infected nodes. As the time step t increases, the total number of infected nodes F(t) increases and finally stabilizes at time step t c , at which time there are no susceptible nodes within the network. Due to the size of each real-world network, the value of t c varies such that F(t) eventually stabilizes around t c = 25, 25, 30, and 25 time steps for the Email, Polblogs, USFlights, and Facebook networks, respectively. The mean and standard deviation of F (t) for each centrality measure is calculated from the 100 iterations. Figs 7-10 depict the mean spreading capability F(t) of the top-10 nodes with respect to each centrality measure for each real-world network, where the standard deviation is included as error bars to represent the variability. From Fig 7, in the Email network, the proposed model shows similar spreading efficiency with degree, closeness, and betweenness, while outperforming that of the eigenvector centrality. In the Polblogs network, the curve generated from the proposed measure in Fig 8 is nearly identical to that from the degree, closeness, and betweenness centralities, while outperforming that of eigenvector centrality. The result from Fig 8 is not surprising, as the same nine of the top-10 nodes are identified by the degree, closeness, betweenness, and heatmap centralities. From Fig 9, in the USFlights network, the curve produced by the heatmap centrality is steeper in comparison with the curves of degree, eigenvector, and closeness, but is similar to the curve produced by the betweenness centrality. Finally, from Fig 10, a similar spreading efficiency is demonstrated among all five measures, with the curves almost all overlapping. In conclusion, the heatmap centrality presents similar spreading efficiency performances, overall, when compared with the other measures.

Experiment 3: compare the correlation of the rankings
To quantify the correctness of the rankings with respect to the centrality measures in realworld scale-free networks, both the Spearman-rank and Kendall-rank correlation coefficients are adopted, and the results are displayed in Tables 13 and 14, respectively. Among the correlation of the rankings with respect to the centrality measures, the values of both ρ and τ are strongest for the betweenness and heatmap centralities. In particular, the correlations among the four real-world networks are 0.88<ρ(C HM , C B )<0.98 and 0.70<τ(C HM , C B )<0.88. In a comparison with the remaining three measures, the ranking of the heatmap measure is more highly correlated with that of degree while less correlated with that of the eigenvector centrality. In summary, while displaying a strong correlation with the degree centrality, the heatmap centrality possesses the strongest correlation with the betweenness measure in each of the realworld networks.

Discussion
Motivated by a different interpretation of the "shortest path" between two nodes, this paper aims to explore the properties of a new centrality measure, the heatmap centrality, as a potentially viable measure in the identification of super-spreader nodes in scale-free networks. Although high degree nodes, high betweenness nodes, and high closeness nodes have been identified as good initial spreaders, the heatmap centrality utilizes features from all three measures to strike a balance between accuracy and algorithmic simplicity in the identification of the super-spreader nodes within real-world networks. By definition, the heatmap centrality may be considered a "shortest path" based measure, in that it identifies an influential node as one with a higher likelihood of having information pass through the particular node by considering the difference in the node's farness and the average sum of the farness of its adjacent neighbors. To verify the effectiveness of the heatmap centrality, two experiments based upon simulated scale-free networks and three experiments based upon four real-world scale-free networks are conducted.
The results of the experiments based upon the simulated scale-free networks are: 1. The proposed heatmap centrality on sparse networks can be computed in an acceptable amount of time as both the size and density of the network increase.
2. With the exception of a few network sizes and densities, the heatmap measure exhibits an increasing correlation with respect to the eigenvector and closeness centralities as both the size and density of the network increase.
3. Regardless of the size or density of the network, the correlations among the heatmap with respect to the degree and betweenness centralities are strong.
4. The correlation among the heatmap and betweenness centrality is the highest correlation among any other pair of measures.
The results of the experiments based upon the real-world scale-free networks are: 1. The heatmap centrality shares, at minimum, seven of the top-10 ranked nodes identified by the degree, closeness, and betweenness centralities, respectively.
2. Using a modification of the standard SI model, the heatmap centrality presents similar spreading efficiency performances when compared with the spreading efficiency of the other measures.
3. The degree and betweenness centrality measures each possess a strong correlation with the heatmap centrality. 4. In comparison to the other measures, the heatmap centrality is most strongly correlated with the betweenness centrality.
In summary, the properties of the heatmap centrality as a potential measure to identify super-spreader nodes in scale-free networks are highlighted through the experimental results. In particular, the proposed measure may be executed in acceptable amount of CPU time, can successfully identify top-10 nodes, and possesses the strongest correlation with the betweenness centrality measure.
In this paper, the proposed heatmap centrality measure is applied to undirected and unweighted scale-free networks. Yet, a real-world scale-free network may be directed and/ or weighted. Therefore, future research includes an evaluation of the proposed measure on direct and weighted scale-free networks, as well as on other network models, such as random networks or small-world networks. In addition, future work includes exploring the relationship of the heatmap centrality with other diffusion and flow-based centrality measures, such as random-walk betweenness, flow betweenness, random walk, entropy, and information centrality.