A Bridge Role Metric Model for Nodes in Software Networks

A bridge role metric model is put forward in this paper. Compared with previous metric models, our solution of a large-scale object-oriented software system as a complex network is inherently more realistic. To acquire nodes and links in an undirected network, a new model that presents the crucial connectivity of a module or the hub instead of only centrality as in previous metric models is presented. Two previous metric models are described for comparison. In addition, it is obvious that the fitting curve between the results and degrees can well be fitted by a power law. The model represents many realistic characteristics of actual software structures, and a hydropower simulation system is taken as an example. This paper makes additional contributions to an accurate understanding of module design of software systems and is expected to be beneficial to software engineering practices.


Introduction
Large-scale software systems have developed quickly with the rapid development of software engineering. Hence, understanding, measuring, and controlling design are significant challenges for designers, which have attracted a significant amount of attention. There are many studies on software metric methods such as property-based [1],junction point [2],productivity [3] and combination [4], but ''a common approach of using simple regression models to predict software defects [5][6][7][8] can lead to risk management decisions''. In 2002, complex networks were applied to metric software structures by Valverde et al, where the software structure is represented by a complex network. Characteristics of scale-free and small-world networks have been determined [9], and subsequently, studies have also determined that software networks that are extracted from various software also follow power-law degree distributions [10][11][12][13][14][15][16][17][18][19], exhibit strong community phenomenon [10,20], and show some complex network behavior characteristics [13,[21][22][23][24][25][26]. Furthermore, other studies have been analyzed in software systems, and subsequently, the methodology of dependability in software networks based on three dimensions of structure has been discussed, and the structural stability in software is analyzed on dimension of composition. Because of the reusability of design patterns in object-oriented software systems, the design patterns are regarded as a typical structure that has more effect on the whole [27,28]. [29] has studied nine large object-oriented software networks, recovering that graphs associated with these software networks are self-similar. They have also studied the time evolution of fractal dimensions during software system growth, and a significant correlation is found between the complexity metrics and the fractal dimension. [30] has presented a systematic empirical analysis of the statistical properties of communities.
On the other research front, some studies are trying to develop a metric for the role of a module in software networks, but few models can describe the ''bridge'' role of a module more accurately. The Weighted OO Software Coupling Network as the node weight is proposed in [31], where the weight and out degree follow a power law distribution. [32,33] have introduced main metric parameters of software networks in detail and have integrated these metrics parameters into a hierarchical metric set. The analytic results in [34] have revealed that most of the parameters in complex systems can also be used to represent properties of software structures, some efficient metrics and methods are introduced which are based on basic parameters in other complex systems, and a practical example is used to demonstrate the validity and effectiveness of the proposed metrics. [35] has described some recent algorithms that appear to work as well as some algorithms based on betweenness, which is one of the most important metrics of the centrality of a module in a software network. [36] has introduced another important metric model: closeness. It makes regular and macroscopic analysis and subsequently, utilizes the method to measure important features and characteristics. The relativity among the integral measure and identities facilitates important proofs for the qualification of software qualities. This paper is motivated by the above considerations. [10,[32][33] have proposed some metric parameters and models to represent properties of software structure that are independent of the connectivity role of a module, and modules in [34][35][36] represent the centrality of a module in software networks but are different from connectivity. Hence, a new model is proposed from a new perspective. Some modules behave stronger connectivity than other modules, and if a fault occurs, neighbors of these modules cannot connect to each other. A ''bridge'' is used to represent the connectivity of the module in the software network; therefore, a bridge role metric model that can more accurately serve as a metric for characteristics is proposed. The remainder of this paper is outlined as follows. After describing the bridge role metric model in section 2, we compare this model with two other previous models and analyze the correlation between the Bre results and other fundamental metrics. In section 4, an actual hydropower system is taken as an example to demonstrate the validity of the model and the implications of design principles for software structure are discussed. In section 5, the conclusion is presented and future studies are proposed.

Software networks
Software is a system which is composed of many interactional and collaborative units reflecting coding, design and execution. The extraction from codes to network is displayed in Fig. 1. Particularly, some modules are reused or rely on other modules, and the dependency relations between two modules A and B include two types: inheritance and association. If A makes reference to B (either through association or inheritance) in its definition, there is an edge directed from A to B and vice versa. Hence, a software network is defined as G~vV ,E,f w, where V = fv i Di~1,::::::,Ng, which represents modules, is a set of nodes  and E = fe i,j Di~1,:::m; j~1:::ng(DED~M), which denotes relations between modules. The repeated edges between modules are not considered. Software is regarded as an undirected network, and e i,j = e j,i . Node i is characterized by parameters such as the degreek i , closeness C C (i), and the betweenness C B (i), which are presented in section 3. In this paper, approximately 100 randomly selected software (listed in table in Appendix S1) from the open source community (http://sourceforge.net, http://code.google. com/hosting/ and http://www.oschina.net) are chosen as empirical cases.

Bridge role metric model
As mentioned above, two metric models can better measure the centrality of a node in the software network, i.e. the closeness C C (i) [36] and the betweenness C B (i) [35], and their definitions are as follows: , where N is the number of nodes in the software network, i,.j, and d ij~dji~1 if there is an direct connection between node i and j; otherwise, d ij~djĩ d iv1 +d v2v1 +……+d v2vm +d vmi , m,~n.
where d st (i) = 1 if node i is located on the shortest path between node s and t, and d st is the number of shortest paths between s and t.
Conversely, closeness and betweenness cannot show the connectivity role effectively; therefore, the bridge role metric model is proposed in this paper, and comparisons with the two metric models above are executed.
The definition is as follows: where node j is any neighbor of node i, and d ij~1 if there exists an edge between node i and node j; otherwise, d ij~0 . We now discuss the value of equation (1). We suppose the number of all neighbors of node i is n (1ƒnƒN-1,); therefore, node i and all its neighbors can be considered as a community as follows: We set the number of neighbors (including node i) of node k (k=i,k=j) as n k (1ƒn k ƒn) and the number of neighbors of node j as n j (1ƒn j ƒn). We obtain p kj~1 nk and Bre i~P j=i ( 1 n z P k,k=i,k=j 1 n 1 nk ) 2 . One of the two extreme cases is that when n j1 and the other extreme case is that when n k =n(k=i,k=j), i.e., the community is a n+1-clique, then We set Y = (2n{1) n 3

2
, and subsequently, the solutions of equation Y9 = 0 are n = 0.5 and n = 1.5, but n is an integer; hence, the extremal solution is n = 2 and Bre i~1 .125. When n §3,(2n{1) 2 ƒn 3 ; therefore, Y,1. Finally, another case should be discussed in which the community is not a star network or a clique. In these cases, the maximum can be computed with the recurrence method. Suppose that there are n(n{1) where n~0 means that the node is an isolated one.
Computing the Bre value can be described with the following algorithm. The method of creating the hierarchical network is by placing the nodes to a corresponding hierarchy based on its centralization. For example, the nodes that are in the center will be placed in the most inner, and the whole network will be similar to a multi-ring network.  Removing the nodes of the current hierarchy from |N|; end while returning results

Comparisons
How does the bridge role metric model represent the connectivity of a node? Fig. 2 shows several cases with the number of edges gradually increasing, and node 1 is taken as an example to explain the function of the model. In Fig. 2(a), there are no edges between neighbors of node 1; hence, the neighbors cannot make contact with each other without node 1. In Fig. 2(b), four pair of nodes can connect to one another without node 1, and in Fig. 2(c), much more of these types of nodes exist. In Fig. 2(d), any node can connect to any other one. It can be concluded that the connectivity of a node in a given community becomes stronger with Bre i value decreasing, and it has been theoretically proved in section 2.2 with equation (3), where connectivity means the ability to make other nodes communicate with each other.
As mentioned in section 2.2, two previous metric model parameters are the closeness C C (i) [36] and the betweenness C B (i) [35]. How does the metric model in this paper work more effectively than these two models? Node 1 is taken as an example, and comparisons are described as follows.
In Fig.3, there are two networks that almost have the same structure except for a few edges. Node 1 is in the center of (a) and (b). In (a), nodes 2, 6, and 10 can communicate with each other only through node 1; hence, node 1 acts as a ''bridge''. In (b), all other nodes can connect to each other without node 1; therefore, in the latter network, the role of node 1 is not very important. The closeness of node 1 C C (1) is 0.5 in the two networks, which cannot reflect the evident different role of node 1 as an intermediate node.
Nevertheless, in the former network, Bre 1~0 .3333, and Bre 1~0 .6533 in the latter network. It shows that the bridge role metric model can reflect the connectivity of a node more effectively than can the closeness.
We now concentrate on the other previous metric model parameter: betweenness. In addition, there are two networks in Fig. 4, where the latter one has two more edges than the former. In (a), there are a total of 66 shortest paths between the nodes excluding node 1, in which node 1 is located in 54 paths; therefore, C B (1) = 0.8182. Meanwhile, node 1 acts as a bridge to allow nodes 2, 6, and 10 to communicate with each other, where Bre 1~0 .25. In (b), the connectivity of node 1 for nodes 2, 6, and 10 does not change, and Bre 1~0 .25; however, the shortest paths that node 1 is located in decrease, where C B (1) = 0.6667. It should be noted that Bre 2 ,Bre 5 ,Bre 8 and Bre 11 are altered because of the two extra edges.
The conclusion can be drawn as discussed above that the metric model proposed in this paper can reflect the connectivity of nodes more effectively than the closeness or betweenness.

Simulations
Some studies have revealed that software networks follow power law distributions over an extent of degreeK, which is the number of edges attached to the node P(K),K l [36]. It is natural to consider the correlations between the bridge role metric model and other metrics. Fig.5 shows the correlations of Bre, between- ness, closeness, and the degree K in four familiar software networks.
Typically, centrality (closeness or betweenness) has a significant correlation with the degree; nevertheless, it can be seen in Fig. 5 that the closeness or betweenness increases but is less pronounced as the degree K increases, and the centrality of a node does not significantly depend on its degree. Specially, it is determined that Bre values are logarithmic with the degrees, and it indicates that the node plays a less important connectivity role with increasing degree. Meanwhile, there are more edges between the neighbors of the corresponding node. The correlation contributes more to an accurate understanding of the module for software engineering practices. If there are some reusable modules in a software system, they will obey the engineering principle where if the reusable rate is high. The corresponding module is often redesigned as several additional modules, the neighbors often use or rely on more than one modules, and hence making the neighbors more ''close''.
The Bre values have a close relation with the edges between their neighbors; therefore, there is most likely a correlation between them and another metric model called Clustering Coefficients (CC2) [34], which also depends on the edges.

CC2~E
( , where E(G k (v)) is the number of edges among nodes in the k-neighborhood of node v(k = 1,2). As seen in Fig.6, there is an approximately linear correlation between CC2 and Bre. The correlation indicates that increased use between parts of neighbors (k = 1,2) will inevitably lead to a decrease of the connectivity of the corresponding module. Because of the scalefree characteristic (P(K),K l ) mentioned above, it is clear that P(Bre)is not proportional to Bre l , which represents the difference between Bre and K from another point of view. The Bredistribution certainly does not reflect the scale-free [26] nature of the software system, which is shown in Fig. 7.
To verify the validity of the metric model proposed in this paper for software engineering practices, a hydropower simulation system [34] is taken as an example. The architecture and corresponding networks are shown in Fig. 8. It is developed by Embedded Technology Lab in Northeastern for the Fengman station, which was the earliest established large hydropower station. The software has access to two national software copyright studies (No. 0009448 and No. 050963) and has been working for more than ten years. The metric model in this paper is used for fault detection in developing version 2.0 software. First, modules are sorted based on the Bre values, subsequently, source codes of the modules that have lower values and are not isolated are analyzed. The studies determined that there are fault-pronesses [37] in four modules (the XJ, RD, LP and VoltCurr modules), which lead to overall instability. These modules are basic control units and plays significant bridge roles in the system because other modules inherit or use them. The studies facilitated redesigns to reduce the fault-proness and enhance stability.

Conclusions
The contribution of this paper is the proposed bridge role metric model. Because of the different connectivity role of a node in a software network, we use the Bre metric model instead of the previous two metric models: betweenness and closeness. After providing a definition, the range of the metric value was discussed. The metric model's function is illustrated with different cases as well as theoretically. Comparisons are also carried out, and the analysis indicates that the model can reflect the connectivity more effectively. Furthermore, it is determined that Bre values are logarithmic with the degrees and are proportional to another metric model-Clustering Coefficients-CC2, which indicates that the node plays a less important connectivity role as the degree increases. Nevertheless, P(Bre) is not proportional to Bre l . To verify the validity of the model in software engineering practice, a hydropower simulation system is taken as an example to detect the fault-proness in modules.
However, we still require further work to improve the application of the model in software structure designs. Most likely, we can detect fault-proness through a combination of this model and others (K, closeness, etc.). Second, it also required some other proof, we will use some software engineering metrics such as coupling, Cohesion to support the solution proposed. Additionally, further investigations to extend the metric model to macro-and micro-structure should be carried out to emphasize estimating the role of a node in the entire software network more effectively.
The work in this paper could facilitate a better understanding of the role of modules in systems. Actually, because local instability most likely leads to global failures, the structure is very important for designers to predict the fault-proneness of a module. The metric can help us to redesign the structure of software, improve the quality of software, and subsequently shorten the development life cycle.

Supporting Information
Appendix S1 Appendix to the manuscript. (DOC)