Description of network meta-analysis geometry: A metrics design study

Background The conduction and report of network meta-analysis (NMA), including the presentation of the network-plot, should be transparent. We aimed to propose metrics adapted from graph theory and social network-analysis literature to numerically describe NMA geometry. Methods A previous systematic review of NMAs of pharmacological interventions was performed. Data on the graph’s presentation were collected. Network-plots were reproduced using Gephi 0.9.1. Eleven geometric metrics were tested. The Spearman test for non-parametric correlation analyses and the Bland-Altman and Lin’s Concordance tests were performed (IBM SPSS Statistics 24.0). Results From the 477 identified NMAs only 167 graphs could be reproduced because they provided enough information on the plot characteristics. The median nodes and edges were 8 (IQR 6–11) and 10 (IQR 6–16), respectively, with 22 included studies (IQR 13–35). Metrics such as density (median 0.39, ranged 0.07–1.00), median thickness (2.0, IQR 1.0–3.0), percentages of common comparators (median 68%), and strong edges (median 53%) were found to contribute to the description of NMA geometry. Mean thickness, average weighted degree and average path length produced similar results than other metrics, but they can lead to misleading conclusions. Conclusions We suggest the incorporation of seven simple metrics to report NMA geometry. Editors and peer-reviews should ensure that guidelines for NMA report are strictly followed before publication.


Results
From the 477 identified NMAs only 167 graphs could be reproduced because they provided enough information on the plot characteristics. The median nodes and edges were 8 (IQR 6-11) and 10 (IQR 6-16), respectively, with 22 included studies (IQR . Metrics such as density (median 0.39, ranged 0.07-1.00), median thickness (2.0, IQR 1.0-3.0), percentages of common comparators (median 68%), and strong edges (median 53%) were found to contribute to the description of NMA geometry. Mean thickness, average weighted degree and average path length produced similar results than other metrics, but they can lead to misleading conclusions.

Conclusions
We suggest the incorporation of seven simple metrics to report NMA geometry. Editors and peer-reviews should ensure that guidelines for NMA report are strictly followed before publication. PLOS

Introduction
Two reviewers performed all of the steps of the systematic review process (i.e. title and abstract reading (screening), full-text appraisal and data extraction) individually, and discrepancies were resolved by a third author (PRISMA checklist-S4 File).
Searches were conducted in two scientific literature database platforms (PubMed and Scopus), without limits for time-frame or language (update April 25th, 2017). A manual search in the reference lists of included studies and grey literature searches (Google) were also performed. The full search strategies are in supplementary material (S1 File). We included studies reporting NMAs (e.g. multiple or mixed treatment comparisons/meta-analysis, indirect metaanalysis) comparing any drug therapy intervention head-to-head or against placebo. We considered any type of network (open or closed-loops) of experimental, quasi-experimental, or observational trials. Non-NMAs, study protocols, studies reporting data only on non-pharmacological interventions, and articles written in non-Roman characters were excluded.

Data extraction, metrics proposal and testing
We used a standardized data collection form to extract data on: (i) the study general characteristics (authors names, countries of affiliation, publication year) and (ii) network key-aspects: presence of network-plot (graphical representation of comparisons) and description of the geometry, including number of nodes (i.e. interventions), number of edges (i.e. direct comparisons evidence), and number of included studies (thickness of the edges).
The network-plots of all included NMA studies were replicated using Gephi 0.9.1 (https:// gephi.org/). The network-plot is defined as a graph (G), an ordered pair of nodes (N) or vertices, together with a set of edges (E) or lines. After the replication of NMA plots, we applied eleven adapted descriptive parameters and geometry metrics from previous concepts of social network analyses and graph theory to describe all NMA structures [23][24][25][26]. The definition of the adapted parameters and metrics are shown in Table 1 (see S2 File for metrics to describe NMAs).

Metrics' statistical analyses and sensitivity
Descriptive analyses were conducted with all parameters and metrics. Variable normality was assessed with the Kolmogorov-Smirnov Test and re-evaluated through Q-Q normal plots that revealed that all variables that were non-normally distributed. The variables were then expressed as absolute and relative frequencies.
To test the usability of the eleven proposed parameters and metrics to describe the NMA geometry, we compared the results obtained for each parameter and metric among all the evaluated networks and performed sensitivity analyses including: (i) Comparison of the results obtained for each parameter and metric among networks with different structures, i.e. visual display (geometry), but with the same number of nodes and edges; (ii) Comparison of the results obtained for each parameter and metric among networks with equal structures, i.e. visual display (geometry), but different number of included studies.
Considering the results obtained during the sensitivity analyses, and to explore the relationship between all the eleven proposed parameters and metrics, the Spearman test for nonparametric correlations was used. The Bland-Altman plot and Lin's concordance test (concordance correlation coefficient) were used to analyze the agreement between the metrics presenting a moderate-strong correlation. Thus, the aim of these correlation analyses was to evaluate the level of association among metrics and to avoid reporting overlap (i.e. that is, metrics measuring the same characteristic). The parameters and metrics that presented better results during the analyses, identified as relevant to describe NMA geometry, were selected for discussion. All analyses were conducted in IBM SPSS Statistics v. 24.0 (Armonk, NY: IBM Corp.) and probabilities below 5% were considered statistically significant [41][42][43].

Results
The systematic search in PubMed and Scopus yielded 2179 registers, of which 690 were fully appraised and a total of 477 NMAs were considered for the analyses. The display of the network-plot (item S3 from PRISMA-NMA statement) was provided by 79.4% of these NMAs, but a minimum set of descriptions of the network geometry were presented, according to PRISMA-NMA item S4, by only 249 studies (52.2%). However, during the replication of the network-plots, just 167 NMAs (35.0%) provided enough information about the graph Average degree The degree of a node is the number of edges incident to the node, with loops counted twice. The total degree of a graph is the sum of the degree of all nodes. The average degree is a network level measure. It is calculated from the value of degree of all nodes in the graph, divided by the number of nodes.
Average weighted degree A graph is a weighted graph, if a number is assigned to each edge. In this case, the weight is the number of studies per edge. The weight of the graph is the sum of the weights given to all edges, divided by the total number of nodes.

Density
Density is a measure of the connectedness of a graph, and is defined as the number of connections, divided by the number of possible connections. The graph is dense if the number of edges approaches the maximal number of edges possible (value closer to 1.0), otherwise is sparse (value closer to 0).

Percentage of common comparators±
Complete graphs have the feature that each pair of nodes has an edge connecting them. In this case, all nodes are directly linked and can be considered 'common comparators'. The higher the percentage of common comparators, the more strongly connected is the network. Different from what may occur with density, this metric may better represent the visual display of a network.
Percentage of strong edges± The number of studies in an edge is proportional to the existing direct evidence among two nodes. Edges with only one study can be considered a weak piece of the network. Strong edges contribute more to the robustness of the evidence. This metric accounts for the percentage of edges with more than one study (named 'strong edges').

Mean thickness±
The thickness of an edge is the number of studies assigned to that edge. The mean thickness of a graph is the total number of studies, divided by the total number of edges. This can be obtained by the division of the average weighted degree by the average degree. However, it does not consider the dispersion of the values.

Median thickness with dispersion value±
Different from the mean thickness, the median thickness is the expression of the median number of studies per edge in a network, along with a dispersion measure reported as interquartile ranges (IQR 25% and 75%).

Average path length
The length of a path is the number of edges that a path uses to reach node to node. The average path length is the number of steps along with the shortest paths for all possible pairs of nodes in the network.
� All parameters and metrics were adapted from previous studies on social network analysis and graph theory [23][24][25][26].
±Metrics especially created to support the report of NMAs geometry. https://doi.org/10.1371/journal.pone.0212650.t001 geometry that allowed its reproduction (e.g. data on the number of studies for each edge). See Fig 1 for the flowchart of this process. The overall results of the geometry of the 167 NMAs, after applying the eleven proposed parameters and metrics, is shown in Table 2. The full-assessment of each NMA is in the Open Science Framework platform (doi: 10.17605/OSF.IO/SP7UM). Overall, the included networks had a median of 8 'nodes' (IQR 6-11) and 10 'edges' (IQR 6-16) with 22 included 'studies' (IQR . The 'average degree' (degree of connection) of the networks was of 2.55 connections per node (IQR 2.00-3.00). A total of 6 networks presented the lowest value for this metric (1.50), with all of them composed by 4 nodes and 3 edges. The highest 'average degree' (5.14 edges per node) was obtained for a network with 7 nodes and 18 edges. The mean 'percentage of common comparators' (nodes with more than one connection) was around 70%, with 38 plots considered strongly connected (100% of nodes as common comparators). Around 35% of networks presented half of their nodes with only one connection ('loose-ends'). The 'density' (total number of connections in the network divided per the number of possible connections) varied from 0.07 for the most poorly connected network (32 nodes and 32 edges) to 1.0 in 12 completely connected networks (e.g. structures with 3 nodes and 3 edges; 4 nodes and 6 edges; 5 nodes and 10 edges). The 'average path length' of the networks (distance between nodes) was 1.69 (IQR 1.50-1.89), varying from 1.00 for small networks (e.g. 3 nodes and 3 edges; 4 nodes and 6 edges) to 5.25 in large networks (plot with 32 nodes and 32 edges).
The overall 'mean thickness' of the evaluated networks was of 2.95 studies per edge. One small network (4 nodes, 5 edges) with 119 trials reached the highest value for this metric (20.00 studies per edge), that corresponded to 13.00 studies per edge (IQR 8.00-34.00) considering the metric 'median thickness'. Eleven networks presented only one study per edge, while 23 networks (plots varying from 3 nodes and 3 edges to plots with 8 nodes and 14 edges) presented all edges (100%) with more than one study ('percentage of strong edges' metric).
The sensitivity analyses highlighted some differences in the metric's results for networks with equal number of nodes and edges, but with different three-dimensional structures (graph display). We have exemplified these differences in Fig 2, using three NMAs included in the systematic review (named as A, B, C) that present identical size, with 5 nodes and 5 edges, because they were the most frequently reported among the 167 NMAs with graphs provided. Since the total number of included studies in all of these three networks was 5, this variable was not considered in this first sensitivity analysis. 'Density' and 'average degree' values were equal between the three network plots (0.5 and 2.00, respectively). However, differences were noted in the metric 'percentage of common comparators', where networks with more loose-ends (nodes with only one connection) have lower rates of 'common comparators' (60% for networks A and C; 80% for network B). The 'average path length' also differed among these networks, but with a different pattern than the other metrics, with values of 1.50 for structures A and B, and 1.60 for graph C.
Sensitivity analyses also revealed different metric results for networks with equal geometry structures, but with different numbers of included studies (Fig 3). We have also exemplified this analysis with three similar plots (A, B, C) from our systematic review. In this case, differences were noted in the weight of evidence. 'Average weighted degree', 'mean thickness', and 'median thickness' showed similar performances, presenting higher values Analyses revealed some strong positive correlations as between 'average weighted degree' and 'mean thickness' (Spearman's ρ 0.907; p<0.001) and between 'mean thickness' and 'median thickness' (Spearman's ρ 0.865; p<0.001). 'Percentage of common comparators' also correlated with 'density' (Spearman's ρ 0.626; p<0.001). Negative, but strong, correlation was found for 'percentage of strong edges' with 'average weighted degree' (Spearman's ρ -0.732; p<0.001), with 'mean thickness' (Spearman's ρ -0.867; p<0.001), and with 'median thickness' (Spearman's ρ -0.903; p<0.001) (see Table 3). However, the concordance analyses and Bland-Altman plots showed that 'mean' and 'median thickness' were the only metrics to present substantial agreement (concordance correlation coefficient ρ c = 0.820) (see S3 File).

Discussion
Our evaluation of the geometry of 167 NMA plots indicates that the description of some parameters and metrics are crucial to ensure network reproducibility and may help during evidence interpretation, especially because these network plots are readers' first contact with the available evidence. We have adapted and tested the usability of eleven metrics for NMA geometry description, grounded on social network analysis and graph theory literature. Until recently, NMAs were only used by researchers with a strong statistical background, but the development of user-friendly software has popularized this method [2,4]. However, there is a series of conceptual challenges when conducting and reporting a NMA and these should also be considered by clinicians who read such scientific publications [3,6]. Firstly, the presentation of NMA results is not as straightforward as in traditional pairwise meta-analysis [22,44]. The validity of NMA is based on the underlying assumption that there is no imbalance in the distribution of effect modifiers across the different types of direct treatment comparisons, regardless of the structure of the network [8,45].
Previous studies showed that the synthesis methods and analytical processes for NMA conduct and reporting, including the representation of network structure and other diagrams, still need improvement [46,47]. Improvement is also necessary because network structures also seem to have influence on the final results of NMAs. Salanti and collaborators have investigated 18 different NMAs [20] and showed that entirely star shaped networks (or close to this pattern) have only one comparator, typically placebo or no active treatment. This pattern may suggest study treatment preference bias (e.g. publication bias, missing outcome data), with strong or ubiquitous avoidance of head-to-head comparisons of active treatments [21,48], and should be carefully interpreted.
In our analyses, we were able to reproduce only 35% of the NMA-plots found in the systematic review. Part of this issue was due to the lack of a network diagram or minimum description of geometry, as recommended by the PRISMA-NMA statement. Another group of network-plots, although minimally complying with the PRISMA-NMA checklist items, failed to detail some information about the graph (e.g. amount of studies included in each edge) that prevented their replication. This highlights the need to review the PRISMA-NMA checklist to standardize the report of NMA, requiring authors to provide a minimum set of parameters and metrics of geometry to allow reproducibility. As we have shown by replicating the network-plots, the graphical presentation of the network provides an accessible and understandable format for describing the evidence, how information flows indirectly, the contribution of certain interventions, and the evidence gaps [37,49]. Usually, the more treatments and studies included in a network, the more clinically informative the NMA is [49]. However, large networks informed by few studies often yield imprecise evidence and may show inconsistencies, whereas a smaller network contains less evidence but may show no clear inconsistencies [50,51]. For this reason, the network graph itself is not enough to provide a complete and transparent picture of the available evidence. Slight modifications in the NMA geometry may also have impact on the evidence resulting from the analysis and subsequently influence the decision making process. Thus, in addition to network size, the description of parameters and metrics is useful to supplement graph information [20], especially for distinguishing similar NMAs, as we have demonstrated in our sensitivity analyses. Moreover, a proper geometry description can foster the statistical analysis of the NMA, help in procuring reliable estimates and recommend further trials if necessary [37,49].
After testing eleven metrics, we suggest that, besides reporting three obvious items (number of 'nodes', 'edges' and 'number of studies per edge'), four additional metrics should be incorporated in the future NMA report: 'density', 'percentage of common comparators', 'median thickness' (median of number of studies per edge with interquartile ranges) and 'percentages of strong edges'. 'Density' is a measure of the connectedness in a graph, revealing how many edges are needed to complete the network [13,14]. However, in two different NMAs with the same number of nodes and edges, density is identical. This measure is not influenced by the network three-dimensional display and does not depend on the size of the network. In this case, a complementary measure-the 'percentage of common comparators'-was useful for better defining the display of the structure. This metric provides the number of loose-ends (nodes with only one connection) in the network, which represent poorly compared interventions in the literature that should be better investigated in future trials.
On the other hand, the results of 'average path length' were found to be misleading. This metric is commonly used in social network analyses to account for the distance between objects in the network [14,24]. However, for NMAs, the average distance between all of the interventions does not correspond to the number of loose ends or missing edges in the network. Networks with the same number of nodes, edges, and loose ends may have different 'average path length' that vary according to structure.
Among the measures evaluating the 'weight' of evidence, we found that 'average weighted degree' may also be misleading, since its results are double of those obtained by 'mean thickness'. This occurs because the first measure describes the amount of studies per connection, while the second shows the number of studies per edge. 'Average degree' and 'average weighted degree' are commonly used in social analyses to report positive and negative edges and its relationships [23,25]. However, since NMA edges have no direction, we suggest the report of 'median thickness' (because it includes a dispersion measure), together with the report of the 'percentage of strong edges'. These metrics seems more reasonable to calculate and interpret, and properly account for the weight of evidence in the network edges.
Besides the report of these parameters and metric of NMA geometry, the interpretation of the plots can also benefit from different design approaches. For instance, different colors for the edges to represent the level of confidence of comparisons between treatments (e.g. risk of bias) can be used. The amount of evidence can also be weighted in the nodes of the networkplots. Their sizes can proportionally represent the population sample included for each intervention [19]. However, this representation should be carefully evaluated since it can lead to inaccurate conclusions. The final size of a specific node should account for all of the samples of included studies on that specific intervention. There are several graphical tools available for drawing a network-plot and calculating geometric parameters and measures [52,53]. Moreover, software such as R, STATA, or WinBUGS which are frequently used to perform the NMA statistics, can also be programmed to perform the diagrams and compute network metrics as well improve studies reporting [18,27,28,54]. Additionally, authors' of NMA should provide network graph for each outcome. The certainty of each treatment comparison should be estimated by using a standard approach like the GRADE (Grading of Recommendations Assessment, Development and Evaluation). To facilitate the visualization of the level of evidence (represented by the GRADE panels of outcome-graphs) or the risk of bias (Cochrane risk of bias assessment) different thickness or colors for individual edges should be used in NMA graphs [18,31,55,56].
The main strength of our study was to suggest geometry metrics to standardize the report of NMA plot characteristics aiming at quantitative measure the NMA complexity, which may not be sufficiently evident just by the plot visual analysis. These metrics are simple and usable, both for technical and non-technical readers, and may guide further research on this topic. Our study also has some limitations. We included only NMAs of drug interventions and, although our results cannot not be immediately translated to other type of NMAs, there is nothing indicating differences among NMAs of different types of interventions. Further research on the relationships of network elements and other potential metrics of geometry should be explored. Bland-Altman limits of agreement are usually used to assess differences in normally distributed data; however, literature demonstrated that this test may be applied in non-normal data without a big impact [41,42].

Conclusions
Overall, seven simple geometry metrics were shown to be useful for describing NMA structure, contributing to data interpretation, and reproducibility. Guidelines and recommendations on the conduct and reporting of NMAs should strictly require the display of a network-plot and its complete description based on these metrics. Editors and peer-reviews should also ensure that reporting guidelines, including these items, are followed before publication.