A Bayesian approach to discrete multiple outcome network meta-analysis

In this paper we suggest a new Bayesian approach to network meta-analysis for the case of discrete multiple outcomes. The joint distribution of the discrete outcomes is modeled through a Gaussian copula with binomial marginals. The remaining elements of the hierarchial random effects model are specified in a standard way, with the logit of the success probabilities given by the sum of a baseline log-odds and random effects comparing the log-odds of each treatment against the reference and having a Gaussian distribution centered at the vector of pooled effects. An adaptive Markov Chain Monte Carlo algorithm is devised for running posterior inference. The model is applied to two datasets from Cochrane reviews, already analysed in two papers so to assess and compare its performance. We implemented the model in a freely available R package called netcopula.


Introduction
In the last decades, as the need of evidence based techniques in medical research and clinical practice has been more and more recognized, the use of meta-analysis, introduced with a high level of debate, has become widespread. Nowadays, areas of application of meta-analysis extend beyond medicine and health, being widely used in both natural and social sciences. See [1] for a critical review of the main methodological developments in meta-analysis. Traditionally, meta-analytic techniques make it possible to summarize evidence provided by several studies comparing the same treatments and considering in general one outcome at time. The basic methods combine study-specific treatment effect estimates under a fixed effect or a random effect model (see [2] and [3]). Study specific covariates and individual patients data can be incorporated as well (see for IPD meta-analysis among others [4][5][6] and [7]). Bayesian methods are widely used, making it possible to allow for all parameter uncertainty in the model, to include all relevant information and to extend the models to accommodate more complex scenarios. Advantages of the Bayesian approach are discussed and reviewed in several papers and books, see among others [8] and [9]. The second issue to be addressed in a multivariate generalization of network meta-analytic techniques comes from the fact that not all studies report results on all the considered outcomes. In multiple outcomes meta-analysis, the missing outcomes issue is addressed relying on traditional imputation and data augmentation techniques both in a frequentist and a Bayesian approach ( [29,30]). In a Bayesian approach [26] express the likelihood as a product of marginal distributions over reported outcomes following the approach suggested by Glester and Olkin in [31]. Such techniques make it possible to borrow strength across outcomes and this, as pointed out by [32], reduces the impact of a selective non reporting of the outcomes on the pooled treatment effect estimates.
In this work we adopt a contrast-based perspective to estimate the treatment effects in a network meta-analysis, see [33,34]. Contrast-based models currently represent the most popular methodology in the network meta-analysis literature. However, another approach, called armbased, has also been recently advanced (see [35][36][37][38]). In contrast-based models, a baseline treatment is defined for each study and the focus of the analysis is on the estimation of the relative treatment effects (for example using log odds ratios, or another suitable metric). In this context, the baseline effects are treated as nuisance parameters and they are usually modeled with noninformative prior distributions. This implies that absolute treatment effects cannot be directly obtained unless a reference treatment absolute effect is first estimated using information that are external to the model see among others [39]. On the contrary, arm-based models aim to model the absolute effect of each treatment in a study (for example using the log odds) and the relative treatment effects are then constructed from the arm estimates. Contrast-based models are usually advocated as more theoretically grounded compared to the arm-based approach because the latter discards the randomization structure of the evidence. Moreover, arm-based models are more likely to provide biased estimates of relative treatment effects with increased posterior variances, and they often show a slower convergence especially when some treatments are only included in few studies. On the contrary, arm-based models are more advantageous because they can also incorporate the information provided by single-arm studies. For more details on the pros and cons of the two approaches see [40] and the discussion rejoinder by [41], while for a more technical comparison between arm-based and contrastbased models for network meta-analysis we suggest the recent work by [42].
In this paper we present a new approach to network meta-analysis in the case of discrete multiple outcomes. The model we suggest is a Bayesian hierarchical random effects model that is based on a Gaussian copula likelihood, which allows to incorporate the estimation of withinstudy variances and correlations. Our approach draws on and generalizes the method suggested by [43] based on a Clayton copula model. However the switch from a Clayton copula model to a Gaussian copula model is not straightforward due to the implications in terms of computational problems to be addressed. Indeed in the case of Clayton copula model the correlation between outcomes is modeled through a single univariate parameter, while in the case of a Gaussian copula model the association is modeled through a correlation matrix. Posterior inferences are based on a latent variables adaptive Markov Chain Monte Carlo algorithm, that draws on the suggestions by [44] and [45] for copula regression models and by [46] for the simulation of correlation matrices. The uncertainty due to the missingness problem is addressed and accounted for through a posterior based imputation of missing outcomes at each stage of the algorithm. All codes, data and examples are available in a R package called netcopula, that can be freely downloaded from the following public repository https:// github.com/sergioventurini/netcopula.
The set-up of the paper is as follows. Section 2 provides a description of the suggested model and of the MCMC algorithm devised for running posterior inferences. In Section 3 two applications of the model are provided with a comparison with other two different approaches. Section 4 provides a discussion with final conclusions.

Introduction
Copula models have become widely used in all applied fields since they make it possible to split the specification of a multivariate model into two parts: the marginal distributions on one side and the dependence structure on the other side. In this way, any univariate distribution can be used for modelling the marginal behavior of the considered variables, which can be discrete and continuous. Moreover, marginal distributions belonging to different families can be selected, ensuring a higher flexibility with respect to a traditional modeling with multivariate distributions. The dependency across the variables is then modelled through a copula function that "glues together" the marginal distributions.
In the following, we briefly review the copula based approach for the case of two variables, Y 1 and Y 2 , with marginal cumulative distributions F 1 and F 2 respectively. We want to obtain a bivariate distribution for the vector (Y 1 , Y 2 ) having these two margins. Sklar ([47]) proved that we can always find a function C such that where C(y 1 , y 2 ) is the joint distribution function for a pair of bivariate uniform random variables. Sklar called C copula function and showed three relevant properties (see [48] for a detailed introduction to copula models). The distribution in Eq (1) is constructed from the marginal distributions F 1 and F 2 , while the role of the copula function is to determine the dependence between Y 1 and Y 2 . If the marginal distributions are continuous, differentiating Eq (1) gives the joint density where c(F 1 (y 1 ), F 2 (y 2 )) is the copula density. Eq (2) shows that the copula density controls the level of dependence between Y 1 and Y 2 . The copula function does not determine the distribution of the margins. It merely determines the dependence between the two random variables. There are many copula functions, one of the most popular is the Gaussian copula. In the continuous case a useful way to think at the copula method is that, based on the probability integral transformation on each margin, the original variables are each transformed into uniform random variables U j = F j (Y j ). Indeed no matter is the marginal distribution F j , U j has a uniform distribution and the dependency between the original variables carries through to the transformed uniform distributions. In this way, assuming a copula model as in Eq (1) for the pair (Y 1 , Y 2 ) reduces to considering the following model In the case of discrete marginal distributions, the marginal are steps function, we define U j as associated with the variable Y j through the following inequality Nevertheless this still ensures Y j is uniformly distributed in the interval (0, 1). The method can be easily generalized to the case of more than two variables.

Model specification
We consider a sample of n multi-arm randomized trials and let y ik denote the vector of the number of times each of M outcomes is observed in study i for treatment k, that is y ik = (y ik1 , . . ., y ikM ) > , k 2 T i ¼ f1; . . . ; a i g, where k 2 T i is the set of treatments compared in study i, with the treatment labelled as 1 representing the control (i.e. the baseline) treatment in study i whose efficacy is compared with that of the remaining (a i − 1) treatments. Note that the term "treatment 1" may refer to distinct treatments in the different studies. We suggest to model y ik as the realization of a multivariate discrete random variable Y ik , i = 1, . . ., n and k 2 T i , with distribution built from a Gaussian copula with binomial margins. We assume that for each study i, each arm k and each outcome m where F À 1 ikm ð� j n ik ; p ikm Þ is the inverse cumulative distribution function of a binomial random variable with parameters n ik and p ikm , n ik is the number of patients randomized to arm k in study i, p ikm the treatment-specific probability of an outcome of type m in study i and x ikm is the m-th component of vector x ik , that is the realization of a random vector having a multivariate Gaussian distribution with a arm-specific correlation matrix, Γ k . As previously emphasized, F ikm as the cumulative distribution function of a discrete random variable is a step function, therefore its inverse is a many-to-one function. This indeed complicates the calculations as compared to the continuous case (see [49]). The variables x ikm are latent, not observed variables to be associated with each y ikm . The logistic transformations of the treatment-specific where, μ i = (μ i1 , . . ., μ iM ) > denotes the vector of study-specific baseline effects and δ ik = (δ ik1 , . . ., δ ikM ) > indicates the trial-specific log-odds ratios of treatment k 2 T À i relative to the baseline treatment (i.e. treatment 1) in study i.
Finally, we assume that the random effects have a multivariate normal distribution where t i1 denotes the baseline treatment in study i, t ij is the j-th treatment compared in study i and d j,k = (d j,k,1 , . . ., d j,k,M ) > represents the vector of pooled effects (across trials) of treatment k relative to treatment j. The d j,k are usually the main quantities of interest in a meta-analysis.
The consistency equations where r identifies a treatment chosen as reference, ensure that the correct treatment comparison is used in the network meta-analysis (see [33] and [50]). Note that to guarantee consistency, it is also required that d r,r,1 = � � � = d r,r, M = 0. The matrix S in Eq (3) contains the variances of the random effect δ i,k, j for each treatment k 2 2, . . ., a i and each outcome m = 1. . ., M, and all possible covariances between any two random effects. As the pooled treatment effects, it is common to all studies and for this reason commonly referred to as a matrix that defines the between-study covariance structure, in opposition to the Γ i that models the within-study correlation structure across outcomes. To keep the number of parameters manageable and to allow identifiability of S, we follow [51]and make the following simplifying assumption describes the common between-study covariance structure. In this way, we assume that the variances and covariances of the random effects for each treatment comparison are the same and differ only with respect to the considered outcome. Moreover, such homogeneity assumption along with the structural relationships between the d j,k within trial i imply that the between-arm correlations are assumed to be all equal to 0.5 (see [52]). These correlations between the treatment differences come from the fact that all differences are taken relative to the same control arm, that is, they depend on the same trial baseline effect μ i .

Priors choice
As for the prior assignment, proper priors are selected and we specify them in the application so to be vague. The study-specific baseline effects μ i = (μ i1 , . . ., μ iM ) are assumed to be independent and distributed according to a normal distribution with mean zero and variance s 2 m . As well the pooled (across trial effects) d r,q = (d r,q,1 , . . ., d r,q, M ) are assumed to be independent and identically distributed according to a normal distribution with mean zero and variance s 2 d . As for the matrix S M , it has been shown in the literature that a standard conjugate Wishart prior is overly influential on the corresponding posterior distribution (see among others [26,53] and [54]). Moreover, explicitly representing an informative prior distribution for a covariance matrix is difficult. We therefore follow an alternative strategy by adopting a log-Cholesky parameterization (see [55]) for the precision matrix Σ À 1 M . More specifically, we define with R = {r m,p } being an upper triangular Cholesky factor, with p = 1, . . ., M, m � p. To guarantee that the Cholesky factorization is unique, one has to require the diagonal elements of R to be positive. To avoid constrained estimation, we use the logarithms of the diagonal elements of R. Hence, the covariance matrix is parameterized in terms of the parameter vector β ¼ ð log r 1;1 ; r 1;2 ; log r 2;2 ; r 1;3 ; r 2;3 ; log r 3;3 ; . . . ; log r M;M Þ ð5Þ Finally, we assume that the components of β, β ℓ with ℓ = 1, . . ., M(M+ 1)/2, are a priori independent and all distributed according to a normal distribution with zero mean and variance s 2 ' . The correlation matrices Γ q , with q 2 T are assumed to be independent and uniformly distributed on the space of all correlation matrices.  [45] and [44], we suggest to jointly update all latent variables and the baseline vectors μ i . This is the most delicate step of the algorithm. An adaptive metropolis step is as well foreseen for the updates of the random effects and the baseline log-odds.

Posterior computations
The updates of the copula parameters Γ q are obtained by applying the two-stage parameter expanded reparameterization and Metropolis-Hastings (PX-RPMH) algorithm for simulating a correlation matrix proposed by [46]. Finally, the full conditionals for d and S M are obtained from standard results for Bayesian analysis of multiple regression models, an adaptive metropolis step is used for their simulation.
It is worth emphasizing here that the algorithm also allows for the possibility that the outcomes are reported differently in the studies. In this situation, a simple strategy one can implement consists in analyzing only the subset of events reported by all studies. Even if this suggestion allows to bypass the problem, the risk is that a considerable amount of data may be discarded. In our approach missing data are imputed at each iteration of the algorithm.
A detailed description of devised MCMC algorithm is provided in the Appendix.

Results and discussion
We apply the suggested model to two datasets from two Cochrane reviews. The two datasets have been analysed based on two different models, so that the performance of our model can be assessed and compared against two different approaches. In both cases we assume that the studies share the same Gaussian copula correlation matrix Γ and diffuse priors are chosen for all parameters by setting in particular s 2 m and s 2 d equal to 10 3 and s 2 ' equal to 10 1/8 . In both cases not all outcomes are investigated in all studies. Missing outcomes value y ikm are imputed at each iteration of the algorithm, by imputing the corresponding latent variable value x ikm , from the model predictive distribution. The results come from a long run of the devised algorithm with 350000 iterations of which 300000 are discarded as burn-in. In both cases the convergence of the algorithm is assessed through the R coda package. Both examples can be reproduced since the corresponding scripts are provided as demos.

Home safety
In the first example we consider the data from a subset of a Cochrane review of safety education and provision of safety equipment for injury prevention, see [56] for a description of the methods. The focus is on the evidence relating to the prevention of poisoning injuries. Data come from twenty-two studies on three outcomes are recorded: Medicines Safe storage, Household Products Safe Storage and Poison Center Number Possession. Nine treatments are Overall there are three studies considering all three outcomes, nineteen studies considering two outcomes. All studies but one are two arms. Fig 2 depicts the network graph for the three outcomes.
[51] analyse the same dataset, based on a different approach. Indeed, [51] model the outcomes log odds ratio so that a continuous, Gaussian, multivariate distribution can be used as likelihood. Moreover, in [51] the within-study covariances are taken as known, as they are estimated from the data. In our approach, such matrices are assumed to be unknown so that the model provides as well an estimate of them. The analysis can be replicated by running the script example_homesafety.R to be found in the folder demo of the netcopula package.   Tables 1 to 3 display median estimates along with the highest posterior density (HPD) credibility intervals for the pooled effects estimated according to our model and according to [51], referred to as Model 3 in the original paper. Usual care is taken as baseline effect. In all cases our estimates show a smaller variability, in particular in the estimation of the pooled effect of Education + Home Visit versus Usual Care, of Education + Home Safety Inspection versus Usual Care and of Provision of Free/Low Cost Equipment versus Usual Care. The EDU+HV and EDU+HSI are indeed considered only for one outcome and the effect directly compared against Usual Care. The FE treatment is considered for two outcomes and only indirectly compared against Usual Care. The smaller uncertainty of the median estimates for such treatments

PLOS ONE
shows that our model succeeds in borrowing strength across treatments and outcomes as foreseen. There are differences in the estimates of the pooled-effects for some treatment comparisons. In almost all cases our estimates belong to the corresponding credibility intervals in [51]. A simulation study (not reported here but available as a further demo in the netcopula R package) shows that, for different settings of the true parameters value, our model is able to recover the true values of the pooled effects. Table 4 reports median estimates and highest posterior density (HPD) credibility intervals for the between-study standard deviations and correlations. Again, the estimates produced by the fit of our model show a smaller variability especially in the estimation of the within-study correlations.

Alcohol dependence
In the second example, the data come from a Cochrane systematic review of pharmacology treatments for alcohol dependency. See [57,58] for a detailed description of the methods and [59] for an update. The same data are also analysed in [43]. In particular, the authors model the outcome correlations resorting to a Clayton copula model, with one single parameter then fine-tuning such correlations. Moreover they allow for heterogeneity of the random effects. Again, our analysis can be replicated by running the script example_alcoholdependence.R to be found in the folder demo of the netcopula package.
The data come from forty-one studies and three outcomes are considered: Return to Heavy Drinking, Return to Drinking and Discontinuation. Eleven studies consider all three outcomes, twenty-two studies consider two outcomes and height report results only on one outcome. Three treatments are considered: naltrexone (NAL), acamprosate (ACA) and naltrexone + acamprosate (NAL+ACA). Fig 4 depicts the network graph for the three outcomes.
Tables 5 to 7 reports median estimates along HPD credibility intervals of the treatments pool effects obtained according to our model and [43] model. We can see that in all cases, our estimates show a smaller variability. Table 8 reports median estimates and credibility intervals for the within outcome correlations. Our model estimates a positive weak association between Return to Drinking and Discontinuation, even if there is high uncertainty on the estimation of the correlation between Return to Heavy Drinking and Return to Drinking and Return to Heavy Drinking and Discontinuation.

Conclusion
In this paper we suggest a new model for a multiple outcomes network meta-analysis in the case of discrete outcomes. Our model accounts for both correlation between outcomes and between treatments. Moreover, we deal with the case of missing at random outcomes. In a comparison with two approaches previously proposed in the literature our results show a lower uncertainty. The model we suggest can be extended to the case of outcomes of different kinds, both discrete and continuous. We use here the Gaussian copula, but the model can be easily modified to include a different kind of copula.

PLOS ONE
However, there are some limitations in the suggested approach. The first problem arises in the estimation of the copula correlation matrix. The algorithm should be improved in order to reduce the running time and the uncertainty of the derived estimates. In the proposed approach the correlation matrix is assumed to be unstructured. This in the case of high-dimensional outcome might slow down the convergence of the algorithm. In this case a parametrization of the correlation matrix might be a reasonable choice as investigated in [60] and [61]. Moreover the performance of the model is particularly affected by the number of studies that in general is not high. More specifically, our model includes three orders of latent variables, the variables for the copula, the random effects and the latent variables to be introduced in the imputation step. The smaller number of studies considered in the first example (22) compared those in the second example (41) is in our opinions at the basis of the higher uncertainty in the estimation of the correlation parameters.
Supporting information S1 Appendix. In the following we provide a description of the algorithm used for fitting the multiple outcome network meta-analysis model suggested. For the notation, we refer to

PLOS ONE
the model depicted in Fig 1. (PDF)