Figures
Abstract
Differential networks (DN) are important tools for modeling the changes in conditional dependencies between multiple samples. A Bayesian approach for estimating DNs, from the classical viewpoint, is introduced with a computationally efficient threshold selection for graphical model determination. The algorithm separately estimates the precision matrices of the DN using the Bayesian adaptive graphical lasso procedure. Synthetic experiments illustrate that the Bayesian DN performs exceptionally well in numerical accuracy and graphical structure determination in comparison to state of the art methods. The proposed method is applied to South African COVID-19 data to investigate the change in DN structure between various phases of the pandemic.
Citation: Smith J, Arashi M, Bekker A (2022) Empowering differential networks using Bayesian analysis. PLoS ONE 17(1): e0261193. https://doi.org/10.1371/journal.pone.0261193
Editor: Marton Karsai, Central European University, HUNGARY
Received: June 1, 2021; Accepted: November 24, 2021; Published: January 25, 2022
Copyright: © 2022 Smith et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying the results presented in the study are available from https://archive.ics.uci.edu/ml/datasets/spambase for the spambase dataset. The corresponding COVID-19 data are available from https://www.nicd.ac.za/diseases-a-z-index/disease-index-covid-19/surveillance-reports/ and https://ourworldindata.org/coronavirus/country/south-africa.
Funding: This work was based upon research supported in part by the National Research Foundation (NRF) of South Africa, SARChI Research Chair UID: 71199; Ref.: SRUG190308422768 grant No. 120839. The opinions expressed and conclusions arrived at are those of the authors and are not necessarily to be attributed to the NRF. The research of the corresponding author is supported by a grant from Ferdowsi University of Mashhad (N.2/55265).
Competing interests: The authors have declared that no competing interests exist.
Abbreviations:
Introduction
Probabilistic networks are becoming ever-present in a multitude of scientific disciplines. These networks aim to illustrate the relationships, if any, between the components of complex systems [1]. If the data is assumed to be Gaussian distributed with mean μ and covariance matrix Σ; the precision matrix Θ ≔ {θij}, defined as the inverse of the covariance matrix Θ ≡ Σ−1, directly determines the conditional dependence relations and structure of the Gaussian undirected graphical model [2].
Differential network (DN) analysis is a statistical methodology that involves functions of at least two graphical models. Let define a graphical model with nodes and a set of edges . The graph visually depicts the conditional dependence structure between the nodes of the system. The adjacency matrix associated to a graphical model is the binary encoded p × p precision matrix where the entries of the matrix are equal to 1 if the corresponding precision matrix entry is nonzero and zero otherwise. Nonzero adjacency matrix entries indicate an edge between corresponding nodes of . For this work, the focus will be on the difference of two Gaussian graphical models (GGM), and that share the same set of nodes . In particular, the edge sets given here are equivalent to the adjacency matrices obtained from the GGM estimation. More specifically, assume that the observations, and are generated from a p variate Gaussian distribution, Np(μ1, Σ1) and Np(μ2, Σ2), respectively, where n1 and n2 indicate the respective sample sizes that need not be equal. The interest here is estimating the DN (), that is the difference between two precision matrices. Numerous measures exist for comparing and evaluating the differences between graphical structures [1]. DN analysis is becoming increasingly popular and important, for example in biological systems where protein interaction networks can be utilised as informative biosignatures for prevalent diseases [3, 4]. The fundamental idea here is that, if two molecules interact with one another then a statistical dependency between them should be observed. Additionally, another application of DNs is multivariate statistical quadratic discriminant analysis [5, 6], under the Gaussian distribution assumption.
A key component of DN analysis is the estimation of covariance and precision matrix components. Numerous statistical matrix estimation, as well as graphical model determination methods exist within literature. In particular, from a frequentist approach [7], introduce a computationally efficient neighborhood selection procedure. The lasso is used for covariance estimation which enjoys consistency for sparse high-dimensional graphs. The approach is quite effective, in that the sparse precision matrix is estimated by fitting the lasso to each variable using the remaining as predictors. Finally, the estimated precision matrix entry (θij) is non-zero if the estimated coefficient of i on j or vice versa is non-zero. Importantly, their algorithm can consistently estimate the set of non-zero entries in Θ, [8]. For a penalised likelihood methodology for sparse precision matrix estimation see [9, 10]. More so [11], estimate the undirected graphical model using both a block coordinate descent algorithm, as well as Nesterov’s first order method [12]. Additionally [13], propose a ℓ1 constraint estimation technique for both sparse and non-sparse high dimensional matrices with applicability on a wide range of sparsity patterns and class of matrices; precision estimation in GGMs for example. For a joint graphical model estimation approach see [14, 15].
Fully Bayesian treatments of GGM estimation are, also, well rooted in literature. In particular [16], introduce the Bayesian adaptive graphical lasso (BAGLASSO) which utilises a generalised Pareto distribution in the hierarchical formulation of the Bayesian graphical lasso. [17] provide a method for graphical model determination by invoking positive prior mass on the event that there is no conditional dependencies between variables. In terms of joint graphical model inference from a Bayesian perspective see [18]. Lastly [19], propose using Kullback-Leibler divergence and cross-validation for graphical model structure estimation.
Background
Recently, a plethora of statistical techniques have emerged for estimating DNs. These techniques can largely be classified into two main categories. The first estimating the individual precision matrices, Θ1 and Θ2 separately; where the estimated DN is the difference between the estimated precision matrices. For example, the methods and references for GGM estimation outlined in the introduction can be used to directly estimate Δ. The second methodology estimates both the precision matrices simultaneously. The approach here, typically penalises a joint loss function for both precision matrices. [20] provide a methodology for inference and estimation of functions of GGMs. In particular, the Intertwined Graphical Lasso (IGL) approach biases the estimation of the precision matrices towards a common value. More so, their Graphical Cooperative Lasso (GCL) utilises a group-penalty for solutions that favour a common sparsity pattern. [14, 21] estimate separate graphical models using a joint penalised loss function. [22] propose a method for estimating Δ directly which relaxes the need for both individual precision matrices to be sparse nor be estimated directly. Similarly [6, 23], utilise an alternating direction method of multipliers (ADMM) algorithm for estimating Δ from their joint ℓ1 penalised convex loss function. More recently [24], introduce a computationally efficient iterative shrinkage-thresholding algorithm for minimising the ℓ1 loss function defined in [6], namely (1) is convex and S1 and S2 are the sample covariance matrices. The DN estimate is obtained by minimising the penalised loss Eq (1). An analogous symmetric convex loss function and estimator is proposed by [23].
The shrinkage-thresholding algorithm proposed by [24], based on the fast-iterative shrinkage-thresholding algorithm in [25], aims to minimise Eq (1). The objective function is given by where . The lasso tuning parameter, ρ, controls the strength of the penalty term and resultantly the amount of shrinkage (precision matrix entries shrunk towards zero) too. The optimisation objective converges to the solution sequentially using a quadratic approximation and a gradient descent algorithm. The efficiency of the procedure is attested to this approach, resulting in superior computational complexity in contrast to the ADMM approaches by [6, 23]. To conclude this section it is worth noting that the iterative shrinkage-thresholding method will be used for experimental comparison later.
The main contributions of this study are as follows.
- A framework for Bayesian DN estimation is developed. That is, the DN is estimated by separately estimating each Gausian graphical model, referred to as the components.
- The graphical lasso is applied as the thresholding method in the Bayesian precision matrix estimation in order to efficiently capture sparse patterns in the DN, hence developing the BAGLASSO. A threshold selection strategy, based on a conjugate Wishart prior, that accommodates both dense and sparse graphical structures determination is explored. The aforementioned strategy, applied to each component of the DN, ensures an accurately sparse DN estimate.
- The proposed Bayesian DN efficiently improves the existing classical DN estimation for a number of known network structures.
- An R package for the BAGLASSO block Gibbs sampler has been developed for the interested practitioner and is available on The Comprehensive R Archive Network (CRAN) as abglasso.
The Bayesian DN
A fully Bayesian treatment of DNs remains unexplored and the novel methodology here aims to develop a simple yet highly accurate Bayesian DN estimation procedure. The novel contribution utilises the BAGLASSO as a launching point to separately estimate the components of the DN. The subsections that follow develop the framework for individual component estimation from a Bayesian viewpoint. Moreover, the framework has been develop for low p = 10 to moderate, p = 50 − 100, dimensions where n ≥ p.
The Bayesian graphical lasso prior
Recall that the graphical lasso objective is maximising the penalized log-likelihood where (2) and M+ is the space of positive definite matrices, S is the sample covariance matrix and n the sample size, respectively. More over, ρ ≥ 0 is the shrinkage parameter and Θ = (θij) is the precision matrix. The Bayesian connection to the graphical lasso problem is the maximum a posteriori (MAP) estimate, assuming a random sample from Np(μ, Θ−1), of the following (3) The prior distribution is given by the product of a double exponential (DE) with form p(y) = λ/2 exp(−λ|y|) for the off diagonal elements and an exponential (EXP) with form p(y) = λ exp(−λy)1y > 0, otherwise. The value of Θ which maximizes the posterior density is the graphical lasso estimate in Eq (2) when ρ = λ/n. Within the Bayesian context λ is treated as the shrinkage parameter. The formulation and interpretations of the graphical lasso prior in Eq (3) have been studied in [26]. The aim therein is the development of varying regularization to infer block structures within the graphical models and efficiently estimating the maximum a posteriori of the corresponding posterior distribution. [16] make use of this prior formulation for the convenience (scale mixture of Gaussian formulation of the double exponential) in the development of their efficient block Gibbs sampler, in addition to allowing for the use of a gamma hyperprior on the shrinkage parameter λ for improved precision matrix estimation.
Hierarchical representation
The Gibbs sampler for sampling the precision matrix Θ from the posterior distribution, defined below in Eq (5), associated with the prior in Eq (3), is constructed using a hierarchical representation of Eq (3). This particular hierarchical representation of the prior in Eq (3) is presented by [16], whom follow the same approach as in the development of the Gibbs sampler for the Bayesian lasso in [27]. The Gibbs sampler in [27] utilises the structure of the double exponential distribution as a scale mixture of Gaussians, assuming independence of the conditional double exponential priors [28, 29], in their hierarchical representation to simulate regression parameters from the desired posterior distribution. The positive definite constraint on Θ in Eq (3) implies that the Gaussian components for θij (DE parameters) in the scale mixture formulation are no longer independent given the scale parameters. To address this issue, the hierarchical representation of the graphical lasso prior in Eq (3) is given by (4) where θ ≔ {θij}i≤j is a vector of the upper triangular matrix entries of Θ and τ = {τij}i<j the scale parameters. The normalising constant, Cτ, has no closed-form solution. Obtaining the marginal distribution Eq (3), [16] propose a mixing density proportional to an exponential density with rate parameter λ2/2 and simple substitution circumvents the intractable normalising constant. Finally, the hierarchical representation in Eq (4) is used in the development of the block Gibbs sampler, available in the S1 File, with a target posterior distribution given by (5)
BAGLASSO
It is well known that the double exponential prior in Eq (3) may over-shrink (under-shrink) large (small) coefficients in Θ. The limitations within a regression context have been studied in [30–32] with alternative proposals. The BAGLASSO, Bayesian analog to the adaptive graphical lasso [33], exploits the framework and flexibility of the hierarchical representation in Eq (4) to address the aforementioned limitation. This extension serves to improve the accuracy of the precision matrix estimates obtained from the posterior in Eq (5) by allowing for different shrinkage parameters λij for each corresponding off-diagonal precision matrix entry θij. Recall that the adaptive graphical lasso is given by where (6) and for α > 0 are the adaptive weights and the weight matrix () is the sample precision matrix.
The form of the Bayesian graphical lasso in Eq (3) enables the selection of an appropriate hyperprior on the shrinkage parameter λ, recall that ρ = λ/n in the Bayesian formulation of Eq (2). Adhering to the positive definite constraint on Θ, the prior normalising constant in Eq (3) when a single λ is applied to all elements in Θ can be obtained by applying the substitution . Thereafter, a gamma prior λ ∼ GA(r, s) and corresponding conditional posterior λ ∼ GA(r + p(p + 1), s + ‖Θ‖1/2) can be obtained and sampled from. When allowing for individual λij’s for different off-diagonal θij’s, the normalising constant C will inevitably depend on λij. To address this a hierarchical formulation can be used to construct a set of prior distributions, serving as the the extension of the graphical lasso prior in Eq (3), for various λij that mitigate the complications associated with posterior simulation due to the intractable normalising constant. This extension is the BAGLASSO and, assuming a random sample from Np(μ, Θ−1), is given by (7) The normalising constant is intractable, as mentioned above, and the set are hyperparameters for the diagonal elements of Θ. Simple substitution yields that computation of λij is simplified by circumventing the intractable normalising constant.
The BAGLASSO selects the amount of shrinkage λij proportionally to the current value of θij. To see this [16], demonstrate that the conditional posterior, λij | Θ ∼ GA(r + 1, |θij| + s), has an expected value that is inversely related to magnitude of θij. The data augmented block Gibbs sampler for the hierarchical representation in Eq (7) is the fundamental building block upon which the novel Bayesian DN is devised.
Technicalities on conditional dependencies
Recall that the precision matrix directly determines the conditional dependence relations and structure of the undirected graphical model. Therefore, correctly estimating the precision matrices with sparse structures is essential to adequately gauge the conditional dependency relations between variables. The task to estimate the precision matrix for both n < p and p ≤ n remains challenging and regularization is often required [34–36]. A popular choice of prior for Bayesian posterior inference regarding network structure is the conjugate Wishart [37]. An alternative thresholding strategy is presented which is an adaption of the recommendation by [32]. In particular the conjugate Wishart W(3, ϵ Ip) prior is used. The corresponding posterior is W(3 + n, (S + ϵ Ip)−1), where ϵ = 0.001 and Ip a p dimensional identity matrix. The posterior samples are used to compute the posterior distribution of the p × p partial correlation matrix P ≔ {ρij}. The recommended strategy here suggests θij ≠ 0 for i ≠ j if (8) where η may vary depending on the underlying graph structure. The Bayesian posterior thresholding recommendation by [16] claim that θij ≠ 0 for i ≠ j if and only if (9) Noting that is the posterior sample mean estimate of the partial correlation under graphical lasso priors in Eq (3); g is the standard conjugate Wishart W(3, Ip) and h the standard conjugate Wishart W(3, ϵ Ip). Moreover, η ∈ [0, 1] with the lower and upper bounds resulting in a completely dense or sparse estimate, respectively.
The original recommendation for η in Eq (9) is 0.5. The forthcoming synthetic data analysis section describes the simulation procedure, as well as, illustrates the performance of the Bayesian DN with regards to different graph structures, namely an AR(1), AR(2), sparse random, scale-free, band, cluster, star and circle. The goal here is to suggest a suitable sparsity threshold region under the varying graph structures for the recommended sparsity criterion in Eq (8). The Bayesian DN is applied across all graph structures with thresholds, η, in the range of 0.2 and 0.6 in increments of 0.02. The absolute sparsity error is computed for each graph structure for each Bayesian sparsity criterion in Eqs (8) and (9), respectively. The results are based on the median of 40 replications and the Matthews Correlation Coefficient (MCC), see [38], is used to determine the best performing threshold. Fig (1a)–(1i) display the optimal threshold, based on the top performing MCC, for each graph structure and Bayesian sparsity criterion for p = 10. The optimal threshold plots for p = 30 and p = 100 are available in the S1 File. The optimal threshold based on Eq (8), η*, for the Bayesian DN is, in most cases, in the neighborhood of the minimum absolute sparsity error and in the region of η* ∈ {0.2 − 0.4}. Both Bayesian sparsity criterion candidates perform comparably well noting, however, that Eq (8) requires less computation.
The median of the absolute sparsity error and best performing MCC for various graph structures under varying thresholds for each Bayesian sparsity criterion in Eq (9) (dotted) and Eq (8) (dot-dash) for dimension p = 10. The best performing threshold is indicated by a vertical line with the accompanying MCC value displayed in the legend. (a) Model 1: AR(1). (b) Model 2: AR(2). (c) Model 3: at most 80% sparse. (d) Model 4: at most 40% sparse. (e) Model 5: scale-free. (f) Model 6: band. (g) Model 7: cluster. (h) Model 8: star. (i) Model 9: circle.
Synthetic data analysis
The synthetic experiment is designed to test the parameter estimation and graphical structure determination of the DN estimation for both the novel Bayesian approach (referred to as ‘B-net’) and the iterative shrinkage-thresholding estimator (referred to as ‘D-net’) from [24]. The iterative shrinkage-thresholding estimator uses the lasso penalty and Bayesian Information Criterion (BIC) for model estimation and selection, respectively. For all simulations, the assumption is that the observations, and are generated from a Gaussian Np(0, Σ1) and Np(0, Σ2) respectively. The true DN is where the true precision matrices are and . The Bayesian DN applies the BAGLASSO Eq (7) to each sample, i.e. separately estimates the precision matrices. Furthermore, for excellent performance set r = 10−2 and s = 10−6, see S1 File for more details, for the hyperparameters of the prior distributions of λij for i < j and λii = 1 for i = 1, …, p. The iterative shrinkage-thresholding approach jointly estimates the precision matrices for Eq (1). The following 9 graphical structure variations are considered—where the structure of each is applied to each component in the DN’s composition to achieve the desired structure in the DN itself—in the simulation:
- structure 1: An AR(1) model.
- Component 1: θij = 0.7|i−j|.
- Component 2: θij = 0.75|i−j|.
- structure 2: An AR(2) model.
- Component 1: θii = 0.1, θi,i−1 = θi−1,i = 0.05 and θi,i−2 = θi−2,i = 0.025.
- Component 2: θii = 1,θi,i−1 = θi−1,i = 0.5 and θi,i−2 = θi−2, i = 0.25.
- structure 3: A sparse random model where both components have approximately up to 80% off-diagonal elements set to zero.
- structure 4: A moderately sparse random model where both components have approximately up to 40% off-diagonal elements set to zero.
- structure 5: A scale-free model where the second component is a scalar multiple of the first.
- structure 6: A band or diagonal model.
- Component 1: θii = 1, θij = 0.2 for 1 ≤ i ≠ j ≤ p/2, θij = 0.5 for p/2 + 1 ≤ i ≠ j ≤ p and θij = 0 otherwise.
- Component 2: θii = 1, θij = 0.7 for 1 ≤ i ≠ j ≤ p/2, θij = 0.9 for p/2 + 1 ≤ i ≠ j ≤ p and θij = 0 otherwise.
- structure 7: A cluster model containing two disjoint groups.
- Component 1: θii = 1, θij = 0.5 for 1 ≤ i ≠ j ≤ p/2, θij = 0.5 for p/2 + 1 ≤ i ≠ j ≤ p and θij = 0 otherwise.
- Component 2: θii = 1, θij = 0.9 for 1 ≤ i ≠ j ≤ p/2, θij = 0.9 for p/2 + 1 ≤ i ≠ j ≤ p and θij = 0 otherwise.
- structure 8: A star model with every node connected to the first node.
- Component 1: θii = 1, θ1,i = θi,1 = 0.1 and θi,j = 0. otherwise.
- Component 2: θii = 1, θ1,i = θi,1 = 2.1 and θi,j = 0. otherwise.
- structure 9: A circular model.
- Component 1: θii = 2, θi,i−1 = θi−1,i = 1 and θ1,p = θp,1 = 0.45.
- Component 2: θii = 4, θi,i−1 = θi−1,i = 2 and θ1,p = θp,1 = 0.95.
The sample sizes and dimensions for each model are n1 = n2 ∈ {50, 100, 200} and p1 = p2 ∈ {10, 30, 100}, respectively. The Bayesian estimates are based on 10000 Monte Carlo iterations after 5000 burn-in iterations. To assess the performance of DN matrix estimation, six loss functions are considered and defined in Table 1, where p denotes the dimension and γi the ith eigenvalue, respectively. Notice that some loss functions utilise the true DN matrix and its estimates, while others utilise the eigenvalues and their respective estimates. Table 2 reports the median of L1, L2, EL1, EL2, MAXEL1 and MINEL1 for p = 10, 30, 100 in structures 1−9 based on 40 replications. For each scenario, the best performing measure is boldfaced.
The eigenvalue based loss functions are designed to investigate the extremes of the eigenvalue spectrum. In particular, the MAXEL1 loss function highlights which estimator is favourable in a principal component setting, [39]. A couple of observations are worth noting from Tables 2 and 3. First, the D-net estimator performs better with the AR(1) structure. Second, the B-net estimator performs exceptionally well in remaining structures. Third, the standard errors for both DN estimation techniques remain relatively consistent throughout the dimension spectrum considered, noting that the D-net estimator yields, in general, better results. This may be due to the fact that the best performing tuning parameter in the solution path leads to highly sparse estimates. The B-net estimation procedure inherits the utilisation of multiple penalty parameters in the precision matrix estimation, leading to robust estimation of the precision matrices.
To assess the performance on graphical structure determination, the specificity, sensitivity, false negative rate, f1 score and the MCCs are computed and defined in Table 4. Noting that, TP, TN, FP and FN denote the number of true positives, true negatives, false positives and false negatives, respectively. Values of specificity, sensitivity, f1-score and MCC closer to one, imply better classification performance. The closer the values of false negative rate are to zero the better. Further insights on the performance metrics are discussed in [40]. The sparsity for the B-net estimator is determined by the thresholding rule in Eq (8) and the thresholds, η, associated with the MCC values in Fig (1a)–(1i). Similarly, the best performing tuning parameter in the solution path of the D-net algorithm determines the sparsity of the estimator. The median performance scores, based on 40 repetitions, for each graphical structure is presented in Table 5. The main diagonals of the adjacency matrices were not included in the scoring.
The B-net estimator generally outperforms the D-net estimator across all models and all dimensions according to the MCC, f1-score, sensitivity and false negative rate, with the exception of the star case for p = 100. Fig (3a)–(3i) display the true and inferred undirected DN graphs for both the B-net and D-net estimators for p = 10; higher dimensions are available in the S1 File. Lastly, Fig (2a)–(2i) display the true and inferred adjacency matrices for p = 10. Both Figs 2 and 3 visually demonstrate the superiority of the B-net estimator.
Comparison of the true DN, B-net and D-net adjacency matrices for an AR(1), AR(2), sparse random, scale-free, band, cluster, star and circle graphical model and p = 10. (a) Model 1: AR(1). (b) Model 2: AR(2). (c) Model 3: at most 80% sparse. (d) Model 4: at most 40% sparse. (e) Model 5: scale-free. (f) Model 6: band. (g) Model 7: cluster. (h) Model 8: star. (f) Model 9: circle.
Comparison of the true DN, B-net and D-net graphical structure estimates for an AR(1), AR(2), sparse random, scale-free, band, cluster, star and circle graphical model and p = 10. (a) Model 1: AR(1). (b) Model 2: AR(2). (c) Model 3: at most 80% sparse. (d) Model 4: at most 40% sparse. (e) Model 5: scale-free. (f) Model 6: band. (g) Model 7: cluster. (h) Model 8: star. (i) Model 9: circle.
Real data analysis
This section focuses on applying the novel Bayesian DN estimator, B-net, as well as the terative shrinkage-thresholding estimator, D-net, to the spambase dataset, available at https://archive.ics.uci.edu/ml/datasets/spambase to investigate changes in DN structure between spam and non-spam data. In addition, the B-net estimator is applied to South African COVID-19 data, obtained from https://www.nicd.ac.za/diseases-a-z-index/disease-index-covid-19/surveillance-reports/, https://ourworldindata.org/coronavirus/country/south-africa and https://mediahack.co.za/datastories/coronavirus to investigate the change in DN structure between various phases of the pandemic.
Spam data
The objective here is to compare the B-net and D-net graphical model estimates of the spam and non-spam emails. The dataset consists of 1813 spam emails and 2788 non-spam emails. The attributes include, amongst others, the average length of uninterrupted sequences of capital letters; total number of capital letters in the e-mail; an indicator denoting whether the e-mail was considered spam or not, in this study.
Following the approach of [24], the data is standardised using a non-paranormal transformation in order to satisfy the Gaussian assumption. The B-net estimates are based on 10000 iterations of the Monte Carlo sampler after 5000 burn-in iterations. Fig 4 illustrates the difference between the B-net and D-net estimates. Both estimators indicate the presence of several common hub features namely, ‘edu’, ‘original’, ‘direct’, ‘lab’, ‘telnet’ and ‘addresses’. It is clear from both panes that the state of the covariance matrix structure between the spam and non-spam emails may very well be different. Furthermore, given that Hewlett-Packard Labs donated the data, words such as ‘telnet’ and ‘hp’ appear more often in the non-spam emails and can be used to distinguish between spam and non-spam emails.
(a) The Bayesian DN for the spam emails dataset. (b) The iterative-shrinkage DN for the spam emails dataset.
South African COVID-19 data
The 2019 novel coronavirus (COVID-19) has affected more than 180 countries around the world, including South Africa. The current body of knowledge boasts a wealth of statistical literature that aims at empowering researchers to study and alleviate the impact of the disease, see for example [41]. Understanding the interaction of key metrics and attributes between various phases, cycles or waves of the pandemic may prove to be invaluable in strategic planning and prevention. The goal here, is to use the Bayesian DN, B-net, to illustrate that the interactivity of key daily metrics between suspected homogeneous and heterogeneous phases within the pandemic life cycle is ever changing. In particular, the B-net is used to model the interactivity of daily metrics between the first two peaks or waves; the first wave and the following plateau and finally the difference between the first and second post wave plateaus. The data consists of 446 observations from the 7th of February 2020 to the 27th of April 2021. The daily metrics include, deaths; performed tests; positive test rate; active cases; tests per active case; recoveries; hospital admissions; hospital discharges; ICU admissions and the number of ventilated patients. It should be noted that no sensitive patient information is used, however, the interested reader is referred to [42] for a detailed treatment and framework for dealing with and sanitizing medical data containing sensitive patient information. Due to the irregularities in data capturing and publishing, a seven day moving average is applied across all daily metrics. The data is standardised using a non-paranormal transformation in order to satisfy the Gaussian assumption. The B-net is applied to the data using 10000 iterations of the Monte Carlo sampler after 5000 burn-in iterations.
Fig 5 highlights the temporal nature of the pandemic between suspected homogeneous and heterogeneous phases. In other words, comparing the cyclical behaviour of individual daily metrics may seem clearly distinctive over time; a peak or wave is always followed by a plateau. Furthermore, extrapolation of the temporal behaviour of individual daily metrics may incorrectly allude to distinct multi modality of multiple daily metrics. Upon observing multiple metrics simultaneously, the crisp group-wise multi modality diminishes rather rapidly. The figures in Fig 6 illustrate the higher proportions of hub features present in the DNs. Interestingly, the Bayesian DN provides insight to the change in interaction between daily metrics between perceived homogeneous pandemic phases, that is comparisons between the two peaks and two post-peak plateaus. This change in behaviour could be as a result of the change in population adherence to public sanitation awareness; weather conditions; virus mutations or complacency of over time.
7-day moving average filled area line plots with standardised counts for daily new cases; deaths; tests; positive test rate; active cases; tests per active case; recoveries; hospital admissions; hospital discharges; ICU admissions and ventilated patients.
The Bayesian DN and corresponding BAGLASSO graphical models between the first two waves; the first wave and the following plateau and finally the difference between the first and second post wave plateaus. The p−values from the Box’s M-test for homogeneity of covariance matrices between the contributing precision matrices were all less than 0.001 [43].
Discussion
The Bayesian differential network estimator is the first of its kind which utilises the excellent graphical structure determination and matrix estimation of the Bayesian graphical lasso [16]. In comparison with the state of the art iterative shrinkage-thresholding approach, the Bayesian differential network offers MCMC outputs that allow the user to gain deeper insight and inference in the estimation procedure. The numerical accuracy of the Bayesian differential network is, in general, superior to that of the iterative shrinkage-thresholding estimator. Moreover, the Bayesian proposal captures both sparse and dense precision matrix patterns in some well-known graphical structures more accurately. The latter being a result of the Wishart prior’s ability to accommodate the variability and adjustment to the data. Furthermore, the thresholding technique for sparse estimation is designed such that it accounts for the effect of prior allocation through the posterior expectation.
The graphical structure learning is a crucial component of the Bayesian differential network estimator. The ad hoc approach provided in Eq (8) suggests a suitable sparsity threshold under varying graph structures. The Bayesian differential network also provides key insights to changes in the interactive behaviour of real data metrics ranging from filtering spam emails to COVID-19 life cycles. For high-dimensional data, the block Gibbs sampler may be adjusted to incorporate the singular normal distribution presented in [44] in the hierarchical representation Eq (7). Furthermore, research on simultaneous Bayesian estimation and optimisation of both and in the construction of the differential network is underway.
Supporting information
S1 File. Supplementary material.
Contains a block Gibbs sampler, as well as, additional optimal threshold; adjacency heatmaps and graphical network figures for dimensions p = 30 and p = 100.
https://doi.org/10.1371/journal.pone.0261193.s001
(PDF)
Acknowledgments
We would like to sincerely thank both anonymous reviewers for their generous comments on the manuscript. The astute feedback was most welcomed, insightful and above all greatly improved the presentation, scientific justification and readability of the paper.
References
- 1. Shojaie A. Differential network analysis: A statistical perspective. Wiley Interdisciplinary Reviews: Computational Statistics. 2020; p. e1508.
- 2.
Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. MIT press; 2009.
- 3. Chuang H, Lee E, Liu Y, Lee D, Ideker T. Network-based classification of breast cancer metastasis. Molecular Systems Biology. 2007;3(1):140. pmid:17940530
- 4. Taylor I, Linding R, Warder-Farley D, Liu Y, Pesquita C, Faria D, et al. Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nature Biotechnology. 2009;27(2):199–204. pmid:19182785
- 5. Li Q, Shao J. Sparse quadratic discriminant analysis for high dimensional data. Statistica Sinica. 2015;25:457–473.
- 6. Jiang B, Wang X, Leng C. A direct approach for sparse quadratic discriminant analysis. The Journal of Machine Learning Research. 2018;19(1):1098–1134.
- 7. Meinshausen N, Bühlmann P. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics. 2006;34(3):1436–1462.
- 8.
Lauritzen S. Graphical models. Oxford: Clarendon Press; 1996.
- 9. Yuan M, Lin Y. Model selection and estimation in the Gaussian graphical model. Biometrika. 2007;94(1):19–35.
- 10. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics. 2008;9(3):432–441. pmid:18079126
- 11. Banerjee O, Ghaoui LE, d’Aspremont A. Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data. The Journal of Machine Learning Research. 2008;9:485–516.
- 12. Nesterov Y. Smooth minimization of non-smooth functions. Mathematical Programming. 2005;103(1):127–152.
- 13. Cai T, Liu W, Luo X. A constrained ℓ1 minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association. 2011;106(494):594–607.
- 14. Guo J, Levina E, Michailidis G, Zhu J. Joint estimation of multiple graphical models. Biometrika. 2011;98(1):1–15. pmid:23049124
- 15. Danaher P, Wang P, Witten D. The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Methodological). 2014;76(2):373–397. pmid:24817823
- 16. Wang H. Bayesian graphical lasso models and efficient posterior computation. Bayesian Analysis. 2012;7(4):867–886.
- 17. Banerjee S, Ghosal S. Bayesian structure learning in graphical models. Journal of Multivariate Analysis. 2015;136:147–162.
- 18. Peterson C, Stingo F, Vannucci M. Bayesian inference of multiple Gaussian graphical models. Journal of the American Statistical Association. 2015;110(509):159–174. pmid:26078481
- 19.
Williams D, Piironen J, Vehtari A, Rast P. Bayesian estimation of Gaussian graphical models with predictive covariance selection. arXiv preprint arXiv:180105725. 2018;.
- 20. Chiquet J, Grandvalet Y, Ambroise C. Inferring multiple graphical structures. Statistics and Computing. 2011;21(4):537–553.
- 21. Zhu Y, Li L. Multiple matrix gaussian graphs estimation. Journal of the Royal Statistical Society: Series B (Methodological). 2018;80(5):927–950. pmid:30505211
- 22. Zhao S, Cai T, Li H. Direct estimation of differential networks. Biometrika. 2014;101(2):253–268. pmid:26023240
- 23. Yuan H, Xi R, Chen C, Deng M. Differential network analysis via lasso penalized D-trace loss. Biometrika. 2017;104(4):755–770.
- 24. Tang Z, Yu Z, Wang C. A fast iterative algorithm for high-dimensional differential network. Computational Statistics. 2020;35(1):95–109.
- 25. Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences. 2009;2(1):183–202.
- 26.
Marlin B, Schmidt M, Murphy K. Group sparse priors for covariance estimation. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence; 2009.
- 27. Park T, Casella G. The bayesian lasso. Journal of the American Statistical Association. 2008;103(482):681–686.
- 28. Andrews DF, Mallows CL. Scale mixtures of normal distributions. Journal of the Royal Statistical Society: Series B (Methodological). 1974;36(1):99–102.
- 29. West M. On scale mixtures of normal distributions. Biometrika. 1987;74(3):646–648.
- 30. Li Q, Lin N. The Bayesian elastic net. Bayesian Analysis. 2010;5(1):151–170.
- 31. Griffin JE, Philip J. Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis. 2010;5(1):171–188.
- 32. Carvalho CM, Polson NG, Scott JG. The horseshoe estimator for sparse signals. Biometrika. 2010;97(2):465–480.
- 33. Fan J, Feng Y, Wu Y. Wishart distributions for decomposable graphs. The Annals of Applied Statistics. 2009;3(2):521–541. pmid:21643444
- 34. Demptser A. Covariance selection. Biometrics. 1972; p. 157–175.
- 35. Mazumder R, Hastie T. The graphical lasso: New insights and alternatives. Electronic Journal of Statistics. 2012;6:2125–2149. pmid:25558297
- 36. Wang J, Jiang B. An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss. Computational Statistics & Data Analysis. 2020;142(2):106812.
- 37. Letac G, Massam H. Wishart distributions for decomposable graphs. The Annals of Statistics. 2007;35(3):1278–1323.
- 38. Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure. 1975;405(2):442–451. pmid:1180967
- 39. Banerjee S, Monni S, Wells M. A regularized profile likelihood approach to covariance matrix estimation. Journal of Statistical Planning and Inference. 2013;179:36–59.
- 40. Iwendi C, Khan S, Anajemba JH, Mittal M, Alenezi M, Alazab M. The use of ensemble models for multiple class and binary class classification for improving intrusion detection systems. Sensors. 2020;20(9):2559. pmid:32365937
- 41. Iwendi C, Bashir AK, Peshkar A, Sujatha R, Chatterjee JM, Pasupuleti S, et al. COVID-19 patient health prediction using boosted random forest algorithm. Frontiers in Public Health. 2020;8(357). pmid:32719767
- 42. Iwendi C, Moqurrab SA, Anjum A, Khan S, Mohan S, Srivastava G. N-sanitization: A semantic privacy-preserving framework for unstructured medical datasets. Computer Communications. 2020;161:160–171.
- 43. Box GE. A general distribution theory for a class of likelihood criteria. Biometrika. 1949;36(3/4):317–346. pmid:15402070
- 44. Bland RP, Owen DB. A note on singular normal distributions. Annals of the Institute of Statistical Mathematics. 1966;18(1):113–116.