Fig 1.
Hypotetical example Bayesian network with and without variable grouping.
(A) Example model of a modular detailed Bayesian network with variables waist circumference (waist_c), body fat percentage (fat_perc), BMI and three blood pressure measurements (blood_pr1, blood_pr2, blood_pr3) as well as a target disease. (B) Possible grouping of the variables in the network. (C) Corresponding group Bayesian network among two groups and the target variable.
Fig 2.
Schematic outline of the proposed approach to learn group Bayesian networks.
Features of the input data are grouped using hierarchical clustering, then a group Bayesian network is learned. Based on the accuracy of the resulting model, the grouping is refined adaptively downwards along the dendrogram. The output is an interpretable disease-specific biomarker network based on feature groups, which has high predictive accuracy.
Fig 3.
Results on simulated networks.
(A) The basic model structure used to simulate random networks with latent group structure. Group networks with 20 nodes in layer 1 were learned from simulated data from layer 0 with varying group sizes and noise levels. (B-C) Results from the reconstruction of variable grouping and group networks for varying group sizes. y-axes showing partition metric and normalized Hamming distance, respectively. Two types of group network inference—aggregation by principal components (PC) and cluster medoids (MED) – as well a standard network inference approach were used. As a comparison, the ground-truth grouping was used for network inference. (D-E) Results from the reconstruction of variable grouping and group networks for varying noise levels. y-axes showing partition metric and normalized Hamming distance, respectively. (F-G) Results from the prediction of a target variable for varying group sizes and noise levels, and applied noise level as comparison. y-axes showing the average prediction error.
Fig 4.
(A) Dendrogram of the wine dataset with 5 groups indicated by colour, and the target variable Soil separated. (B) Group Bayesian network learned from the wine dataset with 5 groups, colours refer to the grouping. (C) Group Bayesian network after target-specific refinement.
Table 1.
Prediction results of NAFLD models.
Fig 5.
(A) Structure of the complete, refined group Bayesian network model for hepatic steatosis. (B) Extract from the group network including the target variable steatosis, its Markov blanket and surrounding.
Table 2.
Prediction results of hypertension models.
Fig 6.
(A) Structure of the complete, refined group Bayesian network model for hypertension. (B) Extract from the group network including the target variable hypertension, its Markov blanket and surrounding.
Table 3.
Processing times.