Skip to main content
Advertisement

< Back to Article

Fig 1.

Schematic representation of the comparison of CBN models.

A Data sets D1 and D2 consist of N1 and N2 genotypes, respectively, and, in this example, p = 4 mutations. We combined both data sets into a single one D0 with N1 + N2 genotypes. B We randomly split data set D0 into data sets S1 and S2 and we do so B times. C For each data set, we apply the H-CBN2 approach to learn the structure of the network and for each pair, S1 and S2, we compute the Jaccard distance. D The empirical distribution of the test statistic is computed by aggregating the distances between pairs S1 and S2. E We compare the inferred CBN posets from original data sets D1 and D2 by computing the Jaccard distance and assess its significance.

More »

Fig 1 Expand

Fig 2.

Assessment of H-CBN2 on simulated data.

A Box plots of the difference between true (ϵ) and estimated () error rate (y-axis) for each of the evaluated poset sizes (x-axis). B Box plots of the relative median absolute error (RMAE; y-axis) of the estimated rate parameters . C Average run time of the MCEM/EM step (y-axis, logarithmic scale) for different poset sizes (x-axis, logarithmic scale). The blue dotted line corresponds to linear scaling, whereas the red line corresponds to quadratic scaling. In panels A to C, different colors indicate different importance sampling schemes and we show results of 100 simulated data sets for each of combination of the simulation settings. The true error rate is ϵ = 0.05, the number of samples drawn from the proposal distribution is set to L = 1000 unless specified otherwise and we run 100 iterations of the MCEM/EM algorithm. D Error in the estimation of the log-likelihood, . E Box plots of F1 scores for reconstructed network edges. In panels D and E, we show results of 20 different networks with 16 mutations and an error rate of 5%. We fix the ideal acceptance rate to 1/p, and run 25,000 iterations of the simulated annealing algorithm. The initial temperature is set to Θ0 = 50 for all runs, and for adaptive simulated annealing, three adaptation rates are evaluated (ar = 0.1, 0.3, 0.5). Comparison of H-CBN2 to MC-CBN methods in terms of F the difference in normalized log-likelihood and G F1 scores for two poset sizes and various error rates. For the H-CBN2 results shown in panels F and G, we employ the ASA algorithm. SA: simulated annealing, ASA: adaptive simulated annealing, +: with additional new moves.

More »

Fig 2 Expand

Fig 3.

Consensus posets for lopinavir resistance for two different HIV-1 subtype C data sets.

Shown are the consensus poset for A the South African cohort and B for the remaining HIV-1 subtype C sequences retrieved from the HIVDB. Nodes in the network correspond to amino acid changes in the HIV-1 protease, where mutations at the same locus are grouped together in one event. Only edges with a bootstrap support greater than 0.7 are shown and the edge thickness indicates the bootstrap support. Nodes with white background show residues with at least one major PI mutation.

More »

Fig 3 Expand

Fig 4.

Consensus poset for the accumulation of mutations in HIV-1 subtype B under lopinavir treatment.

The underlying data set contains 470 genotypes retrieved from the HIVDB and SHCS. Nodes in the network correspond to amino acid changes in the HIV-1 protease, and mutations at the same locus are grouped together. Edge labels indicate the bootstrap support, and we show only edges with a bootstrap support greater or equal to 0.7.

More »

Fig 4 Expand

Fig 5.

Empirical null distribution of pairwise Jaccard distances estimated by permuting group labels.

Displayed are the histograms of Jaccard distances for the comparison of subtypes B and C for H-CBN2 posets with A 11 mutations and B 18 mutations, as well as the histograms of Jaccard distances for the comparison of two data sets for subtype C for H-CBN2 posets with C 11 mutations and D 19 mutations. Vertical dotted lines indicate the distance between the CBNs obtained from the observed data.

More »

Fig 5 Expand