Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques

doi:10.1371/journal.pone.0154493

Fig 1.

Schematic diagram depicting the three strategies employed for indicating the presence/ absence of a taxon.

Schematic diagram depicting the three strategies employed for indicating the presence/ absence of a taxon (in various samples) based on their abundance values (in the respective samples). The first strategy (depicted in section A), relies only on the abundance proportion of the taxa in each sample. A taxon whose (normalized) abundance proportion (in a sample) exceeds 0.1% is considered as 'present' (in that sample). In the second strategy (depicted in section B), a taxon is reported as 'present' (in a sample) only if its abundance value (in that sample) lies between the 2nd and 3rd quartile range of the computed mean/median value. Strategy 3 (depicted in section C) involves computing Manhattan distances between individual abundance values of a taxon (in each of the samples) and then hierarchically clustering the samples on basis of the computed distances. Given that hierarchically clustering in this case involves only singular abundance values, the clustering can be achieved by progressively merging sample pairs with the least distance. The sorting mechanism indicated in the figure helps in making the distance calculation process less time consuming (i.e. computationally efficient). Note that the final two clusters obtained indicate that the taxon is reported as 'present' in all samples except for Sample S1.

More »

Expand

Fig 2.

Schematic work-flow depicting the associative rule mining procedure customised for microbial abundance data.

A schematic work-flow depicting the associative rule mining procedure that has been customised for microbial abundance data. The work-flow has been explained using an initial example abundance matrix which depicts normalized proportions of five distinct microbes in nine microbiome samples (S1 to S9). The subsequently indicated Boolean matrix (wherein taxa abundances have been indicated by presence/absence values i.e. 0 and 1) was generated by employing strategy I in which taxa whose normalized abundance were greater than 0.1 are considered as 'present'. The subsequent steps represent the process of candidate set generation. The depicted example indicates the use of a Support Count Value of 6. Taxa whose Support Count Value exceeded 6 (indicated in green font) eventually constitute the candidate set. The final matrix represents the sole association rule generated after validating various taxa combinations (in the candidate set) for confidence value threshold. Note that this rule is generated only if all possible (indicated) taxa combinations exceed the confidence value threshold.

More »

Expand

Fig 3.

Minimalist graphical representation of associative rules involving 3 or more genera.

A 'minimalist' graphical representation of associative rules (involving 3 or more genera) generated from an example dataset containing 26 genera named alphabetically (A to Z). Rules indicated in this example involve only 13 out of 26 genera. It is pertinent to note here that genera (and/ or groups of genera) constituting an individual rule share an all-to-all associative relationship. For examples rule 3 (involving 5 genera viz. X, Y, Z, H, and O) not only indicates an associative relationship between all possible genera pairs, but also between all possible combinations of genera. For the purpose of clarity, an exhaustive list of such combinations (possible from rule 3) is provided in the table depicted in Fig 3. As indicated, rule 3 (for instance) indicates an association between the abundances of genera pair (X, Y) and the genera group (Z, H, and O). Given that Fig 3 illustrates a 'minimalist' graphical representation of all associative rules, genera X, Y, and Z (common to rules 3 and 4) are shown only once in the circled portion of the illustrated figure. The table depicted in Fig 3 also provides an exhaustive list of taxa and combinations of taxa generated from rule 4.

More »

Expand

Fig 4.

Associative rules (involving 3 or more genera) generated from the prebiotic datasets.

A graphic representation of associative rules (involving 3 or more genera) generated from the prebiotic datasets. Parts A, B and C depict association rules generated from the Chinese prebiotic datasets [2]. Parts D, E and F depict association rules generated from the Japanese prebiotic datasets [3].

More »

Expand

Fig 5.

Associative rules (involving 3 or more genera) generated from the HMP datasets.

A graphic representation of associative rules (involving 3 or more genera) generated from the HMP datasets [4]. Parts A and B depict association rules generated from samples corresponding to male and female subjects respectively.

More »

Expand

Table 1.

Number of association rules generated using the Apriori rule mining approach with various datasets.

Summarised information pertaining to (a) the number of samples, (b) the number of generated association rules (total as well as rules that involve 3 or more genera), (c) the unique number of microbial genera involved in the identified association rules, (d) execution time, and (e) the number of rules generated using an alternative rule mining strategy (detailed in discussion section of the manuscript).

More »

Expand

Table 2.

Number of association rules generated from the prebiotics dataset with various run-time thresholds.

Number of association rules generated using the Apriori rule mining approach on the prebiotics dataset at various values of support count and confidence thresholds. Table also depicts variations in number of rules due to adoption of various strategies that define the minimum abundance threshold for individual taxa to be considered for rule mining.

More »

Expand

Table 3.

Number of association rules generated from the HMP (male) dataset with various run-time thresholds.

Number of association rules generated using the Apriori rule mining approach on the HMP (male) dataset at various values of support count and confidence thresholds. Table also depicts variations in number of rules due to adoption of various strategies that define the minimum abundance threshold for individual taxa to be considered for rule mining.

More »

Expand

Table 4.

Number of association rules generated from the HMP (full) dataset with various run-time thresholds.

Number of association rules generated using the Apriori rule mining approach on the HMP (full) dataset at various values of support count and confidence thresholds. Table also depicts variations in number of rules due to adoption of various strategies that define the minimum abundance threshold for individual taxa to be considered for rule mining.

More »

Expand

Fig 6.

Comparison of results generated using correlation approach and the Apriori approach.

A comparison of results generated using (i) correlation approach and (ii) the Apriori approach. The abundance values indicated in part A represent the actual abundances of 4 genera in various samples constituting the prebiotic datasets [2]. Table shown in Part B indicates Spearman correlation values computed between various taxa pairs. The taxon pair that generated a significant correlation is indicated in green font. Part C depicts association rules generated using the Apriori approach.

More »

Expand

Fig 7.

Steps followed in the correlation-based (alternative) rule mining approach.

A graphical representation of various steps followed in the correlation-based (alternative) rule mining approach.

More »

Expand