Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

Flowchart of the proposed methodology (StatBicRM) for the rule mining.
Here, the terms TOTALDESET_N, TOTALDESET_NN, TOTALDESET_N+NN are described in last paragraph of subsection “Identification of differentially expressed/methylated genes using Statistical tests”. For methylation dataset, the above terms are replaced by TOTALDMSET_N, TOTALDMSET_NN, TOTALDMSET_N+NN, respectively.

More »

Fig 2 — Fig 2.

Flowchart of the proposed methodology (StatBicRM) for the classification.
Here, the terms TOTALDESET_N, TOTALDESET_NN, TOTALDESET_N+NN are described in last paragraph of subsection. For methylation dataset, the above terms are replaced by TOTALDMSET_N, TOTALDMSET_NN, TOTALDMSET_N+NN, respectively.

More »

Fig 3 — Fig 3.

An example of generating special rules from data matrix of the differentially expressed genes.
Here, up-regulation (i.e., ‘+’) and down-regulation (‘-’) are denoted by ‘1’ and ‘0’ in (b), and red and green colors in (c), respectively. Here, s_tr and s_nr denote experimental/diseased/treated and control/normal samples respectively.

More »

Fig 4 — Fig 4.

An example of classification of evolved rules by the majority voting using weighted-sum.
Here, ‘r’ and ‘w’ denote rank and weight of the rule (computed by Equation 18), respectively. Tickmark/crossmark in ‘Q’ column states that test-point (ts) is satisfied/non-satisfied by the corresponding rule.

More »

Table 1 — Table 1.

Information of used Real Datasets (DS).

More »

Table 2 — Table 2.

Number of differentially expressed genes by different statistical tests for Dataset 1, where #G_up, #G_dw denote up and down-regulated genes, respectively. Here, Pearson’s correlation test can not be used as the number of experimental samples is not equal to the number of control samples.

More »

Table 3 — Table 3.

Number of differentially expressed genes by different statistical tests for Dataset 2, where #G_up, #G_dw denote up and down-regulated genes, respectively.

More »

Table 4 — Table 4.

Number of differentially methylated genes by different statistical tests for Dataset 3, where #G_hyper and #G_hypo refer to hyper and hypo-methylated genes, respectively.

More »

Fig 5 — Fig 5.

The clustergram of the common differentially expressed genes (by different statistical tests) for DS1.
Here, red colour denotes up-regulation of genes across the specific samples/conditions, and green colour denotes down-regulation of genes across the specific samples/conditions.

More »

Fig 6 — Fig 6.

Volcanoplot for identifying differential up and down-regulated genes from Dataset 1 by SAM.

More »

Table 5 — Table 5.

Number of differentially expressed genes by different statistical tests for the artificial Dataset 4, where #G_up, #G_down denote up-regulated and down-regulated genes, respectively.

More »

Table 6 — Table 6.

Comparative performance analysis of the rule-based classifiers on Dataset 1, respectively (at 4-fold CVs repeating for 10 times); where bold font denotes the highest value for each column.

More »

Table 7 — Table 7.

Comparative performance analysis of the rule-based classifiers on Dataset 2, respectively (at 4-fold CVs repeating for 10 times); where bold font denotes the highest value for each column.

More »

Table 8 — Table 8.

Comparative performance analysis of the rule-based classifiers on Dataset 3, respectively (at 4-fold CVs repeating for 10 times); where bold font denotes the highest value for each column.

More »

Table 9 — Table 9.

Comparative performance analysis of the rule-based classifiers on Dataset 4, respectively (at 4-fold CVs repeating for 10 times); where bold font denotes the highest value for each column.

More »

Table 10 — Table 10.

p-value of Anova 1 between the avg. accuracies of the proposed and other classifiers (pairwise) in DS1, DS2, DS3 and DS4 (where ‘S’ and ‘NS’ refer to significant (p-value ≤ 0.05) and non-significant (p-value > 0.05) p-values respectively).

More »

Fig 7 — Fig 7.

A graphical representation of the gene expression of a maximal homogeneous bicluster (i.e., a MFCHOI) over different samples.

More »

Fig 8 — Fig 8.

Barcharts: (a) comparison of dataset-wise average accuracies, and (b) comparison of dataset-wise average MCCs, among our proposed and other existing rule-based classifiers for the four datasets.

More »

Fig 9 — Fig 9.

Boxplots of significance tests (i.e., one-way Anova) for identifying level of significances (i.e., p-values) of accuracies between the proposed and other rule-based classifiers (pairwise) for Dataset 1 [in (a).(i-vi)], Dataset 2 [in (b).(i-vi)], Dataset 3 [in (c).(i-vi)] and Dataset 4 [in (d).(i-vi)]; where (i) proposed vs ConjunctiveRule, (ii) proposed vs DecisionTable, (iii) proposed vs JRip, (iv) proposed vs OneR, (v) proposed vs PART and (vi) proposed vs Ridor; (here vertical axis denotes the accuracy of the classifier).

More »

Table 11 — Table 11.

Top 10 frequent genes in evolved rules of the two class-labels for DS1, DS2 and DS3, respectively.
Rule_experimental and Rule_control denote the set of the evolved rules of experimental class-label, and the set of the evolved rules of control class-label, respectively.

More »

Fig 10 — Fig 10.

Two examples of how significant biomarkers are identified from the maximal homogeneous biclusters (i.e., MFCHOI) for each class-label for each dataset.
Here, we are shown intersection of only four maximal homogeneous biclusters for (a) the class-label AC and (b) the class-label SCC, individually (for Dataset 1). For the class AC, CENPA-, TTK-, KIF11-, KIF18B- and ZNF367- are the top frequent genes as they exist in the four biclusters (see (a)); similarly, for the class SCC, SHROOM3- is top frequent gene as it exists in the four biclusters (see (b)).

More »

Table 12 — Table 12.

KEGG pathway, GO:BP, GO:CC and GO:MF analysis of corresponding genes of the evolved rules from the three datasets.
Here, ‘satisfiable rule’ or SRule by some KEGG-pathway(i.e., Path)/GO:BP/GO:CC/GO:MF means that all the genes (i.e., antecedent) of the rule are occurred together in the pathway/Go-term.

More »

Table 13 — Table 13.

Some top important rules w.r.t. their existing KEGG pathways/GO:BPs/GO:CCs/GO:CCs/GO:MFs in Dataset 1.

More »

Table 14 — Table 14.

Some top important rules w.r.t. their existing KEGG pathways/GO:BPs/GO:CCs in Dataset 2.
Here, we have got no such significant rule w.r.t. their existing GO:MFs for the dataset.

More »

Table 15 — Table 15.

Top important rules w.r.t. their existing KEGG pathways/GO:BPs/GO:CCs/GO:MFs in Dataset 3.

More »

Fig 11 — Fig 11.

Comparison of number of significant itemsets between StatBicRM and other existing ARM methods at different minimum support for the two artificial datasets (viz., ArDS5 and ArDS6).
“Significant itemset” refers to MFCHOI for StatBicRM, and FI for the other methods.

More »