Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Flowchart of the proposed methodology (StatBicRM) for the rule mining.

Here, the terms TOTALDESETN, TOTALDESETNN, TOTALDESETN+NN are described in last paragraph of subsection “Identification of differentially expressed/methylated genes using Statistical tests”. For methylation dataset, the above terms are replaced by TOTALDMSETN, TOTALDMSETNN, TOTALDMSETN+NN, respectively.

More »

Fig 1 Expand

Fig 2.

Flowchart of the proposed methodology (StatBicRM) for the classification.

Here, the terms TOTALDESETN, TOTALDESETNN, TOTALDESETN+NN are described in last paragraph of subsection. For methylation dataset, the above terms are replaced by TOTALDMSETN, TOTALDMSETNN, TOTALDMSETN+NN, respectively.

More »

Fig 2 Expand

Fig 3.

An example of generating special rules from data matrix of the differentially expressed genes.

Here, up-regulation (i.e., ‘+’) and down-regulation (‘-’) are denoted by ‘1’ and ‘0’ in (b), and red and green colors in (c), respectively. Here, str and snr denote experimental/diseased/treated and control/normal samples respectively.

More »

Fig 3 Expand

Fig 4.

An example of classification of evolved rules by the majority voting using weighted-sum.

Here, ‘r’ and ‘w’ denote rank and weight of the rule (computed by Equation 18), respectively. Tickmark/crossmark in ‘Q’ column states that test-point (ts) is satisfied/non-satisfied by the corresponding rule.

More »

Fig 4 Expand

Table 1.

Information of used Real Datasets (DS).

More »

Table 1 Expand

Table 2.

Number of differentially expressed genes by different statistical tests for Dataset 1, where #Gup, #Gdw denote up and down-regulated genes, respectively. Here, Pearson’s correlation test can not be used as the number of experimental samples is not equal to the number of control samples.

More »

Table 2 Expand

Table 3.

Number of differentially expressed genes by different statistical tests for Dataset 2, where #Gup, #Gdw denote up and down-regulated genes, respectively.

More »

Table 3 Expand

Table 4.

Number of differentially methylated genes by different statistical tests for Dataset 3, where #Ghyper and #Ghypo refer to hyper and hypo-methylated genes, respectively.

More »

Table 4 Expand

Fig 5.

The clustergram of the common differentially expressed genes (by different statistical tests) for DS1.

Here, red colour denotes up-regulation of genes across the specific samples/conditions, and green colour denotes down-regulation of genes across the specific samples/conditions.

More »

Fig 5 Expand

Fig 6.

Volcanoplot for identifying differential up and down-regulated genes from Dataset 1 by SAM.

More »

Fig 6 Expand

Table 5.

Number of differentially expressed genes by different statistical tests for the artificial Dataset 4, where #Gup, #Gdown denote up-regulated and down-regulated genes, respectively.

More »

Table 5 Expand

Table 6.

Comparative performance analysis of the rule-based classifiers on Dataset 1, respectively (at 4-fold CVs repeating for 10 times); where bold font denotes the highest value for each column.

More »

Table 6 Expand

Table 7.

Comparative performance analysis of the rule-based classifiers on Dataset 2, respectively (at 4-fold CVs repeating for 10 times); where bold font denotes the highest value for each column.

More »

Table 7 Expand

Table 8.

Comparative performance analysis of the rule-based classifiers on Dataset 3, respectively (at 4-fold CVs repeating for 10 times); where bold font denotes the highest value for each column.

More »

Table 8 Expand

Table 9.

Comparative performance analysis of the rule-based classifiers on Dataset 4, respectively (at 4-fold CVs repeating for 10 times); where bold font denotes the highest value for each column.

More »

Table 9 Expand

Table 10.

p-value of Anova 1 between the avg. accuracies of the proposed and other classifiers (pairwise) in DS1, DS2, DS3 and DS4 (where ‘S’ and ‘NS’ refer to significant (p-value ≤ 0.05) and non-significant (p-value > 0.05) p-values respectively).

More »

Table 10 Expand

Fig 7.

A graphical representation of the gene expression of a maximal homogeneous bicluster (i.e., a MFCHOI) over different samples.

More »

Fig 7 Expand

Fig 8.

Barcharts: (a) comparison of dataset-wise average accuracies, and (b) comparison of dataset-wise average MCCs, among our proposed and other existing rule-based classifiers for the four datasets.

More »

Fig 8 Expand

Fig 9.

Boxplots of significance tests (i.e., one-way Anova) for identifying level of significances (i.e., p-values) of accuracies between the proposed and other rule-based classifiers (pairwise) for Dataset 1 [in (a).(i-vi)], Dataset 2 [in (b).(i-vi)], Dataset 3 [in (c).(i-vi)] and Dataset 4 [in (d).(i-vi)]; where (i) proposed vs ConjunctiveRule, (ii) proposed vs DecisionTable, (iii) proposed vs JRip, (iv) proposed vs OneR, (v) proposed vs PART and (vi) proposed vs Ridor; (here vertical axis denotes the accuracy of the classifier).

More »

Fig 9 Expand

Table 11.

Top 10 frequent genes in evolved rules of the two class-labels for DS1, DS2 and DS3, respectively.

Ruleexperimental and Rulecontrol denote the set of the evolved rules of experimental class-label, and the set of the evolved rules of control class-label, respectively.

More »

Table 11 Expand

Fig 10.

Two examples of how significant biomarkers are identified from the maximal homogeneous biclusters (i.e., MFCHOI) for each class-label for each dataset.

Here, we are shown intersection of only four maximal homogeneous biclusters for (a) the class-label AC and (b) the class-label SCC, individually (for Dataset 1). For the class AC, CENPA-, TTK-, KIF11-, KIF18B- and ZNF367- are the top frequent genes as they exist in the four biclusters (see (a)); similarly, for the class SCC, SHROOM3- is top frequent gene as it exists in the four biclusters (see (b)).

More »

Fig 10 Expand

Table 12.

KEGG pathway, GO:BP, GO:CC and GO:MF analysis of corresponding genes of the evolved rules from the three datasets.

Here, ‘satisfiable rule’ or SRule by some KEGG-pathway(i.e., Path)/GO:BP/GO:CC/GO:MF means that all the genes (i.e., antecedent) of the rule are occurred together in the pathway/Go-term.

More »

Table 12 Expand

Table 13.

Some top important rules w.r.t. their existing KEGG pathways/GO:BPs/GO:CCs/GO:CCs/GO:MFs in Dataset 1.

More »

Table 13 Expand

Table 14.

Some top important rules w.r.t. their existing KEGG pathways/GO:BPs/GO:CCs in Dataset 2.

Here, we have got no such significant rule w.r.t. their existing GO:MFs for the dataset.

More »

Table 14 Expand

Table 15.

Top important rules w.r.t. their existing KEGG pathways/GO:BPs/GO:CCs/GO:MFs in Dataset 3.

More »

Table 15 Expand

Fig 11.

Comparison of number of significant itemsets between StatBicRM and other existing ARM methods at different minimum support for the two artificial datasets (viz., ArDS5 and ArDS6).

“Significant itemset” refers to MFCHOI for StatBicRM, and FI for the other methods.

More »

Fig 11 Expand