Skip to main content
Advertisement

< Back to Article

Fig 1.

The overall methodology of ProtFus.

The algorithm begins with collecting abstracts and full-texts from PubMed, followed by normalization, tokenization and entity recognition, cross-references, databases, and machine learning classifier.

More »

Fig 1 Expand

Table 1.

Datasets considered for training.

(collected from PubMed between January 2013 and April 2017).

More »

Table 1 Expand

Table 2.

Datasets considered for testing ProtFus.

More »

Table 2 Expand

Fig 2.

N-gram model for detecting N-words by ProtFus.

The N-gram model and some possible sets of combinations.

More »

Fig 2 Expand

Table 3.

Bag-of-words collection for 10 PubMed ID abstracts.

More »

Table 3 Expand

Table 4.

Precision and Recall for retrieval step.

More »

Table 4 Expand

Table 5.

Precision and Recall for named-entity recognition.

More »

Table 5 Expand

Table 6.

Accuracy score of classifiers.

More »

Table 6 Expand

Table 7.

Performance of ProtFus compared to other resources.

More »

Table 7 Expand

Fig 3.

ChiPPI analysis (a) PPI-Fus/ProtFus extraction for BCR-JAK2 and STAT5B interaction (b) as predicted by ProtFus.

More »

Fig 3 Expand

Fig 4.

ROC curve for Naïve Bayes and accuracy.

For fusions, the (a) ROC curve and (b) Precision, Recall, F-score; for fusion PPI (c) ROC curve and (d) Precision, Recall, F-score. Compared to full-text articles, prediction of a cancer type was more accurate for abstracts. This is because the size of feature space is too large for full-text articles. For text classification purposes, abstracts may yield better results than full-text scientific articles.

More »

Fig 4 Expand