ProtFus: A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins

doi:10.1371/journal.pcbi.1007239

Fig 1.

The overall methodology of ProtFus.

The algorithm begins with collecting abstracts and full-texts from PubMed, followed by normalization, tokenization and entity recognition, cross-references, databases, and machine learning classifier.

More »

Expand

Table 1.

Datasets considered for training.

(collected from PubMed between January 2013 and April 2017).

More »

Expand

Table 2.

Datasets considered for testing ProtFus.

More »

Expand

Fig 2.

N-gram model for detecting N-words by ProtFus.

The N-gram model and some possible sets of combinations.

More »

Expand

Table 3.

Bag-of-words collection for 10 PubMed ID abstracts.

More »

Expand

Table 4.

Precision and Recall for retrieval step.

More »

Expand

Table 5.

Precision and Recall for named-entity recognition.

More »

Expand

Table 6.

Accuracy score of classifiers.

More »

Expand

Table 7.

Performance of ProtFus compared to other resources.

More »

Expand

Fig 3.

ChiPPI analysis (a) PPI-Fus/ProtFus extraction for BCR-JAK2 and STAT5B interaction (b) as predicted by ProtFus.

More »

Expand

Fig 4.

ROC curve for Naïve Bayes and accuracy.

For fusions, the (a) ROC curve and (b) Precision, Recall, F-score; for fusion PPI (c) ROC curve and (d) Precision, Recall, F-score. Compared to full-text articles, prediction of a cancer type was more accurate for abstracts. This is because the size of feature space is too large for full-text articles. For text classification purposes, abstracts may yield better results than full-text scientific articles.

More »

Expand