Thousands of Rab GTPases for the Cell Biologist
(A) summarises the validation in normal mode, i.e. without taking the subfamily score produced by Rabifier into account, against the Rab families of Trypanosoma brucei , Entamoeba histolytica  and Monosiga brevicollis, which we annotated in (E). Three quantities needed to judge the performance of the Rabifier are shown for Rabs belonging to human and other subfamilies separately: sequences erroneously classified as not being a Rab by the Rabifier (red), sequences correctly identified as Rabs, however, wrongly classified at subfamily level (light green), and those which were entirely correct (dark green). (B) displays the distribution of confidence scores associated to each subfamily call, respecting the same colour code as above. The blue line indicates the threshold which we propose on default, and below which subfamily classification may be rejected and treated as a undefined RabX. That choice is based on the ROC-curve  analysis shown in (C), which plots the true positive rate against the false positive rate for each possible confidence threshold  and provides a combined measure of the accuracy of a classifier (Area under the curve, AUC ). The effect of choosing an 0.4 confidence threshold (blue circle) on the classification accuracy, i.e. running the Rabifier in high confidence mode, is shown in the inlay. (D) plots the improvement in terms of the three quantities discussed above the Rabifier achieves compared to an alternative strategy (see Results and Discussion for details on its implementation). (E) Phylogenetic tree of the human and M. brevicollis Rab family on which the manual classification of the latter Rab family was based (bootstrap support above 70% shown). Colours indicate the results of the corresponding automated annotation for that specific sequence. Abbreviations: subfamily (sf.), annotation (annot.).