Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing

doi:10.1371/journal.pcbi.1012575

Reliability of plastid and mitochondrial localisation prediction declines rapidly with the evolutionary distance to the training set increasing

Fig 6

A framework for improving localisation prediction algorithms.

Strategies to improve prediction reliability involve changes in the curation of the training data as well as the training procedure. The training data should ideally be collected from a range of diverse species and for each be based on different experimental techniques that support a training protein’s localisation (e.g. reporters, mass spectrometry, coexpression, interactions). Proteins with non-canonical internal motifs, or those dually targeted need to be taken into account (as they help to better distinguish between pNTS and mNTS features) and validated data could be sorted according to whether it is part of a core- or pan-proteome. Classifiers on which the algorithms are trained could include parameters such as the evolutionary distance of a species, non-coding regions, or a protein’s abundance as a currently neglected factor. One can expect that the combination of multi-dimensional parameters from evolutionary biology, cell biology and molecular biology on evolutionary diverse species will significantly improve the next generation of machine leaning algorithms that serve localisation (and function) predictions.

doi: https://doi.org/10.1371/journal.pcbi.1012575.g006