The author has declared that no competing interests exist.
Conceived and designed the experiments: FO. Performed the experiments: FO. Analyzed the data: FO. Contributed reagents/materials/analysis tools: FO. Wrote the paper: FO.
The introduction of the term ‘Tubulin Polymerization Promoting Protein (TPPP)-like proteins’ is suggested. They constitute a eukaryotic protein superfamily, characterized by the presence of the p25alpha domain (Pfam05517, IPR008907), and named after the first identified member, TPPP/p25, exhibiting microtubule stabilizing function. TPPP-like proteins can be grouped on the basis of two characteristics: the length of their p25alpha domain, which can be long, short, truncated or partial, and the presence or absence of additional domain(s). TPPPs, in the strict sense, contain no other domains but one long or short p25alpha one (long- and short-type TPPPs, respectively). Proteins possessing truncated p25alpha domain are first described in this paper. They evolved from the long-type TPPPs and can be considered as arthropod-specific paralogs of long-type TPPPs. Phylogenetic analysis shows that the two groups (long-type and truncated TPPPs) split in the common ancestor of arthropods. Incomplete p25alpha domains can be found in multidomain TPPP-like proteins as well. The various subfamilies occur with a characteristic phyletic distribution: e. g., animal genomes/proteomes contain almost without exception long-type TPPPs; the multidomain apicortins occur almost exclusively in apicomplexan parasites. There are no data about the physiological function of these proteins except two human long-type TPPP paralogs which are involved in developmental processes of the brain and the musculoskeletal system, respectively. I predict that the superfamily members containing long or partial p25alpha domain are often intrinsically disordered proteins, while those with short or truncated domain(s) are structurally ordered. Interestingly, members of this superfamily connected or maybe connected to diseases are intrinsically disordered proteins.
The TPPPs, a new eukaryotic protein family, has recently been identified
There are three TPPP paralogs in the human genome; denoted as TPPP/p25, TPPP2/p18 and TPPP3/p20 (shortly TPPP1, TPPP2 and TPPP3, respectively), indicating their molecular mass
In this paper I have investigated the conservation of this protein/gene family and the occurrence of the p25alpha domain in a systematic bioinformatics study. I have denoted the proteins/genes containing the p25alpha domain as “
Accession Numbers of protein and EST sequences refer to the NCBI RefSeq and GenBank databases, respectively, except if otherwise stated.
The database search was started with an NCBI blast search using the sequences of human TPPP proteins (NP_008961; NP_776245; NP_057048). BLASTP or TBLASTN analysis
In the case of other multidomain proteins a higher threshold (1e−2) was used but the reciprocal best-hit approach cannot be applied. Moreover, the EBI InterPro (
Domain | Long p25alpha | Truncated p25alpha | Short p25alpha | Short p25alpha | Partial p25alpha | Partial p25alpha |
Protein | Long-type TPPP | Truncated TPPP | Short-type TPPP | Multidomain proteins | Apicortin | |
|
|
|
|
|
|
|
|
|
|
|
|
||
Choanomonada | 2 | 2 | ||||
Metazoa | 200 (48) | 21 | 1 | |||
Vertebrata | 148 (47) | |||||
Fungi | 3 | 2 | 1 | |||
|
|
|||||
|
|
|||||
|
|
|
|
|
|
|
Glaucophyta | 1 (1) | |||||
Chloroplastida | 2 (2) | 9 (2) | 10 | 3 (1) | 1 (1) | |
Chlorophyta | 7 | 10 | 2 | |||
Charophyta | 2 (2) | 2 (2) | 1 (1) | 1 (1) | ||
|
|
|
|
|||
Stramenopiles | 6 | |||||
Alveolata | 26 | 15 | ||||
|
|
|||||
|
|
|
|
|
||
Fornicata | 2 | |||||
Jakobida | 3 (3) | 2 (2) | ||||
Malawimonas | 1 (1) | |||||
Preaxostyla | 1 (1) | |||||
Heterolobosea | 2 | 1 | ||||
Euglenozoa | 10 (2) |
The numbers of ESTs are in parenthesis.
Structural similarities were investigated by the PDBeFold (Structure Similarity) server (
The phylogenetic classification and nomenclature applied in Adl et al.
Multiple alignments of sequences were done by the ClustalW program
Sequences were submitted to the IUPRED server freely available at
TPPP-like proteins involve TPPPs and other proteins possessing one or more complete or partial p25alpha domain, Pfam05517 or IPR008907 (cf.
The proteins are quasi-aligned, i.e., the length and the position of the domains correspond to the real situation. White boxes and ovals represent p25alpha domains and other kind of domains, respectively. Black squares show the position of the Rossmann-like motif. The dotted line in short-type TPPP represents the position of amino acids being present in long-type TPPPs but missing in short-type ones. Apicortin is the
The alignment was refined manually. Long type TPPPs:
Long-type TPPPs possess the whole p25alpha domain. They eventuate in all the three phylogenetic megagroups (i.e. unikonts, the photosynthetic megagroup and Excavate) and are the most abundant in Opisthokonta, especially in animals (Metazoa) (cf.
It is rather rare in the photosynthetic megagroup (Archaeplastida+Rhizaria+Chromalveolata). It can be found at EST level in the Glaucophyta
These proteins are identified in this paper. They are discussed after the long-type TPPPs since it seems that they evolved by the loss of the last exon of long-type TPPPs (see later). They occur only in some animals, mostly in Endopterygota, insects undergoing on metamorphosis, e.g., flies, butterflies, ants, beetles. In some cases it might happen that these proteins are artifacts due to incomplete sequencing but in the case of flies (Diptera), including all the twelve Drosophila species, where the whole genomes are known, it can be excluded. These proteins are listed in
Phylogenetic group | Species | ID | GI | Source |
|
||||
Hexopoda Insecta Endopterygota | ||||
|
NP_648370 | 24662040 | RefSeq | |
|
XP_002029959 | 195326485 | RefSeq | |
|
XP_002084342 | 195589197 | RefSeq | |
|
XP_001972246 | 194868209 | RefSeq | |
|
XP_002094265 | 195493080 | RefSeq | |
|
XP_001957775 | 194750915 | RefSeq | |
|
XP_002062203 | 195428283 | RefSeq | |
|
XP_002025402 | 195169178 | RefSeq | |
|
XP_001353716 | 125979367 | RefSeq | |
|
XP_002007566 | 195126208 | RefSeq | |
|
XP_002047114 | 195376667 | RefSeq | |
|
XP_001983728 | 195012698 | RefSeq | |
|
XP_556944 | 57918257 | RefSeq | |
|
XP_001862283 | 170052572 | RefSeq | |
|
EFN74475 | 307190439 | GenBank | |
|
EFZ11240 |
322784183 | GenBank | |
|
EHJ66593 | 357609707 | GenBank | |
|
EFA09619 | 270013171 | GenBank | |
EEZ98749 | 270002302 | GenBank | ||
Chelicerata Arachnida Acari | ||||
|
XP_002404704 | 241731346 | RefSeq | |
|
XP_003742023 | 391335280 | RefSeq | |
|
||||
Trematoda | ||||
|
GAA47940 |
358339980 | RefSeq |
Phylogenetic analysis makes questionable whether EFZ11240 and GAA47940 belong to this group.
Short-type TPPPs contain a short p25alpha domain, which corresponds to the whole or major part of their sequences (cf.
This protein is also widely distributed in all the three phyla of Alveolata (Apicomplexa, Ciliophora, Dinozoa), representing its occurrence in the Chromalveolata supergroup (cf.
Interestingly, in many species more paralogs of short-type TPPP can be found. This is the situation in Clorophyta, Alveolata and Euglenozoa as well. As the phylogenetic analysis has shown (see later), these multiple occurrences are the results of species and lineage specific duplications. (The short-type TPPPs are listed on
In addition to the incidences of short p25alpha domain in short-type TPPPs, it occurs as a part of larger proteins. The length of the p25alpha domains in these proteins range between about 70 and 140 amino acids thus it is not unambiguous whether they can be considered as truncated or short domains. The first half of the p25alpha domain is always present but the length of the C-terminal part varies. This kind of occurrence happens mostly in two photosynthetic supergroups, Archeaplastida and Chromalveolate (cf.
Phylogenetic group | Species | ID | Source | Number of short p25alpha domains | Other domain/motif | |
Name | CDD | |||||
|
||||||
|
XP_001691800 (GI:159467228) | RefSeq | 2 | EFh | 28933 | |
|
XP_002948912 (GI:302834700) | RefSeq | 2 | EFh | 28933 | |
|
XP_003058058 (GI:303277529) | RefSeq | 3 | EFh COG4942 | 28933 34550 | |
XP_003063447 (GI:303288317) | RefSeq | 1 | EFh | 28933 | ||
XP_002506378 (GI:255088912) | RefSeq | 2 | EFh | 208857 | ||
XP_002507907 (GI:255081370) | RefSeq | 2 | EFh | 28933 | ||
XP_003061031 (GI:303283480) | RefSeq | 2 | - | - | ||
|
EFN57882 (GI:307109645) | GenBank | 2 | - | - | |
|
EIE25016 (GI:384251539) | GenBank | 2 | EFh | - | |
|
XP_001421186 (GI:145353793) | RefSeq | 2 | - | - | |
|
||||||
Stramenopiles |
|
CCA17632 (GI:325183175) | GenBank | 1 | P-loopNTPase DEXDc HELICc | 208973 197756 28960 |
|
CBN75312 (GI:299117356) | GenBank | 1 | Znf BBOX IQ | 206793 210118 | |
CBJ49059 (GI:298705751) | GenBank | 1 | zf-SNAP50_C Znf BBOX IQ WW | 204865 206793 210118 206869 | ||
|
XP_002905233 (GI:301112308) | RefSeq | 1 | Znf BBOX IQ COG5022 | 206793 210118 34627 | |
XP_002907084 (GI:301118713) | RefSeq | 1 | Mcp5_PH | 206947 | ||
|
EGZ26181 (GI:348686366) | GenBank | 1 | Znf BBOX IQ COG5022 | 206793 210118 34627 | |
|
||||||
Heterolobosea |
|
XP_002683090 (GI:291001047) | RefSeq | 1 | Kelch | 207702 |
XP_002682916 | RefSeq | 1 | PLN02919 PTPc | 29029 206804 |
These kinds of larger proteins of Clorophyta species contain the short p25alpha domain generally in duplicate but XP_003078535 of
These kinds of proteins of flagellated stramenopiles, as
Finally, an Excavata species, the Heterolobosea
The partial p25alpha domain, with or without the Rossmann-like motif, can be found in many organisms, in all megagroups, occurring independently from the other parts of the p25alpha domain (
Phylogenetic group | Species | ID | Source | Number of partial p25alpha domains | Rossmann-like motif |
|
|||||
Choanomonada |
|
XP_001750206 (GI:167537072) | RefSeq | 1 | yes |
|
EGD82798 (GI:326437228) | GenBank | 1 | yes | |
Fungi |
|
EGF79566 (GI:328769522) | GenBank | 2 | yes, no |
|
SPPG_08463 | Broad Institute |
2 | yes | |
|
|||||
Mycetozoa |
|
EC854006 |
GenBank | 3 | yes |
|
|
AMSG_02233 | Broad Institute |
4 | yes |
|
|||||
Chloroplastida |
|
XP_001690551 (GI:159464643) | RefSeq | 1 | yes |
|
XP_002946586 (GI:302830039) | RefSeq | 2 | yes | |
|
GR509039 |
GenBank | 1 | yes | |
|
|||||
Stramenopiles |
|
EGB10333 (GI:323454463) | GenBank | 1 | yes |
|
CBN76131 (GI:299116327) | GenBank | 1 | no | |
|
XP_002907772 (GI:301120089) | RefSeq | 1 | no | |
|
phyra80518 scaffold_50000026 draft genome v1.1 | DOE JGI |
1 | no | |
|
EGZ29591 (GI: 348689777) | GenBank | 1 | no | |
|
|||||
Fornicata |
|
XP_001705540 (GI:159110572) | RefSeq | 2 | no, yes |
GL50581_3979 | GiardiaDB |
2 | no, yes | ||
Jakobida |
|
EC691986 |
GenBank | 3 | no |
|
EC817264 |
GenBank | 3 | no | |
Preaxostyla |
|
EC840067 |
GenBank | 3 | yes |
Heterolobosea |
|
D2VER9_NAEGR (EFC44650) | UniProt | 2 | no |
Asterisks indicate ESTs.
A special case of this independent occurrence is the apicortin where the partial p25alpha domain is combined with a DCX (Pfam03607, IPR003533) domain
The green algae,
The alignment was refined manually. Long type TPPPs:
EST data revealed that the multiplication of the partial domain occurs in many other genomic sequences in various species: in the flagellated Amoebozoa,
The multiple alignment of the C-termini of short- and long-type TPPPs and the partial p25alpha domains (
Two independent analyses were run with three heated and one cold chain for 2×106 generations, and 1.0×106 generations discarded as burn-in. The numbers at the nodes represent clade credibility values; branches that received maximum support are indicated by full circles. For easier comparison, long-type TPPPs are labeled by name, truncated TPPPs by species code and short-type TPPPs by species code and accession number. All accession numbers are listed in
The detailed phyletic analysis of long-type TPPPs of Opisthokonts was carried out by Stifanic et al.
Truncated TPPPs are embedded as a sub-clade into long-type TPPPs (
Phylogenetic tree of short-type TPPPs (
Of course, multidomain proteins cannot be analyzed in this way, thus in this case only the short p25alpha domains were used in the analysis (
The phylogenetic tree built using the sequences of the partial p25alpha domains shows that short- and long-type TPPPs are separated, as in the case of the whole proteins (
As suggested recently, eukaryotes can be divided into three monophyletic megagroups: unikonts, Archaeplastida+Rhizaria+Chromalveolata, Excavata
Opisthokonta is specific almost exclusively for the long-type TPPPs. Long-type TPPP is present in all the metazoan genomes known but
The “photosynthetic” megagroup (Archaeplastida+Rhizaria+Chromalveolata) is represented mainly by the short-type TPPP which is present in all three supergroups. In the case of Chromalveolata it holds for the monophyletic clade (stramenopiles and Alveolata including Apicomplexa, Ciliophora, and Dinozoa) but not for the HC group (Haptophyta and Cryptomonads), in which no TPPP-like protein was found, at least in the databases available. The apicomplexan species contain, beside the short form, also a partial p25alpha domain as part of apicortin. For Rhizaria only very few data are available but the biflagellated Rhizarian,
The Archeaplastida (beside Excavata) shows the most multifarious picture concerning the distribution of these protein family members. Green algae contain short-type TPPPs, partial p25alpha domain containing proteins and multidomain proteins with more than one short p25alpha domains. Multidomain proteins can be found also in stramenopiles. A Glaucophyta (
In Excavata, according to the EST data available, both short- and long-type TPPPs and the partial domain are widely distributed. Euglenozoa, on one hand, Jakobida and Malawimonadidae, on the other hand, are characterized by the occurrence of the short and long form, respectively. Several proteins/genes in
NMR structures are available only for a few long-type TPPPs: CE32E8.3 of
The long N-terminal tail, present only in TPPP1, is fully disordered (∼50 aa). The further part of the molecules, present in all long-type TPPPs, is composed of two distinct regions. The C-terminal, sequentially conserved, part is unstructured (about ∼60 aa) in all cases. The middle, less conserved, region is more ordered. In the case of TPPP1 it is rather flexible; the other three proteins possess 5 α-helices in this part; human TPPP2 has also 2 β-sheets. This region corresponds to the first two coding exons, while the C-terminus to the third one, not only in human but in most of the long-type TPPPs
The disordered regions of human TPPP1 have probably functional role since they were suggested to be responsible for the binding of the protein to microtubules
Disorder prediction values for the given residues are plotted against the amino acid residue number. The significance threshold, above which a residue is considered to be disordered, set to 0.5, is shown. A)
Long-type TPPPs have generally been predicted to be similar as established experimentally for the above mentioned cases. The C-termini of the TPPPs are predicted to be disordered, as well as the N-terminal tail of the
Short-type and truncated TPPPs are generally predicted to be ordered in their full length. The examples of
Members of another class of TPPP-like proteins contain only (a) partial p25alpha domain(s), the sequence of which is very conservative and corresponds to the C-terminal part of long TPPPs. Characteristically, the partial p25alpha domain occurs in disordered proteins. Proteins containing this sequence in more than one copy are generally fully disordered (
In conclusion, one can hypothesize that long-type TPPPs and proteins with partial p25alpha domain have a role in microtubule organization due their disordered character, while short-type and truncated TPPPs and proteins with short p25alpha domain may miss this function. Naturally, experimental verification of this hypothesis is needed.
Interestingly, members of this superfamily connected or maybe connected to diseases are intrinsically disordered proteins. Apicortins occur almost exclusively in apicomplexan parasites responsible for illnesses as malaria and toxoplasmosis. It was suggested that they are involved in the so called apical complex of these protists, which has important role in the pathogen-host interactions. A long-type TPPP (human TPPP1) was shown to be enriched in glial and neuronal inclusions in synucleinopathies as Parkinson's disease and multiple system atrophy
The TPPP gene was considered to be conserved in the genomes of ciliated/flagellated eukaryotes but to be absent from those that are non-ciliated
The truncated TPPPs evolved by the loss of the last exon of long-type TPPPs in some arthropods (Arthropoda), especially in Entopterygota (insects undergoing on metamorphosis). It occurs also in other Arthropoda subphylum, Chelicerata, in ticks and mites; and perhaps in a flatworm,
The combination of short and partial p25alpha domains with various other domains has of special interest. Apicortin is a chimeric protein of partial p25alpha and DCX domains. Its evolution is enigmatic because of its very limited and specific phyletic occurrence: it is present only in few species except the phylum Apicomplexa. On the other hand, the DCX domain, which is common in Metazoa, was not found in the photosynthetic megagroup
The other multidomain proteins being present mostly on algae and stramenopiles seem to be of lineage specific origin.
Multiple sequence alignments of TPPP proteins by ClustalW used for constructing the phylogenetic tree on
(DOC)
Multiple sequence alignments of TPPP-like proteins by ClustalW used for constructing the phylogenetic trees.
(DOC)
Multiple sequence alignment of
(DOC)
Phylogenetic tree of the short-type TPPPs obtained by Bayesian analysis. Two independent analyses were run with three heated and one cold chain for 2×106 generations, and 1.0×106 generations discarded as burn-in. The numbers at the nodes represent clade credibility values; branches that received maximum support are indicated by full circles. Proteins and ESTs (labeled by asterisk) are indicated by species code and database accession number. ETH (
(TIF)
Phylogenetic tree of the short p25alpha domains obtained by Bayesian analysis. Two independent analyses were run with three heated and one cold chain for 2.6×106 generations and 2.1×105 generations discarded as burn-in. Species codes are the same as in
(TIF)
Phylogenetic tree of the partial p25alpha domains obtained by Bayesian analysis. Two independent analyses were run with three heated and one cold chain for 1.1×106 generations and 5.5×105 generations were discarded as burn-in. Cr hominis and Cr parvum stand for
(TIF)
Disorder prediction of TPPP-like proteins using POODLE-L (solid line) and IUPRED (dotted line) predictors. Disorder prediction values for the given residues are plotted against the amino acid residue number. The significance threshold, above which a residue is considered to be disordered, set to 0.5, is shown. A)
(TIF)
Phyletic distribution of the TPPP-like proteins.
(DOC)