Tollip or Not Tollip: What Are the Evolving Questions behind It?

Tollip plays an important role in the interleukin-1 receptor IL-1R and Toll pathways. As a modulator of the immune pathway, it indirectly controls the amount of antimicrobial peptides. This could indicate a vital step in maintaining animal immune systems and preventing infection. Evolutionary questions are crucial to understanding the conservation and functioning of the biochemical pathways like the Tollip-mediated one. Through an analysis of 36 sequences of the Tollip protein from different animal taxa, downloaded from Kyoto Encyclopedia of Genes and Genomes (KEGG) databank, we inferred diverse evolutionary parameters, such as molecular selection and structure conservation, by analyzing residue by residue, beyond the canonical parameters to this type of study, as maximum likelihood trees. We found that Tollip presented different trends in its evolving history. In primates, the protein is becoming more unstable, just the opposite is observed in the arthropod group. The most interesting finding was the concentration of positively selected residues at amino terminal ends. Some observed topological incongruences in maximum likelihood trees of complete and curated Tollip data sets could be explained through horizontal transfers, evidenced by recombination detection. These results suggest that there is more to be researched and understood about this protein.


Introduction
There is a lot of biological information deposited in online databases, but little of the data is analyzed properly [1,2]. These data are largely used in bioinformatics, covering various areas such as computer science, mathematics and biological engineering several. Thus it is possible to optimize these studies, in a simple way [3]. The bioinformatic data can be used in phylogenetic analysis, as it is used in most branches of biology, such as phylogenetic trees for paralog genes [4], population analysis [5], evolution, epidemiology [6,7], and genomic and metagenomic sequence comparison [8]. Protein phylogeny is used to indicate synonymous and non-synonymous substitutions along with the branches in order to identify cases of rapid changes of amino acids [9]. The analysis of different trees allows for the observation of topological incongruences, differences in the formation of taxa, and the relationship between nodes and trees [10,11]. The complete phylogenetic inference at species level is presented in the Tree of Life (ToL) Web Project. ToL is a collaborative project of hundreds of phylogenetic researchers correlating diverse sources of information, including morphological, physiological, and molecular information. (This project is a work in progress [12]).
The presence of pathogens in the environment can interfere in the survival and reproduction of individuals in a population, leading to new evolutionary trends [13,14]. Multicellular organisms have a rapid immune response to pathogens entering, named innate immunity. This response is performed by specialized cells, which have specific receptors for pathogen-associated molecular patterns (PAMPs) [15,16], the most noticeable are Toll-Like Receptors (TLRs) [17]. Tollip (Toll-interacting protein) participates in the signaling pathway of the TLR with an endogenous modulatory role. Tollip has a target N-terminal Myb1 (Tom1) binding domain (TBD), a conserved core domain 2 (C2) and a Cterminal portion of coupling ubiquitin to endoplasmic reticulum degradation (CUE). In resting cells, Tollip controls the activation pathway of Myeloid differentiation primary response gene (88) (MyD88)-dependent NF-kB in two different ways. First, Tollip associates with IL-1R, TLR4 after LPS activation, inhibiting the immune response mediated by TLR [18,19]. This association requires TLR-TIR domain and intact C-terminal region of Tollip, CUE domain. Second, Tollip binds directly to interleukin-1 receptor-associated kinase-1 (IRAK-1) by inhibiting an autophosphorylation but without promoting its degradation. Overexpression of Tollip leads to inhibition of TLR2, TLR4, and IL-1R signaling, confirming a modulatory role of Tollip in immune responses [20][21][22][23].
The main goal of this paper is to show the topological incongruences between Tollip protein sequence phylogenetic trees using ToL data as reference. Other goals are to determine the diversity in the evolution of this protein in different taxa, the possible horizontal gene transfers, and the correlation of molecular features in the sequences within primates and arthropod groups.

Material and Methods
Thirty-six sequences of Tollip protein were downloaded from KEGG ( Table 1). The phylogenetic reference used was the Tree of Life Web Project, ToL (http://tolweb.org/tree/), which were used in topological comparisons with Tollip generated data.
All evolutionary analyses were carried out on the MEGA 5 software [24]. Maximum likelihood phylogenetic trees were obtained for Tollip using a Muscle alignment and G Blocks curation with default sets at PhyML 3.0 [25], using the most appropriate model of amino acid substitution and likelihood scores assessed by TOPALi V2.5 [26,27]. The best model was determined by using the Akaike Information Criterion (AIC) [28,29]. Supports for the nodes were evaluated by bootstrapping with 1000 pseudoreplicates.
The effect of reticulate evolutionary events was analyzed through a neighbor-net analysis [30] and converted into a splits graph using the drawing algorithms implemented in SplitsTree4 software -version 4.10 [31]. The neighbor-net method was based on the pairwise distance matrices of Tollip complete sequences alignment with deletion of gaps and non-informative parsimony sites; the matrices were calculated and corrected with the Poisson distribution model [32].
The isoletric point (pI) and molecular weight (MW) of each protein sequence was inferred with the Compute pI/Mw tool [33], the variability present in sequences was accessed through the Protein Variability Server [34], and the main protein characteristics as instability index (ININ), aliphatic index (AI) and GRAVY (grand average of hydropathicity) were evaluated with the ProtParam tool [35]. Tests of correlation between collected data and statistical treatments were made with GraphPad Prism version 5.0 software [36].
To check if selection affected the patterns of genetic diversity, we tested if the protein was under positive selection. Tajima's D statistic [37] was calculated by testing the mutation neutrality hypothesis [38], as previously described by Coscollá and colleagues [39]. In order to investigate the presence of positively selected codons, the estimation of both positive and purifying selection at each amino acid sites was calculated from the ratio of non-synonymous to synonymous substitutions, v, as previously described [40]. Analyses were conducted using the Selecton version 2.1 software [41,42]. The significance of scores was obtained by using a Likelihood Ratio Test that compares two nested models: a null model that assumes no selection (M8a) [43] and an alternative model that does (M8) [44].
Several approaches were used to determine the extent of recombination in the Tollip data set. First, Tollip protein sequences previously aligned at Clustal W2 [45] were backtranslated at BioEdit [46] package using standard genetic code, to normalize the codon frequencies and bring/make the comparisons more accurate. Once recombination eventually creates mosaic sequences in which evolutionary history at each site may be different. Then, GARD method [47] available in Datamonkey server [48] was also used to search for evidence of phylogenetic incongruences, and to identify the number and location of breakpoints corresponding to recombination events. In order to confirm GARD results, the recombination was assessed using a recombination cost ''delta dirac'' and mutation cost ''Hamming'', implemented in the Recco program [49]. The gap extension cost was fixed to 0.2 and the statistical significance of the analysis was obtained after 1000 permutations. Validation of the previously obtained results was performed with the six methods implemented in the RDP3 program [50]: RDP [51], GENECONV [52], BootScan [53], Maximum Chisquared Test [54], CHIMAERA [55] and Sister Scan [56]. The analysis was performed with default settings for the detection methods, a Bonferroni corrected P-value cut-off of 0.05, and a requirement that each potential event had to be detected simultaneously by four or more methods.
A tridimensional model was generated starting from hsa protein sequence, evaluating I-Tasser server [57], using default sets. This approach was used in order to assess the potential implications of our findings in the tertiary protein structure. Tests to search for ligands and hot spots in the protein were ran using the Profunc application [58] at PDBSum [59].

Results and Discussion
The maximum likelihood composite trees generated are shown at Figure 1. The incongruent topology is evidenced by different branch sorting between phylogenetic trees of Tollip complete and G blocks trees, and between the current phylogenetic organization available at ToL. The most appropriate model for explaining the evolution of Tollip was found to be mtREV24 [60,61] with following the parameters: BIC of 5424.48, AICc of 5006.19, lnL of -2432.61, Invariant sites n/a, and Gamma parameters of n/a.
The groups were split based on the Tollip protein sequence, confirming the evolutionary relatedness constructed in the ToL project. Despite this, a new phylogenetic array suggests other relationships between protein sequences of these animals. In Figure 1B, notice a node formed by subgroups 13 to 17, which include porifera (subgroup 17) and cnidaria (subgroup 16), together with bilateria, subgroups 13 to 15. The other branch is composed just by vertebrata, group I, which remains like a monophyletic group in all the trees, confirmed by bootstrap values in Figures 1B and 1C, respectively 81% and 89%. The separated groups, based in the Tollip sequence reinforces the evolutionary relatedness constructed in the ToL project. Although, other relationships between the vertebrates are suggested. Indeed, the group I suggests a characteristic function of the protein in order of higher levels of organisms complexity, requiring a less stable molecule for the modulation of information which will be discussed later. In the group I, the primates (subgroup 1) are in two arranges, in Figure 1B the ptr and ggo (bootstrap value of 100%) are far the other primates (bootstrap value of 94%) suggesting one difference in complete sequence, the non conserved sites difference ptr and ggo from the others. The conserved sites separated the two primates too, Figure 1C, the two sequences hold similarities between them in this molecular level.
In Figure 1C, there are two branches, one of them is formed by groups I to V and the other just by group III, except for subgroup 14. This configuration is due to the alignment of conserved sites in the sequences, showing the variability of Tollip sequences between different organisms. When the Tollip complete sequence was analyzed ( Figure 1B), this configuration changed. The group III remains like a monophyletic group, the alignment of non conserved parts does not change significantly the branch, subgroup 14 returns to the group III and the subgroup 15 becomes paraphyletic relatively to group III. This allows us to consider group III as close enough to be relatively consistent throughout the entire analysis.
The average length of Tollip was 262 amino acids with a standard deviation of 64 amino acids; and the molecular weight average was 31.33 kDa with a standard deviation of 6.78 kDa. Through splits decomposition and analysis of alignment, after block curing, we could estimate the proportion of invariant sites as 68.54% and the segregant sites counted was 77 in amino acid base. These findings suggest a tendency of recurrent duplications and/ or insertions, as well as deletions evidenced by variations in the length and mass of protein ranging between approximately 25% and 4.62 fold, respectively. But it is important to stress that the protein activity seems be intact or just slightly reduced, once its activity is essential to keep the health in animals. Tollip participates in several immune pathways, mediated or not by Myd88 [18,19], modulating the responses and the loss or reduction of its affinity to molecular complexes made between it and several other compounds, like IRAK-1 [20][21][22][23], could trigger an exaggerated response of the immune system, leading to death in some cases. Though, there are studies, like Didierlaurent's [62], which affirm that mice lacking Tollip become healthy and fertile. Despite all identified polymorphisms and mutations of Tollip, we could not make any inferences about its role in the TLRtriggering activation of dendritic cells, without more in vitro and in vivo tests. Although, some studies [22,62] have revealed that Tollip does not have a fundamental function in the TLRtriggering activation pathway of dendritic cells. Mutant mice lacking the tollip gene, when compared with wild-type mice, have been shown to not have significant differences. Therefore, mutations in key-residues for Tollip activity does not imply differences at the activation level of dendritic cells.
The characteristics of proteins were evaluated (Table 1) and the distribution of instability index (ININ), aliphatic index (AI) and GRAVY followed the normal distribution (P Kolmogorov Smirnov test .0.05). The molecular weight showed a statically significant correlation coefficient with aliphatic index (p = 0.014; r = 20.4065); another characters showed correlations between the aliphatic index and the instability index (p,0.001; r = 20.5829), and between GRAVY and the aliphatic index (p,,0.001; r = 0.6714). These data are shown in Table 2. The protein variability (Figure 2), was measured by the Shannon coefficient. We observed the Tollip G-block cured proteins, and the regions  that have continuous conserved residues are probably responsible for the catalytic reactions or binding interactions.
The variability of sequence lengths implies a complex organization of Tollip function or adjustment in diverse immune pathways. The standard deviation of the number of residues in a protein probably reflects a process of tertiary structure "stabilization", evidenced by increasing AI values, which were positively correlated (r = 0.366; p = 0.14) with arthropod group. Higher molecular weights showed higher hydrophobicity by AI (p = 20.407; r = 0.007) and GRAVY results showed similar findings, being correlated with AI too (r = 0.671; p = 3.7610 26 ). These aliphatic residues seem to be essential to the evolving process of this protein. The ININ revealed by itself a tendency of accumulation of instabilities in all groups except the arthropods, once a positive correlation of these values was shown between primates group (r = 0.515, p = 0.001) and another negative was shown between arthropods and ININ (r = 20.413; p = 0.006). These tendencies are related with a hydrophobization of the entire molecule, which increase the molecule lifetime, being advantageous for their group due to rapid molecular ratio and molecular turnover [63][64][65].
Primates revealed just a tendency to increase instability of protein (ININ; r = 0.515, p = 0.001); this is related to lower half-life in this protein. It is in agreement with the higher available energy in these animals, in opposite that observed in arthropods or small animals. The cell environment of superior chordata can be very unstable which enables physiological reactions, with rapid and efficient beginnings and ends. The Tollip has sites for ubiquitination [22], which considerably reduces its life-time. In these groups of animals, it seems that Tollip has more sites available or sites with more affinity to ubiquitin.
Starting from a virtual model of this protein, made from hsa sequence, at I-Tasser server using default variables had an estimated accuracy measured through TM-score of 0.3460.12 and a RMSD of 14.163.8 Å . We tried to identify the pattern of hydrophobic pockets, but it was seen that aliphatic residual apolarity is homogenously distributed along the entire sequence ( Figure 3.A and B). The main residues, evidenced by conservation ( Figure 2.A), likely construct the reaction pockets and in the order of the modular characteristics of this protein [66], the alphahelixes and beta-sheets are domains for binding to specific ligands. Some ions showed to be important to conservation of tertiary structure as calcium II (BS Score 1.34-1.39, RMSD 3.00, TM-Score 0.349). It contacts with G69-D74-D121-E122-R123 residues, as can be seen at Figure 3.C, that are relatively conserved. An organic compound, ligand 768 (1-(2,4-dichlorophenoxy)-3-{2imino-3-[2-(1-piperidinyl)ethyl]-2,3-dihydro-1H-benzimidazol-1-yl}-2-propanol), was found to be a ligand which contacts with E118-I131-A132-W133-L154 residues, as can be seen at Figure 3.D, with a RMSD of 5.51, TM-Score of 0.25 and BS Score of 0.86. This ligand 768 is related with inhibition of calcium-dependent membrane binding activity of prothrombin and of factors Va, VIII and Xa of human coagulation pathway [67] by interaction with C2 domain. This interaction is consistent with Tollip modular criteria and its functions, revealing a potential need of Ca 2+ to maintain the C2 domain structure and could be potentially inhibited by 768 ligand.
Splits graph (Figure 4) using a neighbor-net analysis, excluding parsimony uninformative sites, gaps and constant sites, showed a concentrated reticulation in the evolutionary history of Tollip, mainly disposed in superior animals. These groups present a complex diversification history. Indeed, these incongruences evidenced by trees (Figure 1) reveal an interesting clustering pattern in this protein, stressing the diversification of arthropod in detriment of others. This division is due to the diverse pathogenic Table 2. Correlation between main assessed protein characteristics of Tollip.

20.22
These characters were evaluated under ANOVA 1 Factor test, followed by Bartlett's test for equal variances at 5% of significance. To evaluate the correlation coefficients and p-value was made using the Bonferroni   elements that eventually could enter in contact with the arthropod and the ubiquity of their presence in almost all possible environments (earth, air and water) could make this process more efficient and fast. The high bootstrap values evidence the strong support of presented nodes and clusters. GARD found at least 3 breaking-points with statistical significance (p,0.001, KH test) and these findings were supported by Recco analysis with 1000 bootstraps and by at least four Figure 5. Recombinational events involved with Tollip evolution. Each sequence are represented by a color and the recombination is evidenced by donor. All analyses were evaluated with RDP and the most significant P value to support the findings are shown at Table 3. doi:10.1371/journal.pone.0097219.g005 Table 3. Potential recombinant events identified in Tollip with RDP.  (Table 3). RDP showed the same breaking-points ( Figure 5) which comprised the hypothesis that recombinational events generated or could isolate some groups bringing new specific pathways. Some incongruences, as discussed before, could be explained by these events. Owing to a strange pattern of recombination found, the most probable hypothesis to support our findings is the horizontal transfer mediated by viruses or bacteria [68], that lived in the same environment of the two species, as some donors could not be identified with a high-threshold confidence level, these events could take part of very long and intrinsic evolutionary histories. The molecular clock evaluated with the sequences (Table 4) showed that the sequences really presented different evolving patterns, where the sequences have increasing patterns of substitution when the complexity of the organisms become higher. These findings suggested that the Tollip evolutionary pattern is related with successive insertions and deletions that change the protein primary structure in order to bring less stable products; this is explained by the protein turnover that turns higher when the available energy and size of the animal increases [63][64][65].
Tajima's D statistic was 1.8226, meaning a tendency to positive selection. To assess the positive selection, we normalized the sequences using BioEdit through the back translation device, once the problem of codon preferences for each species could interfere in posterior analyses. To evaluate the results showed at Tajima's D statistic, the Selecton server was used and the results ( Figure 6) showed a positive selection operating in almost all residues (49.33%) with statistical significance, M8 versus M8a as null model, evidenced by DLnL value of 179.6 (p,0.001).
These findings are consistent with the data presented by analysis of the segregant and conserved sites, where it was determined that the protein is variable and presents a very active process of restructuring. The protein domains from amino-terminal ends are under a high positive selection, indicating that these parts of protein are variable and become higher adaptative values with more variability. Several consecutive amino acids presented in the second domain in the sequence a relative conservation, including a tendency to negative selection. These residues are related with the activity of the protein. Indeed, they could participate in the Tollip The molecular clock test was performed by comparing the ML value for the given topology with and without the molecular clock constraints under Poisson model [32]. The null hypothesis of equal evolutionary rate throughout the trees were rejected at a 5% significance level (P,,0.05). The analysis involved 36 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 88 positions in the final dataset. Evolutionary analyses were conducted in MEGA5 [24]. doi:10.1371/journal.pone.0097219.t004 protein-protein and protein-lipid interactions, crucial to the right performance on the pathway. The modulatory activity of Tollip is directly related to their association with different intracellular factors, such as other proteins and calcium ions. We have noted that these residues responsible for such interactions suffer broadly neutral to negative selection, which in fact was obviously expected in order to keep their functionality.
Tollip polymorphisms were correlated with several human diseases like atopic dermatitis [69], inflammatory bowel disease [70], tuberculosis [71] and other. In atopic dermatitis (AD), Singlestrand conformation polymorphism (SSCP) of the tollip sequence is correlated to AD. We could infer that amino acid exchanges of A (Ala) to S (Ser) occur at residue 222. Ser222 has a higher correlation to healthier controls (5.4%) when compared with Ala222 (2.7%) [69]. Residue 222 is occupied by A in 52.78% of sequences and is largely distributed in vertebrate sequences (subgroups 1-10), excluding oaa sequences. We found that it suffers a strong positive selection, through Selecton analysis (Fig. 6), shown as residue 245. In this case, seems that residue is conservated through vertebrata, subgroups 1 to 10, and we are inclined to believe that this is a trait which has became from an ancestor at amphibian level, and this mutation could be benefic to them too.
Tollip, in inflammatory bowel disease, suffers an inactivation that makes the intestinal epithelium unable to inhibit LPS-induced NF-kB activation [70], through a mutational amino acid exchange of lys150glu. In this sense, all primates present the residue K (Lys) at this position (except the ggo and ptr, which present R (Arg) and G (Gly), respectively), despite the common trend to present residue E (Glu) in other animals (55.56%). Residue D (Asp) in this position is important for insects (except for tsp and phu sequences, presenting E and Q (Gln) residues, respectively). This residue is under positive selection (position 166 at Selecton file, Figure 6). This mutation is apparently negative to health in primates and is a trait which was largely incorporated by other animals, reflecting a directional selection in this group, as occurs among insects.
In addition, a study involving tuberculosis (TB) and tollip [71] reveals that some synonymous polymorphisms or some that occur at noncoding regions (intronic regions or 3'UTR) could trigger different levels of risk of TB, not identifiable with protein analysis. This study also shows an association between minor homozygote of single-nucleotide polymorphism (SNP), named rs5743899, and a trend of reduced levels of Tollip mRNA in comparison with heterozygotes and major homozygotes, driving down the Tollip expression levels regulating cytokine response. Still, another SNP (rs3750920) was associated with increased levels of Tollip mRNA, providing protection to the organism against TB. The hypofunctional Tollip genotype has an association with increased levels of proinflammatory cytokines and increased risk of TB, as well as production of augmented proinflammatory cytokines. However, the assessed SNPs were related to synonymous variations or mutations in non-coding regions, and therefore, our data could not reveal any kind of correlation with that. Shah et al. [71] finds that tollip's effect in murine models are not applicable because tollip behaves differently in humans. There is a need for more research in this area.

Conclusions
Tollip presents diverse evolutionary tendencies and several of them are indicating successive modifications in the protein structure, in order to stabilize the tertiary structure accumulating aliphatic residues. Primates generally have more unstable proteins, while arthropods have more stability at ININ, AI and GRAVY level. Size was not correlated with any groups and seems to be highly variable in all groups. In/del trends were saw as very frequent. The three dimensional structure analysis revealed the modular characteristic of this protein and the necessity of Ca 2+ to keep the correct pocket of C2 domain. Ligand association studies revealed that 768 ligand probably could inhibit the Tollip activity. Positively selected residues were found in almost all domains, but a considerable part of them are relatively conserved, indicating a conservation of active pockets, which is consistent with maintaining protein right activity. The tested animal groups were differentially grouped, when studied with parsimonious and nonparsimonious residues, and revealed through molecular clock analysis that they present different selection and evolving speeds. The recombination supports diverse incongruences observed in the phylogenetic trees obtained with complete and cured Tollip data sets. There are no evidences that support a homogeneity in this immunologic pathway, once Tollip presented evolving trends that are not constant for all groups. Summarizing, some groups are highly evolutionary closed, as arthropods and primates, but when compared between them are totally non consistent.
In conclusion, differences in Tollip structure among vertebrates could be detected as well as changes occurring in the primary structure through evolutionary processes. Once these changes occur in Tollip's structure, the same must occur with other structures in the IL-1R and TLR pathway. Adaptive immunity is commonly seen as the most evolved aspect of the immune systems of these organisms, but our data suggest that innate immunity in vertebrates could also be evolving differently among the species in order to promote a better adaptation to their reality.