The Evolution and Origin of Animal Toll-Like Receptor Signaling Pathway Revealed by Network-Level Molecular Evolutionary Analyses

Genes carry out their biological functions through pathways in complex networks consisting of many interacting molecules. Studies on the effect of network architecture on the evolution of individual proteins will provide valuable information for understanding the origin and evolution as well as functional conservation of signaling pathways. However, the relationship between the network architecture and the individual protein sequence evolution is yet little known. In current study, we carried out network-level molecular evolution analysis on TLR (Toll-like receptor ) signaling pathway, which plays an important role in innate immunity in insects and mammals, and we found that: 1) The selection constraint of genes was negatively correlated with its position along TLR signaling pathway; 2) all genes in TLR signaling pathway were highly conserved and underwent strong purifying selection; 3) the distribution of selective pressure along the pathway was driven by differential nonsynonymous substitution levels; 4) The TLR signaling pathway might present in a common ancestor of sponges and eumetazoa, and evolve via the TLR, IKK, IκB and NF-κB genes underwent duplication events as well as adaptor molecular enlargement, and gene structure and conservation motif of NF-κB genes shifted in their evolutionary history. Our results will improve our understanding on the evolutionary history of animal TLR signaling pathway as well as the relationship between the network architecture and the sequences evolution of individual protein.


Introduction
Genes perform their biological functions within genetic pathways via interacting with other molecules in networks [1,2]. It may provide important insights into the evolutionary constraints of genes in molecular networks via establishing the genetic variation pattern of genes across networks and revealing the impact of natural selection on such variability. Many studies have shown that proteins in the center of networks experienced strong evolutionary constraints, while proteins in the periphery of networks seemed to undergo positive selection, and physically interacting proteins in networks showed similar evolutionary rates [3,4,5,6]. The distribution of gene selective pressure within networks might depend on network function [7]. However, it's still unclear how the network architecture impacts on gene evolution in networks. Comparison of the nucleotide substitution patterns in multiple networks would possibly discover general rules for gene evolution within genetic networks.
Downstream genes experienced relaxed selection pressure and evolved faster than upstream ones do in several pathways, such as the anthocyanin pathway [8,9,10] and the carotenoid & terpenoid biosynthetic pathway in plants [11,12], and the melanin synthesis pathway in silkworms [13]. In these pathways, a gene nucleotide substitution rate always positively correlates to the gene position in network. In contrast, downstream genes tend to evolve more slowly than upstream ones do in the insulin/Tor signaling pathway in Drosophila and vertebrates [2,14]. In the insulin/ Tor signaling pathway in Caenorhabditis, there is no relationship between the rate of nonsynonymous/synonymous substitution and the position of a gene in network [1], while in the HOG-signaling pathway in yeast [15] and the N-Glycosylation metabolic pathway across primates [16] there is a negative correlation between the nucleotide substitution rate and position of a gene in network.
The TLR signaling pathway, which plays a central role in innate immunity from Drosophila to mammal [17], is highly conserved in structure and function from insects to vertebrates [18]. Many immunomodulatory properties of the TLR signaling pathway have been found in some early metazoan genomes such as Nematostella vectensis [19], Trichoplax [20] and Amphimedon queenslandica [21]. Surveying early-branching metazoans (Nematostella vectensis), and the recent discovery of unusual TLR-like genes in various multicellular animals (Amphimedon queenslandica and Hydra magnipapillata) [22,23,24] would help us to decipher the origin of this toll-like receptor superfamily. Although the NF-kB gene was detected in single-celled eukaryote Capsaspora owczarzaki, the tolllike receptor was not found in single-celled eukaryotes [25], and the NF-kB gene is lost in Caenorhabditis elegans [26]. The evolution history of the animal TLR signaling pathway is still unknown.
Here, we study the evolution and origin of TLR signaling pathway in animals at network-level. Our results showed that the nucleotide substitution rate was negatively correlated with gene position along TLR signaling pathway from receptors to transcription factors, and all genes underwent relatively strong purifying selection. We also found that selective pressures on genes along the pathway were driven by nonsynonymous substitutions. More importantly, we provided evidences to support that the TLR signaling pathway was presented in Porifera, but not in choanoflagellate Monosiga brevicollis, Saccharomyces cerevisiae and Caenorhabditis elegans. In addition, NF-kB genes underwent strong selection pressure in the evolution and showed positive selection in some branches of the evolutionary tree, which might be connected with the gene duplication, gene structure shift and domain lost. Our findings suggested that the TLR signaling pathway might present in a common ancestor of eumetazoa, such as sponge and placozoans.

Data collection
The data set contained the TLR signaling pathway-related genes from eight vertebrates. These genes and their interaction networks were downloaded from the KEGG database (PATH-WAY: map04620). Here, we mainly focused on these genes involved in TLR signaling pathway. We downloaded these protein coding sequences (CDS) and protein sequences of Homo sapiens (human), Pan troglodytes (chimpanzee), Macaca mulatta (macaque), Mus musculus (mouse), Bos taurus (cow), Gallus gallus (chicken), Xenopus tropicalis (western clawed frog), Danio rerio (zebrafish) from KEGG database. Finally, the dataset was composed of 50 genes listed in Table 1 and Table S1.
However, some sequences from non-human vertebrates showed long deletions in the middle of the genes when aligned to their orthologous group, were likely to be artifacts due to a low quality of the sequence or of the annotation. The putative missing parts of incomplete CDS were recovered through a similarity-search-based procedure as follows: first, a BLAT search against the whole genome of interest on the UCSC genome browser (http:// genome.ucsc.edu/cgi-bin/hgBlat) was performed; if not successful, a BLAST search against NCBI Traces (http://blast.ncbi.nlm.nih. gov/Blast.cgi) or Ensemble genome browser (http://www. ensembl.org/info/about/species.html) of whole genomes was performed. When a homologous genomic region could be retrieved, the structure of the gene was predicted with the Wise2 program of the GeneWise tool [27] applied with default options. This program was used to predict the structure of a gene (intronsexons) given a genomic DNA region and a protein sequence of high homology to the putative one. The genomic DNA identified through the BLAT search and the protein sequences of the human reference genes were given as input to the Wise2 which predicted the gene structure. Only good predictions in which there were no internal stop codons or frame shifts were accepted. The longest transcript of a gene with more than one transcription was chosen for analysis (detailed information Table S1).
Moreover, to explore the evolutionary origin of the TLR signaling pathway, tblastn and blastp were employed to search the orthologs genes of TLR signaling pathway in genomes of single-celled eukaryotes (Saccharomyces cerevisiae, Monosiga brevicollis), sponges (Amphimedon queenslandica), sea anemone (Nematostella vectensis), hydra (Hydra magnipapillata), nematode (Caenorhabditis elegans), fruit fly (Drosophila melanogaster), amphioxus (Branchiostoma floridae), zebrafish (Danio rerio), frog (Xenopus tropicalis), and human (Homo sapiens) and so on. All human sequences of TLR signaling pathway-related genes were searched against the Refseq protein data from different organisms. All blast hits were filtered, and only sequences with blast score .150 and length .50 were examined.
After that, sequences were tested using reciprocal blast search. A gene was assigned as a homologous gene if the best hit of one blast search matched the best hit of the other. If the sequences were not found reciprocally in two genomes, a gene was assigned as a homologous gene if it is with the best coverage, with score .150, relative identity .30% and relative similarity .40%. If the parameters of a protein were lower or domains of a protein were not similar to its Refseq protein, we would assign it as a ''non homologous gene''. When we did not find an ortholog, we verified the lack of orthologous sequences with tblastn against the genome sequence and EST libraries of the relevant organism [28,29]. These above dates have been shown in Table S3.

Multiple Sequence Alignment and phylogenetic analysis
Multiple sequence alignment was performed for orthologs sequences of TLR signaling pathway using MUSCLE [30] with default parameters, which were manually curated if necessary. Gene structure and position of motifs were checked by hand using data from Entrez gene, and conservation of motifs was predicted by SMART software (http://smart.embl-heidelberg.de/). Phylogenetic analysis was conducted using MrBayes [31] for bayes analysis with mixed amino acid substitution model, and using PhyML [32] for ML analysis with default parameters. Phylogeny support was verified with the bootstrap consensus tree inferred from 1000 replicates.

Codons-Based sequence Analysis
Nonsynonymous/synonymous substitution rate ratio (v = dn/ ds) can provide a measurement for the change of selective pressures. Respectively, v = 1, ,1 and .1 will indicate neutral evolution, purifying selection, and positive selection on the target gene. Here, we used a codon-substitution model implemented in the CODEML program in the PAML4.4 software package [33] to analyze changes of selective pressure, which allow for variable selection patterns among amino acid sites, M0 (one ratio), M1a (nearly neutral), M2a (positive selection), M7 (beta), M8 (beta & v), M8a to test for the presence of sites under positive selection. We further tested for the presence of codons evolving under positive selection by contrasting the M1a and M2a models, and the M7 and M8, M8 and M8a models by the likelihood ratio tests (LRTs). Otherwise, to explore the divergence of different branch of NF-kB in the evolution history, we also used branch model of codeml soft to compute the nonsynonymous/synonymous substitution rate ratio of different branches.

Multivariate analysis
To analyze the evolution of each gene within the context of the structure in the network, we computed differently topological parameters and explored their correlation with the evolutionary rates of genes given by model M0. Some studies indicated that selective constraint levels of the different position of the biology networks might be affected by different factors, including protein length [34], connectivity [4], codon bias [35], and so on. Thus, we performed a multivariate analysis considering dn, v, the pathway position, and some other parameters influencing selection levels (codon bias, protein length and connectivity). First, we evaluated whether there is significant correlation among these parameters using Spearman's rank correlation coefficient (r). Then, we analyzed the data using partial correlations and path analysis. Path analysis is an extension of multiple regression analysis that allows decomposing the regression coefficients into their direct and indirect components by considering an underlying user-defined causal model, and to assess the statistical significance of the relevantly direct components. We therefore performed path analysis to find which is the main factor influencing the trends of the v values. Connectivity, pathway position, protein length were considered as exogenous variables, whilst the ENC (effective number of codons ), dn and v were considered as endogenous variable. All these analyses were conducted using the PASW statistical software. Connectivity was estimated by the number of PPIs which data from the human interaction network of Bossi and Lehner [36]. The codon usage bias of each orthologous group gene was measured by the median of the ENC across all studied species. ENC values were computed using the DnaSP 5.10.01 software [37].

Results and Discussion
Analysis of the evolutionary rates The selection pressures acting on genes can be inferred by nonsynonymous/synonymous substitution rates. For each gene in the TLR signaling pathway, the M0 model estimated a single nonsynonymous/synonymous substitution rate (v = dn/ds) for all lineages.
The v values of genes in the TLR signaling pathway ranged from 0.00721 (MAPK9) to 0.25817 (TLR4) using the M0 model (Table 1). These results indicated that TLR signaling pathway-related genes have undergone strong purifying selection with strong functional constrains. In addition, we found the variation pattern along the TLR signaling pathway was unlikely to predominantly result from positive selection acting on the upstream genes (Table S2), but may be caused by increasing levels of purifying selection [1].

The strength of the selective constraints and pathway structure
Since genes in the TLR signaling pathway had a clear sequentiality (Fig.1), the correlation between the evolutionary rate of a gene and its position in the pathway was tested. The results demonstrated that the v value of a gene was strongly negatively   Table 2 and Fig. 2), and indicated that the downstream genes evolved more slowly than the upstream ones did. Similar results were also observed along the insulin/Tor pathway in Drosophila [2], and in vertebrates [14], as well as along the N-Glycosylation metabolic pathway in primates [16]. Our findings suggested that the topological structure of the TLR signaling pathway influenced the distributions of selective constraints of genes. One possible explanation was that in signal transduction pathway, evolutionary changes might preferentially localize to the receptor which interacted with the external environment of cells, while downstream elements which were located within a more stable cytoplasmic milieu might be expected to be under stronger purifying selection. Consistently, a global analysis of human-signaling pathways showed that purifying selection increased from the extracellular space to the nucleus [38].
Analyses were further performed to determine the relationship between the nonsynonymous substitution rate (dn) and synonymous substitution rate (ds) of genes and their position in the pathway. Spearman rank's correlation analysis showed that the gene position was significantly correlated with the dn (r = 0.439, P,0.001), but non-significantly correlated with the ds (r = 0.243, P = 0.089). The results implied that nonsynonynous substitution rate changes might be the main contributor to the above negative tendency.
The selective constraints on the pathway genes could be affected by other factors. The association between evolution rate of gene in the TLR signaling pathway with its codon usage, connectivity and protein length were explored (Table 2). A simple Spearman's rank  In addition, as shown in Table 2, the position of gene in the TLR signaling pathway was significantly correlated with the v value, dn (r = 20.439, P,0.001), codon bias (r = 20.321, P = 0.02), protein length (r = 20.447, P,0.001) and connectivity (r = 0.479, P,0.001). And the dn vaule was significantly correlated with protein length (r = 0.48, P,0.001). The ds vaule was also significantly negatively correlated with codon bias (r = 20.306, P = 0.03), and significantly positively correlated with connectivity (r = 0.285, P = 0.05). The protein length was significantly positively correlated with codon bias (r = 0.358, P = 0.01), and significantly negatively correlated with connectivity (r = 20.308, P = 0.03).

Multivariate analysis
To clarify whether the observed correlations resulted from indirect or direct effects, two multivariate analysis techniques were applied (partial correlation analysis and path analysis) to evaluate the association between the pathway position and v, dn, and ds values controlling for the factors discussed above. Partial correlation revealed that when controlling for gene position, the correlation between connectivity and v vaule was non-significant (r = 20.103, P = 0.48), but the correlation between position and v was still significant when controlling for connectivity (r = 20.676, P,0.001). Similarly, when controlling for v, the correlation between position and connectivity was non-significant (r = 0.136, P = 0.353), the correlation between dn and position was also nonsignificant (r = 0.095, P = 0.516). Instead, the correlation between dn and v holded significant when controlling for the position and connectivity (r = 0.738, P,0.001). This result supported that the main observed effect was the correlation between dn and v. The dn and position of genes in TLR signaling pathway were very important factors affecting selection pressure of TLR signaling pathway-related genes.
Recent studies showed that protein length was not the main factor to affect the pathway evolution process [2,14]. However, as shown in table 2, protein length was significantly correlated with all other factors except ds. To address this issue, partial correlation analysis was applied to evaluate the association between protein length and other factors. When controlling for v, the protein length was not correlation with position (r = 20.215, P = 0.138), dn (r = 0.05, P = 0.735), and connectivity (r = 0.001, P = 0.995). Similarly, when controlling for connectivity, codon bias, position, respectively, the correlation between v and protein length consistently showed significant. Therefore, the v was a main factor in our study, and the protein length could correlate with other factors indirectly. Recently, many studies showed that protein length appeared to be an important factor virtually influencing all aspects of molecular evolution [34,39,40]. Thus, protein length may influence the evolution of the TLR signaling pathway of vertebrate.
To better characterize the relationships within these variables, a path analysis under the model was performed and the result was presented in Figure 3. The path analysis revealed that the v value was affected by the dn and position of the gene in the pathway (standardized path coefficient, dn: b = 0.649, P,0.001; position: b = 20.269, P = 0.006), even after removing the effects of putatively relevant factors (codon bias and protein length). The dn value was positive associated with protein length (b = 0.347, P = 0.016) and negative associated with position (b = 20.448, P = 0.002) and codon bias (b = 20.290, P = 0.026). And the protein length could affect the v value through affecting the dn value. Among the features that described the network, the dn was the main factor shaping the rate of the natural selection. Nevertheless, these factors affecting protein evolution in TLR signaling pathway were complex, with interplay of many factors.
The evolutionary origin of the NF-kB-mediated TLR signaling pathway TLR signaling pathway played an important role not only in innate immunity in vertebrate and insect, but also in development of embryos dorso-ventral pattern at Drosophila [18,41]. However, the evolutionary origin of the TLR signaling pathway is still unknown. To explore the evolutionary origin of the TLR signaling pathway, blastp and tblastn were employed to search the orthologs genes of TLR signaling pathway in genomes of single-celled eukaryotes (Saccharomyces cerevisiae, Monosiga brevicollis), sponges (Amphimedon queenslandica), sea anemone (Nematostella vectensis), hydra (Hydra magnipapillata), nematode (Caenorhabditis elegans), fruit fly (Drosophila melanogaster), amphioxus (Branchiostoma floridae), zebrafish (Danio rerio), frog (Xenopus tropicalis), and human (Homo sapiens). Compared to NF-kB which was not been found in prokaryote [25], the MAPK signaling pathway was involved in invasive growth and detected in prokaryote [42]. Thus, MAPK signaling pathway was not taken into account. Figure 4 and Figure 5 depicted the composition of evolution history of the TLR signaling pathway from sponges to vertebrates. In Saccharomyces cerevisiae and Monosiga brevicollis genomes, homol- ogous of the core pathway were not found, such as toll-like receptor genes and NF-kB genes (data not shown). Therefore, convincing evidence was lacking for the TLR signalling pathway in single-celled eukaryotes. Recently studies showed that A20/ AN1 zinc-finger domain-containing family proteins were well characterized and known to play a central role in regulating the immune response in animals and responding to different types of stresses including cold, desiccation, salt, submergence, heavy metals, wounding, and the stress hormone in plants [43,44,45]. Interestingly, three A20/AN1 genes were found in Monosiga brevicollis [46], suggesting Monosiga brevicollis could be resistant to pathogen infections. Meanwhile, a single C1qDC gene, which was regarded as a major connecting link between innate and acquired immunity, and Tyrosine kinases (TyrK) proteins, which was essential for cell-cell communication in animals, mediating hormone, growth factor, immune, and adhesion-based signaling, had been detected in the choanoflagellate Monosiga brevicollis genome [47,48].
In the metazoan clade, a relatively complete TLR signaling pathway was identified in Amphimedon queenslandica genome, including some core genes of pathway, such as 2 TLRs, MyD88, SARM1, TRAF, IKK, IkB and NF-kB. This finding suggested that the TLR signaling pathway might origin from early metazoan. Interestingly, the domain analysis showed that TLRs in sponge contained a TIR domain, a transmembrane anchor and two Ig domains. Unlike vertebrates in which TIR and LRR domains were found in a single protein sequence of TLRs, Amphimedon queenslandica genome codes the TIR and LRR domains were not detected in a single protein sequence [23]. However, SDTLR, a TLR-like gene in the sponge Suberites domuncula [49], only had a TIR domain and a transmembrane anchor, and contained no Ig and LRR domains. Therefore, those TLR-like genes might be the ancestor gene of TLRs in the evolution history. Otherwise, the gene structure and motifs analysis of Rel/NF-kB family showed that NF-kB protein had 7 ANK repeats in the Amphimedon, 6 ANK repeats in vertebrates and none ANK repeats in sea anemone (Fig 6). If orthologs were not found, the ortholog place marked a question mark (?) with a circle. Divergence times of major metazoa groups are presented next to the relevant branches. Our result shows that the complete core pathway known from bilaterians is represented in the sponge Amphimedon queenslandica. The pathway is mostly absent in non-metazoans, Saccharomyces cerevisiae and Monosiga brevicollis. The nematode Caenorhabditis elegans and its relatives Caenorhabditis Brigssae are deficient in several of the pathway components. This may indicate of a functional loss of major components in nematodes. doi:10.1371/journal.pone.0051657.g005 Similar TLR signaling pathway topology was found between Nematostella vectensis and Hydra magnipapillata. The cnidarians Nematostella vectensis is considered to be the closest outgroup to the bilaterians among the known extant taxa [50,51]. At least two TLR-like genes were detected in Nematostella vectensis and Hydra magnipapillata. Like the TLR-like genes in Amphimedon queenslandica, the TLR-like genes in Nematostella vectensis and Hydra magnipapillata had a TIR domain and a transmembrane anchor without any LRR domains. Unexpectedly, IRAK orthologous was not detected in two animals: Nematostella vectensis and Hydra magnipapillata. This phenomenon might be due to either the too low identity to be detected between the IRAK-like gene in these two animals and human IRAK genes using reciprocal blast, or incompleteness of gene ontology of these two animals' genomes or incompleteness of the genomes of these two animals. The IKK (I kappa B kinase) gene had gene duplication in Hydra magnipapillata ( Fig. 4 and Fig. 5). Interestingly, although NF-kB contained no ANK repeats and Death domain in Hydra magnipapillata and Nematostella vectensis, the blast result showed that Hydra magnipapillata and Nematostella vectensis NF-kB genes best matched with human NFKB1 gene, but not Rel subfamily genes. Gene structure analysis showed that RHD domain and IPT domain of Nematostella vectensis NF-kB gene had same number of introns with mammal NF-kB subfamily, with 4 introns and 2 introns, respectively. Further analysis of intron phase showed that the intron phase was also similar with other NF-kB subfamily genes, with 0, 0, 2, 1 and 1, 0 in turn, respectively. However, these two genes had the highest identity and similarity to Rel subfamily genes (Table S4). Especially, the Nematostella vectensis NF-kB gene branch had undergone very strong positive selection (v = 65.918) possibly due to the shift of gene structure and motifs (Fig S1). These structures and motifs change of Hydra magnipapillata NF-kB gene might be one of the reasons for these two genes belong to neither Rel subfamily nor NF-kB subfamily in the evolutionary history. We inferred that although the Hydra magnipapillata and Nematostella vectensis NF-kB genes best matched with NF-kB subfamily genes, it might be an ancestor gene of vertebrate Rel subfamily genes.
Similar to other major pathways, the components and complexity of TLR signaling pathway increased from insect to mammal with the evolution of species and changed in the living environment. Firstly, the TLR, IKK, IkB and NF-kB genes underwent duplication events during their evolutionary history. This might have conferred more flexibility to the pathway as the various paralogs could allow fine-tuning of function in different organs [29]. Secondly, the adaptor molecular genes had been enlarged in the evolution history. Our analysis pointed out that the MAL-like gene and the TRAM-like gene were first detected in amphioxus; and the TRIF was first detected in vertebrate. The diversity of the toll-like receptors and the adaptor molecules might connect with recognition of pathogens. Thirdly, in addition to duplication event of NF-kB genes, the gene structure and conservation motif of NF-kB genes had shifted in the evolution history. For instances, NF-kB genes in Hydra magnipapillata and Nematostella vectensis only contains RHD domain and IPT domain without any ANK repeats and Death domain and Relish gene in Drosophila melanogaster without coding Death domain has only two exons compared to other NF-kB subfamily genes containing at least 18 exons ( Figure 6).
In summary, we found that the selection constraint of TLR signaling pathway-related genes was negatively correlated with its position and all genes were highly conserved and underwent relatively strong purifying selection. The distribution of selective pressure along the pathway was driven by differential nonsynonymous substitution level. More importantly, we provided supporting evidences to show that the TLR signaling pathway might present in a common ancestor of sponges and eumetazoa, with gene duplication events, adaptor molecular enlarged, gene structure and conservation motif of NF-kB genes shifted in the their evolutionary history. In addition, Hydra magnipapillata and Nematostella vectensis NF-kB genes only contained RHD domain and IPT domain, but lost ANK repeats and Death domain; Drosophila melanogaster Relish gene lost Death domain and had only two exons, while other NF-kB subfamily genes had at least 18 exons. Despite NF-kB genes had strong selection pressure in the evolution, they had positive selection in some branches of the evolutionary tree. It might be connected with the genes duplication, genes structure shifted and domains lost. Therefore, the gene duplication might be connection with positive selection of specific branches. These results help us to understand the evolution history of TLR signal pathway and gene structure and conservation motif shifted of NF-kB. Our results are helpful for better understanding the selection constraint and evolutionary history of TLR signaling pathway. Figure S1 Phylogenetic relationship of NF-kB family genes. All sequences were downloaded from NCBI, and aligned with MUSCLE. Trees were constructed as described in methods. The bootstrap consensus tree inferred from 1000 replicates. BI tree and ML tree have similar topology, with two clusters: Rel subfamily and NF-kB subfamily, respectively. The pentagram represents the branch underwent very strong positive selection in the evolutionary history.

Table S4
The identity and similarity between any two NF-kB family genes in different organisms. The upper matrix is identity, and the lower matrix is similarity.