Comparative and Evolutionary Analysis of the Interleukin 17 Gene Family in Invertebrates

Interleukin 17 (IL-17) is an important pro-inflammatory cytokine and plays critical roles in the immune response to pathogens and in the pathogenesis of inflammatory and autoimmune diseases. Despite its important functions, the origin and evolution of IL-17 in animal phyla have not been characterized. As determined in this study, the distribution of the IL-17 family among 10 invertebrate species and 7 vertebrate species suggests that the IL-17 gene may have originated from Nematoda but is absent from Saccoglossus kowalevskii (Hemichordata) and Insecta. Moreover, the gene number, protein length and domain number of IL-17 differ widely. A comparison of IL-17-containing domains and conserved motifs indicated somewhat low amino acid sequence similarity but high conservation at the motif level, although some motifs were lost in certain species. The third disulfide bond for the cystine knot fold is formed by two cysteine residues in invertebrates, but these have been replaced by two serine residues in Chordata and vertebrates. One third of invertebrate IL-17 proteins were found to have no predicted signal peptide. Furthermore, an analysis of phylogenetic trees and exon–intron structures indicated that the IL-17 family lacks conservation and displays high divergence. These results suggest that invertebrate IL-17 proteins have undergone complex differentiation and that their members may have developed novel functions during evolution.


Introduction
Interleukin 17 (IL-17) is an important pro-inflammatory cytokine and is a critical component of the immune response to pathogens and in the pathogenesis of inflammatory and autoimmune diseases [1][2][3]. IL-17 was initially identified as a cytokine secreted by T helper 17 (TH17) cells as one of its signature cytokines, and recent findings have indicated that IL-17 is also produced by other cell types, particularly by the innate immune cell populations involved in the inflammatory process [4]. IL-17 was first cloned and identified as cytotoxic T-lymphocyte (CTL)-associated antigen 8 (CTLA-8), a T-cell-derived cytokine with 58% identity to predicted open reading frame 13, HSVS13, of the T-lymphotropic Herpesvirus saimiri (known as virus IL-17) [5,6]. Six IL-17 family members, IL-17A (the original IL-17), IL-17B, IL-17C, IL-17D,  IL-17E (also known as IL-25) and IL-17F, have since been identified, and these proteins range in size from 20 to 30 kDa [7]. Among these family members, IL-17A and IL-17F share the highest amino acid sequence identity (50%), whereas IL-17E is the most divergent, showing 16% identity with IL-17A. Moreover, a novel type of IL-17 family gene (IL-17N) has recently been identified in teleosts [8]. Amino acid similarity among the family members is higher in the C terminus and in five spatially conserved cysteine residues, four of which form a cystine knot fold that forms two intrachain disulfide bonds. This cystine knot fold is similar to the canonical cystine knot observed in growth factors such as transforming growth factor (TGF)-β, endocrine glycoprotein hormones (e.g. chorionic gonadotrophin), platelet-derived growth factors (PDGFs), nerve growth factor (NGF) and other neurotrophins with six cysteines rather than four [9,10].
Among the IL-17 family members, IL-17A and IL-17F are the best characterized, followed by IL-17C and IL-17E, while IL-17B and IL-17D have remained understudied [1]. Mechanistically, the biologically active form of IL-17 is a 35-kDa homodimer or heterodimer whose activity is dependent on the single-pass transmembrane receptors, IL-17 receptors (IL-17Rs), which have several conserved structural features, including an extracellular fibronectin III-like domain and a cytoplasmic SEF (similar expression to FGF)/IL-17R (SEFIR) domain. The IL-17Rs, as well as the cognate IL-17 family, have little homology with any other known receptors or ligands and therefore are thought to represent a distinct ligand-receptor signaling system that is highly conserved across vertebrate evolution. However, the exact mechanisms of IL-17 signaling have not been fully elucidated [1,11].
Despite an accumulation of knowledge of the functions of IL-17 and their regulatory pathways, the number of pathways involving the IL-17 family remains unclear [1,2,12]. Some members of the IL-17 family are highly conserved among vertebrate organisms, but evolutionary analysis of the family has mainly been limited to vertebrates and a handful of invertebrates [8,13], and little is known about its origin and evolution in animal phyla. For example, given that homology among IL-17 family members is only 16-50%, perhaps the IL-17A-like genes in some phyla may be too dissimilar to be identified but, interestingly, IL-17D has shown some degree of homology with IL-17-like proteins in primitive phyla such as worms [4]. The identification of similarities and differences in the IL-17 family among animal phyla, particularly invertebrates, could facilitate the elucidation of the functional evolution of this family, as well as allowing further functional verification. The recent large-scale sequencing of the transcriptomes and genomes of invertebrate species [14], particularly non-model organisms [15][16][17], represents a global survey that can be used to investigate IL-17 family members. For instance, in the purple sea urchin, about thirty IL-17 genes and two receptor genes were identified. Many of the ligands are linked in tandem arrays [18]. In this study, we determined the distribution of the IL-17 family among invertebrates, analyzed their exon-intron structures and phylogenetic trees, and explored their origin and evolutionary history in animal phyla.

Ethics statement
No specific permits were required for the field studies described, and the field studies did not involve endangered or protected species.

Identification of IL-17 genes
BLAST searching methods were used to identify IL-17 proteins. The amino acid sequences of the IL-17 domain previously identified in P. fucata IL-17 (JX971444) and C. gigas IL-17 (ABO93467) were used as query sequences to BLAST against the protein database of each genome for the species mentioned above [19,20]. The threshold E-value was set to range from 3 to 10 with 50 maximum target sequences, to identify a maximal number of candidate sequences, and other parameters were left at the default values. After the corresponding hits were downloaded from the BLAST results, the sequences were examined using the NCBI CDS program (Batch CD-search, http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) with default cutoff parameters to remove sequences that did not contain the IL-17 domain. For the maximum target sequences obtained, the hit sequence with the maximum numeric Evalue was used as the query sequence to BLAST against the protein database of the corresponding species. The sequences from the genome of each species were analyzed independently using Clustal Omega Multiple Sequence Alignment (http://www.ebi.ac.uk/Tools/msa/ clustalo/) to eliminate redundant sequences. To simplify the presentation and subsequent discussion, the longest isoform sequence was retained, and other isoforms were removed. For incomplete sequences containing the complete IL-17 domain, only the longest sequence was retained.

Sequence analysis and amino acid alignment
Batch CD-search was used to analyze the domain among the IL-17 protein sequences identified, and MEME 4.9.1 (Motif-based sequence analysis tools, http://meme.nbcr.net/meme/) was used to identify motifs in the IL-17 protein sequences. Signal peptides were predicted using Sig-nalP 4.1 [21]. Comparison and phylogenetic analysis were performed using Clustal Omega multiple sequence alignment and the MEGA 6.06 software using neighbor-joining (NJ) methods and performing 10,000 bootstrap replications [22].

Exon-intron structure and location of IL-17 genes
For the IL-17 amino acid sequences, the corresponding nuclear sequences, including the EST and genomic sequences, were obtained. Spidey, an mRNA-to-genomic alignment program (http://www.ncbi.nlm.nih.gov/spidey/), was used to analyze exon-intron structures. Owing to the use of draft genomes, some IL-17 exon-intron structures were not available. Meanwhile, the genomic location of IL-17 genes were analyzed, using the NCBI mapview browsers.
Furthermore, this table indicates that the length of invertebrate IL-17 proteins generally ranges from 100 to 250 amino acids but fluctuates greatly when compared with vertebrate homologs. More drastic changes are found at the EST or exon sequence level, ranging from a few hundred to about two thousand base pairs, although the sequence data of some species mentioned above are insufficient or contain errors. The IL-17 domains generally contain approximately 70 amino acids and are located in the C-terminal region of the sequences. Interestingly, there are some exceptions: 1) three IL-17 superfamily domains with repetitive protein sequences in S. purpuratus SPU_019350.1; 2) two IL-17 superfamily domains with different protein sequences in S. purpuratus SPU_022838.1; 3) one IL-17 domain that partially overlaps with an incomplete YccV-like superfamily domain in C. teleta 209749; and 4) multi-domains with the N-terminal anticodon recognition domain of lysyl-tRNA synthetases (LysRS_N), the IL-17 superfamily, incomplete lysyl-tRNA synthetases, and the Class II tRNA amino-acyl synthetase-like catalytic core domain (LysRS_core) in B. floridae 132638. In addition, some IL-17 proteins, including C. elegans protein C44B12.6, isoform a (CDH93392.1), P. fucata 8548.1_09780.t1 and S. purpuratus SPU_030197.1, contain incomplete IL-17 domains and are listed in S1 Dataset but not Table 1. These results suggest that IL-17 protein sequences have undergone rapid and continual changes which may have led to a change in their function.

Conserved residues and motifs in IL-17 proteins
To clarify the relationships among IL-17 proteins from different species, multiple alignment analysis of the IL-17 domains was performed using Clustal Omega. The results indicated that the distribution of amino acid residues is not conserved in IL-17 domains, as illustrated in Fig  1, or in full-length invertebrate IL-17 proteins (data not shown). However, five cysteine residues (marked with arrows) were basically conserved, four (red arrows) of which are important for the cystine knot fold. Remarkably, there is a third disulfide bond for the cystine knot fold that is formed by the two cysteine residues in invertebrates, except for Chordata (B. floridae and C. intestinalis), in which the cysteine residues have been replaced by two serine residues (red rhombus). MEME was performed to discover conserved motifs within the IL-17 proteins and IL-17 domains. The sequences of the motifs in IL-17 domains are presented in Fig 2 and the combined motif block diagrams are shown in S1 Fig. From Fig 2,  In addition, a comparison of the motifs in IL-17 proteins and IL-17 domains indicated that these motifs were primarily located in IL-17 domains, suggesting that, although the amino acid sequence identity of IL-17 proteins is rather low, they exhibit greater conservation at the motif level.
Meanwhile, SignalP was performed to predict signal peptides at the N-terminal IL-17 proteins. As shown in Table 1, in vertebrates, 33 out of 34 IL-17 proteins had a predicted signal peptide, except for X. tropicalis IL-17F. In contrast 32 out of 54 of invertebrate IL-17 proteins had the predicted signal peptide, while 1/3 (18 out of 54 IL-17 proteins) had no signal peptide, and 6 IL-17 proteins were unknown due to their incomplete protein sequences. The results indicated that many of IL-17 proteins in invertebrates have no predicted signal peptide, suggesting that they might be not be secreted proteins.

Phylogenetic analysis and classification of invertebrate IL-17 proteins
To investigate the potential evolutionary relationships of the IL-17 family, phylogenetic trees were constructed based on the amino acid sequences of the full-length proteins. The phylogenetic tree based on full-length sequences in the NJ analysis was divided into many subgroups (Fig 3). Nearly all vertebrate IL-17 proteins were located in one subgroup (the light green area), in agreement with the phylogenetic tree of vertebrate IL-17 proteins presented in S2 Fig. In addition, many of the invertebrate IL-17 proteins form a large group subsequently divided into several subgroups. In general, the IL-17 proteins from a single species were distributed over different groups. These results indicate that, during evolution, invertebrate IL-17 proteins underwent complex differentiation and include far more than the 7 members (IL-17A-F and IL-17N) found in vertebrates, suggesting that these IL-17 proteins may have developed novel functions during evolution.

Exon-intron structure and location of IL-17genes
The exon-intron structure of IL-17 genes in invertebrates and vertebrates was examined to obtain further insight into the possible structural evolution of these genes. As shown in Table 1 and Fig 3, in vertebrates, 29 out of 34 IL-17 genes had two introns, while three members contained only one intron and two members had three introns. By contrast, in invertebrates, the intron number of IL-17 was more variable, but generally (49 of 54) ranged from 0 to 3. The exceptions were genes with 4 introns (C.

Discussion
As an important regulatory cytokine, IL-17 is involved in and mediates cell-cell communication for many biological processes, particularly host defense responses and inflammatory diseases [1,2]. However, the functions and characteristics of the invertebrate IL-17 family have not been well characterized [13,14,19,23]. The recent release of a number of invertebrate genome databases may provide new insights into the IL-17 family. In the present study, we identified and summarized 54 IL-17-encoding genes in invertebrates and compared them with 28 vertebrate homologs, to investigate their origin and diversification. IL-17 genes were identified in invertebrates including Nematoda (C. briggsae and C. elegans), Annelida (C. teleta), Mollusca (L. gigantean, C. gigas and P. fucata), Arthropoda (D. pulex), Echinodermata (S. purpuratus) and Chordata (C. intestinalis and B. floridae) but were absent from Porifera (A. queenslandica), Cnidaria (N. vectensis and H. magnipapillata), Hemichordata (S. kowalevskii), Placozoa (T. adhaerens) and Insecta (such as A. pisum, A. mellifera, and D. melanogaster), as well as Protozoa. The number of IL-17 genes in each species was highly variable, ranging from 1 (C. briggsae) to 12 (P. fucata), which may reflect their unusually high evolutionary rate (Table 1). While the absence of the cytokine IL-17 family, which functions in cell-cell communication, in Protozoa and simple, ancient lower invertebrates such as A. queenslandica and H. magnipapillata was not unanticipated, it is puzzling that IL-17 genes were missing from Hemichordata (S. kowalevskii) and relatively high insects. This result is partially supported by a report by Simakov et al. that, although mollusks and annelids are related to flies, nematodes and flatworms within the protostomes, the genome organization, gene structure and functional content of these species are in many ways more similar to those of invertebrate deuterostomes (such as amphioxus and sea urchin) [16]. These similarities include features of bilaterian and/ or metazoan genomes that have been lost or diverged in many protostome genomes.
Furthermore, immune gene families are usually under more intense evolutionary pressure, and rapid evolutionary changes are frequently observed for effector proteins such as cytokine IL-17 [24,25]. In this study, the length and domain number of some IL-17 proteins varied greatly, suggesting broadened or reduced functions. For example, B. floridae 132638 contains not only the IL-17 domain but also the LysRS_N and incomplete LysRS core domain. LysRS_N is a beta-barrel domain (OB fold) involved in binding the tRNA anticodon stem-loop. LysRS enzymes are homodimeric class 2b aminoacyl-tRNA synthetases (aaRSs), which catalyze the specific attachment of amino acids to their cognate tRNAs during protein biosynthesis [26]. IL-17 enhances the expression of multiple pro-inflammatory cytokines, particularly members of the CXC chemokine family, through mRNA stabilization via an AUUUA/Tristetraprolinindependent sequence [27,28]. By contrast, some IL-17 proteins contain incomplete IL-17 domains (S1 Dataset). This study also demonstrated that, although the amino acid sequence similarities of the IL-17 proteins were rather low, the motifs were highly conserved, although some motifs were lost in certain species. Given that these conserved motifs are located, to a great extent, in IL-17 domains, they provide the base for IL-17 domains and proteins. Significantly, there is a third disulfide bond for the cystine knot fold in invertebrate IL-17 proteins, suggesting that they may possess the canonical disulfides of the cystine knot, which belongs to the canonical cystine knot fold superfamily, with members such as the NGF subfamily; This is until in Chordata (B. floridae and C. intestinalis), where the two cysteine residues have been replaced by the corresponding serine residues [29,30]. Unlike almost all vertebrate IL-17 proteins, which contain a predicted signal peptide, a significant proportion of those of invertebrates have no predicted signal peptide. The secretory signal peptide targets its passenger protein for translocation across the endoplasmic reticulum membrane in eukaryotes and the cytoplasmic membrane in prokaryotes [31]. The invertebrate IL-17 proteins without a predicted signal peptide may perform a different function from that of their vertebrate counterparts. Furthermore, some IL-17 genes were found to exhibit conserved synteny, which reveals a close evolutionary relationship between two genes or even two species and suggests that they may be derived from a common ancestor. This may also partially explain why IL-17A-like genes in some phyla may be too dissimilar to be identified. These results suggest that IL-17 proteins and their functions have been continuously undergoing dynamic change through evolution.
Previous studies of genomic organization involving phylogenetic analysis have revealed that the genomic organization of the vertebrate IL-17 family has been basically conserved through evolution [8,13]. In mammals, the IL-17 family is generally divided into six members (IL-17A-F) or subgroups, and IL-17N is also present in fish. Furthermore, each member of the IL-17 family has different functions, with the exception of IL-17A and IL-17F. In this study, phylogenetic analysis indicated that there are many subgroups of the IL-17 family in invertebrates that likely produce numerous IL-17 family members, far more than the 7 known members in vertebrates (IL-17A-F and IL-17N), which suggests that the invertebrate proteins have undergone high divergence, including in their function. Additionally, introns may affect gene expression by increasing the time required to transcribe the gene, and intron-containing and intronless versions of otherwise identical genes can exhibit dramatically different expression profiles [32,33]. While there is no universal intron requirement for eukaryotic gene expression, in many cases transgene expression can be dramatically increased by the addition of just one generic intron to the cDNA [34,35]. This may give a partial explanation for the change in the number of IL-17 introns from invertebrates to vertebrates. Although intron evolution is a dynamic process in eukaryotes [36], the comparison of IL-17 family gene organization revealed that the IL-17 family gene has not been very highly conserved throughout evolution. The more drastic changes in the exons also strengthen this observation. In general, from the perspective of both phylogenetics and genomic organization, the IL-17 family lacks conservation and exhibits high divergence, suggesting that invertebrate IL-17 proteins have undergone complex differentiation and that their members may have developed novel functions during evolution.
In the progression from unicellular protozoans to multicellular animals, the capability for more advanced and complicated communication and cooperation among cells was acquired. Some cytokines, such as tumor necrosis factor (TNF)-α, appeared early in primitive invertebrates [37,38] and, therefore, it is likely that the emerging IL-17 gene family may have fulfilled the increased demand for more complex regulation in relatively high multicellular animals. New genes must be integrated with other novel and existing genes to evolve expanded or modified biochemical pathways and/or regulatory networks [39]. Accordingly, the IL-17 family functions via its receptor IL-17R, a specific cell surface receptor, thus forming a distinct ligandreceptor signaling system to induce downstream signaling. In mollusks, IL-17 family genes participate in the immune response to stimulation [19,23]. Therefore, IL-17 may also play a vital role in invertebrate inflammatory reactions. Inexplicably, other IL members have only arisen in lower vertebrates and not invertebrates, whereas some ILRs are found only in invertebrates [14]. However, why the IL-17 gene and not another IL member was selected during early evolution remains unclear.
So far, five members of the IL-17R family (IL-17RA-IL-17RE) have been identified, and are thought to consist of homodimers or heterodimers. Among them, the heterodimer of IL-17RA and IL-17RC is a receptor for homodimers and heterodimers of IL-17A and IL-17F, whereas the heterodimer consisting of IL-17RA and IL-17RB serves as a receptor for IL-17E. IL-17B binds to IL-17RB, and IL-17C was recently reported to bind to IL-17RE and to activate NF-κB. The receptor for IL-17RD has yet to be identified [10,40]. Specifically, a mechanism of complex formation has been presented, such that two fibronectin-type domains of IL-17RA engage IL-17F in a groove within the IL-17F homodimer interface [41]. The IL-17R family mediates a signal pathway that serves as a bridge between innate and adaptive immune responses [40]. However, these receptors are rarely isolated from invertebrates. The signal transduction pathway mediated by IL-17 and IL-17R remains poorly defined, particularly in invertebrates, and there is still much to learn about the structures and functions of IL-17 and IL-17R and their characteristics and nature during evolution.
In conclusion, this study provided a global survey to investigate the distribution of the IL-17 family among invertebrates, revealed the features of their motifs and signal peptides. Meanwhile, phylogenetic trees and their exon-intron structures were analyzed, and their origin and evolutionary history in animal phyla were explored. The results of this study suggest that, during evolution, invertebrate IL-17 proteins have undergone complex differentiation, and that their members may have developed novel functions. The findings provide direction for future studies of the functions of the IL-17 family.