Adaptive Evolution of Toll-Like Receptors (TLRs) in the Family Suidae

Members of the family Suidae have diverged over extended evolutionary periods in diverse environments, suggesting that adaptation in response to endemic infectious agents may have occurred. Toll-like receptors (TLRs) comprise a multigene family that acts as the first line of defense against infectious microbes at the host-environment interface. We hypothesized that across the Suidae, positive selection mediated by infectious agents has contributed to the evolution of TLR diversity. Thus, we analyzed Sus scrofa, Sus barbatus, Sus verrucosus, Sus celebensis, Sus scebifrons, Babyrousa babyrussa, Potamochoerus larvatus, Potamochoerus porcus and Phacochoerus africanus genomes. Specifically, analyses were performed to identify evidence of positive selection using Maximum likelihood (ML) methods within a phylogenetic framework for bacterial and viral sensing Suidae TLR extracellular domains. Our analyses did not reveal evidence of positive selection for TLR3 and TLR7, suggesting strong functional conservation among these two genes for members of the Suidae. Positive selection was inferred for Suidae TLR1, TLR2, TLR6 and TLR8 evolution. ML methods identified amino acid sites of the bacterial sensing TLR1, TLR2, TLR6 and the viral sensing TLR8 to be under persistent positive selection. Some of these sites are in close proximity to functionally relevant sites, further strengthening the case for pathogen mediated selection for these sites. The branch leading to the genus Sus demonstrated evidence of episodic positive selection for TLR1, indicating selection mediated by infectious agents encountered within the specific geographic origin of the Sus. These results indicate that species of the Suidae have positively selected residues within functional domains of TLRs reflective of prior infections. Thus, TLR genes represent candidates for experimental validation to determine their functional role in antibacterial and antiviral activity within members of the Suidae.


Introduction
Bacterial and viral infectious diseases constitute a significant threat to host survival. Host species have developed various strategies to combat these threats, including the development of innate and acquired immune defenses. The innate immune system provides an immediate defensive response against pathogenic infections while the acquired immune system response to pathogenic infections may require weeks to develop. As pathogens evolve to subvert the host immune response, host immune genes evolve in response. The arms race between hosts and microbial pathogens (host-parasite co-evolution) influences variation in the response to infectious disease agents at individuals, population, species levels and within higher order taxonomic units [1].
Adaptive evolution (positive selective) is selective pressure through a change in environment placed on a protein in order to improve the fitness of the organism in that environment [2]. With respect to vertebrate immune-related genes, studies on adaptive evolution have mainly focused on the major histocompatibility complex (MHC), cell surface glycoproteins of the acquired immune system that mediate presentation of peptides to T-cell receptors [3]. In humans, it has been shown that half of the genetic variability in immune response to infections is accounted for by non-MHC genes [4]. Most of these non-MHC genes seem to belong to the innate immune system [5], indicating that such genes may be under adaptive evolution. Phagocytic cells such as monocytes, macrophages and dendritic cells mediate the recognition of pathogens by the innate immune system through germline encoded receptors known as pattern recognition receptors (PRRs). These PRRs recognize conserved molecular features of microbes called pathogen-associated molecular patterns (PAMPs) [6,7]. Among the numerous PRRs, the Toll-like receptor family is the most widely studied. The Toll-like receptors (TLRs) are innate immunity receptors important during early phase of infections and also serve as a link between the innate and acquired immunity during host immune response [8]. Cell surface expressed TLRs (TLR1, TLR2, TLR4, TLR5 and TLR6) recognize predominantly bacterial ligands and several fungal and parasite ligands while TLR3, TLR7 and TLR8 are expressed within the endosome and recognize single and double-stranded viral RNA [9]. TLRs are type I transmembrane glycoproteins composed of an extracellular domain characterized by a leucine-rich repeat (LRR) motif responsible for binding infectious agents ligands, a transmembrane domain and an intracellular signaling domain.
Previous studies have documented purifying selection [10] and overdominant balancing selection [11] as the dominant selective pressures acting within innate immune genes including TLRs. TLRs might be under positive selection due to a co-evolutionary arms race with their microbial pathogens as they lie directly at the host-environment interface and target microbial molecules [12]. Studies at the interspecies level have found clear signatures of positive selection at codon positions across TLR genes from primate, avian and murinae species [12][13][14][15]. In the context of positive selection at the interspecies level, a distinction can be made between persistent positive selection, where selective pressure at codon positions within a gene remains constant throughout time across species and episodic positive selection where selective pressures act in a lineage specific manner [16]. In the case of persistent positive selection, the selective pressure affects most lineages within a phylogeny and is evident as codons rapidly evolving across the species in a phylogenetic tree. For episodic positive selection, codon positions under positive selective pressure within particular lineages may be neutrally or negatively evolving in other lineages. Regardless of the type of selective pressure (persistent or episodic positive selection), detection of evidence of selection of a gene region suggests a selective advantage in changing amino acid sequence in this region [2].
Members of the family Suidae have a widespread distribution. The natural occurrence of Sus scrofa (wild boar) is across most of Eurasia while all other species of the genus Sus are restricted to Southeast Asia [17]. The Babyrussa babyrussa (babyrussa) is also found in Southeast Asia and Potamochoerus larvatus (bush pig), Potamochoerus porcus (red river hog) and Phacochoerus africanus (common warthog) are restricted to sub-Saharan Africa [18]. Such diverse environments of members of the family Suidae suggests adaptation to endemic infectious disease agents may have occurred, that can be investigated as positive selection within TLR genes. However, information on how positive selection has influenced TLR genes within members of the Suidae is limited.
The aim of this study was to determine whether there is evidence of positive selective pressure in the family Suidae in a phylogenetic framework. We hypothesized that positive selection has contributed to the evolution of bacterial and viral sensing TLRs in the family Suidae. The specific aims of this study were to 1) identify evidence of persistent positive selection at TLRs across members of the family Suidae and 2) to determine whether restricted lineages within the Suidae demonstrate TLR positive selection. We focused on the bacterial sensing TLR1, TLR2 and TLR6 and viral sensing TLR3, TLR7 and TLR8 as viruses and bacteria are the dominant parasites threatening wild mammals [19]. Identifying positively selected residues within the TLR genes of members of the Suidae will yield vital information as to their adaptation to previous bacterial and viral infections. Our findings suggest that positive selection of TLRs amongst members of the Suidae has been mediated by infectious disease agents.

Study animals
Ten animals representing 9 species of the family Suidae were utilized in this study. A range map showing the natural distribution of these species is shown in Fig 1. The species Sus scrofa (wild boar) was represented by a European wild boar (Sus scrofa Europe) and a Asian wild boar (Sus scrofa Asia) to reflect the wide distribution of this species. Southeast Asian suids were represented by Sus verrucosus (javan warty pig), Sus celebensis (sulawesi warty pig), Sus scebifrons (visayan warty pig), Sus barbatus (bearded pig) and Babyrousa babyrussa (babirusa). Suidae species of African origin were represented by Potamochoerus larvatus (bush pig), Potamochoerus porcus (red river hog) and Phacochoerus africanus (common warthog).

Genes analyzed
TLR1, TLR2 and TLR6 encoding receptors for bacterial ligands and TLR3, TLR7 and TLR8 recognizing viral ligands were selected for this study. The extracellular domains were the focus since they encode the functional sites involved in pathogen ligand recognition.

DNA extraction and sequencing
DNA extraction, library preparation and sequencing was performed as previously described [20]. Briefly, DNA was extracted from whole blood by using the QIAamp DNA blood spin kit (Qiagen Sciences) and quantity and quality parameters were performed on the Qubit 2.0 fluorometer (Invitrogen) and run on a 1% agarose gel. Library construction and re-sequencing of individual members of the family Suidae were done with 1-3 ug of genomic DNA according to the Illumina library prepping protocols. The library insert size was 300-500 bp and sequencing was performed using a 100 paired-end sequencing kit [20]. All DNA were sequenced to approximately 8x depth. Quality trimmed reads (phred quality>20, minimum length of pairs of reads = 40bp) were aligned to the Sus scrofa reference genome build 10.2 [21] using the unique alignment option of Mosaik Aligner (V.1.1.0017). The aligned reads from each of the animals together with the Sus scrofa reference genome were stored as bam files for each individual animal.

Orthologs identification and delineation of their extracellular domains
Porcine TLR mRNA sequences were obtained from Ensemble database (http://www.ensemble. org). The accession numbers of sequences obtained from the public databases were TLR1: NM_001031775, TLR2: NM_213761, TLR3: HQ412796, TLR6: NM_213760, TLR7: NM_001097434, TLR8: ENSSSCG00000012118. When a TLR gene was found to have more than one transcript, the longest transcript was chosen. The genomic coordinates of the porcine TLR mRNA sequences within the Sus scrofa genome assembly 10.2 were obtained from Ensemble. Based on these genomic coordinates, sequences of TLR gene orthologs were then retrieved from aligned bam files (illumina resequencing data for family Suidae species aligned against Sus scrofa genome assembly 10.2) of Sus scrofa (Sus scrofa Europe and Sus scrofa Asia), Sus verrucosus, Sus celebensis, Sus scebifrons, Sus barbatus Babyrousa babyrussa, Potamochoerus larvatus, Potamochoerus porcus and Phacochoerus africanus to identify TLR gene orthologs. The resulting sequences for each species were then blast screened against the Sus scrofa genome to ensure similarity with the porcine TLR mRNA sequences. Exonic regions were then obtained from these sequences and concatenated to obtain coding sequences. The coding sequences were further trimmed to obtain sequences of the extracellular domain for each TLR in each species. Sequences were aligned using ClustalW 1.81 [22]. In this study, porcine TLR reference amino acid sequences were aligned to corresponding human and murine sequences in order to delineate the extracellular domains of porcine TLRs and their LRR modules and sub-domains [23][24][25][26][27][28] (S1 Fig). The genomic coordinates of the TLR extracellular domains are provided in S1 Table. ML test for positive selection Comparison of the non-synonymous substitutions per non-synonymous site (dN) with the number of synonymous substitutions per synonymous site (dS) in a maximum likelihood (ML) framework was used to test for positive selection for every codon, defining a dN/dS ratio (ω)>1 in a codon as evidence of positive selection. First, we determined whether ω varied among codon sites for each TLR alignment by comparing CODEML program models in PAML version 4 [29,30] M0 which assumes that ω is constant across all sites in the alignment and M3 which allows ω to vary amongst sites.
Next, 4 site models were employed to detect sites under persistent positive selection. Two models each from the CODEML program in PAML version 4 [29,30] and the Datamonkey web server [31] were utilized. CODEML site model M1a, a nearly neutral evolution model where sites are assumed to be evolving under either purifying selection (ω<1) or neutral evolution (ω = 1) was compared to model M2a that allows positive selection among sites. M7, which allows sites to evolve under either purifying selection or neutrally, was compared to model M8, which allows for positively selected sites. Models M7 and M8 differ from models M1a and M2a in that, the former assume that ω values are drawn from a beta distribution [32]. Models were compared using a likelihood ratio test (LRT). In order to identify positive selection, twice the difference in log-likelihood values (2lnΔL) between models would be significant by chi-square testing. The F3x4 model of codon frequencies was used for the analyses. Models were run in duplicates with ω of 0.5 and 1.5 to increase the probability of convergence of model parameters. The Bayes empirical Bayes (BEB) approach implemented in CODEML was used to identify codons under positive selection. BEB estimates the posterior probability of each site belonging to three selection classes: low, intermediate and high ω. Codon sites with ω>1 and a posterior probability >95% were inferred to be under positive selection. Fixed-effects likelihood (FEL) and Random-effects likelihood (REL) models implemented using the Datamonkey web server were also used to detect positive selection. The FEL model estimates synonymous and nonsynonymous rates directly at each codon site, without assuming an a priori distribution of rates across sites while REL model allows synonymous and nonsynonymous substitution rates to vary among codon sites. Codon sites were considered to be under positive selection at significant levels p<0.1 for FEL and a Bayes factor >50 for REL [33].
To test for episodic positive selection, Branch-Site REL and MEME (mixed effects model of evolution) implemented on the Datamonkey web server were utilized. The Branch-Site REL model estimates proportion of sites under selection along tree branches and allows evolutionary rates to simultaneously vary along phylogenetic branches and sites [16]. The MEME method identifies lineage-specific events of positive selection at sites, even though the same site is under purifying or neutral selection in other lineages [34]. A Suidae species tree (Fig 2) derived from near complete genome sequences for members of the Suidae [17] [L. Frantz, personal communication, September 27, 2014] was used in all analyses.
Positively selected sites detected in this study were compared to human TLR Swiss-Prot database to determine their possible link to function. Sites under positive selection were also mapped to three dimensional (3D) protein structures using MuPIT Interactive [35] in order to examine their functional significance. We also determined the conservative or radical nature of amino acid changes at sites under positive selection within this study.

Results
The sequences (10 sequences within each TLR alignment) of the extracellular domains of bacterial sensing TLR1, TLR2 and TLR6 and viral sensing TLR3, TLR7 and TLR8 from species within the family Suidae were obtained. The length of the extracellular domains in terms of number of nucleotides of the TLRs ranged from 1668 bases for TLR1 to 2445 bases for TLR7. Amino acid length ranged from 556 amino acids for TLR1 to 792 amino acids for TLR7.

Heterogeneity of selective pressure along genes
To determine whether selective pressures varied amongst codon sites for each TLR gene, the M0 and M3 models of CODEML program was utilized. Comparison of M0 vs M3 indicated that dN/dS ratio (ω) of some TLR genes varied among codons, implying that selective constraints were heterogeneous between sites. We detected significant (p<0.01 for 2lnΔl) heterogeneity of ω along TLR1, TLR2, TLR6 and TLR8 (Table 1). For these genes, we found that the proportion of sites with evidence of positive selection (p 2 ) is relatively smaller than the proportion of sites with evidence of purifying (p 0 ) or neutral (p 1 ) selection. Thus, the majority of sites within the proteins of TLR1, TLR2, TLR6 and TLR8 were functionally constrained. TLR3 and TLR7 sequences did not reveal heterogeneity of selection pressure ω among their codons and are thus functionally conserved along their entire extracellular domains within the members of Suidae involved in this study.

Detection of persistent positive selection across members of the Suidae
To detect positive selection pressure that have acted persistently and shared across most Suidae members regardless of their geographic origins, site models implemented in the CODEML  program of the PAML package and on the Datamonkey web server were utilized. Site models permit detection of positive selection within gene codons. Site models detected positively selected codons in bacterial sensing TLR1, TLR2 and TLR6 and viral sensing TLR8. Specifically, comparisons of nested models available in CODEML program indicated that models including codons with ω>1 (M2a and M8) demonstrated a better fit than did neutral models (M1a and M7) for all the four TLR genes (S2 and S3 Tables). Since detecting codons under positive selection using site-based methods have power limitations when analyzing a few closely related species [36], we defined sites under positive selection conservatively as those for which significant results were obtained by more than one site model. Such sites and the properties of their amino acids are shown in Table 2. Three codons were identified for TLR1, 2 codons for TLR2, 7 codons for TLR6 and 2 codons for TLR8 that showed evidence for persistent positive selection. The site based methods did not identify codons under positive selective pressure for TLR3 and TLR7.

Detection of episodic positive selection in particular lineages
To detect signatures of episodic positive selection in specific lineages for each TLR gene, branch-site REL analysis available on the Datamonkey web server were performed. The branch-site REL identify lineages at which a proportion of sites have dN/dS ratios >1without making any assumptions as to which lineages should be analyzed for positive selection. With respect to TLR1, evidence for positive selection in the ancestral lineage of the genus Sus (internal branch leading to the Sus clade) and on TLR2 species branch corresponding to Sus verrucosus (Table 3) were detected. Analyses also indicated that within the TLR2 gene, the species branch corresponding to Potamochoerus porcus is under positive selective pressure (Table 3). Thus, MEME was employed to identify sites under positive selection along branches. One codon position in TLR1 (codon position 434) in the lineage leading to the genus Sus and the species branch corresponding to Sus verrucosus was identified as under positive selection. Another codon position (codon position 338 in TLR2) was found to be under positive selection in the species branch corresponding to Potamochoerus porcus (Table 3).

Functional significance of positively selected sites
To determine functional relevance of positively selected amino acid sites, sites determined to be under positive selection by more than one ML method were compared to human TLR Swiss-Prot entries (Table 4). First, human and porcine TLRs were aligned to determine the equivalent positions of positively selected sites in pigs within humans. Then analysis were performed to determine whether the sites in human TLRs have been implicated as having functional effects or are in close proximity to a functionally annotated site from human Swiss-Prot. Sites that were adjacent to residues and within regions known to affect TLR protein function (Table 4)   radical amino acid changes ( Table 2), suggesting a possible role of such sites in diverse protein functions.

Discussion
The important role of pathogen mediated positive selection pressure in shaping diversity in the TLRs of mammalian species has been documented elsewhere [12,13,37]. The adaptation of the members of the Suidae to different environments presenting numerous bacterial and viral pathogenic challenges, make the family amenable to studies of pathogen mediated selection on immune genes. Results obtained in this study indicate that both persistent and episodic positive selection have shaped TLR evolution and diversity among the Suidae.
Our finding of small proportion of sites of TLR1, TLR2, TLR6 and TLR8 showing evidence of persistent positive selection agrees with the mostly accepted paradigm that purifying selection is the dominant force operating on TLRs [10,12]. As was the case with previous studies [12,15,38], more positively selected sites within bacterial-sensing TLRs than their viral-sensing counterparts were inferred. Viral infections are thought to exert stronger selective pressure than bacterial infections, constraining the evolution of viral-sensing TLRs [38]. In contrast to previous studies done in primates [12] and across rodents, carnivores, lagomorphs and primates [37], fewer sites under persistent positive selection within genes involved in this study were detected. Members of the Suidae represent closely related species and are therefore likely to be affected by fewer related bacteria and viruses than the diverse species involved in previous studies. TLR6 stood out as the gene with the strongest evidence of selection, where more codons were under persistent positive selection. The dimerization interface in TLR6/TLR2 is 80% larger than that of TLR1/TLR2 [39]. Therefore one can speculate that the larger TLR6/TLR2 dimerization surface exposes more codons of TLR6 to positive selective pressure. The finding that among the bacterial sensing TLRs, TLR2 had fewer sites under persistent positive selection despite having similar protein length as TLR1 and TLR6 is suggestive of a stronger selective constraint on TLR2. The TLR2 gene product recognizes a myriad of ligands (microbial triacyl lipoproteins, diacyl lipoproteins found in mycoplasma, lipoteichoic acid of Gram-positive bacteria or Zymosan of yeast) pathogens through heterodimerization with TLR1 and TLR6 [40,41]. TLR2 also been shown to affect IFN production, making the TLR2 gene evolutionary constrained [42].
Apart from persistent pathogen mediated positive selection acting over long evolutionary time across members of the Suidae, the evolutionary histories of members of the Suidae may have been affected by periodic pathogenic infections confined to specific lineages within certain geographic locations, leading to episodic positive selection within such lineages. Such a signal of adaptive evolution is usually masked by a background signal of purifying selection, which makes their identification difficult. Both Branch Site REL and the MEME methodology implemented in Datamonkey revealed the same lineages were evolving under episodic positive selection, suggesting that sites within these branches are under positive selection. MEME is a recently developed method [34] that allows the detection of episodic positive selection even when majority of lineages are evolving under purifying selection.
Our lineage specific analysis showed that the branch leading to the Sus clade was found to have undergone episodic positive selection at TLR1 amino acid site 434 indicating that the ancestors of species within the genus Sus had to undergo adaptive changes at this site in response to their environment. With the exception of Sus verrucosus which had methionine at TLR1 site 434, other Sus species had the leucine residue while the African suids and Babyrousa babyrussa had methionine. This finding suggests a possible selective advantage for leucine at TLR1 site 434 in the environment in which ancestors of the Sus species originated. Indeed, methionine seems to be very rare at TLR1 site 434 within the domesticated breeds of Sus scrofa [43], indicating leucine is preferred at this site. The substitutions of methionine with leucine within the interior of a protein increase protein stability [44] supporting a hypothesis that leucine within the Sus species stabilizes the TLR1 protein prior to heterodimerization with TLR2 for efficient recognition of diverse bacterial ligands (peptidoglycans and triacyl lipoproteins). This finding of positive selection on branches leading to Sus verrucosus for TLR1 and Potamochoerus porcus for TLR2 requires a cautious interpretation, since only one sequence from one animal is involved in each case. Sus verrucosus is thought to represent a distinct lineage following a deep split with other species of the genus Sus [17]. It is possible that bacterial pathogens restricted to Sus verrucosus may have exerted selective pressure on its TLR1 gene. Related to Potamochoerus porcus, positive selection on TLR2 gene could partly be due to adaptation to infectious agents within the African rain forest, a location outside of which they are rarely found [45].
The case for positive selection within TLR amino acid sites involved in this study is strengthened by the location of specific sites in close proximity to functionally relevant regions. Site 117 of TLR1 is within disulphide bonds region. Disulphide bonds are important to the overall function of proteins as they are associated with their folding and stability [46]. Site 434 of TLR1 is adjacent to a glycosylation site. One conclusion would be that positive selection at this site is of consequence as glycosylation of TLRs is thought to influence receptor surface presentation, trafficking and ligand recognition [47]. The positive selection inferred at site 559 of TLR1, adjacent to a site that leads to impairment of NF-kB activation, suggests a role for this site in regulating inflammatory response to bacterial infection.
TLR1/TLR2 heterodimer formation is required for ligand recognition and signal initiation [23]. Thus changes at sites 434 and 559 within TLR1 suggest residues at these sites could be under selective pressure to improve the TLR1/TLR2 heterodimer formation. . .. As was the case with the study of [48] involving RIG-I-like pattern recognition receptors, in this study the majority (10/15 sites) of sites under positive selection involved radical amino acid residue changes across species of the Suidae. This is in agreement with positive selection favoring radical amino changes at sites within particular genes [49]. Such sites may be of functional significance.
Results obtained here have implications for present day domestic pigs. African wild suids are susceptible to some viral, bacterial and parasitic diseases of domestic pigs. As European and Asian wild boars are the progenitors of most domestic pigs, it is likely that species of the genus Sus are also susceptible to diseases of domestic swine [50]. Thus, residues that were under positive selection in the past could still be beneficial to domestic pigs in terms of disease resistance. Evidence for past positive selection influencing resistance or susceptibility to present day pathogens is seen in the Protein Kinase R (PKR) gene, where adaptive changes at important residues, most likely driven by old viruses [51], are important in the ability of PKR to fight infections from present-day poxyviruses [52].

Conclusion
In conclusion, residues within bacterial sensing TLR1, TL2, TLR6 and viral sensing TLR8 of members of the Suidae that have undergone persistent and episodic positive selection were identified. The evidence of positive selection on the TLR genes reveals that pathogen mediated selective pressure has shaped Suidae TLR evolution. The case for positive selective at amino acid sites is strengthened by location of these sites in close proximity to functionally relevant sites and the radical changes in amino acids at some of these sites across members of the Suidae. Sites under positive selection may have aided in the adaptation of the Suidae to infectious agents that evolved rapidly or that were encountered in new environments.