Computational Analyses of an Evolutionary Arms Race between Mammalian Immunity Mediated by Immunoglobulin A and Its Subversion by Bacterial Pathogens

IgA is the predominant immunoglobulin isotype in mucosal tissues and external secretions, playing important roles both in defense against pathogens and in maintenance of commensal microbiota. Considering the complexity of its interactions with the surrounding environment, IgA is a likely target for diversifying or positive selection. To investigate this possibility, the action of natural selection on IgA was examined in depth with six different methods: CODEML from the PAML package and the SLAC, FEL, REL, MEME and FUBAR methods implemented in the Datamonkey webserver. In considering just primate IgA, these analyses show that diversifying selection targeted five positions of the Cα1 and Cα2 domains of IgA. Extending the analysis to include other mammals identified 18 positively selected sites: ten in Cα1, five in Cα2 and three in Cα3. All but one of these positions display variation in polarity and charge. Their structural locations suggest they indirectly influence the conformation of sites on IgA that are critical for interaction with host IgA receptors and also with proteins produced by mucosal pathogens that prevent their elimination by IgA-mediated effector mechanisms. Demonstrating the plasticity of IgA in the evolution of different groups of mammals, only two of the eighteen selected positions in all mammals are included in the five selected positions in primates. That IgA residues subject to positive selection impact sites targeted both by host receptors and subversive pathogen ligands highlights the evolutionary arms race playing out between mammals and pathogens, and further emphasizes the importance of IgA in protection against mucosal pathogens.


Introduction
Immunoglobulin A (IgA), in the form of dimers or higher polymers (pIgA) particularly tetramers, is the predominant immunoglobulin isotype in mucosal tissues and external secretions, where it provides a major line of defense against pathogens. In addition, it plays a major role in the maintenance of the commensal microbiota in the intestinal tract, where interplay between commensal microorganisms and IgA promotes a mutually beneficial co-existence [1]. Monomeric IgA is present in serum, being the second most prevalent immunoglobulin after IgG and a critical factor for eliminating pathogens that breach external surfaces [2]. Much energy is expended in producing these serum and mucosal forms of IgA. In humans, for example, more IgA is produced than all the other antibody isotypes combined. Such high investment in IgA is presumably indicative of the key contribution this antibody isotype makes to immune protection. Like all immunoglobulins, IgA displays a basic monomeric structure of two light and two heavy chains, each having a variable and a constant region, linked together by disulphide bridges. Each chain is organized in globular domains consisting of approximately 110-130 amino acids. The light chains (VL and CL domains) and the variable (VH) and first constant domain of the heavy chain (Ca1) constitute the two Fab regions, which bind antigens. The remaining constant domains of the heavy chain (Ca2 and Ca3) constitute the Fc region, responsible for the recruitment of mechanisms that lead to pathogen elimination. Linking the Fab and Fc regions is a flexible hinge region. This basic IgA unit can exist as monomers or be arranged into dimers (dIgA) and higher order multimers in which the monomers are linked by a J (joining) chain. In secretions, IgA is present as secretory IgA (S-IgA), a complex of dIgA or pIgA with another polypeptide chain, the secretory component (SC) [3], which confers some protection from proteolytic cleavage.
IgA has been identified in all mammals and birds studied [3]. In mammals, differences in gene number and molecular forms have been noted, defining different IgA systems. Most mammals have one IGHA gene, coding for one IgA isotype, which adopts a dimeric form in serum IgA. Humans, chimpanzees, gorillas and gibbons have, however, two IGHA genes, which arose by gene duplication in a common hominoid primate ancestor and code for the IgA1 and IgA2 [4] subclasses. In hominoids serum IgA is mainly monomeric. Rabbit has the most complex IgA system observed, with 13 IGHA genes encoding 13 IgA subclasses [5]: of these 13 subclasses, 11 are expressed and are differentially distributed among the mucosal tissues [6]. Mammalian IgA subclasses mainly differ in the length and amino acid sequence of the hinge, which affects their susceptibility to cleavage by bacterial proteases [5], [7].
Elimination and destruction of pathogens is facilitated by the binding of Ig-antigen complexes to Ig receptors (FcRs) on effector cells and soluble effector molecules such as complement. In most mammals, IgA effector functions appear to be reliant on FcaRI (CD89), the Fc receptor specific for IgA: binding of the IgAantigen complex to FcaRI can lead to phagocytosis, antibody dependent cell-mediated cytotoxicity (ADCC) and release of cytokines and inflammatory mediators. FcaRI binds to IgA at the Ca2-Ca3 interface [8], [9] an interaction that has been suggested to evolve under pressure from pathogen decoy IgAbinding proteins [10]. FcaRI appears to be functional in the majority of mammals, but it is notably absent from mice, rabbits and dogs due either to loss of the gene or to its degeneration into a pseudogene.
Other IgA-Fc receptors important for IgA function include the polymeric Ig receptor (pIgR) and the IgA/IgM Fc receptor (Fca/ mR) [11]. The pIgR is responsible for delivery of the large quantities of pIgA produced in the mucosae across the epithelial cell layer into mucosal secretions. In the process, pIgR is cleaved to yield the SC, which remains covalently complexed with pIgA to form S-IgA. The binding involves interaction of pIgR with J chain and IgA-Fc residues, particularly within the Ca3 domain of IgA. Some of these residues are located in the Ca2-Ca3 interface [12] and overlap with residues critical for binding to FcaRI and Fca/ mR [3]. In addition to transport of free pIgA, pIgR can also transport polymeric IgA immune complexes, including pIgA complexed with viruses, out across the epithelium [2]. Moreover, pIgA transported via pIgR may intercept and neutralize certain viruses inside epithelial cells [2]. In humans, Fca/mR is present on macrophages and plasma cells, and also on follicular dendritic cells in tonsil and in intestinal tissues [11], likely reflecting a role in coordination of the immune response in mucosal tissues. The Nterminal Ig-binding domain of Fca/mR shares similarity with domain 1 of pIgR, and the modes of interaction with dIgA are presumed to have similar features. Consistent with this possibility, the results of mutagenesis mapping analysis indicate a critical role for the Ca2-Ca3 domain interface of the IgA heavy chain in the interaction [13].
To evade elimination by the immune system, numerous pathogens have evolved proteins targeting IgA. These include IgA-binding proteins, which by binding to IgA block its access to host IgA-receptors, as well as proteases that by cleaving the IgA hinge, uncouple the recognition of foreign antigens from the effector functions that eliminate them. Examples of microbial IgAbinding proteins include the Sir22 and Arp4 proteins of Streptococcus pyogenes, the b protein of Streptococcus agalactiae, and the SSL7 toxin of Staphylococcus aureus, all of which bind to residues lying in the Ca2-Ca3 interface of IgA and prevent IgA interacting with FcaRI [14], [15]. Examples of the microbial proteases include IgA1 proteases secreted by clinically important bacterial pathogens, such as Neisseria meningitidis and Haemophilus influenzae, which cleave specifically in the hinge region of IgA1 of humans and great apes. IgA1 proteases are postproline endopeptidases that cleave at either Pro-Ser (type 1 enzymes) or Pro-Thr (type 2 enzymes) peptide bonds within the IgA1 hinge region. To achieve such specific cleavage, these enzymes recognize structural elements within the hinge [16], [17] and some of them also have to contact the Fc region before cleavage can occur [18], [19]. Notably, the type 2 IgA1 protease of Neisseria meningitidis, a causative agent of bacterial meningitis, interacts with the Ca3 residues of the Ca2-Ca3 interface also bound by FcaRI, pIgR and Fca/mR, whereas the type 2 IgA1 protease of Haemophilus influenzae contacts a different set of Ca3 residues that are implicated in binding to pIgR [19].
Over recent years it has become increasingly apparent that S-IgA contributes to mucosal homeostasis through various mechanisms [20]. For example, coating of commensal bacteria by S-IgA may promote gut colonization and survival through biofilm formation. The role of S-IgA in maintaining the commensal microbiota may depend, at least in part, on interactions between IgA glycans and commensal bacteria [20].
Considering the complex interactions of IgA with other components of the immune system, with commensal microorganisms and the evasion proteins of diverse pathogens, IgA is a likely target for natural selection. Few studies have examined Ig sequences for the impact of natural selection and they have focused on IgA or IgG isotype in a limited number of vertebrate taxa [10], [21], [22]: for example, Abi-Rached et al [10] investigated the pattern of diversification of IgA-Fc using maximum likelihood [23], [24] and pairwise methods, with a focus on primates. To develop deeper understanding of the issue, in this study we took a broader approach that encompasses a wider range of methods and mammalian species. In total, 64 sequences from 28 species representing monotremes, marsupials and eight orders of placental mammals were included in the analyses.
For the analysis of the primate datasets and of the placental mammal Ca1 dataset, sequences were aligned using CLUSTAL W [27] as implemented in BioEdit [28], and corrected manually; notably, adjustments were made to follow the rigorous IMGT numbering system. For the mammalian Ca2 and Ca3 datasets, amino acid alignments were first generated using MUSCLE [29] and manual corrections, and these alignments were then used as a guide to prepare codon alignments for the same set of sequences.
Codon numbering is according to the Bur IgA1 numbering. IMGT unique numbering for C-DOMAIN [26] is also shown in parenthesis.

Codon-based Analyses of Positive Diversifying Selection
To investigate positive selection on IgA, we studied the three constant domains (Ca1, Ca2 and Ca3) separately: for each domain we compared the rate per-site of nonsynonymous substitution (dN) to the rate per-site of synonymous substitutions (dS) in a maximum likelihood (ML) framework, using six different methods. Since each method has strengths and weaknesses, we used the approach of Wlasiuk and Nachman [30] to identify the codons for which the signal of positive selection was strongest: only codons identified by at least two of the ML methods were considered to be positively selected codons (PSC). Unlike pairwise dN/dS analyses, the methods used here rely on phylogenetic approaches and are thus not as sensitive as the pairwise dN/dS methods to differences in the number of sequences present in the taxonomic groups investigated: to increase the resolution of the analysis, we included all available sequences.
We first compared two alternative models implemented in CODEML (PAML 4.4) [23], [24]: M8, which allows for codons to evolve under positive selection (dN/dS.1) and M7, which does not (dN/dS#1). These two nested models were compared using a likelihood ratio test (LRT) with 2 degrees of freedom [31], [32]. The analysis was run twice, and conducted with the F364 model of codon frequencies. Codons under positive selection for model M8 were identified using a Bayes Empirical Bayes approach (BEB) [33] and considering a posterior probability of .90%. For each analysis, a Neighbour-Joining phylogenetic tree was used as the 'working topology', and generated using Mega 5 [34] with the pdistance substitution model and the complete deletion option to handle gaps and missing data. Overall, the tree topologies used reflected the accepted topology for mammals.
We also used the five methods for detecting positive selection available from the DATAMONKEY web server [35]: the Single Likelihood Ancestor Counting model (SLAC), the Fixed Effect Likelihood model (FEL), the Random Effect Likelihood model (REL), the Mixed Effects Model of Evolution (MEME) and the Fast Unbiased Bayesian Approximation (FUBAR). For these analyses, the best fitting nucleotide substitution model was determined through the automatic model selection tool available on the server.
Because recombination can contribute to false inference of positive selection, causing a high rate of false positive detection [36], [37], [38], all datasets were screened for recombination using the GARD [39] method from the DATAMONKEY web server [35]. No evidence of recombination was found.

Location of the PSC in Structural Models of IgA
A molecular model of human IgA1 (MMDB ID: 10546, PDB ID:1iga [40]) and the three-dimensional X-ray crystal structure of human IgA1-Fc (PDB ID :1OW0 [9]) were used to map the amino acids encoded by PSC onto 3D structures of the protein. To investigate their relation to putative sites of interest, the sites of interaction with host receptors (FcaRI, pIgR and Fca/mR [3], [8], [9], [12], [13]) and bacterial proteins (S. aureus SSL7 protein, streptococcal IgA binding proteins, N. meningiditis and H. influenzae type 2 IgA1 proteases [3], [14], [19], [41]) were also mapped onto the 3D structure. For this purpose the NCBI application Cn3D 4.1 (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml [42]) and iMol software [43] were used. Although the molecular model of human IgA1 has the drawbacks of being based on low resolution X-ray and neutron scattering data and of using the Xray crystal structure of IgG to model the Fc part of Iga (the IgA Fc structure was unavailable at the time), it offered the best means to visualize all PSC in one intact structure. The solved X-ray crystal structure of human IgA1-Fc offers a higher resolution view, and aids understanding of the putative impact of these PSCs on the IgA-Fc ligand interaction.

Natural Selection Diversified the Ca1 and Ca2 Domains of Primate IgA Sequences
Using the ML approach of PAML [23], [24], evidence for positive diversifying selection was obtained in primates for two of the three IgA constant domains, Ca1 and Ca2, with the model allowing sites to evolve under positive selection (M8) showing a significantly better fit than the model that did not (M7) (a = 0.01-0.05; Table 1). The other five ML methods also identified positively selected sites for IgA Ca1 and Ca2 but not for IgA Ca3. Comparison of the sites characterised by each method reveals five codons supported with high confidence (p.0.9) by at least two methods: of these five positively selected codons (PSC), two are in the Ca1 domain, codons 133 and 166 (Ca1-10 and 45.2), and three others are in the Ca2 domain, codons 296, 319 and 326 (Ca2-84, 100 and 107). Natural amino acid variability and characteristics for each of these codons are given in Table 2: for four of the five positions (133, 166, 319, and 326), changes in amino acid characteristics such as polarity and charge were observed, with potential to alter the protein structure or capacity for protein-protein interaction.

Mammalian IgA Evolution was Marked by Diversifying Selection on the Three Constant Domains
Analysis of IgAs from a much broader range and larger sample of mammals using the ML approach implemented in PAML [23], [24] revealed significant evidence of diversifying selection for two of the three domains investigated (Ca1 and Ca2) ( Table 3). However, the other five ML methods clearly identified positively selected sites for all three domains, which is consistent with the three domains having been the targets of diversifying selection. Comparison of the positively selected sites identified by each of the methods led to the identification of eighteen well supported positively selected codons (PSC) ( Table 3). Of these eighteen PSC, ten locate to the Ca1 domain, five to the Ca2 domain, and three to the Ca3 domain ( Figure 1). All but one of these residues show variations in both polarity and charge, changes that could alter the protein structure or capacity for protein-protein interaction (Table 4); the exception, PSC 431 (Ca3-103), displays a restricted set of residues that share the same polarity and charge, suggesting these characteristics are of value at this position. Of note, two of the changes at PSC in the Ca1 domain, codons 166 and 213 (Ca1-45.2 and 116 IMGT numbering) can generate putative Nglycosylation sites, which could affect protein function: residue 166 is a known N-glycosylation site of primate IgA2, as well as sheep, panda and alpaca IgA and rabbit IgA7, IgA8, IgA11 and IgA13. In contrast the putative N-glycosylation site at residue 213 appears only in rabbit IgA7, IgA8, IgA11 and IgA13.
The recently developed MEME methodology [44] can identify both episodic and persistent positive selection, because it allows the distribution of the dN/dS ratio to vary from site to site and also from branch to branch at a site. The additional positively-selected codons identified by MEME and not by the other approaches, are likely to have been subject to episodes of positive selection. Consistent with this interpretation, of 6 such sites detected by MEME in the Ca3 domain residues 389 and 442 are sites targeted by pathogenic IgA-binding proteins (Ca3-45.2 and 115 IMGT numbering).

Positively Selected Codons are Located Near Sites of Interaction with Ligands and Bacterial Proteases
To understand better the possible biological significance of the detected PSC, we mapped the residues they encode onto a molecular model of human IgA1 and the three-dimensional X-ray crystal structure of IgA1-Fc, along with sites of interaction for host receptors and bacterial proteins ( Figure 2). Remarkably, more than half (13 out of 21) of the PSC encode residues located near sites of interaction with ligands and bacterial proteases. Ca1 residues 133, 134, 135, 137 and 221 and Ca2 residues 293 and 296 (Ca1-10, 11, 12, 14 and 124 and Ca2-81 and 84, IMGT numbering) are near the hinge region, the preferential target region for some IgA1-specific bacterial proteases. Ca1 residues Table 1. Phylogenetic tests of positive selection in primates.    [25]. Residue 343 also lies close to the putative interaction site for pIgR [12] and a region important for interaction with the type 2 IgA1 protease of H. influenzae [19]. Residue 408 (Ca3-85.5) is one of several Ca3 domain residues of human IgA1 that directly influence binding to pIgR; it also lies adjacent to the site where the H. influenzae IgA1 protease is believed to bind. Although position 431 (Ca3-103) in the Ca3 domain is positively selected, its location in the IgA molecule is not close to any known interaction sites of IgA-Fc region. Substitutions at this position could exert a functional effect by indirectly influencing the conformation of one or more of the interaction sites.

Discussion
Genes involved in host-pathogen interactions are prone to diversifying selection [45], [46]. As pathogens continuously evolve mechanisms to evade host defenses and cause infectious diseases, so must host species evolve counter defense mechanisms if they are to survive. This never-ending arms race subjects those components of the mammalian immune system that recognize pathogens and their products to strong varying selection. IgA, the main Ig isotype present in external secretions and at mucosal surfaces, is uniquely exposed to a wide variety of bacteria, viruses, fungi and other infectious microorganisms, which together exert strong selective pressures on this immunoglobulin isotype. The results obtained in this study demonstrate the considerable impact that positive selection has played in the evolution of IgA in mammals and in the diversity and divergence of IgA among extant mammalian species.

Natural Selection Diversified IgA in Mammals
Consistent with the study of Abi-Rached and coworkers [10], our analysis shows that the Ca2 domain of primate IgA-Fc exhibits evidence of positive diversifying selection, and the Ca3 domain does not. Making use of six different and complementary methodologies to identify positively selected residues, three Ca2 codons were identified by at least two of the methods used (positions 296, 319 and 326). These three positions also correspond to three of the seven codons identified previously [10] as being positively selected (Table 1). In contrast, the other four positions found previously as positively selected did not reach the cutoffs for detection used here, even though two of them appeared in individual analyses (positions 245 and 317, Table 1). Because the goals of the two studies were different (sensitive detection in the earlier study versus detection of positions with the strongest signals for selection here), different cutoffs were applied. To reconcile the apparent discrepancies will require analysis of a much larger dataset of IgA sequences.
To develop deeper understanding of IgA evolution, we compared IgA in a broad range of mammalian species. Of eighteen positions selected during mammalian evolution, only two are included in the five positions selected during primate evolution. This difference vividly illustrates the evolutionary plasticity of IgA.
We find that diversifying selection has mainly targeted the Ca1 and Ca2 domains of IgA, and to lesser extent the Ca3 domain. Thus only three of the eighteen selected positions are in the Ca3 domain. One of these, position 431, exhibits relatively conservative variation, having only three alternative amino acids, with similar polarity and charge. The Ca3 domain, along with the J chain, plays a key role in binding of pIgA to pIgR. Ca3 is also the main domain of IgA involved in binding to the major IgA-Fc receptors FcaRI and Fca/mR. These crucial roles, along with contributions to the assembly and polymerization of IgA, can explain why Ca3 is the most conserved of the constant region of the IgA heavy chain. In contrast, the ten PSCs detected in Ca1 show more variety in amino acid substitutions, including changes in polarity and charge. Such variation modulates the Ca1 structure, with potential impact on Fab conformation, the antigen-binding site and the hinge region. Substitution at residues 166 and 213 could introduce an additional N-glycosylation site since this putative site of glycosylation is also present in primate IgA2, sheep, panda and alpaca IgA, and some rabbit IgA subclasses. N-linked glycans in the Fab region are known to influence antigen binding, either by increasing affinity for antigen or blocking antigen binding [47]. Since IgA-Fc N-linked glycans could protect IgA from cleavage by bacterial and other proteases [18], we speculate that Fab N-linked glycans can also contribute to such protection from proteases. Furthermore, glycans could impact on interactions of S-IgA with commensal microorganisms, thereby influencing the make-up of the microbiota and homeostasis of the gut [1], [20].

IgA Diversification in Mammals Targets Sites Involved in the Interaction with Ligands and Bacterial Proteases
Mapping positively selected sites onto the structures of IgA and IgA-Fc revealed their likely impact on IgA function. Seven such sites, residues 133, 134, 135, 137 and 221 and Ca2 residues 293 and 296, are near the hinge, which links the antigen-recognition function of the Fab arms to the effector-recognition function of the Fc region. Because it is accessible, flexible and essential for antibody function, the hinge is a preferred target for bacterial proteases [3], [48]. Hinge structure varies considerably across mammalian species and between different subclasses and allotypes. For example, the hinge of hominoid IgA1 is 16 amino-acids longer Table 3. Phylogenetic tests of positive selection in mammals. than that of IgA2 and much more susceptible to proteolytic cleavage. The possible advantage of the longer hinge in IgA1 is its greater flexibility and potential for cross-linking antigens on the surface of bacteria and other pathogens [2]. Longer hinges are also a feature of most rabbit IgA subclasses. Thus, any variation that confers protection of the IgA hinge from proteolysis is a likely candidate for positive selection. For IgA1 proteases that cleave specifically in the hinge of hominoid IgA1 the distance of the susceptible peptide bond in the hinge from the ''top'' of the Fc (where the heavy chain enters the globular Ca2 domain) is critical for efficient cleavage [17]. Indeed, the crystal structure of a bacterial IgA1 protease from H. influenzae suggests that an intricate and coordinated association of protease with IgA is essential for optimal orientation of the hinge into the enzyme's active site [49]. Substitutions at residues in and around the hinge could therefore increase resistance to proteolytic attack and become targets for positive selection. Three positively-selected residues are found near sites of human IgA1 that interact with Fc receptors and bacterial proteins. Residues 341, 341a and 343 are in the strand linking the Ca2 and Ca3 domains, in the vicinity of the Ca2 asparagine-histidine motif that participates in the binding of S. aureus SSL7 molecule to human IgA [25]. The CH2-CH3 interface is central to the binding of IgA to several classes of Fc receptor including FcaRI, Fca/mR and pIgR [8], [9], [12], [13], and is also the target of pathogenic mechanisms to obstruct IgA function [14], [15], [19].

Test of selection
Variation at the CH2-CH3 interface could prove adaptive, either by improving the binding of IgA to its Fc receptors or hampering the binding of pathogen decoy molecules, or by achieving both of these effects. Such adaptations could be accomplished by changes in the residues that contact Fc receptors or decoy proteins and also in nearby residues that have conformational impact. Residues under positive selection have been described in the Cc2-Cc3 interface of IgG in leporids [21]. Residue 408, is one of the positively selected Ca3 residues implicated in the binding of human IgA1 to pIgR [12] and the type 2 IgA1 protease of H. influenzae. Substitution at position 408 could therefore provide protection from cleavage by this IgA protease. The results of mutagenesis experiments are consistent with this possibility [19]. MEME methodology, which detects both episodic and persistent positive selection, identified codons in all three IgA analysed domains that were not revealed by the methods detecting only persistent selection. Thus these PSC are candidates for being subject to episodic selection. Among them are residues 389 and 442 in Ca3 that are targets for pathogenic IgA-binding proteins. Residue 442, which was previously shown to be subject to episodes of diversifying selection [10], is a site of N-linked oligosaccharide for IgA in mice. The glycan attached at asparagine 442 of mouse IgA hinders interaction with the S. aureus SSL7 decoy protein, but does not affect the binding of IgA to pIgR [50].
In conclusion, this study identified residues under positive selection in all three IgA heavy chain constant region domains.  The majority of the identified residues are located in parts of the molecule that are essential for the functions of IgA in resistance to pathogens. This correlation is consistent with the positivelyselected residues having influences on the interactions of IgA with immune-system receptors and the microbial proteins that interfere with these interactions. Future functional analyses should determine the mechanisms by which the positively selected residues exert their effect. Such knowledge could assist the design of therapeutic IgA-based monoclonal antibodies that are not susceptible to the pathogenic proteins that obstruct the defense functions of IgA.