Biophysical Analysis of Anopheles gambiae Leucine-Rich Repeat Proteins APL1A1, APL1B and APL1C and Their Interaction with LRIM1

Natural infection of Anopheles gambiae by malaria-causing Plasmodium parasites is significantly influenced by the APL1 genetic locus. The locus contains three closely related leucine-rich repeat (LRR) genes, APL1A, APL1B and APL1C. Multiple studies have reported the participation of APL1A—C in the immune response of A. gambiae to invasion by both rodent and human Plasmodium isolates. APL1C forms a heterodimer with the related LRR protein LRIM1 via a C-terminal coiled-coil domain that is also present in APL1A and APL1B. The LRIM1/APL1C heterodimer protects A. gambiae from infection by binding the complement-like protein TEP1 to form a stable and active immune complex. Here we report solution x-ray scatting data for the LRIM1/APL1C heterodimer, the oligomeric state of LRIM1/APL1 LRR domains in solution and the crystal structure of the APL1B LRR domain. The LRIM1/APL1C heterodimeric complex has a flexible and extended structure in solution. In contrast to the APL1A, APL1C and LRIM1 LRR domains, the APL1B LRR domain is a homodimer. The crystal structure of APL1B-LRR shows that the homodimer is formed by an N-terminal helix that complements for the absence of an N-terminal capping motif in APL1B, which is a unique distinction within the LRIM1/APL1 protein family. Full-length APL1A1 and APL1B form a stable complex with LRIM1. These results support a model in which APL1A1, APL1B and APL1C can all form an extended, flexible heterodimer with LRIM1, providing a repertoire of functional innate immune complexes to protect A. gambiae from a diverse array of pathogens.


Introduction
Malaria results from infection with Plasmodium parasites and is exclusively transmitted by Anopheles mosquitoes. Despite being both curable and preventable, malaria caused an estimated 584,000 deaths in 2013, mostly African children living in poverty [1]. Prevention, especially vector control measures such as insecticide-treated bed nets (ITNs) and indoor residual insecticide spraying (IRS), dramatically reduces the malaria burden. However, the effectiveness of vector control is threatened as malaria mosquitoes develop resistance to the insecticides used in ITNs and IRS [1]. The African mosquito vector A. gambiae possesses an immune response that is effective against various pathogens, including malaria parasites. Destruction of parasites by the mosquito's own immune system prevents their further transmission to humans [2,3]. Hence, understanding Anopheles-Plasmodium host-pathogen interactions and the mechanism of parasite killing within the Anopheles mosquito informs both the dynamics of transmission and potentially the development of new malaria control measures.
Genetic analysis of variation in natural infection of A. gambiae populations identified a region on chromosome 2L that is strongly linked to P. falciparum resistance [4][5][6]. Three gene paralogs named APL1 (Anopheles Plasmodium-responsive Leucine-rich repeat protein 1) within an 18 kb locus were identified as resistance candidates: APL1A, APL1B and APL1C [7]. At least three allelic variants of APL1A (APLA 1 , APL1A 2 and APL1A 3 ) have been identified in A. gambiae laboratory strains from Gambia and Cameroon [7,8], and in field-caught mosquitoes from Mali [9]. The APL1 genes are homologous in sequence, polymorphic and under positive selection pressure. The APL1 proteins contain a signal peptide, a leucine-rich repeat (LRR) domain and a cysteine-rich region followed by a C-terminal coiled-coil (CC) domain containing a helix-loop-helix (HLH) motif ( Fig. 1). APL1C forms a disulfide-linked heterodimeric complex with another anti-Plasmodium factor LRIM1 (Leucine-Rich Immune Molecule 1) [10][11][12]. LRIM1 is a paralog of APL1A-C and is structurally homologous to APL1C [13]. LRIM1 and APL1A-C are members of an LRR family, the LRIM family, that includes several dozen genes found within, but not outside, mosquito genomes (family Culicidae) and are believed to play multiple roles within the innate immune system [14].
The LRIM1/APL1C complex binds and stabilizes a specific form of A. gambiae Thioestercontaining Protein 1 (TEP1) [11,12], a structural and functional homolog of vertebrate complement C3 [15,16]. Binding of TEP1 leads to the killing of bacteria by phagocytosis [17], and lysis and melanization of P. berghei parasites [2,18,19]. A strong association has been shown between TEP1 and resistance to P. falciparum in refractory strains of A. gambiae [20,21]. Proteolytic processing of TEP1 within a protease sensitive region is required to produce a cleaved form (TEP1 cut ) that is responsible for the initial attachment to pathogen surfaces. However, TEP1 cut is an unstable species that precipitates over time in the absence of the LRIM1/APL1C complex [11,13].
The anti-Plasmodium phenotype of APL1A-C varies depending on A. gambiae strain and Plasmodium species or isolate used. Studies in the G3 (susceptible) and L3-5 (refractory) strains of A. gambiae [22] knocking down either LRIM1 (dsLRIM1)or all APL1 paralogs (dswAPL1) demonstrated a role in the immune response to the rodent malaria parasite P. berghei [5,[10][11][12]. Subsequently, only dsAPL1C demonstrated a phenotype for P. berghei infection in G3 [7]. Using the recently colonized the Ngousso from Cameroon [23], a dswAPL1 phenotype was observed against P. falciparum isolates resulting from natural infection, but only dsA-PL1A had a phenotype against the cultured P. falciparum isolate NF54 while only dsAPL1C demonstrated a phenotype against P. berghei or P. yoelii [24]. A further analysis of the Ngousso/NF54 infection model suggested the phenotype was specifically due to the APL1A 2 allele, which lacks the C-terminal CC domain and is not constitutively secreted from cells [8]. Studies using an outbred strain, Keele [25], and P. falciparum NF54 had different outcomes depending on infection intensity: a phenotype was only observed at low or medium infection intensities for dswAPL1, and specifically for dsAPL1B or dsAPL1C, but not dsAPL1A [26]. The NF54 isolate is able to infect the otherwise refractory A. gambiae L3-5 strain by evading the TEP1 immune response [21,27]; such adaptation may partly explain the variable phenotype of APL1 knockdown.
The majority of the sequence variation between the APL1 proteins exists at the N-and C-termini of the protein sequences (S1 Fig.). APL1A 2 and APL1C contain an N-terminal lowcomplexity region of variable extent (22-77 amino acids in APL1C) with multiple (P,L)--ANGG-(P,L) repeats [9]. APL1A and APL1C each contain 15 LRR repeats, APL1B has 13 and LRIM1 only has 11 [14]. Allele specific differences in APL1A 2 and APL1A 3 result in premature stop codons upstream of the CC domains. Within the CC domain, APL1B is significantly different from both APL1A and APL1C in the HLH motif. Outside of the HLH, APL1B and APL1C are identical within the CC region except for the last 30 residues, while APL1A 1 diverges from APL1C in the last 60 residues (S1 Fig.).
Extracellular LRR proteins are typically flanked by disulfide-containing capping motifs at the N-and C-termini, called LRRNT and LRRCT, respectively, that protect the hydrophobic cores of the first and last LRRs [28]. The LRIM family has a unique LRRCT that distinguishes it from other LRR families [14]. The LRRCT motif has two disulfide bonds with a conserved structure that is observed in the crystal structures of LRIM1 and APL1C [13]. LRIM1 and APL1C also form an intermolecular disulfide bridge between LRIM1 Cys352 and APL1C Cys562. The latter cysteine is conserved in APL1B and APL1A 1 but not in APL1A 2 (S1 Fig.). The APL1 LRRNT capping motif contains only a single disulfide bond that was resolved in the crystal structure of the APL1C LRR domain (PDB ID 3O6N). While this motif is found in almost all members of the LRIM family, APL1B and LRIM1 do not have a defined LRRNT, which is unusual for extracellular LRR proteins (S1 Fig.).
We have previously hypothesized that the LRR domains of LRIM1/APL1 act as molecular spacers [13], creating a defined distance between the N-terminal and C-terminal variable regions that are the primary binding sites for other immune factors. For instance, the CC domain and the HLH motif in particular, appear crucial for binding and stabilization of TEP1 cut [13,29]. However, while APL1A 1 and APL1B possess CC domains similar to APL1C it is not known whether these proteins form similar complexes with LRIM1, and if so, whether that complex interacts with TEP1 or other A. gambiae TEPs. The answer to this question directly impacts our understanding of the unknown mechanism of APL1A's observed phenotype against human malaria, and whether multiple TEP1-mediated or TEP1-independent mechanisms of Plasmodium killing operate in A. gambiae. Furthermore, the absence of the LRRNT motif in APL1B raises the question as to how the N-terminus of the LRR fold is stabilized and whether this has any relevance to its unknown function.
Here we report further in vitro studies of the APL1A and APL1B proteins. The structure of the LRIM1/APL1C heterodimer is extended and flexible in solution. Structural analysis of APL1B identified that the protein exists as a homodimer that is mediated by a unique N-terminus which has no homology to other LRIM family members [14]. Finally, we show that APL1B forms a disulfide-bridged heterodimer with LRIM1 analogous to APL1C. These results are likely to have direct bearing on the functional role of APL1B in the A. gambiae immune system.

Results
The crystal structure of LRIM1/APL1C does not represent its structure in solution The crystal structures of TEP1 (both Ã R1 and Ã S1 alleles), the LRR domains of LRIM1 (LRIM1-LRR) and APL1C (APL1C-LRR), and the LRIM1/APL1C heterodimer have been previously determined. To test whether these structures were representative of the structure in solution we analyzed TEP1 Ã R1, LRIM1/APL1C, APL1C-LRR and LRIM1-LRR using small-angle x-ray scattering (SAXS). The proteins behaved as homogeneous, monodisperse solutions within the concentration range 1-5 mg/ml (supplementary methods). Primary analysis of the Guinier plot, p(r) distribution, Kratky and Porod plots further confirmed that the samples are generally folded, monodisperse particles (S2 Fig.). A consistent measure of I 0 and R G were determined for all samples (Table 1) with one exception; the R G for LRIM1/APL1C derived from the p(r) distribution (R G = 57.2 ± 0.1 Å) was significantly greater than that determined by Guinier analysis (R G = 54.5 ± 0.3 Å). The LRIM1/APL1C p(r) distribution is also the function with the lowest confidence score in GNOM [30], likely due to the functions extended tail at large interatomic distances up to D max = 210 Å. While consistent with the long rod-like feature of the LRIM1/APL1C coiled-coil, this makes D max difficult to determine. Indeed a function with the same I 0 can be produced up to D max = 250 Å, which is the maximum D max for the implicit Fourier transform given the sampling limit set by smallest q value measured in the experiment. A set of twenty ab initio bead models were generated by DAMMIF [31] for each structure and were generally self-consistent as judged by their normalized structural discrepancy (NSD). The visual fit of the bead models (S3A Fig.) for TEP1 Ã R1 and APL1C-LRR were quite reasonable except for the hole in the β-ring of TEP1 which is counter to one of the assumptions in ab initio models, hence the volume is narrower in that region. For LRIM1-LRR the fit was reasonable but with extra unaccounted volume in the bead model. For LRIM1/APL1C however, the fit was poor. The bead model is essentially a tube that is significantly longer than the crystal structure and, despite thickening at one end, the model does not accommodate the two LRR domains; the ends project out of the volume as does the end of the CC region.
Two approaches were used to compare the known crystal structures of each domain to the solution scattering curve. First, we used the program CRYSOL [32] to calculate the fit of the crystal structure plus a hydration shell to the scattering curve (S3B Fig.). The goodness-of-fit as judged by the χ 2 statistic were reasonable for TEP1 Ã R1 and LRIM1-LRR; less so for APL1C due to deviations at q~0.2 Å -1 values, but the R G of each hydrated crystal structure was similar to the experimental value (Table 1). For LRIM1/APL1C however, the fit was appreciably worse (χ 2 = 6.85), and notably the R G of the hydrated crystal structure (R G = 45.0 Å) was significantly less than the experimental value.
Two sources of error in comparing the crystal structures to solution scattering curves is (i) missing N-and C-terminal residues and internal loops that are disordered in the crystal structure, and (ii) N-linked glycosylation which is disordered and also affects the predicted scattering from the hydration shell surrounding the protein. To model these additional factors we used the software ALLOSMOD-FOXS [33] to generate a series of static models containing multiple conformations of both protein loops and N-linked glycans ( Fig. 2A). For TEP1 Ã R1, APL1C-LRR and LRIM1-LRR this led to a significant improvement in the goodness-of-fit, as judged by the χ 2 statistic, for a single model (Fig. 2B). For LRIM1/APL1C however, the fit remains poor (χ 2 = 5.14), and the model R G = 47.5 Å is still significantly below the experimental value.
Taken together, these results suggest that a single glycosylated conformation based on the crystal structure is consistent with the observed solution scattering curve for TEP1 Ã R1, LRIM1-LRR and APL1C-LRR. Hence, the solution structure is most likely a local ensemble of similar structures as the single best conformation. For LRIM1/APL1C however, the experimental scattering curve cannot be accurately modeled by a conformation based upon the known crystal structure. The experimental evidence suggests the structure in solution is more extended, which requires movement of either or both of the LRR domains away from the CC domain. A satisfactory minimal ensemble of structures for LRIM1/APL1C that fit the experimental curve has not yet been derived by flexible fitting approaches.

The LRR domain of APL1B forms a homodimer in solution
We then examined the state of the APL1A 1 and APL1B LRR domains in solution in comparison to APL1C and LRIM1. The four LRRs were expressed in insect cells, purified to homogeneity and analyzed by size-exclusion chromatography (SEC) (Fig. 3). As expected from their been prior purification [13] and SAXS analysis, LRIM1-LRR and APL1C elute at an apparent molecular weight consistent with a monomer ( Table 2). APL1A 1 -LRR includes a short extension at the N-terminus compared to APL1C-LRR. Partial proteolysis in this region was observed in purified APL1A 1 -LRR, which appears as a doublet on SDS-PAGE (Fig. 3, inset). The N-terminal cleavage site was determined by Edman sequencing to occur at position Gln 42. Nevertheless, APL1A 1 -LRR also elutes at an apparent molecular weight of a monomer. Surprisingly however, APL1B-LRR eluted at a higher apparent molecular weight (MW~90 kDa), consistent with a dimer.
Retention on SEC is related to a protein's hydrodynamic radius, which is influenced by shape as well as molecular weight (MW). We therefore used SEC with multi-angle laser light scattering (SEC-MALLS) or analytical ultracentrifugation (AUC) to verify the apparent MW of each protein from SEC ( Table 2, S4 Fig.). The MW determined for APL1B-LRR was 91 kDa, almost twice that measured for APL1A 1 -LRR (55 kDa), APL1C-LRR (50 kDa) and LRIM1-LRR (36 kDa). The MW of the monomers are several kDa greater than calculated due to N-linked glycosylation (APL1A 1 -3 sites, APL1B-3 sites, APL1C-4 sites, LRIM1-2 sites). Addition of EDTA to the buffer did not dissociate the APL1B-LRR dimer, ruling out metal-induced aggregation via the C-terminal 6×His tag. Since all APL1 LRRs are highly homologous in sequence, we concluded that the dimeric state of APL1B-LRR is due to its unique N-terminal sequence.   Crystal structure of the APL1B-LRR homodimer To determine the molecular basis of APL1B-LRR homodimerization, we crystallized APL1B-LRR and solved the structure to 1.74 Å resolution ( Table 3). The refined model contains two molecules of APL1B-LRR in the asymmetric unit, residues 25-364 for chain A and residues 27-364 for chain B (the crystallized protein comprises APL1B 21-370 and a C-terminal 6×His tag). Of the three predicted N-linked glycosylation sites (Asn166, Asn211, Asn317) a single N-acetylglucosamine was resolved for residue Asn166 of each domain. The overall structure of APL1B-LRR is similar to APL1C-LRR (Cα rmsd 0.605 Å). However, APL1B has two fewer LRRs than APL1C so the overall length of its LRR domain is~70 Å versus~80 Å for APL1C (Fig. 4A). The N-terminus of APL1B is well resolved (Fig. 4B). The LRRCT motif is conserved with APL1C (Fig. 4C). A homodimeric interface exists between the two APL1B-LRR domains in the asymmetric unit that buries an area of~1500 Å 2 representing 9-10% of total solvent-accessible surface area (Fig. 5A). The dimer interface is formed between the N-terminal helix of one domain and the first LRR of the other domain. The two N-terminal helices form a dovetail joint between the solenoid LRRs that hold the two domains at a 90°angle to each other. Residues 31-44 form an amphipathic α-helix such that Leu35, Leu38, Leu39, and Phe42 pack against the first LRR, protecting the hydrophobic surface from the solvent (Fig. 5B). Interestingly, a short, well resolved β-strand on the N-terminal loop of one domain (Met28-Leu30) forms a parallel β-sheet with a β-strand on the first LRR of the second domain (Ile55-Ile58). Backbone interactions are formed between residues Met28 and Leu30 with Gln54, Glu57 and Asp59 on the second domain (Fig. 5C).
We compared the structure of the N-terminus of APL1B to that of APL1C and LRIM1. Several differences exist at the N-termini of the LRR domains. In the absence of an LRRNT, APL1B has an α-helix that protects the first LRR of the second domain within the homodimer (Fig. 5D). In contrast, APL1A and APL1C cap the leading LRR by a canonical LRRNT (Fig. 5E), while a short antiparallel β-strand and an α-helix formed by residues 43-58 protect LRIM1's leading LRR (Fig. 5F).
We compared the specific APL1B residues located at the homodimeric interface with the corresponding residues in APL1C by sequence alignment (Fig. 6). Intriguingly, we found that the secondary structure elements of the APL1B N-terminus are found on the convex face of LRR II of APL1C and LRR I of APL1B is identical to the corresponding LRR III of APL1C (shown in yellow in Figs. 4D, E and Fig. 6). Hence, an N-terminal truncation in the middle of LRR II leads APL1B to form a homodimer to protect the hydrophobic face of the leading LRR via domain swapping with secondary structure complementation [34].
The APL1 genes are highly polymorphic and under positive selection pressure, presumably based on the binding of either other Anopheles proteins or of Plasmodium or other microbial ligands. If a specific portion surface of the LRR domain were responsible for binding a common ligand (e.g. another Anopheles immune factor) we may expect the residues within that region would be conserved between APL1 paralogs. Conversely, if a specific region on the surface of the LRR domain were a site of selection for binding different ligands (e.g. from different Plasmodium species), then mapping polymorphisms within or between APL1A-C onto either the APL1B-LRR or APL1C-LRR structure might reveal a localized hypervariable patch on the surface, as certain hypervariable loops between TEP1 alleles are proximal to the thioester bond [15,16]. Mapping of polymorphisms between APL1A-C onto the APL1C-LRR crystal structure (PDB ID 3O6N) however, shows that the mutations are distributed all over the concave and convex faces of the structure (S5 Fig.). This is consistent with the hypothesis that the LRR domain itself is not the primary site of interaction between LRIM1, APL1A-C and their (unknown) ligands.

Heterodimerization of APL1A-C with LRIM1
Given the similarities between APL1A 1 , APL1B and APL1C, we hypothesized that APL1A 1 and APL1B may form a heterodimer with LRIM1 as seen for APL1C [13]. We tested this hypothesis by generating recombinant baculovirus for dual-expression of 6×His-tagged LRIM1, APL1A 1 , APL1A 2 APL1B, and APL1C with FLAG-tagged LRIM1 (Fig. 7A) in T. ni cells and performed co-immunoprecipitation (coIP) assays from conditioned medium. Based on the qualitatively constant level of LRIM1-FLAG in the supernatant we can establish the qualitative level of expression/stability of the 6×His-tagged proteins as LRIM1, APL1C >> APL1B >> APL1A 1 ; APL1A 2 not secreted. LRIM1 and APL1C are able to homodimerize in vitro [13], but heterodimerization is more efficient. In the case of LRIM1-6×His and LRIM1-FLAG, the 6×His-tagged homodimer is more efficiently formed and/or immunoprecipitated, such that LRIM1-FLAG did not co-immunoprecipiate with LRIM1-6×His in the conditions of our experiment (Fig. 7B). Despite its similarity to APL1A 1 , APL1A 2 is not secreted from the cells [8] hence LRIM1-FLAG was not immunoprecipitated. However, LRIM1-FLAG co-immunoprecipitated with APL1A 1 , APL1B and APL1C.
A further test for the formation of a specific heterodimer between LRIM1 and APL1B is whether an intermolecular disulfide bond is formed similar to LRIM1/APL1C [13]. Hence we co-infected T. ni cells with 6×His-tagged LRIM1 and 6×His-tagged APL1A-C and analyzed reducing and non-reducing SDS-PAGE with Western blotting. Full-length LRIM1 and APL1C, but not LRIM1 C352S, form a heterodimer in the absence of reducing agent (Fig. 7C). Heterodimerization of LRIM1/APL1C is more efficient than homodimerization, but some LRIM1 and APL1C also form homodimers (marked with asterisk) in the absence of reducing agent. The expression level of full-length APL1A 1 -6xHis was too low to confidently detect in this experiment, but LRIM1 and APL1B clearly formed a disulfide-mediated heterodimer.

Discussion
Protein complexes are essential for host innate immune responses to invasion by pathogens [35]. The LRR fold, commonly involved in protein-protein interactions [36], is not surprisingly involved in numerous central functions related to immunity and pathogenesis including cell signaling, platelet aggregation and attachment and invasion of pathogenic bacteria into host cells [37][38][39][40][41][42][43][44]. LRR domains consist of multiple repeats that form a curved solenoid structure with a parallel β-sheet on its concave face, which serves as the canonical binding surface. Numerous LRR proteins are known to form dimers and dimerization is almost always physiologically significant. Decorin, an extracellular matrix proteoglycan, forms a homodimer with an interface that spans over three quarters of the length of the concave face. Dimerization of decorin is coupled to protein folding and the interaction site is also involved in ligand binding [45]. The cell surface molecule AMIGO-1 exists as dimer with an interface of 1350 Å 2 formed by the concave side of the LRR domain as well as the LRRNT and LRRCT capping motifs. Dimerization of AMIGO-1 is necessary for proper cell surface expression and is likely to be involved in cell to cell adhesion [46]. Slits are multi-domain proteins containing four consecutive LRR domains that are important in neuronal development. The crystal structure of Slit2 showed that the protein exists as a homodimer where the entire concave face, including the C-terminal cap, is involved [47]. In contrast, Toll-like receptor (TLR) ectodomains dimerize via their lateral faces, forming an "m-shaped" structure [48]. Ligand-induced dimerization of TLRs is crucial for the recognition and initiation of innate immune responses against a wide variety of pathogens.
We have previously reported the heterodimeric structure of LRIM1 and APL1C, which is not mediated by an interaction between their respective LRR domains but mainly by a separate CC domain. Nevertheless, heterodimerization is also required for the function of LRIM1 and APL1C in stabilizing TEP1 cut . Here we describe the homodimer of APL1B, in which dimerization is mediated by domain-swapping of the leading LRR in the absence of a standard N-terminal capping motif. While domain swapping is a common mode of oligomerization in proteins [49], to our knowledge a domain-swapped oligomer of a LRR protein has not previously been reported. Despite the lack of a known functional role of APL1B within the A. gambiae immune system, we assert that homodimerization of APL1B is likely to be of physiological consequence.
The homodimerization of APL1B is due to the absence of a defined capping motif at the N-terminus, called the LRRNT. The unusual nature of this deficit was previously noted in a comprehensive sequence analysis of LRIM1 and APL1 proteins in the genomes of A. gambiae, Aedes aegyptii and Culex quinquefasciatus [14]. In the 83 proteins that were analyzed only five -A. gambiae LRIM1, APL1B and three LRIM20 orthologs-lacked the two-cysteine motif prior to the leading LRR that represents an LRRNT as found in the structure of APL1C-LRR [13]. To determine if any additional members may recently have been identified, we examined the N-terminal region of all 30 orthologs of APL1B (AGAP007035) listed in Vectorbase (www. vectorbase.org). A one-to-one ortholog for APL1B was not found and all of the orthologs that lack the LRRNT (nine proteins: AATE001420, ADIR007846, ADIR007869, AFAF007474, AFAF009836, AFAF016856, AFUN000279, AFUN000288, AFUN000597 from A. atroparvus, A. dirus, A. farauti and A. funestus) also lacked a signal peptide. Hence, the structure of APL1B may represent a unique feature within the LRIM1 family.
While the N-terminal sequence of APL1B is not conserved, the LRR that it binds to is. LRR III of APL1A and APL1C is generally conserved in long LRIM family members. We note that, if APL1A or APL1C were truncated within LRR II, whether transcriptionally, post-translationally by an endopeptidase, or if APL1B was co-expressed with APL1A and APL1C, the N-terminus of APL1B could theoretically bind to LRR III in an equivalent manner as observed for the APL1B homodimer. Domain swapping is not only associated with oligomer assembly but also with misassembly [50]. Consider that intracellular retention of APL1A 2 is not readily explained (while being >90% identical to APL1C and APL1A 1 , which are both secreted) and that at least some LRIM family proteins lacking a signal peptide are in fact intracellular and not simply misannotated. We speculate that domain swapping could occur during the co-expression of LRIM proteins and be associated with intracellular function or retention. Following secretion however, all evidence suggests that the LRR domains of LRIM1, APL1A 1 and APL1C are quite stable to proteolysis. Domain swapping of a folded LRR would therefore require specific activity of a chaperone or protease complex.
Full-length APL1B may not only form an N-terminal homodimer but is also capable of forming a heterodimer with LRIM1 via its CC domain. It should be noted that neither LRIM1 or APL1B homodimers nor LRIM1/APL1B heterodimers have been detected in vivo, and could reflect an artifact of heterologous expression. If an LRIM1/APL1B complex were formed it could potentially exist as a 2:2 heterotetramer. The CC domains of APL1B and APL1C are identical except for four point mutations within the HLH motif that is inserted within the CC structure, and the final~30 residues of the coil (S1 Fig.). The successful coIP of APL1B with LRIM1 therefore suggests that this structural relationship is preserved within the LRIM1/ APL1B complex along with the intermolecular disulfide bond.
We have not yet purified LRIM1/APL1B to homogeneity or demonstrated any interaction with TEP1, but hypothesize that the diversity in the CC domain serves to ensure the association of the correct N-terminal LRR domains (i.e. LRIM1 with APL1C) and that this diversity, specifically the HLH, may regulate the association with TEP1 or other TEPs (such as TEP3, TEP4, or TEP6). The structural diversity within the APL1 genes is consistent with their being principal mediators of pathogen discrimination and specificity of the immune response to Plasmodium. TEP1 and LRIM1 in contrast are static structural components; variation in these proteins more likely affects affinity and reactivity, but not specificity, of the immune complex. APL1A 1 diverges from APL1C to a greater extent than APL1B in the final~60 residues of the CC domain, yet was observed to interact with LRIM1. Further in vivo studies are required to confirm the association of LRIM1 and APL1B. and APL1A 1 . Our in vitro results are consistent with the observation that APL1A 2 , which confers the most protective effect against P. falciparum and lacks a CC domain, is not constitutively secreted from cells [8]. Since APL1A 2 contains a signal peptide, the lack of secretion could be as a result of the absence of its binding partner in T. ni cells, incorrect protein folding or the absence of specific cofactors required for its secretion upon immune challenge in vivo. We hypothesize that APL1 heterodimers formed with LRIM1 exert their anti-Plasmodium activity via interaction with TEP1 (or other TEPs), but could also include TEP-independent pathways. No function has been attributed to the N-terminal low-complexity region but the absence of these repeats in APL1A 1 and APL1B and their presence in APL1A 2 and APL1C suggests that this region may confer a specific APL1 function and/or adaptive value [9].
In summary, the multivalent interactions identified for APL1B support a general model for the LRIM1/APL1 family as molecular scaffolds capable of generating a wide range of innate immune complexes. In the absence of an adaptive response, a genetically encoded repertoire of protein complexes and a high rate of polymorphism within immune genes can allow mosquito populations to adapt and defend against a diverse and rapidly evolving set of environmental pathogens. Understanding the molecular basis of this immune repertoire is important because the Anopheles immune defense to environmental pathogens is cross-reactive to human pathogens such as P. falciparum of which it is a vector. P. falciparum itself is under constant selective pressure to adapt to and evade the Anopheles immune system. Future studies to determine the function of the APL1 genes may lead to (i) genetic tests to accurately predict the vectoral capacity of field-sampled mosquitoes, and (ii) the development of strategies that enhance the natural immunity of A. gambiae to Plasmodium, thereby reducing transmission.

Protein expression and purification
All proteins were expressed using the Bac-to-Bac system (Invitrogen). Spodoptera frugiperda cells (Sf9, Invitrogen) in Sf900-III medium (Invitrogen) were used for the propagation of the baculoviruses while Trichoplusia ni cells (Expression Systems LLC) in ESF921 medium (Expression Systems LLC) were used for large-scale protein expression at 27°C. Infection with baculovirus constructs was performed at a multiplicity of infection (MOI) of 1.0. Harvesting of cells was optimized for each protein and ranged from 40-72 hours post infection (hpi). Purification and coIP experiments were performed as previously described [13,15,16]. The presence of disulfide-linked dimerization was detected by performing small-scale expression in T.ni cells infected with the full-length LRIM1 or APL1 viruses. Conditioned media were collected 24-72 hpi and evaluated with reducing or non-reducing 4-20% SDS-PAGE and α6×His/HRP (Clontech) or αFLAG-M2 (Sigma-Aldrich) Western blotting.
Purification of the LRR domains were performed as follows: following Talon affinity chromatography (Clontech) and elution in 250 mM NaCl, 20 mM Tris-HCl pH 7.8, 250 mM imidazole, the LRR domains were purified with MonoS ion exchange (GE) with Buffer A (50 mM NaCl, 20 mM HEPES pH 7.5) and a salt gradient with Buffer B (600 mM NaCl, 20 mM HEPES pH 7.5). The proteins were further purified with SEC on a Superdex 75 (16/600) column (GE) equilibrated with 100 mM NaCl, 20 mM HEPES pH 7.5.
Sedimentation velocity was used to accurately determine the MW of APL1A 1 -LRR at a concentration of 0.7 mg/ml in a Beckman XL-1 centrifuge at 20°C and 45,000 rpm. Data was analyzed with SEDFIT [52].

Crystallization
Purified APL1B-LRR at 5 mg/ml rapidly crystallized in various conditions. Diffraction data to a resolution of 1.74 Å were collected from a crystal (space group P2 1 2 1 2 1 ) grown in 0.1 M NaCH 3 COO pH 5.2, 0.1 M Li 2 SO 4 , 1.0 M (NH 4 ) 2 PO 4 at 293 K. The crystal was cryoprotected in the same condition containing 30% glycerol and frozen directly in a nitrogen cryostream at 100 K.
Data collection and refinement X-ray data was collected at NSLS beam line X25C (Brookhaven National Laboratory). Data processing was performed with HKL2000 [53]. Molecular replacement was performed with MOLREP [54] using APL1C-LRR (PDB ID 3O6N) as search model. Subsequent refinement was performed with REFMAC [55] followed by model building in COOT [56]. Side chain clashes were checked by the addition of riding hydrogens and geometric analysis by MOL-PROBITY [57]. The total solvent-accessible surface area was analyzed by PISA [58]. The coordinates and structure factors are deposited in the PDB database with PDB ID 4XGO.
Small-angle x-ray scattering SAXS data was collected at ALS SYBYLS beam line. Frozen aliquots of TEP1 Ã R1, LRIM1/ APL1C, APL1C-LRR and LRIM1-LRR were re-purified by SEC in PBS prior to data collection. Samples ranging from 1-5 mg/ml were interleaved with buffer and subjected to one second exposure. Buffer subtraction was performed by in-house software followed by primary data analysis using the ATSAS software package. Guinier analysis as a function of concentration confirmed each protein behaved as an ideal scatterer (R G constant vs. concentration, I 0 linearly dependent with concentration and proportional to MW).
Three approaches to modeling the solution structure of each sample. First, a set of ab initio bead models were derived for each structure using DAMMIF, superimposed and compared to the crystal structure. Second, the crystal structure is used to model the solution scattering curve in CRYSOL. Third, we used the flexible fitting approach implemented by ALLOSMOD-FOXS to derive a best fit model incorporating missing residues and N-linked glycosylation from SAXS to the crystal structures of TEP1 Ã R1 (4D94), LRIM1/APL1C (3OJA), APL1C-LRR (3O6N) and LRIM1-LRR (3O53) [13].
Supporting Information S1 Fig. Structure-based sequence alignment of the full-length sequences of APL1A 1 . APL1A 2 , APL1B, APL1C and LRIM1. The signal peptide sequences for each are shown in purple and do not form part of the alignment. Sequences that show similarities across a group are boxed in blue, similar residues across a group are boxed and highlighted yellow, similar residues within a group are in bold and residues that are strictly conserved are highlighted in red. Symbols above blocks of sequences correspond to the secondary structure of APL1C (Chain B) of PDB ID 3OJA containing helices (grey), beta sheets (black arrows), 3 10