A Rational Engineering Strategy for Designing Protein A-Binding Camelid Single-Domain Antibodies

Staphylococcal protein A (SpA) and streptococcal protein G (SpG) affinity chromatography are the gold standards for purifying monoclonal antibodies (mAbs) in therapeutic applications. However, camelid VHH single-domain Abs (sdAbs or VHHs) are not bound by SpG and only sporadically bound by SpA. Currently, VHHs require affinity tag-based purification, which limits their therapeutic potential and adds considerable complexity and cost to their production. Here we describe a simple and rapid mutagenesis-based approach designed to confer SpA binding upon a priori non-SpA-binding VHHs. We show that SpA binding of VHHs is determined primarily by the same set of residues as in human mAbs, albeit with an unexpected degree of tolerance to substitutions at certain core and non-core positions and some limited dependence on at least one residue outside the SpA interface, and that SpA binding could be successfully introduced into five VHHs against three different targets with no adverse effects on expression yield or antigen binding. Next-generation sequencing of llama, alpaca and dromedary VHH repertoires suggested that species differences in SpA binding may result from frequency variation in specific deleterious polymorphisms, especially Ile57. Thus, the SpA binding phenotype of camelid VHHs can be easily modulated to take advantage of tag-less purification techniques, although the frequency with which this is required may depend on the source species.


Introduction
Therapeutic antibodies (Abs) represent the fastest-growing class of biologic drugs, with expanding applications in cancer, chronic diseases and autoimmunity (reviewed in [1][2][3]). Currently licensed biologics are most commonly fully human or humanized monoclonal Abs (mAbs), with antigen-binding fragments such as Fab, F(ab') 2 and scFv making up a

Phage-displayed libraries and isolation of V H Hs
The V H Hs described in this report are directed against a variety of antigens and were initially isolated from phage-displayed V H H libraries for reasons other than the study of SpA binding, as described previously [32][33][34][35][36][37][38].

Soluble V H H monomer and pentamer expression and purification
Wild-type or engineered V H H genes bearing BbsI/BamHI or BbsI/ApaI restriction sites were cloned into pSJF2H or pVT2 expression vectors (for monomeric and pentameric expression, respectively) as described [32,33,35]. 6×His-and c-Myc-tagged V H H monomers and pentamers were expressed in E. coli TG1, extracted from the periplasm by osmotic shock and purified by immobilized metal affinity chromatography (IMAC) or protein A chromatography using using HisTrap HP or HiTrap Protein A HP columns, respectively (GE Healthcare, Piscataway, NJ; [32,33]). The integrity and aggregation status of soluble V H H monomers and pentamers were assessed by SDS-PAGE, Western blotting and size exclusion chromatography [32,33].

Surface plasmon resonance (SPR)
For screening of V H H monomers and pentamers for binding to immobilized SpA at a single concentration, analyte proteins were used directly after purification by immobilized metal affinity chromatography. For determination of binding affinities to immobilized antigen or SpA, V H H monomers were purified by size exclusion chromatography using Superdex™ 75 or 200 10/300 GL columns (GE Healthcare) on an ÄKTA FPLC protein purification system (GE Healthcare), and the monomer peaks collected in HBS-EP buffer (10 mM 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES), pH 7.4, 150 mM NaCl, 3mM ethylenediaminetetraacetic acid (EDTA), 0.005% (v/v) surfactant P20; [32,33]). Briefly, either SpA (Thermo Fisher, Waltham, MA), human ICAM-1 ectodomain (R&D Systems, Minneapolis, MN) or human IGF1R ectodomain (R&D Systems) were immobilized on CM5 sensor chips using an amine coupling kit (GE Healthcare) in 10 mM acetate buffer, pH 4.5, with surface densities of 609, 1348 and 476 resonance units, respectively. V H H monomers and pentamers were injected at 25°C in HBS-EP buffer at a flow rate of 20 μL/min on a Biacore 3000 instrument (GE Healthcare) at different concentration ranges, depending on the interaction (binding to SpA: 50 nM -25 μM; binding to ICAM-1: 0.5 nM-200 nM; binding to IGF1R: 0.1 nM-10 nM). All surfaces were regenerated using 10 mM glycine, pH 2.0. Data were analyzed using BIAevaluation 4.1 software (GE Healthcare) and for affinity determinations, fitted to a 1:1 binding model.

In silico scanning mutagenesis of the SpA:V H H interaction
The co-crystal structure of a human IGHV3-encoded Fab in complex with domain D of SpA (PDB ID: 1DEE) was used as a starting point for virtual scanning mutagenesis. Only one copy of the IGHV3 domain (chain D) and the bound SpA fragment (chain G, corresponding to SpA domain D positions Asp8-Lys58 as numbered by Graille et al. [25]) were retained. The IGHV3 domain was first "camelized" by introducing four mutations in FR2: Val37Phe, Gly44Glu, Leu45Arg and Trp47Ala. Hydrogen atoms were added to the resulting V H H:SpA complex and adjusted to maximize H-bonding interactions. Structural refinement of the complex was then carried out by energy-minimization using the AMBER force-field [41,42] with a distancedependent dielectric and infinite cutoff for non-bonded interactions. Non-hydrogen atoms were restrained at their crystallographic positions with harmonic force constants of 20 and 5 kcal/(mol . A 2 ) for the backbone and side-chain atoms, respectively. The resulting structure was then used for single-point scanning mutagenesis simulations at the following positions of the IGHV3 domain: 15, 17, 19, 57, 59, 64, 65, 66, 68, 70, 75, 81, 82a, and 82b (all positions using Kabat numbering). We used three protocols (SIE-SCWRL [43][44][45], FoldX [46,47] and Rosetta [48,49]) for modeling the structures and evaluating the energies of single-point substitutions of the other 17 naturally-occurring amino acids (Cys and Pro excluded) at each of these 14 positions relative to the wild-type sequence. A consensus approach over specific versions of these three protocols was applied for building and scoring IGHV3 mutants. Further technical and implementation details of this approach and its component methods can be found in Sulea et al. [50].

Next-generation DNA sequencing
Phagemid replicative form DNA from naïve and immune phage-displayed V H H libraries was isolated from E. coli TG1 cells using QIAprep spin miniprep kits (QIAGEN, Valencia, CA). Next-generation sequencing libraries were generated by two-step PCR amplification of V H H genes and purified as previously described [51,52]. The final amplicons were pooled and purified from 1% (w/v) agarose gels using a QIAquick 1 gel extraction kit (QIAGEN), desalted using Agencourt AMPure XP beads (Beckman-Coulter, Pasadena, CA), then sequenced on a MiSeq Sequencing System (Illumina, San Diego, CA) using a 500-cycle MiSeq Reagent Kit V2 and a 5% PhiX genomic DNA spike. From each sample, 0.1-2.1 million reads were generated, of which 0.4×10 5 -6.0×10 5 were used for analysis after assembly using FLASH (default parameters; [53]) and quality filtering using the FAST-X toolkit with a stringency of Q30 over 95% of each read [54].

Definition of V H H residues involved in SpA binding
Previous structural work [25] suggested the existence of a set of seven core SpA binding residues in human IGHV3 heavy-chain variable regions (positions 19, 65, 66, 68, 70, 81 and 82a; all positions using Kabat numbering), with a lesser contribution of six additional contact residues (positions 15, 17, 57, 59, 64 and 82b). Parenthetically, these tend to be mostly conserved in SpA-binding human and murine Abs and altered via germline variation or somatic hypermutation in non-SpA-binding Abs [17][18][19], although neither the importance of each residue nor the spectrum of tolerated substitutions has been rigorously tested by mutagenesis studies. Overlay of the three-dimensional structures of a human IGHV3-encoded IgM heavy-chain variable domain [25] and a llama V H H directed against Clostridium difficile toxin A [55] showed strong overall conservation of these immunoglobulin folds (Fig 1).
To investigate whether the conserved SpA-binding residues in human IGHV3 Abs were also important for V H H binding by SpA, we first determined the sequences of 55 V H Hs obtained in our lab along with an IGHV3-encoded human autonomous V H domain, HVH430 [56], and assessed their binding to immobilized SpA at a single concentration (250 nM) by SPR (Fig 2; S1 Table). Many, but not all, camelid V H Hs shared the human IGHV3 consensus sequence at all thirteen SpA contact residues, suggesting that they may interact in a similar way with SpA. Complete or near-complete sequence conservation at some V H H positions (Gly15, Ser17, Arg66, Ser70 and Gln81) among both SpA-binding V H Hs and non-SpA-binding V H Hs prevented the assessment of their importance. In agreement with the conclusions of structural studies [25] and a previous mutagenesis study [57], the salt bridge formed between the Asp residue at position 36 of SpA domain D and the conserved Arg residue at V H H FR1 position 19 was indispensable, as its replacement with Lys, Ser or Thr abrogated SpA binding. Similarly, replacement of core Gly65 with negatively charged Asp, or of core Asn82a with Asp or Ser, had a destructive effect on SpA binding; the latter result is consistent with a previous mutagenesis study [57] which found that substitution of Asn82a with Ala abrogated SpA binding. Surprisingly, one V H H bearing a substitution of core Thr68 with Ala (V H H36) showed residual SpA binding. However, since V H Hs encoding Ala68 in combination with other substitutions did not bind SpA, and the more conservative replacement of Thr68 with Ser reduced or ablated SpA binding, we infer that that the limited polymorphisms tolerated at this position confer a partial destabilizing effect on SpA binding.
In agreement with previous work [18,23], we found that a variety of residues (Thr, Arg and Lys) were tolerated at the non-core V H H CDR2 position 57, although the full spectrum of tolerated residues could not be conclusively identified due to the co-occurrence of substitutions at other positions (Fig 2; S1 Table). Conversely, the presence of Ala57 clearly ablated SpA binding, and surprisingly, two V H Hs bearing Ile57, ICAM11-4 and V H H55, did not bind SpA; Ile57 is present in some germline human IGHV3 genes and was previously judged as permissive for SpA binding by some groups [23,25] but not others [18]. There was some indication that V H H Tyr59, which forms a H-bond with SpA Asp37 that was not considered a core interaction by Graille et al. [25], might be essential, as its substitution with His or Val (in combination with other substitutions in V H H26 and IGF1R-4) ablated SpA binding. Both reversal of charge at the non-core V H H position 64 (Lys!Glu) and substitution of non-core Ser82b with Asn or Arg appeared to be well tolerated, with no consistent detrimental effect on SpA binding.
Thus, the overall picture emerging from these data was that V H H interaction with SpA depended primarily on the same set of residues as human IGHV3 Abs, albeit with: (i) an unanticipated but minor degree of tolerance for variation at core position Thr68 and a broader tolerance at non-core positions Thr57, Lys64 and Ser82b; (ii) a potentially critical role for non-core V H H position Tyr59; and (iii) a destructive effect of Ile at V H H position 57, in contrast to the predictions of some previous studies. We attempted to corroborate these data using a set of V H H pentamers [35] but found that the multivalent nature of the pentamer:SpA interaction made it difficult to discriminate very weakly-binding from non-binding V H H pentamers (S1 Fig; S2 Table). We found no evidence to suggest that FR sequence polymorphisms outside the previously defined SpA interface [25] played any role in SpA binding, with one notable exception (V H H39): this V H H did not bind SpA despite bearing the human IGHV3 consensus sequence at all 13 SpA contact positions, but encodes a single-residue deletion at FR3 position 76. This provided the first preliminary evidence that FR3 positions 71-80 may influence structuring of nearby SpA contact residues and play an indirect role in modulating SpA binding.  Table) were assayed for binding to immobilized SpA at a single concentration (250 nM) by SPR and the number of response units bound at the end of the injection was recorded. The solid line represents binding of HVH430 (17 RUs), an IGHV3-encoded human autonomous domain. For V H Hs bearing the human IGHV3 consensus residue (shown on the X-axis) at all 13 SpA contact positions, no data are plotted on the graph; instead, dotted lines are shown representing the 95% confidence interval (CI) for mean SpA binding of "wild-type" V H Hs bearing this consensus sequence. For V H Hs bearing single amino acid substitutions at any one of the 13 SpA contact positions, the relevant substitution is plotted on the graph in green (substitution tolerated) if SpA binding fell within the 95% CI for wild-type V H Hs (10-360 RUs), and red (substitution not tolerated) if it fell below. For V H Hs bearing multiple amino acid substitutions at SpA contact sites, substitutions are plotted on the graph in blue.

SpA binding is reliably conferred to V H Hs by humanization
The simplest possible strategy for rescuing SpA binding in non-SpA-binding V H Hs would be to revert any discrepancies at the 13 contact positions defined by Graille et al. [25] to the human IGHV3 consensus sequence. This is always accomplished, by definition, through the process of humanization, in which a llama V H H's FRs are replaced almost entirely with those of the most closely homologous human IGHV3 gene. To confirm that this strategy would be successful, we took four a priori non-SpA-binding V H Hs (ICAM11-4 and ICAM34-1, directed against human ICAM-1; IGF1R-4 and IGF1R-5, directed against human IGF1R) and humanized them, thus reverting their FR sequences to human IGHV3 germline both at SpA contact positions (Table 1) and elsewhere (S3 Table). A fifth example, AFAI (a V H H directed against CEACAM6), was not humanized, but instead had two reversions incorporated via site-directed mutagenesis (Glu64Lys, Asp82bSer).
SpA binding by these five camelid V H Hs was initially extremely weak or undetectable, as measured using single-concentration injections of 250 nM V H H over immobilized SpA in SPR. However, in each case, V H H humanization (or in the case of AFAI, reversion of both Glu64 to Lys and Asp82b to Ser) conferred SpA binding (S2 Fig) that was sufficient to enable their purification by SpA affinity chromatography (S3 Fig). In three of four cases, expression yields of the humanized V H Hs after IMAC purification were reduced by 50% or more compared to their wild-type llama counterparts ( Table 2, left vs. middle columns), and in one case (ICAM11-4), the humanized V H H had a~20-fold loss of affinity for its cognate antigen (Table 3, left vs. middle columns). Thus, perhaps unsurprisingly, SpA binding could be successfully introduced into five a priori non-SpA-binding V H Hs by humanization, although this came at the cost of reduced expression yield, and in one case, impaired antigen recognition.  SpA binding can be conferred to V H Hs by limited site-directed mutagenesis without negatively impacting expression yield or antigen binding To determine whether SpA binding could be restored in non-SpA binding V H Hs without incurring the negative consequences of humanization on expression yield and antigen binding, and to better understand the relative impacts of individual substitutions on SpA binding, we took the same five non-SpA binding V H Hs as above and reverted any discrepancies at the 13 SpA contact positions (Table 1) to the human IGHV3 consensus by site-directed mutagenesis.
In one case (ICAM34-1), we also reverted an unusual Pro at position 75 to the human IGHV3 consensus Lys, based on the hypothesis that this Pro residue might affect structuring of surrounding SpA contact residues. Substitutions were incorporated either singly or in all possible combinations, and the SpA-binding affinities of the resulting engineered V H Hs were then determined by SPR.
In the case of ICAM11-4, a single substitution (Ile57Thr) was sufficient to restore 9.0 μM affinity for SpA (Table 4; in this table and hereafter, SpA-engineered V H Hs (underlined) are defined as the variants bearing the minimal substitutions from wild-type sequences required to restore SpA binding), near to the wild-type affinity range of 1-5 μM described for human IGHV3-encoded V H domains [56]. In the case of ICAM34-1, two reversions (Ala57Thr and Pro75Lys) were necessary to recover any degree of SpA binding (9.6 μM), which was improved slightly by a third reversion (Gly82bSer, 4.5 μM). Likewise, two substitutions (Val59Tyr and Asp65Gly) were required to restore SpA binding (1.5 μM) to IGF1R-4, confirming the critical nature of the consensus residue at both of these positions for interaction with SpA. In the case of IGF1R-5, a single substitution (Ala57Thr) conferred 3.7 μM affinity for SpA, with no further affinity improvement obtained by reversion of position 82b from Asn to Ser. Finally, reversion of Asp82b to Ser in AFAI restored weak SpA binding (11.0 μM), which was further improved with the additional reversion of Glu64 to Lys (0.6 μM). This suggested a cumulative defect of both substitutions in the wild-type V H H, with Asp82b conferring more significant binding impairment than Glu64.
In summary, at least in the five examples above, SpA binding could be conferred to non-SpA binding V H Hs using limited numbers (1-3) of amino acid substitutions. The resulting engineered V H Hs bound SpA with affinities similar to those of human IGHV3-encoded Fabs, could be purified by SpA affinity chromatography (S3 Fig), and had expression yields after IMAC purification that were indistinguishable from their non-SpA binding counterparts (  n.d., not determined columns). The data provided from these mutagenesis studies also expanded our understanding of the spectrum of permissive and non-permissive V H H residues for SpA binding, which are listed in Table 5, and clearly supported the hypothesis that residues outside the SpA contact interface such as Pro75 can indirectly modulate SpA binding.

In silico modeling of the SpA:V H H interaction by virtual mutagenesis
To support and extend the experimental results above, we carried out a computational assessment of the effects of virtual mutagenesis of V H H SpA contact residues on SpA binding (Fig 3  and Table 5). We used a consensus approach for mutant building and scoring that has been found to afford improved ranking of Ab:antigen binding affinities relative to various individual methods when applied to over 200 single-point antibody mutants of the SiPMAB database curated from the literature [50]. We used the published SpA:IGHV3 Fab structure [25] as the basis for in silico analyses, but prior to virtual mutagenesis excluded the light chain from consideration and "camelized" the human IGHV3 domain by substitution of FR2 Val37Phe, Gly44Glu, Trp47Ala and Leu45Arg. The results of in silico mutagenesis provided an explanation for the almost complete conservation of Gly15, core Gly65 and core Arg66 in V H Hs, since substitution of these residues was predicted to drastically destabilize V H H folding; the backbone phi-psi dihedral angles of Gly15 and Gly65 are (92, -18) and (92, -20), respectively, in the 4NC0 crystal structure and are high-energy regions for non-glycine amino acids. Conversely, Ser17 and core Ser70 were predicted to tolerate substitution with a variety of residues despite being almost totally conserved in V H Hs. In the case of core V H H Gly65, experimental evidence contradicted computational predictions of fold destabilization by substitution with Asp, as this residue was observed in three V H H monomers and one V H H pentamer (S1 and S2 Tables). In agreement with experimental data, core V H H Arg19 was critical for interaction with SpA, and  could not be substituted even conservatively by Lys. Also in line with experimental data, moderate but potentially non-destructive reductions of SpA binding were predicted when V H H core Thr68 was substituted with Ser, Ala and several other residues; similar trends were observed for core Gln81 and Asn82a. Virtual mutagenesis predicted a critical role for non-core V H H positions 57 (Thr, Lys, and Arg) and Tyr59, a negligible role for non-core V H H position Lys64 and variable effects of Ser82b substitution, with minor loss of binding resulting from Asn substitution and more pronounced defects resulting from substitution with Gly or Asp. Interestingly, V H H position 75 was predicted to tolerate substitution with most residues except those with beta-branched side chains, which probably disrupt the H-bonded turn of the FR3 loop at this location. We speculate that the effect of Pro substitution at this position has a similar effect and the incurred local misfolding may be propagated to adjacent SpA contact residues. Overall, the experimental and computational data were congruent and together supported the classification of SpA contact residues of V H Hs into three categories: (i) critical for folding (Gly15, Gly65, Arg66) and/or binding (Arg19, Tyr59, Gly65); (ii) partially permissive to specific substitutions (Thr68, Gln81, Asn82a, Thr/Lys/Arg57, Ser82b) and generally tolerant to many substitutions (Ser17, Ser70, Lys64). For V H H residues falling into the second and third categories, however, the spectrum of amino acids predicted computationally to be tolerated at each position could not be validated experimentally using the available data.
Next-generation DNA sequencing of camelid V H H repertoires suggests that species differences in SpA binding may arise from differential frequency of Ile57 polymorphism SpA binding is observed rarely in V H Hs of dromedary origin and more commonly in those of llama origin, although a detailed comparison of the frequency of SpA binding by species has yet to be published [30]. To identify potential explanations at the level of protein sequences that might account for this observation, we used next-generation sequencing technology to interrogate four V H H repertoires to variable depths (0.4-6.0×10 5 reads; S4 Table). The source of the V H H repertoires were either lymphocytes derived from three individual llamas or a single pooled sample of alpaca, camel and llama lymphocytes [34].
As shown in Fig 4A, there was no general defect in SpA binding evident from the sequences of V H H repertoires of camels and alpacas. The proportion of V H H sequences bearing the human IGHV3 consensus sequence at SpA contact positions was broadly similar in all four repertoires, except at position 57, where Ile was much more frequently present in dromedary and/or alpaca V H Hs (llama repertoire:~5% vs. mixed llama, alpaca and camel repertoires: 22%). A major difference was observed in the frequency of putative CDR1-CDR3 intradomain disulfide bonds, as indicated by the simultaneous presence of Cys residues at any position within both these regions, which were almost entirely absent in the repertoires of llamas but very frequent in the mixed-species repertoire (Fig 4B). To confirm that CDR1-CDR3 disulfide bridging had no effect on SpA binding, we ablated this disulfide from a non-SpA-binding dromedary V H H as well as introduced it into the engineered SpA-binding variant of the same V H H (Ile57Thr; S5 Table). Neither introduction nor ablation of the CDR1-CDR3 disulfide bridge had any impact on SpA binding, at least in the case of the single dromedary V H H tested here.

Discussion
SpA affinity chromatography has become the purification method of choice for most antibody manufacturers over the last 15 years, especially in the therapeutic Ab pipeline [8]. For many applications, V H Hs are fused to mouse or human F c regions for mammalian cell production and easily purified using conventional methods [58]. However, in some circumstances (e.g., in vivo imaging [59]; co-crystallization [60]; tumor targeting requiring tissue penetration [61]), the small size of the V H H molecule is essential, and the addition of affinity tags for purification may detract from the stability and homogeneity of the final product. The same is true of V H H pentamers, and the starting point for this work was the observation that His-and/or Myc-tagged pentamers had increased propensity for aggregation and non-specific binding than their untagged counterparts. Thus, tagless strategies for V H H purification are highly desirable.
Here, we present a detailed description of the sequence features of SpA-binding and SpAnon-binding V H Hs, and by extension, the features governing the interaction between SpA and other immunoglobulin V H domains. Our results have important distinctions from those inferred from the SpA:Fab crystal structure [25]. First, we found that several core positions in the V H H:SpA interface can tolerate a limited degree of polymorphism without ablating SpA binding (shown experimentally: Thr68; predicted computationally: Thr68, Ser70, Gln18, Asn82a). Second, on the basis of both experimental data and computational structural modeling, we found that non-core V H H Thr/Lys/Arg57 and Tyr59 are indispensable for SpA binding, and that Ile57 had a destructive effect on SpA binding. Third, we found that V H H Pro75 exerts a destabilizing effect on SpA binding, potentially by altering the conformation of FR3 sufficiently to displace SpA contact residues some distance away. Given this surprising finding, a potential role for additional V H H polymorphisms outside the SpA interface in determining SpA binding cannot be ruled out. However, the fact that a three amino acid insertion in the FR3 of V H H52 (S1 Table) did not impair SpA binding suggests that epistatic effects involving "action at a distance" may be rare.
In the analyses presented here, we did not attempt to rank the relative effects of individual V H H substitutions on SpA binding, instead opting to categorize them dichotomously as permissive or non-permissive for SpA binding ( Table 5). The rationale for this decision was twofold. First, several substitutions were observed experimentally in only a single V H H, and their effects may depend on the sequence background and presence of other polymorphisms. Engineered Protein A-Binding V H Hs Second, the weak overall affinity of the SpA:V H H interaction makes determinations of monovalent binding strength challenging below a certain threshold, and weak residual binding is difficult to rule out experimentally; this was clearly evident in comparisons of SpA binding of V H H monomers and pentamers bearing similar sequences. Nevertheless, the overall consistency between experimental data and computational predictions, as well as with the limited mutagenesis data of Fridy et al. [57], provides a strong degree of confidence in the general effects of many of the substitutions described here. We caution, however, that computational predictions of minor or moderate reductions in SpA binding were not always accurate in the degree of their effects, and that the safest course is to revert all V H H contact positions back to the human IGHV3 consensus, even if some of the original polymorphisms might have been tolerated.
Using next-generation DNA sequencing of llama, alpaca and dromedary V H H repertoires, we found that the most likely explanation for non-SpA binding of dromedary V H Hs was the presence of non-permissive residues at SpA contact positions, especially Ile57. Since the comparison in our analysis was between llama and mixed-species repertoires, the frequency of deleterious polymorphisms detrimental to SpA binding, including Ile57, is likely even higher than shown here in the repertoires of alpacas and/or dromedaries. It remains unclear whether such differences arise through germline polymorphism or somatic mutation, although comparison of germline [26] and rearranged [27] camelid V H H sequences favours the latter hypothesis. Also unclear are the reasons why Ile57 should be a hotspot for mutation in dromedaries and/or alpacas, but not llamas.
On the basis of these data, we propose the following general strategy for conferring SpA binding upon camelid V H Hs: (i) ensure that FR1 residues Gly15, Ser17 and Arg19 are present and revert any discrepancies to this consensus; (ii) ensure that CDR2 residue Thr/Lys/Arg57 is present and revert any discrepancies to this consensus; (iii) ensure that FR3 residues Tyr59, Lys64, Gly65, Arg66, Thr68, Ser70, Gln81, Asn82a, Ser/Asn82b are present and revert any discrepancies to this consensus; and (iv) closely examine FR3 positions 71-80 for Pro residues (especially at position 75) and unusual deletions and revert these to the nearest human IGHV3 germline residue. We have found no evidence to suggest that any of the substitutions introduced following these rules affect the expression yield, solubility, stability or aggregation status of V H Hs, and while some may not be essential for SpA binding, they are also not harmful. While there were no affinity penalties resulting from these substitutions for any of the five V H Hs shown here, the necessity of restricting CDR2 position 57 diversity to Thr, Lys or Arg may compromise the affinity of other V H Hs.
In conclusion, we have identified the sequence hallmarks responsible for determining camelid V H H binding by SpA, which provide an explanation for species differences in V H H SpA reactivity. We used this information to develop a strategy for engineering V H Hs to introduce SpA binding and enable their tagless purification by SpA chromatography. This strategy may also apply to Ab fragments of other species, or at least those that share homology with human IGHV3 Abs.  Table) was injected for 2 min and the number of response units bound at the end of the injection was measured. For pentamers bearing the human IGHV3 consensus residue at all 13 SpA contact positions, no residues are plotted on the graph; instead, dotted lines are shown representing the 95% confidence interval (CI) for mean SpA binding of wild-type pentamers bearing this consensus sequence. For pentamers bearing single amino acid substitutions at any one of the 13 SpA contact positions, the relevant substitution is plotted on the graph in green (substitution tolerated) if SpA binding fell within the 95% CI for wildtype pentamers, and red (substitution not tolerated) if not. For pentamers bearing multiple amino acid substitutions at SpA contact sites, substitutions are plotted on the graph in blue. We used a verotoxin B-irrelevant peptide fusion as a negative control to rule out potential interactions between SpA and the pentamerization domain (data not shown).