Fig 1.
Diverse proteins containing a variable number of Ig-like domains chained in tandem.
A) immune system proteins: TCRs, MHCs, antibodies and many other cell surface receptors extracellular regions contain Ig-domains (CD8, CD4, CD22, CD28, PD1, CTLA4 etc.). B) Nervous system proteins: CEACAMs, contactins, that like sidekick or DSCAM proteins contain a 4-Ig horseshoe “superdomain” at the N-terminus (see section on Ig-Ig interfaces). MDGA proteins (MAM domain-containing glycosylphosphatidylinositol anchor) contain 7 Ig-domains followed by a MAM domain that has a jelly roll fold with similarities with the Ig-fold. C) Vascular system proteins: JAM proteins contain 2 Ig-domains in tandem, and VEGFR act as receptors for VEGF (Vascular endothelial growth factor) proteins. 3D models can be visualized in iCn3D links: Contactin1 https://structure.ncbi.nlm.nih.gov/icn3d/share.html?imRwYoJKmCpCaXYm7&t=Q12860; CEACAM1 https://structure.ncbi.nlm.nih.gov/icn3d/share.html?g89JayoAbbjN7Yg18&t=P13688; VCAM1 https://structure.ncbi.nlm.nih.gov/icn3d/share.html?9edsvYBJ43ZyPv9j8&t=P19320; VEGFR1 https://structure.ncbi.nlm.nih.gov/icn3d/share.html?SbL83DUuwccuJWtz8&t=Q8QHL3; JAM1 https://structure.ncbi.nlm.nih.gov/icn3d/share.html?6mGp5wdHxcxV4Cv98&t=1NBQ.
Fig 2.
Canonical Ig domain variants: Schematic and Ribbon representations.
The IgV domain contains 9strands (AA’)BCC’C”DEFG (9s) according to the nomenclature as displayed, The IgI domain variant contains 8 strands (AA’)BCC’DEFG, IgC1 and IgC2 contain both 7 strand, ABCDEFG and ABCC’EFG resp.. The A strand splits as A and A’ between the two sheets of the beta sandwich in IgI and IgV domains, as displayed. Some IgV domains can exhibit A’ strand only. The IgC2 can also present an A strand split. (see text for details).
Fig 3.
Sequence patterns of Ig-domain variants in WebLogo format.
The anchor residues in each strand are marked by a “#”, which are numbered xx50 as described in the Results and Discussion section. a) The IgV-set with most commonly an (AA’)BCC’C”DEFG topology (9 strands). b) The IgI-set presents a (AA’)BCC’DEFG topology (8 strands) highly similar to IgV domains. c) The IgC1-set exhibits an ABCDEFG topology (7 strands), with usually two hallmarks: a straight A strand, even if some strand breaks can be observed, and a D strand. Some IgC1 domains can also exhibit a small non-conserved C’ strand (denoted by XXX) as in IgI or IgC2 domains. d) The IgC2-set exhibits an ABCC’EFG topology (7 strands). IgC2 domain can exhibit an A/A’ strand split with a non-conserved A’ strand (denoted by XXX) as in LILRs. It also has a C’ strand with no sequence conservation (denoted by XXX). IgC2 differs from IgC1 domains in not presenting a D strand. The first four canonical Ig-domains IgV, IgI, IgC1 and IgC2 present high similarity in sequence, especially the Cysteines in strands B and F forming a Cys-Cys bridge and a Tryptophan in strand C flanking the Cys-bridge. Each possess shared specific sequence patterns, for example the Tyrosine corner in strand F (see later in text) or some specific to each type such as R and N residues in IgI strand D and F respectively, residues Q, RF and D in IgV’s strand A, D and F respectively, etc. One should note that the G-strand sequence pattern in the WebLogo results from an over representation of antibody domains in the dataset. e) The Ig-FN3 set presents an ABCC’EFG topology (7 strands). Like any other Ig-domain variant some FN3 domains can exhibit an A/A’ strand split. f) The Ig-Cadherin set exhibits an A’BCDEFG topology (7 strands), although while sharing Ig-fold it exhibits a very different sequence pattern and may result from convergent evolution [63]. The A’ strand in Cadherins corresponds to the A strand designation in the literature [66]. Classical cadherins can exhibit an A*/A strand split where its most N-terminal segment, called the “A* strand”, provides an adhesive mechanism between cells by swapping between N-terminal (EC1) domains [67]. Both FN3 and Cadherins, however, show key Tyrosine residues conserved in strands C and F, where the Tyr in F strand might correspond to the Tyr corner of the F strand in the previous four canonical domains. (See supplement files S1 Data to S6 Data for multiple sequence alignments used in producing sequence logos in Fig 3).
Fig 4.
Schematic representation of the Ig-fold structural signature consisting in strand B-C, E-F, and G strands arranged as displaced in the central panel, and main regions of lateral variability: NTerm (strand A), Cterm (strand G) and Linker (Strands between C and D), as well as an example of a less known lateral strand addition.
(See text).
Table 1.
Reference set of diverse topological and structural variants of Ig domains from the SCOP b.1 Superfamily. (see supplement file S7 Data, for the corresponding spreadsheet). It includes surface receptors in the immune, nervous, and vascular systems, cytoplasmic proteins and enzymes, transcription factors and nuclear proteins.
Fig 5.
A) IgStrand numbers for anchors and additional key residues in important IgV domains Definitions for Classical Strands (1000’s), Non-classical Strands (100’s) with Anchor numbers (50s). Residue types are shown for well-known IgV examples as in antibodies where sheet 1 is composed of strands ABED and sheet 2 (A’)GFCC’(C”). The A strand anchor residue is given the number 1550, where it is the first strand (A) on Sheet 1. A’ strand anchor is given the number 1850 it is the second half of the A strand swaps to Sheet 2 adjacent to the G strand in parallel. The B strand anchor is 2550. C strand anchor is 3550 and is on Sheet 2. C’ strand anchor is 4550 and is on Sheet 2. C” strand anchor is 5550 on Sheet 2, but it can swap to Sheet 1. D strand anchor is 6550. The E strand anchor is 7550. The F strand anchor is 8550. The G strand anchor is 9550. Additional, non-classical strands can also appear in Ig-extended domains. This is taken into account by the second digit, for example the anchor of the strand appearing before the A strand (A- strand) will have the IgStrand number 1450, while the anchor of a strand A+ after the A strand will be 1650, and for a G+ strand appearing after a G strand similarly 9650. The anchors form a network. B) shows the extensive residue interaction network between the anchors core and residues in all strands of the fold. C) shows sidechain and backbone representations of all anchors with respect to the overall fold. D) shows the positions and backbone connections of the anchors in the GFCC’C” Sheet 2. E) shows D 6550 and E 7550 strand anchors vs. A 1550 and B 2550 anchors on the ABED Sheet 1. F) shows the interaction of A strand anchor 1550 with the backbone of 8550, forming the signature bulge in the G strand of IgVs. G) shows the position of 1850 in the A’ strand with respect to the F and G strand residues. (PDBid used: 5ESV_H for heavy chain variable domain).
Fig 6.
IgV domain Proteomap A) Sequence/Topology Map (PDBid 1RHH) of the VH domain show both the the sequence and topology simultaneously
https://structure.ncbi.nlm.nih.gov/icn3d/share.html?DTtCtLWevzwS2XkJ8&t=1RHH. B) IgStrand numbers corresponding to IgV domains with the AA’BCC’C”DEFG topology.
Fig 7.
Ig domain extension and circular permutations A) C2A domain - This type II C2 domain has a G+G++ hairpin extension (shown in cyan) and no A.
The G++ is a circular permutation of the A strand of a Type I C2 domain. B) Synaptotagmin 2 tandem domains C2A - C2B (PDBid: 4P42) in a C2 pseudo-symmetric head to head tandem arrangement. The C2B domains the A strand circularly permuted w.r.t. G++ strand in C2A (A and G+ in cyan) https://structure.ncbi.nlm.nih.gov/icn3d/share.html?T2NCEnWVHaoW2AE9A&t=4P42. C) Transthyretyin (PDBid 4P42) domain. Similarly it shows a G+G++ extension but in a reverse order vs. C2A forming a parallel G++ strand to the B strand https://structure.ncbi.nlm.nih.gov/icn3d/share.html?wgNVJar5M1ZBwiuL9&t=2ROX. D) Transthyretin tetramer with a D2 symmetry formed by the G+G++ hairpin.
Fig 8.
Head-to-Head tandem Arrestin Structure A) Arrestin (PDBid: 1CF1) https://structure.ncbi.nlm.nih.gov/icn3d/share.html?1LKCe6fyS1fzacVL7&t=1CF1
B) Arrestin and GPCR 7-TM domain interacting in an active conformation (PDBid: 6TKO). The Arrestin_N domain’s Finger loop (CC’) and Middle loop (extended EF loop) interact with the transmembrane helices of activated G-protein-coupled receptors. The additional A- strand (see text) is shown in cyan to highlight its role in enabling a head to head configuration between the two Arrestin Ig domains. https://structure.ncbi.nlm.nih.gov/icn3d/share.html?He7U1h5mUbipbVW58&t=6TKO.
Fig 9.
Example of the Ig-extended Lamin Tail Domain (LTD).
A) Schematic LTD strand topology forming sheets ABED+ and A-GFCC’. B) Pseudo symmetric LTD dimer brings the two A-GFCC’ as one in antiparallel through the additional A- strand at the N terminus. C) LTD dimerization interface using the IgStrand numbers for the A- strand (i=1; j=4 running from 1445 to 1452) and a few residues on strand A and B. Note that the current IgStrand numbering does not cover residues before 1445 (based on current template - see Table 1) https://structure.ncbi.nlm.nih.gov/icn3d/share.html?BRks581vEqzUpENs9&t=7DTG.
Fig 10.
Transcription factors Ig-extended and Ig-like domains.
A) NF-kB RHD and NF-kB IPT dimer. The Rel Homology Domain (RHD) in tandem with the Immunoglobulin-Plexin-Transcription factor (IPT) domain form a dimer that binds DNA https://structure.ncbi.nlm.nih.gov/icn3d/share.html?NVxjekmSMxWnrs5P7&t=1A3Q. The RHD exhibits an insertion between strands C and C’ extending the ABE strand in position of a D strand and one contributing to the GFCC’ sheet in a C” position before the C’ strand in that order. The linker region between C and E strands is therefore inverted as compared to the canonical IgV domains. If we name strands according to the IgV canonical strand names, the topology runs C>DC”C’>E in RHD as opposed to C>C’C”D>E in IgV. Overall the topology can be seen as a permuted IgV-like topology, forming two sheets GFCC’C” and ABED as in IgV domains. It is followed by the IPT (Immunoglobulin, Plexin, Transcription factor) domain, also known as TIG (Transcription factor ImmunoGlobulin), with an IgC2 topology extended by an unusual small D strand and a split AA’ N-Terminal strand. The two IPT domains dimerize through their ABED sheet, in parallel. The two chains bind DNA pseudo-symmetrically. B) P53 DNA Binding Domain possesses, as in NF-kB RHD, a permuted IgV-like topology https://structure.ncbi.nlm.nih.gov/icn3d/share.html?RiTYSGVRTWCA2AyZ8&t=1TUP with an additional N-terminal strand extension before and between A and A’. C) Residue Interactions in the IPT-IPT dimer interface, the two IPT domains - DNA interface, and the two RHD domains - DNA interface.
Fig 11.
A) The Murine Leukaemia viruses envelope RBD (PDB id 1AOL) SCOP b.20 shows an extended Ig-domain framework with receptor binding regions called VRA and VRB correspond to loops BC (CDR1) and DE (HV4) in canonical Ig-domains.
The Ig-domain is extended at the N-terminus with a A’- strand hydrogen bonded to the G-strand in antiparallel and forming a A’-A’ hairpin extending the A’GFCC’C” to A’A-GFCC’C” where the A’- in inserted between the A’ and G strand https://structure.ncbi.nlm.nih.gov/icn3d/share.html?hA7Ftj6A2QEMoCSV7&t=1AOL. The leukemia virus envelopes can be compared to the Feline Leukemia virus RBD as well as the endo-retrovirus EnvP(b)1 https://structure.ncbi.nlm.nih.gov/icn3d/share.html?oFgS8acCM7AW1hZDA&t=6W5Y,1LCS,1AOL. B) The monkeypox virus protein M2 binding CD80. (PDBid: 8HXA). This beta sandwich is similar to the SECRET domain found in other orthopoxviruses binding chemokines. This domain is also called PIE (Poxvirus immune evasion). In this case, the M2 protein modulates T cell co-stimulation in binding CD80. It is missing the A strand altogether at the N terminus. It also presents an insertion of two strands between B and C forming a hairpin C--C-, that take the place of C’‘ and C’ (in reverse order vs. a canonical IgV domain), extending Sheet 2. The C-terminal extension is composed of 3 additional strands to the G-strand, first a G+ intercalating strand forming a GG+ hairpin and a parallel beta sheet interface to strand F, extending the GG+FCC’ followed by a G++G+++ hairpin extending Sheet1 as G+++G++BED, the latter G+++ strand replacing the missing A strand, similarly to a circular permutation as found in type II C2 domains. The M2 domain binds the G-strand (red) of CD80 laterally through its C-terminus G-strands extension (red), mainly G++/G+++, to modulate T cell co-stimulation. https://structure.ncbi.nlm.nih.gov/icn3d/share.html?cbid7J2RCJhMNRca8&t=8HXA [110].
Fig 12.
A) IgC2 schematic topology B) Conceptual A and G strands swap comparing Jelly-roll to Ig-like transforming AB and GF loops into straddling loops between Sheets 1 and 2 of the sandwich changing from ABE and GFCC’ in an IgC2 domain to BGE and AFCC’ sheet respectively in what could therefore be called a C2-roll domain, using the Ig strands nomenclature. C) The Ephrin Receptor (PDBid: 3GXU), a Galactose-binding domain-like fold classified as a jelly-roll in CATH, and ECOD and SCOP (b.18) represents a good example of an extended “C2-roll” domain where the N-terminus provides an additional set of strands, extending both sheets and providing the bulk of the interface its ligand Ephrin. D) Ephrin exhibits an Ig-extended Cupredoxin-like beta sandwich fold (SCOP b.6). The G-strand is forming, as in many IgV domains, an FG loop and a parallel A’ strand (A’GFCC’) but it is preceded by an N-terminus extension (in cyan) providing additional strands A- on Sheet 2 to form A-A’GFCC’ and a strand A-- on Sheet 1 to form the A--BE, the interacting surface to the Ephrin Receptor https://structure.ncbi.nlm.nih.gov/icn3d/share.html?kRY8dJJvV4aM7fFx6&t=3GXU. E) The Ephrin Receptor-Ephrin interactome. F) Galectin (GLECT) domain is a galactose-binding lectin (PDBid 4AGV) that represents yet another Jelly-roll variant topology, classified as a Concanavalin A-like (SCOP b.29). GLECT exhibits the A and G strand swap between the sandwich sheets as compared to an IgV domain, extending one sheet with an additional C”’ strand (in grey) after C” to form a sheet AFCC’C”C”’, and with a two strand E+E++ β-hairpin insertion (in grey) after E to form the GBEE+E++ extended sheet. https://structure.ncbi.nlm.nih.gov/icn3d/share.html?rYBVVeHiBh2TkPVM6&t=4AGV. G) Laminin G-like (LamG) domain is also classified as a Concanavalin A-like jelly roll with a two strands extension at the N-terminus extending the first sheet A--A-GBEE+E++ as compared as to GLECT https://structure.ncbi.nlm.nih.gov/icn3d/share.html?bmYk9ciuwBQb2td47&t=5DZE.
Fig 13.
Schematic representation of Ig-domains dimerization patterns and contact network A) VH:VL schematic parallel interface and conserved interactions B) CH1:CL schematic antiparallel interface and conserved interactions (see text).
On the left a schematic representation of the domain interface with the C2 pseudo symmetry axis. Note that the symmetry axis for VH:VL is vertical corresponding to a parallel interface, while it is horizontal for CH1:CL as the interface is antiparallel. On the right the residue interaction network. Residue subscripts indicate the percentage of occurrence of that residue at that position, if less than 100% in the dataset. Ig strands are color-coded according to the iCn3D IgStrand scheme (See Methods). Only residue-residue interactions present in 70% of the Fabs in each dataset are shown with solid lines. Dotted lines with a % number represent interactions that are present in at least 70% of the Fabs in the diverse antigen binding dataset, with the % number representing the % of Fabs in which this contact is present in the SARS-CoV-2 antigen binding dataset. Interaction lines between residues in VH:VL and CH1:CL domains are color-coded in red for symmetric contacts, green line for hydrogen bonds and purple for a noticeable conserved ionic interaction in CH1:CL; otherwise, grey indicates van der Waals interactions. The highly common interactions, present in at least 90% of the dataset, are indicated by a star. See Tables 2 and 3 for more details. The VH:VL and CH1:CL interfaces of a specific Fab (PDBid:7LM8) can be visualized and analyzed with the iCn3D link: https://www.ncbi.nlm.nih.gov/Structure/icn3d/share.html?uvawHK16NinhaZL68.
Table 2.
VH:VL interactions of Fabs binding SARS-CoV-2 spike protein (70% cutoff). Red numbers represent symmetric contacts. Bold contacts represent highly conserved hydrogen bonding contacts. Shaded cells represent five highly conserved contacts (90% cutoff) shared between SARS-CoV-2 antigen binding dataset and diverse antigen binding dataset (Table B in S1 Text).
Table 3.
CH1:CL interactions of Fabs binding SARS-CoV-2 spike protein (70% cutoff). Red numbers represent symmetric contacts. Bold contacts show highly conserved hydrogen bonding. Bold contacts show highly conserved hydrogen bonding, and in purple for a highly conserved ionic contact. Shaded cells represent twelve highly conserved contacts (90% cutoff) shared between SARS-CoV-2 antigen binding dataset and diverse antigen binding dataset (Table C in S1 Text).
Fig 14.
Comparing IgV-IgV interfaces A-B) CD96 vs.
TIGIT interacting with nectin-like protein-5 (necl-5) (PDBid: 6ARQ, 3UDW) using their N-terminal IgV domain shown side by side with their residue interaction network, using IgStrand numbering. C) Different residue interaction network in CD96 and in TIGIT https://structure.ncbi.nlm.nih.gov/icn3d/share.html?HppETGxGj1yjQ7md7&t=6ARQ,3UDW. D-E) PD-L1 interacting with PD1 (PDBid 4ZQK) vs. PDL1 interacting with a nanobody (PDBid 5JDS) (https://structure.ncbi.nlm.nih.gov/icn3d/share.html?ehLCiHmy953yyGMeA&t=4ZQK,5JDS targeting the same epitope/surface on the PDL1 GFCC’ sheet, using an elaborate FG (CDR3) loop from igs# 8553 to igs# 9547. F) PD1/Nivolumab (PDBid: 5WT9) interaction. Nivolumab binds to a PD1 epitope composed of the FG loop residues as well as an N-terminal loop that is considered outside of the PD1 domain in IgStrand numbering, hence the residues (in cyan)are not numbered and retain their PDB numbers. This points to the need to consider Ig domain extensions. The interface that can be compared to the PD1/PDL1 interface in D) https://structure.ncbi.nlm.nih.gov/icn3d/share.html?UhSpS48KgM7dsMUD8&t=4ZQK,5WT9.
Fig 15.
Multi Ig-domains Intrachain pseudo quaternary interfaces.
A-B) Tandem IgV domains in human CD226 (DNAM-1). (PDBid 6ISB) The two IgV domains in tandem with a long linker form an antiparallel interface involving a (tertiary) antiparallel strand zipper between IgV1 (strand A’) and IgV2 (strand C’) as well as a few residues in the G and C’‘ resp. https://structure.ncbi.nlm.nih.gov/icn3d/share.html?UM2jEBErouevRpGR8&t=6ISB. C-D) Tandem IgV domains of Vcbp3. (PDBid 2FBO) The variable region-containing chitin-binding protein-3 (VCBP) is an immune-type molecule found in amphioxus (Branchiostoma floridae). The two IgV domains in tandem form an antiparallel interface involving the GFCC’ sheets exhibiting C2 pseudosymmetry, partially resembling an inverted IgV-IgV quaternary interface https://structure.ncbi.nlm.nih.gov/icn3d/share.html?kyVRzncH2ZGzKbQL9&t=2FBO E-F-G) N-terminal Ig-Horseshoe domain formed by 4 IgI domains in tandem with a long linker between Ig2 and Ig3 forming an interface Ig2-Ig3 using partially the ABED sheets and an Ig1-Ig4 interface using in part the GFC sheet and the A strand. The 4 domains are highly superimposable (RMSD 4.20/TM-0.77 using TM-Align) when comparing structures of contactin-1 (PDBid 7OL4) and contactin-2 (PDBid 8A0Y) https://structure.ncbi.nlm.nih.gov/icn3d/share.html?iyZJT4PXqFaWnQhP8&t=7OL4,8A0Y.
Fig 16.
Heterophilic Ig-Horseshoe Zipper interface. A) Contactin-2 Homophilic dimer interface (PDBid 8a0y). Contactin-2 extracellular region contains 6 N-terminal Ig domains followed by 4 FN3 domains. The four N-terminus Ig-domains Ig1-4) form the Ig-Horseshoe. The Ig2 domain forms a dimer through a G-strand antiparallel zipper interface. B) Contactin-2 Homophilic Horseshoe Zipper interactome showing the G-strand zipper (red) as well as residues in the F and C strands of the GFCC’ sheet C) Contactin-1 - neurofascin-155 Heterophilic dimer interface (PDBid 7OL4). Contactin-1 is highly homologous to Contactin-2 and interacts with neurofascin-155. Both interact through their N-terminal Ig-Horseshoe substructure, also using the Ig2 domain forming a G-strand antiparallel zipper interface. The dimer complexes show nonetheless significant plasticity https://structure.ncbi.nlm.nih.gov/icn3d/share.html?ZnVaH3FLKjYL4Eod8&t=7OL4,8A0Y. D) Contactin-1 - neurofascin-155 Heterophilic Horseshoe Zipper interactome showing the G-strand zipper (red) as well as residues in the F, C and C strands of the GFCC’ sheet. E) Conserved positional interaction. The G-strand zipper interface is conserved in terms of its positional residue network. However One should note that this does not mean that all the atom level residue interactions are the same. For example while the symmetric pair igs# 9547-9547 involves a conserved antiparallel backbone-backbone interaction in both the homophilic (S9547-S9547) or heterophilic (S9547-I9547) pair, the symmetric pair T9545-F9549 in the homophilic dimer (PDBid 8A0Y) forms a side chain(T)-backbone(F) HBond, replaced in the heterophilic dimer (PDBid 7OL4) by a backbone-backbone HBond in the T9545-Q9549 pair while the pseudo symmetric H9545-F9549 pair is reduced to a vdW interaction.
Fig 17.
A) https://structure.ncbi.nlm.nih.gov/icn3d/share.html?6swEjWBjXWhufUycA&t=3DMK. B) Ig2-Ig2, Ig3-Ig3 and Ig7-Ig7 homophilic interfaces using the AG strands, AA’ strands and ABED sheet, respectively. C) Ig2-Ig3 intrachain Horseshoe interface using the ABED Sheet. D) Ig5-Ig6 intrachain Horseshoe interface using the A’GFC Sheet in Ig6 vs. the A and B strands in Ig6.
Fig 18.
Ig Strand color spectrum used for ABCC’C”DEFG strands in Ig-domains.
A) Ig Strand extended rainbow color spectrum for 9 strands – 9 colors as indicated. B) Colored Ig strands according to the Ig Strand Rainbow Spectrum. Additional and split strands at the N-terminus A’,A-,A+,… use the same color as A: dark violet, and similarly additional strands at the C-terminus G+,G++, … use the same color as for G: red. Other inserted strands in Ig-extended domains can appear in white or cyan. C) Ig Protodomain reduced rainbow color spectrum 4 strands colors blue, green, yellow, orange for ABCC’ and DEFG and a fith color red for the C” strand, if present. Using this color scheme a sheet ABED will be Blue green green blue; a sheet GFCC’ will be orange yellow yellow orange. In iCn3D the command used is “color ig strand” and “color ig protodomain” respectively (lowercase). Loops are in Grey between strands. Extensions are in Cyan, and insertions in White. The hexadecimal RGB color codes used in iCn3D are indicated (In this paper, we use the pure yellow code (FFFF00) instead). Residue Interactions color codes: Interactomes use the following colors for interaction types Green: H-Bonds; Cyan: Salt Bridge/Ionic; Grey: contacts (Van der Waals); Magenta: Halogen Bonds; Red: π-Cation; Blue: π-Stacking. Contacts are displayed as dotted lines in 3D and traced from Ca to Ca between residues while other types of interactions are atom specific.