Fig 1.
Lectin-glycan interaction characterization and comparison.
Features for lectin-glycan interactions (A) are derived from Protein-Ligand Interaction Profiler (PLIP) defined interaction counts (B), voxelized representations of the 3D pocket space occupied by the glycan (C), and binding site residues binned by their minimum distance to the glycan (D). Two types of specificity analyses were conducted. For global specificity (E), binding interaction characteristics from each glycan of interest were compared to the background characteristics of all other lectin-glycan interactions, revealing features that were enriched or depleted in association with the presence of the given glycan relative to all other glycans. For fine specificity (F), characteristics were compared among interactions within a subgroup of similar glycans. In panels A-D, the binding interaction between human lung collectin surfactant protein D and a disaccharide fragment (Hep-Kdo) of a bacterial lipopolysaccharide is used to demonstrate the three categories of interaction features (PDB ID: 4E52). Panel C has additional components illustrating featurization of the voxel point cloud via features describing the D2 distribution of pairwise distances between surface points and computed 3D Zernike descriptors (3DZDs), with the original point cloud in red and the reconstructed shape from the 3DZDs in blue. Panels E & F display schematic results of select features defined in panels B-D that were found to be significantly enriched or depleted in the specified interactions. Structures were rendered using PyMol and glycan symbols follow the Symbol Nomenclature for Glycans (SNFG) system.
Fig 2.
Lectin binding site features have significant associations with the presence of specific glycans.
Volcano plots show that a substantial proportion of features from all three categories are statistically significantly (q < 0.01) enriched (x > 0) and depleted (x < 0) in interaction characterizations for each of the 15 glycans of interest when compared to background interaction characterizations from all other glycans. It is apparent that pocket-size-correlated D2 distribution & pocket descriptor features (represented by the two lightest blue colored points) are generally enriched for larger glycan ligands (terminal NeuAc, high mannose, 3’-siayllactose) and depleted for interactions with smaller ligands (monosaccharide glycans). Some glycan-lectin interactions have fewer features that are strongly enriched (terminal fucose, N-acetyllactosamine, and TF antigen), possibly indicating a diversity of interaction mechanisms, or that more common, highly similar glycans in the background are reducing the strength of associations. Significance and direction of association was determined by weighted Wilcoxon-Mann-Whitney (WMW) tests accounting for homologous and redundant lectin structures. The x-axis shows the direction and strength of rank-based enrichment for each feature compared to background. The y-axis indicates the statistical significance (q-values) adjusted by the Benjamini-Hochberg procedure applied separately for each ligand with a significance threshold set for an FDR of 0.01 (represented by the solid horizontal lines). Q-values more significant than 1 × 10−16(horizontal dotted line) were scattered between 3 × 10−19 and 1 × 10−16. The vertical line (x = 0) divides positive (right) and negative (left) associations. Glycan symbols follow the SNFG system.
Fig 3.
Lectin binding site features can be used to predict the identity of bound glycans.
Random forest models trained for each of the 15 glycans have strong recall performance while predicting whether interactions contain the respective glycan based on the interaction features alone. The models are predictive of glycan identity even when trained only on lectins with less than 50% sequence identity, outperforming identical classifiers trained on data with shuffled labels. Split violin plots show the recall (left-hand distribution and left y-axis) and precision (right-hand distribution and right y-axis) of ligand-specific random forest models measured during leave-one-out cross-validation. The pairs of notched boxplots for each glycan show the performance of classifiers trained on data with shuffled labels, where again the left-hand boxplots depict recall and the right-hand boxplots depict precision. Glycan symbols follow the SNFG system.
Fig 4.
Determinants of global lectin specificity are shared for similar glycans.
Similar glycans have similar patterns of enriched and depleted interaction features as observed by Pearson correlations between weighted WMW feature effect sizes. Panel A shows the correlogram from all 221 interaction feature effect sizes, clustered by Pearson correlation coefficient. Panels B-D show heatmaps of the interaction feature effect sizes with features in the columns and ligands in the rows clustered by Pearson correlation. Features that are statistically significant by the weighted WMW tests (q < 0.01) and in at least the 75th percentile of median feature-type-stratified importance from the random forest models are indicated with bullet points. The color bars present along the columns indicated the subcategory of the feature and the parameter threshold used when extracting the feature. The color bars along the rows indicate the identity of the terminal saccharide in the glycan and the number of saccharides present. Clusters discussed include sialic acid glycans (purple boxes), mannose and glucose (cyan boxes), lactose and N-acetyllactosamine (orange boxes), and fucose and terminal fucose containing glycans (red boxes). Interestingly, N-acetylglucosamine interactions are more similar to interactions with galactose while N-acetylgalactosamine interactions are more similar to interactions with glucose. The dark green boxes indicate distinct patterns in the 3D pockets of interactions with high mannose. Glycan symbols follow the SNFG system.
Fig 5.
Sialic acid recognizing lectin binding sites are much deeper and more concave than the fairly flat and shallow binding sites of lectins that bind high mannose.
Representative lectin interactions with a terminal NeuAc glycan (panel A, PDB ID: 1SID), NeuAc monosaccharide (panel B, PDB ID: 1HGH), and high mannose (panel C, PDB ID: 1CVN) demonstrate the differences in the 3D interaction site space between NeuAc-binding lectins and high-mannose-binding lectins. Panel D shows the D2 distributions summarizing pocket geometry for each of these representative interactions. The lectin binding sites containing sialic acid glycans are wider and more concave while the high-mannose-accepting binding sites are more shallow and compact, being nearly entirely defined by the lowest threshold used for pocket generation as seen in the inset subpanels in A-C and in the D2 distributions in panel D. In panels A-C, residues are colored by their binned distance from the glycan (red: bin 1, orange: bin 2, sand: bin 3, pale yellow: bin 4), the glycan is colored by atom-type with carbons in white, and the rest of the lectin structure is in grey. PLIP interactions are colored blue for hydrogen bonds, pale blue for water bridges, yellow for electrostatic interactions, and grey for hydrophobic interactions. In the insets, 0.5 Å3 spheres were placed at each voxel center in the pocket and colored by the distance threshold used (magenta/red/orange/yellow: 4/6/8/10 Å). In panel D, vertical lines were placed at the median D2 measure from each threshold with the same coloring as used from the insets in panels A-C. All structures were rendered in PyMol and glycan symbols follow the SNFG symbols.
Fig 6.
Focused analysis of influenza HA binding sites reveals significant and discriminative features associated with binding of human-like sialoglycans over avian-like sialogylcans.
Clustering HA interactions by significant interaction features discriminates those recognizing 3’ vs. 6’ NeuAc terminal glycans, while clustering by interaction sequence simply recapitulates influenza strain. Panel A shows that clustering of the 96 HA-3’/6’ αNeuAc interactions using correlations from the 35 significantly associated features allows for much cleaner grouping of interactions by ligand-type (upper-right-triangular similarity matrix) compared to interaction clustering using the alignment of the sequence of binding site residues leading to perfect grouping of interactions by influenza strain and HA subtype in the lower-left-triangular similarity matrix. Comparisons between hemagglutinin structures with 6’ αNeuAc-terminal glycans versus 3’ αNeuAc-terminal glycans reveal 35 features that are significantly associated with the presence of 6’ αNeuAc-terminal glycans (panel B) displayed in the same manner as in Fig 2 with points discussed directly in the text bolded and underlined. These features are found in representative interaction structures between the respective glycans and HA proteins in panels C & D. In panel A, the upper-right-triangular matrix was constructed by calculating the pairwise Pearson correlations for all interactions using the scaled values of the 35 significant interaction features. The lower-left-triangular similarity matrix was constructed from sequence similarity scores using Needleman-Wunch to align binding site sequences with the BLOSUM62 substitution matrix. In panel B, significance and effect size were determined by a Wilcoxon-Mann-Whitney test weighted by influenza strain/hemagglutinin subtype and UniProt ID, with a significance threshold of q < 0.01 (solid horizontal white line) by the Benjamini-Hochberg Procedure. Panel C shows HA from H1N1 (Puerto Rico/8/1934) (dual specificity) complexed with an avian sialopentasccharide, although only the three terminal sugars were resolved (PDB ID: 1RVX). Panel D shows HA from H1N1 (California/4/09) in complex with a human sialopentasccharide (PDB ID: 3UBE). Both panels C and D use the same color scheme for lectins, PLIP interactions, and glycans as in Fig 5.