Skip to main content
Advertisement
  • Loading metrics

Human CD33 deficiency is associated with mild alteration of circulating white blood cell counts

Abstract

The single pass transmembrane protein CD33 is enriched in phagocytic and hematopoietic cell types, such as monocytes. CD33 is thought to be associated with immune cell function, susceptibility to Alzheimer’s disease, and rare leukemias. Antagonism or genetic ablation of CD33 has been proposed to treat Alzheimer’s disease, hematological cancers, and as a selection mechanism for enriching genetically altered blood cells. To understand the impact of chronic CD33 loss or ablation, we describe individuals who we confirmed to be missing CD33 due to germline loss of function variants. Through PheWAS-based approaches using existing whole exome biobanks and bespoke phenotyping using recall-by-genotype (RBG) studies, we show that CD33 loss of function alters circulating white blood cell counts and distributions, albeit mildly and with no overt clinical pathology. These findings indicate that chronic CD33 antagonism/ablation is likely to be safe in humans.

Author summary

CD33 is expressed on certain cell types in the human immune system. Its exact role in human health is not clear but it may be linked to Alzheimer’s disease and leukemias. Therapies are being designed to reduce or eliminate CD33 function to treat disease in humans, but the consequences of long term loss of CD33 in humans are uncertain. To help derisk future CD33 lowering therapies, we identified individuals who have been missing either one or two copies of functional CD33 since birth. Loss of CD33 was associated with mild changes in blood cell counts but otherwise no obvious clinical presentations. These findings suggest, but do not conclusively prove, that elimination of CD33 does not lead to severe health issues or chronic infection.

Introduction

CD33 molecule (CD33) is a Type 1 single pass transmembrane protein that belongs to the Siglec (Sialic acid-binding immunoglobulin-type lectins) protein family. This class of protein is broadly characterized by the presence of an N-terminal V-type immunoglobulin domain and the ability to bind sialylated glycan ligands. CD33 may preferentially bind α-2,6-linked sialic acid glycans [1].

CD33 expression is enriched in multiple phagocytic and hematopoietic cell types of mammals, including macrophages, monocytes, microglia, dendritic cells, hematopoietic progenitors, some lymphoid cell types, and myelomonocytic precursor types [1,2]. This enrichment of expression suggests a role in immune system function, consistent with the broader function of Siglec family members in regulating inhibition of immune cell activity. Siglecs possess a cytoplasmic immunoreceptor tyrosine-based inhibition motif (ITIM). Upon ligand binding by the Siglec, the ITIM is phosphorylated and recruits SH2-domain proteins that reduce immune cell activation. For example, Siglec protein CD22 negatively regulates B cell receptor activity, as demonstrated by B cell hyperactivation after challenge with a T cell dependent antigen in Cd22 knockout mice [3].

CD33’s exact function in humans remains to be ascertained. Gleaning insights into CD33 function from mouse has been confounded by the significant protein sequence divergence from human (~50% sequence identity) and the absence, in mouse, of a competent ITIM sequence. Cd33 knockout mice do not show any immune phenotype [4], whereas in vitro studies with human CD33 suggest a role for CD33 in inhibiting immune cell activity. Human monocytes treated with either a CD33 antibody or a CD33 siRNA exhibited elevated cytokine production [5]. Genome and exome wide association studies have shown non-coding variants in the CD33 locus to be associated with changes in white blood cell counts, also suggesting that this protein plays a role in immune cell regulation [68]. Multiple genome wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) in CD33 that are associated with a change in the risk for Alzheimer’s disease or late onset dementia [69].

Identification and characterization of individuals with biallelic germline loss of CD33 would further the understanding CD33’s pathophysiological function in humans. To date, only isolated singleton individuals with very low or absent CD33 expression and hematological cancer have been reported but the small number of cases precludes any generalization of potential biology [10]. Public human genetic repositories, such as the Genome Aggregation Database (gnomAD version 4.1), indicate the existence of germline homozygous carriers of predicted loss of function (pLOF) variant NP_001763.3:p.G156TfsTer5, present at an allele frequency of ~2.4% in global populations (see Fig 1A for CD33 locus schematic). Although predicted LOF variant carriers exist, there are no reports experimentally confirming these variants to be LOF or characterizing their direct phenotypic impact in humans. Understanding the phenotypic impact of CD33 LOF may inform on-going industry efforts to antagonize or genetically delete CD33 for Alzheimer’s disease [11] as well as various cell therapies for the treatment of hematological cancers in which CD33 is either genetically deleted as a mechanism to improve therapeutic cell selection and engraftment or CD33 itself is an effector target of the cell therapy [1215].

thumbnail
Fig 1. In vitro characterization of CD33 variants.

(A) A schematic of the human CD33 gene locus. Exons are overlaid with the protein domains that they correspondingly encode. Relative positions of the two most abundant frameshift variants from PGR are indicated in red. (B) Synthetic CD33 gene constructs for cell surface expression assays. The relative locations of binding epitopes for anti-CD33 monoclonal antibodies WM53 and HIM3-4 are highlighted. The full-length CD33M polypeptide consisted of the extracellular region, containing Ig-V and Ig-C2 domains, the transmembrane domain (TM), and intracellular domain (ICD). ICD contains two immunoreceptor tyrosine-based inhibitory motifs (ITIM-1 and ITIM-2). (C) FACS assessment of CD33 variant expression. CD33M is highly expressed on the surface of transfected 293F cells but TM-free truncation variants G156TFsTer5 and P238QFsTer38 are nominally expressed, as assessed by (i) cells stained with Ig-V domain-targeting monoclonal antibody WM53-PE or (ii) Ig-C2 domain-targeting monoclonal antibody HIM-3-4 (PE-Cy5). N = 3 independent assays. (D) Immunoblot assays for the expression of CD33M and truncation mutants. Medium and lysates samples were derived from 293T cultures 72 hours post transfection. Protein expression was probed with rabbit anti-human CD33 Ab134115 in combination with goat anti-rabbit IgG Dylight 680. CD33M medium showed no secreted extracellular domain polypeptide fragment. Secreted G156TfsTer5 migrated at a higher molecular weight than predicted, presumably due to glycosylation; P238QfsTer38 migrated closer to its expected molecular weight. CD33M was expressed in the cell lysate at levels higher than the truncation variants. Again, cell associated G156Tfs5Ter5 migrated at a higher-than-expected molecular weight. P238QfsTer38 exhibited evidence of degradation.

https://doi.org/10.1371/journal.pgen.1011600.g001

To understand CD33 function in humans, we employed the Pakistan Genome Resource (PGR), which is the world’s largest biobank of human homozygous pLOF carriers (knockouts) identified through whole-exome sequencing of >80,000 participants [16]. With the aid of PGR and its associated infrastructure, we identified homozygous carriers of CD33 pLOF variants, confirmed experimentally in vitro that these variants are loss of function, and evaluated whether CD33 LOF is associated with any overt clinical phenotypes. We complemented these efforts with an analysis of CD33 pLOF variants in UKB and their effect on blood cell counts and other phenotypes.

Results

Identification of CD33 frameshift variants in PGR and their biochemical characterization

A survey of the PGR (see methods) identified 19 pLOF variants that included protein truncating variants, frameshift variants, and splice disruptors (Table 1). Among these variants, the two most common were NP_001763.3:p.G156TfsTer5 and NP_001763.3:p.P238QfsTer38, with alternate allele frequencies of 0.12% and 0.039%, respectively. These two variants were selected for further in vitro characterization to evaluate impact of the frameshift variant on expression. The predicted termination of these two proteins occurs after the IgV domain but before the transmembrane domain of canonical reference CD33 protein sequence and there is thus a theoretical possibility that they could result in soluble, secreted proteins (Fig 1A).

thumbnail
Table 1. List of high confidence pLOF variants in PGR.

https://doi.org/10.1371/journal.pgen.1011600.t001

Mammalian expression constructs corresponding to the predicted cDNAs of reference allele (CD33M) and frameshift variant-containing human CD33 were generated to test their levels of surface expression and potential secretion (Fig 1B). Transient transfection of HEK293F cells with reference CD33 resulted in significant levels of surface protein expression by FACS when using an IgV domain targeting antibody, WM53, or an Ig-C2 domain targeting antibody, HIM-3-4 (Fig 1C). By contrast, neither p.G156TfsTer5 nor p.P238QfsTer38 showed significant levels of surface expression when evaluated under the same conditions. Western blots of transiently transfected HEK293F cell lysates and medium were conducted to ascertain total expression of the variants from cDNA cassettes as well as their potential secretion into the media. Probing of total cell lysates demonstrated expression of reference CD33M as well as the frameshift variants, indicating that cells could support generation of the frameshift variants (Fig 1D). Evaluation of CD33 signal in the media revealed detectable levels of secreted frameshift variants but not CD33M reference allele. In aggregate, these data indicated that frameshift variant proteins were efficiently produced by cells and, if the frameshift variant transcripts were able to escape nonsense mediated mRNA decay in human carriers, a secreted protein could be generated. Measurement of CD33 protein levels in both sera and cells of human variant carriers was therefore necessary to resolve this.

Recall-by-genotype of CD33 variant carriers from the Pakistan Genomic Resource

The population of Pakistan has a high rate of consanguinity, which can result in homozygous enrichment of otherwise rare variants, including pLOF variants [16]. The Pakistan Genomic Resource (PGR) at the Center for Non-Communicable Diseases in Karachi, Pakistan contains a large biobank of individuals (> 250,000) who have consented to participate in sequencing and recontact studies. With the aid of the whole exome sequencing data from the PGR, carriers of the two most common frameshift variants, p.G156TfsTer5 and p.P238QfsTer38, as well as a less common frameshift variant, p.Q213RfsTer2, were recontacted in a recall-by-genotype (RBG) study. These individuals, along with recruited consenting family members, were (1) genotyped at CD33 by Sanger sequencing to confirm carrier status and zygosity, and (2) phenotypically characterized for a wide array of traits including anthropometry, blood pressure, a focused lipid panel, complete blood counts, and personal and family medical histories.

Through RBG of 14 probands, 114 individuals were identified across 14 different families: 11 families for the variant p.G156TfsTer5, 2 families for the variant p.P238QfsTer38, and 1 family for the variant p.Q213RfsTer2. A breakdown of the variant carrier numbers, as ascertained by Sanger sequencing, is shown in Table 2. Importantly, the recall studies successfully identified 3 homozygous p.G156TfsTer5 variant carriers: 2 brothers, ages 55 (participant ID 1A) and 58 (participant ID 2A), in family A (Fig 2) and a 35 year-old male (participant ID 2B) in family B (Fig 3).

thumbnail
Table 2. A breakdown of CD33 frameshift variant carriers identified through recall by genotype studies.

https://doi.org/10.1371/journal.pgen.1011600.t002

thumbnail
Fig 2. Pedigree of homozygous p.G156TfsTer5 variant carrier family A.

https://doi.org/10.1371/journal.pgen.1011600.g002

thumbnail
Fig 3. Pedigree of homozygous p.G156TfsTer5 variant carrier family B.

https://doi.org/10.1371/journal.pgen.1011600.g003

To ascertain whether the frameshift variants resulted in any secreted CD33 isoforms in the circulation of variant carriers, two sandwich-format MesoScaleDiscovery (MSD) assays for the CD33 ectodomain were established In the first assay, monoclonal antibody hSGL3 recognizing the common IgV domain of all isoforms was used to capture CD33 protein from sera followed by detection with a monoclonal antibody, 3D6, that binds to the Ig-C2 domain (Fig 4A and 4B). Soluble CD33 protein was detectable in the sera of homozygous reference carriers (Fig 4C, red bar) and levels of free CD33 protein dropped off in a significant, gene-dosage dependent manner for the frameshift variant carriers (Fig 4C, blue and green bars). Homozygous variant carriers had non-detectable levels of protein in circulation. This general pattern was seen across all of the frameshift variants and in all families (Fig 4D). During assay development, it was observed that capture and detection of recombinant purified G156TfsTer5 with the hSGL3/3D6 antibody pair was poor (Fig 5A), perhaps due to reduced ability of this variant to bind to 3D6. As a consequence, a second assay using rabbit polyclonal antibody rT16-PA to capture CD33 protein and a murine version of hSGL3 to detect G156TfsTer5 was developed (Fig 5B and 5C). hSGL3 was reformatted to a murine isotype to reduce background. A subset of recall by genotype sera samples, including all 3 homozygous G156TfsTer5 samples, were tested in this assay (Fig 5D). Total binding signal in this assay format reduced in a gene dosage-dependent manner with homozygous G156TfsTer5 carriers showing no detectable levels of CD33 protein in sera even at the lowest sample dilution (1:10). These data indicated that, in contrast to the secreted proteins generated by the frameshift variants in vitro, there was no secreted protein produced in vivo, possibly due to nonsense mediated decay of the variant transcripts.

thumbnail
Fig 4. Summary of participant serum CD33 concentration assays.

(a) Recombinant CD33 protein constructs used to evaluate CD33 detection assays. hSGL3, anti-CD33 human IgG1 binds to an epitope located at the N-terminus of Ig-V domain. 3D6, mouse anti-human CD33 mAb binds to the Ig-C2 region. (b) Schematic of quantitative binding assay for serum CD33 levels. (c) CD33 serum concentrations across all frameshift variant genotypes, 0 = noncarrier, 1= heterozygous variant carrier, 2= homozygous variant carrier. CD33 serum levels among the three genotype groups were statistically significant (P<1E-4). (D) CD33 levels for each frameshift variant stratified by genotype and family ID.

https://doi.org/10.1371/journal.pgen.1011600.g004

thumbnail
Fig 5. Independent binding assay confirms that circulating levels of CD33 protein in G156TfsTer5 homozygous carriers are below detection level.

(A) G156TfsTerHis, which is missing most of the Ig-C2 domain that is recognized by 3D6 antibody, is poorly detected in the ELISA format shown in Fig 4B. (B) An ELISA assay designed with rabbit anti-human CD33 polyclonal antibody rT16-PA (Cat# 12238-T16, SinoBiological; immunogen = M1-H259 of CD33) to capture CD33 polypeptides followed by detection with hSGL3 antibody can directly detect G156TfsTer5. Signals from purified, recombinant CD33MECDHis and G156TfsTer5His tag constructs are shown. (C) Reformatting of hSGL3 to a murine constant region (mSGL3) preserves detection of G156TfsTer5 protein. (D). Measurement of G156TFsTer5 levels in select participant sera samples from the recall by genotype study. Homozygous reference:0” genotype; Heterozygous carrier:1” genotype; Homozygous G156TfsTer5 carrier:2” genotype.

https://doi.org/10.1371/journal.pgen.1011600.g005

To confirm that CD33 expression was affected at the surface of relevant immune cell types, peripheral blood mononuclear cells (PBMCs) were isolated from participants in families A and B, which contained heterozygous and homozygous carriers of the p.G156TfsTer5 variant as well as homozygous reference allele carriers (see Figs 2 and 3 for pedigrees comprising these individuals along with sample Sanger sequencing results). Cell surface expression of CD33 protein was evaluated in the PBMCs by FACS using a monoclonal antibody (clone P67.6) that recognizes the common IgV domain of all CD33 isoforms [17]. Surface expression of CD33 on CD45+ cells was clearly detectable in homozygous reference carriers (Genotype 0, Fig 6A). This expression was ablated in PBMCs of homozygous p.G156TfsTer5 variant carriers (Genotype 2, Fig 6A). Heterozygous variant carriers (Genotype 1) had intermediate percentages of CD33+ expression that varied between those found in homozygous reference carries to complete loss of expression (Fig 6A). Quantification of surface CD33 expression by mean fluorescence intensity (MFI) showed a significant gene dosage-dependent decrease in expression (Fig 6B).

thumbnail
Fig 6. FACS-based evaluation of CD33 levels and multi-lineages from PBMCs in families A and B.

(A) CD33+ myeloid cells as a percentage of total CD45+ cells. (B) Comparison of levels of surface CD33 expression in p.G156TfsTer5 carrier families. Histograms of CD33-APC of individual participates are shown in left and statistical analysis of mean fluorescence intensity (MFI) is shown in right. Histograms are colored by genotype. Unstained control cells are shown in grey. (C) CD14+ monocytes as a percentage of total CD45+ cells. (D) CD3+ T cells as a percentage of total CD45+ cells. (E) CD19+ B cells as a percentage of total CD45+ cells. *P <5E-2, **P <1E-2 compared between indicated groups. P values were determined by one-way analysis of variance (ANOVA).

https://doi.org/10.1371/journal.pgen.1011600.g006

Although all myeloid cells of homozygous variant carriers showed no detectable expression of CD33, the monocytes in the PBMCs of these carriers still expressed CD14 (Genotype 2, Fig 6C). Moreover, compared to homozygous and heterozygous reference carriers, homozygous variant carriers displayed slight but significantly higher frequency of CD14+ monocytes (Fig 6C). Correspondingly, the percentage of peripheral CD3+ T cells was slightly lower in homozygous variant carriers (Genotype 2, Fig 6D). By contrast, CD19+ B cells as a percentage of total CD45+ hematopoietic cells in circulation were equivalent across all family members regardless of carrier status (Fig 6E). Although the number of participants for which PBMCs could be profiled was limited, the data show that loss of CD33 was resulting in a subtle yet significant change in the immune cell compartment.

To determine whether CD33 LOF was resulting in any other phenotypic changes across all participants in the RBG, 47 phenotypes were analyzed for potential associations with CD33 LOF carrier status (Table 3). No phenotypes were found to be significantly associated with CD33 LOF after multiple testing correction (P = 1.1E-3). This includes complete blood cell count with differentials, suggesting that the loss of CD33 did not produce a large global perturbation in immune cell distributions—consistent with the findings from the PBMC analyses of a subset of variant carriers. Several nominal and weak associations were found: higher waist-to-hip ratio (P = 4.6E-2), higher cholesterol (P = 1.3E-2), higher triglycerides (P = 3.4E-2), higher LDL cholesterol (LDL-C) (P = 1.3E-2), and higher rate of myocardial infarction (MI, P = 1E-2) and cataracts (P =4.4E-2). The observation of higher incidence of MI may be driven by ascertainment bias since a subset of CD33 LOF carrier probands were recruited from an MI case cohort within PGR. The identification of three adult homozygous LOF males who have all had children also indicates that complete LOF of CD33 does not result in male sterility.

thumbnail
Table 3. Distribution of baseline characteristics of recall-by-genotype participants per genotype.

https://doi.org/10.1371/journal.pgen.1011600.t003

CD33 PheWAS

To determine whether loss of CD33 was associated with any of the phenotypes assessed as part of the broader PGR cohort, a burden phenome-wide association study (PheWAS) was performed on quantitative (Table 4) and binary traits (Table 5). No associations were observed that were significant after correcting for multiple testing. CD33 loss of function was only nominally association with a lower height (β = -0.2, P = 9.4E-4) and parathyroid hormone levels (β = -0.48, P = 8E-3) and higher BMI (β = 0.17, P = 1.34E-2) in the additive model. In the recessive model a nominal association with increasing glucose was observed (β = 0.88, P =4.5E-2)

thumbnail
Table 4. Results of a PGR burden PheWAS of quantitative traits using pooled high confidence pLOF variants.

https://doi.org/10.1371/journal.pgen.1011600.t004

thumbnail
Table 5. Results of a PGR burden PheWAS of binary traits using pooled high confidence pLOF variants.

https://doi.org/10.1371/journal.pgen.1011600.t005

Given the modest blood cell phenotype observed from the PBMCs collected from the PGR recall study along with previous publications indicating that CD33 may play a role in susceptibility to Alzheimer’s disease and isolated leukemias, a subset of 31 continuous phenotypes and 13 binary outcomes related to hematological clinical endpoints as well as neurological disease were evaluated for association with CD33 LOF in the large whole exome dataset available through UK Biobank. The full list of 27 pLOF variants surveyed as part of the UKBB PheWAS is described in Table 6.

thumbnail
Table 6. List of high-confidence pLOF variants identified in UK Biobank used for PheWAS.

https://doi.org/10.1371/journal.pgen.1011600.t006

Among the quantitative hematological traits queried, platelet crit (β = -0.027, P = 4.0E-6), leukocyte counts (β = -0.027, P = 2.0E-5), and lymphocyte counts (β = -0.024, P = 1.0E-4) showed small but significantly lower levels in CD33 pLOF carriers after Bonferroni multiple testing correction (Table 7; additive model). Nominally significantly lower counts of platelets (β = -0.018, P = 2.1E-3), monocytes (β = -0.018, P = 2.5E-3), and neutrophils (β =-0.019, P = 2.7E-3) as well as platelet/thrombocyte volume (β = -0.012, P = 2.9E-2) were also observed. In contrast, reticulocyte percentage (β =0.025, P = 6.3E-5), total reticulocyte count (β = 0.023, P = 1.8E-4), and high light scatter reticulocyte percentage (β = 0.023, P = 3.1E-4) were significantly higher among pLOF carriers.

thumbnail
Table 7. Summary of PheWAS findings for high confidence pLOF variants from the UK Biobank.

https://doi.org/10.1371/journal.pgen.1011600.t007

No binary outcomes in UKBB were found to be statistically significant after correction for multiple testing (Table 7). Nominally significant associations found were with a lower vascular dementia risk (OR = 0.80, P = 4E-2; additive model) and a higher rate of cataracts (OR = 1.39, P = 2.5; recessive model). No significant reduction in Alzheimer’s disease risk was observed. There was also no significant association observed between CD33 LOF and myeloid or monocytic leukemias. For these latter phenotypes, the low total number of cases in UKBB may limit the discovery power to detect a signal.

Discussion

To our knowledge, no systematic evaluation has been reported that ascertains whether loss of CD33 is a modulator of disease risk in humans. Rare individuals with very low or absent CD33 expression and concurrent pathology have been reported. One report describes an infant with low levels of surface CD33 expression, copper deficiency, and myelodysplastic syndrome (10). It is unclear if the pathologies were directly related to CD33 and if the loss of CD33 expression was due to the dysmyelopoiesis present in myelodysplastic syndrome or germline CD33 variants carried by the infant as no DNA sequencing was reported. A second report describes a 68-year-old woman with acute myeloid leukemia and no detectable levels of surface CD33 protein [18]. DNA analysis of a bone marrow aspirate of the patient with active disease, and prior to a bone marrow transplant, revealed a homozygous CD33 deletion (19:51225847:CCCGG:C) that would yield the G156TfsTer5 variant. But without sequencing of a separate non-transformed tissue DNA source, it is unclear if this deletion was a germline variant or a variant present in leukemic cells. To better understand the role of CD33 in human health and disease, we systematically evaluated the impact of germline CD33 LOF variants in humans through a combination of biochemical characterization of LOF variants, recall of confirmed LOF variant carriers for in-depth phenotyping including profiling of PBMCs, and PheWAS of two large whole exome biobanks (PGR and UKBB).

Within the Pakistani population, CD33 LOF allele frequency was observed to be rare (0.4%) and was not associated with any overt disease states in the PGR biobank after multiple testing corrections. RBG studies additionally showed no obvious enrichment or depletion of disease in families with heterozygous and homozygous LOF variant carriers. Complete blood cell counts using an automated hematology analyzer also showed no significant changes in circulating cells. FACS-based analysis of PBMCs from consenting participants, however, did show increases in CD14+ monocytes and a decrease in CD3+ T cells in homozygous LOF carriers. It is unclear if the differences observed between the hematology analyzer and FACS analysis are methodologically based (the hematology analyzer does not use the more precise FACS-based cell surface markers for cell population identification) or due to the smaller number of samples from the FACS study. Despite these changed counts, individuals did not report any notable history of recurrent infectious disease. Additionally, the existence of homozygous LOF males in these recall studies who have successfully had children indicate that CD33 is dispensable for male fertility and fetal-derived CD33 is not essential for in utero development in males.

In the largely non-Finnish European-based ancestry of UKBB, CD33 pLOF variants occurred at a much higher overall frequency (3% allele frequency) than in Pakistan (0.4% allele frequency). Whether selective forces are behind the nearly 10-fold higher enrichment of CD33 pLOF variants in non-Finnish Europeans compared to South Asians may be worth investigating. Consistent with the data from the PGR dataset and recall studies, CD33 LOF was not associated with any overt disease states—including leukemias—in UKBB. Additionally, like the Pakistani cohort, small yet significant changes in circulating cell compartment profiles were observed. This included reductions in white blood cell counts. The magnitude of these changes is very small. The β = -0.027 for changes in white blood cells in UKBB, for instance, translates to an absolute cell count change of -5.8E4 cells/ml, which is ~0.8% of the mean total WBC count of 6.9E6 cells/ml.

Our results also show the presence of shed CD33 in otherwise healthy individuals. Confirmed LOF variants had significantly lower levels of shed protein and surface expression of CD33 on PBMCs. These results suggest that circulating shed protein may be a sensitive proxy marker for in situ protein expression in the absence of access to primary tissue [19].

The absence of notable pathology in both confirmed CD33 LOF and pLOF carriers from PGR and UKBB suggests that CD33 is dispensable for normal human health. A possible limitation of this study, however, is that we analyzed biobanks that did not purposefully recruit a large number of individuals with confirmed immune disease. This could result in an ascertainment bias that would underpower the ability to detect a pathophysiological role of CD33 on the background pre-existing immune dysfunction. Future studies with appropriately constructed case-control designs will be needed to address this issue.

Overall, we observed that CD33 LOF produces small, but significant, perturbations in circulating cells within the myeloid cell lineage. These small changes, however, do not appear to result in any adverse phenotypes. This finding is consistent with the phenotype of the Cd33 knockout mouse, which also shows no serious pathology. Importantly, the absence of a remarkable phenotype associated with life-long CD33 LOF strongly suggests but does not conclusively prove that chronic inhibition/ablation of CD33 as a therapeutic intervention is likely to be safe in humans.

Methods

Ethics statement

The Institutional Review Board (IRB) at the Center for Non-Communicable Diseases (IRB: 00007048, IORG0005843, FWAS00014490) approved the study. All participants gave written informed consent.

In vitro expression studies

Gene constructs for CD33 wild-type and truncation variants p.G156TFsTer5 and p.P238QFsTer38. A synthetic gene construct for full-length, reference allele human CD33, also called CD33M (RefSeq: NP_001763.3), was codon-optimized for expression in human cells. From the CD33M construct we derived expression vectors for p.G156TFsTer5 (G156TFsTer5) and p.P238QFsTer38 (P238QFsTer38). CD33 gene constructs were synthesized at GeneWiz in pRS5a expression vector.

Cell culture.

Human embryonic kidney HEK293F (293F) cell lines were maintained in suspension culture with free-style 293 expression medium (FS293EM) from Invitrogen in a shaking incubator at 37 °C in the presence of 8% CO2 and 80% humidity with 110 revolutions per minute. Cultures with a viability of >96% were utilized for transfection assays. Transfections were conducted in 100 ml HEK293F cells with 100ug plasmid DNA and 250ug PEI (Polyethylenimine 25K) from Polysciences (Warrington, PA). Cultures were harvested after 48h for analyses.

Antibody staining and fluorescence activated cytometry sorting (FACS) of transfected HEK293F cells.

To determine cell surface expression of CD33 variants in transfected cells, fluorescence-conjugated anti-CD33 mouse monoclonal antibody WM53-PE (ab233577) and mAb HIM3-4 (PE-Cy5) (ab169823) were used from Abcam. WM53-PE and HIM3-4 (PE-Cy5) bind to Ig-V and Ig-C2 domain, respectively (Fig 1B). 48h hours post transfection harvested, cells were washed in DPBS, and resuspended in cold FACS buffer (DPBS supplemented with 1% fetal calf serum and 1% goat serum) at 2.5 million cells/ml. For mouse antibody staining, cell suspensions were blocked for 30min with 1:10 diluted mouse serum prior to staining for 40min with predetermined amount of WM53-PE or HIM3-4 (PE-Cy5). Afterwards, the stained cells were washed in FACS buffer and loaded into 96-well plates for analysis on a Fortessa FACS device.

MSD binding assays for quantification of serum CD33 proteins

A quantitative assay for measuring serum CD33 levels was developed with recombinant monomeric extracellular domain of CD33M (CD33MECDHis, Fig 1B), a commercial anti-CD33 mAb 3D6 (Ab11031, Abcam), and a recombinant anti-CD33 mAb hSGL3 generated from a sequence previously reported [20]. hSGL3 was used for capturing CD33 and 3D6 for detection with standard bind 96-well MSD plates (MesoScaleDiscovery). Briefly, MSD plates were coated with 1.5ug/ml hSGL3 human IgG1 in phosphate-buffered saline (PBS), pH7.4, at 4 °C overnight. Similar MSD plates were incubated with PBS as negative controls. Plates were blocked with 150ul of 1X KPL Milk Diluent/Blocking Solution (Cat# 5140-0010, SeraCare Life Science) diluted in 1X TBS-Tween 20 (Cat# 28360, Thermo Scientific) on a rocking platform for 60min at room temperature followed by 3 washes with 1X TBS-Tween 20 buffer. A positive standard curve was created with CD33MECDHis in assay buffer (1XTBS-Tween 20 supplemented with 0.2% bovine serum albumin). To test serum samples, 30ul serum was mixed with 270ul assay buffer followed by 2.5-fold serial dilutions. Then 35ul serially diluted CD33MECDHis and serum samples were incubated with hSGL3 coated and uncoated plates for 60min followed by 3 washes. Then the plates were incubated with 10nM 3D6 for 60min. After 3 washes, the plates were further incubated with MSD goat anti-mouse IgG sulfo-tag antibody (R32AC-1, 1:1400 dilution) for 60min. After 3x washing, plates were read with 50ul 1X MSD reading buffer T (R92TC-2). Data were processed in Excel and analyzed in Graph Pad Prism 9.2.0. The statistical significance of CD33 concentrations among three genotype groups was analyzed by one-way ANOVA.

FACS analysis

Human PBMCs collected from consenting participants were pre-incubated with Human TruStain FcX (BioLegend Cat. No. 422301) to block Fc-Receptors. Analysis of human CD33 expression levels was performed using the following antibodies: CD45-BUV395 (BD, 563792), CD33-APC (Biolegend, 366626), and Live/Dead Fixable Near IR (Life Technology, L34992). For assessing human PBMC subsets, the following panel was used: CD45-BUV395 (BD, 563792), CD3-BV421 (Biolegend, 317344), CD14-FITC (BD, 561712), CD19-PE (Biolegend, 302208), and Live/Dead Fixable Near IR (Life Technology, L34992). All flow cytometry was performed on a BD Fortessa and analysis was performed using FlowJo V10. The live singlets were first gated by CD45, and the CD45+ population was then further gated by CD33, CD14, CD3 or CD19 to identify the CD33+ myeloid cells, monocytes, T cells or B cells, respectively (see Fig 7 for sample gating strategy).

thumbnail
Fig 7. Representative gating strategy data for PBMC FACS studies from a homozygous reference (A), heterozygous p.G156TFTer5 (B), and homozygous p.G156TFsTer5 carriers.

https://doi.org/10.1371/journal.pgen.1011600.g007

Variant QC and annotation

Exome sequencing for the PGR samples was performed at two different locations, at the Broad Institute as described earlier [16] and at the Regeneron Genetics Center. All samples were sequenced at 30X coverage. Samples with low allele balance (< 0.2) or low depth (< 10) were set to missing and variants which had a missing rate > 5% were removed. UKBB exome sequencing was performed by the RGC as previously described [21]. Additional genotype and sample level QC was performed as described previously [22]. Variant annotation was performed using Variant Effect Predictor (VEP) version NNN with the Loftee plugin [23,24]. High confidence pLOF variants were annotated as stop gained, frameshift, splice donor, splice acceptor variants based on Loftee filtering.

Exome analysis.

All quantitative traits were transformed by the rank based inverse standard normal function, applied within each genotyping batch. Quantitative traits were analyzed using linear regression as implemented in regenie [25]. All analyses were adjusted for age, sex, age*sex, age ^ 2 and top 10 genetics PCs generated using common genotyping array SNPs. For exomes data, if genotyping array data wasn’t available, PCs were derived from common (MAF > 1%) exome SNPs. Exomes and genomes data were analyzed separately across sequencing centers and meta-analyzed using inverse variance weighted meta-analysis as implemented in METAL [26]. Binary traits were analyzed using logistic regression model, with Firth fallback.

Phenotype definitions: LDL-c, calculated using Friedwald equation, was analyzed by subsetting to individuals who were not on cholesterol lowering drugs, glucose levels were analyzed for participants who were not on oral antidiabetic drugs and creatinine and eGFR (calculated using the CKD-EPI calculation) were subset to participants without heart failure. Myocardial infarction cases were enrolled at time of event as described. Type 2 diabetes was defined as individuals having an HbA1c>= 6.5 or self-reporting to have diabetes or using oral-hypoglycemics. Individuals with a diabetes age of onset less than 30 were excluded from the analysis. Type 2 diabetes controls were ascertained as individuals with HbA1c < 6.5 or no self-reported history of diabetes and their random blood glucose levels were less than 150 mg/dl. Stroke cases, angina, atrial fibrillation/ irregular heartbeats, hypertension were all self-reported. Atherosclerotic cardiovascular disease cases were defined as any individuals with myocardial infarction, stroke, or angina, and controls were healthy individuals without any cardiovascular disease history.

Recall by genotype study

Sanger sequencing.

Whole blood-derived DNA from recalled participants was used to carrier status and zygosity for variants of interest via Sanger sequencing. Sanger sequencing was conducted at Macrogen, Inc (South Korea) or in-house at the CNCD. For Macrogen-processed samples, DNA samples were shipped directly to Macrogen for PCR amplification and Sanger sequencing. PCR primers were designed covering a region of approximately 200 to 300 bases around the variant. For in-house Sanger sequencing, specific primers were designed to amplify the region of interest using Platinum Master Mix (Thermo Scientific, USA). This amplified DNA product was processed using ExoSAP-IT Express PCR Product Cleanup (Thermo Scientific, USA), BigDye XTerminator (Thermo Scientific, USA), then run on Applied Biosystems SeqStudio Genetic Analyzer (Thermo Scientific, USA). Manufacturers’ protocols were followed for all kits.

Recall by genotype analyses.

PheWAS for quantitative traits in the recall study was performed using an additive linear mixed model with age and sex as fixed effects, and family ID as a random effect. All quantitative traits were transformed by the rank based inverse standard normal function. Binary traits were analyzed using logistic regression with age and gender as covariates. All analyses were performed as burden analyses, with carriers of different CD33 LOF variants grouped together. All analyses were performed using the statsmodels Python module (version 0.13.0).

PBMC isolation.

20 ml of whole blood from consenting participants was collected in a citrate phosphate dextrose solution with EDTA and heparin. An equal volume of Ca2+/Mg2+-free DPBS buffer supplemented with 2% FBS was added and mixed. The cell mixture was carefully layered over sterile Ficoll-Paque density 1.077 media (GE Healthcare cat #17-5442-02) and centrifuged at 400xg for 30 min. Mononuclear cells were isolated and washed in Ca2+/Mg2+-free DPBS buffer supplemented with 2% FBS. After washing, cells were resuspended in wash buffer to a concentration of ~2e7 cells/ml. An equal volume of cell freezing medium (Cryostore CS10, Stemcell Technologies cat 100-1061) was added and cells were slow froze at -80C in a Mr. Frosty Cell freezer (Thermo Scientific cat no 5100-0001). For long-term storage, cells were transferred to liquid nitrogen.

UK Biobank (UKBB) dataset analyses

UK Biobank analysis was performed in a similar manner to the PGR PheWAS analysis. Data was accessed under approved UK Biobank project 59456. Analyses were restricted to a set of unrelated individuals with high quality genetic data (UKBB Data Field 22020). A quality-controlled set of genotyping array data was used in regenie’s whole genome regression step 1. All quantitative traits were transformed by the rank based inverse standard normal function. Quantitative traits were analyzed using linear regression as implemented in regenie. All analyses were adjusted for age, sex, age*sex, age^2 and the top 10 genetics PCs generated using common genotyping array SNPs. Binary traits were analyzed using a logistic regression model with Firth fallback. Continuous and binary phenotypes related to hematological clinical endpoints as well as neurological disease were selected for evaluation. For quantitative traits, the first available instance was assessed. Binary outcomes were defined based on the presence or absence of ICD10 codes. The specific ICD10 codes evaluated are listed in Table 6. All analyses were done as part of UK Biobank Resource Application Number 59456.

Acknowledgments

The authors would like to thank Vinney George, Philippe Runge, and Paul Schroeder of Novartis for their contributions to compliance documentation associated with human tissue samples. The authors are additionally grateful to all of the PGR, recall-by-genotype participants, and UK Biobank participants for their vital contributions to this research.

References

  1. 1. Crocker PR, Paulson JC, Varki A. Siglecs and their roles in the immune system. Nat Rev Immunol. 2007;7(4):255–66. pmid:17380156
  2. 2. Hernández-Caselles T, Martínez-Esparza M, Pérez-Oliva AB, Quintanilla-Cecconi AM, García-Alonso A, Alvarez-López DMR, et al. A study of CD33 (SIGLEC-3) antigen expression and function on activated human T and NK cells: two isoforms of CD33 are generated by alternative splicing. J Leukoc Biol. 2006;79(1):46–58. pmid:16380601
  3. 3. O’Keefe TL, Williams GT, Davies SL, Neuberger MS. Hyperresponsive B cells in CD22-deficient mice. Science. 1996;274(5288):798–801. pmid:8864124
  4. 4. Brinkman-Van der Linden ECM, Angata T, Reynolds SA, Powell LD, Hedrick SM, Varki A. CD33/Siglec-3 binding specificity, expression pattern, and consequences of gene deletion in mice. Mol Cell Biol. 2003;23(12):4199–206. pmid:12773563
  5. 5. Lajaunias F, Dayer J-M, Chizzolini C. Constitutive repressor activity of CD33 on human monocytes requires sialic acid recognition and phosphoinositide 3-kinase-mediated intracellular signaling. Eur J Immunol. 2005;35(1):243–51. pmid:15597323
  6. 6. Logue MW, Schu M, Vardarajan BN, Buros J, Green RC, Go RCP, et al. A comprehensive genetic association study of Alzheimer disease in African Americans. Arch Neurol. 2011;68(12):1569–79. pmid:22159054
  7. 7. Deng Y-L, Liu L-H, Wang Y, Tang H-D, Ren R-J, Xu W, et al. The prevalence of CD33 and MS4A6A variant in Chinese Han population with Alzheimer’s disease. Hum Genet. 2012;131(7):1245–9. pmid:22382309
  8. 8. Hollingworth P, Harold D, Sims R, Gerrish A, Lambert J, Carrasquillo M, et al. Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer’s disease. Nature Genetics. 2011;43(5):429–35.
  9. 9. Naj AC, Jun G, Beecham GW, Wang L-S, Vardarajan BN, Buros J, et al. Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer’s disease. Nat Genet. 2011;43(5):436–41. pmid:21460841
  10. 10. Kohla S, Ali E, Amer A, Yousif T, Yassin MA. A Rare Case of Severe Copper Deficiency in an Infant with Exclusive Breast Feeding Mimicking Myelodysplastic Syndrome. Case Rep Oncol. 2020;13(1):62–8. pmid:32110221
  11. 11. Maslyar D, Paul R, Long H, Rhinn H, Tassi I, Morrison G, et al. A phase 1 study of AL003 in healthy volunteers and participants with Alzheimer’s disease (P5-3.002). Neurology. 2022;98(18 Supplement):3582.
  12. 12. Kim MY, Yu K-R, Kenderian SS, Ruella M, Chen S, Shin T-H, et al. Genetic Inactivation of CD33 in Hematopoietic Stem Cells to Enable CAR T Cell Immunotherapy for Acute Myeloid Leukemia. Cell. 2018;173(6):1439–1453.e19. pmid:29856956
  13. 13. Borot F, Wang H, Ma Y, Jafarov T, Raza A, Ali A, et al. Gene-edited stem cells enable CD33-directed immune therapy for myeloid malignancies. Proceedings of the National Academy of Sciences of the United States of America. 2019;116(24):11978–87.
  14. 14. Frankel NW, Deng H, Yucel G, Gainer M, Leemans N, Lam A, et al. Precision off-the-shelf natural killer cell therapies for oncology with logic-gated gene circuits. Cell Rep. 2024;43(5):114145. pmid:38669141
  15. 15. Kenderian SS, Ruella M, Shestova O, Klichinsky M, Aikawa V, Morrissette JJD, et al. CD33-specific chimeric antigen receptor T cells exhibit potent preclinical activity against human acute myeloid leukemia. Leukemia. 2015;29(8):1637–47. pmid:25721896
  16. 16. Saleheen D, Natarajan P, Armean IM, Zhao W, Rasheed A, Khetarpal SA, et al. Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. Nature. 2017;544(7649):235–9. pmid:28406212
  17. 17. Pérez-Oliva AB, Martínez-Esparza M, Vicente-Fernández JJ, Corral-San Miguel R, García-Peñarrubia P, Hernández-Caselles T. Epitope mapping, expression and post-translational modifications of two isoforms of CD33 (CD33M and CD33m) on lymphoid and myeloid human cells. Glycobiology. 2011;21(6):757–70. pmid:21278227
  18. 18. Papageorgiou I, Loken MR, Brodersen LE, Gbadamosi M, Uy GL, Meshinchi S, et al. CCGG deletion (rs201074739) in CD33 results in premature termination codon and complete loss of CD33 expression: another key variant with potential impact on response to CD33-directed agents. Leuk Lymphoma. 2019;60(9):2287–90. pmid:30721105
  19. 19. Dhindsa RS, Burren OS, Sun BB, Prins BP, Matelska D, Wheeler E. Rare variant associations with plasma protein levels in the UK Biobank. Nature. 2023;622(7982):339–47.
  20. 20. Konopitzky RBE, Adam P, Heider K-H. Inventor; Boehringer Ingelheim International GmbH, assignee. CD33 binding agents. United States; 2011.
  21. 21. Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599(7886):628–34. pmid:34662886
  22. 22. Chen S, Francioli LC, Goodrich JK, Collins RL, Kanai M, Wang Q, et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024;625(7993):92–100. pmid:38057664
  23. 23. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biology. 2016;17.
  24. 24. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. Author Correction: The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2021;590(7846):E53. pmid:33536625
  25. 25. Mbatchou J, Barnard L, Backman J, Marcketta A, Kosmicki J, Ziyatdinov A. Computationally efficient whole-genome regression for quantitative and binary traits. Nature Genetics. 2021;53(7):1097–103.
  26. 26. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–1. pmid:20616382