Glycosylation of Immunoglobulin G: Role of Genetic and Epigenetic Influences

Objective To determine the extent to which genetic and epigenetic factors contribute to variations in glycosylation of immunoglobulin G (IgG) in humans. Methods 76 N-glycan traits in circulating IgG were analyzed by UPLC in 220 monozygotic and 310 dizygotic twin pairs from TwinsUK. A classical twin study design was used to derive the additive genetic, common and unique environmental components defining the variance in these traits. Epigenome-wide association analysis was performed using the Illumina 27k chip. Results 51 of the 76 glycan traits studied have an additive genetic component (heritability, h 2)≥ 0.5. In contrast, 12 glycan traits had a low genetic contribution (h2<0.35). We then tested for association between methylation levels and glycan levels (P<2 x10-6). Among glycan traits with low heritability probe cg08392591 maps to a CpG island 5’ from the ANKRD11 gene, a p53 activator on chromosome 16. Probe cg26991199 maps to the SRSF10 gene involved in regulation of RNA splicing and particularly in regulation of splicing of mRNA precursors upon heat shock. Among those with high heritability we found cg13782134 (mapping to the NRN1L gene) and cg16029957 mapping near the QPCT gene to be array-wide significant. The proportion of array-wide epigenetic associations was significantly larger (P<0.005) among glycans with low heritability (42%) than in those with high heritability (6.2%). Conclusions Glycome analyses might provide a useful integration of genetic and non-genetic factors to further our understanding of the role of glycosylation in both normal physiology and disease.


Introduction
Glycans constitute the most abundant and diverse form of the post-translational modifications. All cell surface and secreted glycoproteins that contain appropriate sequences (Asn-X-Ser/Thr where X is any amino acid except proline) can potentially acquire N-linked oligosaccharides (N-glycans) while they travel through the endoplasmic reticulum and the Golgi compartments [1]. Glycans can influence disease development in many syndromes such as congenital disorders of glycosylation, cancer, rheumatoid arthritis and AIDS [2]. Glycans are crucial for the immune system, as some of the most important interactions between the immune system and viruses and bacteria are mediated by protein-glycan interactions. Moreover, glycans are key in the recognition of non-self events and an altered glycome can lead to autoimmune disorders [3]. The biological functions of glycans go from basic structural roles to development, protein folding and immune response. Glycosylation is known to be affected by factors such as sugar nucleotide concentration, type of glyco-enzymes and their expression levels [1].
While genes unequivocally determine the structure of each polypeptide, there is no genetic template for the glycan part [4]. Instead, hundreds of genes and their products interact in the complex pathway of glycan biosynthesis resulting in a very complex biosynthetic pathway that is further complicated by both direct environmental influence (nutrition, hormonal status, etc) and epigenetic memory of past environmental effects (altered gene expression) [5][6][7][8].
Some recent studies [9,10] demonstrated large variability of plasma proteins and immunoglobulin G (IgG) N-glycome composition in a population. However, longitudinal studies in individuals also revealed very high temporal stability of the individual plasma glycome [11] indicating stable long-term regulation of the glycosylation machinery.
Several studies have already shown that there are consistent genetic factors that affect circulating levels of some IgG Nglycans [12][13][14]. However, the extent to which genetic and nongenetic factors affect glycan levels is still unexplored.
In this study we aim to determine the extent to which individual differences in IgG glycosylation pattern reflect genetic versus environmental influences by estimating heritability in a cohort of female twins. We, then, hypothesized that IgG glycosylation pattern with low heritability (h 2 ≤0.35) might be epigenetically mediated and we explored the relationship with epigenetic data.

Results
The study cohort comprised 220 monozygotic (MZ) and 310 dizygotic (DZ) twin pairs, and the baseline characteristics are presented in Table 1. MZ and DZ twin pairs were not significantly different for age and body mass index (BMI), and all of them were females. The number of twin pairs used in this study is sufficient to detect with 95% power heritability of 0.4 or higher with P<0.05 for a range of shared environmental contributions from low (0.1) to high (0.5) [15]. In total 76 glycan traits were studied which were derived from 24 directly measured glycan structures and derived measures. A detailed description of the glycan structures and the derived measures can be found in Lauc et al. [14]. For example, the Fc region of IgG contains two highly conserved asparagine (Asn-297) residues where complex core-fucosylated glycans such as FA2 (GP4) , FA2G1 (GP8, GP9) and FA2G2 (GP14) are found [14]. These carbohydrates are critical components of the Fc-Fcγ receptor interaction [1].
The heritability analyses for the 76 IgG patterns are presented in Table 2. The best fitting model for the majority of the IgG pattern was the AE model (including additive genetic and non-shared environment), ascribing the total variance to additive genetic factors and non-shared environmental factors, with heritability estimates ranging from 0.49 for GP21 (which reflects the proportion of the A2G2S2 glycan, i.e. disialylated, digalactosylated, bi-antennary N-linked glycans; see Table S1 for description of glycan codes) to 0.80 for GP8n (which reflects the proportion of the monogalactosylated isomer FA2 [6]G1 relative to all the neutral glycans). The ACE model (including additive genetic, common environment and unique environments) was the best fitting model for 18 glycan traits with heritability estimates ranging from 0.19 for FtotalS1/ FtotalS2 to 0.52 for FBG0n/G0n. In contrast, 7 glycan traits showed no clear genetic influence and appeared to be affected primarily by common and unique environmental factors. We have classified these 7, added to the 5 glycan traits in Table 2 with an additive component under 0.35, as "low heritability". 51 of the 76 glycan traits (67%) studied have an additive genetic component of 0.5 or greater, meaning that at least half of the variance in their levels is determined by genetic factors, 5 glycan traits had a genetic contribution under 0.35. Heritability estimates for the remaining glycan traits varied between 0.35 and 0.5.
We hypothesized that IgG glycans showing none or low genetic influences could be epigenetically mediated and we compared their levels with genome-wide DNA methylation profiles from the Illumina HumanMethylation27 DNA Analysis BeadChip assay in 127 MZ and DZ female twins [16]. The analyses were adjusted for age, sex, BMI, methylation chip, sample position on methylation chip, and family relatedness.
Among glycan traits with a low heritability we identified 2 CpG-sites (cg08392591, cg26991199) at which DNA methylation levels were associated with levels of 4 IgG glycan traits with P <2 x 10 -6 (the Bonferroni cut off for array-wide significance; see Table 3). Probe cg08392591 maps to a CpG island 5' from the ANKRD11 gene on chromosome 16 while cg26991199 maps to SRSF10 gene on chromosome 1. ANKRD11 is a p53 activator, while SFRS10 is involved in constitutive and regulated RNA splicing and in particular is involved in regulation of splicing of mRNA precursors upon heat shock.
Among the 64 glycan traits that had heritabilities above 0.35 we observed five array-wide significant hits ( Table 3) for the two probes, cg13782134 (mapping to the NRN1L gene) and cg16029957 (mapping near the QPCT gene). NRN1L encodes a neuritin-like protein precursor and its role in immunoglobulin glycosylation is unclear. QPCT encodes the human pituitary glutaminyl cyclase, which is responsible for the presence of pyroglutamyl residues in many neuroendocrine peptides. Overall, the proportion of array-wide epigenetic associations was significantly larger (P<0.005) among glycan traits with low heritability (5/12= 41.6%) than in those showing high heritability (4/64 = 6.2%).
We also tested if the glycan traits measured could have some clinical relevance and we found that 4 of the 76 glycan traits are significantly (after adjustment for multiple tests) associated with circulating levels of triglycerides, a well known risk factor of cardiovascular risk [17] and 3 are associated with circulating levels of C-reactive protein (CRP), a well known marker of systemic inflammation [18] (Table S2). Although the investigation of the role of the various glycans in health and disease is beyond the scope of our study, these data suggest that the molecular mechanisms underlying IgG glycans can yield clinically relevant insights. We note that all the glycans associated with these two traits have high heritabilities and no epigenetic significant associations were found for these. However, we did not carry out a systematic study of the relationship between IgG glycans and cardiovascular or inflammatory traits which may show associations also among glycans with low heritabilities.

Discussion
In this study we have evaluated, using a classical study design, the heritable and non-heritable component of circulating immunoglobulin G glycome composition. The study was sufficiently powered to detect with 95% probability heritabilities above 0.4 even with low-shared environmental contributions and we detected 59 glycan traits with heritabilities above this cut-off.
Our data show that variation in levels of 51 of the 76 IgG glycan traits studied is at least half heritable and only a small proportion of N-glycan traits have a low genetic contribution. Though glycans are produced in a complex biosynthetic pathway [19] and were believed to be significantly affected by many environmental factors [20], we find a high contribution of the genetic component to IgG glycome composition. Compared to GWAS study of the plasma glycome in 2705 individuals [13], the recent GWAS study of the IgG glycome in 2247 [14] has identified five times more genetic loci with genome-wide significant associations. Inter-individual variation of the IgG glycome [9] is more than three-fold larger than the interindividual variation of the plasma glycome [10]. Large heritability and the large number of involved genetic loci suggest that the genetic regulation of the IgG glycome is stronger than the regulation of the total plasma glycome. Fine details of IgG glycan structure significantly affect function of immunoglobulins [21], thus close regulation of IgG glycosylation is required for proper function of the immune system [22]. Genetic loci with variants that affect IgG glycome composition were reported for the majority of glycans with high heritability [12][13][14] (Table S3), and some of the highly heritable glycans seem to be affected by multiple loci (e.g., FG0n/G0n associates with genetic variants in the following genes: FUT8, MGAT3, IKZF1 and SMARCB1-DERL3).
In addition to the genetic contribution to the glycome, an important source of complexity and variability in IgG glycosylation is the interaction with the environment, some of which may be revealed by epigenetic changes [23]. Epigenetic silencing of HNF1A, a known master regulator of plasma protein fucosylation, has been shown to be associated with changes in the composition of the human plasma N-glycome [24]. In this study we find that methylation levels at other genes are also implicated in glycome composition, both in those with high heritabilities and those with a lower genetic contribution. By using a well-characterized cohort with epigenetics data available, such as TwinsUK, it was possible to integrate glycome data with other existing molecular data and adjust for confounders.
The main limitation of the present study is that due to the novelty of the glycan phenotypes we lack the replication for the epigenetic findings in an independent cohort. The fact that we find epigenome-wide significant hits on a relatively small sample suggests that epigenetic factors contribute to IgG glycan levels, although due to the lack of replication we cannot exclude false positive results. Epigenetic factors also play a role in the case of some glycans with a high heritability. However, we only found 5 significant methylation hits (mapping to two probes) for 64 glycan traits with h 2 >0.35 and 5 significant hits (mapping to two probes) for 12 glycan traits with h 2 ≤0.35. The probes associated with highly heritable glycan traits were different to those associated with glycan traits with lower heritabilities.
For glycan traits with lower heritabilities the most significant probe maps to the p53 activator ANKRD11. The tumor suppressor p53 is known to be able to modulate innate immune gene responses. For glycan traits with high heritabilities the two hits appear related to neuroendocrine regulation, in one case directly to a glycosylation enzyme QPCT, in the second case to a neuritin-like protein precursor which has been implicated in neuronal survival [25]. These genes have not been previously reported to have a role in IgG glycosylation. A similar observation was recently reported in a GWA study of the IgG glycome which identified 12 genes not previously known to be involved in IgG glycosylation [14], providing further evidence that IgG glycosylation is a very complex and tightly regulated process. As more epigenetic and genetic data become available for cohorts with IgG N-glycan characterizations it will become possible to elucidate the molecular pathways underlying many complex traits.

Ethic statement
The study was approved by St. Thomas' Hospital Research Ethics Committee, and all twins provided informed written consent.

Study subjects
Study subjects were twins enrolled in the TwinsUK registry, a national register of adult twins. Twins were recruited as volunteers by successive media campaigns without selecting for particular diseases or traits [26]. All twin pairs recruited were of the same sex.
In this study we analysed data from 440 monozygotic and 610 dizygotic female twins with glycomics and epigenomic data available.

Isolation of IgG from human plasma
The IgG was isolated using protein G monolithic plates as described previously [9]. Briefly, 90 µl of plasma was diluted 10x with PBS, applied to the protein G plate (BIA Separations, Ljubljana, Slovenia) and instantly washed. IgGs were eluted with 1 ml of 0.1 M formic acid and neutralized with 1 M ammonium bicarbonate.

N-Glycan Release
Isolated IgG samples were dried in a vacuum centrifuge. After drying, proteins were denatured with addition of 20 μL 2% SDS (w/v) (Invitrogen, Carlsbad, CA, USA) and by incubation at 60 °C for 10 min. Subsequently, 10 μL of 4% Igepal-CA630 (Sigma-Aldrich, St. Louis, MO, USA) and 0.5 mU of PNGase F in 10 μL 5× PBS were added to the samples. The samples were incubated overnight at 37 °C for N-glycan release.

2-aminobenzamide labelling
The released N-glycans were labelled with 2aminobenzamide (2-AB), the fluorescent dye used to make glycans visible in UPLC. The labelling mixture was freshly prepared by dissolving 2-AB (Sigma-Aldrich, St. Louis, MO, USA) in DMSO (Sigma-Aldrich, St. Louis, MO, USA) and glacial acetic acid (Merck, Darmstadt, Germany) mixture (85:15, v/v) to a final concentration of 48 mg/mL. A volume of 25 μL of labelling mixture was added to each N-glycan sample in the 96-well plate. Also, 25 μL of freshly prepared reducing agent solution (2-picoline borane (Sigma-Aldrich, St. Louis, MO, USA) in DMSO -concentration of 106.96 mg/ml) was added and the plate was sealed using adhesive tape. Mixing was achieved by shaking for 10 min, followed by 2 hour incubation at 65 °C. Samples (in a volume of 100 μL) were brought to 80% ACN (v/v) by adding 400 μL of ACN.

Cleaning and elution of labelled glycans using HILIC-Solid Phase Extraction (SPE)
Free label and reducing agent were removed from the samples using HILIC-SPE. An amount of 200 μL of 0.1 g/mL suspension of microcrystalline cellulose (Merck, Darmstadt, Germany) in water was applied to each well of a 0.45 μm GHP filter plate (Pall Corporation, Ann Arbor, MI, USA). Solvent was removed by application of vacuum using a vacuum manifold (Millipore Corporation, Billerica, MA, USA). All wells were prewashed using 5 × 200 μL water, followed by equilibration using 3 × 200 μL acetonitrile/water (80:20, v/v). The samples were loaded to the wells. The wells were subsequently washed 5 × using 200 μL acetonitrile/water (80:20, v/v).
Glycans were eluted 2 × with 100 μL of water and combined elutes were not dried, but either analyzed immediately by UPLC or stored at -20 °C until usage.

Hydrophilic interaction high performance liquid chromatography (HILIC)
Fluorescently labelled N-glycans were separated by hydrophilic interaction chromatography on a Waters Acquity UPLC instrument (Milford, MA, USA) consisting of a quaternary solvent manager, sample manager and a FLR fluorescence detector set with excitation and emission wavelengths of 330 and 420 nm, respectively. The instrument was under the control of Empower 2 software, build 2145 (Waters, Milford, MA, USA). Labeled N-glycans (10 μl) were separated on a Waters BEH Glycan chromatography column, 100 × 2.1 mm i.d., 1.7 μm BEH particles, with 100 mM ammonium formate, pH 4.4, as solvent A and acetonitrile as solvent B. Separation method used linear gradient of 75-62% acetonitrile (v/v) at flow rate of 0.4 ml/min in a 25 min analytical run. Samples were maintained at 5 °C before injection, and the separation temperature was 60 °C. The system was calibrated using an external standard of hydrolyzed and 2-AB labelled glucose oligomers from which the retention times for the individual glycans were converted to glucose units. Data processing was performed using an automatic processing method with a traditional integration algorithm after which each chromatogram was manually corrected to maintain the same intervals of integration for all the samples. The chromatograms obtained were all separated in the same manner into 24 peaks and the amount of glycans in each peak was expressed as % of total integrated area. In addition to 24 directly measured glycan structures, 53 derived traits were calculated as described previously [9] (see Table S1). These derived traits average particular glycosylation features (galactosylation, fucosylation, sialylation) across different individual glycan structures. Consequently, they are more closely related to individual enzymatic activities, and underlying genetic polymorphisms [13].

Epigenetics
DNA methylation levels were obtained using the 27k Illumina CpG methylation probe array in 127 female twins aged 32 to 80 randomly selected from the discovery cohort. QC measure were applied, as previously described [16] and 24,641 autosomal probes passed quality control and were included in the analysis. Probes were standardized to have mean zero and variance 1.

Statistical methods
Statistical analysis was carried out using Stata version 11. The R package OpenMX was used to calculate heritability.
Heritability of glycome composition was estimated using structural equation modelling to decompose the observed phenotypic variance into three latent sources of variation: additive genetic variance (A), shared/common environmental variance (C), and non-shared/unique environmental variance (E) [27] adjusting for age and batch. Additive genetic influences are indicated when monozygotic twins are more similar than dizygotic twins. The common environmental component estimates the contribution of family environment which is assumed to be equal in both MZ and DZ twin pairs [28], whereas the unique environmental component does not contribute to twin similarity, rather it estimates the effects that apply only to each individual and includes measurement error. The equal environment assumption across zygosities implies that any greater similarity between MZ twins than DZ twins is attributed to greater sharing of genetic influences. The best fitting model (among ACE, AE, CE, and E models) was determined by removing each factor sequentially from the full model and testing the deterioration in fit of the various nested models, using the likelihood ratio test (p=0.05). In addition, the Akaike Information Criteria (AIC) was considered, with lower values indicating the most suitable model. The AIC combines the goodness of fit of a model (the discrepancy of expected to observed covariance matrixes) with its simplicity, resulting in a measure of parsimony [27]. The most parsimonious model was then used to estimate heritability, defined as the proportion of the phenotypic variation attributable to genetic factors Random intercept logistic regression was used to test the association between whole-blood DNA methylation patterns and IgG patterns with low heritability, adjusting for age, sex, BMI, methylation chip, sample position on methylation chip, and family relatedness. Adjusting for zygosity did not change the results.