A genome-wide analysis of DNA methylation identifies a novel association signal for Lp(a) concentrations in the LPA promoter

Lipoprotein(a) [Lp(a)] is a major cardiovascular risk factor, which is largely genetically determined by one major gene locus, the LPA gene. Many aspects of the transcriptional regulation of LPA are poorly understood and the role of epigenetics has not been addressed yet. Therefore, we conducted an epigenome-wide analysis of DNA methylation on Lp(a) levels in two population-based studies (total n = 2208). We identified a CpG site in the LPA promoter which was significantly associated with Lp(a) concentrations. Surprisingly, the identified CpG site was found to overlap the SNP rs76735376. We genotyped this SNP de-novo in three studies (total n = 7512). The minor allele of rs76735376 (1.1% minor allele frequency) was associated with increased Lp(a) values (p = 1.01e-59) and explained 3.5% of the variation of Lp(a). Statistical mediation analysis showed that the effect on Lp(a) is rather originating from the base change itself and is not mediated by DNA methylation levels. This finding is supported by eQTL data from 208 liver tissue samples from the GTEx project, which shows a significant association of the rs76735376 minor allele with increased LPA expression. To evaluate, whether the association signal at rs76735376 may actually be derived from a stronger eQTL signal in LD with this SNP, eQTL association results of all correlated SNPs (r2≥0.1) were integrated with genetic association results. This analysis pinpointed to rs10455872 as the potential trigger of the effect of rs76735376. Furthermore, both SNPs coincide with short apo(a) isoforms. Adjusting for both, rs10455872 and the apo(a) isoforms diminished the effect size of rs76735376 to 5.38 mg/dL (p = 0.0463). This indicates that the effect of rs76735376 can be explained by both an independent effect of the SNP and a strong correlation with rs10455872 and apo(a) isoforms.

The apo(a) isoform determined by the KIV-2 repeat number is the strongest contributor to Lp(a) concentration variance and explains ≈30-70 % of the total trait variance, with the complete LPA locus explaining up to 90%(2, 3), Interestingly, Lp(a) levels of the same-sized isoforms vary by 200-fold in the general population(1) but <2.5-fold within families (4), suggesting the existence of genetic variation strongly modifying the impact of the isoform. Indeed, genome-wide SNP-bound heritability has been recently estimated to be ≈49% in a German population (5).
Some high impact loss-of-function variants have been described(6-10) but regulatory variants have been elusive so far. The search for regulatory variants has so far focused on the promoter region (11)(12)(13) and a known enhancer region located 15 to 25 kb upstream of LPA (14), as well as on genome-wide association study efforts (5,15,16). Three variants in the enhancer region have been found to regulate LPA expression in reporter assays (14). Conversely, the promoter region of LPA is less well defined. Depending on whether a reference transcript with a leading non-coding first exon(11) (UCSC Genome browser hg38 annotation) or without it (12,13) (NCBI NM_005577.2) was used, two different genomic regions located ≈4 kb apart from each other have been investigated as putative promoters (11). The most extensively studied region extends about 1.4 kb upstream of the translation start. Some regulatory variants (17,18) and liver-specific regulatory elements (19) have been described in this region, as well as a pentanucleotide repeat (PNR) polymorphism with [5][6][7][8][9][10][11][12] repeats (12,20). The latter is significantly associated with Lp(a) levels in Caucasians (21,22). Early reports indicated a causal impact on promoter activity (12), which was not replicated in later studies (13). Since the association of the PNR with Lp(a) is, however, well replicated (21,22), it is likely that the PNR is simply in linkage disequilibrium with a regulatory variant (22).
Epigenetic regulation presents an additional layer of genetic regulation acting via chemical modifications of either histones (23) or the DNA sequence itself (24). DNA methylation is a process by which methyl groups are added to the DNA molecule (24). 5-Methylcytosine can be activating, when occurring in gene bodies, or repressing, when occurring in promoters(24) (albeit the repressing action of promoter methylation extends into the 5' region of the gene body (25)). DNA methylation may act by modifying the binding affinity of transcription factors (26) or recruiting silencing factors causing heterochromatinization (24). Variation in the methylation level can thus affect both binding affinity of transcription factors and chromatin organization.
Allele-specific methylation (ASM) at SNPs represents a special case of gene regulation via methylation. Up to 50,000, mostly cis-acting, ASM-SNPs have been reported (25,27) so far. The large majority was located in CpG dinucleotides (28), where the SNP either creates or abolishes a CpG site (CpG-SNP). Such ASM-SNPs have been reported for e.g. inflammatory bowel disease (29), osteoarthritis susceptibility (30), atopic dermatitis(31), triglyceride levels(32) and diabetes (33). Even methylation close to a SNP site may already modulate the allele specific binding of transcription factors to the SNP site itself (34).
The impact of methylation on regulation of lipid phenotypes has been studied to a lesser extent than germline SNPs on these phenotypes. The studies performed so far reported clear associations with lipid phenotypes such as VLDL-C, HDL-C, LDL-C, triglycerides and total cholesterol and even lipid-related diseases (reviewed in (35)), but none of these studies has addressed Lp(a). Therefore we report here the first epigenome-wide methylation association study on Lp(a) concentrations.

Study design and populations
The overall study design is outlined in Supplementary Figure 1

Genome-wide DNA methylation
Genome-wide DNA methylation patterns in the discovery and replication cohorts were analyzed using the Infinium HumanMethylation450 BeadChip Array (Illumina) and DNA derived from peripheral blood. KORA F3 and F4 samples were measured and processed separately (37). KORA F3/F4 samples were processed on 20 respectively 7 96-well plates in 9 respectively 4 batches; plate and batch effects were investigated using principal component analysis and eigenR2 analysis (38).
DNA methylation data were preprocessed following the CPACOR pipeline (39). Background correction was performed using the R package minfi, v1.6.0 (40). Signals with detection p-values (the probability of a signal being detected above the background) ≥ 0.01 were removed, as they indicate unreliable signals, as were signals summarized from fewer than three functional beads on the chip.
Observations with less than 95% of CpG sites providing reliable signals were excluded.
To reduce the non-biological variability between observations, data were normalized using quantile normalization (QN) on the raw signal intensities: QN was performed on a stratification of the probe categories into 6 types, based on probe type and color channel, using the R package limma, v3. 16.5(41).
The percentage of methylation at a given cytosine is reported as a beta-value, which is a continuous variable between 0 and 1 corresponding to the ratio of the methylated signal over the sum of the methylated and unmethylated signals of the particular cytosine site. Since it was the primary idea to detect the effect of differential methylation on Lp(a) and since probe binding might be affected by SNPs in the binding area, sites representing or being located in a 50 bp proximity to SNPs with a minor allele frequency (MAF) of at least 5% were excluded from the data set (42). Nine CpG sites in the LPA gene locus were present on the microarray (Supplementary Table 1).

SNP validation and LPA pentanucleotide repeat genotyping
KORA F4 samples were genotyped using the Affymetrix Axiom chip array. Genotypes were called with the Affymetrix software and were annotated to NCBI build 37. Imputation was performed with IMPUTE v2.3.0 using the 1000G phase1 (v3) reference panel (43). Additional de novo genotyping for rs76735376 was performed in KORA F3, KORA F4 and the SAPHIR using a commercial Taqman assay (ThermoFisher Scientific).
Due to the low MAF, samples being heterozygous and homozygous for the rare allele were confirmed by Sanger sequencing using the primers 5'-TACAGGACAGAGACTAACT-3' and  Table 2 for sequences and annealing temperature). All sequencing was done using ABI BigDye 1.1 chemistry.

Measurement of Lp(a) concentrations and apo(a) isoform size
Lp(a) concentration were determined in plasma using the same standardized sandwich ELISA assay in all samples. The assay has been described in detail previously (10,47). In brief, ELISA plates were coated with an affinity-purified rabbit anti-human apo(a) antibody and incubated with human plasma for 1 hour. After several washing steps, the bound Lp(a) was detected using a horseradishperoxidase-conjugated monoclonal antibody (1A2; in 0.1% w/v casein, 1x PBS, pH 7.3) and the Blue Star TMB substrate (Adaltis, Guidonia Montecelio, IT) (30 minutes incubation at room temperature).
Lp(a) isoform was determined by Western blotting in all samples as described (10). In brief, 150 ng Lp(a) as determined in the ELISA were loaded on a 1.46% agarose gel with 0.08% SDS and separated for 18 h at 0.04 A (constant current). A size standard consisting of a mixture of five plasma samples collected from individuals expressing each only one apo(a) isoform (13,19,23,27,35 KIV repeats determined by Pulsed Field Gel Electrophoresis) was applied in every seventh well of the gel. In heterozygous individuals, the smaller (and thus the commonly Lp(a) concentrationdetermining) of the two alleles was used for isoform-adjustment of statistical models.

Genome wide methylation analysis
Linear regression models were applied to evaluate the association between DNA methylation beta values and Lp(a). Lp(a) levels were log-transformed to account for the skewed distribution. Age, sex and the first 20 principal components of the Illumina control probes (to adjust for technical confounding) were included as potential confounders. In addition, since whole blood DNA samples were used, cell heterogeneity had to be considered as a confounder. As no measured cell count information was available, sample-specific estimates of the proportion of the major white blood cell types were obtained using a statistical method described by Houseman et al (48). To correct for multiple comparisons, a Bonferroni corrected significance level of 1.03e-07 (=0.05/485512) was used.
Genome-wide significant findings were replicated in KORA F3 using the same adjustment model as in the discovery stage. Fixed-effects inverse-variance meta-analysis was applied to summarize the findings in KORA F3 and F4. The analyses were repeated additionally adjusting for apo(a) isoforms or in subgroups of apo(a) isoforms.

SNP association and mediation analysis
Association of the identified SNP with log-transformed Lp(a) was evaluated in all three studies individually (KORA F3, KORA F4, SAPHIR) using a linear regression model and in all three studies combined using a linear mixed effects model with "study" as a random effect and adjusting for age and sex. The genotypes of the SNP were additively coded, that is, effect sizes refer to one copy of the minor allele. This analysis was repeated additionally adjusting for apo(a) isoforms or in subgroups of apo(a) isoforms. In addition to the models with log-transformed Lp(a) values as the dependent variable, which were used to ensure the normal-distribution assumption to derive the test-statistic and p-values, Lp(a) on its original scale was used. From these models, beta estimates and standard errors (se) were derived to ease interpretability of the effect sizes.
Due to the close vicinity of the cg-site/SNP with the well-known PNR polymorphism, the association of the PNR with Lp(a) concentrations was evaluated and whether this effect can be explained by the SNP or vice versa.
To assess whether the effect of the identified SNP on Lp(a) is mediated through methylation, a formal mediation analysis was applied using the R library "mediation".

Follow-up in GTEx and transcription factor binding site prediction
The effect of rs76735376 on LPA gene expression in native liver tissue was assessed using the GTEx (49)

Methylation analysis
Descriptive statistics of the participating studies are given in Supplementary Table 3. One CpG site (cg17028067) located in the promoter region of the LPA gene was significantly associated with Lp(a) in the discovery sample KORA F4 (p=6.04e-11, Supplementary Table 1). This site was replicated in KORA F3 with direction-consistent effect size (p=0.0001 in KORA F3, meta-analysis results: p=2.61e-14; Table 1).
Since low methylation beta-values of cg17028067 (mean ± sd in KORA F4 = 0.811 ± 0.068; in KORA F3: 0.782 ± 0.061) were occurring predominantly in low molecular weight apo(a) isoforms with 18 to 20 KIV repeats (Supplementary Figure 4), the analysis was enhanced as follows (Table 1): first additionally adjusting for the lower of both apo(a) isoforms and secondly analyzing only the subgroup of participants with one isoform equal to 18-20 repeats (only in KORA F4 due to low sample size in KORA F3). Although the association was partially attenuated after adjustment for apo(a) isoforms (p=4.20e-7, n=2187) and in the subgroup of isoforms 18-20 (p=0.00265, n=223), it still remained highly significant, indicating that the association is not only a consequence of a correlation of the SNP with small apo(a) isoforms.

SNP genotyping
Surprisingly, the identified cg site was found to overlap a SNP (rs76735376). This C>T (reverse strand, i.e. transcribed strand of LPA) polymorphism is located exactly at the cytosine base of the CpG site cg17028067. The minor allele thus abolishes one potential methylation site by replacing the cytosine base of the CpG by a thymidine, thus effectively reducing the amount of available methylation target sites ( Figure 1). This SNP had not been genotyped directly with the earlier SNP microarray but had been imputed with a low imputation quality (imputation quality info =0.4) and a reported very low minor allele frequency of ≈0.1%. Therefore, this cg site had not been removed from the primary analysis dataset when SNP positions were masked. Even more surprising, subsequent de novo genotyping and confirmatory Sanger sequencing in KORA F3, KORA F4 and SAPHIR revealed that this SNP was indeed considerably more frequent than reported in the imputed data with a true MAF of 1.1% and a carrier frequency of 2.2% (Supplementary Table 4).

Validation by genotyping and bisulfite sequencing
The methylation of the region was validated by bisulfite sequencing in 8 samples of the SAPHIR study (4 homozygotes for the major allele of rs76735376 and 4 heterozygotes, Supplementary Figure   5A). Besides cg17028067 nine additional CpG sites nearby cg17028067 were all nearly 100% methylated (Supplementary Figure 2). None of these sites were represented on the HumanMethylation450 BeadChip array. The methylation level of the cg sites nearby was independent from the status of cg17028067. This indicates that the whole region might contain CpG dinucleotides, which are target of DNA methylation and can thus potentially affect LPA expression. In heterozygous carriers of the SNP rs76735376 the methylation at cg17028067 was reduced by 50% as would be expected (Supplementary Figure 5B).

SNP association analysis
The minor allele of rs76735376 was associated with increased Lp(a) values ( Figure 2) with consistent effect sizes in all three cohorts (p=1.01e-59, Table 2). Each copy of the minor allele was associated with an increase of Lp(a) concentrations by 37 mg/dL. The SNP explained 3.45% of the variance of Lp(a) concentrations. One part of the SNP effect seems to be due to correlation with apo(a) isoforms. As already observed for the methylation levels, isoforms 18-20 are overrepresented in carriers of the minor allele (T) (Supplementary Figure 6). The effect size is reduced in a model adjusted for isoforms and one third to one quarter as high in subgroups with one apo(a) isoform equal to 18, 19 or 20 repeats. However, rs76735376 still explains about 3.2% of the variance of Lp(a) in the latter subgroup (p=9.71e-08, Table 2 and Figure 2).

Association with the pentanucleotide-repeat (PNR) polymorphism
Since rs76735376 is in close vicinity to the well-known LPA pentanucleotide-repeat (PNR) polymorphism (12,(20)(21)(22), we evaluated in KORA F4 and SAPHIR, whether the association of the PNR with Lp(a) concentrations can be explained by rs76735376 or vice versa. In both studies, 89% of all participants present one allele with 8 repeats of this PNR, whereas all carriers of the minor allele of rs76735376 have this repeat number at least once (Supplementary Table 5

Mediation analysis
Since rs76735376 was identified actually by chance in a genome-wide methylation analysis, we investigated whether the effect of the SNP on Lp(a) is mediated through the methylation signal. A formal mediation analysis revealed no significant causal mediation effect of the methylation level of cg17028067 on log-Lp(a) (p=0.13, Supplementary Figure 7). The total effect of the SNP on Lp(a) is only marginally altered by inclusion of the beta-value of cg17028067 in the model. In a regression model with both, rs76735376 and cg17028067, only the SNP remains significant. This might indicate that the true causal factor is the base change at rs76735376 rather than the methylation level at cg17028067.

Effect of rs76735376 on LPA transcription
LPA is expressed exclusively in liver tissue, which is hard to obtain for a large number of samples, especially if they need to be controlled for defined rare LPA genotypes. To assess whether rs76735376 indeed affects LPA expression in vivo, we therefore performed a lookup in the data of the GTEx consortium (51). GTEx is a multinational project providing both genome-wide SNP data and complete transcriptome data for >10,000 individuals and >70 different tissues, including data from 153 liver samples. The minor allele rs76735376 was significantly associated with increased LPA expression in liver tissue (beta(se)=+0.44 (0.20), p=0.0369) despite only 11 heterozygous carriers in the data. Accordingly, bioinformatic transcription factor binding analysis using TRANSFAC (50) revealed a direct DNA transcription factor binding site for Oct factors POU2F1 and POU5F1 (also known as Oct1 and Oct3/4) within the T -allele of the rs76735376 SNP but not the C-or 5-mC allele (Supplementary Figure 3). Moreover, 5 bases besides the Oct binding site, a CEBPB was predicted (Supplementary Figure 3). Pilot experiments using electrophoretic mobility shift assays (EMSA) with probes encompassing the SNP site with C, 5-mC and T allele oligonucleotides suggests that CEBPB but not the Oct factors POU2F1 and POU5F1 indeed bind to both the C and 5-mC allele, but not to the T-allele (Supplementary Figure 8).

Discussion
Lp(a) is a major genetically determined cardiovascular risk factor. Research on Lp(a) has experienced two revivals: the first by genetic studies providing strong support for a causal association with cardiovascular outcomes and the second, more recently, due to Lp(a)-lowering agents becoming available in the form of PCSK9 inhibitors and mRNA antisense therapy. Despite the strong impact of Lp(a) on CVD, many controversies and uncertainties remain about its metabolism and regulation (52).
The association of Lp(a) concentrations with the apo(a) isoform size is well accepted, but far from being simple and linear (1,10). Even within the same isoform group, levels may vary by 200-fold(4), suggesting the existence of modifier variants which act on top of the apo(a) isoforms. Some of these have been identified (7,53,54), but they all act via disturbing protein synthesis.
Regulatory variants and elements are less defined. However, understanding the transcriptional regulation of LPA might be highly relevant for future drug development. We performed the first genome-wide DNA methylation study for Lp(a). This identified the SNP rs76735376 located in a CpG site in the LPA promoter. Each minor allele abolishes a methylation site in a CpG dinucleotide and is associated with an increase in Lp(a) of about 37 mg/dL and 9 mg/dL when restricted to individuals with 18-20 KIV repeats who are the main carriers of this methylation site. To dissect the effect of the base from the methylation site we performed a statistical mediation analysis which indicated that the effect observed in the association study is driven by the base change and not by the methylation status of the locus. This was supported by pilot EMSA experiments using the three possible allele states C, 5m-C and T), which showed no clearly defined difference between the C and the 5m-C allele, but a difference between the C and T alleles.
The location in the promoter suggests an effect on the transcriptional regulation of LPA. While Lp(a) is exclusively expressed in the liver and native tissue is hard to obtain in numbers sufficient to account for the low MAF of rs76735376. The recent large-scale multi-tissue transcriptome and eQTL profiling initiative GTEx, however, provides a convenient way to still investigate the effects of rs76735376 on LPA expression. Indeed, a look-up for in the GTEx raw data revealed that the rs76735376 T-allele is associated with significantly increased LPA expression, which is in line with the observed increase of Lp(a) concentrations in these individuals.
Interestingly, rs76735376 was located close to the LPA pentanucleotide repeat (PNR). This led us to speculate that it might be the causal factor underlying the previously reported(12), but not clearly replicated (13) impact of the PNR on LPA expression. This was, however, not the case as the association signal at rs76735376 was only marginally affected by including the PNR in the model and vice versa. Rs76735376 therefore represents a novel, independent SNP in the LPA promoter strongly associated with Lp(a) concentrations. Although it is rare, it explains more than 3% of the Lp(a) variance, which is remarkable high compared to other complex phenotypes.
Of note, all carriers of the minor allele of rs76735376 carried a PNR allele with 8 copies and virtually all minor allele carriers presented an apo(a) allele with 19 to 20 KIV repeat (Supplementary Figure 6). Rs76735376 therefore is in line with several other putative functional SNPs in LPA, which are confined to certain isoform ranges (18,53,55). This underscores the complex genetic make-up of the Lp(a) trait, consisting of an intricate interplay between the KIV-2 repeat number, transcriptional regulation by enhancer and promoter regions, modifier SNPs, which may be in turn confined only to certain isoform sizes, and trans-regulating factors like apoE (5,56). Given the large impact of the underlying apo(a) isoform on Lp(a) levels, a full assessment of variants influencing Lp(a) is likely to require complete knowledge about the isoforms of the investigated samples. Failure to account for the isoforms has been shown to even potentially mask true genetic effects (18,53).
Our study also highlights an important technical message for the community. Indeed, when we originally set out to investigate the impact of DNA methylation on Lp(a), the quality control procedure was supposed to purge any CpG sites affected by a SNP by using imputed 1000Genomes genotype data filtered at 5% MAF to focus only on differential methylation but not common ASM-SNPs.
However, rs76735376 escaped filtering because it was badly imputed and thus incorrectly reported with a MAF of 0.1%. Since such a low MAF could not have triggered the association signal observed, we genotyped it de-novo, revealing a ten times higher true MAF. This MAF is now confirmed also by new reference datasets that became available only very recently (like TOPMed(57), https://bravo.sph.umich.edu/freeze5/hg38/).
Our study presents some limitations which need to be mentioned. Firstly, the methylation dataset was derived from peripheral blood, while LPA expression is limited to hepatic tissue. This is an unavoidable limitation of population-scale genome wide methylation analyses since hepatic tissue is not accessible at the sample numbers required for genome-wide studies. This applies even more to Lp(a), where a full depiction of the isoform variability is required for proper analysis and some functional variants are rare. Indeed, even the GTEx initiative, which contains data from ≈10,000 individuals and >70 tissues, reports only 175 liver samples (as of April 2019; thereof only 153 having also SNP genotype data available). Nevertheless, while methylation is clearly a tissue-specific marker, studies also have shown that a consistent portion of the DNA methylation patterns is still conserved across tissues (58,59). Therefore, while the use of blood-derived DNA as surrogate for hepatic methylation may have led to missing some very strictly tissue-specific regulators, we can be confident that a consistent portion of methylation-based regulation has still been assayed. Secondly, our dataset was of Caucasian origin. Given the known differences in allele frequencies and effects of LPA SNPs between ethnicities, investigations in other populations are required to establish the meaning of our findings in other ethnicities (60,61). Finally, functional and fine mapping studies that integrate tissue specific transcriptomics and epigenomics data with Lp(a) phenotype data are finally warranted to fully understand the contribution of this SNP to the Lp(a) trait.
In conclusion, this study represents a first step assessing the epigenetic regulation of Lp(a). We showed that the analyzed region of the Lp(a) promoter is subjected to DNA methylation and the identification of rs76735376 located at the CpG site at cg17028067 adds a novel variant to the catalog of genetic variants affecting Lp(a) concentrations. Extending our association results, we found an association between rs76735376 and LPA gene expression levels in public tissue data. Although the elucidation of the precise mechanisms and biological relevance of our observations will require dedicated in-depth studies, investigations on Lp(a) regulation might be fruitful and lead to novel therapeutic options.