Genetic associations between serum low LDL-cholesterol levels and variants in LDLR, APOB, PCSK9 and LDLRAP1 in African populations

Non-communicable diseases, including cardiovascular diseases (CVDs), are increasing in African populations. High serum low density lipoprotein cholesterol (LDL-cholesterol) levels are a known risk factor for CVDs in European populations, but the link remains poorly understood among Africans. This study investigated the associations between serum LDL-cholesterol levels and selected variants in the low density lipoprotein receptor (LDLR), apolipoprotein B (APOB), proprotein convertase subtilisin/kexin type 9 (PCSK9) and low density lipoprotein receptor adaptor protein 1 (LDLRAP1) genes in some selected African populations. Nineteen SNPs were selected from publicly available African whole genome sequence data based on functional prediction and allele frequency. SNPs were genotyped in 1000 participants from the AWI-Gen, study selected from the extremes of LDL-cholesterol level distribution (500 with LDL-cholesterol>3.5 mmol/L and 500 with LDL-cholesterol<1.1 mmol/L). The minor alleles at five of the six associated SNPs were significantly associated (P<0.05) with lower LDL-cholesterol levels: LDLRAP1 rs12071264 (OR 0.56, 95% CI: 0.39–0.75, P = 2.73x10-4) and rs35910270 (OR 0.78, 95% CI: 0.64–0.94, P = 0.008); APOB rs6752026 (OR 0. 55, 95% CI: 0.41–0.72, P = 2.82x10-5); LDLR: rs72568855 (OR 0.47, 95% CI: 0.27–0.82, P = 0.008); and PCSK9 rs45613943 (OR = 0.72, 95% CI: 0.58–0.88, P = 0.001). The minor allele of the sixth variant was associated with higher LDL-cholesterol levels: APOB rs679899 (OR 1.41, 95% CI: 1.06–1.86, P = 0.016). A replication analysis in the Africa America Diabetes Mellitus (AADM) study found the PCSK9 variant to be significantly associated with low LDL-cholesterol levels (Beta = -0.10). Since Africans generally have lower LDL-cholesterol levels, these LDL-cholesterol associated variants may be involved in adaptation due to unique gene-environment interactions. In conclusion, using a limited number of potentially functional variants in four genes, we identified significant associations with lower LDL-cholesterol levels in sub-Saharan Africans.


Introduction
Wits-INDEPTH Partnership for GENomic studies), an established project investigating genomic and environmental factors that influence cardio-metabolic disease risk in rural and urban Africans [28]. From~10,000 AWI-Gen participants, 500 participants with high fasting LDLcholesterol levels (> 3.5 mmol/L) and 500 participants with low LDL-cholesterol levels (< 1.1 mmol/L) were selected to represent the "cases" and "controls", respectively. The thresholds were set based on the distribution of LDL-cholesterol levels in the AWI-Gen cohort. The rationale was that because all individuals are born with an LDL-cholesterol level of approximately 1.1 mmol/L, [29] a value lower than this suggests a genetic aetiology. A recent meta-analysis on African data used a cut-off of 3.3 mmol/L for high LDL-cholesterol [30], therefore, the high cut-off of > 3.5 mmol/L was appropriate for our study. The age range in our study was between 35 and 80 years. Participants were excluded if they had diabetes, a BMI >35, had problematic alcohol use or were on medication for lipidaemia. The AWI-Gen study was approved by the Human Research Ethics Committee (HREC) (Medical) of the University of the Witwatersrand (Wits), in accordance with the Declaration of Helsinki principles (protocol number M121029), renewed in 2017 (protocol number M170880). This study was approved as an MSc research project by the HREC (Medical) (protocol number M160833).
The data collection for the AWI-Gen study is described by Ali, et al., 2018 [31]. Briefly, serum LDL-cholesterol and glucose were analysed with a Randox Daytona Plus Clinical Chemistry analyser (Crumlin, Northern Ireland) using colorimetric assays. The coefficient of variation of the laboratory measurement for lipids and glucose was less than 1.5% and 2.3%, respectively. Body Mass Index (BMI, kg/m 2 ) was calculated from height and weight measurements. Classification of diabetes was guided by the standards set by the American Diabetes Association [32]. It was defined as the presence of one or more of the following conditions: previous diagnosis by a health care provider (which excluded gestational diabetes), taking medication for the condition, or a fasting blood glucose level of � 7.0 mmol/L. Alcohol consumption was categorised into: never consumed; current non-problematic consumer; current problematic consumer; former consumer. Problematic drinking was determined according to the CAGE questionnaire [33], where four questions related to potential problematic alcohol consumption were asked, and categorised as problematic if the participant answered "yes" to at least two of them.

Candidate gene and variant selection
Variants in LDLR, APOB, PCSK9 and LDLRAP1 were identified using publicly available whole genome sequence (WGS) data from African participants in the 1000 Genomes Project (KGP) and the African Genome Variation Project (AGVP). The variants were selected on the basis of in silico functional prediction and allele frequency in African populations and genotyped in a group of 1000 AWI-Gen participants, half with low and half with high LDL-cholesterol levels. The variants were tested for association in a case:control study design with high (cases) compared to low (controls) LDL-cholesterol levels, correcting for multiple testing and considering potential confounders.
The four genes are known to be associated with monogenic FH, and in some cases also with multifactorial LDL-cholesterol levels. Variants in these genes were extracted in VCF file format from WGS data of African population samples available from KGP and the AGVP. A region including the genomic sequence of each gene, plus a 1000bp flanking region on either side, was screened for variants. A total of 975 individuals from eight African populations were included in the investigation: 655 WGS from KGP and 320 WGS from AGVP.
A total of 3541 variants were identified. The variants were functionally annotated using CADD [34] and Ensembl's VEP [35] to identify potentially deleterious variants. Sequences from KGP and AGVP were mapped to GRCh37, therefore VEP was used on Ensembl's archive site for GRCh37. Variants with a CADD score >10, SIFT score <0.05 or PolyPhen score >0.5 were selected as potentially deleterious.
To increase the power of the association analysis, only variants that were observed in at least six of the eight populations were chosen. Furthermore, the variants were filtered in two stages. Firstly, variants with at least one deleterious score, and being either a missense, start/ stop, gain/loss, exonic or regulatory variant, were selected. Secondly, variants with a minor allele frequency (MAF) in African populations (according to dbSNP) of between 10% and 45% were selected to boost the power of the analysis. Linkage disequilibrium (LD) was assessed for the selected variants and no pairs were in strong LD (Haploview, r 2 >0.4) [36].

Genotyping
The Agena Bioscience MassARRAY genotyping platform was used to genotype 19 selected SNPs. This service was provided by Inqaba Biotech in Pretoria, South Africa. The DNA used for the genotyping was obtained from the Biobank based at the Sydney Brenner Institute for Molecular Bioscience (SBIMB), Johannesburg, South Africa. The DNA concentration for each of the 1000 samples was normalised to~30 ng/μl and~10 μl DNA was provided. The MassARRAY system software was used to test whether variants of interest are likely to be successfully genotyped.
Data analysis PLINK 1.9 [37] was used. The genotype data was separated into cases (high LDL-cholesterol levels) and controls (low LDL-cholesterol levels) so that logistic regression could be carried out. Quality control was performed and samples with >17/19 missing SNP data were excluded from further analysis. SNP variants with >104/998 (>10%) missingness, Hardy-Weinberg equilibrium (HWE) P<0.005, differential missingness <1x10 -5 and MAF <0.01 were excluded from further analysis. Quality control measures were derived from Marees et al., (2018) [38] with slight modifications to fit this small dataset.

Association analysis
We used logistic regression analysis using 14 SNPs in the four genes of interest. Associations were corrected for multiple testing using the Benjamini-Hochberg method. All associations with P<0.05 after adjustment were considered significant. The odds ratios (OR) and 95% confidence intervals (CI) were calculated using the major allele (A2) as a reference for all associations. The logistic regression analysis was adjusted for variables that were identified as potential covariates, namely: sex, BMI, fasting glucose levels and geographical origin of participants.

Polygenic risk score (PRS)
A simple additive PRS for lower LDL-cholesterol was calculated using six variants (P<0.05) that were significant after adjusting for covariates ( Fig 3A). A frequency plot with the PRS for cases and controls was generated. A t-test was completed to test for significance between cases and controls. A plot showing the linear correlation of the PRS against the mean of LDL-cholesterol level per risk score was generated ( Fig 3B).

Replication study
Replication analysis of the six variants associated with LDL-cholesterol was performed in the AADM study. This ongoing genetic epidemiology study of diabetes and related traits has been described previously [26,27]. Briefly, individuals attending medical clinics or referred for clinical suspicion of diabetes to university medical centres in urban sites in Nigeria (Enugu, Lagos, and Ibadan), Ghana (Accra and Kumasi), and Kenya (Eldoret) were recruited. Within the AADM study population, 50.2% were found to have Type 2 Diabetes. Genotyping was conducted using two different GWAS arrays: Affymetrix Axiom1 PANAFR SNP array and the Illumina Consortium Multi-Ethnic Global Array (MEGA). Quality control was conducted separately for each of the resulting datasets. After technical quality control, sample-level genotype call rate was at least 0.95 for all participants. Each SNP dataset was filtered for missingness, HWE and allele frequency. SNPs passing the following filters were retained: missingness <0.05, HWE P>1 × 10 −6 and MAF >0.01. SNPs that passed quality control were used as the basis for imputation. Imputation of all samples was done with the African Genome Resources Haplotype Reference Panel using the Sanger Imputation Server [39]. Analysis was conducted using a linear mixed model of the inverse normal transformations of the age-, age squared-, and sex-adjusted residuals. From prior work [40], the first three principal components (PCs) of the genotypes were found to be statistically significant and were included in the model, along with adjustment for BMI. The model included a genetic relationship matrix to account for the random effect of relatedness, as related individuals were included in AADM. Models were run using EPACTS [41]. Statistical significance was declared at P<0.01 (0.05/5 [variants available in AADM]) with consistent direction of effect.

Statistics
A Chi-squared test was used to determine whether there was a significant difference between males and females with regard to LDL-cholesterol levels. The variables age, BMI, fasting glucose levels and LDL-cholesterol levels were all tested for normality. None of the variables fit a normal distribution, and therefore a Mann-Whitney U test was used to determine whether there was a significant difference between the cases and controls. STATA was used for these statistical tests [42].

Results
There are more females in the low LDL-cholesterol group, and the high LDL-cholesterol group was characterised by a higher BMI and higher fasting glucose levels ( Table 1).
The distribution of the LDL-cholesterol values in the group with low LDL-cholesterol ranged from 0.4-1.2 mmol/L. The high LDL-cholesterol group had LDL-cholesterol levels ranging from 3.7-14.2 mmol/L. Two individuals were excluded from the analyses due to very high fasting LDLcholesterol levels of 14.2 mmol/L and 8.23 mmol/L as they may have a monogenic FH aetiology.

Variant filtering and QC
In total, 29 variants were selected for genotyping, but only 19 remained after assay design for final genotyping. Of these, five variants failed quality control parameters (four variants due to high missingness, one variant not in HWE), leaving 14 variants to be analysed (see Table 2 for more information). Seven samples were removed (five due to high missingness and two as they were high LDL-cholesterol outliers who could potentially have monogenic FH), leaving 993 samples to be analysed. Some missense variants had no SIFT or PolyPhen2 scores in the databases we used, since they were not annotated at the time the search was performed.

Association analysis
An allelic association (Table 3) found six significantly associated loci after correcting for multiple testing (P<0.05). The minor alleles of five variants were associated with low LDL-cholesterol levels and the minor allele of only one variant was associated with high LDL-cholesterol levels. After adjusting for covariates (sex, BMI, fasting glucose and geographic region), logistic regression (Table 4) revealed five variants that were significantly associated with low LDL-cholesterol levels: APOB rs6752026 (OR: 0.55) and LDLRAP1 rs12071264 (OR: 0.54) and rs35910270 (OR: 0.78), PCSK9 rs45613943 (OR: 0.72), LDLR rs72658855 (OR: 0.47). Only one variant was significantly associated with increased levels of LDL-cholesterol: APOB rs679899 (OR: 1.41). A forest plot was generated using the 14 variants from Table 4 (Fig 1). Fig 2 shows the association of the genotypes for six variants significantly associated with low and high LDL-cholesterol levels after adjusting for covariates. For four variants (Fig 2A-2D), the minor allele contributes to lower LDL-cholesterol levels in these populations. There is a decrease in LDL-cholesterol when the minor allele is present (in both the heterozygous and homozygous genotype) for these four variants. This suggests that these alleles may have a gain of function LDL-cholesterol lowering mode of action. The minor allele of the fifth variant (rs35910270) (Fig 2E) shows that there is a decrease in LDL-cholesterol levels only when the homozygous minor allele genotype is present. This suggests a loss of function, recessive mode of action. The major allele is associated with high LDL-cholesterol levels for the five variants. The minor allele of the final variant (rs679899) (Fig 2F) shows that the minor allele is associated with high LDL-cholesterol levels.
In the PRS, "risk" is depicted by lower LDL-cholesterol ( Fig 3A). Therefore, the curve of the controls (low LDL-cholesterol) is shifted to the right (higher risk score for low LDL-cholesterol), as expected. The two groups are significantly different from each other (P = 0.001). Fig  3B shows the correlation of the PRS with LDL-cholesterol levels. It is apparent that individuals with a greater number of LDL-cholesterol reducing alleles have lower LDL-cholesterol levels. Alleles individually have a small effect on the phenotype, but when considering alleles across all five loci, the additive effect is clearly observed.

Replication of associated variants in an independent African study
We evaluated our associated variants in an independent sample of West and East Africans drawn from the AADM study (participant characteristics: Table 1). One of the variants, rs35910270 (LDLRAP1), did not pass quality control filters in the replication dataset and was not included in the replication analysis. PCSK9 variant rs45613943 was associated with lower LDL-cholesterol in the AADM data (P<9x10 -5 ; Table 4). The association of both APOB variants was directionally consistent with the main findings but did not reach statistical significance (rs6752026 P = 0.08; rs679899 P = 0.09; statistical significance set at P<0.01). There was no association between rs12071264 or rs72658855 with LDL-cholesterol in the AADM study.

Discussion
The aim of this study was to examine potentially functional variants in four genes for association with LDL-cholesterol levels in black African populations. To increase the power to detect associations we selected participants at the extremes of the LDL-cholesterol distribution with high and low levels. LDL-cholesterol levels are influenced by many genetic variants at different loci and by environmental factors, and lipid levels have an estimated heritability ranging between 40 and 60% [43]. GWAS studies of very large sample sizes have generally explained only 10-12% of the variability in LDL-cholesterol levels [18]. Some of the missing heritability could be explained by gene-environment interactions and gene-gene interactions [44]. Mutations in the genes investigated also contribute to monogenic dyslipidaemias. Deleterious mutations in LDLR are the most common cause of FH [8,45,46]; loss of function variants in LDLRAP1 have been documented to cause high LDL-cholesterol levels with a recessive form of inheritance (38); variants in APOB have been known to cause both low and high LDLcholesterol levels [16,[47][48][49]; and loss of function variants that cause low LDL-cholesterol have been identified in PCSK9 [50][51][52]. In this study, two LDLRAP1 variants were associated with low LDL-cholesterol levels. LDLRAP1 rs12071264 is located in intron 5, close to a splice site [53], and could affect transcription. This variant is absent in European populations. LDLRAP1 rs35910270 is in the 3'UTR and is common in both European (47%) and African (42%) populations.
Two variants in APOB were associated with LDL-cholesterol levels. rs6752026, a missense variant in exon 5, was associated with lower LDL-cholesterol levels. The proline to serine change is predicted to be deleterious by both SIFT and PolyPhen, however, it has not been Genetic variants associated with low LDL-cholesterol levels in Africans previously associated with LDL-cholesterol levels. This variant occurs at very low frequencies in European populations (~0.1%) but is common in Africans (~11%). The second, rs679899, is an alanine to valine missense variant associated with higher LDL-cholesterol levels. It is common in European populations (86%), but less common in African populations (13%). The evidence for the deleterious nature of the variant is conflicting with PolyPhen predicting it to be possibly damaging, while SIFT predicts it to be tolerated. One variant in PCSK9, rs45613943, was associated with low LDL-cholesterol levels and was also significantly associated with low LDL-cholesterol levels in an African replication cohort (Beta:-0.10, P<9x10 -5 ), strengthening its association with low LDL-cholesterol levels across several African populations. The variant allele occurs at low frequencies in European populations (5%) but is more common in Africa (~29%). This is likely to be a loss of function variant, as loss of function of the PCSK9 enzyme increases the number of LDL receptors returning to the surface of the cell, but further functional studies would be required to assess the effect of the variant on the function of the protein. Interestingly, one variant in LDLR, rs72568855, was associated with low LDL-cholesterol levels and this variant has not been reported in European populations.
Associations with four of the variants would not have been detected in studies with participants from Europe as they appear to be African-specific or extremely rare in Europeans. In all cases, the rare allele was associated with lower LDL-cholesterol. This may suggest that the allelic variants, excluding rs45613943, have a gain of function impact, or are in close LD with functional variants that contribute to decreased LDL-cholesterol levels in Africans. The PCSK9 variant, rs45613943, is a regulatory variant. This may decrease transcription, resulting in less production of protein and an increased turnover of the LDL receptors, thereby reducing the serum levels of LDL-cholesterol. The APOB rs679899 rare allele was the only variant that was significantly associated with high LDL-cholesterol levels in this study.
Participants from East, West and South Africa have an LDL-cholesterol distribution favouring lower LDL-cholesterol levels and interestingly, the rare alleles at five loci (Fig 2A-2E) showed association with low LDL-cholesterol levels. Only one variant was associated with high LDL-cholesterol levels ( Fig 2F). The PRS shows a modest, but significant (P = 0.001) shift between individuals with high and low LDL-cholesterol levels and a PRS is likely to improve with more markers from a GWAS analysis of the full AWI-Gen cohort. Plot shows the frequency of cases and controls for each score. The curve of the controls is shifted to the right, indicating that in controls the LDL-cholesterol levels decreases with the addition of alleles associated with lower LDL-cholesterol levels (either common or minor allele). B: Plot of risk score against mean LDL-cholesterol level per risk score. It is apparent that with the addition of each allele associated with lower LDLcholesterol levels (common or minor allele), the mean LDL-cholesterol level of the participants decreased. https://doi.org/10.1371/journal.pone.0229098.g003 Genetic variants associated with low LDL-cholesterol levels in Africans The LDL-cholesterol distribution in African populations is generally considered to be lower, compared to non-African populations; therefore, it is counter intuitive that the common alleles at the five associated variants would associate with higher LDL-cholesterol levels in Africans. For five of the six significantly associated variants identified in this study, the major alleles were associated with higher LDL-cholesterol levels. Although this may suggest that the normal distribution of LDL-cholesterol levels in African populations would be expected to be higher, the rare alleles in rs6752026, rs12071264, rs35910270 and rs72658855 may have some gain of function effect that associates them with lower LDL-cholesterol levels.
In addition, gene-environment interactions could play a role, and low-fat diets and high physical activity could also contribute to lower LDL-cholesterol levels in African populations. However, as African populations become more urbanised, a more western lifestyle will follow, which could increase LDL-cholesterol, especially in those with a genetic predisposition for high LDL-cholesterol levels [54].
Detecting hyperlipidaemia early in individuals and administering treatment and lifestyle changes can reduce the number of CVD related events, and subsequently reduce the health burden among Africans [55]. Precision public health is using data to implement intervention strategies that will most efficiently benefit the majority of individuals in a population [56]. Using population specific genetic variants to predict LDL-cholesterol levels will only be effective if they have good predictive potential and the assays are affordable. At present, a serum cholesterol test remains a better and more cost-effective measure of LDL-cholesterol levels.
Intervention strategies, such as lifestyle changes and appropriate prescription of medication for high LDL-cholesterol that is effective for the population in question, could be implemented for a better outcome.

Limitations & future research
Even though the AWI-Gen participants were all African, they were multi-ethnic with an uneven distribution across geographic regions in West, East and South Africa. This could have caused a bias due to population sub-structure, despite adjusting for study site (as proxy for ethnicity) in the logistic regression analysis. Nonetheless, since lipid data on African populations are limited, this study serves as a starting point for subsequent research endeavours on understanding genetic associations with LDL-cholesterol levels in African populations.
Due to funding limitations, only a small number of variants were tested per gene. Ideally, a more representative set of markers to capture all the haplotype blocks across each gene (to account for lower linkage disequilibrium in African populations) would have provided a more accurate indication of the association of variants in these genes with LDL-cholesterol levels. A GWAS analysis for LDL-cholesterol in the AWI-Gen study is in progress.
This study used a basic logistic regression approach to analysing variants, unlike the linear mixed model used by the replication study. Glucose levels were not adjusted for in the replication study.

Conclusion
In selected African populations in four sub-Saharan African countries, we investigated the association of variants in four genes (LDLR, APOB, PCSK9 and LDLRAP1) known to be involved in lipid metabolism. The study identified five variants associated with low LDL-cholesterol levels and one variant associated with high LDL-cholesterol levels. Using a different cohort from West Africa, we replicated the association of PCSK9 rs45613943C with low LDLcholesterol. These data suggest allelic association differences with LDL-cholesterol levels across African populations, which may be influenced by gene-environment interactions.