High prevalence of carriers of variant c.1528G>C of HADHA gene causing long-chain 3-hydroxyacyl-CoA dehydrogenase deficiency (LCHADD) in the population of adult Kashubians from North Poland

Background/Objectives The mitochondrial β-oxidation of fatty acids is a complex catabolic pathway. One of the enzymes of this pathway is the heterooctameric mitochondrial trifunctional protein (MTP), composed of four α- and β-subunits. Mutations in MTP genes (HADHA and HADHB), both located on chromosome 2p23, cause MTP deficiency, a rare autosomal recessive metabolic disorder characterized by decreased activity of MTP. The most common MTP mutation is long-chain 3-hydroxyacyl-CoA dehydrogenase (LCHAD) deficiency caused by the c.1528G>C (rs137852769, p.Glu510Gln) substitution in exon 15 of the HADHA gene. Subjects/Methods We analyzed the frequency of genetic variants in the HADHA gene in the adults of Kashubian origin from North Poland and compared this data in other Polish provinces. Results We found a significantly higher frequency of HDHA c.1528G>C (rs137852769, p.Glu510Gln) carriers among Kashubians (1/57) compared to subjects from other regions of Poland (1/187). We found higher frequency of c.652G>C (rs71441018, pVal218Leu) polymorphism in the HADHA gene within population of Silesia, southern Poland (1/107) compared to other regions. Conclusion Our study indicate described high frequency of c.1528G>C variant of HADHA gene in Kashubian population, suggesting the founder effect. For the first time we have found high frequency of rs71441018 in the South Poland Silesian population.

Piekutowska-Abramczuk et al. found that the carriers frequency in babies born in Poland in the year 2008 was 1/217, whereas it was significantly higher in the Pomeranian region [23]. Because the majority of carriers was detected in children living in the Kashubian region, the authors suggested a probable Kashubian origin of the prevalent c.1528G>C variant. The estimated frequency of disease in the Pomeranian region was 1/16,900 whereas in rest of Poland it was 1/118,336. Kashubians are a relatively small population that inhabits Kashubia, a region in Poland's Pomeranian Province in North Poland. Currently, the number of indigenous Kashubians is estimated at nearly 230,000. The isolation of this relatively small population is assumed to be mainly linguistic, cultural, and geographical, because its genetic structure has not been explored in detail yet. However, a high endogamy rates, slow population expansion, and insignificant immigration allows the consideration of Kashubians as a genetically isolated population [25,26].
The aim of our study was to analyze the frequency of genetic variants in the HADHA gene in a sample of adult Kashubians and compare this data with the frequency found in other provinces of Poland.

Material and methods Material
Kashubian population. The study sample was taken from the population in an isolated population of Kashubians from North Poland between October 2010 and March 2015 [26,27]. The sample comprised of 1023 adult subjects of Kashubian origin between the age of 18 and 85 years. There were 631 (61.8%) men and 392 (38.2%) women among them. From each person 5 ml of peripheral blood were collected and stored at -70˚C.
The participants were recruited from different health services from the Kashubian region of Poland (Fig 1). The Kashubian origin of participants was confirmed by being born into a Kashubian family (i.e., both mother and father, as well as four grandparents were Kashubian) and by a command of the Kashubian language.
The study protocol was approved by the Bioethics Commission of the Medical University of Gdańsk and all subjects prior to participation in the study.
Populations from other provinces of Poland. The participants were recruited between 2010-2012 within the TESTOPLEK research project and registered as POPULOUS collection at the Biobank Lab of The Department of Molecular Biophysics of The University of Lodz [28,29]. Each subject gave the written informed consent and completed a questionnaire. Saliva was collected into Oragene OG-500 DNA collection/storage receptacles (DNA Genotek, Kanata, Canada) from each individual. The approval for this study was obtained from The University of Lodz's Review Board. All procedures were performed in accordance with the Declaration of Helsinki (ethical principles for medical research involving human subjects).
From over 10,000 adult individuals throughout Poland from the POPULOUS collection, a total of 5,901 participants were involved in the creation of a study group (Fig 1). For these participants all of the survey data needed for this study was completed, including the region of saliva collection. There were 3,009 (50.99%) females and 2,892 (49.01%) males within this group, aged between 20 and 77 years (average 43.43).

Methods
DNA isolation from blood samples. Genomic DNA was extracted from 100 μl of frozen blood by using the Genomic Micro AX Blood Gravity kit (A&A Biotechnology) according to the manufacturer's protocol.
DNA isolation from saliva samples. Genomic DNA from the participants from other regions of Poland was manually isolated from 500 μl of saliva according to the manufacturer's protocol (PrepitL2P, PD-PR-052, DNA Genotek, Kanata, Canada). The elution volume was 50 μl. DNA was quantified using the broad range Quant-iT™ dsDNA Broad Range Assay Kit (Invitrogen™, Carlsbad, CA, USA). All DNA samples underwent quality control using a PCR reaction for sex determination [30].
Pyrosequencing method. The HADHA c.1528G>C variant was analyzed by pyrosequencing in a total of 1,023 adult subjects (Fig 2). To amplify the HADHA gene fragment (202 bp), primers HADHABiotFor199 (5'-[Btn]CTCACCCGCATTCTCCGAT-3') and HADHARev199 (5'-ACAGCCCCTTACCTTAACCACA-3') were used. The PCR amplification was performed using a 50 μL reaction volume containing 2 μL of genomic DNA, 2x PCR Master Mix Plus (A&A Biotechnology), and 1 μL of each primer (10 μM). The PCR used the following steps: 94˚C for 3 min, followed by 40 cycles with 94˚C for 15 s, 58˚C for 30 s and 72˚C for 30 s.; the final extension was at 72˚C for 5 min. The sequencing primer was 5'-GTTTTCTCGGTCGTGATAA-3', and the nucleotide dispensation order was TCTSCAGC. Sequencing was carried out as instructed by the protocol using the PSQ™ 96MA pyrosequencing aparatus and the PyroMark 1 Gold Q96 kit (Qiagen).
Microarrays analysis. The 5901 DNA samples were genotyped for 551,945 SNPs using the 24x1 Infinium HTS Human Core Exome PLUS (Illumina Inc., San Diego, CA, USA) microarrays according to the protocol provided by the manufacturer. Briefly, DNA samples were amplified, then enzymatically fragmented and hybridized to the BeadChips. Afterwards, the BeadChips underwent an extension and X-staining processes. Next the BeadChips were scanned using iScan (Illumina Inc., San Diego, CA, USA). Raw fluorescence intensities were imported to the GenomeStudio V2011.1 with the Genotyping Module v1.9.4 (Illumina Inc., San Diego, CA, USA). All data first underwent stringent quality control including sample exclusion if call rate was below 0.94 and if the 10% GenCall parameter was below 0.4. We filtered the data from the HADHA gene region (NC_000002.11 (26413504..26467665)) which gave 30 SNPs for the analysis. All sequence coordinates were compared to the GRCh37/hg19 reference sequence and were obtained from GenBank (http://www.ncbi.nlm.nih.gov). These results were exported from GenomeStudio using the PLINK Input Report Plug-in v2.1.3 by forward strand [31].
For each of the polymorphisms detected in this study the following parameters were assigned: dbSNP IDs (rs numbers), nucleotide position within or relative to the coding sequence based on the NM_000182.4, and amino acid position in the protein for SNPs in the coding sequence based on the reference sequence NP_000173.2. These data were obtained from GenBank (http://www.ncbi.nlm.nih.gov) and were used in this paper as the nomenclature of variants. Statistical methods. The observed genotype distribution was determined for all the detected polymorphisms by performing the Hardy-Weinberg equilibrium exact test assuming consistency for a P-value higher than 0.05. Differences in observed MAF values were examined with chi-square test for alleles (with Yates' correction if any allele quantity was below 10) and shown as their P-values, assumed significant if <0.05. The prediction of the effect of aminoacid and nucleotide substitution on protein function or gene were evaluated with the PredictSNP tool [32].

Microarrays method analysis for c.1528G>C-POPULOUS collection (this study).
Out of 5,877 analyzed subjects, 36 (0.61%) were carriers of variants of the HADHA gene. We did not find individuals with abnormal homozygous CC genotype and 5,841 (99.39%) were homozygous for the GG genotype. The carriers ratio of this variant in the studied population of Poland was 1/163. After exclusion of individuals from the Pomeranian region the ratio was 1/171. The carriers frequency in the Pomeranian province was 1/103 (Table 1).
Pyrosequencing method analysis for c.1528G>C-Kashubian population (this study). Out of 1,023 analyzed subjects, 18 (1.75%) were carriers of variants of the HADHA gene. We have not found individuals with abnormal homozygous CC genotype and 1,005 (98.25%) were homozygous for the GG genotype. The carriers ratio of this variant in the Kashubian population of Poland was 1/57 (Table 1).
Combined data from this study and Piekutowska-Abramczuk et al. (23) for c.1528G>C variant. Out of 9,334 analyzed subjects from all regions, excluding Pomeranians and defined Kashubians, 50 (0.53%) were carriers of variants of the HADHA gene. There were no individuals with abnormal homozygous GG genotype and 9,284 (99.47%) were homozygous-GG  Fig 3).

Comparison of c.1528G>C variant frequency in the different Polish regions.
Based on pairwise chi-square test for alleles (with Yates correction if any group <10), any significant differences in c.1528G>C variant frequency between particular regions of Poland (excluding the Pomeranian region from this comparison) were detected. Thus, we were able to put together populations from all the regions and compare these pooled data to the Pomeranian population.
By using this methodology, we found that population from the Pomeranian region and Kashubian population on its own significantly differ from the rest of the country in terms of frequency of this variant (p value of chi-square test for alleles p<0.0001) ( Table 2).
Comparison of c.1528G>C variant frequency between populations from different countries. Frequency of c.1528G>C was described previously for several populations: Finland, Holland, Estonia and Poland, summarized in Table 3. Taking data from these populations into account the observed frequency does not differ significantly between each other (Table 4). However, the population from the Polish Pomeranian region, either taking the Kashubian subset into account to not, differed significantly from other countries' populations. No difference, on the border of significance threshold, was only detected between the Dutch and Pomeranian population excluding Kashubians. The Kashubian population on its own had a higher observed frequency of examined SNPs and differed significantly from all compared populations (Table 4).
Other studied polymorphisms of the HADHA gene-Microarray data For rs146406360 (c.2113G>A, p.Val705Ile) polymorphism from 5,879 analyzed people (POP-ULOUS collection) we found 21 (0.41%) persons with heterozygous GA genotype and no one with AA genotype. The frequency of heterozygotes was 1/280. We did not find any differences in the frequency between all compared provinces of Poland. For rs7593175 (c.572+32T>C) polymorphism from 5,877 analyzed persons, we found 3,097 with CC (52.70%) genotype, 2,338 with TC (39.78%) genotype and 442 (7.52%) with TT genotype. We did not find any differences in the frequency between different regions of Poland. Interesting data were uncovered for rs71441018 (c.652G>C, p.Val218Leu) polymorphism. From 5,878 analyzed persons, we found 11 (0.19%) individuals with heterozygous GC genotype. The frequency of heterozygotes within the Polish population was 1/534. Nine of all 11 (82%) detected heterozygotes were found in the Silesian region. The frequency for this region was 1/107, statistically higher than in other regions of Poland (1/2933) (Pearson chi square test, p<0.0001).

In silico prediction of SNP and amino acid substitution effects on protein functionality
In silico analysis of SNP effect on protein functionality revealed that all exonic variants, such as rs71441018 (c.652G>C, p.Val218Leu), rs137852769 (c.1528G>C, p.Glu510Gln), rs146406360 (c.2113G>A, p.Val705Ile), have deleterious effects (Table 5). All in silico prediction tools for amino acid changes classified the p.Glu510Gln variant as deleterious, and the p.Val218Leu variant was assigned as deleterious by most of them. The p.Val705Ile variant was not predicted as harmful substitution for protein functionality (Table 6).

Discussion
Kashubians are a relatively small population that inhabits the Pomeranian Province in North Poland. Nowadays, 53,000 native speakers of Kashubian live in Pomerania, although roughly half a million people in Poland claim Kashubian or half Kashubian ancestry.
At the 2011 census however, the number of individuals declaring "Kashubian" as their only identity was 16,000, and 228,000 including those who declared Kashubian and Polish ethnicity. In the same census, over 108,000 people declared everyday use of the Kashubian language [34]. Currently, the number of indigenous Kashubians living in this region is estimated at nearly 230,000. Kashubians are considered to be an isolated population since several lines of evidence suggest that they conform to the criteria of such a population: an old settlement, high rates of endogamy with consanguineous marriages between distant relatives, and slow population expansion with negligible immigration, accompanied by the conservation of a strong sociocultural identity, including a distinct dialect and traditional customs [27]. Some studies, recently conducted in Poland, employed this isolated population, which is particularly attractive from a genetic point of view. For instance, Siemińska et al. showed, that the variant allele A of rs169969968 of the alpha-5 nicotinic receptor subunit gene (CHRNA5), a polymorphism which is strongly associated with nicotine dependence was significantly less frequent in comparison to the HapMap CEU reference population (www.hapmap.org) [26,27,[35][36][37]. Other published genetic studies indicated that Kashubian are very old Polish population, with presence of several mutations/genetic variants almost exclusively detected in patients from this region indicating a founder effect [35,[38][39][40][41][42][43].
Rebala et al studied the frequencies of Y chromosome haplotypes (7 STR) have found that Kashubian are closely related to similar populations from nearby regions (Kociewie, Kurpie), and also related to other Polish populations, but were different from other Slavics (Lusatia, Chech/Slovakia) and German populations (Meklenburgia, Bavaria) Danish, and Got (Sweden) populations [38].
In another study, Lipska et al explored the spectrum of NPHS2 gene mutations causing steroid-resistant nephrotic syndrome in Polish patients and reported that the carriers of the c.1032delT allele were exclusively found in the Pomeranian (Kashubian) region, suggesting a founder effect origin [42]. Studies of Chmara et al. in the population of patients with hypercholesterolaemia suggested that the (c.662A>G variant in the LDLR gene is frequent in this population) [23,38,42,43].
In a 2008 study of Polish newborns a carrier frequency of 1/217 was found, whereas it was significantly higher in the Pomeranian region at 1/73. Because the majority of carriers was detected in children living in the Kashubian region, the authors suggested a probable Kashubian origin (the Kashubian origin of children and parents was not confirmed in that study) of the prevalent c.1528G>C variant. The estimated frequency of disease was 1/16,900 whereas in the rest of Poland it was 1/118,336 [23].
Our study confirms the high frequency of carriers (1/70) of the c.1528G>C variant in the HADHA gene within the Pomeranian region (Table 1), and was statistically higher than in other region of Poland ( Table 2). The highest frequency was observed in individuals of Kashubian origin at 1/57 which was higher them previously described. In comparison to neighboring countries, (Finland and Estonia) this frequency was also higher. It is interesting that in Finland southern regions have higher frequency of this polymorphism than in northern regions (North Karelia, Oulu vs Helsinki and Bothnia)-Tables 3 and 4 [33].
In our study another region of Poland with high frequency of HADHA gene variants was the Lubusz province. This result could be explained by the low number of analyzed persons, and must be verified in enlarged population. When we analyzed the data collected as part of our study together with data from Piekutowska-Abramczuk, the frequency of variant in this region was 1/88 i.e. lower than in Kashubians ( Table 1). The frequency observed in Kashubians was statistically higher than in all other region of Poland. For the whole of Poland the carrier frequency 1/122 (113/13,746 analyzed persons)- Table 1.
For other analyzed polymorphism of the HADHA gene, prevalence of rs71441018 in the Silesian population was higher compared to the rest of Poland, which had not previously been described. This finding is all the more interesting because this amino acid substitution was classified as deleterious by in silico tools and could be clinically significant. More studies are necessary to confirm this observation and explain possible effects on protein functionality.
In silico analysis of other polymorphisms showed that predictions for the c.1528G>C (p. Glu510Gln) variant consistent with its previously described clinical significance. Although the effect of the SNP variant c.2113G>A (p.Val705Ile) was also classified as deleterious, amino acid substitution which is vital for protein functionality, was not predicted as negative effect.

Conclusions
In the summary, the results of our study confirm higher frequency of c.1528G>C variant in HADHA gene in the Pomeranian region of North Poland region with the highest frequency in individuals of Kashubian origin, which could confirm founder effect in this population. For the first time we have found a high frequency of rs71441018 polymorphism in the HADHA gene within the Silesian population of Southern Poland.